Revision 142023-09-20 - JeffreyBarrick

Changed:

<
<

META TOPICPARENT	name="ToolList"

>
>

META TOPICPARENT	name="SoftwareList"

Warning!
These instructions and the associated scripts have not been tested with more modern versions of R / Perl / Matlab / etc. The procedure may require updates to function.

Marker Divergence Experiments

This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s) from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.

Marker Divergence Experiments

Example Data

Ratios	Counts	Reference
EW1.tab EW2.tab EL1.tab EL2.tab	EW_EL_counts.xlsx	Woods, R.J., Barrick, J.E., Cooper, T.F., Shrestha, U., Kauth, M.R., Lenski, R.E. (2011) Second-order selection for evolvability in a large Escherichia coli population. Science 331:1433-1436.

The Counts files have raw colony counts for each of the two types in the population (e.g., red and white for Ara^– marker).
The Ratios files have the fractions of one type within the population (e.g., red divided by red plus white). They are used as input into the procedure below.

Requirements and Installation

The current version of the md package is available from our public Mercurial repository:

Download Current Source Archive

Browse Mercurial Repository

The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a system you can probably install these from the command line as follows:


 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto

  answer yes to any prompts about installing prerequisites

MATLAB is required for calculating establishment probabilities.

R is required for fitting marker divergence curves. It should be present in your $PATH, so that the Perl scripts can invoke it. You must also have the R package nortest installed for the Lilliefors test employed by the curve fitting script. This can be done from the command line within R by issuing the following command:


 >install.packages("nortest")

You may want to add the location of the perl scripts to your $PATH. You may need to change the first line of each script to the correct path to your Perl executable if it is not located at:


 #!/usr/bin/perl

Help for each individual Perl script can be obtained as follows


 >perldoc <script>

1. Fit α and τ empirical parameters from experimental data

The basic command is:


 >marker_divergence_fit.pl -i input.tab -o output.fit -p output.fit.pdf

output.fit is a tab-delimited file containing information about each curve fit. output.fit.pdf shows plots of each fit, so that you can judge whether they accurately reflect the data.

Input file data format

The input file is tab-delimited. The header row begins with "transfer", and the other columns are labels indicating the name of an experimental time series of marker ratio measurements. Each following row begins with the number of the transfer followed by the marker ratio measurements for that series at that time point. Marker ratios may be given in a variety of formats. Pass the -m option followed by ratio, fraction, log_ratio, or log10_ratio to this script depending on the format of you data values. The default mode is ratio.

Portion of an example marker ratio input file:


transfer	exp-1 	exp-2 	exp-3	

0       	0.5087	0.5068	0.4990

3       	0.5000	0.4844	0.5174

6       	0.4853	0.5393	0.5115

9       	0.4802	0.4862	0.4522

12       0.4884	0.4431	0.5170

15       0.5277	0.5196	0.5266

18       0.4983	0.4638	0.4607

21       0.5221	0.5361	0.5000

The output file is tab-delimited, with columns containing data as labeled.

Baseline correction

If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) that you would like to fit as a baseline. The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A, where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% of the population.

Example:


 >marker_divergence_fit.pl -m log_ratio -i input.tab -o output.fit -p output.fit.pdf

Corrects for the baseline by taking the average of the first 5 points.

2. Generate a table of establishment probabilities with MATLAB.

The population genetics model assumes that each new beneficial mutation that is generated has a certain probability of establishing (escaping loss due to dilution) during the serial transfer regime of the experiment. A table of these probabilities must be calculated with the MATLAB script.

First, add the directory containing the two ".m" files that come with the distribution to the MATLAB path.

In MATLAB:


 >>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T_6.64_No_5E6.tab')

The arguments are the number of generations per transfer, the initial population size immediately after each transfer, the precision of the file to be generated, the maximum selection coefficient to consider, and the output filename. Output is a tab-delimited list of selection coefficient and probability of establishment when a new mutant has this advantage relative to the population average [2].

It is important to allow a maximum selection coefficient value several fold greater than the expected effective selection coefficient (s) because multiple mutations may occur that give a large benefit relative to the population average. Reasonable values are typically 0.0001 to 0.001 for the precision and 1 to 5 for the maximum.

A file (pr_establishment_T_6.64_N_5E6_LT.tab) is provided with the distribution that can be used for experiments conducted under the conditions of the long-term E. coli evolution experiment.

3. Simulate and fit idealized marker divergence curves

The next step is to simulate marker divergence curves generated by a simplified population genetics model where there is only one category of beneficial mutation with selection coefficient s. These beneficial mutations occur with a rate μ. Selection coefficients are defined such that w_new=w_initial (1+s). This model takes into account the population bottlenecks that occur during a serial transfer evolution experiment.

For each combination of s and μ, a distribution of the effective parameters α and &tau is determined from the idealized data. Comparing the effective parameters extracted from the experimental data to all of these distributions allows the maximum likelihood values of s and μ that best explain the experimental data to be determined.

3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters


 >marker_divergence_pop_gen_simulation.pl -T 6.64 -N 5E6 -u 1E-8 -s 0.08 -p pr_establishment_T_6.64_No_5E6.tab -k 22 -i 3 -r 100 -o pop_gen_s_0.08_u_1E-8.tab

The generations per transfer (-T), initial population size at the beginning of each growth cycle (-N), per generation mutations rate (-u), per generation selection coefficient (-s), file of establishment probabilities produced by MATLAB (-p), number of generations during outgrowth (before printing any data) (-k), print out marker ratio each time this many transfers pass (-i), number of simulation replicates to perform (-r).

3.2 Fit α and τ empirical parameters from simulated curves

This step is the same as that used to fit the experimental data.


 >marker_divergence_fit.pl -m log_ratio -i pop_gen_s_0.08_u_1E-8.tab -o pop_gen_s_0.08_u_1E-8.fit

3.3 Automating and parallelizing this step

Generally, many combinations of μ and s must be calculated to determine the maximum likelihood effective parameters. This script combines the two previous steps to serially create marker ratio and fit files over a range of these parameters.


 >marker_divergence_background_model.pl -T 6.64 -N 5E6 -u -8:-6:0.5 -s 0.06:0.2:0.02 -p pr_establishment_T_6.64_No_5E6.tab -i 1  -r 100 -k 22

Parameters are the same as in marker_divergence_pop_gen_simulation.pl, except -u and -s are supplied as start:end:step_size combinations, and -u is in log10 units, i.e. passing a value of -8 gives a mutation rate of 10^-8.

This procedure can be farmed out to a computer cluster. Consolidate all of the output files into a single directory before proceeding to the next step. (Its okay if they are scattered across subdirectories, as long as they are the only files ending in .fit) .

4. Determine the maximum likelihood effective parameters

Finally, we determine what values of μ and s produce α and τ distributions in the simulated data that are statistically indistinguishable from those fit from the experimental data.


 >marker_divergence_significance.pl -i experimental.fit -d path/to/simulation/fits -o experimental.sig -p experimental.sig.pdf

The output file experimental.sig has starred lines where the experimental data and simulations agree. Since α and τ are not independent, this represents a greater than 95% confidence contour. The output PDF file should have a black square, representing the best parameter combination, and a blue region, indicating the >95% confidence contour. (If the plot is all blue, then none of your simulated data agreed with the experimental data.)

Note that you can now pass the -n flag, and the program will use a 2-dimensional version of the KS-test to determine the 95% confidence contour.

References

Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.

Acknowledgments

Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and for providing his raw data to check the results from these tools.

META FILEATTACHMENT	attachment="EW1.tab" attr="" comment="" date="1416519449" name="EW1.tab" path="EW1.tab" size="2773" stream="EW1.tab" tmpFilename="/usr/tmp/CGItemp48156" user="JeffreyBarrick" version="1"
META FILEATTACHMENT	attachment="EW2.tab" attr="" comment="" date="1416519503" name="EW2.tab" path="EW2.tab" size="2789" stream="EW2.tab" tmpFilename="/usr/tmp/CGItemp48263" user="JeffreyBarrick" version="1"
META FILEATTACHMENT	attachment="EL1.tab" attr="" comment="" date="1416519547" name="EL1.tab" path="EL1.tab" size="3339" stream="EL1.tab" tmpFilename="/usr/tmp/CGItemp48192" user="JeffreyBarrick" version="1"
META FILEATTACHMENT	attachment="EL2.tab" attr="" comment="" date="1416519579" name="EL2.tab" path="EL2.tab" size="2919" stream="EL2.tab" tmpFilename="/usr/tmp/CGItemp48532" user="JeffreyBarrick" version="1"
META FILEATTACHMENT	attachment="EW_EL_counts.xlsx" attr="" comment="" date="1416519951" name="EW_EL_counts.xlsx" path="EW_EL_counts.xlsx" size="76831" stream="EW_EL_counts.xlsx" tmpFilename="/usr/tmp/CGItemp48533" user="JeffreyBarrick" version="1"

Revision 132014-11-20 - JeffreyBarrick

META TOPICPARENT	name="ToolList"

Warning!
These instructions and the associated scripts have not been tested with more modern versions of R / Perl / Matlab / etc. The procedure may require updates to function.

Marker Divergence Experiments

This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s) from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.

Marker Divergence Experiments

Example Data

Ratios	Counts	Reference

Changed:

<
<

[[][EW1.tab]] [[][EW2.tab]] [[][EL1.tab]] [[][EL2.tab]]

[[][counts.xls]]

Woods, R.J.*, Barrick, J.E.*, Cooper, T.F., Shrestha, U., Kauth, M.R., Lenski, R.E. (2011) Second-order selection for evolvability in a large Escherichia coli population. Science 331:1433-1436.

>
>

EW1.tab EW2.tab
EL1.tab EL2.tab

EW_EL_counts.xlsx

Woods, R.J.*, Barrick, J.E.*, Cooper, T.F., Shrestha, U., Kauth, M.R., Lenski, R.E. (2011) Second-order selection for evolvability in a large Escherichia coli population. Science 331:1433-1436.

Changed:

<
<

The counts.xls (Counts) files have raw colony counts for each of the two types in the population (e.g., red and white for Ara^– marker).
The input.tab (Ratios) files have the fractions of one type within the population (e.g., red divided by red plus white). They are used as input into the procedure below.

>
>

The Counts files have raw colony counts for each of the two types in the population (e.g., red and white for Ara^– marker).
The Ratios files have the fractions of one type within the population (e.g., red divided by red plus white). They are used as input into the procedure below.

Requirements and Installation

The current version of the md package is available from our public Mercurial repository:

Download Current Source Archive

Browse Mercurial Repository

The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a system you can probably install these from the command line as follows:


 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto

  answer yes to any prompts about installing prerequisites

MATLAB is required for calculating establishment probabilities.

R is required for fitting marker divergence curves. It should be present in your $PATH, so that the Perl scripts can invoke it. You must also have the R package nortest installed for the Lilliefors test employed by the curve fitting script. This can be done from the command line within R by issuing the following command:


 >install.packages("nortest")

You may want to add the location of the perl scripts to your $PATH. You may need to change the first line of each script to the correct path to your Perl executable if it is not located at:


 #!/usr/bin/perl

Help for each individual Perl script can be obtained as follows


 >perldoc <script>

1. Fit α and τ empirical parameters from experimental data

The basic command is:


 >marker_divergence_fit.pl -i input.tab -o output.fit -p output.fit.pdf

output.fit is a tab-delimited file containing information about each curve fit. output.fit.pdf shows plots of each fit, so that you can judge whether they accurately reflect the data.

Input file data format

The input file is tab-delimited. The header row begins with "transfer", and the other columns are labels indicating the name of an experimental time series of marker ratio measurements. Each following row begins with the number of the transfer followed by the marker ratio measurements for that series at that time point. Marker ratios may be given in a variety of formats. Pass the -m option followed by ratio, fraction, log_ratio, or log10_ratio to this script depending on the format of you data values. The default mode is ratio.

Portion of an example marker ratio input file:


transfer	exp-1 	exp-2 	exp-3	

0       	0.5087	0.5068	0.4990

3       	0.5000	0.4844	0.5174

6       	0.4853	0.5393	0.5115

9       	0.4802	0.4862	0.4522

12       0.4884	0.4431	0.5170

15       0.5277	0.5196	0.5266

18       0.4983	0.4638	0.4607

21       0.5221	0.5361	0.5000

The output file is tab-delimited, with columns containing data as labeled.

Baseline correction

If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) that you would like to fit as a baseline. The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A, where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% of the population.

Example:


 >marker_divergence_fit.pl -m log_ratio -i input.tab -o output.fit -p output.fit.pdf

Corrects for the baseline by taking the average of the first 5 points.

2. Generate a table of establishment probabilities with MATLAB.

The population genetics model assumes that each new beneficial mutation that is generated has a certain probability of establishing (escaping loss due to dilution) during the serial transfer regime of the experiment. A table of these probabilities must be calculated with the MATLAB script.

First, add the directory containing the two ".m" files that come with the distribution to the MATLAB path.

In MATLAB:


 >>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T_6.64_No_5E6.tab')

The arguments are the number of generations per transfer, the initial population size immediately after each transfer, the precision of the file to be generated, the maximum selection coefficient to consider, and the output filename. Output is a tab-delimited list of selection coefficient and probability of establishment when a new mutant has this advantage relative to the population average [2].

It is important to allow a maximum selection coefficient value several fold greater than the expected effective selection coefficient (s) because multiple mutations may occur that give a large benefit relative to the population average. Reasonable values are typically 0.0001 to 0.001 for the precision and 1 to 5 for the maximum.

A file (pr_establishment_T_6.64_N_5E6_LT.tab) is provided with the distribution that can be used for experiments conducted under the conditions of the long-term E. coli evolution experiment.

3. Simulate and fit idealized marker divergence curves

The next step is to simulate marker divergence curves generated by a simplified population genetics model where there is only one category of beneficial mutation with selection coefficient s. These beneficial mutations occur with a rate μ. Selection coefficients are defined such that w_new=w_initial (1+s). This model takes into account the population bottlenecks that occur during a serial transfer evolution experiment.

For each combination of s and μ, a distribution of the effective parameters α and &tau is determined from the idealized data. Comparing the effective parameters extracted from the experimental data to all of these distributions allows the maximum likelihood values of s and μ that best explain the experimental data to be determined.

3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters


 >marker_divergence_pop_gen_simulation.pl -T 6.64 -N 5E6 -u 1E-8 -s 0.08 -p pr_establishment_T_6.64_No_5E6.tab -k 22 -i 3 -r 100 -o pop_gen_s_0.08_u_1E-8.tab

The generations per transfer (-T), initial population size at the beginning of each growth cycle (-N), per generation mutations rate (-u), per generation selection coefficient (-s), file of establishment probabilities produced by MATLAB (-p), number of generations during outgrowth (before printing any data) (-k), print out marker ratio each time this many transfers pass (-i), number of simulation replicates to perform (-r).

3.2 Fit α and τ empirical parameters from simulated curves

This step is the same as that used to fit the experimental data.


 >marker_divergence_fit.pl -m log_ratio -i pop_gen_s_0.08_u_1E-8.tab -o pop_gen_s_0.08_u_1E-8.fit

3.3 Automating and parallelizing this step

Generally, many combinations of μ and s must be calculated to determine the maximum likelihood effective parameters. This script combines the two previous steps to serially create marker ratio and fit files over a range of these parameters.


 >marker_divergence_background_model.pl -T 6.64 -N 5E6 -u -8:-6:0.5 -s 0.06:0.2:0.02 -p pr_establishment_T_6.64_No_5E6.tab -i 1  -r 100 -k 22

Parameters are the same as in marker_divergence_pop_gen_simulation.pl, except -u and -s are supplied as start:end:step_size combinations, and -u is in log10 units, i.e. passing a value of -8 gives a mutation rate of 10^-8.

This procedure can be farmed out to a computer cluster. Consolidate all of the output files into a single directory before proceeding to the next step. (Its okay if they are scattered across subdirectories, as long as they are the only files ending in .fit) .

4. Determine the maximum likelihood effective parameters

Finally, we determine what values of μ and s produce α and τ distributions in the simulated data that are statistically indistinguishable from those fit from the experimental data.


 >marker_divergence_significance.pl -i experimental.fit -d path/to/simulation/fits -o experimental.sig -p experimental.sig.pdf

The output file experimental.sig has starred lines where the experimental data and simulations agree. Since α and τ are not independent, this represents a greater than 95% confidence contour. The output PDF file should have a black square, representing the best parameter combination, and a blue region, indicating the >95% confidence contour. (If the plot is all blue, then none of your simulated data agreed with the experimental data.)

Note that you can now pass the -n flag, and the program will use a 2-dimensional version of the KS-test to determine the 95% confidence contour.

References

Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.

Acknowledgments

Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and for providing his raw data to check the results from these tools.

Added:

>
>

META FILEATTACHMENT	attachment="EW1.tab" attr="" comment="" date="1416519449" name="EW1.tab" path="EW1.tab" size="2773" stream="EW1.tab" tmpFilename="/usr/tmp/CGItemp48156" user="JeffreyBarrick" version="1"
META FILEATTACHMENT	attachment="EW2.tab" attr="" comment="" date="1416519503" name="EW2.tab" path="EW2.tab" size="2789" stream="EW2.tab" tmpFilename="/usr/tmp/CGItemp48263" user="JeffreyBarrick" version="1"
META FILEATTACHMENT	attachment="EL1.tab" attr="" comment="" date="1416519547" name="EL1.tab" path="EL1.tab" size="3339" stream="EL1.tab" tmpFilename="/usr/tmp/CGItemp48192" user="JeffreyBarrick" version="1"
META FILEATTACHMENT	attachment="EL2.tab" attr="" comment="" date="1416519579" name="EL2.tab" path="EL2.tab" size="2919" stream="EL2.tab" tmpFilename="/usr/tmp/CGItemp48532" user="JeffreyBarrick" version="1"
META FILEATTACHMENT	attachment="EW_EL_counts.xlsx" attr="" comment="" date="1416519951" name="EW_EL_counts.xlsx" path="EW_EL_counts.xlsx" size="76831" stream="EW_EL_counts.xlsx" tmpFilename="/usr/tmp/CGItemp48533" user="JeffreyBarrick" version="1"

Revision 122014-11-20 - JeffreyBarrick

META TOPICPARENT	name="ToolList"

Warning!
These instructions and the associated scripts have not been tested with more modern versions of R / Perl / Matlab / etc. The procedure may require updates to function.

Marker Divergence Experiments

This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s) from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.

Marker Divergence Experiments

Example Data

Ratios	Counts	Reference

Changed:

<
<

[[][input.tab]]

[[][counts.xls]]

Woods, R.J.*, Barrick, J.E.*, Cooper, T.F., Shrestha, U., Kauth, M.R., Lenski, R.E. (2011) Second-order selection for evolvability in a large Escherichia coli population. Science 331:1433-1436.

>
>

[[][EW1.tab]] [[][EW2.tab]] [[][EL1.tab]] [[][EL2.tab]]

[[][counts.xls]]

Woods, R.J.*, Barrick, J.E.*, Cooper, T.F., Shrestha, U., Kauth, M.R., Lenski, R.E. (2011) Second-order selection for evolvability in a large Escherichia coli population. Science 331:1433-1436.

The counts.xls (Counts) files have raw colony counts for each of the two types in the population (e.g., red and white for Ara^– marker).
The input.tab (Ratios) files have the fractions of one type within the population (e.g., red divided by red plus white). They are used as input into the procedure below.

Requirements and Installation

The current version of the md package is available from our public Mercurial repository:

Download Current Source Archive

Browse Mercurial Repository

The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a system you can probably install these from the command line as follows:


 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto

  answer yes to any prompts about installing prerequisites

MATLAB is required for calculating establishment probabilities.

R is required for fitting marker divergence curves. It should be present in your $PATH, so that the Perl scripts can invoke it. You must also have the R package nortest installed for the Lilliefors test employed by the curve fitting script. This can be done from the command line within R by issuing the following command:


 >install.packages("nortest")

You may want to add the location of the perl scripts to your $PATH. You may need to change the first line of each script to the correct path to your Perl executable if it is not located at:


 #!/usr/bin/perl

Help for each individual Perl script can be obtained as follows


 >perldoc <script>

1. Fit α and τ empirical parameters from experimental data

The basic command is:


 >marker_divergence_fit.pl -i input.tab -o output.fit -p output.fit.pdf

output.fit is a tab-delimited file containing information about each curve fit. output.fit.pdf shows plots of each fit, so that you can judge whether they accurately reflect the data.

Input file data format

The input file is tab-delimited. The header row begins with "transfer", and the other columns are labels indicating the name of an experimental time series of marker ratio measurements. Each following row begins with the number of the transfer followed by the marker ratio measurements for that series at that time point. Marker ratios may be given in a variety of formats. Pass the -m option followed by ratio, fraction, log_ratio, or log10_ratio to this script depending on the format of you data values. The default mode is ratio.

Portion of an example marker ratio input file:


transfer	exp-1 	exp-2 	exp-3	

0       	0.5087	0.5068	0.4990

3       	0.5000	0.4844	0.5174

6       	0.4853	0.5393	0.5115

9       	0.4802	0.4862	0.4522

12       0.4884	0.4431	0.5170

15       0.5277	0.5196	0.5266

18       0.4983	0.4638	0.4607

21       0.5221	0.5361	0.5000

The output file is tab-delimited, with columns containing data as labeled.

Baseline correction

If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) that you would like to fit as a baseline. The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A, where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% of the population.

Example:


 >marker_divergence_fit.pl -m log_ratio -i input.tab -o output.fit -p output.fit.pdf

Corrects for the baseline by taking the average of the first 5 points.

2. Generate a table of establishment probabilities with MATLAB.

The population genetics model assumes that each new beneficial mutation that is generated has a certain probability of establishing (escaping loss due to dilution) during the serial transfer regime of the experiment. A table of these probabilities must be calculated with the MATLAB script.

First, add the directory containing the two ".m" files that come with the distribution to the MATLAB path.

In MATLAB:


 >>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T_6.64_No_5E6.tab')

The arguments are the number of generations per transfer, the initial population size immediately after each transfer, the precision of the file to be generated, the maximum selection coefficient to consider, and the output filename. Output is a tab-delimited list of selection coefficient and probability of establishment when a new mutant has this advantage relative to the population average [2].

It is important to allow a maximum selection coefficient value several fold greater than the expected effective selection coefficient (s) because multiple mutations may occur that give a large benefit relative to the population average. Reasonable values are typically 0.0001 to 0.001 for the precision and 1 to 5 for the maximum.

A file (pr_establishment_T_6.64_N_5E6_LT.tab) is provided with the distribution that can be used for experiments conducted under the conditions of the long-term E. coli evolution experiment.

3. Simulate and fit idealized marker divergence curves

The next step is to simulate marker divergence curves generated by a simplified population genetics model where there is only one category of beneficial mutation with selection coefficient s. These beneficial mutations occur with a rate μ. Selection coefficients are defined such that w_new=w_initial (1+s). This model takes into account the population bottlenecks that occur during a serial transfer evolution experiment.

For each combination of s and μ, a distribution of the effective parameters α and &tau is determined from the idealized data. Comparing the effective parameters extracted from the experimental data to all of these distributions allows the maximum likelihood values of s and μ that best explain the experimental data to be determined.

3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters


 >marker_divergence_pop_gen_simulation.pl -T 6.64 -N 5E6 -u 1E-8 -s 0.08 -p pr_establishment_T_6.64_No_5E6.tab -k 22 -i 3 -r 100 -o pop_gen_s_0.08_u_1E-8.tab

The generations per transfer (-T), initial population size at the beginning of each growth cycle (-N), per generation mutations rate (-u), per generation selection coefficient (-s), file of establishment probabilities produced by MATLAB (-p), number of generations during outgrowth (before printing any data) (-k), print out marker ratio each time this many transfers pass (-i), number of simulation replicates to perform (-r).

3.2 Fit α and τ empirical parameters from simulated curves

This step is the same as that used to fit the experimental data.


 >marker_divergence_fit.pl -m log_ratio -i pop_gen_s_0.08_u_1E-8.tab -o pop_gen_s_0.08_u_1E-8.fit

3.3 Automating and parallelizing this step

Generally, many combinations of μ and s must be calculated to determine the maximum likelihood effective parameters. This script combines the two previous steps to serially create marker ratio and fit files over a range of these parameters.


 >marker_divergence_background_model.pl -T 6.64 -N 5E6 -u -8:-6:0.5 -s 0.06:0.2:0.02 -p pr_establishment_T_6.64_No_5E6.tab -i 1  -r 100 -k 22

Parameters are the same as in marker_divergence_pop_gen_simulation.pl, except -u and -s are supplied as start:end:step_size combinations, and -u is in log10 units, i.e. passing a value of -8 gives a mutation rate of 10^-8.

This procedure can be farmed out to a computer cluster. Consolidate all of the output files into a single directory before proceeding to the next step. (Its okay if they are scattered across subdirectories, as long as they are the only files ending in .fit) .

4. Determine the maximum likelihood effective parameters

Finally, we determine what values of μ and s produce α and τ distributions in the simulated data that are statistically indistinguishable from those fit from the experimental data.


 >marker_divergence_significance.pl -i experimental.fit -d path/to/simulation/fits -o experimental.sig -p experimental.sig.pdf

The output file experimental.sig has starred lines where the experimental data and simulations agree. Since α and τ are not independent, this represents a greater than 95% confidence contour. The output PDF file should have a black square, representing the best parameter combination, and a blue region, indicating the >95% confidence contour. (If the plot is all blue, then none of your simulated data agreed with the experimental data.)

Note that you can now pass the -n flag, and the program will use a 2-dimensional version of the KS-test to determine the 95% confidence contour.

References

Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.

Acknowledgments

Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and for providing his raw data to check the results from these tools.

Revision 112014-11-20 - JeffreyBarrick

META TOPICPARENT	name="ToolList"

Changed:

<
<

Warning!
This tool is under active development. Its capabilities and usage may change without warning.

>
>

Warning!
These instructions and the associated scripts have not been tested with more modern versions of R / Perl / Matlab / etc. The procedure may require updates to function.

Marker Divergence Experiments

This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s) from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.

Marker Divergence Experiments

Added:

>
>

Example Data

Ratios	Counts	Reference
[[][input.tab]]	[[][counts.xls]]	Woods, R.J., Barrick, J.E., Cooper, T.F., Shrestha, U., Kauth, M.R., Lenski, R.E. (2011) Second-order selection for evolvability in a large Escherichia coli population. Science 331:1433-1436.

The counts.xls (Counts) files have raw colony counts for each of the two types in the population (e.g., red and white for Ara^– marker).
The input.tab (Ratios) files have the fractions of one type within the population (e.g., red divided by red plus white). They are used as input into the procedure below.

Requirements and Installation

The current version of the md package is available from our public Mercurial repository:

Download Current Source Archive

Browse Mercurial Repository

The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a system you can probably install these from the command line as follows:


 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto

  answer yes to any prompts about installing prerequisites

MATLAB is required for calculating establishment probabilities.

R is required for fitting marker divergence curves. It should be present in your $PATH, so that the Perl scripts can invoke it. You must also have the R package nortest installed for the Lilliefors test employed by the curve fitting script. This can be done from the command line within R by issuing the following command:


 >install.packages("nortest")

You may want to add the location of the perl scripts to your $PATH. You may need to change the first line of each script to the correct path to your Perl executable if it is not located at:


 #!/usr/bin/perl

Help for each individual Perl script can be obtained as follows


 >perldoc <script>

1. Fit α and τ empirical parameters from experimental data

The basic command is:


 >marker_divergence_fit.pl -i input.tab -o output.fit -p output.fit.pdf

output.fit is a tab-delimited file containing information about each curve fit. output.fit.pdf shows plots of each fit, so that you can judge whether they accurately reflect the data.

Input file data format

The input file is tab-delimited. The header row begins with "transfer", and the other columns are labels indicating the name of an experimental time series of marker ratio measurements. Each following row begins with the number of the transfer followed by the marker ratio measurements for that series at that time point. Marker ratios may be given in a variety of formats. Pass the -m option followed by ratio, fraction, log_ratio, or log10_ratio to this script depending on the format of you data values. The default mode is ratio.

Portion of an example marker ratio input file:


transfer	exp-1 	exp-2 	exp-3	

0       	0.5087	0.5068	0.4990

3       	0.5000	0.4844	0.5174

6       	0.4853	0.5393	0.5115

9       	0.4802	0.4862	0.4522

12       0.4884	0.4431	0.5170

15       0.5277	0.5196	0.5266

18       0.4983	0.4638	0.4607

21       0.5221	0.5361	0.5000

The output file is tab-delimited, with columns containing data as labeled.

Baseline correction

If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) that you would like to fit as a baseline. The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A, where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% of the population.

Example:


 >marker_divergence_fit.pl -m log_ratio -i input.tab -o output.fit -p output.fit.pdf

Corrects for the baseline by taking the average of the first 5 points.

2. Generate a table of establishment probabilities with MATLAB.

The population genetics model assumes that each new beneficial mutation that is generated has a certain probability of establishing (escaping loss due to dilution) during the serial transfer regime of the experiment. A table of these probabilities must be calculated with the MATLAB script.

First, add the directory containing the two ".m" files that come with the distribution to the MATLAB path.

In MATLAB:


 >>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T_6.64_No_5E6.tab')

The arguments are the number of generations per transfer, the initial population size immediately after each transfer, the precision of the file to be generated, the maximum selection coefficient to consider, and the output filename. Output is a tab-delimited list of selection coefficient and probability of establishment when a new mutant has this advantage relative to the population average [2].

It is important to allow a maximum selection coefficient value several fold greater than the expected effective selection coefficient (s) because multiple mutations may occur that give a large benefit relative to the population average. Reasonable values are typically 0.0001 to 0.001 for the precision and 1 to 5 for the maximum.

A file (pr_establishment_T_6.64_N_5E6_LT.tab) is provided with the distribution that can be used for experiments conducted under the conditions of the long-term E. coli evolution experiment.

3. Simulate and fit idealized marker divergence curves

The next step is to simulate marker divergence curves generated by a simplified population genetics model where there is only one category of beneficial mutation with selection coefficient s. These beneficial mutations occur with a rate μ. Selection coefficients are defined such that w_new=w_initial (1+s). This model takes into account the population bottlenecks that occur during a serial transfer evolution experiment.

For each combination of s and μ, a distribution of the effective parameters α and &tau is determined from the idealized data. Comparing the effective parameters extracted from the experimental data to all of these distributions allows the maximum likelihood values of s and μ that best explain the experimental data to be determined.

3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters


 >marker_divergence_pop_gen_simulation.pl -T 6.64 -N 5E6 -u 1E-8 -s 0.08 -p pr_establishment_T_6.64_No_5E6.tab -k 22 -i 3 -r 100 -o pop_gen_s_0.08_u_1E-8.tab

The generations per transfer (-T), initial population size at the beginning of each growth cycle (-N), per generation mutations rate (-u), per generation selection coefficient (-s), file of establishment probabilities produced by MATLAB (-p), number of generations during outgrowth (before printing any data) (-k), print out marker ratio each time this many transfers pass (-i), number of simulation replicates to perform (-r).

3.2 Fit α and τ empirical parameters from simulated curves

This step is the same as that used to fit the experimental data.


 >marker_divergence_fit.pl -m log_ratio -i pop_gen_s_0.08_u_1E-8.tab -o pop_gen_s_0.08_u_1E-8.fit

3.3 Automating and parallelizing this step

Generally, many combinations of μ and s must be calculated to determine the maximum likelihood effective parameters. This script combines the two previous steps to serially create marker ratio and fit files over a range of these parameters.


 >marker_divergence_background_model.pl -T 6.64 -N 5E6 -u -8:-6:0.5 -s 0.06:0.2:0.02 -p pr_establishment_T_6.64_No_5E6.tab -i 1  -r 100 -k 22

Parameters are the same as in marker_divergence_pop_gen_simulation.pl, except -u and -s are supplied as start:end:step_size combinations, and -u is in log10 units, i.e. passing a value of -8 gives a mutation rate of 10^-8.

This procedure can be farmed out to a computer cluster. Consolidate all of the output files into a single directory before proceeding to the next step. (Its okay if they are scattered across subdirectories, as long as they are the only files ending in .fit) .

4. Determine the maximum likelihood effective parameters

Finally, we determine what values of μ and s produce α and τ distributions in the simulated data that are statistically indistinguishable from those fit from the experimental data.


 >marker_divergence_significance.pl -i experimental.fit -d path/to/simulation/fits -o experimental.sig -p experimental.sig.pdf

The output file experimental.sig has starred lines where the experimental data and simulations agree. Since α and τ are not independent, this represents a greater than 95% confidence contour. The output PDF file should have a black square, representing the best parameter combination, and a blue region, indicating the >95% confidence contour. (If the plot is all blue, then none of your simulated data agreed with the experimental data.)

Note that you can now pass the -n flag, and the program will use a 2-dimensional version of the KS-test to determine the 95% confidence contour.

References

Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.

Acknowledgments

Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and for providing his raw data to check the results from these tools.

Revision 102010-09-28 - JeffreyBarrick

META TOPICPARENT	name="ToolList"

Changed:

<
<

Warning!
This tool is under active development. Its capabilities and usage may change without warning, and it has not been adequately tested.

>
>

Warning!
This tool is under active development. Its capabilities and usage may change without warning.

Marker Divergence Experiments

This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s) from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.

Marker Divergence Experiments

Requirements and Installation

Changed:

<
<

Download

version 0.01

21 September 2008

>
> The current version of the md package is available from our public Mercurial repository:

Added:

>
>

Download Current Source Archive

Browse Mercurial Repository

Changed:

<
< The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a system you can probably install these from the command line as follows:

>
> The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a system you can probably install these from the command line as follows:


 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto

  answer yes to any prompts about installing prerequisites

MATLAB is required for calculating establishment probabilities.

R is required for fitting marker divergence curves. It should be present in your $PATH, so that the Perl scripts can invoke it. You must also have the R package nortest installed for the Lilliefors test employed by the curve fitting script. This can be done from the command line within R by issuing the following command:


 >install.packages("nortest")

You may want to add the location of the perl scripts to your $PATH. You may need to change the first line of each script to the correct path to your Perl executable if it is not located at:


 #!/usr/bin/perl

Help for each individual Perl script can be obtained as follows


 >perldoc <script>

1. Fit α and τ empirical parameters from experimental data

The basic command is:


 >marker_divergence_fit.pl -i input.tab -o output.fit -p output.fit.pdf

output.fit is a tab-delimited file containing information about each curve fit. output.fit.pdf shows plots of each fit, so that you can judge whether they accurately reflect the data.

Input file data format

The input file is tab-delimited. The header row begins with "transfer", and the other columns are labels indicating the name of an experimental time series of marker ratio measurements. Each following row begins with the number of the transfer followed by the marker ratio measurements for that series at that time point. Marker ratios may be given in a variety of formats. Pass the -m option followed by ratio, fraction, log_ratio, or log10_ratio to this script depending on the format of you data values. The default mode is ratio.

Portion of an example marker ratio input file:


transfer	exp-1 	exp-2 	exp-3	

0       	0.5087	0.5068	0.4990

3       	0.5000	0.4844	0.5174

6       	0.4853	0.5393	0.5115

9       	0.4802	0.4862	0.4522

12       0.4884	0.4431	0.5170

15       0.5277	0.5196	0.5266

18       0.4983	0.4638	0.4607

21       0.5221	0.5361	0.5000

The output file is tab-delimited, with columns containing data as labeled.

Baseline correction

If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) that you would like to fit as a baseline. The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A, where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% of the population.

Example:


 >marker_divergence_fit.pl -m log_ratio -i input.tab -o output.fit -p output.fit.pdf

Corrects for the baseline by taking the average of the first 5 points.

2. Generate a table of establishment probabilities with MATLAB.

The population genetics model assumes that each new beneficial mutation that is generated has a certain probability of establishing (escaping loss due to dilution) during the serial transfer regime of the experiment. A table of these probabilities must be calculated with the MATLAB script.

First, add the directory containing the two ".m" files that come with the distribution to the MATLAB path.

In MATLAB:


 >>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T_6.64_No_5E6.tab')

The arguments are the number of generations per transfer, the initial population size immediately after each transfer, the precision of the file to be generated, the maximum selection coefficient to consider, and the output filename. Output is a tab-delimited list of selection coefficient and probability of establishment when a new mutant has this advantage relative to the population average [2].

It is important to allow a maximum selection coefficient value several fold greater than the expected effective selection coefficient (s) because multiple mutations may occur that give a large benefit relative to the population average. Reasonable values are typically 0.0001 to 0.001 for the precision and 1 to 5 for the maximum.

A file (pr_establishment_T_6.64_N_5E6_LT.tab) is provided with the distribution that can be used for experiments conducted under the conditions of the long-term E. coli evolution experiment.

3. Simulate and fit idealized marker divergence curves

The next step is to simulate marker divergence curves generated by a simplified population genetics model where there is only one category of beneficial mutation with selection coefficient s. These beneficial mutations occur with a rate μ. Selection coefficients are defined such that w_new=w_initial (1+s). This model takes into account the population bottlenecks that occur during a serial transfer evolution experiment.

For each combination of s and μ, a distribution of the effective parameters α and &tau is determined from the idealized data. Comparing the effective parameters extracted from the experimental data to all of these distributions allows the maximum likelihood values of s and μ that best explain the experimental data to be determined.

3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters


 >marker_divergence_pop_gen_simulation.pl -T 6.64 -N 5E6 -u 1E-8 -s 0.08 -p pr_establishment_T_6.64_No_5E6.tab -k 22 -i 3 -r 100 -o pop_gen_s_0.08_u_1E-8.tab

The generations per transfer (-T), initial population size at the beginning of each growth cycle (-N), per generation mutations rate (-u), per generation selection coefficient (-s), file of establishment probabilities produced by MATLAB (-p), number of generations during outgrowth (before printing any data) (-k), print out marker ratio each time this many transfers pass (-i), number of simulation replicates to perform (-r).

3.2 Fit α and τ empirical parameters from simulated curves

This step is the same as that used to fit the experimental data.


 >marker_divergence_fit.pl -m log_ratio -i pop_gen_s_0.08_u_1E-8.tab -o pop_gen_s_0.08_u_1E-8.fit

3.3 Automating and parallelizing this step

Generally, many combinations of μ and s must be calculated to determine the maximum likelihood effective parameters. This script combines the two previous steps to serially create marker ratio and fit files over a range of these parameters.


 >marker_divergence_background_model.pl -T 6.64 -N 5E6 -u -8:-6:0.5 -s 0.06:0.2:0.02 -p pr_establishment_T_6.64_No_5E6.tab -i 1  -r 100 -k 22

Parameters are the same as in marker_divergence_pop_gen_simulation.pl, except -u and -s are supplied as start:end:step_size combinations, and -u is in log10 units, i.e. passing a value of -8 gives a mutation rate of 10^-8.

This procedure can be farmed out to a computer cluster. Consolidate all of the output files into a single directory before proceeding to the next step. (Its okay if they are scattered across subdirectories, as long as they are the only files ending in .fit) .

4. Determine the maximum likelihood effective parameters

Finally, we determine what values of μ and s produce α and τ distributions in the simulated data that are statistically indistinguishable from those fit from the experimental data.


 >marker_divergence_significance.pl -i experimental.fit -d path/to/simulation/fits -o experimental.sig -p experimental.sig.pdf

The output file experimental.sig has starred lines where the experimental data and simulations agree. Since α and τ are not independent, this represents a greater than 95% confidence contour. The output PDF file should have a black square, representing the best parameter combination, and a blue region, indicating the >95% confidence contour. (If the plot is all blue, then none of your simulated data agreed with the experimental data.)

Changed:

<
< Note that you can now pass the -n flag, and the program will use a 2-dimensional version KS-test to determine a true 95% confidence contour, but this has not been extensively tested.

>
> Note that you can now pass the -n flag, and the program will use a 2-dimensional version of the KS-test to determine the 95% confidence contour.

References

Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.

Acknowledgments

Changed:

<
< Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and for providing his raw data to check the results from these tools.

>
> Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and for providing his raw data to check the results from these tools.

Deleted:

<
<

META FILEATTACHMENT	attachment="md_v2.tgz" attr="" comment="" date="1230410029" name="md_v2.tgz" path="md_v2.tgz" size="458895" stream="md_v2.tgz" user="Main.JeffreyBarrick" version="2"
META FILEATTACHMENT	attr="" autoattached="1" comment="" date="1230408662" name="md_v1.tgz" path="md_v1.tgz" size="452888" user="Main.JeffreyBarrick" version="1"

Revision 92008-12-27 - JeffreyBarrick

  META TOPICPARENT 
 name="ToolList" 

 Warning!
This tool is under active development.  Its capabilities and usage may change without warning, and it has not been adequately tested.

 Marker Divergence Experiments 

This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s) from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.

 
  Marker Divergence Experiments 
  Example Data
   Requirements and Installation
   1. Fit α and τ empirical parameters from experimental data 
  Input file data format
   Baseline correction
 
   2. Generate a table of establishment probabilities with MATLAB.
   3. Simulate and fit idealized marker divergence curves 
  3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters
   3.2 Fit α and τ empirical parameters from simulated curves
   3.3 Automating and parallelizing this step
 
   4. Determine the maximum likelihood effective parameters
   References
   Acknowledgments
 
 


 Requirements and Installation
- META TOPICPARENT
+ name="ToolList"
 Warning!
This tool is under active development.  Its capabilities and usage may change without warning, and it has not been adequately tested.
-<
<
+  Download 
 version 0.01 
 21 September 2008
-  Download
+September 2008
->
>
+  Download 
 version 0.01 
 21 September 2008
-  Download
+September 2008
 The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a  system you can probably install these from the command line as follows:


 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto

  answer yes to any prompts about installing prerequisites


MATLAB is required for calculating establishment probabilities.
-<
<
+R is required for fitting marker divergence curves. It should be present in your $PATH, so that the Perl scripts can invoke it.
->
>
+R is required for fitting marker divergence curves. It should be present in your $PATH, so that the Perl scripts can invoke it. You must also have the R package nortest installed for the Lilliefors test employed by the curve fitting script. This can be done from the command line within R by issuing the following command:
->
>
+ >install.packages("nortest")
 You may want to add the location of the perl scripts to your $PATH. You may need to change the first line of each script to the correct path to your Perl executable if it is not located at:
-<
<
+#!/usr/bin/perl
->
>
+ #!/usr/bin/perl
 Help for each individual Perl script can be obtained as follows
-<
<
+ > perldoc <script>
->
>
+ >perldoc <script>
 . Fit α and τ empirical parameters from experimental data 

The basic command is:
-<
<
+ >marker_divergence_fit.pl -i input.tab > output.fit
->
>
+ >marker_divergence_fit.pl -i input.tab -o output.fit -p output.fit.pdf
->
>
+output.fit is a tab-delimited file containing information about each curve fit. output.fit.pdf shows plots of each fit, so that you can judge whether they accurately reflect the data.
  Input file data format
-<
<
+The input file is tab-delimited. The header row begins with "transfer", and the other columns are labels indicating the name of an experimental time series of marker ratio measurements. Each following row begins with the number of the transfer followed by the marker ratio measurements for that series at that time point. Marker ratios may be given in a variety of formats. Pass the -m option followed by ratio, log_ratio, log10_ratio to this script depending on the format of you data values. The default mode is ratio.
->
>
+The input file is tab-delimited. The header row begins with "transfer", and the other columns are labels indicating the name of an experimental time series of marker ratio measurements. Each following row begins with the number of the transfer followed by the marker ratio measurements for that series at that time point. Marker ratios may be given in a variety of formats. Pass the -m option followed by ratio, fraction, log_ratio, or log10_ratio to this script depending on the format of you data values. The default mode is ratio.
 Portion of an example marker ratio input file:

transfer	exp-1 	exp-2 	exp-3	

0       	0.5087	0.5068	0.4990

3       	0.5000	0.4844	0.5174

6       	0.4853	0.5393	0.5115

9       	0.4802	0.4862	0.4522

12       0.4884	0.4431	0.5170

15       0.5277	0.5196	0.5266

18       0.4983	0.4638	0.4607

21       0.5221	0.5361	0.5000



The output file is tab-delimited, with columns containing data as labeled. 

 Baseline correction
-<
<
+If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) . The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A, where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% or more of the population.
->
>
+If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) that you would like to fit as a baseline. The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A, where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% of the population.
 Example:
-<
<
+ >marker_divergence_fit.pl -m log_ratio -i input.tab > output.fit
->
>
+ >marker_divergence_fit.pl -m log_ratio -i input.tab -o output.fit -p output.fit.pdf
 Corrects for the baseline by taking the average of the first 5 points.

 2. Generate a table of establishment probabilities with MATLAB. 

The population genetics model assumes that each new beneficial mutation that is generated has a certain probability of establishing (escaping loss due to dilution) during the serial transfer regime of the experiment. A table of these probabilities must be calculated with the MATLAB script.

First, add the directory containing the two ".m" files that come with the distribution to the MATLAB path.

In MATLAB:

 >>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T_6.64_No_5E6.tab')


The arguments are the number of generations per transfer, the initial population size immediately after each transfer, the precision of the file to be generated, the maximum selection coefficient to consider, and the output filename. Output is a tab-delimited list of selection coefficient and probability of establishment when a new mutant has this advantage relative to the population average [2].

It is important to allow a maximum selection coefficient value several fold greater than the expected effective selection coefficient (s) because multiple mutations may occur that give a large benefit relative to the population average. Reasonable values are typically 0.0001 to 0.001 for the precision and 1 to 5 for the maximum. 

A file (pr_establishment_T_6.64_N_5E6_LT.tab) is provided with the distribution that can be used for experiments conducted under the conditions of the long-term E. coli evolution experiment.

 3. Simulate and fit idealized marker divergence curves 

The next step is to simulate marker divergence curves generated by a simplified population genetics model where there is only one category of beneficial mutation with selection coefficient s. These beneficial mutations occur with a rate μ. Selection coefficients are defined such that w_new=w_initial (1+s). This model takes into account the population bottlenecks that occur during a serial transfer evolution experiment.

For each combination of s and μ, a distribution of the effective parameters α and &tau is determined from the idealized data. Comparing the effective parameters extracted from the experimental data to all of these distributions allows the maximum likelihood values of s and μ that best explain the experimental data to be determined.

 3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters
-<
<
+ >marker_divergence_pop_gen_simulation.pl -T 6.64 -N 5E6 -u 1E-8 -s 0.08 -p pr_establishment_T_6.64_No_5E6.tab -k 22 -i 3 -r 100 > pop_gen_s_0.08_u_1E-8.tab
->
>
+ >marker_divergence_pop_gen_simulation.pl -T 6.64 -N 5E6 -u 1E-8 -s 0.08 -p pr_establishment_T_6.64_No_5E6.tab -k 22 -i 3 -r 100 -o pop_gen_s_0.08_u_1E-8.tab
 The generations per transfer (-T), initial population size at the beginning of each growth cycle (-N), per generation mutations rate (-u),  per generation selection coefficient (-s), file of establishment probabilities produced by MATLAB (-p), number of generations during outgrowth (before printing any data) (-k), print out marker ratio each time this many transfers pass (-i), number of simulation replicates to perform (-r).

 3.2 Fit α and τ empirical parameters from simulated curves 

This step is the same as that used to fit the experimental data.
-<
<
+ >marker_divergence_fit.pl -i
->
>
+ >marker_divergence_fit.pl -m log_ratio -i pop_gen_s_0.08_u_1E-8.tab -o pop_gen_s_0.08_u_1E-8.fit
 .3 Automating and parallelizing this step 

Generally, many combinations of μ and s must be calculated to determine the maximum likelihood effective parameters. This script combines the two previous steps to serially create marker ratio and fit files over a range of these parameters. 


 >marker_divergence_background_model.pl -T 6.64 -N 5E6 -u -8:-6:0.5 -s 0.06:0.2:0.02 -p pr_establishment_T_6.64_No_5E6.tab -i 1  -r 100 -k 22 


Parameters are the same as in marker_divergence_pop_gen_simulation.pl, except -u and -s are supplied as start:end:step_size combinations, and -u is in log10 units, i.e. passing a value of -8 gives a mutation rate of 10^-8.
-<
<
+This procedure can be farmed out to a computer cluster. Consolidate all of the output files into a single directory before proceeding to the next step.
->
>
+This procedure can be farmed out to a computer cluster. Consolidate all of the output files into a single directory before proceeding to the next step. (Its okay if they are scattered across subdirectories, as long as they are the only files ending in .fit) .
 . Determine the maximum likelihood effective parameters 

Finally, we determine what values of μ and s produce α and τ distributions in the simulated data that are statistically indistinguishable from those fit from the experimental data.
-<
<
+ >marker_divergence_significance.pl -i experimental.fit -d path/to/simulation/fits > experimental.sig
->
>
+ >marker_divergence_significance.pl -i experimental.fit -d path/to/simulation/fits -o experimental.sig -p experimental.sig.pdf
-<
<
+The output file experimental.sig has starred lines where the experimental data and simulations agree. Since α and τ are not independent, this represents a greater than 95% confidence cloud.
->
>
+The output file experimental.sig has starred lines where the experimental data and simulations agree. Since α and τ are not independent, this represents a greater than 95% confidence contour. The output PDF file should have a black square, representing the best parameter combination, and a blue region, indicating the >95% confidence contour. (If the plot is all blue, then none of your simulated data agreed with the experimental data.)
->
>
+Note that you can now pass the -n flag, and the program will use a 2-dimensional version KS-test to determine a true 95% confidence contour, but this has not been extensively tested.
  References 
 
 Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
  Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.
 

 Acknowledgments 
Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and for providing his raw data to check the results from these tools.
-<
<
+ META FILEATTACHMENT 
 attachment="md_v0_01.tgz" attr="" comment="" date="1222016303" name="md_v0_01.tgz" path="md_v0_01.tgz" size="451651" stream="md_v0_01.tgz" user="Main.JeffreyBarrick" version="1"
- META FILEATTACHMENT
+ attachment="md_v0_01.tgz" attr="" comment="" date="1222016303" name="md_v0_01.tgz" path="md_v0_01.tgz" size="451651" stream="md_v0_01.tgz" user="Main.JeffreyBarrick" version="1"
->
>
+ META FILEATTACHMENT 
 attachment="md_v2.tgz" attr="" comment="" date="1230410029" name="md_v2.tgz" path="md_v2.tgz" size="458895" stream="md_v2.tgz" user="Main.JeffreyBarrick" version="2"
- META FILEATTACHMENT
+ attachment="md_v2.tgz" attr="" comment="" date="1230410029" name="md_v2.tgz" path="md_v2.tgz" size="458895" stream="md_v2.tgz" user="Main.JeffreyBarrick" version="2"
->
>
+ META FILEATTACHMENT 
 attr="" autoattached="1" comment="" date="1230408662" name="md_v1.tgz" path="md_v1.tgz" size="452888" user="Main.JeffreyBarrick" version="1"
- META FILEATTACHMENT
+ attr="" autoattached="1" comment="" date="1230408662" name="md_v1.tgz" path="md_v1.tgz" size="452888" user="Main.JeffreyBarrick" version="1"

Revision 82008-09-22 - JeffreyBarrick

META TOPICPARENT	name="ToolList"

Warning!
This tool is under active development. Its capabilities and usage may change without warning, and it has not been adequately tested.

Marker Divergence Experiments

This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s) from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.

Marker Divergence Experiments

Requirements and Installation

Download

version 0.01

21 September 2008

The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a system you can probably install these from the command line as follows:


 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto

  answer yes to any prompts about installing prerequisites

MATLAB is required for calculating establishment probabilities.

Changed:

<
< R is required for fitting marker divergence curves. It should be present in your $PATH, so that the Perl scripts can invoke it.

>
> R is required for fitting marker divergence curves. It should be present in your $PATH, so that the Perl scripts can invoke it.

You may want to add the location of the perl scripts to your $PATH. You may need to change the first line of each script to the correct path to your Perl executable if it is not located at:


#!/usr/bin/perl

Changed:

<
< Help for each Perl script can be obtained

>
> Help for each individual Perl script can be obtained as follows

Added:

>
>


 > perldoc <script>

Deleted:

<
<

1. Fit α and τ empirical parameters from experimental data

The basic command is:


 >marker_divergence_fit.pl -i input.tab > output.fit

Input file data format

The input file is tab-delimited. The header row begins with "transfer", and the other columns are labels indicating the name of an experimental time series of marker ratio measurements. Each following row begins with the number of the transfer followed by the marker ratio measurements for that series at that time point. Marker ratios may be given in a variety of formats. Pass the -m option followed by ratio, log_ratio, log10_ratio to this script depending on the format of you data values. The default mode is ratio.

Portion of an example marker ratio input file:


transfer	exp-1 	exp-2 	exp-3	

0       	0.5087	0.5068	0.4990

3       	0.5000	0.4844	0.5174

6       	0.4853	0.5393	0.5115

9       	0.4802	0.4862	0.4522

12       0.4884	0.4431	0.5170

15       0.5277	0.5196	0.5266

18       0.4983	0.4638	0.4607

21       0.5221	0.5361	0.5000

The output file is tab-delimited, with columns containing data as labeled.

Baseline correction

If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) . The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A, where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% or more of the population.

Example:


 >marker_divergence_fit.pl -m log_ratio -i input.tab > output.fit

Corrects for the baseline by taking the average of the first 5 points.

2. Generate a table of establishment probabilities with MATLAB.

The population genetics model assumes that each new beneficial mutation that is generated has a certain probability of establishing (escaping loss due to dilution) during the serial transfer regime of the experiment. A table of these probabilities must be calculated with the MATLAB script.

First, add the directory containing the two ".m" files that come with the distribution to the MATLAB path.

In MATLAB:


 >>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T_6.64_No_5E6.tab')

The arguments are the number of generations per transfer, the initial population size immediately after each transfer, the precision of the file to be generated, the maximum selection coefficient to consider, and the output filename. Output is a tab-delimited list of selection coefficient and probability of establishment when a new mutant has this advantage relative to the population average [2].

It is important to allow a maximum selection coefficient value several fold greater than the expected effective selection coefficient (s) because multiple mutations may occur that give a large benefit relative to the population average. Reasonable values are typically 0.0001 to 0.001 for the precision and 1 to 5 for the maximum.

A file (pr_establishment_T_6.64_N_5E6_LT.tab) is provided with the distribution that can be used for experiments conducted under the conditions of the long-term E. coli evolution experiment.

3. Simulate and fit idealized marker divergence curves

The next step is to simulate marker divergence curves generated by a simplified population genetics model where there is only one category of beneficial mutation with selection coefficient s. These beneficial mutations occur with a rate μ. Selection coefficients are defined such that w_new=w_initial (1+s). This model takes into account the population bottlenecks that occur during a serial transfer evolution experiment.

For each combination of s and μ, a distribution of the effective parameters α and &tau is determined from the idealized data. Comparing the effective parameters extracted from the experimental data to all of these distributions allows the maximum likelihood values of s and μ that best explain the experimental data to be determined.

3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters


 >marker_divergence_pop_gen_simulation.pl -T 6.64 -N 5E6 -u 1E-8 -s 0.08 -p pr_establishment_T_6.64_No_5E6.tab -k 22 -i 3 -r 100 > pop_gen_s_0.08_u_1E-8.tab

The generations per transfer (-T), initial population size at the beginning of each growth cycle (-N), per generation mutations rate (-u), per generation selection coefficient (-s), file of establishment probabilities produced by MATLAB (-p), number of generations during outgrowth (before printing any data) (-k), print out marker ratio each time this many transfers pass (-i), number of simulation replicates to perform (-r).

3.2 Fit α and τ empirical parameters from simulated curves

This step is the same as that used to fit the experimental data.


 >marker_divergence_fit.pl -i

3.3 Automating and parallelizing this step

Generally, many combinations of μ and s must be calculated to determine the maximum likelihood effective parameters. This script combines the two previous steps to serially create marker ratio and fit files over a range of these parameters.


 >marker_divergence_background_model.pl -T 6.64 -N 5E6 -u -8:-6:0.5 -s 0.06:0.2:0.02 -p pr_establishment_T_6.64_No_5E6.tab -i 1  -r 100 -k 22

Parameters are the same as in marker_divergence_pop_gen_simulation.pl, except -u and -s are supplied as start:end:step_size combinations, and -u is in log10 units, i.e. passing a value of -8 gives a mutation rate of 10^-8.

This procedure can be farmed out to a computer cluster. Consolidate all of the output files into a single directory before proceeding to the next step.

4. Determine the maximum likelihood effective parameters

Finally, we determine what values of μ and s produce α and τ distributions in the simulated data that are statistically indistinguishable from those fit from the experimental data.


 >marker_divergence_significance.pl -i experimental.fit -d path/to/simulation/fits > experimental.sig

The output file experimental.sig has starred lines where the experimental data and simulations agree. Since α and τ are not independent, this represents a greater than 95% confidence cloud.

References

Changed:

<
<

Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.

>
>

Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.

Acknowledgments

Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and for providing his raw data to check the results from these tools.

META FILEATTACHMENT	attachment="md_v0_01.tgz" attr="" comment="" date="1222016303" name="md_v0_01.tgz" path="md_v0_01.tgz" size="451651" stream="md_v0_01.tgz" user="Main.JeffreyBarrick" version="1"

Revision 72008-09-21 - JeffreyBarrick

META TOPICPARENT	name="ToolList"

Warning!
This tool is under active development. Its capabilities and usage may change without warning, and it has not been adequately tested.

Marker Divergence Experiments

This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s) from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.

Marker Divergence Experiments

Requirements and Installation

Added:

>
>

Download

version 0.01

21 September 2008

The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from CPAN

. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a system you can probably install these from the command line as follows:


 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto

  answer yes to any prompts about installing prerequisites

MATLAB is required for calculating establishment probabilities.

R is required for fitting marker divergence curves. It should be present in your $PATH, so that the Perl scripts can invoke it.

You may want to add the location of the perl scripts to your $PATH. You may need to change the first line of each script to the correct path to your Perl executable if it is not located at:


#!/usr/bin/perl

Help for each Perl script can be obtained

1. Fit α and τ empirical parameters from experimental data

The basic command is:


 >marker_divergence_fit.pl -i input.tab > output.fit

Input file data format

The input file is tab-delimited. The header row begins with "transfer", and the other columns are labels indicating the name of an experimental time series of marker ratio measurements. Each following row begins with the number of the transfer followed by the marker ratio measurements for that series at that time point. Marker ratios may be given in a variety of formats. Pass the -m option followed by ratio, log_ratio, log10_ratio to this script depending on the format of you data values. The default mode is ratio.

Portion of an example marker ratio input file:


transfer	exp-1 	exp-2 	exp-3	

0       	0.5087	0.5068	0.4990

3       	0.5000	0.4844	0.5174

6       	0.4853	0.5393	0.5115

9       	0.4802	0.4862	0.4522

12       0.4884	0.4431	0.5170

15       0.5277	0.5196	0.5266

18       0.4983	0.4638	0.4607

21       0.5221	0.5361	0.5000

The output file is tab-delimited, with columns containing data as labeled.

Baseline correction

If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) . The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A, where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% or more of the population.

Example:


 >marker_divergence_fit.pl -m log_ratio -i input.tab > output.fit

Corrects for the baseline by taking the average of the first 5 points.

2. Generate a table of establishment probabilities with MATLAB.

The population genetics model assumes that each new beneficial mutation that is generated has a certain probability of establishing (escaping loss due to dilution) during the serial transfer regime of the experiment. A table of these probabilities must be calculated with the MATLAB script.

First, add the directory containing the two ".m" files that come with the distribution to the MATLAB path.

In MATLAB:


 >>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T_6.64_No_5E6.tab')

The arguments are the number of generations per transfer, the initial population size immediately after each transfer, the precision of the file to be generated, the maximum selection coefficient to consider, and the output filename. Output is a tab-delimited list of selection coefficient and probability of establishment when a new mutant has this advantage relative to the population average [2].

It is important to allow a maximum selection coefficient value several fold greater than the expected effective selection coefficient (s) because multiple mutations may occur that give a large benefit relative to the population average. Reasonable values are typically 0.0001 to 0.001 for the precision and 1 to 5 for the maximum.

A file (pr_establishment_T_6.64_N_5E6_LT.tab) is provided with the distribution that can be used for experiments conducted under the conditions of the long-term E. coli evolution experiment.

3. Simulate and fit idealized marker divergence curves

The next step is to simulate marker divergence curves generated by a simplified population genetics model where there is only one category of beneficial mutation with selection coefficient s. These beneficial mutations occur with a rate μ. Selection coefficients are defined such that w_new=w_initial (1+s). This model takes into account the population bottlenecks that occur during a serial transfer evolution experiment.

For each combination of s and μ, a distribution of the effective parameters α and &tau is determined from the idealized data. Comparing the effective parameters extracted from the experimental data to all of these distributions allows the maximum likelihood values of s and μ that best explain the experimental data to be determined.

3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters


 >marker_divergence_pop_gen_simulation.pl -T 6.64 -N 5E6 -u 1E-8 -s 0.08 -p pr_establishment_T_6.64_No_5E6.tab -k 22 -i 3 -r 100 > pop_gen_s_0.08_u_1E-8.tab

The generations per transfer (-T), initial population size at the beginning of each growth cycle (-N), per generation mutations rate (-u), per generation selection coefficient (-s), file of establishment probabilities produced by MATLAB (-p), number of generations during outgrowth (before printing any data) (-k), print out marker ratio each time this many transfers pass (-i), number of simulation replicates to perform (-r).

3.2 Fit α and τ empirical parameters from simulated curves

This step is the same as that used to fit the experimental data.


 >marker_divergence_fit.pl -i

3.3 Automating and parallelizing this step

Generally, many combinations of μ and s must be calculated to determine the maximum likelihood effective parameters. This script combines the two previous steps to serially create marker ratio and fit files over a range of these parameters.


 >marker_divergence_background_model.pl -T 6.64 -N 5E6 -u -8:-6:0.5 -s 0.06:0.2:0.02 -p pr_establishment_T_6.64_No_5E6.tab -i 1  -r 100 -k 22

Parameters are the same as in marker_divergence_pop_gen_simulation.pl, except -u and -s are supplied as start:end:step_size combinations, and -u is in log10 units, i.e. passing a value of -8 gives a mutation rate of 10^-8.

This procedure can be farmed out to a computer cluster. Consolidate all of the output files into a single directory before proceeding to the next step.

4. Determine the maximum likelihood effective parameters

Finally, we determine what values of μ and s produce α and τ distributions in the simulated data that are statistically indistinguishable from those fit from the experimental data.


 >marker_divergence_significance.pl -i experimental.fit -d path/to/simulation/fits > experimental.sig

The output file experimental.sig has starred lines where the experimental data and simulations agree. Since α and τ are not independent, this represents a greater than 95% confidence cloud.

References

Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.

Acknowledgments

Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and for providing his raw data to check the results from these tools.

Added:

>
>

META FILEATTACHMENT	attachment="md_v0_01.tgz" attr="" comment="" date="1222016303" name="md_v0_01.tgz" path="md_v0_01.tgz" size="451651" stream="md_v0_01.tgz" user="Main.JeffreyBarrick" version="1"

Revision 62008-09-20 - JeffreyBarrick

META TOPICPARENT	name="ToolList"

Added:

>
>

Warning!
This tool is under active development. Its capabilities and usage may change without warning, and it has not been adequately tested.

Marker Divergence Experiments

This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s) from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.

Marker Divergence Experiments

Requirements and Installation

The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a system you can probably install these from the command line as follows:


 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto

  answer yes to any prompts about installing prerequisites

MATLAB is required for calculating establishment probabilities.

Changed:

<
< R is required for fitting marker divergence curves. It should be present in your $PATH, so that Perl scripts can invoke it.

>
> R is required for fitting marker divergence curves. It should be present in your $PATH, so that the Perl scripts can invoke it.

Added:

>
>

You may want to add the location of the perl scripts to your $PATH. You may need to change the first line of each script to the correct path to your Perl executable if it is not located at:


#!/usr/bin/perl

Help for each Perl script can be obtained

1. Fit α and τ empirical parameters from experimental data

The basic command is:


 >marker_divergence_fit.pl -i input.tab > output.fit

Input file data format

The input file is tab-delimited. The header row begins with "transfer", and the other columns are labels indicating the name of an experimental time series of marker ratio measurements. Each following row begins with the number of the transfer followed by the marker ratio measurements for that series at that time point. Marker ratios may be given in a variety of formats. Pass the -m option followed by ratio, log_ratio, log10_ratio to this script depending on the format of you data values. The default mode is ratio.

Portion of an example marker ratio input file:


transfer	exp-1 	exp-2 	exp-3	

0       	0.5087	0.5068	0.4990

3       	0.5000	0.4844	0.5174

6       	0.4853	0.5393	0.5115

9       	0.4802	0.4862	0.4522

12       0.4884	0.4431	0.5170

15       0.5277	0.5196	0.5266

18       0.4983	0.4638	0.4607

21       0.5221	0.5361	0.5000

Added:

>
> The output file is tab-delimited, with columns containing data as labeled.

Baseline correction

If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) . The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A, where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% or more of the population.

Example:


 >marker_divergence_fit.pl -m log_ratio -i input.tab > output.fit

Corrects for the baseline by taking the average of the first 5 points.

2. Generate a table of establishment probabilities with MATLAB.

The population genetics model assumes that each new beneficial mutation that is generated has a certain probability of establishing (escaping loss due to dilution) during the serial transfer regime of the experiment. A table of these probabilities must be calculated with the MATLAB script.

First, add the directory containing the two ".m" files that come with the distribution to the MATLAB path.

In MATLAB:


 >>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T_6.64_No_5E6.tab')

The arguments are the number of generations per transfer, the initial population size immediately after each transfer, the precision of the file to be generated, the maximum selection coefficient to consider, and the output filename. Output is a tab-delimited list of selection coefficient and probability of establishment when a new mutant has this advantage relative to the population average [2].

It is important to allow a maximum selection coefficient value several fold greater than the expected effective selection coefficient (s) because multiple mutations may occur that give a large benefit relative to the population average. Reasonable values are typically 0.0001 to 0.001 for the precision and 1 to 5 for the maximum.

Added:

>
> A file (pr_establishment_T_6.64_N_5E6_LT.tab) is provided with the distribution that can be used for experiments conducted under the conditions of the long-term E. coli evolution experiment.

3. Simulate and fit idealized marker divergence curves

The next step is to simulate marker divergence curves generated by a simplified population genetics model where there is only one category of beneficial mutation with selection coefficient s. These beneficial mutations occur with a rate μ. Selection coefficients are defined such that w_new=w_initial (1+s). This model takes into account the population bottlenecks that occur during a serial transfer evolution experiment.

For each combination of s and μ, a distribution of the effective parameters α and &tau is determined from the idealized data. Comparing the effective parameters extracted from the experimental data to all of these distributions allows the maximum likelihood values of s and μ that best explain the experimental data to be determined.

3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters


 >marker_divergence_pop_gen_simulation.pl -T 6.64 -N 5E6 -u 1E-8 -s 0.08 -p pr_establishment_T_6.64_No_5E6.tab -k 22 -i 3 -r 100 > pop_gen_s_0.08_u_1E-8.tab

The generations per transfer (-T), initial population size at the beginning of each growth cycle (-N), per generation mutations rate (-u), per generation selection coefficient (-s), file of establishment probabilities produced by MATLAB (-p), number of generations during outgrowth (before printing any data) (-k), print out marker ratio each time this many transfers pass (-i), number of simulation replicates to perform (-r).

3.2 Fit α and τ empirical parameters from simulated curves

This step is the same as that used to fit the experimental data.


 >marker_divergence_fit.pl -i

3.3 Automating and parallelizing this step

Generally, many combinations of μ and s must be calculated to determine the maximum likelihood effective parameters. This script combines the two previous steps to serially create marker ratio and fit files over a range of these parameters.


 >marker_divergence_background_model.pl -T 6.64 -N 5E6 -u -8:-6:0.5 -s 0.06:0.2:0.02 -p pr_establishment_T_6.64_No_5E6.tab -i 1  -r 100 -k 22

Changed:

<
< Parameters are the same as in marker_divergence_pop_gen_simulation.pl, except -u and -s are supplied as start:end:step_size combinations, and -u is in log10 units, i.e. 10^-u.

>
> Parameters are the same as in marker_divergence_pop_gen_simulation.pl, except -u and -s are supplied as start:end:step_size combinations, and -u is in log10 units, i.e. passing a value of -8 gives a mutation rate of 10^-8.

This procedure can be farmed out to a computer cluster. Consolidate all of the output files into a single directory before proceeding to the next step.

4. Determine the maximum likelihood effective parameters

Finally, we determine what values of μ and s produce α and τ distributions in the simulated data that are statistically indistinguishable from those fit from the experimental data.


 >marker_divergence_significance.pl -i experimental.fit -d path/to/simulation/fits > experimental.sig

The output file experimental.sig has starred lines where the experimental data and simulations agree. Since α and τ are not independent, this represents a greater than 95% confidence cloud.

References

Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.

Acknowledgments

Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and for providing his raw data to check the results from these tools.

Revision 52008-09-20 - JeffreyBarrick

META TOPICPARENT	name="ToolList"

Marker Divergence Experiments

This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s) from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.

Marker Divergence Experiments

Requirements and Installation

The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a system you can probably install these from the command line as follows:


 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto

  answer yes to any prompts about installing prerequisites

MATLAB is required for calculating establishment probabilities.

R is required for fitting marker divergence curves. It should be present in your $PATH, so that Perl scripts can invoke it.

1. Fit α and τ empirical parameters from experimental data

The basic command is:


 >marker_divergence_fit.pl -i input.tab > output.fit

Input file data format

The input file is tab-delimited. The header row begins with "transfer", and the other columns are labels indicating the name of an experimental time series of marker ratio measurements. Each following row begins with the number of the transfer followed by the marker ratio measurements for that series at that time point. Marker ratios may be given in a variety of formats. Pass the -m option followed by ratio, log_ratio, log10_ratio to this script depending on the format of you data values. The default mode is ratio.

Portion of an example marker ratio input file:


transfer	exp-1 	exp-2 	exp-3	

0       	0.5087	0.5068	0.4990

3       	0.5000	0.4844	0.5174

6       	0.4853	0.5393	0.5115

9       	0.4802	0.4862	0.4522

12       0.4884	0.4431	0.5170

15       0.5277	0.5196	0.5266

18       0.4983	0.4638	0.4607

21       0.5221	0.5361	0.5000

Baseline correction

If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) . The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A, where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% or more of the population.

Example:


 >marker_divergence_fit.pl -m log_ratio -i input.tab > output.fit

Corrects for the baseline by taking the average of the first 5 points.

2. Generate a table of establishment probabilities with MATLAB.

The population genetics model assumes that each new beneficial mutation that is generated has a certain probability of establishing (escaping loss due to dilution) during the serial transfer regime of the experiment. A table of these probabilities must be calculated with the MATLAB script.

First, add the directory containing the two ".m" files that come with the distribution to the MATLAB path.

In MATLAB:


 >>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T_6.64_No_5E6.tab')

The arguments are the number of generations per transfer, the initial population size immediately after each transfer, the precision of the file to be generated, the maximum selection coefficient to consider, and the output filename. Output is a tab-delimited list of selection coefficient and probability of establishment when a new mutant has this advantage relative to the population average [2].

It is important to allow a maximum selection coefficient value several fold greater than the expected effective selection coefficient (s) because multiple mutations may occur that give a large benefit relative to the population average. Reasonable values are typically 0.0001 to 0.001 for the precision and 1 to 5 for the maximum.

3. Simulate and fit idealized marker divergence curves

The next step is to simulate marker divergence curves generated by a simplified population genetics model where there is only one category of beneficial mutation with selection coefficient s. These beneficial mutations occur with a rate μ. Selection coefficients are defined such that w_new=w_initial (1+s). This model takes into account the population bottlenecks that occur during a serial transfer evolution experiment.

For each combination of s and μ, a distribution of the effective parameters α and &tau is determined from the idealized data. Comparing the effective parameters extracted from the experimental data to all of these distributions allows the maximum likelihood values of s and μ that best explain the experimental data to be determined.

3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters


 >marker_divergence_pop_gen_simulation.pl -T 6.64 -N 5E6 -u 1E-8 -s 0.08 -p pr_establishment_T_6.64_No_5E6.tab -k 22 -i 3 -r 100 > pop_gen_s_0.08_u_1E-8.tab

The generations per transfer (-T), initial population size at the beginning of each growth cycle (-N), per generation mutations rate (-u), per generation selection coefficient (-s), file of establishment probabilities produced by MATLAB (-p), number of generations during outgrowth (before printing any data) (-k), print out marker ratio each time this many transfers pass (-i), number of simulation replicates to perform (-r).

3.2 Fit α and τ empirical parameters from simulated curves

This step is the same as that used to fit the experimental data.


 >marker_divergence_fit.pl -i

3.3 Automating and parallelizing this step

Generally, many combinations of μ and s must be calculated to determine the maximum likelihood effective parameters. This script combines the two previous steps to serially create marker ratio and fit files over a range of these parameters.


 >marker_divergence_background_model.pl -T 6.64 -N 5E6 -u -8:-6:0.5 -s 0.06:0.2:0.02 -p pr_establishment_T_6.64_No_5E6.tab -i 1  -r 100 -k 22

Parameters are the same as in marker_divergence_pop_gen_simulation.pl, except -u and -s are supplied as start:end:step_size combinations, and -u is in log10 units, i.e. 10^-u.

This procedure can be farmed out to a computer cluster. Consolidate all of the output files into a single directory before proceeding to the next step.

4. Determine the maximum likelihood effective parameters

Finally, we determine what values of μ and s produce α and τ distributions in the simulated data that are statistically indistinguishable from those fit from the experimental data.


 >marker_divergence_significance.pl -i experimental.fit -d path/to/simulation/fits > experimental.sig

The output file experimental.sig has starred lines where the experimental data and simulations agree. Since α and τ are not independent, this represents a greater than 95% confidence cloud.

References

Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.

Acknowledgments

Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and for providing his raw data to check the results from these tools.

Revision 42008-09-19 - JeffreyBarrick

  META TOPICPARENT 
 name="ToolList" 

 Marker Divergence Experiments 

This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s) from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.

 
  Marker Divergence Experiments 
  Example Data
   Requirements and Installation
   1. Fit α and τ empirical parameters from experimental data 
  Input file data format
   Baseline correction
 
   2. Generate a table of establishment probabilities with MATLAB.
   3. Simulate and fit idealized marker divergence curves 
  3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters
   3.2 Fit α and τ empirical parameters from simulated curves
   3.3 Automating and parallelizing this step
 
   4. Determine the maximum likelihood effective parameters
   References
   Acknowledgments
 
 


 Requirements and Installation 

The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a  system you can probably install these from the command line as follows:


 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto

  answer yes to any prompts about installing prerequisites


MATLAB is required for calculating establishment probabilities.

R is required for fitting marker divergence curves. It should be present in your $PATH, so that Perl scripts can invoke it.
- META TOPICPARENT
+ name="ToolList"
-<
<
+. Fit α and τ Empirical Parameters from Experimental Data
->
>
+. Fit α and τ empirical parameters from experimental data
 The basic command is:

 >marker_divergence_fit.pl -i input.tab > output.fit
-<
<
+ Input File Data Format
->
>
+ Input file data format
 The input file is tab-delimited. The header row begins with "transfer", and the other columns are labels indicating the name of an experimental time series of marker ratio measurements. Each following row begins with the number of the transfer followed by the marker ratio measurements for that series at that time point. Marker ratios may be given in a variety of formats. Pass the -m option followed by ratio, log_ratio, log10_ratio to this script depending on the format of you data values. The default mode is ratio.
-<
<
+Portion of an example marker ratio input file:
->
>
+Portion of an example marker ratio input file:
 transfer	exp-1 	exp-2 	exp-3	

0       	0.5087	0.5068	0.4990

3       	0.5000	0.4844	0.5174

6       	0.4853	0.5393	0.5115

9       	0.4802	0.4862	0.4522

12       0.4884	0.4431	0.5170

15       0.5277	0.5196	0.5266

18       0.4983	0.4638	0.4607

21       0.5221	0.5361	0.5000
-<
<
+ Baseline Correction
->
>
+ Baseline correction
 If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) . The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A, where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% or more of the population.

Example:

 >marker_divergence_fit.pl -m log_ratio -i input.tab > output.fit

Corrects for the baseline by taking the average of the first 5 points.
-<
<
+. Generate a Table of establishment probabilities with MATLAB.
->
>
+. Generate a table of establishment probabilities with MATLAB.
-<
<
+The population genetics model assumes that each new beneficial mutation that is generated has a certain probability of establishing (not going extinct) during the serial transfer regime of the experiment. A table of these probabilities must be calculated with the MATLAB script.
->
>
+The population genetics model assumes that each new beneficial mutation that is generated has a certain probability of establishing (escaping loss due to dilution) during the serial transfer regime of the experiment. A table of these probabilities must be calculated with the MATLAB script.
 First, add the directory containing the two ".m" files that come with the distribution to the MATLAB path.

In MATLAB:
-<
<
+ >>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T=6.64_No=5E6.tab')
->
>
+ >>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T_6.64_No_5E6.tab')
-<
<
+. Generate α and τ Values from the Population Genetics Simulation
->
>
+The arguments are the number of generations per transfer, the initial population size immediately after each transfer, the precision of the file to be generated, the maximum selection coefficient to consider, and the output filename. Output is a tab-delimited list of selection coefficient and probability of establishment when a new mutant has this advantage relative to the population average [2].
-<
<
+.1 Generate Simulated Marker Divergence Data for a μ and s Combination
->
>
+It is important to allow a maximum selection coefficient value several fold greater than the expected effective selection coefficient (s) because multiple mutations may occur that give a large benefit relative to the population average. Reasonable values are typically 0.0001 to 0.001 for the precision and 1 to 5 for the maximum.
-<
<
+.2 Fit α and τ Empirical Parameters from Experimental Data
->
>
+. Simulate and fit idealized marker divergence curves
-<
<
+.3 Helper Script: Automating this Step
->
>
+The next step is to simulate marker divergence curves generated by a simplified population genetics model where there is only one category of beneficial mutation with selection coefficient s. These beneficial mutations occur with a rate μ. Selection coefficients are defined such that w_new=w_initial (1+s). This model takes into account the population bottlenecks that occur during a serial transfer evolution experiment.
-<
<
+Determine the Effective Parameters where Simulations and Experimental Data Produce the Same Distributions of Empirical Parameters
->
>
+For each combination of s and μ, a distribution of the effective parameters α and &tau is determined from the idealized data. Comparing the effective parameters extracted from the experimental data to all of these distributions allows the maximum likelihood values of s and μ that best explain the experimental data to be determined.
->
>
+.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters 


 >marker_divergence_pop_gen_simulation.pl -T 6.64 -N 5E6 -u 1E-8 -s 0.08 -p pr_establishment_T_6.64_No_5E6.tab -k 22 -i 3 -r 100 > pop_gen_s_0.08_u_1E-8.tab


The generations per transfer (-T), initial population size at the beginning of each growth cycle (-N), per generation mutations rate (-u),  per generation selection coefficient (-s), file of establishment probabilities produced by MATLAB (-p), number of generations during outgrowth (before printing any data) (-k), print out marker ratio each time this many transfers pass (-i), number of simulation replicates to perform (-r).

 3.2 Fit α and τ empirical parameters from simulated curves 

This step is the same as that used to fit the experimental data.


 >marker_divergence_fit.pl -i 


 3.3 Automating and parallelizing this step 

Generally, many combinations of μ and s must be calculated to determine the maximum likelihood effective parameters. This script combines the two previous steps to serially create marker ratio and fit files over a range of these parameters. 


 >marker_divergence_background_model.pl -T 6.64 -N 5E6 -u -8:-6:0.5 -s 0.06:0.2:0.02 -p pr_establishment_T_6.64_No_5E6.tab -i 1  -r 100 -k 22 


Parameters are the same as in marker_divergence_pop_gen_simulation.pl, except -u and -s are supplied as start:end:step_size combinations, and -u is in log10 units, i.e. 10^-u.

This procedure can be farmed out to a computer cluster. Consolidate all of the output files into a single directory before proceeding to the next step. 

 4. Determine the maximum likelihood effective parameters 

Finally, we determine what values of μ and s produce α and τ distributions in the simulated data that are statistically indistinguishable from those fit from the experimental data.


 >marker_divergence_significance.pl -i experimental.fit -d path/to/simulation/fits > experimental.sig


The output file experimental.sig has starred lines where the experimental data and simulations agree. Since α and τ are not independent, this represents a greater than 95% confidence cloud.
  References 
 
 Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
  Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.
 

 Acknowledgments 
Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and for providing his raw data to check the results from these tools.

Revision 32008-09-19 - JeffreyBarrick

  META TOPICPARENT 
 name="ToolList" 

 Marker Divergence Experiments 

This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s) from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.

 
  Marker Divergence Experiments 
  Example Data
   Requirements and Installation
   1. Fit α and τ empirical parameters from experimental data 
  Input file data format
   Baseline correction
 
   2. Generate a table of establishment probabilities with MATLAB.
   3. Simulate and fit idealized marker divergence curves 
  3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters
   3.2 Fit α and τ empirical parameters from simulated curves
   3.3 Automating and parallelizing this step
 
   4. Determine the maximum likelihood effective parameters
   References
   Acknowledgments
 
 


 Requirements and Installation 

The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a  system you can probably install these from the command line as follows:


 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto
- META TOPICPARENT
+ name="ToolList"
-<
<
+  _answer yes to any prompts about installing prerequisites_
->
>
+  answer yes to any prompts about installing prerequisites
->
>
+MATLAB is required for calculating establishment probabilities.

R is required for fitting marker divergence curves. It should be present in your $PATH, so that Perl scripts can invoke it.
 . Fit α and τ Empirical Parameters from Experimental Data 

The basic command is:
-<
<
+ >sudo perl -MCPAN -e shell
->
>
+ >marker_divergence_fit.pl -i input.tab > output.fit
-<
<
+ >Password: ********

 >install Math::Random::MT::Auto

  _answer yes to any prompts about installing prerequisites_
  Input File Data Format
-<
<
+The input file is a tab-delimited.
->
>
+The input file is tab-delimited. The header row begins with "transfer", and the other columns are labels indicating the name of an experimental time series of marker ratio measurements. Each following row begins with the number of the transfer followed by the marker ratio measurements for that series at that time point. Marker ratios may be given in a variety of formats. Pass the -m option followed by ratio, log_ratio, log10_ratio to this script depending on the format of you data values. The default mode is ratio.
-<
<
+You should pass the -m option followed by ratio, log_ratio, log10_ratio to this script depending on the format of you data values.
->
>
+Portion of an example marker ratio input file:
->
>
+transfer	exp-1 	exp-2 	exp-3	

0       	0.5087	0.5068	0.4990

3       	0.5000	0.4844	0.5174

6       	0.4853	0.5393	0.5115

9       	0.4802	0.4862	0.4522

12       0.4884	0.4431	0.5170

15       0.5277	0.5196	0.5266

18       0.4983	0.4638	0.4607

21       0.5221	0.5361	0.5000
  Baseline Correction
-<
<
+If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) . The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% or more of the population.
->
>
+If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) . The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A, where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% or more of the population.
->
>
+Example:
-<
<
+ >sudo perl -MCPAN -e shell
->
>
+ >marker_divergence_fit.pl -m log_ratio -i input.tab > output.fit
-<
<
+ >Password: ********

 >install Math::Random::MT::Auto

  _answer yes to any prompts about installing prerequisites_
-<
<
 Corrects for the baseline by taking the average of the first 5 points.

 2. Generate a Table of establishment probabilities with MATLAB.
-<
<
+. Generate α and τ Values from a Population Genetics Simulation
->
>
+The population genetics model assumes that each new beneficial mutation that is generated has a certain probability of establishing (not going extinct) during the serial transfer regime of the experiment. A table of these probabilities must be calculated with the MATLAB script.
-<
<
+.1 Generate Simulated Data for a μ and s Combination
->
>
+First, add the directory containing the two ".m" files that come with the distribution to the MATLAB path.
->
>
+In MATLAB:

 >>establishment_probability_table(6.64, 5E6, 0.001, 1, 'pr_establishment_T=6.64_No=5E6.tab')


 3. Generate α and τ Values from the Population Genetics Simulation 

 3.1 Generate Simulated Marker Divergence Data for a μ and s Combination
 .2 Fit α and τ Empirical Parameters from Experimental Data 

 3.3 Helper Script: Automating this Step 

 4 Determine the Effective Parameters where Simulations and Experimental Data Produce the Same Distributions of Empirical Parameters 

 References 
 
 Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
  Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.
 

 Acknowledgments
-<
<
+Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and providing his raw data to check the results from these tools.
->
>
+Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and for providing his raw data to check the results from these tools.

Revision 22008-09-19 - JeffreyBarrick

  META TOPICPARENT 
 name="ToolList" 

 Marker Divergence Experiments
- META TOPICPARENT
+ name="ToolList"
-<
<
+This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s)from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.
->
>
+This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s) from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.
   Marker Divergence Experiments 
  Example Data
   Requirements and Installation
   1. Fit α and τ empirical parameters from experimental data 
  Input file data format
   Baseline correction
 
   2. Generate a table of establishment probabilities with MATLAB.
   3. Simulate and fit idealized marker divergence curves 
  3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters
   3.2 Fit α and τ empirical parameters from simulated curves
   3.3 Automating and parallelizing this step
 
   4. Determine the maximum likelihood effective parameters
   References
   Acknowledgments
 
 


 Requirements and Installation
-<
<
+The Perl scripts require the module Math::Random::MT::Auto and its prerequisites. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. On many systems you can install these from the command line as follows:
->
>
+The Perl scripts require the module Math::Random::MT::Auto and its prerequisites for random number generation. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. If you have root access on a  system you can probably install these from the command line as follows:
  >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto
-<
<
+answer yes to any prompts about installing prerequisites
->
>
+  _answer yes to any prompts about installing prerequisites_
 . Fit α and τ Empirical Parameters from Experimental Data
->
>
+The basic command is:

 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto

  _answer yes to any prompts about installing prerequisites_


 Input File Data Format 

The input file is a tab-delimited.

You should pass the -m option followed by ratio, log_ratio, log10_ratio to this script depending on the format of you data values.

 Baseline Correction 

If some of your experimental curves do not start at a 1:1 ratio of the neutral marker states, you will also want to pass the -b option followed by the number of initial points (not transfers) . The script corrects for the initial marker imbalance by fitting τ and α to a modified equation that accounts for the fact that for a population diverging toward marker state A where A was initially present in less than 50% of the population, the marker ratio will be shifted sooner than in a population where it was initially present in 50% or more of the population.


 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto

  _answer yes to any prompts about installing prerequisites_


Corrects for the baseline by taking the average of the first 5 points.
 . Generate a Table of establishment probabilities with MATLAB. 

 3. Generate α and τ Values from a Population Genetics Simulation 

 3.1 Generate Simulated Data for a μ and s Combination 

 3.2 Fit α and τ Empirical Parameters from Experimental Data 

 3.3 Helper Script: Automating this Step 

 4 Determine the Effective Parameters where Simulations and Experimental Data Produce the Same Distributions of Empirical Parameters 

 References 
 
 Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
  Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.
 

 Acknowledgments 
Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and providing his raw data to check the results from these tools.

Revision 12008-09-18 - JeffreyBarrick

META TOPICPARENT	name="ToolList"

Marker Divergence Experiments

This workflow implements a method for extracting effective beneficial mutation rates (μ) and selection coefficients (s)from marker divergence experiments [1]. This is a way of parameterizing the evolvability of a bacterial strain.

Marker Divergence Experiments

Requirements and Installation

The Perl scripts require the module Math::Random::MT::Auto and its prerequisites. They can be installed from CPAN. The Math::Random::MT::Auto module has a code component that must be compiled. On many systems you can install these from the command line as follows:


 >sudo perl -MCPAN -e shell

 >Password: ********

 >install Math::Random::MT::Auto

answer yes to any prompts about installing prerequisites

1. Fit α and τ Empirical Parameters from Experimental Data

2. Generate a Table of establishment probabilities with MATLAB.

3. Generate α and τ Values from a Population Genetics Simulation

3.1 Generate Simulated Data for a μ and s Combination

3.2 Fit α and τ Empirical Parameters from Experimental Data

3.3 Helper Script: Automating this Step

4 Determine the Effective Parameters where Simulations and Experimental Data Produce the Same Distributions of Empirical Parameters

References

Hegreness, M., Shoresh, N., Hartl, D., and Kishony, R. (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311, 1615-1617.
Wahl, L.M., and Gerrish, P.J. (2001) The probability that beneficial mutations are lost in populations with periodic bottlenecks. Evolution 55, 2606-2610.

Acknowledgments

Many thanks to Noam Shoresh for extensive discussions that made it possible for me to reproduce his methods and providing his raw data to check the results from these tools.

Difference: ToolsMarkerDivergence (1 vs. 14)

Revision 142023-09-20 - JeffreyBarrick

Marker Divergence Experiments

Example Data

Requirements and Installation

1. Fit α and τ empirical parameters from experimental data

Input file data format

Baseline correction

2. Generate a table of establishment probabilities with MATLAB.

3. Simulate and fit idealized marker divergence curves

3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters

3.2 Fit α and τ empirical parameters from simulated curves

3.3 Automating and parallelizing this step

4. Determine the maximum likelihood effective parameters

References

Acknowledgments

Revision 132014-11-20 - JeffreyBarrick

Marker Divergence Experiments

Example Data

Requirements and Installation

1. Fit α and τ empirical parameters from experimental data

Input file data format

Baseline correction

2. Generate a table of establishment probabilities with MATLAB.

3. Simulate and fit idealized marker divergence curves

3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters

3.2 Fit α and τ empirical parameters from simulated curves

3.3 Automating and parallelizing this step

4. Determine the maximum likelihood effective parameters

References

Acknowledgments

Revision 122014-11-20 - JeffreyBarrick

Marker Divergence Experiments

Example Data

Requirements and Installation

1. Fit α and τ empirical parameters from experimental data

Input file data format

Baseline correction

2. Generate a table of establishment probabilities with MATLAB.

3. Simulate and fit idealized marker divergence curves

3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters

3.2 Fit α and τ empirical parameters from simulated curves

3.3 Automating and parallelizing this step

4. Determine the maximum likelihood effective parameters

References

Acknowledgments

Revision 112014-11-20 - JeffreyBarrick

Marker Divergence Experiments

Example Data

Requirements and Installation

1. Fit α and τ empirical parameters from experimental data

Input file data format

Baseline correction

2. Generate a table of establishment probabilities with MATLAB.

3. Simulate and fit idealized marker divergence curves

3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters

3.2 Fit α and τ empirical parameters from simulated curves

3.3 Automating and parallelizing this step

4. Determine the maximum likelihood effective parameters

References

Acknowledgments

Revision 102010-09-28 - JeffreyBarrick

Marker Divergence Experiments

Requirements and Installation

1. Fit α and τ empirical parameters from experimental data

Input file data format

Baseline correction

2. Generate a table of establishment probabilities with MATLAB.

3. Simulate and fit idealized marker divergence curves

3.1 Simulate marker divergence data for a set of μ and s effective beneficial mutation parameters

3.2 Fit α and τ empirical parameters from simulated curves

3.3 Automating and parallelizing this step

4. Determine the maximum likelihood effective parameters

References

Acknowledgments

Revision 92008-12-27 - JeffreyBarrick

Marker Divergence Experiments

Requirements and Installation

1. Fit α and τ empirical parameters from experimental data

Input file data format