Molecular markers - computer practicals
DNA sequences
Microsatellites NGS
home
Analysis of microsatellite (SSRs) data

Presentation can be found
here.
Training dataset A is here. It consists of 15
samples of Nuphar lutea genotyped for four loci (one multiplex).
Training data matrix B is here. It
contains 24 diploid and 24 tetraploid individuals from Betula scored
for four loci in GenoDive format.
The fragment analysis data we generated during a lab part of the course will be
sent to you by e-mail.
A ZIP file with necessary software is here.
Download it, unzip and install some of the software:
GeneMarker
- software for visualization and analysis of fragment analysis data,
installation necessary
Arlequin
- software for population genetic analysis, just run the exe file
MSA - MicrosatelliteAnalyser, program for evaluation of diploid data, just
run the exe file
PAST - simple statistical program allowing also PCoA and tree building from
distance matrices, just run the exe file
neighbor - program for creating NJ trees, part of the
PHYLIP software package, just run the exe file
Tasks A
(to work with Nuphar data)
1. Analysis of raw data using GeneMarker + manual scoring
- open the files from sequencer in Genemarker and run the analysis with
'GS500_1' as a standard - Were all the data successfully analyzed?
- look through the patterns of all four loci in three colours assuming the
length in associated table - Can you recognize homo- and heterozygotes? How?
Can you distinguish stutter bands and +/- A peaks? How?
- score alleles for all loci according the scheme for MSA (code
homozygotes as two alleles with the same length), look at protocols (page
24) for the correct file structure, use Excel for table creation
- save the data as a text file
2. Calculating lengths of flanking regions
- find sequence in
GenBank according to accession number in the table, switch view to FASTA
and copy the sequence
- find forward primer sequence (e.g., in Word using CTRL+F) and label it
- make reverse-complement of reverse primer sequence (e.g., using
RC.exe) and find and label
the sequence
- find repetitive sequence (microsatellite motif) and label it
- subtract length of repetition from total amplicon length (incl. primers)
- submit all the values using this
Google Form
- fill the lengths of flanking regions to the data table
3. Calculations using MSA
- copy MSA to the folder with your input data matrix
- open MSA and follow the instructions in protocols (page 25) to calculate
F-statistics, distance matrices and input files for other software
- go through the file structure of the results generated and get familiar
with them (use MSA documentation)
4. AMOVA analysis using Arlequin
- following the instructions in protocols (pages 25-26) calculate AMOVA
analysis using both FST- and RST-like approaches (two separate calculations
necessary) - What are the differences?
5. PCoA and trees in PAST
- following the instructions in protocols (pages 26) create PCoA and tree
from distances matrices generated by MSA (both inter-individual and
inter-populatin), try different distance matrices (Nei, DMS etc.) - What are
the differences?
6. Population-genetics calculations in FSTAT (optionally)
- convert the Genepop file (generated by MSA) to FSTAT format using
Utilities-File Conversion, open the converted file
- calculate gene diversities and F-statistics (per locus and samples,
global statistics, HW-testing)
- check the output file with results
Tasks
B (to work with Betula data)
1. Analyze the matrix using polysat package in R
- open R and install following packages: polysat, ade4, ape (e.g.
install.packages("polysat"))
- import the data into R, check and complete ploidy information using the
commands in presentation (slide 24)
- calculate Bruvo distance among individuals and make PCA plot (command on
slide 24)
- recode the allelic data to binary format (allele presence/absence) and
perform PCoA (command on slide 24) - Is there any difference between the two
plots?
- repeat the previous step with the other two coefficients (SMC - simple
matching coefficient, Sorensen) - Does this make a difference?
- calculate number of alleles per locus and ploidy/population (command on
slide 24) - What are the differences?
Good luck and thank you for joining the practical course...