Analysis of microsatellite (SSRs) data

Molecular markers - computer practicals
DNA sequences Microsatellites NGS home

Analysis of microsatellite (SSRs) data

Nuphar lutea

Presentation can be found here.

Training dataset A is here. It consists of 15 samples of Nuphar lutea genotyped for four loci (one multiplex).
Training data matrix B is here. It contains 24 diploid and 24 tetraploid individuals from Betula scored for four loci in GenoDive format.
The fragment analysis data we generated during a lab part of the course will be sent to you by e-mail.

A ZIP file with necessary software is here. Download it, unzip and install some of the software:
GeneMarker - software for visualization and analysis of fragment analysis data, installation necessary
Arlequin - software for population genetic analysis, just run the exe file
MSA - MicrosatelliteAnalyser, program for evaluation of diploid data, just run the exe file
PAST - simple statistical program allowing also PCoA and tree building from distance matrices, just run the exe file
neighbor - program for creating NJ trees, part of the PHYLIP software package, just run the exe file

Tasks A (to work with Nuphar data)

1. Analysis of raw data using GeneMarker + manual scoring

open the files from sequencer in Genemarker and run the analysis with 'GS500_1' as a standard - Were all the data successfully analyzed?
look through the patterns of all four loci in three colours assuming the length in associated table - Can you recognize homo- and heterozygotes? How? Can you distinguish stutter bands and +/- A peaks? How?
score alleles for all loci according the scheme for MSA (code homozygotes as two alleles with the same length), look at protocols (page 24) for the correct file structure, use Excel for table creation
save the data as a text file

2. Calculating lengths of flanking regions

find sequence in GenBank according to accession number in the table, switch view to FASTA and copy the sequence
find forward primer sequence (e.g., in Word using CTRL+F) and label it
make reverse-complement of reverse primer sequence (e.g., using RC.exe) and find and label the sequence
find repetitive sequence (microsatellite motif) and label it
subtract length of repetition from total amplicon length (incl. primers)
submit all the values using this Google Form
fill the lengths of flanking regions to the data table

3. Calculations using MSA

copy MSA to the folder with your input data matrix
open MSA and follow the instructions in protocols (page 25) to calculate F-statistics, distance matrices and input files for other software
go through the file structure of the results generated and get familiar with them (use MSA documentation)

4. AMOVA analysis using Arlequin

following the instructions in protocols (pages 25-26) calculate AMOVA analysis using both FST- and RST-like approaches (two separate calculations necessary) - What are the differences?

5. PCoA and trees in PAST

following the instructions in protocols (pages 26) create PCoA and tree from distances matrices generated by MSA (both inter-individual and inter-populatin), try different distance matrices (Nei, DMS etc.) - What are the differences?

6. Population-genetics calculations in FSTAT (optionally)

convert the Genepop file (generated by MSA) to FSTAT format using Utilities-File Conversion, open the converted file
calculate gene diversities and F-statistics (per locus and samples, global statistics, HW-testing)
check the output file with results

Tasks B (to work with Betula data)

1. Analyze the matrix using polysat package in R

open R and install following packages: polysat, ade4, ape (e.g. install.packages("polysat"))
import the data into R, check and complete ploidy information using the commands in presentation (slide 24)
calculate Bruvo distance among individuals and make PCA plot (command on slide 24)
recode the allelic data to binary format (allele presence/absence) and perform PCoA (command on slide 24) - Is there any difference between the two plots?
repeat the previous step with the other two coefficients (SMC - simple matching coefficient, Sorensen) - Does this make a difference?
calculate number of alleles per locus and ploidy/population (command on slide 24) - What are the differences?

Good luck and thank you for joining the practical course...