This section of the documentation is provided for experienced computer users who wish to know how the data is stored in SPARKS. By knowing this you may be able to access the files from your own analysis programs to go beyond the capabilities of SPARKS. However, please keep in mind that you also have the power to destroy all your records. Make sure you know what you are doing and that you have a backup of your data. Please do not modify any SPARKS data files from outside of the SPARKS program. Doing so disables SPARKS’ powerful edit checking.
SPARKS is written in an extended version of the dBASE 3 plus language. It is compiled using WordTech’s Quicksilver version 1.3 compiler. All the data files and lookup table files, and the indexes, are standard dbase formats that may be read from any dbase program. Each studbook kept in SPARKS is stored in a separate sub-directory under the 'SPARKS sub-directory.
Their are four main .dbf files and associated indexes for them. A complete description of them is stored in a master dictionary called metafile.dbf.
Structure for database: MASTER.DBF
| Field | Type | Width | Description | |
|---|---|---|---|---|
| STUD_ID | Character | 6 | KEY-Studbook Number | MASTER |
| SEX | Numeric | 1 | Sex | MASTER |
| HYBRID | Logical | 1 | Hybrid flag | MASTER |
| DAM_ID | Character | 6 | Dam's Studbook Number | MASTER |
| SIRE_ID | Character | 6 | Sire's Studbook Number | MASTER |
| BIRTH_TYPE | Character | 1 | Birth type MASTER | |
| BDATE | Date | 8 | Birth Date | MASTER |
| BIRTH_EST | Character | 1 | Confidence in Birth Date | MASTER |
| REARING | Character | 1 | How young was raised | MASTER |
| MGMT_PLAN | Logical | 1 | Global Management Plan | MASTER |
| SURPLUS | Logical | 1 | Management Plan Surplus | MASTER |
| UPDATE | Date | 8 | Last update to record | MASTER |
| SEND | Logical | 1 | Data has been changed | MASTER |
| RSORT | Character | 9 | Report Sort Order | MASTER |
| SELECT1 | Logical | 1 | Report Select Flag 1 | MASTER |
| SELECT2 | Logical | 1 | Report Select Flag 2 | MASTER |
For each studbook specimen, there is only one master record. However, in the moves file there is at least one record and often more. The key field, STUD_ID, is common to all four files and is the key that links all the data for a single studbook specimen together.
Structure for database: MOVES.DBF
| Field | Type | Width | Description | |
|---|---|---|---|---|
| STUD_ID | Character | 6 | KEY-Studbook Number | MOVES |
| TRAN CODE | Character | 2 | Type of transaction | MOVES |
Technical Reference 39
| PHYSICAL | Logical | 1 | Physical transfer | MOVES |
| OWNER | Logical | 1 | Ownership transfer | MOVES |
| LOCATION | Character | 9 | Physical location | MOVES |
| LOCAL_ID | Character | 6 | institutions local ID | MOVES |
| TRAN_DATE | Date | 8 | Date of transaction | MOVES |
| TDATE_EST | Character | 1 | Confidence in Trans Date | MOVES |
| REM_DATE | Date | 8 | Removal Date | MOVES |
| RDATE_EST | Character | 1 | Confidence in Rem Date | MOVES |
| INSTCODE | Character | 9 | Institution code | MOVES |
Structure for database: SPECIALS.DBF
| Field | Type | Width | Description | |
|---|---|---|---|---|
| STUD_ID | Character | 6 | KEY-Studbook Number | SPECIALS |
| CODE | Character | 2 | Type of Special Data | SPEClALS |
| COMMENT | Character | 65 | Text | SPECIALS |
| SPEC_DATE | Date | 8 | Date of Special Data | SPECIALS |
The UDF file will only contain records if the user has defined UDF’s. Again, the key field is always there, followed by any UDF fields.
Structure for database: UDF.DBF
| Field | Type | Width | Description | |
|---|---|---|---|---|
| STUD_ID | Character | 6 | KEY-Studbook Number | UDF |
Each .dbf file has an index file (.NDX) of the same name to speed retrieval. The STUD_ID field is always part of the index key field.
Where the variable &xSPARKS is the sub-directory that the SPARKS system is installed in and &xSTUD is the name of the studbook.
The largest data file by far is the institution lookup table with over 6250 records. This is the very important list that contains the name of most of the worlds zoos and aquariums, museums, dealers and exchanges, many individual collectors, and non-exhibit centers. There is room to store the complete address, although
Technical Reference 40
only some are provided. Each entry is assigned a mnemonic code, often a city abbreviation. It is this code that is used by SPARKS to record all locations. Without the consistency forced by SPARKS to use the same name for a location, it would not be possible to retrieve reports based upon any geographic criteria.
Structure for database: ISISISF.DBF
| Field | Type | Width | Description |
|---|---|---|---|
| INSTCODE | Character | 9 | ISIS numeric geographic code |
| ISIS_MEMB | Character | 1 | P for ISIS, A for ARKS |
| MNEMONIC | Character | 9 | ISIS alpha geographic code |
| INST_NAME | Character | 40 | Full institution name |
| ADDRESS | Character | 35 | Institution address |
| CITY | Character | 20 | City |
| STATE | Character | 20 | State/provence |
| COUNTRY | Character | 20 | Country |
| MAILCODE | Character | 10 | Mail code / ZIP code |
| PHONE | Character | 20 | Telephone number |
| ** Total ** | 185 | ||
This file also has index files, in this case two:
Technical Reference 41
Technical Reference 42
Assuming that you have exported an appropriate pedigree data set using the Export Report from SPARKS, to run GENES, simply type GENES from the operating system, and answer the questions that appear on the screen.
GENES will, if asked politely, do:
In addition to the statistics calculated in the GENEDROP program written by Georgina, the GENES version also calculates:
Target founder representations -- parity representations corrected for the irreversible loss of founder alleles that has already likely occurred in the pedigree: algorithm developed by Jon Ballou. Note that living wild-caught animals have the highest target representations, because none of their genes are yet irreversibly lost.
Mean allelic retention -- the fraction of a founders genes that are present in at least one copy in the living descendant population.
Founder genomes surviving -- the summed allelic retention; i.e., the number of founder alleles still in the population.
Genetic Evaluation using GENES 43
"Heterozygosity" is used for several different, though closely related, concepts by geneticists. Most simply, the heterozygosity of a population as the proportion of the induviduals that are heterozygous at the locus or loci of interest This is often termed the "observed heterozygosity" of a population
In a randomly mating population (i.e., one in Hardy-Wemberg-Castle equilibrium), the mean. heterozygosity as expected to be H = 1 - sum(pi2), in which P1 m the frequency of allele i (The expected frequency of homozygotes for each allele is pi2) The heterozygosity expected under Hardy-Weinberg-Castle equihbrium often termed the "expected heterozygosity" of a population
For many genetic ion (typically 50X to 90X), all induviduals of a population are homozygous for a single allele, x e, the locus is monomorphic In population management, as. m other evolutionary processes, such invariant loci are of relatively little interest. (Evolution requires variation) Often, we are concerned with not the absolute heterozygosity (observed or expected), but rather the heterozygosity of a population relative to the heterozygosity of some starting reference population The fractional heterozygosity is termed the "gene diversity" of a population and is sometimes symbolized P (Pi = Hi/Ho in which Pi is the gene diversity at time t, and H1 and Ho are the expected heterozygosities at times t and 0).
Inbreeding reduces the probabibty that an individual is heterozygous at any given locus, and the inbreeding coefficient, F, of an individual is defined as the:fractional reduction of that individual’s heterozygosity (across all loci) relative to the mean expected heterozygosity of the population [F1 = (He - H1)/He, in which F1 as the inbreeding coefficient of individual i, He is the expected heterozygosity of the.population at some reference time point and H1 is the (observed) heterozygosity of individual i ]. Note that the mean inbreeding coefficient (at.time t, relative to reference time D) of a small population that is in Hardy-Weinberg-Castle equilibrium is given by F1 = (H0 - H1)/H0 = 1 - Pt.
In the gene drop simulation is GENES (and typically in any founder analysis), the starting (observed) heterozygosity is set at 1.0, because each founder is given two unique alleles. The expected heterozygosity among the founders as 1 - sum{[ 1/(2 x Nf)]2], in which Nf is the number of founders, because Pi = 1/(2 x Nf) For reasons I won’t explain here, this expected heterozygosity of the founders is also equal to the fraction of the (expected) heterozygosity of the mid population that as expected m the founder stock (i e, the "gene diversity" of ihe founders relative to the mid population from which they came, Pf = 1 - sum{fi/(2 X Nf)]2}).
With this clarifying (?!) background on the distinction between observed heterozygosity, expected heterozygosity, gene diversity, and inbreeding coefficients, we now continue with the output from GENES:
Fraction of wild heterozygosity retained -- the "gene diversity" of the captive population: the expected heterozygosity in the living population relative to the wild population from which the founders were taken.
Genetic Evaluation using GENES 44
Fraction of wild heterozygosity lost – 1 minus the heterozygosity retained. If the population were randomly mating (few populations are), then the fraction of heterozygosity lost would be equal to the mean inbreeding coefficient of the population.
Mean inbreeding coefficient realized – the mean inbreeding coefficient within the living descendant population. This is also equal to one minus the observed heterozygosity of the descendant population.
Founder genome equivalents – the number of equally represented founders, with no loss of founder alleles, that would yield the amount of genetic diversity in the living descendant population. Thus, the age is that number of newly wild caught animals that would be needed to obtain the genetic diversity in the present captive population. Founder equivalents – the number of equally represented founders, with the observed losses of founder alleles, that would yield the amount of genetic diversity observed in the living descendant population. Founder equivalents do not correct for the losses of alleles in population bottlenecks, whereas founder genome equivalents do. (See Lacy 1989 paper in Zoo Biology).
Each of the above are calculated on the total pedigree and also on a subset that excludes contributions from animals with unknown parents (which otherwise are treated as founders). Also given are the summary statistics (mean retention, heterozygosity, founder equivalents, etc.) attainable with "perfect” management in the future, i.e., if all target founder representations are met and no further allelic losses occur.
Before running GENES, the directory should contain:
To this, GENES will add data matrices xxxxxxxx.rf and xxxxxxxx.des, and output files INBREED.PRN, FOUNDER.PRN, and GD.PRN. The program also creates several temporary files that will be deleted when the program terminates normally.
The inbreeding analysis assumes that UNK and WILD parents are unrelated to all other animals – it cannot do otherwise. Thus, animals with unknown parents will be treated as wild-caught founders.
If one parent is known (and captive), but the other parent is WILD or UNK (as would occur if a wild-caught female gave birth to an offspring sired in the wild), GENES will treat the unknown parent as a founder. The "studbook number” of that pseudo-founder is set equal to the negative of the studbook number of the known (captive) parent. (This pseudo-founder is not added to the studbook, however, it is simply assumed to exist for the genetic calculations.) If an animal gives birth to several offspring with an UNK or WILD animal for the other parent, the program assumes that the unknown (pseudo-founder) parent is the same for all those offspring. The gene drop program outputs summary statistics for the entire data set (treating unknowns as wild-caught founders) and for those only founders recorded as truly WILD. (A few statistics cannot be calculated on the subset without unknown ”founders". Those spots are left blank on the output.)
One unknown parent causes no problem for the inbreeding calculations, beyond the obvious loss of information (and possible under-estimation of F) if the unknown parent is in fact related to other animals in the studbook.
If none of this makes sense, try the program and see what happens.
GENES is dimensioned to handle up to 2000 animals. Let Lacy know if you want a version with larger
Genetic Evaluation using GENES 45
GENES can be made quite fast by running it on a RAM disk. The program will be very slow, and will handle only very small studbooks, if it is run from a floppy disk. The program will make use of a math coprocessor if the computer has one, and the program will run noticeably faster.
GENES assumes that the studbook data are ASCII format as produced by using the SPARKS Export utility. Minimally necessary is a file containing the following fields:
Lacy welcomes comments on GENES, to which he will respond if he has time. No guarantees of any sort are provided with this software: little effort has gone into testing and debugging. Use at your own risk.
Genetic Evaluation using GENES 46
[ Table of Contents | next page ]