LOVD Scripts
Reference Sequence Parser
LOVD currently comes with one script that is very useful if you want to create coding DNA reference sequences for the genes in your database. For it to work, you will need to make the refseq directory writable. If you also want the script to generate a GenBank file for you, you'll also need to make the genbank directory writable.
Making the refseq and genbank directories writable
Your LOVD package comes with two directories called 'refseq' and 'genbank'. As explained above, these directories need to be made writable. Making the 'genbank' directory writable is only necessary if you want the reference sequence parser to generate a GenBank file for you. Below we will explain how to do that on Unix/Linux systems. If your LOVD is installed on a different server (Windows, Macintosh), please contact the person responsible for that server in order to make the directory world writable.
 |
For advanced users It is actually more secure to do a 'chgrp' on the folder to the user running the webserver, than to make the directory world writable. If you don't know which user the webserver runs as (i.e. 'www-data' or 'httpd'), you will need to do a 'ps' to find out which user that is. After that, make sure the groupbit has been set to 7 (rwx). This way, the Reference Sequence Parser works without having to make the refseq directory world writable. |
Using the commandline
- Log in to the server using SSH, telnet or any non-graphical (S)FTP client.
- Move into the directory LOVD is placed using the 'cd' command.
- Type in 'chmod 777 refseq genbank'.
- Check if the Reference Sequence Parser is still generating an error. If not, you're done.
Using a graphical (S)FTP client
Of course, there are so many different FTP clients that I can't write a exact manual on how to change rights on a folder. This will describe a generally used method.
- Log in to the server using your favorite (S)FTP client.
(FTP is a protocol used to transfer files to or from the internet. If you have uploaded your LOVD files to a server, you can use the same program you used then.)
- Move into the directory LOVD is placed (usually by double clicking on the folder names).
- Right click on the folder 'refseq', this will bring up a menu.
- Select 'chmod' if it is an option. Otherwise, click 'Properties' and look for the folder permissions/rights.
- Fill in '777' as the new value.
- Repeat steps 3-5 for the 'genbank' directory.
- Check if the Reference Sequence Parser is still generating an error. If not, you're done.
Input formats Reference Sequence Parser
The Reference Sequence Parser accepts specific input formats for each step. Step 1 requires a GenBank file, and will generate the formats for steps 2 and 3 automatically, so starting at this step is recommended.
Step 2 requires the genomic sequence formatted to provide the positions of the upstream sequence, exons, introns, downstream sequence and the start of the translation.
Step 3 requires the coding DNA sequence formatted to provide the positions of the exon borders and the start of the translation.
All formats are case-insensitive. It does not matter whether you use capital or non-capital nucleotides.
Input format for step 1
Step 1 requires a valid GenBank file, uploaded into the 'genbank' directory. Preferably, the GenBank file contains only your gene of interest and only one transcript is defined. If more than one transcript is defined, select the number as the "mRNA field" value on the form of step 1.
Step 1 will create the correct format for step 2, so this is the recommended step to start out with.
Input format for step 2
UPSTREAM<EXON>INTRON<EXON>INTRON<EXON>INTRON (...etc...) <EXON>DOWNSTREAM
Make sure you include the starting point of the translation by putting a '|' in front of the 'a' of the 'atg' starting codon. If you started at step 1, this is done for you automatically.
Example:
cccccccc<gggggg|at>tttttttt<gggggggg>aaaaaaaa
will parse:
- 'cccccccc' as the upstream sequence.
- 'ggggggat' as exon 1, with the translation starting at the 'a'.
- 'tttttttt' as intron 1.
- 'gggggggg' as exon 2.
- 'aaaaaaaa' as the downstream sequence.
Step 2 will create the upstream sequence, intronic sequences and the downstream sequence and save them in the refseq directory. It will also create the input for step 3 for you.
Input format for step 3
EXON;EXON;EXON; (...etc...) ;EXON
Make sure you include the starting point of the translation by putting a '|' in front of the 'a' of the 'atg' starting codon. If you started at step 1 or 2, the same start codon is used.
Example:
gggggg|at;gggggggg
will parse:
- 'ggggggat' as exon 1, with the translation starting at the 'a'.
- 'gggggggg' as exon 2.
Step 3 will create the coding DNA sequence including the translation and save it in the refseq directory.
For examples on how these reference sequences will look like, take a look at the reference sequences www.DMD.nl, such as the CAPN3 reference sequence.
|