Help:

What are the various ID types?

1. AlphaFold Accessions: These accessions can be mapped directly from UniProtKB Accessions to download structural coordinate files from AlphaFold Database.

2. UniProt Accessions: These are 6 or 10 alphanumerical character long stable identifiers for UniProtKB entries. They can be directly searched AlphaFold Structure Database. Click here to know more.

3. RefSeq Accessions: The NCBI Reference Sequence (RefSeq) project provides non-redundant curated data of sequence records and related information for numerous organisms, and provides a baseline for medical, functional, and comparative studies. A RefSeq record can be identified by its distinct accession number format that begins with two characters followed by an underscore (e.g., WP_). The prefixes of these accessions are of various types. Click here to know more.

4. Locus Tags: Locus_tags are identifiers that are systematically applied to every protein coding and non-coding gene in a genome. Their prefixes are unique to every organism and there is only one locus_tag associated with one gene. The locus_tag prefix can contain only alpha-numeric characters and it must be at least 3 characters long. Click here to know more.

5. Old Locus Tags: The old locus tags were used prior to the Prokaryotic Genome Annotation Pipeline (PGAP) re-annotation and the locus tags are the newer annotation. On the current RefSeq bacterial genome record, the /old_locus_tag qualifier is still annotated. Click here to know more.

6. TaxIDs: Each NCBI Taxonomy entry or TaxNode is assigned a public stable, unique numerical identifier-the taxonomy identifier (TaxId)-which is shared by all names for a specific TaxNode. Click here to know more.


Sample Inputs and Outputs:

If the input is AlphaFold Acessions or UniProt Accessions, the respective AlphaFold prediction will be downloaded. However, it is possible that some UniProt accessions will not have AlphaFold predicted structures, so no files will be downloaded for such cases.


AlphaFold IDs
AF-Q9ULZ0-F1
AF-Q8R143-F1
AF-Q9LU47-F1
AF-P05453-F1
AF-Q7WTR3-F1

or

UniProt Accessions
Q9ULZ0
Q8R143
Q9LU47
P05453
Q7WTR3
Structure files downloaded
AF-Q9ULZ0-F1.pdb, AF-Q9ULZ0-F1.cif, AF-Q9ULZ0-F1_error.json
AF-Q8R143-F1.pdb, AF-Q8R143-F1.cif, AF-Q8R143-F1_error.json
AF-Q9LU47-F1.pdb, AF-Q9LU47-F1.cif, AF-Q9LU47-F1_error.json
AF-P05453-F1.pdb, AF-P05453-F1.cif, AF-P05453-F1_error.json
AF-Q7WTR3-F1.pdb, AF-Q7WTR3-F1.cif, AF-Q7WTR3-F1_error.json


For RefSeq identifiers as the input, they could map either to single or multiple species such as example 1 and 3 in the table below. In cases when no structure file is downloaded, the reason could be that there is no corresponding UniProt Accession (example 2) or no predicted structure (example 4).


RefSeq Protein IDs
NP_666037.1
WP_296409975.1
WP_004157415.1
NP_056986.2
Mapped UniProt Accession Structure files downloaded
Q8R143 AF-Q8R143-F1.pdb, AF-Q8R143-F1.cif, AF-Q8R143-F1_error.json
Not found No files downloaded
A0A831EQJ9
Q7WTR3
D4I2C8
The files corresponding to all 3 UniProt accessions will be downloaded
O95071 No predicted structures for this accession are available on AlphaFoldDB -
No files downloaded


For old locus tags as the input, they are mapped to the corresponding UniProt accession. In cases when no structure file is downloaded, the reason could be that there is no corresponding UniProt Accession (example 2, 4) or no predicted structure (example 3).


Old Locus Tags
GVO57_04600
L3078_27850
CL52_18575
LN051_09145
Organism Name Locus Tag Old locus Tag UniProt Accession Files Downloaded
Sphingomonas changnyeongensis GVO57_RS04600 GVO57_04600 A0A7Z2NV19 AF-A0A7Z2NV19-F1.pdb, AF-A0A7Z2NV19-F1.cif, AF-A0A7Z2NV19-F1.bcif, AF-A0A7Z2NV19-F1_error.json
Streptomyces deccanensis L3078_RS27850 L3078_27850 Not found No files downloaded
Stutzerimonas balearica DSM 6083 CL52_RS18275 CL52_18575 A0A8D3Y4T1 No predicted structures for this accession are available on AlphaFoldDB -
No files downloaded
Staphylococcus ratti strain CCM 9025 LN051_RS09145 LN051_09145 Not found No files downloaded


For locus tags as the input, they are first mapped to old locus tags and these old locus tags are then mapped to UniProt accessions. This is done because the old locus tags have a one-to-one association with UniProt accessions. No structure file is downloaded, when a corresponding UniProt accession (example 2) or predicted structure (example 4) is not found.


Locus Tags
GVO57_RS04600
L3078_RS27850
CL52_RS18275
J0917_RS17640
Organism Name Locus Tag Old locus Tag UniProt Accession Files Downloaded
Sphingomonas changnyeongensis GVO57_RS04600 GVO57_04600 A0A7Z2NV19 AF-A0A7Z2NV19-F1.pdb, AF-A0A7Z2NV19-F1.cif, AF-A0A7Z2NV19-F1.bcif, AF-A0A7Z2NV19-F1_error.json
Streptomyces deccanensis L3078_RS27850 L3078_27850 Not found No files downloaded
Stutzerimonas balearica DSM 6083 CL52_RS18275 A0A8D3Y4T1 A0A8D3Y4T1 No predicted structures for this accession are available on AlphaFoldDB -
No files downloaded
Streptomyces nodosus J0917_RS17640 Not found Not found No files downloaded


For NCBI TaxIDs as input, the structure files will be downloaded for all available AlphaFold predictions of proteins in that organism.



API:

This service allows you download structures programatically on your terminal.

Template URL for single accession as input:
https://project.iith.ac.in/sharmaglab/alphafoldextractor/api/<id-type>/<id>

Template URL for multiple accessions as input:
https://project.iith.ac.in/sharmaglab/alphafoldextractor/api/<id-type>

<id-type>: oldlocustag / locustag / uniprot / alphafold / refseq / taxonomy

For multiple accessions, only ids and optionally format are accepted in request body. format deafaults to 'pdb'.

ids : Comma seperated values. Spaces and '_' are tolerated. Unsupported symbols are converted to '_'.
format : [pdb] / cif / bcif / pae / all

Examples:

Single accession as input:

curl -JL https://project.iith.ac.in/sharmaglab/alphafoldextractor/api/oldlocustag/MXAN_1028 > structures.zip

wget --output-document=structures.zip https://project.iith.ac.in/sharmaglab/alphafoldextractor/api/oldlocustag/MXAN_1028


Multiple accessions as input:

curl -OJL -d "ids=MXAN_1028,GVO57_04600" https://project.iith.ac.in/sharmaglab/alphafoldextractor/api/oldlocustag
curl -JL -d "ids=MXAN_1028,GVO57_04600&&format=bcif" https://project.iith.ac.in/sharmaglab/alphafoldextractor/api/oldlocustag > MXAN_1028.zip
curl -JL -d "ids=MXAN_1028,GVO57_04600" -d "format=bcif" https://project.iith.ac.in/sharmaglab/alphafoldextractor/api/oldlocustag > MXAN_1028.zip

wget --post-data="ids=MXAN_1028,GVO57_04600" --content-disposition --trust-server-names https://project.iith.ac.in/sharmaglab/alphafoldextractor/api/oldlocustag
wget --post-data="ids=MXAN_1028&format=bcif" --output-document=MXAN_1028.zip https://project.iith.ac.in/sharmaglab/alphafoldextractor/api/oldlocustag