About the tool:
Today there is an avalanche of predicted structures available on the AlphaFold Structure Database, which has brought a huge
paradigm shift in the analysis and interpretation of protein functions. The access to the structure files in
PDB and mmCIF formats makes
it easy for structural biologists and bioinformaticians to perform analyses. AlphaFold DB provides an option
for 48 organisms’ bulk structure downloads
(model organisms and global health proteome) and otherwise very recently, the bulk download has been enabled
for up to 100 structures. However, there is no direct way to selectively
download all the structures of interest at once. Also, several bioinformaticians work with sequence data
from NCBI and have access to various protein
sequence identifiers but
cannot extract structures based on those sequence identifiers.
To bridge these gaps, we have
developed AlphaFoldDBExtractor which is a tool that lets users
download all available predicted structures from AlphaFoldDB for their dataset of interest with just a list
of taxonomy/protein accessions.
Input:
AlphaFoldDBExtractor takes the input in several formats including: List of specific proteins: |
---|
AlphaFold IDs |
Uniprot IDs |
Locus tags |
Old locus tags |
RefSeq Protein IDs |
All proteins of organims: |
---|
NCBI TaxID |
Workflow:
To download the structures from the FTP page of AlphaFold Database, the UniProt accessions can be used and the main protocol that this tool
utilizes is the mapping of other ID formats to UniProt Accessions. This is how the mapping is happening:
