Reference
RESTful API Reference
🚧 stands for waiting for implementation
- PDBe REST API
- PDBe Graph API (Neo4j Graph DataBase)
- ModelServer API (A successor of the CoordinateServer)
PDBe
: https://www.ebi.ac.uk/pdbe/model-server/RCSB
: https://models.rcsb.org/ (NOTE: Its’ update frequency may fall behind the PDBe’s.)
- PDBe CoordinateServer API
- 🚧PDBe DensityServer API
- 🚧PDBe VolumeServer API
- SWISS-MODEL Repository API
- EBI Proteins API
- UniProt API
- Interactome3D API
- Ensembl REST API
- Eutils API
- https://eutils.ncbi.nlm.nih.gov/entrez/eutils/ (NOTE: currently only support minimum use)
- Download data from PDB Archive against unexpected needs
- RCSB Data API
- https://data.rcsb.org/
- both RESTful API & GraphQL API
- RCSB Search API
- RCSB PDB 1D Coordinate Server API
Package Method API Reference
Base, Compounds, PDB, SIFTS
from pdb_profiling.processors import Base, Compounds, PDB, SIFTS
PDBAssembly, PDBInterface
from pdb_profiling.processors import PDBAssembly, PDBInterface
PDBs, SIFTSs
from pdb_profiling.processors import PDBs, SIFTSs
Identifier, Identifiers
from pdb_profiling.processors import Identifier, Identifiers
PDBeModelServer, PDBeCoordinateServer, PDBArchive, PDBVersioned
from pdb_profiling.processors import PDBeModelServer, PDBeCoordinateServer, PDBArchive, PDBVersioned
UniProts, UniProtFASTA, ProteinsAPI, EnsemblAPI, EutilsAPI, SMR
from pdb_profiling.processors import UniProts, UniProtFASTA, ProteinsAPI, EnsemblAPI, EutilsAPI, SMR
Interactome3D
from pdb_profiling.processors.i3d.api import Interactome3D
cif_gz_stream
from pdb_profiling import cif_gz_stream
PISAErrorWarning
from pdb_profiling.warnings import PISAErrorWarning
Column Name Reference
Terminology in PDB
Defined by PDBe
pdb_id
: PDB Entry IDresolution
: (pdb-101-explanation)experimental_method_class
: (pdb-101-explanation)experimental_method
: x-ray, nmr, em, otherrevision_date
: as name saiddeposition_date
: as name saidentity_id
: the entity identifier of a PDB Entity; for what is PDB Entity? please look at this linkmolecule_type
- polypeptide(L)
- cyclic-pseudo-peptide
- e.g. 3q9h,6iqg BUT polypeptide(L) was asigned to entity_poly.type
- cyclic-pseudo-peptide
- polypeptide(D)
- e.g.3ue7, struct_asym_id: A
- polydeoxyribonucleotide
- polyribonucleotide
- polydeoxyribonucleotide/polyribonucleotide hybrid
- e.g. 6ozg
- carbohydrate_polymer
- e.g. 2wmg
- bound
- ligand
- sugar
- other
- water
- polypeptide(L)
sequence
: one-letter-code of SEQRESpdb_sequence
: SEQRES sequence that use one-letter-code for standard amino-acid or nucleotide, three-letter-code-with-brackets for non-standard amino-acid or non-standard nucleotide or UNKca_p_only
: whether the PDB Entity only contains C-alpha atom for each observed residuechain_id
: the chain identifier of a PDB Chain defined by author (highly possible to clash among different entity)chain_id
: implement by PDBe REST APIauth_asym_id
: implement by PDBe ModelServer APIauthAsymId
: implement by PDBe CoordinateServer API
struct_asym_id
: the chain identifier of a PDB Chain (unique across all PDB Entity)struct_asym_id
: implement by PDBe REST APIlabel_asym_id
: implement by PDBe ModelServer APIasymId
: implement by PDBe CoordinateServer API
residue_number
: residue index in the aspect of the PDB Chain’s Sequence (SEQRES, Index from 1)author_residue_number
: residue index in the aspect of the PDB Chain’s Sequence but assigned by the author of this PDB Entryauthor_insertion_code
: residue insertion code provided by author in the PDB filemultiple_conformers
: those alternate conformers modelled for this residueassembly_id
- 0 stands for asymmetric unit
- 1 stands for biological assembly 1
- 2 stands for biological assembly 2
- and so on
- for what is asymmetric unit & biological assembly? please click the link
interface_id
- Defined by
PISA
- The interface identifier in the corresponding biological assembly
- Defined by
UniProt
: UniProt Isoform ID- isoform suffix not shown for canonical sequence
is_canonical
: whether the UniProt Isoform is the canonical isoform defined by UniProt-KBunp_residue_number
: residue index in the aspect of the UniProt Isoform’s Sequence (Index from 1)identity
: provided by SIFTS: sequence identity of PDB Entity SEQRES with UniProt Isoform (0-1)pdb_start
: The starting index of the SIFTS alignment segment in PDB entity SEQRES (Index from 1)pdb_end
: The ending index of the SIFTS alignment segment in PDB entity SEQRES (Index from 1)unp_start
: The starting index of the SIFTS alignment segment in UniProt Isoform Sequenceunp_end
: The starting index of the SIFTS alignment segment in UniProt Isoform Sequence
Defined by pdb-profiling
multi_method
: whether the PDB entry was determined by multiple method1/resolution
: as name saidid_score
- calculated by
-sum(ord(i) for i in chain_id)
- used for multi-criteria sorting of chains according to their chain_id
- calculated by
SEQRES_COUNT
: the count of the residues in SEQRESOBS_INDEX
: the index of observed/modelled (with coordinates) residues of the PDB Chain InstanceOBS_COUNT
: the count of observed/modelled (with coordinates) residues of the PDB Chain InstanceOBS_RATIO_SUM
: the sum of the observed/modelled (with coordinates) residues’s ratio of the PDB Chain InstanceBINDING_LIGAND_INDEX
: the residues that binding to ligands(including carbohydrate polymer) of the PDB Chain InstanceBINDING_LIGAND_COUNT
: the count of residues that binding to ligands(including carbohydrate polymer) of the PDB Chain InstanceSTD_INDEX
: the index of the standard residues of the PDB EntitySTD_COUNT
: the count of the standard residues of the PDB EntityOBS_STD_INDEX
: the index of the standard residues of the PDB EntityOBS_STD_COUNT
: the count of the observed standard residues of the PDB Chain InstanceNON_INDEX
: the index of non-standard residues of the PDB Entity (including UNK)NON_COUNT
: the count of non-standard residues of the PDB Entity (including UNK)UNK_INDEX
: the index of the UNK residues of the PDB EntityUNK_COUNT
: the count of the UNK residues of the PDB EntitydNTP_INDEX
: the index (in SEQRES) of DA|DT|DC|DG|DI in that PDB EntitydNTP_COUNT
: the count of DA|DT|DC|DG|DI in that PDB EntityNTP_INDEX
: the index (in SEQRES) of A|U|C|G|I in that PDB EntityNTP_COUNT
: the count of A|U|C|G|I in that PDB EntityARTIFACT_INDEX
- the index of those artifact residues in the PDB Entity
- including:
- Cloning artifact
- Expression tag
- Initiating methionine
- Linker
Entry
: UniProt Entry ID/Accessionunp_range
: mapped range of the UniProt Isoform’s Sequence with its corresponding PDB Chain’s Sequence (Index from 1)- generated from
unp_start
andunp_end
- generated from
pdb_range
: mapped range of the PDB Chain’s Sequence Sequence with its corresponding UniProt Isoform’s (Index from 1)- generated from
pdb_start
andpdb_end
- generated from
new_unp_range
: fixed(deal with InDel) mapped range of the UniProt Isoform’s Sequence with its corresponding PDB Chain’s Sequence (Index from 1)new_pdb_range
: fixed(deal with InDel) mapped range of the PDB Chain’s Sequence Sequence with its corresponding UniProt Isoform’s (Index from 1)sifts_range_tag
- (example)
Safe
Insertion
Deletion
InDel
reversed
- whether there is reversed mapped range in the aspect of UniProt Isoform Sequence (P00441 5j0c A)
- SIFTS是以PDB Chain Sequence的视角来匹配序列片段,少数情况会有把部分unp序列反向匹配
- “5j0c - it the circular permutant structure where authors have swapped the few chunk protein from front and back -(figure 2 in https://pubs.acs.org/doi/pdf/10.1021/jacs.6b05151). That’s why you see “the head of the UniProt sequence is mapped with the tail of the PDB-Chain sequence” – from Preeti Choudhary
repeated
- whether there is repeated mapped range in the aspect of UniProt Isoform Sequence (i.e. Q7KZ85-3 6gmh M)
- “In SIFTS, the segment generation is done from PDB point of view, that’s why you will see continuous pdb ranges. Seldom, in protein structures, you may see a same protein (uniprot accession) is present in copies/or is repeated” – from Preeti Choudhary
conflict_pdb_index
: the index(dictionary) of those residues that confilct with the reference sequence(UniProt Isoform) in the pdb mapped rangeconflict_pdb_range
: the range of those residues that confilct with the reference sequence(UniProt Isoform) in the pdb mapped rangeconflict_unp_range
: the range of those residues that confilct with the reference sequence(UniProt Isoform) in the unp mapped rangeunp_len
: the length of the UniProt Isoform SequenceInDel_sum
: SEQRES residues that fall into the range of Insertion or Deletion or InDel of the PDB Chain Instanceselect_tag
: whether in the recommanded representative setselect_rank
: the rank among all the candidate chains for the UniProt Isoform (1st denoted as the best)RAW_BS
: the weighted score that measure the correspondence between the UniProt Isoform and the PDB chain instance (apo-state) in the sequence level defined by pdb-profilingRAW_BS_IG3
: set the weight of the binding ligand residues to zero and then calculate the RAW_BS_1
: denoted as the partner chain 1_2
: denoted as the partner chain 2model_id
: the model ID of the chain in the corresponding biological assembly PDB format file- 0 denoted as the first model
asym_id_rank
- This column is defined in response to replication and rotation operation in the biological assembly
- It is the index of the order of occurrence of that
struct_asym_id
in the corresponding biological assembly - Both of the name and the content of this column defined by pdb-profiling
struct_asym_id_in_assembly
- This column is defined in response to replication and rotation operation in the biological assembly
- It is the regenerated
struct_asym_id
of the chain instance in the corresponding biological assembly - For asymmetric unit, the content in this column is the same as the content in the
struct_asym_id
column - The name of this column defined by
pdb-profiling
- The content of this column defined by
PDBe
unp_range_DSC
: the Dice Similarity Coefficient ofnew_unp_range_1
&new_unp_range_2
interface_range_1
: the range of the interaction’s interface in the aspect of partner1 chain (Index from 1)interface_range_2
: the range of the interaction’s interface in the aspect of partner2 chain (Index from 1)unp_interface_range_1
: the range of the interaction’s interface in the aspect of partner1 chain (mapped to the UniProt Isoforom)unp_interface_range_2
: the range of the interaction’s interface in the aspect of partner2 chain (mapped to the UniProt Isoforom)i_select_tag
: whether in the recommanded interaction representative seti_select_rank
: the rank among all the interacting-chains (1st denoted as the best)i_group
: the Interaction Groupin_i3d
: judge whether the PPI is in theinteraction3d
dataset (condisider both of the UniProt entry interaction and the PDB chain interaction wtih corresponding biological assembly and model-id)