Command Line Tutorial
Folder Tree
To starts with, you need to choose a folder as the output dir of PDB-Profiling. The current path would be the default arg and you don’t have to specify --folder
arg if you intend to use the current path.
# The following commands are not required to run in your actual task.
# They are just to show the sub-folders and files that the `pdb_profiling` command would create if not exist.
pdb_profiling --folder $your_output_folder init # or just: pdb_profiling init
tree $your_output_folder
Click to view tree
.
├── api
│ ├── mappings
│ │ ├── all_isoforms
│ │ ├── best_structures
│ │ ├── cath
│ │ ├── cath_b
│ │ ├── ec
│ │ ├── ensembl
│ │ ├── go
│ │ ├── hmmer
│ │ ├── homologene
│ │ ├── homologene_uniref90
│ │ ├── interpro
│ │ ├── isoforms
│ │ ├── pfam
│ │ ├── scop
│ │ ├── sequence_domains
│ │ ├── structural_domains
│ │ ├── uniprot
│ │ ├── uniprot_publications
│ │ ├── uniprot_segments
│ │ ├── uniprot_to_pfam
│ │ └── uniref90
│ ├── nucleic_mappings
│ │ ├── rfam
│ │ └── sequence_domains
│ ├── pdb
│ │ └── entry
│ │ ├── assembly
│ │ ├── binding_sites
│ │ ├── carbohydrate_polymer
│ │ ├── cofactor
│ │ ├── drugbank
│ │ ├── electron_density_statistics
│ │ ├── entities
│ │ ├── experiment
│ │ ├── files
│ │ ├── ligand_monomers
│ │ ├── modified_AA_or_NA
│ │ ├── molecules
│ │ ├── mutated_AA_or_NA
│ │ ├── observed_residues_ratio
│ │ ├── polymer_coverage
│ │ ├── related_experiment_data
│ │ ├── residue_listing
│ │ ├── secondary_structure
│ │ ├── status
│ │ └── summary
│ ├── pisa
│ │ ├── asiscomponent
│ │ ├── interfacedetail
│ │ └── interfacelist
│ └── validation
│ ├── global-percentiles
│ │ └── entry
│ ├── key_validation_stats
│ │ └── entry
│ ├── nmr_cyrange_cores
│ │ └── entry
│ ├── nmr_ensemble_clustering
│ │ └── entry
│ ├── outliers
│ │ └── all
│ ├── protein-ramachandran-sidechain-outliers
│ │ └── entry
│ ├── protein-RNA-DNA-geometry-outlier-residues
│ │ └── entry
│ ├── rama_sidechain_listing
│ │ └── entry
│ ├── residuewise_outlier_summary
│ │ └── entry
│ ├── RNA_pucker_suite_outliers
│ │ └── entry
│ ├── summary_quality_scores
│ │ └── entry
│ ├── vdw_clashes
│ │ └── entry
│ └── xray_refine_data_stats
│ └── entry
├── coordinate-server
│ ├── ambientResidues
│ ├── assembly
│ ├── backbone
│ ├── cartoon
│ ├── chainsentities
│ ├── full
│ ├── het
│ ├── ligandInteraction
│ ├── residueRange
│ ├── residues
│ ├── sidechain
│ ├── symmetryMates
│ ├── trace
│ └── water
├── data_rcsb
│ ├── assembly
│ ├── branched_entity
│ ├── branched_entity_instance
│ ├── entry
│ ├── graphql
│ ├── nonpolymer_entity
│ ├── nonpolymer_entity_instance
│ ├── polymer_entity
│ ├── polymer_entity_instance
│ └── search
├── ensembl
│ ├── archive
│ │ └── id
│ └── sequence
│ └── id
├── eutils
│ └── efetch
├── graph-api
│ ├── compound
│ │ ├── atoms
│ │ ├── bonds
│ │ ├── cofactors
│ │ └── summary
│ ├── mappings
│ │ ├── all_isoforms
│ │ ├── best_structures
│ │ ├── ensembl
│ │ ├── homologene
│ │ ├── isoforms
│ │ ├── sequence_domains
│ │ ├── uniprot
│ │ └── uniprot_segments
│ ├── pdb
│ │ ├── bound_excluding_branched
│ │ ├── bound_molecule_interactions
│ │ ├── bound_molecules
│ │ ├── funpdbe
│ │ ├── funpdbe_annotation
│ │ │ ├── 14-3-3-pred
│ │ │ ├── 3Dcomplex
│ │ │ ├── 3dligandsite
│ │ │ ├── akid
│ │ │ ├── camkinet
│ │ │ ├── canSAR
│ │ │ ├── cath-funsites
│ │ │ ├── ChannelsDB
│ │ │ ├── depth
│ │ │ ├── dynamine
│ │ │ ├── FoldX
│ │ │ ├── M-CSA
│ │ │ ├── MetalPDB
│ │ │ ├── Missense3D
│ │ │ ├── p2rank
│ │ │ ├── POPScomp_PDBML
│ │ │ └── ProKinO
│ │ ├── ligand_monomers
│ │ ├── modified_AA_or_NA
│ │ ├── mutated_AA_or_NA
│ │ ├── secondary_structure
│ │ └── sequence_conservation
│ ├── pdbe_pages
│ │ ├── annotations
│ │ ├── binding_sites
│ │ ├── domains
│ │ ├── interfaces
│ │ ├── rfam
│ │ ├── secondary_structure
│ │ └── uniprot_mapping
│ ├── residue_mapping
│ └── uniprot
│ ├── annotations
│ ├── domains
│ ├── interface_residues
│ ├── ligand_sites
│ ├── secondary_structures
│ ├── sequence_conservation
│ └── unipdb
├── interactome3d
│ └── split
├── local_db
│ ├── custom.db
│ ├── I3DDB.db
│ ├── PDBeDB.db
│ ├── proteinsAPI.db
│ ├── RCSBDB.db
│ └── uniprot.db
├── model-server
│ ├── assembly
│ ├── atoms
│ ├── full
│ ├── ligandresidueSurroundings
│ ├── query-many
│ ├── residueInteraction
│ └── symmetryMates
├── pdb
│ └── data
│ └── structures
│ ├── divided
│ │ ├── mmCIF
│ │ ├── pdb
│ │ └── XML
│ └── obsolete
│ ├── mmCIF
│ ├── pdb
│ └── XML
├── pdbe_assembly_cif
├── proteins
│ └── api
│ └── proteins
├── swissmodel
│ └── repository
│ └── uniprot
└── UniProt
├── fasta
└── uploadlists
Map from transcript/protein identifier to UniProt Isoform
for canonical isoform, the isoform suffix would be dropped
pdb_profiling --folder $your_output_folder id-mapping --input $id_file
--folder $your_output_folder
can be ignored and the default value is the current path
Demo content for $id_file
ENSP00000491589
ENST00000379268
ENSP00000427757
ENSP00000266732
NP_001291289.1
ENST00000335295
NP_001165602.1
P21359-3
ENST00000402254
ENST00000371100
NM_000267.3
P68871
The mapping results are stored in the$your_output_folder/local/CustomDB.db
's IDMapping Table.
If your $id_file
has column name(s), you can add --column $column_name
, and the command becomes this:
pdb_profiling --folder $your_output_folder id-mapping --input $id_file --column $column_name
Map from UniProt Isoform to available PDB Chain instance
pdb_profiling --folder $your_output_folder sifts-mapping --func pipe_base --output $output_file
By default, this command would load UniProt Isoforms from the $your_output_folder/local/CustomDB.db
's IDMapping Table and perform the SIFTS mapping procedure.
Noted that if you start a new task in the same folder, you should delete $your_output_folder/local/CustomDB.db
's IDMapping Table or the complete DB file.
Load External File to process
If you already have a file containing UniProt Isoform identifiers, you can directly run the sifts-mapping
command without running the id-mapping
command:
pdb_profiling --folder $your_output_folder sifts-mapping --input $id_file --func pipe_base --output $output_file
Still, if you have column name(s) in your $id_file
, you can add --column $column_name
.
Noted that you should drop duplicate identifiers by yourself.
Perform Selection
Type | Args |
---|---|
Monomer | sifts-mapping --func pipe_select_mo (default) |
Homodimer | sifts-mapping --func pipe_select_ho |
Heterodimer | sifts-mapping --func pipe_select_he |
Protein-Ligand Interaction Pair | sifts-mapping --func pipe_select_else --kwargs '{"func": "pipe_protein_ligand_interface", "focus_assembly_ids": (0,)}' |
Protein-Nucleotide Interaction Pair | sifts-mapping --func pipe_select_else --kwargs '{"func": "pipe_protein_nucleotide_interface"}' |
Retrieve SMR Data
pdb_profiling --folder $your_output_folder sifts-mapping --input $id_file --output $output_file --func pipe_select_smr_mo
Map from PDB to UniProt Isoform
pdb_profiling --folder $your_output_folder sifts-mapping --input $id_file --output $output_file --func pipe_base --column pdb
Demo content for $id_file
pdb
1a01
2xyn
3hl2
4hho
Residue Mapping expanded from SIFTS mapping results
pdb_profiling --folder $your_output_folder residue-mapping --input $sifts_file --output $output_file
Noted that the default separator is \t
, if your input file has another separator, you can add --sep $your_sep
Demo content for $sifts_file
UniProt | pdb_id | struct_asym_id | new_pdb_range | new_unp_range | conflict_pdb_index |
P16144 | 3f7p | C | [[4,247]] | [[1126,1369]] | {} |
P49366 | 6wkz | A | [[4,372]] | [[1,369]] | {} |
Q96RI1-4 | 5q1b | A | [[5,233]] | [[254,482]] | {"38":"E","111":"E"} |
P49366 | 6xxh | A | [[1,369]] | [[1,369]] | {} |
O14558 | 4jus | A | [[1,104]] | [[57,160]] | {} |
Download Complete PDB Structure from PDB Archive
.pdb
format
pdb_profiling --folder $your_output_folder sifts-mapping --input $id_file --func fetch_from_PDBArchive --kwargs 'dict(api_suffix=\"divided/pdb/\")'
By default, the file suffix would be .ent.gz
. You can specify the file suffix you want.
pdb_profiling --folder $your_output_folder sifts-mapping --input $id_file --func fetch_from_PDBArchive --kwargs 'dict(api_suffix=\"divided/pdb/\", file_suffix=\".pdb.gz\")'
.cif
format
pdb_profiling --folder $your_output_folder sifts-mapping --input $id_file --func fetch_from_PDBArchive --kwargs 'dict(api_suffix=\"divided/mmCIF/\")'
Collect PDB-Based Annotations
From SIFTS Resources
Sequence Domain
api/mappings/sequence_domains/
(interpro & pfam)
- interpro:
api/mappings/interpro/
- pfam:
api/mappings/pfam/
Structrual Domain
api/mappings/structural_domains/
(scop & cath)
api/mappings/scop/
api/mappings/cath/
api/mappings/cath_b/
Others
api/mappings/go/
api/mappings/ec/
api/mappings/hmmer/
api/pdb/entry/secondary_structure/
From PDBe Graph Database (PDBe Graph API/Aggregated API)
FunPDBe Resources
graph-api/pdb/funpdbe_annotation/
graph-api/pdb/funpdbe_annotation/depth/
,graph-api/pdb/funpdbe_annotation/cath-funsites/
,graph-api/pdb/funpdbe_annotation/3Dcomplex/
,graph-api/pdb/funpdbe_annotation/akid/
,graph-api/pdb/funpdbe_annotation/3dligandsite/
,graph-api/pdb/funpdbe_annotation/camkinet/
,graph-api/pdb/funpdbe_annotation/canSAR/
,graph-api/pdb/funpdbe_annotation/ChannelsDB/
,graph-api/pdb/funpdbe_annotation/dynamine/
,graph-api/pdb/funpdbe_annotation/FoldX/
,graph-api/pdb/funpdbe_annotation/MetalPDB/
,graph-api/pdb/funpdbe_annotation/M-CSA/
,graph-api/pdb/funpdbe_annotation/p2rank/
,graph-api/pdb/funpdbe_annotation/Missense3D/
,graph-api/pdb/funpdbe_annotation/POPScomp_PDBML/
,graph-api/pdb/funpdbe_annotation/ProKinO/
,graph-api/pdb/funpdbe_annotation/14-3-3-pred/
Others
graph-api/pdb/bound_molecules/
graph-api/pdb/bound_molecule_interactions
graph-api/pdb/sequence_conservation/
graph-api/uniprot/sequence_conservation/
graph-api/uniprot/annotations/
- …
Code:
PDB('1a01').fetch_from_pdbe_api('api/mappings/sequence_domains/').result()
PDB(['1a01', '3hl2']).fetch('fetch_from_pdbe_api', api_suffix='api/mappings/structural_domains/').run().result()
Command line:
pdb_profiling sifts-mapping \
--input pdbs.dat \
--func fetch_from_pdbe_api \
--kwargs 'dict(api_suffix="api/mappings/interpro/")' \
--chunksize 50 \
--skip_pdbs ''