The databases available for searching, along with their corresponding versions, are shown in the following table.
|PDB Chain||as of 2019-06-26|
You can search by a structure id or an uploaded PDB file.
The structure id can be any SCOPe, CATH, or ECOD domain identifier or a PDB id and PDB chain id concatenated together for identifying whole chains within the PDB. The structure id does not have to match the database type you are searching -- this is a feature of RUPEE that arose naturally when implementing the upload PBD file feature.
When uploading a PDB file, only the first chain of the first model is considered. Additionally, all backbone atoms (i.e. N, CA, C, and O) should be present for the search to be effective. If you want to find structures similar to a given domain, then upload the domain. If you want to find structures similar to a protein chain, then upload the chain and search the PDB Chain database.
The available search filters change dynamically based on the selected search database and search by criteria. SCOPe, CATH, and ECOD are hierarchical classifications. CATH designates a representative domain for each grouping at each level of the CATH hierarchy whereas SCOPe and ECOD do not (or at least I'm not aware of this being the case). On the other hand, whole PDB chains are not classified into a hierarchy at all. Given the above, the SCOPe, CATH, and ECOD databases allow you to filter the search results by differences from the query structure for different hierarchy level classifications. In addition to filtering by different classifications, the CATH database allows you to filter by hierarchy level representatives. Currently, search filters are not provided for searching the PDB Chain database. Search filters allow for the discovery of structural similarities between differentially classified domains while preventing the results from being buried by known similarities.
Return structures similar to the full-length of the query.
Return structures that contain the query structure.
Return structures similar to a fragment of the query structure.
For Top-Aligned mode, the initial filtering with min-hashing and LSH still functions as described in the PLoS ONE paper. However, once filtered matches are obtained, instead of adjusting Jaccard similarities estimates, a simplified Needleman-Wunsch residue descriptor alignment between the query structure and all filtered structures is performed. Mismatches and gaps are penalized -1 points and matches are awarded +1 points. For containment searches, depending on whether or not the search type is Contained In or Contains, one of the sides of the dynamic programming matrix is not penalized for the opening gap and end gap. Likewise, for containment searches, the length of one of the structures is used to normalize the TM-score rather than using the average length of compared structures as is used for the Full-Length search type. Once alignment scores are obtained for the filtered structures, TM-align alignments are run as described in the PLoS ONE paper.
All-Aligned mode skips the min-hashing and LSH steps and instead, for each structure in the searched database, performs the simplified Needleman-Wunsch algorithm to obtain an alignment that is used as the initial alignment in a modified TM-align algorithm that does not attempt to find initial alignments by other means. This allows RUPEE to apply the modified TM-align to all available structures in a reasonable amount of time, typically between 5 and 10 minutes. The results of all-aligned mode are virtually identical to top-aligned mode for known structures. However, for the case of novel structures, such as those output from protein structure prediction protocols, all-aligned mode is an improvement.