COMPARE FAQ
Potential sequences for inclusion in the database will be identified by searching publicly available protein databases (NCBI) using keyword filtering approaches. These sequences will then be manually curated. The criteria for this filtering process will be publicly available. Once sequences are identified, a panel of academic peer reviewers will consider these sequences and their associated published literature to determine if they should be included in the 2017 database. The entire process and filtering algorithm will be transparent and the steps of the process will be documented. Criteria for inclusion or exclusion will also be developed and published.
Professional scientific staff at HESI will provide management oversight for the sequence search (to be conducted by informatics experts), academic peer review panel, and public release of the final database. A public-private steering team convened by HESI will provide input into matters of process but will not have any influence on decisions regarding sequence inclusion/exclusion in the database.
As genomic sequencing technology has become widespread, the number of sequences to be filtered has grown exponentially. The COMPARE process will accommodate this growth by implementing an automated cutting-edge and high-throughput bioinformatics platform to identify a meaningful subset of sequences for scientific review by a diverse group of recognized allergy experts. The COMPARE process will also meet contemporary needs for the population criteria of a well-documented and sustainable allergen database.
The COMPARE database relies on the contribution of scientific expertise as well as in-kind and direct financial support from both public and private scientific organizations to develop this public resource. If you would like to learn more about how you or your organization can contribute, please contact us here.
The first iteration of the database (COMPARE 2017) was released and publicly available on this website as of 03 February 2017. The database will be updated annually with the release of a new version.
The second iteration of the database, COMPARE 2018, is the current version, accessible through the “Database” tab of this website.
The complete 2017 database consists of 14 new unique sequences as well as all the allergens listed in the 2016 AllergenOnline (AOL) database, for a total of 1970 allergens. For the 2016 process, a total of 55,641 entries were downloaded from NCBI using the keyword “allerg*”. A total of 251 entries were submitted to an independent Peer Review Panel (PRP) for review after bioinformatic filtering and a quality control manual examination. 43 entries were qualified by the PRP to be included in the database. After removal of redundancies, 14 unique sequences were identified for inclusion in the database. A more detailed explanation of each quality control step can be found in a white paper describing the “COMPARE Process” on the website www.comparedatabase.org, after the public release of the COMPARE database.
There is an international nomenclature group – WHO-IUIS Allergen Nomenclature Subcommittee (http://www.allergen.org/) – that is responsible for designating names to allergens, and where necessary re-naming already listed allergens in case of inconsistencies or changes in biological names of organisms. Names of allergens are built up by using the first three letters of the genus and the first letter of the species, separated by a space, followed by another space and a number related to the order of discovery. As an example, the first allergen from the organism, Blomia tropicalis, was named Blo t 1. Subsequent, although structurally distinct allergens, from the same organism then are named sequentially; Blo t 2, Blo t 3, etc.
Homologous allergens in different species will be given the same number, but in this system there are some inconsistencies due to the fact that a number may already have been used for another allergen that was discovered earlier. An example of such inconsistency is the homologue of the major birch pollen allergen Bet v 1 in peanut is known as Ara h 8, because the name Ara h 1 was already taken.
Note: the IUIS database is not a comprehensive database of clinically reviewed allergens; its purpose is the standardization and regulation of allergen nomenclature.