PubMLST molecular typing databases
The opportunities for bacterial population genomics that are being realised by the application of next-generation sequencing require novel bioinformatics platforms. The PubMLST molecular typing databases, powered by the Bacterial Isolate Genome Sequence Database (BIGSDB) are scalable, web-accessible databases that meet these needs. BIGSDB enables phenotype and sequence data, which can range from a single sequence read to whole genome data, to be efficiently linked to a limitless number of bacterial specimens. Previously, it was not possible to analyse whole genomes using the gene-by-gene approach, the PubMLST databases now enable the elucidation of the structure and function of bacteria by means of a population genomics approach.
Sequencing the bacterial genome
Researchers at the University of Oxford have developed databases for hosting nomenclatures linked to whole genome sequences for molecular characterisation of bacteria.
The PubMLST website (https://pubmlst.org/) hosts curated molecular typing data for over a hundred microorganisms, providing sequence and allelic profile definitions for multi-locus sequence typing (MLST) and single-gene methods. In recent years, these have expanded to cover the whole genome with schemes such as core genome MLST (cgMLST) cataloguing the allelic
diversity found in hundreds to thousands of genes. These methods provide a common nomenclature for high-resolution strain identification and comparison.
The underlying genomics platform, BIGSdb, links molecular typing information to isolate provenance, phenotype, and increasingly genome assemblies, providing a rich resource for outbreak investigation and research into population structure, gene association, global epidemiology and vaccine coverage.
Scaling up with population genomics
Databases include those for Neisseria spp., Campylobacter spp., Staphylococcus aureus and Streptococcus pneumoniae, which between them contain over 61,000 genomes, linked to typing nomenclatures, structured catalogues of gene variants and provenance information.
Data are made available on an open access basis through the PubMLST website and its application programming interface. For private, commercial use Oxford University Innovation offers mirror-site licences to selective databases or on a fully-flexible basis. This facilitates local linking and integration of private data to the large amount of available genome data and authoritative
about this technology