Clust – Optimised software for consensus clustering of multiple heterogeneous datasets
The number of gene expression studies being conducted has risen exponentially with high-throughput techniques becoming more cost-effective. Many datasets generated from these studies are customised to answer specific research questions but could provide useful insights beyond the original purpose.
Cluster analysis segments data, also known as gene expression studies, provides meaningful groups with an understanding of the natural structure of the data. Existing cluster analysis tools, for gene expression datasets, can only process datasets individually.
Oxford University researchers have developed software that enables automated, simultaneous cluster analysis of multiple gene expression datasets, irrespective of their datatype and source organism. This enables the analysis of multiple gene expression datasets from various techniques and source organism to obtain novel biological insights.
Cluster analysis in isolation
Cluster analysis is routinely used on gene expression data to identify groups of genes that are regulated under certain conditions, such as diseased or non-diseased states.
Existing tools only perform their analyses on one dataset in isolation. This method fails to exploit the huge amount of data generated in the field, which may provide additional, potentially vital, information. Moreover, these tools cannot incorporate datasets produced from different sources, such as those that are from a different technique or a different source organism.
Automatic, simultaneous cluster analysis of multiple datasets
Researchers at the University of Oxford have developed Clust, a software package that enables simultaneous analysis of multiple heterogeneous datasets, producing clusters with dramatically improved accuracy. It is also fully automated with almost no need for manual intervention. Clust can take advantage of data from previous and current gene expression studies, irrespective of their techniques (RNA-sequencing, microarray analysis, protein expression etc.) or source organism, to greatly enhance our ability to derive meaningful conclusions from various datasets.
Further benefits of Clust include:
- No need to pre-process data
- No need to preset the number of clusters
- Control of cluster tightness through a single parameter
- Deals with data that requires different types of normalisation
- Missing values and multiple replicates are not a problem
Clust is not limited to gene expression datasets and can be applied to any data that is numerical.
Oxford University Innovation would like to hear from companies who may wish to employ this software to support their research and development.
about this technology