OSCAR Team Joint Winner of Data Hackathon Awards

The team from the Digital Health Lab led by Professor David Clifton at Oxford Suzhou Centre for Advanced Research (OSCAR) is the joint winner of Data Hackathon Awards for the IEEE COVID-19 Bioinformatics Drug Target Challenge. This challenge was initiated by the 2021 IEEE Healthcare Summit.

What’s IEEE Covid-19 Bioinformatics Drug Target Challenge?

The IEEE Covid-19 Bioinformatics Drug Target Challenge lasted for about one month (data released on the 27th of August and the submission deadline was 2nd of October). Its goal is to develop effective ML/AI-based models that can accurately predict the docking scores of candidate drug molecules on SARS-CoV-2 protein targets within a limited time period.

The traditional drug discovery process is expensive and time-consuming. To accelerate this process, a ML/AI-based pre-screening approach is applied to assist us with high-throughput virtual pre-screening of huge amount of drug candidates to identify highly potent candidates for experimental testing and further validation.

The whole dataset for the Challenge included docking scores of 300,457 drug molecules on 18 different COVID-19 protein docking targets, of which 90% is used for training models with the remaining 10% for testing. Drug candidates were represented in SMILES strings and selected from known drug databases, including ENAMINE, ZINC and DrugBank. The COVID-19 protein targets were provided by researchers at Argonne National Laboratory (ANL). The predicted scores were compared with the ground truth docking scores by the orginising committee, based on which the model accuracy was assessed in terms of the averaged mean absolute error (MAE) over all of the targets.

What’s the OSCAR solution to the Challenge?

The OSCAR team first embedded SMILES with four types of commonly used structural molecule fingerprints, then used XGBoost (a gradient tree-based ensemble learning method) to determine the optimal type of fingerprints and set benchmark results. Finally, they applied randome search and cross-validation to decide the optimal architecture and parameters of feed-forward neural networks. The resulting models obtained the best prediction performance in the test set.

OSCAR Recipients of Data Hackathon Awards