Skip to the content.

Data Science Notebooks

The pages in this section describe various projects I have undertaken with publicly avaialble datasets, mostly on Kaggle. They provide an opportunity to see practical demonstrations of my data science work.

Clustering Proteins in Breast Cancer Patients
Using the Breast Cancer Proteome dataset, I identified clusters of proteins with related activity, and investigated using them to predict clinical outcomes. Illustrates data reduction, hierarchical clustering, logistic regression and linear regression
The Entropy of Alice In Wonderland
Using Montemurro and Zanette’s algorithm to identify significant words and sentences in the text of Alice’s Adventures in Wonderland. Illustrates information theory
The Grammar of Truth and Lies
Using grammatical features to distinguish real from fake news. Illustates latent semantic indexing, logistic regression and random forests
Is It A Mushroom or Is It A Toadstool?
Predicting whether or not fungi are edible. Illustrates Bayes’ theorem and information theory
Part of Speech Tagging
A video, and associated Binder notebook, discussing different approaches to Part of Speech Tagging. Illustrates Hidden Markov Models

by Dr Peter J Bleackley