Professional Development Winter School on Clinical Data Analytics, Statisticaland Machine Learning using R &Python,23rd-24th January,2017

The IEEE EMB MSRIT student chapter with the IEEE EMB Bangalore chapter organizedwinter school on clinical data analytics,statistical and machine learning using R and Python on 23rd-24th January,2017.

The objective of the school was:The genomic revolution in biology enables one to answer many questions in medical sciences like personalized medicine, the etiology of diseases like cancer, HIV, etc. However, the answers to these questions are impossible without a support of powerful computational and statistical tools that helps to understand and uncover the underlying network design principles responsible for these diseases. Also, predictive modelling of biological processes and drugs becomes significantly more sophisticated and widespread. By leveraging the diversity of available molecular and clinical data, predictive modelling could help identify new potential-candidate molecules with a high probability of being successfully developed into drugs that act on biological targets safely and effectively.

Also, with the advent of new biotechnological techniques massive amounts of genomics data are generated at a rapid pace from the experiments and the    analysis    of these  data requires tremendous amount of domain knowledge, solid computational background and strong programming skills. The entry cost of this highly interdisciplinary field consists of a good amount of understanding of molecular biology, genomics, algorithms,  programming, statistical computation, machine learning, stochastic processes, and other mathematical techniques that underlie biological design principles. Therefore, it is imperative to stitch biology, statistics, algorithms and mathematical models to analyze and interpret large-scale clinical and biological data. Though the need and potential applications of computational biology and bioinformatics is tremendous in India, currently very few groups have strength and capability in thisarea.

The speakers for the session were:

Day-1: Session 1 & 2:

Mr. Gunasekaran,Biostatistics Manager

ICON,Clinical Research Organization for Drug Development

Head Quartered in Dublin,Ireland

Day-1 :Session 3 & 4,Day 2,3,4

Dr.S.Mahesh Anand Founder & CTO, SCS-India.

The topics covered were:

Day -1


  • Introduction to Clinical Data & Scopefor Analytics
  • Different Definitions and Perspectives ofData
  • Sampling – Random & Non-randomSampling Methods
  • Clinical Data – Bio-statistician Perspective



  • Standard Error, Confidence Interval & SamplingDistribution
  • Basic Principles of Testing ofHypothesis
  • Test of Significance – t-test, one-way ANOVA, repeated measures ANOVA,Chisquare

and Non parametric Methods

  • Sample Size in Clinical DataAnalytics

Session -3

  • Role of R in Statistical Learning & Data


  • Numerical, Semantic and Sensory, Measurement Techniques
  • Data Analysis Vs DataAnalytics
  • R-Installation Instructions /Package Installation Procedure
  • Basics of R: R-programmingIntroduction: Vectors, Matrices & Arrays, Objects and Attributes
  • Introduction to List & Data Frames, Loopsand ConditionalExpressions


Session -4

  • Special functions inR
  • Creating user defined Functions inR
  • Graphical Representation of Data
  • Multiple Graphical Options: Box plot,Scatter plot, Histogram & Correlation Plot


Day -2


  • Retrieving Data from File (CSV, TEXT & XLS type) in R
  • Introduction to Data Pruning, Cleaningand Visualization,

Descriptive Statistical Measures



  • Introduction to Inferential Statistics, Hypothesis Testing, Univariate (T-test)and
  • Bivariate Analysis (F-Test) in R
  • Multivariate Analysis, Correlation,Regression Techniques
  • Fitting Linear Regression Model inR
  • Interpretation of Beta values inRegression Models


Session -3

  • Quadratic & Polynomial RegressionModels
  • Model Flexibility (Degrees of Freedom) Versus Inference
  • Need for Logistic Regression forQualitative Data
  • Implementation of Logistic Regression fora Clinical Data


Session -4

  • Machine Learning Models for ClinicalData Analytics
  • Parametric & Non-parametricModels
  • Introduction to Estimation,Prediction, Association models
  • K- Nearest Neighborhood model forEstimation, Prediction andClassification


Day -3


  • Machine Learning Algorithms for ClinicalData Tea/Coffee Break



Speaker: Clinical Data Scientist from Quintiles (Bangalore) or Data Scientist from Biocon

  • A Case study Demonstration of Machine Learning Algorithm for Healthcare Data Lunch

Session -3

  • Introduction to Classification Models in R
  • Classification & Regression Trees(CART)
  • Random Forest Classification Algorithm


Session -4

  • Clustering Algorithm for Clinical Data
  • Neural Network Models in R


Day -4


  • Introduction to Python & its significance in Clinical Data Analytics
  • Python Installation & usage ofJupyter Notebook forPython
  • Getting started withPython
  • Python programming essentials



  • Numerical Python
  • Array & Matrix Handling
  • IndexingData
  • Control Structures (If-loop, for-loop &While)


Session -3

  • Modules and Packages in Python: PANDAS Library for Data Analytics & ML
  • Loading Data from CSV, Converting Stringsto Float & Converting Strings toIntegers
  • Data Pre-processing : Normalizing& Standardizing theData


Session -4 Machine Learning & Data AnalyticsPrediction Models in Python

  • Implementation of Clinical DataAnalytics using Python
  • Discussion on pros & cons of R & Pythonfor Clinical Data Analytics
  • Summary, Scope & opportunitiesin Clinical Data Analytics