This repository contains the materials, source codes and links which were explored and experimented for “Data Science and Machine Learning” Courses as part of my Ph.D. Coursework (Oct, 2021 - Jan, 2023).
IBM Watson Studio - Jupyter Notebook Assignment
IBM Watson Studio - Jupyter Notebook Code
- Python
- Pandas : Data Structure & Tools
- numpy : Arrays & Matrices
- Scipy
- matplotlib
- PyTorch : For Experimentation and Test Research Ideas
- TensorFlow : Production and Deployment
- Keras : Deep Learning Neural Networks
- Scikit-learn : Tools for Statistical Modeling including Regression, Classification and Clustering
- R
- SQL
- Java
- Weka : Data-mining
- Java-ML : Machine Learning
- Apache MLlib : Scalable ML
- Deeplearning4j
- Hadoop
- Scala
- C++
- JavaScript
- TensorFlow.js
- R.js
- brain.js
- machinelearn.js
- Julia
- Data Management
- Open Source
- Relational DB : MySQL and PostgreSQL
- File-based : Hadoop
- NoSQL DB : MongoDB, CouchDB, Apache Cassandra
- Cloud File : Ceph
- Commercial
- Oracle Database
- Microsoft SQLServer
- IBM DB2
- Cloud
- AWS Amazon Dynamo DB
- Cloudant
- IBM DB2
- Data Integration and Transformation
- Open Source
- Commercial
- Cloud
- Informatica
- IBM Data Refinery
- Data Visualization
- Open Source
- Commercial
- Cloud
- Model Building
- Model Deployment
- Model Monitoring and Assessment
- Open Source
- No Major Commercial Products
- Cloud
- AWS Amazon SageMaker Model Monitor
- Code Asset Management
- Open Source
- Git : Gitlab, GitHub
- Bitbucket
- Data Asset Management
- Development Environment
- Cluser Execution Environment
- Fully Integrated Visual Tools
Dataset Sources
Course - 2 : Deploying Machine Learning Models in Production
- Model Serving
- Input Feature Lookup
- Model Deployment Servers
- Data Versioning
- Logging Metrics
- Notebooks
- nbconvert
- nbdime
- jupytext
- neptune-notebooks
- git
Lab Exercises - Google Cloud