AI Expert Roadmap
{i.am.ai}
{i.am.ai}
Matrices and Linear Algebra Fundamentals.
Database Basics.
Relational vs non-relational databases.
SQL + Joins (Inner, Outer, Cross, Theta Join)
NoSQL.
Tabular Data
Data Frames and Series.
[ETL] Extract, Transform, Load.
Reporting/ Bl/ Analytics.
Data Formats (JSON, XML, CSV).
Regular Expression (RegEx).
Python.
Expression.
Variables.
Data Structures.
Functions.
Install Packages.
Codestyle (PEP8)
Libraries.
Numpy.
Pandas.
Virtual Environments.
Jupyter Notebooks/ Lab.
Data Mining.
Web Scraping.
Awesome Public Datasets.
Kaggle.
Principal Component Analysis.
Dimensionality and Numerosity Reduction.
Normalization.
Data Scrubbing, Handling Missing Values.
Unbiased Estimators.
Bining Sparse Values.
Feature Extraction.
Denoising.
Sampling.
Probability Theory.
Randomness/ Random variable/ Random sample.
Probability Distribution.
Conditional Probability and Bayes's theorem.
(Statistical) Independence.
(IID) Independence and Identically Distributed Random Variables.
Cumulative Distribution Function (cdf).
Probability Density Function (pdf).
Probability Mass Function (pmf).
Continuous Distribution (pdf's).
Normal/ Gassian.
Uniform.
Beta.
Dirichlet.
Exponential.
x^2 (chi-squared)
Discrete Distribution.
Uniform.
Binomial.
Multinomial.
Hypergeometric.
Poisson.
Geometric.
Summary Statistics.
Expectation and Mean.
Varianced Standard Deviation (sd).
Covariance and Correlation.
Median and Quartile.
Interquartile Range.
Percentile/ Quantile.
Mode.
Important Laws.
Law of Large Numbers (LLN).
Central Limit Theorem (CLT)
Estimation.
Maximum Likelihood Estimation (MLE)
Kernel Density Estimation (KDE)
Hypothesis Testing.
p-value.
Chi-squared test.
F-test.
t-test.
Confidence Interval (CI)
Monte Carlo Method.
Chart Suggestion Thought Starter.
Python.
Matplotlib.
Seaborn.
Bokeh.
ipyvolume (3D data).
plotnine.
Web.
Vega-Lite.
D3.js.
Dashboards.
Dash.
Bl.
Tableau.
Power-Bl.
Categorical/ Ordinal/ Numerical Variables.
Cost Function and Gradient Descent.
Overfitting and Under Fitting.
Training, Validation and Test Data.
Precision and Recall.
Bias and Variance.
Lift.
Supervised Learning.
Regression:
Linear Regression.
Poisson Regression.
Classification:
Classification Rate.
Decision Tree.
Logistic Regression.
Naive Bayes Classifiers.
K-Nearest Neighbour.
SVM.
Gaussian Mixture Models.
Unsupervised Learning
Clustering:
Hierarchical Clustering.
K-Means Clustering.
DBSCAN.
HDBSCAN.
Fuzzzy C-Means.
Mean Shift.
Agglomerative.
OPTICS.
Assiciate Rule Learning:
Apriori Algorithm.
ECLAT Algorithm.
PP Trees.
Dimensional Reduction:
Principal Component Analysis (PCA).
Random Projection.
NMF.
T-SNE.
UMAP.
Ensemble Learning:
Boosting.
Bagging.
Stacking.
Reinforcement Learning.
Q-Learning.
Sentiment Analysis.
Collaborative Filtering.
Tagging.
Prediction.
Scikit-Learn.
Spacy (NLP)
Neural Networks.
Loss Function.
Activation Function.
Weight Initialization.
Vanishing/ Exploding Gradient Problem.
Feedforward Neural Network.
AutoEncoder.
CNN.
Pooling
RNN.
LSTM.
GRU.
Transformer.
Encoder.
Decoder.
Attention.
Siamese Network.
Generative Adversarial Network (GAN).
Evolving Architectures/ NEAT.
Residual Connections.
Optimizers.
SGD.
Momentum.
Adam.
AdaGrad.
AdaDelta.
Nadam.
RMS Prop.
Learning Rate Schedule.
Batch Normalization.
Batch Size Effects.
Regularization.
Early Stopping.
Dropout.
Parameter Penalties.
Data Augmentation.
Adversarial Training.
Multitask Learning.
Transfer Learning.
Curriculum Learning.
Awaesome Deep Learning.
Hugging Face.
MLFlow.
Distilation.
Quantization.
Neural Architecture Search (NAS).
Data Formats.
Data Discovery.
Data Source and Acquisition.
Data Integration.
Data Fusion.
Transformation and Enrichment.
Data Survey.
OpenRefine.
How much Data.
Using ETL.
Data Lake and Data Warehouse.
Dockerize your Python Application.
Architectural Patterns and Best Practice.
Horizontal vs Vertical Scaling.
Map Reduce.
Data Replication.
Name and Data Nodes.
Job and Task Tracker.
Big Data List.
Hadoop (large data).
HDFS.
Loading data with Sqoop and Pig.
Storm: Hadoop Realtime.
Spark (in memory).
RAPIDS (on GPU).
Flume, Scribe: For Unstruct Data.
Data Warehouse with Hive.
Elastic (EKL) Stack.
Avro.
Flink.
Dask.
Numba.
Onnx.
OpenVino.
MLFlow.
Kafka and KSQL.
Databases.
Cassandra.
Mongo DB, Neo4j.
Scalability.
ZooKeeper.
Kubernetes.
Cloud Services.
AWS SageMaker,
Google ML Engine.
Microsoft Azure Machine Learning Studio.
Awesome Prodution ML.