Machine Learning

What is Artificial Intelligence (AI)

Artificial Intelligence is the branch of computer science that focuses on creating machines or systems that can simulate human intelligence. In simple words: AI is about making computers think, learn, and make decisions like humans.

Examples in daily life:

Google Maps suggesting the fastest route.
Chatbots (like ChatGPT, Gemini, Copilot, etc.) answering queries.
Face recognition in smartphones.
Amazon/Netflix recommending products or movies.

Goals of AI

Automation → Reduce human effort by automating repetitive or complex tasks.
Adaptability → Learn and improve from experience (data-driven learning).
Decision-Making → Make accurate, data-based predictions or recommendations.
Mimicking Human Intelligence → Natural communication, reasoning, creativity.

Characteristics of AI Systems

Learning → Ability to learn from past experiences/data.
Reasoning → Apply logic to solve problems.
Problem-Solving → Explore multiple solutions.
Perception → Understand inputs from the environment (vision, sound).
Action → Interact with the world (e.g., robots).

Types of AI

Based on Functionality

Reactive Machines → No memory, only react to current input (e.g., IBM’s Deep Blue chess computer).
Limited Memory → Uses past data for short-term decisions (e.g., self-driving cars).
Theory of Mind → Hypothetical; AI that understands emotions and intentions.
Self-Aware AI → Hypothetical; AI with consciousness.

Based on Capability

Narrow AI (Weak AI): Specialized in one task (e.g., Google Translate).
General AI (Strong AI): Performs any intellectual task like humans (still in research).
Super AI: Surpasses human intelligence Beyond human intelligence. Exists only in theory and science fiction (theoretical, movies like Terminator seen in sci-fi).

What is Machine Learning (ML)?

Machine Learning (ML) is a branch of Artificial Intelligence (AI) that allows computers to learn from data and improve their performance on tasks without being explicitly programmed for each task. A subset of AI that allows machines to learn patterns from data and improve their performance without explicit programming.

👉 Example: Instead of coding “rules” to identify cats in images, we provide thousands of labeled cat/dog images. The ML algorithm learns the features (like ears, eyes, whiskers) and builds its own model for classification.

In simple words: Instead of writing rules for the computer, you give it data and it learns patterns to make predictions or decisions.

Analogy:

Imagine teaching a child how to recognize fruits.
Instead of giving strict rules (“Apples are always red”),
You show many examples of apples, bananas, and oranges.
The child learns the patterns → that’s machine learning!

Why ML?

Manual programming for every scenario is impossible.
With huge amounts of data available, ML can find hidden patterns.
Improves automatically with more data and experience.

Key Concepts in Machine Learning

Learning from Data: Computers analyze large amounts of data to find patterns and relationships.
Model: A mathematical representation that the computer uses to make predictions based on input data.
Training: The process of teaching the computer by feeding it labeled data (input + output).
Prediction / Inference: Using the trained model to predict unknown outputs from new inputs.
Improvement: Machine Learning systems improve automatically as they get more data over time.

Relation Between AI, ML, and Deep Learning

AI → Big umbrella (machines that think).

ML → Subset of AI (machines learn from data).

Deep Learning (DL) → Subset of ML using neural networks with multiple layers.

AI ⊃ ML ⊃ DL

Feature	AI	ML	DL
Definition	Machines simulating intelligence	Machines learning from data	Neural network-based ML
Scope	Broad	Narrower	Specialized in perception
Human Involvement	High (rules & reasoning)	Medium (data labeling, features)	Low (automatic feature learning)
Examples	Self-driving car (overall)	Lane detection, speed prediction	Image recognition inside car

Future of AI & ML

More explainable AI (transparent decision-making).
Integration with IoT and robotics.
Advances in generative AI (text, image, music creation).
Wider use in medicine, climate change, and education.
Ethical frameworks and AI governance will play a key role.

Types of Machine Learning

1. Supervised Learning

Definition: The model is trained on labeled data (data with input-output pairs).
Goal: Predict output for new, unseen data.

Examples:

Predicting house prices based on features (area, rooms, location).
Classifying emails as spam or not spam.

from sklearn.linear_model import LinearRegression
import numpy as np
# Features: size in sq.ft
X = np.array([[500], [1000], [1500], [2000]])
# Labels: price in thousands
y = np.array([150, 300, 450, 600])
model = LinearRegression()
model.fit(X, y)
# Predict price for 1200 sq.ft house
print("Predicted Price:", model.predict([[1200]]))

2. Unsupervised Learning

Definition: The model is trained on unlabeled data and tries to find patterns or structure in the data.
Goal: Discover hidden relationships or groupings.

Examples:

Customer segmentation for marketing.
Detecting anomalies in credit card transactions.

Example: Clustering Customers

from sklearn.cluster import KMeans

import numpy as np

# Customer data: [annual income, spending score]

X = np.array([[15, 39], [16, 81], [17, 6], [18, 77], [19, 40]])

kmeans = KMeans(n_clusters=2, random_state=0)

kmeans.fit(X)

print("Cluster labels:", kmeans.labels_)

3. Reinforcement Learning

Definition: The model learns by trial and error, receiving rewards or penalties based on actions.

Goal: Learn a strategy to maximize long-term rewards.

Examples:

Training robots to walk.
AI playing games like Chess or Go.

Example:

A robot learns to navigate a maze: every correct step gives a reward, and wrong steps give a penalty. Over time, it learns the best path.

Summary

Feature	Machine Learning
Input	Data
Process	Model learns patterns
Output	Prediction / Classification / Decision
Feedback	Model improves with new data

Introduction to Machine Learning Libraries in Python

Machine Learning (ML) in Python relies heavily on specialized libraries that simplify tasks like data handling, visualization, model building, and deep learning.

Instead of writing mathematical formulas and algorithms from scratch, these libraries provide ready-to-use tools.

Categories of ML Libraries in Python

Data Handling & Processing Libraries
Data Visualization Libraries
Core Machine Learning Libraries
Deep Learning Libraries
Supporting Libraries (for optimization, deployment, etc.)

Python Data Handling & Processing Libraries

Machine Learning models are only as good as the data they are trained on. Before you can train any model, you must collect, clean, transform, and organize data properly. Python provides powerful data handling and processing libraries to make this easy and efficient.

NumPy (Numerical Python)

NumPy (Numerical Python) is a Python library for numerical and scientific computation.

It provides:

Powerful N-dimensional array objects (ndarray)
Tools for linear algebra, statistics, random number generation
Support for vectorized operations (fast computations without loops)

It forms the foundation for many data science and ML libraries like Pandas, SciPy, scikit-learn, and TensorFlow.

Why NumPy?

Python’s built-in lists are flexible but slow for mathematical operations.

NumPy arrays are:

Fast (implemented in C)
Memory-efficient
Convenient (provide many mathematical operations directly)

Example of speed difference:

import numpy as np
import time
L = range(1000000)
A = np.arange(1000000)
start = time.time()
[x**2 for x in L]
print("Python list time:", time.time() - start)
start = time.time()
A**2
print("NumPy array time:", time.time() - start)

NumPy performs operations much faster than Python lists.

NumPy Installation

To install NumPy: pip install numpy

Then import it: import numpy as np

NumPy Data Structure: ndarray

The ndarray (N-dimensional array) is the core object in NumPy.

It can represent:

Scalars (0D)
Vectors (1D)
Matrices (2D)
Tensors (3D or higher)

Example 1: Creating Arrays

import numpy as np
# 1D Array
arr1 = np.array([1, 2, 3, 4, 5])
print("1D Array:", arr1)

# 2D Array (Matrix)
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:\n", arr2)

# 3D Array
arr3 = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print("3D Array:\n", arr3)

Array Attributes

Every array has useful attributes that describe it:

print("Shape:", arr2.shape) # (2, 3)
print("Dimensions:", arr2.ndim) # 2
print("Data type:", arr2.dtype) # int64
print("Size:", arr2.size) # 6 elements
print("Item size:", arr2.itemsize, "bytes")

Array Creation Functions

NumPy provides many built-in functions for creating arrays easily:

np.zeros((2,3)) # 2x3 matrix of zeros
np.ones((3,2)) # 3x2 matrix of ones
np.eye(3) # 3x3 Identity matrix
np.arange(0,10,2) # [0, 2, 4, 6, 8]
np.linspace(0, 1, 5) # [0. , 0.25, 0.5 , 0.75, 1.]
np.random.rand(2,3) # Random 2x3 array

Array Indexing and Slicing

Example:

arr = np.array([[10, 20, 30], [40, 50, 60]])

print("First row:", arr[0])
print("Element at (1,2):", arr[1, 2])
print("Slicing (rows 0–1, columns 0–2):\n", arr[0:2, 0:2])

You can also modify values directly:

arr[0, 0] = 99
print("Modified array:\n", arr)

Array Operations NumPy performs element-wise operations efficiently.

Arithmetic Operations:

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print("Addition:", a + b)
print("Subtraction:", a - b)
print("Multiplication:", a * b)
print("Division:", a / b)

Mathematical Functions:

x = np.array([1, 4, 9, 16, 25])

print("Square root:", np.sqrt(x))
print("Exponent:", np.exp(x))
print("Log:", np.log(x))
print("Sum:", np.sum(x))
print("Mean:", np.mean(x))
print("Standard Deviation:", np.std(x))

Matrix Operations

NumPy makes linear algebra easy.

A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print("Matrix addition:\n", A + B)
print("Matrix multiplication:\n", np.dot(A, B))
print("Transpose:\n", A.T)
print("Determinant:", np.linalg.det(A))
print("Inverse:\n", np.linalg.inv(A))

Broadcasting

Broadcasting lets NumPy perform arithmetic on arrays of different shapes.

Example:

a = np.array([1, 2, 3])
b = 2
print("Add scalar:", a + b)

# 2D array with 1D
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([10, 20, 30])
print("Broadcasting addition:\n", A + B)

Boolean Indexing and Filtering

You can filter elements that satisfy a condition.

arr = np.array([10, 20, 30, 40, 50])

print("Elements > 25:", arr[arr > 25])

Reshaping and Flattening Arrays

arr = np.arange(1, 10)
arr2 = arr.reshape(3, 3)
print("Reshaped 3x3:\n", arr2)

print("Flattened:", arr2.flatten())

Aggregation Functions

NumPy offers many functions for summarizing data:

arr = np.array([[1, 2, 3], [4, 5, 6]])

print("Sum:", np.sum(arr))
print("Column-wise Sum:", np.sum(arr, axis=0))
print("Row-wise Mean:", np.mean(arr, axis=1))

Random Module in NumPy

NumPy’s random module is useful for generating random numbers — essential for simulations and ML model initialization.

from numpy import random

print("Random integer:", random.randint(10))
print("Random array:\n", random.randint(1, 100, size=(3,3)))
print("Random float array:\n", random.rand(2,2))
print("Random choice:", random.choice([10, 20, 30, 40]))

NumPy in Machine Learning

NumPy is used for:

Feature scaling & normalization
Matrix operations in neural networks
Data preprocessing
Implementing algorithms from scratch

Example: Simple Linear Regression (Manual using NumPy)

import numpy as np

# Training data
X = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

# Mean
X_mean, y_mean = np.mean(X), np.mean(y)

# Calculate slope (m) and intercept (c)
m = np.sum((X - X_mean)*(y - y_mean)) / np.sum((X - X_mean)**2)
c = y_mean - m * X_mean

print("Equation: y =", round(m, 2), "x +", round(c, 2))
print("Prediction for x=6:", m*6 + c)

Summary

NumPy is the backbone of data handling in Python.
It provides speed, power, and flexibility for numerical operations.
Used in almost every area of data science, AI, and ML.

NumPy Complete Demonstration Program

import numpy as np

print("========== NUMPY COMPLETE DEMONSTRATION ==========\n")

# 1️⃣ ARRAY CREATION METHODS

arr1 = np.array([1, 2, 3])

arr2 = np.array([[1, 2], [3, 4]])

zeros_arr = np.zeros((2, 3))

ones_arr = np.ones((3, 3))

empty_arr = np.empty((2, 2))

arange_arr = np.arange(1, 11, 2)

lin_arr = np.linspace(0, 1, 5)

print("Array 1 =", arr1)

print("Array 2 =\n", arr2)

print("Zeros Array =\n", zeros_arr)

print("Ones Array =\n", ones_arr)

print("Empty Array =\n", empty_arr)

print("Arange =", arange_arr)

print("Linspace =", lin_arr, "\n")

# 2️⃣ ARRAY PROPERTIES

print("Shape:", arr2.shape)

print("Size:", arr2.size)

print("Data Type:", arr2.dtype)

print("Dimension:", arr2.ndim, "\n")

# 3️⃣ INDEXING & SLICING

print("arr2[0][1] =", arr2[0][1])

print("Slice arr2[0,:] =", arr2[0, :])

print("Slice arr2[:,1] =", arr2[:, 1], "\n")

# 4️⃣ MATHEMATICAL OPERATIONS

a = np.array([10, 20, 30])

b = np.array([1, 2, 3])

print("a + b =", a + b)

print("a - b =", a - b)

print("a * b =", a * b)

print("a / b =", a / b)

print("a ** 2 =", a ** 2)

print("Sin(a) =", np.sin(a), "\n")

# 5️⃣ BROADCASTING

mat = np.array([[1, 2, 3], [4, 5, 6]])

print("Matrix + 10 =\n", mat + 10, "\n")

# 6️⃣ RESHAPING

r = np.arange(1, 13)

reshaped = r.reshape(3, 4)

print("Reshaped 1-12 into 3x4:\n", reshaped, "\n")

# 7️⃣ FLATTEN & RAVEL

print("Flatten:", reshaped.flatten())

print("Ravel:", reshaped.ravel(), "\n")

# 8️⃣ STACKING ARRAYS

vstack_arr = np.vstack((arr1, b))

hstack_arr = np.hstack((arr1, b))

print("Vertical Stack:\n", vstack_arr)

print("Horizontal Stack:", hstack_arr, "\n")

# 9️⃣ SPLITTING ARRAYS

split_arr = np.array([10, 20, 30, 40, 50, 60])

print("Split:", np.split(split_arr, 3), "\n")

# 🔟 STATISTICAL FUNCTIONS

stats = np.array([10, 20, 30, 40, 50])

print("Mean =", np.mean(stats))

print("Median =", np.median(stats))

print("Standard Deviation =", np.std(stats))

print("Variance =", np.var(stats))

print("Sum =", np.sum(stats))

print("Min =", np.min(stats))

print("Max =", np.max(stats), "\n")

# 1️⃣1️⃣ RANDOM MODULE

rand_arr = np.random.rand(3, 3)

rand_int_arr = np.random.randint(1, 50, size=5)

normal_arr = np.random.normal(0, 1, 5)

print("Random (0-1):\n", rand_arr)

print("Random Integers:", rand_int_arr)

print("Normal Distribution:", normal_arr, "\n")

# 1️⃣2️⃣ SORTING

unsorted = np.array([40, 10, 50, 20, 30])

print("Sorted:", np.sort(unsorted), "\n")

# 1️⃣3️⃣ LOGICAL OPERATIONS

logic_arr = np.array([10, 20, 30, 40, 50])

print("logic_arr > 25:", logic_arr > 25)

print("Elements > 25:", logic_arr[logic_arr > 25], "\n")

# 1️⃣4️⃣ COPY vs VIEW

original = np.array([1, 2, 3, 4])

view_arr = original.view()

copy_arr = original.copy()

original[0] = 99

print("Original:", original)

print("View (changes with original):", view_arr)

print("Copy (independent):", copy_arr, "\n")

# 1️⃣5️⃣ ITERATING ARRAYS

print("Iterating:")

for i in reshaped:

print(i)

print()

# 1️⃣6️⃣ LINEAR ALGEBRA

matrix_A = np.array([[1, 2], [3, 4]])

matrix_B = np.array([[5, 6], [7, 8]])

print("Matrix Multiplication:\n", np.dot(matrix_A, matrix_B))

print("Transpose:\n", np.transpose(matrix_A))

print("Determinant:", np.linalg.det(matrix_A))

print("Inverse:\n", np.linalg.inv(matrix_A), "\n")

# 1️⃣7️⃣ UFUNCS (Universal Functions)

arr = np.array([1, 4, 9, 16])

print("Square Root =", np.sqrt(arr))

print("Log =", np.log(arr))

print("Exp =", np.exp(arr), "\n")

# 1️⃣8️⃣ FILE OPERATIONS

np.savetxt("numbers.csv", stats, delimiter=",")

loaded = np.loadtxt("numbers.csv", delimiter=",")

print("Saved and Loaded Array:", loaded)

# --------------------------------------------------------------

print("\n========= END OF NUMPY DEMONSTRATION =========")

Pandas (Python Data Analysis Library)

1. Introduction to Pandas

Pandas stands for Python Data Analysis Library. It is built on top of NumPy and provides high-level data structures and data analysis tools. It helps to clean, transform, analyze, and visualize data easily — especially data stored in tables (rows and columns) just like in Excel or SQL.

2. Features of Pandas

Feature Description

Data Structures Provides Series (1D) and DataFrame (2D) for data manipulation
Data Handling Handles missing, duplicate, and inconsistent data efficiently
File Operations Supports reading/writing data from CSV, Excel, JSON, SQL, etc.
Data Alignment Automatic data alignment and indexing
Fast Operations Built on NumPy, hence highly optimized for performance
Data Analysis Supports grouping, merging, joining, and pivoting operations

3. Pandas Data Structures

(a) Series

A 1-dimensional labeled array (like a column in a table).

Can hold data of any type (integer, string, float, Python objects).

Example:

import pandas as pd
# Creating a Series
s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])
print(s)
print("Access element with label 'c':", s['c'])

Output:

a 10

b 20

c 30

d 40

e 50

dtype: int64

Access element with label 'c': 30

(b) DataFrame

A 2-dimensional labeled data structure with rows and columns (like an Excel sheet or SQL table). Each column can be a Series with a different data type.

Example:

import pandas as pd
# Creating a DataFrame using a dictionary
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 22],
'City': ['Delhi', 'Mumbai', 'Chennai']
}
df = pd.DataFrame(data)
print(df)

Output:

Name Age City

0 Alice 25 Delhi

1 Bob 30 Mumbai

2 Charlie 22 Chennai

4. Reading and Writing Files

Pandas can read and write many file formats directly.

File Type Function

CSV read_csv(), to_csv()
Excel read_excel(), to_excel()
JSON read_json(), to_json()
SQL read_sql(), to_sql()

Example (Read/Write CSV):

# Read a CSV file
df = pd.read_csv('data.csv')
# Write DataFrame to CSV
df.to_csv('output.csv', index=False)

5. Common DataFrame Operations

(a) Viewing Data

print(df.head()) # First 5 rows
print(df.tail(2)) # Last 2 rows
print(df.info()) # Summary (columns, types, memory)
print(df.describe()) # Statistical summary of numeric columns

(b) Selecting Columns and Rows

print(df['Name']) # Select single column
print(df[['Name', 'City']]) # Select multiple columns
print(df.iloc[0]) # Select row by index
print(df.loc[1, 'City']) # Select specific cell (row=1, col='City')

# Select rows where Age > 25
print(df[df['Age'] > 25])

(d) Adding or Modifying Columns

df['Country'] = 'India' # Add a new column
df['Age+5'] = df['Age'] + 5 # Add computed column

(e) Handling Missing Data

df.dropna(inplace=True) # Remove rows with missing values
df.fillna(0, inplace=True) # Replace missing values with 0

(f) Sorting Data

df.sort_values(by='Age', ascending=False, inplace=True)

(g) Grouping Data

data = {'City': ['Delhi', 'Mumbai', 'Delhi', 'Mumbai'],
'Sales': [200, 150, 400, 300]}
df = pd.DataFrame(data)
grouped = df.groupby('City')['Sales'].sum()
print(grouped)

Output:

City

Delhi 600

Mumbai 450

Name: Sales, dtype: int64

6. Combining DataFrames

Concatenation

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
result = pd.concat([df1, df2])
print(result)

Merging (like SQL Join)

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['A', 'B', 'C']})
df2 = pd.DataFrame({'ID': [1, 2, 3], 'Marks': [90, 80, 70]})
merged = pd.merge(df1, df2, on='ID')
print(merged)

7. Data Cleaning Example

data = {
'Name': ['Alice', 'Bob', None, 'David'],
'Age': [25, None, 22, 30],
'City': ['Delhi', 'Mumbai', 'Chennai', None]
}
df = pd.DataFrame(data)
print("Before Cleaning:\n", df)
df.fillna({'Name': 'Unknown', 'Age': df['Age'].mean(), 'City': 'Unknown'}, inplace=True)
print("\nAfter Cleaning:\n", df)

8. Visualization with Pandas

Pandas integrates directly with Matplotlib.

import matplotlib.pyplot as plt
df = pd.DataFrame({'Year': [2018, 2019, 2020, 2021],
'Sales': [250, 300, 350, 400]})
df.plot(x='Year', y='Sales', kind='bar', title='Yearly Sales')
plt.show()

9. Advantages of Pandas

Easy to use and understand
Highly efficient and fast
Handles large datasets easily
Excellent integration with other Python libraries (NumPy, Matplotlib, Scikit-learn)
Built-in tools for cleaning, transforming, merging, and analyzing data

Pandas is the heart of data analysis in Python — it makes working with structured data fast, flexible, and powerful, enabling easy preparation of data for Machine Learning models.

First Install All Required Libraries

You can install all of them at once:

pip install pandas numpy matplotlib seaborn

Or install them one by one:

pip install pandas
pip install numpy
pip install matplotlib
pip install seaborn

COMPLETE PANDAS PROGRAM

# Step 1: Import libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

# ----------------------------------------------

# Step 2: Create DataFrame

# ----------------------------------------------

data = {

"Name": ["Shree", "Satya", "Jyoti", "Ashis", "Saraswati", "Ganesh", "Kajal", "Priya"],

"Gender": ["F", "M", "F", "M", "F", "M", "F", "F"],

"Age": [25, 30, 22, 28, 26, np.nan, 24, 27],

"Department": ["CS", "IT", "CS", "EC", "IT", "CS", "EC", "IT"],

"Marks1": [85, 78, 92, 70, 88, 80, 90, 95],

"Marks2": [80, 85, 88, 75, np.nan, 70, 95, 98],

"Marks3": [82, 80, 91, 72, 90, 68, 88, 96]

}

df = pd.DataFrame(data)

print("\n===== Original Data =====")

print(df)

# ----------------------------------------------

# Step 3: Basic Information and Statistics

# ----------------------------------------------

print("\n===== Basic Information =====")

print(df.info())

print("\n===== Statistical Summary =====")

print(df.describe())

# ----------------------------------------------

# Step 4: Data Selection

# ----------------------------------------------

print("\n===== Column Selection =====")

print(df["Name"]) # Single column

print(df[["Name", "Marks1"]]) # Multiple columns

print("\n===== Row Selection =====")

print(df.iloc[0]) # By position

print(df.loc[2, "Marks1"]) # By label

# ----------------------------------------------

# Step 5: Adding and Modifying Columns

# ----------------------------------------------

df["Total"] = df["Marks1"] + df["Marks2"] + df["Marks3"]

df["Average"] = df["Total"] / 3

print("\n===== After Adding Columns =====")

print(df.head())

# Modify a column

df["Department"] = df["Department"].replace({"CS": "Computer", "IT": "InformationTech", "EC": "Electronics"})

print("\n===== After Modifying Department Names =====")

print(df)

# ----------------------------------------------

# Step 6: Filtering Data

# ----------------------------------------------

print("\n===== Students with Average > 85 =====")

print(df[df["Average"] > 85])

# ----------------------------------------------

# Step 7: Sorting Data

# ----------------------------------------------

print("\n===== Sorting by Average (Descending) =====")

print(df.sort_values(by="Average", ascending=False))

# ----------------------------------------------

# Step 8: Handling Missing Data

# ----------------------------------------------

print("\n===== Missing Values =====")

print(df.isnull().sum())

df["Marks2"].fillna(df["Marks2"].mean(), inplace=True)

df["Age"].fillna(df["Age"].mean(), inplace=True)

print("\n===== After Filling Missing Values =====")

print(df)

# ----------------------------------------------

# Step 9: Grouping and Aggregation

# ----------------------------------------------

grouped = df.groupby("Department")[["Marks1", "Marks2", "Marks3", "Average"]].mean()

print("\n===== Average Marks by Department =====")

print(grouped)

# ----------------------------------------------

# Step 10: Concatenation and Merging

# ----------------------------------------------

df_extra = pd.DataFrame({

"Name": ["Shree", "Satya", "Jyoti", "Ashis", "Saraswati", "Ganesh", "Kajal", "Priya"],

"Attendance (%)": [95, 88, 92, 80, 97, 85, 90, 99]

})

merged = pd.merge(df, df_extra, on="Name")

print("\n===== After Merging Attendance Data =====")

print(merged)

# ----------------------------------------------

# Step 11: Visualization with Pandas + Matplotlib + Seaborn

# ----------------------------------------------

# Bar plot of Average Marks by Department

grouped["Average"].plot(kind="bar", color="skyblue", title="Average Marks by Department")

plt.ylabel("Average Marks")

plt.show()

# Distribution of Marks

sns.histplot(df["Average"], bins=10, kde=True)

plt.title("Distribution of Student Averages")

plt.show()

# Box plot

sns.boxplot(x="Department", y="Average", data=df)

plt.title("Average Scores by Department")

plt.show()

# ----------------------------------------------

# Step 12: Export Data

# ----------------------------------------------

df.to_csv("students_full_processed.csv", index=False)

print("\n✅ Data exported to 'students_full_processed.csv' successfully!")

COMPLETE PANDAS PROGRAM USING CSV FILE

# Step 1: Import required libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

# Step 2: Load CSV file

# Make sure "students_full_processed.csv" is in the same directory as this script

df = pd.read_csv("students_full_processed.csv")

print("===== Original Data =====")

print(df)

print("\n")

# Step 3: Display basic info

print("===== Basic Information =====")

print(df.info())

print("\n")

# Step 4: Display statistics

print("===== Statistical Summary =====")

print(df.describe())

print("\n")

# Step 5: Show first and last few rows

print("===== First 5 Rows =====")

print(df.head())

print("\n")

print("===== Last 5 Rows =====")

print(df.tail())

print("\n")

# Step 6: Access specific columns and rows

print("===== Names of All Students =====")

print(df["Name"])

print("\n")

print("===== Marks of First 3 Students =====")

print(df[["Name", "Marks1", "Marks2", "Marks3"]].head(3))

print("\n")

# Step 7: Conditional filtering

print("===== Students with Average > 85 =====")

high_achievers = df[df["Average"] > 85]

print(high_achievers)

print("\n")

# Step 8: Sorting

print("===== Sorting by Average (Descending) =====")

sorted_df = df.sort_values(by="Average", ascending=False)

print(sorted_df)

print("\n")

# Step 9: Grouping and Aggregation

print("===== Average Marks by Department =====")

dept_avg = df.groupby("Department")[["Marks1", "Marks2", "Marks3", "Average"]].mean()

print(dept_avg)

print("\n")

# Step 10: Handle missing data (if any)

print("===== Check Missing Values =====")

print(df.isnull().sum())

print("\n")

# Fill missing numeric values (if found)

df.fillna(df.mean(numeric_only=True), inplace=True)

# Step 11: Add new derived column (Performance Grade)

def grade(avg):

if avg >= 90:

return "A+"

elif avg >= 80:

return "A"

elif avg >= 70:

return "B"

else:

return "C"

df["Grade"] = df["Average"].apply(grade)

print("===== After Adding Grade Column =====")

print(df[["Name", "Average", "Grade"]])

print("\n")

# Step 12: Data Visualization

print("===== Data Visualization =====")

# Bar plot for average marks by department

dept_avg["Average"].plot(kind="bar", color="skyblue", title="Average Marks by Department")

plt.ylabel("Average Marks")

plt.show()

# Distribution of average marks

sns.histplot(df["Average"], bins=8, kde=True, color="green")

plt.title("Distribution of Average Marks")

plt.show()

# Box plot of Average by Department

sns.boxplot(x="Department", y="Average", data=df, palette="pastel")

plt.title("Average Marks by Department")

plt.show()

# Step 13: Export modified data

df.to_csv("students_analysis_output.csv", index=False)

print("✅ Processed data saved as 'students_analysis_output.csv'")

Data Visualization Libraries

1. Introduction to Matplotlib

Matplotlib is a comprehensive 2D and 3D plotting library in Python that allows users to create high-quality static, animated, and interactive visualizations. It was developed by John D. Hunter in 2003, originally designed to provide MATLAB-like plotting features in Python.

2. Why Use Matplotlib?

Matplotlib is widely used because it is:
Flexible – You can customize every element of a plot.
Powerful – Supports hundreds of plot types (line, bar, scatter, histogram, pie, etc.).
Compatible – Works well with other libraries such as NumPy, Pandas, Seaborn, and Scikit-learn.
Cross-platform– Works on Windows, macOS, Linux, and in Jupyter notebooks.

3. Installation

To install Matplotlib, open your terminal or command prompt and type:

pip install matplotlib

Complete Matplotlib Demonstration Program

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

# Step 1: Create Sample Dataset

data = {

"Name": ["Shree", "Satya", "Jyoti", "Ashis", "Saraswati", "Ganesh", "Kajal", "Priya"],

"Marks1": [85, 78, 90, 88, 75, 92, 81, 89],

"Marks2": [80, 74, 94, 84, 70, 95, 77, 90],

"Marks3": [82, 79, 88, 91, 73, 89, 80, 87],

"Department": ["CSE", "ECE", "CSE", "EEE", "MECH", "CSE", "ECE", "CIVIL"]

}

df = pd.DataFrame(data)

df["Total"] = df["Marks1"] + df["Marks2"] + df["Marks3"]

df["Average"] = df["Total"] / 3

print("===== STUDENTS DATA =====")

print(df, "\n")

# Step 2: Basic Line Plot

plt.figure(figsize=(7, 4))

plt.plot(df["Name"], df["Marks1"], color='blue', marker='o', linestyle='-', label='Marks1')

plt.plot(df["Name"], df["Marks2"], color='red', marker='x', linestyle='--', label='Marks2')

plt.plot(df["Name"], df["Marks3"], color='green', marker='s', linestyle='-.', label='Marks3')

plt.title("Line Plot: Students' Marks Comparison")

plt.xlabel("Student Name")

plt.ylabel("Marks")

plt.legend()

plt.grid(True)

plt.show()

# Step 3: Bar Chart

plt.figure(figsize=(7, 4))

plt.bar(df["Name"], df["Average"], color='purple', alpha=0.6)

plt.title("Bar Chart: Students' Average Marks")

plt.xlabel("Name")

plt.ylabel("Average Marks")

plt.grid(axis='y', linestyle='--')

plt.show()

# Step 4: Horizontal Bar Chart

plt.figure(figsize=(7, 4))

plt.barh(df["Name"], df["Total"], color='orange')

plt.title("Horizontal Bar Chart: Total Marks")

plt.xlabel("Total Marks")

plt.ylabel("Student Name")

plt.show()

# Step 5: Scatter Plot

plt.figure(figsize=(7, 4))

plt.scatter(df["Marks1"], df["Marks2"], color='teal', s=100, alpha=0.7)

plt.title("Scatter Plot: Marks1 vs Marks2")

plt.xlabel("Marks1")

plt.ylabel("Marks2")

plt.grid(True)

plt.show()

# Step 6: Histogram

plt.figure(figsize=(6, 4))

plt.hist(df["Average"], bins=5, color='coral', edgecolor='black')

plt.title("Histogram: Distribution of Average Marks")

plt.xlabel("Average Marks")

plt.ylabel("Number of Students")

plt.show()

# Step 7: Pie Chart

plt.figure(figsize=(6, 6))

plt.pie(df["Average"], labels=df["Name"], autopct='%1.1f%%', startangle=90, shadow=True)

plt.title("Pie Chart: Share of Average Marks")

plt.show()

# Step 8: Multiple Subplots

fig, axes = plt.subplots(2, 2, figsize=(10, 7))

# 1️⃣ Line Plot

axes[0, 0].plot(df["Name"], df["Marks1"], color='blue', marker='o')

axes[0, 0].set_title("Marks1 Line Plot")

axes[0, 0].set_xlabel("Name")

axes[0, 0].set_ylabel("Marks1")

# 2️⃣ Bar Plot

axes[0, 1].bar(df["Name"], df["Marks2"], color='red')

axes[0, 1].set_title("Marks2 Bar Plot")

# 3️⃣ Scatter Plot

axes[1, 0].scatter(df["Marks1"], df["Marks3"], color='green', s=80)

axes[1, 0].set_title("Marks1 vs Marks3 Scatter")

# 4️⃣ Histogram

axes[1, 1].hist(df["Total"], color='purple', bins=4)

axes[1, 1].set_title("Histogram of Total Marks")

plt.suptitle("Students’ Performance Subplots", fontsize=14)

plt.tight_layout()

plt.show()

# Step 9: Customization and Annotation

plt.figure(figsize=(7, 4))

plt.plot(df["Name"], df["Average"], color='brown', marker='D', label="Average")

plt.title("Customized Plot with Annotations")

plt.xlabel("Student")

plt.ylabel("Average Marks")

plt.grid(True)

plt.legend()

# Annotate highest performer

max_avg = df["Average"].max()

max_name = df.loc[df["Average"].idxmax(), "Name"]

plt.annotate(f"Topper: {max_name} ({max_avg:.2f})",

xy=(max_name, max_avg),

xytext=(max_name, max_avg + 2),

arrowprops=dict(facecolor='black', arrowstyle="->"))

plt.show()

# Step 10: Styling with Built-in Styles

plt.style.use('seaborn-v0_8-darkgrid')

plt.figure(figsize=(7, 4))

plt.plot(df["Name"], df["Total"], color='magenta', marker='o')

plt.title("Styled Plot using Seaborn Darkgrid")

plt.xlabel("Name")

plt.ylabel("Total Marks")

plt.show()

# Step 11: Object-Oriented API Example

fig, ax = plt.subplots(figsize=(7, 4))

ax.bar(df["Department"], df["Average"], color='skyblue')

ax.set_title("Department-wise Average Marks")

ax.set_xlabel("Department")

ax.set_ylabel("Average Marks")

ax.grid(True)

plt.show()

# Step 12: Save the Plot

plt.figure(figsize=(6, 4))

plt.bar(df["Name"], df["Total"], color='darkgreen')

plt.title("Saving Plot Example")

plt.xlabel("Name")

plt.ylabel("Total Marks")

plt.savefig("students_performance_plot.png")

print("Plot saved successfully as 'students_performance_plot.png' ✅")

Python Program: Mean, Median, Mode, Variance & Standard Deviation

import numpy as np

import statistics as stats

from collections import Counter

# Sample List

data = [12, 15, 12, 18, 20, 22, 12, 15, 30]

# Using statistics Library

mean_stats = stats.mean(data)

median_stats = stats.median(data)

# Mode can return error if no unique mode, so use try-except

try:

mode_stats = stats.mode(data)

except:

mode_stats = "No unique mode"

variance_stats = stats.variance(data) # Sample variance

std_stats = stats.stdev(data) # Sample standard deviation

# Using NumPy

mean_np = np.mean(data)

median_np = np.median(data)

variance_np = np.var(data) # Population variance

std_np = np.std(data) # Population standard deviation

# Mode with Counter

counter_mode = Counter(data).most_common(1)[0][0]

# Output

print("===== Using statistics Module =====")

print("Mean:", mean_stats)

print("Median:", median_stats)

print("Mode:", mode_stats)

print("Variance:", variance_stats)

print("Standard Deviation:", std_stats)

print("\n===== Using NumPy Library =====")

print("Mean:", mean_np)

print("Median:", median_np)

print("Mode (Counter):", counter_mode)

print("Variance:", variance_np)

print("Standard Deviation:", std_np)

Overview of Scikit-Learn, TensorFlow & PyTorch

These are the core machine learning libraries used in industry, research, and academics.

Scikit-Learn (sklearn)

Best for: Traditional Machine Learning Algorithms
Level: Beginner → Intermediate
Built on: NumPy, SciPy, Matplotlib

What Scikit-Learn Is

Scikit-Learn is a Python library that provides ready-made implementations of classical ML algorithms like:

Linear Regression
Logistic Regression
Decision Trees
Random Forest
Naive Bayes
SVM (Support Vector Machines)
Clustering (K-Means, DBSCAN)
Dimensionality Reduction (PCA)

It is simple, clean, and widely used in data science.

Key Features

Easy to use (simple API)
Fast (built on optimized C/C++ libraries)
Includes preprocessing tools

(e.g., StandardScaler, OneHotEncoder)

Excellent for beginners and training ML models quickly. Great for small- and medium-size datasets

Why is Scikit-Learn Important?

It contains almost every traditional machine learning algorithm.

Provides ready-made functions for:

Data preprocessing
Feature selection
Model building
Model evaluation
Model tuning
Highly reliable and optimized.

Scikit-Learn is the standard library for most ML beginners and data scientists.

Core Areas of Scikit-Learn

Scikit-Learn provides support for:

1. Supervised Learning

Algorithms where the model learns from labeled data:

Regression
Linear Regression
Polynomial Regression
Ridge, Lasso Regression
Classification
Logistic Regression
Decision Trees
Random Forest
Support Vector Machines
K-Nearest Neighbors (KNN)
Naive Bayes

2. Unsupervised Learning

Algorithms that work with unlabeled data:

Clustering
K-Means
DBSCAN
Agglomerative Clustering
Dimensionality Reduction
PCA (Principal Component Analysis)
LDA (Linear Discriminant Analysis)
t-SNE (via other packages)

3. Semi-Supervised Learning

Uses a mix of labeled and unlabeled data.

4. Model Selection & Evaluation

Tools for testing model accuracy:

train_test_split
Cross Validation (K-Fold)
Grid Search (GridSearchCV)
Randomized Search
Accuracy, Precision, Recall, F1-score
Confusion Matrix

5. Preprocessing Tools

For preparing raw data:

Standardization (StandardScaler)
Normalization (MinMaxScaler)
Encoding (OneHotEncoder, LabelEncoder)
Imputation (SimpleImputer)
Binarization
Polynomial features

FULL MACHINE LEARNING PROGRAM WITH AUTO BEST MODEL SELECTION

# Demonstrates all major ML workflow steps

# Dataset: Handwritten Digits (0–9)

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn import datasets

from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV

from sklearn.preprocessing import StandardScaler

from sklearn.decomposition import PCA

from sklearn.pipeline import Pipeline

from sklearn.metrics import (

accuracy_score, confusion_matrix, classification_report,

precision_score, recall_score, f1_score

)

from sklearn.linear_model import LogisticRegression

from sklearn.neighbors import KNeighborsClassifier

from sklearn.svm import SVC

from sklearn.tree import DecisionTreeClassifier

import joblib

# 1. LOAD DATASET

digits = datasets.load_digits()

X, y = digits.data, digits.target

print("Shape of X:", X.shape)

print("Shape of y:", y.shape)

# 2. TRAIN-TEST SPLIT

X_train, X_test, y_train, y_test = train_test_split(

X, y, test_size=0.2, random_state=42

)

# 3. DEFINE MODELS AND PIPELINES

models = {

"Logistic Regression": LogisticRegression(max_iter=2000),

"KNN": KNeighborsClassifier(),

"SVM": SVC(),

"Decision Tree": DecisionTreeClassifier()

}

pipelines = {}

for name, model in models.items():

if name in ["Logistic Regression", "SVM"]:

pipelines[name] = Pipeline([

("scaler", StandardScaler()),

("pca", PCA(n_components=30)),

("model", model)

])

else:

pipelines[name] = Pipeline([

("scaler", StandardScaler()),

("model", model)

])

# 4. TRAIN MODELS, EVALUATE, AND SELECT BEST

best_accuracy = 0

best_model_name = None

best_model_pipeline = None

results = []

for name, pipe in pipelines.items():

pipe.fit(X_train, y_train)

y_pred = pipe.predict(X_test)

acc = accuracy_score(y_test, y_pred)

prec = precision_score(y_test, y_pred, average="macro")

rec = recall_score(y_test, y_pred, average="macro")

f1 = f1_score(y_test, y_pred, average="macro")

results.append([name, acc, prec, rec, f1])

print(f"\n===== {name} =====")

print("Accuracy:", acc)

print("Precision:", prec)

print("Recall:", rec)

print("F1 Score:", f1)

# Update best model

if acc > best_accuracy:

best_accuracy = acc

best_model_name = name

best_model_pipeline = pipe

# Confusion matrix heatmap

cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(6,5))

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')

plt.title(f"Confusion Matrix: {name}")

plt.ylabel('Actual')

plt.xlabel('Predicted')

plt.show()

# Summary DataFrame

results_df = pd.DataFrame(results, columns=["Model", "Accuracy", "Precision", "Recall", "F1"])

print("\n===== SUMMARY OF MODEL PERFORMANCE =====")

print(results_df)

print(f"\nBest Model: {best_model_name} with Accuracy = {best_accuracy:.4f}")

# 5. CROSS-VALIDATION OF BEST MODEL

scores = cross_val_score(best_model_pipeline, X, y, cv=5)

print(f"\nCross-validation Scores for {best_model_name}: {scores}")

print(f"Mean CV Accuracy: {scores.mean():.4f}")

# 6. HYPERPARAMETER TUNING FOR BEST MODEL

if best_model_name == "Logistic Regression":

param_grid = {

'model__C': [0.01, 0.1, 1, 10],

'model__solver': ['liblinear', 'lbfgs']

}

elif best_model_name == "SVM":

param_grid = {

'model__C': [0.1, 1, 10],

'model__kernel': ['linear', 'rbf']

}

elif best_model_name == "KNN":

param_grid = {

'model__n_neighbors': [3, 5, 7, 9],

'model__weights': ['uniform', 'distance']

}

elif best_model_name == "Decision Tree":

param_grid = {

'model__max_depth': [None, 5, 10, 20],

'model__min_samples_split': [2, 5, 10]

}

grid = GridSearchCV(best_model_pipeline, param_grid, cv=3, scoring='accuracy', verbose=1)

grid.fit(X_train, y_train)

best_model_final = grid.best_estimator_

print(f"\n===== GRID SEARCH RESULTS FOR {best_model_name} =====")

print("Best Parameters:", grid.best_params_)

print("Best CV Score:", grid.best_score_)

# 7. SAVE FINAL BEST MODEL

joblib.dump(best_model_final, "best_digit_model_final.pkl")

print(f"\nFinal best model ({best_model_name}) saved as best_digit_model_final.pkl")

# 8. LOAD MODEL & PREDICT NEW SAMPLE

loaded_model = joblib.load("best_digit_model_final.pkl")

sample = X_test[0].reshape(1, -1)

predicted_digit = loaded_model.predict(sample)

print("\nSample Prediction:")

print("Actual value:", y_test[0])

print("Predicted value:", predicted_digit[0])

# END OF PROGRAM

TensorFlow

Best for: Deep Learning, Neural Networks, Large-Scale ML
Level: Intermediate → Advanced
Developed by: Google
Has Keras API (easy high-level NN building)

What TensorFlow Is

TensorFlow is a powerful open-source platform used for:

Deep Neural Networks
Convolutional Neural Networks (CNNs)
Recurrent Neural Networks (RNNs)
Natural Language Processing (NLP)
Computer Vision
Large-scale ML training

TensorFlow uses computational graphs and can run on:

CPU
GPU
TPU (Tensor Processing Units)

Key Features

High-performance deep learning
Supports distributed training
Industry standard for production models
Keras makes model-building simple
Used in Google products (Search, Photos, YouTube)

COMPLETE TENSORFLOW PROJECT (ALL MAJOR METHODS INCLUDED)

# MNIST HANDWRITTEN DIGIT RECOGNITION

import tensorflow as tf

from tensorflow.keras.models import Model

from tensorflow.keras.layers import (

Input, Conv2D, SeparableConv2D, MaxPooling2D,

GlobalAveragePooling2D, Dense, Dropout, BatchNormalization, Activation

)

from tensorflow.keras.callbacks import (

EarlyStopping, ModelCheckpoint, TensorBoard, ReduceLROnPlateau, LearningRateScheduler

)

from tensorflow.keras.preprocessing.image import ImageDataGenerator

from tensorflow.keras.utils import to_categorical

from sklearn.metrics import confusion_matrix, classification_report

from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt

import numpy as np

import datetime

# 1. CHECK GPU

print("Available GPUs:", tf.config.list_physical_devices('GPU'))

# 2. LOAD MNIST DATASET

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

print("Train shape:", x_train.shape, "Test shape:", x_test.shape)

# 3. PREPROCESSING

x_train = x_train.astype("float32") / 255

x_test = x_test.astype("float32") / 255

x_train = x_train.reshape(-1,28,28,1)

x_test = x_test.reshape(-1,28,28,1)

y_train_cat = to_categorical(y_train, 10)

y_test_cat = to_categorical(y_test, 10)

# 4. SPLIT TRAIN/VALIDATION MANUALLY

x_train_new, x_val, y_train_new, y_val = train_test_split(

x_train, y_train_cat, test_size=0.2, random_state=42

)

# 5. DATA AUGMENTATION

datagen = ImageDataGenerator(

rotation_range=10,

zoom_range=0.1,

width_shift_range=0.1,

height_shift_range=0.1

)

datagen.fit(x_train_new)

# 6. BUILD MODEL

inputs = Input(shape=(28,28,1))

x = Conv2D(32, (3,3), padding='same')(inputs)

x = BatchNormalization()(x)

x = Activation('relu')(x)

x = MaxPooling2D((2,2))(x)

x = SeparableConv2D(64, (3,3), padding='same')(x)

x = BatchNormalization()(x)

x = Activation('relu')(x)

x = MaxPooling2D((2,2))(x)

x = GlobalAveragePooling2D()(x)

x = Dense(128, activation='relu')(x)

x = Dropout(0.4)(x)

outputs = Dense(10, activation='softmax')(x)

model = Model(inputs, outputs)

model.summary()

# 7. COMPILE MODEL

model.compile(

optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),

loss='categorical_crossentropy',

metrics=['accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()]

)

# 8. CALLBACKS

log_dir = "logs/advanced_mnist_" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")

def lr_schedule(epoch, lr):

if epoch > 0 and epoch % 5 == 0:

return lr * 0.5

return lr

callbacks = [

EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True),

ModelCheckpoint('best_advanced_mnist.h5', save_best_only=True),

TensorBoard(log_dir=log_dir),

ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, verbose=1),

LearningRateScheduler(lr_schedule)

]

# 9. TRAIN MODEL

history = model.fit(

datagen.flow(x_train_new, y_train_new, batch_size=128),

epochs=20,

validation_data=(x_val, y_val),

steps_per_epoch=len(x_train_new)//128,

callbacks=callbacks

)

# 10. EVALUATE MODEL

loss, acc, precision, recall = model.evaluate(x_test, y_test_cat, verbose=0)

print(f"Test Loss: {loss:.4f}, Accuracy: {acc:.4f}, Precision: {precision:.4f}, Recall: {recall:.4f}")

# 11. PREDICTIONS

pred = model.predict(x_test)

pred_classes = np.argmax(pred, axis=1)

print("\nClassification Report:")

print(classification_report(y_test, pred_classes))

print("\nConfusion Matrix:")

print(confusion_matrix(y_test, pred_classes))

# 12. PLOT ACCURACY, LOSS

plt.figure(figsize=(18,5))

plt.subplot(1,3,1)

plt.plot(history.history['accuracy'], label='Train Acc')

plt.plot(history.history['val_accuracy'], label='Val Acc')

plt.title('Accuracy')

plt.legend()

plt.subplot(1,3,2)

plt.plot(history.history['loss'], label='Train Loss')

plt.plot(history.history['val_loss'], label='Val Loss')

plt.title('Loss')

plt.legend()

plt.subplot(1,3,3)

if 'lr' in history.history:

plt.plot(history.history['lr'], label='Learning Rate')

plt.title('Learning Rate')

plt.legend()

plt.show()

# 13. SAVE AND LOAD MODEL

model.save('final_advanced_mnist_model')

loaded_model = tf.keras.models.load_model('final_advanced_mnist_model')

# 14. SINGLE SAMPLE PREDICTION

sample_idx = 0

sample = x_test[sample_idx].reshape(1,28,28,1)

prediction = loaded_model.predict(sample)

print(f"\nActual: {y_test[sample_idx]}, Predicted: {np.argmax(prediction)}")

# 15. VISUALIZE MISCLASSIFIED IMAGES

misclassified_idx = np.where(pred_classes != y_test)[0]

plt.figure(figsize=(12,6))

for i, idx in enumerate(misclassified_idx[:9]):

plt.subplot(3,3,i+1)

plt.imshow(x_test[idx].reshape(28,28), cmap='gray')

plt.title(f"Actual: {y_test[idx]}, Predicted: {pred_classes[idx]}")

plt.axis('off')

plt.show()

# END OF PROGRAM

PyTorch

PyTorch is an open-source deep learning and scientific computing framework developed by Facebook AI Research (FAIR).

Best for: Research, Deep Learning, AI Models
Level: Intermediate → Advanced
Developed by: Facebook (Meta)

PyTorch is a deep learning framework known for:

Flexibility
Dynamic computation graphs
Faster debugging
Research-friendliness
It is widely used in:
NLP (Transformers, BERT, GPT)
Computer Vision
Deep Learning Research
Reinforcement Learning

PyTorch NLP Sentiment Analysis using LSTM

import torch

import torch.nn as nn

import torch.optim as optim

from torch.utils.data import Dataset, DataLoader

from torch.nn.utils.rnn import pad_sequence

from collections import Counter

import re

# 1. TEXT CLEANING + TOKENIZER

def clean_text(text):

text = text.lower()

text = re.sub(r"[^a-zA-Z0-9\s']", "", text)

return text.split()

# 2. TRAINING DATA (40 SENTENCES)

positive_sentences = [

"I love this movie",

"This film is amazing",

"I really like this product",

"The acting was fantastic",

"This is a wonderful experience",

"I enjoyed this movie a lot",

"The product quality is very good",

"This is awesome",

"Absolutely loved it",

"What a great film",

"This is brilliant",

"The storyline is excellent",

"The actors did a great job",

"I highly recommend this movie",

"This product works very well",

"This made me very happy",

"It was a delightful experience",

"I am satisfied with this item",

"The movie was very enjoyable",

"I like this a lot"

]

negative_sentences = [

"I hate this movie",

"This film is terrible",

"I dislike this product",

"The acting was awful",

"This is a bad experience",

"I did not enjoy the movie",

"The product quality is very poor",

"This is horrible",

"Absolutely hated it",

"What a waste of time",

"This is boring",

"The storyline is terrible",

"The actors did a bad job",

"I don't recommend this movie",

"This product does not work",

"This made me very sad",

"It was a disappointing experience",

"I am frustrated with this item",

"The movie was not enjoyable",

"I don't like this at all"

]

texts = positive_sentences + negative_sentences

labels = [1]*20 + [0]*20 # 1 = Positive, 0 = Negative

# 3. BUILD VOCABULARY

tokenized = [clean_text(t) for t in texts]

word_counts = Counter(word for sent in tokenized for word in sent)

vocab = {"<PAD>": 0, "<UNK>": 1}

for word, _ in word_counts.items():

vocab[word] = len(vocab)

vocab_size = len(vocab)

print("Vocabulary Size:", vocab_size)

# 4. ENCODING FUNCTION

def encode(sentence):

return torch.tensor([vocab.get(word, 1) for word in clean_text(sentence)])

# 5. DATASET CLASS

class SentimentDataset(Dataset):

def __init__(self, texts, labels):

self.data = [encode(t) for t in texts]

self.labels = torch.tensor(labels)

def __len__(self):

return len(self.labels)

def __getitem__(self, idx):

return self.data[idx], self.labels[idx]

# Padding function

def collate_fn(batch):

text, labels = zip(*batch)

text = pad_sequence(text, batch_first=True, padding_value=0)

labels = torch.tensor(labels)

return text, labels

dataset = SentimentDataset(texts, labels)

loader = DataLoader(dataset, batch_size=4, shuffle=True, collate_fn=collate_fn)

# 6. BI-DIRECTIONAL LSTM MODEL (UPGRADED)

class BiLSTMSentiment(nn.Module):

def __init__(self, vocab_size):

super().__init__()

self.embedding = nn.Embedding(vocab_size, 64)

self.lstm = nn.LSTM(64, 128, batch_first=True, bidirectional=True)

self.dropout = nn.Dropout(0.4)

self.fc = nn.Linear(128 * 2, 2) # Bi-LSTM → *2

def forward(self, x):

emb = self.embedding(x)

output, (hidden, cell) = self.lstm(emb)

hidden = torch.cat((hidden[-2], hidden[-1]), dim=1) # Combine directions

hidden = self.dropout(hidden)

return self.fc(hidden)

model = BiLSTMSentiment(vocab_size)

criterion = nn.CrossEntropyLoss()

optimizer = optim.Adam(model.parameters(), lr=0.001)

# 7. TRAINING LOOP

for epoch in range(15):

total_loss = 0

for x, y in loader:

optimizer.zero_grad()

preds = model(x)

loss = criterion(preds, y)

loss.backward()

optimizer.step()

total_loss += loss.item()

print(f"Epoch {epoch+1}/15, Loss = {total_loss:.4f}")

# 8. PREDICTION FUNCTION

def predict(text):

model.eval()

with torch.no_grad():

encoded = encode(text).unsqueeze(0) # Add batch dimension

output = model(encoded)

label = torch.argmax(output, dim=1).item()

return "Positive 😀" if label == 1 else "Negative 😡"

# 9. TESTING THE MODEL

print("\n--- SENTIMENT PREDICTIONS ---")

print("This movie is awesome! ->", predict("This movie is awesome!"))

print("I don't like this product. ->", predict("I don't like this product."))

print("The acting was fantastic! ->", predict("The acting was fantastic!"))

print("This is the worst thing ever ->", predict("This is the worst thing ever"))

CNN-Based Face Recognition: Student Manual

This project demonstrates how to perform face recognition using a Convolutional Neural Network (CNN) in Python with TensorFlow/Keras.

It supports both:

Binary face recognition (e.g., one person vs another)
Multi-class face recognition (e.g., multiple people)

Components:

Training Script (train.py) – Train the CNN on your dataset.

Recognition Script (recognize.py) – Real-time face recognition using webcam.

Dataset folder – Contains images of people, organized by subfolders.

Prerequisites

Software Required

Python 3.11+

Pip package manager

Required Python Libraries

Install using pip:

pip install tensorflow opencv-python numpy scikit-learn pillow

tensorflow – For building and training CNN
opencv-python – For webcam capture and face detection
numpy – For array operations
scikit-learn – For label encoding & train-test split
pillow – For image loading & preprocessing

Folder Structure

Create the following structure:

face_recognition/

dataset/

person1/

img1.jpg

img2.jpg

person2/

img1.jpg

img2.jpg

model/

(will store cnn_model.keras and labels.npy)

train.py

recognize.py

Dataset Preparation

Each person’s images go in their own subfolder inside dataset/.
Recommended 20–50 images per person.
Images can be .jpg or .png.
Keep face visible and centered.

train.py

import os

import numpy as np

from tensorflow.keras.utils import to_categorical

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import train_test_split

from tensorflow.keras.preprocessing.image import load_img, img_to_array

# ------------------------

# Config

# ------------------------

DATASET_DIR = "dataset/"

IMG_SIZE = 128 # Resize images

# ------------------------

# Load dataset

# ------------------------

X, y = [], []

for person_name in os.listdir(DATASET_DIR):

person_dir = os.path.join(DATASET_DIR, person_name)

if not os.path.isdir(person_dir):

continue

for img_name in os.listdir(person_dir):

img_path = os.path.join(person_dir, img_name)

try:

img = load_img(img_path, target_size=(IMG_SIZE, IMG_SIZE))

img_array = img_to_array(img) / 255.0

X.append(img_array)

y.append(person_name)

except:

continue

X = np.array(X)

y = np.array(y)

# ------------------------

# Encode labels

# ------------------------

le = LabelEncoder()

y_encoded = le.fit_transform(y)

num_classes = len(np.unique(y_encoded))

os.makedirs("model", exist_ok=True)

np.save("model/labels.npy", le.classes_)

# ------------------------

# Prepare labels

# ------------------------

if num_classes == 2:

y_final = y_encoded # Binary classification

else:

y_final = to_categorical(y_encoded, num_classes=num_classes) # Multi-class

# ------------------------

# Train-test split

# ------------------------

X_train, X_test, y_train, y_test = train_test_split(

X, y_final, test_size=0.2, random_state=42

)

# ------------------------

# Build CNN model

# ------------------------

model = Sequential([

Input(shape=(IMG_SIZE, IMG_SIZE, 3)),

Conv2D(32, (3,3), activation="relu"),

MaxPooling2D(2,2),

Conv2D(64, (3,3), activation="relu"),

MaxPooling2D(2,2),

Conv2D(128, (3,3), activation="relu"),

MaxPooling2D(2,2),

Flatten(),

Dense(128, activation="relu"),

Dropout(0.5),

Dense(1 if num_classes==2 else num_classes,

activation="sigmoid" if num_classes==2 else "softmax")

])

# ------------------------

# Compile model

# ------------------------

model.compile(

optimizer="adam",

loss="binary_crossentropy" if num_classes==2 else "categorical_crossentropy",

metrics=["accuracy"]

)

model.summary()

# ------------------------

# Train model

# ------------------------

model.fit(X_train, y_train,

validation_data=(X_test, y_test),

batch_size=16,

epochs=15)

# ------------------------

# Save model

# ------------------------

model.save("model/cnn_model.keras")

print("✅ Model trained and saved successfully!")

recognize.py

import cv2

import numpy as np

from tensorflow.keras.models import load_model

IMG_SIZE = 128

# ------------------------

# Load model and labels

# ------------------------

model = load_model("model/cnn_model.keras")

labels = np.load("model/labels.npy")

num_classes = len(labels)

# ------------------------

# Load face detector

# ------------------------

face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml")

# ------------------------

# Start webcam

# ------------------------

cap = cv2.VideoCapture(0)

while True:

ret, frame = cap.read()

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

faces = face_cascade.detectMultiScale(gray, 1.3, 5)

for (x, y, w, h) in faces:

face_img = frame[y:y+h, x:x+w]

face_img = cv2.resize(face_img, (IMG_SIZE, IMG_SIZE))

face_img = face_img.astype("float32") / 255.0

face_img = np.expand_dims(face_img, axis=0)

# Predict

preds = model.predict(face_img)[0]

if num_classes == 2:

# Binary classification

class_id = int(preds > 0.5)

confidence = preds if class_id == 1 else 1 - preds

else:

# Multi-class

class_id = np.argmax(preds)

confidence = preds[class_id]

name = labels[class_id]

# Draw rectangle and label

cv2.rectangle(frame, (x, y), (x+w, y+h), (0,255,0), 2)

cv2.putText(frame, f"{name} ({confidence*100:.1f}%)",

(x, y-10),

cv2.FONT_HERSHEY_SIMPLEX,

0.8, (0,255,0), 2)

cv2.imshow("CNN Face Recognition", frame)

if cv2.waitKey(1) & 0xFF == ord("q"):

break

cap.release()

cv2.destroyAllWindows()

Implement the linear regression algorithm.

# Linear Regression using Scikit-learn

# Step 1: Import Libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error, r2_score

# Step 2: Create Dataset (Study Hours vs Marks)

data = {

"Hours": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],

"Marks": [2, 4, 5, 4, 5, 7, 8, 9, 10, 12]

}

df = pd.DataFrame(data)

# Step 3: Define Features (X) and Target (y)

X = df[["Hours"]] # 2D array required

y = df["Marks"]

# Step 4: Split Dataset into Training and Testing

X_train, X_test, y_train, y_test = train_test_split(

X, y, test_size=0.2, random_state=42

)

# Step 5: Create Linear Regression Model

model = LinearRegression()

# Step 6: Train Model

model.fit(X_train, y_train)

# Step 7: Model Parameters

print("Slope (Coefficient):", model.coef_[0])

print("Intercept:", model.intercept_)

# Step 8: Make Predictions

y_pred = model.predict(X_test)

print("\nActual Marks:", list(y_test))

print("Predicted Marks:", y_pred)

# Step 9: Model Evaluation

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print("\nMean Squared Error:", mse)

print("R2 Score:", r2)

# Step 10: Plot Graph

plt.scatter(X, y, color='blue', label="Actual Data")

plt.plot(X, model.predict(X), color='red', label="Regression Line")

plt.xlabel("Study Hours")

plt.ylabel("Marks")

plt.title("Linear Regression using Scikit-learn")

plt.legend()

plt.show()

# Step 11: Predict for New Value

new_hours = np.array([[12]])

predicted_marks = model.predict(new_hours)

print("\nPredicted marks for 12 study hours:", predicted_marks[0])

Implement the logistic regression algorithm.

# MULTIPLE FEATURE LOGISTIC REGRESSION - LOAN APPROVAL

# Step 1: Import Libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

from sklearn.preprocessing import StandardScaler

# Step 2: Create Realistic Dataset

data = {

"Income": [15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000, 55000, 60000],

"Credit_Score": [500, 520, 580, 600, 650, 700, 720, 750, 780, 800],

"Loan_Status": [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]

}

df = pd.DataFrame(data)

print("Dataset:\n")

print(df)

# Step 3: Define Features and Target

X = df[["Income", "Credit_Score"]]

y = df["Loan_Status"]

# Step 4: Split Dataset

X_train, X_test, y_train, y_test = train_test_split(

X, y, test_size=0.2, random_state=42

)

# Step 5: Feature Scaling (Important for Logistic Regression)

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Step 6: Train Logistic Regression Model

model = LogisticRegression()

model.fit(X_train, y_train)

# Step 7: Make Predictions

y_pred = model.predict(X_test)

print("\nActual Values:", list(y_test))

print("Predicted Values:", list(y_pred))

# Step 8: Model Evaluation

accuracy = accuracy_score(y_test, y_pred)

cm = confusion_matrix(y_test, y_pred)

report = classification_report(y_test, y_pred)

print("\nModel Evaluation:")

print("Accuracy:", accuracy)

print("\nConfusion Matrix:\n", cm)

print("\nClassification Report:\n", report)

# Step 9: Predict New Applicant (No Warning Version)

new_applicant = pd.DataFrame({

"Income": [42000],

"Credit_Score": [690]

})

# Apply same scaling

new_applicant_scaled = scaler.transform(new_applicant)

prediction = model.predict(new_applicant_scaled)

if prediction[0] == 1:

print("\nLoan Approved for new applicant")

else:

print("\nLoan Rejected for new applicant")

# Step 10: Display Model Coefficients

print("\nModel Coefficients:")

print("Income Coefficient:", model.coef_[0][0])

print("Credit Score Coefficient:", model.coef_[0][1])

print("Intercept:", model.intercept_[0])

Implement the K-nearest neighbor algorithm.

# KNN - HEART DISEASE PREDICTION

# Step 1: Import Libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Step 2: Create Realistic Medical Dataset

data = {

"Age": [25, 35, 45, 50, 55, 60, 65, 70, 40, 48],

"Blood_Pressure": [120, 130, 140, 150, 160, 170, 180, 190, 135, 145],

"Cholesterol": [180, 190, 210, 220, 240, 260, 280, 300, 200, 215],

"Heart_Disease": [0, 0, 0, 1, 1, 1, 1, 1, 0, 1]

}

df = pd.DataFrame(data)

print("Dataset:\n")

print(df)

# Step 3: Define Features and Target

X = df[["Age", "Blood_Pressure", "Cholesterol"]]

y = df["Heart_Disease"]

# Step 4: Split Dataset

X_train, X_test, y_train, y_test = train_test_split(

X, y, test_size=0.3, random_state=42

)

# Step 5: Feature Scaling (Important for KNN)

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Step 6: Train KNN Model

k = 3

model = KNeighborsClassifier(n_neighbors=k)

model.fit(X_train, y_train)

# Step 7: Make Predictions

y_pred = model.predict(X_test)

print("\nActual Values:", list(y_test))

print("Predicted Values:", list(y_pred))

# Step 8: Model Evaluation

accuracy = accuracy_score(y_test, y_pred)

cm = confusion_matrix(y_test, y_pred)

report = classification_report(y_test, y_pred)

print("\nModel Evaluation:")

print("Accuracy:", accuracy)

print("\nConfusion Matrix:\n", cm)

print("\nClassification Report:\n", report)

# Step 9: Predict New Patient

new_patient = pd.DataFrame({

"Age": [52],

"Blood_Pressure": [155],

"Cholesterol": [230]

})

# Apply scaling

new_patient_scaled = scaler.transform(new_patient)

prediction = model.predict(new_patient_scaled)

if prediction[0] == 1:

print("\nPatient likely has Heart Disease")

else:

print("\nPatient likely does NOT have Heart Disease")

# Step 10: 2D Visualization (Age vs Cholesterol)

plt.scatter(df["Age"], df["Cholesterol"], c=df["Heart_Disease"])

plt.xlabel("Age")

plt.ylabel("Cholesterol")

plt.title("Heart Disease Classification (KNN)")

plt.show()

Implement the decision tree algorithm.

# LOAN APPROVAL PREDICTION USING DECISION TREE

# Step 1: Import Libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

from sklearn import tree

# Step 2: Create Realistic Dataset

data = {

"Income": [15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000, 55000, 60000],

"Credit_Score": [500, 520, 580, 600, 650, 700, 720, 750, 780, 800],

"Age": [22, 25, 28, 30, 35, 40, 45, 50, 55, 60],

"Loan_Status": [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]

}

df = pd.DataFrame(data)

print("Dataset:\n")

print(df)

# Step 3: Define Features and Target

X = df[["Income", "Credit_Score", "Age"]]

y = df["Loan_Status"]

# Step 4: Split Dataset

X_train, X_test, y_train, y_test = train_test_split(

X, y, test_size=0.3, random_state=42

)

# Step 5: Train Decision Tree Model

model = DecisionTreeClassifier(criterion="gini", max_depth=3, random_state=42)

model.fit(X_train, y_train)

# Step 6: Make Predictions

y_pred = model.predict(X_test)

print("\nActual Values:", list(y_test))

print("Predicted Values:", list(y_pred))

# Step 7: Evaluate Model

accuracy = accuracy_score(y_test, y_pred)

cm = confusion_matrix(y_test, y_pred)

report = classification_report(y_test, y_pred)

print("\nModel Evaluation:")

print("Accuracy:", accuracy)

print("\nConfusion Matrix:\n", cm)

print("\nClassification Report:\n", report)

# Step 8: Predict New Applicant

new_applicant = pd.DataFrame({

"Income": [42000],

"Credit_Score": [690],

"Age": [38]

})

prediction = model.predict(new_applicant)

if prediction[0] == 1:

print("\nLoan Approved for new applicant")

else:

print("\nLoan Rejected for new applicant")

# Step 9: Visualize Decision Tree

plt.figure(figsize=(12, 8))

tree.plot_tree(

model,

feature_names=["Income", "Credit_Score", "Age"],

class_names=["Rejected", "Approved"],

filled=True

)

plt.title("Decision Tree - Loan Approval")

plt.show()

Saubhagya ranjan Das

Search This Blog

Machine Learning

Comments

Post a Comment

Popular posts from this blog

How to write and naming a java program

Command Prompt (CMD) Basic commands to Compile and Run java programs

Features of Java