# Session 1 - DataFrames - Lesson 1: Creating DataFrames

## Learning Objectives
- Understand different methods to create pandas DataFrames
- Learn to create DataFrames from dictionaries, lists, and NumPy arrays
- Practice with various data types and structures

## Prerequisites
- Basic Python knowledge
- Understanding of lists and dictionaries

In [19]:
# Import required libraries
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

Pandas version: 2.2.3
NumPy version: 2.2.6


## Method 1: Creating DataFrame from Dictionary

This is the most common and intuitive way to create a DataFrame.

In [20]:
# Creating DataFrame from dictionary
student_data = {
 'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
 'Age': [23, 25, 22, 24, 23],
 'Grade': ['A', 'B', 'A', 'C', 'B'],
 'Score': [95, 87, 92, 78, 89]
}

df_students = pd.DataFrame(student_data)
print("Student DataFrame:")
print(df_students)
print(f"\nShape: {df_students.shape}")
print(f"Data types:\n{df_students.dtypes}")

Student DataFrame:
 Name Age Grade Score
0 Alice 23 A 95
1 Bob 25 B 87
2 Charlie 22 A 92
3 Diana 24 C 78
4 Eve 23 B 89

Shape: (5, 4)
Data types:
Name object
Age int64
Grade object
Score int64
dtype: object


## Method 2: Creating DataFrame from Lists

You can create DataFrames from separate lists by combining them in a dictionary.

In [21]:
# Creating DataFrame from separate lists
cities = ['New York', 'London', 'Tokyo', 'Paris', 'Sydney']
populations = [8.4, 8.9, 13.9, 2.1, 5.3]
countries = ['USA', 'UK', 'Japan', 'France', 'Australia']

df_cities = pd.DataFrame({
 'City': cities,
 'Population_Million': populations,
 'Country': countries
})

print("Cities DataFrame:")
print(df_cities)
print(f"\nIndex: {df_cities.index.tolist()}")
print(f"Columns: {df_cities.columns.tolist()}")

Cities DataFrame:
 City Population_Million Country
0 New York 8.4 USA
1 London 8.9 UK
2 Tokyo 13.9 Japan
3 Paris 2.1 France
4 Sydney 5.3 Australia

Index: [0, 1, 2, 3, 4]
Columns: ['City', 'Population_Million', 'Country']


## Method 3: Creating DataFrame from NumPy Array

This method is useful when working with numerical data or when you need random data for testing.

In [22]:
# Creating DataFrame from NumPy array
np.random.seed(42) # For reproducible results
random_data = np.random.randint(1, 100, size=(5, 3))

df_random = pd.DataFrame(random_data, 
 columns=['Column_A', 'Column_B', 'Column_C'],
 index=['Row1', 'Row2', 'Row3', 'Row4', 'Row5'])

print("Random DataFrame:")
print(df_random)
print(f"\nSummary statistics:")
print(df_random.describe())

Random DataFrame:
 Column_A Column_B Column_C
Row1 52 93 15
Row2 72 61 21
Row3 83 87 75
Row4 75 88 24
Row5 3 22 53

Summary statistics:
 Column_A Column_B Column_C
count 5.000000 5.000000 5.000000
mean 57.000000 70.200000 37.600000
std 32.272279 29.693434 25.530374
min 3.000000 22.000000 15.000000
25% 52.000000 61.000000 21.000000
50% 72.000000 87.000000 24.000000
75% 75.000000 88.000000 53.000000
max 83.000000 93.000000 75.000000


## Method 4: Creating DataFrame with Custom Index

You can specify custom row labels (index) when creating DataFrames.

In [23]:
# Creating DataFrame with custom index
product_data = {
 'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor'],
 'Price': [1200, 800, 600, 300],
 'Stock': [15, 50, 30, 20]
}

# Custom index using product codes
custom_index = ['PROD001', 'PROD002', 'PROD003', 'PROD004']
df_products = pd.DataFrame(product_data, index=custom_index)

print("Products DataFrame with Custom Index:")
print(df_products)
print(f"\nAccessing by index label 'PROD002':")
print(df_products.loc['PROD002'])

Products DataFrame with Custom Index:
 Product Price Stock
PROD001 Laptop 1200 15
PROD002 Phone 800 50
PROD003 Tablet 600 30
PROD004 Monitor 300 20

Accessing by index label 'PROD002':
Product Phone
Price 800
Stock 50
Name: PROD002, dtype: object


## Method 5: Creating Empty DataFrame and Adding Data

Sometimes you need to start with an empty DataFrame and add data incrementally.

In [24]:
# Creating empty DataFrame with specified columns
columns = ['Date', 'Temperature', 'Humidity', 'Pressure']
df_weather = pd.DataFrame(columns=columns)

print("Empty DataFrame:")
print(df_weather)
print(f"Shape: {df_weather.shape}")

# Adding data row by row (not recommended for large datasets)
weather_data = [
 ['2024-01-01', 22.5, 65, 1013.2],
 ['2024-01-02', 24.1, 68, 1015.1],
 ['2024-01-03', 21.8, 72, 1012.8]
]

for row in weather_data:
 df_weather.loc[len(df_weather)] = row

print("\nDataFrame after adding data:")
print(df_weather)

Empty DataFrame:
Empty DataFrame
Columns: [Date, Temperature, Humidity, Pressure]
Index: []
Shape: (0, 4)

DataFrame after adding data:
 Date Temperature Humidity Pressure
0 2024-01-01 22.5 65 1013.2
1 2024-01-02 24.1 68 1015.1
2 2024-01-03 21.8 72 1012.8


## Practice Exercises

Try these exercises to reinforce your learning:

In [25]:
# Exercise 1: Create a DataFrame from dictionary with employee information
# Include: Employee ID, Name, Department, Salary, Years of Experience

# Your code here:
employee_data = {
 # Add your data here
}

# df_employees = pd.DataFrame(employee_data)
# print(df_employees)

In [26]:
# Exercise 2: Create a DataFrame using NumPy with 6 rows and 4 columns
# Use column names: 'A', 'B', 'C', 'D'
# Use row indices: 'R1', 'R2', 'R3', 'R4', 'R5', 'R6'

# Your code here:


In [27]:
# Exercise 3: Create a DataFrame with mixed data types
# Include at least one string, integer, float, and boolean column

# Your code here:


## Key Takeaways

1. **Dictionary method** is most intuitive for creating DataFrames
2. **NumPy arrays** are useful for numerical data and testing
3. **Custom indices** provide meaningful row labels
4. **Empty DataFrames** can be useful but avoid adding rows one by one for large datasets
5. Always check the **shape** and **data types** of your DataFrame after creation
