# Session 1 - DataFrames - Lesson 10: Time Series Analysis

## Learning Objectives
- Master datetime indexing and time-based operations
- Learn resampling and frequency conversion techniques
- Understand rolling calculations and window functions
- Practice with seasonal analysis and trend decomposition
- Apply time series techniques to business forecasting scenarios

## Prerequisites
- Completed Lessons 1-9
- Understanding of datetime concepts
- Basic knowledge of statistics (helpful for trend analysis)

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 20)
plt.style.use('seaborn-v0_8')
%matplotlib inline

print("Libraries loaded successfully!")

## Creating Time Series Data

Let's create realistic time series datasets for analysis.

In [None]:
# Create comprehensive time series dataset
np.random.seed(42)

# Generate 2 years of daily data
start_date = '2022-01-01'
end_date = '2023-12-31'
date_range = pd.date_range(start=start_date, end=end_date, freq='D')

# Create realistic sales data with trends and seasonality
n_days = len(date_range)
base_sales = 1000

# Add trend (gradual increase over time)
trend = np.linspace(0, 300, n_days)

# Add seasonality (weekly and monthly patterns)
daily_pattern = np.sin(2 * np.pi * np.arange(n_days) / 7) * 100 # Weekly pattern
monthly_pattern = np.sin(2 * np.pi * np.arange(n_days) / 30.44) * 150 # Monthly pattern
yearly_pattern = np.sin(2 * np.pi * np.arange(n_days) / 365.25) * 200 # Yearly pattern

# Add random noise
noise = np.random.normal(0, 80, n_days)

# Combine all components
sales = base_sales + trend + daily_pattern + monthly_pattern + yearly_pattern + noise
sales = np.maximum(sales, 0) # Ensure non-negative sales

# Create DataFrame
ts_data = pd.DataFrame({
 'date': date_range,
 'sales': sales,
 'customers': np.random.poisson(50, n_days) + (sales / 50).astype(int),
 'marketing_spend': np.random.gamma(2, 20, n_days),
 'temperature': 20 + 10 * np.sin(2 * np.pi * np.arange(n_days) / 365.25) + np.random.normal(0, 5, n_days),
 'is_weekend': pd.Series(date_range).dt.dayofweek >= 5,
 'is_holiday': np.random.choice([True, False], n_days, p=[0.05, 0.95])
})

# Set date as index
ts_data.set_index('date', inplace=True)

print("Time series dataset created:")
print(f"Shape: {ts_data.shape}")
print(f"Date range: {ts_data.index.min()} to {ts_data.index.max()}")
print("\nFirst few rows:")
print(ts_data.head())
print("\nData types:")
print(ts_data.dtypes)

## 1. DateTime Indexing and Basic Operations

Working with datetime indices and time-based selection.

In [None]:
# Basic datetime index operations
print("=== DATETIME INDEX OPERATIONS ===")

# Index information
print(f"Index type: {type(ts_data.index)}")
print(f"Index frequency: {ts_data.index.freq}")
print(f"Index is monotonic increasing: {ts_data.index.is_monotonic_increasing}")
print(f"Index is monotonic decreasing: {ts_data.index.is_monotonic_decreasing}")
print(f"Index has duplicates: {ts_data.index.has_duplicates}")

# Time-based selection
print("\n--- Time-based Selection ---")

# Select specific year
sales_2022 = ts_data.loc['2022']
print(f"2022 data shape: {sales_2022.shape}")
print(f"2022 average daily sales: {sales_2022['sales'].mean():.2f}")

# Select specific month
jan_2023 = ts_data.loc['2023-01']
print(f"\nJanuary 2023 data shape: {jan_2023.shape}")
print(f"January 2023 total sales: {jan_2023['sales'].sum():.2f}")

# Select date range
q1_2023 = ts_data.loc['2023-01-01':'2023-03-31']
print(f"\nQ1 2023 data shape: {q1_2023.shape}")
print(f"Q1 2023 average sales: {q1_2023['sales'].mean():.2f}")

# Recent data (last 30 days)
recent_data = ts_data.tail(30)
print(f"\nLast 30 days average sales: {recent_data['sales'].mean():.2f}")

In [None]:
# DateTime component extraction
print("=== DATETIME COMPONENT EXTRACTION ===")

# Extract various date components
ts_enhanced = ts_data.copy()
ts_enhanced['year'] = ts_enhanced.index.year
ts_enhanced['month'] = ts_enhanced.index.month
ts_enhanced['quarter'] = ts_enhanced.index.quarter
ts_enhanced['day_of_week'] = ts_enhanced.index.dayofweek # 0=Monday, 6=Sunday
ts_enhanced['day_name'] = ts_enhanced.index.day_name()
ts_enhanced['month_name'] = ts_enhanced.index.month_name()
ts_enhanced['week_of_year'] = ts_enhanced.index.isocalendar().week
ts_enhanced['day_of_year'] = ts_enhanced.index.dayofyear
ts_enhanced['is_month_start'] = ts_enhanced.index.is_month_start
ts_enhanced['is_month_end'] = ts_enhanced.index.is_month_end
ts_enhanced['is_quarter_start'] = ts_enhanced.index.is_quarter_start
ts_enhanced['is_quarter_end'] = ts_enhanced.index.is_quarter_end

print("Enhanced dataset with datetime components:")
print(ts_enhanced[['sales', 'year', 'month', 'quarter', 'day_name', 'week_of_year']].head())

# Analyze patterns by day of week
print("\nSales patterns by day of week:")
dow_analysis = ts_enhanced.groupby('day_name')['sales'].agg(['mean', 'std', 'count'])
# Reorder by weekday
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
dow_analysis = dow_analysis.reindex(day_order)
print(dow_analysis.round(2))

# Monthly patterns
print("\nSales patterns by month:")
monthly_analysis = ts_enhanced.groupby('month_name')['sales'].agg(['mean', 'std'])
month_order = ['January', 'February', 'March', 'April', 'May', 'June',
 'July', 'August', 'September', 'October', 'November', 'December']
monthly_analysis = monthly_analysis.reindex([m for m in month_order if m in monthly_analysis.index])
print(monthly_analysis.round(2))

In [None]:
# Time series visualization
print("=== TIME SERIES VISUALIZATION ===")

# Create comprehensive time series plots
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Plot 1: Daily sales over time
ts_data['sales'].plot(ax=axes[0, 0], title='Daily Sales Over Time', alpha=0.7)
axes[0, 0].set_ylabel('Sales ($)')
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Monthly aggregated sales
monthly_sales = ts_data['sales'].resample('M').sum()
monthly_sales.plot(ax=axes[0, 1], title='Monthly Sales', marker='o')
axes[0, 1].set_ylabel('Monthly Sales ($)')
axes[0, 1].grid(True, alpha=0.3)

# Plot 3: Sales vs customers correlation
axes[1, 0].scatter(ts_data['customers'], ts_data['sales'], alpha=0.5)
axes[1, 0].set_title('Sales vs Customers')
axes[1, 0].set_xlabel('Number of Customers')
axes[1, 0].set_ylabel('Sales ($)')
axes[1, 0].grid(True, alpha=0.3)

# Plot 4: Seasonal pattern (by month)
ts_enhanced.groupby('month')['sales'].mean().plot(ax=axes[1, 1], kind='bar', 
 title='Average Sales by Month')
axes[1, 1].set_ylabel('Average Sales ($)')
axes[1, 1].set_xlabel('Month')
axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

# Summary statistics
print("\nTime series summary statistics:")
print(ts_data['sales'].describe())

## 2. Resampling and Frequency Conversion

Converting between different time frequencies and aggregating data.

In [None]:
# Basic resampling operations
print("=== BASIC RESAMPLING ===")

# Resample to different frequencies
weekly_data = ts_data.resample('W').agg({
 'sales': 'sum',
 'customers': 'sum',
 'marketing_spend': 'sum',
 'temperature': 'mean'
})

print("Weekly resampled data:")
print(weekly_data.head(10))

# Monthly resampling with multiple aggregations
monthly_data = ts_data.resample('M').agg({
 'sales': ['sum', 'mean', 'std', 'min', 'max'],
 'customers': ['sum', 'mean'],
 'marketing_spend': 'sum',
 'temperature': 'mean'
})

print("\nMonthly resampled data (first 6 months):")
print(monthly_data.head(6))

# Quarterly resampling
quarterly_data = ts_data.resample('Q').agg({
 'sales': 'sum',
 'customers': 'sum',
 'marketing_spend': 'sum'
})

print("\nQuarterly resampled data:")
print(quarterly_data)

# Year-over-year comparison
yearly_data = ts_data.resample('Y').agg({
 'sales': 'sum',
 'customers': 'sum',
 'marketing_spend': 'sum'
})

print("\nYearly resampled data:")
print(yearly_data)

# Calculate year-over-year growth
if len(yearly_data) > 1:
 yoy_growth = yearly_data.pct_change() * 100
 print("\nYear-over-year growth (%):")
 print(yoy_growth.round(2))

In [None]:
# Advanced resampling techniques
print("=== ADVANCED RESAMPLING ===")

# Custom aggregation functions
def coefficient_of_variation(series):
 """Calculate coefficient of variation"""
 return series.std() / series.mean() if series.mean() != 0 else 0

def sales_volatility(series):
 """Calculate sales volatility (std/mean)"""
 return series.std()

# Custom resampling with multiple functions
custom_monthly = ts_data.resample('M').agg({
 'sales': ['sum', 'mean', coefficient_of_variation, sales_volatility],
 'customers': ['sum', 'mean'],
 'marketing_spend': 'sum'
})

print("Custom monthly aggregations:")
print(custom_monthly.round(3))

# Resampling with different anchor points
# Weekly data starting on different days
weekly_sunday = ts_data.resample('W-SUN')['sales'].sum() # Week ending Sunday
weekly_monday = ts_data.resample('W-MON')['sales'].sum() # Week ending Monday

print("\nWeekly totals comparison (first 10 weeks):")
weekly_comparison = pd.DataFrame({
 'Week_End_Sunday': weekly_sunday,
 'Week_End_Monday': weekly_monday
})
print(weekly_comparison.head(10))

# Business day resampling
business_weekly = ts_data.resample('B').mean() # Business days only
print(f"\nBusiness days data shape: {business_weekly.shape}")
print("Business days average (first 10):")
print(business_weekly[['sales', 'customers']].head(10))

In [None]:
# Upsampling and downsampling
print("=== UPSAMPLING AND DOWNSAMPLING ===")

# Downsample to weekly and visualize
weekly_sales = ts_data['sales'].resample('W').sum()

# Upsample weekly back to daily (forward fill)
upsampled_ffill = weekly_sales.resample('D').ffill()

# Upsample with interpolation
upsampled_interp = weekly_sales.resample('D').interpolate()

print("Upsampling comparison (sample period):")

# Fix: Use the date range directly, not the filtered DataFrame
start_date = '2023-01-01'
end_date = '2023-01-31'

upsample_comparison = pd.DataFrame({
 'Original_Daily': ts_data.loc[start_date:end_date, 'sales'],
 'Weekly_Upsampled_FFill': upsampled_ffill.loc[start_date:end_date],
 'Weekly_Upsampled_Interp': upsampled_interp.loc[start_date:end_date]
})

print(upsample_comparison.head(15))

# Visualize upsampling methods
plt.figure(figsize=(12, 6))
plt.plot(upsample_comparison.index, upsample_comparison['Original_Daily'], 
 label='Original Daily', alpha=0.7)
plt.plot(upsample_comparison.index, upsample_comparison['Weekly_Upsampled_FFill'], 
 label='Forward Fill', linestyle='--')
plt.plot(upsample_comparison.index, upsample_comparison['Weekly_Upsampled_Interp'], 
 label='Interpolated', linestyle='-.')
plt.title('Upsampling Methods Comparison')
plt.xlabel('Date')
plt.ylabel('Sales ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 3. Rolling Calculations and Window Functions

Moving averages, rolling statistics, and window-based analysis.

In [None]:
# Basic rolling calculations
print("=== BASIC ROLLING CALCULATIONS ===")

# Calculate various rolling statistics
rolling_data = ts_data.copy()

# Rolling means (moving averages)
rolling_data['sales_7d_avg'] = rolling_data['sales'].rolling(window=7).mean()
rolling_data['sales_30d_avg'] = rolling_data['sales'].rolling(window=30).mean()
rolling_data['sales_90d_avg'] = rolling_data['sales'].rolling(window=90).mean()

# Rolling standard deviation (volatility)
rolling_data['sales_7d_std'] = rolling_data['sales'].rolling(window=7).std()
rolling_data['sales_30d_std'] = rolling_data['sales'].rolling(window=30).std()

# Rolling min/max
rolling_data['sales_30d_min'] = rolling_data['sales'].rolling(window=30).min()
rolling_data['sales_30d_max'] = rolling_data['sales'].rolling(window=30).max()

print("Rolling statistics (last 10 days):")
rolling_cols = ['sales', 'sales_7d_avg', 'sales_30d_avg', 'sales_7d_std', 'sales_30d_std']
print(rolling_data[rolling_cols].tail(10).round(2))

# Rolling sum for cumulative analysis
rolling_data['sales_7d_sum'] = rolling_data['sales'].rolling(window=7).sum()
rolling_data['sales_30d_sum'] = rolling_data['sales'].rolling(window=30).sum()

print("\nRolling sums (last 5 days):")
print(rolling_data[['sales', 'sales_7d_sum', 'sales_30d_sum']].tail(5).round(0))

In [None]:
# Advanced rolling calculations
print("=== ADVANCED ROLLING CALCULATIONS ===")

# Rolling correlation between variables
rolling_data['sales_customers_corr_30d'] = rolling_data['sales'].rolling(window=30).corr(rolling_data['customers'])
rolling_data['sales_marketing_corr_30d'] = rolling_data['sales'].rolling(window=30).corr(rolling_data['marketing_spend'])

print("Rolling correlations (last 10 days):")
corr_cols = ['sales_customers_corr_30d', 'sales_marketing_corr_30d']
print(rolling_data[corr_cols].tail(10).round(3))

# Rolling quantiles
rolling_data['sales_30d_q25'] = rolling_data['sales'].rolling(window=30).quantile(0.25)
rolling_data['sales_30d_q75'] = rolling_data['sales'].rolling(window=30).quantile(0.75)
rolling_data['sales_30d_median'] = rolling_data['sales'].rolling(window=30).median()

print("\nRolling quantiles (last 5 days):")
quantile_cols = ['sales', 'sales_30d_q25', 'sales_30d_median', 'sales_30d_q75']
print(rolling_data[quantile_cols].tail(5).round(2))

# Custom rolling functions
def rolling_cv(series):
 """Rolling coefficient of variation"""
 return series.std() / series.mean() if series.mean() != 0 else 0

def rolling_skewness(series):
 """Rolling skewness"""
 return series.skew()

rolling_data['sales_30d_cv'] = rolling_data['sales'].rolling(window=30).apply(rolling_cv)
rolling_data['sales_30d_skew'] = rolling_data['sales'].rolling(window=30).apply(rolling_skewness)

print("\nCustom rolling statistics (last 5 days):")
custom_cols = ['sales_30d_cv', 'sales_30d_skew']
print(rolling_data[custom_cols].tail(5).round(3))

In [None]:
# Exponentially weighted functions
print("=== EXPONENTIALLY WEIGHTED FUNCTIONS ===")

# Exponentially weighted moving average (EWMA)
rolling_data['sales_ewm_10'] = rolling_data['sales'].ewm(span=10).mean()
rolling_data['sales_ewm_30'] = rolling_data['sales'].ewm(span=30).mean()

# Exponentially weighted standard deviation
rolling_data['sales_ewm_std_10'] = rolling_data['sales'].ewm(span=10).std()

print("Exponentially weighted statistics (last 10 days):")
ewm_cols = ['sales', 'sales_7d_avg', 'sales_ewm_10', 'sales_ewm_30']
print(rolling_data[ewm_cols].tail(10).round(2))

# Visualize different smoothing methods
plt.figure(figsize=(15, 8))

# Plot last 90 days for clarity
recent_period = rolling_data.tail(90)

plt.plot(recent_period.index, recent_period['sales'], label='Original Sales', alpha=0.7)
plt.plot(recent_period.index, recent_period['sales_7d_avg'], label='7-day MA', linewidth=2)
plt.plot(recent_period.index, recent_period['sales_30d_avg'], label='30-day MA', linewidth=2)
plt.plot(recent_period.index, recent_period['sales_ewm_10'], label='EWM (span=10)', linewidth=2)

plt.title('Sales Smoothing Methods Comparison (Last 90 Days)')
plt.xlabel('Date')
plt.ylabel('Sales ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate lag between different smoothing methods
print("\nSmoothing method responsiveness (correlation with original):")
responsiveness = {
 '7-day MA': rolling_data['sales'].corr(rolling_data['sales_7d_avg']),
 '30-day MA': rolling_data['sales'].corr(rolling_data['sales_30d_avg']),
 'EWM (span=10)': rolling_data['sales'].corr(rolling_data['sales_ewm_10']),
 'EWM (span=30)': rolling_data['sales'].corr(rolling_data['sales_ewm_30'])
}

for method, corr in responsiveness.items():
 print(f"{method}: {corr:.4f}")

## 4. Seasonal Analysis and Decomposition

Analyzing seasonal patterns and decomposing time series.

In [None]:
# Seasonal pattern analysis
print("=== SEASONAL PATTERN ANALYSIS ===")

# Add more detailed time components
seasonal_data = ts_data.copy()
seasonal_data['month'] = seasonal_data.index.month
seasonal_data['quarter'] = seasonal_data.index.quarter
seasonal_data['day_of_week'] = seasonal_data.index.dayofweek
seasonal_data['week_of_year'] = seasonal_data.index.isocalendar().week
seasonal_data['day_of_year'] = seasonal_data.index.dayofyear

# Monthly seasonality
monthly_pattern = seasonal_data.groupby('month')['sales'].agg(['mean', 'std', 'count'])
print("Monthly sales patterns:")
print(monthly_pattern.round(2))

# Day of week patterns
dow_pattern = seasonal_data.groupby('day_of_week')['sales'].agg(['mean', 'std'])
dow_pattern.index = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
print("\nDay of week patterns:")
print(dow_pattern.round(2))

# Weekly patterns throughout the year
weekly_pattern = seasonal_data.groupby('week_of_year')['sales'].mean()
print("\nWeekly pattern statistics:")
print(f"Highest week: Week {weekly_pattern.idxmax()} (${weekly_pattern.max():.0f})")
print(f"Lowest week: Week {weekly_pattern.idxmin()} (${weekly_pattern.min():.0f})")
print(f"Weekly variation: {weekly_pattern.std():.2f}")

# Quarterly analysis
quarterly_pattern = seasonal_data.groupby('quarter')['sales'].agg(['mean', 'sum', 'std'])
print("\nQuarterly patterns:")
print(quarterly_pattern.round(2))

In [None]:
# Simple seasonal decomposition
print("=== SEASONAL DECOMPOSITION ===")

# Manual decomposition approach
def simple_decompose(series, period=365):
 """Simple seasonal decomposition"""
 # Trend (using centered moving average)
 trend = series.rolling(window=period, center=True).mean()
 
 # Detrended series
 detrended = series - trend
 
 # Seasonal component (average for each period)
 seasonal_avg = detrended.groupby(detrended.index.dayofyear).mean()
 seasonal = pd.Series(index=series.index, dtype=float)
 for idx in series.index:
 day_of_year = idx.dayofyear
 if day_of_year in seasonal_avg.index:
 seasonal.loc[idx] = seasonal_avg.loc[day_of_year]
 else: # Handle leap year day
 seasonal.loc[idx] = 0
 
 # Residual (what's left after removing trend and seasonality)
 residual = series - trend - seasonal
 
 return trend, seasonal, residual

# Decompose sales data
trend, seasonal, residual = simple_decompose(ts_data['sales'])

# Create decomposition DataFrame
decomposition = pd.DataFrame({
 'original': ts_data['sales'],
 'trend': trend,
 'seasonal': seasonal,
 'residual': residual
})

print("Decomposition summary:")
print(decomposition.describe().round(2))

# Visualize decomposition
fig, axes = plt.subplots(4, 1, figsize=(15, 12))

# Original series
decomposition['original'].plot(ax=axes[0], title='Original Sales Data')
axes[0].set_ylabel('Sales ($)')
axes[0].grid(True, alpha=0.3)

# Trend
decomposition['trend'].plot(ax=axes[1], title='Trend Component', color='red')
axes[1].set_ylabel('Trend ($)')
axes[1].grid(True, alpha=0.3)

# Seasonal
decomposition['seasonal'].plot(ax=axes[2], title='Seasonal Component', color='green')
axes[2].set_ylabel('Seasonal ($)')
axes[2].grid(True, alpha=0.3)

# Residual
decomposition['residual'].plot(ax=axes[3], title='Residual Component', color='purple')
axes[3].set_ylabel('Residual ($)')
axes[3].set_xlabel('Date')
axes[3].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nDecomposition insights:")
print(f"Trend contribution: {trend.std():.2f} (std dev)")
print(f"Seasonal contribution: {seasonal.std():.2f} (std dev)")
print(f"Residual contribution: {residual.std():.2f} (std dev)")
print(f"Total variation: {ts_data['sales'].std():.2f} (std dev)")

In [None]:
# Advanced seasonal analysis
print("=== ADVANCED SEASONAL ANALYSIS ===")

# Year-over-year comparison
yoy_comparison = pd.DataFrame()
for year in ts_data.index.year.unique():
 year_data = ts_data[ts_data.index.year == year]['sales']
 year_data.index = year_data.index.dayofyear
 yoy_comparison[f'Year_{year}'] = year_data

print("Year-over-year comparison (sample days):")
print(yoy_comparison.head(10).round(2))

# Calculate year-over-year changes
if len(yoy_comparison.columns) > 1:
 yoy_change = yoy_comparison.pct_change(axis=1) * 100
 print("\nYear-over-year change statistics:")
 for col in yoy_change.columns[1:]:
 print(f"{col}: mean={yoy_change[col].mean():.2f}%, std={yoy_change[col].std():.2f}%")

# Seasonal strength measurement
def seasonal_strength(series, period=365):
 """Calculate seasonal strength (0 = no seasonality, 1 = pure seasonality)"""
 # Detrend the series
 trend = series.rolling(window=period, center=True).mean()
 detrended = series - trend
 
 # Calculate seasonal component
 seasonal_avg = detrended.groupby(detrended.index.dayofyear).mean()
 seasonal_var = seasonal_avg.var()
 
 # Calculate residual variance
 seasonal_full = pd.Series(index=series.index, dtype=float)
 for idx in series.index:
 day_of_year = idx.dayofyear
 if day_of_year in seasonal_avg.index:
 seasonal_full.loc[idx] = seasonal_avg.loc[day_of_year]
 else:
 seasonal_full.loc[idx] = 0
 
 residual = detrended - seasonal_full
 residual_var = residual.var()
 
 # Seasonal strength
 return seasonal_var / (seasonal_var + residual_var)

sales_seasonal_strength = seasonal_strength(ts_data['sales'])
print(f"\nSales seasonal strength: {sales_seasonal_strength:.3f}")
print("(0 = no seasonality, 1 = pure seasonality)")

# Identify most/least seasonal periods
monthly_seasonal = seasonal_data.groupby('month')['sales'].std()
print(f"\nMost variable month: {monthly_seasonal.idxmax()} (std: {monthly_seasonal.max():.2f})")
print(f"Least variable month: {monthly_seasonal.idxmin()} (std: {monthly_seasonal.min():.2f})")

## 5. Business Applications and Forecasting

Real-world time series analysis for business insights.

In [None]:
# Business performance metrics
print("=== BUSINESS PERFORMANCE METRICS ===")

def calculate_business_metrics(df):
 """Calculate key business time series metrics"""
 metrics = {}
 
 # Growth metrics
 daily_sales = df['sales']
 metrics['total_sales'] = daily_sales.sum()
 metrics['avg_daily_sales'] = daily_sales.mean()
 metrics['sales_growth_rate'] = (daily_sales.iloc[-30:].mean() / daily_sales.iloc[:30].mean() - 1) * 100
 
 # Volatility metrics
 metrics['sales_volatility'] = daily_sales.std()
 metrics['coefficient_of_variation'] = daily_sales.std() / daily_sales.mean()
 
 # Trend metrics
 trend = daily_sales.rolling(window=30).mean()
 metrics['trend_direction'] = 'Increasing' if trend.iloc[-1] > trend.iloc[-30] else 'Decreasing'
 metrics['trend_strength'] = abs(trend.iloc[-1] - trend.iloc[-30]) / trend.iloc[-30] * 100
 
 # Customer metrics
 metrics['avg_customers_per_day'] = df['customers'].mean()
 metrics['sales_per_customer'] = df['sales'].sum() / df['customers'].sum()
 
 # Marketing efficiency
 metrics['marketing_roi'] = df['sales'].sum() / df['marketing_spend'].sum()
 
 return metrics

# Calculate metrics for different periods
overall_metrics = calculate_business_metrics(ts_data)
recent_metrics = calculate_business_metrics(ts_data.tail(90)) # Last 90 days

print("Overall Performance Metrics:")
for metric, value in overall_metrics.items():
 if isinstance(value, float):
 print(f"{metric}: {value:.2f}")
 else:
 print(f"{metric}: {value}")

print("\nRecent 90-day Performance Metrics:")
for metric, value in recent_metrics.items():
 if isinstance(value, float):
 print(f"{metric}: {value:.2f}")
 else:
 print(f"{metric}: {value}")

In [None]:
# Simple forecasting using historical patterns
print("=== SIMPLE FORECASTING ===")

def simple_forecast(series, periods=30, method='seasonal_naive'):
 """Simple forecasting methods"""
 if method == 'naive':
 # Naive: repeat last value
 return pd.Series([series.iloc[-1]] * periods, 
 index=pd.date_range(series.index[-1] + pd.Timedelta(days=1), periods=periods))
 
 elif method == 'seasonal_naive':
 # Seasonal naive: repeat same day from previous year
 forecast_dates = pd.date_range(series.index[-1] + pd.Timedelta(days=1), periods=periods)
 forecast_values = []
 
 for date in forecast_dates:
 # Find same day of year from previous year
 previous_year_date = date - pd.DateOffset(years=1)
 if previous_year_date in series.index:
 forecast_values.append(series.loc[previous_year_date])
 else:
 # Fallback to seasonal average
 day_of_year = date.dayofyear
 same_day_values = series[series.index.dayofyear == day_of_year]
 if len(same_day_values) > 0:
 forecast_values.append(same_day_values.mean())
 else:
 forecast_values.append(series.mean())
 
 return pd.Series(forecast_values, index=forecast_dates)
 
 elif method == 'moving_average':
 # Moving average forecast
 ma_value = series.tail(30).mean()
 return pd.Series([ma_value] * periods,
 index=pd.date_range(series.index[-1] + pd.Timedelta(days=1), periods=periods))
 
 elif method == 'trend':
 # Linear trend forecast
 from scipy import stats
 x = np.arange(len(series))
 slope, intercept, _, _, _ = stats.linregress(x, series.values)
 
 forecast_dates = pd.date_range(series.index[-1] + pd.Timedelta(days=1), periods=periods)
 forecast_values = [slope * (len(series) + i) + intercept for i in range(1, periods + 1)]
 
 return pd.Series(forecast_values, index=forecast_dates)

# Generate forecasts using different methods
forecast_periods = 30
sales_series = ts_data['sales']

forecasts = {
 'Naive': simple_forecast(sales_series, forecast_periods, 'naive'),
 'Seasonal_Naive': simple_forecast(sales_series, forecast_periods, 'seasonal_naive'),
 'Moving_Average': simple_forecast(sales_series, forecast_periods, 'moving_average'),
 'Trend': simple_forecast(sales_series, forecast_periods, 'trend')
}

print(f"Forecasts for next {forecast_periods} days:")
forecast_df = pd.DataFrame(forecasts)
print(forecast_df.head(10).round(2))

print("\nForecast summary statistics:")
print(forecast_df.describe().round(2))

# Visualize forecasts
plt.figure(figsize=(15, 8))

# Plot historical data (last 90 days)
historical_period = sales_series.tail(90)
plt.plot(historical_period.index, historical_period.values, label='Historical Data', color='black', linewidth=2)

# Plot forecasts
colors = ['red', 'blue', 'green', 'orange']
for i, (method, forecast) in enumerate(forecasts.items()):
 plt.plot(forecast.index, forecast.values, label=f'{method} Forecast', 
 color=colors[i], linestyle='--', linewidth=2)

plt.title('Sales Forecasting Comparison')
plt.xlabel('Date')
plt.ylabel('Sales ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Anomaly detection in time series
print("=== ANOMALY DETECTION ===")

def detect_anomalies(series, method='zscore', threshold=3):
 """Detect anomalies in time series"""
 anomalies = pd.Series(False, index=series.index)
 
 if method == 'zscore':
 # Z-score method
 z_scores = np.abs((series - series.mean()) / series.std())
 anomalies = z_scores > threshold
 
 elif method == 'iqr':
 # Interquartile range method
 Q1 = series.quantile(0.25)
 Q3 = series.quantile(0.75)
 IQR = Q3 - Q1
 lower_bound = Q1 - 1.5 * IQR
 upper_bound = Q3 + 1.5 * IQR
 anomalies = (series < lower_bound) | (series > upper_bound)
 
 elif method == 'rolling':
 # Rolling window method
 rolling_mean = series.rolling(window=30).mean()
 rolling_std = series.rolling(window=30).std()
 z_scores = np.abs((series - rolling_mean) / rolling_std)
 anomalies = z_scores > threshold
 
 return anomalies

# Detect anomalies using different methods
anomaly_methods = ['zscore', 'iqr', 'rolling']
anomaly_results = {}

for method in anomaly_methods:
 anomalies = detect_anomalies(ts_data['sales'], method=method)
 anomaly_results[method] = anomalies
 print(f"{method.upper()} method: {anomalies.sum()} anomalies detected")

# Combine anomaly detection results
anomaly_df = pd.DataFrame(anomaly_results)
anomaly_df['any_method'] = anomaly_df.any(axis=1)
anomaly_df['all_methods'] = anomaly_df[anomaly_methods].all(axis=1)

print(f"\nAnomalies detected by any method: {anomaly_df['any_method'].sum()}")
print(f"Anomalies detected by all methods: {anomaly_df['all_methods'].sum()}")

# Show anomalous dates
severe_anomalies = ts_data[anomaly_df['all_methods']]
if len(severe_anomalies) > 0:
 print("\nSevere anomalies (detected by all methods):")
 print(severe_anomalies[['sales', 'customers', 'marketing_spend']].round(2))

# Visualize anomalies
plt.figure(figsize=(15, 8))

# Plot sales data
plt.plot(ts_data.index, ts_data['sales'], label='Sales Data', alpha=0.7)

# Highlight anomalies
for method in anomaly_methods:
 anomaly_dates = ts_data.index[anomaly_results[method]]
 anomaly_values = ts_data.loc[anomaly_dates, 'sales']
 plt.scatter(anomaly_dates, anomaly_values, label=f'{method.upper()} Anomalies', alpha=0.7, s=30)

plt.title('Sales Data with Anomaly Detection')
plt.xlabel('Date')
plt.ylabel('Sales ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Anomaly statistics
print("\nAnomaly statistics:")
for method in anomaly_methods:
 anomaly_sales = ts_data.loc[anomaly_results[method], 'sales']
 if len(anomaly_sales) > 0:
 print(f"{method.upper()}: mean=${anomaly_sales.mean():.2f}, std=${anomaly_sales.std():.2f}")
 else:
 print(f"{method.upper()}: No anomalies detected")

## Practice Exercises

Apply time series analysis to complex business scenarios:

In [18]:
# Exercise 1: Comprehensive Time Series Dashboard
# Create a complete time series analysis dashboard that includes:
# - Multiple time series metrics and KPIs
# - Seasonal analysis and trend identification
# - Anomaly detection and alerting
# - Forecasting with confidence intervals
# - Business insights and recommendations

def create_time_series_dashboard(df):
 """Create comprehensive time series analysis dashboard"""
 # Your implementation here
 pass

# dashboard = create_time_series_dashboard(ts_data)
# print("Time Series Dashboard Created")

In [19]:
# Exercise 2: Multi-variate Time Series Analysis
# Analyze relationships between multiple time series:
# - Cross-correlation analysis
# - Lead-lag relationships
# - Causality testing
# - Multi-variate forecasting

# Your code here:


In [20]:
# Exercise 3: Advanced Forecasting Challenge
# Implement more sophisticated forecasting methods:
# - Exponential smoothing with trend and seasonality
# - ARIMA modeling
# - Model evaluation and selection
# - Forecast accuracy metrics

# Your code here:


## Key Takeaways

1. **DateTime Indexing**:
 - Use `pd.DatetimeIndex` for time-based operations
 - Enable powerful time-based selection and slicing
 - Extract components (year, month, day, etc.) for analysis

2. **Resampling**:
 - **`.resample()`**: Convert between different frequencies
 - **Downsampling**: Aggregate to lower frequency (daily → monthly)
 - **Upsampling**: Convert to higher frequency (monthly → daily)
 - Use appropriate aggregation functions for your data

3. **Rolling Calculations**:
 - **`.rolling()`**: Moving window calculations
 - **`.ewm()`**: Exponentially weighted functions
 - Useful for smoothing and trend analysis
 - Handle missing values appropriately

4. **Seasonal Analysis**:
 - Identify patterns by time components
 - Decompose into trend, seasonal, and residual
 - Measure seasonal strength and variability

## Time Series Quick Reference

```python
# Create datetime index
df.set_index(pd.to_datetime(df['date']), inplace=True)

# Time-based selection
df['2023'] # Select year
df['2023-01'] # Select month
df['2023-01-01':'2023-01-31'] # Date range

# Resampling
df.resample('M').sum() # Monthly sum
df.resample('W').mean() # Weekly average
df.resample('Q').agg({'col': ['sum', 'mean']}) # Quarterly multi-agg

# Rolling calculations
df['col'].rolling(7).mean() # 7-period moving average
df['col'].ewm(span=10).mean() # Exponential moving average
df['col'].rolling(30).std() # 30-period rolling standard deviation
```

## Business Applications

| Use Case | Technique | Key Insights |
|----------|-----------|-------------|
| Sales forecasting | Seasonal decomposition + trends | Predict future performance |
| Anomaly detection | Rolling statistics + thresholds | Identify unusual patterns |
| Performance monitoring | Moving averages + KPIs | Track business health |
| Seasonal planning | Seasonal analysis | Optimize inventory/staffing |
| Marketing ROI | Cross-correlation analysis | Measure campaign effectiveness |

## Best Practices

1. **Data Quality**: Ensure consistent time intervals and handle missing data
2. **Frequency Choice**: Choose appropriate resampling frequency for your analysis
3. **Window Size**: Balance responsiveness vs. smoothness in rolling calculations
4. **Seasonality**: Always check for and account for seasonal patterns
5. **Validation**: Use holdout periods to validate forecasting models
6. **Business Context**: Interpret results in context of business cycles and events
