crypto_bot_training/Session_01/PandasDataFrame-exmples/10_time_series_analysis.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Session 1 - DataFrames - Lesson 10: Time Series Analysis\n",
    "\n",
    "## Learning Objectives\n",
    "- Master datetime indexing and time-based operations\n",
    "- Learn resampling and frequency conversion techniques\n",
    "- Understand rolling calculations and window functions\n",
    "- Practice with seasonal analysis and trend decomposition\n",
    "- Apply time series techniques to business forecasting scenarios\n",
    "\n",
    "## Prerequisites\n",
    "- Completed Lessons 1-9\n",
    "- Understanding of datetime concepts\n",
    "- Basic knowledge of statistics (helpful for trend analysis)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import required libraries\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from datetime import datetime, timedelta\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "# Set display options\n",
    "pd.set_option('display.max_columns', None)\n",
    "pd.set_option('display.max_rows', 20)\n",
    "plt.style.use('seaborn-v0_8')\n",
    "%matplotlib inline\n",
    "\n",
    "print(\"Libraries loaded successfully!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating Time Series Data\n",
    "\n",
    "Let's create realistic time series datasets for analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create comprehensive time series dataset\n",
    "np.random.seed(42)\n",
    "\n",
    "# Generate 2 years of daily data\n",
    "start_date = '2022-01-01'\n",
    "end_date = '2023-12-31'\n",
    "date_range = pd.date_range(start=start_date, end=end_date, freq='D')\n",
    "\n",
    "# Create realistic sales data with trends and seasonality\n",
    "n_days = len(date_range)\n",
    "base_sales = 1000\n",
    "\n",
    "# Add trend (gradual increase over time)\n",
    "trend = np.linspace(0, 300, n_days)\n",
    "\n",
    "# Add seasonality (weekly and monthly patterns)\n",
    "daily_pattern = np.sin(2 * np.pi * np.arange(n_days) / 7) * 100  # Weekly pattern\n",
    "monthly_pattern = np.sin(2 * np.pi * np.arange(n_days) / 30.44) * 150  # Monthly pattern\n",
    "yearly_pattern = np.sin(2 * np.pi * np.arange(n_days) / 365.25) * 200  # Yearly pattern\n",
    "\n",
    "# Add random noise\n",
    "noise = np.random.normal(0, 80, n_days)\n",
    "\n",
    "# Combine all components\n",
    "sales = base_sales + trend + daily_pattern + monthly_pattern + yearly_pattern + noise\n",
    "sales = np.maximum(sales, 0)  # Ensure non-negative sales\n",
    "\n",
    "# Create DataFrame\n",
    "ts_data = pd.DataFrame({\n",
    "    'date': date_range,\n",
    "    'sales': sales,\n",
    "    'customers': np.random.poisson(50, n_days) + (sales / 50).astype(int),\n",
    "    'marketing_spend': np.random.gamma(2, 20, n_days),\n",
    "    'temperature': 20 + 10 * np.sin(2 * np.pi * np.arange(n_days) / 365.25) + np.random.normal(0, 5, n_days),\n",
    "    'is_weekend': pd.Series(date_range).dt.dayofweek >= 5,\n",
    "    'is_holiday': np.random.choice([True, False], n_days, p=[0.05, 0.95])\n",
    "})\n",
    "\n",
    "# Set date as index\n",
    "ts_data.set_index('date', inplace=True)\n",
    "\n",
    "print(\"Time series dataset created:\")\n",
    "print(f\"Shape: {ts_data.shape}\")\n",
    "print(f\"Date range: {ts_data.index.min()} to {ts_data.index.max()}\")\n",
    "print(\"\\nFirst few rows:\")\n",
    "print(ts_data.head())\n",
    "print(\"\\nData types:\")\n",
    "print(ts_data.dtypes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. DateTime Indexing and Basic Operations\n",
    "\n",
    "Working with datetime indices and time-based selection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Basic datetime index operations\n",
    "print(\"=== DATETIME INDEX OPERATIONS ===\")\n",
    "\n",
    "# Index information\n",
    "print(f\"Index type: {type(ts_data.index)}\")\n",
    "print(f\"Index frequency: {ts_data.index.freq}\")\n",
    "print(f\"Index is monotonic increasing: {ts_data.index.is_monotonic_increasing}\")\n",
    "print(f\"Index is monotonic decreasing: {ts_data.index.is_monotonic_decreasing}\")\n",
    "print(f\"Index has duplicates: {ts_data.index.has_duplicates}\")\n",
    "\n",
    "# Time-based selection\n",
    "print(\"\\n--- Time-based Selection ---\")\n",
    "\n",
    "# Select specific year\n",
    "sales_2022 = ts_data.loc['2022']\n",
    "print(f\"2022 data shape: {sales_2022.shape}\")\n",
    "print(f\"2022 average daily sales: {sales_2022['sales'].mean():.2f}\")\n",
    "\n",
    "# Select specific month\n",
    "jan_2023 = ts_data.loc['2023-01']\n",
    "print(f\"\\nJanuary 2023 data shape: {jan_2023.shape}\")\n",
    "print(f\"January 2023 total sales: {jan_2023['sales'].sum():.2f}\")\n",
    "\n",
    "# Select date range\n",
    "q1_2023 = ts_data.loc['2023-01-01':'2023-03-31']\n",
    "print(f\"\\nQ1 2023 data shape: {q1_2023.shape}\")\n",
    "print(f\"Q1 2023 average sales: {q1_2023['sales'].mean():.2f}\")\n",
    "\n",
    "# Recent data (last 30 days)\n",
    "recent_data = ts_data.tail(30)\n",
    "print(f\"\\nLast 30 days average sales: {recent_data['sales'].mean():.2f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# DateTime component extraction\n",
    "print(\"=== DATETIME COMPONENT EXTRACTION ===\")\n",
    "\n",
    "# Extract various date components\n",
    "ts_enhanced = ts_data.copy()\n",
    "ts_enhanced['year'] = ts_enhanced.index.year\n",
    "ts_enhanced['month'] = ts_enhanced.index.month\n",
    "ts_enhanced['quarter'] = ts_enhanced.index.quarter\n",
    "ts_enhanced['day_of_week'] = ts_enhanced.index.dayofweek  # 0=Monday, 6=Sunday\n",
    "ts_enhanced['day_name'] = ts_enhanced.index.day_name()\n",
    "ts_enhanced['month_name'] = ts_enhanced.index.month_name()\n",
    "ts_enhanced['week_of_year'] = ts_enhanced.index.isocalendar().week\n",
    "ts_enhanced['day_of_year'] = ts_enhanced.index.dayofyear\n",
    "ts_enhanced['is_month_start'] = ts_enhanced.index.is_month_start\n",
    "ts_enhanced['is_month_end'] = ts_enhanced.index.is_month_end\n",
    "ts_enhanced['is_quarter_start'] = ts_enhanced.index.is_quarter_start\n",
    "ts_enhanced['is_quarter_end'] = ts_enhanced.index.is_quarter_end\n",
    "\n",
    "print(\"Enhanced dataset with datetime components:\")\n",
    "print(ts_enhanced[['sales', 'year', 'month', 'quarter', 'day_name', 'week_of_year']].head())\n",
    "\n",
    "# Analyze patterns by day of week\n",
    "print(\"\\nSales patterns by day of week:\")\n",
    "dow_analysis = ts_enhanced.groupby('day_name')['sales'].agg(['mean', 'std', 'count'])\n",
    "# Reorder by weekday\n",
    "day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']\n",
    "dow_analysis = dow_analysis.reindex(day_order)\n",
    "print(dow_analysis.round(2))\n",
    "\n",
    "# Monthly patterns\n",
    "print(\"\\nSales patterns by month:\")\n",
    "monthly_analysis = ts_enhanced.groupby('month_name')['sales'].agg(['mean', 'std'])\n",
    "month_order = ['January', 'February', 'March', 'April', 'May', 'June',\n",
    "               'July', 'August', 'September', 'October', 'November', 'December']\n",
    "monthly_analysis = monthly_analysis.reindex([m for m in month_order if m in monthly_analysis.index])\n",
    "print(monthly_analysis.round(2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Time series visualization\n",
    "print(\"=== TIME SERIES VISUALIZATION ===\")\n",
    "\n",
    "# Create comprehensive time series plots\n",
    "fig, axes = plt.subplots(2, 2, figsize=(15, 10))\n",
    "\n",
    "# Plot 1: Daily sales over time\n",
    "ts_data['sales'].plot(ax=axes[0, 0], title='Daily Sales Over Time', alpha=0.7)\n",
    "axes[0, 0].set_ylabel('Sales ($)')\n",
    "axes[0, 0].grid(True, alpha=0.3)\n",
    "\n",
    "# Plot 2: Monthly aggregated sales\n",
    "monthly_sales = ts_data['sales'].resample('M').sum()\n",
    "monthly_sales.plot(ax=axes[0, 1], title='Monthly Sales', marker='o')\n",
    "axes[0, 1].set_ylabel('Monthly Sales ($)')\n",
    "axes[0, 1].grid(True, alpha=0.3)\n",
    "\n",
    "# Plot 3: Sales vs customers correlation\n",
    "axes[1, 0].scatter(ts_data['customers'], ts_data['sales'], alpha=0.5)\n",
    "axes[1, 0].set_title('Sales vs Customers')\n",
    "axes[1, 0].set_xlabel('Number of Customers')\n",
    "axes[1, 0].set_ylabel('Sales ($)')\n",
    "axes[1, 0].grid(True, alpha=0.3)\n",
    "\n",
    "# Plot 4: Seasonal pattern (by month)\n",
    "ts_enhanced.groupby('month')['sales'].mean().plot(ax=axes[1, 1], kind='bar', \n",
    "                                                  title='Average Sales by Month')\n",
    "axes[1, 1].set_ylabel('Average Sales ($)')\n",
    "axes[1, 1].set_xlabel('Month')\n",
    "axes[1, 1].tick_params(axis='x', rotation=45)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "# Summary statistics\n",
    "print(\"\\nTime series summary statistics:\")\n",
    "print(ts_data['sales'].describe())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Resampling and Frequency Conversion\n",
    "\n",
    "Converting between different time frequencies and aggregating data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Basic resampling operations\n",
    "print(\"=== BASIC RESAMPLING ===\")\n",
    "\n",
    "# Resample to different frequencies\n",
    "weekly_data = ts_data.resample('W').agg({\n",
    "    'sales': 'sum',\n",
    "    'customers': 'sum',\n",
    "    'marketing_spend': 'sum',\n",
    "    'temperature': 'mean'\n",
    "})\n",
    "\n",
    "print(\"Weekly resampled data:\")\n",
    "print(weekly_data.head(10))\n",
    "\n",
    "# Monthly resampling with multiple aggregations\n",
    "monthly_data = ts_data.resample('M').agg({\n",
    "    'sales': ['sum', 'mean', 'std', 'min', 'max'],\n",
    "    'customers': ['sum', 'mean'],\n",
    "    'marketing_spend': 'sum',\n",
    "    'temperature': 'mean'\n",
    "})\n",
    "\n",
    "print(\"\\nMonthly resampled data (first 6 months):\")\n",
    "print(monthly_data.head(6))\n",
    "\n",
    "# Quarterly resampling\n",
    "quarterly_data = ts_data.resample('Q').agg({\n",
    "    'sales': 'sum',\n",
    "    'customers': 'sum',\n",
    "    'marketing_spend': 'sum'\n",
    "})\n",
    "\n",
    "print(\"\\nQuarterly resampled data:\")\n",
    "print(quarterly_data)\n",
    "\n",
    "# Year-over-year comparison\n",
    "yearly_data = ts_data.resample('Y').agg({\n",
    "    'sales': 'sum',\n",
    "    'customers': 'sum',\n",
    "    'marketing_spend': 'sum'\n",
    "})\n",
    "\n",
    "print(\"\\nYearly resampled data:\")\n",
    "print(yearly_data)\n",
    "\n",
    "# Calculate year-over-year growth\n",
    "if len(yearly_data) > 1:\n",
    "    yoy_growth = yearly_data.pct_change() * 100\n",
    "    print(\"\\nYear-over-year growth (%):\")\n",
    "    print(yoy_growth.round(2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Advanced resampling techniques\n",
    "print(\"=== ADVANCED RESAMPLING ===\")\n",
    "\n",
    "# Custom aggregation functions\n",
    "def coefficient_of_variation(series):\n",
    "    \"\"\"Calculate coefficient of variation\"\"\"\n",
    "    return series.std() / series.mean() if series.mean() != 0 else 0\n",
    "\n",
    "def sales_volatility(series):\n",
    "    \"\"\"Calculate sales volatility (std/mean)\"\"\"\n",
    "    return series.std()\n",
    "\n",
    "# Custom resampling with multiple functions\n",
    "custom_monthly = ts_data.resample('M').agg({\n",
    "    'sales': ['sum', 'mean', coefficient_of_variation, sales_volatility],\n",
    "    'customers': ['sum', 'mean'],\n",
    "    'marketing_spend': 'sum'\n",
    "})\n",
    "\n",
    "print(\"Custom monthly aggregations:\")\n",
    "print(custom_monthly.round(3))\n",
    "\n",
    "# Resampling with different anchor points\n",
    "# Weekly data starting on different days\n",
    "weekly_sunday = ts_data.resample('W-SUN')['sales'].sum()  # Week ending Sunday\n",
    "weekly_monday = ts_data.resample('W-MON')['sales'].sum()  # Week ending Monday\n",
    "\n",
    "print(\"\\nWeekly totals comparison (first 10 weeks):\")\n",
    "weekly_comparison = pd.DataFrame({\n",
    "    'Week_End_Sunday': weekly_sunday,\n",
    "    'Week_End_Monday': weekly_monday\n",
    "})\n",
    "print(weekly_comparison.head(10))\n",
    "\n",
    "# Business day resampling\n",
    "business_weekly = ts_data.resample('B').mean()  # Business days only\n",
    "print(f\"\\nBusiness days data shape: {business_weekly.shape}\")\n",
    "print(\"Business days average (first 10):\")\n",
    "print(business_weekly[['sales', 'customers']].head(10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Upsampling and downsampling\n",
    "print(\"=== UPSAMPLING AND DOWNSAMPLING ===\")\n",
    "\n",
    "# Downsample to weekly and visualize\n",
    "weekly_sales = ts_data['sales'].resample('W').sum()\n",
    "\n",
    "# Upsample weekly back to daily (forward fill)\n",
    "upsampled_ffill = weekly_sales.resample('D').ffill()\n",
    "\n",
    "# Upsample with interpolation\n",
    "upsampled_interp = weekly_sales.resample('D').interpolate()\n",
    "\n",
    "print(\"Upsampling comparison (sample period):\")\n",
    "\n",
    "# Fix: Use the date range directly, not the filtered DataFrame\n",
    "start_date = '2023-01-01'\n",
    "end_date = '2023-01-31'\n",
    "\n",
    "upsample_comparison = pd.DataFrame({\n",
    "    'Original_Daily': ts_data.loc[start_date:end_date, 'sales'],\n",
    "    'Weekly_Upsampled_FFill': upsampled_ffill.loc[start_date:end_date],\n",
    "    'Weekly_Upsampled_Interp': upsampled_interp.loc[start_date:end_date]\n",
    "})\n",
    "\n",
    "print(upsample_comparison.head(15))\n",
    "\n",
    "# Visualize upsampling methods\n",
    "plt.figure(figsize=(12, 6))\n",
    "plt.plot(upsample_comparison.index, upsample_comparison['Original_Daily'], \n",
    "         label='Original Daily', alpha=0.7)\n",
    "plt.plot(upsample_comparison.index, upsample_comparison['Weekly_Upsampled_FFill'], \n",
    "         label='Forward Fill', linestyle='--')\n",
    "plt.plot(upsample_comparison.index, upsample_comparison['Weekly_Upsampled_Interp'], \n",
    "         label='Interpolated', linestyle='-.')\n",
    "plt.title('Upsampling Methods Comparison')\n",
    "plt.xlabel('Date')\n",
    "plt.ylabel('Sales ($)')\n",
    "plt.legend()\n",
    "plt.grid(True, alpha=0.3)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Rolling Calculations and Window Functions\n",
    "\n",
    "Moving averages, rolling statistics, and window-based analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Basic rolling calculations\n",
    "print(\"=== BASIC ROLLING CALCULATIONS ===\")\n",
    "\n",
    "# Calculate various rolling statistics\n",
    "rolling_data = ts_data.copy()\n",
    "\n",
    "# Rolling means (moving averages)\n",
    "rolling_data['sales_7d_avg'] = rolling_data['sales'].rolling(window=7).mean()\n",
    "rolling_data['sales_30d_avg'] = rolling_data['sales'].rolling(window=30).mean()\n",
    "rolling_data['sales_90d_avg'] = rolling_data['sales'].rolling(window=90).mean()\n",
    "\n",
    "# Rolling standard deviation (volatility)\n",
    "rolling_data['sales_7d_std'] = rolling_data['sales'].rolling(window=7).std()\n",
    "rolling_data['sales_30d_std'] = rolling_data['sales'].rolling(window=30).std()\n",
    "\n",
    "# Rolling min/max\n",
    "rolling_data['sales_30d_min'] = rolling_data['sales'].rolling(window=30).min()\n",
    "rolling_data['sales_30d_max'] = rolling_data['sales'].rolling(window=30).max()\n",
    "\n",
    "print(\"Rolling statistics (last 10 days):\")\n",
    "rolling_cols = ['sales', 'sales_7d_avg', 'sales_30d_avg', 'sales_7d_std', 'sales_30d_std']\n",
    "print(rolling_data[rolling_cols].tail(10).round(2))\n",
    "\n",
    "# Rolling sum for cumulative analysis\n",
    "rolling_data['sales_7d_sum'] = rolling_data['sales'].rolling(window=7).sum()\n",
    "rolling_data['sales_30d_sum'] = rolling_data['sales'].rolling(window=30).sum()\n",
    "\n",
    "print(\"\\nRolling sums (last 5 days):\")\n",
    "print(rolling_data[['sales', 'sales_7d_sum', 'sales_30d_sum']].tail(5).round(0))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Advanced rolling calculations\n",
    "print(\"=== ADVANCED ROLLING CALCULATIONS ===\")\n",
    "\n",
    "# Rolling correlation between variables\n",
    "rolling_data['sales_customers_corr_30d'] = rolling_data['sales'].rolling(window=30).corr(rolling_data['customers'])\n",
    "rolling_data['sales_marketing_corr_30d'] = rolling_data['sales'].rolling(window=30).corr(rolling_data['marketing_spend'])\n",
    "\n",
    "print(\"Rolling correlations (last 10 days):\")\n",
    "corr_cols = ['sales_customers_corr_30d', 'sales_marketing_corr_30d']\n",
    "print(rolling_data[corr_cols].tail(10).round(3))\n",
    "\n",
    "# Rolling quantiles\n",
    "rolling_data['sales_30d_q25'] = rolling_data['sales'].rolling(window=30).quantile(0.25)\n",
    "rolling_data['sales_30d_q75'] = rolling_data['sales'].rolling(window=30).quantile(0.75)\n",
    "rolling_data['sales_30d_median'] = rolling_data['sales'].rolling(window=30).median()\n",
    "\n",
    "print(\"\\nRolling quantiles (last 5 days):\")\n",
    "quantile_cols = ['sales', 'sales_30d_q25', 'sales_30d_median', 'sales_30d_q75']\n",
    "print(rolling_data[quantile_cols].tail(5).round(2))\n",
    "\n",
    "# Custom rolling functions\n",
    "def rolling_cv(series):\n",
    "    \"\"\"Rolling coefficient of variation\"\"\"\n",
    "    return series.std() / series.mean() if series.mean() != 0 else 0\n",
    "\n",
    "def rolling_skewness(series):\n",
    "    \"\"\"Rolling skewness\"\"\"\n",
    "    return series.skew()\n",
    "\n",
    "rolling_data['sales_30d_cv'] = rolling_data['sales'].rolling(window=30).apply(rolling_cv)\n",
    "rolling_data['sales_30d_skew'] = rolling_data['sales'].rolling(window=30).apply(rolling_skewness)\n",
    "\n",
    "print(\"\\nCustom rolling statistics (last 5 days):\")\n",
    "custom_cols = ['sales_30d_cv', 'sales_30d_skew']\n",
    "print(rolling_data[custom_cols].tail(5).round(3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exponentially weighted functions\n",
    "print(\"=== EXPONENTIALLY WEIGHTED FUNCTIONS ===\")\n",
    "\n",
    "# Exponentially weighted moving average (EWMA)\n",
    "rolling_data['sales_ewm_10'] = rolling_data['sales'].ewm(span=10).mean()\n",
    "rolling_data['sales_ewm_30'] = rolling_data['sales'].ewm(span=30).mean()\n",
    "\n",
    "# Exponentially weighted standard deviation\n",
    "rolling_data['sales_ewm_std_10'] = rolling_data['sales'].ewm(span=10).std()\n",
    "\n",
    "print(\"Exponentially weighted statistics (last 10 days):\")\n",
    "ewm_cols = ['sales', 'sales_7d_avg', 'sales_ewm_10', 'sales_ewm_30']\n",
    "print(rolling_data[ewm_cols].tail(10).round(2))\n",
    "\n",
    "# Visualize different smoothing methods\n",
    "plt.figure(figsize=(15, 8))\n",
    "\n",
    "# Plot last 90 days for clarity\n",
    "recent_period = rolling_data.tail(90)\n",
    "\n",
    "plt.plot(recent_period.index, recent_period['sales'], label='Original Sales', alpha=0.7)\n",
    "plt.plot(recent_period.index, recent_period['sales_7d_avg'], label='7-day MA', linewidth=2)\n",
    "plt.plot(recent_period.index, recent_period['sales_30d_avg'], label='30-day MA', linewidth=2)\n",
    "plt.plot(recent_period.index, recent_period['sales_ewm_10'], label='EWM (span=10)', linewidth=2)\n",
    "\n",
    "plt.title('Sales Smoothing Methods Comparison (Last 90 Days)')\n",
    "plt.xlabel('Date')\n",
    "plt.ylabel('Sales ($)')\n",
    "plt.legend()\n",
    "plt.grid(True, alpha=0.3)\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "# Calculate lag between different smoothing methods\n",
    "print(\"\\nSmoothing method responsiveness (correlation with original):\")\n",
    "responsiveness = {\n",
    "    '7-day MA': rolling_data['sales'].corr(rolling_data['sales_7d_avg']),\n",
    "    '30-day MA': rolling_data['sales'].corr(rolling_data['sales_30d_avg']),\n",
    "    'EWM (span=10)': rolling_data['sales'].corr(rolling_data['sales_ewm_10']),\n",
    "    'EWM (span=30)': rolling_data['sales'].corr(rolling_data['sales_ewm_30'])\n",
    "}\n",
    "\n",
    "for method, corr in responsiveness.items():\n",
    "    print(f\"{method}: {corr:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Seasonal Analysis and Decomposition\n",
    "\n",
    "Analyzing seasonal patterns and decomposing time series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Seasonal pattern analysis\n",
    "print(\"=== SEASONAL PATTERN ANALYSIS ===\")\n",
    "\n",
    "# Add more detailed time components\n",
    "seasonal_data = ts_data.copy()\n",
    "seasonal_data['month'] = seasonal_data.index.month\n",
    "seasonal_data['quarter'] = seasonal_data.index.quarter\n",
    "seasonal_data['day_of_week'] = seasonal_data.index.dayofweek\n",
    "seasonal_data['week_of_year'] = seasonal_data.index.isocalendar().week\n",
    "seasonal_data['day_of_year'] = seasonal_data.index.dayofyear\n",
    "\n",
    "# Monthly seasonality\n",
    "monthly_pattern = seasonal_data.groupby('month')['sales'].agg(['mean', 'std', 'count'])\n",
    "print(\"Monthly sales patterns:\")\n",
    "print(monthly_pattern.round(2))\n",
    "\n",
    "# Day of week patterns\n",
    "dow_pattern = seasonal_data.groupby('day_of_week')['sales'].agg(['mean', 'std'])\n",
    "dow_pattern.index = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']\n",
    "print(\"\\nDay of week patterns:\")\n",
    "print(dow_pattern.round(2))\n",
    "\n",
    "# Weekly patterns throughout the year\n",
    "weekly_pattern = seasonal_data.groupby('week_of_year')['sales'].mean()\n",
    "print(\"\\nWeekly pattern statistics:\")\n",
    "print(f\"Highest week: Week {weekly_pattern.idxmax()} (${weekly_pattern.max():.0f})\")\n",
    "print(f\"Lowest week: Week {weekly_pattern.idxmin()} (${weekly_pattern.min():.0f})\")\n",
    "print(f\"Weekly variation: {weekly_pattern.std():.2f}\")\n",
    "\n",
    "# Quarterly analysis\n",
    "quarterly_pattern = seasonal_data.groupby('quarter')['sales'].agg(['mean', 'sum', 'std'])\n",
    "print(\"\\nQuarterly patterns:\")\n",
    "print(quarterly_pattern.round(2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Simple seasonal decomposition\n",
    "print(\"=== SEASONAL DECOMPOSITION ===\")\n",
    "\n",
    "# Manual decomposition approach\n",
    "def simple_decompose(series, period=365):\n",
    "    \"\"\"Simple seasonal decomposition\"\"\"\n",
    "    # Trend (using centered moving average)\n",
    "    trend = series.rolling(window=period, center=True).mean()\n",
    "    \n",
    "    # Detrended series\n",
    "    detrended = series - trend\n",
    "    \n",
    "    # Seasonal component (average for each period)\n",
    "    seasonal_avg = detrended.groupby(detrended.index.dayofyear).mean()\n",
    "    seasonal = pd.Series(index=series.index, dtype=float)\n",
    "    for idx in series.index:\n",
    "        day_of_year = idx.dayofyear\n",
    "        if day_of_year in seasonal_avg.index:\n",
    "            seasonal.loc[idx] = seasonal_avg.loc[day_of_year]\n",
    "        else:  # Handle leap year day\n",
    "            seasonal.loc[idx] = 0\n",
    "    \n",
    "    # Residual (what's left after removing trend and seasonality)\n",
    "    residual = series - trend - seasonal\n",
    "    \n",
    "    return trend, seasonal, residual\n",
    "\n",
    "# Decompose sales data\n",
    "trend, seasonal, residual = simple_decompose(ts_data['sales'])\n",
    "\n",
    "# Create decomposition DataFrame\n",
    "decomposition = pd.DataFrame({\n",
    "    'original': ts_data['sales'],\n",
    "    'trend': trend,\n",
    "    'seasonal': seasonal,\n",
    "    'residual': residual\n",
    "})\n",
    "\n",
    "print(\"Decomposition summary:\")\n",
    "print(decomposition.describe().round(2))\n",
    "\n",
    "# Visualize decomposition\n",
    "fig, axes = plt.subplots(4, 1, figsize=(15, 12))\n",
    "\n",
    "# Original series\n",
    "decomposition['original'].plot(ax=axes[0], title='Original Sales Data')\n",
    "axes[0].set_ylabel('Sales ($)')\n",
    "axes[0].grid(True, alpha=0.3)\n",
    "\n",
    "# Trend\n",
    "decomposition['trend'].plot(ax=axes[1], title='Trend Component', color='red')\n",
    "axes[1].set_ylabel('Trend ($)')\n",
    "axes[1].grid(True, alpha=0.3)\n",
    "\n",
    "# Seasonal\n",
    "decomposition['seasonal'].plot(ax=axes[2], title='Seasonal Component', color='green')\n",
    "axes[2].set_ylabel('Seasonal ($)')\n",
    "axes[2].grid(True, alpha=0.3)\n",
    "\n",
    "# Residual\n",
    "decomposition['residual'].plot(ax=axes[3], title='Residual Component', color='purple')\n",
    "axes[3].set_ylabel('Residual ($)')\n",
    "axes[3].set_xlabel('Date')\n",
    "axes[3].grid(True, alpha=0.3)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "print(\"\\nDecomposition insights:\")\n",
    "print(f\"Trend contribution: {trend.std():.2f} (std dev)\")\n",
    "print(f\"Seasonal contribution: {seasonal.std():.2f} (std dev)\")\n",
    "print(f\"Residual contribution: {residual.std():.2f} (std dev)\")\n",
    "print(f\"Total variation: {ts_data['sales'].std():.2f} (std dev)\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Advanced seasonal analysis\n",
    "print(\"=== ADVANCED SEASONAL ANALYSIS ===\")\n",
    "\n",
    "# Year-over-year comparison\n",
    "yoy_comparison = pd.DataFrame()\n",
    "for year in ts_data.index.year.unique():\n",
    "    year_data = ts_data[ts_data.index.year == year]['sales']\n",
    "    year_data.index = year_data.index.dayofyear\n",
    "    yoy_comparison[f'Year_{year}'] = year_data\n",
    "\n",
    "print(\"Year-over-year comparison (sample days):\")\n",
    "print(yoy_comparison.head(10).round(2))\n",
    "\n",
    "# Calculate year-over-year changes\n",
    "if len(yoy_comparison.columns) > 1:\n",
    "    yoy_change = yoy_comparison.pct_change(axis=1) * 100\n",
    "    print(\"\\nYear-over-year change statistics:\")\n",
    "    for col in yoy_change.columns[1:]:\n",
    "        print(f\"{col}: mean={yoy_change[col].mean():.2f}%, std={yoy_change[col].std():.2f}%\")\n",
    "\n",
    "# Seasonal strength measurement\n",
    "def seasonal_strength(series, period=365):\n",
    "    \"\"\"Calculate seasonal strength (0 = no seasonality, 1 = pure seasonality)\"\"\"\n",
    "    # Detrend the series\n",
    "    trend = series.rolling(window=period, center=True).mean()\n",
    "    detrended = series - trend\n",
    "    \n",
    "    # Calculate seasonal component\n",
    "    seasonal_avg = detrended.groupby(detrended.index.dayofyear).mean()\n",
    "    seasonal_var = seasonal_avg.var()\n",
    "    \n",
    "    # Calculate residual variance\n",
    "    seasonal_full = pd.Series(index=series.index, dtype=float)\n",
    "    for idx in series.index:\n",
    "        day_of_year = idx.dayofyear\n",
    "        if day_of_year in seasonal_avg.index:\n",
    "            seasonal_full.loc[idx] = seasonal_avg.loc[day_of_year]\n",
    "        else:\n",
    "            seasonal_full.loc[idx] = 0\n",
    "    \n",
    "    residual = detrended - seasonal_full\n",
    "    residual_var = residual.var()\n",
    "    \n",
    "    # Seasonal strength\n",
    "    return seasonal_var / (seasonal_var + residual_var)\n",
    "\n",
    "sales_seasonal_strength = seasonal_strength(ts_data['sales'])\n",
    "print(f\"\\nSales seasonal strength: {sales_seasonal_strength:.3f}\")\n",
    "print(\"(0 = no seasonality, 1 = pure seasonality)\")\n",
    "\n",
    "# Identify most/least seasonal periods\n",
    "monthly_seasonal = seasonal_data.groupby('month')['sales'].std()\n",
    "print(f\"\\nMost variable month: {monthly_seasonal.idxmax()} (std: {monthly_seasonal.max():.2f})\")\n",
    "print(f\"Least variable month: {monthly_seasonal.idxmin()} (std: {monthly_seasonal.min():.2f})\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Business Applications and Forecasting\n",
    "\n",
    "Real-world time series analysis for business insights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Business performance metrics\n",
    "print(\"=== BUSINESS PERFORMANCE METRICS ===\")\n",
    "\n",
    "def calculate_business_metrics(df):\n",
    "    \"\"\"Calculate key business time series metrics\"\"\"\n",
    "    metrics = {}\n",
    "    \n",
    "    # Growth metrics\n",
    "    daily_sales = df['sales']\n",
    "    metrics['total_sales'] = daily_sales.sum()\n",
    "    metrics['avg_daily_sales'] = daily_sales.mean()\n",
    "    metrics['sales_growth_rate'] = (daily_sales.iloc[-30:].mean() / daily_sales.iloc[:30].mean() - 1) * 100\n",
    "    \n",
    "    # Volatility metrics\n",
    "    metrics['sales_volatility'] = daily_sales.std()\n",
    "    metrics['coefficient_of_variation'] = daily_sales.std() / daily_sales.mean()\n",
    "    \n",
    "    # Trend metrics\n",
    "    trend = daily_sales.rolling(window=30).mean()\n",
    "    metrics['trend_direction'] = 'Increasing' if trend.iloc[-1] > trend.iloc[-30] else 'Decreasing'\n",
    "    metrics['trend_strength'] = abs(trend.iloc[-1] - trend.iloc[-30]) / trend.iloc[-30] * 100\n",
    "    \n",
    "    # Customer metrics\n",
    "    metrics['avg_customers_per_day'] = df['customers'].mean()\n",
    "    metrics['sales_per_customer'] = df['sales'].sum() / df['customers'].sum()\n",
    "    \n",
    "    # Marketing efficiency\n",
    "    metrics['marketing_roi'] = df['sales'].sum() / df['marketing_spend'].sum()\n",
    "    \n",
    "    return metrics\n",
    "\n",
    "# Calculate metrics for different periods\n",
    "overall_metrics = calculate_business_metrics(ts_data)\n",
    "recent_metrics = calculate_business_metrics(ts_data.tail(90))  # Last 90 days\n",
    "\n",
    "print(\"Overall Performance Metrics:\")\n",
    "for metric, value in overall_metrics.items():\n",
    "    if isinstance(value, float):\n",
    "        print(f\"{metric}: {value:.2f}\")\n",
    "    else:\n",
    "        print(f\"{metric}: {value}\")\n",
    "\n",
    "print(\"\\nRecent 90-day Performance Metrics:\")\n",
    "for metric, value in recent_metrics.items():\n",
    "    if isinstance(value, float):\n",
    "        print(f\"{metric}: {value:.2f}\")\n",
    "    else:\n",
    "        print(f\"{metric}: {value}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Simple forecasting using historical patterns\n",
    "print(\"=== SIMPLE FORECASTING ===\")\n",
    "\n",
    "def simple_forecast(series, periods=30, method='seasonal_naive'):\n",
    "    \"\"\"Simple forecasting methods\"\"\"\n",
    "    if method == 'naive':\n",
    "        # Naive: repeat last value\n",
    "        return pd.Series([series.iloc[-1]] * periods, \n",
    "                        index=pd.date_range(series.index[-1] + pd.Timedelta(days=1), periods=periods))\n",
    "    \n",
    "    elif method == 'seasonal_naive':\n",
    "        # Seasonal naive: repeat same day from previous year\n",
    "        forecast_dates = pd.date_range(series.index[-1] + pd.Timedelta(days=1), periods=periods)\n",
    "        forecast_values = []\n",
    "        \n",
    "        for date in forecast_dates:\n",
    "            # Find same day of year from previous year\n",
    "            previous_year_date = date - pd.DateOffset(years=1)\n",
    "            if previous_year_date in series.index:\n",
    "                forecast_values.append(series.loc[previous_year_date])\n",
    "            else:\n",
    "                # Fallback to seasonal average\n",
    "                day_of_year = date.dayofyear\n",
    "                same_day_values = series[series.index.dayofyear == day_of_year]\n",
    "                if len(same_day_values) > 0:\n",
    "                    forecast_values.append(same_day_values.mean())\n",
    "                else:\n",
    "                    forecast_values.append(series.mean())\n",
    "        \n",
    "        return pd.Series(forecast_values, index=forecast_dates)\n",
    "    \n",
    "    elif method == 'moving_average':\n",
    "        # Moving average forecast\n",
    "        ma_value = series.tail(30).mean()\n",
    "        return pd.Series([ma_value] * periods,\n",
    "                        index=pd.date_range(series.index[-1] + pd.Timedelta(days=1), periods=periods))\n",
    "    \n",
    "    elif method == 'trend':\n",
    "        # Linear trend forecast\n",
    "        from scipy import stats\n",
    "        x = np.arange(len(series))\n",
    "        slope, intercept, _, _, _ = stats.linregress(x, series.values)\n",
    "        \n",
    "        forecast_dates = pd.date_range(series.index[-1] + pd.Timedelta(days=1), periods=periods)\n",
    "        forecast_values = [slope * (len(series) + i) + intercept for i in range(1, periods + 1)]\n",
    "        \n",
    "        return pd.Series(forecast_values, index=forecast_dates)\n",
    "\n",
    "# Generate forecasts using different methods\n",
    "forecast_periods = 30\n",
    "sales_series = ts_data['sales']\n",
    "\n",
    "forecasts = {\n",
    "    'Naive': simple_forecast(sales_series, forecast_periods, 'naive'),\n",
    "    'Seasonal_Naive': simple_forecast(sales_series, forecast_periods, 'seasonal_naive'),\n",
    "    'Moving_Average': simple_forecast(sales_series, forecast_periods, 'moving_average'),\n",
    "    'Trend': simple_forecast(sales_series, forecast_periods, 'trend')\n",
    "}\n",
    "\n",
    "print(f\"Forecasts for next {forecast_periods} days:\")\n",
    "forecast_df = pd.DataFrame(forecasts)\n",
    "print(forecast_df.head(10).round(2))\n",
    "\n",
    "print(\"\\nForecast summary statistics:\")\n",
    "print(forecast_df.describe().round(2))\n",
    "\n",
    "# Visualize forecasts\n",
    "plt.figure(figsize=(15, 8))\n",
    "\n",
    "# Plot historical data (last 90 days)\n",
    "historical_period = sales_series.tail(90)\n",
    "plt.plot(historical_period.index, historical_period.values, label='Historical Data', color='black', linewidth=2)\n",
    "\n",
    "# Plot forecasts\n",
    "colors = ['red', 'blue', 'green', 'orange']\n",
    "for i, (method, forecast) in enumerate(forecasts.items()):\n",
    "    plt.plot(forecast.index, forecast.values, label=f'{method} Forecast', \n",
    "             color=colors[i], linestyle='--', linewidth=2)\n",
    "\n",
    "plt.title('Sales Forecasting Comparison')\n",
    "plt.xlabel('Date')\n",
    "plt.ylabel('Sales ($)')\n",
    "plt.legend()\n",
    "plt.grid(True, alpha=0.3)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Anomaly detection in time series\n",
    "print(\"=== ANOMALY DETECTION ===\")\n",
    "\n",
    "def detect_anomalies(series, method='zscore', threshold=3):\n",
    "    \"\"\"Detect anomalies in time series\"\"\"\n",
    "    anomalies = pd.Series(False, index=series.index)\n",
    "    \n",
    "    if method == 'zscore':\n",
    "        # Z-score method\n",
    "        z_scores = np.abs((series - series.mean()) / series.std())\n",
    "        anomalies = z_scores > threshold\n",
    "        \n",
    "    elif method == 'iqr':\n",
    "        # Interquartile range method\n",
    "        Q1 = series.quantile(0.25)\n",
    "        Q3 = series.quantile(0.75)\n",
    "        IQR = Q3 - Q1\n",
    "        lower_bound = Q1 - 1.5 * IQR\n",
    "        upper_bound = Q3 + 1.5 * IQR\n",
    "        anomalies = (series < lower_bound) | (series > upper_bound)\n",
    "        \n",
    "    elif method == 'rolling':\n",
    "        # Rolling window method\n",
    "        rolling_mean = series.rolling(window=30).mean()\n",
    "        rolling_std = series.rolling(window=30).std()\n",
    "        z_scores = np.abs((series - rolling_mean) / rolling_std)\n",
    "        anomalies = z_scores > threshold\n",
    "    \n",
    "    return anomalies\n",
    "\n",
    "# Detect anomalies using different methods\n",
    "anomaly_methods = ['zscore', 'iqr', 'rolling']\n",
    "anomaly_results = {}\n",
    "\n",
    "for method in anomaly_methods:\n",
    "    anomalies = detect_anomalies(ts_data['sales'], method=method)\n",
    "    anomaly_results[method] = anomalies\n",
    "    print(f\"{method.upper()} method: {anomalies.sum()} anomalies detected\")\n",
    "\n",
    "# Combine anomaly detection results\n",
    "anomaly_df = pd.DataFrame(anomaly_results)\n",
    "anomaly_df['any_method'] = anomaly_df.any(axis=1)\n",
    "anomaly_df['all_methods'] = anomaly_df[anomaly_methods].all(axis=1)\n",
    "\n",
    "print(f\"\\nAnomalies detected by any method: {anomaly_df['any_method'].sum()}\")\n",
    "print(f\"Anomalies detected by all methods: {anomaly_df['all_methods'].sum()}\")\n",
    "\n",
    "# Show anomalous dates\n",
    "severe_anomalies = ts_data[anomaly_df['all_methods']]\n",
    "if len(severe_anomalies) > 0:\n",
    "    print(\"\\nSevere anomalies (detected by all methods):\")\n",
    "    print(severe_anomalies[['sales', 'customers', 'marketing_spend']].round(2))\n",
    "\n",
    "# Visualize anomalies\n",
    "plt.figure(figsize=(15, 8))\n",
    "\n",
    "# Plot sales data\n",
    "plt.plot(ts_data.index, ts_data['sales'], label='Sales Data', alpha=0.7)\n",
    "\n",
    "# Highlight anomalies\n",
    "for method in anomaly_methods:\n",
    "    anomaly_dates = ts_data.index[anomaly_results[method]]\n",
    "    anomaly_values = ts_data.loc[anomaly_dates, 'sales']\n",
    "    plt.scatter(anomaly_dates, anomaly_values, label=f'{method.upper()} Anomalies', alpha=0.7, s=30)\n",
    "\n",
    "plt.title('Sales Data with Anomaly Detection')\n",
    "plt.xlabel('Date')\n",
    "plt.ylabel('Sales ($)')\n",
    "plt.legend()\n",
    "plt.grid(True, alpha=0.3)\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "# Anomaly statistics\n",
    "print(\"\\nAnomaly statistics:\")\n",
    "for method in anomaly_methods:\n",
    "    anomaly_sales = ts_data.loc[anomaly_results[method], 'sales']\n",
    "    if len(anomaly_sales) > 0:\n",
    "        print(f\"{method.upper()}: mean=${anomaly_sales.mean():.2f}, std=${anomaly_sales.std():.2f}\")\n",
    "    else:\n",
    "        print(f\"{method.upper()}: No anomalies detected\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Practice Exercises\n",
    "\n",
    "Apply time series analysis to complex business scenarios:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 1: Comprehensive Time Series Dashboard\n",
    "# Create a complete time series analysis dashboard that includes:\n",
    "# - Multiple time series metrics and KPIs\n",
    "# - Seasonal analysis and trend identification\n",
    "# - Anomaly detection and alerting\n",
    "# - Forecasting with confidence intervals\n",
    "# - Business insights and recommendations\n",
    "\n",
    "def create_time_series_dashboard(df):\n",
    "    \"\"\"Create comprehensive time series analysis dashboard\"\"\"\n",
    "    # Your implementation here\n",
    "    pass\n",
    "\n",
    "# dashboard = create_time_series_dashboard(ts_data)\n",
    "# print(\"Time Series Dashboard Created\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 2: Multi-variate Time Series Analysis\n",
    "# Analyze relationships between multiple time series:\n",
    "# - Cross-correlation analysis\n",
    "# - Lead-lag relationships\n",
    "# - Causality testing\n",
    "# - Multi-variate forecasting\n",
    "\n",
    "# Your code here:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 3: Advanced Forecasting Challenge\n",
    "# Implement more sophisticated forecasting methods:\n",
    "# - Exponential smoothing with trend and seasonality\n",
    "# - ARIMA modeling\n",
    "# - Model evaluation and selection\n",
    "# - Forecast accuracy metrics\n",
    "\n",
    "# Your code here:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Key Takeaways\n",
    "\n",
    "1. **DateTime Indexing**:\n",
    "   - Use `pd.DatetimeIndex` for time-based operations\n",
    "   - Enable powerful time-based selection and slicing\n",
    "   - Extract components (year, month, day, etc.) for analysis\n",
    "\n",
    "2. **Resampling**:\n",
    "   - **`.resample()`**: Convert between different frequencies\n",
    "   - **Downsampling**: Aggregate to lower frequency (daily → monthly)\n",
    "   - **Upsampling**: Convert to higher frequency (monthly → daily)\n",
    "   - Use appropriate aggregation functions for your data\n",
    "\n",
    "3. **Rolling Calculations**:\n",
    "   - **`.rolling()`**: Moving window calculations\n",
    "   - **`.ewm()`**: Exponentially weighted functions\n",
    "   - Useful for smoothing and trend analysis\n",
    "   - Handle missing values appropriately\n",
    "\n",
    "4. **Seasonal Analysis**:\n",
    "   - Identify patterns by time components\n",
    "   - Decompose into trend, seasonal, and residual\n",
    "   - Measure seasonal strength and variability\n",
    "\n",
    "## Time Series Quick Reference\n",
    "\n",
    "```python\n",
    "# Create datetime index\n",
    "df.set_index(pd.to_datetime(df['date']), inplace=True)\n",
    "\n",
    "# Time-based selection\n",
    "df['2023']                    # Select year\n",
    "df['2023-01']                 # Select month\n",
    "df['2023-01-01':'2023-01-31'] # Date range\n",
    "\n",
    "# Resampling\n",
    "df.resample('M').sum()        # Monthly sum\n",
    "df.resample('W').mean()       # Weekly average\n",
    "df.resample('Q').agg({'col': ['sum', 'mean']})  # Quarterly multi-agg\n",
    "\n",
    "# Rolling calculations\n",
    "df['col'].rolling(7).mean()   # 7-period moving average\n",
    "df['col'].ewm(span=10).mean() # Exponential moving average\n",
    "df['col'].rolling(30).std()   # 30-period rolling standard deviation\n",
    "```\n",
    "\n",
    "## Business Applications\n",
    "\n",
    "| Use Case | Technique | Key Insights |\n",
    "|----------|-----------|-------------|\n",
    "| Sales forecasting | Seasonal decomposition + trends | Predict future performance |\n",
    "| Anomaly detection | Rolling statistics + thresholds | Identify unusual patterns |\n",
    "| Performance monitoring | Moving averages + KPIs | Track business health |\n",
    "| Seasonal planning | Seasonal analysis | Optimize inventory/staffing |\n",
    "| Marketing ROI | Cross-correlation analysis | Measure campaign effectiveness |\n",
    "\n",
    "## Best Practices\n",
    "\n",
    "1. **Data Quality**: Ensure consistent time intervals and handle missing data\n",
    "2. **Frequency Choice**: Choose appropriate resampling frequency for your analysis\n",
    "3. **Window Size**: Balance responsiveness vs. smoothness in rolling calculations\n",
    "4. **Seasonality**: Always check for and account for seasonal patterns\n",
    "5. **Validation**: Use holdout periods to validate forecasting models\n",
    "6. **Business Context**: Interpret results in context of business cycles and events\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}