crypto_bot_training/Session_01/PandasDataFrame-exmples/09_pivot_tables.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Session 1 - DataFrames - Lesson 9: Pivot Tables and Data Reshaping\n",
    "\n",
    "## Learning Objectives\n",
    "- Master pivot table creation and customization\n",
    "- Understand data reshaping with melt, pivot, stack, and unstack\n",
    "- Learn cross-tabulation and contingency tables\n",
    "- Practice with multi-level indexing and hierarchical data\n",
    "- Apply reshaping techniques to real-world analysis scenarios\n",
    "\n",
    "## Prerequisites\n",
    "- Completed Lessons 1-8\n",
    "- Understanding of aggregation and groupby operations\n",
    "- Familiarity with Excel pivot tables (helpful but not required)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Libraries loaded successfully!\n"
     ]
    }
   ],
   "source": [
    "# Import required libraries\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from datetime import datetime, timedelta\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "# Set display options\n",
    "pd.set_option('display.max_columns', None)\n",
    "pd.set_option('display.max_rows', 20)\n",
    "pd.set_option('display.width', None)\n",
    "\n",
    "print(\"Libraries loaded successfully!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating Sample Dataset\n",
    "\n",
    "Let's create a comprehensive business dataset for pivot table examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Business dataset created:\n",
      "Shape: (1000, 20)\n",
      "\n",
      "First few rows:\n",
      "        date   salesperson region product_category   product customer_type  \\\n",
      "0 2024-01-01     Grace Lee  North         Clothing     Shoes     Returning   \n",
      "1 2024-01-02  Diana Prince   East         Clothing  Magazine     Returning   \n",
      "2 2024-01-03   Henry Davis   West            Books     Shoes           VIP   \n",
      "3 2024-01-04    Eve Wilson   West         Clothing     Novel           New   \n",
      "4 2024-01-05     Grace Lee   East         Clothing    Laptop           VIP   \n",
      "\n",
      "  sales_channel  quantity  unit_price  discount_percent  shipping_cost  \\\n",
      "0        Online         1       72.81                 0          16.13   \n",
      "1         Store         9       99.98                15          10.42   \n",
      "2        Online         6        4.61                 0           8.73   \n",
      "3         Phone         8       31.51                 0           8.27   \n",
      "4         Store         3       86.28                15          11.61   \n",
      "\n",
      "   gross_sales  discount_amount  net_sales  total_order  year  month  quarter  \\\n",
      "0        72.81            0.000     72.810       88.940  2024      1        1   \n",
      "1       899.82          134.973    764.847      775.267  2024      1        1   \n",
      "2        27.66            0.000     27.660       36.390  2024      1        1   \n",
      "3       252.08            0.000    252.080      260.350  2024      1        1   \n",
      "4       258.84           38.826    220.014      231.624  2024      1        1   \n",
      "\n",
      "  day_of_week month_name  \n",
      "0      Monday    January  \n",
      "1     Tuesday    January  \n",
      "2   Wednesday    January  \n",
      "3    Thursday    January  \n",
      "4      Friday    January  \n",
      "\n",
      "Column info:\n",
      "date                datetime64[ns]\n",
      "salesperson                 object\n",
      "region                      object\n",
      "product_category            object\n",
      "product                     object\n",
      "customer_type               object\n",
      "sales_channel               object\n",
      "quantity                     int64\n",
      "unit_price                 float64\n",
      "discount_percent             int64\n",
      "shipping_cost              float64\n",
      "gross_sales                float64\n",
      "discount_amount            float64\n",
      "net_sales                  float64\n",
      "total_order                float64\n",
      "year                         int32\n",
      "month                        int32\n",
      "quarter                      int32\n",
      "day_of_week                 object\n",
      "month_name                  object\n",
      "dtype: object\n"
     ]
    }
   ],
   "source": [
    "# Create comprehensive business dataset\n",
    "np.random.seed(42)\n",
    "n_records = 1000\n",
    "\n",
    "# Generate realistic business data\n",
    "business_data = {\n",
    "    'date': pd.date_range('2024-01-01', periods=n_records, freq='D'),\n",
    "    'salesperson': np.random.choice([\n",
    "        'Alice Johnson', 'Bob Smith', 'Charlie Brown', 'Diana Prince', 'Eve Wilson',\n",
    "        'Frank Miller', 'Grace Lee', 'Henry Davis', 'Ivy Chen', 'Jack Robinson'\n",
    "    ], n_records),\n",
    "    'region': np.random.choice(['North', 'South', 'East', 'West'], n_records),\n",
    "    'product_category': np.random.choice(['Electronics', 'Clothing', 'Books', 'Home & Garden'], n_records),\n",
    "    'product': np.random.choice([\n",
    "        'Laptop', 'Phone', 'Tablet', 'Headphones', 'Speaker',\n",
    "        'Shirt', 'Pants', 'Shoes', 'Jacket', 'Hat',\n",
    "        'Novel', 'Textbook', 'Magazine', 'Comic', 'Cookbook',\n",
    "        'Plant', 'Tool', 'Furniture', 'Decoration', 'Garden'\n",
    "    ], n_records),\n",
    "    'customer_type': np.random.choice(['New', 'Returning', 'VIP'], n_records, p=[0.3, 0.5, 0.2]),\n",
    "    'sales_channel': np.random.choice(['Online', 'Store', 'Phone'], n_records, p=[0.6, 0.3, 0.1]),\n",
    "    'quantity': np.random.randint(1, 10, n_records),\n",
    "    'unit_price': np.random.normal(50, 20, n_records),\n",
    "    'discount_percent': np.random.choice([0, 5, 10, 15, 20], n_records, p=[0.5, 0.2, 0.2, 0.08, 0.02]),\n",
    "    'shipping_cost': np.random.normal(8, 3, n_records)\n",
    "}\n",
    "\n",
    "df_business = pd.DataFrame(business_data)\n",
    "\n",
    "# Clean and calculate derived fields\n",
    "df_business['unit_price'] = np.abs(df_business['unit_price']).round(2)\n",
    "df_business['shipping_cost'] = np.abs(df_business['shipping_cost']).round(2)\n",
    "df_business['gross_sales'] = df_business['quantity'] * df_business['unit_price']\n",
    "df_business['discount_amount'] = df_business['gross_sales'] * df_business['discount_percent'] / 100\n",
    "df_business['net_sales'] = df_business['gross_sales'] - df_business['discount_amount']\n",
    "df_business['total_order'] = df_business['net_sales'] + df_business['shipping_cost']\n",
    "\n",
    "# Add time-based columns\n",
    "df_business['year'] = df_business['date'].dt.year\n",
    "df_business['month'] = df_business['date'].dt.month\n",
    "df_business['quarter'] = df_business['date'].dt.quarter\n",
    "df_business['day_of_week'] = df_business['date'].dt.day_name()\n",
    "df_business['month_name'] = df_business['date'].dt.month_name()\n",
    "\n",
    "print(\"Business dataset created:\")\n",
    "print(f\"Shape: {df_business.shape}\")\n",
    "print(\"\\nFirst few rows:\")\n",
    "print(df_business.head())\n",
    "print(\"\\nColumn info:\")\n",
    "print(df_business.dtypes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Basic Pivot Tables\n",
    "\n",
    "Creating fundamental pivot tables for data summarization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== BASIC PIVOT TABLES ===\n",
      "Sales by Region and Product Category:\n",
      "product_category    Books  Clothing  Electronics  Home & Garden\n",
      "region                                                         \n",
      "East              12377.0   12466.0      16463.0        14140.0\n",
      "North             13492.0   14938.0      15229.0        17770.0\n",
      "South             17590.0   16946.0      14552.0        13695.0\n",
      "West               8157.0   16544.0      12681.0        15652.0\n",
      "\n",
      "Average Order Value by Customer Type and Sales Channel:\n",
      "sales_channel  Online   Phone   Store\n",
      "customer_type                        \n",
      "New            263.07  244.46  222.16\n",
      "Returning      246.03  220.45  224.39\n",
      "VIP            234.35  237.16  253.16\n",
      "\n",
      "Transaction Count by Region and Customer Type:\n",
      "customer_type  New  Returning  VIP\n",
      "region                            \n",
      "East            60        128   60\n",
      "North           70        144   53\n",
      "South           68        138   45\n",
      "West            61        119   54\n"
     ]
    }
   ],
   "source": [
    "# Simple pivot table - sales by region and product category\n",
    "print(\"=== BASIC PIVOT TABLES ===\")\n",
    "\n",
    "# Basic pivot: sum of sales by region and product category\n",
    "basic_pivot = df_business.pivot_table(\n",
    "    values='net_sales',\n",
    "    index='region',\n",
    "    columns='product_category',\n",
    "    aggfunc='sum',\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"Sales by Region and Product Category:\")\n",
    "print(basic_pivot.round(0))\n",
    "\n",
    "# Average order value by customer type and sales channel\n",
    "avg_order_pivot = df_business.pivot_table(\n",
    "    values='total_order',\n",
    "    index='customer_type',\n",
    "    columns='sales_channel',\n",
    "    aggfunc='mean',\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"\\nAverage Order Value by Customer Type and Sales Channel:\")\n",
    "print(avg_order_pivot.round(2))\n",
    "\n",
    "# Count of transactions\n",
    "transaction_count = df_business.pivot_table(\n",
    "    values='total_order',\n",
    "    index='region',\n",
    "    columns='customer_type',\n",
    "    aggfunc='count',\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"\\nTransaction Count by Region and Customer Type:\")\n",
    "print(transaction_count)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== MULTIPLE AGGREGATION FUNCTIONS ===\n",
      "Multiple aggregations (sum, mean, count):\n",
      "                       sum                                        mean  \\\n",
      "product_category     Books  Clothing Electronics Home & Garden   Books   \n",
      "region                                                                   \n",
      "East              12376.85  12465.78    16462.72      14139.75  217.14   \n",
      "North             13491.69  14938.35    15228.66      17769.95  214.15   \n",
      "South             17589.76  16945.54    14551.58      13695.40  266.51   \n",
      "West               8156.64  16544.42    12680.58      15651.63  189.69   \n",
      "\n",
      "                                                    count           \\\n",
      "product_category Clothing Electronics Home & Garden Books Clothing   \n",
      "region                                                               \n",
      "East               197.87      238.59        239.66    57       63   \n",
      "North              226.34      220.71        257.54    63       66   \n",
      "South              260.70      234.70        236.13    66       65   \n",
      "West               239.77      218.63        244.56    43       69   \n",
      "\n",
      "                                            \n",
      "product_category Electronics Home & Garden  \n",
      "region                                      \n",
      "East                      69            59  \n",
      "North                     69            69  \n",
      "South                     62            58  \n",
      "West                      58            64  \n",
      "\n",
      "Mixed aggregations for different metrics:\n",
      "              net_sales                    quantity             total_order  \\\n",
      "sales_channel    Online    Phone     Store   Online Phone Store      Online   \n",
      "region                                                                        \n",
      "East           34662.14  4164.23  16618.73      731    90   372      246.72   \n",
      "North          36698.38  6597.29  18133.00      780   141   370      235.80   \n",
      "South          38569.20  5475.73  18737.36      778   128   369      263.71   \n",
      "West           38003.32  3745.70  11284.26      805    83   264      245.57   \n",
      "\n",
      "                               \n",
      "sales_channel   Phone   Store  \n",
      "region                         \n",
      "East           239.96  203.20  \n",
      "North          243.26  240.63  \n",
      "South          227.11  258.94  \n",
      "West           204.47  213.05  \n"
     ]
    }
   ],
   "source": [
    "# Multiple aggregation functions in one pivot table\n",
    "print(\"=== MULTIPLE AGGREGATION FUNCTIONS ===\")\n",
    "\n",
    "# Multiple aggregations for comprehensive analysis\n",
    "multi_agg_pivot = df_business.pivot_table(\n",
    "    values='net_sales',\n",
    "    index='region',\n",
    "    columns='product_category',\n",
    "    aggfunc=['sum', 'mean', 'count'],\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"Multiple aggregations (sum, mean, count):\")\n",
    "print(multi_agg_pivot.round(2))\n",
    "\n",
    "# Different values with different aggregations\n",
    "mixed_agg_pivot = df_business.pivot_table(\n",
    "    values=['net_sales', 'quantity', 'total_order'],\n",
    "    index='region',\n",
    "    columns='sales_channel',\n",
    "    aggfunc={\n",
    "        'net_sales': 'sum',\n",
    "        'quantity': 'sum',\n",
    "        'total_order': 'mean'\n",
    "    },\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"\\nMixed aggregations for different metrics:\")\n",
    "print(mixed_agg_pivot.round(2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== PIVOT TABLES WITH TOTALS ===\n",
      "Pivot table with row and column totals:\n",
      "product_category    Books  Clothing  Electronics  Home & Garden     Total\n",
      "region                                                                   \n",
      "East              12377.0   12466.0      16463.0        14140.0   55445.0\n",
      "North             13492.0   14938.0      15229.0        17770.0   61429.0\n",
      "South             17590.0   16946.0      14552.0        13695.0   62782.0\n",
      "West               8157.0   16544.0      12681.0        15652.0   53033.0\n",
      "Total             51615.0   60894.0      58924.0        61257.0  232689.0\n",
      "\n",
      "Sales distribution as percentages:\n",
      "product_category  Books  Clothing  Electronics  Home & Garden   Total\n",
      "region                                                               \n",
      "East               5.32      5.36         7.07           6.08   23.83\n",
      "North              5.80      6.42         6.54           7.64   26.40\n",
      "South              7.56      7.28         6.25           5.89   26.98\n",
      "West               3.51      7.11         5.45           6.73   22.79\n",
      "Total             22.18     26.17        25.32          26.33  100.00\n"
     ]
    }
   ],
   "source": [
    "# Pivot tables with totals and margins\n",
    "print(\"=== PIVOT TABLES WITH TOTALS ===\")\n",
    "\n",
    "# Add margins (totals) to pivot table\n",
    "pivot_with_totals = df_business.pivot_table(\n",
    "    values='net_sales',\n",
    "    index='region',\n",
    "    columns='product_category',\n",
    "    aggfunc='sum',\n",
    "    fill_value=0,\n",
    "    margins=True,\n",
    "    margins_name='Total'\n",
    ")\n",
    "\n",
    "print(\"Pivot table with row and column totals:\")\n",
    "print(pivot_with_totals.round(0))\n",
    "\n",
    "# Calculate percentages of total\n",
    "pivot_percentages = df_business.pivot_table(\n",
    "    values='net_sales',\n",
    "    index='region',\n",
    "    columns='product_category',\n",
    "    aggfunc='sum',\n",
    "    fill_value=0,\n",
    "    margins=True,\n",
    "    margins_name='Total'\n",
    ")\n",
    "\n",
    "# Convert to percentages (excluding totals row/column for calculation)\n",
    "total_sales = pivot_percentages.loc['Total', 'Total']\n",
    "pivot_pct = (pivot_percentages / total_sales * 100).round(2)\n",
    "\n",
    "print(\"\\nSales distribution as percentages:\")\n",
    "print(pivot_pct)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Advanced Pivot Table Techniques\n",
    "\n",
    "Complex pivot tables with multiple indices and custom aggregations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== MULTI-LEVEL INDEX PIVOT TABLES ===\n",
      "Hierarchical pivot (Region > Salesperson vs Product Category):\n",
      "product_category          Books   Clothing  Electronics  Home & Garden\n",
      "region salesperson                                                    \n",
      "East   Alice Johnson  1537.0605   838.5400    1792.4760      1693.2745\n",
      "       Bob Smith      2039.4035  1156.4895    1110.9565       950.9340\n",
      "       Charlie Brown   435.2560   952.0800    2610.3275      1098.6095\n",
      "       Diana Prince   1874.3990  1845.6785    1898.2705      1610.8440\n",
      "       Eve Wilson     1585.4275  1534.6530    1395.2180      1179.5475\n",
      "       Frank Miller   1106.1950  1476.6940     945.8550       988.5600\n",
      "       Grace Lee      1556.9905  1298.2700     865.6170       938.2240\n",
      "       Henry Davis     875.1640   725.9365    2336.0330      1322.1800\n",
      "       Ivy Chen        797.3240  1608.6365    1110.4660      1724.6260\n",
      "       Jack Robinson   569.6290  1028.8020    2397.5045      2632.9480\n",
      "North  Alice Johnson  2314.9745  2757.4270    1911.9930      1856.1875\n",
      "       Bob Smith      1031.4940   489.7395     801.7940       796.1610\n",
      "       Charlie Brown   498.9095  1879.5460    1790.1620      1393.1530\n",
      "       Diana Prince    214.3950  1484.9535    1053.2905      2467.8815\n",
      "       Eve Wilson     2040.3085  1654.9070    1178.6505      1498.5810\n",
      "       Frank Miller   2509.5095  1490.6560    1264.5290       766.4540\n",
      "       Grace Lee       748.8710  1293.0970     833.7205      2235.9345\n",
      "       Henry Davis     815.0200  1089.6130    1072.8980      1280.2690\n",
      "       Ivy Chen       2411.9690  1079.7175    1762.1170      1244.5645\n",
      "       Jack Robinson   906.2425  1718.6920    3559.5100      4230.7635\n",
      "\n",
      "Hierarchical columns (Product Category > Customer Type):\n",
      "product_category   Books                   Clothing                    \\\n",
      "customer_type        New Returning     VIP      New Returning     VIP   \n",
      "region                                                                  \n",
      "East              2077.0    7770.0  2530.0   2680.0    6856.0  2929.0   \n",
      "North             2546.0    7153.0  3792.0   2959.0    7514.0  4465.0   \n",
      "South             2831.0   10393.0  4366.0   6472.0    8685.0  1788.0   \n",
      "West              3128.0    2906.0  2123.0   4974.0    8570.0  3000.0   \n",
      "\n",
      "product_category Electronics                   Home & Garden                    \n",
      "customer_type            New Returning     VIP           New Returning     VIP  \n",
      "region                                                                          \n",
      "East                  4562.0    7630.0  4271.0        3974.0    7367.0  2799.0  \n",
      "North                 6068.0    6693.0  2467.0        4970.0   10655.0  2145.0  \n",
      "South                 4919.0    6629.0  3004.0        2979.0    7527.0  3189.0  \n",
      "West                  3013.0    7416.0  2252.0        4220.0    7502.0  3929.0  \n",
      "\n",
      "Full hierarchical pivot (limited sample):\n",
      "product_category   Books                Clothing               \n",
      "sales_channel     Online  Phone   Store   Online  Phone   Store\n",
      "region quarter                                                 \n",
      "East   1          2693.0  261.0  1483.0   1873.0  573.0  2045.0\n",
      "       2          1735.0  233.0  1766.0   4208.0  644.0   529.0\n",
      "       3          1358.0    0.0  1284.0    867.0    0.0   225.0\n",
      "       4          1036.0    0.0   527.0    998.0    0.0   503.0\n",
      "North  1          2012.0    0.0  1019.0   1987.0  948.0  1759.0\n",
      "       2          1806.0  375.0  2181.0   1836.0  471.0   499.0\n",
      "       3          2210.0  348.0   806.0   2702.0  305.0   311.0\n",
      "       4          1649.0    0.0  1086.0   2437.0    0.0  1682.0\n"
     ]
    }
   ],
   "source": [
    "# Multi-level index pivot tables\n",
    "print(\"=== MULTI-LEVEL INDEX PIVOT TABLES ===\")\n",
    "\n",
    "# Hierarchical rows\n",
    "hierarchical_pivot = df_business.pivot_table(\n",
    "    values='net_sales',\n",
    "    index=['region', 'salesperson'],\n",
    "    columns='product_category',\n",
    "    aggfunc='sum',\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"Hierarchical pivot (Region > Salesperson vs Product Category):\")\n",
    "print(hierarchical_pivot.head(20))\n",
    "\n",
    "# Hierarchical columns\n",
    "hierarchical_cols_pivot = df_business.pivot_table(\n",
    "    values='net_sales',\n",
    "    index='region',\n",
    "    columns=['product_category', 'customer_type'],\n",
    "    aggfunc='sum',\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"\\nHierarchical columns (Product Category > Customer Type):\")\n",
    "print(hierarchical_cols_pivot.round(0))\n",
    "\n",
    "# Both hierarchical rows and columns\n",
    "full_hierarchical = df_business.pivot_table(\n",
    "    values='net_sales',\n",
    "    index=['region', 'quarter'],\n",
    "    columns=['product_category', 'sales_channel'],\n",
    "    aggfunc='sum',\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"\\nFull hierarchical pivot (limited sample):\")\n",
    "print(full_hierarchical.iloc[:8, :6].round(0))  # Show subset for readability"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== CUSTOM AGGREGATION FUNCTIONS ===\n",
      "Custom aggregations pivot table:\n",
      "                    mean                                        std           \\\n",
      "product_category   Books Clothing Electronics Home & Garden   Books Clothing   \n",
      "region                                                                         \n",
      "East              217.14   197.87      238.59        239.66  133.32   176.45   \n",
      "North             214.15   226.34      220.71        257.54  144.81   157.82   \n",
      "South             266.51   260.70      234.70        236.13  153.97   158.86   \n",
      "West              189.69   239.77      218.63        244.56  125.47   163.81   \n",
      "\n",
      "                                           coefficient_of_variation           \\\n",
      "product_category Electronics Home & Garden                    Books Clothing   \n",
      "region                                                                         \n",
      "East                  154.31        150.89                     0.61     0.89   \n",
      "North                 165.37        174.64                     0.68     0.70   \n",
      "South                 175.74        165.01                     0.58     0.61   \n",
      "West                  143.66        167.40                     0.66     0.68   \n",
      "\n",
      "                                           sales_range                       \\\n",
      "product_category Electronics Home & Garden       Books Clothing Electronics   \n",
      "region                                                                        \n",
      "East                    0.65          0.63      648.70   758.44      697.50   \n",
      "North                   0.75          0.68      531.31   663.18      752.50   \n",
      "South                   0.75          0.70      634.55   641.48      773.92   \n",
      "West                    0.66          0.68      436.05   629.05      638.34   \n",
      "\n",
      "                                \n",
      "product_category Home & Garden  \n",
      "region                          \n",
      "East                    583.40  \n",
      "North                   619.36  \n",
      "South                   659.43  \n",
      "West                    656.63  \n",
      "\n",
      "Lambda function aggregations:\n",
      "              <lambda_0>                 <lambda_1>             <lambda_2>  \\\n",
      "sales_channel     Online   Phone   Store     Online Phone Store     Online   \n",
      "region                                                                       \n",
      "East              330.66  337.74  265.63         59     9    38     109.56   \n",
      "North             340.40  317.78  358.12         72    15    34     318.50   \n",
      "South             375.57  293.52  359.79         66    11    35     156.12   \n",
      "West              352.13  323.46  267.42         69     8    22      54.42   \n",
      "\n",
      "                                mean                  \n",
      "sales_channel  Phone   Store  Online   Phone   Store  \n",
      "region                                                \n",
      "East           16.17  168.84  239.05  231.35  195.51  \n",
      "North          72.98   38.41  227.94  235.62  232.47  \n",
      "South          42.03   14.63  255.43  219.03  249.83  \n",
      "West           53.92   53.58  237.52  197.14  205.17  \n"
     ]
    }
   ],
   "source": [
    "# Custom aggregation functions\n",
    "print(\"=== CUSTOM AGGREGATION FUNCTIONS ===\")\n",
    "\n",
    "# Define custom aggregation functions\n",
    "def coefficient_of_variation(series):\n",
    "    \"\"\"Calculate coefficient of variation (std/mean)\"\"\"\n",
    "    return series.std() / series.mean() if series.mean() != 0 else 0\n",
    "\n",
    "def sales_range(series):\n",
    "    \"\"\"Calculate range (max - min)\"\"\"\n",
    "    return series.max() - series.min()\n",
    "\n",
    "def high_value_count(series, threshold=100):\n",
    "    \"\"\"Count values above threshold\"\"\"\n",
    "    return (series > threshold).sum()\n",
    "\n",
    "# Apply custom aggregations\n",
    "custom_agg_pivot = df_business.pivot_table(\n",
    "    values='net_sales',\n",
    "    index='region',\n",
    "    columns='product_category',\n",
    "    aggfunc=[np.mean, np.std, coefficient_of_variation, sales_range],\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"Custom aggregations pivot table:\")\n",
    "print(custom_agg_pivot.round(2))\n",
    "\n",
    "# Lambda functions for inline custom aggregations\n",
    "lambda_agg_pivot = df_business.pivot_table(\n",
    "    values='net_sales',\n",
    "    index='region',\n",
    "    columns='sales_channel',\n",
    "    aggfunc={\n",
    "        'net_sales': [\n",
    "            'mean',\n",
    "            lambda x: x.quantile(0.75),  # 75th percentile\n",
    "            lambda x: (x > x.mean()).sum(),  # Count above average\n",
    "            lambda x: x.max() / x.min() if x.min() > 0 else 0  # Max/Min ratio\n",
    "        ]\n",
    "    },\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"\\nLambda function aggregations:\")\n",
    "print(lambda_agg_pivot.round(2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== TIME-BASED PIVOT TABLES ===\n",
      "Monthly sales by product category:\n",
      "product_category   Books  Clothing  Electronics  Home & Garden\n",
      "month_name                                                    \n",
      "January           4813.0    7034.0       4631.0         3365.0\n",
      "February          4022.0    8492.0       3016.0         5593.0\n",
      "March             6809.0    3461.0       5373.0         6898.0\n",
      "April             5464.0    5000.0       5719.0         5014.0\n",
      "May               5767.0    4023.0       5243.0         6586.0\n",
      "June              4104.0    8360.0       7231.0         2890.0\n",
      "July              5149.0    3183.0       6009.0         4328.0\n",
      "August            3052.0    6048.0       6709.0         5114.0\n",
      "September         4366.0    3026.0       4470.0         8162.0\n",
      "October           3442.0    4719.0       2349.0         4465.0\n",
      "November          2562.0    3316.0       3646.0         5111.0\n",
      "December          2064.0    4231.0       4528.0         3729.0\n",
      "\n",
      "Day of week analysis:\n",
      "              net_sales                    quantity            \n",
      "sales_channel    Online    Phone     Store   Online Phone Store\n",
      "day_of_week                                                    \n",
      "Monday         18611.51  3866.50  10728.14     4.80  5.14  4.91\n",
      "Tuesday        19606.58  3841.09   9615.84     4.90  5.75  4.68\n",
      "Wednesday      22785.70   945.00   9312.78     5.14  4.60  5.00\n",
      "Thursday       22420.93  5605.94   8475.84     5.35  5.75  4.89\n",
      "Friday         21358.62  1274.59  11041.92     5.06  3.18  5.36\n",
      "Saturday       24008.00  1851.37   6557.88     4.85  4.10  4.06\n",
      "Sunday         19141.68  2598.45   9040.94     4.99  4.57  3.88\n"
     ]
    }
   ],
   "source": [
    "# Time-based pivot tables\n",
    "print(\"=== TIME-BASED PIVOT TABLES ===\")\n",
    "\n",
    "# Monthly sales trends by product category\n",
    "monthly_sales = df_business.pivot_table(\n",
    "    values='net_sales',\n",
    "    index='month_name',\n",
    "    columns='product_category',\n",
    "    aggfunc='sum',\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "# Reorder months correctly\n",
    "month_order = ['January', 'February', 'March', 'April', 'May', 'June',\n",
    "               'July', 'August', 'September', 'October', 'November', 'December']\n",
    "monthly_sales = monthly_sales.reindex([m for m in month_order if m in monthly_sales.index])\n",
    "\n",
    "print(\"Monthly sales by product category:\")\n",
    "print(monthly_sales.round(0))\n",
    "\n",
    "# Day of week analysis\n",
    "dow_analysis = df_business.pivot_table(\n",
    "    values=['net_sales', 'quantity'],\n",
    "    index='day_of_week',\n",
    "    columns='sales_channel',\n",
    "    aggfunc={\n",
    "        'net_sales': 'sum',\n",
    "        'quantity': 'mean'\n",
    "    },\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "# Reorder days of week\n",
    "day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']\n",
    "dow_analysis = dow_analysis.reindex([d for d in day_order if d in dow_analysis.index])\n",
    "\n",
    "print(\"\\nDay of week analysis:\")\n",
    "print(dow_analysis.round(2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Cross-tabulation and Contingency Tables\n",
    "\n",
    "Analyzing relationships between categorical variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== CROSS-TABULATION ===\n",
      "Transaction count by Region and Customer Type:\n",
      "customer_type  New  Returning  VIP\n",
      "region                            \n",
      "East            60        128   60\n",
      "North           70        144   53\n",
      "South           68        138   45\n",
      "West            61        119   54\n",
      "\n",
      "Total sales by Region and Customer Type:\n",
      "customer_type      New  Returning      VIP\n",
      "region                                    \n",
      "East           13293.0    29623.0  12529.0\n",
      "North          16544.0    32016.0  12869.0\n",
      "South          17201.0    33234.0  12347.0\n",
      "West           15336.0    26394.0  11304.0\n",
      "\n",
      "Transaction count by Product Category and Sales Channel (with totals):\n",
      "sales_channel     Online  Phone  Store  Total\n",
      "product_category                             \n",
      "Books                140     15     74    229\n",
      "Clothing             169     26     68    263\n",
      "Electronics          166     21     71    258\n",
      "Home & Garden        142     28     80    250\n",
      "Total                617     90    293   1000\n"
     ]
    }
   ],
   "source": [
    "# Basic cross-tabulation\n",
    "print(\"=== CROSS-TABULATION ===\")\n",
    "\n",
    "# Simple crosstab - count of transactions\n",
    "basic_crosstab = pd.crosstab(df_business['region'], df_business['customer_type'])\n",
    "print(\"Transaction count by Region and Customer Type:\")\n",
    "print(basic_crosstab)\n",
    "\n",
    "# Crosstab with values (not just counts)\n",
    "sales_crosstab = pd.crosstab(\n",
    "    df_business['region'],\n",
    "    df_business['customer_type'],\n",
    "    values=df_business['net_sales'],\n",
    "    aggfunc='sum'\n",
    ")\n",
    "print(\"\\nTotal sales by Region and Customer Type:\")\n",
    "print(sales_crosstab.round(0))\n",
    "\n",
    "# Crosstab with margins (totals)\n",
    "crosstab_with_margins = pd.crosstab(\n",
    "    df_business['product_category'],\n",
    "    df_business['sales_channel'],\n",
    "    margins=True,\n",
    "    margins_name='Total'\n",
    ")\n",
    "print(\"\\nTransaction count by Product Category and Sales Channel (with totals):\")\n",
    "print(crosstab_with_margins)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== NORMALIZED CROSS-TABULATION ===\n",
      "Row percentages (Customer type distribution within each region):\n",
      "customer_type   New  Returning   VIP\n",
      "region                              \n",
      "East           24.2       51.6  24.2\n",
      "North          26.2       53.9  19.9\n",
      "South          27.1       55.0  17.9\n",
      "West           26.1       50.9  23.1\n",
      "\n",
      "Column percentages (Region distribution within each customer type):\n",
      "customer_type   New  Returning   VIP\n",
      "region                              \n",
      "East           23.2       24.2  28.3\n",
      "North          27.0       27.2  25.0\n",
      "South          26.3       26.1  21.2\n",
      "West           23.6       22.5  25.5\n",
      "\n",
      "Total percentages (Percentage of overall total):\n",
      "customer_type  New  Returning  VIP\n",
      "region                            \n",
      "East           6.0       12.8  6.0\n",
      "North          7.0       14.4  5.3\n",
      "South          6.8       13.8  4.5\n",
      "West           6.1       11.9  5.4\n"
     ]
    }
   ],
   "source": [
    "# Normalized cross-tabulation (percentages)\n",
    "print(\"=== NORMALIZED CROSS-TABULATION ===\")\n",
    "\n",
    "# Normalize by rows (percentage of each row)\n",
    "crosstab_row_pct = pd.crosstab(\n",
    "    df_business['region'],\n",
    "    df_business['customer_type'],\n",
    "    normalize='index'  # Normalize by rows\n",
    ") * 100\n",
    "\n",
    "print(\"Row percentages (Customer type distribution within each region):\")\n",
    "print(crosstab_row_pct.round(1))\n",
    "\n",
    "# Normalize by columns (percentage of each column)\n",
    "crosstab_col_pct = pd.crosstab(\n",
    "    df_business['region'],\n",
    "    df_business['customer_type'],\n",
    "    normalize='columns'  # Normalize by columns\n",
    ") * 100\n",
    "\n",
    "print(\"\\nColumn percentages (Region distribution within each customer type):\")\n",
    "print(crosstab_col_pct.round(1))\n",
    "\n",
    "# Normalize by total (percentage of grand total)\n",
    "crosstab_total_pct = pd.crosstab(\n",
    "    df_business['region'],\n",
    "    df_business['customer_type'],\n",
    "    normalize='all'  # Normalize by total\n",
    ") * 100\n",
    "\n",
    "print(\"\\nTotal percentages (Percentage of overall total):\")\n",
    "print(crosstab_total_pct.round(1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== MULTI-DIMENSIONAL CROSS-TABULATION ===\n",
      "Three-way analysis (Region & Product Category vs Customer Type):\n",
      "customer_type               New  Returning     VIP\n",
      "region product_category                           \n",
      "East   Books             188.82     228.53  210.82\n",
      "       Clothing          206.15     221.17  154.18\n",
      "       Electronics       240.11     224.41  266.92\n",
      "       Home & Garden     233.78     254.03  215.29\n",
      "North  Books             254.64     183.41  270.87\n",
      "       Clothing          211.37     208.73  279.06\n",
      "       Electronics       233.40     230.79  176.23\n",
      "       Home & Garden     248.49     266.38  238.33\n",
      "South  Books             202.21     273.49  311.87\n",
      "       Clothing          340.66     234.73  198.67\n",
      "       Electronics       245.96     220.96  250.30\n",
      "       Home & Garden     198.59     228.10  318.92\n",
      "West   Books             223.46     193.70  151.62\n",
      "       Clothing          248.72     231.62  250.01\n",
      "       Electronics       251.07     211.88  204.71\n",
      "       Home & Garden     281.34     234.44  231.13\n",
      "\n",
      "Multiple aggregations crosstab:\n",
      "               count                   sum                       mean          \\\n",
      "sales_channel Online Phone Store    Online    Phone     Store  Online   Phone   \n",
      "region                                                                          \n",
      "East             145    18    85  34662.14  4164.23  16618.73  239.05  231.35   \n",
      "North            161    28    78  36698.38  6597.29  18133.00  227.94  235.62   \n",
      "South            151    25    75  38569.20  5475.73  18737.36  255.43  219.03   \n",
      "West             160    19    55  38003.32  3745.70  11284.26  237.52  197.14   \n",
      "\n",
      "                       \n",
      "sales_channel   Store  \n",
      "region                 \n",
      "East           195.51  \n",
      "North          232.47  \n",
      "South          249.83  \n",
      "West           205.17  \n"
     ]
    }
   ],
   "source": [
    "# Multi-dimensional cross-tabulation\n",
    "print(\"=== MULTI-DIMENSIONAL CROSS-TABULATION ===\")\n",
    "\n",
    "# Three-way crosstab\n",
    "three_way_crosstab = pd.crosstab(\n",
    "    [df_business['region'], df_business['product_category']],\n",
    "    df_business['customer_type'],\n",
    "    values=df_business['net_sales'],\n",
    "    aggfunc='mean'\n",
    ")\n",
    "\n",
    "print(\"Three-way analysis (Region & Product Category vs Customer Type):\")\n",
    "print(three_way_crosstab.round(2))\n",
    "\n",
    "# Analysis with multiple aggregations\n",
    "multi_agg_crosstab = pd.crosstab(\n",
    "    df_business['region'],\n",
    "    df_business['sales_channel'],\n",
    "    values=df_business['net_sales'],\n",
    "    aggfunc=['count', 'sum', 'mean']\n",
    ")\n",
    "\n",
    "print(\"\\nMultiple aggregations crosstab:\")\n",
    "print(multi_agg_crosstab.round(2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Data Reshaping: Melt, Pivot, Stack, Unstack\n",
    "\n",
    "Transforming data between wide and long formats."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== MELTING DATA (WIDE TO LONG) ===\n",
      "Wide format data:\n",
      "product_category    salesperson      Books   Clothing  Electronics  \\\n",
      "0                 Alice Johnson  7260.4360  7069.2950    8358.1560   \n",
      "1                     Bob Smith  4511.1555  4336.2970    3679.3625   \n",
      "2                 Charlie Brown  3677.6895  7830.9720   10264.1605   \n",
      "3                  Diana Prince  4861.2975  6275.8940    3348.5570   \n",
      "4                    Eve Wilson  6363.1345  7208.5765    5191.2705   \n",
      "\n",
      "product_category  Home & Garden  \n",
      "0                     5283.4600  \n",
      "1                     5164.6310  \n",
      "2                     4281.9240  \n",
      "3                     6108.1495  \n",
      "4                     6582.1870  \n",
      "\n",
      "Long format data (melted):\n",
      "     salesperson product_category  total_sales\n",
      "0  Alice Johnson            Books    7260.4360\n",
      "1      Bob Smith            Books    4511.1555\n",
      "2  Charlie Brown            Books    3677.6895\n",
      "3   Diana Prince            Books    4861.2975\n",
      "4     Eve Wilson            Books    6363.1345\n",
      "5   Frank Miller            Books    5489.7555\n",
      "6      Grace Lee            Books    5945.8655\n",
      "7    Henry Davis            Books    5273.0950\n",
      "8       Ivy Chen            Books    4812.9940\n",
      "9  Jack Robinson            Books    3419.5195\n",
      "\n",
      "Complex wide format:\n",
      "product_category region    salesperson      Books   Clothing  Electronics  \\\n",
      "0                  East  Alice Johnson  1537.0605   838.5400    1792.4760   \n",
      "1                  East      Bob Smith  2039.4035  1156.4895    1110.9565   \n",
      "2                  East  Charlie Brown   435.2560   952.0800    2610.3275   \n",
      "3                  East   Diana Prince  1874.3990  1845.6785    1898.2705   \n",
      "4                  East     Eve Wilson  1585.4275  1534.6530    1395.2180   \n",
      "\n",
      "product_category  Home & Garden  \n",
      "0                     1693.2745  \n",
      "1                      950.9340  \n",
      "2                     1098.6095  \n",
      "3                     1610.8440  \n",
      "4                     1179.5475  \n",
      "\n",
      "Complex long format:\n",
      "  region    salesperson product_category  total_sales\n",
      "0   East  Alice Johnson            Books    1537.0605\n",
      "1   East      Bob Smith            Books    2039.4035\n",
      "2   East  Charlie Brown            Books     435.2560\n",
      "3   East   Diana Prince            Books    1874.3990\n",
      "4   East     Eve Wilson            Books    1585.4275\n",
      "5   East   Frank Miller            Books    1106.1950\n",
      "6   East      Grace Lee            Books    1556.9905\n",
      "7   East    Henry Davis            Books     875.1640\n",
      "8   East       Ivy Chen            Books     797.3240\n",
      "9   East  Jack Robinson            Books     569.6290\n"
     ]
    }
   ],
   "source": [
    "# Melting data from wide to long format\n",
    "print(\"=== MELTING DATA (WIDE TO LONG) ===\")\n",
    "\n",
    "# Create a wide format dataset first\n",
    "wide_sales = df_business.pivot_table(\n",
    "    values='net_sales',\n",
    "    index='salesperson',\n",
    "    columns='product_category',\n",
    "    aggfunc='sum',\n",
    "    fill_value=0\n",
    ").reset_index()\n",
    "\n",
    "print(\"Wide format data:\")\n",
    "print(wide_sales.head())\n",
    "\n",
    "# Melt to long format\n",
    "long_sales = pd.melt(\n",
    "    wide_sales,\n",
    "    id_vars=['salesperson'],\n",
    "    var_name='product_category',\n",
    "    value_name='total_sales'\n",
    ")\n",
    "\n",
    "print(\"\\nLong format data (melted):\")\n",
    "print(long_sales.head(10))\n",
    "\n",
    "# Melt with multiple ID variables\n",
    "# First create a more complex wide dataset\n",
    "complex_wide = df_business.groupby(['region', 'salesperson', 'product_category'])['net_sales'].sum().unstack(fill_value=0).reset_index()\n",
    "\n",
    "print(\"\\nComplex wide format:\")\n",
    "print(complex_wide.head())\n",
    "\n",
    "# Melt with multiple ID vars\n",
    "complex_long = pd.melt(\n",
    "    complex_wide,\n",
    "    id_vars=['region', 'salesperson'],\n",
    "    var_name='product_category',\n",
    "    value_name='total_sales'\n",
    ")\n",
    "\n",
    "print(\"\\nComplex long format:\")\n",
    "print(complex_long.head(10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== PIVOT OPERATION (LONG TO WIDE) ===\n",
      "Long format data sample:\n",
      "  region product_category customer_type  net_sales\n",
      "0  North         Clothing     Returning     72.810\n",
      "1   East         Clothing     Returning    764.847\n",
      "2   West            Books           VIP     27.660\n",
      "3   West         Clothing           New    252.080\n",
      "4   East         Clothing           VIP    220.014\n",
      "\n",
      "Pivoted to wide format:\n",
      "product_category    Books  Clothing  Electronics  Home & Garden\n",
      "region                                                         \n",
      "East              12377.0   12466.0      16463.0        14140.0\n",
      "North             13492.0   14938.0      15229.0        17770.0\n",
      "South             17590.0   16946.0      14552.0        13695.0\n",
      "West               8157.0   16544.0      12681.0        15652.0\n",
      "\n",
      "Pivoted with reset index:\n",
      "product_category region       Books    Clothing  Electronics  Home & Garden\n",
      "0                  East  12376.8490  12465.7800   16462.7240     14139.7475\n",
      "1                 North  13491.6935  14938.3485   15228.6645     17769.9495\n",
      "2                 South  17589.7640  16945.5435   14551.5775     13695.4025\n",
      "3                  West   8156.6360  16544.4195   12680.5820     15651.6335\n"
     ]
    }
   ],
   "source": [
    "# Pivot operation (long to wide)\n",
    "print(\"=== PIVOT OPERATION (LONG TO WIDE) ===\")\n",
    "\n",
    "# Create long format data\n",
    "long_data = df_business[['region', 'product_category', 'customer_type', 'net_sales']].copy()\n",
    "print(\"Long format data sample:\")\n",
    "print(long_data.head())\n",
    "\n",
    "# Simple pivot\n",
    "pivoted_data = long_data.pivot_table(\n",
    "    values='net_sales',\n",
    "    index='region',\n",
    "    columns='product_category',\n",
    "    aggfunc='sum'\n",
    ")\n",
    "\n",
    "print(\"\\nPivoted to wide format:\")\n",
    "print(pivoted_data.round(0))\n",
    "\n",
    "# Reset index to make it a regular DataFrame\n",
    "pivoted_reset = pivoted_data.reset_index()\n",
    "print(\"\\nPivoted with reset index:\")\n",
    "print(pivoted_reset.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== STACK AND UNSTACK OPERATIONS ===\n",
      "Multi-index DataFrame:\n",
      "customer_type                  New   Returning        VIP\n",
      "region product_category                                  \n",
      "East   Books             2077.0530   7769.9840  2529.8120\n",
      "       Clothing          2679.9640   6856.4155  2929.4005\n",
      "       Electronics       4562.0515   7629.9035  4270.7690\n",
      "       Home & Garden     3974.1930   7366.8250  2798.7295\n",
      "North  Books             2546.4320   7153.0835  3792.1780\n",
      "       Clothing          2959.1455   7514.3230  4464.8800\n",
      "       Electronics       6068.4815   6692.9180  2467.2650\n",
      "       Home & Garden     4969.7755  10655.2095  2144.9645\n",
      "South  Books             2830.9150  10392.7060  4366.1430\n",
      "       Clothing          6472.4515   8685.0290  1788.0630\n",
      "\n",
      "After stacking (columns become rows):\n",
      "region  product_category  customer_type\n",
      "East    Books             New              2077.0530\n",
      "                          Returning        7769.9840\n",
      "                          VIP              2529.8120\n",
      "        Clothing          New              2679.9640\n",
      "                          Returning        6856.4155\n",
      "                          VIP              2929.4005\n",
      "        Electronics       New              4562.0515\n",
      "                          Returning        7629.9035\n",
      "                          VIP              4270.7690\n",
      "        Home & Garden     New              3974.1930\n",
      "dtype: float64\n",
      "\n",
      "After unstacking (back to original):\n",
      "customer_type                  New   Returning        VIP\n",
      "region product_category                                  \n",
      "East   Books             2077.0530   7769.9840  2529.8120\n",
      "       Clothing          2679.9640   6856.4155  2929.4005\n",
      "       Electronics       4562.0515   7629.9035  4270.7690\n",
      "       Home & Garden     3974.1930   7366.8250  2798.7295\n",
      "North  Books             2546.4320   7153.0835  3792.1780\n",
      "       Clothing          2959.1455   7514.3230  4464.8800\n",
      "       Electronics       6068.4815   6692.9180  2467.2650\n",
      "       Home & Garden     4969.7755  10655.2095  2144.9645\n",
      "South  Books             2830.9150  10392.7060  4366.1430\n",
      "       Clothing          6472.4515   8685.0290  1788.0630\n",
      "\n",
      "Unstacking level 0 (region):\n",
      "region                               East      North       South       West\n",
      "product_category customer_type                                             \n",
      "Books            New            2077.0530  2546.4320   2830.9150  3128.4435\n",
      "                 Returning      7769.9840  7153.0835  10392.7060  2905.5140\n",
      "                 VIP            2529.8120  3792.1780   4366.1430  2122.6785\n",
      "Clothing         New            2679.9640  2959.1455   6472.4515  4974.4750\n",
      "                 Returning      6856.4155  7514.3230   8685.0290  8569.8735\n"
     ]
    }
   ],
   "source": [
    "# Stack and Unstack operations\n",
    "print(\"=== STACK AND UNSTACK OPERATIONS ===\")\n",
    "\n",
    "# Create a DataFrame with MultiIndex\n",
    "multi_index_df = df_business.groupby(['region', 'product_category', 'customer_type'])['net_sales'].sum().reset_index()\n",
    "multi_pivot = multi_index_df.pivot_table(\n",
    "    values='net_sales',\n",
    "    index=['region', 'product_category'],\n",
    "    columns='customer_type',\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"Multi-index DataFrame:\")\n",
    "print(multi_pivot.head(10))\n",
    "\n",
    "# Stack operation (columns to rows)\n",
    "stacked = multi_pivot.stack()\n",
    "print(\"\\nAfter stacking (columns become rows):\")\n",
    "print(stacked.head(10))\n",
    "\n",
    "# Unstack operation (rows to columns)\n",
    "unstacked = stacked.unstack()\n",
    "print(\"\\nAfter unstacking (back to original):\")\n",
    "print(unstacked.head(10))\n",
    "\n",
    "# Unstack different levels\n",
    "unstacked_level0 = multi_pivot.stack().unstack(level=0)\n",
    "print(\"\\nUnstacking level 0 (region):\")\n",
    "print(unstacked_level0.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Advanced Reshaping Techniques\n",
    "\n",
    "Complex data transformations for specialized analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== MULTIPLE VALUE COLUMNS MELTING ===\n",
      "Wide format with multiple metrics:\n",
      "  region product_category   net_sales  quantity  total_order\n",
      "0   East            Books  12376.8490       277   225.021912\n",
      "1   East         Clothing  12465.7800       266   205.978571\n",
      "2   East      Electronics  16462.7240       349   245.975420\n",
      "3   East    Home & Garden  14139.7475       301   247.297246\n",
      "4  North            Books  13491.6935       296   221.647357\n",
      "\n",
      "Long format with multiple metrics:\n",
      "   region product_category     metric       value\n",
      "0    East            Books  net_sales  12376.8490\n",
      "1    East         Clothing  net_sales  12465.7800\n",
      "2    East      Electronics  net_sales  16462.7240\n",
      "3    East    Home & Garden  net_sales  14139.7475\n",
      "4   North            Books  net_sales  13491.6935\n",
      "5   North         Clothing  net_sales  14938.3485\n",
      "6   North      Electronics  net_sales  15228.6645\n",
      "7   North    Home & Garden  net_sales  17769.9495\n",
      "8   South            Books  net_sales  17589.7640\n",
      "9   South         Clothing  net_sales  16945.5435\n",
      "10  South      Electronics  net_sales  14551.5775\n",
      "11  South    Home & Garden  net_sales  13695.4025\n",
      "12   West            Books  net_sales   8156.6360\n",
      "13   West         Clothing  net_sales  16544.4195\n",
      "14   West      Electronics  net_sales  12680.5820\n",
      "\n",
      "Reshaping: Region-Metric vs Product Category:\n",
      "product_category       Books  Clothing  Electronics  Home & Garden\n",
      "region metric                                                     \n",
      "East   net_sales    12376.85  12465.78     16462.72       14139.75\n",
      "       quantity       277.00    266.00       349.00         301.00\n",
      "       total_order    225.02    205.98       245.98         247.30\n",
      "North  net_sales    13491.69  14938.35     15228.66       17769.95\n",
      "       quantity       296.00    313.00       320.00         362.00\n",
      "       total_order    221.65    234.40       228.86         265.49\n",
      "South  net_sales    17589.76  16945.54     14551.58       13695.40\n",
      "       quantity       356.00    327.00       301.00         291.00\n",
      "       total_order    274.84    269.23       243.38         244.65\n",
      "West   net_sales     8156.64  16544.42     12680.58       15651.63\n",
      "       quantity       217.00    365.00       271.00         299.00\n",
      "       total_order    198.11    246.91       226.43         253.21\n"
     ]
    }
   ],
   "source": [
    "# Wide to long with multiple value columns\n",
    "print(\"=== MULTIPLE VALUE COLUMNS MELTING ===\")\n",
    "\n",
    "# Create dataset with multiple metrics\n",
    "metrics_wide = df_business.groupby(['region', 'product_category']).agg({\n",
    "    'net_sales': 'sum',\n",
    "    'quantity': 'sum',\n",
    "    'total_order': 'mean'\n",
    "}).reset_index()\n",
    "\n",
    "print(\"Wide format with multiple metrics:\")\n",
    "print(metrics_wide.head())\n",
    "\n",
    "# Melt multiple value columns\n",
    "metrics_long = pd.melt(\n",
    "    metrics_wide,\n",
    "    id_vars=['region', 'product_category'],\n",
    "    value_vars=['net_sales', 'quantity', 'total_order'],\n",
    "    var_name='metric',\n",
    "    value_name='value'\n",
    ")\n",
    "\n",
    "print(\"\\nLong format with multiple metrics:\")\n",
    "print(metrics_long.head(15))\n",
    "\n",
    "# Alternative: melt and then pivot for different structure\n",
    "metrics_pivot = metrics_long.pivot_table(\n",
    "    values='value',\n",
    "    index=['region', 'metric'],\n",
    "    columns='product_category',\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"\\nReshaping: Region-Metric vs Product Category:\")\n",
    "print(metrics_pivot.round(2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== TIME SERIES PIVOT TABLES ===\n",
      "Daily sales pivot (first 10 days):\n",
      "product_category  Books  Clothing  Electronics  Home & Garden\n",
      "date                                                         \n",
      "2024-01-01          NaN      73.0          NaN            NaN\n",
      "2024-01-02          NaN     765.0          NaN            NaN\n",
      "2024-01-03         28.0       NaN          NaN            NaN\n",
      "2024-01-04          NaN     252.0          NaN            NaN\n",
      "2024-01-05          NaN     220.0          NaN            NaN\n",
      "2024-01-06        112.0       NaN          NaN            NaN\n",
      "2024-01-07          NaN       NaN          NaN          172.0\n",
      "2024-01-08          NaN       NaN        201.0            NaN\n",
      "2024-01-09          NaN       NaN          NaN          153.0\n",
      "2024-01-10         24.0       NaN          NaN            NaN\n",
      "\n",
      "7-day rolling average (sample):\n",
      "product_category  Books  Clothing  Electronics  Home & Garden\n",
      "date                                                         \n",
      "2026-09-22        32.15       0.0        60.29         191.03\n",
      "2026-09-23        32.15       0.0        60.29         184.35\n",
      "2026-09-24        18.09       0.0        84.62         184.35\n",
      "2026-09-25        57.20       0.0        84.62          89.29\n",
      "2026-09-26        57.20       0.0       113.45          89.29\n",
      "\n",
      "Monthly sales:\n",
      "product_category   Books  Clothing  Electronics  Home & Garden\n",
      "period                                                        \n",
      "2024-01           1393.0    2789.0        983.0         1657.0\n",
      "2024-02           2322.0    2198.0        877.0         2445.0\n",
      "2024-03           2665.0    1029.0       1393.0         1555.0\n",
      "2024-04           1722.0    1728.0       2782.0         2171.0\n",
      "2024-05           2191.0    1795.0       1068.0         2092.0\n",
      "...                  ...       ...          ...            ...\n",
      "2026-05           1651.0    1084.0       2314.0         2334.0\n",
      "2026-06           2077.0    2676.0       1940.0          261.0\n",
      "2026-07           1445.0     811.0       2807.0         1512.0\n",
      "2026-08            104.0    2701.0       2219.0         1606.0\n",
      "2026-09           1570.0     285.0       1295.0         3671.0\n",
      "\n",
      "[33 rows x 4 columns]\n",
      "\n",
      "Month-over-month growth (%):\n",
      "product_category   Books  Clothing  Electronics  Home & Garden\n",
      "period                                                        \n",
      "2024-01              NaN       NaN          NaN            NaN\n",
      "2024-02             66.7     -21.2        -10.7           47.5\n",
      "2024-03             14.8     -53.2         58.8          -36.4\n",
      "2024-04            -35.4      67.9         99.7           39.6\n",
      "2024-05             27.2       3.9        -61.6           -3.6\n",
      "...                  ...       ...          ...            ...\n",
      "2026-05            -49.0     -24.0         88.6          203.8\n",
      "2026-06             25.8     146.9        -16.2          -88.8\n",
      "2026-07            -30.4     -69.7         44.7          479.7\n",
      "2026-08            -92.8     233.0        -21.0            6.2\n",
      "2026-09           1403.9     -89.4        -41.6          128.5\n",
      "\n",
      "[33 rows x 4 columns]\n"
     ]
    }
   ],
   "source": [
    "# Creating time series pivot tables\n",
    "print(\"=== TIME SERIES PIVOT TABLES ===\")\n",
    "\n",
    "# Daily sales by product category\n",
    "daily_sales = df_business.groupby(['date', 'product_category'])['net_sales'].sum().reset_index()\n",
    "daily_pivot = daily_sales.pivot(index='date', columns='product_category', values='net_sales')\n",
    "\n",
    "print(\"Daily sales pivot (first 10 days):\")\n",
    "print(daily_pivot.head(10).round(0))\n",
    "\n",
    "# Fill missing values and calculate rolling averages\n",
    "daily_pivot_filled = daily_pivot.fillna(0)\n",
    "rolling_avg = daily_pivot_filled.rolling(window=7).mean()\n",
    "\n",
    "print(\"\\n7-day rolling average (sample):\")\n",
    "print(rolling_avg.tail(5).round(2))\n",
    "\n",
    "# Month-over-month growth\n",
    "monthly_sales = df_business.groupby(['year', 'month', 'product_category'])['net_sales'].sum().reset_index()\n",
    "monthly_sales['period'] = monthly_sales['year'].astype(str) + '-' + monthly_sales['month'].astype(str).str.zfill(2)\n",
    "monthly_pivot = monthly_sales.pivot(index='period', columns='product_category', values='net_sales')\n",
    "\n",
    "print(\"\\nMonthly sales:\")\n",
    "print(monthly_pivot.round(0))\n",
    "\n",
    "# Calculate month-over-month growth\n",
    "mom_growth = monthly_pivot.pct_change() * 100\n",
    "print(\"\\nMonth-over-month growth (%):\")\n",
    "print(mom_growth.round(1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== COMPLEX MULTI-LEVEL RESHAPING ===\n",
      "Complex hierarchical data:\n",
      "  region    salesperson  quarter product_category  net_sales  quantity\n",
      "0   East  Alice Johnson        1            Books   710.9865        14\n",
      "1   East  Alice Johnson        1    Home & Garden   428.9600         8\n",
      "2   East  Alice Johnson        2            Books   240.5520         5\n",
      "3   East  Alice Johnson        2         Clothing   436.8720         7\n",
      "4   East  Alice Johnson        2      Electronics   750.9260        20\n",
      "5   East  Alice Johnson        2    Home & Garden    96.1745         2\n",
      "6   East  Alice Johnson        3            Books   223.0520         3\n",
      "7   East  Alice Johnson        3         Clothing   128.8600         2\n",
      "8   East  Alice Johnson        3      Electronics   567.8100         9\n",
      "9   East  Alice Johnson        3    Home & Garden   655.3200        12\n",
      "\n",
      "Salesperson performance by quarter and category:\n",
      "product_category                  Books   Clothing  Electronics  Home & Garden\n",
      "region salesperson   quarter                                                  \n",
      "East   Alice Johnson 1         710.9865     0.0000       0.0000       428.9600\n",
      "                     2         240.5520   436.8720     750.9260        96.1745\n",
      "                     3         223.0520   128.8600     567.8100       655.3200\n",
      "                     4         362.4700   272.8080     473.7400       512.8200\n",
      "       Bob Smith     1         775.7800   360.9700       0.0000       698.4970\n",
      "                     2         729.5005   561.8205     305.0745        10.1970\n",
      "                     3         534.1230   233.6990       0.0000         0.0000\n",
      "                     4           0.0000     0.0000     805.8820       242.2400\n",
      "       Charlie Brown 1           0.0000   675.4200     672.7680       220.9600\n",
      "                     2         141.1830    97.4520    1063.2360       689.1900\n",
      "                     3         233.1210     0.0000     455.7235        31.8835\n",
      "                     4          60.9520   179.2080     418.6000       156.5760\n",
      "       Diana Prince  1         294.5120  1280.4690     542.1370        58.3920\n",
      "                     2        1364.7370   244.2495     183.4500      1018.4200\n",
      "                     3         215.1500   320.9600     610.4835       266.0820\n",
      "\n",
      "Quarterly comparison view (sample):\n",
      "product_category      Books                       Clothing                \\\n",
      "quarter                   1       2      3      4        1      2      3   \n",
      "region salesperson                                                         \n",
      "East   Alice Johnson  711.0   241.0  223.0  362.0      0.0  437.0  129.0   \n",
      "       Bob Smith      776.0   730.0  534.0    0.0    361.0  562.0  234.0   \n",
      "       Charlie Brown    0.0   141.0  233.0   61.0    675.0   97.0    0.0   \n",
      "       Diana Prince   295.0  1365.0  215.0    0.0   1280.0  244.0  321.0   \n",
      "       Eve Wilson       0.0   501.0  281.0  803.0    505.0  942.0   88.0   \n",
      "\n",
      "product_category             \n",
      "quarter                   4  \n",
      "region salesperson           \n",
      "East   Alice Johnson  273.0  \n",
      "       Bob Smith        0.0  \n",
      "       Charlie Brown  179.0  \n",
      "       Diana Prince     0.0  \n",
      "       Eve Wilson       0.0  \n",
      "\n",
      "Region summary:\n",
      "product_category    Books  Clothing  Electronics  Home & Garden\n",
      "region                                                         \n",
      "East              12377.0   12466.0      16463.0        14140.0\n",
      "North             13492.0   14938.0      15229.0        17770.0\n",
      "South             17590.0   16946.0      14552.0        13695.0\n",
      "West               8157.0   16544.0      12681.0        15652.0\n"
     ]
    }
   ],
   "source": [
    "# Complex multi-level reshaping\n",
    "print(\"=== COMPLEX MULTI-LEVEL RESHAPING ===\")\n",
    "\n",
    "# Create complex hierarchical data\n",
    "complex_data = df_business.groupby(['region', 'salesperson', 'quarter', 'product_category']).agg({\n",
    "    'net_sales': 'sum',\n",
    "    'quantity': 'sum'\n",
    "}).reset_index()\n",
    "\n",
    "print(\"Complex hierarchical data:\")\n",
    "print(complex_data.head(10))\n",
    "\n",
    "# Multiple pivot operations\n",
    "# First pivot: Quarter vs Product Category for each salesperson\n",
    "salesperson_pivot = complex_data.pivot_table(\n",
    "    values='net_sales',\n",
    "    index=['region', 'salesperson', 'quarter'],\n",
    "    columns='product_category',\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"\\nSalesperson performance by quarter and category:\")\n",
    "print(salesperson_pivot.head(15))\n",
    "\n",
    "# Stack and unstack for different views\n",
    "# Unstack quarter to see quarterly comparison\n",
    "quarterly_comparison = salesperson_pivot.unstack(level=2)\n",
    "\n",
    "print(\"\\nQuarterly comparison view (sample):\")\n",
    "print(quarterly_comparison.iloc[:5, :8].round(0))  # Show subset\n",
    "\n",
    "# Create summary by region\n",
    "region_summary = salesperson_pivot.groupby('region').sum()\n",
    "print(\"\\nRegion summary:\")\n",
    "print(region_summary.round(0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Business Intelligence Applications\n",
    "\n",
    "Real-world business analysis using pivot tables and reshaping."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== SALES PERFORMANCE DASHBOARD ===\n",
      "1. Regional Performance Summary:\n",
      "       net_sales           quantity total_order\n",
      "            mean       sum      sum       count\n",
      "region                                         \n",
      "East      223.57  55445.10     1193         248\n",
      "North     230.07  61428.66     1291         267\n",
      "South     250.13  62782.29     1275         251\n",
      "West      226.64  53033.27     1152         234\n",
      "\n",
      "2. Category Performance by Quarter:\n",
      "quarter                 1        2        3        4       All\n",
      "product_category                                              \n",
      "Books             15644.0  15334.0  12567.0   8069.0   51615.0\n",
      "Clothing          18988.0  17384.0  12257.0  12266.0   60894.0\n",
      "Electronics       13021.0  18192.0  17188.0  10523.0   58924.0\n",
      "Home & Garden     15857.0  14490.0  17604.0  13305.0   61257.0\n",
      "All               63509.0  65400.0  59617.0  44163.0  232689.0\n",
      "\n",
      "3. Sales Channel Analysis:\n",
      "                    sum                        mean                  \n",
      "customer_type       New Returning       VIP     New Returning     VIP\n",
      "sales_channel                                                        \n",
      "Online         39805.26  76878.78  31248.99  255.16    238.01  226.44\n",
      "Phone           5438.57  10409.81   4134.56  236.46    212.45  229.70\n",
      "Store          17130.46  33977.55  13665.33  214.13    216.42  244.02\n",
      "\n",
      "4. Top 5 Performers:\n",
      "               net_sales  quantity  total_order\n",
      "salesperson                                    \n",
      "Alice Johnson   27971.35       550          118\n",
      "Jack Robinson   27955.98       564          107\n",
      "Charlie Brown   26054.75       529          110\n",
      "Eve Wilson      25345.17       553          107\n",
      "Frank Miller    24301.05       491           96\n"
     ]
    }
   ],
   "source": [
    "# Sales performance dashboard\n",
    "print(\"=== SALES PERFORMANCE DASHBOARD ===\")\n",
    "\n",
    "def create_sales_dashboard(df):\n",
    "    \"\"\"Create comprehensive sales dashboard using pivot tables\"\"\"\n",
    "    dashboard = {}\n",
    "    \n",
    "    # 1. Regional performance summary\n",
    "    dashboard['regional_summary'] = df.pivot_table(\n",
    "        values=['net_sales', 'quantity', 'total_order'],\n",
    "        index='region',\n",
    "        aggfunc={\n",
    "            'net_sales': ['sum', 'mean'],\n",
    "            'quantity': 'sum',\n",
    "            'total_order': 'count'\n",
    "        }\n",
    "    ).round(2)\n",
    "    \n",
    "    # 2. Product category performance\n",
    "    dashboard['category_performance'] = df.pivot_table(\n",
    "        values='net_sales',\n",
    "        index='product_category',\n",
    "        columns='quarter',\n",
    "        aggfunc='sum',\n",
    "        margins=True,\n",
    "        fill_value=0\n",
    "    ).round(0)\n",
    "    \n",
    "    # 3. Sales channel analysis\n",
    "    dashboard['channel_analysis'] = df.pivot_table(\n",
    "        values='net_sales',\n",
    "        index='sales_channel',\n",
    "        columns='customer_type',\n",
    "        aggfunc=['sum', 'mean'],\n",
    "        fill_value=0\n",
    "    ).round(2)\n",
    "    \n",
    "    # 4. Top performers\n",
    "    salesperson_performance = df.groupby('salesperson').agg({\n",
    "        'net_sales': 'sum',\n",
    "        'quantity': 'sum',\n",
    "        'total_order': 'count'\n",
    "    }).round(2)\n",
    "    dashboard['top_performers'] = salesperson_performance.sort_values('net_sales', ascending=False).head(5)\n",
    "    \n",
    "    # 5. Monthly trends\n",
    "    dashboard['monthly_trends'] = df.pivot_table(\n",
    "        values='net_sales',\n",
    "        index='month_name',\n",
    "        columns='product_category',\n",
    "        aggfunc='sum',\n",
    "        fill_value=0\n",
    "    ).round(0)\n",
    "    \n",
    "    return dashboard\n",
    "\n",
    "# Generate dashboard\n",
    "sales_dashboard = create_sales_dashboard(df_business)\n",
    "\n",
    "print(\"1. Regional Performance Summary:\")\n",
    "print(sales_dashboard['regional_summary'])\n",
    "\n",
    "print(\"\\n2. Category Performance by Quarter:\")\n",
    "print(sales_dashboard['category_performance'])\n",
    "\n",
    "print(\"\\n3. Sales Channel Analysis:\")\n",
    "print(sales_dashboard['channel_analysis'])\n",
    "\n",
    "print(\"\\n4. Top 5 Performers:\")\n",
    "print(sales_dashboard['top_performers'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== CUSTOMER SEGMENTATION ANALYSIS ===\n",
      "Customer behavior by type and channel:\n",
      "              discount_amount               net_sales                  \\\n",
      "sales_channel          Online  Phone  Store    Online   Phone   Store   \n",
      "customer_type                                                           \n",
      "New                     11.73  15.24  10.61    255.16  236.46  214.13   \n",
      "Returning               11.45   9.43  11.81    238.01  212.45  216.42   \n",
      "VIP                     11.36  15.36  15.59    226.44  229.70  244.02   \n",
      "\n",
      "              quantity              \n",
      "sales_channel   Online Phone Store  \n",
      "customer_type                       \n",
      "New               5.29  4.78  4.47  \n",
      "Returning         4.88  5.14  4.62  \n",
      "VIP               5.03  4.44  5.21  \n",
      "\n",
      "Purchase patterns (% of quarterly sales):\n",
      "quarter                            1     2     3     4\n",
      "customer_type product_category                        \n",
      "New           Books              6.2   4.4   3.1   4.3\n",
      "              Clothing           6.9   7.2   5.7  10.5\n",
      "              Electronics        4.3   9.5   9.7   8.6\n",
      "              Home & Garden      8.8   3.7   4.5  12.4\n",
      "Returning     Books             13.2  13.8   9.8  11.3\n",
      "              Clothing          16.2  12.8  12.3  12.7\n",
      "              Electronics       11.2  13.8  11.9  11.6\n",
      "              Home & Garden     10.2  15.3  17.7  13.7\n",
      "VIP           Books              5.2   5.3   8.1   2.7\n",
      "              Clothing           6.8   6.5   2.6   4.6\n",
      "              Electronics        4.9   4.5   7.2   3.6\n",
      "              Home & Garden      6.1   3.2   7.3   4.1\n",
      "\n",
      "Customer value distribution:\n",
      "               count    mean     std    min     25%     50%     75%     max\n",
      "customer_type                                                              \n",
      "New            259.0  248.78  155.20  18.31  128.74  217.77  345.09  768.32\n",
      "Returning      529.0  237.24  160.52   9.76  109.41  201.91  342.29  798.01\n",
      "VIP            212.0  239.56  158.12  25.92  115.17  198.15  357.37  673.39\n"
     ]
    }
   ],
   "source": [
    "# Customer segmentation analysis\n",
    "print(\"=== CUSTOMER SEGMENTATION ANALYSIS ===\")\n",
    "\n",
    "# Customer behavior analysis\n",
    "customer_behavior = df_business.pivot_table(\n",
    "    values=['net_sales', 'quantity', 'discount_amount'],\n",
    "    index='customer_type',\n",
    "    columns='sales_channel',\n",
    "    aggfunc={\n",
    "        'net_sales': 'mean',\n",
    "        'quantity': 'mean',\n",
    "        'discount_amount': 'mean'\n",
    "    },\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"Customer behavior by type and channel:\")\n",
    "print(customer_behavior.round(2))\n",
    "\n",
    "# Purchase patterns analysis\n",
    "purchase_patterns = pd.crosstab(\n",
    "    [df_business['customer_type'], df_business['product_category']],\n",
    "    df_business['quarter'],\n",
    "    values=df_business['net_sales'],\n",
    "    aggfunc='sum',\n",
    "    normalize='columns'\n",
    ") * 100\n",
    "\n",
    "print(\"\\nPurchase patterns (% of quarterly sales):\")\n",
    "print(purchase_patterns.round(1))\n",
    "\n",
    "# Customer value distribution\n",
    "value_distribution = df_business.groupby('customer_type')['total_order'].describe()\n",
    "print(\"\\nCustomer value distribution:\")\n",
    "print(value_distribution.round(2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== COHORT ANALYSIS ===\n",
      "Cohort analysis (average sales by quarters since first purchase):\n",
      "periods_since_first      0      1      2      3\n",
      "first_quarter                                  \n",
      "1                    234.0  240.0  219.0  240.0\n",
      "\n",
      "Cohort retention (number of active salespeople):\n",
      "periods_since_first    0    1    2    3\n",
      "first_quarter                          \n",
      "1                    271  273  272  184\n",
      "\n",
      "Retention rates (%):\n",
      "periods_since_first      0      1      2     3\n",
      "first_quarter                                 \n",
      "1                    100.0  100.7  100.4  67.9\n"
     ]
    }
   ],
   "source": [
    "# Cohort analysis using pivot tables\n",
    "print(\"=== COHORT ANALYSIS ===\")\n",
    "\n",
    "# Simplified cohort analysis by quarter\n",
    "# Group customers by their first purchase quarter\n",
    "customer_first_purchase = df_business.groupby('salesperson')['quarter'].min().reset_index()\n",
    "customer_first_purchase.columns = ['salesperson', 'first_quarter']\n",
    "\n",
    "# Merge back to get cohort information\n",
    "cohort_data = df_business.merge(customer_first_purchase, on='salesperson')\n",
    "cohort_data['periods_since_first'] = cohort_data['quarter'] - cohort_data['first_quarter']\n",
    "\n",
    "# Create cohort table\n",
    "cohort_table = cohort_data.pivot_table(\n",
    "    values='net_sales',\n",
    "    index='first_quarter',\n",
    "    columns='periods_since_first',\n",
    "    aggfunc='mean',\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"Cohort analysis (average sales by quarters since first purchase):\")\n",
    "print(cohort_table.round(0))\n",
    "\n",
    "# Retention analysis\n",
    "cohort_counts = cohort_data.pivot_table(\n",
    "    values='salesperson',\n",
    "    index='first_quarter',\n",
    "    columns='periods_since_first',\n",
    "    aggfunc='count',\n",
    "    fill_value=0\n",
    ")\n",
    "\n",
    "print(\"\\nCohort retention (number of active salespeople):\")\n",
    "print(cohort_counts)\n",
    "\n",
    "# Calculate retention rates\n",
    "cohort_retention = cohort_counts.divide(cohort_counts.iloc[:, 0], axis=0) * 100\n",
    "print(\"\\nRetention rates (%):\")\n",
    "print(cohort_retention.round(1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Practice Exercises\n",
    "\n",
    "Apply pivot tables and reshaping to complex business scenarios:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 1: Multi-dimensional Business Intelligence Report\n",
    "# Create a comprehensive BI report that includes:\n",
    "# - Sales performance across multiple dimensions\n",
    "# - Trend analysis with period-over-period comparisons\n",
    "# - Customer segmentation insights\n",
    "# - Product performance matrix\n",
    "# - Actionable recommendations based on pivot table insights\n",
    "\n",
    "def create_comprehensive_bi_report(df):\n",
    "    \"\"\"Create multi-dimensional business intelligence report\"\"\"\n",
    "    # Your implementation here\n",
    "    pass\n",
    "\n",
    "# bi_report = create_comprehensive_bi_report(df_business)\n",
    "# print(\"Comprehensive BI Report:\")\n",
    "# for section, data in bi_report.items():\n",
    "#     print(f\"\\n{section}:\")\n",
    "#     print(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 2: Dynamic Pivot Table Generator\n",
    "# Create a flexible function that can generate pivot tables with:\n",
    "# - User-specified dimensions (rows, columns, values)\n",
    "# - Multiple aggregation functions\n",
    "# - Automatic handling of different data types\n",
    "# - Export capabilities for different formats\n",
    "\n",
    "# Your code here:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 3: Advanced Reshaping Challenge\n",
    "# Transform the data through multiple reshaping operations:\n",
    "# - Convert to time series format\n",
    "# - Create rolling calculations\n",
    "# - Build comparison matrices\n",
    "# - Generate variance analysis reports\n",
    "\n",
    "# Your code here:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Key Takeaways\n",
    "\n",
    "1. **Pivot Tables**:\n",
    "   - **`pivot_table()`**: Most flexible, handles duplicates with aggregation\n",
    "   - **`pivot()`**: Simple reshaping, requires unique index-column combinations\n",
    "   - **`crosstab()`**: Specialized for frequency tables and cross-tabulation\n",
    "\n",
    "2. **Reshaping Operations**:\n",
    "   - **`melt()`**: Wide to long format (unpivot)\n",
    "   - **`pivot()`**: Long to wide format\n",
    "   - **`stack()`**: Column to row index\n",
    "   - **`unstack()`**: Row index to column\n",
    "\n",
    "3. **Best Practices**:\n",
    "   - Use `fill_value=0` to handle missing combinations\n",
    "   - Add `margins=True` for totals when needed\n",
    "   - Choose appropriate aggregation functions for your data\n",
    "   - Consider data types when reshaping\n",
    "\n",
    "4. **Business Applications**:\n",
    "   - Sales performance analysis\n",
    "   - Customer segmentation\n",
    "   - Trend analysis and forecasting\n",
    "   - Cohort and retention analysis\n",
    "\n",
    "## Pivot Table Quick Reference\n",
    "\n",
    "```python\n",
    "# Basic pivot table\n",
    "df.pivot_table(values='sales', index='region', columns='product', aggfunc='sum')\n",
    "\n",
    "# Multiple aggregations\n",
    "df.pivot_table(values='sales', index='region', aggfunc=['sum', 'mean', 'count'])\n",
    "\n",
    "# With margins (totals)\n",
    "df.pivot_table(values='sales', index='region', columns='product', \n",
    "               aggfunc='sum', margins=True)\n",
    "\n",
    "# Cross-tabulation\n",
    "pd.crosstab(df['region'], df['product'], normalize='index')\n",
    "\n",
    "# Reshaping\n",
    "pd.melt(df, id_vars=['id'], value_vars=['col1', 'col2'])  # Wide to long\n",
    "df.pivot(index='date', columns='category', values='value')  # Long to wide\n",
    "```\n",
    "\n",
    "## Common Use Cases\n",
    "\n",
    "| Scenario | Best Tool | Key Parameters |\n",
    "|----------|-----------|----------------|\n",
    "| Sales by region/product | `pivot_table()` | `values='sales', index='region', columns='product'` |\n",
    "| Frequency analysis | `crosstab()` | `normalize='index'` for percentages |\n",
    "| Time series analysis | `pivot()` + `unstack()` | Handle dates in index |\n",
    "| Data normalization | `melt()` | `id_vars` for identifiers |\n",
    "| Multi-level analysis | Hierarchical indexing | Multiple columns in `index`/`columns` |"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}