crypto_bot_training/Session_01/PandasDataFrame-exmples/01_creating_dataframes.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Session 1 - DataFrames - Lesson 1: Creating DataFrames\n",
    "\n",
    "## Learning Objectives\n",
    "- Understand different methods to create pandas DataFrames\n",
    "- Learn to create DataFrames from dictionaries, lists, and NumPy arrays\n",
    "- Practice with various data types and structures\n",
    "\n",
    "## Prerequisites\n",
    "- Basic Python knowledge\n",
    "- Understanding of lists and dictionaries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Pandas version: 2.2.3\n",
      "NumPy version: 2.2.6\n"
     ]
    }
   ],
   "source": [
    "# Import required libraries\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from datetime import datetime, timedelta\n",
    "\n",
    "print(f\"Pandas version: {pd.__version__}\")\n",
    "print(f\"NumPy version: {np.__version__}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Method 1: Creating DataFrame from Dictionary\n",
    "\n",
    "This is the most common and intuitive way to create a DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Student DataFrame:\n",
      "      Name  Age Grade  Score\n",
      "0    Alice   23     A     95\n",
      "1      Bob   25     B     87\n",
      "2  Charlie   22     A     92\n",
      "3    Diana   24     C     78\n",
      "4      Eve   23     B     89\n",
      "\n",
      "Shape: (5, 4)\n",
      "Data types:\n",
      "Name     object\n",
      "Age       int64\n",
      "Grade    object\n",
      "Score     int64\n",
      "dtype: object\n"
     ]
    }
   ],
   "source": [
    "# Creating DataFrame from dictionary\n",
    "student_data = {\n",
    "    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],\n",
    "    'Age': [23, 25, 22, 24, 23],\n",
    "    'Grade': ['A', 'B', 'A', 'C', 'B'],\n",
    "    'Score': [95, 87, 92, 78, 89]\n",
    "}\n",
    "\n",
    "df_students = pd.DataFrame(student_data)\n",
    "print(\"Student DataFrame:\")\n",
    "print(df_students)\n",
    "print(f\"\\nShape: {df_students.shape}\")\n",
    "print(f\"Data types:\\n{df_students.dtypes}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Method 2: Creating DataFrame from Lists\n",
    "\n",
    "You can create DataFrames from separate lists by combining them in a dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cities DataFrame:\n",
      "       City  Population_Million    Country\n",
      "0  New York                 8.4        USA\n",
      "1    London                 8.9         UK\n",
      "2     Tokyo                13.9      Japan\n",
      "3     Paris                 2.1     France\n",
      "4    Sydney                 5.3  Australia\n",
      "\n",
      "Index: [0, 1, 2, 3, 4]\n",
      "Columns: ['City', 'Population_Million', 'Country']\n"
     ]
    }
   ],
   "source": [
    "# Creating DataFrame from separate lists\n",
    "cities = ['New York', 'London', 'Tokyo', 'Paris', 'Sydney']\n",
    "populations = [8.4, 8.9, 13.9, 2.1, 5.3]\n",
    "countries = ['USA', 'UK', 'Japan', 'France', 'Australia']\n",
    "\n",
    "df_cities = pd.DataFrame({\n",
    "    'City': cities,\n",
    "    'Population_Million': populations,\n",
    "    'Country': countries\n",
    "})\n",
    "\n",
    "print(\"Cities DataFrame:\")\n",
    "print(df_cities)\n",
    "print(f\"\\nIndex: {df_cities.index.tolist()}\")\n",
    "print(f\"Columns: {df_cities.columns.tolist()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Method 3: Creating DataFrame from NumPy Array\n",
    "\n",
    "This method is useful when working with numerical data or when you need random data for testing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Random DataFrame:\n",
      "      Column_A  Column_B  Column_C\n",
      "Row1        52        93        15\n",
      "Row2        72        61        21\n",
      "Row3        83        87        75\n",
      "Row4        75        88        24\n",
      "Row5         3        22        53\n",
      "\n",
      "Summary statistics:\n",
      "        Column_A   Column_B   Column_C\n",
      "count   5.000000   5.000000   5.000000\n",
      "mean   57.000000  70.200000  37.600000\n",
      "std    32.272279  29.693434  25.530374\n",
      "min     3.000000  22.000000  15.000000\n",
      "25%    52.000000  61.000000  21.000000\n",
      "50%    72.000000  87.000000  24.000000\n",
      "75%    75.000000  88.000000  53.000000\n",
      "max    83.000000  93.000000  75.000000\n"
     ]
    }
   ],
   "source": [
    "# Creating DataFrame from NumPy array\n",
    "np.random.seed(42)  # For reproducible results\n",
    "random_data = np.random.randint(1, 100, size=(5, 3))\n",
    "\n",
    "df_random = pd.DataFrame(random_data, \n",
    "                        columns=['Column_A', 'Column_B', 'Column_C'],\n",
    "                        index=['Row1', 'Row2', 'Row3', 'Row4', 'Row5'])\n",
    "\n",
    "print(\"Random DataFrame:\")\n",
    "print(df_random)\n",
    "print(f\"\\nSummary statistics:\")\n",
    "print(df_random.describe())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Method 4: Creating DataFrame with Custom Index\n",
    "\n",
    "You can specify custom row labels (index) when creating DataFrames."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Products DataFrame with Custom Index:\n",
      "         Product  Price  Stock\n",
      "PROD001   Laptop   1200     15\n",
      "PROD002    Phone    800     50\n",
      "PROD003   Tablet    600     30\n",
      "PROD004  Monitor    300     20\n",
      "\n",
      "Accessing by index label 'PROD002':\n",
      "Product    Phone\n",
      "Price        800\n",
      "Stock         50\n",
      "Name: PROD002, dtype: object\n"
     ]
    }
   ],
   "source": [
    "# Creating DataFrame with custom index\n",
    "product_data = {\n",
    "    'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor'],\n",
    "    'Price': [1200, 800, 600, 300],\n",
    "    'Stock': [15, 50, 30, 20]\n",
    "}\n",
    "\n",
    "# Custom index using product codes\n",
    "custom_index = ['PROD001', 'PROD002', 'PROD003', 'PROD004']\n",
    "df_products = pd.DataFrame(product_data, index=custom_index)\n",
    "\n",
    "print(\"Products DataFrame with Custom Index:\")\n",
    "print(df_products)\n",
    "print(f\"\\nAccessing by index label 'PROD002':\")\n",
    "print(df_products.loc['PROD002'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Method 5: Creating Empty DataFrame and Adding Data\n",
    "\n",
    "Sometimes you need to start with an empty DataFrame and add data incrementally."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Empty DataFrame:\n",
      "Empty DataFrame\n",
      "Columns: [Date, Temperature, Humidity, Pressure]\n",
      "Index: []\n",
      "Shape: (0, 4)\n",
      "\n",
      "DataFrame after adding data:\n",
      "         Date  Temperature  Humidity  Pressure\n",
      "0  2024-01-01         22.5        65    1013.2\n",
      "1  2024-01-02         24.1        68    1015.1\n",
      "2  2024-01-03         21.8        72    1012.8\n"
     ]
    }
   ],
   "source": [
    "# Creating empty DataFrame with specified columns\n",
    "columns = ['Date', 'Temperature', 'Humidity', 'Pressure']\n",
    "df_weather = pd.DataFrame(columns=columns)\n",
    "\n",
    "print(\"Empty DataFrame:\")\n",
    "print(df_weather)\n",
    "print(f\"Shape: {df_weather.shape}\")\n",
    "\n",
    "# Adding data row by row (not recommended for large datasets)\n",
    "weather_data = [\n",
    "    ['2024-01-01', 22.5, 65, 1013.2],\n",
    "    ['2024-01-02', 24.1, 68, 1015.1],\n",
    "    ['2024-01-03', 21.8, 72, 1012.8]\n",
    "]\n",
    "\n",
    "for row in weather_data:\n",
    "    df_weather.loc[len(df_weather)] = row\n",
    "\n",
    "print(\"\\nDataFrame after adding data:\")\n",
    "print(df_weather)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Practice Exercises\n",
    "\n",
    "Try these exercises to reinforce your learning:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 1: Create a DataFrame from dictionary with employee information\n",
    "# Include: Employee ID, Name, Department, Salary, Years of Experience\n",
    "\n",
    "# Your code here:\n",
    "employee_data = {\n",
    "    # Add your data here\n",
    "}\n",
    "\n",
    "# df_employees = pd.DataFrame(employee_data)\n",
    "# print(df_employees)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 2: Create a DataFrame using NumPy with 6 rows and 4 columns\n",
    "# Use column names: 'A', 'B', 'C', 'D'\n",
    "# Use row indices: 'R1', 'R2', 'R3', 'R4', 'R5', 'R6'\n",
    "\n",
    "# Your code here:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise 3: Create a DataFrame with mixed data types\n",
    "# Include at least one string, integer, float, and boolean column\n",
    "\n",
    "# Your code here:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Key Takeaways\n",
    "\n",
    "1. **Dictionary method** is most intuitive for creating DataFrames\n",
    "2. **NumPy arrays** are useful for numerical data and testing\n",
    "3. **Custom indices** provide meaningful row labels\n",
    "4. **Empty DataFrames** can be useful but avoid adding rows one by one for large datasets\n",
    "5. Always check the **shape** and **data types** of your DataFrame after creation\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}