In this topic, we'll delve into Pandas, a powerful library for data manipulation and analysis in Python. Pandas provides data structures like DataFrames and Series, along with functions for reading and writing data from various file formats. We'll cover the basics of Pandas, including data ingestion, manipulation, indexing, and more.
Pandas is an open-source Python library built on top of NumPy, designed for data manipulation and analysis. It provides easy-to-use data structures like DataFrame and Series, along with a wide range of functions for performing operations on structured data.
import pandas as pd
# Create a Pandas Series
data = pd.Series([1, 2, 3, 4, 5])
print("Pandas Series:")
print(data)
import pandas as pd
# Create a Pandas Series
data = pd.Series([1, 2, 3, 4, 5])
print("Pandas Series:")
print(data)
pd
.pd.Series()
function with a list of data.A Pandas Series is a one-dimensional array-like object that can hold data of any type. It consists of a sequence of values and an associated array of labels called the index.
import pandas as pd
# Create a Pandas Series
data = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
print("Pandas Series:")
print(data)
index
parameter of the pd.Series()
function.A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or SQL table, where each column represents a different variable, and each row represents a different observation.
import pandas as pd
# Create a Pandas DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
print("Pandas DataFrame:")
print(df)
data
.Pandas provides functions for reading data from various file formats, such as CSV, Excel, JSON, and more.
import pandas as pd
# Read data from a CSV file into a DataFrame
df = pd.read_csv('data.csv')
print("DataFrame from CSV:")
print(df)
pd.read_csv()
function to read data from a CSV file named data.csv
into a Pandas DataFrame.Pandas also allows writing data from DataFrames to various file formats.
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
# Write DataFrame to a CSV file
df.to_csv('output.csv', index=False)
print("DataFrame written to CSV file.")
df
.df.to_csv()
function to write the DataFrame to a CSV file named output.csv
, specifying index=False
to exclude the DataFrame index from the output.Pandas allows accessing DataFrame columns using their column names.
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
# Access a specific column
print("Age Column:")
print(df['Age'])
df
.Age
column of the DataFrame using bracket notation (df['Age']
).Age
column to the console.Pandas supports various methods for selecting specific rows and columns from a DataFrame, such as .loc[]
and .iloc[]
.
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
# Select rows and columns using .loc[]
print("Selecting Rows and Columns using .loc[]:")
print(df.loc['B', 'Age'])
# Select rows and columns using .iloc[]
print("Selecting Rows and Columns using .iloc[]:")
print(df.iloc[1, 1])
df
with custom row indices..loc[]
and .iloc[]
methods..loc[]
is used for label-based indexing, while .iloc[]
is used for integer-based indexing.Pandas allows aggregating data using functions like groupby()
to group data based on one or more columns and then perform aggregate functions on the grouped data.
import pandas as pd
# Create a DataFrame
data = {'City': ['New York', 'Los Angeles', 'Chicago', 'New York', 'Chicago'],
'Population': [8.6, 3.9, 2.7, 8.6, 2.7]}
df = pd.DataFrame(data)
# Group by City and calculate mean population
mean_population = df.groupby('City')['Population'].mean()
print("Mean Population by City:")
print(mean_population)
groupby()
function to group the data by city.mean()
function.Pandas provides functions like merge()
and join()
to merge multiple DataFrames based on one or more common columns.
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 4], 'Age': [25, 30, 35]})
# Merge DataFrames on 'ID' column
merged_df = pd.merge(df1, df2, on='ID', how='inner')
print("Merged DataFrame:")
print(merged_df)
df1
and df2
with a common column ‘ID’.merge()
function to merge the DataFrames based on the ‘ID’ column.how='inner'
parameter specifies to perform an inner join.Pandas provides functions like isna()
and fillna()
to detect and handle missing data in DataFrames.
import pandas as pd
import numpy as np
# Create a DataFrame with missing values
data = {'A': [1, np.nan, 3, 4],
'B': [5, 6, np.nan, 8]}
df = pd.DataFrame(data)
# Check for missing values
print("Missing Values:")
print(df.isna())
# Fill missing values with 0
filled_df = df.fillna(0)
print("DataFrame after Filling Missing Values:")
print(filled_df)
isna()
function to check for missing values in the DataFrame.fillna()
function to fill missing values with 0.In this topic, we've covered the basics of Pandas, including its data structures, data ingestion, indexing, and selection capabilities. Pandas is a versatile library that simplifies data manipulation and analysis tasks, making it an essential tool for data scientists, analysts, and Python developers.
By understanding the fundamentals of Pandas and mastering its usage, developers can efficiently work with structured data, perform data wrangling tasks, and gain insights from their datasets. Happy Coding!❤️