"Pandas DataFrames and Series" in Python introduces fundamental concepts and practical techniques for working with Pandas, a powerful library for data manipulation and analysis. Pandas provides intuitive data structures like DataFrames and Series, along with a rich set of functions for data cleaning, exploration, and transformation.
Pandas is a powerful Python library used for data manipulation and analysis. It provides data structures like DataFrame and Series, along with a wide range of functions for data cleaning, exploration, and transformation.
Pandas offers several advantages:
You can install Pandas using pip, the Python package manager:
pip install pandas
Once installed, you can import Pandas into your Python scripts or interactive sessions using:
import pandas as pd
Here, pd
is a commonly used alias for Pandas, making it easier to reference Pandas functions and objects.
Pandas Series are one-dimensional labeled arrays capable of holding any data type. You can create a Series from a Python list, array, or dictionary.
import pandas as pd
# Creating a Series from a list
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
print(series)
0 1
1 2
2 3
3 4
4 5
dtype: int64
pd
.df
from a dictionary data
.A DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or SQL table, where each column represents a different variable, and each row represents a different observation.
You can create DataFrames from various data sources such as dictionaries, lists, NumPy arrays, or external files like CSV or Excel.
import pandas as pd
# Creating a DataFrame from a dictionary
data = {'Name': ['John', 'Emma', 'Ryan', 'Emily'],
'Age': [25, 30, 35, 28],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)
print(df)
Name Age City
0 John 25 New York
1 Emma 30 Los Angeles
2 Ryan 35 Chicago
3 Emily 28 Houston
df['Name']
.df[df['Age'] > 28]
.You can perform various operations on DataFrames, such as selecting columns, filtering rows, and computing summary statistics.
# Selecting a column
print(df['Name'])
# Filtering rows based on a condition
print(df[df['Age'] > 28])
# Computing summary statistics
print(df.describe())
0 John
1 Emma
2 Ryan
3 Emily
Name: Name, dtype: object
Name Age City
1 Emma 30 Los Angeles
2 Ryan 35 Chicago
3 Emily 28 Houston
Age
count 4.000000
mean 29.500000
std 4.041452
min 25.000000
25% 27.250000
50% 29.000000
75% 31.250000
max 35.000000
df['Name']
.df[df['Age'] > 28]
.Pandas provides several ways to index and select data in a DataFrame, including using column names, row indices, and boolean conditions.
# Selecting a single column
print(df['Name'])
# Selecting multiple columns
print(df[['Name', 'Age']])
# Selecting rows by index
print(df.loc[1])
# Selecting rows and columns by index
print(df.loc[1, 'Age'])
# Selecting rows based on a condition
print(df[df['Age'] > 28])
[]
to select a single column or multiple columns by their names.loc[]
accessor to select rows by their index labels.loc[]
.You can set one of the columns as the index of the DataFrame using the set_index()
method.
# Setting 'Name' column as index
df.set_index('Name', inplace=True)
print(df)
set_index()
method to set the ‘Name’ column as the index of the DataFrame.inplace=True
parameter modifies the DataFrame in place, without returning a new DataFrame.If you have set an index and want to reset it back to the default integer index, you can use the reset_index()
method.
# Resetting index
df.reset_index(inplace=True)
print(df)
reset_index()
method to reset the index of the DataFrame back to the default integer index.inplace=True
parameter modifies the DataFrame in place, without returning a new DataFrame.You can add new columns to a DataFrame or remove existing ones using simple assignment or the drop()
method.
# Adding a new column
df['Gender'] = ['Male', 'Female', 'Male', 'Female']
print(df)
# Removing a column
df.drop(columns=['Gender'], inplace=True)
print(df)
df
.drop()
method to remove the ‘Gender’ column from the DataFrame. The columns
parameter specifies the columns to drop, and inplace=True
modifies the DataFrame in place.Pandas provides methods like fillna()
and dropna()
to handle missing values in DataFrames.
# Filling missing values with a specified value
df.fillna(0, inplace=True)
# Dropping rows with missing values
df.dropna(inplace=True)
fillna()
method to fill missing values in the DataFrame df
with a specified value (in this case, 0).dropna()
method to drop rows with missing values from the DataFrame df
. The inplace=True
parameter modifies the DataFrame in place.You can group data in a DataFrame based on one or more columns and perform aggregation operations.
# Grouping data by 'City' and computing mean age
grouped_df = df.groupby('City')['Age'].mean()
print(grouped_df)
groupby()
method to group the DataFrame df
by the ‘City’ column.mean()
function to compute the average age within each group.grouped_df
is a Series containing the mean age for each city.Throughout the topic, we embarked on a journey into the world of Pandas DataFrames and Series, exploring fundamental concepts and practical techniques for data manipulation and analysis in Python. Moving forward, we delved into advanced topics such as indexing and selecting data, data manipulation operations like adding and removing columns, cleaning missing values, and handling data aggregation through grouping. Happy coding! ❤️