How to filter and slice DataFrames
· Category: Data Science
Short answer
Filtering and slicing extract subsets of data for focused analysis, transformation, or visualization.
Steps
- Use boolean indexing for conditional filtering based on column values.
- Select rows and columns by labels with loc using slice notation.
- Select rows and columns by integer position with iloc.
- Use query for readable multi-condition filters on named columns.
- Combine conditions with bitwise operators and parentheses for complex logic.
Tips
- Use isin for membership checks instead of chaining multiple equality conditions.
- Create boolean masks separately for readability and reuse.
- Avoid chained indexing like df[a][b] which can return copies unpredictably.
- Use between for inclusive range filtering on numeric columns.
Common issues
- SettingWithCopy warnings when modifying filtered views instead of copies.
- Operator precedence bugs when combining and and or without parentheses.
- Label-based slicing being inclusive on both ends unlike Python slicing.
- Performance degradation from complex boolean masks on very large DataFrames.
Example
import pandas as pd
import numpy as np
df = pd.DataFrame({'sales': [100, 150, 200, np.nan]})
df['sales'] = df['sales'].fillna(df['sales'].median())
print(df.describe())
This snippet creates a DataFrame, handles a missing value with the median, and prints summary statistics common in exploratory analysis.