How to filter and slice DataFrames

· Category: Data Science

Short answer

Filtering and slicing extract subsets of data for focused analysis, transformation, or visualization.

Steps

  1. Use boolean indexing for conditional filtering based on column values.
  2. Select rows and columns by labels with loc using slice notation.
  3. Select rows and columns by integer position with iloc.
  4. Use query for readable multi-condition filters on named columns.
  5. Combine conditions with bitwise operators and parentheses for complex logic.

Tips

  • Use isin for membership checks instead of chaining multiple equality conditions.
  • Create boolean masks separately for readability and reuse.
  • Avoid chained indexing like df[a][b] which can return copies unpredictably.
  • Use between for inclusive range filtering on numeric columns.

Common issues

  • SettingWithCopy warnings when modifying filtered views instead of copies.
  • Operator precedence bugs when combining and and or without parentheses.
  • Label-based slicing being inclusive on both ends unlike Python slicing.
  • Performance degradation from complex boolean masks on very large DataFrames.

Example

import pandas as pd
import numpy as np

df = pd.DataFrame({'sales': [100, 150, 200, np.nan]})
df['sales'] = df['sales'].fillna(df['sales'].median())
print(df.describe())

This snippet creates a DataFrame, handles a missing value with the median, and prints summary statistics common in exploratory analysis.