Project

Air Quality Analysis

An exploratory analysis of Pune air-quality sensor data to identify pollution hotspots, daily patterns, and intervention opportunities.

Data Analysis Jan 21, 2026 3 min read
Pandas NumPy Plotly

Problem

Pune has multiple air-quality monitoring stations, but raw sensor readings are difficult to turn into decisions. This project analyzes more than 103,000 readings from 10 stations to answer a practical question: where is air quality consistently worse, when does exposure spike, and what patterns are strong enough to act on?

Dataset Snapshot

MetricValue
Data points103,000+
Monitoring stations10
Parameters tracked28
Time periodMay-Aug 2019

Findings

Transport corridors dominate the ranking

Location Ranking

Hadapsar Gadital and Pune Railway Station stand out as the most polluted locations in the dataset. Both are traffic-heavy areas, which makes them good candidates for targeted interventions instead of city-wide blanket measures.

Morning commute hours are the main risk window

Hourly Patterns

PM2.5, PM10, and NO2 all spike between 8 AM and 10 AM, aligning with daily commute patterns. This matters more than a daily average because it points to the hours when people are most likely to face the worst exposure.

TimeRisk levelPractical takeaway
5-7 AMLowBest window for outdoor activity
8-10 AMHighAvoid outdoor exertion where possible
12-4 PMModerateConditions vary by location
10 PM+LowSafer than peak commute hours

Weekends show a measurable drop

Weekday vs Weekend

Pollution falls by roughly 15% on weekends, which strengthens the case that traffic is one of the main drivers. That gives city planners a cleaner signal than general assumptions about “urban pollution.”

Clustering separates chronic hotspots from occasional spikes

Cluster Analysis

K-means clustering split the stations into three practical groups:

ZoneMeaningSuggested response
GreenUsually cleanMaintain current conditions
CautionSpikes at certain timesIncrease monitoring and alerts
HotspotPersistently poor air qualityPrioritize mitigation

Pollutants move together

Correlation Matrix

PM2.5 and PM10 are strongly correlated, which suggests they often come from the same sources, such as traffic and dust. That matters because one well-targeted intervention can improve more than one metric at once.

What This Suggests

  • Focus enforcement and mitigation near transport-heavy locations first.
  • Treat the morning commute window as the highest-priority exposure period.
  • Use weekend deltas as evidence when evaluating traffic-related policies.
  • Group stations by behavior, not just absolute averages, when deciding where to invest.

Approach

StepWhat I did
Data cleaningParsed timestamps, handled missing values, flagged bad readings
Exploratory analysisCompared stations, time windows, and weekday/weekend shifts
Correlation analysisMeasured how pollutants move together
ClusteringGrouped stations by pollution behavior
Threshold comparisonChecked readings against NAAQS-style limits

Outcome

The result is a notebook-based analysis that makes the dataset easier to reason about for both residents and planners. Instead of a generic “air quality is bad” conclusion, it points to specific places, times, and policy levers that deserve attention.