About This Project
An interactive exploration of AR Rahman's complete discography —
282 albums (across multiple languages) and 2,130 tracks spanning 1992 to 2026,
visualized through data. Browse sonic fingerprints, trace collaborator networks,
and discover patterns across three decades of one of cinema's most inventive composers.
Built by Swaroop and Sharan, who also host
Brothers in Music: The AR Rahman Edition — a podcast where two brothers
explore Rahman's discography year by year.
The Podcast
View all 13 episodes
- 1. 1992
- 2. 1993
- 3. 1994 (Part 1)
- 4. 1994 (Part 2)
- 5. 1995 (Part 1)
- 6. 1995 (Part 2)
- 7. 1996
- 8. 1997 (Part 1)
- 9. 1997 (Part 2)
- 10. 1997: Vande Mataram
- 11. 1998 (Part 1: Dil Se)
- 12. 1998 (Part 2: 1947 Earth)
- 13. 1999 (Part 1)
Data Sources
Track and film metadata was scraped from Wikipedia's discography pages for
AR Rahman. This includes album names, track listings, singer credits, lyricist credits,
director, cast, release year, and language.
Audio features come from two sources:
- Spotify (via ReccoBeats) —
provides danceability, energy, valence, acousticness, and tempo for tracks with a
Spotify match (~35% of the catalog).
- Librosa (direct audio analysis) — computes rhythmic complexity,
spectral features, and harmonic structure from the audio waveform (~64% of the catalog).
For tracks with Spotify data, we use Spotify's features directly. For tracks without,
we estimate equivalent scores from librosa's audio analysis, using the ~730 tracks that
have both sources as a calibration set. This gives us unified audio features for about
two-thirds of all tracks. Coverage varies by album — the
Data Quality page shows completeness for each one.
Audio Features Glossary
Each feature is a number between 0 and 1 (except tempo, which is in BPM before normalization).
These are the six dimensions used across the site's radar charts and scatter plots.
- Energy —
Perceptual intensity and activity. Energetic tracks feel fast, loud, and driven.
A soft ghazal scores low; a high-octane dance number scores high.
- Microtonal Variation —
How much a track uses pitches between the 12 standard Western notes.
Higher values indicate more use of gamakas, shrutis, and bent notes — common in
Carnatic- and Hindustani-inflected compositions. Derived from librosa's chroma and
pitch analysis.
- Valence —
Musical positivity. High valence = happy, cheerful, euphoric.
Low valence = sad, melancholic, intense. Note: for tracks without Spotify data,
valence is estimated and shown with reduced confidence.
- Acousticness —
How acoustic (vs. electronic) a track sounds.
1.0 = high confidence the track is purely acoustic.
- Rhythmic Complexity —
How layered and varied the rhythmic texture is. High scores indicate
complex, shifting percussion patterns; low scores indicate a simple, steady beat.
Derived from onset density and beat-tracking confidence.
- Tempo —
The speed of the track in beats per minute, normalized to a 0–1 scale.
Rahman's tracks range from ~46 to ~217 BPM.
What These Features Don't Measure
These six dimensions capture useful sonic properties, but they were designed for
Western popular music and have real blind spots when applied to Rahman's work:
- Raga and tala structure — The melodic grammar of Carnatic and
Hindustani music is not captured. A track rooted in Mohanam and one in Kalyani may
score similarly on all six features despite being melodically distinct.
- Melodic contour and ornamentation — Gamakas, meends, and
other ornamental techniques that define a vocalist's style are invisible to these features.
- Orchestral layering — Rahman's signature production —
building from a single voice to a full arrangement — is not directly measured.
Energy captures some of this, but not the texture.
- Cultural mood — Valence measures "positivity" by Western standards.
A devotional track like Khwaja Mere Khwaja may score low on valence while being
deeply uplifting in its spiritual context.
Limitations & Caveats
This project is a labor of love, not a peer-reviewed dataset. We've tried to be thorough,
but there are real limitations you should know about:
- Not every track has audio features. About 28% of tracks have no
audio data from either source. Spotify matching depends on regional availability;
librosa requires finding the audio online. Coverage is weakest for early 90s Tamil films
and obscure dubbed releases.
- Predicted features are approximate. For tracks with only librosa data,
features like valence are estimated via regression (R² ranging from 0.37 to 0.65)
and carry lower confidence than direct Spotify measurements. Valence predictions
are the least reliable.
- Wikipedia credits may be incomplete or wrong. Singer and lyricist credits
are community-maintained and may have gaps, duplicates, or errors — especially for
older or lesser-known albums. Some tracks list the composer as "singer" when the credit
really refers to the score performer.
- Language labels have noise. Our scraper sometimes picked up Wikipedia
section headers instead of actual languages. We've manually corrected hundreds of these,
but some may remain.
- Duplicate tracks across languages. Many Rahman films were released in
multiple languages (Tamil, Hindi, Telugu). The same song may appear twice or more with
different titles and singers. We separate albums by language but the underlying
composition is sometimes the same. However, star (actor/actress) collaborations on the
Singer-Collaborators page are counted per film, not per language version.
- Album boundaries are fuzzy. Wikipedia's definition of an "album" varies:
some pages list background score alongside songs, some split soundtracks from scores,
and some Hindi dubbed releases share the original Tamil compositions. Our 282-album
count reflects Wikipedia's structure, not a canonical discography.
- Audio features were designed for Western pop. Spotify's algorithms and
librosa's defaults are trained on Western music. Features like valence and danceability
may not translate perfectly to Carnatic-inflected, devotional, or folk compositions.
Our "Microtonal Variation" feature is an attempt to address this gap, but it's an
approximation.
← Back to Explore