About This Project

An interactive exploration of AR Rahman's complete discography — 283 albums (across multiple languages) and 2,110 tracks spanning 1992 to 2026, visualized through data. Browse sonic fingerprints, trace collaborator networks, and discover patterns across three decades of one of cinema's most inventive composers.

Built by Swaroop and Sharan, who also host Brothers in Music: The AR Rahman Edition — a podcast where two brothers explore Rahman's discography year by year.

The Podcast

▶ Spotify ▶ Apple Podcasts RSS Feed

View all 13 episodes

1. 1992
2. 1993
3. 1994 (Part 1)
4. 1994 (Part 2)
5. 1995 (Part 1)
6. 1995 (Part 2)
7. 1996
8. 1997 (Part 1)
9. 1997 (Part 2)
10. 1997: Vande Mataram
11. 1998 (Part 1: Dil Se)
12. 1998 (Part 2: 1947 Earth)
13. 1999 (Part 1)

Data Sources

Track and film metadata was scraped from Wikipedia's discography pages for AR Rahman. This includes album names, track listings, singer credits, lyricist credits, director, cast, release year, and language.

Audio features come from two sources:

Spotify (via ReccoBeats) — provides danceability, energy, valence, acousticness, and tempo for tracks with a Spotify match (~35% of the catalog).
Librosa (direct audio analysis) — computes rhythmic complexity, spectral features, and harmonic structure from the audio waveform (~64% of the catalog).

For tracks with Spotify data, we use Spotify's features directly. For tracks without, we estimate equivalent scores from librosa's audio analysis, using the ~730 tracks that have both sources as a calibration set. This gives us unified audio features for about two-thirds of all tracks. Coverage varies by album — the Data Quality page shows completeness for each one.

Audio Features Glossary

Each feature is a number between 0 and 1 (except tempo, which is in BPM before normalization). These are the six dimensions used across the site's radar charts and scatter plots.

Energy — Perceptual intensity and activity. Energetic tracks feel fast, loud, and driven. A soft ghazal scores low; a high-octane dance number scores high.
Microtonal Variation — How much a track uses pitches between the 12 standard Western notes. Higher values indicate more use of gamakas, shrutis, and bent notes — common in Carnatic- and Hindustani-inflected compositions. Derived from librosa's chroma and pitch analysis.
Valence — Musical positivity. High valence = happy, cheerful, euphoric. Low valence = sad, melancholic, intense. Note: for tracks without Spotify data, valence is estimated and shown with reduced confidence.
Acousticness — How acoustic (vs. electronic) a track sounds. 1.0 = high confidence the track is purely acoustic.
Rhythmic Complexity — How layered and varied the rhythmic texture is. High scores indicate complex, shifting percussion patterns; low scores indicate a simple, steady beat. Derived from onset density and beat-tracking confidence.
Tempo — The speed of the track in beats per minute, normalized to a 0–1 scale. Rahman's tracks range from ~46 to ~217 BPM.

What These Features Don't Measure

These six dimensions capture useful sonic properties, but they were designed for Western popular music and have real blind spots when applied to Rahman's work:

Raga and tala structure — The melodic grammar of Carnatic and Hindustani music is not captured. A track rooted in Mohanam and one in Kalyani may score similarly on all six features despite being melodically distinct.
Melodic contour and ornamentation — Gamakas, meends, and other ornamental techniques that define a vocalist's style are invisible to these features.
Orchestral layering — Rahman's signature production — building from a single voice to a full arrangement — is not directly measured. Energy captures some of this, but not the texture.
Cultural mood — Valence measures "positivity" by Western standards. A devotional track like Khwaja Mere Khwaja may score low on valence while being deeply uplifting in its spiritual context.

Limitations & Caveats

This project is a labor of love, not a peer-reviewed dataset. We've tried to be thorough, but there are real limitations you should know about:

Not every track has audio features. About 28% of tracks have no audio data from either source. Spotify matching depends on regional availability; librosa requires finding the audio online. Coverage is weakest for early 90s Tamil films and obscure dubbed releases.
Predicted features are approximate. For tracks with only librosa data, features like valence are estimated via regression (R² ranging from 0.37 to 0.65) and carry lower confidence than direct Spotify measurements. Valence predictions are the least reliable.
Wikipedia credits may be incomplete or wrong. Singer and lyricist credits are community-maintained and may have gaps, duplicates, or errors — especially for older or lesser-known albums. Some tracks list the composer as "singer" when the credit really refers to the score performer.
Language labels have noise. Our scraper sometimes picked up Wikipedia section headers instead of actual languages. We've manually corrected hundreds of these, but some may remain.
Duplicate tracks across languages. Many Rahman films were released in multiple languages (Tamil, Hindi, Telugu). The same song may appear twice or more with different titles and singers. We separate albums by language but the underlying composition is sometimes the same. However, star (actor/actress) collaborations on the Singer-Collaborators page are counted per film, not per language version.
Album boundaries are fuzzy. Wikipedia's definition of an "album" varies: some pages list background score alongside songs, some split soundtracks from scores, and some Hindi dubbed releases share the original Tamil compositions. Our 282-album count reflects Wikipedia's structure, not a canonical discography.
Audio features were designed for Western pop. Spotify's algorithms and librosa's defaults are trained on Western music. Features like valence and danceability may not translate perfectly to Carnatic-inflected, devotional, or folk compositions. Our "Microtonal Variation" feature is an attempt to address this gap, but it's an approximation.

← Back to Explore