A Closer Look at the Geographical and Other Data Coming Out of SoundCloud and Other Music Services
I recently indulged my need for storage space on SoundCloud and upgraded to their “Pro-Unlimited” package. It could be argued that I’ve got a fair amount of cruft on my account, or at least recordings that could stand to be redone, but for the sake of simplicity, I went ahead got the unlimited storage. This comes with the additional feature of more detailed reporting data, such as information about listeners on the city level. So I thought I’d take closer look at what I’m getting.
As someone who does in fact have a degree in Geography and still has a great love for the subject, the city data is particularly intriguing — if nothing else because I like to look at maps. My curiosity was particularly piqued by the fact that for some reason France was showing as one of the top countries. It’d be fun to have an excuse to visit of course, but mostly it was just interesting. On a semi-related note I came across the latest edition of what is probably my favorite board game "Ticket to Ride" this past weekend which is combination France / the Old West — this of course got me dreaming of trains and riding the rails in Europe again.
I do however have some issues with how SoundCloud presents this data. As is often the case with these sorts of applications, the data is marred by poor presentation and is made unnecessarily opaque. To make things worse, there seems to be pretty much no API available to get reporting data (and apparently no dedicated API support) — once again this is not untypical in my own experience developing such applications. The use cases for the APIs tend to focus on things like upload and playback.
That’s not to say those functions aren’t valuable, but to an independent artist, like any entrepreneur, data and trends about your listeners are invaluable. Location data is particularly important if you’re considering touring and trying to target locations that make sense to visit based on the potential audience that’s there. One must of course be mindful of the fact that this geographic data is likely to be imperfect because it assumes the client (i.e. the end user) is being “honest” about the location data they’re providing. My own personal data is not even terribly useful for that as it’s (sadly) a pretty small data set.
I’ll compare and contrast what SoundCloud provides with what I see on CD Baby (which I use for distribution) and Spotify, which likewise give breakdowns of the data based on listener location, albeit with a less interesting visual display. Basically the Spotify and CDBaby data is the same set, but it’s a little more “raw” in Spotify. I’m looking that those two because I have some actual data points to compare and they seem to be the most rich data wise. I know Bandcamp offers mapping info for a premium (which I'm not paying) and I haven’t even begun to look at other platforms like ReverbNation beyond a cursory look that suggests it’s pretty limited (only showing state level stats with no obvious promises of more if I pony up some cash). That this data is so balkanized across various platforms is itself problematic and worthy of its own discussion.
My primary complaint about SoundCloud’s presentation is that the data gets artificially and arbitrarily limited to the top 50 in various categories without a way of usefully breaking down data into regions — stepping from the country level to the city level goes from far too opaque a level to far too fine grained. It’s not terribly useful to know the total number of users at the country level when dealing with the U.S. But as soon as you step down to the city level, unless you really know a particular area, it’s just a flurry of names. Furthermore, when you are looking at the country level you can’t drill down into that country and see the cities therein. In the listing below you should be able to click on say, France, and see all the French cities. But you can’t. This seems like pretty simple change request given what’s already there.
Now, since I don’t know France well enough to get a sense where all the individual cities that were coming up in the city level data were relative to each other in case there was any pattern. Contrast this to what CDBaby does with the data it gets from Spotify and Apple Music by at least giving you a map — unfortunately I have very listeners in France on those platforms, so I’ll show my U.S. data instead. Since I’m from San Francisco, it’s clear there’s a bias to that location in my data (once again my number of spins is embarrassingly low, but this is the curse of the indie artist).
If you look at the “raw” data on Spotify on the artist’s page, you can see a list of top cities. As near as I can tell, CDBaby is aggregating on the metro level so some of the more granular detail is lost. I would say they are being rather aggressive as San Jose and Fremont seem to be getting lumped into San Francisco — even San Mateo is kinda a stretch (anyone who knows the Bay Area knows knows that traffic and bridges divide the region up into some pretty distinct and arguably isolated areas). That or certain data points get dropped from the mapping software CDBaby uses. I reached out and asked their support about this but I’ve not yet gotten a concrete answer from CDBaby as to how this works beyond their general disclaimer that “counts and location names may vary slightly between views due to variances in available geographic data" (that is the number you see at the country level tends to be larger than the totals for city data you see plotted as above). I'd of course love to have the raw Spotify data, but from what I can tell there's no public API for Spotify to do reporting (beyond downloading total listeners, streams and followers as a CSV)
So I took a little time to go into Google Maps and plot out the individual cities (done by brute force). Now realistically, this data set is tiny and not really particularly predictive. Better than raw listens would be to look a set of individuals who had liked or commented on tracks — which in fairness, SoundCloud does give you but there so few of those in my data it’s practically non existent so I’m just looking at plays — keeping in mind in SoundCloud a play just means the play button was clicked — you’re not getting any retention or drop off rate data as you would with say Facebook videos or tracks on Bandcamp.
In either case, the fact that the list gets cut off at given (arbitrary) number (50) and there’s no filtering at a country or regional level available means data points are potentially getting lost. This wouldn’t be so bad if you were looking at data on a country or regional level, but when we get to individual cities, it’s a bit problematic given the sheer number of municipalities in a given area that could realistically be considered to be part of the same “market." Now you can look at the data from a individual song standpoint, and that can get you more detail at the country level but it’s a bit cumbersome to navigate in that way and still likely to lead to some truncation of the data.
Ideally SoundCloud would actually map the data as I have done in this example so one is better able to get a sense of the geographical distribution of your listeners. Better would be to give a sense the way CDBaby does of proportional number of listeners at a given location (I did some rough color coding in my map). Short of actually presenting a map, providing the data for export such that it could be imported into a Google Maps or some of the 3rd parties that CDBaby uses would be much more helpful. This would be useful for stripping out locations where you have no particular interest in touring. For instance I had a random spike of listeners in Ho Chi Minh City and Saigon, which is kinda interesting, but if I was wanting to make sure I had the full set of French listeners, those data points are just noise when all I can see is the top 50.
I ended up doing the British and other European counties as well as the U.S. and France because I’m a sucker for these sorts of things. So you see a cluster around London, smaller cluster around Paris — just sheer population numbers alone would cause that. No big surprises — though kind of a little cluster in Pays de Loire, with more than one play in Angers for what it’s worth.
Predictably lots of data points around the Bay Area (made more clear when you zoom in) and then further south around L.A. which I've visited a few times (And I have "The L.A. Song" but that was released earlier than the time period that this data comes from). That I have some more listens from around Austin and Nashville (marked with ever so slightly greener pinpoints for the multiple listens) also makes sense given recent (and upcoming) visits to those places for music conferences and and SxSW.
Honesty this would all be a lot more interesting if the numbers were an order of magnitude bigger, but that’s also just me pining for a bigger audience. I will say, this is potentially a lot more detailed than what you will get out of Spotify, which doesn’t even make data past a timeframe available.
On a completely different note, the URL data suffers from not having anyway trim off irrelevant query string info in such a way that you can look at domains in a useful way. Any URLs coming via Facebook tend to have a unique identifier stitched onto them that artificially explodes the number of sources from which end users are coming (the FaceBook Click ID, or fbclid, paramater). This makes it quite difficult to ascertain the actual breakdown of where traffic is coming from.
As you can see, the data from Americana UK and Americana Highways is getting distorted by the fbclid URL parameter. Breaking down the traffic by that which came directly from a given source vs. that came through Facebook is actually an interesting datapoint (and there’s actually a whole other discussion to be had on that subject). But without something to correlate fbclid with, it’s useless noise that’s preventing me from seeing the full story. Ideally you’d be able to strip out query parameters when it makes sense — or just get the raw data.
An entirely separate problem is data that may be junk data. I’m not sure why an obscure live recording from years ago suddenly spiked one day, and I can’t seem to get any data as to where those plays came from. It could be it was featured somewhere, but I also worry that there are more nefarious happenings with bots (which do tend to descend upon new tracks in order to offer you more plays). Then there’s the possibility of people (or rather bots) using VPNs to spoof an IP Address from another country.
One notable bit of data that’s missing is any sort of demographic data. SoundCloud gives you top listeners, which could be useful for finding “superfans” but tells you nothing about broader trends such as age or gender. This is obviously because that data isn’t even in the user profile of SoundCloud account and would clearly not be available for anonymous users. In fairness, CDBaby only gives age (no gender info) which clearly is available Spotify.
Another useful thing to know would be where fans are who listen the most are located. Obviously now we’re getting into some potential privacy issues, but there’s a quite a difference between a lot of one off listens vs. individuals who listen repeatedly and having a sense where your tracks are getting repeated listens would indicate where the richer veins of audience lie. You could of course go off likes and reposts, being wary once again of the bots that tend to descend on new tracks to offer promotional services. You can at least reach out to listeners that are repeatedly listening to your tracks (though in my case, its often other musicians learning the song).
Anyway, that’s what I’ve learned and explored to date with SoundCloud. There’s a whole other realm of geographic data to be obtained a discussed when it comes to other Social Media platforms, Facebook in particular, who’s bread and butter comes from knowing everything about its users. YouTube would also be worth exploring, though an initial first glance suggests to me you won’t get deeper than state level data.
If I were to offer SoundCloud some unsolicited advice, as the title of this piece promises, I feel like at a minimum, SoundCloud could make a few modest changes that will make the data more navigable. Allow users to drill down or filter by country — and ideally add regional filtering. In addition or alternatively, allow the data to be exported so users can analyze it in other 3rd party applications to make better use of it.
I’m curious as to what other people’s experiences are looking at geographic data on different platforms.
And if you want to give me some more data to play with as I explore these things, or just listen to my music, there are some links below.
The extended musings of a songwriter.