A study (a long-ish entry)

By Ryan G

A Glimpse of the 2010 Market for Music Consumption

If you took a survey for me last fall, this paper is the result. Though somewhat basic in scope, this might give you an idea of some themes I want to explore through the industry analysis portion of this site.

For a while now I have been looking for ways to combine my inner statistician and music geek into something practical. Since the summer of 2010, I have been studying for the probability exam, administered by the Society of Actuaries whilst nurturing an interest for all things related to the record industry, namely the rock, alternative, and Christian sectors of it. When the opportunity arose to conduct a semester-long regression analysis project, I figured it would be a good chance to begin to attempt to combine these two areas of interest.

The music industry is one that is rapidly changing. It seems like scarcely that long ago that an album selling millions of copies in its first week on the market was commonplace. Now, in late 2010, release week sales in the tens of thousands can land a record in the overall top ten in a given week. In light of this, musicians and record companies are beginning to look for new ways to promote music. Because the industry is changing, new niche markets are beginning to pop up and will likely continue for some time. By compiling data on the current ways people consume music, I hoped to gain some insight on what these niches might be and how we can go about filling them. My findings, I hoped, would shed light on a starting place for a future career, potentially.

Despite not really knowing what the current music consumption trends are, I had some educated guesses that I hoped to test via my survey and regression analysis. First, I would hypothesize that downloading music has a negative effect on overall music related revenue. This might seem like common sense, but I thought that for my project to be thorough it would be bad to make assumptions before I had even started. Second, I wanted to test the affect of concert attendance on music purchases. I initially hypothesized that the avid concert attendee is more likely to have a large awareness of the music industry as a whole, and more likely to engage in downloading to support his or her enthusiasm. I have heard some of my peers justify their use of file-sharing sites by saying that most money spent on music does not go to the artists anyway, and that they would rather go to shows and buy merchandise, since more of that revenue directly supports the artists. I wanted to see if this principle translated into the current trends within the music industry. I thought it would also be interesting to see if there was a stronger linear fit within different subgroups, based on where people most increased their musical awareness. I hypothesized that people tend to find out about new music mostly from the social networking (Facebook, Myspace Music, message boards) and radio, so that was where I would see the strongest R-squared figures. Finally, I guessed that those that had a background living in larger cities are more likely to be culturally savvy than those living in rural areas, and thus might engage more in consuming music.

My research process began with a search for current data on music sales, and concert attendance. While data is out there, most needed some sort of professional designation for access, and I could not find a summary of habits for individuals that crossed both the realms of music purchases/downloads and other music spending habits. Also, most of the studies I found dated back to the early part of the decade. The early 2000s marked a time period when entertainment headlines were dominated by the rise of file-sharing services like Napster, Kazaa and Limewire. Now, groups like the RIAA have taken steps against the proliferation of such sites and we have seen the advent of Amazon, iTunes, and Napster as leading legal online music stores. This paradigm shift has caused me to wonder if indeed downloading habits have changed on the whole and I thought the easiest way to come up with a data set easy to interpret would be to create my own, via an online survey.

Despite the lack of readily available current data on music consumption, I did look to a couple of studies from the past decade relating to my topic. The first is a study conducted by Felix Oberholzer-Gee and Koleman Strumpf, dating from February 2007 and the second was conducted by Rafael Rob and Joel Waldfogel and dates from 2006. The latter centers on the downloading habits of students at four collegiate institutions in the United States and the former centers on a group songs on the Kazaa servers, tracking their downloads and the circumstances surrounding those downloads.

Both studies have music purchases in some form as their dependent variable, and the study conducted by Waldfogel and Rob included ordinary least squares as its regression method. I followed these examples later when I built my regression model. Music purchases as the dependent variable serves as the starting point from which they drew their paradigm of the music industry, and the way I hope to draw mine.

The biggest problem in the two studies I examined was the potential for sample bias. Both are very limited in scope. While it is important to note the difficulty in obtaining a purely unbiased sample, given the nature of the data being collected, I believed some efforts were needed to make the surveys more random. Also, the study put out by Oberholzer-Gee and Strumpf, while examining a staggering 1.75 million downloads (though still a relatively small sample, given the scope of music downloading today), concentrated on only “popular” music. Hence, one study fails to take into account trends outside of popular music, and concentrating on only four colleges for data collection is too narrow. I tried to remedy these issues in the survey distribution portion of my empirical strategy.

Another problem with the existing studies was the subjective nature of the variables. For example, in the Rob and Waldfogel study they use subjective value of an individual’s album purchases along with one’s personal level of interest in music as independent variables. Without a definite way to differentiate these levels, these variables can become problematic. Rob and Waldfogel attempt to differentiate, but their categories end up coming across as neither mutually exclusive nor easy to differentiate. For example, when it came to subjective value of album purchases, they used this wording:

If the current valuation exceeded the initial valuation, we allowed respondents to indicate (1) that they were pleasantly surprised, (2) that the music grew on them, or (3) both. If the current valuation equaled the initial, we allowed respondents to indicate that (1) they were familiar with the music before purchase, (2) they guessed right, or (3) both. Finally, if the current valuation fell short of the initial valuation, we allowed respondents to indicate that (1) they were disappointed from the start, (2) they guessed right, or (3) both (Rob and Waldfogel 55).

The use of subcategories, as well as the repetitive use of “guessed right” has the potential to be confusing to the average survey taker. Also, the wording of the categories “pleasantly surprised” and “the music grew on them” is such that they are not necessary mutually exclusive. I can be pleasantly surprised by something I purchase and still grow to appreciate it more than in the beginning over time.

There a few other problems with the data the two surveys collected. Despite their publishing dates being in the later part of the decade, their actual data samples were obsolete or too limited. One sample was limited to respondents at four schools of similar size and one sample was in different era of internet trends than today. 2002, the year of Rob and Waldfogel’s survey sample, marked a different year for internet trends because in the past eight years social networking trends, average internet speed, and how music downloading has been a controversial issue have all changed. In recent years, methods of promoting music have shifted, affecting how people become aware what they choose to listen to. WiFi has also become more commonplace, making access to quick internet speeds available to those may not have been able to afford it. While access to quick internet may have an affect on how much music one downloads, I believe the influence has decreased in recent years so I chose to omit this variable from my survey. The movement of downloading issues out of the national spotlight has also likely affected perception of illegal downloading and I wanted to see how this might have changed.

For the survey, I included eight questions. I hoped the succinct nature of the survey would increase the likelihood of potential survey takers to actually take it, given that especially at Wheaton College people tend to get the impression that they are being petitioned all the time by various organizations. There was one question corresponding with each variable, and an extra one for use of dividing the total results into subcategories for individual regressions. Access to the survey was via an open event on Facebook, and word of mouth through email. This approach I hoped would overcome the problem in the Oberholzer-Gee and Strumpf study where the sample was confined to four universities. Over 500 people were invited to take the survey on Facebook, many of them not by me, which helped the randomness I hoped to achieve. I also had a feeling that if the survey had been confined to Wheaton College, the survey would have been biased not only by campus but also by “honesty.” At least in theory, Christians should go about obtaining things, including music, in an honest manner so the music spending habits of Wheaton College might not be an accurate reflection of society for that reason.

All of the survey questions were multiple-choice, with the exception of the age question which was open-ended. I wanted to keep the survey quick and easy, so for most questions I gave respondents a choice between ranges. I included three demographic variables; age, sex and geographic background. Sex I made into a dummy variable, arbitrarily choosing male to have the positive outcome. For geographic background I gave the respondents four choices, but also chose to make this into a dummy variable as it that way it would be easier to test m hypothesis that those living in larger cities are more culturally savvy. Therefore, those that answered “rural” or “small city” a value of “0” was assigned and those who answered “mid-sized city” or “large city” got assigned a value of “1.”

Some of the survey questions were weighted. For example, those answering that they listened to music for pleasure less than two hours per week received a weight of “1” and those who listened to music more than two hours per day received a weight of “5”. The same concept applied for the questions about music-related purchases, the range of times one accessed the internet for downloading music (with no distinction between legal and illegal downloads) in a given week, and number of concerts attended in a given year (designated as any experience with live music to make it easy for the respondents to estimate). So, in essence I was regressing three weighted variables, two dummy variables, and one open ended variable against one weighted variable.

When one compiles the results of the survey together, he would get an equation that reads something like this:

F(DOLLAR) = b(0) + b(1)DLFREQ + b(2)SEX + b(3)GEO + b(4)AGE + b(5)LISTEN + b(6)CONCERTS + e

In the above equation, DOLLAR represents music purchases in a year (in dollars), DLFREQ represents the number of times one accesses the internet with the goal of downloading music in a given week, GEO represents one’s geographic background, LISTEN represents the number of hours one spends listening to music for pleasure in a given week, and AGE and SEX are pretty self-explanatory. I hypothesized that DLFREQ would have a negative sign, because those downloading are almost always spending less money than those buying hard copies of music. I hypothesized GEO would have a positive sign because I assigned a value of 1 to mid-size and large cities, and my hypothesis states that larger cities are more culturally savvy, and thus more likely to contribute revenue to the music industry because they are more aware. I hypothesized LISTEN would have a positive sign because if one spends more time listening to music, their music library is probably bigger and thus they have probably spent more money on music. An alternative hypothesis might be that those who listen to music a lot have big libraries, but resort to illegal downloads to try to save money. CONCERTS could also go either way. Someone who attends a lot of shows might be more apt to support the artists through buying their music, because seeing them live adds a psychological element of them seeming more personal, or that person might not buy their music but simply attend shows because he knows that activity will do more to directly support the artist. I hypothesized the expected sign for AGE to be positive, because older people probably have more money to spend on music, and are might be less savvy about alternate ways to obtain music on the internet. I had no idea which direction SEX would lean, so I did not make any predictions there. Overall, in the grand scheme of my analysis I paid the closest attention to the DLFREQ, CONCERTS, and GEO variables, in that order. SEX was the least important variable, and was there more as a control.

The online survey was distributed via a Facebook event and email over a two week period through the online service of SurveyMonkey. 276 people responded to the eight-question survey. The age range was 14 to 100 (allegedly) and the male-female ratio was close to evenly split. Only three people responded that their main medium of obtaining music knowledge was through television outlets like MTV, VH1 or Fuse so I decided to omit that subcategory from my regressions. The top mediums of obtaining music knowledge were unsurprisingly radio and social networks, however a surprisingly high percentage (23%) of respondents chose “Other.” Another surprising result was that almost exactly two-thirds said they downloaded music less than once a week, with a similar skew happening in CONCERTS, where 61 percent of respondents saying they attend between one and five concerts per year. Standard deviations were on the whole a bit on the high side, especially in AGE where it was close to 14 years. This figure was probably due to outliers in the AGE responses, such as the individual that claimed to be 100. The weighted response variables all had standard deviations around 1, which is good. The dummy variables also had positive results, with the standard deviation being very close to 0.50 in each.

Following the example of the literature I reviewed, I decided to stick with a traditional ordinary least squares regression model. The r-squared figures that resulted for each of the five regressions were good but not outstanding. This was what I wanted, since a high r-squared would have suggested a strong possibility of multicollinearity. The CONCERTS and LISTEN variables both yielded the signs I expected. The demographic variables SEX and GEO both yielded standard errors as large as their coefficients, suggesting a high degree of serial correlation. Subsequent robust regressions on all five equations (the Newey-West standard errors approach) did little to remedy this effect, only lowering the standard errors of each a couple of hundredths in magnitude.

During and after the regression analysis, some problems with the project presented themselves. Some had to do with the methodology of the project. First, the distribution of the survey over the internet creates an unavoidable bias. The question regarding one’s main source of music awareness included two internet sources as possible answers, so given the fact that the survey’s primary mode of distribution was through a social network one would think that the answers to that question would be skewed in that direction. Luckily, this did not happen but because there is not an efficient way to distribute such a survey outside of the internet several times over a long period time we may never know exactly how much bias there is. I could have also worded a few of the questions better. Offering “1-5 concerts per year” as a choice on the CONCERTS variable question may have been too broad. Though I added the stipulation in the question that a concert counted as any experience with live music in the past year, it is likely that people still may have perceived as a concert being a show that you buy a ticket for, that is seeing a “big” artist live. In the future, I need to work with the public’s preconceived notions instead of forcing them to work around my parameters because it will not affect their much at all. Since it can be hard to guess what a mass mentality might be, this is something that would be refined over the course of multiple surveys.

The potential for some econometric problems arose up as well. The most fundamental problem is the potential for spurious correlation and joint causation. DOLLAR and LISTEN have a high probability of being jointly determined, since one is both likely to spend money on something he listens to a lot, and will spend money to fuel his listening habit in return. Determining which variable is dominant can be solved by simply setting up two separate equations with the other as the dependent variable, and testing for Granger Casuality. Spurious correlation is a threat in each survey because in theory the results can be drastically different each time and thus there could be isolated incidents of variables seeming to be correlated that are not. This was not an obvious issue here, and examining the adjusted R-squared values showed that variance inflation factors are low, implying a low likelihood of multicollinearity as well.

The music industry is one that is rapidly changing and current information available to the public is quickly becoming obsolete. This project was my attempt to begin to fill the gaping hole that exists in the analysis of this changing industry. I found a couple of hundred people willing to help me fill this hole, and I found that while demographics like gender, geographic background, and age are becoming less and less relevant the internet has become the leading source for increasing one’s music awareness. As the generation that preceded the internet shrinks, the influence of demographic variables will likely also shrink proportionally because the availability of internet resources transcends such boundaries. Econometric problems related to multicollinearity are not a big issue in my survey but fundamentally the equation could be reworked to avoid spurious correlation and joint causation. All in all, my general idea that downloading in general negatively impacts overall music revenue still stands. The result for concert attendance was inconclusive and probably needs further analysis. Geographic background can probably be dropped altogether. Most of the problems presented here can be solved by switching around variables, and changing the wording and methodology of the survey administration each time it is given. Repetition and increasing sample size will add reliability to the results.

It felt good to complete a study that felt current and relevant. Music consumption is likely a phenomenon that goes far beyond the influence of four variables on one revenue related variable, and several more studies need to be done to see if the findings of this one are corroborated in the future. However, as the industry continues to change, studies will become obsolete as quickly as they become relevant. Once the industry settles into a pattern, it will be clear what niches need to be filled. Until then, people with a passion for the industry that are willing to take a look from an economic perspective will be needed to seek out what the current needs are as the industry changes.