David A. Windham thumbnail

Data Mining Our Viewing Habits

I’ve often thought and talked about this over the years and my better half and I were talking about it last night in bed while searching the idiot box for something to watch. In case you’re wondering, we decided to finish off  Blandings1 because we like snarky British comedy.  We were debating how good the recommendations were from Netflix and more importantly what kind of conclusions could you draw form our television viewing habits. I’ve often said that if you’ll give me access to your bank statements and your Netflix account I can tell more about you than from any other sources.  I think I’ve got a pretty firm grasp on how much data mining for personality traits is happening online. It’s one of the primary reasons I dropped social media years ago. I try to be forthright and upfront about who I am… or do I? Does anyone? Is that just another layer of the ‘ego onion’ so to speak. 

I pull in, publish, and store all of my music listening habits. I document my ideas about life on this website.  I publish my work history and projects. I know most of it is relatively polished public relations in a sense. Not exactly bullshit, but close.  I know that social media is full of it. I know first hand because I had family involved in the ‘old media’ before everyone became their own PR agents online. I mean, the concepts of ‘reality’ television, the hyper-personal online publishing outfits, the fear mongering of the  ‘going out of business’ journalism. All of it really plays to some vulnerabilities of the psyche. Yeah yeah, so what does this have to do ‘data mining video viewing habits’. Back when I still thought very idealistically about the future of internet, I didn’t understand how we turn this sort of data into weapons designed to make us feel certain ways as a call to action, where most often the action is either buy this or pay attention to that.  As a kid, I vaguely understood how companies gathered data on what television viewers or radio listeners are doing.  As an adult, I think I understand exactly what data is being passed around and why. I use the word ‘think’ because I thought I knew everything as a kid, and I’m expecting that as I grow older, I’ll reflect back on everything I didn’t know as a middle-aged adult.

The question is… would I publish my entire viewing habits. Every article I read, every video I watch, every image I see?  I’m not talking about a curated sorta, look at my bookmarks or try to trace my interest based on my upbeat social media posts sorta data. I’m talking about all the other stuff…  the odd medical searches about popping some sort of growth on your ear, the curious pornography related search because I didn’t know what a ‘fleshlight’2 is, the queries into the intra-webs about the crazy belief systems of others, the cyber stalking of folks I work with. You know… that sorta stuff.  Anytime someone ask me to look at their computer, I always tell them “you know, I’ll be able to see every-thing you do” to which I almost always get a bit of  hushed reaction. It’s only the very defensive sorta ‘I got nothing to hide’ folks who never drop the laptop off or follow up on my fixing their phone or computer. I had one not too long ago where I did an extra swipe I wished I’d never seen on the phone while syncing a bunch of email addresses for an associate.  So, last night while we ran through our Netflix list we pondered really digging into creating a system for defining video qualities as they relate to personality. My wife, a psychologist, starts suggesting some ideas that relate almost entirely to previous behavioral and personality studies. Meanwhile the art school flunky in me starts in or trying to categorize filmmaker styles.  As we were discussing it, and as I have known, it’s very easy to determine personality type and other personal information based on artistic preferences.  So maybe companies like Arbitron3 having been doing it all along. Maybe the rise of ‘talk’ radio was just a precursor to the modern publishing landscape of political journalism. Maybe?

Will I be publishing all of my television or web viewing habits? It’s doubtful. I pride myself on a bit of mystique. Sometimes, because I track my music listening habits, I’ll quickly flip past a song so it doesn’t log when I find myself engaged in something that might be considered unsophisticated or distasteful.  At one time, I was pulling in viewing data from Netflix. They don’t offer a public facing API anymore so I’m sure I’d have to sign up and be vetted as some sort of content partner to even get close to pulling data4. I’ll stick with publishing a curated version of me. I’ll publish my list of favorite shows, I’ll give you a list of my bookmarks.. the public ones. I’ll share this or that on this website. As much as the informality of my amateurish writing style seems ‘off the hip’, it’s all somewhat carefully curated just like almost everything else. This is my point, while Netflix or Amazon’s ‘suggestions’ might seem to be there to help you. It’s really just a measure of how much data those companies are collecting on you. And while Facebook and others might give you an ‘export’ option, trust me… your data is not leaving that company.

So being the opportunist that I’ve grown into, I’ve added the ‘movie personality’ project into the grab bag of fun side projects I’d like to build. The entirety of every major studio film made can’t be more than a couple hundred thousand at this point. I could probably even get that in a dataset from somewhere like IMBD5. And then my wife and I could assign all kinds of hypothetical taxonomies to each film based loosely on our own personal understanding of the film. Then I could make all this data publicly available knowing how powerful a tool it would be. You could pipe in your own viewing habits to figure out your personality type, net worth, marital status, emotional stability, and intelligence quotient.  Don’t worry. I’m pretty sure this project already exists. And I’m sure that when Netflix asks you tonight who’s watching… they already know. And at some point your YouTube viewing data is going to determine your health insurance rates. I’m pretty sure mine are going to remain low because the algorithm will misappropriate my interest in medicine due to that time I binged watched a bunch of surgeries6 intended to train medical students.

1. Blandings – https://en.wikipedia.org/wiki/Blandings_(TV_series)
2. Fleshlight – https://en.wikipedia.org/wiki/Fleshlight
3. Arbitron – https://en.wikipedia.org/wiki/Nielsen_Audio
4. Netflix Backlot – https://partnerhelp.netflixstudios.com
5. MovieDB API https://www.themoviedb.org/documentation/api
6. Mayo Clinic Surgical Technique – https://www.youtube.com/watch?v=osgndmRBjsM