I've seen things you people wouldn't believe
Yet another answer to a question on Quora.  
Today's question: "Do large tech companies collect and compile enough data from users to create a virtual profile/entity? Are we essentially tracked, profiled, and categorized?"
I was just talking to a colleague about this yesterday.
Google has this great nimbus of data that I’ve provided it over the years, if you look at it that way: every box I’ve checked, every term I’ve searched for, every YouTube video I’ve watched, all the apps I’ve downloaded, the domains I’ve registered, the photos I’ve uploaded, the docs I’ve written, the email I’ve received.
Someone who sat down with all of this data and forced themselves to read it all would get a sort of picture of what I’m like. Here’s what he wrote about his mother the day she died. Here’s a recipe for cake. Here are instructions for feeding his cats when he’s out of town. Here’s a list of board games he played at a convention. Here’s a picture of the back of his TV set. Here’s the number of stars he gave a restaurant. Here we see that some day in 2015 he searched Google for “yes close to the edge.” He played Spaceteam on his phone. Here’s a picture of cards from a board game featuring photos of people dressed like film-noir characters. Here’s his birthday. Here’s every piece of spam email he ever received.
I happen to know, being someone who works there, and who has the motivation to read articles in the New York Times by the company’s CEO that most people do not, how much of that information is actually analyzed and used. That recipe for cake was definitely analyzed; I can search for “cake” in my Docs page and it shows up quickly, because some machine built an index of all of my docs. Something analyzed all my photos, because I can use the (pretty amazing) feature of Photos that lets me find every picture I’ve taken that contains a glacier. That’s a lot of analysis. But it’s really only used for what I just described.
Am I tracked? Sure. Every time I open a document in Docs, it tracks how long it took the server to open that document and present it to me, and the server pushes out a tracking metric that goes into the metrics database so that the SREs maintaining Docs get an alert if too many requests like mine are failing or taking too long. Every preference I set in any screen of any app gets written to a database somewhere, and gets checked whenever I’m using a program that needs to know what I’ve said I prefer. All of my search queries get written to a log, and when Google Trends calculates how many people in the US searched for Katy Perry this month, that log entry is one of the ones that get counted. (Never mind that what I was searching for was Adam Neely’s video about just how bad the plagiarism verdict against Katy Perry is; it still counts as a search for Katy Perry.)
Not only have I been tracked (my query’s associated with my user ID, because I was logged in when I made it), but I’ve been categorized (I’ve incremented the bucket of “people in the US who searched for Katy Perry in August of 2019”).
If people cared about me - say, if I blew up a federal building and then went on the run - they could go to Google with a court order and get the company to disgorge a lot of this information. I can imagine the FBI sifting through my email and contacts and documents trying to figure out if that recipe for cake means something else. (Could “baking powder” be a code for “ammonium nitrate?”) But Google probably wouldn’t give them the ratings I’ve given songs in Google Music, or the index that Photos built of all of the pictures I’ve taken that have dogs in them, or the fact that I set Keep to dark mode.
But since I’m not actually the target of a federal investigation, nobody’s going to bother doing even that. Indeed, I’ll probably go to my grave without anyone except me looking at most of that data. But it’ll live on after me, maybe for years or even decades, before some job that’s cleaning up old inactive accounts purges the last traces of me, like Ozymandias in reverse.
Today's question: "Do large tech companies collect and compile enough data from users to create a virtual profile/entity? Are we essentially tracked, profiled, and categorized?"
I was just talking to a colleague about this yesterday.
Google has this great nimbus of data that I’ve provided it over the years, if you look at it that way: every box I’ve checked, every term I’ve searched for, every YouTube video I’ve watched, all the apps I’ve downloaded, the domains I’ve registered, the photos I’ve uploaded, the docs I’ve written, the email I’ve received.
Someone who sat down with all of this data and forced themselves to read it all would get a sort of picture of what I’m like. Here’s what he wrote about his mother the day she died. Here’s a recipe for cake. Here are instructions for feeding his cats when he’s out of town. Here’s a list of board games he played at a convention. Here’s a picture of the back of his TV set. Here’s the number of stars he gave a restaurant. Here we see that some day in 2015 he searched Google for “yes close to the edge.” He played Spaceteam on his phone. Here’s a picture of cards from a board game featuring photos of people dressed like film-noir characters. Here’s his birthday. Here’s every piece of spam email he ever received.
I happen to know, being someone who works there, and who has the motivation to read articles in the New York Times by the company’s CEO that most people do not, how much of that information is actually analyzed and used. That recipe for cake was definitely analyzed; I can search for “cake” in my Docs page and it shows up quickly, because some machine built an index of all of my docs. Something analyzed all my photos, because I can use the (pretty amazing) feature of Photos that lets me find every picture I’ve taken that contains a glacier. That’s a lot of analysis. But it’s really only used for what I just described.
Am I tracked? Sure. Every time I open a document in Docs, it tracks how long it took the server to open that document and present it to me, and the server pushes out a tracking metric that goes into the metrics database so that the SREs maintaining Docs get an alert if too many requests like mine are failing or taking too long. Every preference I set in any screen of any app gets written to a database somewhere, and gets checked whenever I’m using a program that needs to know what I’ve said I prefer. All of my search queries get written to a log, and when Google Trends calculates how many people in the US searched for Katy Perry this month, that log entry is one of the ones that get counted. (Never mind that what I was searching for was Adam Neely’s video about just how bad the plagiarism verdict against Katy Perry is; it still counts as a search for Katy Perry.)
Not only have I been tracked (my query’s associated with my user ID, because I was logged in when I made it), but I’ve been categorized (I’ve incremented the bucket of “people in the US who searched for Katy Perry in August of 2019”).
If people cared about me - say, if I blew up a federal building and then went on the run - they could go to Google with a court order and get the company to disgorge a lot of this information. I can imagine the FBI sifting through my email and contacts and documents trying to figure out if that recipe for cake means something else. (Could “baking powder” be a code for “ammonium nitrate?”) But Google probably wouldn’t give them the ratings I’ve given songs in Google Music, or the index that Photos built of all of the pictures I’ve taken that have dogs in them, or the fact that I set Keep to dark mode.
But since I’m not actually the target of a federal investigation, nobody’s going to bother doing even that. Indeed, I’ll probably go to my grave without anyone except me looking at most of that data. But it’ll live on after me, maybe for years or even decades, before some job that’s cleaning up old inactive accounts purges the last traces of me, like Ozymandias in reverse.
Comments
Post a Comment