How Does a Website Know My Name?
Well… it finally happened to me today. While reading the news this morning I noticed that the Bill.com1 IPO managed to gain 60%2. I wanted to dig a bit into their business model since I’ve been thinking about reworking the invoicing application I use. In doing so, I headed over to Bill.com, scrolled down a bit, and viola… the inevitable new-fangled chat-box appears. However, this time the chat bot correctly identifies me with my full name. I’ve know about this practice for years and I’ve seen the exponential growth in the arena, but today is the first time, I’ve actually had it happen to me. I’ve previous figured out and am pretty well versed in all kinds of methodologies for various privacy invading tracking techniques used by the likes of LinkedIn, Facebook etc, but this one is a bit different because I was on a mobile device using a browser in incognito mode so it wasn’t cookie or tracker based. It was done just using my IP address. I headed to my desktop, changed my IP address, and tried it again using an IP set to Central America. The site chatbot misidentified me as “Jobcluster” which turns out to be an employment agency in Florida.
So, of course I’m intrigued and I set about to reverse engineer the feature. Firstly, I can see why the Bill.com IPO gained so much traction and I’ll give the companies involved kudos for being able to do so. The chat feature on Bill.com is powered by Drift3 which offers a ton of integrations and is a sophisticated project4. Drift is able to add datapoint integrations with MadKudu, Demandbase, 6Sense, and Clearbit5. Combining this data into or alongside of another system such as Segment, Saleforce, Google Analytics, or Facebook Pixel can create a powerful dataset on any person. I say person whereas most of the typical language about the tracking is either customer, consumer, market, etc. If you then combine that data with that of the big name companies like IBM and Equifax who compile and sell personal data, then you’ve got a bunch of data-points originating from a single web requests6. The are vast amounts of resources being put into closing down the ‘last mile’ of the anonymity of the internet. I suspect that it won’t just be IP addresses for long as the telecommunication providers and governments make strides in pushing to identify the devices on their networks7 for regulation and legal purposes even though vendors are had been trying to randomize those addresses8. If you’re following along, you might have noticed the recent attempt by the US Government to impose restrictions on encryption9. I’m not going dig into all that here for the sake of focusing just on the little feature I ran into this morning. I’ve previously written about privacy and tracking online10 so I’ll just leave you with the fun fact that folks are now attaching beacons to identify you on everything from churches to those little roadside political signs11 so the tracking is not just online, it’s in real space in real time12.
In order to reverse engineer the identifying info on the Bill.com site I saw this morning, I ran through each of the their third party Drift chat APIs quickly and determined that the identifying information on my IP is most likely the data coming from Clearbit. I say ‘most likely’ because I’m unwilling to set up thousands of dollars in accounts to test my theory. Drift integrations with Clearbit indicate that the Clearbit Reveal13 feature is what is trying to identify the user. Try it out by loading up a VPN and flopping around your IP addresses while viewing this page https://clearbit.com/reveal. As far as I can discern, it’s using a combination of IP and cookies. I set up a Clearbit account and starting making some API requests out of curiosity and sure enough, the IP address I was using identifies me by my name, my job, my address, my website, and my social media accounts. It’s still kicking up null for my annualRevenue, but I tried some other folks I know who work for larger public companies and it was able give me the annual revenue using the combined Enrichment API request14. I have implemented a number of structured data15 elements into my website including Open Graph elements which are also a form of structured data16. That data explains much of the data present using the Clearbit Enrichment API. However, it doesn’t explain how the Reveal API is able to pair the IP address assigned by my ISP to that data. My first guess was that I signed into an another identifiable service using the IP and it’s stored and shared that data with Clearbit. My IP has been static for years, so it could be any number of services. (Please Clearbit don’t send me a cease and desist over this post… or do and then help me explain to others how my IP is being matched to my identity). I double checked my ISP policy on customer data protections which simply stated that they can ‘share’ data with their business ‘partners’ and I also tried looking at any other identifiable service I’m using that could pair that data. I tried running a bunch of other IPs through the API and can see some of the more obvious data collection points, but I’m still baffled as to which data point is exposing my real identification through my ISP’s assigned IP address.
So what do I do with my newfangled discovery. I know… I can use my knowledge to make money… if you can imagine that. These type of services and APIs can provide some serious firepower to client facing web applications and sites. It’s potent stuff for marketing. Although I have some reservations about privacy, I’d certainly recommend them to other companies trying to engage ‘customers’ online. It’s a powerful marketing tool when used added to a CRM or funneling tool. The more advanced tools do come at a price though. The costs for the Clearbit Reveal Google Analytics plan is a thousand a month. The costs for the Reveal API subscriptions are two grand a month17. For any business of scale that’s just a drop in the bucket. I’m already trying to figure out ways to incorporate it into projects and I can see where these are going to be an invaluable tool for folks trying to reach customers or improve their sales and customer retention tactics. Just thinking about it this morning has given me some fun ideas as to how to customize a bunch of different aspects of applications based on third party data points. As of now, I enjoy messing around with the AI bots in an effort to garner humorous responses, but I suspect that these AI data driven features in applications will inevitably be messing with me given more data.
Of course there is a drawback… I think the other message here is that as these types of services start combining data points from various sources that the majority of folks will inevitably lose a large piece of their perceived anonymity online. The type of targeted messages being published by the clients of Cambridge Analytica18 showed some of the vulnerabilities of the exploits of personal data. I’m not too personally worried about my anonymity online mostly because I’m educated about the practices and have been sharing personal information online for 15 years now without issue mostly because I quarantine it under my own control. I also admit that I’m very much guilty in that arena by running reverse IP searches on everyone, tagging emails with trackers, tracking user locations via IP address through the headers in their emails, and just ‘Googling’ people to find out more about them. I recently had a conversation with some folks where I had done a bit of cyberstalking on the person and just outright asked them about their history to which they said ‘how do you know this?’. I replied with ‘the internet’… like it’s what everyone does. I just assume they do and make no bones about it because I’ve been working with this stuff for years. Maybe abuse is the wrong word here… marketers might call it a strategic advantage. But, let’s just say if I can identify you before you visit a website or other application and I know your address and can infer your wealth, I might decide to price a product or service accordingly. It’s already being done with geolocation data. Is it fair? Is it abuse or is it strategic? Does’t it happen in real life all the time? I sometimes have to give consent as to when my data is pulled in the fine print of whatever agreement I’m signing. In many ways I’ve understood very clearly that this is the direction that it the internet has been heading and I’m wasn’t entirely surprised to see a website correctly identify me by name. My experience this morning is more evidence that it’s happening online already. The last time I had that type of eye opening web experience was in the early days of LinkedIn’s connection recommendations when I was baffled as to how they understood all of my email contacts. Although I’m not a fan of treating everyone like a ‘user’, it does personalize the experience and in doing so, it makes it more effective and personal. Let’s hope the data rich developers and companies who use these tools decide to do so wisely and not just for maligned interest. I believe that the privacy laws in other countries are heading in the right direction and I’m leery of the unchecked use of this data. Just something to chew on this morning… I’ll leave you with this quote from my favorite tweeter and futurologist ~ “The Future is here – it’s just not very evenly distributed”.19
- Bill.com – https://www.bill.com/
- Bill.com, of Palo Alto, gets an early Christmas present as shares climb 61% in IPO – https://www.mercurynews.com/2019/12/12/bill-com-of-palo-alto-gets-an-early-christmas-present-as-shares-climb-61-in-ipo/
- Drift – https://www.drift.com/
- Drift Intel Integrations – https://gethelp.drift.com/hc/en-us/sections/360003545753-ABM-And-Intel-Integrations
- MadKudu, Demandbase, 6Sense, Clearbit – https://www.madkudu.com/, https://www.demandbase.com/, https://6sense.com/, https://clearbit.com/
- Here are the data brokers quietly buying and selling your personal information- https://www.fastcompany.com/90310803/here-are-the-data-brokers-quietly-buying-and-selling-your-personal-information
- Home Affairs floats making telcos retain MAC addresses and port numbers- https://www.zdnet.com/article/home-affairs-floats-making-telcos-retain-mac-addresses-and-port-numbers/
- Behind the One-Way Mirror: A Deep Dive Into the Technology of Corporate Surveillance- https://www.eff.org/wp/behind-the-one-way-mirror
- Distrust of Tech Could Be Encryption’s Achilles’ Heel – https://www.axios.com/newsletters/axios-codebook-9630f129-ae8b-4813-b64d-4712b65e9835.html
- David A. Windham – Privacy and Cookies – https://davidawindham.com/privacy-and-cookies/
- Campaign Put Beacons on Lawn Signs to Track Phones – https://mashable.com/article/beacons-location-tracking-republican-campaign/
- Beaconstac – https://www.beaconstac.com/
- Clearbit Reveal – https://clearbit.com/reveal
- Clearbit Enrichment API Docs – https://clearbit.com/docs#enrichment-api
- Schema.org Structured Data – Wikipedia – https://en.wikipedia.org/wiki/Schema.org
- Facebook Open Graph https://en.wikipedia.org/wiki/Facebook_Platform#Open_Graph_protocol
- Clearbit Reveal Subscriptions – https://help.clearbit.com/hc/en-us/categories/115001976668-Reveal#Subscriptions
- Cambridge Analytica – https://en.wikipedia.org/wiki/Cambridge_Analytica
- William Gibson – https://en.wikipedia.org/wiki/William_Gibson
22/04/12 – Update: Saw an episode of Last Week Tonight last night on Data Brokers that touches on some of the issues I mention in this essay. The full episode is available @ https://www.youtube.com/watch?v=wqn3gR1WTcA