Where is Security 3.0 ?
As people push their applications to be more ‘web 2.0′ and look towards the potential future ‘web 3.0′ (semantic web), I wonder who is considering the security aspects of such a move? Every day we pave the way to the semantic web with Microformats, Social tagging, collaborative content, mash ups, and public APIs. Every day we also hear rally cries for increased privacy and protection from Big Brother and the Evil Corporations. At the moment, I don’t think we can have it both ways without some serious effort in the near future.
Earlier this month I was reading two articles about privacy that put had me considering the relationship between privacy and meaningful content.
The first article from the Google blog, “Why does Google retain information about searches?”. The article detailed the business/legal/moral dilemma of search logs and how long to store them. Google eventually settled on anonymizing their logs after 18-24 months. From a business perspective, it is understandable why they would want to keep the logs. From a privacy perspective, it is absolutely useless. Google gives you the opportunity to view your search history and I can tell you now, most people (myself included) are easily identifiable by their search history given a minimal amount of information on the target. Think of the software you log into that has an administrator login (like Wordpress). That one URL alone is a dead give away to your identity. As we increase the meaningfulness of URLs and content, these identification points will increase in number.
The second article was regarding ISPs selling click stream information to businesses. Once again, similar to Google, but this time any third party regardless of reputation could purchase your data.
I don’t know why this is shocking. This sort of snooping is going to happen all the time. We need to plan for it to occur and roll with it.
Yet again, the article goes into how the data is anonymized, but as you can get an idea from the AOL search logs, that “anonymized” data is far from anonymous. You can pick out a good number of identities just by staring at the data. As we move towards more meaningful content, I would expect to see historical user data become a more valuable and dangerous item.
So here we are with the desire for more meaningful content on one end, and the need for privacy on the other. How can we address these needs simultaneously? I don’t really have the answers. I can see “disposable online identities” becoming a potentially popular item in the future. Your computer could help you create (and enforce) one unique identity per website in the same way it will help you store passwords and cookies. I know browsers already do this, but it will take some sort of policing to keep users from being lazy. At least this way, it would be more difficult for third parties to find you via online name, username, etc. The clickstream snooping is pretty much unstoppable outside of an encrypted proxy, but I can’t see that becoming the norm. You are just shifting the problem from the ISP to the encrypted proxy. Now someone different is selling your data.
I think we are already seeing real world implications of improved searches and more meaningful content. Without effort, I can receive a full customer list of a competitor in my employer’s industry. This may not seem like a big deal, but I would rather have a list of customers I could target with a specific battle plan rather than cold calling trying to find someone who needs my services. Not to mention, the data found on job applicants applying for positions is often hilarious. People have no idea that there is no longer any effort involved in profiling a person. I think both putting the data out there, AND deciding whether to use available data have strong moral and legal implications. I would strongly urge developers to start considering what data you put out there. I think the only way to address security concerns of a more unified web is for developers to work together and be conscious of what you are “dumping” on the Internet and how different data points can be used in conjunction. I am greatly looking forward to where this issue is headed. There are so many different possibilities: Maybe the Internet loses the bulk of the anonymity to keep everyone on the level. Maybe the Internet becomes Cypherpunk heaven, and encryption becomes as fashionable as social networking. Maybe some agency, nation, governing body, etc tries to police the hell out of it. Maybe we blunder through the issues after the fact. I have absolutely no idea, but I am definitely interested in everyones thoughts and ideas. It feels like we are right on the cusp of significant change. Until then, I will make a conscious effort to sanitize outgoing data, and wait for the next big idea.





