A word-for-word template to regain your privacy from the Wayback Machine
The Internet Archive Wayback Machine has been trawling the internet since 1996 caching snapshots of webpages, even entire websites, and holding them in a virtual museum at https://web.archive.org.
Here’s a snapshot it took of the Twitter homepage on 30th June 2007. The black bar along the top of the image shows the number of snapshots it’s taken of that page over the years.
It’s a wonderful piece of history and the Internet Archive is free. You can go back and see how websites have changed over time.
The Wayback machine snapshots individual pages too. Here’s Ashton Kutcher’s twitter page as at 3rd March 2009.
This is the part I don’t like much. Say you had a twitter account and then a few years down the line, decided to make your account private. Random people can’t see your old posts and this is what you want.
But the wayback machine has saved your posts from years gone by and anyone can find them. The Internet Archive only trawls public pages and when it took a snapshot of your twitter page, it was before you made it private. So it hasn’t done anything wrong.
The Internet Archive doesn’t need your permission to take a snapshot of a public page and the twitter terms and conditions that you agreed to undoubtedly have this covered.
Basically, you use social media knowing your posts might live on forever somewhere — as a quote in a newspaper article, on a stalker’s hard drive or in a publically accessible internet library. It’s a risk we accept when we use social media.
Taking Control of Your Personal and Business websites and blogs
A business site I created in 1998 lives in the Internet Archive Wayback museum. It was nostalgic but also spooky to see it sitting there fully functional because it’s been two decades since I killed it. But every now and again, I’ll look it up and remember that special era when the internet was brand new and I’ll think about all the clients I made logos and websites for.
A blog I used to write when I was an expat in the Middle East lives in the Wayback museum too. I killed it when I returned to the UK and sometimes wish I’d kept it live even though I don’t want anyone to read it anymore. So it soothes my soul that those expat memories are archived in a museum at an address only I know.
But there are many reasons you might not want the history of your website to live on forever in the digital library of the internet archive:
- There is private information you no longer want in the public domain
- You’re selling the website and don’t want to be associated with the new owner
- You purchased a domain and don’t want to be associated with whatever business owned it before you
- You purchased a domain and you don’t want people to know how much you brought it for (the wayback machine captures redirected sedo splash pages)
- You don’t want people to see how your website changes over time
I decided I didn’t want this website, Wednesday Genius, in the Internet Archive. If you search for it, you’ll see this:
Before I tell you what I did to have my website removed, I want to point out there are advantages in keeping your site in the wayback machine:
- It’s a piece of your history and from personal experience with many of my sites, it’s so much fun to go through old versions after several years
- You can use it to restore information you delete by mistake
- You can reference it as part of your portfolio once you’ve moved on to a different phase in your life
But if you’re sure you don’t want all previous versions of your site archived by the Wayback Machine, read on.
The Template to Remove Your Website From the Internet Archive
Removing your website from the Internet Archive and keeping it out is a two step process. Here’s what you do:
Step One: Send This Email
Find your website on https://web.archive.org. You’ll be customising the following email with your details.
Once you’ve customised the email, send it to firstname.lastname@example.org with the subject “DMCA Take Down Notice”
Email template: Customise the bold text with your own details
Step Two: Amend Your Robots.txt file
Now you need to tell the Wayback Machine robot that you don’t want it looking at your site in future.
The robots file is a .txt document that you add to the root of your domain.
Open notepad or any text editor and write this:
Save the document as robots.txt and upload this file to the root directory of your site. For example, if your domain is www.mywebsite.com, you will place the file at www.mywebsite.com/robots.txt.
It took them a week to confirm my website was being submitted for exclusion and when I checked two days later, it was gone.
The Internet Archive is a useful resource and I enjoy using it. As with all things, moderation is key. Everything you do doesn’t have to exist for public consumption at all times.
Share this article