The Wayback Machine – Internet Archive Help Center

Save Pages in the Wayback Machine

Jeff Kaplan — Fri, 15 Mar 2024 20:55:12 +0000

Many people have shown interest in making sure the Wayback Machine has copies of the web pages they care about most. These saved pages can be cited, shared, linked to – and they will continue to exist even after the original page changes or is removed from the web.

There are several ways to save pages and whole sites so that they appear in the Wayback Machine. Here are 5 of them.

1. Save Page Now

Put a URL into the form, press the button, and we save the page. You will instantly have a permanent URL for your page. Please note, this method only saves a single page, not the whole site.

At the moment, there are a few exceptions for this method – some sites prohibit crawling, a few have SSL (security) settings that make it break – but this method will work for most pages. The feature saves the page you enter including the images and CSS. It does not save any of the outlinks, and can’t be used to initiate a crawl of an entire web site. We do not keep your IP address, so your submission is anonymous.

2. Browser extensions and add-ons

Install the Wayback Machine Chrome extension in your browser. Go to a page you want to archive, click the icon in your toolbar, and select Save Page Now. We will save the page and give you a permanent URL.

The same provisos from “Save Page Now” apply – there are some pages where it won’t work, and it only saves one page at a time. One plus to installing the extension though is that now as you surf around, when you run into a missing page we will alert you if we have a saved copy.

More extensions, apps, and add-ons:

3. Wikipedia JavaScript Bookmarklet

Nobody loves a primary source more than a Wikipedia editor. To that end, they offer a Wayback Machine JavaScript Bookmarklet that allows you to quickly save a web page from any browser.

4. Volunteer for Archive Team

Archive Team is an entirely volunteer driven group who are interested in saving Internet history. Many of the sites and pages they save end up in the Wayback Machine. Visit the Archive Team site to learn more about how to volunteer with them.

5. Sign up for an Archive-It Account

Archive-It is a subscription service provided by Internet Archive that allows you to run your own crawling projects without any technical expertise. Tell us what to crawl and how often to crawl it, and we execute the crawl and put the results in the Wayback Machine.

Archive-It is a paid subscription service with technical and web archivist support. This option is most appropriate for organizations that have a mandate to save certain types or categories of web content on a regular basis. If your institution is a current Archive-It partner, contact them for how you can contribute.

The Internet Archive has been saving web pages for 20 years. This archive has been built by thousands of people, and we would like you to help. Use one of the methods above to make sure we have the pages you care about.

Archive whole web sites

Jeff Kaplan — Fri, 15 Mar 2024 20:54:24 +0000

Organizations interested in archiving entire web sites or creating large collections of content may want to explore our Archive-It service.

Archive-It is a subscription web archiving service from the Internet Archive that helps organizations to harvest, build, and preserve collections of digital content. Through our user friendly web application Archive-It partners can collect, catalog, and manage their collections of archived content with 24/7 access and full text search available for their use as well as their patrons.

Individuals who wish to archive web pages may want to refer to this article: Save Pages in the Wayback Machine.

Developers may wish to consult the Wayback Machine API documentation.

Can I rebuild my website using the Wayback Machine?

Jeff Kaplan — Fri, 15 Mar 2024 20:53:39 +0000

There are several 3rd party services that will help you rebuild a website from archives available via our Wayback Machine. Here are some that we are aware of:

We do not have direct experience working with any of these sites, so please investigate the services to see whether they will meet your individual needs.

Wayback Machine General Information

Jeff Kaplan — Fri, 15 Mar 2024 20:53:04 +0000

What is the Wayback Machine?

The Internet Archive Wayback Machine is a service that allows people to visit archived versions of Web sites. Visitors to the Wayback Machine can type in a URL, select a date range, and then begin surfing on an archived version of the Web. Imagine surfing circa 1999 and looking at all the Y2K hype, or revisiting an older version of your favorite Web site. The Internet Archive Wayback Machine can make all of this possible.

What are the sources of your captures?

When you roll over individual web captures (that pop-up when you roll over the dots on the calendar page for a URL,) you may notice some text links shows up above the calendar, along with the word “why”. Those links will take you to the Collection of web captures associated with the specific web crawl the capture came from. Every day hundreds of web crawls contribute to the web captures available via the Wayback Machine. Behind each, there is a story about factors like who, why, when and how.

Why is the Internet Archive collecting sites from the Internet? What makes the information useful?

Most societies place importance on preserving artifacts of their culture and heritage. Without such artifacts, civilization has no memory and no mechanism to learn from its successes and failures. Our culture now produces more and more artifacts in digital form. The Archive’s mission is to help preserve those artifacts and create an Internet library for researchers, historians, and scholars. The Archive collaborates with institutions including the Library of Congress and the Smithsonian.

Where does the name come from?

The Wayback Machine is named in reference to the famous Mr. Peabody’s WABAC (pronounced way-back) machine from the Rocky and Bullwinkle cartoon show.

Who was involved in the creation of the Internet Archive Wayback Machine?

“The original idea for the Internet Archive Wayback Machine began in 1996, when the Internet Archive first began archiving the web. Now, five years later, with over 100 terabytes and a dozen web crawls completed, the Internet Archive has made the Internet Archive Wayback Machine available to the public. The Internet Archive has relied on donations of web crawls, technology, and expertise from Alexa Internet and others. The Internet Archive Wayback Machine is owned and operated by the Internet Archive.”

How was the Wayback Machine made?

Alexa Internet, in cooperation with the Internet Archive, has designed a three dimensional index that allows browsing of web documents over multiple time periods, and turned this unique feature into the Wayback Machine.

How do you archive dynamic pages?

There are many different kinds of dynamic pages, some of which are easily stored in an archive and some of which fall apart completely. When a dynamic page renders standard html, the archive works beautifully. When a dynamic page contains forms, JavaScript, or other elements that require interaction with the originating host, the archive will not contain the original site’s functionality.

Do you collect all the sites on the Web?

No, the Archive collects web pages that are publicly available. We do not archive pages that require a password to access, pages that are only accessible when a person types into and sends a form, or pages on secure servers. Pages may not be archived due to robots exclusions and some sites are excluded by direct site owner request.

Do you archive email? Chat?

No, we do not collect or archive chat systems or personal email messages that have not been posted to Usenet bulletin boards or publicly accessible online message boards.

Is there any personal information in these collections?

We collect Web pages that are publicly accessible. These may include pages with personal information.

Who has access to the collections? What about the public?

Anyone can access our collections through our website archive.org. The web archive can be searched using the Wayback Machine.

The Archive makes the collections available at no cost to researchers, historians, and scholars. At present, it takes someone with a certain level of technical knowledge to access collections in a way other than our website, but there is no requirement that a user be affiliated with any particular organization.

What is the Wayback Machine’s Copyright Policy?

The Internet Archive respects the intellectual property rights and other proprietary rights of others. The Internet Archive may, in appropriate circumstances and at its discretion, remove certain content or disable access to content that appears to infringe the copyright or other intellectual property rights of others. If you believe that your copyright has been violated by material available through the Internet Archive, please provide the Internet Archive Copyright Agent with the following information:

Identification of the copyrighted work that you claim has been infringed;
An exact description of where the material about which you complain is located within the Internet Archive collections;
Your address, telephone number, and email address;
A statement by you that you have a good-faith belief that the disputed use is not authorized by the copyright owner, its agent, or the law;
A statement by you, made under penalty of perjury, that the above information in your notice is accurate and that you are the owner of the copyright interest involved or are authorized to act on behalf of that owner;
Your electronic or physical signature.
The Internet Archive Copyright Agent can be reached as follows:

Internet Archive Copyright Agent
Internet Archive
300 Funston Ave.
San Francisco, CA 94118
Phone: 415-561-6767
Email: info at archive dot org

How can I help the Internet Archive and the Wayback Machine?

The Internet Archive actively seeks donations of digital materials for preservation. If you have digital materials that may be of interest to future generations, please let us know by sending an email to info at archive dot org. The Internet Archive is also seeking additional funding to continue this important mission. You can click the donate tab above or click here. Thank you for considering us in your charitable giving.

How do I contact the Internet Archive?

All questions about the Wayback Machine, or other Internet Archive projects, should be addressed to info@archive.org.

Using the Wayback Machine

Jeff Kaplan — Fri, 15 Mar 2024 20:51:36 +0000

This introduction video provides an overview for how to use the Wayback Machine, including information about searching by URL or keyword, understanding provenance, and saving your own pages, along with other features.

Can I link to old pages on the Wayback Machine?

Yes! The Wayback Machine is built so that it can be used and referenced. If you find an archived page that you would like to reference on your Web page or in an article, you can copy the URL. You can even use fuzzy URL matching and date specification… but that’s a bit more advanced.

How can I use the Wayback Machine’s Site Search to find websites?

The Site Search feature of the Wayback Machine is based on an index built by evaluating terms from hundreds of billions of links to the homepages of more than 350 million sites. Search results are ranked by the number of captures in the Wayback and the number of relevant links to the site’s homepage.

Can I search the Archive?

Using the Internet Archive Wayback Machine, it is possible to search for the names of sites contained in the Archive (URLs) and to specify date ranges for your search. We hope to implement a full text search engine at some point in the future.

Why isn’t the site I’m looking for in the archive?

Some sites may not be included because the automated crawlers were unaware of their existence at the time of the crawl. It’s also possible that some sites were not archived because they were password protected, blocked by robots.txt, or otherwise inaccessible to our automated systems. Site owners might have also requested that their sites be excluded from the Wayback Machine.

How can I exclude or remove my site’s pages from the Wayback Machine?

If you would like to submit a request for archives of your site or account to be excluded from web.archive.org, send us a request to info@archive.org and indicate:

the URL or URLs of the material
the time period that you wish to have excluded
the time period during which you had control of the site or relevant user account (if applicable) and
any other information that you think would be helpful for us to better understand your request.

This will initiate a review by our team. We do not make any guarantees beforehand about the outcome of a request.

How can I use the Wayback Machine’s Site Search to find websites?

How can I get a copy of the pages on my Web site? If my site got hacked or damaged, could I get a backup from the Archive?

Our terms of use do not cover backups for the general public. However, you may use the Internet Archive Wayback Machine to locate and access archived versions of a site to which you own the rights. We can’t guarantee that your site has been or will be archived. We can no longer offer the service to pack up sites that have been lost.

Can I add pages to the Wayback Machine?

On https://archive.org/web you can use the “Save Page Now” feature to save a specific page one time. This does not currently add the URL to any future crawls nor does it save more than that one page. It does not save multiple pages, directories or entire sites.

Where is the rest of the archived site? Why am I getting broken or gray images on a site?

Broken images occur when the images are not available on our servers. Usually this means that we did not archive them.

You can tell if the image or link you are looking for is in the Wayback Machine by entering the image or link’s URL into the Wayback Machine search box. Whatever archives we have are viewable in the Wayback Machine.

The best way to see all the files we have archived of the site is: http://web.archive.org/*/www.yoursite.com/*

There is a 3-10 hour lag time between the time a site is crawled and when it appears in the Wayback Machine.

Why are some sites harder to archive than others?

If you look at our collection of archived sites, you will find some broken pages, missing graphics, and some sites that aren’t archived at all. Some of the things that may cause this are:

Robots.txt — A site’s robots.txt document may have prevented the crawling of a site.
Javascript — Javascript elements are often hard to archive, but especially if they generate links without having the full name in the page. Plus, if javascript needs to contact the originating server in order to work, it will fail when archived.
Server side image maps — Like any functionality on the web, if it needs to contact the originating server in order to work, it will fail when archived.
Orphan pages — If there are no links to your pages, the robot won’t find it (the robots don’t enter queries in search boxes.)
As a general rule of thumb, simple html is the easiest to archive.
Can I find sites by searching for words that are in their pages?

No, at least not yet. Site Search for the Wayback Machine will help you find the homepages of sites, based on words people have used to describe those sites, as opposed to words that appear on pages from sites.

Can I still find sites in the Wayback Machine if I just know the URL?

Yes, just enter a domain or URL the way you have in the past and press the “Browse History” button.

Why are some of the dots on the calendar page different colors?

We color the dots, and links, associated with individual web captures, or multiple web captures, for a given day. Blue means the web server result code the crawler got for the related capture was a 2nn (good); Green means the crawlers got a status code 3nn (redirect); Orange means the crawler got a status code 4nn (client error), and Red means the crawler saw a 5nn (server error). Most of the time you will probably want to select the blue dots or links.

How does the Wayback Machine behave with Javascript turned off?

If you have Javascript turned off, images and links will be from the live web, not from our archive of old Web files.

How did I end up on the live version of a site? or I clicked on X date, but now I am on Y date, how is that possible?

Not every date for every site archived is 100% complete. When you are surfing an incomplete archived site the Wayback Machine will grab the closest available date to the one you are in for the links that are missing. In the event that we do not have the link archived at all, the Wayback Machine will look for the link on the live web and grab it if available. Pay attention to the date code embedded in the archived url. This is the list of numbers in the middle; it translates as yyyymmddhhmmss. For example in this url http://web.archive.org/web/20000229123340/http://www.yahoo.com/ the date the site was crawled was Feb 29, 2000 at 12:33 and 40 seconds.

You can see a listing of the dates of the specific URL by replacing the date code with an asterisk (*), ie: http://web.archive.org/*/www.yoursite.com

How do I cite Wayback Machine urls in MLA format?

This question is a newer one. We asked MLA to help us with how to cite an archived URL in correct format. They did say that there is no established format for resources like the Wayback Machine, but it’s best to err on the side of more information. You should cite the webpage as you would normally, and then give the Wayback Machine information. They provided the following example: McDonald, R. C. “Basic Canary Care.” _Robirda Online_. 12 Sept. 2004. 18 Dec. 2006 [http://www.robirda.com/cancare.html]. _Internet Archive_. [ http://web.archive.org/web/20041009202820/http://www.robirda.com/cancare.html]. They added that if the date that the information was updated is missing, one can use the closest date in the Wayback Machine. Then comes the date when the page is retrieved and the original URL. Neither URL should be underlined in the bibliography itself. Thanks MLA!

How can I get pages authenticated from the Wayback Machine? How can I use the pages in court? While the Wayback Machine tool was not expressly designed with legal use in mind, we receive regular requests for certified records for use in legal proceedings. Our affidavit request procedure can be found here. Please review that information including our standard affidavit and the legal request FAQ section linked there to prior to contacting us.

Some sites are not available because of robots.txt or other exclusions. What does that mean?

Such sites may have been excluded from the Wayback Machine due to a robots.txt file on the site or at a site owner’s direct request.

How can I get my site included in the Wayback Machine?

Much of our archived web data comes from our own crawls or from Alexa Internet’s crawls. Neither organization has a “crawl my site now!” submission process. Internet Archive’s crawls tend to find sites that are well linked from other sites. The best way to ensure that we find your web site is to make sure it is included in online directories and that similar/related sites link to you.

Alexa Internet uses its own methods to discover sites to crawl. It may be helpful to install the free Alexa toolbar and visit the site you want crawled to make sure they know about it.

Regardless of who is crawling the site, you should ensure that your site’s ‘robots.txt’ rules and in-page META robots directives do not tell crawlers to avoid your site.

What is the Archive-It service of the Internet Archive’s Wayback Machine?

For information on the Archive-It subscription service that allows institutions to build and preserve collections of born digital content, see https://www.archive.org/about/faqs.php#Archive-It.

How do I request to remove something from archive.org?

Jeff Kaplan — Fri, 15 Mar 2024 18:41:53 +0000

If you would like to submit a copyright claim for material found on archive.org, please refer to our Copyright Policy.

If you would like to submit a request for archives of your site or account to be excluded from web.archive.org, send us a request to info@archive.org and indicate:

the URL or URLs of the material
the time period that you wish to have excluded
the time period during which you had control of the site or relevant user account (if applicable) and
any other information that you think would be helpful for us to better understand your request.

This will initiate a review by our team. We do not make any guarantees beforehand about the outcome of a request.

Other types of removal requests may also be sent to info@archive.org. Please provide as clear an explanation as possible as to what you are requesting be removed for us to better understand your reason for making the request. Again, our team carefully reviews requests and we do not make any guarantees beforehand about the outcome of a request. #Archive.org #TheWaybackMachine

Archive-It Information

Jeff Kaplan — Fri, 15 Mar 2024 18:33:08 +0000

What is Archive-It?

Archive-It is a subscription service that allows institutions to build and preserve collections of born digital content. Through the user-friendly web application, Archive-It partners can harvest, catalog, manage, and browse their archived collections. Collections are hosted at the Internet Archive data center and are accessible to the public with full-text search.

Why would I subscribe to Archive-It instead of using the Wayback machine at Internet Archive?

Partners to this service can create distinct Web archives called “collections”, containing only the born digital content they are interested in harvesting, at whatever frequency suits their needs. All collections are full-text searchable. The collections created with Archive-It can be cataloged with metadata and managed directly by the partner. The Archive-It service maintains a minimum of two copies of each collection online, a primary and a back-up copy.

How frequently can I archive Web sites?

Archive-It is very flexible: you can harvest material from the Web using ten different frequencies, from daily to annually. Partners can select different crawl frequencies for each chosen URL. Additionally, your institution can also chose to start a crawl “on demand” in the case of an unforeseen spontaneous or historic event.

Who gets access to the collections created in Archive-It?

By default, all collections are available for public access from the main page at www.archive-it.org. However, a partner can choose to have their collection(s) made private by special arrangement.

How can I search the collections?

Archive-It provides full text search capability for all public collections. You can also browse by URL from the list provided for each collection. The public can browse and search collections by partner type or collection from www.archive-it.org.

What types of institutions can subscribe to Archive-It?

Archive-It is designed to fit the needs of many types of organizations. The hundreds of partners include state archives and libraries, university libraries, federal institutions, non government non profits, museums, art libraries, and local city governments.

Who decides which content to archive in Archive-It?

Partners develop their own collections and have complete control over which content to archive within those collections.

Where is the data stored for Archive-It collections?

All data created using the Archive-It service is hosted and stored by the Internet Archive. We store two copies online and are working with partners to have redundant copies in other locations. Partners can also request a copy of their data for local use and preservation to be shipped either on a hard drive or over the internet.

How do I sign up?
For more information, go to archive-it.org or contact us.

Why can’t I see the Web page I archived recently?

associate-richard-greydanus@archive.org — Wed, 02 Mar 2022 20:52:27 +0000

The Wayback Machine can sometimes experience delays in registering snapshots made using the Save Page Now tool. Our website replays pages only after they’ve been saved (indexed) by our system. We use several different indexes and each one covers a different time period.

Sometimes a web page that gets added quickly to our near-real-time index used by our Save Page Now service may be removed from that index before it gets stored in one of our longer-term indexes. When that happens, you might see a page available to view for a short time—maybe a few hours or days—but then it disappears for a while. It will become available again once it’s added to a longer-term index.

Rest assured that the snapshot was still captured at the time you requested— we appreciate your patience while you wait! If you are still unable to view a snapshot you took after some time, please reach out to info@archive.org with relevant details.