What is the Deep Web and How Does It Work?

Source: Thinkstock

In popular lore, the Deep Web is a den of vice and unspeakable horror. It is a place of child pornography and assassins for hire. It is also a place of fraud and criminal activities.

For example, the largest bitcoin-related scandal emerged from the Deep Web, when Ross Ulbricht Williams (otherwise, known as Dread Pirate Roberts) was identified as the owner of Silk Road, a Deep website that sold drugs. Williams was charged with selling drugs, money laundering and fraud. There is also speculation that, after a crackdown on its online activities on the visible Internet, ISIS may have moved to the Deep Web.

The Feds have also upped their game and increased monitoring of the Deep Web in response to news reports about criminal activity from there. The government defense agency DARPA developed Memex – a browser that crawls the Deep Web – and the Feds have asked Reddit to identify users on its forum for darknets.

Public interest about the Deep Web is at an all-time high. Unfortunately most of that interest skews towards negative stereotypes defined by sensational news stories.

But the Deep Web is really much more than drugs and criminal activities. In fact, the Deep Web has philosophical underpinnings in the right to anonymity and free speech. For example, a number of sites there offer pirated material for free. In that sense, the Deep Web resembles the early days of the World Wide Web, when “information wants to be free” was a war cry among its enthusiasts. Similarly, not all sites and chat rooms on the Deep Web are devoted to criminal activities (see below).

Read the following Q&A section for a more nuanced take on the Deep Web.

What is the Deep Web?

The Deep Web is the part of the web that has not been indexed or is not accessible using traditional tools and technologies. Estimates about the size of the Internet vary but there is broad agreement that despite its infrastructure prowess and programming talent, Google has been able to index only a minuscule part of it. The remaining web, which is not indexed, cannot be accessed using search engines. Here, sites also do not use the conventional rule for uniform resource locators (URLs) on the web.

The Deep Web is a myriad collection of websites, academic and criminal databases, corporate intranets, dark marketplaces and IRC chat forums. There are a number of reasons why sites choose to not be indexed by search engines. For example, academic and research databases have a financial incentive to not expose their papers to search engines. Similarly, lurking away from search engines enables criminal websites to offer services that would otherwise be easily discoverable by law enforcement agencies.

Source: Thinkstock
Source: Thinkstock

How does it work?

In its structure, the Deep Web is like an early version of the Internet. The design and technologies used by websites are perfunctory. The file and wiki structure, which are artifacts of early versions of the Web, are common there.

The Deep Web is also slow. Several sites on the Tor network (by far, the most popular network on the Deep Web) take a long time to load and have performance issues, such as slow or timed-out connections.

This is because it lacks the sophisticated and scalable infrastructure regularly employed by large mainstream tech companies, such as Google and Facebook, to speed algorithmic processing and return quick results. Instead, the Deep Web employs a network of relay computers that randomize IP addresses. It is also fickle: more than 30% of the Deep Web vanished in August 2013 after an Irish programmer was arrested for allegedly hosting sites that encouraged pedophilia and trafficking of children.

Is the Deep Web a single, vast cyberspace?

Two networks dominate the Deep Web: Tor and I2P.

Like most technology innovations Tor, or The Onion Router project, started life as a government project. It was developed at the U.S. Naval Research Laboratory to protect government communication and is a peer-to-peer (P2P) network that uses a communication system dependent on a number of interconnected computers. As mentioned earlier, the structure and design of websites on the Tor project is primitive. This means that website loading time is significant.

What it lacks in performance issues, the Tor project makes up through its security and privacy. In addition to randomizing IP addresses, the Tor project uses hashing algorithms, which use a combination of numbers, letters, and alphanumeric codes to ensure that all communication and website addresses are encrypted. This means that the network’s site addresses, or URLs, are not accessible through regular browsers.

Instead, the Tor browser is necessary to navigate the Tor network. Downloads for the browser surged by 100% after the Edward Snowden revelations. The network’s commitment to privacy and user security is comprehensive. On the download website for the browser, a number of steps are listed to maintain privacy. Even downloading the browser is prohibited.

In contrast, I2P is a recent phenomenon. Known as eepSites, I2P sites are designed and optimized for hidden services. The I2P network is relatively unused because it is new. But, as federal law enforcement authorities increase surveillance of the Deep Web, which mainly comprises the Tor network, the alternate network is gaining traction.

This Deep website advertises for a U.S. Passport – at a price.

What can I find on the Deep Web?

Most of the estimates regarding the Deep Web come from a seminal 2001 paper by Michael Bergman, who founded BrightPlanet. In the paper, Bergman outlined a number of astounding statistics and facts about the Deep Web. According to the paper, Google, which was also the dominant search engine at that time, crawled only 0.03% of the web. Bergman found that 60 of the largest sites on the Deep Web were about 40 times the size of the surface-level Internet. The Deep Web sites also received 50% more monthly traffic than surface sites.

Much has changed since then. Several sites and databases, which were earlier parts of the Deep Web, have shifted to the surface web. For example, databases, such as Lexis Nexis and JSTOR, have moved online and online access is available through subscriptions.
Parallel to the database migration is an increase in nefarious activities offline. Part of the draw of the Deep Web is its relative anonymity. Search engines and social networks use cookies to track your movement across the web and serve up custom advertising and content. In contrast, the Deep Web offers anonymity. This is a boon for criminals, researchers, and journalists, who use the network to peddle criminal services, search for information that might not be available through subscription databases, or source contacts respectively.

That said, anonymity is a boon for a number of unsavory elements as well. Along with ethical hackers (see below), you can also find assassins for hire as well as hacked adult videos detailing rape and forced sex.

What is the future of the Deep web?

The Deep Web may not be so deep in the future. There has been a concerted effort to make the Deep Web accessible in recent times. For example, Facebook recently made its service available on the Tor network. Torbook, an alternative to the deeply personalized and supposedly transparent Facebook, has also emerged within its ecosystem.

George Kadianakis, a developer who is working on a revamp of the hidden services protocol, wrote a blog post outlining the potential of the Deep Web. “Anything that you can build on the Internet, you can build on hidden services. But they’re better,” he wrote. The Tor project, which is a group of developers and evangelists working within the project, is also working to popularize the network. The blog post above discussed crowdfunding as an option for the upcoming development costs and also a means to raise awareness about the project.

Want more great content like this? Sign up here to receive the best of Cheat Sheet delivered daily. No spam; just tailored content straight to your inbox.