Huge number of websites barely visited, report finds

The 'long tail' is cold and dark.

The Internet, famously, has a long tail, but a new analysis has revealed another characteristic of this vast slew of obscure websites. Huge numbers of them are never visited.

Analysing visits to several million websites during the last quarter of 2009 for its State of the Web report (registration required), cloud security startup Zscaler created a Hilbert curve-generated 'heatmap' of active and inactive IPv4 sites from real customer data. As expected, the grid that emerged from this showed clusters of active sites as white dots, a large volume of reserved or non-routed addresses in gray, but it was the sea of dark that loomed largest of all.

In the three months of the analysis, vast numbers of sites were not visited at all, and on the assumption that Zscaler's customers are typical of Internet users more generally, these are Internet's lost continent of sites nobody ever visits, or visit so infrequently that it doesn't register.

"It's a fascinating view which exposes just how vast the Internet truly is. Even when analyzing traffic from millions of users over the course of three months, it can be seen that much of the Internet remains untouched," say the authors.

Commentators often refer to the 'dark side of the web', meaning the criminal and unsavoury parts of the Internet few normally look closely at, but what Zscaler has turned up on its map is dark in a more literal sense. Nobody looks at these sites or if they do it is incredibly hard to detect from the US cloud.

Some of this 'unlit space' could, of course, be non-English speaking domains beyond the ken of Zscaler's customer base, which raises the possibility that there are several 'long dark tails' on the Internet which depend from which point you measure the phenomenon.

Part of the explanation for what does not get visited in Zscaler's report might also be explained in relation to what does.

According to the company, even half a decade ago the web was just that, a space defined by html files. Although many persist on seeing the web in this way, the file types moving across its servers have changed markedly. Now, more than half of such files are Jpegs or Gifs, with html files accounting for only 0.57 percent of files.

Popular domains also dominate the Internet, hovering up more and more of people's attention span. Liveperson, Google, doubleclick (the web ad distribution network), Yahoo, Facebook, and a clutch of less well known but structurally important web domains took a large percentage of all web visits, a sign that the web is becoming more concentrated on fewer locations. This is the part of the Internet that is growing.

Tellingly, a similar story of concentration is seen in terms of malware hosts, though with considerable fluctuations. Depending on the particular type of scam being looked at, huge number of malicious URLs emanate from a very small number of hosts. Whether botnets, phishing websites, or malware servers, there is usually a single mega-source, one or two large sources, and a large number of sources with extremely small shares.

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

John E. Dunn

Techworld
Show Comments

Cool Tech

Breitling Superocean Heritage Chronographe 44

Learn more >

SanDisk MicroSDXC™ for Nintendo® Switch™

Learn more >

Toys for Boys

Family Friendly

Panasonic 4K UHD Blu-Ray Player and Full HD Recorder with Netflix - UBT1GL-K

Learn more >

Stocking Stuffer

Razer DeathAdder Expert Ergonomic Gaming Mouse

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

David Coyle

Brother PocketJet PJ-773 A4 Portable Thermal Printer

I rate the printer as a 5 out of 5 stars as it has been able to fit seamlessly into my busy and mobile lifestyle.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?