New software detects bots scraping Web site data

The software is aimed at protecting Web sites whose intellectual property can be easily copied

Web sites such as job boards face a persistent problem: their data is constantly pilfered by automated bots.

The data ends up on other competing job boards, which have stolen the content. It's a problem that plagues any Web site whose intellectual property must be publicly posted for free, or even those with subscription models.

But an Atlanta-based security company that specializes in detecting bots has developed software that can detect those screen-scraping and data-mining bots.

Pramana's main product, HumanPresent, detects automated bots that, for example, enter spam into Web-based forms or register for free e-mail accounts to be used for spam.

Pramana has now developed a module called "data mining and screen scraping prevention" for HumanPresent. It works on many of the same principles as its main product but has been modified for data-mining scenarios, said David Crowder, Pramana's CEO.

HumanPresent can detect bots by noticing differences in the way a human would normally interact with a Web page and contrasting that with how bots behave. It looks at more than 30 metrics, such as keyboard strokes, mouse clicks and the timing of those actions.

HumanPresent looks at single transactions, but the data-mining module has been modified to look at a timed period when either a bot or human is on the site, Crowder said.

Data-mining bots tend to entirely circumvent a browser's user interface. For example, a bot may request a Web page with lots and lots of data, but never scrolls or clicks on a page. If a series of pages are opened and viewed in that manner, it could mean a data-mining bot has arrived.

Pramana assigns a unique ID to the visitor, and after analyzing the visitor's behavior, can make a decision whether to label the visitor a bot or not. There are several different ways a Web site operator can then choose to deal with the situation.

The IP (Internet Protocol) address of the bot's computer can be block permanently. One car auction Web site that is testing Pramana's data mining module decided to move suspected bots into a "sandbox" where it is served completely false data.

"They're indeed data mining -- it's just dead wrong," Crowder said.

Other options include prompting the Web site visitor with a challenge or task, which some bots aren't capable of completing.

Data mining costs companies dearly. Companies that sell premium data will find that their competitors will buy a subscription and then use automated bots to steal the data for their own sites. In one example, a Web site that has gigabytes of data on used car prices found their data had been scraped and was for sale on eBay.

"They are actually competing with their own content," Crowder said.

Some Web sites have poor designs that make data scraping that much easier. The used car site had URLs (Uniform Resource Locators) could be sequentially modified to reveal more data, Crowder said.

The data-mining module will be wrapped into the HumanPresent product for now, but early next year Pramana plans to sell it separately, Crowder said. Pramana offers HumanPresent either as an on-premise appliance or as a software-as-as-service configuration.

For the SaaS (software as a service) offering, Pramana's technology is integrated into a Web application and session information is sent back to Pramana for analysis. Crowder said Pramana has been able to significantly cut down on the latency time in its latest version. For customers who need more speed, the appliance is available.

Join the Good Gear Guide newsletter!

Error: Please check your email address.

Tags botsPraman

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Jeremy Kirk

IDG News Service
Show Comments

Most Popular Reviews

Latest News Articles

Resources

PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?