Microsoft researchers teamed up with University of California, Davis researchers to pinpoint exactly where “the bottleneck” of Web spam occurs and how legitimate advertisers inadvertently end up in bad neighborhoods. The majority of spam, they found out, comes from the same few places, and the middlemen are some names you might recognize.
The study (PDF – recommended reading to understand the whole) was authored by Microsoft’s Yi-Min Wang and Ming Ma, and UCD’s Yuan Niu and Hao Chen, and their results are startling. Using their “Strider Search Ranger System,” an automated spam detection system, the study authors found that:
- Blogspot and AOL Hometown domains were used for the vast majority of spammy doorway pages. At least three in four (75%) Blogspot URLs appearing in the top 50 results for commercial queries were spam, totaling 22% of all spam appearances.
- Three IP blocks accounted for huge percentages of redirection spam (links that lead to made-for-ads pages) and for spam-ads clickthrough traffic.
- Three syndicators were located most at the center of the redirection chains: LookSmart.com; FindWhat.com; and 7Search.com.
- Nearly 60% of keywords returning spam URLs were related to drugs and ringtones.
Based on that, the researchers developed what they called a five-layer double-funnel model to illustrate the sophisticated middleman/syndication circuit that matches advertisers with undesirable spammy URLs. Similar in structure to the double-helix of DNA models, there are complicated systems between end users (searchers) and advertisers.
While searchers are clicking in one direction, advertisements are coming the opposite way, as if passing on the road.
From the user side it goes:
Doorway — Redirection Domain — Aggregators — Syndicator — Advertiser
From the advertiser side it goes:
Advertiser — Syndicator — Aggregators — Redirection Domain — Doorway
That is the “Spam Double-Funnel.”
The researches note that among the top ten Live Search results for “cheap ticket,” three doorway pages appeared:
–http://-cheapticket.blogspot.com/
–http://sitegtr.com/all/cheap-ticket.html
–http://cheap-ticketv.blogspot.com/
Their ranking is related to comment spam in open forums, where the URLs are often posted. The URLs redirect to known-spammer domains like:
–vip-online-search.info
–searchadv.com
–webresourses.info
Surprisingly, ads for reputable online travel firm Orbitz showed up on all three spam domains. The researchers assume that Orbitz did not intend for its brand to appear in these bad neighborhoods. The same scenario played out for other well-known advertisers like Shopping.com, DealTime.com, BizRate.com, eBay, and Shopzilla.
Research showed that 60 percent of the redirection chains involved ads syndicated through LookSmart, FindWhat, and 7Search. But the sources for the majority of Web spam pages themselves came from three specific IP blocks:
- 22-25% of all spam appearances originated from IP block 209.8.25.150~209.8.25.159
- IP blocks 66.230.128.0~66.230.191.255 and 64.111.192.0~64.111.223.255 were responsible for over 100,000 spam ads, occupying the bottleneck of the spam double-funnel. The researchers say this may prove to be the best layer for attacking the search spam problem.
The researchers conclude their study by saying:
By exposing the end-to-end search spamming activities, we hope to educate users not to click spam links and spam ads, and to encourage advertisers to scrutinize those syndicators and traffic affiliates who are profiting from spam traffic at the expense of the long-term health of the web.