Tuesday, November 5, 2024

SMX Day 1: Duplicate Content Summit

Moderated by event organizer Danny Sullivan he gave a brief introduction of the panelists (Vanessa Fox (Google), Amit Kumar (Yahoo! Search), and Peter Linsley (Ask.com), and Eytan Seidman (Microsoft), like the geeks in the room needed to be reminded.

Beginning first was Eytan Seidman (Microsoft), the Lead Program Manager, who stressed on the fact duplicate content fragments your rank. Also, to be maintained is simplicity in session parameters. Duplicate content is okay for different locations if the content is unique. An important pointer was to ‘always use client-side redirects’.

When someone asked, “How do you avoid having people copy your content?”.

Seidman: All my experience is based on sites I helped administer. One thing is a simple method – tell people that if they use your content, they should attribute it to you. You can also block out types of crawlers, detect user agents, block unknown IP addresses from crawling.

Microsoft handles duplicate content via aggressively searching throughout for session parameters while also tracking parameters during crawl time.

Peter Linsley Ask.com’s Senior Product Manager for Search proposed using a copyright or even a creative commons notice to ward off duplicate content. Another of his pointer was to make content difficult to be molded any other way so that it maintains its uniqueness. if at all none of these work, take legal action.

Next to speak was Yahoo! Search’s Senior Engineering Manager, Amit Kumar. Yahoo extract links while crawling through sites but maintains a policy where they do not take content from pages they know are mere duplicates. However, Amit stated 4 reasons Yahoo! considered legit for duplication:

  1. Alternate document formats – PDF, printer friendly pages
  2. Legitimate syndication (newspaper sites have wire-service stories)
  3. Different languages
  4. Partial duplicate pages: navigation, common site elements, disclaimers.

Finally, it was Google’s very own Vanessa Fox who talked about duplicate content in the context of an episode from “Buffy the vampire slayer”. In one episode, there were two Xanders who had to be joint to end the problem. However, there was another episode where there were two Willows, this time the problem wasn’t the same as the previous one as in this situation, on Willow was good while the other Willow was evil, so the evil one had to go.

Ending the sessions was a whole lot of questions between the panel and invitees. Some of the most importan pointers are:

Peter: For the most part, a meta refresh is the same thing as a 301.

Vanessa: Use robots.txt to get rid of duplicate content

Vanessa: Tracking URLs and parameters is easy by linking to the canonical version to prevent dilution.

Though the Search Engine engineers were anti the suggestion most people attending were in favor of digital signatures to prevent content scraping.

Does someone have a picture of Matt McGee’s face when Vanessa mentioned Buffy the vampire slayer? Another interesting feature was that only Microsoft’s Etyan Seidman’s Powerpoint worked well.

Comments

Tag:

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles