I’m Not Actually a Geek

May 27, 2008

Tag Recommendations for Content: Ready to Filter Noise?

Filed under: geek — Tags: , , , , , , — Hutch Carpenter @ 11:34 pm

In a recent post, I suggested that the semantic web might hold a solution for managing noise in social media. The semantic web can auto-generate tags for content, and these tags can be used to filter out subjects you don’t want to see.

As a follow-up, I wanted to see how four different services perform in terms of recommending tags for different content.

I’ve looked at the four services, each of which provide tag recommendations. Here they are, along with some information about how they approach their tag recommendations:

  • del.icio.us: Popular tags are what other people have tagged this page as, and recommended tags are a combination of tags you have already used and tags that other people have used.
  • Twine: Applies natural language processing and semantic indexing to just that data (via TechCrunch)
  • Diigo: We’ll automatically analyze the page content and recommend suitable tags for you
  • Faviki: Allows you to tag webpages you want to remember with Wikipedia terms.

Twine and Diigo take the initiaitve, and apply tags based on analyzing the content. del.icio.us and Faviki follow a crowdsourced approach, leveraging the previous tag work of members to provide recommendations.

Note that Faviki just opened its public beta. So it suffers from a lack of activity around content thus far. That will be noticed in the following analysis.

I ran the six articles through the four tagging services:

  1. The Guessing Game Has Begun on the Next iPhone - New York Times
  2. TiVo: The Gossip Girl of DVRs - Robert Seidman’s ‘TV by the Numbers’ blog
  3. Twitter! - TechCrunch
  4. Injury ‘bombshell’ hits Radcliffe - BBC Sport
  5. Why FriendFeed Is Disruptive: There’s Only 24 Hours in a Day - this blog
  6. Antioxidant Users Don’t Live Longer, Analysis Of Studies Concludes - Science Daily

The tag recommendations are below. Headline on the results? Recommendations appear to be a work in progress.

First, the New York Times iPhone article. Twine wins. Handily. At Diigo gave it a shot, but the nytimes tags really miss the mark. del.icio.us and Faviki weren’t even in the game.

Next, Robert Seidman’s post about Tivo. Twine comes up with several good tags. Diigo has something relevant. And again, del.icio.us and Faviki weren’t even in the game.

Now we get to the trick article, Michael Arrington’s no text blog entry Twitter! The table turn here. Twine comes up empty for the post. Based on the post’s presence on Techmeme and the 400+ comments on the blog post, a lot of people apparently bookmarked this post. This gives del.icio.us and Faviki something to work with, as seen below. And Diigo offers the single tag of…twitter!

Switching gears, this is a running-related article covering one of the top athletes in the world, Paula Radcliffe. Twine comes up the best here. Diigo manages “bombshell”…nice. del.icio.us and Faviki come up empty, presumably because no users bookmarked this article. And none of them could come up with tags of “running” or “marathon”.

I figured I’d run one of my own blog posts through this test. The post has been saved to del.icio.us a few times, so I figured there’d be something to work with there. Strangely, Twine comes up empty. Faviki…nuthin’.


Finally, I threw some science at the services. This article says that antioxidants don’t actually deliver what is promised. Twine comes up with a lot of tags, but misses the word “antioxidants”. Diigo only gets antioxidant. And someone must have bookmarked the article on del.icio.us, because it has a tag. Faviki…nada.

Conclusions

Twine clearly has the most advanced tag recommendation engine. It generates a bevy of tags. One thing I noticed between Twine and Diigo:

  • Twine most often draws tags from the content
  • Diigo more often draws tags from the title

Obviously my sample size isn’t statistically relevant, but I see that pattern in the above results.

The other thing to note is that these services do a really great job with auto-generating tags. For instance, the antioxidant article has 685 words. Both Twine and Diigo were able to come up with only what’s relevant out of all those words.

With del.icio.us and Faviki, if someone else hasn’t previously tagged the content, they don’t generate tags. Crowdsourced tagging - free form on del.icio.us, structured per Wikipedia on Faviki - still has a lot of value though. Nothing like human eyes assessing what an article is about. Faviki will get better with time and activity.

Note that both Twine and Diigo allow manually entered tags as well, getting the best of both auto-generated and human-generated.

When it comes to using tags as a way to filter noise in social media, both system- and human-generated tags will be needed.

  • System-generated tags ensures some level of tagging for most new content. This is important in an app like FriendFeed, where new content is constantly streaming in.
  • Human-generated tags pick up where the system leaves off. In the Paula Radcliffe example above, I’d expect people to use common sense tags like “running” and “marathon”.

The results of this simple test show the promise of tagging, and where the work lies ahead to create a robust semantic tagging system that could be used for noise control.

*****

See this item on FriendFeed: http://friendfeed.com/search?q=%22Tag+Recommendations+for+Content%3A+Ready+to+Filter+Noise%3F%22&public=1

May 19, 2008

Hey Yahoo! Forget MSFT, GOOG. Change the Search Rules.

Filed under: geek — Tags: , , , , , — Hutch Carpenter @ 12:25 am

These I wish I knew the moment I was turned off on Yahoo and what the root cause may be, but I no longer use anything Yahoo (except my Flickr account if you want to count that).

Vince DeGeorge, on FriendFeed

I was doing the same thing until I started using delicious as a search tool. Finally realized how powerful it was, and have been using it since.

Shaun McLane, on FriendFeed

I have previously written that Delicious search is one of the best ways of searching for things when a standard search doesn’t pull up what you are looking for. After Google, it is my favorite “search engine.”

Michael Arrington, TechCrunch, Delicious Integrated Into Yahoo Search Results

The latest news is that Microsoft is reaching out to Yahoo again. In fact, a couple reports (here, here) say that Microsoft wants to buy Yahoo’s search business.

Before any such transaction occurs, it seems worthwhile to think about what Yahoo could do with its existing assets. The three comments above are insightful. Yahoo is slowly losing share of mind, although it’s existing base of users will be around for a while. At the same time, there are nuggets in the Yahoo empire.

Search via del.icio.us ranks as one of those nuggets. Another nugget? Yahoo! Buzz. According to ReadWriteWeb, Yahoo! Buzz has surpassed Digg in terms of traffic, and its demographics better reflect web users.

Yet, Yahoo struggles against Google in the highly lucrative search market. Google increased to 67.9% of searches in April 2008, compared to Yahoo’s decline to 20.3% of searches.

What should Yahoo do? Stop playing Google’s game. Rewrite the search rules by embracing the social web fully, leveraging the social media assets it has.

And in doing so, demonstrate an aggressive path to make Yahoo a social media titan.

A Proposal for “Socializing” Yahoo Search

In January 2008, TechCrunch ran a post with a preview of del.icio.us integrated with regular Yahoo search results. Included in the search result links would be stats that tell a user:

  • Number of del.icio.us users who bookmarked the page
  • The top tags they used on the page

Both of those stats appear to be clickable. By clicking on the number of users stat, I assume a user would be taken to the del.icio.us page showing the users who bookmarked the page. If one clicked a tag, you’d land on the del.icio.us page for all web pages with that tag.

That’s a good start. But Yahoo can do better. Below is a diagram that shows how Yahoo can use its existing assets, combined with a good dose of the new social media experience, to radically change search:

Here’s a breakdown of what’s going on with the proposal.

Search Rankings

From what I’ve read, Yahoo has pretty much caught up to Google in terms of search performance. That means the use of links and clicks to rank websites is pretty common across the two search engines. However, Google does have the advantage of three times the traffic, which makes its insight into what’s relevant better than Yahoo.

But Yahoo has its own in-house advantages: del.icio.us and Yahoo! Buzz. Both address shortcomings in the links and clicks rankings for search engines:

  • Links require a media site or blogger to take the time to link. These links are insightful, but lack the broader reach of what Web users find relevant.
  • Clicks occur before a searcher knows whether the landing site is valuable. They don’t describe its usefulness after someone has clicked onto the site.

With del.icio.us and Yahoo! Buzz, Yahoo can tap into users sentiments about websites in a way that Google cannot. These insights can be used to influence the ranking of search results.

Search Results - Your Friends or Everyone

Here’s where it can really interesting. Notice I keep the general search results outside the influence of what your friends think. I think that’s important. A person should see results outside their own social circle. Otherwise, it will be hard to find new content.

But there is real power in seeing what your friends find valuable (e.g. see FriendFeed). So Yahoo should let you easily subscribe to other people for content discovery. Yahoo already has a head start on letting you set up your subscriptions:

  • Yahoo Mail
  • Yahoo Instant Messenger

In addition to that, you should be able to easily subscribe to anyone who publicly shares content they find interesting. Both del.icio.us and Yahoo! Buzz have public-facing lists for every user of what they bookmark or ‘buzz’. After viewing those lists, I should be able to easily subscribe to these users.

Once your network is developed, it becomes a powerful basis for improving information discovery.

Search Results - Associated Tags

Whenever tags are available from del.icio.us, they should be visible for each web site shown in the search results. This is what TechCrunch previewed. What do tags tell a user?

  • A way to discover other sites that might be relevant
  • Context for the web site
  • That someone thought enough of the web page to actually tag it

Tags should come in two flavors: everyone and your network. Clicking on a tag should display the top 10 associated sites right on the search results page. For more sites associated to the tag, the user is taken to del.icio.us.

Keeping the top sites on the search results page is important to make people use the functionality. Leaving the search results page just to see the sites associated to a tag will cause adoption to drop signficantly.

Search Results - Associated People

Each web page in the search results will show the number of people who have (i) bookmarked the site; or (ii) Yahoo! Buzzed the site. These numbers give a direct indication of how many people, not websites, found the web page valuable.

Clicking these numbers displays a list of the people, along with their most recent activity. This gives users a sense of whether they want to subscribe to a given user or not.

Search Agent

Once users perform a search, they will be able to subscribe to new content matching their search results. These subscriptions can be based on different criteria:

  • Any new content matching the search term (Google does this via Google Alerts) or a tag
  • Any new content matching the search term/tag and bookmarked by someone to whom the user subscribes
  • Any new content matching the search term/tag and Yahoo! Buzzed by someone to whom the user subscribes
  • Any new bookmarks or Yahoo! Buzzes by someone to whom the user subscribes

New content notifications occur via email or RSS. RSS can be anywhere, including on the user’s My Yahoo page. Again, FriendFeed has shown the power of these content streams.

Final Thoughts

My little post here isn’t the only idea someone could float. But it does at least address taking Yahoo much more deeply into the social media world, where users drive the value.

Yahoo revealed details of a proposed del.icio.us integration back in mid-January. And then nothing. Yahoo previewed Yahoo Mash, a new social network in September 2007. And then…nothing. The last post on the Yahoo Mash blog was January 11, 2008.

Yahoo has so many amazing assets. Search, email, portal home page. Several beloved social media apps (Flickr, del.icio.us, Upcoming). Yet they have not put them together into a cohesive strategy and experience.

And now, talk of selling the search business? C’mon Yahoo. You’ve got too much going on to give up yet. Stop playing by others’ rules. Make your own rules with the amazing assets you have.

*****

See this item on FriendFeed: http://friendfeed.com/e/1b07226a-b51b-f386-fbb8-bdaece83e9fe

April 10, 2008

Becoming a Web 2.0 Jedi

Filed under: geek — Tags: , , , , , , , , , , — Hutch Carpenter @ 10:29 pm

Thinking about the ever deeper levels of involvement one can have with Web 2.0 apps and the Web 2.0 ethos. Came up with this chart.

Thoughts?

*****

See this item on FriendFeed: http://friendfeed.com/search?q=%22Becoming+a+Web+2.0+Jedi%22&public=1

March 10, 2008

Search Smackdown: Mahalo - del.icio.us - Google

Filed under: geek — Tags: , , , , , , , , , — Hutch Carpenter @ 10:49 pm

I was reading the Crowdsourcing vs. Expertsourcing: A Misleading Comparison post over at Mashable. In it, Paul Glazowski analyzes a Newsweek article that suggests the bloom is off the Web 2.0 rose. Too much junk is enabled via everyday people logging on, and there’s a movement for more professional, expert information sourcing.

One example of expertsourcing is Mahalo. Mahalo was started to be a guide to Web content. Paid professionals own a topic, they research a number of sites related to that topic, and post the links that provide the best information. In their opinion, that is.

I’ll admit to some skepticism here. Google has been so good at revealing information and letting me see what’s out there. The idea of limiting my results to what someone deems worthy seems so incomplete. I’m afraid I’d be missing something that’d be really important to me.

But Mahalo has gotten some traction, so there’s something there.

I decided to run my own simple test of Mahalo, pitting it against two other ways to find relevant web content: del.icio.us and Google search. Quick backgrounder on those. del.icio.us is a bookmarking/tagging app that lets you save websites you like, and give them terms that have meaning to you. You can also find content on a given subject by searching tags, and seeing what others have bookmarked. Google is, of course, the preeminent Web search engine.

I tested three separate search terms, going from broad to specific:

  • Running
  • Marathon training
  • Tempo run

My scoring system is simple. For each search term, gold, silver or bronze will assigned based on my own subjective view.

SEARCH TERM #1: RUNNING

‘Running’ is a fairly broad topic. There are a lot of areas that may apply, making it a challenge to return results that are relevant . With that in mind, let’s see what the three search apps returned.

Mahalo: SILVER

The foundation of Mahalo’s search results is “The Mahalo Top 7″. These are the seven best links for a given topic. It is the Top 7 where expertsourcing proves its value.

The ‘Running’ Top 7 provide links to two running publications and wikipedia’s entry for running. Another link is to About.com’s page for running, itself a form of expertsourcing. A little uninspired, but a serviceable offering.

Mahalo also has several other sections in its running page. These include health-related topics, oddball sites, web tools and user recommendations. The web tools include MapMyRun.com, which lets you map a run or view others’ running routes. A user recommendation includes LetsRun.com, which is the best site for the competitive runner.

One other thing that’s good. All the links relate to the physical exercise running.

del.icio.us: BRONZE

This search shows both the power and the weakness of bookmark/tagging sites. On the plus side, I love the running results that are returned. Very interesting variety. The downside? A lot of sites that aren’t exercise running-related. Things like “Running a Windows Partition in VMware” and “Internet Explorer 7 running side by side with IE6″. In fact, 26 of the first 50 results were not related to exercise running.

There are interesting sites that del.icio.us users have posted related to running. MapMyRun.com is here. How to Select a Running Shoe by eHow.

Several, but not all of the Mahalo Top 7 appear in the first 50 del.icio.us results.

Google: GOLD

You can see how Mahalo picked its Top 7 websites…they’re all the top results in Google search! Google also returns the fun stuff in del.icio.us.

Then Google offers a plethora of other sites, and only 6 of the first 50 are not related to exercise running. Pretty much everything on Mahalo is there, plus other interesting sites. A site listing running movies. A company that sells the running skirt! Ultrarunning.

SEARCH TERM #2: MARATHON TRAINING

‘Marathon training’ is not nearly as wide open as ‘running’. This search is for someone who has a a goal in mind.

Mahalo: BRONZE

First, let me say that the bronze here is a very strong showing. If there was photo finish, you’d have a hard time telling Mahalo hadn’t won this test. The presented sites are all good and worty of consideration for anyone contemplating a marathon.

There are a variety of programs available here: Runners World, Running Times, marathontraining.com, etc. And to Mahalo’s credit, there’s no listing for Galloway’s training program! Editor bias there, I’ll admit.

I was disappointed that Pete Pfitzinger’s program isn’t shown. It’s my own favorite. But I liked the CrunchGear site, listing stuff marathoners would want.

del.icio.us: GOLD

One thing that immediately struck me this time is that all 50 of the del.icio.us results were related to marathon training. The greater specificity helped del.icio.us here. Also, “running” has several meanings, but “marathon” has few.

Several of the Mahalo Top 7 are in the first 50 results. Missing are the Running Times program, the AIDS national training program and the Boston Athletic Association program. But Team in Training is included (if you’re offsetting charity-related programs).

Several other valuable sites are here. For example, there’s McMillan Running, which includes running pace calculators and marathon time prediction workouts.

Unfortunately, Jeff Galloway’s site is bookmarked here. But…Pete Pfitzinger is included as well. Bonus points for that.

Google: SILVER

Google does its usual excellent job in its results. 6 of the Mahalo Top 7 are here; Running Times is missing from the first 50 results. Surprisingly, Team in Training is not in the top 50 results.

Google gets dinged for no race calculator in the first 50 results. No Pete Pfitzinger. But Jeff Galloway is there! Noooo…

SEARCH TERM #3: TEMPO RUN

A tempo run is a specific training technique in which you hold a fast pace over several miles. It’s a tough workout, but it can advance your performance dramatically. Obviously, we’re now in the technical weeds of running.

Mahalo: DISQUALIFIED

Mahalo has no entry for tempo running. We’ve gone too detailed for Mahalo here. DQ’d.

del.icio.us: SILVER

Use of the term “run” again confuses poor del.icio.us here. 34 of the first 50 results are not related to exercise running. But there are several good sites related to the tempo run. Runner’s World has Learn How To Do A Perfect Tempo Run. Running Times has A Tempo Run by Many Other Names.

And this is one of my favorites…a LetsRun.com post/discussion about Tempo run length vs. speed from 2003. One would have to go pretty deep into the LetsRun site to unearth that one. A true credit to the power of social bookmarks & tagging.

Google: GOLD

Incredibly, all of the first 50 results were related to exercise tempo runs. Very impressive. Lots of good info about the temp run. A LetsRun post/discussion, but different than the one on del.icio.us. Bloggers describing their tempo runs. Formal programs that advise on the pace of the tempo run. Just really good stuff.

Recap: Broad, Narrow, Technical

Broad search: Google, Mahalo, del.icio.us
Narrow search: del.icio.us, Google, Mahalo
Technical search: Google, del.icio.us, (Mahalo DQ’d)

Conclusions that I draw from this admittedly small, subjective test:

  • Mahalo is a good starting point for finding information on something that’s not familiar to you. It only covers broader, more popular categories. It does appear that the Mahalo expert just skims the top results from Google. But the clean interface and human filtering makes it a decent place to start your search.
  • del.icio.us is challenged by results that are not related to the search topic, which is consistent with its user-generated chaotic nature. It’s also a really good place to find hidden nuggets of valuable information not easily found elsewhere. And for a narrow topic with words that do not have multiple meanings, del.icio.us really shines.
  • Google still makes sense as the first place to look. Breadth and depth of results, and it takes on all comers. It also does an exceedingly good job of figuring out what sites relate to a search topic.

One final note in favor of Mahalo. There is research that shows consumers are actually better off with fewer choices than more. Give me 7 good choices, and I’ll be able to begin my journey to learn more about a topic. Give me 50 choices, some great, some terrible, and I’ll be flummoxed as I try to read them all.

Mahalo does have the advantage of providing a simple, limited set of good results to get beginners going. There is value to that.

Blog at WordPress.com.