On this site I write about my various projects and share my thoughts on areas that interest me including, but not limited to, vertical search engines, cloud computing, social networking, Twitter, space exploration and the semantic web.
“Because the platform is open it gives all Web site owners — big or small — an opportunity to present more useful information on the Yahoo! Search page as compared to what is presented on other search engines. Site owners will be able to provide all types of additional information about their site directly to Yahoo! Search. So instead of a simple title, abstract and URL, for the first time users will see rich results that incorporate the massive amount of data buried in websites — ratings and reviews, images, deep links, and all kinds of other useful data — directly on the Yahoo! Search results page.”
Some exciting news today from Eric Baldeschwieler, Senior Director, Grid Computing on the Yahoo Developer Network, Yahoo! Launches World’s Largest Hadoop Production Application. I’ll note that my company Hyperix is using Hadoop for our vertical search platform.
Here’s some of the stats:
Some Webmap size data:
* Number of links between pages in the index: roughly 1 trillion links
* Size of output: over 300 TB, compressed!
* Number of cores used to run a single Map-Reduce job: over 10,000
* Raw disk used in the production cluster: over 5 Petabytes
Over at the Yahoo! Search blog Sharad Verma recapsWebmasterWorld’s Pubcon. I could not attend but it sounds like I missed a good conference and relevant keynote for Hyperix. It’s nothing new to me but it’s nice to see other people talking about it.
Noteworthy Keynote
I thought that Richard Rosenblatt’s keynote on Wednesday delivered sound insight. According to Richard, most content online today is about sports, politics, news, and other common topics, leaving long tail topics underserved. He emphasized that there is significant demand for quality content in the long tail and therefore an unaddressed opportunity to create content and capitalize on the monetization opportunities. Susan Esparza at Bruce Clay, Inc. lived blogged from the address, where Richard revealed, “The old model was about owning a generic domain name (pets.com). The new is that the search engines don’t care where you are. Get a one or two word domain on a nontraditional domain. Target the wide body and the long tail.”