Archive for the “Search Engines” Category


Hyperix LogoA lot has been written about cloud computing in the last year and each day seems to bring news of a new player in the cloud arena. So what does the cloud have to offer search engine companies like Hyperix? Well that depends on how deep our pockets are. After all, we need a lot of bandwidth, processing power and data storage to run any real search engine. And as we don’t have deep pockets, nor an angel or venture firm backing us we’ve had to be find creative solutions and innovate where possible.

Up to this point we’ve been focusing solely on the technology we’re using that will differentiate ourselves from any other vertical search platform entities out there. We’ve got our own small web crawling cluster setup which we’ve used for some time to test different web crawlers, collect and parse data and measure a variety web crawler values which determine how many CPU cycles, RAM, bandwidth, and storage is necessary to create the vertical search indexes we want. We’ve also been focusing on the quality of the data we’re crawling, the algorithm which ranks the pages crawled, the parsing engines, and the results pages.

(more…)

Tags: , ,

Comments No Comments »

Cuil Home Page Screen Shot

It’s cool to be Cuil today. Cuil Inc. launched their new search alternative to Google today. Cuil pronounced Cool has received lot’s of press today and it helps when it’s in the right places. And if it we’re not for the fact that the principals have a history of producing value add to existing search products like Google search, then this roll out would be hardly noticed.

But the fact that they have a track record, worked at Google and are boasting that they have an index bigger than Google, is newsworthy. Cuil is led by Anna Patterson a former engineer at Google.  Along with her husband Tom Costello, a search expert in his right, Cuil aims to take on Google. No small feat.

But having a bigger index doesn’t mean you’re better. And only time will tell if they have what it takes to carve out a piece of the big search pie. They claim to be able to search across 120 billion web pages compared to an estimated 40 billion Google has. Google officially does not reveal how many pages it indexes but others sources suggest that they keep an index of around 60 billion pages. As well Google says that not all of the pages it crawls are indexed because many are duplicates. Working in this industry I can concur that there is a lot of duplicate content out there.

For Cuil to take some market share away from Google it will take more than the boasting of a bigger index. Reality is, with enough hardware and money a startup can build an index that is big, even huge as Cuil has. The test of whether Cuil can succeed will be if the public and business users find more relevant search results through Cuil. Being as big or fast as Google is not enough. You have to be able to change people’s search preferences. And that’s not easy.

What is noteworthy is that Cuil says they’ve developed a faster, better way to index pages and just as important use less hardware. Less hardware is important as the cost to index, store and serve up results can be prohibitive. The ongoing downward costs of hard drives, CPU’s etc. helps. However even though RAM prices have come down, the price of RAM still is one of the most expensive aspect of creating a searchable index.

In my initial tests of Cuil I was both pleased with the results and disappointed. Some common searches resulted in no results. I’ll attribute that to first day bugs. But I also found that sources like Wikipedia were heavily weighted, sometimes in favor of the actually site that I was looking for.

It’s public day 1 for Cuil and they have people’s attention. Let’s see if they can keep it and build some momentum. In the meantime I’ll give them a try and report back with my thoughts in the near future.

Tags: , ,

Comments 1 Comment »

People are lazy. They don’t bookmark sites they’re interested in and so on a continuous basis they type the sites name in a Google search field and search for the address. And voila Google serves up the address. But what they’re really interested in is some information from that site. Google calls this phenomena “teleporting”.

Based on this phenomena Google has introduced “search within a site” feature to their search engine results. So for certain queries you’ll be presented with a second search box that searches just that site. This is pretty cool, but there’s more to it than that. Below is an example. Say for some reason you wanted to search the New York Times but didn’t have the address, fire up Google and it’s the first result. But also notice the search box offered.

New York Times sample search

And now do your search within the ’search nytimes.com’ field and get results only from the New York times site. Experienced users have known you could do this for some time. What’s new is two things; first the ability to have that search box show on the Google search results page and second you will notice that targeted sponsored links show up on the right of this second search creating what John Battelle calls the “second click”. The second click offers publishers highly targeted ad space and Google ultimately more revenue.

New York Times search sample 2

What are your thoughts on this new feature?

Tags: , ,

Comments No Comments »

Over at the Yahoo! Search blog Sharad Verma recaps WebmasterWorld’s Pubcon. I could not attend but it sounds like I missed a good conference and relevant keynote for Hyperix. It’s nothing new to me but it’s nice to see other people talking about it.

Noteworthy Keynote

I thought that Richard Rosenblatt’s keynote on Wednesday delivered sound insight. According to Richard, most content online today is about sports, politics, news, and other common topics, leaving long tail topics underserved. He emphasized that there is significant demand for quality content in the long tail and therefore an unaddressed opportunity to create content and capitalize on the monetization opportunities. Susan Esparza at Bruce Clay, Inc. lived blogged from the address, where Richard revealed, “The old model was about owning a generic domain name (pets.com). The new is that the search engines don’t care where you are. Get a one or two word domain on a nontraditional domain. Target the wide body and the long tail.”

Tags:

Comments No Comments »

As I’ve said before vertical search is where it’s at these days and while I work on a couple of products of my own I’ll take the time to recommend someone else’s product.

Recently I’ve had to do some research in the health area for a particular rare condition and I’ve been trying various search engines. So when I came across Kosmix’s Beta search for health I was pleasantly surprised at the results. So if you need health info give Kosmix a try.

Comments No Comments »