Riya - Phase 2

About 8 weeks ago Riya launched its open beta.  By any usage metric this has been a success.  We enjoyed the Techcrunch Effect with 1M image uploads in the first 24 hours.  Thanks, Mike. Once past the pent up demand, we have continued to grow nicely, passing 7M images last week.  Yesterday we had a board meeting to officially chronicle the progress and approve Riya Phase 2.

When we first conceived of Riya (by the way, we officially changed the company name from Ojos to Riya today) we were torn between the twin objectives of sharing photos and finding photos.   Since then, it has become clear that there are literally 100’s of ways to share, but precious few methods of searching for them.  Text search engines like G***** and Y****!  (I  don’t want to give them more brand awareness, lest you have not heard of them  ;-),  really aren’t very effective for finding photos. 

As a board member, I get an automated report every morning from Riya that shows the number of daily users, images, faces recognized, etc.  Each morning has been a re-affirmation that we are on to something very big.  We now have a very powerful team. Over 30% of the Company is in Research, as distinct from Engineering. (I have never had a software company with this level of investment in basic research from the beginning.)  We already may have the largest Machine Vision team in the world. We have over a dozen PhDs in Computer Science.  As a consequence, the Company is now entering into Phase 2 – Rapid Innovation.  Phase 1 was about building the platform and the user base.  With 7m+ images and 5M+ faces detected, and a full infrastructure for rapid development, deployment, and iteration, Riya is now about doing what consumer Internet companies have to do to succeed  – experiment – testing  new innovations with real user behavioral feedback.  One consequence is our IP attorneys are getting a big payday as we patent the innovations in image search that this team is inventing.

Phase 1 exceeded my initial expectations, but has raised my expectations for Phase 2 and beyond.  Munjal and the team are clearly veering into Photo Search to complement and partner with the 100s of photo sharing sites out there.  The PhotoWeb is very real and there is no effective way to navigate through it.  If we are successful, all the sites with photos (stock, personal, amateur photography, etc.) should benefit by being easier to search.   If navigation is easier, monetization will naturally follow for them and for us.  A photo is worth a 1000 words.  So a photo site is worth …

A Vast Improvement for Classifieds, Mashups, and the Web

A few months ago I noted the release of Google Base as a pivotal moment in the maturation of the Web.  I said the move from unstructured to structured data was an important improvement in making the Web usable and re-usable. 

I have also discussed mashups as an important, lightweight development paradigm for making this reusability possible.  Thus far the mashups I have seen have been features, mostly visualizations of someone else’s data on Google or Yahoo maps.  The owners of the primary data have Web 1.0 business models, including Google Base.  The Web 1.0 model is “all your data are belong to us.  So come to [oursite].com if you want to see it."   Consequently, mashup innovation stops at the edge of what’s  really interesting, because mashup developers can’t rely on re-mixing those data and building really great destination sites.  This conflicts with the walled garden business model of the data aggregator.  Hence, restrictive licensing models.

Well, now there is about to be a Vast improvement. 

Vast has been in development for about a year, creating what I think of as the first inverted portal.  Now the preview release is up, showing the first three "applications" of the technology. Vast crawls the web for data and extracts that data in vertical categories.  The URL Vast.com is a demonstration site, not really a destination site.  The real destination site is everyplace else on the web – blogs, discussion boards, microsites, social networking sites, etc.  Anyplace where content creates context.

Vast.com looks like a classic classifieds site, but it is not.  It is not a data aggregator.  It is a data disseminator.  It is a hub targeted to the developer community to enable the mashup of structured data in a reusable form.   And it’s not just about classifieds.  It’s about adding structure to content. But the first application of structuring the Web is to take free-form descriptions on the Web and make them structured, searchable listings, i.e. classifieds.

What’s different from the aggregators besides the business model?  It is the data itself.  The data do not come from feeds.  They come from crawls of primary data sources – cars listed on dealer sites, jobs posted on companies’ web sites,  personal profiles listed on blogs, personal web sites, and dating sites.  As a result, the data are Vast – millions of cars, millions of jobs, and millions of profiles, with more categories of objects to come.

This is the true long tail of listings. The user benefit is obvious -- find the exceptional value.   Like the old joke -- why is it always that you find something in the last place you look? -- the exceptional value is always in the long tail. 

You don’t see a lot of end user features on Vast.com – no AJAX widgets, no mapping, no integration with reviews and ratings.  What you will see a Vast amount of data.  If you have a web site about Miami – create Miami-only view of the data.  If you have a Mercedes-Benz discussion site, create an M-B specialty classifieds component.  If you think you can do the next HotorNot.com, build it.   You can even build the next Vast.com on the API.  All the features you see are available in the API for free.

I don’t think of Vast (or Riya) as just Web 2.0.  Web 2.0 is largely about tagging, social annotation, and sharing.  I think of Vast and Riya as bricks in the road toward a Structured Web, beyond Web 2.0. Vast does for free form text what Riya does for images – extracts structure for reusability.  There is some additional discussion of Vast here, here, and here

Vast is not a walled garden.  It is  “All your data are belong to you.”  Have at it. 

Portals are the New Newspapers

My last post was about newspapers and how they have been deconstructed.   I made the analogy of newspapers as information mainframes – legacy feature bundles based on pre-aggregation of information services.  To paraphrase the point, the Internet crushes attempts to build proprietary aggregations.

Now examine the current feature-war strategies of GYM.  They are building the same generic stack. But the web doesn’t stack. It topples.  Or should I say it tuples?

The core premise of a value-added stack is the friction associated with disaggregation.  I could buy my french fries at McDonalds and go to Wendy’s for the cheeseburgers, but the incremental value of Mac over Wendy french fries isn’t great enough for the effort.  The cheeseburger is the bigger part of the value chain. So it drags the inferior good along with it by aggregation, unless and until Wendy’s fries really suck.

The classic strategy for a David fighting Goliath is to go deep. Be best of breed on one thing.  It doesn’t always work, but it works best when the friction against switching is lowest.  Where else is it lower than on the Internet?

If GYM are building Web mainframes, how do they get deconstructed?  Web services deconstruct them.  The first salvo was fired today by Alexa.  As I said a few months ago, business models are going to be the bottleneck to web services adoption, not technology.   The one advantage that GYM have is they can make the end user experience free with advertising.  Of course what they are really doing is allocating the value received (the cost per click) across an internal stack of services.   The David fighting a Goliath must live or die on his service.  His value can’t be buried in the stack. 

Alexa has created a pricing algorithm for their API.  This may be the first explicit price-per-use model for a public internet web service API.  I cannot think of any others, though there may well be some.   Alexa now has put anyone and everyone in the portal business. I could host Alexa and use Google’s Adsense and, Bingo! I have my own monetizing portal on a $10/month web site.   Add APIs for calendars, instant messaging, and other web services, and the GYM stack is entirely blown apart.   Revenge of the long tail.  Let a million portlets flourish... Vertical value stacks are replaced by horizontal services.

The only thing standing in the way is the evolution of market-based prices for other APIs.  What’s a calendar worth? $1.00/1000 uses? What’s web email worth?  $1.00/1000 messages? What’s Yahoo’s VOIP worth – oh, we, know, its $1.00/minute.  And so forth….

This will be the new service economy – the Web Services Economy.

I may be the only one, but I see an elegant irony.  Newspapers are being decimated by unbundlers.  These are the first generation Internet services (Yahoo, Ebay, Google) who unbundled content from advertising, and unbundled them both from physical delivery.  Search has been the killer application – the cheeseburger.   

However, the Web’s pressure for unbundling is relentless. Witness skyrocketing tiny companies like Goowy that do nothing more that allow you to use best-of-breed web portal services without being tied to any one portal. If I can now re-package search by using an API from the open market, portal-based search is no longer the cheeseburger.  It is just dead meat. 

Portal strategies are antithetical to a Web Services Economy.  The pressure to deconstruct is relentless. The center cannot hold. Just as newpapers are the new mainframes, Web portals are the new newspapers.

Organizing Chaos

Today Dan Farber paraphrased some comments by Munjal Shah about the Riya.com business model. The comments refer to extracting value from user-generated content and rewarding the 'long tail.' Those of you following Riya know they are in the process of launching a site to automatically tag photos with faces and text. These metadata make it easier to organize and share photos. They also make it easier to search for photos that have value to you.

The principle behind this is as old as the commercial Web -- all of nine years. Brooks Fisher (now a VP at Intuit) was the first person to invent and implement the concept of keyword advertising back at Infoseek in 1996. (We were doing all of 2M page views/day!) That was when it became clear that as the audience grows, the value of the ad inventory expands exponentially. Targeting equates to value. Targeting specificity increases as volume increases, lifting the value of the entire inventory. It is a virtuous cycle. (This is also part of why search has a winner-take-all dynamic, but that's for another time.) Adding the structure unlocks the value.

This is the power behind the Riya business (and another soon-to-launch business in which we have invested). Adding structure unlocks the vaue. It attracts users by improving the experience. More users generate and attract more content. Content expansion increases the value of targeting. Value is extracted by making the content more searchable, and ultimately, reusable. Scalability and consistency are key to unlocking the value. Scale and consistency limitations are also why I think manual tagging regimes have a limited role in the Web platform.

If this sounds like a machine-learning bias, guilty. If it sounds like the core of how search engines monetize, that's no accident. Search engines are the most mature. I prefer to think of them as targeting engines.

The generic rule - add structure and mine for money by organizing the chaos. We'll see it played out many, many times over the next several years. Because it works.

Search Inside(tm?)

The past twelve months have been search-intensive. I mean I have seen a lot of "next generation search" deals. And I know I am far from alone. Socially-enhanced search, tag-based search, machine intelligence search, multimedia search, etc. And I have done a couple of search deals, too.

Nearly every search startup I see is attacking the Googleplex head on with a "better experience" behind the search box. This is flying directly into the buzzsaw, at least for now. Google had a market opening because the first generation guys (including us at Infoseek) took the eye off the ball and became "portals". Maybe this will happen at Google, but I'm not betting on it.

The attraction to search is magnetic. Search monetizes because of the intention behind the act of searching. Search is a great business because Intention is Attention.

So how do I think of search as an investment opportunity in light of the dominance of G, Y, and even M? More and more I am convinced that search is going to become a service, not an experience. For the moment I am heavily influenced by Jeff Hawkins' book On Intelligence. A Great Read. (Thank you, Naval.)

Hawkins' main thesis that that thinking is pattern matching. Intelligence is better, faster, more fine-grained pattern matching. I am not going to try to restate the whole point. Read the book.

But search is pattern matching, too. Search == pattern matching == thinking. Thinking is the process embedded between behaviors. Search can and probably will be, too. Search is not the end user experience any more than typing ">c: dir" or ">ls" was. Search is a service, not an application. By analogy, consider the SQL query in a software application. It is an embedded search process in service of a larger experience. It is an intermediate step to match a well-formed need (request) to a well-formed fulfillment (response). The application logic and user interface are major value-added steps on top of the basic request/reponse mechanism. Of course request/response infrastructure is a big business. Oracle and Google can both attest, being the current big dogs in structured and unstructured search respectively with radically different business models and business practices.

I think "Search Inside" is one direction where search is going. Web applications that embed search to produce a richer experience are going to be really interesting. A better retrieval or ranking algorithm feels like a marginal change not justifying a change in consumer behavior. Machine intelligence with embedded search is going to feel like real intelligence. Are they going to be built on a G, Y, or M search platform or have their own. Don't know.

This is pretty abstract stuff. If you don't really know what I mean, you're not alone. Neither do I, yet. But I'll know it when I see it. And this post is an unabashed troll is find entrepreneurs/technologists who have interesting ideas that relate to this thought.

My only example to motivate the point is this, and the example is pretty trite. There are lots of travel search sites. But I have yet to see a real web travel agent, one that embeds search to fulfill my real goals when I travel. My real vacation goals are about relaxation, interesting things to do, and budget and time constraints. The end result is the constellation of transactions and prepatory content that promise to create that experience. That's what a real travel agent used to do before Travelocity and Expedia disintermediated their commissions. Travel agents embedded search to create a higher value result for the client.

This is the kind of "software agent" example that has been around since the invention of object-oriented programming. It won't happen any time soon, if ever, but interesting enablers will be built along the way.

As I said, we made two investments so far in search. Riya and Vast [no public site, yet]. In retrospect, both are enablers of forms of embedded search. Search can't be a foundation service until it reliably returns results that are both precise and well-formed to enable some value-added transformations/processing. Riya is about automatically associating a data type -- images -- with metadata. The association of metadata (identities, time/place, objects, text, permissions, etc.) makes the image retrieval both precise and reusable in upstream applications. Vast is building a WWW crawling, parsing, and extraction technology for mining the deep web for unstructured content. Its technology is a 'content factory', transforming unstructured content into a structured reusable form -- crudely put, it turns the Google world into the Oracle world. Both companies are enablers. They enable different forms content reuse for the Search Inside vision. Along the way, they will look like interesting end user applications, but they really are being built with a vision of being components of a Search Inside vision for web applications.

I know this begs a lot of questions. And the holy grails (some might say delusions) of AI and the semantic web are lurking behind the curtain. I don't pretend to have the answers. I can only say I believe this notion of search-qua-thinking is an important fragment of an idea. I don't want to invest in better search services. I want to find companies that are going to enable or use Search Inside(TM) (With apologies to Intel Marketing).

PS.
Soon after my initial post, I received several thoughtful replies about the failures of AI, mostly from AI experts. I guess this reads like a call for AI -- not meant to. That's why I consider this a fragment. It is admittedly some fuzzy thinking about fuzzy thinking. But is also way to reach out to people with innovative ideas at the edge, because the center is so dense with replicas of what's successful today.

Omni-Explorer->?

I had lunch with another newly Leapfrog-funded CEO today -- Naval Ravikant. Naval is CEO of a company we are (for now) calling Omni-Explorer. Matt Marshall semi-outted them a few weeks ago. All he'll let me say is that Omni is a Web 2.0 search deal. Yeah search. Am I a lemming? I don’t think so. When I saw the tech, I got it immediately, because I have a fair amount of the search business in my background, too. But, without Naval, I wouldn’t have done this deal. Naval is brilliant – he is the synthesis of strategy and tech. Omni isn’t a bet on an algorithm. That would be VC lemming behavior. It is bet on Naval’s ability to build a great business leveraging great (and special) Web 2.0 search technologies. Naval, too, is a young man with a lot to prove (his fight with a couple of venture firms about Shopping.com was big headlines in the venture community when we did this deal), and it's good he has something to prove. Because when he wins, I’ll win.