Friday 13 July 2012

The Search API - Part 1

The search API 

The Search API gives your application the capability to perform Google-like searches over structured data within your application. You can perform full-text searches across several different types of text (plain text, HTML, Atom, and others). Searches return a ranked list of matching text, and you can customize the ranking of results.
https://developers.google.com/appengine/docs/python/search/

Sound familiar? That "Google-like" search is really the main Google experience - ranked search results from a set corpus of data.

As usual Google provide excellent documentation, which I won't attempt to replicate here. Rather I'll talk about some of the problems I encountered when attempting an implementation.

The development Environment. 

Running the local instance of the development environment is very similar to running in the cloud. Similar but not identical. 

There are no resource limits locally, nor limits on the amount of operations per second. This can mislead as the cloud has very strict limits on some operations that are simply not present when running locally. 

My learning project was to take this book:


And create a front end for it to allow users to search the content and return relevant results.

You can see the front end here:


Yes, it's a "Dream interpreter". 

Luckily for me the layout of the book in it's electronic form was very amenable to processing into a set of separate documents. Each "dream" was formatted almost with this project in mind. 

_Thorns_.
To dream of thorns, is an omen of dissatisfaction, and evil will surround
every effort to advancement.
If the thorns are hidden beneath green foliage, you prosperity
will be interfered with by secret enemies.
Splitting the dreams up into separate list items was achieved with this regex:

 list_of_dreams=re.split('(_\w+\s*?.*?\w+_([\[][\d][\]])?)\.',data)

Then each dream interpretation was then further split into paragraphs with this regex:  '.\r\n'.

So for each heading (Thorns) we end up with one or more interpretations. Some dreams have dozens of interpretations so splitting then up allows a better search result to be returned.

So the ground is prepared - we now have a python list (in fact it's a little more complex then that) containing our target strings we want to search. Now what?

That's part two!





No comments:

Post a Comment