Skip to content

Hype, the Python Indexer

Yesterday I tried hype, a Python indexer wrapped upon hyperestraier. The API is stupid proof, you can’t do something wrong :-P The cose used to test it is mainly:

[code lang="python"] def index(root="."): db = hype.Database(DB_PATH)

for name in all_files(root):
    doc = hype.Document()
    doc["@uri"] = name
    doc.add_text(name)
    db.put_doc(doc)

db.close()

def locate(search_string): db = hype.Database(DB_PATH, hype.ESTDBREADER)

searcher = db.search(search_string).order("@uri STRD")
list(searcher)

items = [item["@uri"] for item in searcher]

db.close()

return items

[/code]

index() iterates through the list of files, create a Document object for each of them and adds the document to the database. locate() uses a lazy searcher to search through the database and find what are you looking for.

Let me get you some raw numbers:

459917 documents (it’s my whole file system, containing two operating systems, ~ 130 giga byte of files)

~ 195 documents per second

39m16.949s to index the file system, you should consider that the most of the time is spent in the os.path.walk() function to walk the tree

288M of database. Let me say that maybe I didn’t have to readd the content with add_text() function and use only @uri so the database will be a lot smaller

0.000755071640015 seconds to search, measured with time.time() around

[code lang="python"] searcher = db.search(search_string).order("@uri STRD") list(searcher) [/code]

It’s fast, really fast. The test machine is not optimal: I ran it while using the computer for my tasks, my two IDE disks are slow.

What to say? Hype is the tool you want to use if indexing in Python is needed.

The hype source tree contains some benchmarks against Divmod’s Xapwrap and hype seems faster. So download it, try it and use it :)

One Trackback/Pingback

  1. A song for the lovers » Blog Archive » Hypelocate on Saturday, December 10, 2005 at 3:16 pm

    [...] You can find more informations about hyper in my older post: Hype, the Python indexer [...]

Additional comments powered by BackType