Sunday, July 06, 2008

GSoC Hoogle: Week 6

This week I've been tackling type searching. I have just (in the last few minutes) got my first type search to work. At the moment type search is very limited, but all the ideas and scafolding are in place, so should now proceed relatively quickly.

In all previous versions on Hoogle, type searching was O(n), where n is the number of functions in the database. Hoogle compared the type search to each possible answer, computed a closeness score, then at the end wrote out the closest matches. This meant that before the first answer could be given, all functions had to be checked, i.e. the time for the first answer was O(n). As the Hoogle database is about to get massively bigger, this approach is insufficient.

The new version of Hoogle is much cleverer. It works by exploring a graph, following similar ideas to Dijkstra's algorithm, to reach more suitable results first. Typically, the best answers will be given without any search of the graph, and then as the graph is searched more results will appear with lower closeness. With the new scheme the complexity is O(m), where m is the number of results you want. I hope at some point after the SoC is finished to describe the algorithm properly, so others can understand it, and hopefully improve upon it.

Next week: Finishing off type searching, so it supports all the features planned. Build system work, and potentially a cabal pre-release.

User visible changes: Type search works to some degree, but not perfectly. Database debugging options (conversion and dumping to a text file) have been added.

No comments: