Thursday, August 28, 2008

Running your own Hoogle on a Web Server

As promised, here is a guide on deploying Hoogle on a web server. Before doing so, you need to generate the necessary Hoogle databases, as described yesterday, and place them in the datadir configured with Cabal. Then:


  • Move the hoogle binary to a location where it can act as a CGI binary, perhaps changing its name to index.cgi, if necessary. Configure the CGI program to run, possibly changing the program to be executable or adding settings somewhere.

  • Copy the files from src/res in the darcs repo into a res directory located beside the binary.

  • Create a file log.txt and give it global write permissions.



Now you should have Hoogle running on a web server! Some of the features, such as OpenSearch integration, won't work - but Hoogle should be usable. If anyone does get Hoogle running on a web server I'd love to hear, any feedback appreciated. In particular, if there are any tweaks required please let me know.

Wednesday, August 27, 2008

Hoogle Database Generation

Brief Annoucement: A new release of the Hoogle command line is out, including bug fixes and additional features. Upgrading is recommended.

Two interesting features of Hoogle 4 are working with mulitple function databases (from multiple packages), and running your own web server. Both these features aren't fully developed yet, and may change in their use, but can be used with care. This post covers how to generate your own databases, and how the web version databases are generated. Tomorrow I'm going to post on how to run your own Hoogle web server, but you'll need to generate your databases first! I'm going to walk through all the steps to create a database from the filepath library, as an example

Hoogle Databases

A Hoogle database is a set of searchable things, including text and type searching, and has a ".hoo" extension. A database may include the definitions from one package, or from multiple packages. Typically the Hoogle databases installed would include one database for each package (i.e. base.hoo, filepath.hoo), a default database (default.hoo) comprising of all the standard search items, and any number of custom databases (all.hoo) which comprise of different combinations of the other databases.

When using Hoogle, adding +name will include the given database in the search list, and -name will exclude the given package from the search. By default, Hoogle will use default.hoo, but if any +name commands are given then those databases will be used instead.

Hoogle looks for databases in the current directory, in the data directory specified by Cabal, and in any --include directories passed at the command line.

Step 1: Creating a Textbase

A Textbase is a textual representation of a function database. To generate a textbase you need to install the darcs version of Haddock, then use runhaskell Setup haddock --hoogle on your package. For filepath, this will create the file dist/doc/html/filepath/filepath.txt, which is a textbase.

Step 2: Converting a Textbase to a Database

To convert a textbase to a database use the command hoogle --convert=filepath.txt in the appropriate folder. If a package depends on any other packages, then adding +package will allow Hoogle to use the dependencies to generate a more accurate database. In the case of filepath, which depends on base, we use hoogle --convert=filepath.txt +base. This command requires base.hoo to be present.

Adding the dependencies is not strictly necessary, but will allow Hoogle to generate a more accurate database. For example, the base package defines type String = [Char], without the +base flag this type synonym would not be known to Hoogle.

We now have filepath.hoo, which can be used as a search database.

Step 3: Combining Databases

To generate a database comprising of both filepath and base, type hoogle --output=default.hoo --combine=filepath.hoo --combine=base.hoo. By combining databases you allow easy access to common groups of packages, and searching all these packages at once becomes faster than listing each database separately.

Web Version Databases

The web version uses the Hackage tarballs to generate documentation for most of its databases, but also has three custom databases:


  • base - the base package is just too weird, and isn't even on hackage. A darcs version and some tweaking is required.

  • keyword - the keyword database is a list of the keywords in Haskell, and is taken from the web page on the wiki.

  • hackage - the hackage database is a list of all the packages on Hackage, indexed only by the package name.



All the code for generating the web version databases is found in data/generate in the Hoogle darcs repo at http://code.haskell.org/hoogle.

Future Improvements

There are two database related tasks that still need to be done: Cabal integration and indexing all of Hackage.

Bug 80: In the future I would like Hoogle databases to be generated by Cabal automatically on installing a package. Unfortunately, I don't have the time to implement such a feature currently, and even if I did implement it, I'm unlikely to ever use it. If anyone wants to work on this, please get in contact. This is mainly a project working with Cabal.

Bug 79: The other work is to index all the packages on Hackage. The problem here is generating the textbases, once they have been created the rest is fairly simple. However, to run Haddock 2 over a package requires that the package builds, and that all the dependencies are present. Unfortunatley my machine is not powerful enough to cope with the number of packages on Hackage. Hopefully at some point the machinery that builds Haddock documentation for Hackage will also generate textbases, however in the mean time if someone wants to take on the task of generating textbases for Hackage, please get in contact.

Bug Tracker

I'm not working on Hoogle full-time anymore, so am using my bug tracker to keep track of outstanding issues. In order to interact more effectively with my bug tracker, you might want to read this guide. It describes how to vote for bugs etc.

Wednesday, August 20, 2008

Hoogle New Features

I've now finished my Hoogle Summer of Code work, though I still intend to continue working on Hoogle when I get the chance. Before the coding period expired, I was able to add a number of new features to Hoogle. These features are all available at Hoogle, under http://haskell.org/hoogle/.

More Compact Text Searching

The old text search feature was very fast, using an on disk trie to navigate around the possible matches. The downside to this trie was the space it consumed, about half the database was devoted to it. Fortunately, I came up with an alternative way to get fast text searching (albeit slightly slower), in a lot more compact form.

Much smaller database files also mean much faster database generation, as the time spent in the IO routines is the main bottleneck.

Faster IO routines

I rewrote the underlying binary layer in Hoogle, to make it faster. It's not as fast as I would like, and I think that moving to memory-mapped files is probably a good idea. With these improvements, along with the compact text searching, I am able to generate databases in about 2 seconds (compared to about 20 seconds before).

Database Restricted Searches

Hoogle has been able to run database restricted searches for some time, but now the databases contain enough information to make it practical. By adding +package or -package to the search you can include or exclude certain packages. For example, to find out which map functions are in the containers package try map +containers. To find out which map functions are not in the containers or bytestring packages try map -containers -bytestring. I have also split out the GHC.* modules from base, so if you want to find some unboxed types in GHC's libraries try # +ghc. Note that not all the documentation links work from the GHC modules, I am still trying to fix this.

By default Hoogle searches the following packages: array, base, bytestring, cabal, containers, directory, filepath, haskell-src, hunit, keyword, mtl, parallel, parsec, pretty, process, quickcheck, random, stm, template-haskell, time, xhtml

The "ghc" package is also available if specified with +ghc and includes the GHC.* modules of base only.

Hoogle 3

I have now replaced the default Hoogle with Hoogle 4, but have copied Hoogle 3 to http://haskell.org/hoogle/3. Unfortunately, it doesn't yet work, as I need some admin help. But it will in the next few days, I hope. The only reason I can think of for using Hoogle 3 is Gtk2hs library searching, which I do want to add to Hoogle 4 when possible.

Give Me Feedback

There are quite a lot of enhancements to Hoogle that I still want to make. I have tried to list all these improvements in my bug tracker. If you find a bug, or want some feature, open an issue. If you have a particular interest in a bug, you can star it, to be informed on its progress and to indicate to me that you care.

I'm particularly interested in two pieces of feedback:

I don't use Hoogle 4 because ...

Do you use any type/name search engine? Do you want to still use Hoogle 3? Do you use Hayoo? If you use something else, what feature draws you to it? What do you dislike about Hoogle 4?

I use Hoogle 4, but my life would be nicer if ...

There are many things which effect Hoogle 4 users that I'm not aware of. If you open a bug saying what annoys you (or leave a comment and I'll do it for you) then I can keep track of this information. Even if you don't necessarily see any way to fix the problems, I'd still like to know them.

Thanks for everyone who has given feedback on Hoogle so far, it has been very useful.

Friday, August 15, 2008

GSoC Hoogle: Week 12

This week I've been trying to get Hoogle 4 to the point where it can replace Hoogle 3. This is the final official week of Google Summer of Code, but I'm planning to continue hacking Hoogle next week, and then as time allows after that.

The priority this week was getting the documentation links working. The problem was not with Hoogle - displaying the links is trivial - but ensuring that Cabal + Haddock + Hoogle + random build scripts combine to generate the correct databases. This work involved lots of little changes in lots of places, but is now working properly. Included in this work is dependency tracking of packages (so that all packages using base know that String = [Char] etc), and merging multiple databases to create a single one.

After the Hoogle database was generated correctly, I started looking at using some of the additional information present. I have now added Haddock documentation inline in the search results. If the documentation is too long to fit comfortably, Hoogle uses AJAX wizzy-ness (or more accurately, DHTML) to allow the user to expand and show all the documentation. I suspect that this will eliminate many cases of the user actually following to the Haddock webpages. This feature is fairly new, and I have pushed it out because its useful - there are still many small improvements that need to be made.

This week I also spent some time attempting to generate documentation for all the Hackage libraries. I had some success, but the computer I am currently using is years old and lacks the necessary processing power. I will tackle this at some point in the future, once I have purchased a new machine (which should be quite soon).

With all these changes, I find Hoogle 4 to be significantly more usable than Hoogle 3. Please give it a try, and give feedback. At this point I'm particularly interested in any issues that would cause you to use Hoogle 3 instead of Hoogle 4.

Hoogle 3: http://haskell.org/hoogle

Hoogle 4: http://haskell.org/hoogle/beta

If there are no major issues, I will be replacing Hoogle 4 as the standard Hoogle sometime next week.

Next week: I will be no longer doing Google Summer of Code :-) I plan to refine some of the existing bits of Hoogle, and ensure that anything I haven't done is in a bug tracker for later.

User visible changes: The web search engine now gives Haddock links and displays Haddock documentation inline.

Monday, August 11, 2008

GSoC Hoogle: Week 11

This week I've been releasing lots. Hoogle 4 is finally starting to come together, and should be a worthy replacement for Hoogle 3 very shortly. Rather than go into detail about the past week, I'm just going to give some of the bullet points:


  • I have released 4 versions of the command line version of Hoogle, available on Hackage. Many bugs have been spotted by some very useful testers, and improvements have been made.

  • I have released a web version of Hoogle 4, and encourage feedback.

  • I have started to update the wiki Manual, which now contains some details of Hoogle's query syntax.

  • I gave a talk at AngloHaskell 2008, which is available online, as slides and an audio stream. All of the other talks were excellent and are well worth listening to.

  • I have started to build Hoogle documentation for all of Hackage. The machine I'm doing this on is very slow, so its not a quick process!



Next week: I'm hoping to work on generating better Hoogle databases, including a Hoogle database for the whole of Hackage. I also have a number of bugs to fix.

User visible changes: Users can download and use Hoogle, and the web interface is online.

Tuesday, August 05, 2008

Hoogle 4.0 web client preview

Since releasing a command line version of Hoogle 4 yesterday, I've had some useful feedback from a number of people. As a result, I have added a few bugs to the bug tracker, and fixed a few mistakes in the searching and ranking. The Hoogle on Hackage is currently 4.0.0.3 and is a recommended upgrade to all early testers.

I've now written a web interface to Hoogle 4, which has been uploaded to http://haskell.org/hoogle/beta/. This web interface is primarily so people can test searching/ranking without installing anything. There are a number of limitations:


  • The links to documentation do not work - this is the most severe problem, and probably stops people permanently changing to the new version.

  • The Haddock documentation is not present.

  • Some database entries are duplicates.

  • The Lambdabot says feature is missing.

  • The Suggestion feature is incomplete.

  • The AJAX style client features are not present.



The first three issues are fixed in Hoogle, but need various support through Haddock and Cabal to work. Other than these limitations, I am very interested in hearing what people think. As before, particularly regressions from Hoogle 3 or poor results/ranking.

Monday, August 04, 2008

Hoogle 4.0 release (beta, command line)

I am pleased to announce Hoogle 4.0, available on Hackage. A couple of things to note:


  • This is a release of the command-line version only. It will have identical searching abilities to the web-based version, which I'm about to write.

  • It currently only searches the same packages as Hoogle 3 (the final release will search more).

  • It currently doesn't support the --info flag as previously described (problems with Haddock, not with Hoogle).



Walkthrough: Installation

If you have cabal-install available, it should be as simple as:


$ cabal update && cabal install hoogle


Otherwise, follow the standard Cabal/Hackage guidelines. Hoogle depends on about 4 packages on Hackage which are not available with a standard GHC install, so these will need to be built.

Walkthrough: A few searches

Here are some example searches. I have used --count=5 to limit the number of results displayed. If you are using a terminal with ANSI escape codes I recommend also passing --color to enable colored output.


$ hoogle map --count=5
Prelude map :: (a -> b) -> [a] -> [b]
Data.ByteString map :: (Word8 -> Word8) -> ByteString -> ByteString
Data.IntMap map :: (a -> b) -> IntMap a -> IntMap b
Data.IntSet map :: (Int -> Int) -> IntSet -> IntSet
Data.List map :: (a -> b) -> [a] -> [b]

$ hoogle "(a -> b) -> [a] -> [b]" --count=5
Prelude map :: (a -> b) -> [a] -> [b]
Data.List map :: (a -> b) -> [a] -> [b]
Control.Parallel.Strategies parMap :: Strategy b -> (a -> b) -> [a] -> [b]
Prelude fmap :: Functor f => (a -> b) -> f a -> f b
Control.Applicative <$> :: Functor f => (a -> b) -> f a -> f b

$ hoogle Data.Map.map --count=5
Data.Map map :: (a -> b) -> Map k a -> Map k b
Data.Map data Map k a
module Data.Map
Data.Map mapAccum :: (a -> b -> (a, c)) -> a -> Map k b -> (a, Map k c)
Data.Map mapAccumWithKey :: (a -> k -> b -> (a, c)) -> a -> Map k b -> (a, Map k c)

$ hoogle "Functor f => (a -> b) -> f a -> f b" --count=5
Prelude fmap :: Functor f => (a -> b) -> f a -> f b
Control.Applicative <$> :: Functor f => (a -> b) -> f a -> f b
Control.Monad fmap :: Functor f => (a -> b) -> f a -> f b
Control.Monad.Instances fmap :: Functor f => (a -> b) -> f a -> f b
Data.Traversable fmapDefault :: Traversable t => (a -> b) -> t a -> t b


How you can help

I've released a command line version of the search to solicit feedback. I'm interested in all comments, but especially ones of the form:


  • I prefer the command line version of Hoogle 3 because ...

  • When I search for ... I would expect result ... to appear, or to appear above result ...

  • I was hoping for the feature ...

  • It takes too long when I ...



I'm going to be accumulating Hoogle 4 bugs in my bug tracker, or by email (http://www-users.cs.york.ac.uk/~ndm/contact/) - whichever you find more convenient.

Now I'm going to start work on the Web search :-)

Sunday, August 03, 2008

GSoC Hoogle: Week 10

This week I've been in Bristol, and am just about to head off to the Harbour Festival. Next week I'm heading off to AngloHaskell 2008, and will be talking about Hoogle type searching on the Saturday.

This week has been type search, yet again. There were issues with algorithmic complexity, combinatorial explosions and other fun stuff. However, its now finished. The type search is now fast enough (you can run Hoogle in Hugs against the core libraries) and gives good results. Rather than describe type searching, its easier to give an example. Searching for (a -> b) -> [a] -> [b] in Hoogle 3 gives:


Prelude.map :: (a -> b) -> [a] -> [b]
Data.List.map :: (a -> b) -> [a] -> [b]
Control.Parallel.S... parMap :: Strategy b -> (a -> b) -> [a] -> [b]
Prelude.scanr :: (a -> b -> b) -> b -> [a] -> [b]
Data.List.scanr :: (a -> b -> b) -> b -> [a] -> [b]
Prelude.scanl :: (a -> b -> a) -> a -> [b] -> [a]
Data.List.scanl :: (a -> b -> a) -> a -> [b] -> [a]
Prelude.concatMap :: (a -> [b]) -> [a] -> [b]


But in Hoogle 4 gives:


Prelude map :: (a -> b) -> [a] -> [b]
Data.List map :: (a -> b) -> [a] -> [b]
Prelude fmap :: Functor f => (a -> b) -> f a -> f b
Control.Applicative <$> :: Functor f => (a -> b) -> f a -> f b
Control.Monad fmap :: Functor f => (a -> b) -> f a -> f b
Control.Monad.Instances fmap :: Functor f => (a -> b) -> f a -> f b
Control.Applicative liftA :: Applicative f => (a -> b) -> f a -> f b
Data.Traversable fmapDefault :: Traversable t => (a -> b) -> t a -> t b
Control.Monad liftM :: Monad m => (a1 -> r) -> m a1 -> m r
Control.Parallel.Strategies parMap :: Strategy b -> (a -> b) -> [a] -> [b]


I think the new results are better. For more details, come to the AngloHaskell talk.

Next Week: I want to release a public beta of Hoogle 4 in command line form. I want to start on the web search engine and tweak the ranking algorithm. I'll also be writing up type search in the form of a presentation.

User Visible Changes: Type search works well and fast.