Monday, August 16, 2010

CmdArgs 0.2 - command line argument processing

I've just released CmdArgs 0.2 to Hackage. CmdArgs is a library for defining and parsing command lines. The focus of CmdArgs is allowing the concise definition of fully-featured command line argument processors, in a mainly declarative manner (i.e. little coding needed). CmdArgs also supports multiple mode programs, as seen in darcs and Cabal. For some examples of CmdArgs, please see the manual or the Haddock documentation.

For the last month I've been working on a complete rewrite of CmdArgs. The original version of CmdArgs did what I was hoping it would - it was concise and easy to use. However, CmdArgs 0.1 had lots of rough edges - some features didn't work together, there were lots of restrictions on which types of fields appear where, and it was hard to add new features. CmdArgs 0.2 is a ground up rewrite which is designed to make it easy to maintain and improve in the future.

The CmdArgs 0.2 API is incompatible with CmdArgs 0.1, for which I apologise. Some of the changes you will notice:


  • Several of the annotations have changed name (text has become help, empty has become opt).

  • Instead of writing annotations value &= a1 & a2, you write value &= a1 &= a2.

  • Instead of cmdArgs "Summary" [mode1], you write cmdArgs (mode1 &= summary "Summary"). If you need a multi mode program use the modes function.



All the basic principles have remained the same, all the old features have been retained, and translating parsers should be simple local tweaks. If you need any help please contact me.

Explicit Command Line Processor

The biggest change is the introduction of an explicit command line framework in System.Console.CmdArgs.Explicit. CmdArgs 0.1 is an implicit command line parser - you define a value with annotations from which a command line parser is inferred. Unfortunately there was not complete separation between determining what parser the user was hoping to define, and then executing it. The result was that even at the flag processing stage there were still complex decisions being made based on type. CmdArgs 0.2 has a fully featured explicit parser that can be used separately, which can process command line flags and display help messages. Now the implicit parser first translates to the explicit parser (capturing the users intentions), then executes it. The advantages of having an explicit parser are substantial.


  • It is possible to translate the implicit parser to an explicit parser (cmdArgsMode), which makes testing substantially easier. As a result CmdArgs 0.2 has far more tests.

  • I was able to write the CmdArgs command line itself in CmdArgs. This command line is a multiple mode explicit parser, which has many sub modes defined by implicit parsers.

  • The use of explicit parsers also alleviates one of the biggest worries of users, the impurity. Only the implicit parser relies on impure functions to extract annotations. In particular, you can create the explicit parser once, then rerun it multiple times, without worrying that GHC will optimise away the annotations.

  • Once you have an explicit parser, you can modify it afterwards - for example programmatically adding some flags.

  • The explicit parser better separates the internals of the program, making each stage simpler.



The introduction of the explicit parser was a great idea, and has dramatically improved CmdArgs. However, there are still some loose ends. The translation from implicit to explicit is a multi-stage translation, where each stage infers some additional information. While this process works, it is not very direct - the semantics of an annotation are hard to determine, and there are ordering constrains on these stages. It would be much better if I could concisely and directly express the semantics of the annotations, which is something I will be thinking about for CmdArgs 0.3.

GetOpt Compatibility

While the intention is that CmdArgs users write implicit parsers, it is not required. It is perfectly possible to write explicit parsers directly, and benefit from the argument processing and help text generation. Alternatively, it is possible to define your own command line framework, which is then translated in to an explicit parser. CmdArgs 0.2 now includes a GetOpt translator, which presents an API compatible with GetOpt, but operates by translating to an explicit parser. I hope that other people writing command line frameworks will consider reusing the explicit parser.

Capturing Annotations

As part of the enhanced separation of CmdArgs, I have extracted all the code that captures annotations, and made it more robust. The essential operation is now &= which takes a value and attaches an annotation. (x &= a) &= b can then be used to attach the two annotations a and b (the brackets are optional). Previously the capturing of annotations and processing of flags was interspersed, meaning the entire library was impure - now all impure operations are performed inside capture. The capturing framework is not fully generic (that would require module functors, to parameterise over the type of annotation), but otherwise is entirely separate.

Consistentency

One of the biggest changes for users should be the increase in consistency. In CmdArgs 0.1 certain annotations were bound to certain types - the args annotation had to be on a [String] field, the argPos annotation had to be on String. In the new version any field can be a list or a maybe, of any atomic type - including Bool, Int/Integer, Double/Float and tuples of atomic types. With this support it's trivial to write:


data Sample = Sample {size :: Maybe (Int,Int)}


Now you can run:


$ sample
Sample {size = Nothing}

$ sample --size=1,2
Sample {size = Just (1,2)}

$ sample --size=1,2 --size=3,4
Sample {size = Just (3,4)}


Hopefully this increase in consistency will make CmdArgs much more predictable for users.

Help Output

The one thing I'm still wrestling with is the help output. While the help output is reasonably straightforward for single mode programs, I am still undecided how multiple mode programs should be displayed. Currently CmdArgs displays something I think is acceptable, but not optimal. I am happy to take suggestions for how to improve the help output.

Conclusion

CmdArgs 0.1 was an experiment - is it possible to define concise command line argument processors? The result was a working library, but whose potential for enhancement was limited by a lack of internal separation. CmdArgs 0.2 is a complete rewrite, designed to put the ideas from CmdArgs 0.2 into practical use, and allow lots of scope for further improvements. I hope CmdArgs will become the standard choice for most command line processing tasks.

1 comment:

aleator said...

Kudos!

Very nice library! I use this for some ten or so small programs.