Start here

Boost Command Line Argument Processing

Several months ago I was approached by some co-works who thought it would be useful to develop a platform for writing scientific applications. As we talked, a general plan took shape; we would take our best solutions and refactor them as generalized solutions to common problems we encountered.

As a starting point, we decided that we would address the problem of command line argument processing. We knew that several solutions already existed. For instance we could use getopt() but a C based solution seemed so archaic and inflexible. Finally, we decided that we would dust off an in-house developed command line processor written by one of our researchers and make it suitable for a more general audience.

As I began the process of generalizing the component, I wondered if there was something in the Boost libraries that could address the problem. It did not take long before I stumbled across the Boost program options library. The Boost program options library is a general purpose command line processor with an impressive set of capabilities. The discussion and example below should give you a general idea of how to use the Boost program options library in your own applications.


How to use Boost for Command Line Argument Processing

Here the general workflow you will want to follow when using the Boost program options library:

  • Include the headers relevant to the boost::program_options library. I have included a list in the example source code below for your reference. Do not forget to include std::exception so you can handle any argument errors emitted by the command line parser. It is good practice to print the exception followed by the standard help text so the user can understand what went wrong.
  • Create an options description object and add descriptive text for display when help argument is supplied by the user.
  • Add the definition of each command line argument to your options description object using one of two formats:

(“long-name,short-name”, “Description of argument”) for flag values or

(“long-name,short-name”, <data-type>, “Description of argument”) for arguments with values

  • Remember that arguments with values may have multiple values (i.e. there may be multiple occurrences of an argument specification on a single command line), therefore all valued arguments must be vectors of the base argument type.
  • If your command has position arguments (i.e. arguments without a preceding -tag) map them to their tag valued counterparts using a positional options description object. In the example below I will do this with the –input-file parameter.
  • Create a container (variables map) to be used by the command line parser to store any parsed arguments and their values.
  • Bind the options description and positional options description objects to the command line parser and run the parser. Do not forget to catch and display any exceptions emitted by the parser.
  • As another good practice, add a help argument handler to display help text when requested by the user.
  • Finally, your program is ready to consume the processed command line arguments. When accessing these arguments use one of these two patterns:

The presence of command line flags can be detected by sampling the value count of the specific argument found in the variable map. Here is an example:

if (variable_map.count(“restart”))
{
    cout << “–restart specified” << endl;
}

Valued arguments may be retrieved by using a casting method provided by the variable map. Note that the casting method is template function and must be supplied with the same vectored value type supplied to the options description object.

if (variable_map.count(“output-file”))
{
    vector<string> outputFilename =
        vm[“output-file”].as< vector<string> >();
    cout << “–output-file specified with value = ”
        
<< outputFilename[0] << endl;
}


A Concrete Example

Assume we want to write an application with the following command line arguments:

Command [-h|–help] [-m|–memory-report] [-r|–restart] [-t|–template] [-v|–validate] [-o|-output-file <file name>] -i|-input-file <file name>



Also assume, for legacy reasons, we want to support the ability to specify the input file positionally as well, like this:

Command [-h|–help] [-m|–memory-report] [-r|–restart] [-t|–template] [-v|–validate] [-o|-output-file <file name>] <file name>


From the above we know that the -h, -m, -r, -t and -v are command line flags. We can also see that both the -o and -i arguments take a file name as an argument value. Except for input file handling our application command line is fairly straightforward. The source code below illustrates how we would use the boost_program_options library to implement our application command line parsing functionality:

[CommandLine.cpp]
// Include the headers relevant to the boost::program_options
// library
#include <boost/program_options/options_description.hpp>
#include <boost/program_options/parsers.hpp>
#include <boost/program_options/variables_map.hpp>
#include <boost/tokenizer.hpp>
#include <boost/token_functions.hpp>

using namespace boost;
using namespace boost::program_options;

#include <iostream>
#include <fstream>

// Include std::exception so we can handling any argument errors
// emitted by the command line parser
#include <exception>

using namespace std;

int main(int argc , char **argv)
{

// Add descriptive text for display when help argument is
// supplied
options_description desc(
    “\nAn example command using Boost command line “
    ”arguments.\n\nAllowed arguments”);

// Define command line arguments using either formats:
//
//     (“long-name,short-name”, “Description of argument”)
//     for flag values or
//
//     (“long-name,short-name”, <data-type>, 
//     “Description of argument”) arguments with values
//
// Remember that arguments with values may be multi-values
// and must be vectors
desc.add_options()
    (“help,h”, “Produce this help message.”)
    (“memory-report,m”, “Print a memory usage report to “
     ”the log at termination.”)
    (“restart,r”, “Restart the application.”)
    (“template,t”, “Creates an input file template of “
     ”the specified name and then exits.”)
    (“validate,v”, “Validate an input file for correctness ”
     ”and then exits.”)
    (“output-file,o”, value< vector<string> >(),
     “Specifies output file.”)
    (“input-file,i”, value< vector<string> >(),
     “Specifies input file.”);

// Map positional parameters to their tag valued types 
// (e.g. –input-file parameters)
positional_options_description p;
p.add(“input-file”, -1);

// Parse the command line catching and displaying any 
// parser errors
variables_map vm;
try
{
    store(command_line_parser(
    argc, argv).options(desc).positional(p).run(), vm);
    notify(vm);
} catch (std::exception &e)
{
    cout << endl << e.what() << endl;
    cout << desc << endl;
}

// Display help text when requested
if (vm.count(“help”))
{
    cout << “–help specified” << endl;
    cout << desc << endl;
}

// Display the state of the arguments supplied
if (vm.count(“memory-report”))
{
    cout << “–memory-report specified” << endl;
}

if (vm.count(“restart”))
{
    cout << “–restart specified” << endl;
}

if (vm.count(“template”))
{
    cout << “–template specified” << endl;
}

if (vm.count(“validate”))
{
    cout << “–validate specified” << endl;
}

if (vm.count(“output-file”))
{
    vector<string> outputFilename =
        vm[“output-file”].as< vector<string> >();
    cout << “–output-file specified with value = “
        << outputFilename[0] << endl;
}

if (vm.count(“input-file”))
{
    vector<string> inputFilename =
        vm[“input-file”].as< vector<string> >();
    cout << “–input-file specified with value = “
        << inputFilename[0] << endl;
}

return EXIT_SUCCESS;

}


Compiling and linking your application using the Boost program options library is relatively straight forward. During compilation, remember to add the Boost include directory using the -I compiler option. When linking, specify the location of the Boost libraries using the -L option and include the program options library using the -l option. Here is a copy of the Makefile I used to compile the example code above:

[Makefile]
CXX = clang++
CXXFLAGS = -O2 -g -Wall -std=c++11 -fmessage-length=0
INCLUDES := -I ~/boost
LD := clang++
LDFLAGS := -L ~/boost/lib -lboost_program_options
SOURCES := $(shell find . -depth 1 -name ‘*.cpp’ -print | sort)
OBJECTS := $(SOURCES:.cpp=.o)
TARGETS = CommandLine

%.o:%.cpp
    $(CXX) $(CXXFLAGS) $(INCLUDES) -c $< -o $@

all: $(TARGETS)

CommandLine: CommandLine.o
    $(LD) $(LDFLAGS) -o $@ $<
    chmod 755 $@

clean:
    rm -f $(TARGETS) $(OBJECTS)



Now that we have a running application, we will try invoking it with several different combinations of arguments. First use the help argument. Here is a listing of the results:

Examples$ ./CommandLine -h
–help specified
An example command using Boost command line arguments.
Allowed arguments:
-h [ –help ] Produce this help message.
-m [ –memory-report ] Print a memory usage report to the log at termination.
-r [ –restart ] Restart the application.
-t [ –template ] Creates an input file template of the specified name and then exits.
-v [ –validate ] Validate an input file for correctness and then exits.
-o [ –output-file ] arg Specifies output file.
-i [ –input-file ] arg Specifies input file.



Next we can check to see if our other arguments are processed correctly:

Examples$ ./CommandLine -m -r -t -v -o outfile -i infile
–memory-report specified
–restart specified
–template specified
–validate specified
–output-file specified with value = outfile
–input-file specified with value = infile



Since the -i argument is both -tag and positional, wee will check to see if our application can recognize it as a positional argument:

Examples$ ./CommandLine –memory-report –restart –template –validate –output-file outfile infile
–memory-report specified
–restart specified
–template specified
–validate specified
–output-file specified with value = outfile
–input-file specified with value = infile



Finally, we can try an example command invocation with an invalid argument (e.g. -bad-arg):

Examples$ ./CommandLine –memory-report –restart –template –validate –output-file outfile infile -bad-arg
unrecognised option ‘-bad-arg’
An example command using Boost command line arguments.
Allowed arguments:
-h [ –help ] Produce this help message.
-m [ –memory-report ] Print a memory usage report to the log at termination.
-r [ –restart ] Restart the application.
-t [ –template ] Creates an input file template of the specified name and then exits.
-v [ –validate ] Validate an input file for correctness and then exits.
-o [ –output-file ] arg Specifies output file.
-i [ –input-file ] arg Specifies input file.


Conclusion

In my previous blog post, I discussed how expensive it is to write software. In this case, a simple custom written command line processor could easily cost several thousand dollars to write on your own. Put simply, Boost can save you money. Next time you write an application with command line arguments, I hope you consider using Boost rather than writing your own command line processor.

Intrinsic Type Conversion Using Template Specialization

I was recently approached by one of my colleagues with a question about encapsulation. Basically, he wanted to create a templated C++ wrapper around the MPI library but couldn’t decide how to handle mapping the intrinsic types from the template to the MPI defined enumerations. What follows is a step-by-step guide to using template specialization for converting an intrinsic type into an MPI_DataType needed by the MPI API.

 

The Problem

For the sake of illustration, let’s say that we want to write an abstraction that hides the complexity of the MPI library. It’s easy to see how we could take an MPI method like:

int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)

and wrap it with a method signature like this:

template<class T>
void Send<T>(std::vector<T> &vec, int dest, int tag = 0)

Note: Let’s assume that the internals of the method will handle any issues with detecting errors or choosing the appropriate comm world]

One problem you will quickly run into is, how do you map the template intrinsic type into an MPI_Datatype? Eventually Send<int>() must make a call to MPI_Send() using an MPI_INT value for MPI_Datatype parameter. In short, how do we take a call like this:

Send<int>(vec, 1)

and turn it into this:

MPI_Send(vec.data(), vec.size(), MPI_INT, dest, tag, COMM_WORLD)

The second problem you run into is that if you write an abstraction around something, you typically don’t want the caller to have to see the thing you’re abstracting. In this case, that means we don’t want the user of our abstraction to have to include the MPI headers in his project. In fact, we don’t want the user to actually know that we’re using MPI at all.

The Solution

To achieve this level of encapsulation we’ll have to solve both problems one and two above. To simplify things let’s solve the first problem before we look at the second.

Solution to Problem One: Mapping Intrinsic Types to an Enumeration

First, let’s define the types that our abstraction will work with. There should be a one-to-one mapping between these enumeration values and the set of intrinsic data types our abstraction will support. Here is a code snippet of what it should look like:

namespace Abstraction
{
typedef enum
{
type_unknown = 0,
type_char,
type_unsigned_char,
type_short,
type_unsigned_short,
type_int,
                   type_unsigned_int,
                   type_long,
                   type_unsigned_long,
                   type_float,
                   type_double
          } DataType;
}

Next, define a template function to convert an intrinsic data type into one of our abstraction enumerated values. Notice that the default implementation always fails. This is because any intrinsic data type that does not have its own specialized version of this template represents an unknown type and unknown types have no meaning here. Note that you might want this implementation of the template to throw or assert so that it is NEVER used. For example:

template <class T>
Abstraction::DataType getAbstractionDataType()
          { throw std::runtime_error(“Intrinsic type not supported by the abstraction.”); }

 

Finally, define a template specialization for each of the different intrinsic types our abstraction supports. Note that the templating and inlining basically makes the template function a compile time numeric substitution. We have:

template <>
inline Abstraction::DataType getAbstractionDataType<char>()
         { return Abstraction::type_char; }

template <>
inline Abstraction::DataType getAbstractionDataType<unsigned char>()
         { return Abstraction::type_unsigned_char; }

template <>
inline Abstraction::DataType getAbstractionDataType<short>()
         { return Abstraction::type_short; }

template <>
inline Abstraction::DataType getAbstractionDataType<unsigned short>()
         { return Abstraction::type_unsigned_short; }

template <>
inline Abstraction::DataType getAbstractionDataType<int>()
         { return Abstraction::type_int; }

template <>
inline Abstraction::DataType getAbstractionDataType<unsigned int>()
         { return Abstraction::type_unsigned_int; }

template <>
inline Abstraction::DataType getAbstractionDataType<long>()
         { return Abstraction::type_long; }

template <>
inline Abstraction::DataType getAbstractionDataType<unsigned long>()
         { return Abstraction::type_unsigned_long; }

template <>
inline Abstraction::DataType getAbstractionDataType<float>()
         { return Abstraction::type_float; }

template <>
inline Abstraction::DataType getAbstractionDataType<double>()
         { return Abstraction::type_double; }

This completes the solution to problem one. We can easily use the getAbstractionDataType<>() template function to determine the appropriate enumeration value for the given intrinsic type. Here is an example of its use:

template<class T>
void PrintTypeEnumerationValue()
{
          printf(“Type enumeration value is %d\n”, getAbstractionDataType<T>());
}

Solution to Problem Two: Hiding API Types and Enumeration Values

In order to hide the references to MPI, we’ll need a set of functions that have their implementations in a source file (i.e. .cpp file). Later, when the user references our abstraction via a set of headers and a library, they will never see the references to MPI.

void SendImpl(void *data, int count, Abstraction::DataType type, int dest, int tag);
void ReceiveImpl(void *data, int count, Abstraction::DataType type, int src, int tag);

Add two templated functions for doing sends and receives. The sole purpose of these functions is to act as an adapter by converting the template intrinsic type into an enumerated value by using getAbstractionDataType<>().

template<class T>
void Send(std::vector<T> &vec, int dest, int tag)
{
          // Call our hidden implementation of Send using:
          // The vector data pointer and element count,
          // The Abstraction::DataType provided by our template function, and
          // The originally provided destination and tag values.
          SendImpl(vec.data(), vec.size(), getAbstractionDataType<T>(), dest, tag);
}

template<class T>
void Receive(std::vector<T> &vec, int src, int tag)
{
          // Call our hidden implementation of Receive using:
          // The vector data pointer and element count,
          // The Abstraction::DataType provided by our template function, and
          // The originally provided source and tag values.
          ReceiveImpl(vec.data(), vec.size(), getAbstractionDataType<T>(), src, tag);
}

Within the .cpp file defining the above declarations, we can provide a function to map our abstraction enumeration value to the proper MPI_DataType value. Here is a simple function that does the job:

static MPI_Datatype ConvertType(Abstraction::DataType type)
{
          switch(type)
          {
                    case Abstraction::type_char: return MPI_CHAR;
                    case Abstraction::type_unsigned_char: return MPI_UNSIGNED_CHAR;
                    case Abstraction::type_short: return MPI_SHORT;
                    case Abstraction::type_unsigned_short: return MPI_UNSIGNED_SHORT;
                    case Abstraction::type_int: return MPI_INT;
                    case Abstraction::type_unsigned_int: return MPI_UNSIGNED;
                    case Abstraction::type_long: return MPI_LONG;
                    case Abstraction::type_unsigned_long: return MPI_UNSIGNED_LONG;
                    case Abstraction::type_float: return MPI_FLOAT;
                    case Abstraction::type_double: return MPI_DOUBLE;
          };
         throw std::runtime_error(“MPI_Datatype Convert(Abstraction::DataType) failed”);
}

Finally, within the implementations of SendImpl() and ReceiveImpl() use the above ConvertType() function to convert our abstraction data type enumeration value to an MPI_DataType value.

void SendImpl(void *data, int count, Abstraction::DataType type, int dest, int tag)
{
          if (MPI_Send(data, count, ConvertType(type), dest, tag, MPI_COMM_WORLD) != MPI_SUCCESS)
          {
                    throw std::runtime_error(“MPI_Send failed”);
          }
}

void ReceiveImpl(void *data, int count, Abstraction::DataType type, int src, int tag)
{
          MPI_Status status;
          if (MPI_Recv(data, count, ConvertType(type), src, tag, MPI_COMM_WORLD, &status) != MPI_SUCCESS)
          {
                    throw std::runtime_error(“MPI_Recv failed”);
          }
}

How Much Does Software Cost: The Impact of Minor Development Decisions on Project Budget

In 1981 I joined MITRE as a member of their Technical Staff. During my 7 year tenure there, I spent over 5 of them estimating the cost of complex computer systems for both the Space Station Control Center and Space Station Training Facility. These estimates usually involved both hardware and software components.

An Estimating Model

Much of our cost estimation work relied on research done by Barry Boehm as published in his book Software Engineering Economics [1981]. His cost model called “the Constructive Cost Model” (COCOMO) provided a framework for estimating the cost and uncertainty of software projects. Boehm’s model came in three levels of complexity Basic, Intermediate and Detailed based on the amount of information available to the estimator. For example, the Intermediate COCOMO estimates may be made using the following formula:

E = ai * (KLoC)bi * EAF

Where:

E represents estimated effort in person-months,

ai and bi are constants defined by the type of project,

KLoC represents the lines-of-code in thousands, and

EAF is formed from the product of 15 project attribute ratings.

Application to Today’s Projects

Let’s apply this simple formula to a real life project whose effort and source base is known to see how this methodology stacks up with today’s levels of software productivity.

I’ve been working on a High Performance Computing project that contains approximately 60K lines of code. Over the last three years there have been approximately 6 person-equivalents working on the project each year for a total of 18 person-years. This results in a productivity of:

Productivity = 277.7 lines-of-code / person-month

With respect to the Intermediate COCOMO model, lets assume that the project is a normal (Organic, ai = 3.2 and bi=1.05) project with attributes that are all Nominal (1.0) and therefore have a product of 1.0. Given these factors Boehm’s model becomes:

E = 3.2 * (KLoC)1.05  or

E = 3.2 * (60)1.05  or

E = 235.6 person-months

E = 19.6 person-years

Productivity = 255.1 lines-of-code / person-month

Observations

Interesting, it’s been over 30 years and estimates based on pre-1981 productivity still seem to be relevant (at least in this case). This suggests that to a large extent productivity levels for C/C++ development have changed very little.

More disturbing is that both these numbers point to the fact that software is extremely expensive.  The following scenario highlights just how expensive a seemingly small development decision can be.

Suppose that you’re working on an application and need to process command line arguments. You have several options, but let’s look at two:

  • You could write your own command line processor. Let’s assume that if you do, the component would be about 165 lines of code in C++ (a historical figure), or
  • You could use the Boost::program_options library and avoid writing a component. Since there is no component to be written, the library cost is 0 lines of code.

Each of the two methods result in roughly the same amount of code to use the above solutions. The question is, how much extra did it cost to write your own command line processor rather than using the Boost library?

Using the historical project productivity rate of 277.7 lines of code/person-month we have:

165/277.7 person-months or

0.59 person-months

Assuming that the current billing rate for a developer is $150K/year or $12.5K/month, this means that our component cost is:

component cost = $7.4K

Would it really be worth the money to write your own component? Certainly any effort would have to be weighed against the added benefit of unique projects requirements and cost.

Next time you need a new component or feature look for existing components in standard libraries such as Boost. Using these components can save a significant amount of project development time and money.

Xcode Source Code Management with Git

Here is a quick introduction to using Git from within Xcode. The tutorial provides a very brief visual example of where the Git commands are located within the XCode IDE.

Students Develop Device That Mimics Brain

Fok and Alex Neckar, both doctoral candidates in electrical engineering at Stanford, have developed. The Neurogrid emulates 1 million neurons and 6 billion synapses in real time.

Its always important to understand how you OS organizes your applications and data. Nixie Pixel provides a beginners explanation of the Linus file system.

Creating a Bootable Linux Image for your Pandaboard ES

In order to bring Linux up on a Pandaboard ES you will need to create a bootable image of the Linux operating system. I’ve chosen Ubuntu 2.10 (Quetzal) as my distribution. Here are the steps necessary:

  • From your workstation (in my case, OSX Mountain Lion) download a bootable image of your Linux distribution. The Pandaboard ES supports OMAP4 so make sure you download the appropriate Linux image.
  • Insert the SD card you wish to boot from into a USB carrier and insert the carrier into a USB port on your workstation.

Create a terminal window to enter the remaining commands.

  • Use the “mount” command to assess what your mounted devices are. This is important because you want to find out what device the workstation is using to mount your SD card. In my case the drive mount is:

/dev/disk1s1 as /Volumes/No Name

  • On OSX an inserted USB stick is automatically mounted. In order to write the bootable Linux image to the SD card, you must forcibly remove the mount point. To do this, use the command:

sudo diskutil umountDisk /dev/disk1

  • Finally, copy the bootable Linux image to the SD card using the dd command. This command may take a few minutes to complete. To speed up copying use a larger block size with the command. In my case I chose a block size of 1 Mb (bs = 1m).

sudo dd if=<img file name> of=/dev/disk1 bs=1m

Once the command completes you will have a bootable Linux image on your SD card and you are ready to fire up your Pandaboard ES.

Wget: The command line Internet tool

I saw this really good video by Nixie Pixel discussing the command line tool wget. Wget is a command line tool that allows you to download Internet content using HTTP, HTTPS and FTP protocols. Since the tool is non-interactive, it is easily added to shell scripts, allowing the user to automate complex data retrieval tasks.

The Biggest Changes in C++11 (and Why You Should Care)

Nice article summarizing the changes in C++ due to the new C++11 standard.  One thing that struct me was that the author counted lambda expressions as a positive w.r.t. both productivity and security. Not having all those extra public entry points laying around can help tighten the security of a library API.