Pierre Lemaire

Assistant Professor at Grenoble INP (School of Industrial Engineering); member of the G-SCOP lab

ladoscope: general documentation

Note: this documentation is incomplete; always use the -help option to know all available options.

The different programs that are part of ladoscope share a common general behavior. The options on a command line can be given in any order; however, if there are incompatibilities between different options, the last one prevails. E.g., ladoscope -patterns -d 3 -d 1 will produce patterns of degree 1 only.

Some options exist for all programs:

For specific documentation, have a look to each specific program of ladoscope: classifier, ladoscope, ladoscript, matrices, sampler, stat_instance, stat_model.

classifier

classifier is a convenient tool to make predictions based on existing models. Several models can be used and a vote performed to predict the class of some observations. Different weights can be given for the vote of each model, and the value of an "undefined" vote can be set too.

The models are given on the command line with an optional weight. This weight is introduced by a = sign, without any spaces around (e.g. classifier m1=5 m2=7). The instance is read from the standard input or provided by the -inst option.

ladoscope

ladoscope (the program) is the first born and the core of ladoscope (the sofware). It provides the essential LAD components: cut-point production, pattern generation, model selection, ...

ladoscope is run through command-line calls only. There are lots of different options, which can be split into two main categories: actions and parameters. An action is what you ask ladoscope to do whereas a parameter is your way of telling how you want it to do it. For instance, in ladoscope -patterns -d 3 inst, the action is -patterns and the parameters are -d 3 and inst: ladoscope will produce patterns of degree 3 for the dataset inst.

ladoscope cannot perform two different actions in a same run. Hence, if you run ladoscope inst -patterns -cleancov, the first action (-patterns) is simply ignored and ladoscope will wait for you to give a model to clean. This can be easily overcome by using ladoscope's ability to read from the standard input everything that it needs and which is not provided by a parameter. For the example above you just have to run the command ladoscope inst -patterns | ladoscope inst -cleancov to get what you want (don't forget to provide inst twice!).

ladoscope has one tricky behavior that you must be aware of: by default, it displays everything it reads in reverse order (in the case of patterns, it displays the negative patterns first, in reverse order, and then the positive patterns, also in reverse order). For compatibility and efficiency reason, this is the default behavior; however there is a -sort option that solves that matter.

Besides, for everyone who wondered where the name comes from: the "lad" part is obvious and I stole the "scope" from my predecessor's "datascope" software ; for the remaining letter, well... just sounds better than the other possibilities.

The specific options of ladoscope are:

ladoscript

ladoscript is a gathering of all the other programs... and more. It provides a basic scripting language to automate LAD computations.

ladoscript reads an execute scripts. In a script, every line is a command. Among them, there are ladoscope, classifier, matrices, stat_instance, stat_model and sampler that you can use exactly as the stand-alone programs. Simple for loops are provided and variables can be defined and use in a similar way as shell variables (with a somewhat make-like syntax. The help command in a script displays all the known commands.

The syntax is somewhat primitive. Every line is a command, always of the form cmd-name cmd-parameters. White spaces are used to separate parameters; several white-spaces are merged and the ones at the beginning or the end of a line are ignored. A command is completely read; if the last character of a line is \ then the following line is considered as part of the same command. Once read, if the command does not start with \, then the substitution of all the variables is performed. Then the command line is split at every white-space character: the first word is assumed to be the command name and the other words its parameters.

Note: ladoscript is a very primitive language and is very unlikely to become really more powerful than it is today. I provide it as a convenience but urge any serious person to learn and use real scripting languages (I personaly use Perl and bash): ladoscript will never be as powerful as them

Some commented examples are available in a zip file.

The commands of ladoscript are:

matrices

matrices produces different matrices LAD-related, such as the pattern-observation incidence matrix or the variable-pattern incidence matrix; the output format can be tuned.

Given an instance and/or a model, matrices will displays the required matrix. The output can be alter by several options to fit the format you want. Unlike ladoscope, matrices's outputs are printed in the reading order by default (that is what you expect!).

sampler

sampler is a tool to help validation and cross-validation; it provides several standard method to split a data set (r-sampling, k-folding, leave-one-out).

sampler is run through command-line calls only. Its normal usage is, given an instance and a method, to display whether the training set (TRA) or the testing set (TES). sampler cannot output both TRA and TES in a single run, so you need to make two calls to get them both: be sure to use the exact same parameters (except the -tra/-tes, of course). Do not forget to provide the same seed (if you do not provide a seed, the random-number generator self-initialise itself and (TRA,TES) will be very unlikely to be a partition of the whole dataset).

stat_instance

stat_instance is a simple convenient tool that computes several LAD statistics for an instance, such as the number of positive observations or the ratio of missing values.

Given an instance, stat_instance will displays the required characteristics. The output can be alter by several options to fit what you want.

The specific options of stat_instance are:

stat_model

stat_model is a simple convenient tool that computes several LAD statistics for a model, such as the number of positive patterns or the accuracies on several given instances.

Given a model (and optionaly instances), stat_model will displays the required characteristics. The output can be alter by several options to fit what you want.