Dynamically produced field guides
Many readers will be familiar with “Mail merge” features of word-processors, where a form or template document is arranged to have some static and some variable data. The variable data area is then filled in from a separate data table when the mail merge option is run.
This is in fact a basic feature of database packages (e.g. Microsoft Access or Foxpro) where reports can be generated from data in various formats. Botanists may be familiar with the use of one set of data to generate herbarium labels, check-lists and so on. Today, images can also be added to the reports so a field guide of user-definable format and content can be generated from a field guide database. This is obviously efficient if you want to make field guides customised for particular areas or forests, because it is a trivial matter in any database to filter out some of the records, leaving only the species for a particular area, or maybe family or habit. On the other hand, a common problem with dynamic formats, it leaves no scope for the personal touch, improving layout by varying according to the context.
It is a small step again from this type of computer assembled field guide to one where the interface is via the internet, and where the various outputs could either be in your web browser or as a printable document. This approach is bound to grow over the coming years – (a good reason to think about preparing for your guide by typing information into database fields). The World Wide Web is ideally suited to separating out elements of style from data content, using the XML (World wide web consortium, 2002) (‘Extensible mark-up language). Ideas vary from simple production of sheets with pictures and text to full use of the WWW, with cross-linked databases and shared resources.
Interactive computer identification
There are various classes of computer-based identification systems (e.g. expert systems, ’statistical classifiers’ and neural networks), but for botanists multi-access key types have predominated and are a natural progression from paper-based multi-access keys and especially punched hole cards.
Numerous Online interactive databases exist: see.
- e-Floras
- Neotropikey
- Lucid keys
- ETI projects
- e-chalk
The various software packages used to author e-keys such as these all work in a similar way, based on a matrix of data (characters or character states by taxa) that can be sorted or queried in any order. The field in general has been summarised by Pankhurst (1991), the main advantages of computer keys summarised by Dallwitz.
Modified From Dallwitz (2002) summary of advantages of computer keys
(Points which are better or equally served by static keys have been removed )
- Any characters can be used, and their values changed, in any order.
- A correct identification can be made in spite of errors by the user or in the data.
- Numeric characters can be used directly, without being divided into ranges.
- The user can express uncertainty by entering more than one state value, or a range of numerical values.
- Advice on the most suitable characters to use at any stage of an identification.
- Locating errors which were circumvented by the error-tolerance mechanism.
- Use of probabilities.
- Provision for restricting any operation to subsets of the characters and taxa.
- Finding the differences and similarities between taxa.
- Finding diagnostic descriptions.
- The ability to handle large data sets efficiently.
- Data sharing with other description-based applications: description writing, generation of conventional keys, and phenetic and cladistic analysis.
Main computer identification authoring software
There are at least 15 published programs currently available for interactive identification. However, in reality the practical choice for most field guide creators is more limited. Some programs are only available as published keys with data. Other programs, such as Linnaeus II, are linked to publishers and may not be suitable for small research projects. The older programs, such as Pankey, were designed for the DOS operating system and have been largely superseded by others that suit the latest generations of Windows. Other programs are no longer supported or were never designed for the distribution of interactive keys. For more background information, reviews and lists of interactive key software, see Pankhurst (1991 and 1999) and Dallwitz et al. (2000 and onwards).
Intkey. Intkey is part of the DELTA package. The package includes the programs needed to create Intkey datasets: the DELTA Editor, Confor for translating DELTA data into the Intkey format, and Intimate for manipulating images.
LinaeusII from ETI. Linnaeus II supports the creation of taxonomic databases, optimizes the construction of easy-to-use identification keys, expedites the display and comparison of distribution patterns, and promotes the use of taxonomic data for biodiversity studies.
Lucid (now in version 3, and with a Lucid Phoenix version) is probably the most widely used authoring package, and comes in two parts, the builder for making keys and the player for running keys. Both can be downloaded from the website.
XID allows users to rank characters on usability, so that conspicuous and easily interpreted characters can be scored ahead of difficult characters.
Characters and e-keys
The underlying data in these programs is in the form of discrete (e.g. red, blue, green, 4 petals, 5 petals) or continuous variables (petiole 2.2-59cm long). Most identification software is based on a matrix or table of taxa versus either characters or character states. The matrix may consist of presence or absence scores, 1 or 0, but some programs add refinements. Variable characters, such as ’length of leaf’, are in some programmes scored as present or absent within particular ranges of values. The terminology in the literature is not standardised, but characters, such as petal colour, are often called “features”, while character states e.g. red, white, yellow, may be called “attributes”. A data standard called DELTA (DEscription Language for TAxonomy) has been widely adopted for coding taxonomic descriptions in ASCII text for identification systems. However, plants are variable by nature, herbarium collections or our knowledge are not always complete, characters can be difficult to interpret and the item to be identified is very often incomplete, all adding a degree of imprecision.
Identification proceeds by the software querying the database to select all the taxa exhibiting particular character states. More characters and character states are selected by the user until only a few or a single taxon remains. Once the selection is narrowed down, identification is confirmed by comparison of the specimen with on-screen images and descriptions as well as herbarium specimens if available.
Some programs allow the builder to score the characters for reliability or ease of use or interpretation and rarity etc. The easiest characters can be considered before the more difficult. Lucid adds a refinement in that the underlying data matrix is of taxa versus character states (rather than characters), and the individual character states can be scored for certainty e.g. present, absent, rarely present, uncertain and present by misinterpretation.
An Error tolerance facility in some interactive software is particularly useful for inexperienced botanists or when specimens are difficult to interpret, incomplete or poorly collected. Programs such as Lucid and Intkey return all the taxa that match the characters chosen and discard the rest. However, users can set the error tolerance so they will return all the taxa matching the characters chosen less the number of allowed errors. For example, choose six characters with an error tolerance of two and they will return all taxa matching six, five and four of the specified characters. A similar solution would be to rank taxa by the number of matching characters rather than discarding taxa. This has not been implemented in any of the programs reviewed, but Lucid does have the facility to list taxa by percentage similarity (i.e. the percentage of matching characters).
Botanists often use ‘spot’ characters that if present will immediately identify a particular taxon. For example, yellow latex is a good spot character for many species of Clusiaceae. Obviously, this depends on context (the latex characters are of little use within the Clusiaceae), and characters only found in one taxon in the current context are the least ideal characters for constructing short keys. However, with small numbers of taxa they can be a quick shortcut to an identification. Lucid will list what it calls ‘bingo!’ characters for the remaining taxa. Other useful facilities for the selection of characters, comparison of taxa or confirming identifications, are the listing of diagnostic characters and the differences between remaining taxa.
Imagery
Identification with an interactive key should lead to a single taxon or a small group of taxa. The identification is confirmed by comparing the unknown specimen with images on screen and possibly reference specimens. It is often useful to compare the specimen with images of the nearest matches, not just a single taxon, to be confident of the identification. Interactive keys should therefore have the facility to display images of several taxa on screen at the same time. Lucid and Intkey allow simultaneous display of images in separate windows. This can be slow if the windows have to be opened individually. Unfortunately, most interactive key programs do not yet have the zoom-in functionality taken for granted in even the most basic image manipulation programs. The system in XID is similar to typical web pages and allows scrolling of lines of images up and down the page. This is the quickest and most effective method of displaying multiple images in any of the programs seen. Unfortunately, XID will only handle one image per taxon, so composite images have to be built if images of different aspects of the plant are needed. Linnaeus II integrates maps and allows 3-D visualisations.
Other links
Neural network for leaf recognition