Appendix III
Notes on the philosophy of the FOTOFIND Program

An experienced archivist or collector can weigh at a glance many observational and subconscious details and come to a conclusion that has a certain probability of being right. Such a judgment will always be subjective; it may be biased for or against rarities, or a decision may be rendered in haste that should be deferred for more detailed analysis. There is a need for improved decisions based on better quantified data. Because of the widespread availability of microcomputers for data retrieval and keyword sorting, it was decided to explore computer programs for sorting photographic identification data.

Obviously it is not necessary to use a computer to tell the difference between glass and paper photographs, but the problem is more complicated than that. In Section 2 are listed sixty nine types and fifty nine synonyms or closely related processes, including rarities and non-commercial processes. Previously published flow charts have been forced to disregard some of the rarities and ignore synonyms as a concession to convenience. A printed flow chart has room for only short queries and abbreviated conclusions: there is little room for text unless the chart assumes the dimensions of wallpaper.

A general purpose commercial database program was tested, but the built-in sorting procedure turned out to be completely unsuitable for a variety of reasons. The need for a special program was evident, and an exhaustive search was undertaken of available sources of descriptive data on old photographs.

An interactive computer program can be designed to formalize decision-making in a linear progression: a new decision is not considered until the current one is resolved, encouraging a certain amount of mental discipline. Flow charts are used in the same way, but when the whole chart is visible, our eyes tend to wander along several paths, and linear progression breaks down when indecision causes vacillation.

Three different computer algorithms and numerous revisions were tried in attempting to develop workable logic. The first approach was basically a computerization of the type of flow charts found in Coe & Haworth-Booth [32], Gill [67], and Rempel [124]. Twenty-five questions were formulated for yes/no answers; after each answer the program branched to another question that depended on the previous answer. Usually a conclusion could be reached in about half the questions, so the operator did not have to go through all twenty five questions. If the operator was uncertain which answer to give to a particular question, it was suggested that two runs be made with that question answered both ways and the results compared. This approach simulated the use of a flow chart, with the advantage that the computer could present more detailed questions and answers. It also provided a printout of the questions and answers for filing a permanent record with each picture.

This program worked fairly well, but an awkward flaw became apparent during trials. The rigidity of yes/no answers caused confusion because of imprecise descriptors, and the preprogrammed conclusions could only suggest groups of many possible identifications. Some descriptors are easy (paper versus glass), but color can be both a misleading indicator and a useful clue. Of course this is a fundamental problem in identifying photographs, and a computer cannot be expected to be smarter than the data it contains.

The final FOTOFIND program is based on matching key words and is more user-friendly. It also uses a fundamentally different approach to the problem of uncertainty that makes it a useful learning tool. The program operates as follows:

The user answers are read into a temporary memory array along with the same number of corresponding descriptors for the first identification candidate stored in memory. The answers are then sequentially compared to the candidate descriptors in a series of tests. Each test decides whether to reject the candidate. If there is a definite mismatch in any one of the tests, the candidate is rejected and the program moves on to the next candidate in memory. If rejection does not occur, then that candidate is printed as a definite "ID".

If the user answered "u" for "uncertain" in any question, the program treats this as a conditional acceptance rather than rejection. If further answers do not cause definite rejection of that candidate, it will be printed as a "possible ID". It is then up to the user to decide whether to seek further information to clarify the uncertainty and narrow the possibilities.

The program has provision for printing a report, including both answers and results, with the photo inventory number, so that it can be filed with the photo. It is suggested that archival paper be used for such reports.

The program makes several thousand decisions in a few seconds for a single unknown paper photograph. Since paper photographs outnumber glass or other types, paper searches take a little longer. The difference is almost imperceptible on modern personal computers.

The number of possible identifications depends on the information available. For example, tintypes are always magnetic, and transferotypes might be. If "y" is given in answer to the magnetic question, the identifications "tintype" and "transferotype" will be returned even if all the other answers are "u". Answering 'u' to all questions returns a complete list of all types in memory, which is a convenient way to list all the candidates.

When more than one identification is returned, the detailed descriptions elsewhere in the book should be consulted. If incorrect or inconsistent answers are given by the user, then no identification will be returned by the program.

The return of more than one ID or possible ID is not an ideal outcome; computers, like experts, are expected to give unqualified answers. To accomplish this, it will be necessary to ask better questions and to store more definitive descriptors for the types that closely resemble each other.

Not all descriptors are definitive; indeed, this is a fundamental problem of judgment in all identification processes. An example of ambiguity is the color of old photographs. Many paper prints show shades of brown, either from fading, toning, or process characteristics, and the color may be only a secondary clue. In other cases such as blue cyanotypes or black printers' ink, the color is a useful descriptor. In designing the DATA array certain descriptors in the memory were censored so that they are inactive even if the user enters what is thought to be a definite answer.

Another example of the difficulty of using color as an identifier is the case of calotypes, or salt prints. Variations in chemical processing and light exposure could produce colors ranging from dark brown to light green, as discussed by DuBose [45]. If FOTOFIND were programmed to recognise all possible hues, chroma, and luminance, a large number of other processes would also be candidates. To prevent confusion, the comparison data in FOTOFIND was coded to ignore certain keyboard answers to the color question as applied to calotypes and a few other processes.

FOTOFIND attempts to distinguish between some sixty identities on the basis of only ten questions, and compromises are inevitable. The questions chosen are, of course, not the only possible ones, and could probably be improved. Dealing with observational uncertainty is a basic problem in identification. In mathematics there is a field of investigation known as "fuzzy logic", which endeavors to extract meaningful conclusions from real world data that are full of uncertainty. It is a difficult problem that often requires the largest and fastest computers. However, the FOTOFIND program is only a type of interactive 'expert system'; it is an adjustable sieve that rejects the clearcut misfits and labels the remainder as definite or possible identifications. The program is useful in narrowing the list of candidates and in providing a structure for future improvement. It will usually yield greater clarity than eyeball judgment, which all too often is really 'fuzzy' logic.

The program was written and compiled in Microsoft QUICK BASIC, which is a fairly old language (the only one the author knew). The algorithm treating the problem of uncertain data entry seems to be original with this author: it was not borrowed from any other application. The source code contains nearly two thousand lines; the compiled EXE program requires about 180 kilobytes of memory in a Personal Computer. The running time for a worst-case search is about a second. BASIC limits file names to eight characters, which accounts for the spelling of FOTOFIND.