Appendix III
Notes on the philosophy of the FOTOFIND Program
An experienced archivist or collector can weigh at a glance
many observational and subconscious details and come to a
conclusion that has a certain probability of being right. Such
a judgment will always be subjective; it may be biased for or
against rarities, or a decision may be rendered in haste that
should be deferred for more detailed analysis. There is a need
for improved decisions based on better quantified data. Because
of the widespread availability of microcomputers for data
retrieval and keyword sorting, it was decided to explore
computer programs for sorting photographic identification data.
Obviously it is not necessary to use a computer to tell the
difference between glass and paper photographs, but the problem
is more complicated than that. In Section 2 are listed sixty
nine types and fifty nine synonyms or closely related
processes, including rarities and non-commercial processes.
Previously published flow charts have been forced to disregard
some of the rarities and ignore synonyms as a concession to
convenience. A printed flow chart has room for only short
queries and abbreviated conclusions: there is little room for
text unless the chart assumes the dimensions of wallpaper.
A general purpose commercial database program was tested, but
the built-in sorting procedure turned out to be completely
unsuitable for a variety of reasons. The need for a special
program was evident, and an exhaustive search was undertaken of
available sources of descriptive data on old photographs.
An interactive computer program can be designed to formalize
decision-making in a linear progression: a new decision is not
considered until the current one is resolved, encouraging a
certain amount of mental discipline. Flow charts are used in
the same way, but when the whole chart is visible, our eyes
tend to wander along several paths, and linear progression
breaks down when indecision causes vacillation.
Three different computer algorithms and numerous revisions were
tried in attempting to develop workable logic. The first
approach was basically a computerization of the type of flow
charts found in Coe & Haworth-Booth [32], Gill [67], and
Rempel [124]. Twenty-five questions were formulated for yes/no
answers; after each answer the program branched to another
question that depended on the previous answer. Usually a
conclusion could be reached in about half the questions, so the
operator did not have to go through all twenty five questions.
If the operator was uncertain which answer to give to a
particular question, it was suggested that two runs be made
with that question answered both ways and the results compared.
This approach simulated the use of a flow chart, with the
advantage that the computer could present more detailed
questions and answers. It also provided a printout of the
questions and answers for filing a permanent record with each
picture.
This program worked fairly well, but an awkward flaw became
apparent during trials. The rigidity of yes/no answers caused
confusion because of imprecise descriptors, and the
preprogrammed conclusions could only suggest groups of many
possible identifications. Some descriptors are easy (paper
versus glass), but color can be both a misleading indicator and
a useful clue. Of course this is a fundamental problem in
identifying photographs, and a computer cannot be expected to
be smarter than the data it contains.
The final FOTOFIND program is based on matching key words and
is more user-friendly. It also uses a fundamentally different
approach to the problem of uncertainty that makes it a useful
learning tool. The program operates as follows:
The user answers are read into a temporary memory array along
with the same number of corresponding descriptors for the first
identification candidate stored in memory. The answers are then
sequentially compared to the candidate descriptors in a series
of tests. Each test decides whether to reject the candidate. If
there is a definite mismatch in any one of the tests, the
candidate is rejected and the program moves on to the next
candidate in memory. If rejection does not occur, then that
candidate is printed as a definite "ID".
If the user answered "u" for "uncertain" in any question, the
program treats this as a conditional acceptance rather than
rejection. If further answers do not cause definite rejection
of that candidate, it will be printed as a "possible ID". It is
then up to the user to decide whether to seek further
information to clarify the uncertainty and narrow the
possibilities.
The program has provision for printing a report, including both
answers and results, with the photo inventory number, so that
it can be filed with the photo. It is suggested that archival
paper be used for such reports.
The program makes several thousand decisions in a few seconds
for a single unknown paper photograph. Since paper photographs
outnumber glass or other types, paper searches take a little
longer. The difference is almost imperceptible on modern
personal computers.
The number of possible identifications depends on the
information available. For example, tintypes are always
magnetic, and transferotypes might be. If "y" is given in
answer to the magnetic question, the identifications "tintype"
and "transferotype" will be returned even if all the other
answers are "u". Answering 'u' to all questions returns a
complete list of all types in memory, which is a convenient way
to list all the candidates.
When more than one identification is returned, the detailed
descriptions elsewhere in the book should be consulted. If
incorrect or inconsistent answers are given by the user, then
no identification will be returned by the program.
The return of more than one ID or possible ID is not an ideal
outcome; computers, like experts, are expected to give
unqualified answers. To accomplish this, it will be necessary
to ask better questions and to store more definitive
descriptors for the types that closely resemble each other.
Not all descriptors are definitive; indeed, this is a
fundamental problem of judgment in all identification
processes. An example of ambiguity is the color of old
photographs. Many paper prints show shades of brown, either
from fading, toning, or process characteristics, and the color
may be only a secondary clue. In other cases such as blue
cyanotypes or black printers' ink, the color is a useful
descriptor. In designing the DATA array certain descriptors in
the memory were censored so that they are inactive even if the
user enters what is thought to be a definite answer.
Another example of the difficulty of using color as an
identifier is the case of calotypes, or salt prints. Variations
in chemical processing and light exposure could produce colors
ranging from dark brown to light green, as discussed by DuBose
[45]. If FOTOFIND were programmed to recognise all possible
hues, chroma, and luminance, a large number of other processes
would also be candidates. To prevent confusion, the comparison
data in FOTOFIND was coded to ignore certain keyboard answers
to the color question as applied to calotypes and a few other
processes.
FOTOFIND attempts to distinguish between some sixty identities
on the basis of only ten questions, and compromises are
inevitable. The questions chosen are, of course, not the only
possible ones, and could probably be improved. Dealing with
observational uncertainty is a basic problem in identification.
In mathematics there is a field of investigation known as
"fuzzy logic", which endeavors to extract meaningful
conclusions from real world data that are full of uncertainty.
It is a difficult problem that often requires the largest and
fastest computers. However, the FOTOFIND program is only a type
of interactive 'expert system'; it is an adjustable sieve that
rejects the clearcut misfits and labels the remainder as
definite or possible identifications. The program is useful in
narrowing the list of candidates and in providing a structure
for future improvement. It will usually yield greater clarity
than eyeball judgment, which all too often is really 'fuzzy'
logic.
The program was written and compiled in Microsoft QUICK BASIC,
which is a fairly old language (the only one the author knew).
The algorithm treating the problem of uncertain data entry
seems to be original with this author: it was not borrowed from
any other application. The source code contains nearly two
thousand lines; the compiled EXE program requires about 180
kilobytes of memory in a Personal Computer. The running time
for a worst-case search is about a second. BASIC limits file
names to eight characters, which accounts for the spelling of
FOTOFIND.