Full-text indexing of documents containing mathematics cannot be
considered a complete success unless the mathematics symbolism is
extracted and represented in a standardized form permitting both
searching for formulas, and further use of this information in (for
example) computer algebra systems. Most documents produced in the
past and subsequently digitally encoded, and even most those
potentially ``born digital'' in current journal production are---at
best---encoded in a printer form such as Adobe Postscript
\cite{Postscript}, in which mathematics is not explicitly marked or
easily identifiable. While one might look forward in the future to
other document encodings such as MathML, the current journal or
textbook product is essentially without semantic content: a jumble of
odd characters. We demonstrate an approach to decoding, to
recognizing and extracting mathematical expressions, from a Postscript
document.
|