ISSAC 2004

General Information: Home; Important Dates; Conference Poster; Organizing Committee; Sponsors

Program: Program and Schedule; Invited Talks; Contributed Talks; Tutorials; Posters; Software Exhibitions

Registration: Information; Registered Participants

Call For: Research Papers; Posters; Software Exhibitions; Jenks Prize Nominations

Local Information: Conference Location; Speakers' Information; Lodging; Traveling; Gastronomic Guide; Additional Information

Miscellaneous: Social Events; Previous ISSACs; Other Events

Extracting Mathematical Expressions From Postscript Documents

M. Yang, R. Fateman

Full-text indexing of documents containing mathematics cannot be considered a complete success unless the mathematics symbolism is extracted and represented in a standardized form permitting both searching for formulas, and further use of this information in (for example) computer algebra systems. Most documents produced in the past and subsequently digitally encoded, and even most those potentially ``born digital'' in current journal production are---at best---encoded in a printer form such as Adobe Postscript \cite{Postscript}, in which mathematics is not explicitly marked or easily identifiable. While one might look forward in the future to other document encodings such as MathML, the current journal or textbook product is essentially without semantic content: a jumble of odd characters. We demonstrate an approach to decoding, to recognizing and extracting mathematical expressions, from a Postscript document.

issac2004 @ risc.uni-linz.ac.at