General  Information
Important Dates
Conference Poster
Organizing Committee
Program and Schedule
Invited Talks
Contributed Talks
Software Exhibitions
Registered Participants
 Call  For
Research Papers
Software Exhibitions
Jenks Prize Nominations
 Local  Information
Conference Location
Speakers' Information
Gastronomic Guide
Additional Information
Social Events
Previous ISSACs
Other Events



Extracting Mathematical Expressions From Postscript Documents

M. Yang, R. Fateman


Full-text indexing of documents containing mathematics cannot be considered a complete success unless the mathematics symbolism is extracted and represented in a standardized form permitting both searching for formulas, and further use of this information in (for example) computer algebra systems. Most documents produced in the past and subsequently digitally encoded, and even most those potentially ``born digital'' in current journal production are---at best---encoded in a printer form such as Adobe Postscript \cite{Postscript}, in which mathematics is not explicitly marked or easily identifiable. While one might look forward in the future to other document encodings such as MathML, the current journal or textbook product is essentially without semantic content: a jumble of odd characters. We demonstrate an approach to decoding, to recognizing and extracting mathematical expressions, from a Postscript document.

  issac2004 @