General  Information
Home
Important Dates
Conference Poster
Organizing Committee
Sponsors
 Program
Program and Schedule
Invited Talks
Contributed Talks
Tutorials
Posters
Software Exhibitions
 Registration
Information
Registered Participants
 Call  For
Research Papers
Posters
Software Exhibitions
Jenks Prize Nominations
 Local  Information
Conference Location
Speakers' Information
Lodging
Traveling
Gastronomic Guide
Additional Information
 Miscellaneous
Social Events
Previous ISSACs
Other Events

 

 

Extracting Mathematical Expressions From Postscript Documents

M. Yang, R. Fateman

 

Full-text indexing of documents containing mathematics cannot be considered a complete success unless the mathematics symbolism is extracted and represented in a standardized form permitting both searching for formulas, and further use of this information in (for example) computer algebra systems. Most documents produced in the past and subsequently digitally encoded, and even most those potentially ``born digital'' in current journal production are---at best---encoded in a printer form such as Adobe Postscript \cite{Postscript}, in which mathematics is not explicitly marked or easily identifiable. While one might look forward in the future to other document encodings such as MathML, the current journal or textbook product is essentially without semantic content: a jumble of odd characters. We demonstrate an approach to decoding, to recognizing and extracting mathematical expressions, from a Postscript document.

  issac2004 @ risc.uni-linz.ac.at