Information processing

“Computer, summarize this text!”

The human brain is no longer needed to summarize scholarly articles. A computer program can handle the job. Before our journalist’s eyes, a 15-page scientific document was abridged in a fraction of second. The result (coherent, precise and grammatically correct) took up 12 lines.

The software, still in its experimental stage, is a group project by students in the Applied Computer Linguistics Research Laboratory at University of Montréal, under the supervision of professor Guy Lapalme, a professor in the Department of Information Processing and Operational Research. After the SumUM package, which produced 10- to 15-line summaries of scientific articles, Atefeh Farzindar started looking at jurisprudence texts—not as simple a job, but one that yields astonishing results. “Currently,” she says, “we are only working on documents in English, but nothing prevents us from branching into other languages.”

Of course computers don't understand the meaning of the words. As a result, the researchers must adopt various strategies to “teach” them to write summaries. One option is to analyze the work of real flesh-and-blood editors. Where do they pick up information when they summarize a text? In general, they look at the introduction, the conclusion, the titles, the captions and beginnings of paragraphs. The computer has to take basically the same approach.

Applied to computers, this method can reduce the quantity of text to analyze. The computer does statistical calculations that establish any abnormal frequencies of some words, determines whether particular words are always associated with other words or if some words appear to be keywords. The software saves these significant expressions and then casts them in correct language, reproducing them in a predetermined format.

Farzindar Atefeh is working in conjunction with the Centre for Research into Public Law in the Faculty of Law, which is providing him with a large quantity of digital documents. While the software is already partly functional, what information absolutely has to appear in the summary remains to be determined. Algorithms also have to be developed that will enable the computer to differentiate between expressions like “telephone call” and “call to the bar.”

 

Researcher: Atefeh Farzindar
Direction: Guy Lapalme, lapalme@iro.umontreal.ca
Email: Atefeh.farzindar@umontreal.ca
Telephone: (514) 343-2145
 


Archives | Communiqués | Pour nous joindre | Calendrier des événements
Université de Montréal, Direction des communications et du recrutement