HLT 2002 Conference Program

The following tentative schedule gives a sense of how the Conference will proceed. Times are subject to change, though every effort will be made to stick to the opening and closing times of the conference so that people can make travel arrangements. Times for paper presentations may shift slightly, but are not expected to switch days or even between morning and afternoon.

Sunday, March 24, 2002

11:00am Registration Opens
1:30pm Conference Convenes
1:30 Special Focus Tutorials: Language Processing of Biological Data
1:30 Special Focus Tutorial 1

NLP Techniques for Information Extraction from Biological Documents
– Resource building and our experience
Professor J. Tsujii, University of Tokyo (Japan)

Demands for Information Extraction (IE) and text mining have been increasing rapidly in the biological and medical sciences. Most of the on-going projects treat Medline abstracts since this is the largest collection of papers in these fields. Compared with newspaper articles, reports, etc., the abstracts in these fields have peculiar characteristics, which make the IE task harder than those we have treated so far. In particular, complex term formations, systematic metaphor, numerous semantic classes and various types of co-ordinations and parenthetical expressions pose serious challenges for existing NLP techniques. In this tutorial, I will talk about our experience of IE in these fields together with resource building attempts.

2:50 Break
3:10 Special Focus Tutorial 2

Profile HMMs and other grammatical models of sequences
Professor Richard Hughey, University of California at Santa Cruz (USA)

Since their introduction to biological sequence analysis a decade ago, hidden Markov models (HMMs) have become a standard tool for sequence alignment and remote homology detection. This tutorial examines the effective use of profile HMMs and provides a taste of other modeling techniques such as generalized HMMs and stochastic context-free grammars.

4:30 Break
4:50 Welcome
5:00 Keynote, “Technology Meets the Entertainment Industry: Building Virtual Humans for Immersive Training”
William Swartout, Institute for Creative Technologies, USC

6:00 PAPERS: Across Human Language Technologies

・ “Asynchronous Modeling for Audio-Visual Speech Recognition”
Guillaume Gravier, Gerasimos Potamianos, Chalapathy Neti (IBM Thomas J. Watson Research Center)

・ “Arabic Speech and Text in TIDES OnTAP”
Jayadev Billa, Mohamed Noamany, Amit Srivastava, John Makhoul, Francis Kubala (BBN Technologies)

・ “Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning”
Konrad Scheffler, Steve Young (Cambridge University)

7:20 Reception (“light” hors d’eouvres)
8:45 Adjourn

Monday, March 25, 2002

7:30am Breakfast (provided)

8:30am PAPERS: Speech Recognition

・ “Selective Sampling of Training Data for Speech Recognition”
Teresa Kamm, Gerard G.L. Meyer (Johns Hopkins University)

・ “DynaSpeak: SRI’s Scalable Speech Recognizer for Embedded and Mobile Systems”
Horacio Franco, Jing Zheng, John Butzberger, Federico Cesari, Michael Frandsen, Jim Arnold, Ramana Rau, Andreas Stolcke, Victor Abrash (SRI International)

・ “Word and Sub-word Indexing Approaches for Reducing the Effects of OOV Queries on Spoken Audio”
Beth Logan, Pedro J. Moreno, Om Deshmukh (Compaq Computer Corporation)

・ “Beyond the Phoneme: A Juncture-Accent Model of Spoken Language”
Steven Greenberg, Hannah Carvery, Leah Hitchcock, Shuangyu Chang (International Computer Science Institute)
10:15 Break
10:45 PAPERS: Summarization

・ “The DUC Summarization Evaluations”
Donna Harman, Paul Over (National Institutes of Standards and Technology)

・ “Experiments in Multi-Document Summarization”
Barry Schiffman, Ani Nenkova, Kathleen McKeown (Columbia University)

・ “Automated Multi-document Summarization in NeATS”
Chin-Yew Lin, Eduard Hovy (USC/ISI)

12:05pm Lunch (provided)

1:30pm SPECIAL SESSION ON LANGUAGE PROCESSING OF BIOLOGICAL DATA
1:30 Intro to special session
1:45 Invited Talk:
Statistical NLP approaches for annotating genes and gene clusters
Dr. Russ Altman,
President, International Society for Computational Biology,
Director, Biomedical Informatics Training Program
Stanford University Medical Center, Stanford University (USA)

Bioinformatics has been driven by a series of data explosions: sequence data, structure data, and functional data (most recently from microarray expression experiments). Another data explosion is the availability of text describing major biological results. Most important biomedical literature since 1966 has been indexed in Medline and is available on the web at PubMed (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed). The literature is an important source of information to help make sense of the other data explosions. In this talk, I will review some of the challenges for natural language processing in biology, and discuss statistical techniques that my laboratory has used for adding knowledge derived from text to the tasks of 1) improving sequence homology searches, 2) assigning controlled terminologies to free text discussions, 3) evaluating the biological coherence of a group of genes, and 4) creating a lexicon of abbreviations.

2:50 PAPER: Biology and Natural Language Processing

・ “Formal grammars for estimating conformation counts of double-stranded chain molecules”
David Chiang, Aravind Joshi (University of Pennsylvania)

3:15 Break POSTER SETUP DURING THIS BREAK
3:45 papers PAPERS: Biology and Natural Language Processing

・ “Massive Bio-Ontology Engineering for NLP”
Udo Hahn, Stefan Schulz (Freiburg University)

・ “Comparative n-gram analysis of whole-genome protein sequences”
M. Ganapathiraju, J. Klein-Seetharaman, R. Rosenfeld, J. Carbonell, R. Reddy (Carnegie Mellon University)
・ “The GENIA Corpus: An Annotated Research Abstract Corpus in Molecular Biology Domain”
Yuka Tateisi, Tomoko Ohta, Jin-Dong Kim, Hideki Mima, Jun ‘ichi Tsujii (CREST, JST)

3:45 Discussion

5:30 Boaster session for poster session
6:15 Poster session with reception (“heavy” hors d’eouvres)
9:30 Adjourn

Tuesday, March 26, 2002

7:30am Breakfast (provided)
8:50am 2 papers Text Understanding

・ “A Knowledge-Rich Approach to Understanding Text about Aircraft Systems”
Peter Clark, Lisabeth Duncan, Heather Holmback, Tom Jenkins, John Thompson (Boeing)

・ “Can We Derive General World Knowledge from Texts?”
Lenhart Schubert (University of Rochester)

9:45 Boaster session for demonstrations
10:30 Demonstrations (“Science Fair”)

12:30pm Lunch (provided)

2:00 PAPERS: Information Retrieval & Text Tracking and Detection

・ “A Formal Approach to Score Normalization for Metasearch”
R. Manmatha, H. Sever (University of Massachusetts, Amherst)

・ “Quantifying Query Ambiguity”
Steve Cronen-Townsend, W. Bruce Croft (University of Massachusetts)

・ “An Algorithm for Unsupervised Topic Discovery from Broadcast News Stories”
Sreenivasa Sista, Richard Schwartz, Timothy R. Leek, John Makhoul (BBN Technologies)

・ “Relevance Models for Topic Detection and Tracking”
Victor Lavrenko, James Allan, Edward DeGuzman, Daniel La Flamme, Veera Pollard, Steven Thomas (Center for Intelligent Information Retrieval)

3:45 Break
4:15 PAPERS: Machine Translation and Multilingual Systems

・ “Named Entity Translation”
Yaser Al-Onaizan, Kevin Knight (USC/ISI)

・ “Speaker, Accent, and Language Identification using Multilingual Phone Strings”
Tanja Schultz, Qin Jin, Kornel Laskowski, Alicia Tribble, Alex Waibel (Carnegie Mellon University)

・ “Corpus-based Comprehensive and Diagnostic MT Evaluation: Initial Arabic, Chinese, French, and Spanish Results”
Kishore Papineni, Salim Roukos, Todd Ward, John Henderson, Florence Reeder (IBM & MITRE)

・ “Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics”
George Doddington (NIST)

6:00 Banquet (provided)
8:00 Plenary Demonstration Session
9:30 Adjourn

Wednesday, March 27, 2002

7:30am Breakfast (provided)
8:30 Panel of Government Sponsors
Gary Strong, National Science Foundation
Charles Wayne, DARPA
John Prange, ARDA
James Bass, DARPA
(other speakers to be arranged)

9:45 Break
10:15 PAPERS: Across Human Language Technologies

・ “Does Confidence Annotation Meet the Dialog Goal?: A Quantitative Analysis”
Kadri Hacioglu, Wayne Ward (The Center for Spoken Language Research)

・ “Statistical Answer-Type Identification in Open-Domain Question Answering”
John Prager, Jennifer Chu-Carroll, Krzsztof Czuba (IBM T.J. Watson Research Ctr.)

・ “Japanese Spoken Document Retrieval Considering OOV Keywords Using OOV Detection Processing and Word Spotting”
Hiromitsu Nishizaki, Seiichi Nakagawa (Toyohashi University of Technology)

・ “An Adaptive Approach of Name Entity Extraction for Meeting Application”
Fei Huang, Alex Waibel (Language Technology Institute, Carnegie Mellon University)

12:00noon Wrap-up session
12:30pm Conference Ends

Ward, John Henderson, Florence Reeder (IBM Research)

Jim Arnold, Ramana Rao, Andreas Stolcke, Victor Abrash (SRI International)