Christopher Ross

Project #: DIMACS2003-07

Mentor: S. Muthu Muthukrishnan, Department of Computer Science

Project Title: Monitoring methods for topic drift in message streams

   Consider the series of emails one receives. Hot topics appear and topics that were once hot disappear over time. How to develop mathematics needed to track the topics of current interest, design algorithms for analyzing this process and build systems to monitor the email stream? These issues are relevant to many applications including in homeland security.

   All aspects of this problem need to be understood: Markov Modelling methods in Mathematics, Streaming algorithms for tracking topics, Usable software for this problem, are some of the examples.

References:

   S. Muthukrishnan. Data Streams: Algorithms and Applications
http://athos.rutgers.edu/~muthu/stream-1-1.ps

   J. Kleinberg. Bursty and Hierarchial Structure in Streams
http://www.cs.cornell.edu/home/kleinber/bhs.ps

   Y. Zhu, D. Shasha. Efficient Elastic Burst Detection in Data Streams
http://www.cs.nyu.edu/cs/faculty/shasha/papers/burst.d/burst.pdf

Completed Steps:
   (1) Write JAVA code to implement Kleinberg's interval-processing algorithm.
   (2) Write JAVA code to implement Kleinberg's periodic algorithm.
   (3) Write JAVA code to implement the Zhu/Shasha algorithm.
   (4) Compare the efficiency and precision of the algorithms in (2) and (3).

Current Steps:
   (5) Formalize an exact definition of the "distance" between two burst-result strings.
   (6) Research applicability of Haar wavelets.

Final Project:
   PowerPoint Presentation.
   Please note that this is only a rough outline - I become rapidly annoyed by those that think a presentation is reading their slides verbatim.