Student

Mentor

 

Phyo Thiha

Swarthmore College, PA.

 

William M. Pottenger, PhD

Associate Research Professor

Computer Science and DIMACS

Rutgers University

DIMACS Summer 2008 REU

Project Description

Coming soon…

If you can’t wait, please check the links to the presentation slides provided below ;-)

Log

 

Action

Start

End

1.

Reading for the Literature Search

¨ “The Power of Word Clusters for Text Classification” (Slonim & Tishby)

¨ “A Novel Bayesian Classifier for Sparse Data” (Ganiz & Pottenger)

¨ “Mining Higher-Order Association Rules from Distributed Named Entity Databases” (Li, Janneck, et al.)

¨ Ian H. Witten and Eibe Frank (2005) "Data Mining: Practical machine learning tools and techniques", 2nd Edition, Morgan Kaufmann, San Francisco, 2005.

 

Week 1

Week 1

Week 1

 

 

Week 1

 

Week 1

Week1

Week 2

 

 

Present

2.

Reading & trying to understand the code “IBA_1.0: Information Bottleneck Clustering (2003)” provided at

URL: http://www.princeton.edu/~nslonim/

Week 2

Week2

3.

Write Summaries for each paper mentioned above and for Chapters 1,2,3 and 4 from “Data Mining: Practical machine learning tools and techniques”

Week 1,2,3,4

Week 1,2,3,4

respectively

4.

 Presentation 1: Introduction to my project and plans

 

 Contacted Slonim and trying to tackle his paper about word clusters in more detail

Tuesday, June 17

 

5.

 Changed focus to HONB modification; read the code Murat wrote; re-read the paper about HONB mentioned above

 Assigned run experiments for the best results of SMO on Weka 3

 Also assigned to skim through documentations about software design and learn about HONB software architecture; asked to stop after we switched focus on running SMO tests

Week 4

 

Week 4

Week 4

 

 

Week5

Week 4

6.

 Wrote a summary of the findings on SMO results and handed them to Professor Pottenger

 Re-read Chapter 5 and some part of Chapter 6 from “Data Mining: Practical machine learning tools and techniques”

 Started studying the code to filter higher order paths in HONB

Early Week 5

 

Week 5

 

End of Week 5

 

 

Early Week6

 

 

7.

 Got reply from Murat and used the snippet of code provided to modify HONB for getting pure higher order path; decided to give up and change course

 Run experiments for pure/filtered HONB

 Wrote a summary of the findings and handed in a report to professor

Early Week 6

 

 

Week 6

End of Week 6

 

 

 

End of Week 6

8.

 Re-read Chapter 5 and 6 of the textbook; start brainstorming for the potential project ideas

 Prepare for REU Final Presentations

 Assigned a task to correct the frequency of occurrences in HONB code

 Final Presentation Done; Update the webpage

End of Week 6

 

Early Week 7

Mid Week 7

Thursday, July 17

 

Week 7

 

 

 

 

Resources & Links

1. URL: http://www.princeton.edu/~nslonim/

Þ Noam Slonim’sWebpage.  A good place to see his work (related to IB and word clustering) and retrieve papers written by him.

2. URL: http://citeseer.ist.psu.edu/

Þ Cite Seer.IST.  To download scientific research papers from this awesome digital library.

3. First Presentation Slide

4. Final Presentation Slide

5.  

6.  

7.