| REU 2009 -- DIMACS Graph Mining |
Contact information
Ye Zhu Stephanie
Email: zzyye@eden.rutgers.edu
Email: stephanie8722@hotmail.com
Website:http://dimax.rutgers.edu/~zzyye/
Project Description
Mentor: James Abello, abello@dimacs.rutgers.edu , DIMACS
Project Title: DIMACS Graph Mining
In this project we will build a variety of multi-graphs that can be
extracted from this data set. We will then use our current set of graph
analysis tools to parse, navigate, visualize and synthesize the
findings. One central challenge is to devise methods that are privacy
preserving.
Progress Report
- Week 1 (June 2nd - June 5th)
- Navigated the Netherlands data set
- Built my own graph. Produced wadj file and a lebeled file and visualized using the graph view software
- Reviewed Gaussian Distribution
- Week 2 (June 8th - June 12nd)
- Week 3 (June 15th - June 21st)
- Processed the REU survey
- Mapped the similarity measure from the paper to REU data set
- Calculated Jaccard coefficients for the unweighted set(Majors,Languages) and weighted set (Non-Academic Interest,Computer Programming Skills)
- Week 4 (June 22nd - June 28th)
- Continued working with similarity measures of REU participants
- Developped and debugging the programming
- Produced wadj file and a lebeled file for REU participants and visualized using the graph view software
- Week 5 ((June 29th-July 5th)
- Checked the Clusters of REU participants graphs and found that the simlilarity measure made very good sense by observing the clusters
- Added Vertices'Weight to the wadj.label file so that the size of the nodes on the graph has a good meaning now
- Checked interesting clusters again in the following manner:
- choose degree and filter the graph from low to high
- choose degree and filter the graph from high to low
- choose peel and filter the graph by EdgeWeight from 0 to 1 and then 1 to 0
- Started working with examples of workshop abstract data records
- Week 6 (July 6th-July 12nd)
- Get all DIMACS workshop abstracts in different formats transformed to a uniform format
- read in the file and deleted the stop words from the silly lists first, do this for both the workshop titles and abstracts
- Then also deleted all the unimportant irregular verbs and regular verbs and their present/past tense forms,etc. in the workshop abstracts
- Make sure that anything in the titles is not deleted
- Run the program and waited...Since there are 2000 more workshop abstracts my JAVA program ran really slowly so I repeated the above steps by using Perl It's obviously faster but the bad thing was I had to learn Perl from the very beginning...(never heard of Perl before I came to REU >.<
- Week 7 (July 13rd-July 19th)
- After I get the deleted files of the workshop abstracts run by Perl, I continued working on calculating the similarity measure by using JAVA which is much more familiar=)
- Generated the wadj file and wrote the wadj label file and visulized the graph by using GraphView software
- Presentation II
- Week 8 (July 20rd-July 24th)
- I wrote up a final report on my summer research.
- Name the Cluster evaluation
- Update Final webpage