XIFENG YAN

home | research | publications | tutorials | software


 

Research Staff Member
IBM T. J. Watson Research Center

Ph.D. (2006)
Computer Science
University of Illinois at Urbana-Champaign

Curriculum Vitae [PDF]

IBM T. J. Watson Research Center
19 Skyline Drive
Hawthorne, NY 10532

xifeng [at) gmail dot com
xifengyan [at) us dot ibm dot com

 

RESEARCH INTERESTS

  Data mining, data management and machine learning, with emphasis on modeling, managing, and mining large-scale graphs and networks in bioinformatics, social networks, the Web, and computer systems. I am also working on social network analysis for enterprise search and text mining. (read more)
 
SELECTED PUBLICATIONS

complete list | dblp


 
  1. Efficient Ticket Routing by Resolution Sequence Mining
    by Q. Shao, Y. Chen, S. Tao, X. Yan, N. Anerousis,
    SIGKDD'08
    (Proc. of 2008 Int. Conf. on Knowledge Discovery and Data Mining), 2008. [pdf]
  2. Direct Mining of Discriminative and Essential Graphical and Itemset Features via Model-based Search Tree ,
    by W. Fan, K. Zhang, H. Cheng, J. Gao, X. Yan, J. Han, P. S. Yu, O. Verscheure,
    SIGKDD'08
    (Proc. of 2008 Int. Conf. on Knowledge Discovery and Data Mining), 2008. [pdf]
  3. Mining Significant Graph Patterns by Scalable Leap Search,
    by X. Yan, H. Cheng, J. Han, and P. S. Yu,
    SIGMOD'08
    (Proc. 2008 ACM SIGMOD Int. Conf. on Management of Data), Jun. 2008 [pdf] [ppt]
  4. Direct Discriminative Pattern Mining for Effective Classification,
    by H. Cheng, X. Yan, J. Han, and P. S. Yu,
    ICDE'08
    (Proc. of 2008 Int. Conf. on Data Engineering), Apr. 2008 [pdf]
  5. Towards Graph Containment Search and Indexing,
    by C. Chen, X. Yan, P. S. Yu, J. Han, D. Zhang and X. Gu,
    VLDB'07a
    (the 33rd Very Large Data Bases Conf.), Sept. 2007 [pdf]
  6. Entity Search: Search Directly and Holistically,
    by T. Cheng, X. Yan and K. Chang,
    VLDB'07b
    (the 33rd Very Large Data Bases Conf.), Sept. 2007 [pdf]
  7. A Graph-Based Approach to Systematically Reconstruct Human Transcriptional Regulatory Modules,
    by X. Yan, M. Mehan, Y. Huang, M. S. Waterman, P. S. Yu, and X. Zhou,
    ISMB'07a
    (the 15th Annual Int. Conf. on Intelligent Systems for Molecular Biology), Jul 2007 [pdf]
  8. Systematic Discovery of Functional Modules and Context-Specific Functional Annotation of Human Genome,
    by Y. Huang, H. Li, H. Hu, X. Yan, M. S. Waterman, H. Huang, and X. Zhou,
    ISMB'07b
    (the 15th Annual Int. Conf. on Intelligent Systems for Molecular Biology), Jul 2007 [pdf]
  9. Mining, Indexing and Similarity Search in Large Graph Data Sets,
    by X. Yan
    Ph.D Dissertation, 2006 SIGMOD Dissertation Award Runner-Up. Advisor Prof. Jiawei Han.
  10. gPrune: A Constraint Pushing Framework for Graph Pattern Mining,
    by F. Zhu, X. Yan, J. Han, and P. S. Yu.
    PAKDD'07 (Proc. of 2007 Pacific-Asia Conference on Knowledge Discovery and Data Mining), May 2007. Best Student Paper. [pdf]
  11. Mining Colossal Frequent Patterns by Core Pattern Fusion,
    by F. Zhu, X. Yan, J. Han, P. S. Yu, and H. Cheng.
    ICDE'07a (Proc. of 2007 Int. Conf. on Data Engineering), Apr. 2007. Best Student Paper. [pdf]
  12. Integrative Array Analyzer: A Software Package for Analysis of Cross-platform and Cross-species Microarray Data,
    by F. Pan, K. Kamath, K. Zhang, S. Pulapura, A. Achar, J. Nunez-Iglesias, Y. Huang, X. Yan, J. Han, H. Hu, M. Xu, J. Hu, and X. Jasmine Zhou,
    BIOINFORMATICS, Vol.22 no.13: 1665–1667, 2006. [pdf]
  13. Feature-based Substructure Similarity Search,
    by X. Yan, F. Zhu, P. S. Yu, and J. Han,
    ACM-TODS (ACM Transactions on Database Systems), Dec. 2006. [pdf]
  14. Extracting Redundancy-aware Top-k Patterns,
    by D. Xin, H. Cheng, X. Yan, J. Han, 
    SIGKDD'06 (Proc. of 2006 Int. Conf. on Knowledge Discovery and Data Mining), 2006. [pdf]
  15. Summarizing Itemset Patterns: A Profile-Based Approach, 
    by X. Yan, H. Cheng, J. Han, and D. Xin,
    SIGKDD'05 (Proc. of 2005 Int. Conf. on Knowledge Discovery and Data Mining), 2005. Best Student Paper RunnerUp. [pdf]
  16. Graph Indexing Based on Discriminative Frequent Structure Analysis, 
    by X. Yan, P. S. Yu, and J. Han,
    ACM-TODS (ACM Transactions on Database Systems), Dec. 2005. [pdf]
  17. Mining Compressed Frequent-Pattern Sets, 
    by D. Xin, J. Han, X. Yan and H. Cheng,
    VLDB'05 (Proc. of 2005 Int. Conf. on Very Large Data Bases),  2005. [pdf]
  18. Mining Closed Relational Graphs with Connectivity Constraints, 
    by X. Yan, X. Jasmine Zhou, and J. Han,
    SIGKDD'05 (Proc. of 2005 Int. Conf. on Knowledge Discovery and Data Mining), 2005. [pdf]
  19. Mining Coherent Dense Subgraphs Across Massive Biological Networks for Functional Discovery, 
    by H. Hu, X. Yan, Y. Huang, J. Han, X. Jasmine Zhou,
    ISMB'05 (also Bioinformatics). [pdf] [website]
  20. Substructure Similarity Search in Graph Databases, 
    by X. Yan, P. S. Yu, and J. Han,
    SIGMOD'05 (Proc. of 2005 Int. Conf. on Management of Data), 2005. [pdf]
    Among top-ranked papers in SIGMOD'05, Invited to  ACM Transactions on Database Systems (ACM-TODS).
  21. Graph Indexing: A Frequent Structure-based Approach, 
    by X. Yan, P. S. Yu, and J. Han,
    SIGMOD'04 (Proc. of 2004 Int. Conf. on Management of Data), 2004. [pdf][dataset]
    Among top-ranked papers in SIGMOD'04, Invited to  ACM Transactions on Database Systems (ACM-TODS).
  22. CloseGraph: Mining Closed Frequent Graph Patterns, 
    by X. Yan and J. Han,
    SIGKDD'03 (Proc. of 2003 Int. Conf. Knowledge Discovery and Data Mining), 2003. [pdf]
    Google Scholar ranks CloseGraph as #3 for "graph pattern mining", with 154 citations. (as of Mar 19, 2008)
  23. gSpan: Graph-Based Substructure Pattern Mining,
    by X. Yan and J. Han,
    ICDM'02 (Proc. of 2002 Int. Conf. on Data Mining) (short paper), 2002.  [pdf][demo][download]
    Expanded Version, UIUC Technical Report, UIUCDCS-R-2002-2296 [pdf]
    Google Scholar ranks gSpan as #1 for "graph pattern mining", with 315 citations. (as of Mar 19, 2008)
HONORS & AWARDS

  1. ACM-SIGMOD Dissertation Award Honorable Mention, 2007
  2. Best Student Paper, Proc. of 2007 Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2007 
  3. Best Student Paper, Proc. of 2007 IEEE International Conference on Data Engineering (ICDE), 2007
  4. Among the Department's Nominees (two) for ACM Doctoral Dissertation Award, CS Dept. UIUC, 2006
  5. Best Student Paper Runner-Up, SIGKDD 2005
  6. Excellent Teaching Assistant Award, CS Dept., UIUC, 2003, 2004
  7. Outstanding Teaching Assistant Award, CS Dept., SUNY at Stony Brook, 2001

WorkshopS


  1. IEEE ICDM Workshop on Mining Graphs and Complex Structures ( MGCS’07 )
PATENTS

  1. SYSTEM AND METHOD FOR GRAPH CLASSIFICATION WITH SKEWED CLASS DISTRIBUTION,
    by Hong Cheng, Xifeng Yan, Wei Fan and Philip S. Yu.
    US patent filed as Docket YOR8-2007-0684-US1 by IBM (Dec., 2007).
  2. SYSTEM AND METHOD FOR EFFICIENTLY PERFORMING SIMILARITY SEARCHES OF STRUCTURAL DATA,
    by Xifeng Yan and Philip S. Yu,
    US patent filed as Docket YOR9-2005-0047-US1 by IBM (April, 2005).
  3. SYSTEM AND METHOD FOR GRAPH INDEXING,
    by Xifeng Yan and Philip S. Yu,
    US patent filed as Docket YOR9-2004-0013-US1 by IBM (April, 2004).

Last Modified: July 31st, 2006