Erik Linstead

  • Assistant Professor – Faculty of Computer Science, School of Computational Sciences
  • Area: Artificial Intelligence                                          
  • Email: linstead at chapman dot edu
  • Office Hours (Fall 2010): Thursday 7-10pm LL B13 (additional hours TBD)


  • University of California, Irvine (2009)
    • Ph.D. – Information and Computer Science
    • Advisor: Pierre Baldi
    • Thesis – “Statistical Machine Learning for Internet-Scale Software Repositories”
    • Institute for Genomics and BioInformatics
    • Center for Machine Learning and Intelligent Systems
  • Stanford University (2003)
    • M.Sc. - Computer Science
  • Chapman University (2001)
    • B.Sc. - Computer Science
    • B.Sc. - Computer Information Systems

Research Interests
My research, in one way or another, is centered on machine learning and information retrieval.  Most of my work

has involved adapting and applying these techniques to the software engineering domain and bio/chemical

informatics.  I collaborate closely with the Baldi lab at UCI.


Research Group

I enjoy collaborating with ambitious and enthusiastic undergraduates who are eager to perform original research.

I’m a firm believer in allowing undergrads to co-author research papers with me, and have had several papers

published with my advisees.  My students include:

  • Lindsey Hughes (graduated 2010, now a grad student at UCI)
  • Sarah Maurer
  • Joseph Smith (graduated 2010, now at Google)
  • Elizabeth Stevens
  • Matthew Strand (graduated 2010, now a grad student at UCI)
  • Robert Duncan (UCI. graduated 2010, now a grad student at CMU)


Publications (* denotes equal contributors)



E. Linstead*, S. Bajracharya*, T. Ngo*, P. Rigor, C. Lopes, P. Baldi.  Sourcerer: Mining and Searching

Internet-Scale Software Repositories.  Data Mining and Knowledge Discovery.  Volume 2, Number 18.  April 2009. (online)


J. Chen, E. Linstead, S. Swamidass, D. Wang, P. Baldi.  ChemDB Update: Full-text Search and Virtual

Chemical Space  Bioinformatics.  Volume 23, Number 17.  September 2007.  (online)



E. Linstead, L. Hughes, C. Lopes, P. Baldi.  Information-Theoretic Metrics for Project-Level

Scattering and Tangling.  International Conference on Software Engineering and Knowledge

Engineering (SEKE).  Redwood City, CA.  July, 2010.


E. Linstead, P. Baldi.  Mining the Coherence of GNOME Bug Reports with Statistical Topic Models.

MSR 2009: Proceedings of the Sixth Working Conference on Mining Software Repositories.

Vancouver, BC.  May 2009.  (online)


J. Ossher, S. Bajracharya, E. Linstead, P. Baldi, C. Lopes.  SourcererDB: An Aggregated Repository

Of Statically Analyzed and Cross-Linked Open Source Java Projects.  Proceedings of the Sixth Working

Conference on Mining Software Repositories.  Vancouver, BC.  May 2009.  (online)


E. Linstead, C. Lopes, P. Baldi.  An Application of Latent Dirichlet Allocation to Analyzing Software

Evolution.  Proceedings of ICMLA 2008:  International Conference on Machine Learning and

Applications.  San Diego, CA.  December 2008.  (online)


P. Baldi*, C. Lopes*, E. Linstead*, S. Bajracharya.  A Theory of Aspects as Latent Topics.

OOPSLA 2008.  Nashville, TN. October 2008. (online)


E. Linstead, P. Rigor, S. Bajracharya, C. Lopes, P. Baldi.  Mining Internet-Scale Software

Repositories.  Advances in Neural Information Processing Systems (NIPS*2007)

March 2008.  (online)


E. Linstead, P. Rigor, S. Bajracharya, C. Lopes, P. Baldi.  Mining Concepts from Code with

Probabilistic Topic Models.  Proceedings of ASE 2007: International Conference on Automated

Software Engineering. Atlanta, GA. November 2007.  (online)



E. Linstead, L. Hughes, C. Lopes, P. Baldi.  Software Analysis with Unsupervised Topic Models.

NIPS Workshop on Application of Topic Models: Text and Beyond.  Neural Information

Processing Systems (NIPS 2009).  Whistler, B.C. December 2009.  (online)


E. Linstead, L. Hughes, C. Lopes, P. Baldi.  Exploring Java Software Vocabulary: A Search and

Mining Perspective.  Proceedings of SUITE 2009:  First International Conference on Search-Driven

Development – Users, Interfaces, Tools, and Environments.  Vancouver, BC.  May 2009. (online)


E. Linstead, P. Rigor, S. Bajracharya, C. Lopes, P. Baldi.  Mining Eclipse Developer Contributions via

Author-Topic Models.  Fourth International Workshop on Mining Software Repositories. Minneapolis,

MN. May 2007.  (Voted best paper, MSR “Scale” Challenge).  (online)



Strand, M., Hughes, L., Duncan, R., Smith, J., Linstead, E.  “An Eclipse Plug-in For Enforcing Java Naming Conventions.” 

41st ACM Technical Symposium on Computer Science Education, SIGCSE 2010. 

Milwaukee, WI.  March, 2010. (online)


L. Hughes, P. Baldi, E. Linstead.  The Evolution of Concerns, Scattering, and Tangling in Eclipse and

ArgoUML.  Third International Symposium on Empirical Software Engineering and Measurement.

Lake Buena Vista, FL.  October, 2009.


E. Linstead, L. Hughes, C. Lopes, P. Baldi.  Capturing Java Naming Conventions with First-Order Markov Models.

ICPC 2009: Proceedings of the Seventeenth International Conference on Program Comprehension.

Vancouver, BC.  May 2009.  (online)

S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi & C. Lopes.  Sourcerer: A Search Engine for Open

Source Code Supporting Structure-Based Search.  OOPSLA ’06 Poster Session.  Portland, OR. October 2006.  (online)


Technical Report:

S. Bajracharya, T. Ngo, E. Linstead, P. Rigor, Y. Dou, P. Baldi & C. Lopes.  A Study of Ranking Schemes in

Internet-Scale Code Search.  UCI ISR Technical Report # UCI-ISR-07-8. Nov. 2007 (online)


Recent Invited Talks:

Searching and Mining Internet-Scale Software Repositories.

            AI and Machine Learning Seminar.  Dept. of Computer Science.  UCI.  November 10, 2008.

            Google Tech Talk.  Irvine, CA.  May 9th, 2008.

            Chapman University Computer Science Forum.  Orange, CA.  November 15, 2007.



CPSC 229: C/C++ Programming (Fall 2004)

CPSC 229: Intermediate OO Programming (Interterm 2010)

CPSC 230: Computer Science I (Fall 2010)

CPSC 231: Computer Science II (Spring 2010, Fall 2010)

CPSC 252: Computer Architecture I (Spring 2004)

CPSC 285: Social Issues in Computing (Spring 2005, Fall 2009)

CPSC 350: Data Structures (Fall 2003, Fall 2008, Fall 2010)

CPSC 360: Computer Graphics (Interterm 2005, Interterm 2007, Interterm 2009)

CPSC 370: Data Mining (Spring 2008)

CPSC 370: Advanced OO Programming (Interterm 2010)

CPSC 390: Artificial Intelligence (Fall 2003, Spring 2010)

CPSC 406: Algorithm Analysis (Spring 2009)

CPSC 408: Database Systems (Fall 2010)

CPSC 499: Individual Research (Interterm 2009, Spring 2009, Summer 2009, Fall 2009)

Professional Memberships and Activities

Association for Computing Machinery (Senior Member)

    Special Interest Groups on Artificial Intelligence, Computer Science Education, and

    Knowledge Discovery in Data

Association for the Advancement of Artificial Intelligence (AAAI)

Reviewer: Data Mining and Knowledge Discovery, Journal of Chemical Information and Modeling

PC Member: SUITE 2010


Last Updated: August 9th, 2010