Hi, I'm Maxine Lim. I just graduated from Stanford University with a B.S. in Computer Science, and I'm currently the Chief Design Officer at Apropose, Inc.. I was previously a research assistant with Scott Klemmer in the Stanford Human-Computer Interaction Group, investigating how data-driven techniques can be applied to Web design. This year I was accepted to MIT's CS Ph.D program.


OCT 2013 << >>
This summer I decided to defer my admission to MIT to work at Apropose, Inc. I'm looking forward to what lies ahead!
JUN 2013 << >> >>
This month I finally graduated and got my Stanford diploma! Looking ahead at the summer, I will be continuing research building on top of our Webzeitgeist project. We are interested in exploring different representations of Web pages to use in data-driven applications.
APR 2013 << >>
In the past few weeks our team has been busy preparing a demo for Webzeitgeist, which will be presented in Paris this month at CHI 2013! We are really excited to share some of our hard work with everyone.

This year I was also fortunate enough to be accepted to grad schools for Ph.D programs in Computer Science. This quarter is bittersweet: it will be my last at Stanford, but I'm looking forward to starting at MIT this fall.

DEC 2012 << >>
Our most recent paper, "Webzeitgeist," provides a platform for machine learning on Web design. Traditionaly data mining focuses on textual content, link graphs, etc., but largely ignores the presentation of Web content. But Web pages are a rich source of design data--how can we use this data to help designers find inspiration and inform their design decisions? Webzeitgeist makes it possible to query a huge repository (over 100,000 pages!) of Web pages to identify design patterns like commonly used fonts, colors, or layouts. It also provides a corpus Web pages and their over 1,700 associated features in a way that can easily be used for machine learning applications. I'm excited to see the rich queries that Webzeitgeist can allow for and the range of data-driven applications that it can enable.
NOV 2012 << >>
We recently collected thousands of crowdsourced style labels for a machine learning project that aims to train classifiers for design descriptors, such as labels that capture a page's design style, i.e., "minimal" or "elegant." Our data consisted of many labels that seemed to have similar meanings, making it difficult for a learning algorithm to distinguish between them. We wanted to see if there was a principled way to merge similar labels. From the definitions, it may be hard to argue that words like "clean" and "minimal" are essentially the same thing. However if the data shows that whenever people apply the label "clean" to a page, they also apply "minimal," then perhaps it makes sense to combine the two labels into one. Therefore we decided construct a co-occurrence matrix and re-order the rows to form groups of commonly co-occurring labels. We built Seri (short for "seriate") to more easily inspect these matrices and identify which Web pages the labels described.
SEPT 2012 << >>
In my most recent project, I looked into the automatic structural semantification of Web pages. Getting machines to interpret the ever-increasing amount of unstructured data on the Web is hard: whether we know it or not, we've all likely experienced the consequences of the lack of structural semantics on the Web (if you don't know what I'm talking about, ask yourself why Googling something like "news web site with grey logo" returns as a top hit the official site for the Canadian Football League). People use machine learning to extract textual content from the Web, but they mostly ignore the page design and structure. Instead, structural information is left to be manually embedded in Web pages by Web developers. But this method hasn't been working too well since developers aren't properly incentivized to apply rich semantics: as one developer puts it, "There are two types of developers: those who argue about div's not being semantic and those who create epic shit." Taking a different approach, I'm trying to use machine learning allow machines to automatically extract structural semantics from existing, unstructured data.

Papers and Posters

Webzeitgeist: Design Mining the Web | See video
Ranjitha Kumar, Arvind Satyanarayan, Cesar Torres, Maxine Lim, Salman Ahmad, Scott R. Klemmer, Jerry O. Talton
CHI: ACM Conference on Human Factors in Computing Systems 2013 Best Paper Award
Advances in data mining and knowledge discovery have transformed the way Web sites are designed. However, while visual presentation is an intrinsic part of the Web, traditional data mining techniques ignore render-time page structures and their attributes. This paper introduces design mining for the Web: using knowledge discovery techniques [...]
Learning Structural Semantics for the Web
Maxine Lim, Ranjitha Kumar, Arvind Satyanarayan, Cesar Torres, Jerry O. Talton, Scott R. Klemmer Technical Report, Stanford University 2012
Researchers have long envisioned a Semantic Web, where unstructured Web content is replaced by documents with rich semantic annotations. Unfortunately, this vision has been hampered by the difficulty of acquiring semantic metadata for Web pages [...]
Learning Design Patterns with Bayesian Grammar Induction
Jerry O. Talton, Lingfeng Yang, Ranjitha Kumar, Maxine Lim, Noah D. Goodman, Radomir Mech
UIST: ACM Symposium on User Interface Software and Technology 2012 Best Paper Nominee
Design patterns have proven useful in many creative fields, providing content creators with archetypal, reusable guidelines to leverage in projects. Creating such patterns, however, is a time-consuming, manual process, typically relegated to a few experts in any given domain [...]
A Platform for Large-Scale Machine Learning on Web Design
Arvind Satyanarayan, Maxine Lim, Scott R Klemmer Work-In-Progress, CHI: ACM Conference on Human Factors in Computing Systems 2012
The Web is an enormous and diverse repository of design examples. Although people often draw from extant designs to create new ones, existing Web design tools do not facilitate example reuse in a way that captures the scale and diversity of the Web. To do so requires using machine learning techniques to train computational models [...]
Post Hoc Semantics for the Web
Maxine Lim
Stanford Undergraduate Honors Thesis, 2012

Other Projects

Structural Prediction for Web Design
Maxine Lim, Arvind Satyanarayan, Cesar Torres
CS 229 Final Project, Fall 2012
Recursive neural networks (RNNs) have been successful for structured prediction in domains such as language and image processing. These techniques imposed structure onto sentences or images for more effective learning. However, in domains such as Web design, structure is explicitly embedded in the Document Object Model, so structured prediction can be done using the natural hierarchy of Web pages [...]
Charlotte: Visualizing Web Design
Victoria Flores, Maxine Lim, Cesar Torres
CS 448B Project, Fall 2012
Given design data describing thousands of Web pages, we sought to find ways of exploring the design space for Web design in an aggregate fashion. What patterns and trends can be observed among large quantities of Web designs?
Seri: Similarity Matrix Viewer
Maxine Lim, Cesar Torres
CS 448B Project, Fall 2012
Seri (short for seriation) allows viewers to interactively explore seriated co-occurrence matrices of crowdsourced style labels applied to Web pages.
MPTCP Wireless Performance
CS 244 Project, Spring 2012
In an attempt to reproduce results in networking systems papers, we re-demonstrated MPTCP's increased performance over multiple wireless interfaces as compared to TCP and its ability to perform seamless wireless handoff.
Enabling Crowd-Sourced Social Queries
CS 294S Project, Spring 2012
Motivated by the massive amount of information locked in social networking sites, in this project we built a tool that provides Facebook users to run queries over their social networks.
Muscle Visualizer
Personal Project, January 2012
The importance of strength training for health and fitness, including maintaining muscle mass and bone density is pretty widely recognized. I built this Web app to help people get a sense for what major muscle groups their current weight routines are targeting.


Major: Computer Science (Systems)
"Systems is the study of the design and implementation of computer systems such as compilers, databases, networks, and operating systems. Topics include the hardware/software interface, the networking stack, digital architecture, memory models, optimization, concurrency, privacy, security, distributed and large-scale systems, reliability and fault tolerance, and related algorithms and theoretical topics."
Selected Courses:
CS 140 Operating Systems John Ousterhout
CS 149 Parallel Computing Alex Aiken, Kunle Olukoltun
CS 229 Machine Learning Andrew Ng
CS 244 Advanced Topics in Networking Nick McKeown
CS 244B Distributed Systems David Cheriton
CS 261 Advanced Algorithms Serge Plotkins
CS 448B Data Visualization Jeffrey Heer
Minor: Biology
Selected Courses:
BIO 41-43 Biology Core Multiple Instructors
BIO 112 Human Physiology Daniel Garza
BIO 188 Biochemistry Lynette Cegelski
HUMBIO130 Human Nutrition Christopher Gardner


Section Leading (CS 198):
Spring 2011 CS 106A Programming Methodology Jerry Cain
Summer 2011 CS 106A Programming Methodology Brandon Burr, Osvaldo Jimenez
Fall 2011 CS 106A Programming Methodology Mehran Sahami