Hot property: How Zillow became the real estate data hub

If you’re buying a home or looking for an apartment, most likely will come to mind first, which is a branding triumph for a website that launched just 10 years ago. Today the Zillow Group is a public company with $645 million in revenue that also operates websites for mortgage and real estate professionals — and completed the acquisition of its nearest competitor, Trulia, last year.

From the  start, Zillow offered the “Zestimate,” its value-forecasting feature for homes in locations across the United States. Currently, Zillow claims to have Zestimates for more than 100 million homes, with over a hundred attributes tracked for each property. The technology powering Zestimates and other features has advanced steadily over the years, with open source and cloud computing playing increasingly important roles.

Last week I interviewed Stan Humphries, chief analytics officer at Zillow, along with Jasjeet Thind, senior director of data science and engineering. With diverse data sources, a research group staffed by a dozen economists, and predictive modeling enhanced by a large helping of machine learning, Zillow has made major investments in big data analytics as well as the talent to ensure visitors get what they want. Together, Humphries and Thind preside over a staff of between 80 and 90 data scientists and engineers.

An analytics platform grows up

Humphries says that Zillow’s technology has evolved in three phases, the common thread being the R language, which the company’s data scientists have used for predictive modeling from the beginning. At first, R was used for prototyping to “figure out what we wanted to do and what the analytic solution looked like.” Data scientists would write up specifications that described the algorithm, which programmers would then implement in Java and C++ code.