Machine learning is an exciting topic in the computer science world these days, which means that O’Reilly has you covered with a first class conference. Jupyter Conference is coming this August so I wanted to speak to O’Reilly’s Director of Learning Group, Paco Nathan, about what machine learning and Jupyter are, why they’re important, and what you can expect at this conference!
Here’s a really good recent article, “What Is Jupyter?” by Mike Loukides, https://www.oreilly.com/ideas/what-is-jupyter which explains the essential story about Jupyter. In a nutshell, Jupyter provides a way for people to run code remotely in a particular environment.
One of the more popular ways this gets used is in Jupyter notebooks. Wolfram Research has used notebooks as a UI metaphor since the 1980s for their popular Mathematica product. In many ways, Jupyter notebooks draw inspiration from those.
If you think of how an Excel spreadsheet is organized, let’s simplify that and stretch it out: instead we have cells arranged vertically down a web page. Each of the cells may have rich text (HTML), some image or video, source code, results from running the source code, etc. You run each, step by step, and the data flows from one to the next. Meanwhile, the code is being executed elsewhere: locally on your laptop, or in the cloud, or on some supercomputer at a large university. You control everything through the one web page, and only need to use your browser.
As an example, you may be a grad student working on a scientific computing problem, which you run from your laptop to get started, using a small data set. Then eventually run it from a large supercomputer to work with much larger data sets, but still using the same code, the same notebook.
As another example, say you are a data scientist working within a team to surface insights about a line of business. You can use the notebook to encapsulate your notes and observations, quite literally the code used to pull the data, then the analysis along with the results it produces. Anyone else in your company could re-run the same work, as long as they have a URL for your notebook.
This notion of “repeatable science” has gained so much traction among scientists. The two leads on Project Jupyter are both physicists, and recently the discovery of gravity waves (which may well lead to a Nobel Prize) was published as a Jupyter notebook: https://losc.ligo.org/s/events/GW150914/GW150914_tutorial.html
There’s a declining trend for books in general, and textbooks in particular. Professors who would’ve written textbooks in an earlier year, today tend to use open source tools such as Jupyter to publish learning materials for their courses. These can be readily shared online.
O’Reilly Media, as a publisher, watches this trend carefully. We’ve leveraged Jupyter notebooks for what is called “computable content”. In other words, people can visit a web page and learn to do some complex computation through hands-on coding synchronized with a video of the author explaining the concepts — based on Jupyter. No software installation is required. just a browser required. That’s a huge boost for people talking courses, for example in tutorials at our conferences. Try it out our “Oriole” tutorials showing computable content:
We see approaches such as Jupyter as the future of publishing and learning materials. Here’s a talk that I did recently, describing more about Jupyter and some of our use cases at O’Reilly Media: https://dominodatalab.wistia.com/medias/ydax0dpjug
Machine learning is a different thing. That field traces back to early “cybernetics” and Control Theory research in the 1920s by Norbert Wiener, followed by a project Wiener sponsored at MIT during WWII for the first “artificial neural networks” by McCulloch and Pitts.
Advances throughout the 1970s-1990s set the stage for a point in the late 1990s: given the success of e-commerce, by late 1997 companies such as Amazon and eBay had “Big Data” available, along with the beginnings of what we now call cloud computing. Given the two vital elements together (“big data” and “big compute”), companies such as Amazon were able to productize Machine Learning into consumer services at mass scale, such as “People who bought this book also bought…”, with its patent filed in 1998. Another company, Google, was still a research project at Stanford at the time, and famously used machine learning algorithms to help people search the web more effectively.
A few years later, Jonathan Goldman applied similar approaches at LinkedIn to create “People you may know”, for one of the early large-scale machine learning applications in Social Networks. Facebook, Twitter, Spotify, etc., followed in their path.
Microprocessors for many years doubled in speed every 18 months, due to what’s called “Moore’s Law” — although that began to run out by the 2010s. Instead, researchers revised advanced math and computation on GPUs, which had previously been popular for video games. One area where this was applied was neural networks — a subset of machine learning — and specifically to stacked layers of neural networks in what’s called Deep Learning. By 2012, Google, Facebook, and Microsoft had each funded research teams working on Deep Learning. That was followed by fantastic results circa 2015-2016 in Artificial Intelligence applications, such as speech recognition and translation. We say that three factors (“big data”, “big compute”, and “big models”) allowed AI to become a commercial success.
Jupyter notebooks are inherently well-suited for teaching how to work with machine learning, and AI in particular. Some of the better examples which O’Reilly has published come from Jake VanderPlas at U Washington: https://www.oreilly.com/people/89c9c-jake-vanderplas
Another excellent example is the aforementioned AI tutorial by Jon Bruner: https://www.oreilly.com/learning/generative-adversarial-networks-for-beginners
I did a video called Just Enough Math — which shows more details about the history of machine learning, in use cases suited for business executives. That, oddly enough, also uses Jupyter notebooks for the coding exercises 🙂
In terms of other introductory materials for getting started with machine learning, here are several on our Safari learning platform — that requires a login, although people can sign up for free to get a trial membership:
You asked whether machine learning will replace developers? We’ve had a lot of related material at our recent conferences. The answers range from “No”, “Yes”, and “AI certainly augments people”:
It’s also interesting to note that when “Deep Dive”, a very popular AI project at Stanford University recently needed to build a custom UI for people to use their work, they chose to use Jupyter notebooks: http://dawn.cs.stanford.edu/2017/05/08/snorkel/
I’ll be giving a related talk at JupyterCon on Thu, Aug 24, which describes how we use Jupyter notebooks at O’Reilly Media for our AI work on my team, to help people and machines collaborate together:
Humans in the loop: Jupyter notebooks as a frontend for AI pipelines at scale
I’m super thrilled about JupyterCon. It’ll be *so great* to get these speakers, thinkers, innovators, all together in one place — finally!!
Rojenx is a leading concept artist who work appears in games and publications
Check out his personal gallery here