Production readiness : TensorFlow notes
TensorFlow : a look back at a maturing ecosystem
TensorFlow was open sourced on Novembre 2015 as a second and public generation of DistBelief, Google internal machine learning system. In February 2016, TensorFlow Serving is released. In November 2016, TensorFlow is “the most popular machine learning project on Github” one year after being open sourced. In February 2017, the 1.0 version of TensorFlow is released with an experimental API for Java and the promise of Python API stability. In March 2017, Google ML engine reaches GA and in May 2017, GPU-enabled machines are available world-wide. At the same time, the second generation of TPUs (renamed Cloud TPUs) are available for researchers at no cost using the TensorFlow Research Cloud and an alpha program is started for everybody else.
The progress has been fast. And even if more can be expected (Org-scale TensorBoard, Java API stability, general availability of TPUs, etc), TensorFlow has become, step by step, an attractive solution to consider.
Here are a few notes and videos for those who want to catch up.
TensorFlow : an optimiser of numerical computation.
Machine learning, and even more so deep learning, require lots of numerical computation. As a consequence, practitioners know the importance of using efficient matrix computation means such as LAPACK, ATLAS or any other BLAS-related libraries. TensorFlow was historically an internal Google’s project focusing on this recurring problematic : how to provide the end users the means to consisly formulate their numerical computations while at the same time offering great performance?
The data flow graphs define the interface between the end users and the core of TensorFlow. In a sense, they are the equivalent of an abstract syntax tree between a developer and the compiler of the programming language.
Defining a good interface, or API, is always tricky. As much as possible, technical details regarding the implementation shouldn’t be exposed to the end user. But sometimes, those details do matter for the end user so it is not possible to hide them. It is especially true for performance. Leakiness of the API is one important aspect when selecting an interface. But the second, and more important, aspect is whether the created abstractions are at the right level.
TensorFlow is fundamentally a library for numerical computation. It is not a machine learning nor a deep learning library. The delimitation is however murky and will probably stay in that state. TensorFlow does contain support for machine learning and deep learning but it is possible to use TensorFlow for any kind of numerical computation without limitation of purpose. It is also possible to use deep learning with TensorFlow but without directly manipulating its API : here comes Keras.
Keras : simple but deep learning
Keras defines itself as the Python Deep Learning Library. One of its core guiding principle is user friendliness.
At high level, deep learning with Keras can be seen as building a lego house with lego bricks. The types of the bricks, their number and how they fit together is the responsibility of the builder but lots of details are abstracted away in order to create reusable bricks.
Creating a friendly deep learning library is the challenge addressed by Keras and rather well. So well, in fact, that its TensorFlow implementation has been selected to be directly embedded into the TensorFlow library. It doesn’t help with the definition of what is TensorFlow but it is a great news both for Keras and TensorFlow users.
Valerio Maggio presented Keras during PyData London 2017. Previous knowledge of Deep learning is assumed.
TensorBoard : Machine Learning with humans
With all the buzzword around deep learning, machine learning and AI, it’s easy to loose track. How far is the creation of a strong AI is an open question but it is certain that today humans still play a critical role in machine learning.
In almost every machine learning project, visualizing different metrics such as loss or fit of the model help to check the viability of a model and find the cause of errors. While building such graphical interface in an ad hoc fashion is not impossible, it is still a big time loss when the real goal is to understand the behavior of the learning with regard to a specific context. The good news? TensorFlow works can be analyzed thanks to TensorBoard.
Dandelion Mané presented TensorBoard during TensorFlow Dev Summit 2017.
TensorBoard is still a work in progress but the future looks promising. The roadmap includes an org-scale TensorBoard which would allow multiple users to share their results and keep an history.
Google ML Engine : Fast exploration of hyperparameters
Not all companies require distributed learning. That’s a truth. But on the technical side, there is one step in almost any machine learning project that can require massive computation power : the hyperparameter tuning. The good news are that it is not something that need to be performed regularly and that it is embarrassingly parallel and as a consequence, relatively trivial to distribute.
Google ML engine can be seen as a cloud version of TensorFlow. Arguably, the whole infrastructure of any application could be hosted by Google cloud and its ML engine. In practice, how much the infrascture should depend on an external service is something that need to be answered with the different constraints of the context.
That being said Google ML engine is very well positioned for hyperparameters tuning. Assume that 1 hour is required to train a model for a specific configuration of hyperparameters. Assume additionally that there exists two hyperparameters with 10 values each that should be explored. With a naive full exploration, it would require (10 x 10 x 1 =) 100 hours. But it could be performed by Google ML engine in 1 hour with 100 times the hardware used for the initial training. It is doubtful that many companies can handle such spike in hardware demand. Without Google ML Engine, a company would have a far longer feedback loop.
TensorFlow Serving : production is easy
Once upon a time, a businessman asked a datascientist colleague if it would possible to predict X. After a few exchanges on the definition of the subject, the datascientist was able to define the problem as a relatively classic machine learning problem, to retrieve the data and to select a model for which the conclusion was : yes, predictions could be made with enough quality such that it would help the business. The reaction was “Great! Now, it should be integrated with my application. I need to be able to ask for predictions in almost real time.” The datascientist said “No problem. I will send my results to another team. They will inform you about what can be done.”
This is indeed a stereotypical story. The critical point is that building a predictive model does not stop at the validation that the available data are sufficient to compute good enough predictions. At the end, the objective is often the integration of the model in production. A few use cases are easy in the sense that only cold predictions are required : they can be computed every night for example. However, more and more projects require to use hot data (such as a user browsing data) and as a consequence to generate hot predictions. In that case, the road to production, if not prepared beforehand, is not easy.
Can the technology used for the model be installed in production? Is there any equivalent tech? Is a complete rewrite required? How do other parts of the application ask for the prediction? Will it support the load? Should a new web service be implemented and deployed? How can high availability be guaranteed? What happens if the model needs to be fixed? Is it possible to perform a hot swap? If there is a model update, is it possible to A/B test it against the old one in production before a complete swap? And so on…
TensorFlow Serving is there to answer these questions. It is not necessary to reinvent the wheel in order to drive a TensorFlow project to production.
Noah Fiedel presented TensorFlow Serving during Google I/O 2017.
TensorFlow in production can be easy even on premise when Google ML Engine is not an option.
TensorFlow : on mobile
Machine learning implies data. As a consequence, that thought often leads to BigData and data center. But with the prevalence of smart phones and the internet of objects (raspberry pie!), isn’t there another way? If the mobile application needs to detect specific objects in its video, does the full video need to be streamed to the datacenter in order to have an answer? Wouldn’t that result in horrible latency and the necessity of having a fast and reliable internet connection?
Even though training on mobile is not a solved problem, doing prediction on a mobile is a reality and can lead to novel kinds of applications. A typical example is real time translation of text, sound or video even in airplane mode.
Yufeng Guo presented how machine learning on mobile is possible right now during Google Cloud Next 2017.
Deep learning provides one advantage for fast training : in order to understand a specific domain, a generic model can be fine tuned without a full training.