Differences between Pytorch and Tensorflow
When comparing the experience of programming for PyTorch versus TensorFlow with the Keras API, our opinion is that the differences fall into one major and one minor category. The major difference is that some things that are handled by the Keras API need to be explicitly handled in PyTorch. This makes it slightly harder to get started for a beginner but pays off in the long run in terms of providing flexibility when you want to do something slightly off the beaten path. The minor differences simply consist of a number of minor design/API choices that are different between the two frameworks.
Both frameworks are rapidly evolving. Therefore, this section is likely to get outdated over time. We recommend that you consult the most up-to-date documentation for the framework you use.
Need to Write Our Own Fit/Training Function
One of the bigger obstacles in PyTorch compared to Tensorflow is the need to write your own function to train your model. In Tensorflow, once you have defined a model, you simply call the function fit()
with a set of suitable parameters, and the framework handles a lot of the details, including running the forward pass, running the backward pass, and adjusting the weights. In addition, it computes and prints out a number of useful metrics like loss and accuracy for both the training set and the test set. In PyTorch, you must handle these mechanics yourself.
Although this might seem cumbersome, in reality, it is not that much code to write. In addition, as we show in our code examples, it is simple to write your own library function that can be reused across many models. This is a prime example of where we think it is a little bit harder to get started with PyTorch than with Tensorflow. On the other hand, it is very powerful to be able to easily modify this piece of code.
Explicit Moves of Data Between NumPy and PyTorch
The Keras API in TensorFlow uses NumPy arrays as its representation of tensors. For example, when passing a tensor to a model, the format is expected to be in the form of a multidimensional NumPy array. In contrast, in PyTorch you need to explicitly convert data between NumPy arrays and PyTorch tensors.
PyTorch keeps track of information to be able to do automatic differentiation (using backpropagation) on PyTorch tensors. That is, as long as you work on PyTorch tensors, you can use any computation supported by the tensor data type when defining a function, and you will later be able to automatically compute partial derivatives of that function. The explicit move to and from a tensor enables PyTorch to track what variables to provide this functionality for.
There are a few different functions related to this:
from_numpy()
converts from NumPy array to PyTorch tensor.
detach()
creates a PyTorch tensor that shares storage with the original PyTorch tensor but for which automatic differentiation is not supported.
clone()
creates a PyTorch tensor from a PyTorch tensor but where storage is not shared between the two tensors.
item()
converts a single element in a PyTorch tensor into a NumPy value.
with torch.no_grad()
turns off support for automatic differentiation within the scope of this construct.
For a beginner, it can be challenging to understand how these functions and constructs all relate, especially when encountering a combined expression such as detach().clone().numpy()
. It is like with anything else. It takes some time to get used to, but once you understand it, it is not that complicated.