Training was the easy part…

I recently went through the latest fastai course (v4!), which uses version 2 of their library. My very first deep learning project was an image classifier that works on photos of furniture, so when we covered image classification, it seemed like the perfect time to revisit this project and finally deploy it in a way so it could actually be useful. The original inspiration for this was a local metalworking business. Taking pictures of projects at various stages of completion is an important part of their normal workflow, so their image collection is large, and – importantly! – its organization could be improved. Enter the image classifier! The goal of the project was creating a program that identifies what type of item is in an image and copies the image to an appropriate folder.

Getting the data

As the motivation for this project was having lots of pictures, in one sense getting the data was straightforward. Fastai has built-in functionality to gracefully handle images in an Imagenet-style structure (i.e. separated into different folders with each folder containing one category), and this is also a straightforward way to ask someone to organize pictures, so this was the organizational scheme I chose. Of course, it still required a set of human eyes to go through and do some labeling. While it only took a couple of hours, sorting pictures for even a few hours is perhaps not the greatest of life’s pleasures. Adding to the hassle, the images themselves were very large, and moving them around ended up taking a non-trivial amount of time.

Deep learning is famously data-hungry, so to augment my dataset I turned to the web. The latest fastai includes some helper functions to use Bing Image Search to download images; you just have to sign up for the Bing Image Search API. Apparently there’s a free 7-day trial of the service, and even after the trial period you can use it for free indefinitely but with a (generous) limited number of queries per day. Sounds great. Unfortunately, when I tried to sign up for the trial, I got an error that it couldn’t be added to my Microsoft subscription. Umm, what? I tried several times but to no avail. After posting my difficulties on the fastai forums, one other person reported getting the same error, and no solutions were (or, to my knowledge, have been since) suggested. Solution: get web images another way.

Along the way, I encountered another minor complication. One of my categories, for example, is “table”, but most images of tables on the web are pretty different from the types of tables the classifier would see during inference. So I had to do some funging around with search terms. It was easy, but if I hadn’t thought to do it, my augmented training set wouldn’t have been nearly as helpful.

Finally, I had to set things up into training and validation sets. In theory, or with a nice, ready-made dataset like one would use for a tutorial, this would be a piece of cake – current practice is to use an 80/20 (or similar) train/valid split, and fastai provides this functionality built-in. You can even set a random seed so a seemingly random split is reproducible. Well… real life was a little more complicated. I had a lot more web data than I had ‘real’ data (where by ‘real’ data I mean images from my user – sampled from the actual distribution that I was modeling), but in terms of performance metrics, I was more interested in how the model was working on the ‘real’ data. I decided to use an 80/20 split for the web images, but to put a larger portion of my user’s images in the validation set. My goal was 50/50, but then another complication arose: in many cases, I had several images of the same item taken from multiple angles, or at multiple stages of completion. This is no problem for training the model, but I didn’t want different images of the same item to appear in both training and validation sets. So I had to comb through my user-provided data, carefully making sure that if I had multiple images of the same object, that they all ended up together in the training or validation set.

In the simplest possible case, I would have replicated an imagenet-like directory structure to take advantage of the fastai functionality that automatically gets train/valid assignments and labels from the folders where items are located. However, I wanted to keep my web data and my user data separate, so I ended up organizing everything in a pandas DataFrame. It was nothing heroic, although it was an additional step (and while fastai does support DataFrame organization, it was undocumented at the time I was doing this project so required a bit of digging).

Ethics

You might wonder what ethics has to do with classifying pictures of furniture, and I also wondered this. Since I was doing this project at the same time as the course, after we spent one week’s lecture (yes – out of 7 planned lectures, a whole class session was devoted to ethics – thank you fastai for continually emphasizing the importance of this topic) on deep learning’s ethical considerations, when I read their book’s chapter on ethics I couldn’t help but do it with my project in mind.

One thing I realized is that, while I didn’t foresee any ethical dilemmas related to my project, several of the biases mentioned were still applicable. For example, it was actually beneficial for my classifier to learn to recognize not tables in general, according to their actual distribution in the world, but instead the types of tables it would be classifying during inference. While this is sort of an inversion of the more common, ethically problematic mistake (a classic, egregious example: facial recognition systems that don’t work as well on faces with darker skin tones), it still presents a useful lens for framing problems. Thinking about my dataset, it was helpful to recognize that an intentionally biased training set would actually likely produce a more performant model, while at the same time not being ethically problematic.

Another thing that came up during my reading of this chapter was having a process in place for dealing with mistakes. In practice, it’s easy of course to move a misclassified image from the ‘table’ folder to the ‘cabinet’ folder, but it would be even better to use these examples to improve the model’s performance in the future. With this in mind, I started planning a strategy to retrain the model using images that it had previously misclassified. (This is an improvement for a future iteration, however.)

Training

As suggested by the title, this was not the source of any serious difficulties. With image classification now being a standard first project in deep learning, the actual model training was a very smooth process. I experimented with a few different architectures and saw the best results with a (imagenet-pretrained) ResNet-50. I used the default augmentations recommended by fastai, and a one-cycle-policy learning rate schedule with a maximum learning rate suggested by the built-in learning rate finder. I did also include MixUp data augmentation, but after some additional albeit preliminary experiments, I’m thinking about re-doing this later with different augmentation techniques. (More on this at a later date…)

At this point, I had a reasonable image classifier¹ that worked nicely in a Jupyter notebook. This was satisfying for me, but not exactly a useful application. Cue a lot of dead ends in the service of….

Productionizing

The fastai team is adamant that it’s important to have working demos of your projects so you can share them, show them off, and get some experience if deploying things to production isn’t your strong suit. They include instructions on how to create a demo web-app of an image classifier directly from a Jupyter notebook, which is an extremely comfortable environment for many data-sciencey people, myself included. While this is great for showing off your hot-dog-or-not-type app, it wasn’t going to work for my use case. I didn’t want to just show off my classifier; I wanted to put it to work. For other people. Who use Windows, and probably don’t even have Python installed, and moreover shouldn’t need to have Python installed. OK…. I had a bit of a brainstorm about how I wanted this thing to actually work, and what I wanted the user experience to be.

I decided that the way I wanted the program as a whole to work was to receive a folder location, make predictions for all the images in that folder, and copy the images to different folders based on their predicted categories. (This was based on my user’s specification – sometimes there would be a project-based hierarchy already in place, and he didn’t want the pictures necessarily moved around.) Writing a python script to get the images, call my model’s predict method on them, and copy the images was also pretty straightforward. But this was on my training machine, on Linux, and I wanted this to work on a Windows computer. Moving the classifier to Windows was the next step.

This turned out to be a minor hurdle on its own. Fastai actually works out of the box in Windows, and in hindsight, simply restarting the whole training process from scratch would have taken less time than moving my model weights. At least one other person on the fastai forum ran into similar troubles as I did, so for the sake of anyone who runs into this in the future, I’m recounting the saga here:

Basically, fastai has two ways of saving models. The recommended way is to run model.export, which saves not only the model and its weights, but also the labels and the vocabulary connecting numerical labels to human-relevant word labels, in a .pkl file. (In version 1 of the library, the order of the classes was random, so if you weren’t careful about setting random seeds during dataset creation, two models created the same way could have different orders of classes, which would mess things up when getting predicted labels. It seems that in version 2, the classes are alphabetized, which is definitely an improvement!) Unfortunately, copying the resulting .pkl file over to Windows and attempting to load it resulted in an error: “cannot instantiate ‘PosixPath’ on your system”, because information about the path was serialized along with the model. After some digging, I found that the solution is to use model.save instead. This saves only the weights, so I would have to worry later about recreating my model with appropriate classes. Additionally, it was necessary to manually set pickle_protocol=4 when saving the model. Then in Windows, I had to recreate my fastai DataLoaders object that would be aware of which classes I was using. It turns out that I could accomplish this using an imagenet-like folder structure again – since the classes are alphabetized, my DataLoaders were the same on both platforms. Then, I could load my model weights (with the caveat that PyTorch, multiprocessing, and Windows don’t get along well, so it’s necessary to set num_workers=0), again using pickle_protocol=4. Finally, I had a loaded model in Windows. This new object I could run model.export on so that it would be easy to load and work with in the future.

At this point, I could make predictions and move files around in Windows. A minor victory! But of course, I still had to get the thing working on someone else’s computer. This was a whole additional struggle, and will have to remain for part 2….

¹ It was actually only around 80% accurate, which was a little disappointing, but I attribute much of this to the difficulty in distinguishing consoles from tables. Human-level performance is not great at this either.