We’re excited to announce Hive’s new AutoML tool that provides customers with everything they need to train, evaluate, and deploy customized machine learning models.
Our pre-trained models solve a wide range of use cases, but we will always be bounded by the number of models we can build. Now customers who find that their unique needs and moderation guidelines don’t quite match with any of our existing solutions can create their own, custom-built for their platform and easily accessible via API.
AutoML can be used to augment our current offerings or to create new models entirely. Want to flag a particular subject that doesn’t exist as a head in our Text Moderation API, or a certain symbol or action that isn’t part of our Visual Moderation? With AutoML, you can quickly build solutions for these problems that are already integrated with your Hive workflow.
Let’s walk through our AutoML process to illustrate how it works. In this example, we’ll build a text classification model that can determine whether or not a given news headline is satirical.
First, we need to get our data in the proper format. For text classification models, all dataset files must be in CSV format. One column should contain the text data (titled text_data) and all other columns represent model heads (classification categories). The values within each row of any given column represent the classes (possible classifications) within that head. An example of this formatting for our satire model is shown below:
The first page you’ll see on Hive’s AutoML platform is a dashboard with all of your organization’s training projects. In the image below, you’ll see how the training and deployment status of old projects are displayed. To create our satire classifier, we’re going to make a new project by hitting the “Create New Project” button in the top right corner.
We’ll then be prompted to provide a name and description for the project as well as training data in the form of a CSV file. For test data, you can either upload a separate CSV file or choose to randomly split your training data into two files, one to be used for training and the other for testing. If you decide to split your data, you will be able to choose the percentage that you would like to split off.
After all of that is entered, we are ready to train! Beginning model training is as easy as hitting a single button. While your model trains, you can easily view its training status on the Training Projects page.
Once training is completed, your project page will show an analysis of the model’s performance. The boxes at the top allow you to decide if you want to look at this analysis for a particular class or overall. If you’re building a multi-headed model, you can choose which head you’d like to evaluate as well. We provide precision, recall, and balanced accuracy for all confidence thresholds as well as a PR curve. We also display a confusion matrix to show how many predictions were correct and incorrect per class.
Once you’re satisfied with your model’s performance, select the “Create Deployment” to launch the model. Similarly to model training, deployment will take a few moments. After model deployment is complete, you can view the deployment in your Hive customer dashboard, where you can access your API key, view current tasks, as well as access other information as you would with our pre-trained models.
We’re very excited to be adding AutoML to our offerings. The platform currently supports both text and image classification, and we’re working to add support for large language models next. If you’d like to learn more about our AutoML platform and other solutions we’re building, please feel free to reach out to sales@thehive.ai or contact us here.