By Gaurav Verma, Cloud Infrastructure Architect – AWS
Organizations leverage enterprise-class Box cloud storage to store data for different applications, and they may need to process this data with solutions like data lakes, data warehouses, or machine learning.
There are multiple use cases where applications running on Amazon Web Services (AWS) may require an AWS Lambda function to process the file stored in Box storage before it’s sent to the next stage for further processing. For example, Amazon Textract may have a requirement where files should be converted to PDF from other formats.
A Lambda function can download and convert a file stored in Box using a utility like Gotenberg or a Python library like FPDF. The files can then be processed by Amazon Textract.
This post explains how a Lambda function can be invoked on the upload of a file in a Box folder using a webhook. Box is an AWS Partner that provides webhooks which can be triggered on different file operations like upload, modify, or delete.
With the help of a Box webhook and Amazon API Gateway, users can invoke a Lambda function on file upload operation and download the file to Amazon Simple Storage Service (Amazon S3). Machine learning services use uploaded file for processing, but this post primarily focuses on Box webhooks.
Sample code in this post uses the Python language, but if you’re looking to write a code in a different language, learn how the Box SDK supports different languages.
Before you start creating a webhook, ensure you have an API already created in Amazon API Gateway, which should be connected to a Lambda function to run the function code. Learn about API Gateway REST API with Lambda integration in the AWS documentation.
Set Up Box and Create Webhook
The Box developer console provides functionality to create a webhook which can work with different events like create, delete, and others. These steps explain how to create a webhook for file upload and integrate that with an API.
- Log in to your Box account and go to the developer console.
- If you don’t have an application already created, create an application and select an authentication method. In this example, I am using an application with OAuth 2.0 with JSON Web Tokens (JWT). Learn more about how to set up application authentication on Box.
- From a security configuration standpoint, I need to grant permission to manage the webhook. Click on the Configuration tab and select Manage Webhooks in the Application Scopes section, and then click Save Changes.
- Your application must be authorized before it’s ready to use or create any folder and webhook in it. To submit an authorization request, go to the Authorization tab in your application and click on Review and Submit. This sends an email to the Box admin to authorize the application.
Figure 1 – Application authorization.
- To access your Box environment using a command line interface (CLI), you must create a credential config file which can be downloaded from the Config tab of a Box application.
Before that, however, you must generate the public/private key pair by clicking the Generate a Public/Private Key Pair button in the Manage Public Keys section in the Configuration tab. This will download a JSON file containing all of the secret credentials needed to access the Box account.
Figure 2 – Download key as JSON.
- Now, let’s create a folder in which you’ll set up a webhook. Open the URL https://app.box.com/master and in the admin console’s left pane click on Content. If you can see your app in there, it means your application is authorized now.
Right-click on your application name and select Login to user’s account. This takes you to the user console where you can create a new folder. In my case, the folder name is aiml_upload.
Figure 3 – Log in to an application.
- In this post, I will create webhook using Python script but you can create webhook from the Box CLI as well.
Before you run the script to create the webhook, let’s set up the required libraries in a Python virtual environment. Luckily, there’s only one required library for Box which needs to be set up. This is boxsdk, which is explained in detail in this documentation.
Note that if you face issues with the latest version of boxsdk, try installing an old version like 2.6.1 which works with old Python version.
- Once the virtual environment ready, I need to get the folder ID for the folder which I created in Step 6. Run this code to get the folder ID:
In the above code, I imported the boxsdk library which I installed in Step 7. I have secure credentials in a config.json file, which I downloaded in Step 5.
Next, I call the folder API to get the folder id for aiml_upload in the root directory. In Box, the root folder ID is 0. In my case, the folder ID for aiml_upload is 131206258526.
- Now, we’re all set to create the webhook. Run this code to run the webhook:
The first three lines of code are the same as explained in Step 8. In the create_webhook function, I am using the create_webhook API to create the webhook.
I pass the folder ID, target API URL, and file upload event. You can find different events in the Box developer resources under trigger.
Output of this script in my case is: Webhook ID is 523777628
Test the Webhook
Now, it’s time to test the webhook. The API used for the webhook has a Lambda function as the backend. A function will read the file content, which can be used for further processing like converting data to PDF format so it can be used by Amazon Textract.
The Lambda function uses the boxsdk library to authenticate against Box, and then uses APIs like get_items and content to read the file content.
In this example, the Lambda function code is:
This code imports the boxsdk library and uses the config.json file which I downloaded in Step 5 to authenticate against Box.
Once the authentication is successful, this Lambda function will be triggered by webhook, so code will read the file ID and the name information from the event information sent by webhook is in the JSON format.
Code read the ID and Name keys, which is file ID and file name, and in the get_file function pass the file ID to read the file content.
To test the functionality, go to the user console of an application and upload a file to the folder where you have setup a webhook. In my case, I have uploaded the file Sample.txt.
Figure 4 – Uploaded file properties.
You can see the Lambda function according to the code reads the file ID, file name, and prints the file content as “Hello World.”
Figure 5 – A Lambda function logs.
Learn more details about Box API in this API reference.
The Box cloud developer environment provides the API and webhook which can help users integrate with AWS and exchange data between different clouds while leveraging AWS services for data stored in Box cloud.
There can be different use cases where machine learning, data lakes, or big data can use this method for data transformation as well.
Box – AWS Partner Spotlight
Box is an AWS Partner and cloud content management company that empowers enterprises to securely connect their people, information, and applications.
*Already worked with Box? Rate the Partner
*To review an AWS Partner, you must be a customer that has worked with them directly on a project.