In this tutorial, we’ll learn how to use the APIs to connect OpenAI DALLE with Node.js. We’ll make a complete Node.js application from scratch that will do the following
- Generate images from scratch based on a text prompt
- Edit an existing image based on a new text prompt and an image mask
- Create variations of an uploaded image
In case you want to directly try out the application, you may download the source code from the link below and run the commands
cd path/to/application
npm install
node index.js
At this point, open a browser and run http://localhost:3000. You will need an API key from OpenAI, which can be obtained here. Change the obtained key in the index.js file in the section provided below before running the code
const configuration = new Configuration({
apiKey: "sk-xxxxxOJ0zXYxjFKzxxxxlbkFJbzP64MAQbGchxxxxx",
});
Introduction
OpenAI DALL-E is a deep-learning AI model to generate images from natural language descriptions. DALL-E uses a modified version of the OpenAI GPT-3 model to read text prompts and create pictures. The latest version of DALL-E 2 is better at making images that look more real and have higher resolutions. Microsoft has made a significant investment in OpenAI to improve tools like ChatGPT and DALL-E. Already, DALL-E and ChatGPT are part of the Bing search engine and Edge browsers. Check it out here, Image Creator. OpenAI has an API that allows developers to create their own applications using the model.
Fun Fact: The name "DALL-E" has been coined by combining two words from the animated robot Pixar character WALL-E and the Spanish surrealist artist Salvador Dalí.
OpenAI DALL-E 2 cost
To be clear, the pricing for the DALLE preview app is different from calling the APIs. While the preview app charges 1 token per image generation, the pricing for API calls depends on the image size generated. Here are the differences
DALL-E preview app | API calls | |
Charges | One successful request is equal to 1 token consumed | Depends on the size of the output image |
Cost factor | Per request | Per generated image |
Freebies | 50 free tokens for the first month. 12 free tokens thereafter every month. | $18 in free credit that can be used during your first 3 months. |
Where to buy | Buy using an integrated Payment gateway. The charges for paid tokens are $15 for 115 extra tokens | Need to contact sales |
The charges for calling the APIs are described in the table below (as provided here)
RESOLUTION | PRICE |
1024×1024 | $0.020 / image |
512×512 | $0.018 / image |
256×256 | $0.016 / image |
How to use DALL-E 2
OpenAI’s DALL-E is a powerful tool to manipulate and create images using AI. As of writing, the model can generate only square images, by default. Supported dimensions are 256×256, 512×512, or 1024×1024 pixels. Also, smaller images are generated faster
You can access DALL-E 2 via the DALL-E Preview App or OpenAI APIs. We will cover the details of the latter in detail in this section.
DALL-E Preview App
Link: https://labs.openai.com
The DALL-E preview app is a web interface that OpenAI gives you to make images based on what you tell it. The interface lets us generate imaginary or real photos. Also, you can upload your own image to do AI-based edits. The preview app is helpful for most use cases unless you want to create a custom application for which APIs are used
API-based access
Link: https://platform.openai.com/docs/guides/images/introduction
For creating a custom application using OpenAI DALL-E, we use the APIs provided for the model. This approach gives better control and flexibility to the DALL-E model like controlling the number of generated outputs, dimensions, etc. Also, it can be used with any programming language.
In this tutorial, we will cover the API-based access part to create a Node.js application to interact with the DALL-E model and get an image
Here are other detailed tutorials on GPT 3 integration with Node.js, OpenAI Codex with Node.js, and OpenAI Whisper integration with Node.js, if you are interested in reading and implementing them with your application.
Integrating OpenAI DALLE with Node.js
Here are the screenshots of the desired application.
Before starting, you will need an OpenAI Key, which can be obtained here
Let’s dive into the details of integrating OpenAI DALLE with Node.js. The first step is creating Node.js and installing the required dependencies. Run the following commands
npm init
npm install openai multer hbs express-handlebars express cors body-parse
In the step above, we have installed the OpenAI library along with a few others. Also, for creating the layouts, we will use the HandleBars (HBS) templating engine with Bootstrap.
Creating folders
At this point, create the following folders
uploads: To store uploaded images
views/layouts: To store the master template for the app
views/partials: To store individual screens of the app
Ultimately the code folder structure should look like the following screenshot
Creating the template files
1. master.handlebars (views/layouts/master.handlebars)
This file will hold the master template for our application, including the Bootstrap CDN link and the header buttons. This also contains a “{{body}}” tag, where all the individual template codes will appear when a particular URL is called from the browser.
Here is the code for the file
<html lang="en">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<title>OpenAI DALL-E 2 Node.js Tutorial</title>
<meta
name="viewport"
content="width=device-width, initial-scale=1.0, viewport-fit=cover"
/>
<link
rel="stylesheet"
href="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/css/bootstrap.min.css"
integrity="sha384-rbsA2VBKQhggwzxH7pPCaAqO46MgnOM80zW1RWuH61DGLwZJEdK2Kadq2F9CUG65"
crossorigin="anonymous"
/>
<link rel="stylesheet" type="text/css" href="./style.css" />
</head>
<body>
<div>
<a href="/" class="btn btn-sm btn-primary">Generate</a>
<a href="/get-upload" class="btn btn-sm btn-success">Image Edit</a>
<a href="/get-variation" class="btn btn-sm btn-warning">Variation</a>
</div>
{{{body}}}
</body>
</html>
2. index.handlebars (views/partials/index.handlebars)
Template file to handle the image generation using a query. After clicking the submit button, the file sends a post request to the Node.js server with the user input text from <input type=”text”>. Once a request is submitted, the Node.js server hits the OpenAI API and fetches the image, which is displayed below the Submit button. Here is the code for the file
<h1 class="mt-2">Generate Image</h1>
<code>Will generate a square image of 256 x 256</code>
<div class="mt-2 text-center">
<div class="col col-lg-6 p-auto m-auto">
<form method="post" name="openai-codex-with-nodejs" action="/">
<div class="mt-2">
<input
type="text"
placeholder="Describe your image here"
name="queryPrompt"
class="form-control"
/>
</div>
<div class="mt-2">
<input type="submit" name="submit" class="btn btn-sm btn-danger" />
</div>
</form>
</div>
</div>
<div class="mt-2">
{{#if data}}
<img src="{{data}}" />
{{/if}}
</div>
3. getupload.handlebars (views/partials/getupload.handlebars)
This template file handles the editing of images using a mask and a prompt. The template has form fields for two file uploads and an input form. The first file input accepts the original square png image while the second accepts the masked version of the same image. The input text form accepts text to replace the contents in the masked image. Here are examples of original and masked png images.
Here is the code for the layout file
<h1 class="mt-2">Edit uploaded image</h1>
<code>Upload a square image only.</code><br />
<code>Both mask and original files should have sam dimensions</code><br />
<code><b>*Image format supported -PNG only</b></code>
<div class="mt-2 text-center">
<div class="col col-lg-12 p-auto m-auto">
<form
method="post"
name="openai-codex-with-nodejs"
action="get-upload"
enctype="multipart/form-data"
class="btn btn-light"
>
<div class="row">
<div class="col-md-3">Original file</div>
<div class="col-md-3"><input type="file" name="upload_image" /></div>
</div>
<div class="row mt-2">
<div class="col-md-3">Masked file</div>
<div class="col-md-3"><input
type="file"
name="upload_masked_image"
/></div>
</div>
<div class="row mt-2">
<div class="col-md-3">Description</div>
<div class="col-md-3"><input
type="text"
name="queryPrompt"
class="font"
placeholder="Explain needed changes"
/></div>
</div>
<div class="mt-2">
<input type="submit" name="submit" class="btn btn-sm btn-danger" />
</div>
</form>
</div>
</div>
<div class="mt-2">
{{#if data}}
<b>Output</b><br />
<img src="{{data}}" />
{{/if}}
</div>
4. getvariation.handlebars (views/partials/getvariation.handlebars)
This file template handles generating a variation of the provided image. It accepts just a single square png image. Here is the code for the template file
<h1 class="mt-2">Get Image Variation</h1>
<code>Upload a square image only.</code><br />
<code>File size should be less than 4mb</code><br />
<code><b>*Image format supported -PNG only</b></code>
<div class="mt-2 text-center">
<div class="col col-lg-6 p-auto m-auto">
<form
method="post"
name="openai-codex-with-nodejs"
action="get-variation"
enctype="multipart/form-data"
class="btn btn-light"
>
<input type="file" name="upload_image" />
<div class="mt-2">
<input type="submit" name="submit" class="btn btn-sm btn-danger" />
</div>
</form>
</div>
</div>
<div class="mt-2">
{{#if data}}
<b>Output</b><br />
<img src="{{data}}" />
{{/if}}
</div>
Writing the Node.js server
The complete code of the application is written inside the index.js file. This file contains the configurations as well as the routes for the layouts. I will explain the code components, but first, let’s dive into the OpenAI DALLE parameters. These parameters dictate the output of the API. Here are the parameters
OpenAI DALLE parameters
prompt (string, Required)
: A text description of the desired image(s). The maximum length is 1000 characters.n (integer, Optional)
: Defaults to 1. The number of images to generate. Must be between 1 and 10.size (string, Optional)
: Defaults to 1024×1024. The size of the generated images. Must be one of256x256
,512x512
, or1024x1024
.response_format (string, Optional) :
Defaults to URL. The format in which the generated images are returned. Must be one ofurl
orb64_json
.user (string, Optional)
: A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse.
Now that the parameters are explained, let’s dive into the actual code.
Configuration
This section defines the common OpenAI configuration, configurations for HandleBars, the application port, and other settings. Each section has been defined in the code-comments below
const app = express();
app.use(cors());
app.use(bodyParser.urlencoded({ extended: false }));
app.use(bodyParser.json());
app.use(express.static("public"));
app.use(express.static("uploads"));
app.set("view engine", "handlebars");
const upload = multer({ dest: "uploads/" });
const port = 3000;
// Initiate the OpenAI configuration
const { Configuration, OpenAIApi } = require("openai");
const configuration = new Configuration({
apiKey: "sk-xxxxxOJ0zXYxjFKzxxxxlbkFJbzP64MAQbGchxxxxx",
});
const openai = new OpenAIApi(configuration);
//Sets handlebars configurations for layouts
app.engine(
"handlebars",
handlebars.engine({
layoutsDir: __dirname + "/views/layouts",
})
);
Routes for layouts
There is a total of 6 routes. For each layout, there are two routes: a GET request route and a POST request route. The GET routes handle the view while the POST routes handle the form request processing, API calls to OpenAI, and sending back the response. The POST routes are the ones where the OpenAI parameters explained above are used to generate the required images.
For our example app, the value of n is always 1. So, it generates a single image only. The size parameter is set to 256×256 px which generates a small image. Lastly, response_format and user are set to default. This results in the output of the response_format being an image URL and not base64_json data.
The uploaded images are saved to the uploads folder while the OpenAI APIs are called and then deleted once a response is received.
Here is the code. Check the comments to see the demarkation of routes
// Generate image - - Get input
app.get("/", (req, res) => {
res.render("./partials/index", { layout: "master" });
});
app.post("/", async (req, res) => {
const data = req.body;
const response = await openai.createImage({
prompt: data.queryPrompt,
n: 1,
size: "256x256",
});
image_url = response.data.data[0].url;
res.render("./partials/index", {
layout: "master",
data: image_url,
});
});
// Upload image - - Get input
app.get("/get-upload", (req, res) => {
res.render("./partials/getupload", { layout: "master" });
});
const cpUpload = upload.fields([
{ name: "upload_image", maxCount: 1 },
{ name: "upload_masked_image", maxCount: 1 },
]);
app.post("/get-upload", cpUpload, async function (req, res, next) {
const response = await openai.createImageEdit(
fs.createReadStream(req.files.upload_image[0].path),
fs.createReadStream(req.files.upload_masked_image[0].path),
req.body.queryPrompt,
1,
"256x256"
);
fs.unlink(req.files.upload_image[0].path, (err) => {
if (err) {
console.log(err);
}
});
fs.unlink(req.files.upload_masked_image[0].path, (err) => {
if (err) {
console.log(err);
}
});
image_url = response.data.data[0].url;
res.render("./partials/getupload", {
layout: "master",
data: image_url,
original: req.files.upload_image[0].filename,
mask: req.files.upload_masked_image[0].filename,
});
});
// Image Variation
app.get("/get-variation", (req, res) => {
res.render("./partials/getvariation", { layout: "master" });
});
app.post(
"/get-variation",
upload.single("upload_image"),
async function (req, res, next) {
const response = await openai.createImageVariation(
fs.createReadStream(req.file.path),
1,
"256x256"
);
fs.unlink(req.file.path, (err) => {
if (err) {
console.log(err);
}
});
image_url = response.data.data[0].url;
res.render("./partials/getvariation", {
layout: "master",
data: image_url,
});
}
);
The last line initiates our server and the application listens to port 3000
app.listen(port, () => console.log(`App listening to port ${port}`));
Conclusion
In this tutorial, we learned how to create an application using OpenAI DALLE with Node.js. We also learned to implement the various parameters for DALLE API configuration. Finally, we covered all the methods provided by DALLE to manipulate the provided images.
I hope you liked the tutorial and hope it was helpful in understanding the OpenAI DALLE APIs. Let me know in the comment section below.