
Connecting with AI: Creating Images with OpenAI
This blog post is part of an independent series called Connecting with AI, authored by Felipe Mantilla, an AI engineering expert at Gorilla Logic. The series' goal is to make the world of AI more accessible, support those who want to learn more about the field, and establish foundations on the most interesting advancements in AI. You can find the original version of this post in Spanish on his Medium blog.
Artificial intelligence has recently captured the attention of people across various fields, including designers and social media content creators. These professionals are eager to tap into new ways to boost their productivity and, in the best-case scenarios, achieve outstanding results.
To explore these possibilities, we'll dive into some of the other tools available through OpenAI's API—because it's not just about text.
Images
When it comes to working with images, OpenAI offers two main capabilities:
- Uploading and processing images to understand their content
- Generating new images
Let’s take a closer look at both options.
Vision
To process images, we’ll need to use Vision API.
The image for this test is hosted in the cloud. You can review it at the link provided in the prompt.
This is the same example that you’ll find in the official documentation, but in that documentation, it’s in Python. We can also send the images in base64.
When making the request using the image saved in our directory, we get a response similar to the following (I previously downloaded and used the cover image from this article as an example; I recommend using your own image instead).
The full implementation of the basic image analysis can be found in the following commit.
Here are some additional options, including:
- Sending multiple images
- Understanding low- or high-fidelity images
- Some specific limitations
- Cost calculation
Image Generation
Image understanding is handled through language models, such as the GPT-4-mini we used earlier. For this use case, we need to use the DALL·E image generation model.
The image API offers three options:
- Create images from scratch based on a text prompt (DALL·E 3 and DALL·E 2)
- Edit existing images by having the model replace certain areas of the image using a new prompt (DALL·E 2 only)
- Create variations of an existing image (DALL·E 2 only)
To generate an image, we can run code like this:
This will give us a result similar to the following:
Each image can be returned either as a URL or as Base64-encoded data, using the response_format
parameter. URLs will expire after one hour.
By default, the image is stored in the cloud and its URL is returned to us. When inspecting the URL, you'll see something like this:
Great — we’ve now seen how easy it is to generate images. But what if we want to modify something specific or create variations? For these cases, we can use DALL·E 2.
You can view the implementation in this commit.
Image Editing
To edit images, three parameters are required:
- The image to be edited
- The image mask (identical to the original, but with the section to be modified removed)
- A prompt describing the desired changes in the cropped area
So, we’ll modify the code to store the image locally on our machine instead of saving it to the cloud.
Now that we've downloaded the image locally, we can modify it. For example, this is the image that I've received:
To modify the image, let’s add a balloon to the sky. To do this, we’ll create the image mask using a free online tool.
In this example, I’ll use Pixlr.
This could result in:
To modify the image, we need to use additional properties when calling the API. Let’s adjust the code accordingly.
Heads up! Be careful with the prompt. The transparent areas of the mask indicate where the image should be edited, and the prompt must describe the entire new image — not just the erased section.
Finally, the edited image should turn out something like this:
You can see the implementation in this commit.
Variations
In addition to editing images, we can request variations, which will produce more significant changes. Let’s take a look at the results:
The result is the following:
You can view the implementation in this commit.
Conclusion
OpenAI’s API offers a robust set of tools for image processing and generation that can significantly enhance our projects. We’ve explored its three main features: image analysis using Vision, new image generation with DALL·E 3, and editing and variation capabilities with DALL·E 2. Each of these tools has its own strengths and limitations, but they all share the same ease of implementation thanks to OpenAI’s well-structured API.
For developers just getting started with AI-powered image processing, this toolkit provides an excellent starting point—enabling everything from basic analysis to complex image manipulation. The flexibility to work with both URLs and Base64-encoded images, along with the option to store results locally, gives us the freedom to integrate these capabilities into any type of application.
Always remember to consider the associated costs and each model’s limitations when planning your implementation, and make sure to carefully craft your prompts to get the best results—especially when performing image editing tasks.