Blog_AIChallenge_Hero.jpg
Blog

Connecting with AI: Creating Images with OpenAI

This blog post is part of an independent series called Connecting with AI, authored by Felipe Mantilla, an AI engineering expert at Gorilla Logic. The series' goal is to make the world of AI more accessible, support those who want to learn more about the field, and establish foundations on the most interesting advancements in AI. You can find the original version of this post in Spanish on his Medium blog

Artificial intelligence has recently captured the attention of people across various fields, including designers and social media content creators. These professionals are eager to tap into new ways to boost their productivity and, in the best-case scenarios, achieve outstanding results.

To explore these possibilities, we'll dive into some of the other tools available through OpenAI's API—because it's not just about text.

Images

When it comes to working with images, OpenAI offers two main capabilities:

  1. Uploading and processing images to understand their content
  2. Generating new images

Let’s take a closer look at both options.

Vision

To process images, we’ll need to use Vision API.

Screenshot 2025-04-15 at 9.14.30 AM.png

The image for this test is hosted in the cloud. You can review it at the link provided in the prompt.

aiblog1.jpg

This is the same example that you’ll find in the official documentation, but in that documentation, it’s in Python. We can also send the images in base64.

Screenshot 2025-04-15 at 9.20.09 AM.png

When making the request using the image saved in our directory, we get a response similar to the following (I previously downloaded and used the cover image from this article as an example; I recommend using your own image instead).

aiblog2.jpg

The full implementation of the basic image analysis can be found in the following commit.

Here are some additional options, including:

Image Generation

Image understanding is handled through language models, such as the GPT-4-mini we used earlier. For this use case, we need to use the DALL·E image generation model. 

The image API offers three options:

  • Create images from scratch based on a text prompt (DALL·E 3 and DALL·E 2)
  • Edit existing images by having the model replace certain areas of the image using a new prompt (DALL·E 2 only)
  • Create variations of an existing image (DALL·E 2 only)

To generate an image, we can run code like this:

Screenshot 2025-04-15 at 9.30.31 AM.png

This will give us a result similar to the following:

aiblog3.jpg

Each image can be returned either as a URL or as Base64-encoded data, using the response_format parameter. URLs will expire after one hour.

By default, the image is stored in the cloud and its URL is returned to us. When inspecting the URL, you'll see something like this:

aiblog4.jpg

Great — we’ve now seen how easy it is to generate images. But what if we want to modify something specific or create variations? For these cases, we can use DALL·E 2.

You can view the implementation in this commit.

Image Editing

To edit images, three parameters are required:

  • The image to be edited
  • The image mask (identical to the original, but with the section to be modified removed)
  • A prompt describing the desired changes in the cropped area

So, we’ll modify the code to store the image locally on our machine instead of saving it to the cloud.

Screenshot 2025-04-15 at 9.42.13 AM.png

Now that we've downloaded the image locally, we can modify it. For example, this is the image that I've received:

aiblog5.jpg

To modify the image, let’s add a balloon to the sky. To do this, we’ll create the image mask using a free online tool.

In this example, I’ll use Pixlr.

aiblog6.gif

This could result in:

aiblog7.jpg

To modify the image, we need to use additional properties when calling the API. Let’s adjust the code accordingly.

 

Screenshot 2025-04-15 at 9.49.13 AM.png

Heads up! Be careful with the prompt. The transparent areas of the mask indicate where the image should be edited, and the prompt must describe the entire new image — not just the erased section.

Finally, the edited image should turn out something like this:

aiblog8.jpg

You can see the implementation in this commit.

Variations

In addition to editing images, we can request variations, which will produce more significant changes. Let’s take a look at the results:

Screenshot 2025-04-15 at 9.52.41 AM.png
 The result is the following:

aiblog9.jpg

You can view the implementation in this commit.

Conclusion

OpenAI’s API offers a robust set of tools for image processing and generation that can significantly enhance our projects. We’ve explored its three main features: image analysis using Vision, new image generation with DALL·E 3, and editing and variation capabilities with DALL·E 2. Each of these tools has its own strengths and limitations, but they all share the same ease of implementation thanks to OpenAI’s well-structured API.

For developers just getting started with AI-powered image processing, this toolkit provides an excellent starting point—enabling everything from basic analysis to complex image manipulation. The flexibility to work with both URLs and Base64-encoded images, along with the option to store results locally, gives us the freedom to integrate these capabilities into any type of application.

Always remember to consider the associated costs and each model’s limitations when planning your implementation, and make sure to carefully craft your prompts to get the best results—especially when performing image editing tasks.

Ready to be Unstoppable? Partner with Gorilla Logic, and you can be.

TALK TO OUR SALES TEAM