Apple released open source AI image editing model

Apple is moving into AI image editing with an open source multimodal AI model. Earlier this week, researchers from Apple and the University of California, Santa Barbara released MLLM-Guided Image Editing, or “MGIE”; a multimodal AI model capable of editing images like Photoshop, based on simple text commands. On the AI development front, Apple has […]

Technology Feb 10, 2024 86 Add to Reading List

Apple released open source AI image editing model

Apple is moving into AI image editing with an open source multimodal AI model.

Earlier this week, researchers from Apple and the University of California, Santa Barbara released MLLM-Guided Image Editing, or “MGIE”; a multimodal AI model capable of editing images like Photoshop, based on simple text commands.

On the AI development front, Apple has been particularly cautious about its plans. It was also one of the few companies that didn’t announce any major AI projects following last year’s ChatGPT hype. However, Apple is said to have an internal version of a ChatGPT-like chatbot dubbed “Apple GPT” and Tim Cook has said that Apple will make major AI announcements later this year.

SEE ALSO:

Tim Cook says Apple AI’s big announcement will happen later this year

Whether this announcement includes an AI image editing tool remains to be seen, but based on this model, Apple is certainly conducting research and development.

Although AI image editing tools already exist, “human instructions are sometimes too brief for current methods to capture and follow,” the research paper states. This often leads to poor or ineffective results. MGIE is a different approach that uses MLLMs, or large multimodal language models, to understand text prompts or “expressive instructions,” as well as image-forming data. Indeed, learning MLLMs helps MGIE understand natural language commands without the need for a detailed description.

In examples from research, MGIE can take an input image of a pepperoni pizza and, using the prompt “make this healthier”, infer that “this” refers to pizza with pepperoni and that “healthier” can be interpreted as adding vegetables. So the output image is a pepperoni pizza with some greens scattered on top.

In another example comparing MGIE to other models, the input image is a forested shoreline and a still body of water. With the prompt “add lightning and make the water reflect the lightning,” other models omit the lightning reflection, but MGIE manages to capture it.

MGIE is available as an open source model on GitHub and in demo version hosted on Cuddly face.

The subjects
Apple artificial intelligence

Teknory