Japan Technology

Google releases “PaliGemma 2” visual language model that is easy to fine tune – GIGAZINE


On December 5, 2024, Google announced “PaliGemma 2”, a visual language model that adds visual functions based on the open and lightweight language model “Gemma 2”.

Introducing PaliGemma 2: Powerful Vision-Language Models, Simple Fine-Tuning – Google Developers Blog
https://developers.googleblog.com/en/introducing-paligemma-2-powerful-vision-language-models-simple-fine-tuning/

Welcome PaliGemma 2 – New vision language models by Google
https://huggingface.co/blog/paligemma2

PaliGemma is the first visual language model in the Gemma family.GitHuborHugging FaceIt has the ability to recognize images, verbally describe the content of the image, and understand the text within the image.

If you read the article below, you will see what happens when you actually use PaliGemma.

Google releases open source visual language model “PaliGemma” & announces large-scale language model “Gemma 2” with performance equivalent to Llama 3 – GIGAZINE


Now released, its successor, PaliGemma 2, is available in multiple model sizes (3B, 10B, 28B) and resolutions (224×224, 448×448, 896×896 pixels) to optimize performance for any task. will become.

Another selling point is the length of the captions, which go beyond simply recognizing objects to generating detailed, contextual captions that can describe movement, emotion, and the context of an entire scene, or even chemical formulas or musical scores. It is said that it can show excellent performance in recognition, spatial reasoning, and chest X-ray image reporting.

A demo site is also available.

Paligemma2 Vqav2 – a Hugging Face Space by merve
https://huggingface.co/spaces/merve/paligemma2-vqav2


As a test, let’s click on the sample that asks you what type of graph it is.


The model then answered, “Accuracy after fine tuning.”


Google says, “We can’t wait to see what you create with PaliGemma 2. Join the vibrant Gemma community, share your projects on Gemmaverse, and let’s continue exploring the endless possibilities of AI together.” ” he said.

Copy the title and URL of this article

Avatar

Vasundhara Mali

About Author

Leave a comment

Your email address will not be published. Required fields are marked *

You may also like

Japan Technology

It turns out that TikTok’s algorithm may be actively suppressing criticism of the Chinese government

It has been revealed that searching for terms such as “Uighur” and “Tiananmen Square” on TikTok is likely to result
Japan Technology

Even Apple has difficulty centering text in app layouts

Software engineer Martin Wojcik pointed out that the UI of Apple’s native Calculator app on macOS is misaligned. It is