File:Gemini multimodal reasoning based on visual cues.png
Size of this preview: 800 × 564 pixels. Other resolutions: 320 × 226 pixels | 640 × 451 pixels | 1,024 × 722 pixels | 1,280 × 902 pixels | 1,495 × 1,054 pixels.
Original file (1,495 × 1,054 pixels, file size: 988 KB, MIME type: image/png)
File information
Structured data
Captions
Summary
editDescriptionGemini multimodal reasoning based on visual cues.png |
English: "Identifying the objects in the image (the Empire State Building) and recognizing what those are even with small levels of visual distortion in the image. Based on the image, the model is
also able to correctly identify the precise location of the person taking the photo. Source: photo taken by an author from the Gemini team." |
Date | |
Source | https://arxiv.org/abs/2312.11805 |
Author | Authors of the preprint: Gemini Team Google: Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, et al. |
Licensing
editThis file is licensed under the Creative Commons Attribution 4.0 International license.
- You are free:
- to share – to copy, distribute and transmit the work
- to remix – to adapt the work
- Under the following conditions:
- attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
File history
Click on a date/time to view the file as it appeared at that time.
Date/Time | Thumbnail | Dimensions | User | Comment | |
---|---|---|---|---|---|
current | 22:14, 4 March 2024 | 1,495 × 1,054 (988 KB) | Prototyperspective (talk | contribs) | Uploaded a work by Authors of the preprint: Gemini Team Google: Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, et al. from https://arxiv.org/abs/2312.11805 with UploadWizard |
You cannot overwrite this file.
File usage on Commons
There are no pages that use this file.