Google AI has announced WIT: a data series linking text images and context from Wikipedia

The system that includes 11 million images and their contexts in 108 languages ​​is intended for artificial intelligence training. The data series is available to the public and Google will also hold a WIT-based application competition together with the Wikimedia Foundation and the KEGGLE website

דוגמה לניתוח תמונה והקשר מויקיפדיה עבור פרויקט WIT של גוגל AI. צילום יחצ
An example of image analysis and context from Wikipedia for the Google AI WIT project. Press photo

Google is today celebrating its 23rd anniversary. Google AI, one of the company’s junior divisions, announced WIT: a data series linking text images and the Wikipedia context open to the general public for artificial intelligence training.

Research, Google Research has published the details of Google AI’s announcement of WIT – a huge series of images from Wikipedia and their adaptation to text in many languages ​​- for artificial intelligence training.

In their blog on the Google AI site

” Traditionally, these data sets were created by manually adding captions to images, or scanning the web and extracting the alternative text as captions for images. While the previous approach allows Higher quality data, the intensive manual interpretation process limits the amount of data that can be generated. Kim. Another shortcoming of existing data sets is the lack of coverage in non-English languages. The speaker naturally led us to ask: Is it possible to overcome these limitations and create a high-quality, large and multilingual data set with a variety of contents? “

” Today we present the data set Of Wikipedia-based texts and images (WIT), created by extracting multiple texts in image descriptions from Wikipedia articles and image links in Wikipedia. We conducted a rigorous screening that ensured that only high-quality text-image kits would be scanned. As outlined in “WIT: Wikipedia-Based Image Text Data Kit for Multilingual Multilingual Multilingual Machine Learning” presented at SIGIR ’21, the result was a repository of 37.5 million rich text and image examples including 11.5 million Unique images and their descriptions in 108 languages. The WIT dataset is available for download and use under a Creative Commons license. “

The unique advantages of the WIT dataset are:

  • Size: WIT is the largest multi-modal data set of text-examples Image available to the public. Multilingual: 108 languages WIT has 10 languages ​​more than any other data set. Contextual information: Unlike typical multimodal data systems, which have only one caption per image, WIT includes information that includes page-level and section-level relationships. Entities in the real world: Wikipedia, being a broad knowledge base, is rich in real-world entities represented in WIT.

  • Challenging test set: In our recent work received at EMNLP, all the latest models demonstrated Significantly lower performance in WIT compared to traditional evaluation sets (e.g., a decrease of about 30 points in memory). A quality training kit and Challenging Evaluation Index The extensive coverage of diverse concepts in Wikipedia means that WIT’s evaluation systems serve as a challenging criterion, Even for modern models. We found that The average recovery scores

  • Note: This article has been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

    Related Posts
    The tough mission to redeem BioWare's prestige thumbnail

    The tough mission to redeem BioWare's prestige

    Quando em junho de 2021 Gary McKay assumiu o cargo de gerente geral da BioWare, ele estava dando início a um dos maiores desafios da sua carreira. Comandar um grande estúdio nunca será fácil, mas no caso daquele que já foi visto como um dos principais quando se trata de RPGs, além do executivo ter…
    Read More
    ‘Beneath the Sea’ is Now Available As An eBook thumbnail

    ‘Beneath the Sea’ is Now Available As An eBook

    The popular book “Beneath the Sea” is now available as an ebook. The book is the work of diving pioneer Bill High who, aside from his role as founder of PSI-PCI has achieved many feats in a long and illustrious diving carer, including: Authoring the original NOAA Dive Manual and playing a critical role in
    Read More
    Webb Team Releases Test Images of Jupiter and Its Moons thumbnail

    Webb Team Releases Test Images of Jupiter and Its Moons

    Several images of Jupiter and its moons as well as images and spectra of solar system asteroids from Webb’s commissioning period are now available on the Mikulski Archive for Space Telescopes. This image from Webb’s NIRCam instrument 2.12 micron filter shows Jupiter (center) and its moon Europa (left). Image credit: NASA / ESA / CSA…
    Read More
    It's Christmas: extensions to book Mag Futura thumbnail

    It's Christmas: extensions to book Mag Futura

    Vous êtes déjà plus de 4.000 à avoir réservé le premier Mag Futura : MERCI ! Devant cet engouement, nous avons décidé de prolonger de quelques jours les participations à notre campagne Ulule ! Dernière ligne droite pour recevoir le premier numéro avant tout le monde.Cela vous intéressera aussi [EN VIDÉO] Le Mag Futura, première…
    Read More
    How to Turn Off Transparency in Windows 11 thumbnail

    How to Turn Off Transparency in Windows 11

    Windows 11 includes fancy new transparency effects in its windows, taskbar, and some menus. If you don’t like them, it’s easy to disable translucent interface elements in Windows 11 with the flip of a switch. Here’s how. First, open Windows Settings by searching for “Settings” in the Start menu and clicking its icon. Or, you…
    Read More
    Frozen tardigrade becomes first 'quantum entangled' animal in history, researchers claim thumbnail

    Frozen tardigrade becomes first ‘quantum entangled’ animal in history, researchers claim

    Home News (Image credit: STEVE GSCHMEISSNER/SCIENCE PHOTO LIBRARY/Getty Images) Tardigrades — those microscopic, plump-bodied critters lovingly known as "moss piglets" — have been put through the ringer for science. The amazingly durable creatures have been shot out of guns, bathed in boiling-hot water, exposed to intense ultraviolet radiation and even (accidentally) crash-landed on the moon,…
    Read More
    Index Of News
    Consider making some contribution to keep us going. We are donation based team who works to bring the best content to the readers. Every donation matters.
    Donate Now

    Subscription Form

    Liking our Index Of News so far? Would you like to subscribe to receive news updates daily?