On the afternoon of May 14th local time, Google held its annual I/O developer conference in Mountain View, USA.
In the span of 110 minutes, the keyword "Artificial Intelligence (AI)" was mentioned 121 times, showcasing Google's all-out stance and evident ambition in the field of artificial intelligence.
As Google's flagship model, Gemini and its various iterations took center stage, capturing all the attention. Google is integrating it into almost all of its own products, including the Android operating system, search, browser, and Gmail, among others, with various demonstrations that dazzled the audience.
Previously, Google Gemini had three versions: Ultra, Pro, and Nano, each with different sizes, performance, and specific scenarios they are designed for.
Now, at the conference, Google has launched a new version, Gemini 1.5 Flash. Google stated that the new multimodal model is as powerful as the Gemini 1.5 Pro, but it has been optimized for "high-frequency, low-latency tasks." This allows it to produce faster responses more effectively.Google has also made some upgrades to Gemini 1.5, which are said to enhance its translation, reasoning, and encoding capabilities. In addition, Google has stated that it has doubled the context window of Gemini 1.5 Pro (the amount of information it can receive) from 1 million tokens to 2 million tokens.
Advertisement
Currently, both Gemini 1.5 Pro and 1.5 Flash are open for public preview. Google also disclosed that the number of Gemini developers has now exceeded 1.5 million, and more than 2 billion users have experienced the power of Gemini.
With the support of Gemini, several Google products have welcomed new features. For example, Google Photos will add the Ask Photos feature later this year, which can now better search for photos, recognize different photo backgrounds, or even find photos based on license plate numbers or answer other questions about the content of the photos.
Google CEO Sundar Pichai said on stage that Gemini can "transform any input into any output." This means it can extract information from text, photos, audio, social or online videos, and real-time video from mobile phone cameras, integrate this information, and finally summarize the content and answer questions.
Google demonstrated a video in which a person scanned all the books on the shelf with a camera and recorded the book titles in a database for later identification.Another major announcement made by Google at the conference was the upcoming launch of a new system called Astra later this year, promising it to be the most powerful and advanced artificial intelligence assistant Google has ever released.
The current generation of AI assistants, such as ChatGPT, can retrieve information and provide answers, but their capabilities are limited to this. However, this year, Google has rebranded its virtual assistants to the more advanced "agents," which are said to possess skills in reasoning, planning, and memory, and are capable of taking multiple steps to perform tasks.
Oriol Vinyals, Vice President of Research at Google DeepMind, told MIT Technology Review that people will be able to use Astra through smartphones or even desktop computers, but the company is also exploring other options, such as embedding it in smart glasses or other devices.
It is worth mentioning that during the demonstration video played at the I/O conference, sharp-eyed viewers spotted a device that appeared to be a prototype of Google Glass. This suggests that Google may have revived the smart glasses project that failed in the early years.
"We are at the early stages of developing artificial intelligence agents," said Google CEO Pichai in a conference call prior to the I/O conference."We have always hoped to build a general intelligence that is useful in everyday life," said Demis Hassabis, CEO and co-founder of Google DeepMind.
"Imagine these agents can see and hear what we do, better understand the environment we are in, and respond quickly in conversations, making the speed and quality of interaction more natural," he added, "This is what Astra will look like in the future."
The day before Google's I/O conference, its competitor OpenAI launched its own super artificial intelligence assistant, GPT-4o. Google DeepMind's Astra responds to audio and video input in a very similar way to GPT-4o.
In Google's demonstration video, a user points a smartphone camera and smart glasses at an object and asks Astra to explain what they are. When the user points the device towards the window and asks, "What do you think my community is?", the artificial intelligence system is able to recognize King's Cross in London, where the headquarters of Google DeepMind is located.
It can also remind the user that the glasses are on the table because it recorded this in a previous interaction.Vinyals stated that the demonstration showcased Google DeepMind's vision for real-time multimodal artificial intelligence, which can handle various types of inputs, including speech, video, text, and so on.
"We are very excited about the future, where we can truly get close to users and provide them with any help they want," he said. Google has also upgraded its artificial intelligence model Gemini to handle a larger amount of data, which helps it process larger documents and videos, and engage in longer conversations.
Technology companies are competing for the "hegemony" position in the field of artificial intelligence, and large technology companies have made artificial intelligence agents their "favorites" to demonstrate that they are pushing the forefront of technological development.
Many technology companies have incorporated artificial intelligence agents into their narratives, including OpenAI and Google DeepMind. The goal of these companies is to build artificial general intelligence (AGI), which is an idea about a super artificial intelligence system that is still largely in the conceptual stage.
Professor Chirag Shah, who specializes in online search at the University of Washington, said: "Ultimately, you will have an agent that truly understands you, can do many things for you, and can work across multiple tasks and domains."This vision is desirable, but Google's press conference today is its latest effort to compete with rivals. Shah said that by launching these products, Google can collect more data from over 1 billion users, understand how they use models and which models are effective.
At the I/O conference, in addition to the AI agent, Google also launched more new AI features.
It will integrate AI more deeply into the search engine through a new feature called AI Overviews, which collects information from the Internet and refines it into brief summaries displayed to users as part of the search results. This feature has been launched in the United States and will be opened to more countries and regions later.
Felix Simon, a researcher at the Reuters Institute for the Study of Journalism, Artificial Intelligence, and Digital News, said that this will help speed up the search process and provide users with more specific answers to more complex and niche questions.
"I think this is where search has always struggled to do well," he said.Another new feature of Google's artificial intelligence search is better planning. For example, people will soon be able to ask the search for dining and travel recommendations, just as they would ask a travel agency for restaurant and hotel recommendations.
Give it a recipe, and Gemini will be able to help users plan what to do or what to buy. Users can also converse with the artificial intelligence system, asking it to complete many tasks, from simple tasks like telling them the weather conditions, to complex tasks such as helping them prepare for interviews or important speeches.
People can also interrupt Gemini's responses and ask clarifying questions, just like conversing with a human. Coincidentally, the GPT-4o demonstrated by OpenAI yesterday also has the same capability.
To further compete with rival OpenAI, Google has also launched Veo, a new video generation artificial intelligence system. Veo can generate short videos and understand prompts such as "time-lapse" or "aerial view of scenery," allowing users to better control the style of the video clips.
Google has a significant advantage in training video generation models because it has YouTube. The company has already announced collaborations with artists such as Donald Glover and Wyclef Jean, who are using the company's technology to create their own works.Earlier this year, when asked whether OpenAI's models had used YouTube data during training, OpenAI's Chief Technology Officer Mira Murati did not provide a clear answer.
Google DeepMind's Senior Research Director Douglas Eck, when asked by MIT Technology Review, was also vague about the training data used to create Veo, but he said "it may be trained on some YouTube content according to our agreements with YouTube creators."
Shaah said that on the one hand, Google promotes its generative artificial intelligence as a tool that artists can use for creation, but on the other hand, these tools are likely to learn how to create new things by using the works of existing artists.
Artificial intelligence companies such as Google and OpenAI are facing a series of lawsuits from writers and artists who claim that their intellectual property is being used without consent or payment.
"It's a double-edged sword for artists," said Shaah.Finally, to better distinguish between AI-generated content and real content, Google has also expanded its SynthID watermarking tool. It is designed to detect misinformation generated by artificial intelligence, deepfakes, or phishing spam.
SynthID leaves imperceptible watermarks in the generated content, which humans cannot see but can be detected using software that analyzes pixel data. The tool can now scan content on the Gemini application, the web, and content generated by Veo. Google says it plans to release SynthID as an open-source tool later this summer.
Leave your email and subscribe to our latest articles