What are dolphins saying to each other? What are they trying to tell us? Recent advancements in machine learning and large language models (LLMs) could be moving us closer to achieving the long-elusive goal of interspecies communication.
Google on Monday, April 14, announced that its foundational AI model called DolphinGemma will be made accessible to other researchers in the field this summer. The tech giant claimed that this open AI model has been trained to generate “novel dolphin-like sound sequences” and will one day facilitate interactive communication between humans and dolphins.
“By identifying recurring sound patterns, clusters and reliable sequences, the model can help researchers uncover hidden structures and potential meanings within the dolphins’ natural communication — a task previously requiring immense human effort,” Google said in a blog post.
“Eventually, these patterns, augmented with synthetic sounds created by the researchers to refer to objects with which the dolphins like to play, may establish a shared vocabulary with the dolphins for interactive communication,” it added.
The AI model has been developed by Google in collaboration with AI researchers at Georgia Tech. It has been trained on datasets collected from field researchers working with Wild Dolphin Project (WDP), a non-profit research organisation.
What is DolphinGemma?
DolphinGemma is a lightweight, small language model with a parameter count of 400 million that makes it optimal to run on Pixel phones to be used by WDP researchers underwater, Google claimed.
Its underlying technology comprises Google’s SoundStream tokenizer used to convert the dolphin sounds into a string of discrete, manageable units called tokens. The model architecture borrows from Google’s Gemma series of lightweight, open AI models.
Story continues below this ad
DolphinGemma is an audio-in and audio-out model, meaning that it processes sound rather than text and likely cannot respond to written prompts.
Similar to how LLMs predict the next word or token in a sentence in human language, DolphinGemma analyses “sequences of natural dolphin sounds to identify patterns, structure and ultimately predict the likely subsequent sounds in a sequence,” Google said.
The company revealed that it plans on releasing DolphinGemma as an open model so that other researchers can fine-tune the model based on the sounds of various cetacean species such as bottlenose and spinner dolphins.
What data was used to train DolphinGemma?
According to Google, the AI model was trained on WDP’s dataset of sounds made by the wild Atlantic spotted dolphin. This specific community of dolphins are said to be found in the Bahamas, an island country in the Caribbean.
Story continues below this ad
The dataset used to train the AI model originated from underwater video footage and audio recordings collected over decades. This data was labelled by WDP researchers to specify individual dolphin identities as well as their life histories and observed behaviours.
Instead of making surface observations, WDP researchers went underwater to gather the data as they found that it helped them directly link the sounds made by the dolphins to their specific behaviours.
The DolphinGemma training dataset comprises unique dolphin sounds such as signature whistles (used by mothers to call their calves), burst-pulse squawks (usually heard when two dolphins are fighting), and click buzzes (often heard during courtships or chasing sharks).
How to use the DolphinGemma AI model?
In order to establish a shared vocabulary of dolphin sounds, Google said it teamed up with Georgia Tech researchers to develop the CHAT system.
Story continues below this ad
CHAT is short for Cetacean Hearing Augmentation Telemetry. It is an underwater computer system designed to link AI-generated dolphin sounds with specific objects that dolphins enjoy like seagrass or scarves the researchers use.
Google said that the CHAT tool enables a two-way interaction between humans and dolphins by accurately hearing the dolphin sound whistle underwater, identifying the matching sequence of a sound whistle in its training dataset, and informing the human researcher (via underwater headphones) about the corresponding object that the dolphin had asked for.
This would enable the researcher to respond quickly by offering the correct object to the dolphin, reinforcing the connection between them, Google said.
“By demonstrating the system between humans, researchers hope the naturally curious dolphins will learn to mimic the whistles to request these items. Eventually, as more of the dolphins’ natural sounds are understood, they can also be added to the system,” the company added.
Story continues below this ad
Google said its Pixel 6 series had shown it was capable of processing dolphin sounds in real-time. It said the upcoming Pixel 9 generation would be integrated with specific speaker and microphone functions, and upgraded with advanced processing “to run both deep learning models and template matching algorithms simultaneously.”
“Using Pixel smartphones dramatically reduces the need for custom hardware, improves system maintainability, lowers power consumption and shrinks the device’s cost and size — crucial advantages for field research in the open ocean,” the tech giant said.
Can AI chatbots help us talk to dolphins?
Researchers have been studying ways to leverage AI and machine learning algorithms in order to make sense of animal sounds for several years now.
They have had success applying automatic detection algorithms based on convolutional neural networks to pick out animal sounds and categorise them based on their acoustic characteristics.
Story continues below this ad
Deep neural networks have also made it possible to find hidden structures in sequences of animal vocalisation. This has ensured that AI models trained on examples of animal sounds are capable of generating a unique, synthetic version of the animal sound.
While these supervised learning models are able to generate animal sounds based on human-labelled examples, what about animal sounds that are not part of the training dataset or haven’t been labelled? This is where self-supervised learning models like ChatGPT come in.
These unsupervised learning models are trained on vast amounts of data pulled from every nook and corner of the internet. Researchers expect these datasets may contain animal sounds that have been previously inaccessible.
Yet, there are several major challenges in developing an AI chatbot that lets humans talk to animals. For instance, researchers have pointed out that animals likely communicate using more than just sound, incorporating other senses such as touch and smell.