Animal Crossing’s fake language sounds different in Japanese

marcan_42 · on Dec 25, 2020

It was never supposed to be nonsense. To anyone playing in Japanese or presumably other phonetically consistent languages, it's obviously a sped up, slurred/somewhat mangled version of what the text is saying. I guarantee not a single Japanese player didn't notice this, or otherwise thinks it's pure gibberish. Japanese is not my native language and I noticed immediately; any native would have too.

The problem is that English is a phonetically inconsistent language, with a massive number of rules required to even begin to approximate the mapping from text to phonemes (and zillions of exceptions). So this kind of really dumb TTS not intended to be actually intelligible doesn't work at all in English. And so it sounds like actual nonsense.

jrowen · on Dec 25, 2020

It did work in English though. Maybe not as well, but I do remember noticing in the original Animal Crossing that there was some correlation between the sounds and the text. Even if it was just the timing and intonation, it could definitely be understood that the lines were actually being spoken in some form. I don't think it was TTS because it was too accurate in a really subtle way.

Or maybe it was and they tweaked it well enough to work. It's been a while, my memory could be off. I don't know if it would have been prohibitive on the GameCube to have audio of every line (there were a lot), but I wouldn't put it past them to have done so.

marcan_42 · on Dec 25, 2020

It's TTS. They absolutely don't go dubbing around every line for this, that'd be insane. That's the whole point of this system, to provide some fun sounding audio for the lines without having to actually dub them. You can tell because they speak your island name and your own name exactly the same way as the rest of the text, accurately. And because it's pretty monotone and consistent.

Besides, can you imagine voice actors dubbing this stuff in this kind of voice line by line? They'd go insane.

jrowen · on Dec 25, 2020

Yeah that makes sense. I guess I figured they could have processed the spoken audio, not that the actors would have actually talked like that. I was just fooled by the fact that their TTS sounds more passably human than any other I've heard (in an abstract way).

echelon · on Dec 25, 2020

This technique predated Animal Crossing, though.

Banjo-Kazooie [1] was the first game on the N64 to use what Animal Crossing terms "Animalese/Bebebese" [2], and their intention was never to build a TTS engine. The first Banjo was released in 1998, while Dobutsu no Mori (Animal Crossing) for N64 didn't come out until 2001.

Nintendo was definitely talking with Rareware at the time and they exchanged ideas and techniques on game engine design, platformer mechanics, etc. Interviews from the Rare side admitted this (I'll need to dig up some sources to include here).

I'm curious if Nintendo picked up the Animal Crossing Bebebese voices directly from Banjo-Kazooie.

[1] https://www.youtube.com/watch?v=9ZE5A3DbHDk (Actually a video of the year 2000 sequel, Banjo-Tooie, but this is a better example of the same voice engine using various voices.)

[2] https://animalcrossing.fandom.com/wiki/Language#Bebebese

dada78641 · on Dec 25, 2020

I've played it in both Japanese and in English, and while Japanese is more phonetically consistent I don't understand how people could miss it in English either. You can tell right at the beginning when Rover says your name back to you. Even if it's usually difficult to make out, occasionally you should notice it's not totally random, especially when you start playing around with giving the villagers catchphrases.

ubercow13 · on Dec 25, 2020

This reminds me of the game killer7, where some NPCs would speak in a partially distorted voice. However in that game it always quite interesting listening to the TTS because the script an NPC is reading seems to be a paraphrased from the subtitle (and maybe partially complete gibberish?) and it feels like you’re fading in and out of understanding the language as they speak [1], which is in fitting with the surreal style of the game and presumably intentional.

[1] https://youtu.be/B-Pw72j6xDg

TeaDude · on Dec 25, 2020

Fun fact: In the Japanese version it's literally just mac TTS voices with machine translated English.

The distortion effects were presumably added so people overseas wouldn't notice how jarring it is. It's such a cool effect it feels like something that should have been in the original release.

ubercow13 · on Dec 25, 2020

Oh wow that’s so interesting! Especially how incomprehensible most of the dialog still is. I just realised this account posted the same video with the non-distorted speech too [1]

[1] https://youtu.be/Jz7FUYRQqfY

CorrectHorseBat · on Dec 25, 2020

That is assuming they tried to write their own TTS instead of just taking an existing, working English TTS and speed it up and distort it. Why would they do that?

To me it does sound exactly like that too in English.

I'd wager it's more obvious in the Japanese version because Japanese is the exceptional language. There are only 44 syllables in Japanese (English has about 16,000) and one would probably still notice this in otherwise unintelligible distorted speech.

marcan_42 · on Dec 25, 2020

Writing their own TTS sounds like what Nintendo would do, to be honest.

Spanish is about as phonetically consistent as Japanese. Most languages have simpler phonology than English. It's not about the number of possible syllables, it's about the rules to go from text to phonemes. The rules in English are immensely complicated and inconsistent. Meanwhile, I can accurately describe Spanish phonology, such that you'd be able to pronounce ~any Spanish word (English loanwords excluded) accurately including stress, in about one page. Written Japanese lacks pitch accent information, but otherwise works similarly.

(By the way, your stats are off; modern Japanese has about ~106 possible syllables (mora) by rough count).

lloda · on Dec 25, 2020

English spelling is inconsistent. English phonology is complex. These are different things.

CorrectHorseBat · on Dec 25, 2020

True, but if they didn't write their own TTS nothing of that matters.

Most TTS software out there will be better at English than any other language despite the more complex phonology. Then it's solely about distinguishing language characteristics and I think the amount of syllables would have an effect in that. Japanese has a lower information density than English (at the same speed) so with the same amount of distortion Japanese should be more recognizable.

hahamrfunnyguy · on Dec 25, 2020

Luigi's mansion 3 does something similar, I wouldn't be surprised if it's using the same or similar technology under the hood. For one of the characters, bits of the dialog sounded like incoherent Japanese and I could make out some words. I haven't played in a while, but I recall different characters had different sounding accents too. I like this approach for in-game dialog!

javchz · on Dec 25, 2020

Undertale for some reason manage to do that effect well in English. I don't know how

vimy · on Dec 25, 2020

Reminds me of a song from 1972 by an Italian comedian. The song is called "Prisencolinensinainciusol" which means...well, nothing. It's gibberish. In fact, the entire song is nonsense lyrics made to sound like English.

https://www.youtube.com/watch?v=-VsmF9m_Nt8&feature=emb_titl...

oooooooooooow · on Dec 25, 2020

Adriano Celentano is a singer songwriter, actor, director, screenwriter, composer, film editor and TV author[1]. He definitely has humor and used irony throughout his career, but wouldn't call him a comedian by any stretch.

[1] Taken from wikipedia

Waterluvian · on Dec 25, 2020

It's also a pretty catchy song. But yeah it messes with my brain that is working hard to parse English. It's interesting. If a language is clearly not English I don't have the same issue.

wincy · on Dec 25, 2020

Is English your native language? Because for me at least as a native English speaker I immediately recognize it as gibberish and have no such problem. I’m curious if it’s a native/non-native issue with parsing or not.

Waterluvian · on Dec 25, 2020

Yes it is. I'm a very strong English speaker/listener.

I get this in a lot of things. Human brains love finding patterns and everyone does this to an extent. I think mine does it so much more. For example if I ever see numbers anywhere I'm compulsively adding them to see if any nicer looking numbers result.

I wouldn't be surprised if this inclination to hear jibberish and try to parse it into language is a me thing.

juancn · on Dec 25, 2020

Animalese is a simple phonetic translation of the text. It’s super noticeable in Spanish which is fairly straightforward. In English it’s a simplistic translation to phonemes as far as I can tell.

ziml77 · on Dec 25, 2020

I first realized what they were doing when talking to Blathers. The hoots in his speech were easy to pick out in the audio which then led to hearing that the rest of the text was being spoken. After noticing it, it's been much easier to hear it happening for all characters.

BoorishBears · on Dec 25, 2020

You can notice it at the screen where you enter your name at the start of the game

As you move the cursor over each letter, the character speaks out a phonetic sound similar to the letter you're on

Galanwe · on Dec 25, 2020

Interestingly, we have a word in French to call these gibberish languages that sound like real ones but have no meaning.

It's called speaking "yogurt".

It was originally used because young French people wanted to sing the English songs that they heard, but didn't know the language, so they would make up sounds that looked like English.

proper_black · on Dec 25, 2020

This exact thing is called "washawasheo" or "washawashear" in (Mexican) Spanish. It would be cool to know what languages have this, as I'm pretty sure not all of them do (Russian doesn't, for example).

ryanschneider · on Dec 25, 2020

Reminds me of the (supposed?) etymology of “barbarian”:

https://en.wikipedia.org/wiki/Barbarian

> The Greeks used the term barbarian for all non-Greek-speaking peoples, including the Egyptians, Persians, Medes and Phoenicians, emphasizing their otherness. According to Greek writers, this was because the language they spoke sounded to Greeks like gibberish represented by the sounds "bar..bar..;"

ta1234567890 · on Dec 25, 2020

And a band made a hit song in Spanish about it in the early 2000s, Asereje: https://youtu.be/V0PisGe66mY

grey_earthling · on Dec 25, 2020

This song is loosely a cover of Rapper's Delight.

johncoltrane · on Dec 25, 2020

Interesting, I have never heard the expression "chanter en yaourt" used for anything other than fake English.

33degrees · on Dec 25, 2020

Interesting! I imagine this is specific to France as I’ve never heard this in Québec...

treeman79 · on Dec 25, 2020

How English sounds to foreigners.

https://youtu.be/Vt4Dfa4fOEY

lovemenot · on Dec 25, 2020

Yeah, that's good.

As a native English speaker (UK), I'll be pedantic. That's also how Americans sound to me.

wincy · on Dec 25, 2020

My wife is from New York and has a very difficult time understanding non-US accents. She has to watch British period dramas with subtitles.

garmaine · on Dec 25, 2020

I thought that link was going to be Prisencolinensinainciusol:

https://youtu.be/_g6YxkSqL20

Razengan · on Dec 25, 2020

Instead of the constant focus on graphics graphics graphics, why don’t companies improve other tech like speech synthesis for a while?

That would bring about a new boom (pun) of creativity by allowing indie devs to write complex stories with spoken dialogue without having to worry about hiring actors and immutable recording sessions.

cbhl · on Dec 25, 2020

There aren't super technical burdens to speech synthesis in games; for example, Jackbox Party Pack 7 runs just fine on the Switch and contains a speech synthesis engine for "Blather Round".

I imagine it's mostly a licensing/cost thing (since a "voice" for a speech synthesis engine still requires hiring an actor and doing a recording session).

Razengan · on Dec 25, 2020

> since a "voice" for a speech synthesis engine still requires hiring an actor

That's what I'm talking about. Surely we can do away with that if we try, just as we don't need real people to build 3D models -from- if we don't want them.

klodolph · on Dec 25, 2020

I’m not sure what you mean when you say we don’t need real people to build 3D models. If we want 3D models, someone has to make them. That person might be a traditional modeler working in Blender, or a sculptor in ZBrush, or might be someone with a 3D scanner doing photogrammetry. The tools are changing, but it is still people using the tools.

Just like how we invented an “automatic programming” system where the computer will do programming for you, and it turns out that once we’ve made automated programming we have more programmers and not fewer. The tools for making 3D models are getting better and easier to use, and as photogrammetry is being used more and more, we see larger teams of modelers, not smaller.

zerocrates · on Dec 25, 2020

They're comparing making a bespoke 3D model of a character as opposed to scanning an actual person, saying similarly you could just create a voice from scratch rather than record a real one.

klodolph · on Dec 25, 2020

I thought that was obvious, so I guess my comment wasn’t clear.

A 3D scanner is an artist’s tool. By using a 3D scanner, you aren’t getting rid of artists, you are just changing how artists do their jobs. Vocal synthesizers and vocal transformers, similarly, aren’t making it possible to press a button and get reasonable sounding voice in your games if you don’t have a voice actor making it possible. If you aren’t convinced, then just look at soundtracks. You can press a button and your iMac will spit out the sounds of the BBC Symphony Orchestra violin section. In spite of this, getting a symphonic score is still expensive. It’s expensive enough that TV shows (with sizable budgets) often skip out on the symphonic score and do something cheaper. We haven’t gotten rid of musicians, it’s just that musicians are much more likely to have computers.

There is, in theory, nothing stopping you from buying like $200 in software and making a symphony orchestra right now. The problem is that you have no idea how to write a symphony orchestra. For the same reason, if you have an iPhone 12 or something similar then you can start using the LIDAR features and making a 3D model using photogrammetry in moments—except for the fact that you have no idea how to make a 3D model.

I think there’s a trap that people fall into, thinking that technology is just around the corner that will get rid of job X, Y, or Z. Often what you end up with is MORE people doing job X, Y, and Z, it’s just that they use computers to do it, and have a different skill set.

imtringued · on Dec 25, 2020

What a lot of people fail to grasp is that there is a fixed cost to staying alive. The big problem for a lot of skills is that the production capacity of a human is not high enough to pay for that fixed cost or just barely profitable enough for the top 10000 humans to have a career in this skill. If you somehow increase the productivity of humans with that skill you are massively reducing the barrier to entry which means more people can make a living out of it.

erikpukinskis · on Dec 25, 2020

Most companies are too dysfunctional to employ any strategy other than “do an okay job at all the same things everyone else in the industry is doing”.

It’s actually pretty rare the company that can do its own thing and survive at it.

underwater · on Dec 25, 2020

There are companies like replicastudios.com tackling this. I imagine it's not likely a games company can build a massively lead in what is a very academic field.

jonathankoren · on Dec 25, 2020

Reminds of glossolalia. The speaker believes they’re speaking a foreign language, but when you examine the “language” it’s just random phonemes from the speaker’s native language.

bryanh · on Dec 25, 2020

Someone on Reddit recreated the audio effect in Python, pretty fun and well done!

Github: https://github.com/equalo-official/animalese-generator

Video Explainer: https://www.youtube.com/watch?v=RYnI_ZLj5ys

coding123 · on Dec 25, 2020

I'm curious if that's the same for other games that do similar "gibberish" sounds like Mario Odyssey and Zelda games - I do think the gibberish "sounds" englishy. I can see them making it sound more like Japanese gibberish in Japan.

Shared404 · on Dec 25, 2020

The clothes shop attendant out front in Kakariko Village actually yells something in Japanese iirc.

Edit: In the english version of Breath of the Wild.

nubb · on Dec 25, 2020

A thread on jibberish video game languages isn’t complete without a mention of Simlish.

https://www.google.com/amp/s/www.theverge.com/platform/amp/2...

etiam · on Dec 25, 2020

On not all gibberish being created equal: it's been long enough since "What Languages Sound Like To Foreigners" that some people here may have missed have missed this.

https://www.youtube.com/watch?v=ybcvlxivscw

lhr0909 · on Dec 25, 2020

I believe they run the dialogs into a program and generate the sounds from it. I play AC in Chinese and the words do sound like Chinese. It is just crazy to wrap my head around the amount of effort they put into this game.

bschwindHN · on Dec 25, 2020

I play it in Japanese, I thought it was always just a high-pitched, sped up version of the sounds the language is composed of.

komali2 · on Dec 25, 2020

When I was a kid the only way I could play video games was in 1 week bursts in the form of hollywood video rentals. When we picked up animal crossing for GameCube and I heard animalese, I thought my disc was messed up and went back to swap it out. I was too young to really know how to google something like that.

lloda · on Dec 25, 2020

I haven't heard AC's speech, but the description sounds a bit like the way Donald the Duck talks.