Robber language generator

September 27, 2020 · 11 min read

Software Engineer at Frilans Finans

Robber language (Swedish: rövarspråket) is said to have been invented by Sture Lindgren — husband of the author Astrid Lindgren — when he played with his friends as a child. Astrid Lindgren's books about Kalle Blomkvist made the language popular in Sweden.

Robber language is a spoken code where every consonant is replaced by the consonant + "o" + the consonant again. From that simple rule, building a robber language generator should be dead easy. When it comes to written robber language, however, there are a few more aspects to consider. So this post is really about a written robber language generator.

note

Robber language was designed for Swedish and doesn't translate cleanly to English. Swedish has a clean consonant/vowel split — the letter "y" is a vowel, and "x" is always pronounced "ks" — which is what makes the rule "consonant + o + consonant" work consistently. English is messier: "y" flips between consonant and vowel ("yes" vs "happy"), "x" can sound like "ks" ("ax") or "z" ("xylophone"), and digraphs like "sh", "ch" and "th" represent single sounds that the letter-by-letter rule splits apart.

Spoken English actually works fine if you treat a consonant as a consonant sound rather than a letter — "ship" then becomes one consonant sound + vowel + consonant sound, not s-h-i-p. By sound, "ship" translates to "shoshipop"; a letter-by-letter generator would instead spit out "soshohipop", which doesn't match how the word is pronounced. Doing it by sound puts the burden on the speaker to do the phonetic analysis on the fly, and writing a generator for it would mean shipping a pronunciation dictionary or grapheme-to-phoneme model rather than the simple letter-by-letter substitution this post is about. So the Swedish examples are the ones that actually sound right when spoken — though even Swedish isn't perfectly clean: digraphs like "sj", "sch" and "tj" are single sounds that the letter-by-letter rule splits apart, just accepted by convention rather than worked around.

TL;DR

The simple spoken rule (consonant + "o" + consonant) needs a few additions to work in writing — mainly handling "x" and uppercase.
This post walks through two strategies for building a generator in JavaScript: looping over the consonants, and looping over the text to translate.
A regex with a callback can collapse the whole thing into two lines, shown as the final alternative.

Try the finished robber language generator here:

Text to translate

Rorövovarorsospoproråkoketot

The robber language model

The most basic description of robber language could be:

After every consonant, the letter "o" and the same consonant are added.

So "hej" (Swedish for "hi") becomes "hohejoj" and "rövarspråket" becomes "rorövovarorsospoproråkoketot". This simple rule is enough for spoken language. For written language, however, some additional rules are needed. How does it work with uppercase and lowercase in writing? Based on the description above, the same consonant should be written again. Does that mean "Hej" becomes "HoHejoj"? I assume uppercase and lowercase rules should work the same as in regular writing — no uppercase in the middle of words. So "Hej" should become "Hohejoj".

Spoken language is the base. Written robber language is built to be correct when spoken. The letter "x" is a consonant and should according to the model become "xox". But since "x" is pronounced "ks" in Swedish, it works better — based on the spoken language — if "x" is translated to "koksos". So "yxa" (axe) becomes "ykoksosa" rather than "yxoxa".

The rules can be summarized as follows:

The letter "x" is replaced with "ks".
After every consonant, the letter "o" and the same consonant are added.
For uppercase consonants, only the consonant before "o" is uppercase, the other is lowercase.

The ordering matters: running rule 1 first lets the consonant rule treat every consonant uniformly afterward, instead of carrying a special case through the main loop. This preprocessing approach is what most of the function-based examples below default to — example 10 shows the inline alternative.

Building a robber language generator

Let's think about how a robber language generator could be built in JavaScript. First, a reminder of the simple rule for the spoken language:

After every consonant, the letter "o" and the same consonant are added.

Starting from the consonants

The simplest approach to the problem I can think of is to loop through a variable with all the consonants and in each iteration replace the current consonant in the sentence with the consonant + "o" + the same consonant again. We'll use "Min mening" (Swedish for "My sentence") as our test input. Note that "y" isn't in the consonants list because in Swedish it's a vowel:

Example 1

const consonants = "bcdfghjklmnpqrstvwxz"

let sentence = "Min mening"

consonants.split("").forEach((consonant) => {
	sentence = sentence.replaceAll(consonant, consonant + "o" + consonant)
})

// sentence = "Minon momenoninongog"

Text to translate

Minon momenoninongog

If you try typing something in the box above, you'll see the generator actually works — sometimes. Since the consonants in the variable are lowercase, and JavaScript distinguishes between upper- and lowercase, the generator won't do anything with uppercase consonants. Why not add all consonants as uppercase to the variable too?

Example 2

const consonants = "bcdfghjklmnpqrstvwxzBCDFGHJKLMNPQRSTVWXZ"

let sentence = "Min mening"

consonants.split("").forEach((consonant) => {
	sentence = sentence.replaceAll(consonant, consonant + "o" + consonant)
})

// sentence = "MoMinon momenoninongog"

Text to translate

MoMinon momenoninongog

OK, that handles uppercase too. But it doesn't feel particularly nice to have to specify every consonant twice. The fact that "Min" becomes "MoMinon" also violates rule 3, which says uppercase should only occur for the consonant before the "o" in a translation. How can we improve the generator's code? JavaScript has the built-in functions toUpperCase() and toLowerCase() we can use:

Example 3

const consonants = "bcdfghjklmnpqrstvwxz"

let sentence = "Min mening"

consonants.split("").forEach((consonant) => {
	sentence = sentence.replaceAll(consonant, consonant + "o" + consonant)
	sentence = sentence.replaceAll(
		consonant.toUpperCase(),
		consonant.toUpperCase() + "o" + consonant,
	)
})

// sentence = "Mominon momenoninongog"

Text to translate

Mominon momenoninongog

Or like this:

Example 4

const consonants = "bcdfghjklmnpqrstvwxz"
const allConsonants = consonants + consonants.toUpperCase()

let sentence = "Min mening"

allConsonants.split("").forEach((consonant) => {
	sentence = sentence.replaceAll(
		consonant,
		consonant + "o" + consonant.toLowerCase(),
	)
})

// sentence = "Mominon momenoninongog"

Text to translate

Mominon momenoninongog

Between examples 3 and 4, I think 3 is nicest because it's easier to understand at a glance. Example 4 is in practice the same solution as example 2, just packaged more cleverly — and the extra cleverness doesn't really buy us anything. So I'll continue from example 3.

But how do we handle rule 1? The rule says "x" should be replaced with "ks". Easiest is to do that replacement before the rest of the translation. We'll switch to "Xylofoner är fina" ("Xylophones are nice") as input to exercise that rule:

Example 5

const consonants = "bcdfghjklmnpqrstvwxz"

let sentence = "Xylofoner är fina"

sentence = sentence.replaceAll("x", "ks").replaceAll("X", "Ks")
consonants.split("").forEach((consonant) => {
	sentence = sentence.replaceAll(consonant, consonant + "o" + consonant)
	sentence = sentence.replaceAll(
		consonant.toUpperCase(),
		consonant.toUpperCase() + "o" + consonant,
	)
})

// sentence = "Koksosylolofofononeror äror fofinona"

Text to translate