Arctic Sentences

Last week I posted a piece about the Harvard Sentences, intended for use in the synthesising of speech for IT purposes, and how they verge on poetic in their random absurdity.  Today I’d like to add another set of similar sentences, also selected for their balanced use of phonemes for speech synthesis research.  These are the Arctic Sentences.

CMU Pittsburgh Hamerschlag Hall

CMU Pittsburgh Hamerschlag Hall

These were put together by the Language Technologies Institute at the School of Computer Science at Carnegie Mellon University, Pittsburgh:

These single speaker speech databases have been carefully recorded under studio conditions and consist of nearly 1150 phonetically balanced English utterances. They are distributed as free software, without restriction on commercial or non-commercial use.  The Arctic corpus consists of four primary sets of recordings (3 male, 1 female), plus several ancillary databases. Each database is distributed with automatically segmented phonetic labels. (From ‘CMU ARCTIC databases for speech synthesis’ by John Kominek and Alan W Black, 2003: online report)

So that’s clear, then.  The sample sentences for the Arctic databases were selected from texts out of copyright on the Gutenberg Project website.  Most of the source texts were from stories by Jack London (1876-1916), best known for his Klondike gold-rush novels The Call of the Wild and White Fang, but he was also an early exponent of the dystopian-socialist novel (The Iron Heel) and science fiction; The People of the Abyss (1903) is another Orwellian work, a non-fiction account of the months he spent among the urban poor in the slums of London’s East End.

Because the Arctic databases were chosen from stories and other texts set in the far north, mostly the Yukon, they were collectively entitled ‘Arctic sentences’.  A few sentences were taken from authors named in the report’s Appendix as Curwood, Conner and Hakluyt; poems by Robert Service seem also to feature in the databases.

Jack London pre-1916

Jack London pre-1916

Unlike the Harvard Sentences, which were artificially constructed, these texts were originally composed at part of prose fiction (and some other) texts.  Despite the claim by the database compilers that they stripped out archaic or difficult expressions or vocabulary, I find them charmingly dated and clunky.  Like the Harvard Sentences, they begin to take on an almost poetic resonance when strung together or mashed up.  Here are some of my favourites:

“For the twentieth time that evening the two men shook hands.”

“There’s Fort Churchill, a rifle-shot beyond the ridge, asleep.”  (How far is a rifle-shot?)

“From that moment his friendship for Belize turns to hatred and jealousy.”

“I followed the line of the proposed railroad, looking for chances.”

“Clubs and balls and cities grew to be only memories.”

London's 'Tales of the Far North' from the Oxford World's Classics website

London’s ‘The Son of the Wolf: Tales of the Far North’ from the Oxford World’s Classics website

“It fairly clubbed me into recognizing it.”

“He had a big chimpanzee that was a winner.”

“The Russian music player, the Count, was her obedient slave.”

“Eggshell is not good to eat.”

“Then came my boy code.”

“And wherever I ranged, the way lay along alcohol-drenched roads.”

 

“You yellow giant thing of the frost.”

Illustration to a Jack London story: man, dog and fire

Illustration to a Jack London story: man, dog and fire: ‘Building a Fire’

“It is dog eat dog, and you ate them up.”

“It was my reports from the north which chiefly induced people to buy.”

“This time he did not yap for mercy.”

“I was brought up the way most girls in Hawaii are brought up.”

“I saw it when she rolled.”

“I’ll be out of my head in fifteen minutes.”

“Those are my oysters, he said at last.”

“Massage under tension, was the cryptic reply.”

“Besides, had he not whipped the big owl in the forest.”

“Her achievements with cocoanuts (sic) were a revelation.”