Category: Cryptography
Flag: txsaw{georges_perec}
Challenge Description
This chall’s got a bit of history to it.
First, crack this initial cryptogram. Now, apply OSINT tools to find who authors that original script.
flag format: txsaw{first_last} such as: txsaw{john_scalzi}
Analysis
The provided file was an ASCII text file containing a single long line of ciphertext, which strongly suggested a classical pen-and-paper style cipher rather than anything modern. Because the text had normal word spacing and punctuation positions but unreadable letters, the useful model was a monoalphabetic substitution. Instead of hand-solving it, the solve used a hillclimbing script that scored candidate plaintexts with a mix of quadgram frequencies and wordfreq Zipf scores so that readable English would rise to the top.
Solution
# solve_subst.py
from pathlib import Path
import random
import math
from collections import Counter
from wordfreq import zipf_frequency
def load_ciphertext(path):
return Path(path).read_text().splitlines()
def build_quadgram_model(corpus_paths):
corpus = ""
for path in corpus_paths:
try:
corpus += Path(path).read_text().lower()
except Exception:
pass
corpus = "".join(ch for ch in corpus if ch.isalpha() or ch == " ")
quad = Counter()
for i in range(len(corpus) - 3):
q = corpus[i : i + 4]
if " " in q:
continue
quad[q] += 1
total = sum(quad.values()) or 1
quad_log = {k: math.log10(v / total) for k, v in quad.items()}
floor = math.log10(0.01 / total)
return quad_log, floor
def score_text(pt, quad_log, floor):
score = 0.0
words = pt.split()
for w in words:
if len(w) >= 2:
score += zipf_frequency(w, "en")
for i in range(len(pt) - 3):
q = pt[i : i + 4]
if " " in q:
continue
score += quad_log.get(q, floor)
return score
def decrypt(text, keymap):
alphabet = "abcdefghijklmnopqrstuvwxyz"
out = []
for ch in text:
if ch in alphabet:
out.append(keymap[ch])
else:
out.append(ch)
return "".join(out)
def make_initial_key(cipher_letters):
alphabet = "abcdefghijklmnopqrstuvwxyz"
english_freq_order = "etaoinshrdlucmfwypvbgkjqxz"
freq = Counter(ch for ch in cipher_letters if ch.isalpha())
cipher_order = "".join([c for c, _ in freq.most_common()])
for c in alphabet:
if c not in cipher_order:
cipher_order += c
key = {c: english_freq_order[i] for i, c in enumerate(cipher_order)}
return key
def hillclimb(cipher_text, quad_log, floor, restarts=6, iterations=12000, seed=0):
random.seed(seed)
alphabet = "abcdefghijklmnopqrstuvwxyz"
best_key = None
best_score = -1e18
best_plain = None
cipher_letters = "".join(ch for ch in cipher_text if ch.isalpha() or ch == " ")
for r in range(restarts):
key = make_initial_key(cipher_letters)
letters = list(alphabet)
for _ in range(60):
a, b = random.sample(letters, 2)
key[a], key[b] = key[b], key[a]
cur_key = key.copy()
cur_plain = decrypt(cipher_letters, cur_key)
cur_score = score_text(cur_plain, quad_log, floor)
for T in [20.0, 10.0, 5.0, 2.0, 1.0, 0.5]:
for _ in range(iterations // 8):
c1, c2 = random.sample(letters, 2)
cur_key[c1], cur_key[c2] = cur_key[c2], cur_key[c1]
new_plain = decrypt(cipher_letters, cur_key)
new_score = score_text(new_plain, quad_log, floor)
if new_score > cur_score or random.random() < math.exp(
(new_score - cur_score) / T
):
cur_score = new_score
else:
cur_key[c1], cur_key[c2] = cur_key[c2], cur_key[c1]
if cur_score > best_score:
best_score = cur_score
best_key = cur_key.copy()
best_plain = decrypt(cipher_letters, best_key)
print(
f"restart {r + 1}/{restarts}: score {cur_score:.2f} best {best_score:.2f}"
)
return best_key, best_score, best_plain
def decrypt_lines(lines, keymap):
alphabet = "abcdefghijklmnopqrstuvwxyz"
out_lines = []
for line in lines:
out = []
for ch in line:
low = ch.lower()
if low in alphabet:
dec = keymap[low]
if ch.isupper():
dec = dec.upper()
out.append(dec)
else:
out.append(ch)
out_lines.append("".join(out))
return out_lines
def main():
lines = load_ciphertext("/home/rei/Downloads/texsaw_idiosyncratic_french/ciphertext.txt")
cipher_text = " ".join(lines).lower()
cipher_text = "".join(ch for ch in cipher_text if ch.isalpha() or ch == " ")
quad_log, floor = build_quadgram_model(
[
"/usr/share/dict/words",
"/home/rei/Downloads/Adventure.txt",
]
)
key, score, plain = hillclimb(
cipher_text, quad_log, floor, restarts=6, iterations=12000, seed=3
)
print("best score", score)
print("best plain snippet:")
print(plain[:600])
print("key:")
print(" ".join(f"{c}->{key[c]}" for c in "abcdefghijklmnopqrstuvwxyz"))
dec_lines = decrypt_lines(lines, key)
print("--- decrypted full file ---")
for line in dec_lines:
print(line)
if __name__ == "__main__":
main()python solve_subst.pyrestart 1/6: score -528.20 best -528.20
restart 2/6: score -528.20 best -528.20
restart 3/6: score -528.20 best -528.20
restart 4/6: score -528.20 best -528.20
restart 5/6: score -528.20 best -528.20
restart 6/6: score -528.20 best -528.20
best score -528.198841989987
best plain snippet:
noon rings out a wasp making an ominous sound a sound akin to a klaxon or a tocsin flits about augustus who has had a bad night sits up blinking and purblind oh what was that word is his thought that ran through my brain all night that idiotic word that hard as id try to pun it down was always just an inch or two out of my grasp fowl or foul or vow or voyal a word which by association brought into play an incongruous mass and magma of nouns idioms slogans and sayings a confusing amorphous outpouring which i sought in vain to control or turn off but which wound around my mind a whirlwind of a
key:
a->n b->m c->l d->k e->j f->i g->h h->g i->f j->q k->d l->c m->b n->a o->e p->y q->x r->w s->v t->u u->t v->s w->r x->z y->p z->o
--- decrypted full file ---
Noon rings out. A wasp, making an ominous sound, a sound akin to a klaxon or a tocsin, flits about. Augustus, who has had a bad night, sits up blinking and purblind. Oh what was that word (is his thought) that ran through my brain all night, that idiotic word that, hard as I'd try to pun it down, was always just an inch or two out of my grasp - fowl or foul or Vow or Voyal? - a word which, by association, brought into play an incongruous mass and magma of nouns, idioms, slogans and sayings, a confusing, amorphous outpouring which I sought in vain to control or turn off but which wound around my mind a whirlwind of a cord, a whiplash of a cord, a cord that would split again and again, would knit again and again, of words without communication or any possibility of combination, words without pronunciation, signification or transcription but out of which, notwithstanding, was brought forth a flux, a continuous, compact and lucid flow: an intuition, a vacillating frisson of illumination as if caught in a flash of lightning or in a mist abruptly rising to unshroud an obvious sign - but a sign, alas, that would last an instant only to vanish for good.That plaintext gave a perfect OSINT pivot because its opening sentence is distinctive enough to search verbatim. The quote search immediately returned a University of Cambridge page titled Geroges Perec Excerpt, and the matching passage on that page identified the source as A Void. The final confirmation came from the page itself, which names Georges Perec and even notes the lipogrammatic gimmick of avoiding the letter e, exactly matching the challenge title’s censoring of French.
# search_quote.py
from duckduckgo_search import DDGS
def main():
quote = (
"Noon rings out. A wasp, making an ominous sound, a sound akin to a klaxon "
"or a tocsin, flits about. Augustus, who has had a bad night, sits up blinking "
"and purblind."
)
with DDGS() as ddgs:
results = ddgs.text(f'"{quote}"', max_results=12)
for i, r in enumerate(results, start=1):
print(f"{i}. {r.get('title')}")
print(r.get("href"))
print(r.get("body"))
print("-")
if __name__ == "__main__":
main()python search_quote.py/home/rei/Downloads/texsaw_idiosyncratic_french/search_quote.py:11: RuntimeWarning: This package (`duckduckgo_search`) has been renamed to `ddgs`! Use `pip install ddgs` instead.
with DDGS() as ddgs:
1. Geroges Perec Excerpt - University of Cambridge
http://www-control.eng.cam.ac.uk/hu/Perec.html
Noon rings out. ... Augustus, who has had a bad night, sits up blinking and purblind.
-# fetch_source.py
import urllib.request
def main():
url = "http://www-control.eng.cam.ac.uk/hu/Perec.html"
with urllib.request.urlopen(url, timeout=20) as resp:
data = resp.read().decode("utf-8", errors="replace")
print(data)
if __name__ == "__main__":
main()python fetch_source.py<TITLE>Geroges Perec Excerpt</TITLE>
<H1>Georges Perec Excerpt</H1>
<p>
<H2>From "A Void"</H2>
<p>
<H2>(one of Haig's favourite passages)</H2>
<p>
Noon rings out. A wasp, making an ominous sound, a sound akin to a klaxon
or a tocsin, flits about. Augustus, who has had a bad night, sits up
blinking and purblind. Oh what was that word (is his thought) that ran through
my brain all night, that idiotic word that, hard as I'd try to pun it down,
was always just an inch or two out of my grasp - fowl or foul or Vow or
Voyal? - a word which, by association, brought into play an incongruous mass
and magma of nouns, idioms, slogans and sayings, a confusing, amorphous
outpouring which I sought in vain to control or turn off but which wound
around my mind a whirlwind of a cord, a whiplash of a cord, a cord that
would split again and again, would knit again and again, of words without
communication or any possibility of combination, words without pronunciation,
signification or transcription but out of which, notwithstanding, was brought
forth a flux, a continuous, compact and lucid flow: an intuition, a vacillating
frisson of illumination as if caught in a flash of lightning or in a mist
abruptly rising to unshroud an obvious sign - but a sign, alas, that would last
an instant only to vanish for good.
<p>
<i>(transl. by Gilbert Adair, who succeeded in preserving the idiosyncracy
of the original French... the letter "e" doesnot appear even once in the
book!)</i>At that point the flag format was straightforward: convert the author’s name to first_last, giving txsaw{georges_perec}.