Category: Forensics
Flag: apoorvctf{ohyougotthisfardamn}
Challenge Description
No time to explain! The organizers are after me — I stole the flag for you, by sneakily recording their keyboard.
I managed to capture their keyboard keypresses before the event— every key (qwertyuiopasdfghjklzxcvbnm) pressed 50 times—don’t ask how. Then, while they were uploading the real challenge flag to CTFd, I left a mic running and recorded every keystroke.
Now I’m on the run If the organizers catch you with this, you never saw me. Good luck — and hurry!
Analysis
Since the ZIP was only used as a convenient download container, I skipped archive triage and moved straight to decoding keystrokes from the two provided recordings: Reference.wav (known keypress training audio) and flag.wav (unknown typed string). I still verified both WAV properties first, because matching sample format matters for clean template matching.
exiftool "/home/rei/Downloads/AuthorOnTheRun/Author on the Run/Reference.wav" "/home/rei/Downloads/AuthorOnTheRun/Author on the Run/flag.wav"======== /home/rei/Downloads/AuthorOnTheRun/Author on the Run/Reference.wav
File Type : WAV
Encoding : Microsoft PCM
Num Channels : 1
Sample Rate : 44100
Bits Per Sample : 16
Duration : 0:05:05
======== /home/rei/Downloads/AuthorOnTheRun/Author on the Run/flag.wav
File Type : WAV
Encoding : Microsoft PCM
Num Channels : 1
Sample Rate : 44100
Bits Per Sample : 16
Duration : 12.25 sAt this point, the solve became a keystroke-acoustics classification problem: detect transient events, build templates from Reference.wav (where key order is known), then decode flag.wav. The first pass was a bit troll-y because one physical keypress can create multiple transients (press/release/desk resonance), so event counts can explode if the peak detector is too sensitive.

I used a sweep script to confirm that flag.wav consistently had 19 significant events under multiple detector settings, while Reference.wav required more robust grouping logic.
python /home/rei/Downloads/AuthorOnTheRun/explore_peaks.pywin=1.0ms thr=5.0 mg=0.008s -> ref=5627 flag=19
win=1.0ms thr=6.0 mg=0.008s -> ref=5300 flag=19
win=1.5ms thr=6.0 mg=0.01s -> ref=4630 flag=19
...
selected: ref=5300 flag=19
flag dt stats: mean=0.6499 med=0.6527 min=0.6346 max=0.6669From there, I moved to a time-binned ensemble approach: split Reference.wav into 26 equal-duration bins, map them to qwertyuiopasdfghjklzxcvbnm, build trimmed mean templates per key, decode flag.wav, and then vote across many parameter configurations. That stabilized the noisy characters near the end and converged to a readable phrase.
python /home/rei/Downloads/AuthorOnTheRun/consensus_decode.pymodel count 240
top 20 strings:
01. score=7.0603 conf=0.0801 txt=ohyougotthisfardamb
02. score=5.6284 conf=0.1142 txt=ohyougotthisfqrdqmn
...
K=20 consensus=ohyougotthisfqrdqmn
K=40 consensus=ohyougotthisfqrdqmn
K=80 consensus=ohyougotthisfqrdqmn
K=120 consensus=ohyougotthisfqrdqmn
K=200 consensus=ohyougotthisfardamnThe phrase ohyougotthisfardamn is the final decoded keystroke text. Wrapping that in the challenge prefix gives the final flag.

Solution
# consensus_decode.py
import wave, numpy as np
from pathlib import Path
from collections import defaultdict
BASE=Path('/home/rei/Downloads/AuthorOnTheRun/Author on the Run')
KEYS=list('qwertyuiopasdfghjklzxcvbnm')
def read(p):
with wave.open(str(p),'rb') as w:
ch=w.getnchannels(); sr=w.getframerate(); n=w.getnframes(); raw=w.readframes(n)
x=np.frombuffer(raw,dtype=np.int16).astype(np.float64)
if ch>1: x=x.reshape(-1,ch).mean(axis=1)
return sr,x
def ma(x,w):
return np.convolve(x,np.ones(w)/w,mode='same')
def detect(audio,sr,win_ms,thr_mult,min_gap_s,split_gap_s=0.03):
y=audio/(np.max(np.abs(audio))+1e-9)
env=ma(np.abs(y),max(1,int(sr*win_ms/1000)))
med=np.median(env); mad=np.median(np.abs(env-med))+1e-9
thr=med+thr_mult*mad
idx=np.where(env>thr)[0]
if len(idx)==0:return np.array([],dtype=int)
starts=[idx[0]]
sg=int(sr*split_gap_s)
for i in range(1,len(idx)):
if idx[i]-idx[i-1] > sg:
starts.append(idx[i])
peaks=[]; last=-10**18; mg=int(sr*min_gap_s)
for s in starts:
lo=max(0,s-int(sr*0.01)); hi=min(len(env),s+int(sr*0.12))
p=lo+np.argmax(env[lo:hi])
if p-last>=mg:
peaks.append(p); last=p
elif env[p]>env[peaks[-1]]:
peaks[-1]=p; last=p
return np.array(peaks,dtype=int)
def segs(audio,peaks,sr,pre_ms=3,post_ms=55):
pre=int(sr*pre_ms/1000); post=int(sr*post_ms/1000)
out=[]
for p in peaks:
a,b=p-pre,p+post
if a<0 or b>len(audio): continue
s=audio[a:b].copy(); s-=np.mean(s); s/=np.linalg.norm(s)+1e-9
out.append(s)
return np.array(out)
def sim(a,b):
return float(np.dot(a,b)/((np.linalg.norm(a)+1e-9)*(np.linalg.norm(b)+1e-9)))
def score_text(s):
commons=['th','he','in','er','an','re','on','at','en','nd','st','to','nt','ng','ha','ou','ea','is','it']
vowels=sum(c in 'aeiou' for c in s)
bg=sum(s.count(b) for b in commons)
rep=sum(1 for i in range(1,len(s)) if s[i]==s[i-1])
rare=sum(c in 'qjxzw' for c in s)
return 0.5*vowels + 1.0*bg + 0.4*rep - 0.25*rare
sr_r,ref=read(BASE/'Reference.wav')
sr_f,flg=read(BASE/'flag.wav')
assert sr_r==sr_f
sr=sr_r
models=[]
for win in [1.0,1.5,2.0,2.5,3.0]:
for tm in [3.0,3.5,4.0,4.5,5.0,5.5,6.0,6.5]:
for mg in [0.10,0.11,0.12,0.13,0.14,0.15]:
rp=detect(ref,sr,win,tm,mg,0.03)
fp=detect(flg,sr,win,tm,mg,0.03)
if len(fp)!=19: continue
rs=segs(ref,rp,sr); fs=segs(flg,fp,sr)
if len(fs)!=19 or len(rs)<500: continue
rt=rp[:len(rs)]/sr
bounds=np.linspace(0,len(ref)/sr,27)
groups=[]; ok=True
for i in range(26):
idx=np.where((rt>=bounds[i])&(rt<bounds[i+1]))[0]
if len(idx)<10: ok=False; break
groups.append(idx)
if not ok: continue
tmpls={}
for k,idx in zip(KEYS,groups):
g=rs[idx]
c=np.mean(g,axis=0); c/=np.linalg.norm(c)+1e-9
sims=np.array([sim(v,c) for v in g])
ord=np.argsort(sims)
keep=ord[int(0.15*len(ord)):int(0.9*len(ord))] if len(ord)>=20 else ord
gg=g[keep]
t=np.mean(gg,axis=0); t/=np.linalg.norm(t)+1e-9
tmpls[k]=t
txt=[]; conf=[]
for s in fs:
arr=sorted(((k,sim(s,t)) for k,t in tmpls.items()), key=lambda x:x[1], reverse=True)
txt.append(arr[0][0]); conf.append(arr[0][1]-arr[1][1])
txt=''.join(txt)
sc=score_text(txt)+2.0*np.mean(conf)
models.append((sc,float(np.mean(conf)),txt))
models.sort(key=lambda x:x[0],reverse=True)
for K in [20,40,80,120,200]:
subset=models[:min(K,len(models))]
votes=[defaultdict(float) for _ in range(19)]
for sc,cf,txt in subset:
w=max(0.0001, sc)
for i,ch in enumerate(txt):
votes[i][ch]+=w
consensus=''.join(max(v.items(), key=lambda x:x[1])[0] for v in votes)
print(f'K={K} consensus={consensus}')python /home/rei/Downloads/AuthorOnTheRun/consensus_decode.pyK=200 consensus=ohyougotthisfardamnapoorvctf{ohyougotthisfardamn}