ApoorvCTF 2026 - Project Mirrorfall - AI Writeup

Category: AI Flag: apoorvctf{7d88323_0.0245}

Challenge Description#

Your first task is to locate a public archive serving as an archival mirror for the 2013 intelligence disclosures.

Within this archive, locate the raw PDF classification guide dated on September 5, 2013 that corresponds to the overarching US encryption defeat program.

Variable X: extract the first 7 characters of the latest commit SHA for this exact PDF file. Do not use the repository’s main commit hash.

Download the raw PDF classification guide. Navigate through the dense administrative caveats to Appendix A, which lists the program’s specific capabilities.

Locate the “Remarks” column corresponding to the list of Exceptionally Controlled Information (ECI) compartments used to protect these details.
The first ECI listed is APERIODIC. Identify the second ECI compartment listed immediately after it (an 8-letter codeword).
Normalize this codeword.

Process the extracted ECI codeword through a specific semantic embedding model.

Initialize all-MiniLM-L6-v2 model.

Pass the normalized, 8-letter ECI codeword into the model to generate its tensor embedding array (model.encode()).
Variable Y: Extract the first floating-point value from the resulting embedding array (Index 0) and round it to 4 decimal places.

Analysis#

This one looked like an OSINT-plus-reproducibility trap at first, because the wording pushes you toward broad archive hunting, but the solve path becomes clean once you lock onto a concrete mirror and stay deterministic. I used a small script to query GitHub metadata for a known Snowden mirror candidate, pull the exact target PDF from raw, and parse the APERIODIC line directly from extracted text so there was no manual ambiguity.

tableflip

# mirrorfall_extract.py
import re
import requests
from pathlib import Path
from pypdf import PdfReader

owner, repo = "iamcryptoki", "snowden-archive"
target = "documents/2013/20130905-theguardian__bullrun.pdf"
api = "https://api.github.com"

meta = requests.get(f"{api}/repos/{owner}/{repo}", timeout=30).json()
branch = meta["default_branch"]

commits = requests.get(
    f"{api}/repos/{owner}/{repo}/commits",
    params={"path": target, "per_page": 1, "sha": branch},
    timeout=30,
).json()
sha = commits[0]["sha"]
sha7 = sha[:7]

raw_url = f"https://raw.githubusercontent.com/{owner}/{repo}/{branch}/{target}"
pdf_path = Path("mirrorfall_bullrun.pdf")
pdf_path.write_bytes(requests.get(raw_url, timeout=60).content)

text = "\n".join((p.extract_text() or "") for p in PdfReader(str(pdf_path)).pages)
text = text.replace("\u2019", "'")

line = next((ln for ln in text.splitlines() if "APERIODIC" in ln), "")
parts = [t.strip(" ,.;:/()\t\r\n").upper() for t in re.split(r"\s+", line) if t.strip()]
idx = parts.index("APERIODIC")
second_eci = parts[idx + 1]

print(f"repo={owner}/{repo}")
print(f"file={target}")
print(f"latest_file_sha={sha}")
print(f"variable_x={sha7}")
print(f"aperiodic_line={line}")
print(f"second_eci={second_eci}")
print(f"normalized={second_eci.lower()}")

python mirrorfall_extract.py

repo=iamcryptoki/snowden-archive
file=documents/2013/20130905-theguardian__bullrun.pdf
latest_file_sha=7d88323521194ed8598624dc3a932930debdde1d
variable_x=7d88323
aperiodic_line=APERIODIC,  AMBULANT,
second_eci=AMBULANT
normalized=ambulant

That gives X = 7d88323 and the normalized 8-letter ECI codeword ambulant. The final layer was deterministic embedding extraction with all-MiniLM-L6-v2, taking index 0 and rounding to 4 decimals exactly as requested.

smile

# mirrorfall_y.py
import os
import random
import numpy as np
from sentence_transformers import SentenceTransformer

os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")
random.seed(0)
np.random.seed(0)

codeword = "ambulant"
model = SentenceTransformer("all-MiniLM-L6-v2")
embedding = model.encode(codeword, convert_to_numpy=True, show_progress_bar=False)
y = float(embedding[0])

print(f"codeword={codeword}")
print(f"embedding_dim={len(embedding)}")
print(f"y_raw={y}")
print(f"y_rounded={y:.4f}")

python mirrorfall_y.py

codeword=ambulant
embedding_dim=384
y_raw=0.02446681074798107
y_rounded=0.0245

Combining both variables as <X>_<Y> gives 7d88323_0.0245, which matches the required flag prefix format.

Solution#

# mirrorfall_extract.py
import re
import requests
from pathlib import Path
from pypdf import PdfReader

owner, repo = "iamcryptoki", "snowden-archive"
target = "documents/2013/20130905-theguardian__bullrun.pdf"
api = "https://api.github.com"

meta = requests.get(f"{api}/repos/{owner}/{repo}", timeout=30).json()
branch = meta["default_branch"]

commits = requests.get(
    f"{api}/repos/{owner}/{repo}/commits",
    params={"path": target, "per_page": 1, "sha": branch},
    timeout=30,
).json()
sha = commits[0]["sha"]
sha7 = sha[:7]

raw_url = f"https://raw.githubusercontent.com/{owner}/{repo}/{branch}/{target}"
pdf_path = Path("mirrorfall_bullrun.pdf")
pdf_path.write_bytes(requests.get(raw_url, timeout=60).content)

text = "\n".join((p.extract_text() or "") for p in PdfReader(str(pdf_path)).pages)
line = next((ln for ln in text.splitlines() if "APERIODIC" in ln), "")
parts = [t.strip(" ,.;:/()\t\r\n").upper() for t in re.split(r"\s+", line) if t.strip()]
second_eci = parts[parts.index("APERIODIC") + 1].lower()

print(sha7)
print(second_eci)

# mirrorfall_y.py
import os
import random
import numpy as np
from sentence_transformers import SentenceTransformer

os.environ.setdefault("TOKENIZERS_PARALLELISM", "false")
random.seed(0)
np.random.seed(0)

model = SentenceTransformer("all-MiniLM-L6-v2")
embedding = model.encode("ambulant", convert_to_numpy=True, show_progress_bar=False)
print(f"{float(embedding[0]):.4f}")

python mirrorfall_extract.py
python mirrorfall_y.py

7d88323
ambulant
0.0245
apoorvctf{7d88323_0.0245}