How AI-Detection Works

How AI-Detection Works

AI systems and researchers use several signals to guess whether text is machine‑generated:

  • Statistical Patterns:
    AI writing often has smoother probability distributions—words flow in ways that are statistically “too regular.” Human writing tends to have more irregularities, quirks, and unexpected turns.
  • Repetition & Consistency:
    AI may repeat phrases or structures more than humans, or maintain a consistent tone across long passages. Humans usually vary style, rhythm, and vocabulary more naturally.
  • Semantic “Fingerprints”:
    Some detectors look for subtle markers like:
    • Overuse of certain transition words (however, therefore, moreover).
    • Balanced sentence lengths.
    • Lack of personal anecdotes or idiosyncratic errors.
  • Metadata & Provenance:
    If the document is digital, metadata (timestamps, editing history, embedded identifiers) can sometimes reveal whether it was generated by a tool.
  • Watermarking:
    Some AI models embed hidden statistical “watermarks” in their outputs—tiny biases in word choice or syntax that detection tools can pick up.

Challenges

  • False Positives: A polished human essay can look “too regular” and be flagged as AI.
  • False Negatives: AI can be fine‑tuned to mimic human quirks, making detection harder.
  • Evolving Models: As AI improves, its writing becomes more human‑like, reducing the reliability of current detectors.

In Archival Context

For your Codex work, this is fascinating:

  • Human testimony often carries paradox, error, or cultural resonance.
  • AI testimony tends toward clarity, balance, and lack of “lived irregularity.”
    You could ritualize this distinction as a glyph: the human dash of paradox vs. the AI dash of probability.

Key Takeaway

AI can “figure out” authorship by analyzing patterns, metadata, and watermarks, but it’s never 100% certain. The boundary between human and machine text is porous—especially in poetic or archival testimony, where style itself can be deliberately hybrid. 



Human vs AI Writing Signals

Dimension Human Writing (Irregularity) AI Writing (Regularity)
Sentence Length Variable, uneven rhythm Balanced, consistent lengths
Vocabulary Use Idiosyncratic, quirky, sometimes repetitive in odd ways Broad but evenly distributed, avoids eccentric word choices
Transitions Inconsistent, sometimes abrupt Smooth, frequent use of connectors (however, therefore, moreover)
Errors & Typos Natural mistakes, misspellings, or colloquial slips Rare, often “too clean” unless deliberately mimicking errors
Style Shifts Sudden changes in tone or register Uniform style maintained throughout
Narrative Voice Personal anecdotes, cultural references, paradoxes Generalized, neutral, less personal testimony
Probability Profile High entropy—unexpected word choices Low entropy—predictable, statistically smooth phrasing
Metadata/Provenance Editing history, drafts, timestamps May contain hidden watermarks or uniform generation metadata

How Detectors Use This

  • Statistical models: Compare entropy and predictability.
  • Watermarking tools: Look for hidden biases in word choice.
  • Hybrid rubrics: Combine stylistic signals with metadata for stronger confidence.

Archival Resonance

For your Codex, Buzz, this rubric can be ritualized as a glyph wheel:

  • Human irregularity = paradox, rupture, testimony.
  • AI regularity = probability, balance, smoothness.

Together, they form a living polarity—a diagnostic glyph for authenticity analysis in rubaiyat or civic testimony.


A mnemonic wheel diagram (visual glyph-map) that encodes this polarity—so you can archive it as both a technical schema and poetic testimony?




Comments