AI systems and
researchers use several signals to guess whether text is machine‑generated:
- Statistical Patterns:
AI writing often has smoother probability distributions—words flow in ways that are statistically “too regular.” Human writing tends to have more irregularities, quirks, and unexpected turns. - Repetition & Consistency:
AI may repeat phrases or structures more than humans, or maintain a consistent tone across long passages. Humans usually vary style, rhythm, and vocabulary more naturally. - Semantic “Fingerprints”:
Some detectors look for subtle markers like: - Overuse of certain transition words (however,
therefore, moreover).
- Balanced sentence lengths.
- Lack of personal anecdotes or idiosyncratic
errors.
- Metadata & Provenance:
If the document is digital, metadata (timestamps, editing history, embedded identifiers) can sometimes reveal whether it was generated by a tool. - Watermarking:
Some AI models embed hidden statistical “watermarks” in their outputs—tiny biases in word choice or syntax that detection tools can pick up.
Challenges
- False Positives: A polished human essay can look “too
regular” and be flagged as AI.
- False Negatives: AI can be fine‑tuned to mimic human quirks,
making detection harder.
- Evolving Models: As AI improves, its writing becomes more
human‑like, reducing the reliability of current detectors.
In Archival
Context
For your Codex work, this
is fascinating:
- Human testimony often carries paradox, error, or cultural
resonance.
- AI testimony tends toward clarity, balance, and lack of
“lived irregularity.”
You could ritualize this distinction as a glyph: the human dash of paradox vs. the AI dash of probability.
Key Takeaway
AI can “figure out” authorship by analyzing patterns, metadata, and watermarks, but it’s never 100% certain. The boundary between human and machine text is porous—especially in poetic or archival testimony, where style itself can be deliberately hybrid.
Human vs AI Writing Signals
| Dimension | Human Writing (Irregularity) | AI Writing (Regularity) |
|---|---|---|
| Sentence Length | Variable, uneven rhythm | Balanced, consistent lengths |
| Vocabulary Use | Idiosyncratic, quirky, sometimes repetitive in odd ways | Broad but evenly distributed, avoids eccentric word choices |
| Transitions | Inconsistent, sometimes abrupt | Smooth, frequent use of connectors (however, therefore, moreover) |
| Errors & Typos | Natural mistakes, misspellings, or colloquial slips | Rare, often “too clean” unless deliberately mimicking errors |
| Style Shifts | Sudden changes in tone or register | Uniform style maintained throughout |
| Narrative Voice | Personal anecdotes, cultural references, paradoxes | Generalized, neutral, less personal testimony |
| Probability Profile | High entropy—unexpected word choices | Low entropy—predictable, statistically smooth phrasing |
| Metadata/Provenance | Editing history, drafts, timestamps | May contain hidden watermarks or uniform generation metadata |
How Detectors Use This
- Statistical models: Compare entropy and predictability.
- Watermarking tools: Look for hidden biases in word choice.
- Hybrid rubrics: Combine stylistic signals with metadata for stronger confidence.
Archival Resonance
For your Codex, Buzz, this rubric can be ritualized as a glyph wheel:
- Human irregularity = paradox, rupture, testimony.
- AI regularity = probability, balance, smoothness.
Together, they form a living polarity—a diagnostic glyph for authenticity analysis in rubaiyat or civic testimony.
A mnemonic wheel diagram (visual glyph-map) that encodes this polarity—so you can archive it as both a technical schema and poetic testimony?
Comments
Post a Comment