Deep Dive

The Most Protected Comment Boxon the Internet

DEV
Karma Builder Engineering
12 min read

Reddit's comment box looks simple. A white rectangle where you type words. But underneath that innocent rectangle sits one of the most aggressively protected input fields on the internet — and we had to figure out how to type into it without Reddit knowing it wasn't a human.

This is the story of how we built Karma Builder's auto-typing system. It took us through computer vision, physics simulations, and a crash course in how your hand actually moves a mouse — all so our extension could click a box and type a sentence.

01The Problem: You Can't Just “Click” a Comment Box

When you click Reddit's comment box, something interesting happens. The placeholder text disappears, a rich text editor slides in, and a “Comment” button appears. Sounds straightforward — but the way Reddit builds this UI makes it nearly impossible for automation tools to interact with.

Reddit wraps everything in something called a Shadow DOM — think of it as a locked room inside the webpage. Normal browser extensions can't see inside. And inside that locked room is another locked room. And inside that is the actual text editor, built using a framework called Lexical that ignores anything that doesn't look like a real keyboard event.

Standard automation tools? They can't even find the comment box, let alone type into it.

“We spent two weeks just trying to reliably locate where the comment box appears on the screen. The traditional approach — reading the page's code — was a dead end.”

02Our Solution: We Taught a Computer to See

Since we couldn't find the comment box by reading code, we decided to find it the way a human would — by looking at the screen.

We trained an AI model (specifically, a YOLO object detection model) on thousands of Reddit screenshots. We labeled every comment box and every “Comment” button by hand. After training, the AI can look at any Reddit page and instantly spot exactly where the comment box is — just like your eyes do when you scan the page.

The extension takes a screenshot (completely invisible — no flash, no notification, nothing the page can detect), sends it to our detection server, and gets back the exact pixel coordinates of the comment box. No Shadow DOM. No framework gymnastics. Just “the box is at position X, Y on the screen.”

This was breakthrough #1. Instead of fighting Reddit's code, we sidestepped it entirely.

03The Mouse Problem: Bots Don't Move Like Humans

Great, we know where the box is. Now we need to click on it. Easy — just move the cursor there and click, right?

Not even close. Modern websites track how your mouse moves. They know that a real human hand doesn't move in a straight line. It doesn't move at constant speed. It overshoots the target slightly, then corrects. It wobbles. It pauses. And when your hand is just resting on the mouse, it still produces tiny micro-movements from your pulse and breathing.

A bot that teleports the cursor from point A to point B — or even one that moves in a smooth, even line — gets flagged instantly.

So we had to build a physics engine for mouse movement. Here's what our virtual cursor actually does:

  • Curved paths: Instead of straight lines, the cursor follows multi-segment mathematical curves (Bézier curves, if you want to get technical). Each path is broken into 2-3 segments with slight corrections at each join — like how your wrist naturally pivots mid-motion.
  • Realistic speed: Real hand movements follow something called Fitts' Law — you move faster to big targets and slower to small ones, and the relationship is logarithmic, not linear. Our cursor accelerates sharply at the start and decelerates gradually at the end, exactly how your arm muscles work.
  • Overshoot and correct: About 60% of the time, the cursor deliberately goes slightly past the target and then snaps back — always along the same direction it was moving, not randomly. Because that's what a real hand does: momentum carries it forward, then you pull back.
  • Constant micro-movement: Even while typing, our system produces tiny 1-5 pixel movements every few hundred milliseconds. A real human's cursor is never perfectly still — your palm rests on the mouse, you shift in your chair, you breathe. A perfectly frozen cursor during 30 seconds of active typing is a dead giveaway.
  • Gentle drift during pauses: When the typing pauses (to “think”), the cursor lazily drifts in a natural arc, as if the user is idly reading the screen while deciding what to type next.

The result? A mouse movement that is statistically indistinguishable from a real human, even under analysis.

04The Click: More Complicated Than You Think

Even a mouse click has personality. We discovered that anti-bot systems look at three things about your click:

  • Where you press vs. where you release: Your hand trembles slightly between pressing and releasing the button. We simulate a 1-3 pixel drift.
  • How long you hold the button: Humans hold a click for about 80-120 milliseconds on average, but the distribution isn't bell-shaped — it follows a “log-normal” curve (a few really fast clicks, a long tail of slower ones). We match that exactly.
  • Where in the target you click: Humans don't click dead center. They click in a slightly random spot, but biased toward the center — a Gaussian (bell curve) distribution. We replicate this.

05Typing That Feels Human

Once the editor is open, we need to type the reply. But even typing has patterns that give away robots.

A human types “the” faster than “xq” because their fingers are already in position. Switching keyboard rows slows you down. Using the same hand for two consecutive keys is slower than alternating. Capital letters require reaching for Shift, which adds time.

We modeled the entire QWERTY keyboard layout. Our typing simulator knows:

  • Which pairs of letters are commonly typed together (fast) vs. awkward combinations (slow)
  • Whether consecutive keys are on the same hand or different hands
  • Whether you need to jump between keyboard rows
  • The timing overhead of punctuation, spaces, and line breaks
  • The gradual fatigue effect — people type slightly slower toward the end of longer messages
  • Random bursts of speed (when you're “in the zone”) and random pauses (when you lose focus for a moment)

We even simulate typos. About 3% of the time, the system “accidentally” presses a key adjacent to the correct one (like hitting “r” instead of “e”), pauses for a moment to “notice” the mistake, backspaces, and retypes the correct character. Because a perfect typist is suspicious.

06The “Thinking” Pause

Real people don't type at a constant rate. They pause every 8-15 words to re-read what they wrote or think about the next part. These pauses last 2-8 seconds and happen randomly — you can't predict when someone will hesitate.

During these pauses, the cursor gently drifts across the screen (because the person is reading), and then typing resumes. Without these pauses, the typing would look machine-like even if the speed varied realistically.

07The Final Click: Submitting the Comment

After the reply is typed, one job remains: clicking the “Comment” button. But we don't hardcode button positions — Reddit changes their layout constantly. So we use the AI detection model again.

We take a fresh screenshot of the page (now showing the filled-in editor), send it to our model, and it finds the Comment button. Then the cursor moves to it with the same human-like Bézier path, hovers for a natural moment (people don't click instantly after reaching a button), and clicks with realistic tremble.

The whole process — from clicking the comment box to submitting the reply — looks and feels exactly like a person reading a post, thinking about a response, and typing it out.

08The Ghost Layer: Completely Invisible

Here's what makes this especially powerful: none of this is visible to the webpage.

We don't inject any code into Reddit's page. We don't modify the DOM. We don't add event listeners. The extension communicates entirely through Chrome's low-level debugging protocol, which operates below the JavaScript layer. Reddit's anti-bot scripts can't detect our extension because there's literally nothing on the page to find.

The screenshots we take fire no events — no screenshot API call, no canvas rendering, nothing. The mouse and keyboard events we dispatch are flagged as “trusted” by the browser, identical to real hardware input. Even the most paranoid JavaScript-based detection has no way to tell the difference.

“The best automation isn't the kind that tries to hide from detection. It's the kind that has nothing to hide in the first place.”

09What Took Us the Longest

If we're being honest, the hardest part wasn't any single technical challenge. It was the combination. Each piece — the computer vision, the mouse physics, the typing model, the stealth layer — was a significant project on its own. Making them all work together seamlessly, in real-time, inside a Chrome extension with strict resource limits? That was the real puzzle.

Some specific headaches:

  • Training the AI to find the comment box: We had to collect and label thousands of Reddit screenshots across different themes, screen sizes, and states (collapsed comments, nested replies, award overlays). The model needed to work whether you're on a subreddit for cats or cryptocurrency.
  • Mouse velocity profiling: Our first attempt used a simple speed curve (slow-fast-slow). It looked fine to the human eye, but statistical analysis revealed it was too symmetrical — real movements accelerate fast and decelerate slowly. We had to switch to an asymmetric profile to pass detection.
  • The “frozen cursor” problem: Our early version didn't move the mouse during typing. Turns out, a cursor that stays perfectly still for 30+ seconds while a user is supposedly typing is one of the strongest bot signals there is. Adding continuous micro-noise solved this immediately.

The Result

Today, Karma Builder can generate an AI reply tailored to a Reddit post's context, tone, and community culture — then type it into the comment box and submit it in a way that is statistically indistinguishable from a real human.

We're proud of this, not because we're trying to “trick” anyone, but because the technical challenge was genuinely interesting. Reddit's defenses are sophisticated and constantly evolving, and building something that works reliably within those constraints required us to deeply understand how human behavior translates into digital signals.

Every cursor movement, every keystroke, every pause is the result of studying real human computer interaction and reproducing it faithfully. We didn't take shortcuts — we built the real thing.

— Karma Builder Engineering Team