Browse By

IDEA #20 – Finding the Employee Leak

I guess it is just assumed by Yahoo that a staff-wide email would get leaked, so they’re not going to put any juicy strategy into those types of emails — but it got me thinking, wouldn’t they love to know who did in fact leak this? It’d be simple, really — well, until employees learned the secret.

I’d take the email and send it through a bulk email program that alters some of the words / characters in the email. Correspond each version to each employee and then if you ever see a leaked version, you would be able to match it. Especially an email of that size — although, maybe there couldn’t possibly be 9,800+ grammatical changes :)

Although what about all types of emails you send (to more than 1 person). Wouldn’t it be great if they were passed through a filter — just in case they were confidential and were leaked?

  • Matt

    Funny, had this idea when f* was at it’s height back in the first boom/crash. The english language is by far robust enough to “watermark” a document this size, contractions, synonyms etc. However the problem with the adoption was simply there is a lot of time invested in the language of these memos and to have that modified ad hoc by a filter simply unacceptable to most people I talked to….nonetheless a good an intresting idea.

    On a side note, awesome blog, great ideas that provide a catalyst to others!

  • Eric Nagel

    For a brief while, I was working on something similar… spiders (Google) love unique content, but (gray area) affiliates don’t like writing it. If you could scrape content (from Wikipedia) and alter it just enough so it still made sense, but was “unique”, that’d be a gold mine.

    Lost is an Emmy and Golden Globe award-winning serial drama television series…

    Lost, a Golden Globe and Emmy award-winning serial drama,…

    However, I gave up. There’s no easy way to do this… even with a synonyms database.

  • Eric Nagel

    9,800 text changes isn’t that hard.

    You can even start with # of spaces after a . I learned 2 in high school, so that’s what I still use, but many people use 1 or don’t care. So, if there are 4 sentences (3 breaks between them) that makes the following possibilities:

    (# of spaces)
    1 1 1
    1 1 2
    1 2 1
    1 2 2
    2 1 1
    2 1 2
    2 2 1
    2 2 2

    8, 2^3. So for 9,800 combinations, 2^x >= 9800… x is at least 14.

    So using spaces between sentences alone, you’d need 15 sentences to do this.

    BUT, you’re right… once they caught on, you have to think of another method.

  • AndrewFromFly

    Yes, but then TechCrunch could just alter the number of spaces after each . again before they post it. It’ll either correspond to some innocent guy’s version who didn’t leak it. Or it’ll correspond to none of them.

  • Eric Nagel


    Yes, you’re right… it’ll work once. But if you could manage to watermark the text somehow… this technology could not only be used in corporate press releases, but also in government and even file sharing.

    If you could watermark an mp3 file, then the user that has rights to it could use it wherever he likes – but if it’s watermarked and ends up shared on the web, you can link it back to him.

  • Steve Poland
  • Phil McCarty

    We’re currently doing a version of this with MP3s. If you send songs to 20 different people, each song gets embedded with a unique (largely) inaudible and indestructible signal, which can then be backtracked to them, in the event that they leak it.