Browse By

IDEA #95 – URL Filter for filtering Adult websites using Google SafeSearch

It seems to me that there should be a web app with an API that I can easily send it a URL and it’ll toss back to me whether that URL is an adult website, or has been scanned and has adult keywords on it, or the URL has been officially designated by the company as a website with adult content.

I realize no filter will be great, but I’d like to have an initial filter. Google has ‘SafeSearch’, which will eliminate URL results if you have it turned on.

Does anyone know any actual web services that do what I want?

Otherwise, here’s an initial solution (which someone could create quickly I’m sure and I think it’d be useful to some people; ah-hem):

Do a Google search with the parameter of either ‘safe=off’ or ‘safe=active’ [these are hidden vars, but also appear to work in the query string].

What you could do is a ‘site:(url)’ search with SafeSearch turned off, then with it turned on. If Google shows anything with SafeSearch off for that URL, then you know Google has the domain in its index. If then it shows nothing with SafeSearch turned on, then you know Google has blocked the site for adult content.

Here’s an example with SafeSearch on and subsequently, off.

I guess the easy way to parse Google’s results for whether this is works is the keyword phrase “did not match any documents.”, along with “Make sure all words are spelled correctly.” [just to make sure that phrase wasn’t in the actual results].

I’m likely going to get this programmed as I need it for my startup. I have a great domain for this idea (IMO), FiltURL.

Update 1: Sam built this quick and it works great! We’re going to put it up at FiltURL — anyone want to make this page look semi-decent graphically? I’m picturing the homepage of this site to have a URL box with a submit/go button. The homepage will have info on the app, as well as how to use it [like tinyurl]. We’ll also throw some adsense code on there. There doesn’t even have to be a 2nd page, we could just show the result in ajax on that page if the user inputs a url on our page manually.

Update 2: Nate comments below that AWIS can provide this info at $0.15 per 1000, but I wonder how limiting this is, because Alexa’s site doesn’t contain every single URL out there (and does it tell if a site is adult or not — if so, wouldn’t it be listed under their ‘adult’ category?). It would need to be tested. The Google route is good, but as mentioned in the comments, Google may block the IP; but we’re not profiting from Google.

  • http://www.improvingtheweb.com Wesley

    Great idea and pretty easy to program. The obstacle however would be google banning the IP that does the scraping. Something that is difficult to get around unless you have an entire server farm.

  • http://www.natefanaro.com Nate Fanaro

    You should check out the Alexa Web Information Service from Amazon’s Web Services API: http://aws.amazon.com/awis/

    When you send their API a domain name (just the domain name, not an individual url) they return a lot of useful information about the website. This includes a field named ‘AdultContent’

    I use this on some web proxy sites that I run and find it to be fairly accurate. It’s very cheap at $0.15 per 1,000 requests but you will want to cache the results that AWIS sends back to help keep costs low. The AdultContent Flag doesn’t change often but you’d be safe on clearing out data that’s older than one month.

  • http://hotchkissconsulting.net/ Sam

    This would be super easy to implement through the Google API. If I get a chance, I’ll throw something together this afternoon. Shouldn’t take but a minute.

  • http://hotchkissconsulting.net/ Sam

    http://zlit.net/isitporn.php?url=google.com

    The responses are:

    yes – it is porn (there are fewer than 4 pages on this domain that
    pass through strict safesearch)
    no – it’s not porn (there are more than 4 pages on this domain that
    pass through strict safesearch)
    invalid – there’s no record in google of the site

  • http://www.quixoticquisling.com Carl Morris

    I’m pretty sure you can do this with Google Search API – without having to scrape.

  • http://www.re-searchr.com James

    RE: Alexa

    I think that Alexa used to limit the amount of time a user of their services could cache data to 24 hours (via their TOS). This was ostensibly for “freshness” but also to up the profits I’d guess.

    Haven’t used Alexa for a while, but their services are easy to use, and the people there are great if you need any special consults, or want to do more serious deals with them.

  • mattmcb

    Sam, is the isitporn.php script something that you are willing to share and make downloadable?

  • santila

    how about a firefox complement?

  • Pingback: » 100+ Web Start-up Business Ideas - By Steve Poland - web startup ideas and brainstorms, straight up! (formerly Techquila Shots)()

  • http://www.sitefile.org/ Website Value

    I was thinking the same exact thing… a simple true or false (1, or 0) would do this would clean the web and A LOT of spam too!!

    I decided o search and see if anything existed and seen your post I think this would be a major help to everyone that owns a blog or website.

    They do it with badware.org why couldn’t they do it with adult sites!!

  • Martin

    Dear Steve,

    I have the same Problem and use the google websearch api. But the limits of the api use makes it difficult to use this function. Do you found a solution meanwhile?

  • DON

    Mean while use CURL with safe search adn get the links….and index them out manually..i m assuming this solution may have have came to you but i think that will be not fastest but reliable and unstoppable.

    Thanks :)

  • http://www.kevingulling.com/games Kevin Gulling

    I simply wrote a script that can block url’s via containing keywords using the list at http://adultkeywordfinder.com/