IDEA #95 – URL Filter for filtering Adult websites using Google SafeSearch
It seems to me that there should be a web app with an API that I can easily send it a URL and it’ll toss back to me whether that URL is an adult website, or has been scanned and has adult keywords on it, or the URL has been officially designated by the company as a website with adult content.
I realize no filter will be great, but I’d like to have an initial filter. Google has ‘SafeSearch’, which will eliminate URL results if you have it turned on.
Does anyone know any actual web services that do what I want?
Otherwise, here’s an initial solution (which someone could create quickly I’m sure and I think it’d be useful to some people; ah-hem):
Do a Google search with the parameter of either ‘safe=off’ or ‘safe=active’ [these are hidden vars, but also appear to work in the query string].
What you could do is a ‘site:(url)’ search with SafeSearch turned off, then with it turned on. If Google shows anything with SafeSearch off for that URL, then you know Google has the domain in its index. If then it shows nothing with SafeSearch turned on, then you know Google has blocked the site for adult content.
I guess the easy way to parse Google’s results for whether this is works is the keyword phrase “did not match any documents.”, along with “Make sure all words are spelled correctly.” [just to make sure that phrase wasn’t in the actual results].
Update 1: Sam built this quick and it works great! We’re going to put it up at FiltURL — anyone want to make this page look semi-decent graphically? I’m picturing the homepage of this site to have a URL box with a submit/go button. The homepage will have info on the app, as well as how to use it [like tinyurl]. We’ll also throw some adsense code on there. There doesn’t even have to be a 2nd page, we could just show the result in ajax on that page if the user inputs a url on our page manually.
Update 2: Nate comments below that AWIS can provide this info at $0.15 per 1000, but I wonder how limiting this is, because Alexa’s site doesn’t contain every single URL out there (and does it tell if a site is adult or not — if so, wouldn’t it be listed under their ‘adult’ category?). It would need to be tested. The Google route is good, but as mentioned in the comments, Google may block the IP; but we’re not profiting from Google.