dimanche 28 juin 2015

How to censor website links?

I've been working on a regex censor for quite the time and can't seem to find a decent way of censoring address links (and attempts to circumvent that).

Here's what I got so far, ignoring escape sequences:

([a-zA-Z0-9_-]+[\\W[_]]*)+(\\.|[\\W]?|dot|\\(\\.\\)|[\\(]?dot[\\)]?)+([\\w]{2,6})((\\.|[\\W]?|dot|\\(\\.\\)|[\\(]?dot[\\)]?)([\\w]{1,4}))*

I'm not so sure what might be causing the problem but however it censors the word "com" and "come" and pretty much anything that is about 3+ letters.

Problem: I want to know how to censor website links and invalid links that are attempts to circumvent the censor. Examples:

Google.com

goo gle .com

g o o g l e . c o m

go o gl e % com

go og le (.) c om

Also a slight addition, is there a possible way to add links to a white list for this? Thank you.

Aucun commentaire:

Enregistrer un commentaire