Change of anti-spam triggers and functionality (forum feature request)


#1

Yesterday I arrived here as a new user and had some questions about things like how to change my nick on the forums.
Anyway I'm one of those people that love talking with people (including small talk) so I ended up triggering the anti-spam feature (new people are banned from posting new posts for 24 hours after x amount of posts), then I moved to private messages with someone and then it turns out there's a anti-spam blocker there too.
Neither of them gave me any warning up front...

Now I can think of a few reasons why this feature was added.
To avoid spambots from joining the forum and posting a ton of nonsense.
To avoid trolls from joining and souring the community.
To avoid load on the servers.
And to avoid having to store too many messages.
Also it might reduce the workload on the moderators.

The positive effects of the current implementation is presumably that it stops all of the above problems in the majority of cases.
The negative effect is that people with genuine issues are blocked from talking about their topic and might be put off from using the forums on this site (or even using the site as a whole) permanently because of it's discouraging effect.

I'd therefore propose an alternative.

On the load issue I would suggest a more gradual approach and use it for all non-moderators.
Basically just use some base level, say 30 seconds delay between each post and each time someone posts within the baseline + some other variable, say another 32 seconds (making it 1 minute in total) you'd add a third variable (say 5 seconds) on top of the base variable. (So those 30 seconds become 35 seconds and if someone posts within 67 seconds of that it turns into 40 seconds and so one and so forth)
Warn people about the limit and inform them about the exact rules.

You can also add another variable, basically a quota of x number of posts that a poster is permitted to add outside this system before it's triggered.
This could be different for each poster on the forum and could say be based on a "up vote" system available to mods only for instance or anyone else you'd like to be permitted to have it)
Having such a system should reduce your load while still allowing people who actually have something to say the option of saying it.

Now, for those pesky bots and trolls...
For these I'd suggest a machine learning filter with a hybrid of assisted learning and unassisted learning.
(Yes I know it's a ton of work, but it would be a lot more fair and you'd probably be able to sell or license the software afterwards and earn money on it or create your own forum service that you provide people at a cost)

Anyway, the idea is that you can both add sets of rules that the machine can use to recognize problematic posts, these rules needs to be possible to make flexible enough that they're hard to avoid.
For instance if some spambot is advertising the website www.elite.com (I don't even know if that is a real website but let's use it since that's the first thing that comes to my mind that is flexible enough)
Thing is, an "e" can be written as a "e", as a "E", and as a "3".
Also "elite" while sometimes being written as "3117" it has also turned up as "LEET" in certain situations...
(Yes I know, it's silly to write that way, but some do and if someone say sets up a bot to create DNS addresses following certain rules like the above so that it is easily enough memorable to humans while still variable enough to avoid regular single word filters then such a more complex filter can pick them up)
So, basically to cover all such possible variables you'd want to be able to make input rules that can ban words with variables for every single character, but it should only cover those characters mentioned in the rule.
You'll probably want to allow the people managing the filter to add rules that involves certain word combinations too without making it too cumbersome (having to add variables for every single word added shouldn't be enforced, just a option for instance), so sentences with words in a certain order in a sentence (even with variable number of other words between them) could be allowed to trigger the filter.
There might also be other types of rules that you can add that I haven't though off, like say a negative search term (things that would indicate that the rule does not apply after all).
Such rules should be easy to add after implementing the filter at any time while it's running.
At the same time you could allow the filter to compare posts with the rules to see if it can find other possible candidates for things that belongs to the same problem category, say spam bots or trolls or people violating the terms of use of the site.
You can use a flag system with things like "thumbs up", "thumbs down" and "flag as inappropriate" to help the machine learn what should or should not be moderated or possibly be treated in some other way.
Different users can be given different (unseen) tags that allow their "votes" with the flags, "thumbs up" and so one to be given different weight in the code.
One user can be given several tags like that, and the tags can potentially have different effects, like increased weight to up votes and decreased weight to "flag as inappropriate" "votes" from said users (if someone is too sensitive and reports pretty much everything for instance)

The behaviour of the filter when it finds something worth moderating for whatever reason could be different depending on different tags the issue at hand is given and the rule used to find it is given.

So say if rule "A" finds issue "1" (say a very good post/thread that should be a sticky, or a racist comment that should be removed) then action "+" can be carried out (say making the post temporarily invisible to users and moved to a list made with all rule "A" issue "1" posts found by the filter pending moderation and then give the poster a small warning saying that the post is waiting moderation located in the place where the post was located.

Once you mods feel comfortable that that a certain rule is capable of catching a certain issue 100% of the time with 0% false positives the rule can be changed to things like auto-delete posts or warning the user that the post violates the TOS and will be deleted and then auto-delete if the user don't click on a button to petition that decision (if he or she does it could then be moved to a moderation queue), or it could be auto-stickied or whatever you'd like it to do.
It should be possible to add new behaviours just as easily as new rules.

If you like and think you have space to store the data for it you could also make such a filter available for the users and allow them to use it to manipulate the data on the forums, find threads of interest, remove posts that offend them or that's irrelevant to the issue they're interested in and so one and so forth.

New rules suggested by the computer that it does not know if make sense to a human or not could get a custom filter tag with a default behavioural rule of choice, say do not do anything with the post other then add the invisible tag that makes it available to a list seen in another list of potential rules that moderators can choose to go into to look for.
Different moderators could then potentially find their own favourite rules to follow.
They could add subscriptions to them/mark them as favourites to get alerts when new posts belonging to it are made for instance.

Anyway, you get the gist of what I have in mind I think...
I know this is too big a project to expect any-time soon, and it might be something you want to pass on to someone else for them to make.
But something like this would make the forum more interesting and unlike the stuff you're currently using to get rid of spam bots and trolls it wouldn't drive chat addicts like me insane...

Anyway please, for the love of all that's good, do allow people to talk in private messages without blocking them after x number of posts unless someone in the other end object to their posts...

Anyway, what do you guys think about my idea?
Hate it?
Like it?
Got more complicated feelings about it?
Hate certain parts but love others?
Please do reply with any input you might have. =)


#2

Hey @domaldel,

I've moved your topic to the Meta category, since Feature and Course Requests is generally meant for stuff on the main site :)

I think that you're in the vast minority, wanting to talk to people and not just get help with some problem in your code. I think that most of the time, people who want to talk will come back, even after a bad experience with not being allowed to post.
That's not a good excuse for giving people a bad experience though :)

Rate limiting; It's a very effective method of reducing spam generally - I think that's what the forums are already doing, just in a way you're finding a bit restrictive :)
There was a string of three posts I noticed that got posted yesterday, all spam. They were each two minutes apart, meaning that a thirty second delay wouldn't have caught it. So the rate limit would have to be increased. The problem there is that, sometimes (this isn't too uncommon) a user will join, ask for help on the one exercise, then get past it and need help on the next. So the rate limit would have to be fine-tuned to be just in between spammers and most legitimate posts. That could probably be doable, but then you'd also have to have moderators watching over the rate-limit blocked posts, to make sure it's not blocking anything that's good. So just add more moderators. Not super easy, but doable, so that suggestion might work.

That way spammers know exactly how often to post to get around it :wink:... the warning would have to be somewhat vague to not make the rate limiter totally useless.

NO. NEVER DO THAT.

Automatically deleting posts without any human supervision just isn't a good idea. People aren't perfect, and machines are even less so. They're going to make more mistakes than a person, making people with legitimate posts that get deleted very unhappy.

An alternative suggestion would be to check if a post has the same sentence (word wouldn't work because I've already used, for example, β€œthe” several times in this post, and maintaining a word whitelist would be too difficult I think), then flag it. Not deletion, just bring it to the moderators attention, and they can check if it's good or not.
I think that would be the best solution, at least for now, because most of the spam I've seen on here is the same sentence duplicated about five – ten times, generally with a bunch of special characters and a domain name.
Only problem is, how long is a sentence? Do you stop at a period (.) to end the sentence? What if the spammer doesn't care about punctuation such as that? So that might take some thinking about.


Having real users disliking the onboarding process of these new forums is probably a sign of trouble and lost users, though. So at least trying changing some of the spam-block settings and seeing what happens would probably be a good idea :)


#3

The way I see it "rate limiting" wouldn't be about "spam" at all, but about bandwidth conservation.
I do not think that "rate limiting" is appropriate as a anti-spam feature at all and that other tools should be used to deal with such a issue.


#4

Again, I'm suggesting the "rate limit" for use to conserve hardware resources to more fairly distribute it between the user base.
An alternative option could be some kind of commodity market of resources with each person getting a budget with a regular income that can be spent on the forum.


#5

The point I had here was that it would be supervised by the person making the post themselves.
Basically the bot would flag the post giving the user a warning (at the time of the posting) then if they agree with the bots judgement or don't do anything it would delete the post (or at the very least move it to cold storage on a HDD meant for little use low performance and high storage capability) compressed and then left there.

The goal is to not reward the spammers with giving them exposure but simply stopping the post before it ever gets seen making such spam posts pointless.

As for the word bit, it's basically a string of characters with multiple possible characters for each position (like E, e, Γͺ and 3 for instance) including the space between words if the moderator would like to do so.
So for instance in "Hello, my name is Tom" you could add a string for "lo, my nam" and "is Tom" for instance if you so wished with each letter having the option of adding multiple alternative variables there. and with a variable number of characters between these two sets of characters so for instance "Hello, my name, as given by God, is Tom" (for religious people) would also be possible to add as possible to track by allowing a set variable telling the filter how much space that is allowed to be between the character sets but also telling the filter how much that distance can vary within a range.

You should not set the filter to actively try to search for sentences at all, it's searching for patterns in the text, it doesn't actually need to understand that the text holds meaning, only that certain patterns are desirable and other patterns are not and to give the system and the moderators a way to convey the patterns to the filter.

As for the spam-block filter, yeah sure that would probably help. =)
But honestly what I'm looking for is development of a totally different system.
And after reading up a little on the software used for this forum I'm thinking that perhaps some kind of API and a bot on a secondary server permitted to manipulate the data on the forum would be a better solution then trying to integrate the features I have in mind 100% into the forum software directly.

I wish I knew more about programming because I can see a lot of the program structure in my head but I just don't have the words to properly express them nor the means to implement them yet...