Case-insensitivity not always respected in regexes

BYOND Forums

Announcements · BYOND Help · Bug Reports · Feature Requests · Beta Testers · Beta Bugs · Developer Help · Design Philosophy · Demos & Libraries · Tutorials & Snippets · Art & Sound · Classified Ads · Game Updates · Contests & Events · Linux Talk · On Topic · Off Topic

ID:2472413

May 31 2019, 12:09 pm

Altoids0

Resolved

Case insensitivity was not properly handled in some complex regular expressions.

BYOND Version:	512
Operating System:	Windows 10 Pro 64-bit
Web Browser:	Chrome 74.0.3729.169
Applies to:	Dream Daemon

Status:

Resolved (512.1472)

This issue has been resolved.

Descriptive Problem Summary:
On my Space Station 13 server, we run a series of regexes to filter out bigoted speech.

Recently, I noticed that my most complicated regex, used to detect the n-word, would not work on upper-case text, and would only work on lowercase, as that was the case the regex was written in. Another regex, used for the same purpose, did not experience this problem, and properly followed the case-insensitive flag.

Numbered Steps to Reproduce Problem:
1. Run the code below, with your bigoted expletives of choice.
2. Be sad.

Code Snippet (if applicable) to Reproduce Problem:

/proc/isnotpretty(var/text)
    var/list/pretty_filter_items = list(
    @"\b[nl]+[\W_]{0,4}[!i\/?1\\]+[\W_]{0,4}[qgb]+[\W_]{0,4}[qgb]?[\W_]{0,4}(?:[e3][\W_]{0,4}r|a)(?!ia|al)s*\b",
    "nigg+"
    )

    for(var/pattern in pretty_filter_items)
        var/regex/R = new(pattern, "ig")
        if(R.Find(text)) //If found
            return TRUE // Yes, it isn't pretty.
    return FALSE // No, it is pretty.
/proc/main()
    var/list/expletives = list() // Fill this yourself!
    for(var/word in expletives)
        world.log << isnotpretty(word)

Expected Results: For the more complicated regex to obey it's "i" flag

Actual Results: A lack of case insensitivity.

Does the problem occur:
Every time? Or how often? Every time.
On other computers? Tried MoMMIv2 and the same bug occurred, yep.

When does the problem NOT occur? In the second, simple regex given.

Did the problem NOT occur in any earlier versions? If so, what was the last version that worked? (Visit http://www.byond.com/download/build to download old versions for testing.) This happened on my server running 512.1464, and then was confirmed by MoMMIv2 running 512.1454, so it's at least older than those versions.

Workarounds:
It is possible to just manually add the capital versions of every letter in the regex, albeit rather tiresome.

May 31 2019, 12:12 pm (Edited on Jun 1 2019, 6:49 pm)
Altoids0	Here's a link to a Regex101 thing showing off this regex, to prove that it's supposed to be catching regardless of case: (Contains offensive speech as examples of what this regex captures, cover your eyes children) https://regex101.com/r/XxwMID/13

May 31 2019, 8:13 pm
Lummox JR	Thanks. This should be helpful for finding the problem. I'll test in 512 and also retest in 513's updated engine.

Jun 3 2019, 10:07 am
Lummox JR	Okay, the link you provided isn't working for me. Can you just point me to a pastebin or something with the words I should filter against? Or a test project that includes the words? I don't know what I'm supposed to catch or not catch in this regex otherwise.

Jun 6 2019, 3:50 am
Altoids0	Yeah, I think I forgot the href parameter to my link. It should work now.

Jun 6 2019, 11:20 am
Lummox JR	It's not just the physical link. I mean I can't use that site. Please post a pastebin.

Jun 6 2019, 4:19 pm

In response to Lummox JR

Hiead

Lummox JR wrote:

It's not just the physical link. I mean I can't use that site. Please post a pastebin.

Does regexr work for you? https://regexr.com/4fd3j

It's apparent that the OP wants to filter against a slew of variations of the anti-black racial slur, including if you have like 5 "i"s or some punctuation ("N.i."-). Though it also doesn't work with e.g. repeating "R" at the end of the word.

OP is saying that the regex isn't being treated as case-insensitive, e.g. "n.i."- works fine, but not "N.I."-

Jun 6 2019, 9:08 pm
Lummox JR	Also not working. Just a pastebin of examples, please.

Jun 12 2019, 10:06 am
Lummox JR	Still waiting on an example I can work with. Heck, just throw together a test project or something even. That would be best.

Jun 17 2019, 1:05 am
Optimumtact	https://file.house/eTxu.txt here, I just pulled their regex + test strings (match and non matching)

Jun 17 2019, 1:31 am
Lummox JR	Thanks. I'll look into this.

Jun 17 2019, 11:24 am
Lummox JR	Lummox JR resolved issue with message: Case insensitivity was not properly handled in some complex regular expressions.

Jun 17 2019, 11:25 am
Lummox JR	Interesting bug. It turned out not only was the problem also present in 513, it was worse there. One of the regular expressions in the test set that was supposed to match did not match in 512, but two didn't match in 513, and it was because of two different bugs present in both versions. Apparently a slight difference in behavior caused one of the bugs to manifest in 513 but not 512.