Transfer a string into list but having filter(Python)

TaokyleYT · November 6, 2022, 12:58am

I am so sorry to hurt you brain reading the title as I am bad at English

So I was making a code converter and there is code corresponding to different letter, numbers or symbols. However, I also found that a special combination of code have a complete different conversion

I have already made a dictionary about it but the problem is that
how am I going to convert every string into list but filtering string contains format of
f ‘<{str}>‘ ?
example:
User input: ‘<{KA}>AbcHH13<{HH}>@<{SK}>’
(ignore the {} cuz without them it won’t show)

Converted list: [‘<{KA}>’, ‘a’, ‘b’, ‘c’, ‘h’, ‘h’, ‘1’, ‘3’, ‘<{HH}>, ‘@‘, ‘<{SK}>’]
(Also, ignore the {} they are there for showing the entire thing with that format, for some reason it disappears)

Plz help

MattDESTROYER · November 6, 2022, 1:12am

Here’s just a quick, simple, attempt at what your asking for:

function filterString(str) {
	const res = [];
	for (let i = 0; i < str.length; i++) {
		if (str[i] === "<") {
			const idx = str.indexOf(">", i + 1);
			if (idx === -1) {
				res.push(...str.substr(i).split(""));
				return res;
			} else {
				res.push(str.substr(i, idx - i + 1));
				i = idx + 1;
			}
		} else {
			res.push(str[i]);
			// if your example output being set to lowercase was not a typo:
			// res.push(str[i].toLowerCase());
		}
	}
	return res;
}

filterString("<KA>Abc<LONGER>HH13<HH>@<SK>"); // -> ["<KA>", "A", "b", "c", "<LONGER>", "H", "H", "1", "3", "<HH>", "@", "<SK>"]

pro0grammer · November 6, 2022, 7:19am

I tried it on my end and here’s an alternative to @MattDESTROYER’s code.

This can detect even if string inside < and > has more then two letters and also removes if they are elsewhere in the string.
Example:

<Bk>som<<<>>>ethin<g>>< will give same result as <Bk>somethin<g>.

It also requires less lines:

function filterString(str) {

    var result = [];
   result = str.split("<");
   result = result.map(a=> {
        if(a.includes(">")) a = "<"+a;
        return a;
    });
   
   result = result.join(">").split(">");
   result = result.map(a=> {
       if(a.charAt(0) == "<") a += ">";
       else a = a.toLowerCase().split("").join("<>");
       return a;
   })
   
   result = result.join("<>").split("<>").filter(a=> a!="");
   return result;
}

Here’s how you call it:

//you call it like this:
var res = filterString("<KA>AbcHH13<HH>@<SK>");

Thansk to MattDESTROYER for making it one liner and here is the shortest version of the function:

const filterString = (str) => str.split("<").map((a) => a.includes(">") ? "<" + a : a).join(">").split(">").map((a) => a.charAt(0) === "<" ? a + ">" : a.toLowerCase().split("").join("<>")).join("<>").split("<>").filter((a) => a !== "");

Though as this function uses map, it results in a bit slower function than the one by MattDESTROYER.

MattDESTROYER · November 6, 2022, 8:21am

Didn’t notice that the curly brackets were not supposed to be part of the input, updated my version. I was interested to see the performance difference between the two versions so I did a basic benchmark. Mine seems to be arround 40-60% faster just based off that test (obviously results vary). Your version can actually be chained and shortened into a one-liner:

const filterString = (str) => str.split("<").map((a) => a.includes(">") ? "<" + a : a).join(">").split(">").map((a) => a.charAt(0) === "<" ? a + ">" : a.toLowerCase().split("").join("<>")).join("<>").split("<>").filter((a) => a !== "");

Also, it’s not good practice to use result without declaring it properly, or to leave out semicolons.

Nice idea cleaning out unneccessary ‘tags’ (dunno what to call them), given me the idea of adding escapability to the greater than and less than characters (although I’ll probably do that later).

pro0grammer · November 6, 2022, 8:41am

I actually forgot to declare result. As of missing semi colons, I am on mobile so I just forgot it due to the lack of formatability😅

I first wanted to make it faster than your version but as soon as I wrote .map() I knew my function was slower than yours. I also wanted to make one liner but I was lazy and I wanted to go for some work so just left it as it was. Thanks for making it one liner though, I’ll update my answer with that.

TaokyleYT · November 6, 2022, 12:24pm

I think I forgot to add one thing that it is Python
Sorry

pro0grammer · November 6, 2022, 12:58pm

Umm… well I have little python knowledge to answer the question so sorry. But this kind of feels funny

MattDESTROYER · November 6, 2022, 7:44pm

Rewritten in Python (changed str to string because str is a function in Python):

def filterString(string):
	res = []
	i = 0
	while i < len(string):
		if string[i] == "<":
			idx = string.index(">", i + 1)
			if idx == -1:
				res += string[i:].split("")
				return res
			else:
				res.append(string[i:idx + 1])
				i = idx
		else:
			res.append(string[i].lower())
		i += 1
	return res

filterString("<KA>Abc<LONGER>HH13<HH>@<SK>") # -> ["<KA>", "a", "b", "c", "<LONGER>", "h", "h", "1", "3", "<HH>", "@", "<SK>"]

TaokyleYT · November 6, 2022, 11:21pm

Output being lowercase isn’t a typo, so Ima use that code
And I found some problem, not a big deal fixed for ya and for those who need this code later

Into

res.appnd(str(string[i].lower))

As str statement is a bit weird in Python

MattDESTROYER · November 6, 2022, 11:24pm

Whoops, was meant to be this:

res.append(string[i].lower())

UMARismyname · November 8, 2022, 10:34pm

ignore the {}

in future just type \> to get >, or in this case you should have used code blocks:
```py
# code here
```

This is better solved using regex. Especially basic regex like this is easy (once you learn it, as with many things), and usable in many programming languages

from re import findall
print(findall("<.*?>|."), "<KA>AbcHH13<HH>@<SK>")

MattDESTROYER · November 9, 2022, 3:09am

Regex is definitely simpler, but in some cases, more expensive (just by itself). In this case, if converting to lowercase is a requirement, Regex creates the need to iterate through the string twice.

UMARismyname · November 9, 2022, 7:51pm

well, they didn’t say so, but if you mean make the whole string lowercase:

from re import finditer

print([x.group().lower() for x in finditer("<.*?>|.", "<KA>AbcHH13<HH>@<SK>")])

system · November 16, 2022, 7:51pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.