Question about converting string

Hey I have a question regarding Python 3. I follow this thread because I’m learning data types on a book and is a 1010 course in my university. I want to create a very simple codificator in Python 3. Basically as I don’t know how to manipulate bits and make a real compression ratio I just want to show the difference between the len() of an original text and the one compressed (substituted by some symbols some expressions).

Before showing you the code I want to say that for this a while loop would be awesome, the problem are all the values the iteration has to consider and then do the break. That’s why I’m using a for loop. I’ve though about a more efficient way using kind of a dictionary and accesing an item as the value of each key but I don’t know. So here’s my first try. Just basically nested conditionals; i know this is quite bad:

# For a text inside a string change different words for symbols and print the result
String = str(input("Introduce a text for codificating it in a simple way: "))

for i in String:
    if i == 'and':
        print('&')
    elif i == 'to':
        print('>')
    elif i == 'the':
        print('~')
    elif i == 'an':
        print('!')
    elif i == 'is':
        print('=')
    elif i == 'character':
        print('#')
    elif i == 'ASCII':
        print('%')
    elif i == 'that':
        print('$')
    elif i == 'represented':
        print('@')
    else # I want to state on the else to continue like normal!

# Here I would print the len() of the normal text and then the len() of the compressed one. 
# Showing the difference between those two; not actually doing a ratio

I’ve learned something about bitwise operators and stuff but as you see this only changing some expressions on the strings by some symbols. We are not really codificating anything. How can I do this? Thank you in advance.

Well, this leaves you with two options:

keep a counter of how many characters you save by each substitution.
make a new string with the substituted characters and then use len().

1 Like

I still have to review Huffman Encoding; with the help of a pro friend of mine we manage to just see the len() of the codificated text and the original one. (He did all the hard work)

from string import whitespace, punctuation
from sys import stderr as sys_stderr

STR_HASH = {
    "and": "&",
    "to": ">",
    "the": "~",
    "an": "!",
    "character": "#",
    "ASCII": "%",
    "that": "$",
    "represented": "0",
}


ENC_AVOID = set(whitespace + punctuation)


def encode(line):
    output_stack = []
    current = ""
    for c in line:
        if c in ENC_AVOID:
            output_stack.append(STR_HASH.get(current, current))
            output_stack.append(c)
            current = ""
            continue
        current += c

    output_stack.append(STR_HASH.get(current, current))

    return "".join(output_stack)


filename = input("What file do you want me to encode? : ")

try:
    with open(filename) as fh:
        lines = (line.strip() for line in fh.readlines())

except (FileNotFoundError, PermissionError):
    print(f"File {filename!r} does not exist!", file=sys_stderr)
    raise SystemExit(1)


lines = (encode(line) for line in lines)

for line in lines:
    print(line)

print('\nThe length of the input text is: ' + str(len(filename)))
print('The length of the codificated text is: ' + str(len(line)))

For the next time I have to study more in depth Windows-1252 encoding and review how Python works with character encoding plus some maths to actually do real encoding I guess.