blog/_posts/2018-03-02-algorithm.md

3.1 KiB

title tags updated description
An O(n) Python 3 algorithm that halves the number of characters to be removed
algorithm
string
python
2018-03-02 17:00 A Python algorithm that halves the number of characters to be removed

It's been a while.

Some days ago I thought to use this algorithm to remove duplicate escape characters in my md-toc program. I then realized I didn't need it and also noticed that dealing correctly with escape characters seems very hard. So I generalized the algorithm to get any number of a specified charatcter in a string, except \ !!, to be halvened.

Algorithm

Here's the algorithm along with some unit tests:

import math
import unittest


def halve_characters(s, remove_char):
    assert isinstance(s, str)
    assert isinstance(remove_char, str)
    assert len(remove_char) == 1
    assert remove_char != '\\'

    i = 0
    final_string = str()
    while i < len(s):
        if s[i] == remove_char:
            j = i
            count = 1
            match = True
            while j < len(s) - 1 and match:
                if s[j] == s[j + 1]:
                    count += 1
                else:
                    match = False
                j += 1
            remove_char_count = math.floor(count / 2)
            final_string += remove_char_count * remove_char
            i += count
        else:
            final_string += s[i]
            i += 1

    return final_string



class Test(unittest.TestCase):
    def test_halve_characters(self):
        self.assertEqual(halve_characters('thiis is a stringiioo. Hiiii', 'i'), 'this s a strngioo. Hii')
        self.assertEqual(halve_characters('****\n', '*'), '**\n')
        self.assertEqual(halve_characters('\\\n\\\\\n', '\n'), '\\\\\\')
        self.assertEqual(halve_characters('', 'a'), '')
        self.assertEqual(halve_characters('abcdefffghifffff', 'f'), 'abcdefghiff')


if __name__ == '__main__':
    unittest.main()

Explanation

The idea behind it is that we iterate character by character through the string until we find the an element that needs to be removed: if s[i] == remove_char

If this condition is true, we set a counter count = 1 which is a counter for the number of remove_char characters.

We assume that the next character in the string will also be remove_char. If it is it then we continue to count the occurrencies of remove_char.

Once we reach the end of the string or if there are no more consecutive remove_chars we compute the halved number of remove characters with math.floor(count / 2) and concatenate remove_char * math.floor(count / 2) to the final string. Using the floor function implies that the count is approximated to the nearest smaller integer number.

If the current character is not if s[i] != remove_char we simply add it to the final string.

Complexity

The complexity is O(n) because each character is inspected only once:

while i < len(s):
    if s[i] == remove_char:
        [...]
        # Remember what count is.
        i += count
    else:
        i += 1

Enjoy!