Added new post.

This commit is contained in:
Franco Masotti 2018-03-03 17:13:24 +01:00
parent d9d979db1c
commit beeddfc597
1 changed files with 105 additions and 0 deletions

View File

@ -0,0 +1,105 @@
---
title: An O(n) Python 3 algorithm that halves the number of characters to be removed.
tags: [algorithm, string, python]
updated: 2018-03-02 17:00
description: A Python algorithm that halves the number of characters to be removed
---
It's been a while.
Some days ago I thought to use this algorithm to remove duplicate escape
characters in my [md-toc](http://github.com/frnmst/md-toc) program.
I then realized I didn't need it and also noticed that dealing correctly with
escape characters seems very hard. So I generalized the algorithm to get
any number of a specified charatcter in a string, except `\` !!, to be
halvened.
## Algorithm
Here's the algorithm along with some unit tests:
```python
import math
import unittest
def halve_characters(s, remove_char):
assert isinstance(s, str)
assert isinstance(remove_char, str)
assert len(remove_char) == 1
assert remove_char != '\\'
i = 0
final_string = str()
while i < len(s):
if s[i] == remove_char:
j = i
count = 1
match = True
while j < len(s) - 1 and match:
if s[j] == s[j + 1]:
count += 1
else:
match = False
j += 1
remove_char_count = math.floor(count / 2)
final_string += remove_char_count * remove_char
i += count
else:
final_string += s[i]
i += 1
return final_string
class Test(unittest.TestCase):
def test_halve_characters(self):
self.assertEqual(halve_characters('thiis is a stringiioo. Hiiii', 'i'), 'this s a strngioo. Hii')
self.assertEqual(halve_characters('****\n', '*'), '**\n')
self.assertEqual(halve_characters('\\\n\\\\\n', '\n'), '\\\\\\')
self.assertEqual(halve_characters('', 'a'), '')
self.assertEqual(halve_characters('abcdefffghifffff', 'f'), 'abcdefghiff')
if __name__ == '__main__':
unittest.main()
```
## Explanation
The idea behind it is that we iterate character by
character through the string until we find the an element that needs
to be removed: `if s[i] == remove_char`
If this condition is true, we set a counter ``count = 1`` which is a
counter for the number of `remove_char` characters.
We assume that the next character in the string will also be `remove_char`.
If it is it then we continue to count the occurrencies of `remove_char`.
Once we reach the end of the string or if there are no more
consecutive `remove_chars` we compute the halved number of
remove characters with `math.floor(count / 2)` and concatenate
`remove_char * math.floor(count / 2)` to the final string. Using the floor
function implies that the count is approximated to the nearest smaller integer
number.
If the current character is not `if s[i] != remove_char` we simply add it
to the final string.
## Complexity
The complexity is O(n) because each character is inspected only once:
``` python
while i < len(s):
if s[i] == remove_char:
[...]
# Remember what count is.
i += count
else:
i += 1
```
Enjoy!