Browse Source

⬆️ Propose v/1.4.0 | Few Improvements | Drop dependencies to reduce the footprint (#41)

* 🔧 Using pytest instead of the deprecated nose

* 🔧 Disallow py 3.10 to fail tests

* 🔧 Enable coverage gen using pytest

* 🐛 Fix cov-report missing

* 🔧 coverage report not correctly read using pytest. need investigations.

* 🔥 Drop support for UTF-7 detection

* 🔥 Drop extra feature characters freq file live build

*  Stop using loguru, not pretty anymore but reduce footprint by a lot + ASCII detection review

ASCII until proven otherwise

* 🎨 Dropping useless class UnicodeRangeIdentify, bad code pattern, using functions to be imported as is

* 🔥  Drop dragonmapper, using more elaborated characters frequency to unravel suspicious CJK usage

* 📝 Fix small typo

* 📝 Minor revision on the "Why" section

* 🎨 Safely drop cached_property except for Python 3.5

* ✔️ Adjust test suites to recents changes

* 🔖 Bump to 1.4.0

*  Remove dragonmapper from setup.py

* 🐛 Wrong import for backport cached_property

* 🔧 Fixup due to dropping dragonmapper

* ❇️ Add setter for BOM

* 🔧 Ensure ASCII&UTF-8 are returned if detected

* 🔧 Coverage detect/parse fix

* 🔧 pytest conf (*2)

* Create __init__.py

* ️ Revert maxsize params for cache in unicode.py

*  Reduce Chinese characters freq array to 300

* 🎨 Remove useless calls to dict::keys()

* 🔧 Reducing chinese char freq array again

* ❇️ Quick improvement over CJK mess detector due to regression seen after dragonmapper drop

*  Large input performance issue first address

* 🎨 Minor code revision

* 🐛 Catching ImportError too

* 🔧 add '%' in is_punc for mess detect (especially in ascii)

* 🔧 ASCII detection moderation

* 🐛 Fix drop utf-7 detection

* ❇️ prep CJK mess detect improvement

* 🔧 mv utf_7 drop notice

* ️ revert cjk manual mess detect from 87c580517b

* ✔️ Test inherent sign fn in given sequence
pull/42/head
TAHRI Ahmed R 1 year ago committed by GitHub
parent
commit
98d12fa1ab
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 2
      .coveragerc
  2. 11
      .travis.yml
  3. 2
      CONTRIBUTING.md
  4. 6
      README.md
  5. 2
      charset_normalizer/__init__.py
  6. 2780
      charset_normalizer/assets/frequencies.json
  7. 91
      charset_normalizer/cached_property/__init__.py
  8. 164
      charset_normalizer/normalizer.py
  9. 66
      charset_normalizer/probe_chaos.py
  10. 74
      charset_normalizer/probe_coherence.py
  11. 2
      charset_normalizer/probe_inherent_sign.py
  12. 12
      charset_normalizer/probe_words.py
  13. 524
      charset_normalizer/unicode.py
  14. 2
      charset_normalizer/version.py
  15. 2
      docs/support.rst
  16. 3
      requirements.txt
  17. 2
      setup.cfg
  18. 5
      setup.py
  19. 1
      test/__init__.py
  20. 6
      test/test_detect_legacy.py
  21. 36
      test/test_inherent_sign.py
  22. 26
      test/test_on_byte.py
  23. 5
      test/test_probe_chaos.py
  24. 24
      test/test_unicode_helper.py

2
.coveragerc

@ -0,0 +1,2 @@
[run]
source=charset_normalizer

11
.travis.yml

@ -9,15 +9,18 @@ python:
matrix:
allow_failures:
- python: "3.5" # TODO: Investigate why language detection act differently
- python: "3.10-dev"
- python: "3.5"
before_install:
- "pip install -U pip setuptools"
- "pip install -r requirements.txt"
install:
- pip install nose codecov
- pip install -r requirements.txt
- python setup.py install
script:
- "nosetests --with-coverage --cover-package=charset_normalizer test/*.py"
- pytest
after_success:
- codecov

2
CONTRIBUTING.md

@ -44,5 +44,5 @@ Please be aware of the following things when filing bug reports:
differently and have different bugs.
If you do not provide all of these things, it will take us much longer to
fix your problem. If we ask you to clarify these and you never respond, we
fix your problem. If we ask you to clarify these, and you never respond, we
will close your issue without fixing it.

6
README.md

@ -42,7 +42,7 @@ This project offers you an alternative to **Universal Charset Encoding Detector*
| `License` | LGPL-2.1 | MIT | MPL-1.1
| `Native Python` | :heavy_check_mark: | :heavy_check_mark: | ❌ |
| `Detect spoken language` | ❌ | :heavy_check_mark: | N/A |
| `Supported Encoding` | 30 | :tada: [90](https://charset-normalizer.readthedocs.io/en/latest/support.html) | 40
| `Supported Encoding` | 30 | :tada: [92](https://charset-normalizer.readthedocs.io/en/latest/support.html) | 40
| Package | Accuracy | Mean per file (ns) | File per sec (est) |
| ------------- | :-------------: | :------------------: | :------------------: |
@ -115,8 +115,8 @@ See the docs for advanced usage : [readthedocs.io](https://charset-normalizer.re
## 😇 Why
When I started using Chardet, I noticed that it was unreliable nowadays and also
it's unmaintained, and most likely will never be.
When I started using Chardet, I noticed that it was not suited to my expectations, and I wanted to propose a
reliable alternative using a completely different method. Also! I never back down on a good challenge !
I **don't care** about the **originating charset** encoding, because **two different tables** can
produce **two identical files.**

2
charset_normalizer/__init__.py

@ -1,7 +1,7 @@
# coding: utf-8
from charset_normalizer.normalizer import CharsetNormalizerMatches, CharsetNormalizerMatch, \
CharsetDetector, CharsetDoctor, EncodingDetector # Aliases
from charset_normalizer.unicode import UnicodeRangeIdentify
import charset_normalizer.unicode as unicode_utils
from charset_normalizer.probe_chaos import ProbeChaos
from charset_normalizer.probe_coherence import ProbeCoherence
from charset_normalizer.probe_words import ProbeWords

2780
charset_normalizer/assets/frequencies.json

@ -1,1384 +1,1398 @@
{
"English": [
"e",
"a",
"t",
"i",
"o",
"n",
"s",
"r",
"h",
"l",
"d",
"c",
"u",
"m",
"f",
"p",
"g",
"w",
"y",
"b",
"v",
"k",
"x",
"j",
"z",
"q"
],
"German": [
"e",
"n",
"i",
"r",
"s",
"t",
"a",
"d",
"h",
"u",
"l",
"g",
"o",
"c",
"m",
"b",
"f",
"k",
"w",
"z",
"p",
"v",
"\u00fc",
"\u00e4",
"\u00f6",
"j"
],
"French": [
"e",
"a",
"s",
"n",
"i",
"t",
"r",
"l",
"u",
"o",
"d",
"c",
"p",
"m",
"\u00e9",
"v",
"g",
"f",
"b",
"h",
"q",
"\u00e0",
"x",
"\u00e8",
"y",
"j"
],
"Dutch": [
"e",
"n",
"a",
"i",
"r",
"t",
"o",
"d",
"s",
"l",
"g",
"h",
"v",
"m",
"u",
"k",
"c",
"p",
"b",
"w",
"j",
"z",
"f",
"y",
"x",
"\u00eb"
],
"Italian": [
"e",
"i",
"a",
"o",
"n",
"l",
"t",
"r",
"s",
"c",
"d",
"u",
"p",
"m",
"g",
"v",
"f",
"b",
"z",
"h",
"q",
"\u00e8",
"\u00e0",
"k",
"y",
"\u00f2"
],
"Polish": [
"a",
"i",
"o",
"e",
"n",
"r",
"z",
"w",
"s",
"c",
"t",
"k",
"y",
"d",
"p",
"m",
"u",
"l",
"j",
"\u0142",
"g",
"b",
"h",
"\u0105",
"\u0119",
"\u00f3"
],
"Spanish": [
"e",
"a",
"o",
"n",
"s",
"r",
"i",
"l",
"d",
"t",
"c",
"u",
"m",
"p",
"b",
"g",
"v",
"f",
"y",
"\u00f3",
"h",
"q",
"\u00ed",
"j",
"z",
"\u00e1"
],
"Russian": [
"\u043e",
"\u0430",
"\u0435",
"\u0438",
"\u043d",
"\u0441",
"\u0442",
"\u0440",
"\u0432",
"\u043b",
"\u043a",
"\u043c",
"\u0434",
"\u043f",
"\u0443",
"\u0433",
"\u044f",
"\u044b",
"\u0437",
"\u0431",
"\u0439",
"\u044c",
"\u0447",
"\u0445",
"\u0436",
"\u0446"
],
"Japanese": [
"\u306e",
"\u306b",
"\u308b",
"\u305f",
"\u306f",
"\u30fc",
"\u3068",
"\u3057",
"\u3092",
"\u3067",
"\u3066",
"\u304c",
"\u3044",
"\u30f3",
"\u308c",
"\u306a",
"\u5e74",
"\u30b9",
"\u3063",
"\u30eb",
"\u304b",
"\u3089",
"\u3042",
"\u3055",
"\u3082",
"\u308a"
],
"Portuguese": [
"a",
"e",
"o",
"s",
"i",
"r",
"d",
"n",
"t",
"m",
"u",
"c",
"l",
"p",
"g",
"v",
"b",
"f",
"h",
"\u00e3",
"q",
"\u00e9",
"\u00e7",
"\u00e1",
"z",
"\u00ed"
],
"Swedish": [
"e",
"a",
"n",
"r",
"t",
"s",
"i",
"l",
"d",
"o",
"m",
"k",
"g",
"v",
"h",
"f",
"u",
"p",
"\u00e4",
"c",
"b",
"\u00f6",
"\u00e5",
"y",
"j",
"x"
],
"Chinese": [
"\u7684",
"\u5e74",
"\u4e00",
"\u5728",
"\u662f",
"\u4e2d",
"\u4eba",
"\u5927",
"\u6709",
"\u70ba",
"\u548c",
"\u4ee5",
"\u65e5",
"\u4e86",
"\u6708"
],
"Catalan": [
"e",
"a",
"s",
"i",
"r",
"l",
"n",
"t",
"o",
"d",
"c",
"u",
"m",
"p",
"v",
"b",
"g",
"f",
"h",
"q",
"\u00f3",
"\u00e9",
"x",
"\u00e0",
"y",
"\u00ed"
],
"Ukrainian": [
"\u043e",
"\u0430",
"\u043d",
"\u0456",
"\u0438",
"\u0440",
"\u0432",
"\u0442",
"\u0435",
"\u0441",
"\u043a",
"\u043b",
"\u0443",
"\u0434",
"\u043c",
"\u043f",
"\u0437",
"\u044f",
"\u044c",
"\u0431",
"\u0433",
"\u0439",
"\u0447",
"\u0445",
"\u0446",
"\u0457"
],
"Norwegian": [
"e",
"r",
"n",
"t",
"a",
"s",
"i",
"o",
"l",
"d",
"g",
"k",
"m",
"v",
"f",
"p",
"u",
"b",
"h",
"\u00e5",
"y",
"j",
"\u00f8",
"c",
"\u00e6",
"w"
],
"Finnish": [
"a",
"i",
"n",
"t",
"e",
"s",
"l",
"o",
"u",
"k",
"\u00e4",
"m",
"r",
"v",
"j",
"h",
"p",
"y",
"d",
"\u00f6",
"g",
"c",
"b",
"f",
"w",
"z"
],
"Vietnamese": [
"n",
"h",
"t",
"i",
"c",
"g",
"a",
"o",
"u",
"m",
"l",
"r",
"\u00e0",
"\u0111",
"s",
"e",
"v",
"p",
"b",
"y",
"\u01b0",
"d",
"\u00e1",
"k",
"\u1ed9",
"\u1ebf"
],
"Czech": [
"o",
"e",
"a",
"n",
"t",
"s",
"i",
"l",
"v",
"r",
"k",
"d",
"u",
"m",
"p",
"\u00ed",
"c",
"h",
"z",
"\u00e1",
"y",
"j",
"b",
"\u011b",
"\u00e9",
"\u0159"
],
"Hungarian": [
"e",
"a",
"t",
"l",
"s",
"n",
"k",
"r",
"i",
"o",
"z",
"\u00e1",
"\u00e9",
"g",
"m",
"b",
"y",
"v",
"d",
"h",
"u",
"p",
"j",
"\u00f6",
"f",
"c"
],
"Korean": [
"\uc774",
"\ub2e4",
"\uc5d0",
"\uc758",
"\ub294",
"\ub85c",
"\ud558",
"\uc744",
"\uac00",
"\uace0",
"\uc9c0",
"\uc11c",
"\ud55c",
"\uc740",
"\uae30",
"\uc73c",
"\ub144",
"\ub300",
"\uc0ac",
"\uc2dc",
"\ub97c",
"\ub9ac",
"\ub3c4",
"\uc778",
"\uc2a4",
"\uc77c"
],
"Indonesian": [
"a",
"n",
"e",
"i",
"r",
"t",
"u",
"s",
"d",
"k",
"m",
"l",
"g",
"p",
"b",
"o",
"h",
"y",
"j",
"c",
"w",
"f",
"v",
"z",
"x",
"q"
],
"Turkish": [
"a",
"e",
"i",
"n",
"r",
"l",
"\u0131",
"k",
"d",
"t",
"s",
"m",
"y",
"u",
"o",
"b",
"\u00fc",
"\u015f",
"v",
"g",
"z",
"h",
"c",
"p",
"\u00e7",
"\u011f"
],
"Romanian": [
"e",
"i",
"a",
"r",
"n",
"t",
"u",
"l",
"o",
"c",
"s",
"d",
"p",
"m",
"\u0103",
"f",
"v",
"\u00ee",
"g",
"b",
"\u0219",
"\u021b",
"z",
"h",
"\u00e2",
"j"
],
"Farsi": [
"\u0627",
"\u06cc",
"\u0631",
"\u062f",
"\u0646",
"\u0647",
"\u0648",
"\u0645",
"\u062a",
"\u0628",
"\u0633",
"\u0644",
"\u06a9",
"\u0634",
"\u0632",
"\u0641",
"\u06af",
"\u0639",
"\u062e",
"\u0642",
"\u062c",
"\u0622",
"\u067e",
"\u062d",
"\u0637",
"\u0635"
],
"Arabic": [
"\u0627",
"\u0644",
"\u064a",
"\u0645",
"\u0648",
"\u0646",
"\u0631",
"\u062a",
"\u0628",
"\u0629",
"\u0639",
"\u062f",
"\u0633",
"\u0641",
"\u0647",
"\u0643",
"\u0642",
"\u0623",
"\u062d",
"\u062c",
"\u0634",
"\u0637",
"\u0635",
"\u0649",
"\u062e",
"\u0625"
],
"Danish": [
"e",
"r",
"n",
"t",
"a",
"i",
"s",
"d",
"l",
"o",
"g",
"m",
"k",
"f",
"v",
"u",
"b",
"h",
"p",
"\u00e5",
"y",
"\u00f8",
"\u00e6",
"c",
"j",
"w"
],
"Esperanto": [
"a",
"o",
"e",
"i",
"n",
"r",
"l",
"s",
"t",
"k",
"d",
"j",
"u",
"m",
"p",
"v",
"g",
"c",
"b",
"f",
"\u011d",
"h",
"z",
"\u016d",
"\u0109",
"\u015d"
],
"Serbian": [
"\u0430",
"\u0438",
"\u043e",
"\u0435",
"\u043d",
"\u0440",
"\u0441",
"\u0443",
"\u0442",
"\u043a",
"\u0458",
"\u0432",
"\u0434",
"\u043c",
"\u043f",
"\u043b",
"\u0433",
"\u0437",
"\u0431",
"a",
"i",
"e",
"o",
"n",
"\u0446",
"\u0448"
],
"Lithuanian": [
"i",
"a",
"s",
"o",
"r",
"e",
"t",
"n",
"u",
"k",
"m",
"l",
"p",
"v",
"d",
"j",
"g",
"\u0117",
"b",
"y",
"\u0173",
"\u0161",
"\u017e",
"c",
"\u0105",
"\u012f"
],
"Slovene": [
"e",
"a",
"i",
"o",
"n",
"r",
"s",
"l",
"t",
"j",
"v",
"k",
"d",
"p",
"m",
"u",
"z",
"b",
"g",
"h",
"\u010d",
"c",
"\u0161",
"\u017e",
"f",
"y"
],
"Slovak": [
"o",
"a",
"e",
"n",
"i",
"r",
"v",
"t",
"s",
"l",
"k",
"d",
"m",
"p",
"u",
"c",
"h",
"j",
"b",
"z",
"\u00e1",
"y",
"\u00fd",
"\u00ed",
"\u010d",
"\u00e9"
],
"Malay": [
"a",
"n",
"e",
"i",
"r",
"t",
"u",
"k",
"s",
"d",
"m",
"l",
"g",
"p",
"b",
"h",
"o",
"y",
"j",
"c",
"w",
"f",
"v",
"z",
"x",
"q"
],
"Hebrew": [
"\u05d9",
"\u05d5",
"\u05d4",
"\u05dc",
"\u05e8",
"\u05d1",
"\u05ea",
"\u05de",
"\u05d0",
"\u05e9",
"\u05e0",
"\u05e2",
"\u05dd",
"\u05d3",
"\u05e7",
"\u05d7",
"\u05e4",
"\u05e1",
"\u05db",
"\u05d2",
"\u05d8",
"\u05e6",
"\u05df",
"\u05d6",
"\u05da"
],
"Bulgarian": [
"\u0430",
"\u0438",
"\u043e",
"\u0435",
"\u043d",
"\u0442",
"\u0440",
"\u0441",
"\u0432",
"\u043b",
"\u043a",
"\u0434",
"\u043f",
"\u043c",
"\u0437",
"\u0433",
"\u044f",
"\u044a",
"\u0443",
"\u0431",
"\u0447",
"\u0446",
"\u0439",
"\u0436",
"\u0449",
"\u0445"
],
"Kazakh": [
"\u0430",
"\u044b",
"\u0435",
"\u043d",
"\u0442",
"\u0440",
"\u043b",
"\u0456",
"\u0434",
"\u0441",
"\u043c",
"\u049b",
"\u043a",
"\u043e",
"\u0431",
"\u0438",
"\u0443",
"\u0493",
"\u0436",
"\u04a3",
"\u0437",
"\u0448",
"\u0439",
"\u043f",
"\u0433",
"\u04e9"
],
"Baque": [
"a",
"e",
"i",
"n",
"r",
"t",
"k",
"o",
"z",
"u",
"l",
"d",
"b",
"s",
"g",
"m",
"p",
"h",
"x",
"f",
"j",
"c",
"v",
"y",
"w",
"\u00e9"
],
"Volap\u00fck": [
"n",
"a",
"l",
"i",
"e",
"s",
"o",
"d",
"m",
"t",
"\u00e4",
"b",
"\u00f6",
"f",
"u",
"p",
"\u00fc",
"k",
"r",
"v",
"y",
"c",
"g",
"z",
"h",
"j"
],
"Croatian": [
"a",
"i",
"o",
"e",
"n",
"r",
"j",
"s",
"t",
"u",
"k",
"l",
"v",
"d",
"m",
"p",
"g",
"z",
"b",
"c",
"\u010d",
"h",
"\u0161",
"\u017e",
"\u0107",
"f"
],
"Hindi": [
"\u0915",
"\u0930",
"\u0938",
"\u0928",
"\u0924",
"\u092e",
"\u0939",
"\u092a",
"\u092f",
"\u0932",
"\u0935",
"\u091c",
"\u0926",
"\u0917",
"\u092c",
"\u0936",
"\u091f",
"\u0905",
"\u090f",
"\u0925",
"\u092d",
"\u0921",
"\u091a",
"\u0927",
"\u0937",
"\u0907"
],
"Estonian": [
"a",
"i",
"e",
"s",
"t",
"l",
"u",
"n",
"o",
"k",
"r",
"d",
"m",
"v",
"g",
"p",
"j",
"h",
"\u00e4",
"b",
"\u00f5",
"\u00fc",
"f",
"c",
"\u00f6",
"y"
],
"Azeri": [
"a",
"i",
"\u0259",
"n",
"r",
"l",
"d",
"s",
"m",
"\u0131",
"t",
"y",
"u",
"b",
"e",
"k",
"o",
"\u00fc",
"\u015f",
"q",
"v",
"z",
"h",
"c",
"f",
"x"
],
"Galician": [
"a",
"e",
"o",
"n",
"s",
"i",
"r",
"d",
"t",
"c",
"u",
"l",
"m",
"p",
"b",
"g",
"f",
"v",
"x",
"\u00f3",
"h",
"q",
"\u00ed",
"\u00e1",
"\u00e9",
"z"
],
"Simple English": [
"e",
"a",
"t",
"i",
"o",
"n",
"s",
"r",
"h",
"l",
"d",
"c",
"m",
"u",
"f",
"p",
"g",
"w",
"b",
"y",
"v",
"k",
"j",
"x",
"z",
"q"
],
"Nynorsk": [
"e",
"a",
"r",
"n",
"t",
"i",
"s",
"o",
"l",
"d",
"g",
"k",
"m",
"v",
"u",
"f",
"p",
"h",
"b",
"\u00e5",
"j",
"y",
"\u00f8",
"c",
"w",
"\u00e6"
],
"Thai": [
"\u0e32",
"\u0e19",
"\u0e23",
"\u0e2d",
"\u0e01",
"\u0e40",
"\u0e07",
"\u0e21",
"\u0e22",
"\u0e25",
"\u0e27",
"\u0e14",
"\u0e17",
"\u0e2a",
"\u0e15",
"\u0e30",
"\u0e1b",
"\u0e1a",
"\u0e04",
"\u0e2b",
"\u0e41",
"\u0e08",
"\u0e1e",
"\u0e0a",
"\u0e02",
"\u0e43"
],
"Greek": [
"\u03b1",
"\u03c4",
"\u03bf",
"\u03b9",
"\u03b5",
"\u03bd",
"\u03c1",
"\u03c3",
"\u03ba",
"\u03b7",
"\u03c0",
"\u03c2",
"\u03c5",
"\u03bc",
"\u03bb",
"\u03af",
"\u03cc",
"\u03ac",
"\u03b3",
"\u03ad",
"\u03b4",
"\u03ae",
"\u03c9",
"\u03c7",
"\u03b8",
"\u03cd"
],
"Macedonian": [
"\u0430",
"\u043e",
"\u0438",
"\u0435",
"\u043d",
"\u0442",
"\u0440",
"\u0441",
"\u0432",
"\u043a",
"\u0434",
"\u043b",
"\u043f",
"\u043c",
"\u0443",
"\u0458",
"\u0433",
"\u0437",
"\u0431",
"\u0447",
"\u0448",
"\u0446",
"\u0436",
"\u0444",
"\u045a"
],
"Serbocroatian": [
"a",
"i",
"o",
"e",
"n",
"r",
"j",
"s",
"t",
"u",
"k",
"l",
"d",
"v",
"m",
"p",
"g",
"z",
"b",
"c",
"\u010d",
"h",
"\u0161",
"\u017e",
"\u0107",
"f"
],
"Tamil": [
"\u0b95",
"\u0ba4",
"\u0baa",
"\u0b9f",
"\u0bb0",
"\u0bae",
"\u0bb2",
"\u0ba9",
"\u0bb5",
"\u0bb1",
"\u0baf",
"\u0bb3",
"\u0b9a",
"\u0ba8",
"\u0b87",
"\u0ba3",
"\u0b85",
"\u0b86",
"\u0bb4",
"\u0b99",
"\u0b8e",
"\u0b89",
"\u0b92",
"\u0bb8"
],
"Classical Chinese": [
"\u4e4b",
"\u5e74",
"\u70ba",
"\u4e5f",
"\u4ee5",
"\u4e00",
"\u4eba",
"\u5176",
"\u8005",
"\u570b",
"\u6709",
"\u4e8c",
"\u5341",
"\u65bc",
"\u66f0",
"\u4e09",
"\u4e0d",
"\u5927",
"\u800c",
"\u5b50",
"\u4e2d",
"\u4e94",
"\u56db"
]
{
"English": [
"e",
"a",
"t",
"i",
"o",
"n",
"s",
"r",
"h",
"l",
"d",
"c",
"u",
"m",
"f",
"p",
"g",
"w",
"y",
"b",
"v",
"k",
"x",
"j",
"z",
"q"
],
"German": [
"e",
"n",
"i",
"r",
"s",
"t",
"a",
"d",
"h",
"u",
"l",
"g",
"o",
"c",
"m",
"b",
"f",
"k",
"w",
"z",
"p",
"v",
"\u00fc",
"\u00e4",
"\u00f6",
"j"
],
"French": [
"e",
"a",
"s",
"n",
"i",
"t",
"r",
"l",
"u",
"o",
"d",
"c",
"p",
"m",
"\u00e9",
"v",
"g",
"f",
"b",
"h",