Browse Source

📝 Docs improvements (#158)

* 📝 Docs improvements

add CLI + theme change to furo
pull/159/head
TAHRI Ahmed R 6 months ago committed by GitHub
parent
commit
3bcec1500f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 19
      .readthedocs.yaml
  2. 13
      docs/conf.py
  3. 1
      docs/index.rst
  4. 3
      docs/requirements.txt
  5. 108
      docs/user/cli.rst

19
.readthedocs.yaml

@ -0,0 +1,19 @@
version: 2
build:
os: ubuntu-20.04
tools:
python: "3.9"
# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/conf.py
# If using Sphinx, optionally build your docs in additional formats such as PDF
# formats:
# - pdf
# Optionally declare the Python requirements required to build your docs
python:
install:
- requirements: docs/requirements.txt

13
docs/conf.py

@ -21,9 +21,6 @@
import sys
import os
from recommonmark.parser import CommonMarkParser
import sphinx_rtd_theme
sys.path.insert(0, os.path.abspath(".."))
import charset_normalizer
@ -58,11 +55,9 @@ templates_path = ['_templates']
# source_suffix = ['.rst', '.md']
# source_suffix = '.rst'
source_parsers = {
'.md': CommonMarkParser,
}
source_parsers = {}
source_suffix = ['.rst', '.md']
source_suffix = ['.rst',]
# The master toctree document.
master_doc = 'index'
@ -105,9 +100,9 @@ todo_include_todos = False
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'
html_theme = 'furo'
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
html_theme_path = []
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the

1
docs/index.rst

@ -64,6 +64,7 @@ Start Guide
user/advanced_search
user/handling_result
user/miscellaneous
user/cli
Community Guide
---------------

3
docs/requirements.txt

@ -1,3 +1,2 @@
Sphinx
sphinx_rtd_theme
recommonmark
furo

108
docs/user/cli.rst

@ -0,0 +1,108 @@
Command Line Interface
======================
charset-normalizer ship with a CLI that should be available as `normalizer`.
This is a great tool to fully exploit the detector capabilities without having to write Python code.
Possible use cases:
#. Quickly discover probable originating charset from a file.
#. I want to quickly convert a non Unicode file to Unicode.
#. Debug the charset-detector.
Down bellow, we will guide you through some basic examples.
Arguments
---------
You may simply invoke `normalizer -h` (with the h(elp) flag) to understand the basics.
::
usage: normalizer [-h] [-v] [-a] [-n] [-m] [-r] [-f] [-t THRESHOLD]
file [file ...]
The Real First Universal Charset Detector. Discover originating encoding used
on text file. Normalize text to unicode.
positional arguments:
files File(s) to be analysed
optional arguments:
-h, --help show this help message and exit
-v, --verbose Display complementary information about file if any.
Stdout will contain logs about the detection process.
-a, --with-alternative
Output complementary possibilities if any. Top-level
JSON WILL be a list.
-n, --normalize Permit to normalize input file. If not set, program
does not write anything.
-m, --minimal Only output the charset detected to STDOUT. Disabling
JSON output.
-r, --replace Replace file when trying to normalize it instead of
creating a new one.
-f, --force Replace file without asking if you are sure, use this
flag with caution.
-t THRESHOLD, --threshold THRESHOLD
Define a custom maximum amount of chaos allowed in
decoded content. 0. <= chaos <= 1.
--version Show version information and exit.
.. code:: bash
normalizer ./data/sample.1.fr.srt
Main JSON Output
----------------
🎉 Since version 1.4.0 the CLI produce easily usable stdout result in
JSON format.
.. code:: json
{
"path": "/home/default/projects/charset_normalizer/data/sample.1.fr.srt",
"encoding": "cp1252",
"encoding_aliases": [
"1252",
"windows_1252"
],
"alternative_encodings": [
"cp1254",
"cp1256",
"cp1258",
"iso8859_14",
"iso8859_15",
"iso8859_16",
"iso8859_3",
"iso8859_9",
"latin_1",
"mbcs"
],
"language": "French",
"alphabets": [
"Basic Latin",
"Latin-1 Supplement"
],
"has_sig_or_bom": false,
"chaos": 0.149,
"coherence": 97.152,
"unicode_path": null,
"is_preferred": true
}
I recommend the `jq` command line tool to easily parse and exploit specific data from the produced JSON.
Multiple File Input
-------------------
It is possible to give multiple files to the CLI. It will produce a list instead of an object at the top level.
When using the `-m` (minimal output) it will rather print one result (encoding) per line.
Unicode Conversion
------------------
If you desire to convert any file to Unicode you will need to append the flag `-n`. It will produce another file,
it won't replace it by default.
The newly created file path will be declared in `unicode_path` (JSON output).
Loading…
Cancel
Save