You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Diego Elio Pettenò fce597deee Simplify Mergify configuration by using a single `or` condition. 5 months ago
.github/workflows GitHub Actions: use newer versions of the standard actions. 1 year ago
LICENSES Add an EditorConfig file. 5 months ago
doc Fix typos 11 months ago
tests Fix string handling for parseMultiIndex() in case of non-range. 1 year ago
.dir-locals.el Actually use MIT rather than CC0, for OSI compatibility. 2 years ago
.editorconfig Add an EditorConfig file. 5 months ago
.gitignore tests: use pytest (and pillow) for tests. 2 years ago
.mailmap Actually use MIT rather than CC0, for OSI compatibility. 2 years ago
.mergify.yml Simplify Mergify configuration by using a single `or` condition. 5 months ago
.pre-commit-config.yaml Update pre-commit config. 5 months ago
AUTHORS Update AUTHORS with my new employer's name. 2 years ago
NEWS Update the NEWS file to include unpaper-7 release. 5 months ago
README.md Remove obsolete Travis CI reference. 5 months ago
constants.h Use a typedef for masks. 5 months ago
file.c Stylistic fixes for `file.c` 8 months ago
imageprocess.c Use a typedef for masks. 5 months ago
imageprocess.h Use a typedef for masks. 5 months ago
meson.build Feed the unpaper version from Meson directly. 5 months ago
parse.c Fix typos 11 months ago
parse.h Use clang-format to reformat the whole source code. 2 years ago
tools.c Fix typos 11 months ago
tools.h Use clang-format to reformat the whole source code. 2 years ago
unpaper.c Feed the unpaper version from Meson directly. 5 months ago
unpaper.h Use clang-format to reformat the whole source code. 2 years ago
version.h.in Feed the unpaper version from Meson directly. 5 months ago

README.md

unpaper

Originally written by Jens Gulden — see AUTHORS for more information. The entire unpaper project is licensed under GNU GPL v2. Some of the individual files are licensed under the MIT or Apache 2.0 licenses. Each file contains an SPDX license header specifying its license. The text of all three licenses is available under LICENSES.

Overview

unpaper is a post-processing tool for scanned sheets of paper, especially for book pages that have been scanned from previously created photocopies. The main purpose is to make scanned book pages better readable on screen after conversion to PDF. Additionally, unpaper might be useful to enhance the quality of scanned pages before performing optical character recognition (OCR).

unpaper tries to clean scanned images by removing dark edges that appeared through scanning or copying on areas outside the actual page content (e.g. dark areas between the left-hand-side and the right-hand-side of a double- sided book-page scan).

The program also tries to detect misaligned centering and rotation of pages and will automatically straighten each page by rotating it to the correct angle. This process is called "deskewing".

Note that the automatic processing will sometimes fail. It is always a good idea to manually control the results of unpaper and adjust the parameter settings according to the requirements of the input. Each processing step can also be disabled individually for each sheet.

See further documentation for the supported file formats notes.

Dependencies

The only hard dependency of unpaper is ffmpeg, which is used for file input and output.

Building instructions

unpaper uses the Meson Build system, which can be installed using Python's package manage (pip3 or pip):

unpaper$ pip3 install --user 'meson >= 0.57' 'sphinx >= 3.4'
unpaper$ CFLAGS="-march=native" meson --buildtype=debugoptimized builddir
unpaper$ meson compile -C builddir

You can pass required optimization flags when creating the meson build directory in the CFLAGS environment variable. Usage of Link-Time Optimizations (Meson option -Db_lto=true) is recommended if available.

Further optimizations such as -ftracer and -ftree-vectorize are thought to work, but their effect has not been evaluated so your mileage may vary.

Tests depend on pytest and pillow, which will be auto-detected by Meson.

Further Information

You can find more information on the basic concepts and the image processing in the available documentation.