Browse Source

Updated readme.

dev
Franco Masotti 4 years ago
parent
commit
e14073fd6a
  1. 162
      README.md
  2. 2
      spectrscan

162
README.md

@ -25,67 +25,76 @@ paper to pdf converter suitable for texts.
[](TOC)
## Examples
## Notes
Remember that the output file supplied in the command line can either be an
empty file or an existing PDF file. In the latter case, newly scanned documents
will be automatically appended to the tail of the file.
### Default options
Also note that the mode, resolution and source options must be supported by
your scanner.
- Scan in lineart mode, from the automatic document feeder,
with a resolution of 600 DPI, using Unpaper and with image enhancing options
on the output file `out.pdf`
Default options are defined in the `spectrscan.conf` file.
## Examples
- Scan double sided paper using the ADF
./spectrscan -t out.pdf
Take 3 papers as an example. Each paper has 2 sides, so there are
6 total sides. Mark each side with a progressive number:
1,2,3,4,5,6
Put the papers in the ADF so that sides 1,3,5 (in this order) will be scanned.
Side 1 is the one facing towards you.
Once these 3 sides are scanned you should see sides 6,4,2 (in this order) out
of the ADF. Side 6 is now facing towards you. This means that the ADF
reverses the order of the papers (if you have a scanner with this feature).
./specrescan out.pdf
You can now sides 6,4,2 (in this order) in the ADF. spectrscan will sort the
the sides so that the final result will be
### Two-sided
1,2,3,4,5,6
- Same as before but for double sided paper
In case your scanner does not reverse the order you can use the following
option
./spectrscan -o out.pdf
./spectrscan --two-sided-reverse out.pdf
### Misc
which is comparable to a result of
1,3,5,6,4,2
for the previous example
- Scan in colour, with a resolution of 300 DPI, using the flatbed,
on the output file `out.pdf`
./spectrscan -m Color -r 300 -s Flatbed out.pdf
- Disable unpaper (same procedure for imagemagick):
- Enable unpaper (same procedure for imagemagick):
./spectrscan -ufalse out.pdf
./spectrscan --unpaper_options=false out.pdf
./spectrscan -u out.pdf
./spectrscan --unpaper_options='<unpaper options>' out.pdf
If the scanned text results unreadable try using the `Gray` mode instead
of the default `Lineart`.
Currently, passing options to unpaper and imagemagick is not working. You
should edit the options directly in the script. By default, contrast is set at
a very high level. You can edit
imagemagick_options="-normalize -level 70%,100%,1.0"
By default contrast is set at a very high level. You can modify the parameters
with something like
with something like:
imagemagick_options="-normalize -level 20%,100%,1.0"
./spectrscan --imagemagick_options='-normalize -level 20%,100%,1.0'
and see what happens.
## Path
You can call spectrscan from any directory by modifying the shell's path.
You can then call spectrscan:
cd <specrscan repo directory>
PATH="$PATH:$(pwd)"
cd <document destination directory>
spectrscan <whatever>
## Help
```
Usage: spectrscan [OPTIONS] OUTFILE
Usage: spectrscan [OPTION] OUTFILE
An unintrusive frontend of scanimage which acts as a
paper to pdf converter suitable for texts.
@ -94,53 +103,48 @@ as the tail of the existing one.
The default system scanner is used.
Mandatory arguments to long options are mandatory for short options too.
Options:
-h, --help print this help
-i, --imagemagick-options pass options to ImageMagick
to post-process the documents
-m, --mode scan in Color, Lineart, Gray or whatever
supported method
--list-modes list all possible scan modes
-o, --odd-even toggle preserve the order in double sided paper:
scan a batch of papers one side, then the other
-r, --resolution page resolution in DPI
--list-resolutions list all possible resolutions
-s, --source scan from the ADF, Flatbed or whatever
supported method
--list-sources list all possible sources
-u, --unpaper-options pass options to unpaper
to post-process the documents
Current enabled options:
--imagemagick-options="-contrast-stretch 0.5%x10% -compress lzw"
--mode "Lineart" --odd-even="false" --resolution "600"
--source "ADF" --unpaper-options="true"
The magic values of "true" and "false" can be used
to enable or disable:
--odd-even=<value>
--unpaper-options=<value> (if "false", unpaper is disabled)
Dependencies: GNU Bash; GNU Core Utilities; Gawk; SANE; ImageMagick
unpaper; PDFtk; GNU Parallel; Netpbm
-h, --help print this help
-i, --imagemagick=OPTIONS pass options to ImageMagick
to post-process the documents
--list-modes list all possible scan modes
--list-resolutions list all possible resolutions
--list-sources list all possible sources
-m, --mode=MODE scan in Color, Lineart, Gray or whatever
supported method
--print-flags print the enabled options. This can also
be used to print the default options
-r, --resolution=RESOLUTION page resolution in DPI
-s, --source=SOURCE scan from the ADF, Flatbed or whatever
supported method
-t, --two-sided toggle preserve the order in
double sided paper: scan a batch of
papers one side, then the other
--two-sided-reverse same as '--two-sided' but with the need of
reversing every single paper. This
option conflicts with '--two-sided'
-u, --unpaper[=OPTIONS] enable unpaper. You may pass
options to unpaper
Exit status:
0 if OK,
1 if an error occurred.
Copyright © 2017 Franco Masotti. License GPLv3+: GNU GPL version 3 or
later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it. There
is NO WARRANTY, to the extent permitted by law.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
Copyright © 2017 Franco Masotti.
```
## Relevant features
- Parallel immage processing (based on the number of processor cores)
- Parallel image processing (based on the number of processor cores)
cuts the time by a factor of `#cores` after page scanning has taken place.
This is very effective for a large number of pages.
- Odd-even page numbers scanning.
- Two-sided page scanning.
- Basic unpaper and ImageMagick post-processing.
@ -148,8 +152,6 @@ is NO WARRANTY, to the extent permitted by law.
https://bugs.launchpad.net/simple-scan/+bug/983441
http://netpbm.sourceforge.net/doc/pamfix.html
http://www.jduck.net/blog/2008/01/05/ocr-scanning/
https://www.ubuntu-user.com/Magazine/Archive/2013/18/Scanning-and-editing-text-with-OCR
@ -171,7 +173,7 @@ http://www.jpeek.com/articles/linuxmag/2006-08/
- Scanner software
- [ImageMagick](http://www.imagemagick.org/)
- PNM to PDF converter and image processing tool
- TIFF to PDF converter and image processing tool
- [unpaper](https://github.com/Flameeyes/unpaper)
- Remove issues with scanned images (paper margins, etc...)
@ -186,10 +188,10 @@ http://www.jpeek.com/articles/linuxmag/2006-08/
- Execute image post processing jobs in parallel
- [Netpbm](http://netpbm.sourceforge.net/)
- Fix the newly scanned immages by truncating the unnecessary parts. This
- Fix the newly scanned immages by truncating the unnecessary parts. This
is mostly useful when scanning from the ADF
- In my case I have an HP Officejet 2620 connected via USB to a server as a
network printer/scanner. Scanning using the "Flatbed" option poses no
- In my case I have an HP Officejet 2620 connected via USB to a server as a
network printer/scanner. Scanning using the "Flatbed" option poses no
problems, while using the "ADF", the image is somehow corrupted
and it contains a black box adjacent to the scanned document.
If I try to post process the image with unpaper and/or ImageMagick
@ -199,19 +201,19 @@ http://www.jpeek.com/articles/linuxmag/2006-08/
described like this:
> A toolkit for manipulation of graphic images, without nonfree parts and
> patent issues
> patent issues
## Coming soon
- Options to add
- Compression
- Number of pages to scan
- Option for prompting for each page to scan
- Better default options to pass to unpaper
- OCR (training (GOCR)? + text file outputs)
- options: keep PDF, keep txt, keep both
- Better parallel processing
- Watch inotifies for a new out*.pnm
- show preview
- compression
- number of pages to scan
- option for prompting for each page to scan
- better default options to pass to unpaper
- OCR (training (GOCR)? + text file outputs)
- options: keep PDF, keep txt, keep both
- better processing
- watch inotifies for a new out*.tiff
would be faster than post-processing in parallel
## Origin of the name

2
spectrscan

@ -22,8 +22,6 @@
# along with spectrscan. If not, see <http://www.gnu.org/licenses/>.
#
set -x
. ./spectrscan.conf
check_software()