scandor
scandor is a tool that makes scanning, identifying and archiving of documents as easy as possible. The graphical interface has only the minimum of buttons needed for this workflow. Scanning can be as easy as:
- Click Scan
- Document will be scanned and identified
- Approve Save operation
The configuration allows you to define document types. Each document type can be stored in it's own folder.
The programm will update itself automatically.
scandor is free for non commercial usage.
Configuration
The configuration file docTypes.xml is used to specify how documents will be identified, and where they will be stored. It's possible to use parts of the documents content as part of the filename. This setting needs to be done once per document type.
NEW: The configuration can be changed inside the UI.
Example:
<Type name="Rechnung Allgemein" fileTargetName="C:/dokumente/Rechnungen/${CURRENT_YEAR}/${NAME}_${DATUM}.pdf"> <select pattern="Rechnung"/> <select pattern="Betrag"/> <select pattern="Auftrag"/> <select pattern="(?i)MWSt"/> <select pattern=" +(\d\d\.\d\d\.\d\d\d\d)" parameter="DATUM"/> </Type>The entries:
- name: Name of the document that will be dislayed if document is identified.
- fileTargetName: filename of the saved file.
- pattern: A regular Expression that matches an part of the document. You can use Readable Regular Expressions which makes reading and writing of rules simpler.
- parameter: Use the first capturing group of the regular expression as value for the named placeholder.
Placeholder
Directory and filenames can contain placeholders that will be replaced either automatically or by the user. A placeholder looks like this:
${XXX}XXX is the name of the placeholder. The program already knows this predefined placeholders:
- ${CURRENT_YEAR}: The current year
- ${CURRENT_MONTH}: The current month
- ${NUM}: An automatically increased number
If scandor cannot find a value for an placeholder, then the user will be asked to enter a value manually.
Pattern
A Pattern describes something that should be present in a document type. The simplest pattern is just text.
The document type with the highest matching rate will be selected.
Simple text can be used directly as pattern. But it should not contain any special characters like .,() or similar.
<select pattern="Rechnung"/>Additional to the simple text patterns, patterns can contain readable regular expressions. These are divided into 4 types of commands.
- Content
- Amount
- Grouping
- Special Commands
<select pattern="add('Rechnung').anyCharacter().count(4)"/>
Content commands
The simplest command is add. It's just like adding simple text in the example above.
<select pattern="add('Rechnung')"/>More content commands are:
- alpha() A character or digit.
- notAlpha() Not a character or digit
- anyCharacter() Any character
- digit() A digit
- notDigit() Not a digit
- dot() A dot
- tab() A tabulator character
- whitespace() A space, tabulator or linebreak
- oneOf('AA','BB','CC') One of the given Elements
- range('c','f') A range of characters. Here c, d, e or f
- addGroup('ABC') Adds the text as group. This is a combination with the group command
- date() A date
- year() The year of a date
- month() The month of a date
- betrag('TEXT') the amount of cash after the given text
Amount commands
These commands are always referencing to the previously defined content command. They define how often the previous element should appear.
- count(7) The element must appear exactly 7 times
- count(3,5) The element must appear at least 3 but max 5 times
- oneOrMore() The element must appear at least 1 time, but may repeat.
- zeroOrMore() The element must not appear, but if present, it can occur unlimited times
- zeroOrOne() The element must not appear, but if present, then only 1 time
Grouping commands
Groupings always have an beginning and an end. There are 2 different types of grouping commands.
- group().xxx.groupEnd() xxx steht für beliebige Inhalts Kommandos
- capture().xxx.captureEnd() xxx steht für beliebige Inhalts Kommandos
Example
Heres a example Document:
Rechnung Max Mustermann AG Sonnenweg 947 99111 Musterstadt Rechnungsdatum: 01.01.2000 Rechnung über ein Rauschiff der Klasse G. Hyperantrieb 47.000.000 ED Traktorstrahl 499.000 ED Rumpf 8.000.000 ED Gesamtbetrag: 55.499.000 ED Bitte überweisen Sie den Betrag bis zum 01.01.3748 an die imperiale Sternenflotte. Max Mustermann AG, IBAN ZZ11 1111 2222 3333 4444 5555, Bank des Universums, MusterstadtHere are some patterns used to identify the document:
- add('Rechnung')
- Identifies the document as a bill
- add('Mustermann AG')
- Identifies the company "Mustermann AG"
- add('Rechnungsdatum').anyCharacter().oneOrMore().capture().digit().count(2).dot().digit().count(2).dot().digit().count(4).captureEnd()
- Identifies the date of the bill. The capturing group allows to assign the date to a placeholder
- date()
- Identifies the first date on the page. In this example it will be identical to the previous one.