scandor

scandor is a tool that makes scanning, identifying and archiving of documents as easy as possible. The graphical interface has only the minimum of buttons needed for this workflow. Scanning can be as easy as:

If needed, you may select the option to sort/rotate documents.

The configuration allows you to define document types. Each document type can be stored in it's own folder.

The programm will update itself automatically.

scandor is free for non commercial usage.

Configuration

The configuration file docTypes.xml is used to specify how documents will be identified, and where they will be stored. It's possible to use parts of the documents content as part of the filename. This setting needs to be done once per document type.

NEW: The configuration can be changed inside the UI.

Example:

	<Type name="Rechnung Allgemein" fileTargetName="C:/dokumente/Rechnungen/${CURRENT_YEAR}/${NAME}_${DATUM}.pdf">
		<select pattern="Rechnung"/>
		<select pattern="Betrag"/>
		<select pattern="Auftrag"/>
		<select pattern="(?i)MWSt"/>
		<select pattern=" +(\d\d\.\d\d\.\d\d\d\d)" parameter="DATUM"/>
	</Type>
	
The entries:

Placeholder

Directory and filenames can contain placeholders that will be replaced either automatically or by the user. A placeholder looks like this:

	${XXX}
	
XXX is the name of the placeholder. The program already knows this predefined placeholders: Additional placeholders can be replaced by values from the document. Use the parameter attribute of select to specify the names of the placeholders.
If scandor cannot find a value for an placeholder, then the user will be asked to enter a value manually.

Pattern

A Pattern describes something that should be present in a document type. The simplest pattern is just text.
The document type with the highest matching rate will be selected.
Simple text can be used directly as pattern. But it should not contain any special characters like .,() or similar.

	<select pattern="Rechnung"/>
	
Additional to the simple text patterns, patterns can contain readable regular expressions. These are divided into 4 types of commands. commads are separated by dots. A command may have some parameters.
	<select pattern="add('Rechnung').anyCharacter().count(4)"/>
	

Content commands

The simplest command is add. It's just like adding simple text in the example above.

	<select pattern="add('Rechnung')"/>
	
More content commands are:

Amount commands

These commands are always referencing to the previously defined content command. They define how often the previous element should appear.

Grouping commands

Groupings always have an beginning and an end. There are 2 different types of grouping commands.

The difference is, that the content of a capturing group can be assinged to a placeholder.

Example

Heres a example Document:

	Rechnung
	Max Mustermann AG
	Sonnenweg 947
	99111 Musterstadt                                           Rechnungsdatum: 01.01.2000
	
	Rechnung über ein Rauschiff der Klasse G.
	Hyperantrieb                    47.000.000 ED
	Traktorstrahl                      499.000 ED
	Rumpf                            8.000.000 ED
	
	Gesamtbetrag:                   55.499.000 ED
	
	Bitte überweisen Sie den Betrag bis zum 01.01.3748 an die imperiale Sternenflotte.
	
	Max Mustermann AG, IBAN ZZ11 1111 2222 3333 4444 5555, Bank des Universums, Musterstadt
	
Here are some patterns used to identify the document:
add('Rechnung')
Identifies the document as a bill
add('Mustermann AG')
Identifies the company "Mustermann AG"
add('Rechnungsdatum').anyCharacter().oneOrMore().capture().digit().count(2).dot().digit().count(2).dot().digit().count(4).captureEnd()
Identifies the date of the bill. The capturing group allows to assign the date to a placeholder
date()
Identifies the first date on the page. In this example it will be identical to the previous one.