|R 04d4b66c89||1 week ago|
|doc||10 months ago|
|filecabinet||1 week ago|
|.gitignore||1 year ago|
|LICENSE||1 year ago|
|MANIFEST.in||10 months ago|
|README.md||10 months ago|
|example.conf||1 year ago|
|requirements-doc.txt||1 year ago|
|requirements-ocr.txt||1 year ago|
|requirements-pdf.txt||10 months ago|
|requirements-web.txt||10 months ago|
|requirements.txt||1 year ago|
|setup.py||10 months ago|
filecabinet is a minimal document management system for your computer. It has metadata per document and supports fulltext search in various document types.
This readme explains the simple, single-user, local deployment scenario. If
you want to use this for multiple users, please read
Here’s what the web interface looks like:
To install filecabinet, you should clone the repository, install all requirements, and then filecabinet itself:
$ git clone https://git.spacepanda.se/bold-kitty/filecabinet.git $ cd filecabinet $ pip install --user -r requirements.txt $ pip install --user .
The following additional requirements are optional:
In case you prefer the web interface over the command line shell, you will need this:
$ pip install --user -r requirements-web.txt
If you want to use PDF metadata extraction, you should also install these requirements:
$ pip install --user -r requirements-pdf.txt
For office file metadata (and fulltext search), these requirements are also necessary (and it’s a good idea to have OpenOffice installed):
$ pip install --user -r requirements-doc.txt
If you have scanned documents and want optical character recognition, you will need to install tesseract and this:
$ pip install --user -r requirements-ocr.txt
After the installation, you can copy the
example.conf to the user
configuration directory (usually that’s
~/.config/) and name it
Then you should edit the file to create a cabinet folder, where your documents
will be stored, for example in
~/Documents/cabinet like this:
# ~/.config/filecabinet.conf [cabinet1] name = My File Cabinet path = ~/Documents/cabinet
Now it’s time to add some documents to the cabinet:
$ filecabinet add Document/my-document.pdf
To inspect inspect all files, you can either start the web interface:
$ filecabinet web --browser
--browser option will make sure that the website is immediately opened
in your webbrowser.
Or you can use the shell, if you so prefer:
$ filecabinet shell
help to see the available commands and see below in the Shell section
for more help.
In order to add documents to the filecabinet, you have to copy them into your
Once they are in that folder, you have to tell filecabinet that there are new
files to pick up. You can do that either in the shell with the
command or in your commandline with
$ filecabinet pickup
If configured, filecabinet will run optical character recognition (OCR) on
pictures and PDF. It will use other tricks to try to extract as much metadata
(and the full text) as it can.
Then the document is copied into the cabinet folder and marked as new.
An alternative way is to add the document through the
$ filecabinet add that-file.pdf the-other-file.doc
To indicate that all files belong to the same document, the
parameter (or short
-s) can be used:
$ filecabinet add -s page1.pdf page2.txt
Both web interface and shell support the same search terms and mechanisms listed here.
Searching for tags is done case-insensitive and is done using
For example if you're looking for a document that's tagged with banana, you
can search for it by
Searching new documents is accomplished by searching for
:new:y. If you only want to find documents that are not new, you can also
:new:no. Unless specified, a search will ignore whether or not a
document is new.
You can search for documents by date range using
with dates in the form
yyyy-mm-dd. These dates are exclusive.
If you are looking for a document with a date between February 14 and 21 in the
year 2018, you can search like this:
By default documents that are deleted are ignored in searches or listings. You
can search through deleted documents by searching for
You can search for any metadata value, like title, author, or language,
by searching with the metadata name and a colon like
Everything else that does not match the special search terms will be used in the fulltext search.
Every search term is a case-insensitive regular expression. So you can search
If you want to search for terms with whitespaces, you can use quotes:
The title contains "brain", is from author "Gumby" and it was set to some time
before August 2005:
title:brain author:gumby :before:2015-08-01
Looking for a newly added document with the title "The Larch":
This is what the shell looks like:
The shell has only a minimal built-in help. Try entering
To open documents from the shell with the
open command, you have to
configure a script to open files with. A strong recommendation is
the ranger filemanager:
# ~/.config/filecabinet.conf [Shell] document_opener = rifle
From within the shell you can edit the metadata of documents with the
command. This will, unless configured otherwise, try to use your configured
text editors (see environment variables
You can override that behaviour by specifying your own editor in the configuration file:
# ~/.config/filecabinet.conf [Shell] document_editor = nano
If you decide to use a graphical editor, make sure it does not return until
you are done editing.
gedit should be doing that by default, but for example
Sublime Text must be set up with the
--wait flag and
kate must receive the
[Shell] document_editor = kate --block # document_editor = subl3 --wait
The shell allows searching with the
find command and some search
> list author:gumby
filecabinet can use Tesseract OCR to do character recognition on pictures and scanned PDFs, so you can search the text of images.
In order for that to work, you have to install Tesseract and some language packages, depending on the languages of the documents you wish to scan.
As the last step you should enable OCR in your configuration file:
# ~/.config/filecabinet.conf [OCR] enabled = yes languages = eng, fra
Make sure you have the corresponding language data packages installed! Otherwise filecabinet will just die.
Assuming a cabinet is set up at
~/cabinet, the directory structure is:
~/cabinet | +-- incoming | +-- documents | +-- <partial document id> | +-- <full document id> | +-- document.yaml | +-- <version number> | +-- version.yaml | +-- <part id>.<ext>