↓ Archives ↓

Ritagliare un PDF (oppure aggiungere margini)

PDFjam è un wrapper su riga di comando per il pacchetto LaTeX di pdfpages (quindi è necessario anche questo e una distribuzione TeX installata, ad esempio la TeXLive).


Per ritagliare un PDF, il comando che ti serve è qualcosa del genere:

pdfjam --keepinfo --trim "10mm 15mm 10mm 15mm" --clip true --suffix "clip" input.pdf

Questo produrrà un file chiamato input-cropped.pdf. Le dimensioni di taglio sono nell’ordine: sinistra, in basso, a destra, in alto.

Quello che c’è di buono in PDFjam rispetto a PDFcrop che fa lo stesso lavoro è che il risultato è più compatto ma quantomeno con l’opzione –keepinfo è possibile mantenere, se non i collegamenti ipertestuali e i segnalibri, almeno le proprietà del documento.

PDFjam può essere usato per molte differenti operazioni. Io lo trovo utile per aggiungere un cospicuo margine esterno ai libri per poterci prendere appunti (in questo caso il margine è messo a destra), usando un margine negativo nella precedente formula e per rendere tutto più gradevole spostare l’intera pagina verso il centro. La formula è per l’offset è:

pdfjam --twoside --offset '2cm 0cm' file.pdf

In definitiva la linea di comando che sto usando è questa:

pdfjam --keepinfo --trim "0mm 15mm -40mm 0mm" --offset "1cm 0cm" --clip true --suffix "clip" test.pdf

Come “spezzare” delle fotocopie di due pagine per facciata in PDF

A volte capita di avere delle fotocopie in PDF con due pagine per facciata e si vorrebbe riottenere un documento con una singola pagina per facciata.

Un modo comodo per farlo è usare il comando mutool del pacchetto MuPDF tramite la linea:

mutool poster -x 2 input.pdf output.pdf

Mac CLI: Simple creation and management of disk images

From Mac command line you can crate a disk image as:

hdiutil create $FOLDER.dmg -ov -volname "$FOLDER" -fs HFS+ -srcfolder "$FOLDER"

If you want to convert the Read-Only DMG image to a writable sparsebundle you can use:

hdiutil convert $FOLDER.dmg -format UDSB -o $FOLDER.sparsebundle

And you can set the dimension as:

hdiutil resize -4g $FOLDER.sparsebundle

Mac sparsebundles can be cloned with rsync as:

rsync -aNHAXEx --delete --protect-args --fileflags --force-change $FOLDER.sparsebundle /path/to/destination


Python course at codeacademy

Today they wrote to me:

Today, we’re pleased to announce the arrival of a new programming language: Python!

Python is a great language with applications in many different fields. Its clean, readable syntax makes it a favorite for beginning programmers – say goodbye to all of those braces and semicolons.

Python is currently in use at places like Google, NASA, and Disney Animation. Also, it has an active community of developers and offers great module support – this means you can easily use code that others have written to accomplish all kinds of tasks!


Loading a shape file into MYSQL with ogr2ogr

Whit ogr you can load a shape file directly onto an existing database. The syntax is quite simple:

ogr2ogr -f "MySQL" MySQL:"ogr,user=root,host=localhost,password=root" -lco engine=MYISAM Comuni_01.shp

Patstat import scripts for MySQL (201204 version)

I finally released the import scripts for mysql. You can find it on github along with a small documentation on the (beautiful) github page here.

Please note this is for April 2012 version. If you’re importing another release of Patstat you have to change the thing accordingly.

coreutils are your friends

Once upon a time there was textutils, a small set of classic Unix utility programs to play with text files.

If in a file you had to print out, or reverse, or select lines, or edit the text stream, paginate it, wrap it, select the beginning or the ending lines and so on, you can use that small commands of the Unix tradition, lasting good old days of Real Programmers.

Nowdays textutils are incorporated in a huge set called coreutils with a lot of other stuff.

They are powerful tools to know even to make your data homeworks, so please make yourself acquainted with them. You’ll be grateful forever.

We’ve already seen a simple example, and many other will come. But remind that coreutils are your friends.

MAC OS X environment variables

I’ve tryed hard to understand the environment variables mechanism for Mac OS X. It seems to me a little mess.

The way Mac OS X applications are launched in Aqua is different from the way in other UNIX windows environments.

In standard UNIX the applications, all, inherit their environment variables from the login shell.

In Mac OS X it is different. GUI Applications, even if ported from Unix/Linux, do not run in the same process environment as an application launched in Terminal. This is an inheritance of NeXTStep. To correct this difference: there is a ‘strange’ file named ~/.MacOSX/environment.plist.

In a freshly installed system neither the ~/.MacOSX directory nor the file environment.plist in it exist. You can create them with:

defaults write ${HOME}/.MacOSX/environment PATH "${HOME}/bin:/usr/bin:/bin:/usr/local/bin"

The same mechanism can be used to make MANPATH, INFOPATH, LC_CTYPE, and other environment variables available.

But this can interfere with settings from the usual UNIX files like the system’s like /etc/profile, /etc/csh.login and also the user’s ~/.profile or ~/.login and the same. You have to decide from which of the two systems the settings used will come. I strongly recommend ~/.MacOSX/environment.plist, because it is so easy to change (edit) and use their key/value pairs saved in XML (on the command line with the defaults and plutil commands or PLTools from http://www.macorchard.com/PLTools, or “Mac like” with /Developer/Applications/Utilities/Property List Editor.app from Apple’s Mac OS X Developer Tools).

Even applications run in X11 are directly affected by this mechanism because the X server itself now runs in Aqua, too.

From XLS to CSV

XLS format is often used to transport data. That’s a boorish behavior. XLS format should never be used to exchange data.


Most of the time simple CSV file suffice. A friend gave me a two thousand lines dataset like that:

GIS006003 ARI00601P usr9 4
GIS00300G ATD00302V usr8 l
GIS006003 ATD006019 usr10 6
GIS00700V APC007016 usr11 2

In XLS the file was over 20 Mb. Even counting the half of byte lost in ASCII data representation, it was a 1 to 250 waste.

But sadly, XLS format doesn’t only waste space, it seriously compromise the meaning of the data. It’s not so uncommon that numeric fields are misinterpreted as textual and hence don’t count into numeric operations like means or sums. You aren’t able to know if such error occurs if you don’t inspect cell-by-cell the file, and even so there are some nasty inner problem that cannot be inspect visually.

So please DON’T use XLS format to exchange data. This is the first good advice from your data char.

When you ask for data, please don’t accept XLS files if you don’t know which care your guest puts in making up such files.

The received files can be dramatically wrong, you can consider its state from ‘difficult to work out’ to ‘completely unuseful’. And, sure, you can bet you’ll waste time just to reach the data you need, well before you can use them.

And, for the worst, if you don’t do that, you won’t be ever sure the quality of the data you’re working on.

Less is better, hence.

CSV or TXT file format don’t have this problems. (they surely have other problems, but much more compatible with your work I mean).

When translated to text it’s much more simple to find possible errors in data fields, and what’s even better be SURE that you’re working on an mistake-free file (at least from representation errors).

For instance to be SURE that fourth column of the previous example don’t have letters instead of figures you can simple use the command

cut -c 33- example1.txt | xargs | sed 's/[ 0-9]\+//m'

That’s all. Now you know if non-number exists in the last field. (You obviously should adapt that command from case to case).

A little explanation for the braves. The command is a filter that operates line by line.

The first part of the command line (before the first vertical bar, commonly known as pipe) ‘cuts’ the line up to the 33th position (I had to count the column by hand). All the content of the last column is then aligned up in a single line with the xargs command (this is a sort of a side-effect of this command, which is well more useful for other things also), then the sed part cancels spaces and figures from the string leaving in the results only what should not be in (letters or other characters).

Hence if the response is an empty string I’m sure I have just figures as it’s correct, but if I have a non-empty string I should search for the intruder line by line.

That’s isn’t difficult, too. Just a command more. I can get the offending lines with

egrep -n '^.{32}[^ 0-9]' example1.txt

where the command simple finds a non-figure after the 33th column in the file. The result could be something like:
2:GIS00300G ATD00302V usr8 l

where it is reported the line number and the full content with the found character.

Maybe even if you use the unmentionable data softwareâ„¢ to make your data works (and your data char heartily doesn’t recommend you), you should ever use CSV or TXT file to exchange data with your peers or to effectively use commands like ones seen.

Dragging off your data from XLS

Now that you are concerned about leaving your data in XLS files, it’s time to automate the extraction from that cage.

Do not use ‘Save As…’ in the unmentionable data softwareâ„¢, which is not, by any mean, good at this. And it’s a drag to make, file by file, sheet by sheet.

So, please move XLS files to a platform where you can use Perl (more likley Unix, Linux or Mac OS X, but even the unmentionable operating systemâ„¢ ) and use xls2csv program by Ken Prows.

The use of command is very simple. The options are
-x : filename of the source spreadsheet
-b : the character set the source spreadsheet is in (before)
-c : the filename to save the generated csv file as
-a : the character set the csv file should be converted to (after)
-q : quiet mode
-s : print a list of supported character sets
-h : print help message
-v : get version information
-W : list worksheets in the spreadsheet specified by -x
-w : specify the worksheet name to convert (defaults to the first worksheet)

The following example will convert a spreadsheet that is in the WINDOWS-1252 character set (WinLatin1) and save it as a csv file in the UTF-8 character set.

xls2csv -x "1252spreadsheet.xls" -b WINDOWS-1252 -c "ut8csvfile.csv" -a UTF-8

This example with convert the worksheet named “Users” in the given spreadsheet.

xls2csv -x "multi_worksheet_spreadsheet.xls" -w "Users" -c "users.csv"

The spreadsheet’s charset (-b) will default to UTF-8 if not set.

If the csv’s charset (-a) is not set, the CSV file will be created using the same charset as the spreadsheet (which is not the best option, so try ever to use UTF-8).

Some known problems of the program are:

  • It probably will not work with spreadsheets that use formulas. You should before create a sheet with the static content of formula fields copied as numbers and then extract this sheet.
  • A line in the spreadsheet is assumed to be blank if there is nothing in the first column.
  • Some users have reported problems trying to convert a spreadsheet while it was opened in a different application. You should probably make sure that no other programs are working with the spreadsheet while you are converting it.

The script is free software and you can redistribute it and/or modify it under the same terms as Perl interpreter itself.