looktxt(1) | USER COMMANDS | looktxt(1) |
looktxt - Search and export numerics from any text/ascii file
looktxt [-b][-c][-f FORMAT][-H][-s SEC ...][-m META ...] file1 file2 ...
Extracting data from a text file is a never ending story. Usually, one will write a short script or program/function to analyse each specific input data format. The looktxt command purpose is to read any text data file containing numerical blocks just as a human would read it. Specifically, it looks for contiguous numerical blocks, which are stored into matrices, and other parts of the input file are classified as headers which are optionally exported. Numerical blocks are labelled according to the preceeding header block last word.
Blocks read from the data file can be sorted into sections. Each section SEC starts when it appears in a header and contains all following fields until a new section is found or the end of the file. Additionally, one may search for specific metadata keywords, at user's choice. Each data field matching the keyword metadata META in its headers will create a new entry in the MetaData section.
The output data files may be generated using "Matlab", "Scilab", "IDL", "Octave", "XML", "HTML", and "Raw" formats (using the -f FORMAT option), using a structure-like hierarchy. This hierarchy contains all sections, metadata and optionally headers that have been found during the parsing of the input data file.
After using looktxt foo the data is simply loaded into memory using e.g. 'matlab> ans=foo;' or directly with "matlab> looktxt('foo')". The exact method to import data is indicated at the begining of the output data file, and depends on the format.
The command can handle large files (hundreds of Mb) within a few secconds, with minimal memory requirements.
The command supports other options which are listed using looktxt -h
Among these are
will result in the following Matlab structure:
Creator: 'Looktxt 1.0.8 24 Sept 2009 Farhi E. [farhi at ill.fr]'
User: 'farhi on localhost'
Source: 'foo'
Date: 'Fri Dec 12 11:35:20 CET 2008'
Format: 'Matlab'
Command: [1x195 char]
Filename: 'foo.m'
Headers: struct SEC1, struct SEC2, struct MetaData (headers)
Data: struct SEC1, struct SEC2, struct MetaData (numerics)
The LOOKTXT_FORMAT environment variable may be set to define the default export format. When not defined, the Matlab format is used as default.
The command by itself should work properly. In case of trouble, you may have more information with the --verbose or --debug options. Most problems arise when importing data after running looktxt. E.g. these come from idl(1) and scilab(1) limitations (lines too long, too many structure elements, ...). The --binary may solve some of these import issues.
In case of memory allocation problems, you may try the --fast option.
looktxt returns -1 in case of error, 0 when no file was processed, or the number of processed files.
Usual procedure: ./configure; make; make install. In principle, the only required file is the executable looktxt , to be copied in a system executable location, e.g. '/usr/local/bin', '/usr/bin', or 'c:\windows\system32'.
Binaries are pre-compiled for usual systems with the package. An installer is available for Matlab usage with: matlab> looktxt which may be used both from Linux/Unix and Windows systems, as it uses the MeX executable.
Emmanuel FARHI (farhi (at) ill.eu) and the Institut Laue Langevin at http://www.ill.eu
This program is licensed under the GENERAL PUBLIC LICENSE Version 2.
matlab(1), idl(1), scilab(1), octave(1), xmlcatalog(1), html2text(1)
February 10, 2014 | version 1.4 |