tcat is a tool for concatenating any number of similar
tables one after the other. The tables must be of similar form to each other
(same number and types of columns). Preprocessing of the tables may be done
using the icmd parameter, which will operate in the same way on all
the input tables. Table parameters of the output table will be taken from
the first of the input tables.
Subject to some constraints on the details of the input and output
formats and processing, tcat is capable of joining an unlimited
number of tables together to produce an output table of unlimited length,
without large memory requirements. If there are very many input files, it
may be necessary to set the lazy parameter so that they are not all
kept open at once.
If you have heterogeneous tables, in different formats or
requiring different preprocessing steps from each other before they can be
concatenated, use tcatn instead.
- in=<table>
[<table> ...]
Locations of the input tables. Either specify the
parameter multiple times, or supply the input tables as a space-separated list
within a single use.
The following table location forms are allowed:
- A filename.
- A URL.
- The special value "-", meaning standard input. In this
case the input format must be given explicitly using the ifmt
parameter. Note that not all formats can be streamed in this way.
- A scheme specification of the form
:<scheme-name>:<scheme-args>.
- A system command line with either a "<" character at
the start, or a "|" character at the end
("<syscmd" or "syscmd|"). This
executes the given pipeline and reads from its standard output. This will
probably only work on unix-like systems.
Compression in any of the supported compression formats (Unix compress, gzip or
bzip2) is expanded automatically.
A list of input table locations may be given in an external file
by using the indirction character '@'. Thus "in=@filename"
causes the file filename to be read for a list of input table
locations. The locations in the file should each be on a separate line.
- ifmt=<in-format>
Specifies the format of the input table as specified by
parameter
in. The known formats are listed in SUN/256. This flag can be
used if you know what format your table is in. If it has the special value
(auto) (the default), then an attempt will be made to detect the format
of the table automatically. This cannot always be done correctly however, in
which case the program will exit with an error explaining which formats were
attempted. This parameter is ignored for scheme-specified tables.
The same format parameter applies to all the tables specified by
in.
- multi=true|false
Determines whether all tables, or just the first one,
from input table files will be used. If set
false, then just the first
table from each file named by
in will be used. If
true, then all
tables present in those input files will be used. This only has an effect for
file formats which are capable of containing more than one table, which
effectively means FITS and VOTable and their variants.
- istream=true|false
If set true, the input table specified by the
in
parameter will be read as a stream. It is necessary to give the
ifmt
parameter in this case. Depending on the required operations and processing
mode, this may cause the read to fail (sometimes it is necessary to read the
table more than once). It is not normally necessary to set this flag; in most
cases the data will be streamed automatically if that is the best thing to do.
However it can sometimes result in less resource usage when processing large
files in certain formats (such as VOTable). This parameter is ignored for
scheme-specified tables.
The same streaming flag applies to all the tables specified by
in.
- icmd=<cmds>
Specifies processing to be performed on each input table
as specified by parameter
in, before any other processing has taken
place. The value of this parameter is one or more of the filter commands
described in SUN/256. If more than one is given, they must be separated by
semicolon characters (";"). This parameter can be repeated multiple
times on the same command line to build up a list of processing steps. The
sequence of commands given in this way defines the processing pipeline which
is performed on the table.
Commands may alteratively be supplied in an external file, by
using the indirection character '@'. Thus a value of
"@filename" causes the file filename to be read for
a list of filter commands to execute. The commands in the file may be
separated by newline characters and/or semicolons, and lines which are blank
or which start with a '#' character are ignored.
- ocmd=<cmds>
Specifies processing to be performed on the output table,
after all other processing has taken place. The value of this parameter is one
or more of the filter commands described in SUN/256. If more than one is
given, they must be separated by semicolon characters (";"). This
parameter can be repeated multiple times on the same command line to build up
a list of processing steps. The sequence of commands given in this way defines
the processing pipeline which is performed on the table.
Commands may alteratively be supplied in an external file, by
using the indirection character '@'. Thus a value of
"@filename" causes the file filename to be read for
a list of filter commands to execute. The commands in the file may be
separated by newline characters and/or semicolons, and lines which are blank
or which start with a '#' character are ignored.
- omode=out|meta|stats|count|checksum|cgi|discard|topcat|samp|tosql|gui
The mode in which the result table will be output. The
default mode is
out, which means that the result will be written as a
new table to disk or elsewhere, as determined by the
out and
ofmt parameters. However, there are other possibilities, which
correspond to uses to which a table can be put other than outputting it, such
as displaying metadata, calculating statistics, or populating a table in an
SQL database. For some values of this parameter, additional parameters
(
<mode-args>) are required to determine the exact behaviour.
Possible values are
- out
- meta
- stats
- count
- checksum
- cgi
- discard
- topcat
- samp
- tosql
- gui
Use the
help=omode flag or see SUN/256 for more information.
- out=<out-table>
The location of the output table. This is usually a
filename to write to. If it is equal to the special value "-" (the
default) the output table will be written to standard output.
This parameter must only be given if omode has its default
value of "out".
- ofmt=<out-format>
Specifies the format in which the output table will be
written (one of the ones in SUN/256 - matching is case-insensitive and you can
use just the first few letters). If it has the special value
"
(auto)" (the default), then the output filename will be
examined to try to guess what sort of file is required usually by looking at
the extension. If it's not obvious from the filename what output format is
intended, an error will result.
This parameter must only be given if omode has its default
value of "out".
- seqcol=<colname>
Name of a column to be added to the output table which
will contain the sequence number of the input table from which each row
originated. This column will contain 1 for the rows from the first
concatenated table, 2 for the second, and so on.
- loccol=<colname>
Name of a column to be added to the output table which
will contain the location (as specified in the input parameter(s)) of the
input table from which each row originated.
- uloccol=<colname>
Name of a column to be added to the output table which
will contain the unique part of the location (as specified in the input
parameter(s)) of the input table from which each row originated. If not null,
parameters will also be added to the output table giving the pre- and post-fix
string common to all the locations. For example, if the input tables are
"/data/cat_a1.fits" and "/data/cat_b2.fits" then the
output table will contain a new column <colname> which takes the value
"a1" for rows from the first table and "b2" for rows from
the second, and new parameters "<colname>_prefix" and
"<colname>_postfix" with the values "/data/cat_" and
".fits" respectively.
- lazy=true|false
Whether to perform table resolution lazily. If true, each
table is only accessed when the time comes to add its rows to the output; if
false, then all the tables are accessed up front. This is mostly a tuning
parameter, and on the whole it doesn't matter much how it is set, but for
joining an enormous number of tables setting it true may avoid running out of
resources.
- countrows=true|false
Whether to count the rows in the table before starting
the output. This is essentially a tuning parameter - if writing to an output
format which requires the number of rows up front (such as normal FITS) it may
result in skipping the number of passes through the input files required for
processing. Unless you have a good understanding of the internals of the
software, your best bet for working out whether to set this true or false is
to try it both ways