FBB::CSV4180 - Converter for comma separated values
#include <bobcat/csv>
Linking option: -lbobcat
Objects of the class CSV4180 can be used to convert series
of comma separated values to the individual separated values (also called
`fields’ below). The class implements RFC 4180 (cf.
https://www.ietf.org/rfc/rfc4180.txt, section 2).
According to RFC 4180 lines contain comma separated values: comma
separated values on one line are processed together, as a series of values.
The final comma separated value on a line is not ended by a comma.
Comma separated values may be surrounded by double quotes.
However, they must be surrounded by double quotes in these cases:
- o
- if the values contain commas;
- o
- if the values contain double quotes (in which case the double quote is
`escaped’ by doubling it, e.g., "a "" double
quote");
- o
- if the values extend over multiple lines. E.g.,
"First line
second line"
Comma separated values may be empty: the following line defines
three empty comma separated values:
,,
The first empty value starts at the beginning of the line, and continues up to
the first comma; the second empty value starts beyond the first comma and
continues up to the second comma; the third empty value starts beyond the
second comma, and continues up to the end of the line. If the line ends in
blank space characters then the third value isn’t empty, but contains
those blank space characters.
By default, values are interpreted as strings. The CSV4180
class also offers facilities to ignore specific fields, or to ensure that
they can be converted to integral or floating point values. The second
constructor (below) expects a std::string argument defining how to
interpret fields. Options are:
- o
- I: the field must be convertible to an integral value;
- o
- D: the field must be convertible to a floating point value;
- o
- S: the field is a string: it is used as-is;
- o
- X: the field is omitted from the final set of comma separated
values. I.e., if a line contains three comma separated values, and the
specification "SXS" is used then this results in two
comma separated values: the first and third value of three comma separated
values encountered on lines.
- o
- -: synonym of X.
In addition, field specifications may contain blank spaces, which
are ignored.
When processing comma separated values the first line may be
considered a header line. X specifications also apply to
header lines, but otherwise they merely consist of S-type fields. In
addition, when processing multiple input lines all non-header lines are made
available in a vector of vectors of fields, whereas the header line itself
can be accessed via a dedicated member (header()).
FBB
All constructors, members, operators and manipulators, mentioned in this
man-page, are defined in the namespace FBB.
- o
- explicit CSV4180(size_t nFields = 0, bool header = false, char
fieldSep = ’,’):
The first parameter specifies the number of fields that must be present on
input lines. When using the default value the number of fields encountered
on the first line determines the number of fields that must be present on
subsequent lines. If the second parameter is true then the first
line is interpreted as the header line. The third parameter specifies the
character separating the fields. By default it’s a comma, but
sometimes (not part of the RFC) a semicolon is used. By specifying
fieldSep any character other than a comma can be used as field
separator.
- o
- explicit CSV4180(std::string const &specs, bool header = false,
char fieldSep = ’,’):
The first parameter defines the number and types of the comma separated
values on input lines. Specifications can be
- o
- D: the field must be convertible to a floating point value;
- o
- I: the field must be convertible to an integral value;
- o
- S: the field is left as-is, and can be retrieved as a
std::string.
- o
- X or -: the field is ignored and is not stored inside the
CSV4180 object.
- o
- blank space characters are ignored.
An exception is thrown when encountering other than the
abovementioned characters are encountered.
- If I or D fields cannot be properly converted, or if a line
contains too few or too many comma separated values the input
stream’s fail status is set.
- The last two parameters are interpreted as the last two parameters of the
previous constructor.
- Copy and move constructors (and assignment operators) are available.
- o
- std::istream &operator>>(std::istream &in, CSV4180
&csv):
One line of text is extracted from in and processed by the csv
object. The csv object may or may not already contain converted
comma separated values. When empty, the first line is processed according
to the specifications provided to the csv object at construction
time. Otherwise, the comma separated values on extracted lines must match
the number and types of the fields, as specified by the csv object.
When input lines do not match these specifications in’s fail
status is set.
- o
- void clear(size_t nFields = 0):
The internally stored data (referred to by the data, header, and
lastLine members) are erased. By default, the required number of
CSV fields is reset to 0, but can be set to a specific value by specifying
a value for its nFields parameter.
- o
- std::vector<std::vector<std::string>> const &data()
const:
A reference to the vector of vectors of fields stored inside the
CSV4180 object is returned. The vector returned by data does
not contain the header line. If a header line was requested it can be
retrieved from the header() member.
- o
- std::vector<std::string> const &header() const:
If the constructor’s header parameter was specified as
true then this member returns the fields encountered on the first
line that was processed by the read1 member. Otherwise,
header returns a reference to an empty vector.
- o
- std::string const &lastLine() const:
A reference to the last line that was successfully extracted from the input
stream by the read1 member is returned. So once the lines
containing the comma separated values have been processed, the next line
on the input stream can be obtained from this member.
- o
- size_t nValues() const:
After successfully calling read1 for the first time this member
returns the required number of comma separated values that must be
encountered on subsequent input lines.
- o
- size_t read(std::istream &in, size_t nLines = 0):
By default, all lines of in are read and are processed by the
read1 member. By specifying a non-zero value for the nLines
parameter the specified number of lines is read from in. Reading
stops once in’s status is not good. When
nLines is specified as zero, then in’s status flags
are cleared. The number of successfully processed lines is returned.
- o
- std::istream &read1(std::istream &in):
One line is read from in and is parsed for its comma separated
values. If parsing fails, in’s fail status is set. After
successfully calling read1 for the first time all subsequent lines
read by read1 must have the same number of comma separated values
as encountered when calling read1 for the first time. The parsed
fields are stored in a vector of std::string objects, and that
vector is added to the vector of vectors of strings that is returned by
the data member.
- o
- std::vector<std::vector<std::string>> release():
The vector of vectors of fields stored inside the CSV4180 object is
returned. After calling release the internally stored vector of
fields is empty. The vector returned by data does not contain the
header line. If a header line was requested it can be retrieved from the
header() member. Note that this member does not reset the number of
expected fields for subsequently processed CSV-lines. If that’s
what you want, call clear after calling release.
#include <iostream>
#include <bobcat/csv4180>
using namespace std;
using namespace FBB;
int main(int argc, char **argv)
{
CSV4180 csv; // this processes ’input’
size_t nLines = csv.read(cin);
cerr << nLines << " lines were read\n";
if (not csv.header().empty())
{
cerr << "header: " << ’\n’;
for (auto const &field: csv.header())
cerr << " `" << field << "’\n";
}
cerr << "# CSV values: " << csv.nValues() << ’\n’;
for (auto const &line: csv.data())
{
cerr << "Line:\n";
for (auto const &entry: line)
cerr << " `" << entry << "’\n";
}
}
bobcat/csv - defines the class interface
- o
- https://fbb-git.gitlab.io/bobcat/: gitlab project page;
- o
- bobcat_5.07.00-x.dsc: detached signature;
- o
- bobcat_5.07.00-x.tar.gz: source archive;
- o
- bobcat_5.07.00-x_i386.changes: change log;
- o
- libbobcat1_5.07.00-x_*.deb: debian package containing the
libraries;
- o
- libbobcat1-dev_5.07.00-x_*.deb: debian package containing the
libraries, headers and manual pages;
Bobcat is an acronym of `Brokken’s Own Base Classes And
Templates’.
This is free software, distributed under the terms of the GNU
General Public License (GPL).
Frank B. Brokken (f.b.brokken@rug.nl).