Catmandu::Exporter::Stat(3pm) | User Contributed Perl Documentation | Catmandu::Exporter::Stat(3pm) |
Catmandu::Exporter::Stat - a statistical export
# Calculate statistics on the availabity of the ISBN fields in the dataset cat data.json | catmandu convert -v JSON to Stat --fields isbn # Export the statistics as YAML cat data.json | catmandu convert -v JSON to Stat --fields isbn --as YAML
The Catmandu::Stat package can be used to calculate statistics on the availability of fields in a data file. Use this exporter to count the availability of fields or count the number of duplicate values. For each field the exporter calculates the following statistics:
* name : the name of a field * count : the number of occurrences of a field in all records * zeros : the number of records without a field * zeros% : the percentage of records without a field * min : the minimum number of occurrences of a field in any record * max : the maximum number of occurrences of a field in any record * mean : the mean number of occurrences of a field in all records * variance : the variance of the field number * stdev : the standard deviation of the field number * uniq~ : the estimated number of unique records * uniq% : the estimated percentage of uniq values * entropy : the minimum and maximum entropy in the field values (estimated value)
Details:
* entropy is an indication in the variation of field values (are some values more unique than others) * entropy values are displayed as : minimum/maximum entropy * when the minimum entropy = 0, then all the field values are equal * when the minimum and maximum entropy are equal, then all the field values are different * the 'uniq%' and 'entropy' fields are estimated and are normally within 1% of the correct value (this is done to keep the memory requirements of this module low)
Each statistical report contains one row named hash '#' which contains the total number of records.
--- title: ABCDEF author: - Davis, Miles - Parker, Charly - Mingus, Charles year: 1950
Examples of operation:
# Calculate statistics on the number of records that contain a 'title' cat data.json | catmandu convert JSON to Stat --fields title # Calculate statistics on the number of records that contain a 'title', 'isbn' or 'subject' fields cat data.json | catmandu convert JSON to Stat --fields title,isbn,subject # The next example will not work: no deeply nested fields allowed cat data.json | catmandu convert JSON to Stat --fields foo.bar.x.y
When no fields parameter is available, then all fields are read from the first input record.
Catmandu::Exporter , Statistics::Descriptive , Statistics::TopK , Algorithm::HyperLogLog
2023-02-06 | perl v5.36.0 |