file_sorter(3erl) | Erlang Module Definition | file_sorter(3erl) |
file_sorter - File sorter.
This module contains functions for sorting terms on files, merging already sorted files, and checking files for sortedness. Chunks containing binary terms are read from a sequence of files, sorted internally in memory and written on temporary files, which are merged producing one sorted file as output. Merging is provided as an optimization; it is faster when the files are already sorted, but it always works to sort instead of merge.
On a file, a term is represented by a header and a binary. Two options define the format of terms on files:
Option format also determines what is written to the sorted output file: if Format is term, then io:format/3 is called to write each term, otherwise the binary prefixed by a header is written. Notice that the binary written is the same binary that was read; the results of applying function Format are thrown away when the terms have been sorted. Reading and writing terms using the io module is much slower than reading and writing binaries.
Other options are:
As an alternative to sorting files, a function of one argument can be specified as input. When called with argument read, the function is assumed to return either of the following:
Any other value is immediately returned as value of the current call to sort or keysort. Each input function is called exactly once. If an error occurs, the last function is called with argument close, the reply of which is ignored.
A function of one argument can be specified as output. The results of sorting or merging the input is collected in a non-empty sequence of variable length lists of binaries or terms depending on the format. The output function is called with one list at a time, and is assumed to return a new output function. Any other return value is immediately returned as value of the current call to the sort or merge function. Each output function is called exactly once. When some output function has been applied to all of the results or an error occurs, the last function is called with argument close, and the reply is returned as value of the current call to the sort or merge function.
If a function is specified as input and the last input function returns {end_of_input, Value}, the function specified as output is called with argument {value, Value}. This makes it easy to initiate the sequence of output functions with a value calculated by the input functions.
As an example, consider sorting the terms on a disk log file. A function that reads chunks from the disk log and returns a list of binaries is used as input. The results are collected in a list of terms.
sort(Log) ->
{ok, _} = disk_log:open([{name,Log}, {mode,read_only}]),
Input = input(Log, start),
Output = output([]),
Reply = file_sorter:sort(Input, Output, {format,term}),
ok = disk_log:close(Log),
Reply. input(Log, Cont) ->
fun(close) ->
ok;
(read) ->
case disk_log:chunk(Log, Cont) of
{error, Reason} ->
{error, Reason};
{Cont2, Terms} ->
{Terms, input(Log, Cont2)};
{Cont2, Terms, _Badbytes} ->
{Terms, input(Log, Cont2)};
eof ->
end_of_input
end
end. output(L) ->
fun(close) ->
lists:append(lists:reverse(L));
(Terms) ->
output([Terms | L])
end.
For more examples of functions as input and output, see the end of the file_sorter module; the term format is implemented with functions.
The possible values of Reason returned when an error occurs are:
file_name() = file:name()
file_names() = [file:name()]
i_command() = read | close
i_reply() =
end_of_input |
{end_of_input, value()} |
{[object()], infun()} |
input_reply()
infun() = fun((i_command()) -> i_reply())
input() = file_names() | infun()
input_reply() = term()
o_command() = {value, value()} | [object()] | close
o_reply() = outfun() | output_reply()
object() = term() | binary()
outfun() = fun((o_command()) -> o_reply())
output() = file_name() | outfun()
output_reply() = term()
value() = term()
options() = [option()] | option()
option() =
{compressed, boolean()} |
{header, header_length()} |
{format, format()} |
{no_files, no_files()} |
{order, order()} |
{size, size()} |
{tmpdir, tmp_directory()} |
{unique, boolean()}
format() = binary_term | term | binary | format_fun()
format_fun() = fun((binary()) -> term())
header_length() = integer() >= 1
key_pos() = integer() >= 1 | [integer() >= 1]
no_files() = integer() >= 1
order() = ascending | descending | order_fun()
order_fun() = fun((term(), term()) -> boolean())
size() = integer() >= 0
tmp_directory() = [] | file:name()
reason() =
bad_object |
{bad_object, file_name()} |
{bad_term, file_name()} |
{file_error,
file_name(),
file:posix() | badarg | system_limit} |
{premature_eof, file_name()}
check(FileName) -> Reply
check(FileNames, Options) -> Reply
Types:
Checks files for sortedness. If a file is not sorted, the first out-of-order element is returned. The first term on a file has position 1.
check(FileName) is equivalent to check([FileName], []).
keycheck(KeyPos, FileName) -> Reply
keycheck(KeyPos, FileNames, Options) -> Reply
Types:
Checks files for sortedness. If a file is not sorted, the first out-of-order element is returned. The first term on a file has position 1.
keycheck(KeyPos, FileName) is equivalent to keycheck(KeyPos, [FileName], []).
keymerge(KeyPos, FileNames, Output) -> Reply
keymerge(KeyPos, FileNames, Output, Options) -> Reply
Types:
Merges tuples on files. Each input file is assumed to be sorted on key(s).
keymerge(KeyPos, FileNames, Output) is equivalent to keymerge(KeyPos, FileNames, Output, []).
keysort(KeyPos, FileName) -> Reply
Types:
Sorts tuples on files.
keysort(N, FileName) is equivalent to keysort(N, [FileName], FileName).
keysort(KeyPos, Input, Output) -> Reply
keysort(KeyPos, Input, Output, Options) -> Reply
Types:
Sorts tuples on files. The sort is performed on the element(s) mentioned in KeyPos. If two tuples compare equal (==) on one element, the next element according to KeyPos is compared. The sort is stable.
keysort(N, Input, Output) is equivalent to keysort(N, Input, Output, []).
merge(FileNames, Output) -> Reply
merge(FileNames, Output, Options) -> Reply
Types:
Merges terms on files. Each input file is assumed to be sorted.
merge(FileNames, Output) is equivalent to merge(FileNames, Output, []).
sort(FileName) -> Reply
Types:
Sorts terms on files.
sort(FileName) is equivalent to sort([FileName], FileName).
sort(Input, Output) -> Reply
sort(Input, Output, Options) -> Reply
Types:
Sorts terms on files.
sort(Input, Output) is equivalent to sort(Input, Output, []).
stdlib 3.14 | Ericsson AB |