simrisc - This program performs simulations in the context of
breast and lung cancer
simrisc [options] analyses
The analyses argument is the name of the file specifying
the analyses to perform. See section ANALYSES for details.
Simrisc was originally designed around 2010 by Marcel
Greuter at the University Medical Center Groningen, and thereafter modified
in 2015 by Chris de Jonge.
- o
- In addition to breast cancer simulations simrisc can also perform
lung cancer simulations for either men or women. By default breast cancer
simulations are performed.
- o
- The configuration file was reorganized and extended with parameters used
when performing lung cancer simulations: specifications for CT
scans; additional Tumor Beir7, Growth, Incidence, and
Survival parameters, and a new section S3 containing the
age-range specific probabilities of encountering metastases.
- o
- Simrisc offers the new option --cancer specifying the type
of simulation.
When performing lung cancer simulations only the CT
modality can be specified. When performing breast cancer simulations the
CT modality cannot be specified, but as with previous versions: any
combination of the Mammo, Tomo and MRI modalities can be
specified.
Short options are provided between parentheses, immediately
following their long option equivalents. Several parameters specify the
path-names of files produced by simrisc. If a path-name starts with a
tilde character (~) then the tilde is replaced by the user’s home
directory. An initial + is replaced by the program’s base directory
(see option base). When an analysis uses multiple iterations then
`$’ characters in filename specifications are replaced by the
analysis’ interation index.
All single-letter options referring to filesystem entries
(directories, filenames) are capitalized, all other single-letter options
are lowercase.
- o
- --base=basedir (-B)
the base directory where the output files will be written. By default
./. If basedir doesn’t exist it is created by the
program. If the directory cannot be created and exception is thrown,
terminating the program. The basedir specifications may specify
relative or absolute directory locations;
- o
- --cancer=type (-c)
The simulation type (type) can be specified as breast to
perform breast cancer simulations. Breast cancer simulations are performed
by default when the --cancer option is not specified.
Alternatively, to perform lung cancer simulations type must be
specified as either male or female to perform simulations
for, respectively, male or female cases.
- Be advised that the default configuration file specifies Screening
Mammo rounds, which must either be changed to CT in locally
used configuration files or in analysis: sections (see section
ANALYSES below);
- o
- --config=path (-C)
the location of the configuration file. By default
~/.config/simrisc’ is used;
- o
- --data=path (-D)
path name of the file to contain the data of the cases generated by the
simulation (default: ’<base>/data-$.txt’). If a data
file should not be written specify ! (mnemonic: the logical not
operator, i.e., --data !). See section OUTPUT for a
description of the generated data;
- o
- --death-age=age (-a)
run one simulation using a specific natural death-age. This option also
requires the specification of tumor-age, and is mutually exclusive
with the case option;
- o
- --help (-h)
shows help information and terminates;
- o
- --last-case=nCases (-l)
perform simulations until nCases cases have been analyzed and only
write the data for the final case to the data file. The rounds and
sensitivity files contain the summarized results of all
nCases analyzed cases;
- o
- --one-analysis (-o)
the program’s arguments specify the parameters of a single analysis,
rather than the name of an analyses-specification file. The
program’s arguments are optional and are used to alter the
parameter values as defined in the config file or to define label
specifications. See section ANALYSES for details;
- o
- --parameters=path (-P)
path name of the file showing the actually used parameter specifications. By
default no parameter file is written. If the --base (-B) option was
specified then path is written in the base directory if path
does not contain a slash (/) (use ./path to write the parameters
file in the current directory if --base was specified);
- o
- --rounds=path (-R)
path name of the file to containing the summary info of the simulation
rounds (default: ’<base>/rounds-$.txt’). If a rounds
file should not be written specify ! (i.e., --rounds !). See
section OUTPUT for a description of the generated summary
info;
- o
- --spread=path (-s)
path name of the file containing the configured and actually used parameter
values when spread: true is specified (default:
’<base>/spread-$.txt’). If this file should not be
written specify ! (mnemonic: the logical not operator, i.e.,
--spread !). If a parameter doesn’t use spreading then the
’using’ part is omitted. See section OUTPUT for a
sample of its content;
- o
- --sensitivity=path (-S)
path name of the file to containing the summary info of the
simulation’s sensitivity data (default:
’<base>/sensitivity.txt’). If a sensitivity file
should not be written specify ! (i.e., --sensitivity !). See
section OUTPUT for a description of the produced sensitivity
summary;
- o
- --tumor-age=age (-t)
run one simulation using a specific tumor self-detect age. This option also
requires the specification of death-age, and is mutually exclusive
with the case option;
- o
- --verbose (-V)
provides additional information while running;
- o
- --version (-v)
shows simrisc’s version information and terminates;
Unless the --one-analysis option is used the
program’s first and only required argument is the name of a file
providing the details of the analyses to perform. These files are called
analysis files. These files must be a standard ascii text files.
I.e., they can only contain 7-bit ascii printable and white-space
characters. Identifiers used in analysis files and in configuration files
are interpreted case sensitively.
Configuration specifications starting with uppercase letters (like
Scenario: and Costs:) specify (sub)sections and don’t
contain additional specifications. Specifications starting with lowercase
letters (like ageGroup:) are followed by actual parameter values. For
a complete overview refer to the simriscparams(7) man-page.
Analysis files may define multiple analyses. Each analysis
specification must begin with a line containing
Analysis:
At each Analysis: specification the program’s initial
configuration is reset.
Options specified on the command-line cannot be specified in
Analysis: sections and remain active while simrisc is running.
The default option values are reset at each separate Analysis: unless
an option has been specified on the command-line, in which case those option
values are used throughout the simrisc run.
Following Analysis: lines the characteristics of the
analysis are specified which can be specified for each Analysis:
specification, in the following order:
- o
- a label: line: label: lines, when used, must immediately
follow Analysis: lines. The text following label: is written
at the top of the output files;
- o
- option lines: specifying simrisc options (not specified on the
command line) which are then used for that analysis. When program options
are specified their long option names must be used. E.g.:
base: /tmp/
last-case: 20
- o
- parameter specifications: modify (some) parameter specifications defined
in configuration files. When parameters of configuration file sections
(cf. simriscparams(7)) are not specified then the parameters
specified in the configuration file are used.
All specifications in Analysis: sections are optional. An
Analysis: section merely containing the line Analysis: defines
an analysis using the explicitly specified command-line options or the
default program options and using the parameter specifications provided in
the configuration file.
Empty lines, initial and trailing white-space, and all characters
on lines starting at the hash-mark (#) are ignored and may be used
anywhere in analysis files.
Lines not conforming to the above description result in error
messages, causing simrisc to end.
Filename specified in Analysis: sections may start with a
tilde character (~) which is replaced by the user’s home directory,
or they may start with an initial + character, which is replaced by the
program’s base directory (see option base). When an analysis
performs multiple iterations then `$’ characters in filename
specifications are replaced by the analysis’ interation index.
Multiple analysis sections should not specify identically named
output files, as the output files are (re)written for each separate
analysis.
Analysis sections are commonly used to alter the default
specifications of the configuration file. E.g., the default number of
iterations equals 1. By specifying
Scenario:
iterations: 3
the analysis performs 3 iterations.
Parameters are either read from the configuration file or they are
redefined in Analysis: sections. E.g., in de provided configuration
file screening rounds use two-year intervals between the ages of 50 and 74.
To use screening rounds using 5-year intervals, between ages 50 and 65, then
an Analysis: specification could be, e.g.,
Screening:
round: 50 Mammo MRI
round: 55 Mammo MRI
round: 60 Mammo MRI
round: 65 Mammo MRI
When the --one-analysis option is used parameters are
modified by providing comma-separated parameter specifications as program
command-line arguments. E.g., to perform one analysis, writing the data file
to /tmp/data, simulating 1000 cases, and using 20 as seed for the
random number generator the command
simrisc -D /tmp/data -o Scenario:, cases: 1000, seed: 20
can be used. Note that when using the one-analysis option parameter
section names must precede parameter specifications. E.g., since the
parameters cases and seed are defined in the `Scenario’
section (cf. simriscparams(7)) they must be preceded by the
Scenario: specification.
When an Analysis: specification modifies parameters, then
subsequent Analysis: sections start from the unmodified option and
parameter specifications.
Here is an example of an analysis file specifying two
analyses:
Analysis:
base: 1
cancer: male
parameters: +params.txt
Scenario:
cases: 10
Screening:
round: 50 CT
round: 55 CT
Analysis:
base: 2
config: ~/src/simrisc/stdconfig/lung
parameters: +params.txt
cancer: breast
Scenario:
cases: 20
spread: true
Screening:
round: 50 Mammo MRI
round: 55 Mammo MRI
round: 60 Mammo MRI
round: 65 Mammo MRI
The first lines of the generated files contain time stamps showing
the date and time when the files were written and the used SimRisc
version. Here is an example, following the RFC 2822 format for the
timestamp:
Mon, 14 Nov 2022 15:30:26 +0100 (SimRisc V. 15.00.00)
If label: lines are used then the time stamp is followed by
the label specifications, which is then followed by an empty line. After
this header the file’s specific data are shown.
The data in all files (except for the file listing the actually
used parameters (option --parameters (P))) are written using
the standard comma-separated format (cf. RFC 4180). The initial lines
contain table headings and column labels documenting the meanings of the
various columns. Likewise there is a final line ending the tables.
Data of simulated cases
For each simulated case the values of the following variables are
written to file (one line of comma-separated values per simulated case):
- o
- case: the (0-based) case-index;
- o
- cause of death: either Natural or Tumor;
- o
- death age: the case’s age of death;
- o
- natural death age: the case’s natural age of death (if no
tumor occurs);
- o
- death status: a numeric index specifying how and at what stage the
case died:
1: natural death in the pre-screening phase,
2: natural death in the screening phase,
3: natural death in the post-screening phase,
4: tumor caused death in the pre-screening phase,
5: tumor caused death in the screening phase,
6: tumor caused death in the post-screening phase;
- o
- tumor present: Yes if the simulation resulted in a tumor,
No if no tumor occurred;
- o
- tumor detected: Yes if the tumor was detected, No if
not;
- o
- interval tumor: Yes if the tumor was an interval tumor,
No if not;
- o
- tumor diameter: the tumor’s diameter in mm when it was
detected. 0.00 is shown if no tumor occurred. In the exceptional case
where the simulation produced a tumor whose diameter exceeded 1000 mm the
value 1001 is shown.
- o
- tumor doubling days: the time (in days) it takes for the tumor to
double its size;
- o
- tumor preclinical period: the age at which the tumor is potentially
detectible by screening;
- o
- tumor onset age: the age at which the tumor first occurred;
- o
- tumor self-detect age: the age at which the tumor was
self-detected. This age is the result of the simulation, and may exceed
the case’s actual death age (if so, the case’s data report
that no tumor is present);
- o
- tumor death age: the age at which the tumor caused or would have
caused he case’s death. The simulation process uses ages ranging
from 0 through 100. If the age at which the tumor causes the case’s
death exceeds 100, then 100.00 is reported;
- o
- costs: the case’s screening and (if appliccable) treatment
costs;
- o
- self-detection indicator: 1 if the tumor was self-detected, 0 if
not (also if there’s no tumor);
- o
- detection round: 0-based round index at which the tumor was
detected (or 1) if the tumor was self-detected, 0 if not (also if
there’s no tumor).
- o
- screening rounds: this column contains show which screening rounds
were attended by the simulated cases, and if so whether false negative or
false positive diagnoses were made. The following digits are used:
- o
- 0: the case did not attend this screening round;
- o
- 1: the case did attend this screening round;
- o
- 2: the case did attend this screening round, resulting in a false negative
diagnosis;
- o
- 3: the case did attend this screening round, resulting in a false positive
diagnosis. There are as many digits as screening rounds. The leftmost
digit refers to the first screening round, the rightmost digit to the last
screening round. E.g., using 12 screening rounds the following indicators
could be obtained:
0011311110000
Using screening round indices (which are also used to refer to rounds in the
rounds-$.txt files), this case did not attent screening rounds 0,
1, 9, 10, 11 and 12, and at 4 a false positive diagnosis was obtained.
Note that the screening round indices start at 0: the first screening
round is indicated by index 0.
Actually used spread-values
When spread: true is specified then by default the actually
used and orgiginal parameter values are written to the file
spread-$.txt, where $ is replaced by the loop’s
iteration index. Here is a sample from the content of such a file, showing
the values of the Tumor: DoublingTime: agegroups parameters:
Tumor:
DoublingTime
ageGroup: 1 - 50 configured: 4.38, using: 3.41972
ageGroup: 50 - 70 configured: 5.06, using: 4.83591
ageGroup: 70 - * configured: 5.24, using: 5.30492
- o
- ~/.config/simrisc: the default location of the program’s
configuration file;
- o
- the simrisc distribution archive contains the default configuration
file as simrisc-VERSION/stdconfig/simrisc, where VERSION is
replaced by simrisc’s actual release version;
- o
- when installing simrisc using Linux distribution archives (e.g.,
.deb files) the default configuration file is commonly available as
/usr/shared/doc/simrisc/simrisc.gz
This is free software, distributed under the terms of the GNU
General Public License (GPL).
Frank B. Brokken (f.b.brokken@rug.nl),