stalin - A global optimizing compiler for Scheme
- stalin
- [-version]
[-I include-directory]*
[[-s|-x|-q|-t]]
[[-treat-all-symbols-as-external|
-do-not-treat-all-symbols-as-external]]
[[-index-allocated-string-types-by-expression|
-do-not-index-allocated-string-types-by-expression]]
[[-index-constant-structure-types-by-slot-types|
-do-not-index-constant-structure-types-by-slot-types]]
[[-index-constant-structure-types-by-expression|
-do-not-index-constant-structure-types-by-expression]]
[[-index-allocated-structure-types-by-slot-types|
-do-not-index-allocated-structure-types-by-slot-types]]
[[-index-allocated-structure-types-by-expression|
-do-not-index-allocated-structure-types-by-expression]]
[[-index-constant-headed-vector-types-by-element-type|
-do-not-index-constant-headed-vector-types-by-element-type]]
[[-index-constant-headed-vector-types-by-expression|
-do-not-index-constant-headed-vector-types-by-expression]]
[[-index-allocated-headed-vector-types-by-element-type|
-do-not-index-allocated-headed-vector-types-by-element-type]]
[[-index-allocated-headed-vector-types-by-expression|
-do-not-index-allocated-headed-vector-types-by-expression]]
[[-index-constant-nonheaded-vector-types-by-element-type|
-do-not-index-constant-nonheaded-vector-types-by-element-type]]
[[-index-constant-nonheaded-vector-types-by-expression|
-do-not-index-constant-nonheaded-vector-types-by-expression]]
[[-index-allocated-nonheaded-vector-types-by-element-type|
-do-not-index-allocated-nonheaded-vector-types-by-element-type]]
[[-index-allocated-nonheaded-vector-types-by-expression|
-do-not-index-allocated-nonheaded-vector-types-by-expression]]
[[-no-clone-size-limit|
-clone-size-limit number-of-expressions]]
[-split-even-if-no-widening]
[[-fully-convert-to-CPS|
-no-escaping-continuations]]
[-du]
[-Ob] [-Om] [-On] [-Or]
[-Ot]
[-d0] [-d1] [-d2] [-d3]
[-d4] [-d5] [-d6] [-d7]
[-closure-conversion-statistics]
[-dc] [-dC] [-dH] [-dg]
[-dh]
[-d]
[-architecture name]
[[-baseline|
-conventional|
-lightweight]]
[[-immediate-flat|
-indirect-flat|
-immediate-display|
-indirect-display|
-linked]]
[[-align-strings|-do-not-align-strings]]
[-de] [-df] [-dG] [-di] [-dI]
[-dp] [-dP]
[-ds] [-dS] [-Tmk]
[-no-tail-call-optimization]
[-db] [-c] [-k]
[-cc C-compiler]
[-copt C-compiler-option]*
[pathname]
Compiles the Scheme source file pathname.sc first into a C
file pathname.c and then into an executable image pathname.
Also produces a database file pathname.db. The pathname
argument is required unless -version is specified.
Stalin is an extremely efficient compiler for Scheme. It is
designed to be used not as a development tool but rather as a means to
generate efficient executable images either for application delivery or for
production research runs. In contrast to traditional Scheme implementations,
Stalin is a batch-mode compiler. There is no interactive READ-EVAL-PRINT
loop. Stalin compiles a single Scheme source file into an executable image
(indirectly via C). Running that image has equivalent semantics to loading
the Scheme source file into a virgin Scheme interpreter and then terminating
its execution. The chief limitation is that it is not possible to LOAD or
EVAL new expressions or procedure definitions into a running program after
compilation. In return for this limitation, Stalin does substantial global
compile-time analysis of the source program under this closed-world
assumption and produces executable images that are small, stand-alone, and
fast.
Stalin incorporates numerous strategies for generating efficient
code. Among them, Stalin does global static type analysis using a soft type
system that supports recursive union types. Stalin can determine a narrow or
even monomorphic type for each source code expression in arbitrary Scheme
programs with no type declarations. This allows Stalin to reduce, or often
eliminate, run-time type checking and dispatching. Stalin also does
low-level representation selection on a per-expression basis. This allows
the use of unboxed base machine data representations for all monomorphic
types resulting in extremely high-performance numeric code. Stalin also does
global static life-time analysis for all allocated data. This allows much
temporary allocated storage to be reclaimed without garbage collection.
Finally, Stalin has very efficient strategies for compiling closures.
Together, these compilation techniques synergistically yield efficient
object code. Furthermore, the executable images created by Stalin do not
contain (user-defined or library) procedures that aren't called, variables
and parameters that aren't used, and expressions that cannot be reached.
This encourages a programming style whereby one creates and uses very
general library procedures without fear that executable images will suffer
from code bloat.
- -version
- Prints the version of Stalin and exits immediately.
The following options control preprocessing:
- -I
- Specifies the directories to search for Scheme include files. This option
can be repeated to specify multiple directories. Stalin first searches for
include files in the current directory, then each of the directories
specified in the command line, and finally in the default installation
include directory.
- -s
- Includes the macros from the Scheme->C compatibility library.
Currently, this defines the WHEN and UNLESS syntax.
- -x
- Includes the macros from the Xlib and GL library. Currently, this defines
the FOREIGN-FUNCTION and FOREIGN-DEFINE syntax. This implies
-s.
- -q
- Includes the macros from the QobiScheme library. Currently, this defines
the DEFINE-STRUCTURE syntax, among other things. This implies
-x.
- -t
- Includes the macros needed to compile Stalin with itself. This implies
-q.
The following options control the precision of flow analysis:
- -treat-all-symbols-as-external
- During flow analysis, generate a single abstract external symbol that is
shared among all symbols.
- -do-not-treat-all-symbols-as-external
- During flow analysis, when processing constant expressions that contain
symbols, generate a new abstract internal symbol for each distinct symbol
constant in the program. This is the default.
- -index-allocated-string-types-by-expression
- During flow analysis, when processing procedure-call expressions that can
allocate strings, generate a new abstract string for each such expression.
This is the default.
- -do-not-index-allocated-string-types-by-expression
- During flow analysis, when processing procedure-call expressions that can
allocate strings, generate a single abstract string that is shared among
all such expressions.
Note that there are no versions of the above options for element
type because the element type of a string is always char. Furthermore, there
are no versions of the above options for constant expressions because there
is always only a single abstract constant string.
- -index-constant-structure-types-by-slot-types
- During flow analysis, when processing constant expressions that contain
structures, generate a new abstract structure for each set of potential
slot types for that structure.
- -do-not-index-constant-structure-types-by-slot-types
- During flow analysis, when processing constant expressions that contain
structures, generate a single abstract structure that is shared among all
sets of potential slot types for that structure. This is the default.
- -index-constant-structure-types-by-expression
- During flow analysis, when processing constant expression that contain
structures, generate a new abstract structure for each such expression.
This is the default.
- -do-not-index-constant-structure-types-by-expression
- During flow analysis, when processing constant expressions that contain
structures, generate a single abstract structure that is shared among all
such expressions.
- -index-allocated-structure-types-by-slot-types
- During flow analysis, when processing procedure-call expressions that can
allocate structures, generate a new abstract structure for each set of
potential slot types for that structure.
- -do-not-index-allocated-structure-types-by-slot-types
- During flow analysis, when processing procedure-call expressions that can
allocate structures, generate a single abstract structure that is shared
among all sets of potential slot types for that structure. This is the
default.
- -index-allocated-structure-types-by-expression
- During flow analysis, when processing procedure-call expressions that can
allocate structures, generate a new abstract structure for each such
expression. This is the default.
- -do-not-index-allocated-structure-types-by-expression
- During flow analysis, when processing procedure-call expressions that can
allocate structures, generate a single abstract structure that is shared
among all such expressions.
Note that, currently, pairs are the only kind of structure that
can appear in constant expressions. This may change in the future, if the
reader is extended to support other kinds of structures.
- -index-constant-headed-vector-types-by-element-type
- During flow analysis, when processing constant expressions that contain
headed vectors, generate a new abstract headed vector for each potential
element type for that headed vector.
- -do-not-index-constant-headed-vector-types-by-element-type
- During flow analysis, when processing constant expressions that contain
headed vectors, generate a single abstract headed vector that is shared
among all potential element types for that headed vector. This is the
default.
- -index-constant-headed-vector-types-by-expression
- During flow analysis, when processing constant expressions that contain
headed vectors, generate a new abstract headed vector for each such
expression. This is the default.
- -do-not-index-constant-headed-vector-types-by-expression
- During flow analysis, when processing constant expressions that contain
headed vectors, generate a single abstract headed vector that is shared
among all such expressions.
- -index-allocated-headed-vector-types-by-element-type
- During flow analysis, when processing procedure-call expressions that can
allocate headed vectors, generate a new abstract headed vector for each
potential element type for that headed vector.
- -do-not-index-allocated-headed-vector-types-by-element-type
- During flow analysis, when processing procedure-call expressions that can
allocate headed vectors, generate a single abstract headed vector that is
shared among all potential element types for that headed vector. This is
the default.
- -index-allocated-headed-vector-types-by-expression
- During flow analysis, when processing procedure-call expressions that can
allocate headed vectors, generate a new abstract headed vector for each
such expression. This is the default.
- -do-not-index-allocated-headed-vector-types-by-expression
- During flow analysis, when processing procedure-call expressions that can
allocate headed vectors, generate a single abstract headed vector that is
shared among all such expressions.
- -index-constant-nonheaded-vector-types-by-element-type
- During flow analysis, when processing constant expressions that contain
nonheaded vectors, generate a new abstract nonheaded vector for each
potential element type for that nonheaded vector.
- -do-not-index-constant-nonheaded-vector-types-by-element-type
- During flow analysis, when processing constant expressions that contain
nonheaded vectors, generate a single abstract nonheaded vector that is
shared among all potential element types for that nonheaded vector. This
is the default.
- -index-constant-nonheaded-vector-types-by-expression
- During flow analysis, when processing constant expressions that contain
nonheaded vectors, generate a new abstract nonheaded vector for each such
expression. This is the default.
- -do-not-index-constant-nonheaded-vector-types-by-expression
- During flow analysis, when processing constant expressions that contain
nonheaded vectors, generate a single abstract nonheaded vector that is
shared among all such expressions.
- -index-allocated-nonheaded-vector-types-by-element-type
- During flow analysis, when processing procedure-call expressions that can
allocate nonheaded vectors, generate a new abstract nonheaded vector for
each potential element type for that nonheaded vector.
- -do-not-index-allocated-nonheaded-vector-types-by-element-type
- During flow analysis, when processing procedure-call expressions that can
allocate nonheaded vectors, generate a single abstract nonheaded vector
that is shared among all potential element types for that nonheaded
vector. This is the default.
- -index-allocated-nonheaded-vector-types-by-expression
- During flow analysis, when processing procedure-call expressions that can
allocate nonheaded vectors, generate a new abstract nonheaded vector for
each such expression. This is the default.
- -do-not-index-allocated-nonheaded-vector-types-by-expression
- During flow analysis, when processing procedure-call expressions that can
allocate nonheaded vectors, generate a single abstract nonheaded vector
that is shared among all such expressions.
Note that, currently, constant expressions cannot contain
nonheaded vectors and nonheaded vectors are never allocated by any
procedure-call expression. ARGV is the only nonheaded vector. These options
are included only for completeness and in case future extensions to the
language allow nonheaded vector constants and procedures that allocate
nonheaded vectors.
- -no-clone-size-limit
- Allow unlimited polyvariance, i.e. make copies of procedures of any
size.
- -clone-size-limit
- Specify the polyvariance limit, i.e. make copies of procedures that have
fewer than this many expressions. Must be a nonnegative integer. Defaults
to 80. Specify 0 to disable polyvariance.
- -split-even-if-no-widening
- Normally, polyvariance will make a copy of a procedure only if it is
called with arguments of different types. Specify this option to make
copies of procedures even when they are called with arguments of the same
type. This will allow them to be in-lined.
- -fully-convert-to-CPS
- Normally, lightweight CPS conversion is applied, converting only those
expressions and procedures needed to support escaping continuations. When
this option is specified, the program is fully converted to CPS.
- -no-escaping-continuations
- Normally, full continuations are supported. When this option is specified,
the only continuations that are supported are those that cannot be called
after the procedure that created the continuation has returned.
- -du
- Normally, after flow analysis, Stalin forces each type set to have at most
one structure-type member of a given name, at most one headed-vector-type
member, and at most one nonheaded-vector-type member. This option disables
this, allowing type sets to have multiple structure-type members of a
given name, multiple headed-vector-type members, and multiple
nonheaded-vector-type members. Sometimes yields more efficient code and
sometimes yields less efficient code.
The following options control the amount of run-time
error-checking code generated. Note that, independent of the settings of
these options, Stalin will always generate code that obeys the semantics of
the Scheme language for correct programs. These options only control the
level of safety, that is the degree of run-time error checking for incorrect
programs.
- -Ob
- Specifies that code to check for out-of-bound vector or string subscripts
is to be suppressed. If not specified, a run-time error will be issued if
a vector or string subscript is out of bounds. If specified, the behavior
of programs that have an out-of-bound vector or string subscript is
undefined.
- -Om
- Specifies that code to check for out-of-memory errors is to be suppressed.
If not specified, a run-time error will be issued if sufficient memory
cannot be allocated. If specified, the behavior of programs that run out
of memory is undefined.
- -On
- Specifies that code to check for exact integer overflow is to be
suppressed. If not specified, a run-time error will be issued on exact
integer overflow. If specified, the behavior of programs that cause exact
integer overflow is undefined. Currently, Stalin does not know how to
generate overflow checking code so this option must be specified.
- -Or
- Specifies that code to check for various run-time file-system errors is to
be suppressed. If not specified, a run-time error will be issued when an
unsuccessful attempt is made to open or close a file. If specified, the
behavior of programs that make such unsuccessful file-access attempts is
undefined.
- -Ot
- Specifies that code to check that primitive procedures are passed
arguments of the correct type is suppressed. If not specified, a run-time
error will be issued if a primitive procedure is called with arguments of
the wrong type. If specified, the behavior of programs that call a
primitive procedure with data of the wrong type is undefined.
The following options control the verbosity of the compiler:
- -d0
- Produces a compile-time backtrace upon a compiler error.
- -d1
- Produces commentary during compilation describing what the compiler is
doing.
- -d2
- Produces a decorated listing of the source program after flow
analysis.
- -d3
- Produces a decorated listing of the source program after equivalent types
have been merged.
- -d4
- Produces a call graph of the source program.
- -d5
- Produces a description of all nontrivial native procedures generated.
- -d6
- Produces a list of all expressions and closures that allocate storage
along with a description of where that storage is allocated.
- -d7
- Produces a trace of the lightweight closure-conversion process.
- -closure-conversion-statistics
- Produces a summary of the closure-conversion statistics. These are
automatically processed by the program bcl-to-latex.sc which is run
by the bcl-benchmark script (both in the
/usr/local/stalin/benchmarks directory) to produce tables II, III,
and IV, of the paper Flow-Directed Lightweight Closure
Conversion.
The following options control the storage management strategy used
by compiled code:
- -dc
- Disables the use of alloca(3). Normally, the compiler will use
alloca(3) to allocate on the call stack when possible.
- -dC
- Disables the use of the Boehm conservative garbage collector. Normally,
the compiler will use the Boehm collector to allocate data whose lifetime
is not known to be short. Note that the compiler will still use the Boehm
collector for some data if it cannot allocate that data on the stack or on
a region.
- -dH
- Disables the use of regions for allocating data.
- -dg
- Generate code to produce diagnostic messages when region segments are
allocated and freed.
- -dh
- Disables the use of expandable regions and uses fixed-size regions
instead.
The following options control code generation:
- -d
- Specifies that inexact reals are represented as C doubles. Normally,
inexact reals are represented as C floats.
- -architecture
- Specify the architecture for which to generate code. The default is to
generate code for whatever architecture the compiler is run on. Currently,
the known architectures are IA32, IA32-align-double, SPARC, SPARCv9,
SPARC64, MIPS, Alpha, ARM, M68K, PowerPC, and S390.
- -baseline
- Do not perform lightweight closure conversion. Closures are created for
all procedures. The user would not normally specify this option. It is
only intended to measure the effectiveness of lightweight closure
conversion. It is used by the bcl-benchmark script (in the
/usr/local/stalin/benchmarks directory) to produce tables II, III,
and IV, of the paper Flow-Directed Lightweight Closure
Conversion.
- -conventional
- Perform a simplified version of lightweight closure conversion that does
not rely on interprocedural analysis. Attempts to mimic what
`conventional' compilers do (whatever that is). The user would not
normally specify this option. It is only intended to measure the
effectiveness of lightweight closure conversion. It is used by the
bcl-benchmark script (in the /usr/local/stalin/benchmarks
directory) to produce tables II, III, and IV of the paper Flow-Directed
Lightweight Closure Conversion.
- -lightweight
- Perform lightweight closure conversion. This is the default.
- -immediate-flat
- Generate code using immediate flat closures. This is not (yet)
implemented.
- -indirect-flat
- Generate code using indirect flat closures. This is not (yet)
implemented.
- -immediate-display
- Generate code using immediate display closures.
- -indirect-display
- Generate code using indirect display closures. This is not (yet)
implemented.
- -linked
- Generate code using linked closures. This is the default.
- -align-strings
- Align all strings to fixnum alignment. This will not work when strings are
returned by foreign procedures that are not aligned to fixnum alignment.
It will also not work when ARGV is used, since those strings are also not
aligned to fixnum alignment. This is the default.
- -do-not-align-strings
- Do not align strings to fixnum alignment. This must be specified when
strings returned by foreign procedures are not aligned to fixnum
alignment.
- -de
- Enables the compiler optimization known as EQ? forgery. Sometimes yields
more efficient code and sometimes yields less efficient code.
- -df
- Disables the compiler optimization known as forgery.
- -dG
- Pass arguments using global variables instead of parameters whenever
possible.
- -di
- Generate if statements instead of switch statements for dispatching.
- -dI
- Enables the use of immediate structures.
- -dp
- Enables representation promotion. Promotes some type sets from squeezed to
squished or squished to general if this will decrease the amount of
run-time branching or dispatching representation coercions. Sometimes
yields more efficient code and sometimes yields less efficient code.
- -dP
- Enables copy propagation. Sometimes yields more efficient code and
sometimes yields less efficient code.
- -ds
- Disables the compiler optimization known as squeezing.
- -dS
- Disables the compiler optimization known as squishing.
- -Tmk
- Enables generation of code that works with the Treadmarks
distributed-shared-memory package. Currently this option is not fully
implemented and is not known to work.
- -no-tail-call-optimization
- Stalin now generates code that is properly tail recursive, by default, in
all but the rarest of circumstances. And it can be coerced into generating
properly tail-recursive code in all circumstances by appropriate options.
Some tail-recursive calls, those where the call site is in-lined in the
target, are translated as C goto statements and always result in properly
tail-recursive code. The rest are translated as C function calls in tail
position. This relies on the C compiler to perform tail-call optimization.
gcc(1) versions 2.96 and 3.0.2 (and perhaps other versions) perform
tail-call optimization on IA32 (and perhaps other architectures) when
-foptimize-sibling-calls is specified. (-O2 implies
-foptimize-sibling-calls.) gcc(1) only performs tail-call
optimization on IA32 in certain circumstances. First, the target and the
call site must have compatible signatures. To guarantee compatible
signatures, Stalin passes parameters to C functions that are part of
tail-recursive loops in global variables. Second, the target must not be
declared __attribute__ ((noreturn)). Thus Stalin will not generate
a __attribute__ ((noreturn)) declaration for a function that is
part of a tail-recursive loop even if Stalin knows that it never returns.
Third, the function containing the call site cannot call alloca(3).
gcc(1) does no flow analysis. Any call to alloca(3) in the
function containing the call site, no matter whether the allocated data
escapes, will disable tail-call optimization. Thus Stalin disables stack
allocation of data in any procedure in-lined in a procedure that is part
of a tail-recursive loop. Finally, the call site cannot contain a
reentrant region because reentrant regions are freed upon procedure exit
and a tail call would require an intervening region reclamation. Thus
Stalin disables allocation of data on a reentrant region in any procedure
that is part of a tail-recursive loop. Disabling these optimizations
incurs a cost for the benefit of achieving tail-call optimization. If your
C compiler does not perform tail-call optimization then you may wish not
to pay the cost. The -no-tail-call-optimization option causes
Stalin not to take these above four measures to generate code on which
gcc(1) would perform tail-call optimization. Even when specifying
this option, Stalin still translates calls, where the call site is
in-lined in the target, as C goto statements. There are three rare
occasions that can still foil proper tail recursion. First, if you specify
-dC you may force Stalin to use stack or region allocation even in
a tail-call cycle. You can avoid this by not specifying -dC.
Second, gcc(1) will not perform tail-call optimization when the
function containing the call site applies unary & to a local variable.
gcc(1) does no flow analysis. Any application of unary & to a
local variable in the function containing the call site, no matter whether
the pointer escapes, will disable tail-call optimization. Stalin can
generate such uses of unary & when you specify -de or don't
specify -df. You can avoid such cases by specifying -df and
not specifying -de. Finally, gcc(1) will not perform
tail-call optimization when the function containing the call site calls
setjmp(3). gcc(1) does no flow analysis. Any call to
setjmp(3) in the function containing the call site, no matter
whether the jmp_buf escapes, will disable tail-call optimization.
Stalin translates certain calls to call-with-current-continuation
as calls to setjmp(3). You can force Stalin not to do so by
specifying -fully-convert-to-CPS. Stalin will generate a warning in
the first and third cases, namely, when tail-call optimization is foiled
by reentrant-region allocation or calls to alloca(3) or
setjmp(3). So you can hold off specifying
-fully-convert-to-CPS or refraining from specifying -dC
until you see such warnings. No such warning is generated, however, when
uses of unary & foil tail-call optimization. So you might want to
always specify -df and refrain from specifying -de if you
desire your programs to be properly tail recursive.
The following options control the C-compilation phase:
- -db
- Disables the production of a database file.
- -c
- Specifies that the C compiler is not to be called after generating the C
code. Normally, the C compiler is called after generating the C code to
produce an executable image. This implies -k.
- -k
- Specifies that the generated C file is not to be deleted. Normally, the
generated C file is deleted after it is compiled.
- -cc
- Specifies the C compiler to use. Defaults to gcc(1).
- -copt
- Specifies the options that the C compiler is to be called with. Normally
the C compiler is called without any options. This option can be repeated
to allow passing multiple options to the C compiler.
/usr/local/stalin/include/ default directory for Scheme
include files and library archive files
/usr/local/stalin/include/Scheme-to-C-compatibility.sc include file for
Scheme->C compatibility
/usr/local/stalin/include/QobiScheme.sc include file for QobiScheme
/usr/local/stalin/include/xlib.sc include file for Xlib FPI
/usr/local/stalin/include/xlib-original.sc include file for Xlib FPI
/usr/local/stalin/include/libstalin.a library archive for Xlib FPI
/usr/local/stalin/include/gc.h include file for the Boehm conservative
garbage collector
/usr/local/stalin/include/libgc.a library archive for the Boehm
conservative garbage collector
/usr/local/stalin/include/stalin.architectures the known architectures
and their code-generation parameters
/usr/local/stalin/include/stalin-architecture-name shell script that
determines the architecture on which Stalin is running
/usr/local/stalin/stalin-architecture.c program to construct a new
entry for stalin.architectures with the code-generation parameters
for the machine on which it is run
/usr/local/stalin/benchmarks directory containing benchmarks from the
paper Flow-Directed Lightweight Closure Conversion
/usr/local/stalin/benchmarks/bcl-benchmark script for producing tables
II, III, and IV from the paper Flow-Directed Lightweight Closure
Conversion
/usr/local/stalin/benchmarks/bcl-to-latex.sc Scheme program for
producing tables II, III, and IV from the paper Flow-Directed Lightweight
Closure Conversion
Version 0.11 is an alpha release and contains many known bugs. Not
everything is fully implemented. Bug mail should be addressed to
Bug-Stalin@AI.MIT.EDU and not to the author. Please include the
version number (0.11) in the message. Periodic announcements of bug fixes,
enhancements, and new releases will be made to
Info-Stalin@AI.MIT.EDU. Send mail to
Info-Stalin-Request@AI.MIT.EDU to be added to the
Info-Stalin@AI.MIT.EDU mailing list.
Rob Browning packaged version 0.11 for Debian Linux.