Whether you are doing a local site check or an HTTP site check,
you specify which directories (presumably containing HTML files) to check
with one or more linksets. A linkset uses two wildcard characters @ and #.
Each linkset specifies one or more directories much like the standard * and
? wildcard characters are used to specify the characters in the * names of
files in one directory.
The @ character matches any string of characters (this kind of
acts like "*"), and the # character (which is kind of like
"?") matches any string of characters except "/" . The
best way to understand how @ and # work is to look at a few examples:
the entire site /@
the homepage only (default) /
files in the root directory only /#
. . . and one directory down /#/#
files in the sub directory only /sub/#
files in the sub directory and below /sub/@
specific files /file1 /file2 ...
specific subdirectories /sub1/@ /sub2/@ ...
If you specify more than one linkset, files matching any of the
linksets will be checked. HTML files that don't match any of the linksets
will be skipped. Linklint will see if they exist but won't check any of
their links.
- -skip
skipset
- Skips HTML files that match skipset.
"Linklint" will make sure these files
exist but won't add any of their links to the list of files to check.
Multiple skipsets are allowed, but each must be preceded with
-skip on the command line. Skipsets use the same wildcard
characters as linksets.
- -ignore
ignoreset
- Ignores files matching ignoreset.
"Linklint" doesn't even check to see if
these files exist. Multiple ignoresets are allowed, but each must
be preceded with -ignore on the command line. Ignoresets use the
same wildcard characters as linksets.
- -limit
n
- Limits checking to n HTML files (default 500). All HTML files after
the first n are skipped.
If you are developing HTML pages on a computer that does not have
an http server, or if you are developing a simple site that does not use
Server Redirection or extensive CGI, you should use local site checking.
linklint /@
Checks all HTML files in the current directory and below. Assumes
that the current directory is the server root directory so links starting
with "/" default to this directory. You must specify /@ to
check the entire site. See Which Files to Check for details.
linklint -root dir /@
Checks all HTML files in dir and below. This is useful if you want
to check several sites on the same machine or if you don't want to run
Linklint in your public HTML directory.
- -host
hostname
- By default "Linklint" assumes all links
on your site that start with "http://"
are remote links to other sites. If you have absolute links to your own
site, give "Linklint" your hostname and
links starting with "http://hostname"
will be treated as local files. If you specify -host hostname:port,
only http links to this hostname and port will be treated as local
files.
- -case
- Makes sure that the filename (upper/lower) case used links inside of html
tags matches the case used by the file system. This is for Windows only
and is very handy if you are porting a site to a Unix host.
- -orphan
- Checks all directories that contain files used on the site for unused
(orphan) files.
- -index
file
- Uses file as the default index file instead of the default list
used by "Linklint". You can specify more
than one file but each one must be preceded by -index on the
command line. If a default index file is not found,
"Linklint" uses a listing of the entire
directory. See the Default File section for details.
- -map
/a=[/b]
- Substitutes leading /a with /b. For server-side image maps
or to simulate Server Redirection.
- -no_warn_index
- Turns of the "index file not found" warning. Applies to local
site checking only.
- -no_anchors
- Tells "Linklint" to ignore named
anchors. This could ease memory problems for people with large sites who
are primarily interested in missing pages and not missing named anchors.
This option works for both HTTP and local site checks.
HTTP Site Checking
If you have a complicated site that uses lots of CGI or Server
Redirection, you should use HTTP site checking. Even though an HTTP site
check reads pages via your HTTP server, you will get the best performance if
you do your checking on a machine that has a high speed connection to your
server.
linklint -http -host www.site.com /@
The -http flag tells
"Linklint" to check HTML files on the site
www.site.com via a remote http connection. You must specify a -host whenever
you do an HTTP site check (otherwise Linklint won't where to get your
pages). You can specify /@ to check the entire site. See Which Files
to Check for details.
HTTP Site Check Options
- -http
- This flag tells Linklint to perform an HTTP site check instead of a local
site check. All files (except server side image maps) will be read via the
HTTP protocol from your web server.
- -host
hostname:port
- If you include :port at the end of your hostname, Linklint uses
this port for the HTTP site check.
- -password
realm user:password
- Uses user and password as authorization to enter password
protected realm. Realms are named areas of a site that share a
common set of usernames and passwords. If passwords are needed to check
your site, Linklint will tell you which realms need passwords in warning
messages. Enclose the realm in double quotes if it contains spaces. If no
password is given for a specific realm, Linklint will try using the
password for the ""DEFAULT""
realm if it was provided.
- -timeout
t
- Times out after t seconds (default 15) when getting files via http.
Once data is received, an additional t seconds is allowed. The
timeout is disabled on Windows machines since the Windows port of Perl
does not support the "alarm()"
function.
- -delay
d
- Delays d seconds between requests to the same host (default 0).
This is a friendly thing to do especially if you are checking many links
on the same host.
- -local
linkset
- Gets files that match linkset locally. The default -local
linkset is @.map (which matches any link ending in
.map). This allows Linklint to follow links through server-side
image maps. The default is ignored if you specify your own -local
expressions. You need to specify the -root directory for this
option to work propery.
- -map
/a=[/b]
- Substitutes leading /a with /b. For server-side image maps
or to simulate Server Redirection.
- -no_anchors
- Tells "Linklint" to ignore named
anchors.
- -no_query_string
- Up until version 2.3.4, Linklint did not use query strings while doing
HTTP site checks. Query strings were removed before making HTTP requests.
As of 2.3.4 query strings in links are used in the requests. Use the
-no_query_string flag to get back the "old"
behavior.
- Adds the HTTP header Name: value to all HTTP requests generated by
Linklint. You will need to use quotation marks to hide spaces in the
header line from the command line interpreter. Linklint will automatically
add a space after the first colon if there is not one there already.
Multiple (unique) header lines are allowed.
- -language
zz
- This option is only useful if you are checking a site that uses content
negotiation to present the same URL in different languages.
Creates an HTTP Request header of the form Accept-Language:
zz that is included as part of all HTTP requests generated by
Linklint. Multiple -language specifications are allowed. This
will result in a single Accept-Language: header that lists all of
the languages you have specified in alphabetical order. Some web sites
can use this information to return pages to you in a specific
language.
If you need to get more complicated than this, use the more
general purpose -http_header to create your own header. There is
a partial list of language abbreviations (taken from Debian) included as
part of the Linklint documentation.
A remote URL check is used to see if a remote URL exists (or has
been recently modified). Links in the remote pages are not checked nor does
Linklint look for named anchors in remote URLs.
Remote URL checking can be used to check all of the
"remote" links on your site (those that link to pages on other
sites) or it can check a list of URLs. There are several ways to specify
which remote URLs to check:
linklint http://somehost/file.html
Checks to see if /file.html exists on somehost. Multiple
URLs can be entered on the command line, in an
@commandfile, or in an
@@httpfile. Every URL to be checked must begin with
"http://". This will disable site
checking.
linklint @@httpfile
Checks all the remote http URLs found in httpfile. Anything in the
file starting with "http://" is considered
to be a URL. If the file looks like a remoteX.txt file generated by
Linklint then all failed URLs will be cross referenced.
linklint @@ -doc linkdoc
Assuming you have already done a site check and used -doc
linkdoc to put all of your output files in the linkdoc directory,
Linklint will check all the remote links that were found on your site and
cross reference all failed URLs without doing a site check. You can use the
-netmod or -netset flags to enable the status-cache.
linklint -net [site check options]
The -net flag tells Linklint to check all remote links
after doing either a local or HTTP site check site. If you are having memory
problems, don't use the -net option, instead use one of the @@
options above.
- -timeout
t
- Times out after t seconds (default 15) when getting files via http.
Once data is received, an additional t seconds is allowed. The
timeout is disabled on Windows machines since the Windows port of Perl
does not support the "alarm()"
function.
- -delay
d
- Delays d seconds between requests to the same host (default 0).
This is a friendly thing to do especially if you are checking many links
on the same host.
- -redirect
- Checks for <meta> redirects in the headers of remote URLs that are
html files. If a redirect is found it is followed. This feature is
disabled if the status cache is used.
- -proxy
hostname[:port]
- Sends all remote HTTP requests through the proxy server hostname
and the optional port. This allows you to check remote URLs or (new
with version 2.3.1) your entire site from within a firewall that has an
http proxy server. Some error messages (relating to host errors) may not
be available through a proxy server.
- -concise_url
- Turns off printing successful URLs to STDOUT during remote link
checking.
The Status Cache is a very powerful feature. It allows you to keep
track of recent changes in all of the remote (off-site) pages you link to.
You can then use the Linklint output files to quickly check changed pages to
see if they still meet your needs.
The flags below make use of the status cache file linklint.url
(kept in your HOME or LINKLINT directory). This file keeps track of the
modification dates of all the remote URLs that you check.
- -netmod
- Operates just like -net but makes use of the status cache. Newly
checked URLs will be entered in the cache. Linklint will tell you which
(previously cached) URLs have been modified since the last
-netset.
- -netset
- Like -netmod but also resets the last modified status in the cache
for all URLs that checked ok. If you always use -netset, modified
URLs will be reported just once.
- -retry
- Only checks URLs that have a host fail status in the cache. Sometimes a
URL fails because its host is temporarily down. This flag enables you to
recheck just those links. An easy way to recheck all the cached URLs with
host failures is "linklint @@ -retry".
Use "linklint @@linkdoc/remoteX.txt
-retry" if you want failed URLs to be cross referenced.
- -flush
- Removes all URLs from the cache that are not currently being checked. The
-retry flag has no effect on which URLs are flushed.
- -checksum
- Ensures that every URL that has been modified is reported as such. This
flag can make the remote checking take longer. Many of the pages that
require a checksum are dynamically generated and will always be reported
as modified.
- -cache
directory
- Reads and writes the linklint.url cache file in this directory. The
default directory is set by your LINKLINT or HOME environment
variables.
No output files are generated by default, only progress and a
brief summary of the results are printed to the screen. You can produce
complete documentation (split up into separate files) in a -doc
directory or put selected output in a single -out file or by
redirecting the standard output to a file. See the Output File Specification
section for a detailed description of all output files.
- -doc
linkdoc
- Sends all output to the linkdoc directory. The output is divided
into separate .txt and .html files. Complete documentation
is always produced regardless of the single file flags.
The file index.txt contains an index to all the other
files; index.html is an HTML version of the index. The index
files for remote URL checking are ur_lindex.txt and
url_index.html.
- -textonly
- Prevents any HTML files from being created in the -doc
directory.
- -htmlonly
- Erases redundant text files in the -doc directory after they have
been used to create the HTML output files. The files remote.txt and
remoteX.txt are not erased since they can be used by Linklint to
recheck remote URLs.
- -docbase
base
- Overrides the default base expression used for directing a browser
to the resources listed in the output HTML files. The base is prepended to
local links in the output HTML files. This only affects the links in HTML
output files, it has no effect on what is displayed in these files.
Ordinarily this flag would only be used during a local site check to set
the base to "http://host".
- -output_frames
- All HTML output data files are linked to from index.html. If you
use this flag then the the data files will be opened up in a new frame
(window) which can be handy in some cases since it always leaves the
index.html file open in its own window.
- -output_index
filename
- The output index files were previously named linklint.txt and
linklint.html. These have now been changed to index.txt and
index.html. You can use the -output_index option to change
this name back to "linklint" or to
something else.
- -url_doc_prefix
url/
- By default, the output files associate with remote URL checking all start
with "url". You can change this with the -url_doc_prefix
option. If the url_doc_prefix contains a "/" character then the
appropriate directory will be created (as a subdirectory of the -doc
directory).
- -dont_output
xxxx
- Don't create output files that contain "xxxx". Can be repeated.
Example:
-dont_output "X$"
will supress the output of all cross reference files.
- -error
- Lists missing files and other errors.
- -out file
- Sends list output and summary information to file.
- -list
- Lists all found files, links, directories etc.
- -warn
- Lists all warnings.
- -xref
- Adds cross references to the lists.
- -forward
- Sorts lists by referring file.
- -db1
- Debugs command line input and linkset expressions.
- -db2
- Prints the name of every file that gets checked (not just HTML
files).
- -db3
- Debugs HTML parser, prints out tags and resulting links.
- -db4
- Debugs socket connection (kind of).
- -db5
- Not used.
- -db6
- Details last-modified status for remote URLs (requires -netset or
-netmod).
- -db7
- Prints brief debug information while checking remote URLs.
- -db8
- Prints all http headers while checking remote URLs.
- -db9
- Generates random http errors.
- -version
- Gives version information.
- -help
- Lists a few simple examples of how to use Linklint.
- -help_all
- Lists all help (contained in program) including every input option.
- -quiet
- Disables printing progress to the screen.
- -silent
- Disables printing summarys to the screen.