DAFILESERVER(8) | AFS Command Reference | DAFILESERVER(8) |
dafileserver - Initializes the File Server component of the dafs process
dafileserver
[-auditlog <path to log file>]
[-audit-interface (file | sysvmq)]
[-d <debug level>]
[-p <number of processes>]
[-spare <number of spare blocks>]
[-pctspare <percentage spare>]
[-b <buffers>]
[-l <large vnodes>]
[-s <small vnodes>]
[-vc <volume cachesize>]
[-w <call back wait interval>]
[-cb <number of call backs>]
[-banner]
[-novbc]
[-implicit <admin mode bits: rlidwka>]
[-readonly]
[-admin-write]
[-hr <number of hours between refreshing the host cps>]
[-busyat <redirect clients when queue > n>]
[-nobusy]
[-rxpck <number of rx extra packets>]
[-rxdbg]
[-rxdbge]
[-rxmaxmtu <bytes>]
[-nojumbo]
[-jumbo]
[-rxbind]
[-allow-dotted-principals]
[-L]
[-S]
[-k <stack size>]
[-realm <Kerberos realm name>]
[-udpsize <size of socket buffer in bytes>]
[-sendsize <size of send buffer in bytes>]
[-abortthreshold <abort threshold>]
[-enable_peer_stats]
[-enable_process_stats]
[-syslog [< loglevel >]]
[-mrafslogs]
[-transarc-logs]
[-saneacls]
[-help]
[-vhandle-setaside <fds reserved for non-cache io>]
[-vhandle-max-cachesize <max open files>]
[-vhandle-initial-cachesize <fds reserved for non-cache io>]
[-vattachpar <number of volume attach threads>]
[-m <min percentage spare in partition>]
[-lock]
[-fs-state-dont-save]
[-fs-state-dont-restore]
[-fs-state-verify] (none | save | restore | both)]
[-vhashsize <log(2) of number of volume hash buckets>]
[-vlrudisable]
[-vlruthresh <minutes before eligibility for soft detach>]
[-vlruinterval <seconds between VLRU scans>]
[-vlrumax <max volumes to soft detach in one VLRU scan>]
[-unsafe-nosalvage]
[-offline-timeout <timeout in seconds>]
[-offline-shutdown-timeout <timeout in seconds>]
[-sync <sync behavior>]
[-logfile <log file]
[-config <configuration path]
The dafileserver command initializes the File Server component of the "dafs" process. In the conventional configuration, its binary file is located in the /usr/lib/openafs directory on a file server machine.
The dafileserver command is not normally issued at the command shell prompt, but rather placed into a database server machine's /etc/openafs/BosConfig file with the bos create command. If it is ever issued at the command shell prompt, the issuer must be logged onto a file server machine as the local superuser "root".
The File Server creates the /var/log/openafs/FileLog log file as it initializes, if the file does not already exist. It does not write a detailed trace by default, but the -d option may be used to increase the amount of detail. Use the bos getlog command to display the contents of the log file.
The command's arguments enable the administrator to control many aspects of the File Server's performance, as detailed in "OPTIONS". By default the File Server sets values for many arguments that are suitable for a medium-sized file server machine. To set values suitable for a small or large file server machine, use the -S or -L flag respectively. The following list describes the parameters and corresponding argument for which the File Server sets default values, and the table below summarizes the setting for each of the three machine sizes.
The default values are:
Parameter (Argument) Small (-S) Medium Large (-L) --------------------------------------------------------------------- Number of threads (-p) 6 9 128 Number of cached dir blocks (-b) 70 90 120 Number of cached large vnodes (-l) 200 400 600 Number of cached small vnodes (-s) 200 400 600 Maximum volume cache size (-vc) 200 400 600 Number of callbacks (-cb) 20,000 60,000 64,000 Number of Rx packets (-rxpck) 100 150 200
To override any of the values, provide the indicated argument (which can be combined with the -S or -L flag).
The amount of memory required for the File Server varies. The approximate default memory usage is 751 KB when the -S flag is used (small configuration), 1.1 MB when all defaults are used (medium configuration), and 1.4 MB when the -L flag is used (large configuration). If additional memory is available, increasing the value of the -cb and -vc arguments can improve File Server performance most directly.
By default, the File Server allows a volume to exceed its quota by 1 MB when an application is writing data to an existing file in a volume that is full. The File Server still does not allow users to create new files in a full volume. To change the default, use one of the following arguments:
By default, the File Server implicitly grants the "a" (administer) and "l" (lookup) permissions to system:administrators on the access control list (ACL) of every directory in the volumes stored on its file server machine. In other words, the group's members can exercise those two permissions even when an entry for the group does not appear on an ACL. To change the set of default permissions, use the -implicit argument.
The File Server maintains a host current protection subgroup (host CPS) for each client machine from which it has received a data access request. Like the CPS for a user, a host CPS lists all of the Protection Database groups to which the machine belongs, and the File Server compares the host CPS to a directory's ACL to determine in what manner users on the machine are authorized to access the directory's contents. When the pts adduser or pts removeuser command is used to change the groups to which a machine belongs, the File Server must recompute the machine's host CPS in order to notice the change. By default, the File Server contacts the Protection Server every two hours to recompute host CPSs, implying that it can take that long for changed group memberships to become effective. To change this frequency, use the -hr argument.
The File Server stores volumes in partitions. A partition is a filesystem or directory on the server machine that is named "/vicepX" or "/vicepXX" where XX is "a" through "z" or "aa" though "iv". Up to 255 partitions are allowed. The File Server expects that the /vicepXX directories are each on a dedicated filesystem. The File Server will only use a /vicepXX if it's a mountpoint for another filesystem, unless the file "/vicepXX/AlwaysAttach" exists. A partition will not be mounted if the file "/vicepXX/NeverAttach" exists. If both "/vicepXX/AlwaysAttach" and "/vicepXX/NeverAttach" are present, then "/vicepXX/AlwaysAttach" wins. The data in the partition is a special format that can only be access using OpenAFS commands or an OpenAFS client.
The File Server generates the following message when a partition is nearly full:
No space left on device
This command does not use the syntax conventions of the AFS command suites. Provide the command name and all option names in full.
There are two strategies the File Server can use for attaching AFS volumes at startup and handling volume salvages. The traditional method assumes all volumes are salvaged before the File Server starts and attaches all volumes at start before serving files. The newer demand-attach method attaches volumes only on demand, salvaging them at that time as needed, and detaches volumes that are not in use. A demand-attach File Server can also save state to disk for faster restarts. The dafileserver implements the demand-attach method, while fileserver uses the traditional method.
The choice of traditional or demand-attach File Server changes the required setup in BosConfig. When changing from a traditional File Server to demand-attach or vice versa, you will need to stop and remove the "fs" or "dafs" node in BosConfig and create a new node of the appropriate type. See bos_create(8) for more information.
Do not use the -w argument, which is intended for use by the OpenAFS developers only. Changing it from its default values can result in unpredictable File Server behavior.
Do not specify both the -spare and -pctspare arguments. Doing so causes the File Server to exit, leaving an error message in the /var/log/openafs/FileLog file.
Options that are available only on some system types, such as the -m and -lock options, appear in the output generated by the -help option only on the relevant system type.
Currently, the maximum size of a volume quota is 2 terabytes (2^41 bytes) and the maximum size of a /vicepX partition on a fileserver is 2^64 kilobytes. The maximum partition size in releases 1.4.7 and earlier is 2 terabytes (2^31 bytes). The maximum partition size for 1.5.x releases 1.5.34 and earlier is 2 terabytes as well.
The maximum number of directory entries is 64,000 if all of the entries have names that are 15 octets or less in length. A name that is 15 octets long requires the use of only one block in the directory. Additional sequential blocks are required to store entries with names that are longer than 15 octets. Each additional block provides an additional length of 32 octets for the name of the entry. Note that if file names use an encoding like UTF-8, a single character may be encoded into multiple octets.
In real world use, the maximum number of objects in an AFS directory is usually between 16,000 and 25,000, depending on the average name length.
Defaults to "file".
The maximum number of threads can differ in each release of OpenAFS. Consult the OpenAFS Release Notes for the current release.
File Server is running at I<time>.
Normally, non-DAFS fileservers start accepting requests immediately on startup, but attachment of volumes can take a while. So if a client tries to access a volume that is not attached simply because the fileserver hasn't attached it yet, that client will get an error. With the -nobusy option present, the fileserver will immediately respond with an error code that indicates the server is starting up. However, some older clients (before OpenAFS 1.0) don't understand this error code, and may not function optimally. So the default behavior, without the -nobusy option, is to at first respond with a different error code that is understood by more clients, but is indistinguishable from other scenarios where the volume is busy and not attached for other reasons.
There is usually no reason to use this option under normal operation.
The throttling behaviour can cause issues especially for some versions of the Windows OpenAFS client. When using Windows Explorer to navigate the AFS directory tree, directories with only "look" access for the current user may load more slowly because of the throttling. This is because the Windows OpenAFS client sends FetchStatus calls one at a time instead of in bulk like the Unix Open AFS client.
Setting the threshold to 0 disables the throttling behavior. This option is available in OpenAFS versions 1.4.1 and later.
This option is only meaningful for a file server built with pthreads support.
If a client is interrupted, from the client's point of view, it will appear as if they had accessed the volume after it had gone offline. For RO volumes, this mean the client should fail-over to other valid RO sites for that volume. This option may speed up volume releases if volumes are being accessed by clients that have slow or unreliable network connections.
Setting this option to 0 means to interrupt clients immediately if a volume is waiting to go offline. Setting this option to "-1" means to wait forever for client requests to finish. The default value is "-1".
Setting this option to 0 means to interrupt all clients reading from volumes immediately during the shutdown process. Setting this option to "-1" means to wait forever for client requests to finish during the shutdown process.
If -offline-timeout is specified, the default value of -offline-shutdown-timeout is the value specified for -offline-timeout. Otherwise, the default value is "-1".
Normally, when the fileserver writes to disk, the underlying filesystem or Operating System may delay writes from actually going to disk, and reorder which writes hit the disk first. So, during an unclean shutdown of the machine (if the power goes out, or the machine crashes, etc), file data may become lost that the server previously told clients was already successfully written.
To try to mitigate this, the fileserver will try to "sync" file data to the physical disk at numerous points during various I/O. However, this can result in significantly reduced performance. Depending on the usage patterns, this may or may not be acceptable. This option dictates specifically what the fileserver does when it wants to perform a "sync".
There are several options; pass one of these as the argument to -sync. The default is "onclose".
Note that this is still not a 100% guarantee that data will not be lost or corrupted during a crash. The underlying filesystem itself may cause data to be lost or corrupt in such a situation. And OpenAFS itself does not (yet) even guarantee that all data is consistent at any point in time; so even if the filesystem and OS do not buffer or reorder any writes, you are not guaranteed that all data will be okay after a crash.
This was the only behavior allowed in OpenAFS releases prior to 1.4.5.
Effectively this option is the same as "never" while a volume is attached and actively being used, but if a volume is detached, there is an additional guarantee for the data's consistency.
After the removal of the "delayed" option after the OpenAFS 1.6 series, this option became the default.
Depending on the underlying filesystem and Operating System, there may be guarantees that any data written to disk will hit the physical media after a certain amount of time. For example, Linux's pdflush process usually makes this guarantee, and ext3 can make certain various consistency guarantees according to the options given. ZFS on Solaris can also provide similar guarantees, as can various other platforms and filesystems. Consult the documentation for your platform if you are unsure.
This was the only behavior allowed in OpenAFS releases starting from 1.4.5 up to and including 1.6.2. It was also the default for the 1.6 series starting in OpenAFS 1.6.3.
Which option you choose is not an easy decision to make. Various developers and experts sometimes disagree on which option is the most reasonable, and it may depend on the specific scenario and workload involved. Some argue that the "always" option does not provide significantly greater guarantees over any other option, whereas others argue that choosing anything besides the "always" option allows for an unacceptable risk of data loss. This may depend on your usage patterns, your platform and filesystem, and who you talk to about this topic.
The default is "both".
Due to the increased risk of data corruption, the use of this flag is strongly discouraged. Only use it if you really know what you are doing.
The following bos create command creates a dafs process on the file server machine "fs2.example.com" that uses the large configuration size, and allows volumes to exceed their quota by 10%. Type the command on a single line:
% bos create -server fs2.example.com -instance dafs -type dafs \ -cmd "/usr/lib/openafs/dafileserver -pctspare 10 -L" \ /usr/lib/openafs/davolserver \ /usr/lib/openafs/salvageserver \ /usr/lib/openafs/dasalvager
Sending process signals to the File Server Process can change its behavior in the following ways:
Process Signal OS Result --------------------------------------------------------------------- File Server XCPU Unix Prints a list of client IP Addresses. File Server USR2 Windows Prints a list of client IP Addresses. File Server POLL HPUX Prints a list of client IP Addresses. Any server TSTP Any Increases Debug level by a power of 5 -- 1,5,25,125, etc. This has the same effect as the -d XXX command-line option. Any Server HUP Any Resets Debug level to 0 File Server TERM Any Run minor instrumentation over the list of descriptors. Other Servers TERM Any Causes the process to quit. File Server QUIT Any Causes the File Server to Quit. Bos Server knows this.
The basic metric of whether an AFS file server is doing well is the number of connections waiting for a thread, which can be found by running the following command:
% rxdebug <server> | grep waiting_for | wc -l
Each line returned by "rxdebug" that contains the text "waiting_for" represents a connection that's waiting for a file server thread.
If the blocked connection count is ever above 0, the server is having problems replying to clients in a timely fashion. If it gets above 10, roughly, there will be noticeable slowness by the user. The total number of connections is a mostly irrelevant number that goes essentially monotonically for as long as the server has been running and then goes back down to zero when it's restarted.
The most common cause of blocked connections rising on a server is some process somewhere performing an abnormal number of accesses to that server and its volumes. If multiple servers have a blocked connection count, the most likely explanation is that there is a volume replicated between those servers that is absorbing an abnormally high access rate.
To get an access count on all the volumes on a server, run:
% vos listvol <server> -long
and save the output in a file. The results will look like a bunch of vos examine output for each volume on the server. Look for lines like:
40065 accesses in the past day (i.e., vnode references)
and look for volumes with an abnormally high number of accesses. Anything over 10,000 is fairly high, but some volumes like root.cell and other volumes close to the root of the cell will have that many hits routinely. Anything over 100,000 is generally abnormally high. The count resets about once a day.
Another approach that can be used to narrow the possibilities for a replicated volume, when multiple servers are having trouble, is to find all replicated volumes for that server. Run:
% vos listvldb -server <server>
where <server> is one of the servers having problems to refresh the VLDB cache, and then run:
% vos listvldb -server <server> -part <partition>
to get a list of all volumes on that server and partition, including every other server with replicas.
Once the volume causing the problem has been identified, the best way to deal with the problem is to move that volume to another server with a low load or to stop any runaway programs that are accessing that volume unnecessarily. Often the volume will be enough information to tell what's going on.
If you still need additional information about who's hitting that server, sometimes you can guess at that information from the failed callbacks in the FileLog log in /var/log/afs on the server, or from the output of:
% /usr/afsws/etc/rxdebug <server> -rxstats
but the best way is to turn on debugging output from the file server. (Warning: This generates a lot of output into FileLog on the AFS server.) To do this, log on to the AFS server, find the PID of the fileserver process, and do:
kill -TSTP <pid>
where <pid> is the PID of the file server process. This will raise the debugging level so that you'll start seeing what people are actually doing on the server. You can do this up to three more times to get even more output if needed. To reset the debugging level back to normal, use (The following command will NOT terminate the file server):
kill -HUP <pid>
The debugging setting on the File Server should be reset back to normal when debugging is no longer needed. Otherwise, the AFS server may well fill its disks with debugging output.
The lines of the debugging output that are most useful for debugging load problems are:
SAFS_FetchStatus, Fid = 2003828163.77154.82248, Host 171.64.15.76 SRXAFS_FetchData, Fid = 2003828163.77154.82248
(The example above is partly truncated to highlight the interesting information). The Fid identifies the volume and inode within the volume; the volume is the first long number. So, for example, this was:
% vos examine 2003828163 pubsw.matlab61 2003828163 RW 1040060 K On-line afssvr5.Stanford.EDU /vicepa RWrite 2003828163 ROnly 2003828164 Backup 2003828165 MaxQuota 3000000 K Creation Mon Aug 6 16:40:55 2001 Last Update Tue Jul 30 19:00:25 2002 86181 accesses in the past day (i.e., vnode references) RWrite: 2003828163 ROnly: 2003828164 Backup: 2003828165 number of sites -> 3 server afssvr5.Stanford.EDU partition /vicepa RW Site server afssvr11.Stanford.EDU partition /vicepd RO Site server afssvr5.Stanford.EDU partition /vicepa RO Site
and from the Host information one can tell what system is accessing that volume.
Note that the output of vos_examine(1) also includes the access count, so once the problem has been identified, vos examine can be used to see if the access count is still increasing. Also remember that you can run vos examine on the read-only replica (e.g., pubsw.matlab61.readonly) to see the access counts on the read-only replica on all of the servers that it's located on.
The issuer must be logged in as the superuser "root" on a file server machine to issue the command at a command shell prompt. It is conventional instead to create and start the process by issuing the bos create command.
BosConfig(5), FileLog(5), bos_create(8), bos_getlog(8), fs_setacl(1), msgget(2), msgrcv(2), salvager(8), volserver(8), vos_examine(1)
IBM Corporation 2000. <http://www.ibm.com/> All Rights Reserved.
This documentation is covered by the IBM Public License Version 1.0. It was converted from HTML to POD by software written by Chas Williams and Russ Allbery, based on work by Alf Wachsmann and Elizabeth Cassell.
2021-01-14 | OpenAFS |