gfs2 - GFS2 reference guide
Overview of the GFS2 filesystem
GFS2 is a clustered filesystem, designed for sharing data between
multiple nodes connected to a common shared storage device. It can also be
used as a local filesystem on a single node, however since the design is
aimed at clusters, that will usually result in lower performance than using
a filesystem designed specifically for single node use.
GFS2 is a journaling filesystem and one journal is required for
each node that will mount the filesystem. The one exception to that is
spectator mounts which are equivalent to mounting a read-only block device
and as such can neither recover a journal or write to the filesystem, so do
not require a journal assigned to them.
The GFS2 documentation has been split into a number of
sections:
mkfs.gfs2(8) Create a GFS2 filesystem
fsck.gfs2(8) The GFS2 filesystem checker
gfs2_grow(8) Growing a GFS2 filesystem
gfs2_jadd(8) Adding a journal to a GFS2 filesystem
tunegfs2(8) Tool to manipulate GFS2 superblocks
gfs2_edit(8) A GFS2 debug tool (use with caution)
- lockproto=LockProtoName
- This specifies which inter-node lock protocol is used by the GFS2
filesystem for this mount, overriding the default lock protocol name
stored in the filesystem's on-disk superblock.
The LockProtoName must be one of the supported locking
protocols, currently these are lock_nolock and
lock_dlm.
The default lock protocol name is written to disk initially
when creating the filesystem with mkfs.gfs2(8), -p option. It can
be changed on-disk by using the tunegfs2(8) command.
The lockproto mount option should be used only under
special circumstances in which you want to temporarily use a different
lock protocol without changing the on-disk default. Using the incorrect
lock protocol on a cluster filesystem mounted from more than one node
will almost certainly result in filesystem corruption.
- locktable=LockTableName
- This specifies the identity of the cluster and of the filesystem for this
mount, overriding the default cluster/filesystem identify stored in the
filesystem's on-disk superblock. The cluster/filesystem name is recognized
globally throughout the cluster, and establishes a unique namespace for
the inter-node locking system, enabling the mounting of multiple GFS2
filesystems.
The format of LockTableName is lock-module-specific.
For lock_dlm, the format is clustername:fsname. For
lock_nolock, the field is ignored.
The default cluster/filesystem name is written to disk
initially when creating the filesystem with mkfs.gfs2(8), -t
option. It can be changed on-disk by using the tunegfs2(8)
command.
The locktable mount option should be used only under
special circumstances in which you want to mount the filesystem in a
different cluster, or mount it as a different filesystem name, without
changing the on-disk default.
- localflocks
- This flag tells GFS2 that it is running as a local (not clustered)
filesystem, so it can allow the kernel VFS layer to do all flock and fcntl
file locking. When running in cluster mode, these file locks require
inter-node locks, and require the support of GFS2. When running locally,
better performance is achieved by letting VFS handle the whole job.
This is turned on automatically by the lock_nolock module.
- errors=[panic|withdraw]
- Setting errors=panic causes GFS2 to oops when encountering an error that
would otherwise cause the mount to withdraw or print an assertion warning.
The default setting is errors=withdraw. This option should not be used in
a production system. It replaces the earlier debug option on kernel
versions 2.6.31 and above.
- acl
- Enables POSIX Access Control List acl(5) support within GFS2.
- spectator
- Mount this filesystem using a special form of read-only mount. The mount
does not use one of the filesystem's journals. The node is unable to
recover journals for other nodes.
- norecovery
- A synonym for spectator
- suiddir
- Sets owner of any newly created file or directory to be that of parent
directory, if parent directory has S_ISUID permission attribute bit set.
Sets S_ISUID in any new directory, if its parent directory's S_ISUID is
set. Strips all execution bits on a new file, if parent directory owner is
different from owner of process creating the file. Set this option only if
you know why you are setting it.
- quota=[off/account/on]
- Turns quotas on or off for a filesystem. Setting the quotas to be in the
"account" state causes the per UID/GID usage statistics to be
correctly maintained by the filesystem, limit and warn values are ignored.
The default value is "off".
- discard
- Causes GFS2 to generate "discard" I/O requests for blocks which
have been freed. These can be used by suitable hardware to implement
thin-provisioning and similar schemes. This feature is supported in kernel
version 2.6.30 and above.
- barrier
- This option, which defaults to on, causes GFS2 to send I/O barriers when
flushing the journal. The option is automatically turned off if the
underlying device does not support I/O barriers. We highly recommend the
use of I/O barriers with GFS2 at all times unless the block device is
designed so that it cannot lose its write cache content (e.g. its on a
UPS, or it doesn't have a write cache)
- commit=secs
- This is similar to the ext3 commit= option in that it sets the
maximum number of seconds between journal commits if there is dirty data
in the journal. The default is 60 seconds. This option is only provided in
kernel versions 2.6.31 and above.
- data=[ordered|writeback]
- When data=ordered is set, the user data modified by a transaction is
flushed to the disk before the transaction is committed to disk. This
should prevent the user from seeing uninitialized blocks in a file after a
crash. Data=writeback mode writes the user data to the disk at any time
after it's dirtied. This doesn't provide the same consistency guarantee as
ordered mode, but it should be slightly faster for some workloads. The
default is ordered mode.
- meta
- This option results in selecting the meta filesystem root rather than the
normal filesystem root. This option is normally only used by the GFS2
utility functions. Altering any file on the GFS2 meta filesystem may
render the filesystem unusable, so only experts in the GFS2 on-disk layout
should use this option.
- quota_quantum=secs
- This sets the number of seconds for which a change in the quota
information may sit on one node before being written to the quota file.
This is the preferred way to set this parameter. The value is an integer
number of seconds greater than zero. The default is 60 seconds. Shorter
settings result in faster updates of the lazy quota information and less
likelihood of someone exceeding their quota. Longer settings make
filesystem operations involving quotas faster and more efficient.
- statfs_quantum=secs
- Setting statfs_quantum to 0 is the preferred way to set the slow version
of statfs. The default value is 30 secs which sets the maximum time period
before statfs changes will be syned to the master statfs file. This can be
adjusted to allow for faster, less accurate statfs values or slower more
accurate values. When set to 0, statfs will always report the true
values.
- statfs_percent=value
- This setting provides a bound on the maximum percentage change in the
statfs information on a local basis before it is synced back to the master
statfs file, even if the time period has not expired. If the setting of
statfs_quantum is 0, then this setting is ignored.
- rgrplvb
- This flag tells gfs2 to look for information about a resource group's free
space and unlinked inodes in its glock lock value block. This keeps gfs2
from having to read in the resource group data from disk, speeding up
allocations in some cases. This option was added in the 3.6 Linux kernel.
Prior to this kernel, no information was saved to the resource group lvb.
Note: To safely turn on this option, all nodes mounting the
filesystem must be running at least a 3.6 Linux kernel. If any nodes had
previously mounted the filesystem using older kernels, the filesystem must
be unmounted on all nodes before it can be mounted with this option
enabled. This option does not need to be enabled on all nodes using a
filesystem.
- loccookie
- This flag tells gfs2 to use location based readdir cookies, instead of its
usual filename hash readdir cookies. The filename hash cookies are not
guaranteed to be unique, and as the number of files in a directory
increases, so does the likelihood of a collision. NFS requires readdir
cookies to be unique, which can cause problems with very large directories
(over 100,000 files). With this flag set, gfs2 will try to give out
location based cookies. Since the cookie is 31 bits, gfs2 will eventually
run out of unique cookies, and will fail back to using hash cookies. The
maximum number of files that could have unique location cookies assuming
perfectly even hashing and names of 8 or fewer characters is
1,073,741,824. An average directory should be able to give out well over
half a billion location based cookies. This option was added in the 4.5
Linux kernel. Prior to this kernel, gfs2 did not add directory entries in
a way that allowed it to use location based readdir cookies. Note:
To safely turn on this option, all nodes mounting the filesystem must be
running at least a 4.5 Linux kernel. If this option is only enabled on
some of the nodes mounting a filesystem, the cookies returned by nodes
using this option will not be valid on nodes that are not using this
option, and vice versa. Finally, when first enabling this option on a
filesystem that had been previously mounted without it, you must make sure
that there are no outstanding cookies being cached by other software, such
as NFS.
GFS2 clustering is driven by the dlm, which depends on
dlm_controld to provide clustering from userspace. dlm_controld clustering
is built on corosync cluster/group membership and messaging. GFS2 also
requires clustered lvm which is provided by lvmlockd or, previously, clvmd.
Refer to the documentation for each of these components and ensure that they
are configured before setting up a GFS2 filesystem. Also refer to your
distribution's documentation for any specific support requirements.
Ensure that gfs2-utils is installed on all nodes which mount the
filesystem as it provides scripts required for correct withdraw event
response.
1. Create the gfs2 filesystem
mkfs.gfs2 -p lock_dlm -t cluster_name:fs_name -j num
/path/to/storage
The cluster_name must match the name configured in corosync (and
thus dlm). The fs_name must be a unique name for the filesystem in the
cluster. The -j option is the number of journals to create; there must be
one for each node that will mount the filesystem.
2. Mount the gfs2 filesystem
If you are using a clustered resource manager, see its
documentation for enabling a gfs2 filesystem resource. Otherwise, run:
mount /path/to/storage /mountpoint
Run "dlm_tool ls" to verify the nodes that have each fs
mounted.
3. Shut down
If you are using a clustered resource manager, see its
documentation for disabling a gfs2 filesystem resource. Otherwise, run:
umount -a -t gfs2