VIRTIOFSD(1) | QEMU | VIRTIOFSD(1) |
virtiofsd - QEMU virtio-fs shared file system daemon
virtiofsd [OPTIONS]
Share a host directory tree with a guest through a virtio-fs device. This program is a vhost-user backend that implements the virtio-fs device. Each virtio-fs device instance requires its own virtiofsd process.
This program is designed to work with QEMU's --device vhost-user-fs-pci but should work with any virtual machine monitor (VMM) that supports vhost-user. See the Examples section below.
This program must be run as the root user. The program drops privileges where possible during startup although it must be able to create and access files with any uid/gid:
In "namespace" sandbox mode the program switches into a new file system namespace and invokes pivot_root(2) to make the shared directory tree its root. A new pid and net namespace is also created to isolate the process.
In "chroot" sandbox mode the program invokes chroot(2) to make the shared directory tree its root. This mode is intended for container environments where the container runtime has already set up the namespaces and the program does not have permission to create namespaces itself.
Both sandbox modes prevent "file system escapes" due to symlinks and other file system objects that might lead to files outside the shared directory.
By default the name of xattr's used by the client are passed through to the server file system. This can be a problem where either those xattr names are used by something on the server (e.g. selinux client/server confusion) or if the virtiofsd is running in a container with restricted privileges where it cannot access some attributes.
A mapping of xattr names can be made using -o xattrmap=mapping where the mapping string consists of a series of rules.
The first matching rule terminates the mapping. The set of rules must include a terminating rule to match any remaining attributes at the end.
Each rule consists of a number of fields separated with a separator that is the first non-white space character in the rule. This separator must then be used for the whole rule. White space may be added before and after each rule.
Using ':' as the separator a rule is of the form:
:type:scope:key:prepend:
scope is:
type is one of:
key is a string tested as a prefix on an attribute name originating on the client. It maybe empty in which case a 'client' rule will always match on client names.
prepend is a string tested as a prefix on an attribute name originating on the server, and used as a new prefix. It may be empty in which case a 'server' rule will always match on all names from the server.
e.g.:
will match 'trusted.' attributes in client calls and prefix them before passing them to the server.
:prefix:server::user.virtiofs.:
will strip 'user.virtiofs.' from all server replies.
:prefix:all:trusted.:user.virtiofs.:
combines the previous two cases into a single rule.
:ok:client:user.::
will allow get/set xattr for 'user.' xattr's and ignore following rules.
:ok:server::security.:
will pass 'security.' xattr's in listxattr from the server and ignore following rules.
:ok:all:::
will terminate the rule search passing any remaining attributes in both directions.
:bad:server::security.:
would hide 'security.' xattr's in listxattr from the server.
A simpler 'map' type provides a shorter syntax for the common case:
:map:key:prepend:
The 'map' type adds a number of separate rules to add prepend as a prefix to the matched key (or all attributes if key is empty). There may be at most one 'map' rule and it must be the last rule in the set.
Note: When the 'security.capability' xattr is remapped, the daemon has to do extra work to remove it during many operations, which the host kernel normally does itself.
Operating systems typically partition the xattr namespace using well defined name prefixes. Each partition may have different access controls applied. For example, on Linux there are multiple partitions
While other OS such as FreeBSD have different name prefixes and access control rules.
When remapping attributes on the host, it is important to ensure that the remapping does not allow a guest user to evade the guest access control rules.
Consider if trusted.* from the guest was remapped to user.virtiofs.trusted* in the host. An unprivileged user in a Linux guest has the ability to write to xattrs under user.*. Thus the user can evade the access control restriction on trusted.* by instead writing to user.virtiofs.trusted.*.
As noted above, the partitions used and access controls applied, will vary across guest OS, so it is not wise to try to predict what the guest OS will use.
The simplest way to avoid an insecure configuration is to remap all xattrs at once, to a given fixed prefix. This is shown in example (1) below.
If selectively mapping only a subset of xattr prefixes, then rules must be added to explicitly block direct access to the target of the remapping. This is shown in example (2) below.
-o xattrmap=":prefix:all::user.virtiofs.::bad:all:::"
This uses two rules, using : as the field separator; the first rule prefixes and strips 'user.virtiofs.', the second rule hides any non-prefixed attributes that the host set.
This is equivalent to the 'map' rule:
-o xattrmap=":map::user.virtiofs.:"
"/prefix/all/trusted./user.virtiofs./
/bad/server//trusted./
/bad/client/user.virtiofs.//
/ok/all///"
Here there are four rules, using / as the field separator, and also demonstrating that new lines can be included between rules. The first rule is the prefixing of 'trusted.' and stripping of 'user.virtiofs.'. The second rule hides unprefixed 'trusted.' attributes on the host. The third rule stops a guest from explicitly setting the 'user.virtiofs.' path directly to prevent access control bypass on the target of the earlier prefix remapping. Finally, the fourth rule lets all remaining attributes through.
This is equivalent to the 'map' rule:
-o xattrmap="/map/trusted./user.virtiofs./"
"/bad/all/security./security./
/ok/all///'
The first rule combines what could be separate client and server rules into a single 'all' rule, matching 'security.' in either client arguments or lists returned from the host. This stops the client seeing any 'security.' attributes on the server and stops it setting any.
One can enable support for SELinux by running virtiofsd with option "-o security_label". But this will try to save guest's security context in xattr security.selinux on host and it might fail if host's SELinux policy does not permit virtiofsd to do this operation.
Hence, it is preferred to remap guest's "security.selinux" xattr to say "trusted.virtiofs.security.selinux" on host.
"-o xattrmap=:map:security.selinux:trusted.virtiofs.:"
This will make sure that guest and host's SELinux xattrs on same file remain separate and not interfere with each other. And will allow both host and guest to implement their own separate SELinux policies.
Setting trusted xattr on host requires CAP_SYS_ADMIN. So one will need add this capability to daemon.
"-o modcaps=+sys_admin"
Giving CAP_SYS_ADMIN increases the risk on system. Now virtiofsd is more powerful and if gets compromised, it can do lot of damage to host system. So keep this trade-off in my mind while making a decision.
Export /var/lib/fs/vm001/ on vhost-user UNIX domain socket /var/run/vm001-vhost-fs.sock:
host# virtiofsd --socket-path=/var/run/vm001-vhost-fs.sock -o source=/var/lib/fs/vm001 host# qemu-system-x86_64 \
-chardev socket,id=char0,path=/var/run/vm001-vhost-fs.sock \
-device vhost-user-fs-pci,chardev=char0,tag=myfs \
-object memory-backend-memfd,id=mem,size=4G,share=on \
-numa node,memdev=mem \
... guest# mount -t virtiofs myfs /mnt
Stefan Hajnoczi <stefanha@redhat.com>, Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
2024, The QEMU Project Developers
February 6, 2024 | 7.2.9 |