xl.cfg(5) | Xen | xl.cfg(5) |
xl.cfg - xl domain configuration file syntax
/etc/xen/xldomain
Creating a VM (a domain in Xen terminology, sometimes called a guest) with xl requires the provision of a domain configuration file. Typically, these live in /etc/xen/DOMAIN.cfg, where DOMAIN is the name of the domain.
A domain configuration file consists of a series of options, specified by using "KEY=VALUE" pairs.
Some "KEY"s are mandatory, some are general options which apply to any guest type, while others relate only to specific guest types (e.g. PV or HVM guests).
A "VALUE" can be one of:
The semantics of each "KEY" defines which type of "VALUE" is required.
Pairs may be separated either by a newline or a semicolon. Both of the following are valid:
name="h0" type="hvm" name="h0"; type="hvm"
The following key is mandatory for any guest type.
Deprecated guest type selection
Note that the builder option is being deprecated in favor of the type option.
The following options apply to guests of any type.
CPU Allocation
Combining this notation with the one above is possible. For instance, "1,node:1,^6", means all the vCPUs of the guest will run on CPU 1 and on all the CPUs of NUMA node 1, but not on CPU 6. Following the same example as above, that would be CPUs 1,4,5,7.
Combining this with "all" is also possible, meaning "all,^node:1" results in all the vCPUs of the guest running on all the CPUs on the host, except for the CPUs belonging to the host NUMA node 1.
More complex notation can be also used, exactly as described above. So "all,^5-8", or just "all", or "node:0,node:2,^9-11,18-20" are all legal, for each element of the list.
If this option is not specified, no vCPU to CPU pinning is established, and the vCPUs of the guest can run on all the CPUs of the host. If this option is specified, the intersection of the vCPU pinning mask, provided here, and the soft affinity mask, if provided via cpus_soft=, is utilized to compute the domain node-affinity for driving memory allocations.
A "CPULIST" is specified exactly as for cpus=, detailed earlier in the manual.
If this option is not specified, the vCPUs of the guest will not have any preference regarding host CPUs. If this option is specified, the intersection of the soft affinity mask, provided here, and the vCPU pinning, if provided via cpus=, is utilized to compute the domain node-affinity for driving memory allocations.
If this option is not specified (and cpus= is not specified either), libxl automatically tries to place the guest on the least possible number of nodes. A heuristic approach is used for choosing the best node (or set of nodes), with the goal of maximizing performance for the guest and, at the same time, achieving efficient utilization of host CPUs and memory. In that case, the soft affinity of all the vCPUs of the domain will be set to host CPUs belonging to NUMA nodes chosen during placement.
For more details, see xl-numa-placement(7).
CPU Scheduling
NOTE: Many systems have features that will scale down the computing power of a CPU that is not 100% utilized. This can be done in the operating system, but can also sometimes be done below the operating system, in the BIOS. If you set a cap such that individual cores are running at less than 100%, this may have an impact on the performance of your workload over and above the impact of the cap. For example, if your processor runs at 2GHz, and you cap a VM at 50%, the power management system may also reduce the clock speed to 1GHz; the effect will be that your VM gets 25% of the available power (50% of 1GHz) rather than 50% (50% of 2GHz). If you are not getting the performance you expect, look at performance and CPU frequency options in your operating system and your BIOS.
Memory Allocation
In combination with memory= it will start the guest "pre-ballooned", if the values of memory= and maxmem= differ. A "pre-ballooned" HVM guest needs a balloon driver, without a balloon driver it will crash.
NOTE: Because of the way ballooning works, the guest has to allocate memory to keep track of maxmem pages, regardless of how much memory it actually has available to it. A guest with maxmem=262144 and memory=8096 will report significantly less memory available for use than a system with maxmem=8096 memory=8096 due to the memory overhead of having to track the unused pages.
Guest Virtual NUMA Configuration
Note that virtual NUMA is not supported for PV guests yet, because there is an issue with the CPUID instruction handling that affects PV virtual NUMA. Furthermore, guests with virtual NUMA cannot be saved or migrated because the migration stream does not preserve node information.
Each VNODE_SPEC is a list, which has a form of "[VNODE_CONFIG_OPTION, VNODE_CONFIG_OPTION, ... ]" (without the quotes).
For example, vnuma = [ ["pnode=0","size=512","vcpus=0-4","vdistances=10,20"] ] means vnode 0 is mapped to pnode 0, has 512MB ram, has vcpus 0 to 4, the distance to itself is 10 and the distance to vnode 1 is 20.
Each VNODE_CONFIG_OPTION is a quoted "KEY=VALUE" pair. Supported VNODE_CONFIG_OPTIONs are (they are all mandatory at the moment):
Normally you can use the values from xl info -n or numactl --hardware to fill the vdistances list.
Event Actions
The default for on_poweroff is destroy.
Direct Kernel Boot
Direct kernel boot allows booting guests with a kernel and an initrd stored on a filesystem available to the host physical machine, allowing command line arguments to be passed directly. PV guest direct kernel boot is supported. HVM guest direct kernel boot is supported with some limitations (it's supported when using qemu-xen and the default BIOS 'seabios', but not supported in case of using stubdom-dm and the old 'rombios'.)
Non direct Kernel Boot
Non direct kernel boot allows booting guests with a firmware. This can be used by all types of guests, although the selection of options is different depending on the guest type.
This option provides the flexibly of letting the guest decide which kernel they want to boot, while preventing having to poke at the guest file system form the toolstack domain.
PV guest options
Note that xl expects to find the pvgrub32.bin and pvgrub64.bin binaries in /usr/lib/xen-4.11/boot.
HVM guest options
PVH guest options
Currently there's no firmware available for PVH guests, they should be booted using the Direct Kernel Boot method or the bootloader option.
Default is false.
Other Options
Given the complexity of verifying the validity of a device tree, this option should only be used with a trusted device tree.
Note that the partial device tree should avoid using the phandle 65000 which is reserved by the toolstack.
The following options define the paravirtual, emulated and physical devices which the guest will contain.
Each VTPM_SPEC_STRING is a comma-separated list of "KEY=VALUE" settings from the following list:
Each 9PFS_SPEC_STRING is a comma-separated list of "KEY=VALUE" settings, from the following list:
This option does not control the emulated graphics card presented to an HVM guest. See Emulated VGA Graphics Device below for how to configure the emulated device. If Emulated VGA Graphics Device options are used in a PV guest configuration, xl will pick up vnc, vnclisten, vncpasswd, vncdisplay, vncunused, sdl, opengl and keymap to construct the paravirtual framebuffer device for the guest.
Each VFB_SPEC_STRING is a comma-separated list of "KEY=VALUE" settings, from the following list:
Note: if you specify the display number here, you should not use the vncdisplay option.
Note: you should not use this option if you set the DISPLAYNUM in the vnclisten option.
Each CHANNEL_SPEC_STRING is a comma-separated list of "KEY=VALUE" settings. Leading and trailing whitespace is ignored in both KEY and VALUE. Neither KEY nor VALUE may contain ',', '=' or '"'. Defined values are:
RDM_RESERVATION_STRING is a comma separated list of "KEY=VALUE" settings, from the following list:
By default this isn't set so we don't check all RDMs. Instead, we just check the RDM specific to a given device if we're assigning this kind of a device.
Note: this option is not recommended unless you can make sure that no conflicts exist.
For example, you're trying to set "memory = 2800" to allocate memory to one given VM but the platform owns two RDM regions like:
Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff
Device B [sbdf_B]: RMRR region_B: base_addr ad800000 end_address afffffff
In this conflict case,
#1. If strategy is set to "host", for example:
rdm = "strategy=host,policy=strict" or rdm = "strategy=host,policy=relaxed"
it means all conflicts will be handled according to the policy introduced by policy as described below.
#2. If strategy is not set at all, but
pci = [ 'sbdf_A, rdm_policy=xxxxx' ]
it means only one conflict of region_A will be handled according to the policy introduced by rdm_policy=STRING as described inside pci options.
Note: this may be overridden by the rdm_policy option in the pci device configuration.
Each USBCTRL_SPEC_STRING is a comma-separated list of "KEY=VALUE" settings, from the following list:
This option is the default.
USB controller ids start from 0. In line with the USB specification, however, ports on a controller start from 1.
EXAMPLE
The first controller is USB1.1 and has:
controller id = 0, and ports 1,2,3,4.
The second controller is USB2.0 and has:
controller id = 1, and ports 1,2,3,4,5,6,7,8.
Each USBDEV_SPEC_STRING is a comma-separated list of "KEY=VALUE" settings, from the following list:
If no controller is specified, an available controller:port combination will be used. If there are no available controller:port combinations, a new controller will be created.
Note: by default lspci(1) will omit the domain (DDDD) if it is zero and it is optional here also. You may specify the function (F) as * to indicate all functions.
This option should be enabled with caution: it gives the guest much more control over the device, which may have security or stability implications. It is recommended to only enable this option for trusted VMs under administrator's control.
WARNING: If you set this option, xl will gladly re-assign a critical system device, such as a network or a disk controller being used by dom0 without confirmation. Please use with care.
Note: this would override global rdm option.
The graphics card PCI device to pass through is chosen with the pci option, in exactly the same way a normal Xen PCI device passthrough/assignment is done. Note that gfx_passthru does not do any kind of sharing of the GPU, so you can assign the GPU to only one single VM at a time.
gfx_passthru also enables various legacy VGA memory ranges, BARs, MMIOs, and ioports to be passed through to the VM, since those are required for correct operation of things like VGA BIOS, text mode, VBE, etc.
Enabling the gfx_passthru option also copies the physical graphics card video BIOS to the guest memory, and executes the VBIOS in the guest to initialize the graphics card.
Most graphics adapters require vendor specific tweaks for properly working graphics passthrough. See the XenVGAPassthroughTestedAdapters <http://wiki.xen.org/wiki/XenVGAPassthroughTestedAdapters> wiki page for graphics cards currently supported by gfx_passthru.
gfx_passthru is currently supported both with the qemu-xen-traditional device-model and upstream qemu-xen device-model.
When given as a boolean the gfx_passthru option either disables graphics card passthrough or enables autodetection.
When given as a string the gfx_passthru option describes the type of device to enable. Note that this behavior is only supported with the upstream qemu-xen device-model. With qemu-xen-traditional IGD (Intel Graphics Device) is always assumed and options other than autodetect or explicit IGD will result in an error.
Currently, valid values for the option are:
Note that some graphics cards (AMD/ATI cards, for example) do not necessarily require the gfx_passthru option, so you can use the normal Xen PCI passthrough to assign the graphics card as a secondary graphics card to the VM. The QEMU-emulated graphics card remains the primary graphics card, and VNC output is available from the QEMU-emulated primary adapter.
More information about the Xen gfx_passthru feature is available on the XenVGAPassthrough <http://wiki.xen.org/wiki/XenVGAPassthrough> wiki page.
When RDM conflicts with RAM, RDM is probably scattered over the whole RAM space. Having multiple RDM entries would worsen this and lead to a complicated memory layout. Here we're trying to figure out a simple solution to avoid breaking the existing layout. When a conflict occurs,
#1. Above a predefined boundary - move lowmem_end below the reserved region to solve the conflict; #2. Below a predefined boundary - Check if the policy is strict or relaxed. A "strict" policy leads to a fail in libxl. Note that when both policies are specified on a given region, "strict" is always preferred. The "relaxed" policy issues a warning message and also masks this entry INVALID to indicate we shouldn't expose this entry to hvmloader.
The default value is 2048.
It is recommended to only use this option for trusted VMs under administrator's control.
IOMEM_START is a physical page number. NUM_PAGES is the number of pages, beginning with START_PAGE, to allow access to. GFN specifies the guest frame number where the mapping will start in the guest's address space. If GFN is not specified, the mapping will be performed using IOMEM_START as a start in the guest's address space, therefore performing a 1:1 mapping by default. All of these values must be given in hexadecimal format.
Note that the IOMMU won't be updated with the mappings specified with this option. This option therefore should not be used to pass through any IOMMU-protected devices.
It is recommended to only use this option for trusted VMs under administrator's control.
It is recommended to only use this option for trusted VMs under administrator's control.
If vuart console is enabled then irq 32 is reserved for it. See "vuart="uart"" to know how to enable vuart console.
The default of 1023 should be sufficient for typical guests. The maximum value depends on what the guest supports. Guests supporting the FIFO-based event channel ABI support up to 131,071 event channels. Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit x86).
Each VDISPL_SPEC_STRING is a comma-separated list of "KEY=VALUE" settings, from the following list:
EXAMPLE
With this feature enabled, a compromise of the device model, via such a vulnerability, will not provide a privilege escalation attack on the whole system.
This feature is a technology preview. There are some significant limitations:
Ideally, set aside a range of 32752 uids (from N to N+32751) and create a user whose name is xen-qemuuser-range-base and whose uid is N and whose gid is a plain unprivileged gid. libxl will use one such user for each domid.
Alternatively, either create xen-qemuuser-domid$domid for every $domid from 1 to 32751 inclusive, or xen-qemuuser-shared (in which case different guests will not be protected against each other).
The following options apply only to Paravirtual (PV) guests.
Exposing the host e820 to the guest gives the guest kernel the opportunity to set aside the required part of its pseudo-physical address space in order to provide address space to map passedthrough PCI devices. It is guest Operating System dependent whether this option is required, specifically it is required when using a mainline Linux ("pvops") kernel. This option defaults to true (1) if any PCI passthrough devices are configured and false (0) otherwise. If you do not configure any passthrough devices at domain creation time but expect to hotplug devices later then you should set this option. Conversely if your particular guest kernel does not require this behaviour then it is safe to allow this to be enabled but you may wish to disable it anyway.
The following options apply only to Fully-virtualised (HVM) guests.
Boot Device
Possible values are:
Note: multiple options can be given and will be attempted in the order they are given, e.g. to boot from CD-ROM but fall back to the hard disk you can specify it as dc.
The default is cd, meaning try booting from the hard disk first, but fall back to the CD-ROM.
Emulated disk controller type
Possible values are:
The default is ide.
Paging
The following options control the mechanisms used to virtualise guest memory. The defaults are selected to give the best results for the common cases so you should normally leave these options unspecified.
Processor and Platform Features
The following options allow various processor and platform level features to be hidden or exposed from the guest's point of view. This can be useful when running older guest Operating Systems which may misbehave when faced with more modern features. In general, you should accept the defaults for these options wherever possible.
This option does not have any effect if using bios="rombios" or device_model_version="qemu-xen-traditional".
The valid values are as follows:
Note: While the option "altp2mhvm" is deprecated, legacy applications for x86 systems will continue to work using it.
The libxl syntax is a comma separated list of key=value pairs, preceded by the word "host". A few keys take a numerical value, all others take a single character which describes what to do with the feature bit.
Possible values for a single feature bit:
'1' -> force the corresponding bit to 1
'0' -> force to 0
'x' -> Get a safe value (pass through and mask with the default
policy)
'k' -> pass through the host bit value
's' -> as 'k' but preserve across save/restore and migration (not
implemented)
Note: when specifying cpuid for hypervisor leaves (0x4000xxxx major group) only the lowest 8 bits of leaf's 0x4000xx00 EAX register are processed, the rest are ignored (these 8 bits signify maximum number of hypervisor leaves).
List of keys taking a value: apicidsize brandid clflush family localapicid maxleaf maxhvleaf model nc proccount procpkg stepping
List of keys taking a character: 3dnow 3dnowext 3dnowprefetch abm acpi adx aes altmovcr8 apic arat avx avx2 avx512-4fmaps avx512-4vnniw avx512bw avx512cd avx512dq avx512er avx512f avx512ifma avx512pf avx512vbmi avx512vl bmi1 bmi2 clflushopt clfsh clwb cmov cmplegacy cmpxchg16 cmpxchg8 cmt cntxid dca de ds dscpl dtes64 erms est extapic f16c ffxsr fma fma4 fpu fsgsbase fxsr hle htt hypervisor ia64 ibs invpcid invtsc lahfsahf lm lwp mca mce misalignsse mmx mmxext monitor movbe mpx msr mtrr nodeid nx ospke osvw osxsave pae page1gb pat pbe pcid pclmulqdq pdcm perfctr_core perfctr_nb pge pku popcnt pse pse36 psn rdrand rdseed rdtscp rtm sha skinit smap smep smx ss sse sse2 sse3 sse4.1 sse4.2 sse4_1 sse4_2 sse4a ssse3 svm svm_decode svm_lbrv svm_npt svm_nrips svm_pausefilt svm_tscrate svm_vmcbclean syscall sysenter tbm tm tm2 topoext tsc tsc-deadline tsc_adjust umip vme vmx wdt x2apic xop xsave xtpr
The xend syntax is a list of values in the form of
'leafnum:register=bitstring,register=bitstring'
"leafnum" is the requested function,
"register" is the response register to modify
"bitstring" represents all bits in the register, its length
must be 32 chars.
Each successive character represent a lesser-significant bit, possible
values
are listed above in the libxl section.
Example to hide two features from the guest: 'tm', which is bit #29 in EDX, and 'pni' (SSE3), which is bit #0 in ECX:
xend: [ "1:ecx=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0,edx=xx0xxxxxxxxxxxxxxxxxxxxxxxxxxxxx" ]
libxl: "host,tm=0,sse3=0"
More info about the CPUID instruction can be found in the processor manuals, and on Wikipedia: <http://en.wikipedia.org/wiki/CPUID>
The VM generation ID is a 128-bit random number that a guest may use to determine if the guest has been restored from an earlier snapshot or cloned.
This is required for Microsoft Windows Server 2012 (and later) domain controllers.
Valid options are:
See also "Virtual Machine Generation ID" by Microsoft: <http://www.microsoft.com/en-us/download/details.aspx?id=30707>
Guest Virtual Time Controls
Options are:
If a HVM container in default TSC mode is created on a host that provides constant host TSC, its guest TSC frequency will be the same as the host. If it is later migrated to another host that provide constant host TSC and supports Intel VMX TSC scaling/AMD SVM TSC ratio, its guest TSC frequency will be the same before and after migration, and guest rdtsc/p will be executed natively after migration as well
If a HVM container in native_paravirt TSC mode can execute both guest rdtsc and guest rdtscp natively, then the guest TSC frequency will be determined in a similar way to that of default TSC mode.
Please see xen-tscmode(7) for more information on this option.
Memory layout
Cannot be smaller than 256. Cannot be larger than 3840.
Known good large value is 3072.
Support for Paravirtualisation of HVM Guests
The following options allow Paravirtualised features (such as devices) to be exposed to the guest Operating System in an HVM guest. Utilising these features requires specific guest support but when available they will result in improved performance.
Setting xen_platform_pci=0 with the default device_model "qemu-xen" requires at least QEMU 1.6.
Groups can be disabled by prefixing the name with '!'. So, for example, to enable all groups except freq, specify:
For details of the enlightenments see the latest version of Microsoft's Hypervisor Top-Level Functional Specification.
The enlightenments should be harmless for other versions of Windows (although they will not give any benefit) and the majority of other non-Windows OSes. However it is known that they are incompatible with some other Operating Systems and in some circumstance can prevent Xen's own paravirtualisation interfaces for HVM guests from being used.
The viridian option can be specified as a boolean. A value of true (1) is equivalent to the list [ "defaults" ], and a value of false (0) is equivalent to an empty list.
Emulated VGA Graphics Device
The following options control the features of the emulated graphics device. Many of these options behave similarly to the equivalent key in the VFB_SPEC_STRING for configuring virtual frame buffer devices (see above).
When using the qemu-xen-traditional device-model, the default as well as minimum amount of video RAM for stdvga is 8 MB, which is sufficient for e.g. 1600x1200 at 32bpp. For the upstream qemu-xen device-model, the default and minimum is 16 MB.
When using the emulated Cirrus graphics card (vga="cirrus") and the qemu-xen-traditional device-model, the amount of video RAM is fixed at 4 MB, which is sufficient for 1024x768 at 32 bpp. For the upstream qemu-xen device-model, the default and minimum is 8 MB.
For QXL vga, both the default and minimal are 128MB. If videoram is set less than 128MB, an error will be triggered.
In general, QXL should work with the Spice remote display protocol for acceleration, and a QXL driver is necessary in the guest in that case. QXL can also work with the VNC protocol, but it will be like a standard VGA card without acceleration.
Spice Graphics Support
The following options control the features of SPICE.
Note: the options depending on spicetls_port have not been supported.
Available options are: auto_glz, auto_lz, quic, glz, lz, off.
Available options are: filter, all, off.
Miscellaneous Emulated Hardware
The form serial=DEVICE is also accepted for backwards compatibility.
Host devices can also be passed through in this way, by specifying host:USBID, where USBID is of the form xxxx:yyyy. The USBID can typically be found by using lsusb(1) or usb-devices(1).
If you wish to use the "host:bus.addr" format, remove any leading '0' from the bus and addr. For example, for the USB device on bus 008 dev 002, you should write "host:8.2".
The form usbdevice=DEVICE is also accepted for backwards compatibility.
More valid options can be found in the "usbdevice" section of the QEMU documentation.
This parameter only takes effect when device_model_version=qemu-xen. See xen-pci-device-reservations(7) for more information.
This option is disabled by default.
Paging
The following options control the mechanisms used to virtualise guest memory. The defaults are selected to give the best results for the common cases so you should normally leave these options unspecified.
The following options control the selection of the device-model. This is the component which provides emulation of the virtual devices to an HVM guest. For a PV guest a device-model is sometimes used to provide backends for certain PV devices (most usually a virtual framebuffer device).
Valid values are:
It is recommended to accept the default value for new guests. If you have existing guests then, depending on the nature of the guest Operating System, you may wish to force them to use the device model which they were installed with.
The keymaps available are defined by the device-model which you are using. Commonly this includes:
ar de-ch es fo fr-ca hu ja mk no pt-br sv da en-gb et fr fr-ch is lt nl pl ru th de en-us fi fr-be hr it lv nl-be pt sl tr
The default is en-us.
See qemu(1) for more information.
ARM
Currently, the following versions are supported:
This requires hardware compatibility with the requested version, either natively or via hardware backwards compatibility support.
vuart = "sbsa_uart"
Currently, only the "sbsa_uart" model is supported for ARM.
x86
/etc/xen/NAME.cfg /var/lib/xen/dump/NAME
This document may contain items which require further documentation. Patches to improve incomplete items (or any other item) are gratefully received on the xen-devel@lists.xen.org mailing list. Please see <http://wiki.xen.org/wiki/SubmittingXenPatches> for information on how to submit a patch to Xen.
2021-06-14 | 4.11.4 |