| IOCTL-XFS-COMMIT-RANGE(2) | System Calls Manual | IOCTL-XFS-COMMIT-RANGE(2) |
ioctl_xfs_commit_range - conditionally exchange the contents of parts of two files ioctl_xfs_start_commit - prepare to exchange the contents of two files
#include <sys/ioctl.h>
#include <xfs/xfs_fs.h>
int ioctl(int file2_fd, XFS_IOC_START_COMMIT, struct xfs_commit_range *arg);
int ioctl(int file2_fd, XFS_IOC_COMMIT_RANGE, struct xfs_commit_range *arg);
Given a range of bytes in a first file file1_fd and a second range of bytes in a second file file2_fd, this ioctl(2) exchanges the contents of the two ranges if file2_fd passes certain freshness criteria.
Before exchanging the contents, the program must call the XFS_IOC_START_COMMIT ioctl to sample freshness data for file2_fd. If the sampled metadata does not match the file metadata at commit time, XFS_IOC_COMMIT_RANGE will return EBUSY.
Exchanges are atomic with regards to concurrent file operations. Implementations must guarantee that readers see either the old contents or the new contents in their entirety, even if the system fails.
The system call parameters are conveyed in structures of the following form:
struct xfs_commit_range {
__s32 file1_fd;
__u32 pad;
__u64 file1_offset;
__u64 file2_offset;
__u64 length;
__u64 flags;
__u64 file2_freshness[5];
};
The field pad must be zero.
The fields file1_fd, file1_offset, and length define the first range of bytes to be exchanged.
The fields file2_fd, file2_offset, and length define the second range of bytes to be exchanged.
The field file2_freshness is an opaque field whose contents are determined by the kernel. These file attributes are used to confirm that file2_fd has not changed by another thread since the current thread began staging its own update.
Both files must be from the same filesystem mount. If the two file descriptors represent the same file, the byte ranges must not overlap. Most disk-based filesystems require that the starts of both ranges must be aligned to the file block size. If this is the case, the ends of the ranges must also be so aligned unless the XFS_EXCHANGE_RANGE_TO_EOF flag is set.
The field flags control the behavior of the exchange operation.
On error, -1 is returned, and errno is set to indicate the error.
Error codes can be one of, but are not limited to, the following:
This API is XFS-specific.
Several use cases are imagined for this system call. Coordination between multiple threads is performed by the kernel.
The first is a filesystem defragmenter, which copies the contents of a file into another file and wishes to exchange the space mappings of the two files, provided that the original file has not changed.
An example program might look like this:
int fd = open("/some/file", O_RDWR);
int temp_fd = open("/some", O_TMPFILE | O_RDWR);
struct stat sb;
struct xfs_commit_range args = {
.flags = XFS_EXCHANGE_RANGE_TO_EOF,
};
/* gather file2's freshness information */
ioctl(fd, XFS_IOC_START_COMMIT, &args);
fstat(fd, &sb);
/* make a fresh copy of the file with terrible alignment to avoid reflink */
clone_file_range(fd, NULL, temp_fd, NULL, 1, 0);
clone_file_range(fd, NULL, temp_fd, NULL, sb.st_size - 1, 0);
/* commit the entire update */
args.file1_fd = temp_fd;
ret = ioctl(fd, XFS_IOC_COMMIT_RANGE, &args);
if (ret && errno == EBUSY)
printf("file changed while defrag was underway\n");
The second is a data storage program that wants to commit non-contiguous updates to a file atomically. This program cannot coordinate updates to the file and therefore relies on the kernel to reject the COMMIT_RANGE command if the file has been updated by someone else. This can be done by creating a temporary file, calling FICLONE(2) to share the contents, and staging the updates into the temporary file. The FULL_FILES flag is recommended for this purpose. The temporary file can be deleted or punched out afterwards.
An example program might look like this:
int fd = open("/some/file", O_RDWR);
int temp_fd = open("/some", O_TMPFILE | O_RDWR);
struct xfs_commit_range args = {
.flags = XFS_EXCHANGE_RANGE_TO_EOF,
};
/* gather file2's freshness information */
ioctl(fd, XFS_IOC_START_COMMIT, &args);
ioctl(temp_fd, FICLONE, fd);
/* append 1MB of records */
lseek(temp_fd, 0, SEEK_END);
write(temp_fd, data1, 1000000);
/* update record index */
pwrite(temp_fd, data1, 600, 98765);
pwrite(temp_fd, data2, 320, 54321);
pwrite(temp_fd, data2, 15, 0);
/* commit the entire update */
args.file1_fd = temp_fd;
ret = ioctl(fd, XFS_IOC_COMMIT_RANGE, &args);
if (ret && errno == EBUSY)
printf("file changed before commit; will roll back\n");
Some filesystems may limit the amount of data or the number of extents that can be exchanged in a single call.
| 2024-02-18 | XFS |