* Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written
@ 2024-12-23 16:42 Sai Chaitanya Mitta
2024-12-23 21:53 ` Darrick J. Wong
2024-12-24 3:42 ` Dave Chinner
0 siblings, 2 replies; 10+ messages in thread
From: Sai Chaitanya Mitta @ 2024-12-23 16:42 UTC (permalink / raw)
To: linux-xfs
Hi Team,
Is there any method/tool available to explicitly mark XFS
file extents as written? One approach I
am aware is explicitly zeroing the entire file (this file may be even
in hundreds of GB in size) through
synchronous/asynchronous(aio/io_uring) mechanism but it is time taking
process for large files,
is there any optimization/approach we can do to explicitly zeroing
file/mark extents as written?
Synchronous Approach:
while offset < size {
let bytes_written = img_file
.write_at(&buf, offset)
.map_err(|e| {
error!("Failed to zero out file: {}
error: {:?}", vol_name, e);
})?;
if offset == size {
break;
}
offset = offset + bytes_written as u64;
}
img_file.sync_all();
Asynchronous approach:
Currently used fio with libaio as ioengine but
results are almost same.
--
Thanks& Regards,
M.Sai Chaithanya.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written 2024-12-23 16:42 Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written Sai Chaitanya Mitta @ 2024-12-23 21:53 ` Darrick J. Wong 2024-12-24 5:47 ` Sai Chaitanya Mitta 2024-12-24 3:42 ` Dave Chinner 1 sibling, 1 reply; 10+ messages in thread From: Darrick J. Wong @ 2024-12-23 21:53 UTC (permalink / raw) To: Sai Chaitanya Mitta; +Cc: linux-xfs On Mon, Dec 23, 2024 at 10:12:32PM +0530, Sai Chaitanya Mitta wrote: > Hi Team, > Is there any method/tool available to explicitly mark XFS > file extents as written? One approach I > am aware is explicitly zeroing the entire file (this file may be even > in hundreds of GB in size) through > synchronous/asynchronous(aio/io_uring) mechanism but it is time taking > process for large files, > is there any optimization/approach we can do to explicitly zeroing > file/mark extents as written? Why do you need to mark them written? --D > > Synchronous Approach: > while offset < size { > let bytes_written = img_file > .write_at(&buf, offset) > .map_err(|e| { > error!("Failed to zero out file: {} > error: {:?}", vol_name, e); > })?; > if offset == size { > break; > } > offset = offset + bytes_written as u64; > } > img_file.sync_all(); > > Asynchronous approach: > Currently used fio with libaio as ioengine but > results are almost same. > > -- > Thanks& Regards, > M.Sai Chaithanya. > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written 2024-12-23 21:53 ` Darrick J. Wong @ 2024-12-24 5:47 ` Sai Chaitanya Mitta 2025-01-06 19:46 ` Darrick J. Wong 0 siblings, 1 reply; 10+ messages in thread From: Sai Chaitanya Mitta @ 2024-12-24 5:47 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs Hi Darrick, Thanks for the quick response, we are exposing XFS file (created through fallocate -l <size> <path>) as block device through SPDK bdev (https://github.com/spdk/spdk) over NVMe-oF, Now initiator will connect to the target and provide a block device to database applications. What I have observed is databases' applications are issuing flush IO post each/couple of writes, this flush at backend at backend translates to fsync (through aio/io_uring) operation on FD (which is time taking process), if we are doing no-op for flush IO then performance is 5x better compared to serving flush operation. Doing no-op for flush and if system shutdown abruptly then we are observing data loss (since metadata for new extents are not yet persistent) to overcome this data loss issue and having better performance below are the steps used: 1. Created file through fallocate using FALLOC_FL_ZERO_RANGE option 2. Explicitly zeroed file as mentioned in code (this marks all extents as written and there are no metadata changes related to data [what I observed], but there are atime and mtime updates of file). 3. Expose zeroed file to user as block device (as mentioned above). Using above approach if system shutdown abruptly then I am not able to reproduce data loss issue. So, planning to use above method to ensure both data integrity and better performance On Tue, Dec 24, 2024 at 3:23 AM Darrick J. Wong <djwong@kernel.org> wrote: > > On Mon, Dec 23, 2024 at 10:12:32PM +0530, Sai Chaitanya Mitta wrote: > > Hi Team, > > Is there any method/tool available to explicitly mark XFS > > file extents as written? One approach I > > am aware is explicitly zeroing the entire file (this file may be even > > in hundreds of GB in size) through > > synchronous/asynchronous(aio/io_uring) mechanism but it is time taking > > process for large files, > > is there any optimization/approach we can do to explicitly zeroing > > file/mark extents as written? > > Why do you need to mark them written? > > --D > > > > > Synchronous Approach: > > while offset < size { > > let bytes_written = img_file > > .write_at(&buf, offset) > > .map_err(|e| { > > error!("Failed to zero out file: {} > > error: {:?}", vol_name, e); > > })?; > > if offset == size { > > break; > > } > > offset = offset + bytes_written as u64; > > } > > img_file.sync_all(); > > > > Asynchronous approach: > > Currently used fio with libaio as ioengine but > > results are almost same. > > > > -- > > Thanks& Regards, > > M.Sai Chaithanya. > > -- Thanks& Regards, M.Sai Chaithanya. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written 2024-12-24 5:47 ` Sai Chaitanya Mitta @ 2025-01-06 19:46 ` Darrick J. Wong 2025-01-07 6:14 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: Darrick J. Wong @ 2025-01-06 19:46 UTC (permalink / raw) To: Sai Chaitanya Mitta; +Cc: linux-xfs On Tue, Dec 24, 2024 at 11:17:08AM +0530, Sai Chaitanya Mitta wrote: > Hi Darrick, > Thanks for the quick response, we are exposing XFS file (created > through fallocate -l <size> <path>) as block device through > SPDK bdev (https://github.com/spdk/spdk) over NVMe-oF, Now initiator will > connect to the target and provide a block device to database applications. > What I have observed is databases' applications are issuing flush IO post > each/couple of writes, this flush at backend at backend translates to > fsync (through aio/io_uring) operation on FD (which is time taking process), > if we are doing no-op for flush IO then performance is 5x better compared to > serving flush operation. Doing no-op for flush and if system shutdown abruptly > then we are observing data loss (since metadata for new extents are not yet > persistent) to overcome this data loss issue and having better performance > below are the steps used: > 1. Created file through fallocate using FALLOC_FL_ZERO_RANGE option > 2. Explicitly zeroed file as mentioned in code (this marks all extents as > written and there are no metadata changes related to data [what I observed], > but there are atime and mtime updates of file). > 3. Expose zeroed file to user as block device (as mentioned above). > > Using above approach if system shutdown abruptly then I am not able > to reproduce data loss issue. So, planning to use above method to ensure > both data integrity and better performance That sounds brittle -- even if someday a FALLOC_FL_WRITE_ZEROES gets merged into the kernel, if anything perturbs the file mapping (e.g. background backup process reflinks the file) then you immediately become vulnerable to these crash integrity problems without notice. (Unless you're actually getting leases on the file ranges and reacting appropriately when the leases break...) --D > On Tue, Dec 24, 2024 at 3:23 AM Darrick J. Wong <djwong@kernel.org> wrote: > > > > On Mon, Dec 23, 2024 at 10:12:32PM +0530, Sai Chaitanya Mitta wrote: > > > Hi Team, > > > Is there any method/tool available to explicitly mark XFS > > > file extents as written? One approach I > > > am aware is explicitly zeroing the entire file (this file may be even > > > in hundreds of GB in size) through > > > synchronous/asynchronous(aio/io_uring) mechanism but it is time taking > > > process for large files, > > > is there any optimization/approach we can do to explicitly zeroing > > > file/mark extents as written? > > > > Why do you need to mark them written? > > > > --D > > > > > > > > Synchronous Approach: > > > while offset < size { > > > let bytes_written = img_file > > > .write_at(&buf, offset) > > > .map_err(|e| { > > > error!("Failed to zero out file: {} > > > error: {:?}", vol_name, e); > > > })?; > > > if offset == size { > > > break; > > > } > > > offset = offset + bytes_written as u64; > > > } > > > img_file.sync_all(); > > > > > > Asynchronous approach: > > > Currently used fio with libaio as ioengine but > > > results are almost same. > > > > > > -- > > > Thanks& Regards, > > > M.Sai Chaithanya. > > > > > > > -- > Thanks& Regards, > M.Sai Chaithanya. > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written 2025-01-06 19:46 ` Darrick J. Wong @ 2025-01-07 6:14 ` Christoph Hellwig 2025-01-07 7:04 ` Darrick J. Wong 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2025-01-07 6:14 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Sai Chaitanya Mitta, linux-xfs On Mon, Jan 06, 2025 at 11:46:39AM -0800, Darrick J. Wong wrote: > That sounds brittle -- even if someday a FALLOC_FL_WRITE_ZEROES gets > merged into the kernel, if anything perturbs the file mapping (e.g. > background backup process reflinks the file) then you immediately become > vulnerable to these crash integrity problems without notice. > > (Unless you're actually getting leases on the file ranges and reacting > appropriately when the leases break...) They way I understood the description they have a user space program exposing the XFS file over the network. So if a change to the mapping happens (e.g. due to defragmentation) they would in the worst case pay the cost of an allocation transaction. That is if they are really going through the normal kernel file abstraction and don't try to bypass it by say abusing FIEMAP information, in which case all hope is lost and the scheme has no chance of reliably working, unless we add ioctls to expose the pNFS layouts to userspace and they use that instead of FIEMAP. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written 2025-01-07 6:14 ` Christoph Hellwig @ 2025-01-07 7:04 ` Darrick J. Wong 2025-01-07 8:37 ` Christoph Hellwig 0 siblings, 1 reply; 10+ messages in thread From: Darrick J. Wong @ 2025-01-07 7:04 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Sai Chaitanya Mitta, linux-xfs On Mon, Jan 06, 2025 at 10:14:35PM -0800, Christoph Hellwig wrote: > On Mon, Jan 06, 2025 at 11:46:39AM -0800, Darrick J. Wong wrote: > > That sounds brittle -- even if someday a FALLOC_FL_WRITE_ZEROES gets > > merged into the kernel, if anything perturbs the file mapping (e.g. > > background backup process reflinks the file) then you immediately become > > vulnerable to these crash integrity problems without notice. > > > > (Unless you're actually getting leases on the file ranges and reacting > > appropriately when the leases break...) > > They way I understood the description they have a user space program > exposing the XFS file over the network. So if a change to the mapping > happens (e.g. due to defragmentation) they would in the worst case pay > the cost of an allocation transaction. > > That is if they are really going through the normal kernel file > abstraction and don't try to bypass it by say abusing FIEMAP > information, in which case all hope is lost and the scheme has no chance > of reliably working, unless we add ioctls to expose the pNFS layouts > to userspace and they use that instead of FIEMAP. I get this funny feeling that a lot of programs might like to lease space and get told by the kernel when someone wants/took it back. Swapfiles and lilo ftw. --D ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written 2025-01-07 7:04 ` Darrick J. Wong @ 2025-01-07 8:37 ` Christoph Hellwig 2025-01-07 22:11 ` Darrick J. Wong 0 siblings, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2025-01-07 8:37 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Christoph Hellwig, Sai Chaitanya Mitta, linux-xfs On Mon, Jan 06, 2025 at 11:04:59PM -0800, Darrick J. Wong wrote: > I get this funny feeling that a lot of programs might like to lease > space and get told by the kernel when someone wants/took it back. > Swapfiles and lilo ftw. Well, for swapfiles we can't really take them back. Similarly the lilo model is just broken as any chance of the mapping would actually require re-installing the boot load in the boot block pointing to the blocks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written 2025-01-07 8:37 ` Christoph Hellwig @ 2025-01-07 22:11 ` Darrick J. Wong 0 siblings, 0 replies; 10+ messages in thread From: Darrick J. Wong @ 2025-01-07 22:11 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Sai Chaitanya Mitta, linux-xfs On Tue, Jan 07, 2025 at 12:37:20AM -0800, Christoph Hellwig wrote: > On Mon, Jan 06, 2025 at 11:04:59PM -0800, Darrick J. Wong wrote: > > I get this funny feeling that a lot of programs might like to lease > > space and get told by the kernel when someone wants/took it back. > > Swapfiles and lilo ftw. > > Well, for swapfiles we can't really take them back. Similarly the lilo > model is just broken as any chance of the mapping would actually require > re-installing the boot load in the boot block pointing to the blocks. I bet the rdma users might like a lease that can't be taken back. We /did/ talk a hojillion years ago at LSFMM about having a type of lease that expires when someone tries to change the file; and a different type of lease that can't expire but causes file operations to error out. OTOH it's not like I've ever tried to solve this problem. :P --D ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written 2024-12-23 16:42 Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written Sai Chaitanya Mitta 2024-12-23 21:53 ` Darrick J. Wong @ 2024-12-24 3:42 ` Dave Chinner 2025-01-06 11:15 ` Christoph Hellwig 1 sibling, 1 reply; 10+ messages in thread From: Dave Chinner @ 2024-12-24 3:42 UTC (permalink / raw) To: Sai Chaitanya Mitta; +Cc: linux-xfs On Mon, Dec 23, 2024 at 10:12:32PM +0530, Sai Chaitanya Mitta wrote: > Hi Team, > Is there any method/tool available to explicitly mark XFS > file extents as written? One approach I Writing data to the unwritten extent is the only way to do this. Allowing uninitialised data extents to be converted to a written state opens a massive hole in system security. Go search for the discussions around FALLOC_FL_NO_HIDE_STALE from well over a decade ago..... -Dave -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written 2024-12-24 3:42 ` Dave Chinner @ 2025-01-06 11:15 ` Christoph Hellwig 0 siblings, 0 replies; 10+ messages in thread From: Christoph Hellwig @ 2025-01-06 11:15 UTC (permalink / raw) To: Dave Chinner; +Cc: Sai Chaitanya Mitta, linux-xfs On Tue, Dec 24, 2024 at 02:42:08PM +1100, Dave Chinner wrote: > On Mon, Dec 23, 2024 at 10:12:32PM +0530, Sai Chaitanya Mitta wrote: > > Hi Team, > > Is there any method/tool available to explicitly mark XFS > > file extents as written? One approach I > > Writing data to the unwritten extent is the only way to do this. > Allowing uninitialised data extents to be converted to a written > state opens a massive hole in system security. Yes. > Go search for the discussions around FALLOC_FL_NO_HIDE_STALE from > well over a decade ago..... Or look for the old XFS_IOC_ALLOCSP ioctl which did allocation and freeing in one syscall, which we removed quite a while ago. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-01-07 22:11 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-23 16:42 Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written Sai Chaitanya Mitta 2024-12-23 21:53 ` Darrick J. Wong 2024-12-24 5:47 ` Sai Chaitanya Mitta 2025-01-06 19:46 ` Darrick J. Wong 2025-01-07 6:14 ` Christoph Hellwig 2025-01-07 7:04 ` Darrick J. Wong 2025-01-07 8:37 ` Christoph Hellwig 2025-01-07 22:11 ` Darrick J. Wong 2024-12-24 3:42 ` Dave Chinner 2025-01-06 11:15 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox