Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written
@ 2024-12-23 16:42 Sai Chaitanya Mitta
  2024-12-23 21:53 ` Darrick J. Wong
  2024-12-24  3:42 ` Dave Chinner
  0 siblings, 2 replies; 10+ messages in thread
From: Sai Chaitanya Mitta @ 2024-12-23 16:42 UTC (permalink / raw)
  To: linux-xfs

Hi Team,
           Is there any method/tool available to explicitly mark XFS
file extents as written? One approach I
am aware is explicitly zeroing the entire file (this file may be even
in hundreds of GB in size) through
synchronous/asynchronous(aio/io_uring) mechanism but it is time taking
process for large files,
is there any optimization/approach we can do to explicitly zeroing
file/mark extents as written?


Synchronous Approach:
                    while offset < size {
                        let bytes_written = img_file
                            .write_at(&buf, offset)
                            .map_err(|e| {
                                error!("Failed to zero out file: {}
error: {:?}", vol_name, e);
                            })?;
                        if offset == size {
                            break;
                        }
                        offset = offset + bytes_written as u64;
                    }
                    img_file.sync_all();

Asynchronous approach:
                   Currently used fio with libaio as ioengine but
results are almost same.

-- 
Thanks& Regards,
M.Sai Chaithanya.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written
  2024-12-23 16:42 Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written Sai Chaitanya Mitta
@ 2024-12-23 21:53 ` Darrick J. Wong
  2024-12-24  5:47   ` Sai Chaitanya Mitta
  2024-12-24  3:42 ` Dave Chinner
  1 sibling, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2024-12-23 21:53 UTC (permalink / raw)
  To: Sai Chaitanya Mitta; +Cc: linux-xfs

On Mon, Dec 23, 2024 at 10:12:32PM +0530, Sai Chaitanya Mitta wrote:
> Hi Team,
>            Is there any method/tool available to explicitly mark XFS
> file extents as written? One approach I
> am aware is explicitly zeroing the entire file (this file may be even
> in hundreds of GB in size) through
> synchronous/asynchronous(aio/io_uring) mechanism but it is time taking
> process for large files,
> is there any optimization/approach we can do to explicitly zeroing
> file/mark extents as written?

Why do you need to mark them written?

--D

> 
> Synchronous Approach:
>                     while offset < size {
>                         let bytes_written = img_file
>                             .write_at(&buf, offset)
>                             .map_err(|e| {
>                                 error!("Failed to zero out file: {}
> error: {:?}", vol_name, e);
>                             })?;
>                         if offset == size {
>                             break;
>                         }
>                         offset = offset + bytes_written as u64;
>                     }
>                     img_file.sync_all();
> 
> Asynchronous approach:
>                    Currently used fio with libaio as ioengine but
> results are almost same.
> 
> -- 
> Thanks& Regards,
> M.Sai Chaithanya.
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written
  2024-12-23 21:53 ` Darrick J. Wong
@ 2024-12-24  5:47   ` Sai Chaitanya Mitta
  2025-01-06 19:46     ` Darrick J. Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Sai Chaitanya Mitta @ 2024-12-24  5:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

Hi Darrick,
            Thanks for the quick response, we are exposing XFS file (created
through fallocate -l <size> <path>) as block device through
SPDK bdev (https://github.com/spdk/spdk) over NVMe-oF, Now initiator will
connect to the target and provide a block device to database applications.
What I have observed is databases' applications are issuing flush IO post
each/couple of writes, this flush at backend at backend translates to
fsync (through aio/io_uring) operation on FD (which is time taking process),
if we are doing no-op for flush IO then performance is 5x better compared to
serving flush operation. Doing no-op for flush and if system shutdown abruptly
then we are observing data loss (since metadata for new extents are not yet
persistent) to overcome this data loss issue and having better performance
below are the steps used:
1. Created file through fallocate using FALLOC_FL_ZERO_RANGE option
2. Explicitly zeroed file as mentioned in code (this marks all extents as
   written and there are no metadata changes related to data [what I observed],
   but there are atime and mtime updates of file).
3. Expose zeroed file to user as block device (as mentioned above).

Using above approach if system shutdown abruptly then I am not able
to reproduce data loss issue. So, planning to use above method to ensure
both data integrity and better performance

On Tue, Dec 24, 2024 at 3:23 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Mon, Dec 23, 2024 at 10:12:32PM +0530, Sai Chaitanya Mitta wrote:
> > Hi Team,
> >            Is there any method/tool available to explicitly mark XFS
> > file extents as written? One approach I
> > am aware is explicitly zeroing the entire file (this file may be even
> > in hundreds of GB in size) through
> > synchronous/asynchronous(aio/io_uring) mechanism but it is time taking
> > process for large files,
> > is there any optimization/approach we can do to explicitly zeroing
> > file/mark extents as written?
>
> Why do you need to mark them written?
>
> --D
>
> >
> > Synchronous Approach:
> >                     while offset < size {
> >                         let bytes_written = img_file
> >                             .write_at(&buf, offset)
> >                             .map_err(|e| {
> >                                 error!("Failed to zero out file: {}
> > error: {:?}", vol_name, e);
> >                             })?;
> >                         if offset == size {
> >                             break;
> >                         }
> >                         offset = offset + bytes_written as u64;
> >                     }
> >                     img_file.sync_all();
> >
> > Asynchronous approach:
> >                    Currently used fio with libaio as ioengine but
> > results are almost same.
> >
> > --
> > Thanks& Regards,
> > M.Sai Chaithanya.
> >



-- 
Thanks& Regards,
M.Sai Chaithanya.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written
  2024-12-24  5:47   ` Sai Chaitanya Mitta
@ 2025-01-06 19:46     ` Darrick J. Wong
  2025-01-07  6:14       ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2025-01-06 19:46 UTC (permalink / raw)
  To: Sai Chaitanya Mitta; +Cc: linux-xfs

On Tue, Dec 24, 2024 at 11:17:08AM +0530, Sai Chaitanya Mitta wrote:
> Hi Darrick,
>             Thanks for the quick response, we are exposing XFS file (created
> through fallocate -l <size> <path>) as block device through
> SPDK bdev (https://github.com/spdk/spdk) over NVMe-oF, Now initiator will
> connect to the target and provide a block device to database applications.
> What I have observed is databases' applications are issuing flush IO post
> each/couple of writes, this flush at backend at backend translates to
> fsync (through aio/io_uring) operation on FD (which is time taking process),
> if we are doing no-op for flush IO then performance is 5x better compared to
> serving flush operation. Doing no-op for flush and if system shutdown abruptly
> then we are observing data loss (since metadata for new extents are not yet
> persistent) to overcome this data loss issue and having better performance
> below are the steps used:
> 1. Created file through fallocate using FALLOC_FL_ZERO_RANGE option
> 2. Explicitly zeroed file as mentioned in code (this marks all extents as
>    written and there are no metadata changes related to data [what I observed],
>    but there are atime and mtime updates of file).
> 3. Expose zeroed file to user as block device (as mentioned above).
> 
> Using above approach if system shutdown abruptly then I am not able
> to reproduce data loss issue. So, planning to use above method to ensure
> both data integrity and better performance

That sounds brittle -- even if someday a FALLOC_FL_WRITE_ZEROES gets
merged into the kernel, if anything perturbs the file mapping (e.g.
background backup process reflinks the file) then you immediately become
vulnerable to these crash integrity problems without notice.

(Unless you're actually getting leases on the file ranges and reacting
appropriately when the leases break...)

--D

> On Tue, Dec 24, 2024 at 3:23 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Mon, Dec 23, 2024 at 10:12:32PM +0530, Sai Chaitanya Mitta wrote:
> > > Hi Team,
> > >            Is there any method/tool available to explicitly mark XFS
> > > file extents as written? One approach I
> > > am aware is explicitly zeroing the entire file (this file may be even
> > > in hundreds of GB in size) through
> > > synchronous/asynchronous(aio/io_uring) mechanism but it is time taking
> > > process for large files,
> > > is there any optimization/approach we can do to explicitly zeroing
> > > file/mark extents as written?
> >
> > Why do you need to mark them written?
> >
> > --D
> >
> > >
> > > Synchronous Approach:
> > >                     while offset < size {
> > >                         let bytes_written = img_file
> > >                             .write_at(&buf, offset)
> > >                             .map_err(|e| {
> > >                                 error!("Failed to zero out file: {}
> > > error: {:?}", vol_name, e);
> > >                             })?;
> > >                         if offset == size {
> > >                             break;
> > >                         }
> > >                         offset = offset + bytes_written as u64;
> > >                     }
> > >                     img_file.sync_all();
> > >
> > > Asynchronous approach:
> > >                    Currently used fio with libaio as ioengine but
> > > results are almost same.
> > >
> > > --
> > > Thanks& Regards,
> > > M.Sai Chaithanya.
> > >
> 
> 
> 
> -- 
> Thanks& Regards,
> M.Sai Chaithanya.
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written
  2025-01-06 19:46     ` Darrick J. Wong
@ 2025-01-07  6:14       ` Christoph Hellwig
  2025-01-07  7:04         ` Darrick J. Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2025-01-07  6:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Sai Chaitanya Mitta, linux-xfs

On Mon, Jan 06, 2025 at 11:46:39AM -0800, Darrick J. Wong wrote:
> That sounds brittle -- even if someday a FALLOC_FL_WRITE_ZEROES gets
> merged into the kernel, if anything perturbs the file mapping (e.g.
> background backup process reflinks the file) then you immediately become
> vulnerable to these crash integrity problems without notice.
> 
> (Unless you're actually getting leases on the file ranges and reacting
> appropriately when the leases break...)

They way I understood the description they have a user space program
exposing the XFS file over the network.  So if a change to the mapping
happens (e.g. due to defragmentation) they would in the worst case pay
the cost of an allocation transaction.

That is if they are really going through the normal kernel file
abstraction and don't try to bypass it by say abusing FIEMAP
information, in which case all hope is lost and the scheme has no chance
of reliably working, unless we add ioctls to expose the pNFS layouts
to userspace and they use that instead of FIEMAP.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written
  2025-01-07  6:14       ` Christoph Hellwig
@ 2025-01-07  7:04         ` Darrick J. Wong
  2025-01-07  8:37           ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2025-01-07  7:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sai Chaitanya Mitta, linux-xfs

On Mon, Jan 06, 2025 at 10:14:35PM -0800, Christoph Hellwig wrote:
> On Mon, Jan 06, 2025 at 11:46:39AM -0800, Darrick J. Wong wrote:
> > That sounds brittle -- even if someday a FALLOC_FL_WRITE_ZEROES gets
> > merged into the kernel, if anything perturbs the file mapping (e.g.
> > background backup process reflinks the file) then you immediately become
> > vulnerable to these crash integrity problems without notice.
> > 
> > (Unless you're actually getting leases on the file ranges and reacting
> > appropriately when the leases break...)
> 
> They way I understood the description they have a user space program
> exposing the XFS file over the network.  So if a change to the mapping
> happens (e.g. due to defragmentation) they would in the worst case pay
> the cost of an allocation transaction.
> 
> That is if they are really going through the normal kernel file
> abstraction and don't try to bypass it by say abusing FIEMAP
> information, in which case all hope is lost and the scheme has no chance
> of reliably working, unless we add ioctls to expose the pNFS layouts
> to userspace and they use that instead of FIEMAP.

I get this funny feeling that a lot of programs might like to lease
space and get told by the kernel when someone wants/took it back.
Swapfiles and lilo ftw.

--D

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written
  2025-01-07  7:04         ` Darrick J. Wong
@ 2025-01-07  8:37           ` Christoph Hellwig
  2025-01-07 22:11             ` Darrick J. Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2025-01-07  8:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, Sai Chaitanya Mitta, linux-xfs

On Mon, Jan 06, 2025 at 11:04:59PM -0800, Darrick J. Wong wrote:
> I get this funny feeling that a lot of programs might like to lease
> space and get told by the kernel when someone wants/took it back.
> Swapfiles and lilo ftw.

Well, for swapfiles we can't really take them back.  Similarly the lilo
model is just broken as any chance of the mapping would actually require
re-installing the boot load in the boot block pointing to the blocks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written
  2025-01-07  8:37           ` Christoph Hellwig
@ 2025-01-07 22:11             ` Darrick J. Wong
  0 siblings, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2025-01-07 22:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Sai Chaitanya Mitta, linux-xfs

On Tue, Jan 07, 2025 at 12:37:20AM -0800, Christoph Hellwig wrote:
> On Mon, Jan 06, 2025 at 11:04:59PM -0800, Darrick J. Wong wrote:
> > I get this funny feeling that a lot of programs might like to lease
> > space and get told by the kernel when someone wants/took it back.
> > Swapfiles and lilo ftw.
> 
> Well, for swapfiles we can't really take them back.  Similarly the lilo
> model is just broken as any chance of the mapping would actually require
> re-installing the boot load in the boot block pointing to the blocks.

I bet the rdma users might like a lease that can't be taken back.  We
/did/ talk a hojillion years ago at LSFMM about having a type of lease
that expires when someone tries to change the file; and a different type
of lease that can't expire but causes file operations to error out.

OTOH it's not like I've ever tried to solve this problem. :P

--D

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written
  2024-12-23 16:42 Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written Sai Chaitanya Mitta
  2024-12-23 21:53 ` Darrick J. Wong
@ 2024-12-24  3:42 ` Dave Chinner
  2025-01-06 11:15   ` Christoph Hellwig
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2024-12-24  3:42 UTC (permalink / raw)
  To: Sai Chaitanya Mitta; +Cc: linux-xfs

On Mon, Dec 23, 2024 at 10:12:32PM +0530, Sai Chaitanya Mitta wrote:
> Hi Team,
>            Is there any method/tool available to explicitly mark XFS
> file extents as written? One approach I

Writing data to the unwritten extent is the only way to do this.
Allowing uninitialised data extents to be converted to a written
state opens a massive hole in system security.

Go search for the discussions around FALLOC_FL_NO_HIDE_STALE from
well over a decade ago.....

-Dave
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written
  2024-12-24  3:42 ` Dave Chinner
@ 2025-01-06 11:15   ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2025-01-06 11:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Sai Chaitanya Mitta, linux-xfs

On Tue, Dec 24, 2024 at 02:42:08PM +1100, Dave Chinner wrote:
> On Mon, Dec 23, 2024 at 10:12:32PM +0530, Sai Chaitanya Mitta wrote:
> > Hi Team,
> >            Is there any method/tool available to explicitly mark XFS
> > file extents as written? One approach I
> 
> Writing data to the unwritten extent is the only way to do this.
> Allowing uninitialised data extents to be converted to a written
> state opens a massive hole in system security.

Yes.

> Go search for the discussions around FALLOC_FL_NO_HIDE_STALE from
> well over a decade ago.....

Or look for the old XFS_IOC_ALLOCSP ioctl which did allocation and
freeing in one syscall, which we removed quite a while ago.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-01-07 22:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-23 16:42 Approach to quickly zeroing large XFS file (or) tool to mark XFS file extents as written Sai Chaitanya Mitta
2024-12-23 21:53 ` Darrick J. Wong
2024-12-24  5:47   ` Sai Chaitanya Mitta
2025-01-06 19:46     ` Darrick J. Wong
2025-01-07  6:14       ` Christoph Hellwig
2025-01-07  7:04         ` Darrick J. Wong
2025-01-07  8:37           ` Christoph Hellwig
2025-01-07 22:11             ` Darrick J. Wong
2024-12-24  3:42 ` Dave Chinner
2025-01-06 11:15   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox