* [PATCH] xfs: flush the range before zero range conversion
@ 2014-09-24 19:06 Brian Foster
2014-09-25 12:01 ` Dave Chinner
0 siblings, 1 reply; 3+ messages in thread
From: Brian Foster @ 2014-09-24 19:06 UTC (permalink / raw)
To: xfs
XFS currently discards delalloc blocks within the target range of a zero
range request. Unaligned start and end offsets are zeroed through the
page cache and the internal, aligned blocks are converted to unwritten
extents.
If EOF is page aligned and covered by a delayed allocation extent. The
inode size is not updated until I/O completion. If a zero range request
discards a delalloc range that covers page aligned EOF as such, the
inode size update never occurs. For example:
$ rm -f /mnt/file
$ xfs_io -fc "pwrite 0 64k" -c "zero 60k 4k" /mnt/file
$ stat -c "%s" /mnt/file
65536
$ umount /mnt
$ mount <dev> /mnt
$ stat -c "%s" /mnt/file
61440
Update xfs_zero_file_space() to flush the range rather than discard
delalloc blocks to ensure that inode size updates occur appropriately.
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
I suppose we could be more clever here and only flush the range in this
particular scenario, but I'm not sure if there's a major benefit there.
FWIW, this implicitly addresses the indlen==0 assert failures described
in the xfs_bmap_del_extent() rfc, but doesn't necessarily mean we
shouldn't fix that code IMO.
Brian
fs/xfs/xfs_bmap_util.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index d8b77b5..24d634d 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1394,14 +1394,14 @@ xfs_zero_file_space(
if (start_boundary < end_boundary - 1) {
/*
- * punch out delayed allocation blocks and the page cache over
- * the conversion range
+ * Writeback the range to ensure any inode size updates due to
+ * appending writes make it to disk (otherwise we could just
+ * punch out the delalloc blocks).
*/
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- error = xfs_bmap_punch_delalloc_range(ip,
- XFS_B_TO_FSBT(mp, start_boundary),
- XFS_B_TO_FSB(mp, end_boundary - start_boundary));
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
+ start_boundary, end_boundary - 1);
+ if (error)
+ goto out;
truncate_pagecache_range(VFS_I(ip), start_boundary,
end_boundary - 1);
--
1.8.3.1
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] xfs: flush the range before zero range conversion
2014-09-24 19:06 [PATCH] xfs: flush the range before zero range conversion Brian Foster
@ 2014-09-25 12:01 ` Dave Chinner
2014-09-25 15:21 ` Brian Foster
0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2014-09-25 12:01 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Wed, Sep 24, 2014 at 03:06:31PM -0400, Brian Foster wrote:
> XFS currently discards delalloc blocks within the target range of a zero
> range request. Unaligned start and end offsets are zeroed through the
> page cache and the internal, aligned blocks are converted to unwritten
> extents.
>
> If EOF is page aligned and covered by a delayed allocation extent. The
> inode size is not updated until I/O completion. If a zero range request
> discards a delalloc range that covers page aligned EOF as such, the
> inode size update never occurs. For example:
>
> $ rm -f /mnt/file
> $ xfs_io -fc "pwrite 0 64k" -c "zero 60k 4k" /mnt/file
> $ stat -c "%s" /mnt/file
> 65536
> $ umount /mnt
> $ mount <dev> /mnt
> $ stat -c "%s" /mnt/file
> 61440
>
> Update xfs_zero_file_space() to flush the range rather than discard
> delalloc blocks to ensure that inode size updates occur appropriately.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>
> I suppose we could be more clever here and only flush the range in this
> particular scenario, but I'm not sure if there's a major benefit there.
Punching the delalloc range rather than flushing the file
was done intentionally - this was added primarily for speeding up
the zeroing of large VM image files. i.e. it's an extent
manipulation operation rather than a data Io operation. Flushing the
file defeats the primary reason for the operation existing.
We can easily detect this situation and just zero the last block in
the file directly after punching out all the delalloc state. This
should happen anyway when the region to be zeroed is not page
aligned....
> FWIW, this implicitly addresses the indlen==0 assert failures described
> in the xfs_bmap_del_extent() rfc, but doesn't necessarily mean we
> shouldn't fix that code IMO.
We punch delalloc extents elsewhere, so that still needs fixing.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] xfs: flush the range before zero range conversion
2014-09-25 12:01 ` Dave Chinner
@ 2014-09-25 15:21 ` Brian Foster
0 siblings, 0 replies; 3+ messages in thread
From: Brian Foster @ 2014-09-25 15:21 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Thu, Sep 25, 2014 at 10:01:55PM +1000, Dave Chinner wrote:
> On Wed, Sep 24, 2014 at 03:06:31PM -0400, Brian Foster wrote:
> > XFS currently discards delalloc blocks within the target range of a zero
> > range request. Unaligned start and end offsets are zeroed through the
> > page cache and the internal, aligned blocks are converted to unwritten
> > extents.
> >
> > If EOF is page aligned and covered by a delayed allocation extent. The
> > inode size is not updated until I/O completion. If a zero range request
> > discards a delalloc range that covers page aligned EOF as such, the
> > inode size update never occurs. For example:
> >
> > $ rm -f /mnt/file
> > $ xfs_io -fc "pwrite 0 64k" -c "zero 60k 4k" /mnt/file
> > $ stat -c "%s" /mnt/file
> > 65536
> > $ umount /mnt
> > $ mount <dev> /mnt
> > $ stat -c "%s" /mnt/file
> > 61440
> >
> > Update xfs_zero_file_space() to flush the range rather than discard
> > delalloc blocks to ensure that inode size updates occur appropriately.
> >
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >
> > I suppose we could be more clever here and only flush the range in this
> > particular scenario, but I'm not sure if there's a major benefit there.
>
> Punching the delalloc range rather than flushing the file
> was done intentionally - this was added primarily for speeding up
> the zeroing of large VM image files. i.e. it's an extent
> manipulation operation rather than a data Io operation. Flushing the
> file defeats the primary reason for the operation existing.
>
> We can easily detect this situation and just zero the last block in
> the file directly after punching out all the delalloc state. This
> should happen anyway when the region to be zeroed is not page
> aligned....
>
Hmm, good point. xfs_iozero() goes through page cache. It seems like
that should work and it's something we already have to handle in this
path. I'll give it a shot.
Brian
> > FWIW, this implicitly addresses the indlen==0 assert failures described
> > in the xfs_bmap_del_extent() rfc, but doesn't necessarily mean we
> > shouldn't fix that code IMO.
>
> We punch delalloc extents elsewhere, so that still needs fixing.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-09-25 15:21 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-24 19:06 [PATCH] xfs: flush the range before zero range conversion Brian Foster
2014-09-25 12:01 ` Dave Chinner
2014-09-25 15:21 ` Brian Foster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox