From: David Howells <dhowells@redhat.com>
To: "Pankaj Raghav (Samsung)" <kernel@pankajraghav.com>
Cc: dhowells@redhat.com, brauner@kernel.org,
akpm@linux-foundation.org, chandan.babu@oracle.com,
linux-fsdevel@vger.kernel.org, djwong@kernel.org, hare@suse.de,
gost.dev@samsung.com, linux-xfs@vger.kernel.org, hch@lst.de,
david@fromorbit.com, Zi Yan <ziy@nvidia.com>,
yang@os.amperecomputing.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, willy@infradead.org, john.g.garry@oracle.com,
cl@os.amperecomputing.com, p.raghav@samsung.com,
mcgrof@kernel.org, ryan.roberts@arm.com
Subject: Re: [PATCH v12 00/10] enable bs > ps in XFS
Date: Mon, 19 Aug 2024 12:46:55 +0100 [thread overview]
Message-ID: <3402933.1724068015@warthog.procyon.org.uk> (raw)
In-Reply-To: <20240818165124.7jrop5sgtv5pjd3g@quentin>
Hi Pankaj,
I can reproduce the problem with:
xfs_io -t -f -c "pwrite -S 0x58 0 40" -c "fsync" -c "truncate 4" -c "truncate 4096" /xfstest.test/wubble; od -x /xfstest.test/wubble
borrowed from generic/393. I've distilled it down to the attached C program.
Turning on tracing and adding a bit more, I can see the problem happening.
Here's an excerpt of the tracing (I've added some non-upstream tracepoints).
Firstly, you can see the second pwrite at fpos 0, 40 bytes (ie. 0x28):
pankaj-5833: netfs_write_iter: WRITE-ITER i=9e s=0 l=28 f=0
pankaj-5833: netfs_folio: pfn=116fec i=0009e ix=00000-00001 mod-streamw
Then first ftruncate() is called to reduce the file size to 4:
pankaj-5833: netfs_truncate: ni=9e isz=2028 rsz=2028 zp=4000 to=4
pankaj-5833: netfs_inval_folio: pfn=116fec i=0009e ix=00000-00001 o=4 l=1ffc d=78787878
pankaj-5833: netfs_folio: pfn=116fec i=0009e ix=00000-00001 inval-part
pankaj-5833: netfs_set_size: ni=9e resize-file isz=4 rsz=4 zp=4
You can see the invalidate_folio call, with the offset at 0x4 an the length as
0x1ffc. The data at the beginning of the page is 0x78787878. This looks
correct.
Then second ftruncate() is called to increase the file size to 4096
(ie. 0x1000):
pankaj-5833: netfs_truncate: ni=9e isz=4 rsz=4 zp=4 to=1000
pankaj-5833: netfs_inval_folio: pfn=116fec i=0009e ix=00000-00001 o=1000 l=1000 d=78787878
pankaj-5833: netfs_folio: pfn=116fec i=0009e ix=00000-00001 inval-part
pankaj-5833: netfs_set_size: ni=9e resize-file isz=1000 rsz=1000 zp=4
And here's the problem: in the invalidate_folio() call, the offset is 0x1000
and the length is 0x1000 (o= and l=). But that's the wrong half of the folio!
I'm guessing that the caller thereafter clears the other half of the folio -
the bit that should be kept.
David
---
/* Distillation of the generic/393 xfstest */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#define ERR(x, y) do { if ((long)(x) == -1) { perror(y); exit(1); } } while(0)
static const char xxx[40] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
static const char yyy[40] = "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy";
static const char dropfile[] = "/proc/sys/vm/drop_caches";
static const char droptype[] = "3";
static const char file[] = "/xfstest.test/wubble";
int main(int argc, char *argv[])
{
int fd, drop;
/* Fill in the second 8K block of the file... */
fd = open(file, O_CREAT|O_TRUNC|O_WRONLY, 0666);
ERR(fd, "open");
ERR(ftruncate(fd, 0), "pre-trunc $file");
ERR(pwrite(fd, yyy, sizeof(yyy), 0x2000), "write-2000");
ERR(close(fd), "close");
/* ... and drop the pagecache so that we get a streaming
* write, attaching some private data to the folio.
*/
drop = open(dropfile, O_WRONLY);
ERR(drop, dropfile);
ERR(write(drop, droptype, sizeof(droptype) - 1), "write-drop");
ERR(close(drop), "close-drop");
fd = open(file, O_WRONLY, 0666);
ERR(fd, "reopen");
/* Make a streaming write on the first 8K block (needs O_WRONLY). */
ERR(pwrite(fd, xxx, sizeof(xxx), 0), "write-0");
/* Now use truncate to shrink and reexpand. */
ERR(ftruncate(fd, 4), "trunc-4");
ERR(ftruncate(fd, 4096), "trunc-4096");
ERR(close(fd), "close-2");
exit(0);
}
next prev parent reply other threads:[~2024-08-19 11:47 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-15 9:08 [PATCH v12 00/10] enable bs > ps in XFS Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 01/10] fs: Allow fine-grained control of folio sizes Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 02/10] filemap: allocate mapping_min_order folios in the page cache Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 03/10] readahead: allocate folios with mapping_min_order in readahead Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 04/10] mm: split a folio in minimum folio order chunks Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 05/10] filemap: cap PTE range to be created to allowed zero fill in folio_map_range() Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 06/10] iomap: fix iomap_dio_zero() for fs bs > system page size Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 07/10] xfs: use kvmalloc for xattr buffers Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 08/10] xfs: expose block size in stat Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 09/10] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Pankaj Raghav (Samsung)
2024-08-15 9:08 ` [PATCH v12 10/10] xfs: enable block size larger than page size support Pankaj Raghav (Samsung)
2024-08-16 19:31 ` [PATCH v12 00/10] enable bs > ps in XFS David Howells
2024-08-18 16:51 ` Pankaj Raghav (Samsung)
2024-08-18 20:16 ` David Howells
2024-08-19 7:24 ` Hannes Reinecke
2024-08-19 7:37 ` Pankaj Raghav (Samsung)
2024-08-19 12:25 ` David Howells
2024-08-19 11:46 ` David Howells [this message]
2024-08-19 11:59 ` David Howells
2024-08-19 12:48 ` Hannes Reinecke
2024-08-19 14:08 ` David Howells
2024-08-19 16:39 ` Pankaj Raghav (Samsung)
2024-08-19 18:40 ` David Howells
2024-08-20 9:17 ` Pankaj Raghav (Samsung)
2024-08-20 23:24 ` David Howells
2024-08-21 7:16 ` Pankaj Raghav (Samsung)
2024-08-19 15:17 ` David Howells
2024-08-19 16:51 ` David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3402933.1724068015@warthog.procyon.org.uk \
--to=dhowells@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=chandan.babu@oracle.com \
--cc=cl@os.amperecomputing.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=john.g.garry@oracle.com \
--cc=kernel@pankajraghav.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=ryan.roberts@arm.com \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox