Re: Regression in generic/749 with 8k fsblock size on 6.18-rc1

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Kiryl Shutsemau <kirill@shutemov.name>
Cc: akpm@linux-foundation.org, linux-mm <linux-mm@kvack.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	xfs <linux-xfs@vger.kernel.org>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: Regression in generic/749 with 8k fsblock size on 6.18-rc1
Date: Wed, 15 Oct 2025 10:57:26 -0700	[thread overview]
Message-ID: <20251015175726.GC6188@frogsfrogsfrogs> (raw)
In-Reply-To: <rymlydtl4fo4k4okciiifsl52vnd7pqs65me6grweotgsxagln@zebgjfr3tuep>

On Wed, Oct 15, 2025 at 04:59:03PM +0100, Kiryl Shutsemau wrote:
> On Tue, Oct 14, 2025 at 10:52:14AM -0700, Darrick J. Wong wrote:
> > Hi there,
> > 
> > On 6.18-rc1, generic/749[1] running on XFS with an 8k fsblock size fails
> > with the following:
> > 
> > --- /run/fstests/bin/tests/generic/749.out	2025-07-15 14:45:15.170416031 -0700
> > +++ /var/tmp/fstests/generic/749.out.bad	2025-10-13 17:48:53.079872054 -0700
> > @@ -1,2 +1,10 @@
> >  QA output created by 749
> > +Expected SIGBUS when mmap() reading beyond page boundary
> > +Expected SIGBUS when mmap() writing beyond page boundary
> > +Expected SIGBUS when mmap() reading beyond page boundary
> > +Expected SIGBUS when mmap() writing beyond page boundary
> > +Expected SIGBUS when mmap() reading beyond page boundary
> > +Expected SIGBUS when mmap() writing beyond page boundary
> > +Expected SIGBUS when mmap() reading beyond page boundary
> > +Expected SIGBUS when mmap() writing beyond page boundary
> >  Silence is golden
> > 
> > This test creates small files of various sizes, maps the EOF block, and
> > checks that you can read and write to the mmap'd page up to (but not
> > beyond) the next page boundary.
> > 
> > For 8k fsblock filesystems on x86, the pagecache creates a single 8k
> > folio to cache the entire fsblock containing EOF.  If EOF is in the
> > first 4096 bytes of that 8k fsblock, then it should be possible to do a
> > mmap read/write of the first 4k, but not the second 4k.  Memory accesses
> > to the second 4096 bytes should produce a SIGBUS.
> 
> Does anybody actually relies on this behaviour (beyond xfstests)?

Beats me, but the mmap manpage says:

       SIGBUS Attempted access to a page of the buffer that  lies  be‐
              yond  the end of the mapped file.  For an explanation of
              the treatment of the bytes in the page that  corresponds
              to  the  end  of a mapped file that is not a multiple of
              the page size, see NOTES.

POSIX 2024 says:

The system shall always zero-fill any partial page at the end of an
object. Further, the system shall never write out any modified portions
of the last page of an object which are beyond its end. References
within the address range starting at pa and continuing for len bytes to
whole pages following the end of an object shall result in delivery of a
SIGBUS signal.

https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/functions/mmap.html#tag_17_345

From both I would surmise that it's a reasonable expectation that you
can't map basepages beyond EOF and have page faults on those pages
succeed.

> I think this behaviour existed before the recent changes, but it was
> less prominent.
> 
> Like, tmpfs with huge=always would fault-in PMD if there's order-9 folio
> in page cache regardless of i_size.
> 
> See filemap_map_pages->filemap_map_pmd() path.
> 
> I believe the same happens for large folios in other filesystems.

<shrug> The kernel SIGBUS'd as expected in 6.17.  For the 8k fsblock
case there indeed was a large folio caching the EOF, but then we were
also installing 4k PTE mappings.

(I'm not sure what happens if you actually have a PMD-sized page since
those are a little hard to force.)

> Some of this behaviour is hidden by truncate path trying to split large
> folios, split PMD and unmap a range of PTEs. But split can fail, so we
> cannot rely on this for correctness.
> 
> I would like to understand more about expectations in real workload
> before commit to a fix.

Yeah, I dislike the incongruities between byte-stream files vs mmapping
pages.  All the post-EOF zeroing logic is constantly getting broken in
subtle weird ways.

willy? :D

--D

> -- 
>   Kiryl Shutsemau / Kirill A. Shutemov
>

next prev parent reply	other threads:[~2025-10-15 17:57 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-14 17:52 Regression in generic/749 with 8k fsblock size on 6.18-rc1 Darrick J. Wong
2025-10-15  7:39 ` Kirill A. Shutemov
2025-10-15 17:45   ` Darrick J. Wong
2025-10-15 15:59 ` Kiryl Shutsemau
2025-10-15 17:57   ` Darrick J. Wong [this message]
2025-10-16 10:22     ` Kiryl Shutsemau
2025-10-16 22:33       ` Dave Chinner
2025-10-17 14:28         ` Kiryl Shutsemau
2025-10-17 16:02           ` Darrick J. Wong
2025-10-17 17:00             ` Kiryl Shutsemau
2025-10-17 17:14           ` Matthew Wilcox
2025-10-21 17:02         ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251015175726.GC6188@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).