From: Jan Stancek <jstancek@redhat.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
darrick wong <darrick.wong@oracle.com>,
linuxppc-dev@lists.ozlabs.org,
Memory Management <mm-qe@redhat.com>,
LTP Mailing List <ltp@lists.linux.it>,
Linux Stable maillist <stable@vger.kernel.org>,
CKI Project <cki-project@redhat.com>,
Michael Ellerman <mpe@ellerman.id.au>
Subject: Re: [bug] userspace hitting sporadic SIGBUS on xfs (Power9, ppc64le), v4.19 and later
Date: Tue, 3 Dec 2019 09:35:28 -0500 (EST) [thread overview]
Message-ID: <433638211.14837331.1575383728189.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20191203130757.GA2267@infradead.org>
----- Original Message -----
> On Tue, Dec 03, 2019 at 07:50:39AM -0500, Jan Stancek wrote:
> > My theory is that there's a race in iomap. There appear to be
> > interleaved calls to iomap_set_range_uptodate() for same page
> > with varying offset and length. Each call sees bitmap as _not_
> > entirely "uptodate" and hence doesn't call SetPageUptodate().
> > Even though each bit in bitmap ends up uptodate by the time
> > all calls finish.
>
> Weird. That should be prevented by the page lock that all callers
> of iomap_set_range_uptodate. But in case I miss something, does
> the patch below trigger? If not it is not jut a race, but might
> be some weird ordering problem with the bitops, especially if it
> only triggers on ppc, which is very weakly ordered.
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index d33c7bc5ee92..25e942c71590 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -148,6 +148,8 @@ iomap_set_range_uptodate(struct page *page, unsigned off,
> unsigned len)
> unsigned int i;
> bool uptodate = true;
>
> + WARN_ON_ONCE(!PageLocked(page));
> +
> if (iop) {
> for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) {
> if (i >= first && i <= last)
>
Hit it pretty quick this time:
# uptime
09:27:42 up 22 min, 2 users, load average: 0.09, 13.38, 26.18
# /mnt/testarea/ltp/testcases/bin/genbessel
Bus error (core dumped)
# dmesg | grep -i -e warn -e call
[ 0.000000] dt-cpu-ftrs: not enabling: system-call-vectored (disabled or unsupported by kernel)
[ 0.000000] random: get_random_u64 called from cache_random_seq_create+0x98/0x1e0 with crng_init=0
[ 0.000000] rcu: Offload RCU callbacks from CPUs: (none).
[ 5.312075] megaraid_sas 0031:01:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[ 5.357307] megaraid_sas 0031:01:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[ 5.485126] megaraid_sas 0031:01:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
So, extra WARN_ON_ONCE applied on top of v5.4-8836-g81b6b96475ac
did not trigger.
Is it possible for iomap code to submit multiple bio-s for same
locked page and then receive callbacks in parallel?
WARNING: multiple messages have this Message-ID (diff)
From: Jan Stancek <jstancek@redhat.com>
To: ltp@lists.linux.it
Subject: [LTP] [bug] userspace hitting sporadic SIGBUS on xfs (Power9, ppc64le), v4.19 and later
Date: Tue, 3 Dec 2019 09:35:28 -0500 (EST) [thread overview]
Message-ID: <433638211.14837331.1575383728189.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20191203130757.GA2267@infradead.org>
----- Original Message -----
> On Tue, Dec 03, 2019 at 07:50:39AM -0500, Jan Stancek wrote:
> > My theory is that there's a race in iomap. There appear to be
> > interleaved calls to iomap_set_range_uptodate() for same page
> > with varying offset and length. Each call sees bitmap as _not_
> > entirely "uptodate" and hence doesn't call SetPageUptodate().
> > Even though each bit in bitmap ends up uptodate by the time
> > all calls finish.
>
> Weird. That should be prevented by the page lock that all callers
> of iomap_set_range_uptodate. But in case I miss something, does
> the patch below trigger? If not it is not jut a race, but might
> be some weird ordering problem with the bitops, especially if it
> only triggers on ppc, which is very weakly ordered.
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index d33c7bc5ee92..25e942c71590 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -148,6 +148,8 @@ iomap_set_range_uptodate(struct page *page, unsigned off,
> unsigned len)
> unsigned int i;
> bool uptodate = true;
>
> + WARN_ON_ONCE(!PageLocked(page));
> +
> if (iop) {
> for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) {
> if (i >= first && i <= last)
>
Hit it pretty quick this time:
# uptime
09:27:42 up 22 min, 2 users, load average: 0.09, 13.38, 26.18
# /mnt/testarea/ltp/testcases/bin/genbessel
Bus error (core dumped)
# dmesg | grep -i -e warn -e call
[ 0.000000] dt-cpu-ftrs: not enabling: system-call-vectored (disabled or unsupported by kernel)
[ 0.000000] random: get_random_u64 called from cache_random_seq_create+0x98/0x1e0 with crng_init=0
[ 0.000000] rcu: Offload RCU callbacks from CPUs: (none).
[ 5.312075] megaraid_sas 0031:01:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[ 5.357307] megaraid_sas 0031:01:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[ 5.485126] megaraid_sas 0031:01:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
So, extra WARN_ON_ONCE applied on top of v5.4-8836-g81b6b96475ac
did not trigger.
Is it possible for iomap code to submit multiple bio-s for same
locked page and then receive callbacks in parallel?
WARNING: multiple messages have this Message-ID (diff)
From: Jan Stancek <jstancek@redhat.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: darrick wong <darrick.wong@oracle.com>,
Memory Management <mm-qe@redhat.com>,
Linux Stable maillist <stable@vger.kernel.org>,
linux-xfs@vger.kernel.org, CKI Project <cki-project@redhat.com>,
linux-fsdevel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
LTP Mailing List <ltp@lists.linux.it>
Subject: Re: [bug] userspace hitting sporadic SIGBUS on xfs (Power9, ppc64le), v4.19 and later
Date: Tue, 3 Dec 2019 09:35:28 -0500 (EST) [thread overview]
Message-ID: <433638211.14837331.1575383728189.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20191203130757.GA2267@infradead.org>
----- Original Message -----
> On Tue, Dec 03, 2019 at 07:50:39AM -0500, Jan Stancek wrote:
> > My theory is that there's a race in iomap. There appear to be
> > interleaved calls to iomap_set_range_uptodate() for same page
> > with varying offset and length. Each call sees bitmap as _not_
> > entirely "uptodate" and hence doesn't call SetPageUptodate().
> > Even though each bit in bitmap ends up uptodate by the time
> > all calls finish.
>
> Weird. That should be prevented by the page lock that all callers
> of iomap_set_range_uptodate. But in case I miss something, does
> the patch below trigger? If not it is not jut a race, but might
> be some weird ordering problem with the bitops, especially if it
> only triggers on ppc, which is very weakly ordered.
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index d33c7bc5ee92..25e942c71590 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -148,6 +148,8 @@ iomap_set_range_uptodate(struct page *page, unsigned off,
> unsigned len)
> unsigned int i;
> bool uptodate = true;
>
> + WARN_ON_ONCE(!PageLocked(page));
> +
> if (iop) {
> for (i = 0; i < PAGE_SIZE / i_blocksize(inode); i++) {
> if (i >= first && i <= last)
>
Hit it pretty quick this time:
# uptime
09:27:42 up 22 min, 2 users, load average: 0.09, 13.38, 26.18
# /mnt/testarea/ltp/testcases/bin/genbessel
Bus error (core dumped)
# dmesg | grep -i -e warn -e call
[ 0.000000] dt-cpu-ftrs: not enabling: system-call-vectored (disabled or unsupported by kernel)
[ 0.000000] random: get_random_u64 called from cache_random_seq_create+0x98/0x1e0 with crng_init=0
[ 0.000000] rcu: Offload RCU callbacks from CPUs: (none).
[ 5.312075] megaraid_sas 0031:01:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[ 5.357307] megaraid_sas 0031:01:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[ 5.485126] megaraid_sas 0031:01:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
So, extra WARN_ON_ONCE applied on top of v5.4-8836-g81b6b96475ac
did not trigger.
Is it possible for iomap code to submit multiple bio-s for same
locked page and then receive callbacks in parallel?
next prev parent reply other threads:[~2019-12-03 14:35 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-30 5:26 [LTP] ❌ FAIL: Test report for kernel 5.3.13-3b5f971.cki (stable-queue) CKI Project
2019-11-30 5:26 ` CKI Project
2019-11-30 21:56 ` [LTP] " Jan Stancek
2019-11-30 21:56 ` Jan Stancek
2019-12-02 5:46 ` [LTP] " Michael Ellerman
2019-12-02 5:46 ` Michael Ellerman
2019-12-02 5:46 ` Michael Ellerman
2019-12-02 12:30 ` [LTP] " Jan Stancek
2019-12-02 12:30 ` Jan Stancek
2019-12-02 12:30 ` Jan Stancek
2019-12-03 12:50 ` [bug] userspace hitting sporadic SIGBUS on xfs (Power9, ppc64le), v4.19 and later Jan Stancek
2019-12-03 12:50 ` Jan Stancek
2019-12-03 12:50 ` [LTP] " Jan Stancek
2019-12-03 13:07 ` Christoph Hellwig
2019-12-03 13:07 ` Christoph Hellwig
2019-12-03 13:07 ` [LTP] " Christoph Hellwig
2019-12-03 14:35 ` Jan Stancek [this message]
2019-12-03 14:35 ` Jan Stancek
2019-12-03 14:35 ` [LTP] " Jan Stancek
2019-12-03 16:08 ` Darrick J. Wong
2019-12-03 16:08 ` Darrick J. Wong
2019-12-03 16:08 ` [LTP] " Darrick J. Wong
2019-12-03 19:09 ` Christoph Hellwig
2019-12-03 19:09 ` Christoph Hellwig
2019-12-03 19:09 ` [LTP] " Christoph Hellwig
2019-12-04 14:43 ` Jan Stancek
2019-12-04 14:43 ` Jan Stancek
2019-12-04 14:43 ` [LTP] " Jan Stancek
2019-12-07 0:02 ` dftxbs3e
2019-12-07 0:09 ` dftxbs3e
2019-12-07 0:09 ` [LTP] " dftxbs3e
2019-12-08 20:30 ` Eric Sandeen
2019-12-08 20:30 ` Eric Sandeen
2019-12-08 20:30 ` [LTP] " Eric Sandeen
2019-12-09 8:26 ` Jan Stancek
2019-12-09 8:26 ` Jan Stancek
2019-12-09 8:26 ` [LTP] " Jan Stancek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=433638211.14837331.1575383728189.JavaMail.zimbra@redhat.com \
--to=jstancek@redhat.com \
--cc=cki-project@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=ltp@lists.linux.it \
--cc=mm-qe@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.