From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: "Jürgen Groß" <jgross@suse.com>
Cc: Sarah Newman <srn@prgmr.com>, Jan Beulich <jbeulich@suse.com>,
xen-devel@lists.xenproject.org,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Stefano Stabellini <sstabellini@kernel.org>,
stable@vger.kernel.org, Chris Brannon <cmb@prgmr.com>
Subject: Re: [PATCH] xen: don't reschedule in preemption off sections
Date: Thu, 20 Aug 2020 07:44:19 +0200 [thread overview]
Message-ID: <20200820054419.GA1874724@kroah.com> (raw)
In-Reply-To: <4f33a18e-3491-d6ff-63bc-9bc619bd88cd@suse.com>
On Thu, Aug 20, 2020 at 07:13:00AM +0200, Jürgen Groß wrote:
> On 19.08.20 18:57, Sarah Newman wrote:
> > On 7/10/20 5:01 AM, Jürgen Groß wrote:
> > > On 10.07.20 13:55, Jan Beulich wrote:
> > > > On 10.07.2020 12:50, Jürgen Groß wrote:
> > > > > On 10.07.20 11:49, Jan Beulich wrote:
> > > > > > On 10.07.2020 09:50, Juergen Gross wrote:
> > > > > > > For support of long running hypercalls xen_maybe_preempt_hcall() is
> > > > > > > calling cond_resched() in case a hypercall marked as preemptible has
> > > > > > > been interrupted.
> > > > > > >
> > > > > > > Normally this is no problem, as only hypercalls done
> > > > > > > via some ioctl()s
> > > > > > > are marked to be preemptible. In rare cases when during such a
> > > > > > > preemptible hypercall an interrupt occurs and any softirq action is
> > > > > > > started from irq_exit(), a further hypercall issued by the softirq
> > > > > > > handler will be regarded to be preemptible, too. This might lead to
> > > > > > > rescheduling in spite of the softirq handler potentially having set
> > > > > > > preempt_disable(), leading to splats like:
> > > > > > >
> > > > > > > BUG: sleeping function called from invalid context
> > > > > > > at drivers/xen/preempt.c:37
> > > > > > > in_atomic(): 1, irqs_disabled(): 0, non_block: 0,
> > > > > > > pid: 20775, name: xl
> > > > > > > INFO: lockdep is turned off.
> > > > > > > CPU: 1 PID: 20775 Comm: xl Tainted: G D W
> > > > > > > 5.4.46-1_prgmr_debug.el7.x86_64 #1
> > > > > > > Call Trace:
> > > > > > > <IRQ>
> > > > > > > dump_stack+0x8f/0xd0
> > > > > > > ___might_sleep.cold.76+0xb2/0x103
> > > > > > > xen_maybe_preempt_hcall+0x48/0x70
> > > > > > > xen_do_hypervisor_callback+0x37/0x40
> > > > > > > RIP: e030:xen_hypercall_xen_version+0xa/0x20
> > > > > > > Code: ...
> > > > > > > RSP: e02b:ffffc900400dcc30 EFLAGS: 00000246
> > > > > > > RAX: 000000000004000d RBX: 0000000000000200 RCX: ffffffff8100122a
> > > > > > > RDX: ffff88812e788000 RSI: 0000000000000000 RDI: 0000000000000000
> > > > > > > RBP: ffffffff83ee3ad0 R08: 0000000000000001 R09: 0000000000000001
> > > > > > > R10: 0000000000000000 R11: 0000000000000246 R12: ffff8881824aa0b0
> > > > > > > R13: 0000000865496000 R14: 0000000865496000 R15: ffff88815d040000
> > > > > > > ? xen_hypercall_xen_version+0xa/0x20
> > > > > > > ? xen_force_evtchn_callback+0x9/0x10
> > > > > > > ? check_events+0x12/0x20
> > > > > > > ? xen_restore_fl_direct+0x1f/0x20
> > > > > > > ? _raw_spin_unlock_irqrestore+0x53/0x60
> > > > > > > ? debug_dma_sync_single_for_cpu+0x91/0xc0
> > > > > > > ? _raw_spin_unlock_irqrestore+0x53/0x60
> > > > > > > ? xen_swiotlb_sync_single_for_cpu+0x3d/0x140
> > > > > > > ? mlx4_en_process_rx_cq+0x6b6/0x1110 [mlx4_en]
> > > > > > > ? mlx4_en_poll_rx_cq+0x64/0x100 [mlx4_en]
> > > > > > > ? net_rx_action+0x151/0x4a0
> > > > > > > ? __do_softirq+0xed/0x55b
> > > > > > > ? irq_exit+0xea/0x100
> > > > > > > ? xen_evtchn_do_upcall+0x2c/0x40
> > > > > > > ? xen_do_hypervisor_callback+0x29/0x40
> > > > > > > </IRQ>
> > > > > > > ? xen_hypercall_domctl+0xa/0x20
> > > > > > > ? xen_hypercall_domctl+0x8/0x20
> > > > > > > ? privcmd_ioctl+0x221/0x990 [xen_privcmd]
> > > > > > > ? do_vfs_ioctl+0xa5/0x6f0
> > > > > > > ? ksys_ioctl+0x60/0x90
> > > > > > > ? trace_hardirqs_off_thunk+0x1a/0x20
> > > > > > > ? __x64_sys_ioctl+0x16/0x20
> > > > > > > ? do_syscall_64+0x62/0x250
> > > > > > > ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > > > > >
> > > > > > > Fix that by testing preempt_count() before calling cond_resched().
> > > > > > >
> > > > > > > In kernel 5.8 this can't happen any more due to the
> > > > > > > entry code rework.
> > > > > > >
> > > > > > > Reported-by: Sarah Newman <srn@prgmr.com>
> > > > > > > Fixes: 0fa2f5cb2b0ecd8 ("sched/preempt, xen: Use
> > > > > > > need_resched() instead of should_resched()")
> > > > > > > Cc: Sarah Newman <srn@prgmr.com>
> > > > > > > Signed-off-by: Juergen Gross <jgross@suse.com>
> > > > > > > ---
> > > > > > > drivers/xen/preempt.c | 2 +-
> > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/drivers/xen/preempt.c b/drivers/xen/preempt.c
> > > > > > > index 17240c5325a3..6ad87b5c95ed 100644
> > > > > > > --- a/drivers/xen/preempt.c
> > > > > > > +++ b/drivers/xen/preempt.c
> > > > > > > @@ -27,7 +27,7 @@ EXPORT_SYMBOL_GPL(xen_in_preemptible_hcall);
> > > > > > > asmlinkage __visible void xen_maybe_preempt_hcall(void)
> > > > > > > {
> > > > > > > if (unlikely(__this_cpu_read(xen_in_preemptible_hcall)
> > > > > > > - && need_resched())) {
> > > > > > > + && need_resched() && !preempt_count())) {
> > > > > >
> > > > > > Doesn't this have the at least latent risk of hiding issues in
> > > > > > other call trees (by no longer triggering logging like the one
> > > > > > that has propmted this change)? Wouldn't it be better to save,
> > > > > > clear, and restore the flag in one of xen_evtchn_do_upcall() or
> > > > > > xen_do_hypervisor_callback()?
> > > > >
> > > > > First regarding "risk of hiding issues": it seems as if lots of kernels
> > > > > aren't even configured to trigger this logging. It would need
> > > > > CONFIG_DEBUG_ATOMIC_SLEEP to be enabled and at least SUSE kernels don't
> > > > > seem to have it on. I suspect the occasional xen_mc_flush() failures we
> > > > > have seen are related to this problem.
> > > > >
> > > > > And in theory saving, clearing and restoring the flag would
> > > > > be fine, but
> > > > > it can't be done in a single function with the code flow as up to 5.7.
> > > > > What would need to be done is to save and clear the flag in e.g.
> > > > > __xen_evtchn_do_upcall() and to pass it to xen_maybe_preempt_hcall() as
> > > > > a parameter. In xen_maybe_preempt_hcall() the passed flag value would
> > > > > need to be used for the decision whether to call
> > > > > cond_resched() and then
> > > > > the flag could be restored (after the cond_resched() call).
> > > >
> > > > I'm afraid I don't follow: If __xen_evtchn_do_upcall() cleared the flag,
> > > > xen_maybe_preempt_hcall() would amount to a no-op (up and until the
> > > > flag's prior value would get restored), wouldn't it? No need to pass
> > > > anything into there.
> > >
> > > The problem is after __xen_evtchn_do_upcall() restoring the flag.
> > > As soon as irq_exit() is being called (either by xen_evtchn_do_upcall()
> > > or by the caller of xen_hvm_evtchn_do_upcall()) softirq handling might
> > > be executed resulting in another hypercall, which might be preempted
> > > afterwards. And this is the case which happened in the original
> > > report by Sarah.
> >
> > Tested-by: Chris Brannon <cmb@prgmr.com>
> >
> > We're no longer observing this bug. We've been running this on a test
> > system for about 3 weeks. Is it possible to merge?
>
> Greg, could you please take the patch for stable?
What is the git commit id of the patch in Linus's tree?
And what stable tree(s) should it go to?
thanks,
greg k-h
next prev parent reply other threads:[~2020-08-20 5:44 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-10 7:50 [PATCH] xen: don't reschedule in preemption off sections Juergen Gross
2020-07-10 9:49 ` Jan Beulich
2020-07-10 9:49 ` Jan Beulich
2020-07-10 10:50 ` Jürgen Groß
2020-07-10 10:50 ` Jürgen Groß
2020-07-10 11:55 ` Jan Beulich
2020-07-10 11:55 ` Jan Beulich
2020-07-10 12:01 ` Jürgen Groß
2020-07-10 12:01 ` Jürgen Groß
2020-08-19 16:57 ` Sarah Newman
2020-08-20 5:13 ` Jürgen Groß
2020-08-20 5:44 ` Greg Kroah-Hartman [this message]
2020-08-20 5:58 ` Jürgen Groß
2020-08-20 6:20 ` Greg Kroah-Hartman
-- strict thread matches above, loose matches on Subject: below --
2020-08-20 6:59 Juergen Gross
2020-08-24 8:13 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200820054419.GA1874724@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=boris.ostrovsky@oracle.com \
--cc=cmb@prgmr.com \
--cc=jbeulich@suse.com \
--cc=jgross@suse.com \
--cc=srn@prgmr.com \
--cc=sstabellini@kernel.org \
--cc=stable@vger.kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.