All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Bader <stefan.bader@canonical.com>
To: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	Jan Beulich <jbeulich@suse.com>
Subject: Strange PVM spinlock case revisited (nailed down)
Date: Thu, 14 Feb 2013 12:04:56 +0100	[thread overview]
Message-ID: <511CC4D8.3060203@canonical.com> (raw)
In-Reply-To: <51151ABF.1070007@canonical.com>


[-- Attachment #1.1: Type: text/plain, Size: 4235 bytes --]

I think I finally can proof why allowing interrupts for the duration of poll_irq
in xen_spin_lock_slow is bad! \o/

So with another run/dump:

VCPUs 1,2,5 and 6 are spinning on the runq lock of VCPU 1. The number of
spinners is 4 (ok) and the lock released.
According to dom0 debug info VCPUs 2,5 and 6 are polling, VCPU 1 is not
but seems running (has=T). The event channel for spinlock1 is pending.

Checking the interrupt stack of VCPU 1 in the guest dump shows:

hypercall_page+938
xen_poll_irq+16
xen_spin_lock_slow+151
xen_spin_lock_flags+99
_raw_spin_lock_irqsave+46
update_shares+156
rebalance_domains+72
run_rebalance_domains+72
__do_softirq+168
call_softirq+99
do_softirq+101
irq_exit+142
xen_evtchn_do_upcall+53
xen_do_hypervisor_callback+101

So right before finishing a previous callback, irq_exit triggers softirq
processing (interrupts enabled) and while updating the shares this tries to grab
the runq lock which we see in lock_spinners.

Since irq_count is 2 there is one more interruption executing right now (though
irq_regs wont show this). So I just proceeded upwards in the interrupt stack and
found:
try_to_wake_up+57
default_wake_function+18
autoremove_wake_function+22
wake_bit_function+59
__wake_up_common+88
__wake_up+72
__wake_up_bit+49
end_page_writeback+64
put_io_page+36
ext4_end_bio+338
bio_endio+29
req_bio_endio.isra.45+163
blk_update_request+245
blk_update_bidi_request+49
__blk_end_bidi_request+32
__blk_end_request_all+31
blkif_interrupt+460
handle_irq_event_percpu+85
handle_irq_event+78
handle_edge_irq+132
__xen_evtchn_do_upcall+409
xen_evtchn_do_upcall+47
xen_do_hypervisor_callback+101

There was a bit more on the stack but try_to_wake_up seemed a believable current
path. Even more so after looking into the function:

#ifdef CONFIG_SMP
        /*
         * If the owning (remote) cpu is still in the middle of schedule() with
         * this task as prev, wait until its done referencing the task.
         */
        while (p->on_cpu) {
#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
                /*
                 * In case the architecture enables interrupts in
                 * context_switch(), we cannot busy wait, since that
                 * would lead to deadlocks when an interrupt hits and
                 * tries to wake up @prev. So bail and do a complete
                 * remote wakeup.
                 */
                if (ttwu_activate_remote(p, wake_flags))
                        goto stat;
#else
                cpu_relax();
#endif

And prying out the task in question from the stack, it was one currently
being accounted for VCPU 6 and on_cpu. VCPU 6 is one of the other waiters for
the runq lock of VCPU 1. Which would get picked up as soon as this callback is
done. Unfortunately we "stay awhile... stay forever!".

Ok, as an experiment I defined  __ARCH_WANT_INTERRUPTS_ON_CTXSW for x86 and
running that kernel in the PVM guest no more locks up.
Problem there is that this define is gone since v3.7 (f3e9478674). And the
same testcase is not able to cause the issue since v3.6 (611ae8e3f5). The
change of TLB flushes unlikely is directly related (rather changing the number
of IPIs and by that the likelihood to observe this).

So for v3.6+ kernels, the question is whether it can be ruled out that during
softirq handling rebalance_domains->update_blocked_averages (was update_shares)
is called which then can get into spin_lock_slow and enable interrupts.
I have not seen it or am aware of reports about it but its likely one of those
rare occurrences.

As for a solution, the simplest one would be never to re-enable interrupts
before poll_irq... Anything else feels atm right complicated (like trying to
make the scheduler use disable interrupts and spin_lock variants instead of
spin_lock_irq ones... weird hidden implications in common code "just" for PVM).

-Stefan

Note: comparing the number of db records created on a 15min run of the testcase
seems to yield similar results for interrupts enabled (with that arch define in
v3.2) or disabled. Though I do not have a huge number base and that PVM is the
only one running on the host.




[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2013-02-14 11:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-08 15:33 Strange PVM spinlock case revisited Stefan Bader
2013-02-11 17:29 ` Ian Campbell
2013-02-12 13:42   ` Stefan Bader
2013-02-12 14:07     ` Ian Campbell
2013-02-12 14:50       ` Stefan Bader
2013-02-12 15:07         ` Ian Campbell
2013-02-12 15:53           ` Stefan Bader
2013-02-13 11:31   ` Stefan Bader
2013-02-13 12:16     ` Stefan Bader
2013-02-14 11:04 ` Stefan Bader [this message]
2013-02-14 11:43   ` Strange PVM spinlock case revisited (nailed down) Jan Beulich
2013-02-14 14:21     ` Stefan Bader
2013-02-14 14:48       ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=511CC4D8.3060203@canonical.com \
    --to=stefan.bader@canonical.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.