From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: "imammedo [via Xen]"
<ml-node+4704111-2053006313-93434@n5.nabble.com>,
xen-devel@lists.xensource.com,
Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: [PATCH] blkfront: Move blkif_interrupt into a tasklet.
Date: Tue, 16 Aug 2011 10:57:55 -0400 [thread overview]
Message-ID: <20110816145754.GA31431@dumpdata.com> (raw)
In-Reply-To: <1313494014833-4704111.post@n5.nabble.com>
On Tue, Aug 16, 2011 at 04:26:54AM -0700, imammedo [via Xen] wrote:
>
> Jeremy Fitzhardinge wrote:
> >
> > Have you tried bisecting to see when this particular problem appeared?
> > It looks to me like something is accidentally re-enabling interrupts -
> > perhaps a stack overrun is corrupting the "flags" argument between a
> > spin_lock_irqsave()/restore pair.
> >
> > Is it only on 32-bit kernels?
> >
Any specific reason you did not include xen-devel in this email? I am
CC-ing it here.
> ------------[ cut here ]------------
> [604001.659925] WARNING: at block/blk-core.c:239 blk_start_queue+0x70/0x80()
> [604001.659964] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl
> sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4
> nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables xen_netfront
> pcspkr [last unloaded: scsi_wait_scan]
> [604001.660147] Pid: 336, comm: udevd Tainted: G W 3.0.0+ #50
> [604001.660181] Call Trace:
> [604001.660209] [<c045c512>] warn_slowpath_common+0x72/0xa0
> [604001.660243] [<c06643a0>] ? blk_start_queue+0x70/0x80
> [604001.660275] [<c06643a0>] ? blk_start_queue+0x70/0x80
> [604001.660310] [<c045c562>] warn_slowpath_null+0x22/0x30
> [604001.660343] [<c06643a0>] blk_start_queue+0x70/0x80
> [604001.660379] [<c075e231>] kick_pending_request_queues+0x21/0x30
> [604001.660417] [<c075e42f>] blkif_interrupt+0x19f/0x2b0
> ...
> ------------[ cut here ]------------
>
> I've debugged a bit blk-core warning and can say:
> - Yes, It is 32-bit PAE kernel and happens only with it so far.
> - Affects PV xen guest, bare-metal and kvm configs are not affected.
> - Upstream kernel is affected as well.
> - Reproduces on xen 4.1.1 and 3.1.2 hosts
>
> IF flag is always restored at drivers/md/dm.c
> static void clone_endio(struct bio *bio, int error)
> ...
> dm_endio_fn endio = tio->ti->type->end_io;
> ...
> when page fault happens accessing tio->ti->type field.
>
> After successful resync with kernel's pagetable in
> do_page_fault->vmalloc_fault, io continues happily on, however with IF flag
> restored even if faulted context's eflags register had no IF flag set.
> It happens with random task every time.
>
> Here is ftrace call graph showing problematic place:
> ========================================================
> # tracer: function_graph
> #
> # function_graph latency trace v1.1.5 on 3.0.0+
> # --------------------------------------------------------------------
> # latency: 0 us, #42330/242738181, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0
> #P:1)
> # -----------------
> # | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
> # -----------------
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| /
> # CPU|||| DURATION FUNCTION CALLS
> # | |||| | | | | | |
> 0) d... | xen_evtchn_do_upcall() {
> 0) d... | irq_enter() {
> 0) d.h. 2.880 us | }
> 0) d.h. | __xen_evtchn_do_upcall() {
> 0) d.h. 0.099 us | irq_to_desc();
> 0) d.h. | handle_edge_irq() {
> 0) d.h. 0.107 us | _raw_spin_lock();
> 0) d.h. | ack_dynirq() {
> 0) d.h. 3.153 us | }
> 0) d.h. | handle_irq_event() {
> 0) d.h. | handle_irq_event_percpu() {
> 0) d.h. | blkif_interrupt() {
> 0) d.h. 0.110 us | _raw_spin_lock_irqsave();
> 0) d.h. | __blk_end_request_all() {
> 0) d.h. |
> blk_update_bidi_request() {
> 0) d.h. | blk_update_request() {
> 0) d.h. | req_bio_endio() {
> 0) d.h. | bio_endio() {
> 0) d.h. | endio() {
> 0) d.h. | bio_put() {
> 0) d.h. 4.149 us | }
> 0) d.h. | dec_count() {
> 0) d.h. |
> mempool_free() {
> 0) d.h. 1.395 us | }
> 0) d.h. |
> read_callback() {
> 0) d.h. |
> bio_endio() {
> 0) d.h. |
> clone_endio() {
> 0) d.h. | /* ==>
> enter clone_endio: tio: c1e14c70 */
> 0) d.h. 0.104 us |
> arch_irqs_disabled_flags();
> 0) d.h. | /* ==>
> clone_endio: endio = tio->ti->type->end_io: tio->ti c918c040 */
> 0) d.h. 0.100 us |
> arch_irqs_disabled_flags();
> 0) d.h. 0.117 us |
> mirror_end_io();
> 0) d.h. |
> free_tio() {
> 0) d.h. 2.269 us | }
> 0) d.h. |
> bio_put() {
> 0) d.h. 3.933 us | }
> 0) d.h. |
> dec_pending() {
> 0) d.h. 0.100 us |
> atomic_dec_and_test();
> 0) d.h. |
> end_io_acct() {
> 0) d.h. 5.655 us | }
> 0) d.h. |
> free_io() {
> 0) d.h. 1.992 us | }
> 0) d.h. 0.098 us |
> trace_block_bio_complete();
> 0) d.h. |
> bio_endio() {
> 0) d.h. |
> clone_endio() {
> 0) d.h. |
> /* ==> enter clone_endio: tio: c1e14ee0 */
> 0) d.h. 0.098 us |
> arch_irqs_disabled_flags();
> 0) d.h. |
> do_page_fault() {
> 0) d.h. 0.103 us |
> xen_read_cr2();
> 0) d.h. |
> /* dpf: tsk: c785a6a0 mm: 0 comm: kworker/0:0 */
> 0) d.h. |
> /* before vmalloc_fault (c9552044) regs: c786db1c ip: c082bb20 eflags:
> 10002 err: 0 irq: off */
> ^^^ - fault error code
> 0) d.h. |
> vmalloc_fault() {
> 0) d.h. 0.104 us |
> xen_read_cr3();
> 0) d.h. |
> xen_pgd_val();
> 0) d.h. |
> xen_pgd_val();
> 0) d.h. |
> xen_set_pmd();
> 0) d.h. |
> xen_pmd_val();
> 0) d.h.+ 14.599 us |
> }
> 0) d.h.+ 18.019 us |
> }
> v -- irq enabled
> 0) ..h. |
> /* ==> clone_endio: endio = tio->ti->type->end_io: tio->ti c9552040 */
> 0) ..h. 0.102 us |
> arch_irqs_disabled_flags();
> 0) ..h. |
> /* <7>clone_endio BUG DETECTED irq */
> ========================================
>
> So IF flag is restored right after exiting from do_page_fault().
>
> Any thoughts why it might happen?
>
> PS:
> Full logs, additional trace patch, kernel config and a way reproduce bug can
> be found at https://bugzilla.redhat.com/show_bug.cgi?id=707552
>
>
>
> ______________________________________
> If you reply to this email, your message will be added to the discussion below:
> http://xen.1045712.n5.nabble.com/Fix-the-occasional-xen-blkfront-deadlock-when-irqbalancing-tp2644296p4704111.html
> This email was sent by imammedo (via Nabble)
> To receive all replies by email, subscribe to this discussion: http://xen.1045712.n5.nabble.com/template/NamlServlet.jtp?macro=subscribe_by_code&node=2644296&code=a29ucmFkLndpbGtAb3JhY2xlLmNvbXwyNjQ0Mjk2fDE1MjU5MDEwODc=
next prev parent reply other threads:[~2011-08-16 14:57 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-23 6:54 Fix the occasional xen-blkfront deadlock, when irqbalancing Daniel Stodden
2010-08-23 6:54 ` [PATCH] blkfront: Move blkif_interrupt into a tasklet Daniel Stodden
2010-08-23 7:01 ` Daniel Stodden
2010-09-02 22:46 ` Jeremy Fitzhardinge
2010-09-02 23:08 ` Daniel Stodden
2010-09-07 1:39 ` blktap lockdep hiccup Jeremy Fitzhardinge
2010-09-07 1:46 ` Daniel Stodden
2010-09-08 2:03 ` [PATCH] blkfront: Move blkif_interrupt into a tasklet Jeremy Fitzhardinge
2010-09-08 2:21 ` Daniel Stodden
2010-09-08 6:37 ` Jeremy Fitzhardinge
2010-09-23 16:08 ` Andrew Jones
2010-09-23 16:23 ` Jeremy Fitzhardinge
2010-09-23 16:38 ` Paolo Bonzini
2010-09-23 18:36 ` Jeremy Fitzhardinge
2010-09-24 7:14 ` Andrew Jones
2010-09-24 18:50 ` Jeremy Fitzhardinge
2010-09-27 7:41 ` Andrew Jones
2010-09-27 9:46 ` Daniel Stodden
2010-09-27 10:21 ` Andrew Jones
2011-08-16 11:26 ` imammedo
2011-08-16 14:57 ` Konrad Rzeszutek Wilk [this message]
2011-08-17 2:38 ` Konrad Rzeszutek Wilk
2011-08-17 7:30 ` Paolo Bonzini
2011-08-17 9:07 ` Igor Mammedov
2011-08-24 15:36 ` Konrad Rzeszutek Wilk
2011-08-24 16:36 ` Igor Mammedov
2011-08-29 19:46 ` Konrad Rzeszutek Wilk
2011-08-31 23:47 ` [PATCH] xen: x86_32: do not enable iterrupts when returning from exception in interrupt context Igor Mammedov
2011-08-31 22:37 ` Jeremy Fitzhardinge
2011-09-01 8:19 ` Igor Mammedov
2011-09-01 11:46 ` [PATCH v2] " Igor Mammedov
2011-09-01 15:45 ` Konrad Rzeszutek Wilk
2011-09-01 16:46 ` Jeremy Fitzhardinge
2011-09-02 8:18 ` Igor Mammedov
2011-09-02 13:40 ` Konrad Rzeszutek Wilk
2011-09-02 14:01 ` [Xen-devel] " Igor Mammedov
2011-09-02 14:47 ` Konrad Rzeszutek Wilk
2011-09-06 9:16 ` Igor Mammedov
2011-09-02 9:19 ` Igor Mammedov
2011-09-02 10:00 ` Keir Fraser
2010-08-23 21:09 ` Fix the occasional xen-blkfront deadlock, when irqbalancing Jeremy Fitzhardinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110816145754.GA31431@dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=jeremy@goop.org \
--cc=ml-node+4704111-2053006313-93434@n5.nabble.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).