All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <marc.zyngier@arm.com>
To: Laszlo Ersek <lersek@redhat.com>
Cc: "kwolf@redhat.com" <kwolf@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Richard W.M. Jones" <rjones@redhat.com>,
	"stefanha@redhat.com" <stefanha@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH v2 0/3] AioContext: ctx->dispatching is dead, all hail ctx->notify_me
Date: Fri, 17 Jul 2015 14:48:40 +0100	[thread overview]
Message-ID: <20150717144840.6dfafc3f@arm.com> (raw)
In-Reply-To: <55A905AB.4000603@redhat.com>

On Fri, 17 Jul 2015 14:39:55 +0100
Laszlo Ersek <lersek@redhat.com> wrote:

> On 07/17/15 15:28, Marc Zyngier wrote:
> > On Fri, 17 Jul 2015 10:30:38 +0100
> > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > 
> >>
> >>
> >> On 17/07/2015 06:44, Paolo Bonzini wrote:
> >>>
> >>>
> >>> On 16/07/2015 21:05, Richard W.M. Jones wrote:
> >>>>
> >>>> Sorry to spoil things, but I'm still seeing this bug, although it is
> >>>> now a lot less frequent with your patch.  I would estimate it happens
> >>>> more often than 1 in 5 runs with qemu.git, and probably 1 in 200 runs
> >>>> with qemu.git + the v2 patch series.
> >>>>
> >>>> It's the exact same hang in both cases.
> >>>>
> >>>> Is it possible that this patch doesn't completely close any race?
> >>>>
> >>>> Still, it is an improvement, so there is that.
> >>>
> >>> Would seem at first glance like a different bug.
> >>>
> >>> Interestingly, adding some "tracing" (qemu_clock_get_ns) makes the bug
> >>> more likely: now it reproduces in about 10 tries.  Of course :) adding
> >>> other kinds of tracing instead make it go away again (>50 tries).
> >>>
> >>> Perhaps this:
> >>>
> >>>    i/o thread         vcpu thread                   worker thread
> >>>    ---------------------------------------------------------------------
> >>>    lock_iothread
> >>>    notify_me = 1
> >>>    ...
> >>>    unlock_iothread
> >>>                       lock_iothread
> >>>                       notify_me = 3
> >>>                       ppoll
> >>>                       notify_me = 1
> >>>                                                      bh->scheduled = 1
> >>>                                                      event_notifier_set
> >>>                       event_notifier_test_and_clear
> >>>    ppoll
> >>>     ^^ hang
> >>> 	
> >>> In the exact shape above, it doesn't seem too likely to happen, but
> >>> perhaps there's another simpler case.  Still, the bug exists.
> >>>
> >>> The above is not really related to notify_me.  Here the notification is
> >>> not being optimized away!  So I wonder if this one has been there forever.
> >>>
> >>> Fam suggested putting the event_notifier_test_and_clear before
> >>> aio_bh_poll(), but it does not work.  I'll look more close
> >>>
> >>> However, an unconditional event_notifier_test_and_clear is pretty
> >>> expensive.  On one hand, obviously correctness comes first.  On the
> >>> other hand, an expensive operation at the wrong place can mask the race
> >>> very easily; I'll let the fix run for a while, but I'm not sure if a
> >>> successful test really says anything useful.
> >>
> >> So it may not be useful, but still successful test is successful. :)
> >> The following patch, which also includes the delta between v2 and v3
> >> of this series, survived 674 reboots before hitting a definitely
> >> unrelated problem:
> >>
> >> error: kvm run failed Function not implemented
> >> PC=00000000bf671210  SP=00000000c00001f0
> >> X00=000000000a003e70 X01=0000000000000000 X02=00000000bf680548 X03=0000000000000030
> >> X04=00000000bbb5fc18 X05=00000000004b7770 X06=00000000bf721930 X07=000000000000009a
> >> X08=00000000bf716858 X09=0000000000000090 X10=0000000000000000 X11=0000000000000046
> >> X12=00000000a007e03a X13=0000000000000000 X14=0000000000000000 X15=0000000000000000
> >> X16=00000000bf716df0 X17=0000000000000000 X18=0000000000000000 X19=00000000bf6f5f18
> >> X20=0000000000000000 X21=0000000000000000 X22=0000000000000000 X23=0000000000000000
> >> X24=0000000000000000 X25=0000000000000000 X26=0000000000000000 X27=0000000000000000
> >> X28=0000000000000000 X29=0000000000000000 X30=0000000000000000 PSTATE=60000305 (flags -ZC-)
> >>
> >> For the record, this is the kvm_run struct:
> >>
> >> $6 = {request_interrupt_window = 0 '\000', padding1 = "\000\000\000\000\000\000", exit_reason = 0, 
> >>   ready_for_interrupt_injection = 0 '\000', if_flag = 0 '\000', flags = 0, cr8 = 0, apic_base = 0, {hw = {
> >>       hardware_exit_reason = 150994968}, fail_entry = {hardware_entry_failure_reason = 150994968}, ex = {
> >>       exception = 150994968, error_code = 0}, io = {direction = 24 '\030', size = 0 '\000', port = 2304, 
> >>       count = 0, data_offset = 144}, debug = {arch = {<No data fields>}}, mmio = {phys_addr = 150994968, 
> >>       data = "\220\000\000\000\000\000\000", len = 4, is_write = 0 '\000'}, hypercall = {nr = 150994968, 
> >>       args = {144, 4, 0, 0, 0, 0}, ret = 0, longmode = 0, pad = 0}, tpr_access = {rip = 150994968, 
> >>       is_write = 144, pad = 0}, s390_sieic = {icptcode = 24 '\030', ipa = 2304, ipb = 0}, 
> >>     s390_reset_flags = 150994968, s390_ucontrol = {trans_exc_code = 150994968, pgm_code = 144}, dcr = {
> >>       dcrn = 150994968, data = 0, is_write = 144 '\220'}, internal = {suberror = 150994968, ndata = 0, 
> >>       data = {144, 4, 0 <repeats 14 times>}}, osi = {gprs = {150994968, 144, 4, 0 <repeats 29 times>}}, 
> >>     papr_hcall = {nr = 150994968, ret = 144, args = {4, 0, 0, 0, 0, 0, 0, 0, 0}}, s390_tsch = {
> >>       subchannel_id = 24, subchannel_nr = 2304, io_int_parm = 0, io_int_word = 144, ipb = 0, 
> >>       dequeued = 4 '\004'}, epr = {epr = 150994968}, system_event = {type = 150994968, flags = 144}, 
> >>     s390_stsi = {addr = 150994968, ar = 144 '\220', reserved = 0 '\000', fc = 0 '\000', sel1 = 0 '\000', 
> >>       sel2 = 0}, 
> >>     padding = "\030\000\000\t\000\000\000\000\220\000\000\000\000\000\000\000\004", '\000' <repeats 238 times>}, kvm_valid_regs = 0, kvm_dirty_regs = 0, s = {regs = {<No data fields>}, 
> >>     padding = '\000' <repeats 2047 times>}}
> >>
> >> Marc, does it ring any bell?
> > 
> > Well, this is an example of a guest accessing non-memory using an
> > instruction that we cannot safely emulate - not an IO accessor (load
> > multiple, for example).
> > 
> > In this case, we kill the guest (we could as well inject a fault).
> > 
> > This vcpu seems to be accessing 0x9000018 (the mmio structure points
> > there), but I can't immediately relate it to the content of the
> > registers.
> 
>     [VIRT_UART] =               { 0x09000000, 0x00001000 },
> 

Still: there is nothing in the registers that remotely points to that
area. X0 is the closest, but it'd take a big negative offset to get
there.

Is that a Linux kernel? or something else?

	M.
-- 
Jazz is not dead. It just smells funny.

  reply	other threads:[~2015-07-17 13:48 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-16  9:56 [Qemu-devel] [PATCH v2 0/3] AioContext: ctx->dispatching is dead, all hail ctx->notify_me Paolo Bonzini
2015-07-16  9:56 ` [Qemu-devel] [PATCH v2 1/3] tests: remove irrelevant assertions from test-aio Paolo Bonzini
2015-07-16  9:56 ` [Qemu-devel] [PATCH v2 2/3] aio-win32: reorganize polling loop Paolo Bonzini
2015-07-16  9:56 ` [Qemu-devel] [PATCH v2 3/3] AioContext: fix broken ctx->dispatching optimization Paolo Bonzini
2015-07-17  2:25   ` Fam Zheng
2015-07-17  2:27     ` Paolo Bonzini
2015-07-17  4:17   ` Paolo Bonzini
2015-07-17  8:39     ` Stefan Hajnoczi
2015-07-16 11:07 ` [Qemu-devel] [PATCH v2 0/3] AioContext: ctx->dispatching is dead, all hail ctx->notify_me Kevin Wolf
2015-07-16 12:44 ` Richard W.M. Jones
2015-07-16 19:05 ` Richard W.M. Jones
2015-07-16 22:06   ` Paolo Bonzini
2015-07-17  0:17     ` Paolo Bonzini
2015-07-17  4:44   ` Paolo Bonzini
2015-07-17  9:30     ` Paolo Bonzini
2015-07-17 12:58       ` Richard W.M. Jones
2015-07-17 13:02         ` Paolo Bonzini
2015-07-17 13:28       ` Marc Zyngier
2015-07-17 13:39         ` Laszlo Ersek
2015-07-17 13:48           ` Marc Zyngier [this message]
2015-07-17 13:53             ` Richard W.M. Jones
2015-07-17 14:03               ` Marc Zyngier
2015-07-17 13:57             ` Laszlo Ersek
2015-07-17 14:04         ` Paolo Bonzini
2015-07-17 14:18           ` Marc Zyngier
  -- strict thread matches above, loose matches on Subject: below --
2015-07-18 20:21 Paolo Bonzini
2015-07-19 10:08 ` Richard W.M. Jones
2015-07-20 16:17 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150717144840.6dfafc3f@arm.com \
    --to=marc.zyngier@arm.com \
    --cc=kwolf@redhat.com \
    --cc=lersek@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.