qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Marc Zyngier <marc.zyngier@arm.com>
To: Laszlo Ersek <lersek@redhat.com>
Cc: "kwolf@redhat.com" <kwolf@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Richard W.M. Jones" <rjones@redhat.com>,
	"stefanha@redhat.com" <stefanha@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH v2 0/3] AioContext: ctx->dispatching is dead, all hail ctx->notify_me
Date: Fri, 17 Jul 2015 14:48:40 +0100	[thread overview]
Message-ID: <20150717144840.6dfafc3f@arm.com> (raw)
In-Reply-To: <55A905AB.4000603@redhat.com>

On Fri, 17 Jul 2015 14:39:55 +0100
Laszlo Ersek <lersek@redhat.com> wrote:

> On 07/17/15 15:28, Marc Zyngier wrote:
> > On Fri, 17 Jul 2015 10:30:38 +0100
> > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > 
> >>
> >>
> >> On 17/07/2015 06:44, Paolo Bonzini wrote:
> >>>
> >>>
> >>> On 16/07/2015 21:05, Richard W.M. Jones wrote:
> >>>>
> >>>> Sorry to spoil things, but I'm still seeing this bug, although it is
> >>>> now a lot less frequent with your patch.  I would estimate it happens
> >>>> more often than 1 in 5 runs with qemu.git, and probably 1 in 200 runs
> >>>> with qemu.git + the v2 patch series.
> >>>>
> >>>> It's the exact same hang in both cases.
> >>>>
> >>>> Is it possible that this patch doesn't completely close any race?
> >>>>
> >>>> Still, it is an improvement, so there is that.
> >>>
> >>> Would seem at first glance like a different bug.
> >>>
> >>> Interestingly, adding some "tracing" (qemu_clock_get_ns) makes the bug
> >>> more likely: now it reproduces in about 10 tries.  Of course :) adding
> >>> other kinds of tracing instead make it go away again (>50 tries).
> >>>
> >>> Perhaps this:
> >>>
> >>>    i/o thread         vcpu thread                   worker thread
> >>>    ---------------------------------------------------------------------
> >>>    lock_iothread
> >>>    notify_me = 1
> >>>    ...
> >>>    unlock_iothread
> >>>                       lock_iothread
> >>>                       notify_me = 3
> >>>                       ppoll
> >>>                       notify_me = 1
> >>>                                                      bh->scheduled = 1
> >>>                                                      event_notifier_set
> >>>                       event_notifier_test_and_clear
> >>>    ppoll
> >>>     ^^ hang
> >>> 	
> >>> In the exact shape above, it doesn't seem too likely to happen, but
> >>> perhaps there's another simpler case.  Still, the bug exists.
> >>>
> >>> The above is not really related to notify_me.  Here the notification is
> >>> not being optimized away!  So I wonder if this one has been there forever.
> >>>
> >>> Fam suggested putting the event_notifier_test_and_clear before
> >>> aio_bh_poll(), but it does not work.  I'll look more close
> >>>
> >>> However, an unconditional event_notifier_test_and_clear is pretty
> >>> expensive.  On one hand, obviously correctness comes first.  On the
> >>> other hand, an expensive operation at the wrong place can mask the race
> >>> very easily; I'll let the fix run for a while, but I'm not sure if a
> >>> successful test really says anything useful.
> >>
> >> So it may not be useful, but still successful test is successful. :)
> >> The following patch, which also includes the delta between v2 and v3
> >> of this series, survived 674 reboots before hitting a definitely
> >> unrelated problem:
> >>
> >> error: kvm run failed Function not implemented
> >> PC=00000000bf671210  SP=00000000c00001f0
> >> X00=000000000a003e70 X01=0000000000000000 X02=00000000bf680548 X03=0000000000000030
> >> X04=00000000bbb5fc18 X05=00000000004b7770 X06=00000000bf721930 X07=000000000000009a
> >> X08=00000000bf716858 X09=0000000000000090 X10=0000000000000000 X11=0000000000000046
> >> X12=00000000a007e03a X13=0000000000000000 X14=0000000000000000 X15=0000000000000000
> >> X16=00000000bf716df0 X17=0000000000000000 X18=0000000000000000 X19=00000000bf6f5f18
> >> X20=0000000000000000 X21=0000000000000000 X22=0000000000000000 X23=0000000000000000
> >> X24=0000000000000000 X25=0000000000000000 X26=0000000000000000 X27=0000000000000000
> >> X28=0000000000000000 X29=0000000000000000 X30=0000000000000000 PSTATE=60000305 (flags -ZC-)
> >>
> >> For the record, this is the kvm_run struct:
> >>
> >> $6 = {request_interrupt_window = 0 '\000', padding1 = "\000\000\000\000\000\000", exit_reason = 0, 
> >>   ready_for_interrupt_injection = 0 '\000', if_flag = 0 '\000', flags = 0, cr8 = 0, apic_base = 0, {hw = {
> >>       hardware_exit_reason = 150994968}, fail_entry = {hardware_entry_failure_reason = 150994968}, ex = {
> >>       exception = 150994968, error_code = 0}, io = {direction = 24 '\030', size = 0 '\000', port = 2304, 
> >>       count = 0, data_offset = 144}, debug = {arch = {<No data fields>}}, mmio = {phys_addr = 150994968, 
> >>       data = "\220\000\000\000\000\000\000", len = 4, is_write = 0 '\000'}, hypercall = {nr = 150994968, 
> >>       args = {144, 4, 0, 0, 0, 0}, ret = 0, longmode = 0, pad = 0}, tpr_access = {rip = 150994968, 
> >>       is_write = 144, pad = 0}, s390_sieic = {icptcode = 24 '\030', ipa = 2304, ipb = 0}, 
> >>     s390_reset_flags = 150994968, s390_ucontrol = {trans_exc_code = 150994968, pgm_code = 144}, dcr = {
> >>       dcrn = 150994968, data = 0, is_write = 144 '\220'}, internal = {suberror = 150994968, ndata = 0, 
> >>       data = {144, 4, 0 <repeats 14 times>}}, osi = {gprs = {150994968, 144, 4, 0 <repeats 29 times>}}, 
> >>     papr_hcall = {nr = 150994968, ret = 144, args = {4, 0, 0, 0, 0, 0, 0, 0, 0}}, s390_tsch = {
> >>       subchannel_id = 24, subchannel_nr = 2304, io_int_parm = 0, io_int_word = 144, ipb = 0, 
> >>       dequeued = 4 '\004'}, epr = {epr = 150994968}, system_event = {type = 150994968, flags = 144}, 
> >>     s390_stsi = {addr = 150994968, ar = 144 '\220', reserved = 0 '\000', fc = 0 '\000', sel1 = 0 '\000', 
> >>       sel2 = 0}, 
> >>     padding = "\030\000\000\t\000\000\000\000\220\000\000\000\000\000\000\000\004", '\000' <repeats 238 times>}, kvm_valid_regs = 0, kvm_dirty_regs = 0, s = {regs = {<No data fields>}, 
> >>     padding = '\000' <repeats 2047 times>}}
> >>
> >> Marc, does it ring any bell?
> > 
> > Well, this is an example of a guest accessing non-memory using an
> > instruction that we cannot safely emulate - not an IO accessor (load
> > multiple, for example).
> > 
> > In this case, we kill the guest (we could as well inject a fault).
> > 
> > This vcpu seems to be accessing 0x9000018 (the mmio structure points
> > there), but I can't immediately relate it to the content of the
> > registers.
> 
>     [VIRT_UART] =               { 0x09000000, 0x00001000 },
> 

Still: there is nothing in the registers that remotely points to that
area. X0 is the closest, but it'd take a big negative offset to get
there.

Is that a Linux kernel? or something else?

	M.
-- 
Jazz is not dead. It just smells funny.

  reply	other threads:[~2015-07-17 13:48 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-16  9:56 [Qemu-devel] [PATCH v2 0/3] AioContext: ctx->dispatching is dead, all hail ctx->notify_me Paolo Bonzini
2015-07-16  9:56 ` [Qemu-devel] [PATCH v2 1/3] tests: remove irrelevant assertions from test-aio Paolo Bonzini
2015-07-16  9:56 ` [Qemu-devel] [PATCH v2 2/3] aio-win32: reorganize polling loop Paolo Bonzini
2015-07-16  9:56 ` [Qemu-devel] [PATCH v2 3/3] AioContext: fix broken ctx->dispatching optimization Paolo Bonzini
2015-07-17  2:25   ` Fam Zheng
2015-07-17  2:27     ` Paolo Bonzini
2015-07-17  4:17   ` Paolo Bonzini
2015-07-17  8:39     ` Stefan Hajnoczi
2015-07-16 11:07 ` [Qemu-devel] [PATCH v2 0/3] AioContext: ctx->dispatching is dead, all hail ctx->notify_me Kevin Wolf
2015-07-16 12:44 ` Richard W.M. Jones
2015-07-16 19:05 ` Richard W.M. Jones
2015-07-16 22:06   ` Paolo Bonzini
2015-07-17  0:17     ` Paolo Bonzini
2015-07-17  4:44   ` Paolo Bonzini
2015-07-17  9:30     ` Paolo Bonzini
2015-07-17 12:58       ` Richard W.M. Jones
2015-07-17 13:02         ` Paolo Bonzini
2015-07-17 13:28       ` Marc Zyngier
2015-07-17 13:39         ` Laszlo Ersek
2015-07-17 13:48           ` Marc Zyngier [this message]
2015-07-17 13:53             ` Richard W.M. Jones
2015-07-17 14:03               ` Marc Zyngier
2015-07-17 13:57             ` Laszlo Ersek
2015-07-17 14:04         ` Paolo Bonzini
2015-07-17 14:18           ` Marc Zyngier
  -- strict thread matches above, loose matches on Subject: below --
2015-07-18 20:21 Paolo Bonzini
2015-07-19 10:08 ` Richard W.M. Jones
2015-07-20 16:17 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150717144840.6dfafc3f@arm.com \
    --to=marc.zyngier@arm.com \
    --cc=kwolf@redhat.com \
    --cc=lersek@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).