All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Durrant <xadimgnik@gmail.com>
To: "'Jan Beulich'" <jbeulich@suse.com>
Cc: "'Andrew Cooper'" <andrew.cooper3@citrix.com>,
	"'Marek Marczykowski-Górecki'" <marmarek@invisiblethingslab.com>,
	'xen-devel' <xen-devel@lists.xenproject.org>
Subject: RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
Date: Fri, 5 Jun 2020 14:37:39 +0100	[thread overview]
Message-ID: <002001d63b3e$7c268a40$74739ec0$@xen.org> (raw)
In-Reply-To: <fe275c12-9bea-8733-dbdc-b225bf15fea3@suse.com>

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 05 June 2020 14:32
> To: paul@xen.org
> Cc: 'Marek Marczykowski-Górecki' <marmarek@invisiblethingslab.com>; 'Andrew Cooper'
> <andrew.cooper3@citrix.com>; 'xen-devel' <xen-devel@lists.xenproject.org>
> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> 
> On 05.06.2020 13:05, Paul Durrant wrote:
> > Sorry, only just catching up with this...
> >
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 05 June 2020 10:09
> >> To: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> >> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; xen-devel <xen-devel@lists.xenproject.org>; Paul
> >> Durrant <paul@xen.org>
> >> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> >>
> >> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote:
> >>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote:
> >>>> On 04.06.2020 13:13, Andrew Cooper wrote:
> >>>>> On 04/06/2020 08:08, Jan Beulich wrote:
> >>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:
> >>>>>>> Then, we get the main issue:
> >>>>>>>
> >>>>>>>     (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>>>>>>     (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff
> >>>>>>>     (XEN) domain_crash called from io.c:178
> >>>>>>>
> >>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom
> >>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.
> >>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can
> >>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all
> >>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't
> >>>>>> be any I/O there anymore, and hence no I/O emulation)?
> >>>>>
> >>>>> The vcpu in question is talking to Qemu, so will have v->defer_shutdown
> >>>>> intermittently set, and skip the pause in domain_shutdown()
> >>>>>
> >>>>> I presume this lack of pause is to allow the vcpu in question to still
> >>>>> be scheduled to consume the IOREQ reply?  (Its fairly opaque logic with
> >>>>> 0 clarifying details).
> >>>>>
> >>>>> What *should* happen is that, after consuming the reply, the vcpu should
> >>>>> notice and pause itself, at which point it would yield to the
> >>>>> scheduler.  This is the purpose of vcpu_{start,end}_shutdown_deferral().
> >>>>>
> >>>>> Evidentially, this is not happening.
> >>>>
> >>>> We can't tell yet, until ...
> >>>>
> >>>>> Marek: can you add a BUG() after the weird PIO printing?  That should
> >>>>> confirm whether we're getting into handle_pio() via the
> >>>>> handle_hvm_io_completion() path, or via the vmexit path (at which case,
> >>>>> we're fully re-entering the guest).
> >>>>
> >>>> ... we know this. handle_pio() gets called from handle_hvm_io_completion()
> >>>> after having called hvm_wait_for_io() -> hvm_io_assist() ->
> >>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't call
> >>>> handle_pio() (etc) at all anymore in this state. IOW perhaps
> >>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down"
> >>>> instead of plain "true"?
> >>>>
> >>>> Adding Paul to Cc, as being the maintainer here.
> >>>
> >>> Got it, by sticking BUG() just before that domain_crash() in
> >>> handle_pio(). And also vcpu 0 of both HVM domains do have
> >>> v->defer_shutdown.
> >>
> >> As per the log they did get it set. I'd be curious of the flag's
> >> value (as well as v->paused_for_shutdown's) at the point of the
> >> problematic handle_pio() invocation (see below). It may be
> >> worthwhile to instrument vcpu_check_shutdown() (inside its if())
> >> - before exiting to guest context (in order to then come back
> >> and call handle_pio()) the vCPU ought to be getting through
> >> there. No indication of it doing so would be a sign that there's
> >> a code path bypassing the call to vcpu_end_shutdown_deferral().
> >>
> >>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off.
> >>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001
> >>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001
> >>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0
> >>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1
> >>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done
> >>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off.
> >>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> >>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000
> >>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001
> >>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001
> >>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0
> >>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1
> >>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done
> >>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6
> >>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6
> >>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000
> >>
> >> Perhaps in this message could you also log
> >> v->domain->is_shutting_down, v->defer_shutdown, and
> >> v->paused_for_shutdown? (Would be nice if, after having made
> >> changes to your debugging patch, you could point again at the
> >> precise version you've used for the log provided.)
> >>
> >>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff
> >>> (XEN) Xen BUG at io.c:178
> >>
> >> Btw, instead of BUG(), WARN() or dump_execution_state() would
> >> likely also do, keeping Xen alive.
> >>
> >
> > A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it?
> 
> Where would this originate?

I was referring to the 'if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )' at the top of hvm_send_ioreq().

> 
> > That would mean we wouldn't be seeing the "Unexpected PIO" message. From that message this clearly
> X86EMUL_UNHANDLEABLE which suggests a race with ioreq server teardown, possibly due to selecting a
> server but then not finding a vcpu match in ioreq_vcpu_list.
> 
> I was suspecting such, but at least the tearing down of all servers
> happens only from relinquish-resources, which gets started only
> after ->is_shut_down got set (unless the tool stack invoked
> XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown
> set for the domain).
> 
> For individually unregistered servers - yes, if qemu did so, this
> would be a problem. They need to remain registered until all vCPU-s
> in the domain got paused.

It shouldn't be a problem should it? Destroying an individual server is only done with the domain paused, so no vcpus can be running at the time.

  Paul

> 
> Jan



  reply	other threads:[~2020-06-05 13:37 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04  1:46 handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom Marek Marczykowski-Górecki
2020-06-04  7:08 ` Jan Beulich
2020-06-04  7:17   ` Jan Beulich
2020-06-04 11:13   ` Andrew Cooper
2020-06-04 12:36     ` Jan Beulich
2020-06-04 14:25       ` Marek Marczykowski-Górecki
2020-06-05  9:09         ` Jan Beulich
2020-06-05  9:22           ` Jan Beulich
2020-06-05 12:01             ` Marek Marczykowski-Górecki
2020-06-05 12:39               ` Paul Durrant
2020-06-05 13:04                 ` 'Marek Marczykowski-Górecki'
2020-06-05 13:24                   ` Paul Durrant
2020-06-05 12:44               ` Andrew Cooper
2020-06-05 14:13               ` Jan Beulich
2020-06-05 11:05           ` Paul Durrant
2020-06-05 11:25             ` Paul Durrant
2020-06-05 13:36               ` Jan Beulich
2020-06-05 13:43                 ` Paul Durrant
2020-06-05 13:46                   ` Jan Beulich
2020-06-05 13:49                     ` Paul Durrant
2020-06-05 15:48                       ` Paul Durrant
2020-06-05 17:13                         ` 'Marek Marczykowski-Górecki'
2020-06-05 17:24                           ` Paul Durrant
2020-06-05 20:43                             ` 'Marek Marczykowski-Górecki'
2020-06-07 13:32                               ` Paul Durrant
2020-06-05 13:32             ` Jan Beulich
2020-06-05 13:37               ` Paul Durrant [this message]
2020-06-05 13:57                 ` Jan Beulich
2020-06-05 15:39                   ` Paul Durrant
2020-06-05 16:18                     ` 'Marek Marczykowski-Górecki'
2020-06-08  8:13                       ` Jan Beulich
2020-06-08  9:15                         ` Paul Durrant
2020-06-08  9:24                           ` Jürgen Groß
2020-06-08 12:38                             ` Paul Durrant
2020-06-08 12:46                               ` Jan Beulich
2020-06-08 12:54                                 ` Paul Durrant
2020-06-05  9:38 ` Jan Beulich
2020-06-05 11:18   ` Marek Marczykowski-Górecki
2020-06-05 13:59     ` Jan Beulich
2020-06-05 15:10       ` Paul Durrant

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='002001d63b3e$7c268a40$74739ec0$@xen.org' \
    --to=xadimgnik@gmail.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=marmarek@invisiblethingslab.com \
    --cc=paul@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.