From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: David Vrabel <david.vrabel@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Jan Beulich <jbeulich@suse.com>,
xen-devel@lists.xen.org
Subject: Re: [PATCH 13/13] xen/events: use the FIFO-based ABI if available
Date: Tue, 24 Sep 2013 10:04:18 -0700 (PDT) [thread overview]
Message-ID: <20130924170418.GA14139@phenom.dumpdata.com> (raw)
In-Reply-To: <5241C27A.2050900@cantab.net>
On Tue, Sep 24, 2013 at 05:48:58PM +0100, David Vrabel wrote:
> On 24/09/2013 16:50, Konrad Rzeszutek Wilk wrote:
> > On Fri, Sep 13, 2013 at 06:00:01PM +0100, David Vrabel wrote:
> >> +
> >> + error:
> >> + if (event_array_pages == 0)
> >> + panic("xen: unable to expand event array with initial page (%d)\n", ret);
> >
> > Shouldn't we just printk and return a negative value so we can use
> > the old style event channels?
>
> No, once you've called EVTCHNOP_init_control for at least one VCPU the
> hypervisor has switched ABIs and there is (deliberatly[*]) no mechanism
> to switch back.
>
> [*] it greatly simplifies the hypervisor ABI and implementation if we
> don't have to unwind a half initialized switch over. This failure case
> will never happen in practise as the hypervisor will run out of domheap
> pages for the guest memory before it runs out of global mapping space.
OK. And that is nicely mentioned in the design doc which I failed to spot
until now.
>
> >> + else
> >> + printk(KERN_ERR "xen: unable to expand event array (%d)\n", ret);
> >> + free_unused_array_pages();
> >> + return ret;
> >> +}
> >> +
> >> +static void fifo_bind_to_cpu(struct irq_info *info, int cpu)
> >> +{
> >> + /* no-op */
> >
> > Perhaps you can also say: "The event array is shared between all of the
> > guests VCPUs." (from design-E).
>
> That's not relevant here. This is a no-op because there is no guest
> side state necessary for tracking which VCPU an event channel is bound
> to -- this is all done by the per-VCPU queues.
Ah, gotcha.
>
> >> +static void fifo_unmask(int port)
> >> +{
> >> + unsigned int cpu = get_cpu();
> >> + bool do_hypercall = false;
> >> + bool evtchn_pending = false;
> >> +
> >> + BUG_ON(!irqs_disabled());
> >> +
> >> + if (unlikely((cpu != cpu_from_evtchn(port))))
> >> + do_hypercall = true;
> >
> > Since the event array is shared across all of the vCPUs is this still
> > needed?
>
> The shared event array isn't relevant (the 2-level ABI also has shared
> pending and mask arrays).
>
> But, since a hypercall is always required if an event is pending then we
> can omit the above check -- it's not possible (as in the 2-level case)
> to optimize out the hypercall if we're already on the VCPU.
>
> >> +static uint32_t clear_linked(volatile event_word_t *word)
> >> +{
> >> + event_word_t n, o, w;
> >
> > How about just 'new', 'old', 'temp' ?
>
> The letters match the design doc.
Since you have to modify the design document anyway would it make
sense to alter it as well to have less cryptic variables?
>
> >> +
> >> +static void fifo_handle_events(int cpu)
> >> +{
> >> + struct evtchn_fifo_control_block *control_block;
> >> + uint32_t ready;
> >> + int q;
> >> +
> >> + control_block = per_cpu(cpu_control_block, cpu);
> >> +
> >> + ready = xchg(&control_block->ready, 0);
> >> +
> >> + while (ready & EVTCHN_FIFO_READY_MASK) {
> >> + q = find_first_bit(BM(&ready), EVTCHN_FIFO_MAX_QUEUES);
> >> + consume_one_event(cpu, control_block, q, &ready);
> >> + ready |= xchg(&control_block->ready, 0);
> >
> > Say priority 0 is where VIRQ_TIMER is set and the TIMER is set to
> > be incredibly short.
> >
> > Couldn't this cause a scenario where the guest clears the ready in
> > consume_one_event (clear_bit(0, BM(ready)) and the hypervisor
> > at the same time can set the bit. We then process the IRQ (virq_timer).
> >
> > Then we get to the xchg here and end up with the ready having the
> > priority 0 bit set again.
> >
> > And then loop again and again..
>
> Don't do that then. This is no different to the behaviour of hardware
> interrupts.
I was thinking there might be some mitigation techinque - and there is.
The Linux kernel would disable the IRQ, which hopefully means that this
defective port ends up being masked.
>
> >> + ret = HYPERVISOR_event_channel_op(EVTCHNOP_init_control,
> >> + &init_control);
> >> + if (ret < 0)
> >> + BUG();
> >
> > So if this is run after migration to an older hypevisor won't we
> > blow up here? Shouldn't we just exit with an return and use the
> > old style classis events?
>
> Migrating to older hypervisors has never been supported. You can only
> go forwards.
OK.
>
> >> +static int __cpuinit fifo_cpu_notification(struct notifier_block *self,
> >> + unsigned long action, void *hcpu)
> >> +{
> >> + int cpu = (long)hcpu;
> >> + int ret = 0;
> >> +
> >> + switch (action) {
> >> + case CPU_UP_PREPARE:
> >> + if (!per_cpu(cpu_control_block, cpu))
> >> + ret = fifo_init_control_block(cpu);
> >> + break;
> >> + default:
> >> + break;
> >> + }
> >> + return ret < 0 ? NOTIFY_BAD : NOTIFY_OK;
> >
> > I think you should return NOTIFY_OK.
> >
> > Say you do this:
> >
> > 1) Launch guest on a new hypervisor (which has FIFO ABI)
> > 2) Bring down all CPUs but one.
> > 3) Migrate to an older hypervisor (no FIFI ABI)
> > 4) Bring up all of the CPUs.
> >
> > We shouldn't return NOTIFY_BAD b/c the hypervisor we are resuming
> > on is too old. We should just continue on without the FIFO mechanism
> > in place.
>
> Again, migrating to an older hypervisor has never been supported. Also,
> once we have switched to the FIFO-based ABI for one VCPU we cannot go
> back and it's better to fail to bring up the VCPU than have one that
> cannot receive events.
OK.
>
> David
next prev parent reply other threads:[~2013-09-24 17:04 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-13 16:59 (no subject) David Vrabel
2013-09-13 16:59 ` [PATCH 01/13] xen/events: refactor retrigger_dynirq() and resend_irq_on_evtchn() David Vrabel
2013-09-24 14:37 ` Konrad Rzeszutek Wilk
2013-09-24 16:04 ` David Vrabel
2013-09-13 16:59 ` [PATCH 02/13] xen/events: remove unnecessary init_evtchn_cpu_bindings() David Vrabel
2013-09-24 14:40 ` Konrad Rzeszutek Wilk
2013-09-24 16:06 ` David Vrabel
2013-09-24 16:49 ` Konrad Rzeszutek Wilk
2013-09-13 16:59 ` [PATCH 03/13] xen/events: introduce test_and_set_mask David Vrabel
2013-09-24 14:40 ` Konrad Rzeszutek Wilk
2013-09-13 16:59 ` [PATCH 04/13] xen/events: replace raw bit ops with functions David Vrabel
2013-09-24 14:41 ` Konrad Rzeszutek Wilk
2013-09-13 16:59 ` [PATCH 05/13] xen/events: move drivers/xen/events.c into drivers/xen/events/ David Vrabel
2013-09-13 16:59 ` [PATCH 06/13] xen/events: move 2-level specific code into its own file David Vrabel
2013-09-13 16:59 ` [PATCH 07/13] xen/events: add struct evtchn_ops for the low-level port operations David Vrabel
2013-09-24 14:44 ` Konrad Rzeszutek Wilk
2013-09-13 16:59 ` [PATCH 08/13] xen/events: allow setup of irq_info to fail David Vrabel
2013-09-24 14:47 ` Konrad Rzeszutek Wilk
2013-09-13 16:59 ` [PATCH 09/13] xen/events: add a evtchn_op for port setup David Vrabel
2013-09-24 14:47 ` Konrad Rzeszutek Wilk
2013-09-13 16:59 ` [PATCH 10/13] xen/events: Refactor evtchn_to_irq array to be dynamically allocated David Vrabel
2013-09-24 14:58 ` Konrad Rzeszutek Wilk
2013-09-26 10:53 ` David Vrabel
2013-09-26 12:51 ` Konrad Rzeszutek Wilk
2013-09-13 16:59 ` [PATCH 11/13] xen/events: add xen_evtchn_mask_all() David Vrabel
2013-09-24 14:58 ` Konrad Rzeszutek Wilk
2013-09-13 17:00 ` [PATCH 12/13] xen/events: Add the hypervisor interface for the FIFO-based event channels David Vrabel
2013-09-24 15:06 ` Konrad Rzeszutek Wilk
2013-09-26 11:06 ` David Vrabel
2013-09-24 15:08 ` Konrad Rzeszutek Wilk
2013-09-24 16:11 ` David Vrabel
2013-09-24 16:51 ` Konrad Rzeszutek Wilk
2013-09-13 17:00 ` [PATCH 13/13] xen/events: use the FIFO-based ABI if available David Vrabel
2013-09-24 15:50 ` Konrad Rzeszutek Wilk
2013-09-24 16:48 ` David Vrabel
2013-09-24 17:04 ` Konrad Rzeszutek Wilk [this message]
2013-09-13 17:03 ` [RFC PATCHv3 00/12] Linux: FIFO-based event channel ABI David Vrabel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130924170418.GA14139@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=boris.ostrovsky@oracle.com \
--cc=david.vrabel@citrix.com \
--cc=jbeulich@suse.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).