Re: [PATCH 13/13] xen/events: use the FIFO-based ABI if available

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: David Vrabel <david.vrabel@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Jan Beulich <jbeulich@suse.com>,
	xen-devel@lists.xen.org
Subject: Re: [PATCH 13/13] xen/events: use the FIFO-based ABI if available
Date: Tue, 24 Sep 2013 10:04:18 -0700 (PDT)	[thread overview]
Message-ID: <20130924170418.GA14139@phenom.dumpdata.com> (raw)
In-Reply-To: <5241C27A.2050900@cantab.net>

On Tue, Sep 24, 2013 at 05:48:58PM +0100, David Vrabel wrote:
> On 24/09/2013 16:50, Konrad Rzeszutek Wilk wrote:
> > On Fri, Sep 13, 2013 at 06:00:01PM +0100, David Vrabel wrote:
> >> +
> >> +  error:
> >> +	if (event_array_pages == 0)
> >> +		panic("xen: unable to expand event array with initial page (%d)\n", ret);
> > 
> > Shouldn't we just printk and return a negative value so we can use
> > the old style event channels?
> 
> No, once you've called EVTCHNOP_init_control for at least one VCPU the
> hypervisor has switched ABIs and there is (deliberatly[*]) no mechanism
> to switch back.
> 
> [*] it greatly simplifies the hypervisor ABI and implementation if we
> don't have to unwind a half initialized switch over.  This failure case
> will never happen in practise as the hypervisor will run out of domheap
> pages for the guest memory before it runs out of global mapping space.

OK. And that is nicely mentioned in the design doc which I failed to spot
until now.
> 
> >> +	else
> >> +		printk(KERN_ERR "xen: unable to expand event array (%d)\n", ret);
> >> +	free_unused_array_pages();
> >> +	return ret;
> >> +}
> >> +
> >> +static void fifo_bind_to_cpu(struct irq_info *info, int cpu)
> >> +{
> >> +	/* no-op */
> > 
> > Perhaps you can also say: "The event array is shared between all of the
> > guests VCPUs." (from design-E).
> 
> That's not relevant here.  This is a no-op because there is no guest
> side state necessary for tracking which VCPU an event channel is bound
> to -- this is all done by the per-VCPU queues.

Ah, gotcha.
> 
> >> +static void fifo_unmask(int port)
> >> +{
> >> +	unsigned int cpu = get_cpu();
> >> +	bool do_hypercall = false;
> >> +	bool evtchn_pending = false;
> >> +
> >> +	BUG_ON(!irqs_disabled());
> >> +
> >> +	if (unlikely((cpu != cpu_from_evtchn(port))))
> >> +		do_hypercall = true;
> > 
> > Since the event array is shared across all of the vCPUs is this still
> > needed?
> 
> The shared event array isn't relevant (the 2-level ABI also has shared
> pending and mask arrays).
> 
> But, since a hypercall is always required if an event is pending then we
> can omit the above check -- it's not possible (as in the 2-level case)
> to optimize out the hypercall if we're already on the VCPU.
> 
> >> +static uint32_t clear_linked(volatile event_word_t *word)
> >> +{
> >> +    event_word_t n, o, w;
> > 
> > How about just 'new', 'old', 'temp' ?
> 
> The letters match the design doc.

Since you have to modify the design document anyway would it make
sense to alter it as well to have less cryptic variables?

> 
> >> +
> >> +static void fifo_handle_events(int cpu)
> >> +{
> >> +	struct evtchn_fifo_control_block *control_block;
> >> +	uint32_t ready;
> >> +	int q;
> >> +
> >> +	control_block = per_cpu(cpu_control_block, cpu);
> >> +
> >> +	ready = xchg(&control_block->ready, 0);
> >> +
> >> +	while (ready & EVTCHN_FIFO_READY_MASK) {
> >> +		q = find_first_bit(BM(&ready), EVTCHN_FIFO_MAX_QUEUES);
> >> +		consume_one_event(cpu, control_block, q, &ready);
> >> +		ready |= xchg(&control_block->ready, 0);
> > 
> > Say priority 0 is where VIRQ_TIMER is set and the TIMER is set to
> > be incredibly short.
> > 
> > Couldn't this cause a scenario where the guest clears the ready in
> > consume_one_event (clear_bit(0, BM(ready)) and the hypervisor
> > at the same time can set the bit. We then process the IRQ (virq_timer).
> > 
> > Then we get to the xchg here and end up with the ready having the
> > priority 0 bit set again. 
> > 
> > And then loop again and again..
> 
> Don't do that then.  This is no different to the behaviour of hardware
> interrupts.

I was thinking there might be some mitigation techinque - and there is.

The Linux kernel would disable the IRQ, which hopefully means that this
defective port ends up being masked.


> 
> >> +		ret = HYPERVISOR_event_channel_op(EVTCHNOP_init_control,
> >> +						  &init_control);
> >> +		if (ret < 0)
> >> +			BUG();
> > 
> > So if this is run after migration to an older hypevisor won't we
> > blow up here? Shouldn't we just exit with an return and use the
> > old style classis events?
> 
> Migrating to older hypervisors has never been supported.  You can only
> go forwards.

OK.
> 
> >> +static int __cpuinit fifo_cpu_notification(struct notifier_block *self,
> >> +					   unsigned long action, void *hcpu)
> >> +{
> >> +	int cpu = (long)hcpu;
> >> +	int ret = 0;
> >> +
> >> +	switch (action) {
> >> +	case CPU_UP_PREPARE:
> >> +		if (!per_cpu(cpu_control_block, cpu))
> >> +			ret = fifo_init_control_block(cpu);
> >> +		break;
> >> +	default:
> >> +		break;
> >> +	}
> >> +	return ret < 0 ? NOTIFY_BAD : NOTIFY_OK;
> > 
> > I think you should return NOTIFY_OK.
> > 
> > Say you do this:
> > 
> > 1) Launch guest on a new hypervisor (which has FIFO ABI)
> > 2) Bring down all CPUs but one.
> > 3) Migrate to an older hypervisor (no FIFI ABI)
> > 4) Bring up all of the CPUs.
> > 
> > We shouldn't return NOTIFY_BAD b/c the hypervisor we are resuming
> > on is too old. We should just continue on without the FIFO mechanism
> > in place.
> 
> Again, migrating to an older hypervisor has never been supported.  Also,
> once we have switched to the FIFO-based ABI for one VCPU we cannot go
> back and it's better to fail to bring up the VCPU than have one that
> cannot receive events.

OK.
> 
> David

next prev parent reply	other threads:[~2013-09-24 17:04 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-13 16:59 (no subject) David Vrabel
2013-09-13 16:59 ` [PATCH 01/13] xen/events: refactor retrigger_dynirq() and resend_irq_on_evtchn() David Vrabel
2013-09-24 14:37   ` Konrad Rzeszutek Wilk
2013-09-24 16:04     ` David Vrabel
2013-09-13 16:59 ` [PATCH 02/13] xen/events: remove unnecessary init_evtchn_cpu_bindings() David Vrabel
2013-09-24 14:40   ` Konrad Rzeszutek Wilk
2013-09-24 16:06     ` David Vrabel
2013-09-24 16:49       ` Konrad Rzeszutek Wilk
2013-09-13 16:59 ` [PATCH 03/13] xen/events: introduce test_and_set_mask David Vrabel
2013-09-24 14:40   ` Konrad Rzeszutek Wilk
2013-09-13 16:59 ` [PATCH 04/13] xen/events: replace raw bit ops with functions David Vrabel
2013-09-24 14:41   ` Konrad Rzeszutek Wilk
2013-09-13 16:59 ` [PATCH 05/13] xen/events: move drivers/xen/events.c into drivers/xen/events/ David Vrabel
2013-09-13 16:59 ` [PATCH 06/13] xen/events: move 2-level specific code into its own file David Vrabel
2013-09-13 16:59 ` [PATCH 07/13] xen/events: add struct evtchn_ops for the low-level port operations David Vrabel
2013-09-24 14:44   ` Konrad Rzeszutek Wilk
2013-09-13 16:59 ` [PATCH 08/13] xen/events: allow setup of irq_info to fail David Vrabel
2013-09-24 14:47   ` Konrad Rzeszutek Wilk
2013-09-13 16:59 ` [PATCH 09/13] xen/events: add a evtchn_op for port setup David Vrabel
2013-09-24 14:47   ` Konrad Rzeszutek Wilk
2013-09-13 16:59 ` [PATCH 10/13] xen/events: Refactor evtchn_to_irq array to be dynamically allocated David Vrabel
2013-09-24 14:58   ` Konrad Rzeszutek Wilk
2013-09-26 10:53     ` David Vrabel
2013-09-26 12:51       ` Konrad Rzeszutek Wilk
2013-09-13 16:59 ` [PATCH 11/13] xen/events: add xen_evtchn_mask_all() David Vrabel
2013-09-24 14:58   ` Konrad Rzeszutek Wilk
2013-09-13 17:00 ` [PATCH 12/13] xen/events: Add the hypervisor interface for the FIFO-based event channels David Vrabel
2013-09-24 15:06   ` Konrad Rzeszutek Wilk
2013-09-26 11:06     ` David Vrabel
2013-09-24 15:08   ` Konrad Rzeszutek Wilk
2013-09-24 16:11     ` David Vrabel
2013-09-24 16:51       ` Konrad Rzeszutek Wilk
2013-09-13 17:00 ` [PATCH 13/13] xen/events: use the FIFO-based ABI if available David Vrabel
2013-09-24 15:50   ` Konrad Rzeszutek Wilk
2013-09-24 16:48     ` David Vrabel
2013-09-24 17:04       ` Konrad Rzeszutek Wilk [this message]
2013-09-13 17:03 ` [RFC PATCHv3 00/12] Linux: FIFO-based event channel ABI David Vrabel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130924170418.GA14139@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=david.vrabel@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).