All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: andrew.cooper3@citrix.com, malcolm.crossley@citrix.com,
	linux@eikelenboom.it, xen-devel@lists.xenproject.org
Subject: Re: [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.
Date: Mon, 16 Mar 2015 13:59:24 -0400	[thread overview]
Message-ID: <20150316175924.GA26409@l.oracle.com> (raw)
In-Reply-To: <54D09BE2020000780005C407@mail.emea.novell.com>

> > If we do explicitly we run risk of dead-lock. See below of an draft
> > (not even compiled tested) of what I think you mean.
> 
> That's no different from the code you proposed before, just that
> the live-lock is better hidden there: By re-raising a softirq from a
> softirq handler, you arrange for yourself to be called again right
> away.

Interestingly enough this issue of live-lock has been there since
the start (when this code was written).
The tasklet (so Xen 4.5 and earlier) based implementation allows this:


CPU0                            CPU1                     CPU2

hvm_do_IRQ_dpci()
 - tasklet_schedule()
   [t->scheduled_on = 0]
   if (!t->is_running)
	put tasklet on per-cpu list. raise softirq


 . do_softirq
    [do_tasklet_work]
    t->scheduled_on = -1
    t->is_running = 1
    unlock the lock
     -> call hvm_dirq_assist
                                do_IRQ
                                 hvm_do_IRQ_dpci
                                t->scheduled_on = 1;
				if (!t->is_running).. [it is 1, so skip]

                                			do_IRQ
                                 			 hvm_do_IRQ_dpci
                                			 t->scheduled_on = 2;
							 if (!t->is_running).. [it is 1, so skip]

   takes the lock
    if (t->scheduled_on >= 0)
         tasklet_enqueue:
          put tasklet on per-cpu list, raise softirq.

   [do_tasklet_work]
   .. repeat the story. Inject interrupts right as 'hvm_dirq_assist' it executing.

With interrupts happening right as the dpci_work is being called we can
cause an exact live-lock.

It also suffers from "skipping" interrupts if they span more
than on CPU. The implementation I had does not address this issue
either (it only picks up the 'second' interrupt while the older
picks the 'last' one).

Anyhow the patch below makes the code from a behavior standpoint
the same as what we have in Xen 4.5 and earlier (with warts).
The difference is that the warts are so much faster so you don't
seem them :-)
> 
> > --- a/xen/drivers/passthrough/io.c
> > +++ b/xen/drivers/passthrough/io.c
> > @@ -63,10 +63,35 @@ enum {
> >  static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
> >  {
> >      unsigned long flags;
> > +    unsigned long old, new, val = pirq_dpci->state;
> >  
> > -    if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
> > -        return;
> > -
> > +    for ( ;; )
> > +    {
> > +        old = cmpxchg(&pirq_dpci->state, 0, 1 << STATE_SCHED);
> > +        switch ( old )
> > +        {
> > +        case (1 << STATE_SCHED):
> > +            /*
> > +             * Whenever STATE_SCHED is set we MUST not schedule it.
> > +             */
> > +            return;
> > +        case (1 << STATE_RUN) | (1 << STATE_SCHED):
> > +        case (1 << STATE_RUN):
> > +            /* Getting close to finish. Spin. */
> > +            continue;
> > +        }
> > +        /*
> > +         * If the 'state' is 0 we can schedule it.
> > +         */
> > +        if ( old == 0 )
> > +            break;
> > +    }
> >      get_knownalive_domain(pirq_dpci->dom);
> >  
> >      local_irq_save(flags);
> 
> Yes, this looks better. And I again wonder whether STATE_*
> couldn't simply become a tristate - dpci_softirq(), rather than setting
> STATE_RUN and then clearing STATE_SCHED, could simply
> cmpxchg(, STATE_SCHED, STATE_RUN), acting as necessary when
> that operation fails.

One issue that needs to be figured out is what to do when
we have an interrupt mini-storm and need to queue it up.

One option could be to convert the ->mapping to be an
simple atomic counter - incremented every time
hvm_do_IRQ_dpci is called (and hvm_dirq_assist would
have an while loop decrementing this).

However as previous reviews demonstrated there are a lot
of subtle things that need to be taken care of:
 - Legacy interrupts and its 'lets-reset mapping to zero' when
   the interrupt timer has elapsed.
 - Deal with error states when assigning an PIRQ, or during
   the window time between creating/destroying loop and having
   to make sure that the state in the dpci_softirq is sane.

That took me two months to get right and only with other
folks testing and finding this. While I am totally up for
restarting this to make it awesome - I am also aghast at my
progress on that and fear that this might take _right_ past
the feature freeze window.

Hence was wondering if it would just be easier to put
this patch in (see above) - with the benfit that folks have
an faster interrupt passthrough experience and then I work on another
variant of this with tristate cmpxchg and ->mapping atomic counter.

Thoughts?
> 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

  reply	other threads:[~2015-03-16 17:59 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-12 16:45 [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU Konrad Rzeszutek Wilk
2015-01-12 17:27 ` Sander Eikelenboom
2015-01-12 17:35 ` Konrad Rzeszutek Wilk
2015-01-13 10:26   ` Jan Beulich
2015-01-13 10:20 ` Jan Beulich
2015-01-23  1:44   ` Konrad Rzeszutek Wilk
2015-01-23  9:37     ` Jan Beulich
2015-01-23 14:54       ` Konrad Rzeszutek Wilk
2015-03-17 17:44       ` Konrad Rzeszutek Wilk
2015-03-17 22:16         ` Sander Eikelenboom
2015-03-18  7:41         ` Jan Beulich
2015-03-18 14:06           ` Konrad Rzeszutek Wilk
2015-03-18 16:43             ` Jan Beulich
2015-03-18 17:00               ` Konrad Rzeszutek Wilk
2015-03-19  7:15                 ` Jan Beulich
2015-02-02 14:29   ` Konrad Rzeszutek Wilk
2015-02-02 15:19     ` Jan Beulich
2015-02-02 15:31       ` Konrad Rzeszutek Wilk
2015-02-02 15:48         ` Jan Beulich
2015-02-02 17:44           ` Konrad Rzeszutek Wilk
2015-02-03  8:58             ` Jan Beulich
2015-03-16 17:59               ` Konrad Rzeszutek Wilk [this message]
2015-03-17  8:18                 ` Jan Beulich
2015-03-17  8:42                   ` Sander Eikelenboom
2015-03-17 14:54                     ` Konrad Rzeszutek Wilk
2015-03-17 16:01                       ` Jan Beulich
2015-03-17 16:09                         ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150316175924.GA26409@l.oracle.com \
    --to=konrad.wilk@oracle.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=linux@eikelenboom.it \
    --cc=malcolm.crossley@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.