All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel@lists.xenproject.org
Subject: Re: [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.
Date: Thu, 19 Mar 2015 15:18:50 -0400	[thread overview]
Message-ID: <20150319191850.GF21217@x230.dumpdata.com> (raw)
In-Reply-To: <550A8E8F020000780006B71B@mail.emea.novell.com>

On Thu, Mar 19, 2015 at 07:53:35AM +0000, Jan Beulich wrote:
> >>> On 17.03.15 at 16:38, <konrad.wilk@oracle.com> wrote:
> > There is race when we clear the STATE_SCHED in the softirq
> > - which allows the 'raise_softirq_for' (on another CPU or
> > on the one running the softirq) to schedule the dpci.
> > 
> > Specifically this can happen when the other CPU receives
> > an interrupt, calls 'raise_softirq_for', and puts the dpci
> > on its per-cpu list (same dpci structure). Note that
> > this could also happen on the same physical CPU, however
> > the explanation for simplicity will assume two CPUs actors.
> > 
> > There would be two 'dpci_softirq' running at the same time
> > (on different CPUs) where on one CPU it would be executing
> > hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN)
> > and on the other CPU it is trying to call:
> > 
> >    if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> >     	BUG();
> > 
> > Since STATE_RUN is already set it would end badly.
> > 
> > The reason we can get his with this is when an interrupt
> > affinity is set over multiple CPUs.
> > 
> > Potential solutions:
> > 
> > a) Instead of the BUG() we can put the dpci back on the per-cpu
> > list to deal with later (when the softirq are activated again).
> > This putting the 'dpci' back on the per-cpu list is an spin
> > until the bad condition clears.
> > 
> > b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for
> > to detect for 'STATE_RUN' bit being set and schedule the dpci.
> > The BUG() check in dpci_softirq would be replace with a spin until
> > 'STATE_RUN' has been cleared. The dpci would still not
> > be scheduled when STATE_SCHED bit was set.
> > 
> > c) Only schedule the dpci when the state is cleared
> > (no STATE_SCHED and no STATE_RUN).  It would spin if STATE_RUN is set
> > (as it is in progress and will finish). If the STATE_SCHED is set
> > (so hasn't run yet) we won't try to spin and just exit.
> > 
> > Down-sides of the solutions:
> > 
> > a). Live-lock of the CPU. We could be finishing an dpci, then adding
> > the dpci, exiting, and the processing the dpci once more. And so on.
> > We would eventually stop as the TIMER_SOFTIRQ would be set, which will
> > cause SCHEDULER_SOFTIRQ to be set as well and we would exit this loop.
> > 
> > Interestingly the old ('tasklet') code used this mechanism.
> > If the function assigned to the tasklet was running  - the softirq
> > that ran said function (hvm_dirq_assist) would be responsible for
> > putting the tasklet back on the per-cpu list. This would allow
> > to have an running tasklet and an 'to-be-scheduled' tasklet
> > at the same time.
> > 
> > b). is similar to a) - instead of re-entering the dpci_softirq
> > we are looping in the softirq waiting for the correct condition to
> > arrive. As it does not allow unwedging ourselves because the other
> > softirqs are not called - it is less preferable.
> > 
> > c) can cause an dead-lock if the interrupt comes in when we are
> > processing the dpci in the softirq - iff this happens on the same CPU.
> > We would be looping in on raise_softirq waiting for STATE_RUN
> > to be cleared, while the softirq that was to clear it - is preempted
> > by our interrupt handler.
> > 
> > As such, this patch - which implements a) is the best candidate
> > for this quagmire.
> > 
> > Reported-and-Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
> > Reported-and-Tested-by: Malcolm Crossley <malcolm.crossley@citrix.com>
> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> So I now agree that in the state we're in this is the most reasonable
> fix. My reservations against the extra logic introduced earlier (and

Thank you.

Are you OK with me checking it in?
> being fixed here) stand though: From an abstract perspective the
> IRQ and softirq logic alone should be sufficient to deal with the
> needs we have. The complications really result from the desire to
> use a per-CPU list of hvm_pirq_dpci-s, which I still think a simpler
> alternative should be found for (after all real hardware doesn't use
> such lists).
> 
> A first thought would be to put them all on a per-domain list and
> have a cpumask tracking which CPUs they need servicing on. The
> downside of this would be (apart from again not being a proper
> equivalent of how actual hardware handles this) that - the softirq
> handler not having any other context - domains needing servicing
> would also need to be tracked in some form (in order to avoid
> having to iterate over all of them), and a per-CPU list would be
> undesirable for the exact same reasons. Yet a per-CPU
> domain-needs-service bitmap doesn't seem very attractive either,
> i.e. this would need further thought (also to make sure such an
> alternative model doesn't become even more involved than what
> we have now).

HA! (yes, I completly agree on - "complex" == "unpleasant")

Perhaps we can brainstorm some of this at XenHackathon in Shanghai?
> 
> Jan

      reply	other threads:[~2015-03-19 19:18 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-17 15:38 [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU Konrad Rzeszutek Wilk
2015-03-17 16:06 ` Jan Beulich
2015-03-17 17:15   ` Konrad Rzeszutek Wilk
2015-03-18  7:38     ` Jan Beulich
2015-03-18 13:58       ` Konrad Rzeszutek Wilk
2015-03-18 16:06         ` Jan Beulich
2015-03-18 16:14           ` Konrad Rzeszutek Wilk
2015-03-19  7:53 ` Jan Beulich
2015-03-19 19:18   ` Konrad Rzeszutek Wilk [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150319191850.GF21217@x230.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=JBeulich@suse.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.