From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel@lists.xenproject.org, keir@xen.org,
ian.jackson@eu.citrix.com, ian.campbell@citrix.com, tim@xen.org
Subject: Re: [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model.
Date: Tue, 7 Oct 2014 11:40:43 -0400 [thread overview]
Message-ID: <20141007154043.GB32648@laptop.dumpdata.com> (raw)
In-Reply-To: <542924B1020000780003A403@mail.emea.novell.com>
On Mon, Sep 29, 2014 at 08:21:53AM +0100, Jan Beulich wrote:
> >>> On 27.09.14 at 03:32, <konrad.wilk@oracle.com> wrote:
> > On Thu, Sep 25, 2014 at 04:04:46PM +0100, Jan Beulich wrote:
> >> >>> On 25.09.14 at 16:48, <konrad.wilk@oracle.com> wrote:
> >> > On Thu, Sep 25, 2014 at 03:24:53PM +0100, Jan Beulich wrote:
> >> >> >>> On 23.09.14 at 04:10, <konrad.wilk@oracle.com> wrote:
> >> >> > @@ -130,6 +146,7 @@ int pt_irq_create_bind(
> >> >> > return -ENOMEM;
> >> >> > }
> >> >> > pirq_dpci = pirq_dpci(info);
> >> >> > + pt_pirq_reset(d, pirq_dpci);
> >> >> >
> >> >> > switch ( pt_irq_bind->irq_type )
> >> >> > {
> >> >> > @@ -232,7 +249,6 @@ int pt_irq_create_bind(
> >> >> > {
> >> >> > unsigned int share;
> >> >> >
> >> >> > - pirq_dpci->dom = d;
> >> >> > if ( pt_irq_bind->irq_type == PT_IRQ_TYPE_MSI_TRANSLATE )
> >> >> > {
> >> >> > pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED |
> >> >> > @@ -258,7 +274,6 @@ int pt_irq_create_bind(
> >> >> > {
> >> >> > if ( pt_irq_need_timer(pirq_dpci->flags) )
> >> >> > kill_timer(&pirq_dpci->timer);
> >> >> > - pirq_dpci->dom = NULL;
> >> >> > list_del(&girq->list);
> >> >> > list_del(&digl->list);
> >> >> > hvm_irq_dpci->link_cnt[link]--;
> >> >> > @@ -391,7 +406,6 @@ int pt_irq_destroy_bind(
> >> >> > msixtbl_pt_unregister(d, pirq);
> >> >> > if ( pt_irq_need_timer(pirq_dpci->flags) )
> >> >> > kill_timer(&pirq_dpci->timer);
> >> >> > - pirq_dpci->dom = NULL;
> >> >> > pirq_dpci->flags = 0;
> >> >> > pirq_cleanup_check(pirq, d);
> >> >> > }
> >> >>
> >> >> Is all of the above really necessary? I.e. I can neither see why setting
> >> >> ->dom earlier is needed, nor why clearing it on the error paths should
> >> >> be dropped.
> >> >
> >> > Yes. We need the ->dom so that the hvm_dirq_assist can run without
> >> > hitting an NULL pointer exception. Please keep in mind that the moment
> >> > we setup the IRQ action handler, we are "live" - [...]
> >>
> >> But all you need is that this happens before respective
> >> pirq_guest_bind() calls. I.e. in the PT_IRQ_TYPE_PCI and
> >> PT_IRQ_TYPE_MSI_TRANSLATE cases it was already done early enough
> >> (avoiding it remaining set on error paths), so all you'd need is adding
> >> it for the PT_IRQ_TYPE_MSI path too.
> >
> > .. and with the below statement ["pt_pirq_reset() is just replacing
> > that assigment"] I believe having just
> > pirq->dom = d;
> >
> > before the switch would be correct.
> >
> >>
> >> I agree that the clearing of the field in error paths might need a
> >> little care, but otoh you could equally well have hvm_dirq_assist()
> >> bail when it finds it to be NULL?
> >
> > Can't do it as is. I added in the 'get_knowalive_domain()' and
> > 'put_domain()'
> > in the 'schedule_softirq_for' and 'dpci_softirq' respectively.
> >
> > Which means that we would have an outstanding refcount as 'dpci_softirq'
> > could not access the '->dom' to get the domain and do proper refcounting.
> >
> > Unless I rip out the refcounting there which I believe is OK.
>
> No - instead the site clearing the field should then drop the
> reference (if so needed as indicated by other state). I.e.
> either there is a softirq being processed (in which case it's
> there where the reference gets dropped) or you can stop it
> from getting processed subsequently and drop the reference
> right away. Takes maybe another cmpxchg() afaict (on the
> ->dom field) or in the worst case another flag to signal this.
I've been looking at this and I've come up with something that does
address this - but instead of using 'cmpxchg' it ends up using
set_bit/clear_bit with two different states. Using 'cmpxchg' was
getting a bit too complex - or more likely I was getting too
fatigued of the code.
Anyhow here is an RFC on top of this patchset that I believe does
in spirit what you described. Naturally the whole big piece
of code that does the ref_counting on failure paths needs to be done
in an function. Please advise if this is acceptable for you
and if so I can properly flesh it out.
commit e0647b03be8319907d4f2dc424c1fcaa4895b31a
Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Sun Oct 5 09:44:14 2014 -0400
Two bits using clear/set
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 47c8ed5..cae61f1 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -29,6 +29,10 @@
static DEFINE_PER_CPU(struct list_head, dpci_list);
+enum {
+ STATE_SCHED, /* Bit 0 */
+ STATE_RUN,
+};
/*
* Should only be called from hvm_do_IRQ_dpci. We use the
* 'masked' as an gate to thwart multiple interrupts.
@@ -37,14 +41,14 @@ static DEFINE_PER_CPU(struct list_head, dpci_list);
* completed executing 'hvm_dirq_assist'.
*
*/
-static void schedule_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
+static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
{
unsigned long flags;
- if ( pirq_dpci->masked )
+ if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->masked) )
return;
- pirq_dpci->masked = 1;
+ get_knownalive_domain(pirq_dpci->dom);
local_irq_save(flags);
list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
@@ -55,9 +59,9 @@ static void schedule_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
/*
* If we are racing with softirq_dpci (masked is still set) we return
- * -EAGAIN. Otherwise we return 0.
+ * -ERESTART. Otherwise we return 0.
*
- * If it is -EAGAIN, it is the callers responsibility to make sure
+ * If it is -ERESTART, it is the callers responsibility to make sure
* that the softirq (with the event_lock dropped) has ran. We need
* to flush out the outstanding 'dpci_softirq' (no more of them
* will be added for this pirq as the IRQ action handler has been
@@ -65,10 +69,13 @@ static void schedule_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
*/
int pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
{
- if ( pirq_dpci->masked )
- return -EAGAIN;
+ if ( test_bit(STATE_RUN, &pirq_dpci->masked) )
+ return -ERESTART;
+
+ if ( test_bit(STATE_SCHED, &pirq_dpci->masked) )
+ return -ERESTART;
/*
- * If in the future we would call 'schedule_softirq_for' right away
+ * If in the future we would call 'raise_softirq_for' right away
* after 'pt_pirq_softirq_active' we MUST reset the list (otherwise it
* might have stale data).
*/
@@ -143,7 +150,6 @@ int pt_irq_create_bind(
struct hvm_pirq_dpci *pirq_dpci;
struct pirq *info;
int rc, pirq = pt_irq_bind->machine_irq;
- s_time_t start = NOW();
if ( pirq < 0 || pirq >= d->nr_pirqs )
return -EINVAL;
@@ -188,8 +194,6 @@ int pt_irq_create_bind(
{
spin_unlock(&d->event_lock);
process_pending_softirqs();
- if ( ( NOW() - start ) > SECONDS(1) )
- return -EAGAIN;
goto restart;
}
/*
@@ -230,7 +234,35 @@ int pt_irq_create_bind(
{
pirq_dpci->gmsi.gflags = 0;
pirq_dpci->gmsi.gvec = 0;
- pirq_dpci->dom = NULL;
+
+ /* The softirq has saved it so we are safe to reset it. */
+ if ( test_bit(STATE_RUN, &pirq_dpci->masked) )
+ {
+ pirq_dpci->dom = NULL;
+ }
+ else if ( test_and_clear_bit(STATE_SCHED, &pirq_dpci->masked) )
+ {
+ /* pirq_guest_unbind has made sure we won't be getting
+ * any more interrupts (no raise_softirq_for), so the only
+ * ones that can be are the ones that got scheduled _right_
+ * before the pirq_guest_unbind. If we can de-schedule them
+ * that is good. The problem #1 is that the softirq might be
+ * running and it has just set STATE_RUN while we cleared
+ * STATE_SCHED. That is OK, as right after the STATE_RUN it
+ * will check the STATE_SCHED and if cleared it will unclear
+ * STATE_RUN and ignore this pirq. We MUST put the refcount
+ * down. */
+ put_domain(pirq_dpci->dom);
+ pirq_dpci->dom = NULL;
+ }
+ else
+ {
+ /* !STATE_RUN (stale) and !STATE_SCHED.
+ * softirq will ignore this pirq, but we MUST put the refcount
+ * down. */
+ put_domain(pirq_dpci->dom);
+ pirq_dpci->dom = NULL;
+ }
pirq_dpci->flags = 0;
pirq_cleanup_check(info, d);
spin_unlock(&d->event_lock);
@@ -332,7 +364,7 @@ int pt_irq_create_bind(
{
if ( pt_irq_need_timer(pirq_dpci->flags) )
kill_timer(&pirq_dpci->timer);
- pirq_dpci->dom = NULL;
+ /* TODO - function. pirq_dpci->dom = NULL; */
list_del(&girq->list);
list_del(&digl->list);
hvm_irq_dpci->link_cnt[link]--;
@@ -538,7 +570,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq)
!(pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) )
return 0;
- schedule_softirq_for(pirq_dpci);
+ raise_softirq_for(pirq_dpci);
return 1;
}
@@ -592,11 +624,10 @@ void hvm_dpci_msi_eoi(struct domain *d, int vector)
spin_unlock(&d->event_lock);
}
-static void hvm_dirq_assist(struct hvm_pirq_dpci *pirq_dpci)
+static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
{
- struct domain *d = pirq_dpci->dom;
-
/*
+ * TODO: Remove
* We can be racing with 'pt_irq_destroy_bind' - with us being scheduled
* right before 'pirq_guest_unbind' gets called - but us not yet executed.
*
@@ -604,8 +635,6 @@ static void hvm_dirq_assist(struct hvm_pirq_dpci *pirq_dpci)
* 'mapping' - which is OK as later in this code we would
* do nothing except clear the ->masked field anyhow.
*/
- if ( !d )
- return;
ASSERT(d->arch.hvm_domain.irq.dpci);
@@ -725,11 +754,24 @@ static void dpci_softirq(void)
while ( !list_empty(&our_list) )
{
+ struct domain *d;
+
pirq_dpci = list_entry(our_list.next, struct hvm_pirq_dpci, softirq_list);
list_del(&pirq_dpci->softirq_list);
- hvm_dirq_assist(pirq_dpci);
- pirq_dpci->masked = 0;
+ d = pirq_dpci->dom;
+ barrier(); /* 'd' MUST be saved before we set/clear the bits. */
+ if ( test_and_set_bit(STATE_RUN, &pirq_dpci->masked) )
+ BUG();
+ /* Asked to be descheduled. */
+ if ( !test_and_clear_bit(STATE_SCHED, &pirq_dpci->masked) )
+ {
+ clear_bit(STATE_RUN, &pirq_dpci->masked);
+ continue;
+ }
+ hvm_dirq_assist(d, pirq_dpci);
+ put_domain(d);
+ clear_bit(STATE_RUN, &pirq_dpci->masked);
}
}
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index d147189..1660750 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -775,7 +775,7 @@ static int pci_clean_dpci_irqs(struct domain *d)
struct hvm_irq_dpci *hvm_irq_dpci = NULL;
if ( !iommu_enabled )
- return -ESRCH;
+ return -ENODEV;
if ( !is_hvm_domain(d) )
return -EINVAL;
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 4fc6dcf..4a9324e 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -93,7 +93,7 @@ struct hvm_irq_dpci {
/* Machine IRQ to guest device/intx mapping. */
struct hvm_pirq_dpci {
uint32_t flags;
- bool_t masked;
+ unsigned long masked;
uint16_t pending;
struct list_head digl_list;
struct domain *dom;
of 'latch'
>
> Jan
>
next prev parent reply other threads:[~2014-10-07 15:40 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-23 2:10 [PATCH v6] Fix interrupt latency of HVM PCI passthrough devices Konrad Rzeszutek Wilk
2014-09-23 2:10 ` [PATCH v6 for-xen-4.5 1/3] dpci: Move from domain centric model to hvm_dirq_dpci model Konrad Rzeszutek Wilk
2014-09-25 14:24 ` Jan Beulich
2014-09-25 14:48 ` Konrad Rzeszutek Wilk
2014-09-25 15:04 ` Jan Beulich
2014-09-27 1:32 ` Konrad Rzeszutek Wilk
2014-09-29 7:21 ` Jan Beulich
2014-10-07 15:40 ` Konrad Rzeszutek Wilk [this message]
2014-10-07 16:10 ` Jan Beulich
2014-09-23 2:10 ` [PATCH v6 for-xen-4.5 2/3] dpci: In hvm_dirq_assist stop using pt_pirq_iterate Konrad Rzeszutek Wilk
2014-09-25 14:29 ` Jan Beulich
2014-09-23 2:10 ` [PATCH v6 for-xen-4.5 3/3] dpci: Replace tasklet with an softirq (v6) Konrad Rzeszutek Wilk
2014-09-25 14:55 ` Jan Beulich
2014-09-25 15:27 ` Konrad Rzeszutek Wilk
2014-09-25 15:45 ` Jan Beulich
2014-09-25 16:05 ` Konrad Rzeszutek Wilk
2014-09-27 1:32 ` Konrad Rzeszutek Wilk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141007154043.GB32648@laptop.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=JBeulich@suse.com \
--cc=ian.campbell@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=keir@xen.org \
--cc=tim@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.