* [PATCH] xen: credit2: fix spinlock irq-safety violation
@ 2017-09-19 2:11 Dario Faggioli
2017-09-19 8:23 ` Wei Liu
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Dario Faggioli @ 2017-09-19 2:11 UTC (permalink / raw)
To: xen-devel; +Cc: George Dunlap, Wei Liu, osstest service owner
In commit ad4b3e1e9df34 ("xen: credit2: implement
utilization cap") xfree() was being called (for
deallocating the budget replenishment timer, during
domain destruction) inside an IRQ disabled critical
section.
That must not happen, as it uses the mem-pool's lock,
which needs to be taken with IRQ enabled. And, in fact,
we crash (in debug builds):
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at spinlock.c:47
(XEN) ****************************************
Let's, therefore, kill and deallocate the timer outside of
the critical sections, when IRQs are enabled.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: osstest service owner <osstest-admin@xenproject.org>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
This was spotted by OSSTest's flight 113562:
http://logs.test-lab.xenproject.org/osstest/logs/113562/
http://logs.test-lab.xenproject.org/osstest/logs/113562/test-amd64-amd64-xl-credit2/serial-godello0.log
---
xen/common/sched_credit2.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 32234ac..7a550db 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2923,13 +2923,13 @@ csched2_free_domdata(const struct scheduler *ops, void *data)
write_lock_irqsave(&prv->lock, flags);
- kill_timer(sdom->repl_timer);
- xfree(sdom->repl_timer);
-
list_del_init(&sdom->sdom_elem);
write_unlock_irqrestore(&prv->lock, flags);
+ kill_timer(sdom->repl_timer);
+ xfree(sdom->repl_timer);
+
xfree(data);
}
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] xen: credit2: fix spinlock irq-safety violation 2017-09-19 2:11 [PATCH] xen: credit2: fix spinlock irq-safety violation Dario Faggioli @ 2017-09-19 8:23 ` Wei Liu 2017-09-20 9:16 ` Roger Pau Monné 2017-09-20 9:19 ` George Dunlap 2 siblings, 0 replies; 9+ messages in thread From: Wei Liu @ 2017-09-19 8:23 UTC (permalink / raw) To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Wei Liu, osstest service owner On Tue, Sep 19, 2017 at 04:11:28AM +0200, Dario Faggioli wrote: > In commit ad4b3e1e9df34 ("xen: credit2: implement > utilization cap") xfree() was being called (for > deallocating the budget replenishment timer, during > domain destruction) inside an IRQ disabled critical > section. > > That must not happen, as it uses the mem-pool's lock, > which needs to be taken with IRQ enabled. And, in fact, > we crash (in debug builds): > > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) Xen BUG at spinlock.c:47 > (XEN) **************************************** > > Let's, therefore, kill and deallocate the timer outside of > the critical sections, when IRQs are enabled. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xen: credit2: fix spinlock irq-safety violation 2017-09-19 2:11 [PATCH] xen: credit2: fix spinlock irq-safety violation Dario Faggioli 2017-09-19 8:23 ` Wei Liu @ 2017-09-20 9:16 ` Roger Pau Monné 2017-09-20 9:19 ` George Dunlap 2 siblings, 0 replies; 9+ messages in thread From: Roger Pau Monné @ 2017-09-20 9:16 UTC (permalink / raw) To: Dario Faggioli; +Cc: George Dunlap, xen-devel, Wei Liu, osstest service owner On Tue, Sep 19, 2017 at 04:11:28AM +0200, Dario Faggioli wrote: > In commit ad4b3e1e9df34 ("xen: credit2: implement > utilization cap") xfree() was being called (for > deallocating the budget replenishment timer, during > domain destruction) inside an IRQ disabled critical > section. > > That must not happen, as it uses the mem-pool's lock, > which needs to be taken with IRQ enabled. And, in fact, > we crash (in debug builds): > > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) Xen BUG at spinlock.c:47 > (XEN) **************************************** > > Let's, therefore, kill and deallocate the timer outside of > the critical sections, when IRQs are enabled. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> It would be good to get this committed sooner rather than later, so the push gate can be unblocked. Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xen: credit2: fix spinlock irq-safety violation 2017-09-19 2:11 [PATCH] xen: credit2: fix spinlock irq-safety violation Dario Faggioli 2017-09-19 8:23 ` Wei Liu 2017-09-20 9:16 ` Roger Pau Monné @ 2017-09-20 9:19 ` George Dunlap 2017-09-20 10:04 ` George Dunlap 2 siblings, 1 reply; 9+ messages in thread From: George Dunlap @ 2017-09-20 9:19 UTC (permalink / raw) To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Wei Liu, osstest service owner On 09/19/2017 03:11 AM, Dario Faggioli wrote: > In commit ad4b3e1e9df34 ("xen: credit2: implement > utilization cap") xfree() was being called (for > deallocating the budget replenishment timer, during > domain destruction) inside an IRQ disabled critical > section. > > That must not happen, as it uses the mem-pool's lock, > which needs to be taken with IRQ enabled. And, in fact, > we crash (in debug builds): > > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) Xen BUG at spinlock.c:47 > (XEN) **************************************** > > Let's, therefore, kill and deallocate the timer outside of > the critical sections, when IRQs are enabled. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > --- > Cc: osstest service owner <osstest-admin@xenproject.org> > Cc: George Dunlap <george.dunlap@eu.citrix.com> > Cc: Wei Liu <wei.liu2@citrix.com> > --- > This was spotted by OSSTest's flight 113562: > > http://logs.test-lab.xenproject.org/osstest/logs/113562/ > http://logs.test-lab.xenproject.org/osstest/logs/113562/test-amd64-amd64-xl-credit2/serial-godello0.log > --- > xen/common/sched_credit2.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index 32234ac..7a550db 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -2923,13 +2923,13 @@ csched2_free_domdata(const struct scheduler *ops, void *data) > > write_lock_irqsave(&prv->lock, flags); > > - kill_timer(sdom->repl_timer); > - xfree(sdom->repl_timer); > - > list_del_init(&sdom->sdom_elem); > > write_unlock_irqrestore(&prv->lock, flags); > > + kill_timer(sdom->repl_timer); > + xfree(sdom->repl_timer); Any particular reason for moving the kill_timer() as well as the xfree outside the lock? What happens if the timer goes off after the irqrestore but before the kill_timer? -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xen: credit2: fix spinlock irq-safety violation 2017-09-20 9:19 ` George Dunlap @ 2017-09-20 10:04 ` George Dunlap 2017-09-20 11:44 ` Dario Faggioli 0 siblings, 1 reply; 9+ messages in thread From: George Dunlap @ 2017-09-20 10:04 UTC (permalink / raw) To: Dario Faggioli, xen-devel; +Cc: George Dunlap, Wei Liu, osstest service owner On 09/20/2017 10:19 AM, George Dunlap wrote: > On 09/19/2017 03:11 AM, Dario Faggioli wrote: >> In commit ad4b3e1e9df34 ("xen: credit2: implement >> utilization cap") xfree() was being called (for >> deallocating the budget replenishment timer, during >> domain destruction) inside an IRQ disabled critical >> section. >> >> That must not happen, as it uses the mem-pool's lock, >> which needs to be taken with IRQ enabled. And, in fact, >> we crash (in debug builds): >> >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) Xen BUG at spinlock.c:47 >> (XEN) **************************************** >> >> Let's, therefore, kill and deallocate the timer outside of >> the critical sections, when IRQs are enabled. >> >> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> >> --- >> Cc: osstest service owner <osstest-admin@xenproject.org> >> Cc: George Dunlap <george.dunlap@eu.citrix.com> >> Cc: Wei Liu <wei.liu2@citrix.com> >> --- >> This was spotted by OSSTest's flight 113562: >> >> http://logs.test-lab.xenproject.org/osstest/logs/113562/ >> http://logs.test-lab.xenproject.org/osstest/logs/113562/test-amd64-amd64-xl-credit2/serial-godello0.log >> --- >> xen/common/sched_credit2.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c >> index 32234ac..7a550db 100644 >> --- a/xen/common/sched_credit2.c >> +++ b/xen/common/sched_credit2.c >> @@ -2923,13 +2923,13 @@ csched2_free_domdata(const struct scheduler *ops, void *data) >> >> write_lock_irqsave(&prv->lock, flags); >> >> - kill_timer(sdom->repl_timer); >> - xfree(sdom->repl_timer); >> - >> list_del_init(&sdom->sdom_elem); >> >> write_unlock_irqrestore(&prv->lock, flags); >> >> + kill_timer(sdom->repl_timer); >> + xfree(sdom->repl_timer); > > Any particular reason for moving the kill_timer() as well as the xfree > outside the lock? What happens if the timer goes off after the > irqrestore but before the kill_timer? Looks like if the timer fires, nothing terribly bad will happen; it will just do a useless budget replenishment. It looks like kill_timer() disables irqs, so it could be moved inside the critical section. I'm inclined to say we should do so. I don't anticipate the budget replenishment ever to need to walk the domain list, but should that change, this would be a bear of a bug to find. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xen: credit2: fix spinlock irq-safety violation 2017-09-20 10:04 ` George Dunlap @ 2017-09-20 11:44 ` Dario Faggioli 2017-09-20 11:59 ` Jan Beulich 0 siblings, 1 reply; 9+ messages in thread From: Dario Faggioli @ 2017-09-20 11:44 UTC (permalink / raw) To: George Dunlap, xen-devel; +Cc: George Dunlap, Wei Liu, osstest service owner [-- Attachment #1.1.1: Type: text/plain, Size: 2297 bytes --] On Wed, 2017-09-20 at 11:04 +0100, George Dunlap wrote: > On 09/20/2017 10:19 AM, George Dunlap wrote: > > > diff --git a/xen/common/sched_credit2.c > > > b/xen/common/sched_credit2.c > > > index 32234ac..7a550db 100644 > > > --- a/xen/common/sched_credit2.c > > > +++ b/xen/common/sched_credit2.c > > > @@ -2923,13 +2923,13 @@ csched2_free_domdata(const struct > > > scheduler *ops, void *data) > > > > > > write_lock_irqsave(&prv->lock, flags); > > > > > > - kill_timer(sdom->repl_timer); > > > - xfree(sdom->repl_timer); > > > - > > > list_del_init(&sdom->sdom_elem); > > > > > > write_unlock_irqrestore(&prv->lock, flags); > > > > > > + kill_timer(sdom->repl_timer); > > > + xfree(sdom->repl_timer); > > > > Any particular reason for moving the kill_timer() as well as the > > xfree > > outside the lock? What happens if the timer goes off after the > > irqrestore but before the kill_timer? > > Looks like if the timer fires, nothing terribly bad will happen; it > will > just do a useless budget replenishment. > It's just that it has not reason to be there, as nothing that it does is serialized by prv->lock, so it only makes the critical section (with IRQ disabled) longer, for no reason, which is bad, as this being a write_lock(), it'll stop readers too. I think it was (my) mistake to put it there in the first place. > It looks like kill_timer() disables irqs, so it could be moved inside > the critical section. I'm inclined to say we should do so. I don't > anticipate the budget replenishment ever to need to walk the domain > list, but should that change, this would be a bear of a bug to find. > Indeed it's rather unlikely for the replenishment handler to have to use sdom_elem to go through the list of domains. IAC, if you're concerned about that, I'd much rather put both kill_timer() and xfree() before the critical section, rather than after, like in the attached patch. Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.1.2: xen-credit2-fix-spinlock --] [-- Type: message/rfc822, Size: 2007 bytes --] From: Dario Faggioli <dario.faggioli@citrix.com> Subject: No Subject Date: Wed, 20 Sep 2017 13:42:51 +0200 Message-ID: <1505907771.3483.11.camel@citrix.com> [PATCH v2] xen: credit2: fix spinlock irq-safety violation In commit ad4b3e1e9df34 ("xen: credit2: implement utilization cap") xfree() was being called (for deallocating the budget replenishment timer, during domain destruction) inside an IRQ disabled critical section. That must not happen, as it uses the mem-pool's lock, which needs to be taken with IRQ enabled. And, in fact, we crash (in debug builds): (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Xen BUG at spinlock.c:47 (XEN) **************************************** Let's, therefore, kill and deallocate the timer outside of the critical sections, when IRQs are enabled. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Cc: osstest service owner <osstest-admin@xenproject.org> Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> --- This was spotted by OSSTest's flight 113562: http://logs.test-lab.xenproject.org/osstest/logs/113562/ http://logs.test-lab.xenproject.org/osstest/logs/113562/test-amd64-amd64-xl-credit2/serial-godello0.log --- Changes from v1: - kill_timer() and xfree() moved above critical section, instead than below. diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 32234ac..32b0363 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -2921,11 +2921,11 @@ csched2_free_domdata(const struct scheduler *ops, void *data) struct csched2_dom *sdom = data; struct csched2_private *prv = csched2_priv(ops); - write_lock_irqsave(&prv->lock, flags); - kill_timer(sdom->repl_timer); xfree(sdom->repl_timer); + write_lock_irqsave(&prv->lock, flags); + list_del_init(&sdom->sdom_elem); write_unlock_irqrestore(&prv->lock, flags); [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] xen: credit2: fix spinlock irq-safety violation 2017-09-20 11:44 ` Dario Faggioli @ 2017-09-20 11:59 ` Jan Beulich 2017-09-20 13:49 ` George Dunlap 0 siblings, 1 reply; 9+ messages in thread From: Jan Beulich @ 2017-09-20 11:59 UTC (permalink / raw) To: Dario Faggioli Cc: George Dunlap, xen-devel, Wei Liu, George Dunlap, osstest service owner >>> On 20.09.17 at 13:44, <dario.faggioli@citrix.com> wrote: > IAC, if you're concerned about that, I'd much rather put both > kill_timer() and xfree() before the critical section, rather than > after, like in the attached patch. Hmm, killing the timer upfront is certainly fine, but is freeing the data before removing the element from the list safe not only currently, but also going forward? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xen: credit2: fix spinlock irq-safety violation 2017-09-20 11:59 ` Jan Beulich @ 2017-09-20 13:49 ` George Dunlap 2017-09-21 0:29 ` Dario Faggioli 0 siblings, 1 reply; 9+ messages in thread From: George Dunlap @ 2017-09-20 13:49 UTC (permalink / raw) To: Jan Beulich, Dario Faggioli Cc: George Dunlap, xen-devel, Wei Liu, osstest service owner On 09/20/2017 12:59 PM, Jan Beulich wrote: >>>> On 20.09.17 at 13:44, <dario.faggioli@citrix.com> wrote: >> IAC, if you're concerned about that, I'd much rather put both >> kill_timer() and xfree() before the critical section, rather than >> after, like in the attached patch. > > Hmm, killing the timer upfront is certainly fine, but is freeing the > data before removing the element from the list safe not only > currently, but also going forward? I agree with Jan -- if you don't want to put the kill_timer() in the critical section, put it beforehand; but don't free the structure until after the sdom struct has been removed from the list. Sorry to be picky, but I'm positive I'm not going to remember this in six months' time, and I'd just rather be safe. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] xen: credit2: fix spinlock irq-safety violation 2017-09-20 13:49 ` George Dunlap @ 2017-09-21 0:29 ` Dario Faggioli 0 siblings, 0 replies; 9+ messages in thread From: Dario Faggioli @ 2017-09-21 0:29 UTC (permalink / raw) To: George Dunlap, Jan Beulich Cc: George Dunlap, xen-devel, Wei Liu, osstest service owner [-- Attachment #1.1: Type: text/plain, Size: 1771 bytes --] On Wed, 2017-09-20 at 14:49 +0100, George Dunlap wrote: > On 09/20/2017 12:59 PM, Jan Beulich wrote: > > Hmm, killing the timer upfront is certainly fine, but is freeing > > the > > data before removing the element from the list safe not only > > currently, but also going forward? > > I agree with Jan -- if you don't want to put the kill_timer() in the > critical section, put it beforehand; but don't free the structure > until > after the sdom struct has been removed from the list. > In general, I agree, and so I'll do this. In this case, considering what list we are talking about, and what it is used for right now (i.e., only for debug dump!), there are no dependencies between these two operations. And in any scenario that I can anticipate, where such a dependency would come into being, the level of restructuring of the code that is necessary to use the list in any really useful way, would be significant, and there may be a few other cases where a similar dependency would also surface and become an issue, and that will probably lead us to consider/remember this code here as well. So, IOW, I wouldn't consider this a problem, in the specific case. But since the net effect of this patch and what George suggests is the same, and since I appreciate the value that it has, in principle, doing changes like these with this approach, I'm fine with killing before and freeing after the critical sections, and I'll send a patch for that Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-09-21 0:29 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-09-19 2:11 [PATCH] xen: credit2: fix spinlock irq-safety violation Dario Faggioli 2017-09-19 8:23 ` Wei Liu 2017-09-20 9:16 ` Roger Pau Monné 2017-09-20 9:19 ` George Dunlap 2017-09-20 10:04 ` George Dunlap 2017-09-20 11:44 ` Dario Faggioli 2017-09-20 11:59 ` Jan Beulich 2017-09-20 13:49 ` George Dunlap 2017-09-21 0:29 ` Dario Faggioli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).