All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1] xen/sched/null: avoid crash after failed domU creation
@ 2023-05-01 20:30 Stewart Hildebrand
  2023-05-05  5:59 ` Juergen Gross
  0 siblings, 1 reply; 5+ messages in thread
From: Stewart Hildebrand @ 2023-05-01 20:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Stewart Hildebrand, George Dunlap, Dario Faggioli, Juergen Gross

When creating a domU, but the creation fails, there is a corner case that may
lead to a crash in the null scheduler when running a debug build of Xen.

(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'npc->unit == unit' failed at common/sched/null.c:379
(XEN) ****************************************

The events leading to the crash are:

* null_unit_insert() was invoked with the unit offline. Since the unit was
  offline, unit_assign() was not called, and null_unit_insert() returned.
* Later during domain creation, the unit was onlined
* Eventually, domain creation failed due to bad configuration
* null_unit_remove() was invoked with the unit still online. Since the unit was
  online, it called unit_deassign() and triggered an ASSERT.

To fix this, only call unit_deassign() when npc->unit is non-NULL in
null_unit_remove.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
RFC->v1
* Follow Juergen's suggested fix

Link to RFC [1]

[1] https://lists.xenproject.org/archives/html/xen-devel/2023-04/msg01387.html
---
 xen/common/sched/null.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/xen/common/sched/null.c b/xen/common/sched/null.c
index 65a0a6c5312d..2091337fcd06 100644
--- a/xen/common/sched/null.c
+++ b/xen/common/sched/null.c
@@ -522,6 +522,8 @@ static void cf_check null_unit_remove(
 {
     struct null_private *prv = null_priv(ops);
     struct null_unit *nvc = null_unit(unit);
+    struct null_pcpu *npc;
+    unsigned int cpu;
     spinlock_t *lock;
 
     ASSERT(!is_idle_unit(unit));
@@ -531,8 +533,6 @@ static void cf_check null_unit_remove(
     /* If offline, the unit shouldn't be assigned, nor in the waitqueue */
     if ( unlikely(!is_unit_online(unit)) )
     {
-        struct null_pcpu *npc;
-
         npc = unit->res->sched_priv;
         ASSERT(npc->unit != unit);
         ASSERT(list_empty(&nvc->waitq_elem));
@@ -549,7 +549,10 @@ static void cf_check null_unit_remove(
         goto out;
     }
 
-    unit_deassign(prv, unit);
+    cpu = sched_unit_master(unit);
+    npc = get_sched_res(cpu)->sched_priv;
+    if ( npc->unit )
+        unit_deassign(prv, unit);
 
  out:
     unit_schedule_unlock_irq(lock, unit);
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] xen/sched/null: avoid crash after failed domU creation
  2023-05-01 20:30 [PATCH v1] xen/sched/null: avoid crash after failed domU creation Stewart Hildebrand
@ 2023-05-05  5:59 ` Juergen Gross
  2023-05-18 21:27   ` Stewart Hildebrand
  0 siblings, 1 reply; 5+ messages in thread
From: Juergen Gross @ 2023-05-05  5:59 UTC (permalink / raw)
  To: Stewart Hildebrand, xen-devel; +Cc: George Dunlap, Dario Faggioli


[-- Attachment #1.1.1: Type: text/plain, Size: 1108 bytes --]

On 01.05.23 22:30, Stewart Hildebrand wrote:
> When creating a domU, but the creation fails, there is a corner case that may
> lead to a crash in the null scheduler when running a debug build of Xen.
> 
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion 'npc->unit == unit' failed at common/sched/null.c:379
> (XEN) ****************************************
> 
> The events leading to the crash are:
> 
> * null_unit_insert() was invoked with the unit offline. Since the unit was
>    offline, unit_assign() was not called, and null_unit_insert() returned.
> * Later during domain creation, the unit was onlined
> * Eventually, domain creation failed due to bad configuration
> * null_unit_remove() was invoked with the unit still online. Since the unit was
>    online, it called unit_deassign() and triggered an ASSERT.
> 
> To fix this, only call unit_deassign() when npc->unit is non-NULL in
> null_unit_remove.
> 
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] xen/sched/null: avoid crash after failed domU creation
  2023-05-05  5:59 ` Juergen Gross
@ 2023-05-18 21:27   ` Stewart Hildebrand
  2023-05-19  8:33     ` Jan Beulich
  2023-05-22  8:49     ` Dario Faggioli
  0 siblings, 2 replies; 5+ messages in thread
From: Stewart Hildebrand @ 2023-05-18 21:27 UTC (permalink / raw)
  To: Juergen Gross, xen-devel; +Cc: George Dunlap, Dario Faggioli

On 5/5/23 01:59, Juergen Gross wrote:
> On 01.05.23 22:30, Stewart Hildebrand wrote:
>> When creating a domU, but the creation fails, there is a corner case that may
>> lead to a crash in the null scheduler when running a debug build of Xen.
>>
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Assertion 'npc->unit == unit' failed at common/sched/null.c:379
>> (XEN) ****************************************
>>
>> The events leading to the crash are:
>>
>> * null_unit_insert() was invoked with the unit offline. Since the unit was
>>    offline, unit_assign() was not called, and null_unit_insert() returned.
>> * Later during domain creation, the unit was onlined
>> * Eventually, domain creation failed due to bad configuration
>> * null_unit_remove() was invoked with the unit still online. Since the unit was
>>    online, it called unit_deassign() and triggered an ASSERT.
>>
>> To fix this, only call unit_deassign() when npc->unit is non-NULL in
>> null_unit_remove.
>>
>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> 
> Reviewed-by: Juergen Gross <jgross@suse.com>

Thanks for the review. Does this still need a maintainer ack?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] xen/sched/null: avoid crash after failed domU creation
  2023-05-18 21:27   ` Stewart Hildebrand
@ 2023-05-19  8:33     ` Jan Beulich
  2023-05-22  8:49     ` Dario Faggioli
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2023-05-19  8:33 UTC (permalink / raw)
  To: Stewart Hildebrand
  Cc: George Dunlap, Dario Faggioli, Juergen Gross, xen-devel

On 18.05.2023 23:27, Stewart Hildebrand wrote:
> On 5/5/23 01:59, Juergen Gross wrote:
>> On 01.05.23 22:30, Stewart Hildebrand wrote:
>>> When creating a domU, but the creation fails, there is a corner case that may
>>> lead to a crash in the null scheduler when running a debug build of Xen.
>>>
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Assertion 'npc->unit == unit' failed at common/sched/null.c:379
>>> (XEN) ****************************************
>>>
>>> The events leading to the crash are:
>>>
>>> * null_unit_insert() was invoked with the unit offline. Since the unit was
>>>    offline, unit_assign() was not called, and null_unit_insert() returned.
>>> * Later during domain creation, the unit was onlined
>>> * Eventually, domain creation failed due to bad configuration
>>> * null_unit_remove() was invoked with the unit still online. Since the unit was
>>>    online, it called unit_deassign() and triggered an ASSERT.
>>>
>>> To fix this, only call unit_deassign() when npc->unit is non-NULL in
>>> null_unit_remove.
>>>
>>> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
>>
>> Reviewed-by: Juergen Gross <jgross@suse.com>
> 
> Thanks for the review. Does this still need a maintainer ack?

In principle yes. I might be willing to time out at some point, but
not before at least one ping was sent (and some more time has passed
afterwards).

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1] xen/sched/null: avoid crash after failed domU creation
  2023-05-18 21:27   ` Stewart Hildebrand
  2023-05-19  8:33     ` Jan Beulich
@ 2023-05-22  8:49     ` Dario Faggioli
  1 sibling, 0 replies; 5+ messages in thread
From: Dario Faggioli @ 2023-05-22  8:49 UTC (permalink / raw)
  To: Stewart Hildebrand, Juergen Gross, xen-devel; +Cc: George Dunlap

On Thu, 2023-05-18 at 17:27 -0400, Stewart Hildebrand wrote:
> On 5/5/23 01:59, Juergen Gross wrote:
> > > 
> > > Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> > 
> > Reviewed-by: Juergen Gross <jgross@suse.com>
> 
> Thanks for the review. Does this still need a maintainer ack?
>
Acked-by: Dario Faggioli <dfaggioli@suse.com>

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-05-22  8:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-01 20:30 [PATCH v1] xen/sched/null: avoid crash after failed domU creation Stewart Hildebrand
2023-05-05  5:59 ` Juergen Gross
2023-05-18 21:27   ` Stewart Hildebrand
2023-05-19  8:33     ` Jan Beulich
2023-05-22  8:49     ` Dario Faggioli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.