xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [for-4.8 PATCH] xen: credit2: fix wrong assert in runq_tickle().
@ 2016-11-22 10:43 Dario Faggioli
  2016-11-22 10:55 ` Wei Liu
  2016-11-22 11:21 ` Jan Beulich
  0 siblings, 2 replies; 6+ messages in thread
From: Dario Faggioli @ 2016-11-22 10:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu, George Dunlap

Since b047f888d489 ("xen: sched: leave CPUs doing tasklet
work alone") a cpu executing a tasklet, is not marked as
idle.

Therefore:
 - avoid asserting that we can't find the idle vcpu running
   on one of them, which is not true,
 - avoid triggering a preemption on them (and add an assert
   checking that).

This fixes a bug identified by OSSTest, in flight 102372
(on ARM, but it's not at all ARM specific), where the
ASSERT() was triggering like this:

(XEN) Xen call trace:
(XEN)    [<0022af78>] sched_credit2.c#runq_tickle+0x3e8/0x61c (PC)
(XEN)    [<0022aedc>] sched_credit2.c#runq_tickle+0x34c/0x61c (LR)
(XEN)    [<0022b644>] sched_credit2.c#csched2_context_saved+0x128/0x1a4
(XEN)    [<0023303c>] context_saved+0x7c/0xa4
(XEN)    [<0024f660>] domain.c#schedule_tail+0x2b4/0x308
(XEN)    [<0024faac>] context_switch+0x80/0x94
(XEN)    [<0022ff48>] schedule.c#schedule+0x76c/0x7ec
(XEN)    [<002338d4>] softirq.c#__do_softirq+0xcc/0xec
(XEN)    [<00233968>] do_softirq+0x18/0x28
(XEN)    [<00261084>] leave_hypervisor_tail+0x58/0x88
(XEN)    [<002649d0>] entry.o#return_to_guest+0xc/0xb8
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Assertion '!is_idle_vcpu(cur->vcpu)' failed at sched_credit2.c:1009
(XEN) ****************************************

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/sched_credit2.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 1f26553..ac5ef7d 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1006,7 +1006,12 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
 
         cur = CSCHED2_VCPU(curr_on_cpu(i));
 
-        ASSERT(!is_idle_vcpu(cur->vcpu));
+        /*
+         * Even if the cpu is not in rqd->idle, it may be running the
+         * idle vcpu, if it's doing tasklet work. Just skip it.
+         */
+        if ( is_idle_vcpu(cur->vcpu) )
+            continue;
 
         /* Update credits for current to see if we want to preempt. */
         burn_credits(rqd, cur, now);
@@ -1032,6 +1037,8 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
         }
     }
 
+    ASSERT(ipid == -1 || !is_idle_vcpu(curr_on_cpu(ipid)));
+
     /*
      * Only switch to another processor if the credit difference is
      * greater than the migrate resistance.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [for-4.8 PATCH] xen: credit2: fix wrong assert in runq_tickle().
  2016-11-22 10:43 [for-4.8 PATCH] xen: credit2: fix wrong assert in runq_tickle() Dario Faggioli
@ 2016-11-22 10:55 ` Wei Liu
  2016-11-22 11:21 ` Jan Beulich
  1 sibling, 0 replies; 6+ messages in thread
From: Wei Liu @ 2016-11-22 10:55 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel, Wei Liu, George Dunlap

On Tue, Nov 22, 2016 at 11:43:35AM +0100, Dario Faggioli wrote:
> Since b047f888d489 ("xen: sched: leave CPUs doing tasklet
> work alone") a cpu executing a tasklet, is not marked as
> idle.
> 
> Therefore:
>  - avoid asserting that we can't find the idle vcpu running
>    on one of them, which is not true,
>  - avoid triggering a preemption on them (and add an assert
>    checking that).
> 
> This fixes a bug identified by OSSTest, in flight 102372
> (on ARM, but it's not at all ARM specific), where the
> ASSERT() was triggering like this:
> 
> (XEN) Xen call trace:
> (XEN)    [<0022af78>] sched_credit2.c#runq_tickle+0x3e8/0x61c (PC)
> (XEN)    [<0022aedc>] sched_credit2.c#runq_tickle+0x34c/0x61c (LR)
> (XEN)    [<0022b644>] sched_credit2.c#csched2_context_saved+0x128/0x1a4
> (XEN)    [<0023303c>] context_saved+0x7c/0xa4
> (XEN)    [<0024f660>] domain.c#schedule_tail+0x2b4/0x308
> (XEN)    [<0024faac>] context_switch+0x80/0x94
> (XEN)    [<0022ff48>] schedule.c#schedule+0x76c/0x7ec
> (XEN)    [<002338d4>] softirq.c#__do_softirq+0xcc/0xec
> (XEN)    [<00233968>] do_softirq+0x18/0x28
> (XEN)    [<00261084>] leave_hypervisor_tail+0x58/0x88
> (XEN)    [<002649d0>] entry.o#return_to_guest+0xc/0xb8
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion '!is_idle_vcpu(cur->vcpu)' failed at sched_credit2.c:1009
> (XEN) ****************************************
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>

I would like to have this fix, so in principle:

Release-acked-by: Wei Liu <wei.liu2@citrix.com>

I haven't read the content of this patch though.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [for-4.8 PATCH] xen: credit2: fix wrong assert in runq_tickle().
  2016-11-22 10:43 [for-4.8 PATCH] xen: credit2: fix wrong assert in runq_tickle() Dario Faggioli
  2016-11-22 10:55 ` Wei Liu
@ 2016-11-22 11:21 ` Jan Beulich
  2016-11-22 11:38   ` Juergen Gross
  2016-11-22 11:52   ` Dario Faggioli
  1 sibling, 2 replies; 6+ messages in thread
From: Jan Beulich @ 2016-11-22 11:21 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel, Wei Liu, George Dunlap

>>> On 22.11.16 at 11:43, <dario.faggioli@citrix.com> wrote:
> Since b047f888d489 ("xen: sched: leave CPUs doing tasklet
> work alone") a cpu executing a tasklet, is not marked as
> idle.
> 
> Therefore:
>  - avoid asserting that we can't find the idle vcpu running
>    on one of them, which is not true,
>  - avoid triggering a preemption on them (and add an assert
>    checking that).
> 
> This fixes a bug identified by OSSTest, in flight 102372
> (on ARM, but it's not at all ARM specific), where the
> ASSERT() was triggering like this:
> 
> (XEN) Xen call trace:
> (XEN)    [<0022af78>] sched_credit2.c#runq_tickle+0x3e8/0x61c (PC)
> (XEN)    [<0022aedc>] sched_credit2.c#runq_tickle+0x34c/0x61c (LR)
> (XEN)    [<0022b644>] sched_credit2.c#csched2_context_saved+0x128/0x1a4
> (XEN)    [<0023303c>] context_saved+0x7c/0xa4
> (XEN)    [<0024f660>] domain.c#schedule_tail+0x2b4/0x308
> (XEN)    [<0024faac>] context_switch+0x80/0x94
> (XEN)    [<0022ff48>] schedule.c#schedule+0x76c/0x7ec
> (XEN)    [<002338d4>] softirq.c#__do_softirq+0xcc/0xec
> (XEN)    [<00233968>] do_softirq+0x18/0x28
> (XEN)    [<00261084>] leave_hypervisor_tail+0x58/0x88
> (XEN)    [<002649d0>] entry.o#return_to_guest+0xc/0xb8
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) Assertion '!is_idle_vcpu(cur->vcpu)' failed at sched_credit2.c:1009
> (XEN) ****************************************
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with two remarks:

> @@ -1032,6 +1037,8 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>          }
>      }
>  
> +    ASSERT(ipid == -1 || !is_idle_vcpu(curr_on_cpu(ipid)));
> +
>      /*
>       * Only switch to another processor if the credit difference is
>       * greater than the migrate resistance.

If you moved this past the if() following this comment, the ipid == -1
case would already be taken care of, simplifying the code.

And then, having looked back at the commit mentioned in the
description, that one resulted in two constructs like (taking the
code as it looks now)

            if ( cpumask_test_cpu(cpu, &rqd->idle) )
            {
                __cpumask_clear_cpu(cpu, &rqd->idle);
                ...

Is there a reason this can't or shouldn't be

            if ( __cpumask_test_and_clear_cpu(cpu, &rqd->idle) )
            {
                ...
?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [for-4.8 PATCH] xen: credit2: fix wrong assert in runq_tickle().
  2016-11-22 11:21 ` Jan Beulich
@ 2016-11-22 11:38   ` Juergen Gross
  2016-11-22 11:52   ` Dario Faggioli
  1 sibling, 0 replies; 6+ messages in thread
From: Juergen Gross @ 2016-11-22 11:38 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli; +Cc: xen-devel, Wei Liu, George Dunlap

On 22/11/16 12:21, Jan Beulich wrote:
>>>> On 22.11.16 at 11:43, <dario.faggioli@citrix.com> wrote:
>> Since b047f888d489 ("xen: sched: leave CPUs doing tasklet
>> work alone") a cpu executing a tasklet, is not marked as
>> idle.
>>
>> Therefore:
>>  - avoid asserting that we can't find the idle vcpu running
>>    on one of them, which is not true,
>>  - avoid triggering a preemption on them (and add an assert
>>    checking that).
>>
>> This fixes a bug identified by OSSTest, in flight 102372
>> (on ARM, but it's not at all ARM specific), where the
>> ASSERT() was triggering like this:
>>
>> (XEN) Xen call trace:
>> (XEN)    [<0022af78>] sched_credit2.c#runq_tickle+0x3e8/0x61c (PC)
>> (XEN)    [<0022aedc>] sched_credit2.c#runq_tickle+0x34c/0x61c (LR)
>> (XEN)    [<0022b644>] sched_credit2.c#csched2_context_saved+0x128/0x1a4
>> (XEN)    [<0023303c>] context_saved+0x7c/0xa4
>> (XEN)    [<0024f660>] domain.c#schedule_tail+0x2b4/0x308
>> (XEN)    [<0024faac>] context_switch+0x80/0x94
>> (XEN)    [<0022ff48>] schedule.c#schedule+0x76c/0x7ec
>> (XEN)    [<002338d4>] softirq.c#__do_softirq+0xcc/0xec
>> (XEN)    [<00233968>] do_softirq+0x18/0x28
>> (XEN)    [<00261084>] leave_hypervisor_tail+0x58/0x88
>> (XEN)    [<002649d0>] entry.o#return_to_guest+0xc/0xb8
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 1:
>> (XEN) Assertion '!is_idle_vcpu(cur->vcpu)' failed at sched_credit2.c:1009
>> (XEN) ****************************************
>>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with two remarks:
> 
>> @@ -1032,6 +1037,8 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>>          }
>>      }
>>  
>> +    ASSERT(ipid == -1 || !is_idle_vcpu(curr_on_cpu(ipid)));
>> +
>>      /*
>>       * Only switch to another processor if the credit difference is
>>       * greater than the migrate resistance.
> 
> If you moved this past the if() following this comment, the ipid == -1
> case would already be taken care of, simplifying the code.
> 
> And then, having looked back at the commit mentioned in the
> description, that one resulted in two constructs like (taking the
> code as it looks now)
> 
>             if ( cpumask_test_cpu(cpu, &rqd->idle) )
>             {
>                 __cpumask_clear_cpu(cpu, &rqd->idle);
>                 ...
> 
> Is there a reason this can't or shouldn't be
> 
>             if ( __cpumask_test_and_clear_cpu(cpu, &rqd->idle) )

Avoid cache line trashing?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [for-4.8 PATCH] xen: credit2: fix wrong assert in runq_tickle().
  2016-11-22 11:21 ` Jan Beulich
  2016-11-22 11:38   ` Juergen Gross
@ 2016-11-22 11:52   ` Dario Faggioli
  2016-11-22 13:00     ` Jan Beulich
  1 sibling, 1 reply; 6+ messages in thread
From: Dario Faggioli @ 2016-11-22 11:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Wei Liu, George Dunlap


[-- Attachment #1.1: Type: text/plain, Size: 2315 bytes --]

On Tue, 2016-11-22 at 04:21 -0700, Jan Beulich wrote:
> > > > On 22.11.16 at 11:43, <dario.faggioli@citrix.com> wrote:
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>
Thanks a lot for looking at the patch.

> with two remarks:
>
> [...]
> 
> If you moved this past the if() following this comment, the ipid ==
> -1
> case would already be taken care of, simplifying the code.
> 
True. Though about it, but was two minded.

I think the ASSERT is more 'descriptive' if it's kept as close as
possible to the for (and in fact, I now even regret having put a blank
line in between). But I see your point and agree that it's looking
awkward when you see it together with the following if.

So, yes, all in all, I think I like your idea better.

> And then, having looked back at the commit mentioned in the
> description, that one resulted in two constructs like (taking the
> code as it looks now)
> 
>             if ( cpumask_test_cpu(cpu, &rqd->idle) )
>             {
>                 __cpumask_clear_cpu(cpu, &rqd->idle);
>                 ...
> 
> Is there a reason this can't or shouldn't be
> 
>             if ( __cpumask_test_and_clear_cpu(cpu, &rqd->idle) )
>             {
>                 ...
> ?
> 
It can indeed. You can find it done as it is right now in other places,
in scheduling code, though, I guess for cache betterness, as Juergen is
saying. I remember not being sure which way to go, and eventually
leaning toward this one.

I'm still not sure what's best, but it'd be a cleanup/optimization, and
would IMO require, if done, more than just that (e.g., comments should
be improved). So I'd be inclined to consider this 4.9 material... But
thanks for bringing it up. :-)

So, I'm resending this patch with the ASSERT moved below the if, and
I'm keeping your Reviewed-by. Hope that's ok.

Thanks again and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [for-4.8 PATCH] xen: credit2: fix wrong assert in runq_tickle().
  2016-11-22 11:52   ` Dario Faggioli
@ 2016-11-22 13:00     ` Jan Beulich
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Beulich @ 2016-11-22 13:00 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel, Wei Liu, George Dunlap

>>> On 22.11.16 at 12:52, <dario.faggioli@citrix.com> wrote:
> On Tue, 2016-11-22 at 04:21 -0700, Jan Beulich wrote:
>> > > > On 22.11.16 at 11:43, <dario.faggioli@citrix.com> wrote:
>> And then, having looked back at the commit mentioned in the
>> description, that one resulted in two constructs like (taking the
>> code as it looks now)
>> 
>>             if ( cpumask_test_cpu(cpu, &rqd->idle) )
>>             {
>>                 __cpumask_clear_cpu(cpu, &rqd->idle);
>>                 ...
>> 
>> Is there a reason this can't or shouldn't be
>> 
>>             if ( __cpumask_test_and_clear_cpu(cpu, &rqd->idle) )
>>             {
>>                 ...
>> ?
>> 
> It can indeed. You can find it done as it is right now in other places,
> in scheduling code, though, I guess for cache betterness, as Juergen is
> saying. I remember not being sure which way to go, and eventually
> leaning toward this one.

The question really is whether this is in fact very frequently
executed (and hence bouncing cache lines). I can't easily
tell, but I'd guess the actual schedule functions shouldn't
typically run more than once every few milliseconds.

> I'm still not sure what's best, but it'd be a cleanup/optimization, and
> would IMO require, if done, more than just that (e.g., comments should
> be improved). So I'd be inclined to consider this 4.9 material...

Of course, it was just something I've noticed while looking at
that code.

> So, I'm resending this patch with the ASSERT moved below the if, and
> I'm keeping your Reviewed-by. Hope that's ok.

And again of course.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-11-22 13:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-22 10:43 [for-4.8 PATCH] xen: credit2: fix wrong assert in runq_tickle() Dario Faggioli
2016-11-22 10:55 ` Wei Liu
2016-11-22 11:21 ` Jan Beulich
2016-11-22 11:38   ` Juergen Gross
2016-11-22 11:52   ` Dario Faggioli
2016-11-22 13:00     ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).