All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	George Dunlap <George.Dunlap@citrix.com>,
	"JBeulich@suse.com" <JBeulich@suse.com>,
	"uma.sharma523@gmail.com" <uma.sharma523@gmail.com>
Subject: Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
Date: Thu, 19 Mar 2015 10:03:57 +0000	[thread overview]
Message-ID: <1426759435.2560.104.camel@citrix.com> (raw)
In-Reply-To: <5509B066.5050302@eu.citrix.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 9070 bytes --]

On Wed, 2015-03-18 at 17:05 +0000, George Dunlap wrote:
> On 03/18/2015 04:49 PM, Dario Faggioli wrote:

> >> If this would work, I think it would be a lot better.  It would remove
> >> the credit2-specific callback, and it would mean we wouldn't need an
> >> additional test to filter out
> >>
> > I see. Ok, I guess I can give this a try. If it does not explode,
> > because some weird dependency we can't see right now, we can either try
> > harder to track it, or use the other solution outlined above as a
> > fallback.
> 
> If you could look into this, that would be awesome. :-)
> 
So, I did look into this. In summary, I think it could work, but some
rework is required. Here's my findings in some more details.

So, with the quick-&-dirty(TM) patch attached to this email, Credit2
works-a-charm:

root@Zhaman:~# xl  dmesg |grep runqueue
(XEN) Adding cpu 0 to runqueue 0
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 1 to runqueue 1
(XEN)  First cpu on runqueue, activating
(XEN) Adding cpu 2 to runqueue 1
(XEN) Adding cpu 3 to runqueue 1
(XEN) Adding cpu 4 to runqueue 1
(XEN) Adding cpu 5 to runqueue 1
(XEN) Adding cpu 6 to runqueue 1
(XEN) Adding cpu 7 to runqueue 1
(XEN) Adding cpu 8 to runqueue 0
(XEN) Adding cpu 9 to runqueue 0
(XEN) Adding cpu 10 to runqueue 0
(XEN) Adding cpu 11 to runqueue 0
(XEN) Adding cpu 12 to runqueue 0
(XEN) Adding cpu 13 to runqueue 0
(XEN) Adding cpu 14 to runqueue 0
(XEN) Adding cpu 15 to runqueue 0

(Yes, cpu0 is assigned to the wrong runqueue, but that is because I'm
ignoring the system_state <==> boot_cpu_data thing for now, so let's
also ignore this, it'll be fixed).

So, things are working, less complexity (i.e., no ad-hoc callback for
credit2), less code (i.e., lots of '-' in the patch), outside is warm
and sunny, spring is coming, birds are singing, etc... :-D :-D

Unfortunately, the patch also makes Credit1 go splat like this:

(XEN) Xen call trace:
(XEN)    [<ffff82d08012d14c>] check_lock+0x37/0x3b
(XEN)    [<ffff82d08012d17b>] _spin_lock+0x11/0x54
(XEN)    [<ffff82d08013498f>] xmem_pool_alloc+0x11c/0x46c
(XEN)    [<ffff82d0801350a6>] _xmalloc+0x106/0x316
(XEN)    [<ffff82d0801352c7>] _xzalloc+0x11/0x32
(XEN)    [<ffff82d08011f5ae>] csched_alloc_pdata+0x47/0x1c6
(XEN)    [<ffff82d0801293d6>] cpu_schedule_callback+0x5a/0xcc
(XEN)    [<ffff82d08011ab8a>] notifier_call_chain+0x67/0x87
(XEN)    [<ffff82d08010169a>] notify_cpu_starting+0x1c/0x24
(XEN)    [<ffff82d080189f5b>] start_secondary+0x203/0x256
(XEN) 
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) Xen BUG at spinlock.c:48
(XEN) ****************************************

(...and, suddenly, birds shut up :-( )

That is because pool->lock in xmem_pool_alloc() wants to find IRQs
enabled when acquired, as it's been acquired with IRQs enabled before,
and we don't mix IRQ enabled and disabled spinlocks (that's what the BUG
at spinlock:48 is for).

The difference between before and after the patch is that, bebore,
csched_alloc_pdata() is called during CPU_UP_PREPARE, after, during
CPU_STARTING. More specifically, CPU_UP_PREPARE callbacks are invoked as
a consequence of cpu_up() being called in a loop in __start_xen(), and
it itself calls notifier_call_chain(..CPU_UP_PREPARE..). This means they
run:
 - on the boot cpu;
 - with interrupt enabled (see how the call to local_irq_enable() in
   __start_xen() is *before* the for_each_present_cpu() { cpu_up(i); }
   loop).

OTOH, CPU_STARTING callbacks run:
 - on the cpu being brought up;
 - with interrupt disabled (see how the call to local_irq_enable(), in
   start_secondary(), is *after* the invocation of
   notify_cpu_starting()).

Here we are. And the reason why things works ok in Credit2, is that
csched2_alloc_pdata() doesn't really allocate anything! In fact, in
general, handling alloc_pdata() during CPU_STARTING would mean that we
can't allocate any memory which, given the name of the function, would
look rather odd. :-)

Nevertheless I see the value of doing so, and hence I think what we
could do would be to introduce a new hook in the scheduler interface,
called .init_pdata or .init_pcpu, and, in sched_*.c, split the
allocation and the initialization parts. The former will be handled
during CPU_UP_PREPARE, when allocation is possible, the latter during
CPU_STARTING, when we have more info available to perform actual
initializations.

For Credit2, alloc_pdata() wouldn't even need to exist, and all the work
will be done in init_pcpu(), so less complexity than now, and no need
for the ad-hoc starting callback inside sched_credit2.c. For Credit1,
both will be required, but the added complexity in sched_credit.c
doesn't look too bad at all to me.

Of couse, I still have to code it up, and see whether this plays well
with cpupools, cpu on/offlining, etc., but I'd like to hear whether you
like the idea, before getting to do that. :-)

Let me know what you think.

Regards,
Dario
---
 sched_credit2.c |   63 ++------------------------------------------------------
 schedule.c      |   10 +++++---
 2 files changed, 9 insertions(+), 64 deletions(-)
---
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ad0a5d4..ac7a7f2 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1918,7 +1918,8 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
     cpumask_clear_cpu(rqi, &prv->active_queues);
 }
 
-static void init_pcpu(const struct scheduler *ops, int cpu)
+static void *
+csched2_alloc_pdata(const struct scheduler *ops, int cpu)
 {
     int rqi;
     unsigned long flags;
@@ -1932,7 +1933,7 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
     {
         printk("%s: Strange, cpu %d already initialized!\n", __func__, cpu);
         spin_unlock_irqrestore(&prv->lock, flags);
-        return;
+        return NULL;
     }
 
     /* Figure out which runqueue to put it in */
@@ -1980,20 +1981,6 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
 
     spin_unlock_irqrestore(&prv->lock, flags);
 
-    return;
-}
-
-static void *
-csched2_alloc_pdata(const struct scheduler *ops, int cpu)
-{
-    /* Check to see if the cpu is online yet */
-    /* Note: cpu 0 doesn't get a STARTING callback */
-    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
-        init_pcpu(ops, cpu);
-    else
-        printk("%s: cpu %d not online yet, deferring initializatgion\n",
-               __func__, cpu);
-
     return (void *)1;
 }
 
@@ -2046,49 +2033,6 @@ csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 }
 
 static int
-csched2_cpu_starting(int cpu)
-{
-    struct scheduler *ops;
-
-    /* Hope this is safe from cpupools switching things around. :-) */
-    ops = per_cpu(scheduler, cpu);
-
-    if ( ops->alloc_pdata == csched2_alloc_pdata )
-        init_pcpu(ops, cpu);
-
-    return NOTIFY_DONE;
-}
-
-static int cpu_credit2_callback(
-    struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-    unsigned int cpu = (unsigned long)hcpu;
-    int rc = 0;
-
-    switch ( action )
-    {
-    case CPU_STARTING:
-        csched2_cpu_starting(cpu);
-        break;
-    default:
-        break;
-    }
-
-    return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
-}
-
-static struct notifier_block cpu_credit2_nfb = {
-    .notifier_call = cpu_credit2_callback
-};
-
-static int
-csched2_global_init(void)
-{
-    register_cpu_notifier(&cpu_credit2_nfb);
-    return 0;
-}
-
-static int
 csched2_init(struct scheduler *ops)
 {
     int i;
@@ -2168,7 +2112,6 @@ const struct scheduler sched_credit2_def = {
 
     .dump_cpu_state = csched2_dump_pcpu,
     .dump_settings  = csched2_dump,
-    .global_init    = csched2_global_init,
     .init           = csched2_init,
     .deinit         = csched2_deinit,
     .alloc_vdata    = csched2_alloc_vdata,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index ef79847..5e7cdc9 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1327,10 +1327,6 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
 
-    if ( (ops.alloc_pdata != NULL) &&
-         ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
-        return -ENOMEM;
-
     return 0;
 }
 
@@ -1348,10 +1344,16 @@ static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
     unsigned int cpu = (unsigned long)hcpu;
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     int rc = 0;
 
     switch ( action )
     {
+    case CPU_STARTING:
+        if ( (ops.alloc_pdata != NULL) &&
+             ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
+            return -ENOMEM;
+        break;
     case CPU_UP_PREPARE:
         rc = cpu_schedule_up(cpu);
         break;


[-- Attachment #1.1.2: xen-sched-cpustarting.patch --]
[-- Type: text/x-patch, Size: 3695 bytes --]

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ad0a5d4..ac7a7f2 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1918,7 +1918,8 @@ static void deactivate_runqueue(struct csched2_private *prv, int rqi)
     cpumask_clear_cpu(rqi, &prv->active_queues);
 }
 
-static void init_pcpu(const struct scheduler *ops, int cpu)
+static void *
+csched2_alloc_pdata(const struct scheduler *ops, int cpu)
 {
     int rqi;
     unsigned long flags;
@@ -1932,7 +1933,7 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
     {
         printk("%s: Strange, cpu %d already initialized!\n", __func__, cpu);
         spin_unlock_irqrestore(&prv->lock, flags);
-        return;
+        return NULL;
     }
 
     /* Figure out which runqueue to put it in */
@@ -1980,20 +1981,6 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
 
     spin_unlock_irqrestore(&prv->lock, flags);
 
-    return;
-}
-
-static void *
-csched2_alloc_pdata(const struct scheduler *ops, int cpu)
-{
-    /* Check to see if the cpu is online yet */
-    /* Note: cpu 0 doesn't get a STARTING callback */
-    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
-        init_pcpu(ops, cpu);
-    else
-        printk("%s: cpu %d not online yet, deferring initializatgion\n",
-               __func__, cpu);
-
     return (void *)1;
 }
 
@@ -2046,49 +2033,6 @@ csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 }
 
 static int
-csched2_cpu_starting(int cpu)
-{
-    struct scheduler *ops;
-
-    /* Hope this is safe from cpupools switching things around. :-) */
-    ops = per_cpu(scheduler, cpu);
-
-    if ( ops->alloc_pdata == csched2_alloc_pdata )
-        init_pcpu(ops, cpu);
-
-    return NOTIFY_DONE;
-}
-
-static int cpu_credit2_callback(
-    struct notifier_block *nfb, unsigned long action, void *hcpu)
-{
-    unsigned int cpu = (unsigned long)hcpu;
-    int rc = 0;
-
-    switch ( action )
-    {
-    case CPU_STARTING:
-        csched2_cpu_starting(cpu);
-        break;
-    default:
-        break;
-    }
-
-    return !rc ? NOTIFY_DONE : notifier_from_errno(rc);
-}
-
-static struct notifier_block cpu_credit2_nfb = {
-    .notifier_call = cpu_credit2_callback
-};
-
-static int
-csched2_global_init(void)
-{
-    register_cpu_notifier(&cpu_credit2_nfb);
-    return 0;
-}
-
-static int
 csched2_init(struct scheduler *ops)
 {
     int i;
@@ -2168,7 +2112,6 @@ const struct scheduler sched_credit2_def = {
 
     .dump_cpu_state = csched2_dump_pcpu,
     .dump_settings  = csched2_dump,
-    .global_init    = csched2_global_init,
     .init           = csched2_init,
     .deinit         = csched2_deinit,
     .alloc_vdata    = csched2_alloc_vdata,
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index ef79847..5e7cdc9 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1327,10 +1327,6 @@ static int cpu_schedule_up(unsigned int cpu)
     if ( idle_vcpu[cpu] == NULL )
         return -ENOMEM;
 
-    if ( (ops.alloc_pdata != NULL) &&
-         ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
-        return -ENOMEM;
-
     return 0;
 }
 
@@ -1348,10 +1344,16 @@ static int cpu_schedule_callback(
     struct notifier_block *nfb, unsigned long action, void *hcpu)
 {
     unsigned int cpu = (unsigned long)hcpu;
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     int rc = 0;
 
     switch ( action )
     {
+    case CPU_STARTING:
+        if ( (ops.alloc_pdata != NULL) &&
+             ((sd->sched_priv = ops.alloc_pdata(&ops, cpu)) == NULL) )
+            return -ENOMEM;
+        break;
     case CPU_UP_PREPARE:
         rc = cpu_schedule_up(cpu);
         break;

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2015-03-19 10:03 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-13 18:11 [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code Uma Sharma
2015-03-13 18:29 ` Andrew Cooper
2015-03-13 19:13   ` George Dunlap
2015-03-16 12:48     ` Jan Beulich
2015-03-16 12:51       ` George Dunlap
2015-03-16 12:56         ` Jan Beulich
2015-03-16 13:26           ` Dario Faggioli
2015-03-17 18:18           ` Dario Faggioli
2015-03-18  7:56             ` Jan Beulich
2015-03-18  8:53               ` Dario Faggioli
2015-03-18 15:26                 ` George Dunlap
2015-03-18 15:59                   ` Jan Beulich
2015-03-18 16:08                     ` George Dunlap
2015-03-18 16:30                       ` Dario Faggioli
2015-03-18 16:49                   ` Dario Faggioli
2015-03-18 17:05                     ` George Dunlap
2015-03-19 10:03                       ` Dario Faggioli [this message]
2015-03-19 10:50                         ` Jan Beulich
2015-03-19 11:23                           ` Dario Faggioli
2015-03-19 11:40                           ` George Dunlap
2015-03-19 12:29                             ` Dario Faggioli
2015-03-19 12:35                               ` George Dunlap
2015-03-19 13:00                                 ` Dario Faggioli
2015-03-16 12:45 ` Jan Beulich
2015-03-16 12:49 ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1426759435.2560.104.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=George.Dunlap@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=uma.sharma523@gmail.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.