public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Max Krasnyansky <maxk@qualcomm.com>
To: Paul Jackson <pj@sgi.com>
Cc: mingo@elte.hu, linux-kernel@vger.kernel.org, menage@google.com,
	a.p.zijlstra@chello.nl, vegard.nossum@gmail.com,
	lizf@cn.fujitsu.com
Subject: Re: [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
Date: Mon, 04 Aug 2008 15:11:13 -0700	[thread overview]
Message-ID: <48977E81.4040207@qualcomm.com> (raw)
In-Reply-To: <20080804010033.0d1b0549.pj@sgi.com>

Paul Jackson wrote:
> So far as I can tell, the logic and design is ok.
> 
> I found some of the comments, function names and code factoring
> to be confusing or incomplete.  And I suspect that the rebuilding
> of sched domains in the case that sched_power_savings_store()
> is called in kernel/sched.c, on systems using cpusets, is not
> needed or desirable (I could easily be wrong here - beware!).
Actually we do need it. See below.

> I'm attaching a patch that has the changes that I would like
> to suggest for your consideration.  I have only recompiled this
> patch, for one configuration - x86_64 with CPUSETS.  I am hoping,
> Max, that you can complete the testing.
> 
> The patch below applies to 2.6.27-rc1, plus the first patch,
> "origin.patch" in Andrew's 2.6.27-rc1-mm1 quilt patch stack,
> plus your (Max's) latest:
>     [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
> 
> Here's a description of most of what I noticed:
> 
>  1) Unrelated spacing tweak, changing:
>         LIST_HEAD(q);           /* queue of cpusets to be scanned*/
>     to:
>         LIST_HEAD(q);           /* queue of cpusets to be scanned */
Check.

>  2) The phrasing:
> 	Must be called with cgroup_lock held.
>     seems imprecise to me; "cgroup_lock" is the name of a wrapper
>     function, not of a lock.  The underlying lock is cgroup_mutex,
>     which is still mentioned by name in various kernel/cpuset.c
>     comments, even though cgroup_mutex is static in kernel/cgroup.c
>     So I fiddled with the wording of this phrasing.
Check.

>  3) You removed the paragraph:
> 	  * ...  May take callback_mutex during
> 	  * call due to the kfifo_alloc() and kmalloc() calls.  May nest
> 	  * a call to the get_online_cpus()/put_online_cpus() pair.
> 	  * Must not be called holding callback_mutex, because we must not
> 	  * call get_online_cpus() while holding callback_mutex.  Elsewhere
> 	  * the kernel nests callback_mutex inside get_online_cpus() calls.
> 	  * So the reverse nesting would risk an ABBA deadlock.
>    
>    But you didn't replace it with an updated locking description.
>    I expanded and tweaked various locking comments.
I think it it's covered by the top level comment that starts with
/*
 * There are two global mutexes guarding cpuset structures.  The first
 * is the main control groups cgroup_mutex, accessed via
 * cgroup_lock()/cgroup_unlock().
 ....

But I'm ok with your suggested version.

>  4) The alignment of the right side of consecutive assignment statements,
>     as in:
>         ndoms = 0;
>         doms  = NULL;
>         dattr = NULL;
>         csa   = NULL;
>     or
>         *domains    = doms;
>         *attributes = dattr;
>     is not usually done in kernel code.  I suggest obeying convention,
>     and not aligning these here either.
Check.

>  5) The rebuilding of sched domains in the sched_power_savings_store()
>     routine in kernel/sched.c struck me as inappropriate for systems
>     that were managing sched domains using cpusets.  So I made that
>     sched.c rebuild only apply if CONFIG_CPUSETS was not configured,
>     which in turn enabled me to remove rebuild_sched_domains() from
>     being externally visible outside kernel/cpuset.c
>     
>     I don't know if this change is correct, however.
Actually it is appropriate, and there is one more user of the
arch_reinit_sched_domains() which is S390 topology updates.
Those things (mc_power and topology updates) have to update domain flags based
on the mc/smt power and current topology settings.
This is done in the
  __rebuild_sched_domains()
       ...
       SD_INIT(sd, ALLNODES);
       ...
       SD_INIT(sd, MC);
       ...

SD_INIT(sd,X) uses one of SD initializers defined in the include/linux/topology.h
For example SD_CPU_INIT() includes BALANCE_FOR_PKG_POWER which expands to

#define BALANCE_FOR_PKG_POWER   \
        ((sched_mc_power_savings || sched_smt_power_savings) ?  \
         SD_POWERSAVINGS_BALANCE : 0)

Yes it's kind convoluted :). Anyway, the point is that we need to rebuild the
domains when those settings change. We could probably write a simpler version
that just iterates existing domains and updates the flags. Maybe some other dat :)

>  6) The renaming of rebuild_sched_domains() to generate_sched_domains()
>     was only partial, and along with the added similarly named routines
>     seemed confusing to me.  Also, that rename of rebuild_sched_domains()
>     was only partially accomplished, not being done in some comments
>     and in one printk kernel warning.
> 
>     I reverted that rename.
Actually it was not much of a rename. The only rename was
rebuild_sched_domains() -> async_rebuild_sched_domains() and yes looks like I
missed one or two comment.
The other stuff was basically factoring out the function that generates the
domain masks/attrs from the function that actually tells the scheduler to
rebuild them.
I'd be ok with some other name for generate_sched_domains() if you think it's
confusing.

btw With your current proposal we have
 rebuild_sched_domains()       - just generates domain masks/attrs
 async_rebuild_sched_domains() - generates domain masks/attrs and actually
                                 rebuilds the domains

That I think is more confusing. So I guess my preference would be to have the
generation part separate. And like I explained above we do need to be able to
call rebuild_sched_domains() from the scheduler code (at least at this point).

>     I also reduced by one the number of functions needed to handle
>     the asynchronous invocation of this rebuild, essentially collapsing
>     your routines rebuild_sched_domains() and rebuild_domains_callback()
>     into a single routine, named rebuild_sched_domains_thread().
> 
>     Thanks to the above change (5), the naming and structure of these
>     routines was no longer visible outside kernel/cpuset.c, making
>     this collapsing of two functions into one easier.
Yes if we did not have to call rebuild_sched_domains() from
arch_reinit_sched_domains() I would've done something similar.

Sounds like you're ok with the general approach and as I mentioned we do need
externally usable rebuild_sched_domains(). So how about this. I'll update
style/comments based on your suggestions but keep the generate_sched_domains()
 and partition_sched_domains() split. Or if you have a better name for
generate_sched_domains() we'll use that instead.

Max

  reply	other threads:[~2008-08-04 22:11 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-01 22:59 [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1) Max Krasnyansky
2008-08-02 11:39 ` Paul Jackson
2008-08-02 16:32   ` Max Krasnyansky
2008-08-03  3:51     ` Paul Jackson
2008-08-03 18:07       ` Max Krasnyansky
2008-08-04  6:00         ` Paul Jackson
2008-08-04 22:11           ` Max Krasnyansky [this message]
2008-08-05  3:56             ` Paul Jackson
2008-08-05 20:30               ` Max Krasnyansky
2008-08-05 23:05                 ` Paul Jackson
2008-08-06  3:24                   ` Max Krasnyansky
2008-08-06  3:29                     ` Paul Jackson
2008-08-06  3:53                       ` Max Krasnyansky
2008-08-06  4:28                         ` Paul Jackson
2008-08-06  5:03                           ` Max Krasnyansky
2008-08-06  5:46                             ` Paul Jackson
2008-08-06 20:20                               ` Max Krasnyansky
2008-08-06 20:29                                 ` Paul Jackson
2008-08-06 20:30                                   ` Paul Menage
2008-08-06 20:56                                     ` Paul Jackson
2008-08-06 20:36                                   ` Max Krasnyansky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48977E81.4040207@qualcomm.com \
    --to=maxk@qualcomm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=menage@google.com \
    --cc=mingo@elte.hu \
    --cc=pj@sgi.com \
    --cc=vegard.nossum@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox