From: Max Krasnyansky <maxk@qualcomm.com>
To: Paul Jackson <pj@sgi.com>
Cc: mingo@elte.hu, linux-kernel@vger.kernel.org, menage@google.com,
a.p.zijlstra@chello.nl, vegard.nossum@gmail.com,
lizf@cn.fujitsu.com
Subject: Re: [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
Date: Mon, 04 Aug 2008 15:11:13 -0700 [thread overview]
Message-ID: <48977E81.4040207@qualcomm.com> (raw)
In-Reply-To: <20080804010033.0d1b0549.pj@sgi.com>
Paul Jackson wrote:
> So far as I can tell, the logic and design is ok.
>
> I found some of the comments, function names and code factoring
> to be confusing or incomplete. And I suspect that the rebuilding
> of sched domains in the case that sched_power_savings_store()
> is called in kernel/sched.c, on systems using cpusets, is not
> needed or desirable (I could easily be wrong here - beware!).
Actually we do need it. See below.
> I'm attaching a patch that has the changes that I would like
> to suggest for your consideration. I have only recompiled this
> patch, for one configuration - x86_64 with CPUSETS. I am hoping,
> Max, that you can complete the testing.
>
> The patch below applies to 2.6.27-rc1, plus the first patch,
> "origin.patch" in Andrew's 2.6.27-rc1-mm1 quilt patch stack,
> plus your (Max's) latest:
> [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
>
> Here's a description of most of what I noticed:
>
> 1) Unrelated spacing tweak, changing:
> LIST_HEAD(q); /* queue of cpusets to be scanned*/
> to:
> LIST_HEAD(q); /* queue of cpusets to be scanned */
Check.
> 2) The phrasing:
> Must be called with cgroup_lock held.
> seems imprecise to me; "cgroup_lock" is the name of a wrapper
> function, not of a lock. The underlying lock is cgroup_mutex,
> which is still mentioned by name in various kernel/cpuset.c
> comments, even though cgroup_mutex is static in kernel/cgroup.c
> So I fiddled with the wording of this phrasing.
Check.
> 3) You removed the paragraph:
> * ... May take callback_mutex during
> * call due to the kfifo_alloc() and kmalloc() calls. May nest
> * a call to the get_online_cpus()/put_online_cpus() pair.
> * Must not be called holding callback_mutex, because we must not
> * call get_online_cpus() while holding callback_mutex. Elsewhere
> * the kernel nests callback_mutex inside get_online_cpus() calls.
> * So the reverse nesting would risk an ABBA deadlock.
>
> But you didn't replace it with an updated locking description.
> I expanded and tweaked various locking comments.
I think it it's covered by the top level comment that starts with
/*
* There are two global mutexes guarding cpuset structures. The first
* is the main control groups cgroup_mutex, accessed via
* cgroup_lock()/cgroup_unlock().
....
But I'm ok with your suggested version.
> 4) The alignment of the right side of consecutive assignment statements,
> as in:
> ndoms = 0;
> doms = NULL;
> dattr = NULL;
> csa = NULL;
> or
> *domains = doms;
> *attributes = dattr;
> is not usually done in kernel code. I suggest obeying convention,
> and not aligning these here either.
Check.
> 5) The rebuilding of sched domains in the sched_power_savings_store()
> routine in kernel/sched.c struck me as inappropriate for systems
> that were managing sched domains using cpusets. So I made that
> sched.c rebuild only apply if CONFIG_CPUSETS was not configured,
> which in turn enabled me to remove rebuild_sched_domains() from
> being externally visible outside kernel/cpuset.c
>
> I don't know if this change is correct, however.
Actually it is appropriate, and there is one more user of the
arch_reinit_sched_domains() which is S390 topology updates.
Those things (mc_power and topology updates) have to update domain flags based
on the mc/smt power and current topology settings.
This is done in the
__rebuild_sched_domains()
...
SD_INIT(sd, ALLNODES);
...
SD_INIT(sd, MC);
...
SD_INIT(sd,X) uses one of SD initializers defined in the include/linux/topology.h
For example SD_CPU_INIT() includes BALANCE_FOR_PKG_POWER which expands to
#define BALANCE_FOR_PKG_POWER \
((sched_mc_power_savings || sched_smt_power_savings) ? \
SD_POWERSAVINGS_BALANCE : 0)
Yes it's kind convoluted :). Anyway, the point is that we need to rebuild the
domains when those settings change. We could probably write a simpler version
that just iterates existing domains and updates the flags. Maybe some other dat :)
> 6) The renaming of rebuild_sched_domains() to generate_sched_domains()
> was only partial, and along with the added similarly named routines
> seemed confusing to me. Also, that rename of rebuild_sched_domains()
> was only partially accomplished, not being done in some comments
> and in one printk kernel warning.
>
> I reverted that rename.
Actually it was not much of a rename. The only rename was
rebuild_sched_domains() -> async_rebuild_sched_domains() and yes looks like I
missed one or two comment.
The other stuff was basically factoring out the function that generates the
domain masks/attrs from the function that actually tells the scheduler to
rebuild them.
I'd be ok with some other name for generate_sched_domains() if you think it's
confusing.
btw With your current proposal we have
rebuild_sched_domains() - just generates domain masks/attrs
async_rebuild_sched_domains() - generates domain masks/attrs and actually
rebuilds the domains
That I think is more confusing. So I guess my preference would be to have the
generation part separate. And like I explained above we do need to be able to
call rebuild_sched_domains() from the scheduler code (at least at this point).
> I also reduced by one the number of functions needed to handle
> the asynchronous invocation of this rebuild, essentially collapsing
> your routines rebuild_sched_domains() and rebuild_domains_callback()
> into a single routine, named rebuild_sched_domains_thread().
>
> Thanks to the above change (5), the naming and structure of these
> routines was no longer visible outside kernel/cpuset.c, making
> this collapsing of two functions into one easier.
Yes if we did not have to call rebuild_sched_domains() from
arch_reinit_sched_domains() I would've done something similar.
Sounds like you're ok with the general approach and as I mentioned we do need
externally usable rebuild_sched_domains(). So how about this. I'll update
style/comments based on your suggestions but keep the generate_sched_domains()
and partition_sched_domains() split. Or if you have a better name for
generate_sched_domains() we'll use that instead.
Max
next prev parent reply other threads:[~2008-08-04 22:11 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-01 22:59 [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1) Max Krasnyansky
2008-08-02 11:39 ` Paul Jackson
2008-08-02 16:32 ` Max Krasnyansky
2008-08-03 3:51 ` Paul Jackson
2008-08-03 18:07 ` Max Krasnyansky
2008-08-04 6:00 ` Paul Jackson
2008-08-04 22:11 ` Max Krasnyansky [this message]
2008-08-05 3:56 ` Paul Jackson
2008-08-05 20:30 ` Max Krasnyansky
2008-08-05 23:05 ` Paul Jackson
2008-08-06 3:24 ` Max Krasnyansky
2008-08-06 3:29 ` Paul Jackson
2008-08-06 3:53 ` Max Krasnyansky
2008-08-06 4:28 ` Paul Jackson
2008-08-06 5:03 ` Max Krasnyansky
2008-08-06 5:46 ` Paul Jackson
2008-08-06 20:20 ` Max Krasnyansky
2008-08-06 20:29 ` Paul Jackson
2008-08-06 20:30 ` Paul Menage
2008-08-06 20:56 ` Paul Jackson
2008-08-06 20:36 ` Max Krasnyansky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48977E81.4040207@qualcomm.com \
--to=maxk@qualcomm.com \
--cc=a.p.zijlstra@chello.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
--cc=mingo@elte.hu \
--cc=pj@sgi.com \
--cc=vegard.nossum@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.