From: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
To: Paul Jackson <pj@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>,
mingo@elte.hu, nickpiggin@yahoo.com.au, vatsa@in.ibm.com,
Simon.Derr@bull.net, steiner@sgi.com,
linux-kernel@vger.kernel.org, akpm@osdl.org
Subject: Re: [BUG] sched: big numa dynamic sched domain memory corruption
Date: Tue, 1 Aug 2006 12:00:02 -0700 [thread overview]
Message-ID: <20060801120002.C9822@unix-os.sc.intel.com> (raw)
In-Reply-To: <20060801012533.4192c5b4.pj@sgi.com>; from pj@sgi.com on Tue, Aug 01, 2006 at 01:25:33AM -0700
On Tue, Aug 01, 2006 at 01:25:33AM -0700, Paul Jackson wrote:
> I wish you well on any further code improvements you have planned for
> this code. It's tough to understand, with such issues as many #ifdef's,
> an interesting memory layout of the key sched domain arrays that I
> didn't see described much in the comments, and a variety of memory
> allocation calls that are tough to unravel on error. Portions of
> the code could use some more comments, explaining what is going on.
> For example, I still haven't figured exactly what 'cpu_power' means.
I will add some info to Documentation/sched-domains.txt aswell as some
comments to the code where appropriate. I did some cleanup of the code
but unfortunately that got dropped because of some issues. I will repost
that cleanup patch aswell.
>
> The allocations of sched_group_allnodes, sched_group_phys and
> sched_group_core are -big- on our ia64 SN2 systems (1024 CPUs),
> and could fail once a system has been up for a while and is
> getting memory tight and fragmented.
I have to agree with you. I have an idea(basically passing cpu_map info
to functions which determine the group) to solve this issue. Let me work
on it and post a fix.
> It is not obvious to me from the code or comments just how sched
> domains are arranged on various large systems with hyper-threading
> (SMT) and/or multiple cores (MC) and/or multiple processor packages
> per node, and how scheduling is affected by all this.
Enabling SCHED_DOMAIN_DEBUG should atleast show how sched domains
and groups are arranged. Adding an example in Documentation might
be a good idea.
>
> This was about the third bug that has come by in it -- which I
> in particular notice when it is someone playing with cpu_exclusive
> cpusets who hits the bug. Any kernel backtrace with 'cpuset' buried in
> it tends to migrate to my inbox. This latest bug was particularly
> nasty, as is usually the case with random memory corruption bugs,
> costing us a bunch of hours.
>
> Good luck.
>
> If you are aware of any other fixes/patches besides the above that us
> big honkin numa iron SLES10 users need for reliable operation, let me
> know.
Will keep you in loop.
thanks,
suresh
next prev parent reply other threads:[~2006-08-01 19:10 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-07-31 7:07 [BUG] sched: big numa dynamic sched domain memory corruption Paul Jackson
2006-07-31 7:12 ` Ingo Molnar
2006-07-31 16:04 ` Siddha, Suresh B
2006-07-31 16:54 ` Paul Jackson
2006-07-31 17:15 ` Siddha, Suresh B
2006-08-02 6:57 ` Paul Jackson
2006-08-02 21:36 ` Siddha, Suresh B
2006-08-02 21:58 ` Paul Jackson
2006-08-06 1:38 ` Paul Jackson
2006-07-31 17:04 ` Paul Jackson
2006-08-01 8:25 ` Paul Jackson
2006-08-01 19:00 ` Siddha, Suresh B [this message]
2006-08-01 19:16 ` Paul Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060801120002.C9822@unix-os.sc.intel.com \
--to=suresh.b.siddha@intel.com \
--cc=Simon.Derr@bull.net \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=nickpiggin@yahoo.com.au \
--cc=pj@sgi.com \
--cc=steiner@sgi.com \
--cc=vatsa@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.