public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
To: Paul Jackson <pj@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>,
	mingo@elte.hu, nickpiggin@yahoo.com.au, vatsa@in.ibm.com,
	Simon.Derr@bull.net, steiner@sgi.com,
	linux-kernel@vger.kernel.org, akpm@osdl.org
Subject: Re: [BUG] sched: big numa dynamic sched domain memory corruption
Date: Tue, 1 Aug 2006 12:00:02 -0700	[thread overview]
Message-ID: <20060801120002.C9822@unix-os.sc.intel.com> (raw)
In-Reply-To: <20060801012533.4192c5b4.pj@sgi.com>; from pj@sgi.com on Tue, Aug 01, 2006 at 01:25:33AM -0700

On Tue, Aug 01, 2006 at 01:25:33AM -0700, Paul Jackson wrote:
> I wish you well on any further code improvements you have planned for
> this code.  It's tough to understand, with such issues as many #ifdef's,
> an interesting memory layout of the key sched domain arrays that I
> didn't see described much in the comments, and a variety of memory
> allocation calls that are tough to unravel on error.  Portions of
> the code could use some more comments, explaining what is going on.
> For example, I still haven't figured exactly what 'cpu_power' means.

I will add some info to Documentation/sched-domains.txt aswell as some
comments to the code where appropriate. I did some cleanup of the code
but unfortunately that got dropped because of some issues. I will repost
that cleanup patch aswell.

> 
> The allocations of sched_group_allnodes, sched_group_phys and
> sched_group_core are -big- on our ia64 SN2 systems (1024 CPUs),
> and could fail once a system has been up for a while and is
> getting memory tight and fragmented.

I have to agree with you. I have an idea(basically passing cpu_map info
to functions which determine the group) to solve this issue. Let me work
on it and post a fix.

> It is not obvious to me from the code or comments just how sched
> domains are arranged on various large systems with hyper-threading
> (SMT) and/or multiple cores (MC) and/or multiple processor packages
> per node, and how scheduling is affected by all this.

Enabling SCHED_DOMAIN_DEBUG should atleast show how sched domains
and groups are arranged. Adding an example in Documentation might
be a good idea.

> 
> This was about the third bug that has come by in it -- which I
> in particular notice when it is someone playing with cpu_exclusive
> cpusets who hits the bug.  Any kernel backtrace with 'cpuset' buried in
> it tends to migrate to my inbox.  This latest bug was particularly
> nasty, as is usually the case with random memory corruption bugs,
> costing us a bunch of hours.
> 
> Good luck.
> 
> If you are aware of any other fixes/patches besides the above that us
> big honkin numa iron SLES10 users need for reliable operation, let me
> know.

Will keep you in loop.

thanks,
suresh

  reply	other threads:[~2006-08-01 19:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-31  7:07 [BUG] sched: big numa dynamic sched domain memory corruption Paul Jackson
2006-07-31  7:12 ` Ingo Molnar
2006-07-31 16:04   ` Siddha, Suresh B
2006-07-31 16:54     ` Paul Jackson
2006-07-31 17:15       ` Siddha, Suresh B
2006-08-02  6:57         ` Paul Jackson
2006-08-02 21:36           ` Siddha, Suresh B
2006-08-02 21:58             ` Paul Jackson
2006-08-06  1:38             ` Paul Jackson
2006-07-31 17:04     ` Paul Jackson
2006-08-01  8:25     ` Paul Jackson
2006-08-01 19:00       ` Siddha, Suresh B [this message]
2006-08-01 19:16         ` Paul Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060801120002.C9822@unix-os.sc.intel.com \
    --to=suresh.b.siddha@intel.com \
    --cc=Simon.Derr@bull.net \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=nickpiggin@yahoo.com.au \
    --cc=pj@sgi.com \
    --cc=steiner@sgi.com \
    --cc=vatsa@in.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox