All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Jackson <pj@sgi.com>
To: Paul Menage <menage@google.com>
Cc: rientjes@google.com, nickpiggin@yahoo.com.au,
	a.p.zijlstra@chello.nl, balbir@linux.vnet.ibm.com,
	linux-kernel@vger.kernel.org, clg@fr.ibm.com,
	ebiederm@xmission.com, containers@lists.osdl.org,
	serue@us.ibm.com, svaidy@linux.vnet.ibm.com,
	akpm@linux-foundation.org, xemul@openvz.org
Subject: Re: [RFC] cpuset update_cgroup_cpus_allowed
Date: Mon, 15 Oct 2007 19:34:39 -0700	[thread overview]
Message-ID: <20071015193439.fe67bc4d.pj@sgi.com> (raw)
In-Reply-To: <471403BF.4090203@google.com>

> currently against an older kernel 

ah .. which older kernel?

I tried it against the broken out 2.6.23-rc8-mm2 patch set,
inserting it before the task-containersv11-* patches, but
that blew up on me - three rejected hunks.

Any chance of getting this against a current cgroup (aka
container) kernel?

Could you use the diff --show-c-function option when composing
patches - they're easier to read that way - thanks.

+	if (!retval) {
+		cpus_allowed = cpuset_cpus_allowed(p);
+		if (!cpus_subset(new_mask, cpus_allowed)) {
+			/*
+			 * We must have raced with a concurrent cpuset
+			 * update. Just reset the cpus_allowed to the
+			 * cpuset's cpus_allowed
+			 */
+			new_mask = cpus_allowed;

This narrows the race, perhaps sufficiently, but I don't see that it
guarantees closure.  Memory accesses to two different locations are not
guaranteed to be ordered across nodes, as best I recall.  The second
line above, that rereads the cpuset cpus_allowed, could get an old
value, in essence.

	cpuset update task		sched_setaffinity task
	------------------		----------------------

	A. write cpuset	[Q]		V. read cpuset [Q]
	B. read task [P]		W. check ok
	C. write task [P]		X. write task [P]
					Y. reread cpuset [Q]
					Z. check ok again

Two memory locations:
	[P] the cpus_allowed mask in the task_struct of the
		task doing the sched_setaffinity call.
	[Q] the cpus_allowed mask in the cpuset of the cpuset
		to which the sched_setaffinity task is attached.

Even though, from the perspective of location [P], both B. and C.
happened before X., still from the perspective of location [Q] the
rereading in Y. could return the value the cpuset cpus_allowed had
before the write in A.  This could result in a task running with
a cpus_allowed that was totally outside its cpusets cpus_allowed.

I will grant that this is a narrow window.  I won't loose much sleep
over it.

>  - uses a priority heap to pick the processes to act on, based on start time

This adds a fair bit of code and complexity, relative to my patch.
This I do loose more sleep over.  There has to be a compelling
reason for doing this.

The point that David raises, regarding the interaction of this with
hotplug, seems to be a compelling reason for doing -something-
different than my patch proposal.

I don't know yet if it compels us to this much code, however.

Any chance you could provide a patch that works against cgroups?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

  reply	other threads:[~2007-10-16  2:34 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-15  7:11 [RFC] cpuset update_cgroup_cpus_allowed Paul Jackson
2007-10-15 18:49 ` David Rientjes
2007-10-16  2:32   ` Paul Jackson
2007-10-16  6:07   ` Paul Jackson
2007-10-16  6:21     ` David Rientjes
2007-10-16  9:16       ` Paul Jackson
2007-10-16 18:27         ` David Rientjes
2007-10-16 23:14           ` Paul Jackson
2007-10-15 21:24 ` Paul Menage
2007-10-16  0:16   ` Paul Jackson
     [not found]     ` <20071015171636.6213bf43.pj-sJ/iWh9BUns@public.gmane.org>
2007-10-16  0:20       ` Paul Menage
2007-10-16  0:20         ` Paul Menage
2007-10-16  2:34         ` Paul Jackson [this message]
2007-10-16  5:12           ` Paul Menage
2007-10-16  5:20             ` Paul Jackson
2007-10-16 10:07           ` Paul Menage

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071015193439.fe67bc4d.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=clg@fr.ibm.com \
    --cc=containers@lists.osdl.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=rientjes@google.com \
    --cc=serue@us.ibm.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.