All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jack Steiner <steiner@sgi.com>
To: mingo@redhat.com, hpa@zytor.com, linux-kernel@vger.kernel.org,
	torvalds@linux-foundation.org, travis@sgi.com,
	peterz@infradead.org, drepper@redhat.com, rja@sgi.com,
	sharyath@in.ibm.com, akpm@linux-foundation.org,
	tglx@linutronix.de, kosaki.motohiro@jp.fujitsu.com,
	mingo@elte.hu
Cc: linux-tip-commits@vger.kernel.org
Subject: Re: [tip:sched/urgent] sched: sched_getaffinity(): Allow less than NR_CPUS length
Date: Mon, 15 Mar 2010 11:04:54 -0500	[thread overview]
Message-ID: <20100315160454.GA8211@sgi.com> (raw)
In-Reply-To: <tip-cd3d8031eb4311e516329aee03c79a08333141f1@git.kernel.org>

On Mon, Mar 15, 2010 at 07:43:02AM +0000, tip-bot for KOSAKI Motohiro wrote:
> Commit-ID:  cd3d8031eb4311e516329aee03c79a08333141f1
> Gitweb:     http://git.kernel.org/tip/cd3d8031eb4311e516329aee03c79a08333141f1
> Author:     KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> AuthorDate: Fri, 12 Mar 2010 16:15:36 +0900
> Committer:  Ingo Molnar <mingo@elte.hu>
> CommitDate: Mon, 15 Mar 2010 08:28:44 +0100

...
> IOW we hope they are not annoyed by this issue ...

The change looks ok but I can't reproduce the problem.

I'm running on a distro kernel that has NR_CPUS=4096.  Glibc has also has a
definition of __CPU_SETSIZE (I assume this change was made by the distro but
am not certain):

    sched.c
    	...
	#if defined _SCHED_H && !defined __cpu_set_t_defined
	# define __cpu_set_t_defined
	/* Size definition for CPU sets.  */
	# define __CPU_SETSIZE  4096
	# define __NCPUBITS     (8 * sizeof (__cpu_mask))

Your test program runs ok:

	% strace t
	...
	sched_getaffinity(0, 512,  { ffff, 0, 0, 0, 0, 0, 0, 0 }) = 64



Also note that we've run on IA64 systems with NR_CPUS=4096 for several years w/o
hitting any problems.

Bottom line. I don't think the change will affect us.


> 
> sched: sched_getaffinity(): Allow less than NR_CPUS length
> 
> [ Note, this commit changes the syscall ABI for > 1024 CPUs systems. ]
> 
> Recently, some distro decided to use NR_CPUS=4096 for mysterious reasons.
> Unfortunately, glibc sched interface has the following definition:
> 
> 	# define __CPU_SETSIZE  1024
> 	# define __NCPUBITS     (8 * sizeof (__cpu_mask))
> 	typedef unsigned long int __cpu_mask;
> 	typedef struct
> 	{
> 	  __cpu_mask __bits[__CPU_SETSIZE / __NCPUBITS];
> 	} cpu_set_t;
> 
> It mean, if NR_CPUS is bigger than 1024, cpu_set_t makes an
> ABI issue ...
> 
> More recently, Sharyathi Nagesh reported following test program makes
> misterious syscall failure:
> 
>  -----------------------------------------------------------------------
>  #define _GNU_SOURCE
>  #include<stdio.h>
>  #include<errno.h>
>  #include<sched.h>
> 
>  int main()
>  {
>      cpu_set_t set;
>      if (sched_getaffinity(0, sizeof(cpu_set_t), &set) < 0)
>          printf("\n Call is failing with:%d", errno);
>  }
>  -----------------------------------------------------------------------
> 
> Because the kernel assumes len argument of sched_getaffinity() is bigger
> than NR_CPUS. But now it is not correct.
> 
> Now we are faced with the following annoying dilemma, due to
> the limitations of the glibc interface built in years ago:
> 
>  (1) if we change glibc's __CPU_SETSIZE definition, we lost
>      binary compatibility of _all_ application.
> 
>  (2) if we don't change it, we also lost binary compatibility of
>      Sharyathi's use case.
> 
> Then, I would propse to change the rule of the len argument of
> sched_getaffinity().
> 
> Old:
> 	len should be bigger than NR_CPUS
> New:
> 	len should be bigger than maximum possible cpu id
> 
> This creates the following behavior:
> 
>  (A) In the real 4096 cpus machine, the above test program still
>      return -EINVAL.
> 
>  (B) NR_CPUS=4096 but the machine have less than 1024 cpus (almost
>      all machines in the world), the above can run successfully.
> 
> Fortunatelly, BIG SGI machine is mainly used for HPC use case. It means
> they can rebuild their programs.
> 
> IOW we hope they are not annoyed by this issue ...
> 
> Reported-by: Sharyathi Nagesh <sharyath@in.ibm.com>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Acked-by: Ulrich Drepper <drepper@redhat.com>
> Acked-by: Peter Zijlstra <peterz@infradead.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Jack Steiner <steiner@sgi.com>
> Cc: Russ Anderson <rja@sgi.com>
> Cc: Mike Travis <travis@sgi.com>
> LKML-Reference: <20100312161316.9520.A69D9226@jp.fujitsu.com>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  kernel/sched.c |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 9ab3cd7..6eaef3d 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4902,7 +4902,9 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len,
>  	int ret;
>  	cpumask_var_t mask;
>  
> -	if (len < cpumask_size())
> +	if (len < nr_cpu_ids)
> +		return -EINVAL;
> +	if (len & (sizeof(unsigned long)-1))
>  		return -EINVAL;
>  
>  	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> @@ -4910,10 +4912,12 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len,
>  
>  	ret = sched_getaffinity(pid, mask);
>  	if (ret == 0) {
> -		if (copy_to_user(user_mask_ptr, mask, cpumask_size()))
> +		int retlen = min(len, cpumask_size());
> +
> +		if (copy_to_user(user_mask_ptr, mask, retlen))
>  			ret = -EFAULT;
>  		else
> -			ret = cpumask_size();
> +			ret = retlen;
>  	}
>  	free_cpumask_var(mask);
>  

  reply	other threads:[~2010-03-15 16:05 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-12  7:15 [PATCH] sched: sched_getaffinity() allow less than NR_CPUS length KOSAKI Motohiro
2010-03-12 16:08 ` Ulrich Drepper
2010-03-15  7:43 ` [tip:sched/urgent] sched: sched_getaffinity(): Allow " tip-bot for KOSAKI Motohiro
2010-03-15 16:04   ` Jack Steiner [this message]
2010-03-15 16:17     ` Ulrich Drepper
2010-03-16 17:10   ` Ingo Molnar
2010-03-17  0:36     ` KOSAKI Motohiro
2010-03-17 10:03       ` [tip:sched/urgent] sched: Use proper type in sched_getaffinity() tip-bot for KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100315160454.GA8211@sgi.com \
    --to=steiner@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=drepper@redhat.com \
    --cc=hpa@zytor.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rja@sgi.com \
    --cc=sharyath@in.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=travis@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.