On 10/19/2015 07:23 PM, Carlos O'Donell wrote:

>> The current situation, briefly stated, is this: glibc tries to guess the
>> kernel CPU set size and rejects attempts to specify an affinity mask
>> which is larger than that, but it does not work, and glibc and the
>> kernel still silently accept CPU affinity masks with invalid bits,
>> without returning an error.  The glibc check does not provide any value
>> to applications, it just adds pointless complexity to the library.
>> Therefore, I want to remove it from glibc.
> 
> The point of the code you want to remove was to detect the case where
> the user set CPU bits outside of the maximum possible number of supported
> CPUs. AFAIK today that value is CONFIG_NR_CPUS and the various variables
> that derive from that value.

CONFIG_NR_CPUS is the absolute maximum for a specific kernel (available
at run time in /sys/devices/system/cpu/kernel_max).  The kernel lowers
the observable value if it detects the system cannot support more than a
specific number of CPUs.  This is the kernel-internal nr_cpu_ids
variable.  I don't think its value is directly exported, but it can
currently be derived from /sys/devices/system/cpu/possible.

(The difficulty of obtaining this value, and the tendency of the kernel
to replace hard compile-time limits with run-time configuration options,
makes me think it is unwise to expose these values through sysconf.)

> Can you elaborate some more on any of the false negative cases you think
> might impact user applications? For example what happens if you use the
> stock cpu_set_t size? It would seem to me that such a use keeps working
> and there is no change.

Yes, applications which worked before continue to work.  But you can now
specify an all-ones mask you have allocated, and glibc will not reject
it because it has set bits beyond the value it guessed for nr_cpu_ids.

>> Remove CPU set size checking from affinity functions [BZ #19143]
>>
>> With current kernel versions, the check does not reliably detect that
>> unavailable CPUs are requested, for these reasons:
>>
>> (1) The kernel will silently ignore non-allowed CPUs.
> 
> You mean to say that if sched_setaffinity is called with a CPU mask 
> bit set to enabled, but that cpu is not allowed for the process, then
> it will ignore the setting?

Yes, that is what the kernel does.

> This requires you run sched_getaffinity to
> verify what CPUs you're actually set to run on?

Yes, if you care about this detail.

> Is this because the
> cpuset mechanism is merged with sched_setaffinity and overrides it?

As far as I can tell, yes.  I do not know where the kernel gets the
other mask from, but it seems cgroups-related (hence the Cc:), and it
does and AND on those two masks.

>> (3) The existing probing code assumes that the CPU mask size is a
>>     power of two and at least 1024.  Neither it has to be a power
>>     of two, nor is the minimum possible value 1024, so the value
>>     determined is often too large, resulting in incorrect false
>>     negatives.
> 
> Could you explain those "false negative" cases again?

I tried to make this clearer in the revised commit message of the
attached patch.

> The goal is to keep nptl/ free from linux-isms, and AFAICT your test is
> indeed free of any linux-specific features since your code for finding
> the size of the cpu mask is generic. Is there anything I might have missed
> that would make your test linux-specific? Keep in mind that we share nptl/
> with the nacl port.

Good point.  sched_getcpu is not universally available, so I had to move
the tests to sysdeps/unix/sysv/linux.

I noticed that tst-getcpu was failing (bug 19164), so I added another
sched_setaffinity test variant that supersedes it.

>> +* sched_setaffinity, pthread_setaffinity_np no longer attempt to guess the
>> +  kernel-internal CPU set size.  This means that requests that change the
>> +  CPU affinity which failed before will now succeed.  Applications that need
> 
> Please provide at least one example of a failure which now succeeds.

I've updated the NEWS entry.

>> +/* We wave two loops running for two seconds each.  */
>> +#define TIMEOUT 8
> 
> Why two seconds?
> 
> Why two threads?
> 
> If the value of 2 seconds is arbitrary please state so in 
> a comment such that future reviewers can adjust it as they
> see fit without having to review the history of the value.

Should be clearer in the new version.

>> +++ b/posix/tst-affinity.c
>> @@ -0,0 +1,254 @@
> 
> Needs a one line test description with BZ#.

I did not include the bug number because it is a generic, non-regression
test.

>> +static int
>> +find_set_size (void)
>> +{
>> +  /* We need to use multiples of 64 because otherwise, CPU_ALLOC
>> +     over-allocates, and and we do not see all bits returned by the
>> +     kernel.  */
> 
> How does CPU_ALLOC over-allocating result in not seeing bits from 
> the kernel? Is this because you get over-allocation in CPU_ALLOC,
> but your own external count of num_cpus would be lower and it's
> num_cpus you return? Can't you rely on the result of CPU_ALLOC_SIZE
> and return that?

See find_last_cpu.  It is tricky to determine the actual size of a CPU
set just based on the macros.  I think I got it right.

It turns out the comment was incorrect, I changed the code.

> Why do we do this instead of calling sysconf to get the number
> of CPUs?

sysconf currently does not give us the proper number, and I really don't
think applications should rely on it.  This is a separate conversation,
IMHO.  I added a comment.

>> +
>> +static bool
>> +test_size (const struct conf *conf, size_t size)
>> +{
> 
> Should print PASS:/FAIL: prefix to make grepping easier.

I added info/warning/error prefixes instead.

> Should do what other tests do e.g. err |= and run each
> test on a distinct line. Makes it easier to add tests and
> disable tests while debugging.

Ah, right.

I have added tests which cover additional scenarios (cross-process
sched_* calls and cross-thread pthread_sched* calls).

Florian