On 10/19/2015 07:23 PM, Carlos O'Donell wrote: >> The current situation, briefly stated, is this: glibc tries to guess the >> kernel CPU set size and rejects attempts to specify an affinity mask >> which is larger than that, but it does not work, and glibc and the >> kernel still silently accept CPU affinity masks with invalid bits, >> without returning an error. The glibc check does not provide any value >> to applications, it just adds pointless complexity to the library. >> Therefore, I want to remove it from glibc. > > The point of the code you want to remove was to detect the case where > the user set CPU bits outside of the maximum possible number of supported > CPUs. AFAIK today that value is CONFIG_NR_CPUS and the various variables > that derive from that value. CONFIG_NR_CPUS is the absolute maximum for a specific kernel (available at run time in /sys/devices/system/cpu/kernel_max). The kernel lowers the observable value if it detects the system cannot support more than a specific number of CPUs. This is the kernel-internal nr_cpu_ids variable. I don't think its value is directly exported, but it can currently be derived from /sys/devices/system/cpu/possible. (The difficulty of obtaining this value, and the tendency of the kernel to replace hard compile-time limits with run-time configuration options, makes me think it is unwise to expose these values through sysconf.) > Can you elaborate some more on any of the false negative cases you think > might impact user applications? For example what happens if you use the > stock cpu_set_t size? It would seem to me that such a use keeps working > and there is no change. Yes, applications which worked before continue to work. But you can now specify an all-ones mask you have allocated, and glibc will not reject it because it has set bits beyond the value it guessed for nr_cpu_ids. >> Remove CPU set size checking from affinity functions [BZ #19143] >> >> With current kernel versions, the check does not reliably detect that >> unavailable CPUs are requested, for these reasons: >> >> (1) The kernel will silently ignore non-allowed CPUs. > > You mean to say that if sched_setaffinity is called with a CPU mask > bit set to enabled, but that cpu is not allowed for the process, then > it will ignore the setting? Yes, that is what the kernel does. > This requires you run sched_getaffinity to > verify what CPUs you're actually set to run on? Yes, if you care about this detail. > Is this because the > cpuset mechanism is merged with sched_setaffinity and overrides it? As far as I can tell, yes. I do not know where the kernel gets the other mask from, but it seems cgroups-related (hence the Cc:), and it does and AND on those two masks. >> (3) The existing probing code assumes that the CPU mask size is a >> power of two and at least 1024. Neither it has to be a power >> of two, nor is the minimum possible value 1024, so the value >> determined is often too large, resulting in incorrect false >> negatives. > > Could you explain those "false negative" cases again? I tried to make this clearer in the revised commit message of the attached patch. > The goal is to keep nptl/ free from linux-isms, and AFAICT your test is > indeed free of any linux-specific features since your code for finding > the size of the cpu mask is generic. Is there anything I might have missed > that would make your test linux-specific? Keep in mind that we share nptl/ > with the nacl port. Good point. sched_getcpu is not universally available, so I had to move the tests to sysdeps/unix/sysv/linux. I noticed that tst-getcpu was failing (bug 19164), so I added another sched_setaffinity test variant that supersedes it. >> +* sched_setaffinity, pthread_setaffinity_np no longer attempt to guess the >> + kernel-internal CPU set size. This means that requests that change the >> + CPU affinity which failed before will now succeed. Applications that need > > Please provide at least one example of a failure which now succeeds. I've updated the NEWS entry. >> +/* We wave two loops running for two seconds each. */ >> +#define TIMEOUT 8 > > Why two seconds? > > Why two threads? > > If the value of 2 seconds is arbitrary please state so in > a comment such that future reviewers can adjust it as they > see fit without having to review the history of the value. Should be clearer in the new version. >> +++ b/posix/tst-affinity.c >> @@ -0,0 +1,254 @@ > > Needs a one line test description with BZ#. I did not include the bug number because it is a generic, non-regression test. >> +static int >> +find_set_size (void) >> +{ >> + /* We need to use multiples of 64 because otherwise, CPU_ALLOC >> + over-allocates, and and we do not see all bits returned by the >> + kernel. */ > > How does CPU_ALLOC over-allocating result in not seeing bits from > the kernel? Is this because you get over-allocation in CPU_ALLOC, > but your own external count of num_cpus would be lower and it's > num_cpus you return? Can't you rely on the result of CPU_ALLOC_SIZE > and return that? See find_last_cpu. It is tricky to determine the actual size of a CPU set just based on the macros. I think I got it right. It turns out the comment was incorrect, I changed the code. > Why do we do this instead of calling sysconf to get the number > of CPUs? sysconf currently does not give us the proper number, and I really don't think applications should rely on it. This is a separate conversation, IMHO. I added a comment. >> + >> +static bool >> +test_size (const struct conf *conf, size_t size) >> +{ > > Should print PASS:/FAIL: prefix to make grepping easier. I added info/warning/error prefixes instead. > Should do what other tests do e.g. err |= and run each > test on a distinct line. Makes it easier to add tests and > disable tests while debugging. Ah, right. I have added tests which cover additional scenarios (cross-process sched_* calls and cross-thread pthread_sched* calls). Florian