* getting processor numbers
@ 2007-04-03 16:54 Ulrich Drepper
2007-04-03 17:30 ` linux-os (Dick Johnson)
` (4 more replies)
0 siblings, 5 replies; 56+ messages in thread
From: Ulrich Drepper @ 2007-04-03 16:54 UTC (permalink / raw)
To: Linux Kernel, Andrew Morton
[-- Attachment #1: Type: text/plain, Size: 1710 bytes --]
More and more code depends on knowing the number of processors in the
system to efficiently scale the code. E.g., in OpenMP it is used by
default to determine how many threads to create. Creating more threads
than there are processors/cores doesn't make sense.
glibc for a long time provides functionality to retrieve the number
through sysconf() and this is what fortunately most programs use. The
problem is that we are currently using /proc/cpuinfo since this is all
there was available at that time. Creating /proc/cpuinfo takes the
kernel quite a long time, unfortunately (I think Jakub said it is mainly
the interrupt information).
The alternative today is to use /sys/devices/system/cpu and count the
number of cpu* directories in it. This is somewhat faster. But there
would be another possibility: simply stat /sys/devices/system/cpu and
use st_nlink - 2.
This last step unfortunately it made impossible by recent changes:
http://article.gmane.org/gmane.linux.kernel/413178
I would like to propose changing that patch, move the sched_*
pseudo-files in some other directly and permanently ban putting any new
file into /sys/devices/system/cpu.
To get some numbers, you can try
http://people.redhat.com/drepper/nproc-timing.c
The numbers I see on x86-64:
cpuinfo 10145810 cycles for 100 accesses
readdir /sys 3113870 cycles for 100 accesses
stat /sys 741070 cycles for 100 accesses
Note that for the first two methods I skipped the actual parsing part.
This means in the real solution the gap between those two and the simple
stat() call is even bigger.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 251 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread* Re: getting processor numbers 2007-04-03 16:54 getting processor numbers Ulrich Drepper @ 2007-04-03 17:30 ` linux-os (Dick Johnson) 2007-04-03 17:37 ` Ulrich Drepper 2007-04-03 17:56 ` Dr. David Alan Gilbert ` (3 subsequent siblings) 4 siblings, 1 reply; 56+ messages in thread From: linux-os (Dick Johnson) @ 2007-04-03 17:30 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Linux Kernel, Andrew Morton On Tue, 3 Apr 2007, Ulrich Drepper wrote: > More and more code depends on knowing the number of processors in the > system to efficiently scale the code. E.g., in OpenMP it is used by > default to determine how many threads to create. Creating more threads > than there are processors/cores doesn't make sense. > > glibc for a long time provides functionality to retrieve the number > through sysconf() and this is what fortunately most programs use. The > problem is that we are currently using /proc/cpuinfo since this is all > there was available at that time. Creating /proc/cpuinfo takes the > kernel quite a long time, unfortunately (I think Jakub said it is mainly > the interrupt information). > > The alternative today is to use /sys/devices/system/cpu and count the > number of cpu* directories in it. This is somewhat faster. But there > would be another possibility: simply stat /sys/devices/system/cpu and > use st_nlink - 2. > > This last step unfortunately it made impossible by recent changes: > > http://article.gmane.org/gmane.linux.kernel/413178 > > I would like to propose changing that patch, move the sched_* > pseudo-files in some other directly and permanently ban putting any new > file into /sys/devices/system/cpu. > > To get some numbers, you can try > > http://people.redhat.com/drepper/nproc-timing.c > > The numbers I see on x86-64: > > cpuinfo 10145810 cycles for 100 accesses > readdir /sys 3113870 cycles for 100 accesses > stat /sys 741070 cycles for 100 accesses > > Note that for the first two methods I skipped the actual parsing part. > This means in the real solution the gap between those two and the simple > stat() call is even bigger. > > -- > ÿÿ Ulrich Drepper ÿÿ Red Hat, Inc. ÿÿ 444 Castro St ÿÿ Mountain View, CA ÿÿ > > Shouldn't it just be another system call? 223 is currently unused. You could fill that up with __NR_nr_cpus. The value already exists in the kernel. Cheers, Dick Johnson Penguin : Linux version 2.6.16.24 on an i686 machine (5592.65 BogoMips). New book: http://www.AbominableFirebug.com/ _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:30 ` linux-os (Dick Johnson) @ 2007-04-03 17:37 ` Ulrich Drepper 0 siblings, 0 replies; 56+ messages in thread From: Ulrich Drepper @ 2007-04-03 17:37 UTC (permalink / raw) To: linux-os (Dick Johnson); +Cc: Linux Kernel, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 540 bytes --] linux-os (Dick Johnson) wrote: > Shouldn't it just be another system call? 223 is currently unused. You > could fill that up with __NR_nr_cpus. The value already exists in > the kernel. You forget about Linus' credo "there shall be no sysconf-like syscall". I'd be all for sys_sysconf or even the limited sys_nr_cpus although ideally then we'd have two syscalls (probed CPUs, active CPUs, in which case sys_sysconf is the better choice). -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 16:54 getting processor numbers Ulrich Drepper 2007-04-03 17:30 ` linux-os (Dick Johnson) @ 2007-04-03 17:56 ` Dr. David Alan Gilbert 2007-04-03 18:11 ` Andi Kleen ` (2 subsequent siblings) 4 siblings, 0 replies; 56+ messages in thread From: Dr. David Alan Gilbert @ 2007-04-03 17:56 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Linux Kernel, Andrew Morton * Ulrich Drepper (drepper@redhat.com) wrote: > glibc for a long time provides functionality to retrieve the number > through sysconf() and this is what fortunately most programs use. The > problem is that we are currently using /proc/cpuinfo since this is all > there was available at that time. Creating /proc/cpuinfo takes the > kernel quite a long time, unfortunately (I think Jakub said it is mainly > the interrupt information). It's not only expensive to create, it's expensive and annoying to parse; I don't think it is vaguely consistent accross different architectures not to mention kernel versions. Dave -- -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \ \ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex / \ _________________________|_____ http://www.treblig.org |_______/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 16:54 getting processor numbers Ulrich Drepper 2007-04-03 17:30 ` linux-os (Dick Johnson) 2007-04-03 17:56 ` Dr. David Alan Gilbert @ 2007-04-03 18:11 ` Andi Kleen 2007-04-03 17:17 ` Ulrich Drepper 2007-04-03 19:15 ` Davide Libenzi 2007-04-03 20:16 ` Andrew Morton 4 siblings, 1 reply; 56+ messages in thread From: Andi Kleen @ 2007-04-03 18:11 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Linux Kernel, Andrew Morton Ulrich Drepper <drepper@redhat.com> writes: > More and more code depends on knowing the number of processors in the > system to efficiently scale the code. E.g., in OpenMP it is used by > default to determine how many threads to create. There are more uses for it. > Creating more threads > than there are processors/cores doesn't make sense. There was a proposal some time ago to put that into the ELF aux vector Unfortunately there was disagreement on what information to put there exactly (full topology, only limited numbers etc.) My proposal was number of CPUs, number of cores, number of nodes as three 16 bit numbers. -Andi ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 18:11 ` Andi Kleen @ 2007-04-03 17:17 ` Ulrich Drepper 2007-04-03 17:22 ` Alan Cox 2007-04-03 17:27 ` Andi Kleen 0 siblings, 2 replies; 56+ messages in thread From: Ulrich Drepper @ 2007-04-03 17:17 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux Kernel, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 661 bytes --] Andi Kleen wrote: > There was a proposal some time ago to put that into the ELF aux vector > Unfortunately there was disagreement on what information to put > there exactly (full topology, only limited numbers etc.) Topology, yes, I'm likely in favor of it. Processor number: no. Unless you want to rip out hotpluging. I'm certainly in favor of that, it creates huge problems for no real benefit for the common use cases. But as it is, the number of processors is not necessarily constant over the lifetime of a process. The machine architecture is. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:17 ` Ulrich Drepper @ 2007-04-03 17:22 ` Alan Cox 2007-04-03 17:30 ` Andi Kleen 2007-04-03 17:27 ` Andi Kleen 1 sibling, 1 reply; 56+ messages in thread From: Alan Cox @ 2007-04-03 17:22 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Andi Kleen, Linux Kernel, Andrew Morton > benefit for the common use cases. But as it is, the number of > processors is not necessarily constant over the lifetime of a process. > The machine architecture is. Not once you have migration capable virtualisation it isnt. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:22 ` Alan Cox @ 2007-04-03 17:30 ` Andi Kleen 2007-04-03 20:24 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 56+ messages in thread From: Andi Kleen @ 2007-04-03 17:30 UTC (permalink / raw) To: Alan Cox; +Cc: Ulrich Drepper, Andi Kleen, Linux Kernel, Andrew Morton On Tue, Apr 03, 2007 at 06:22:41PM +0100, Alan Cox wrote: > > benefit for the common use cases. But as it is, the number of > > processors is not necessarily constant over the lifetime of a process. > > The machine architecture is. > > Not once you have migration capable virtualisation it isnt. Migration is fundamentally incompatible with many CPU optimizations. But that's not a reason to not optimize anymore. But I guess luckily most migration users will be able to live with a little decreased performance after it. -Andi ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:30 ` Andi Kleen @ 2007-04-03 20:24 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 56+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-03 20:24 UTC (permalink / raw) To: Andi Kleen; +Cc: Alan Cox, Ulrich Drepper, Linux Kernel, Andrew Morton Andi Kleen wrote: > Migration is fundamentally incompatible with many CPU optimizations. > But that's not a reason to not optimize anymore. > I've been thinking about ways in which Xen could provide the current vcpu->cpu map to guest domains. Obviously this would change over time, but it could remain current enough to be useful. > But I guess luckily most migration users will be able to live > with a little decreased performance after it. > At least in the Xen case, the source and target machines need to be fairly similar in architecture (can't deal with vastly different CPU types, for example). J ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:17 ` Ulrich Drepper 2007-04-03 17:22 ` Alan Cox @ 2007-04-03 17:27 ` Andi Kleen 2007-04-03 17:30 ` Ulrich Drepper 1 sibling, 1 reply; 56+ messages in thread From: Andi Kleen @ 2007-04-03 17:27 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Andi Kleen, Linux Kernel, Andrew Morton On Tue, Apr 03, 2007 at 10:17:15AM -0700, Ulrich Drepper wrote: > Andi Kleen wrote: > > There was a proposal some time ago to put that into the ELF aux vector > > Unfortunately there was disagreement on what information to put > > there exactly (full topology, only limited numbers etc.) > > Topology, yes, I'm likely in favor of it. What topology and what use case? > Processor number: no. Unless you want to rip out hotpluging. I'm Topology is dependent on the number of CPUs. Hot plugging is a completely orthogonal problem. Even your original proposal wouldn't address it. Mine doesn't neither, because i suspect most programs won't care. If it's addressed it could work on top of it. The aux vector to get the information quickly at program startup and later updates can get it from /sys. If some program starts caring we would need to implement some notification mechanism (that would be possible), but it might be hard to fit into glibc because you don't have a event loop. -Andi ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:27 ` Andi Kleen @ 2007-04-03 17:30 ` Ulrich Drepper 2007-04-03 17:35 ` Andi Kleen 2007-04-03 17:44 ` Siddha, Suresh B 0 siblings, 2 replies; 56+ messages in thread From: Ulrich Drepper @ 2007-04-03 17:30 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux Kernel, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 544 bytes --] Andi Kleen wrote: > Topology is dependent on the number of CPUs. Not all of it. > Hot plugging is a completely orthogonal problem. Even your original > proposal wouldn't address it. Nonsense. Reading /proc/cpuinfo or /sys/devices/system/cpu reflects the current CPU count. The information read would (and is not) cached, it's re-read every time. We might add very limited caching (for a few seconds) but that's as much as we can go. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:30 ` Ulrich Drepper @ 2007-04-03 17:35 ` Andi Kleen 2007-04-03 17:45 ` Ulrich Drepper 2007-04-03 17:44 ` Siddha, Suresh B 1 sibling, 1 reply; 56+ messages in thread From: Andi Kleen @ 2007-04-03 17:35 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Andi Kleen, Linux Kernel, Andrew Morton On Tue, Apr 03, 2007 at 10:30:47AM -0700, Ulrich Drepper wrote: > Andi Kleen wrote: > > Topology is dependent on the number of CPUs. > > Not all of it. What is not? > We might add very limited caching (for a few > seconds) but that's as much as we can go. Hmm, e.g. in OpenMP you would have another thread that just reads /proc/cpuinfo in a loop and starts new threads on new CPUs? That sounds ...... "expensive" The other use case in glibc I know of is the Opteron optimized memcpy which can use different functions depending on the number of cores. But having a separate thread regularly rereading cpuinfo for memcpy also sounds quite crazy. -Andi ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:35 ` Andi Kleen @ 2007-04-03 17:45 ` Ulrich Drepper 2007-04-03 17:58 ` Andi Kleen 0 siblings, 1 reply; 56+ messages in thread From: Ulrich Drepper @ 2007-04-03 17:45 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux Kernel, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 1198 bytes --] Andi Kleen wrote: >>> Topology is dependent on the number of CPUs. >> Not all of it. > > What is not? Memory banks can exist without a CPU present. The places where you can plug in memory don't change and so the memory hierarchy can be described. > Hmm, e.g. in OpenMP you would have another thread that just reads /proc/cpuinfo > in a loop and starts new threads on new CPUs? > > That sounds ...... "expensive" That's the cost of doing business. There is an inexpensive solution: finally make the vdso concept a bit more flexible. You could add a vdso call to get the processor count. The vdso code itself can use a data page mapped in from the kernel. This page (read-only at userlevel) would contain global information such as processor count and topology. But we're getting IMO off topic here. That's a separate and far more complicated issue. Here we now have the concrete issue that determining the CPU count is terribly expensive and there is a simple proposal to make it faster by keeping /sys/devices/system/cpu/ free from anything but cpu* directories. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:45 ` Ulrich Drepper @ 2007-04-03 17:58 ` Andi Kleen 2007-04-03 18:05 ` Ulrich Drepper 0 siblings, 1 reply; 56+ messages in thread From: Andi Kleen @ 2007-04-03 17:58 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Andi Kleen, Linux Kernel, Andrew Morton On Tue, Apr 03, 2007 at 10:45:35AM -0700, Ulrich Drepper wrote: > Andi Kleen wrote: > >>> Topology is dependent on the number of CPUs. > >> Not all of it. > > > > What is not? > > Memory banks can exist without a CPU present. The places where you can > plug in memory don't change and so the memory hierarchy can be described. There are systems that support node hotplug, like Altix or the larger IBM or Unisys x86 systems. Basically they are a bunch of smaller systems connected with cables running a cache coherent network protocol. With that memory can appear (and disappear but we don't handle that yet) unexpectedly. That said we might have some idea of that in advance -- it can be described in ACPI SRAT -- but the trouble is that many quite ordinary x86 server systems always describe hotplug zones even though it is unlikely they will ever get any. Trusting that can be quite inefficient. > There is an inexpensive solution: finally make the vdso concept a bit > more flexible. You could add a vdso call to get the processor count. > The vdso code itself can use a data page mapped in from the kernel. The ELF aux vector is exactly that already. > This page (read-only at userlevel) would contain global information such > as processor count and topology. You would still need an event notification mechanism, won't you? > > > But we're getting IMO off topic here. That's a separate and far more > complicated issue. Yes I agree. Hotplug is best ignored for now > Here we now have the concrete issue that determining the CPU count is > terribly expensive and there is a simple proposal to make it faster by > keeping /sys/devices/system/cpu/ free from anything but cpu* directories. The cost will be still large. Accessing sysfs will be never cheap. For once anything going through the VFS tens to take a two sometimes three digit number of locks. If you want it cheap look for some other way. -Andi ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:58 ` Andi Kleen @ 2007-04-03 18:05 ` Ulrich Drepper 2007-04-03 18:11 ` Andi Kleen 0 siblings, 1 reply; 56+ messages in thread From: Ulrich Drepper @ 2007-04-03 18:05 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux Kernel, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 1405 bytes --] Andi Kleen wrote: >> There is an inexpensive solution: finally make the vdso concept a bit >> more flexible. You could add a vdso call to get the processor count. >> The vdso code itself can use a data page mapped in from the kernel. > > The ELF aux vector is exactly that already. No. The aux vector cannot be changed after the process is changed. The memory belong to the process and not the kernel. It must be possible at any time to get the correct information even if the system changed. >> This page (read-only at userlevel) would contain global information such >> as processor count and topology. > > You would still need an event notification mechanism, won't you? No, why? The vdso call would be so inexpensive (just a simple function call) that it can be done whenever a topology-based decision has to be made. Use cookies to determine whether nothing has been changed since the last call etc. > The cost will be still large. Accessing sysfs will be never cheap. > For once anything going through the VFS tens to take a two sometimes > three digit number of locks. That stat solution actually ain't that bad. It takes ~7400 cycles on my machine. > If you want it cheap look for some other way. Well, who's brace enough to submit sys_sysconf() again? -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 18:05 ` Ulrich Drepper @ 2007-04-03 18:11 ` Andi Kleen 2007-04-03 18:21 ` Ulrich Drepper 0 siblings, 1 reply; 56+ messages in thread From: Andi Kleen @ 2007-04-03 18:11 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Andi Kleen, Linux Kernel, Andrew Morton On Tue, Apr 03, 2007 at 11:05:49AM -0700, Ulrich Drepper wrote: > Andi Kleen wrote: > >> There is an inexpensive solution: finally make the vdso concept a bit > >> more flexible. You could add a vdso call to get the processor count. > >> The vdso code itself can use a data page mapped in from the kernel. > > > > The ELF aux vector is exactly that already. > > No. The aux vector cannot be changed after the process is changed. The > memory belong to the process and not the kernel. It must be possible at > any time to get the correct information even if the system changed. That's probably debatable, but ok. I would be opposed for adding another page per process at least because the per process memory footprint in Linux is imho already too large. > >> This page (read-only at userlevel) would contain global information such > >> as processor count and topology. > > > > You would still need an event notification mechanism, won't you? > > No, why? The vdso call would be so inexpensive (just a simple function > call) that it can be done whenever a topology-based decision has to be > made. Use cookies to determine whether nothing has been changed since > the last call etc. But how would that mix with the OpenMP use case where you have thread pools that normally don't make decisions afer startup, but just stay around? I think for those you would need events of some sort to start or remove threads as needed. Asking the kernel every time you submit some work to the threads would probably not fly. > > If you want it cheap look for some other way. > > Well, who's brace enough to submit sys_sysconf() again? If there's a good use case fine for me. However I suspect it's either "slow is ok" or "want it very fast" where even a syscall would hurt. -Andi ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 18:11 ` Andi Kleen @ 2007-04-03 18:21 ` Ulrich Drepper 0 siblings, 0 replies; 56+ messages in thread From: Ulrich Drepper @ 2007-04-03 18:21 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux Kernel, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 2049 bytes --] Andi Kleen wrote: > I would be opposed for adding another page per process at least > because the per process memory footprint in Linux is imho already > too large. That's a single page shared by all threads on the system. Or make this a page per NUMA node. And if the number of threads is larger than the share counter for page, create a few more. But in general the extra overhead would be minimal from the memory consumption POV. >> No, why? The vdso call would be so inexpensive (just a simple function >> call) that it can be done whenever a topology-based decision has to be >> made. Use cookies to determine whether nothing has been changed since >> the last call etc. > > But how would that mix with the OpenMP use case where you have > thread pools that normally don't make decisions afer startup, but > just stay around? There is a different between having threads in the thread pool and actually using them. For every #omp loop the number of processors is checked again and this is the number of threads from the pool which is used. > I think for those you would need events of some sort > to start or remove threads as needed. We need no events if determining the number of processors is cheap. There is really no reason why it shouldn't. Restarting all the threads is not the cheapest operation so a single syscall (sys_sysconf, etc) does not increase the cost a lot. > If there's a good use case fine for me. However I suspect it's > either "slow is ok" or "want it very fast" where even a syscall > would hurt. Ideally, as I said, an optimized vdso call is best. But a syscall is OK. The nice thing about the vdso is that for now one could simply implement it using a syscall and in future add optimizations to avoid the kernel entry if possible. A single syscall is two order of magnitude better than the best solution available today. This is my main concern right now. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:30 ` Ulrich Drepper 2007-04-03 17:35 ` Andi Kleen @ 2007-04-03 17:44 ` Siddha, Suresh B 2007-04-03 17:59 ` Ulrich Drepper 2007-04-03 19:55 ` Ulrich Drepper 1 sibling, 2 replies; 56+ messages in thread From: Siddha, Suresh B @ 2007-04-03 17:44 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Andi Kleen, Linux Kernel, Andrew Morton On Tue, Apr 03, 2007 at 10:30:47AM -0700, Ulrich Drepper wrote: > Reading /proc/cpuinfo or /sys/devices/system/cpu reflects the > current CPU count. Not all of the cpu* directories in /sys/devices/system/cpu may be online. thanks, suresh ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:44 ` Siddha, Suresh B @ 2007-04-03 17:59 ` Ulrich Drepper 2007-04-03 19:40 ` Jakub Jelinek 2007-04-03 20:13 ` Ingo Oeser 2007-04-03 19:55 ` Ulrich Drepper 1 sibling, 2 replies; 56+ messages in thread From: Ulrich Drepper @ 2007-04-03 17:59 UTC (permalink / raw) To: Siddha, Suresh B; +Cc: Andi Kleen, Linux Kernel, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 778 bytes --] Siddha, Suresh B wrote: > Not all of the cpu* directories in /sys/devices/system/cpu may be > online. Brilliant. You people really know how to create user interfaces. So now in any case per CPU another stat/access syscall is needed to check the 'online' pseudo-file? With this the readdir solution is only marginally faster than parsing /proc/cpuinfo which means, it's unacceptably slow. I cannot believe all these big system people are allowed to screw everybody else up with their nonsense. So, anybody else has a proposal? This is a pressing issue and cannot wait until someday in the distant future NUMA topology information is easily and speedily accessible. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:59 ` Ulrich Drepper @ 2007-04-03 19:40 ` Jakub Jelinek 2007-04-03 20:13 ` Ingo Oeser 1 sibling, 0 replies; 56+ messages in thread From: Jakub Jelinek @ 2007-04-03 19:40 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Siddha, Suresh B, Andi Kleen, Linux Kernel, Andrew Morton On Tue, Apr 03, 2007 at 10:59:53AM -0700, Ulrich Drepper wrote: > Siddha, Suresh B wrote: > > Not all of the cpu* directories in /sys/devices/system/cpu may be > > online. > > Brilliant. You people really know how to create user interfaces. So > now in any case per CPU another stat/access syscall is needed to check > the 'online' pseudo-file? With this the readdir solution is only > marginally faster than parsing /proc/cpuinfo which means, it's > unacceptably slow. Note that glibc actually parses /proc/stat in preference of /proc/cpuinfo ATM, because /proc/stat is at least uniform while parsing /proc/cpuinfo needs a special parser for each architecture. And /proc/stat reading is even slower than /proc/cpuinfo, on x86_64 reading/parsing /proc/stat takes about 450usec, while e.g. stat64 on /sys/devices/system/cpu is just 2.5usec. But if that can't be trusted as the number of online CPUs, can somebody please add a short file to proc or sysfs which will contain the number of online and number of configured CPUs? See e.g. http://openmp.org/pipermail/omp/2007/000714.html where the first time after second g++ invocation is with omp_set_dynamic (1) and ought to be about as fast as omp_set_dynamic (0) case with the same number of threads, but it is far slower due to slow sysconf (_SC_NPROCESSORS_ONLN). Jakub ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:59 ` Ulrich Drepper 2007-04-03 19:40 ` Jakub Jelinek @ 2007-04-03 20:13 ` Ingo Oeser 2007-04-03 23:38 ` J.A. Magallón 1 sibling, 1 reply; 56+ messages in thread From: Ingo Oeser @ 2007-04-03 20:13 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Siddha, Suresh B, Andi Kleen, Linux Kernel, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 1370 bytes --] Hi Ulrich, On Tuesday 03 April 2007, Ulrich Drepper wrote: > So, anybody else has a proposal? This is a pressing issue and cannot > wait until someday in the distant future NUMA topology information is > easily and speedily accessible. Since for now you just need a fast and dirty hack, which will be replaced with better interfaces, I suggest creating a directory with some files in it. These should just contain, what you need to handle your most pressing cases. I propose /sys/devices/system/topology_counters/ for that. These can contain "online_cpu", "proped_cpu", "max_cpu" and maybe the same for nodes. All that as a simple file with an integer value. Since sysfs-attribute files are pollable (if the owners notifies sysfs on changes), you also have the notification system you need (select, poll, epoll etc.). If you promise to just keep the slow code around, than one day when the shiny NUMA topology stuff is ready, this directory can be completely removed and glibc (plus all their users) keeps working. It will then even work better with a new glibc version, which supports the shiny new NUMA topology stuff. The kernel can create these counters quiete easy, since most of them are the hamming weight (or population count) of some bitmaps. Does this sound like a proper hacky solution? :-) Regards Ingo Oeser [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 20:13 ` Ingo Oeser @ 2007-04-03 23:38 ` J.A. Magallón 0 siblings, 0 replies; 56+ messages in thread From: J.A. Magallón @ 2007-04-03 23:38 UTC (permalink / raw) To: Ingo Oeser Cc: Ulrich Drepper, Siddha, Suresh B, Andi Kleen, Linux Kernel, Andrew Morton On Tue, 3 Apr 2007 22:13:07 +0200, Ingo Oeser <ioe-lkml@rameria.de> wrote: > Hi Ulrich, > > On Tuesday 03 April 2007, Ulrich Drepper wrote: > > So, anybody else has a proposal? This is a pressing issue and cannot > > wait until someday in the distant future NUMA topology information is > > easily and speedily accessible. > > Since for now you just need a fast and dirty hack, which will be replaced > with better interfaces, I suggest creating a directory with some files in it. > These should just contain, what you need to handle your most pressing cases. > > I propose /sys/devices/system/topology_counters/ for that. > These can contain "online_cpu", "proped_cpu", "max_cpu" > and maybe the same for nodes. All that as a simple file with an integer > value. > > Since sysfs-attribute files are pollable (if the owners notifies sysfs > on changes), you also have the notification system you need > (select, poll, epoll etc.). > > If you promise to just keep the slow code around, than one day when the shiny > NUMA topology stuff is ready, this directory can be completely removed and > glibc (plus all their users) keeps working. It will then even work better with a > new glibc version, which supports the shiny new NUMA topology stuff. > > The kernel can create these counters quiete easy, since most of them are > the hamming weight (or population count) of some bitmaps. > > Does this sound like a proper hacky solution? :-) > Just a point of view from someone who has to parse /proc/cpuinfo. That sort of file tree thing is useful to work from the command line but its a kick in the a** to use from a program. This makes you just to re-parse the tree each time you have to get the info (open, read, atoi, close...) to fill your internal variables. I don't know if its possible, but I would like something like: __packt_it_tight_please struct cpumap_t { u16 ncpus; u16 ncpus_onln; u16 ncpus_inmyset; // for procsets // Here possibly more info about topology, pack-core-thread structure... // in simple arrays... }; struct cpumap_t *cpumap = mmap("/proc/sys/hw/cpumap",sizeof(struct cpumap_t)); for (...cpumap->ncpus_inmyset ....) // As I said, I don't know if its possible. I vaguely remember some comments against binary info in /proc... Even it could be simplified if you realize some things: - Usually people dont worry about if cpus are all the online ones or I'm running in a proc set. Just want to know how many can I use. - Don't care if they are hyper-threaded, cores os independent processors. To adjust processing for hyper-threaded cpus, one needs to tie processes to processors, and you need to be root for that. Really, anything dependent on topology is not usable for normal programs, because you need to be root to control that. So topology is not so important. Some (probably stupid) ideas... -- J.A. Magallon <jamagallon()ono!com> \ Software is like sex: \ It's better when it's free Mandriva Linux release 2007.1 (Cooker) for i586 Linux 2.6.20-jam08 (gcc 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)) #1 SMP PREEMPT ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 17:44 ` Siddha, Suresh B 2007-04-03 17:59 ` Ulrich Drepper @ 2007-04-03 19:55 ` Ulrich Drepper 2007-04-03 20:13 ` Siddha, Suresh B 2007-04-03 20:20 ` Nathan Lynch 1 sibling, 2 replies; 56+ messages in thread From: Ulrich Drepper @ 2007-04-03 19:55 UTC (permalink / raw) To: Siddha, Suresh B; +Cc: Linux Kernel, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 652 bytes --] Siddha, Suresh B wrote:a > Not all of the cpu* directories in /sys/devices/system/cpu may be > online. Apparently this information isn't needed. It's very easy to verify: $ ls /sys/devices/system/cpu/*/online /sys/devices/system/cpu/cpu1/online /sys/devices/system/cpu/cpu2/online /sys/devices/system/cpu/cpu3/online This is a quad core machine and cpu0 doesn't have the 'online' file (2.6.19 kernel). So, if nobody noticed this it's not needed and we can just remove CPUs from /sys/devices/system/cpu when they are brought offline, right? -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 19:55 ` Ulrich Drepper @ 2007-04-03 20:13 ` Siddha, Suresh B 2007-04-03 20:19 ` Ulrich Drepper 2007-04-03 20:20 ` Nathan Lynch 1 sibling, 1 reply; 56+ messages in thread From: Siddha, Suresh B @ 2007-04-03 20:13 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Siddha, Suresh B, Linux Kernel, Andrew Morton On Tue, Apr 03, 2007 at 12:55:22PM -0700, Ulrich Drepper wrote: > Siddha, Suresh B wrote:a > > Not all of the cpu* directories in /sys/devices/system/cpu may be > > online. > > Apparently this information isn't needed. It's very easy to verify: > > $ ls /sys/devices/system/cpu/*/online > /sys/devices/system/cpu/cpu1/online /sys/devices/system/cpu/cpu2/online > /sys/devices/system/cpu/cpu3/online > > This is a quad core machine and cpu0 doesn't have the 'online' file > (2.6.19 kernel). I think that is expected and intentional, as the current cpu hotplug code doesn't support offlining cpu0. > So, if nobody noticed this it's not needed and we can > just remove CPUs from /sys/devices/system/cpu when they are brought > offline, right? No. Logical cpu hotplug uses these interfaces to make a cpu go offline and online. thanks, suresh ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 20:13 ` Siddha, Suresh B @ 2007-04-03 20:19 ` Ulrich Drepper 2007-04-03 20:32 ` Eric Dumazet 0 siblings, 1 reply; 56+ messages in thread From: Ulrich Drepper @ 2007-04-03 20:19 UTC (permalink / raw) To: Siddha, Suresh B; +Cc: Linux Kernel, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 426 bytes --] Siddha, Suresh B wrote: > No. Logical cpu hotplug uses these interfaces to make a cpu go offline > and online. You missed my sarcasms, email is bad for conveying it. The point is nobody really cares about that hotplug nonsense to have noticed the bug. And still does this nonsense prevent real problems from being addressed. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 20:19 ` Ulrich Drepper @ 2007-04-03 20:32 ` Eric Dumazet 0 siblings, 0 replies; 56+ messages in thread From: Eric Dumazet @ 2007-04-03 20:32 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Siddha, Suresh B, Linux Kernel, Andrew Morton Ulrich Drepper a écrit : > Siddha, Suresh B wrote: >> No. Logical cpu hotplug uses these interfaces to make a cpu go offline >> and online. > > You missed my sarcasms, email is bad for conveying it. The point is > nobody really cares about that hotplug nonsense to have noticed the bug. > And still does this nonsense prevent real problems from being addressed. > Please dont focus on /sys being your holy gral. 1) AFAIK /sys/devices/system/cpu was not designed to meet glibc needs. 2) Many production machines dont mount /sys at all $ uname -r 2.6.20 $ ls -al /sys/devices/system/cpu ls: /sys/devices/system/cpu: No such file or directory $ grep processor /proc/cpuinfo processor : 0 processor : 1 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 19:55 ` Ulrich Drepper 2007-04-03 20:13 ` Siddha, Suresh B @ 2007-04-03 20:20 ` Nathan Lynch 1 sibling, 0 replies; 56+ messages in thread From: Nathan Lynch @ 2007-04-03 20:20 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Siddha, Suresh B, Linux Kernel, Andrew Morton Ulrich Drepper wrote: > Siddha, Suresh B wrote:a > > Not all of the cpu* directories in /sys/devices/system/cpu may be > > online. > > Apparently this information isn't needed. It's very easy to verify: > > $ ls /sys/devices/system/cpu/*/online > /sys/devices/system/cpu/cpu1/online /sys/devices/system/cpu/cpu2/online > /sys/devices/system/cpu/cpu3/online > > This is a quad core machine and cpu0 doesn't have the 'online' file > (2.6.19 kernel). So, if nobody noticed this it's not needed and we can > just remove CPUs from /sys/devices/system/cpu when they are brought > offline, right? No... the online sysfs files are used to show and change cpus' online/offline state. You wouldn't be able to bring an offlined cpu back online again. cpu0 doesn't have an online file on machines which don't support offlining of the boot cpu. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 16:54 getting processor numbers Ulrich Drepper ` (2 preceding siblings ...) 2007-04-03 18:11 ` Andi Kleen @ 2007-04-03 19:15 ` Davide Libenzi 2007-04-03 19:32 ` Ulrich Drepper 2007-04-03 20:16 ` Andrew Morton 4 siblings, 1 reply; 56+ messages in thread From: Davide Libenzi @ 2007-04-03 19:15 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Linux Kernel, Andrew Morton On Tue, 3 Apr 2007, Ulrich Drepper wrote: > More and more code depends on knowing the number of processors in the > system to efficiently scale the code. E.g., in OpenMP it is used by > default to determine how many threads to create. Creating more threads > than there are processors/cores doesn't make sense. > > glibc for a long time provides functionality to retrieve the number > through sysconf() and this is what fortunately most programs use. The > problem is that we are currently using /proc/cpuinfo since this is all > there was available at that time. Creating /proc/cpuinfo takes the > kernel quite a long time, unfortunately (I think Jakub said it is mainly > the interrupt information). > > The alternative today is to use /sys/devices/system/cpu and count the > number of cpu* directories in it. This is somewhat faster. But there > would be another possibility: simply stat /sys/devices/system/cpu and > use st_nlink - 2. > > This last step unfortunately it made impossible by recent changes: > > http://article.gmane.org/gmane.linux.kernel/413178 > > I would like to propose changing that patch, move the sched_* > pseudo-files in some other directly and permanently ban putting any new > file into /sys/devices/system/cpu. > > To get some numbers, you can try > > http://people.redhat.com/drepper/nproc-timing.c > > The numbers I see on x86-64: > > cpuinfo 10145810 cycles for 100 accesses > readdir /sys 3113870 cycles for 100 accesses > stat /sys 741070 cycles for 100 accesses It sucks when seen from a micro-bench POV, but does it really matter overall? The vast majority of software usually calls sysconf(_SC_NPROCESSORS_*) with very little frequency (mostly once at initialization time) anyway. That's what 50us / call? - Davide ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 19:15 ` Davide Libenzi @ 2007-04-03 19:32 ` Ulrich Drepper 2007-04-04 0:31 ` H. Peter Anvin 0 siblings, 1 reply; 56+ messages in thread From: Ulrich Drepper @ 2007-04-03 19:32 UTC (permalink / raw) To: Davide Libenzi; +Cc: Linux Kernel, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 962 bytes --] Davide Libenzi wrote: > It sucks when seen from a micro-bench POV, but does it really matter > overall? The vast majority of software usually calls > sysconf(_SC_NPROCESSORS_*) with very little frequency (mostly once at > initialization time) anyway. That's what 50us / call? This is not today's situation. Yes, 10 years ago when I added the support to glibc it wasn't much of a problem. But times change. As I said before in this thread, OpenMP by default scales the number of threads used for a parallel loops depending on the number of available processors/cores and therefore the number must be retrieved every time (with perhaps minimal caching of a few secs, but this requires gettimeofday calls...). All of a sudden this is not micro benchmark anymore. It's a real issue which we only became aware of because it is noticeable in real life. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 19:32 ` Ulrich Drepper @ 2007-04-04 0:31 ` H. Peter Anvin 2007-04-04 0:35 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 56+ messages in thread From: H. Peter Anvin @ 2007-04-04 0:31 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Davide Libenzi, Linux Kernel, Andrew Morton Ulrich Drepper wrote: > Davide Libenzi wrote: >> It sucks when seen from a micro-bench POV, but does it really matter >> overall? The vast majority of software usually calls >> sysconf(_SC_NPROCESSORS_*) with very little frequency (mostly once at >> initialization time) anyway. That's what 50us / call? > > This is not today's situation. Yes, 10 years ago when I added the > support to glibc it wasn't much of a problem. But times change. As I > said before in this thread, OpenMP by default scales the number of > threads used for a parallel loops depending on the number of available > processors/cores and therefore the number must be retrieved every time > (with perhaps minimal caching of a few secs, but this requires > gettimeofday calls...). All of a sudden this is not micro benchmark > anymore. It's a real issue which we only became aware of because it is > noticeable in real life. Sounds like it would need a device which can be waited upon for changes. -hpa ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-04 0:31 ` H. Peter Anvin @ 2007-04-04 0:35 ` Jeremy Fitzhardinge 2007-04-04 0:38 ` H. Peter Anvin 0 siblings, 1 reply; 56+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-04 0:35 UTC (permalink / raw) To: H. Peter Anvin Cc: Ulrich Drepper, Davide Libenzi, Linux Kernel, Andrew Morton H. Peter Anvin wrote: > Sounds like it would need a device which can be waited upon for changes. A vdso-like shared page could have a futex in it. J ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-04 0:35 ` Jeremy Fitzhardinge @ 2007-04-04 0:38 ` H. Peter Anvin 2007-04-04 5:09 ` Eric Dumazet 0 siblings, 1 reply; 56+ messages in thread From: H. Peter Anvin @ 2007-04-04 0:38 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Ulrich Drepper, Davide Libenzi, Linux Kernel, Andrew Morton Jeremy Fitzhardinge wrote: > H. Peter Anvin wrote: >> Sounds like it would need a device which can be waited upon for changes. > > A vdso-like shared page could have a futex in it. Yes, but a futex couldn't be waited upon with a bunch of other things as part of a poll or a select. The cost of reading the information is minimal. -hpa ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-04 0:38 ` H. Peter Anvin @ 2007-04-04 5:09 ` Eric Dumazet 2007-04-04 5:16 ` H. Peter Anvin 0 siblings, 1 reply; 56+ messages in thread From: Eric Dumazet @ 2007-04-04 5:09 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Fitzhardinge, Ulrich Drepper, Davide Libenzi, Linux Kernel, Andrew Morton, Andi Kleen H. Peter Anvin a écrit : > Jeremy Fitzhardinge wrote: >> H. Peter Anvin wrote: >>> Sounds like it would need a device which can be waited upon for changes. >> >> A vdso-like shared page could have a futex in it. > > Yes, but a futex couldn't be waited upon with a bunch of other things as > part of a poll or a select. The cost of reading the information is > minimal. > There is one thing that always worried me. Intel & AMD manuals make clear that mixing data and program in the same page is bad for performance. In particular, x86_64 vsyscall put jiffies and other vsyscall_gtod_data_t right in the midle of code. That is certainly not wise. A probably sane implementation should use two pages, one for code, one for data. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-04 5:09 ` Eric Dumazet @ 2007-04-04 5:16 ` H. Peter Anvin 2007-04-04 5:22 ` Jeremy Fitzhardinge 2007-04-04 5:29 ` Eric Dumazet 0 siblings, 2 replies; 56+ messages in thread From: H. Peter Anvin @ 2007-04-04 5:16 UTC (permalink / raw) To: Eric Dumazet Cc: Jeremy Fitzhardinge, Ulrich Drepper, Davide Libenzi, Linux Kernel, Andrew Morton, Andi Kleen Eric Dumazet wrote: > > There is one thing that always worried me. > > Intel & AMD manuals make clear that mixing data and program in the same > page is bad for performance. > > In particular, x86_64 vsyscall put jiffies and other > vsyscall_gtod_data_t right in the midle of code. That is certainly not > wise. > > A probably sane implementation should use two pages, one for code, one > for data. > Mutable data should be separated from code. I think any current CPU will do fine as long as they are in separate 128-byte chunks, but they need at least that much separation. Readonly data does not need to be separated from code. -hpa ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-04 5:16 ` H. Peter Anvin @ 2007-04-04 5:22 ` Jeremy Fitzhardinge 2007-04-04 5:40 ` H. Peter Anvin 2007-04-04 5:29 ` Eric Dumazet 1 sibling, 1 reply; 56+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-04 5:22 UTC (permalink / raw) To: H. Peter Anvin Cc: Eric Dumazet, Ulrich Drepper, Davide Libenzi, Linux Kernel, Andrew Morton, Andi Kleen H. Peter Anvin wrote: > Mutable data should be separated from code. I think any current CPU > will do fine as long as they are in separate 128-byte chunks, but they > need at least that much separation. P4 manual says that if one processor modifies data within 2k of another processor executing code, it will trash the entire trace cache. J ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-04 5:22 ` Jeremy Fitzhardinge @ 2007-04-04 5:40 ` H. Peter Anvin 2007-04-04 5:46 ` Eric Dumazet 0 siblings, 1 reply; 56+ messages in thread From: H. Peter Anvin @ 2007-04-04 5:40 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Eric Dumazet, Ulrich Drepper, Davide Libenzi, Linux Kernel, Andrew Morton, Andi Kleen Jeremy Fitzhardinge wrote: > H. Peter Anvin wrote: >> Mutable data should be separated from code. I think any current CPU >> will do fine as long as they are in separate 128-byte chunks, but they >> need at least that much separation. > P4 manual says that if one processor modifies data within 2k of another > processor executing code, it will trash the entire trace cache. Yuck. Didn't realize the P4 was that sensitive. OK, so at the least we need a half-page of separation. -hpa ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-04 5:40 ` H. Peter Anvin @ 2007-04-04 5:46 ` Eric Dumazet 0 siblings, 0 replies; 56+ messages in thread From: Eric Dumazet @ 2007-04-04 5:46 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Fitzhardinge, Ulrich Drepper, Davide Libenzi, Linux Kernel, Andrew Morton, Andi Kleen H. Peter Anvin a écrit : > Jeremy Fitzhardinge wrote: >> H. Peter Anvin wrote: >>> Mutable data should be separated from code. I think any current CPU >>> will do fine as long as they are in separate 128-byte chunks, but they >>> need at least that much separation. >> P4 manual says that if one processor modifies data within 2k of another >> processor executing code, it will trash the entire trace cache. > > Yuck. Didn't realize the P4 was that sensitive. OK, so at the least we > need a half-page of separation. Yes but vsyscall API currently defines 4 entry points : vsyscall0 (vgettimeofday) = ADDR vsyscall1 (vtime) = ADDR+1024 vsyscall2 (vgetcpu) = ADDR+2048 vsyscall3 (vxxxxx) = ADDR+3072 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-04 5:16 ` H. Peter Anvin 2007-04-04 5:22 ` Jeremy Fitzhardinge @ 2007-04-04 5:29 ` Eric Dumazet 1 sibling, 0 replies; 56+ messages in thread From: Eric Dumazet @ 2007-04-04 5:29 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Fitzhardinge, Ulrich Drepper, Davide Libenzi, Linux Kernel, Andrew Morton, Andi Kleen H. Peter Anvin a écrit : > Eric Dumazet wrote: >> >> There is one thing that always worried me. >> >> Intel & AMD manuals make clear that mixing data and program in the >> same page is bad for performance. >> >> In particular, x86_64 vsyscall put jiffies and other >> vsyscall_gtod_data_t right in the midle of code. That is certainly not >> wise. >> >> A probably sane implementation should use two pages, one for code, one >> for data. >> > > Mutable data should be separated from code. I think any current CPU > will do fine as long as they are in separate 128-byte chunks, but they > need at least that much separation. > > Readonly data does not need to be separated from code. > Yes... jiffies & vsyscall_gtod_data_t is writen HZ times per second. Not really readonly I'm afraid. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 16:54 getting processor numbers Ulrich Drepper ` (3 preceding siblings ...) 2007-04-03 19:15 ` Davide Libenzi @ 2007-04-03 20:16 ` Andrew Morton [not found] ` <4612BB89.8040102@redhat.com> 2007-04-04 2:04 ` Paul Jackson 4 siblings, 2 replies; 56+ messages in thread From: Andrew Morton @ 2007-04-03 20:16 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Linux Kernel On Tue, 03 Apr 2007 09:54:46 -0700 Ulrich Drepper <drepper@redhat.com> wrote: > More and more code depends on knowing the number of processors in the > system to efficiently scale the code. E.g., in OpenMP it is used by > default to determine how many threads to create. Creating more threads > than there are processors/cores doesn't make sense. but... It would be a mistake for an application to assume that it is allowed to _use_ all the present CPUs. People can and do run applications within cpusets, and under sched_setaffinity(). So I'd have thought that in general an application should be querying its present affinity mask - something like sched_getaffinity()? That fixes the CPU hotplug issues too, of course. But we discussed this all a couple years back and it was decided that sched_getaffinity() was unsuitable. I remember at the time not really understanding why? ^ permalink raw reply [flat|nested] 56+ messages in thread
[parent not found: <4612BB89.8040102@redhat.com>]
[parent not found: <20070403141348.9bcdb13e.akpm@linux-foundation.org>]
* Re: getting processor numbers [not found] ` <20070403141348.9bcdb13e.akpm@linux-foundation.org> @ 2007-04-03 22:13 ` Ulrich Drepper 2007-04-03 22:48 ` Andrew Morton 0 siblings, 1 reply; 56+ messages in thread From: Ulrich Drepper @ 2007-04-03 22:13 UTC (permalink / raw) To: Andrew Morton, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 1905 bytes --] Andrew Morton wrote: > Did we mean to go off-list? Oops, no, pressed the wrong button. >> Andrew Morton wrote: >>> So I'd have thought that in general an application should be querying its >>> present affinity mask - something like sched_getaffinity()? That fixes the >>> CPU hotplug issues too, of course. >> Does it really? >> >> My recollection is that the affinity masks of running processes is not >> updated on hotplugging. Is this addressed? > > ah, yes, you're correct. > > Inside a cpuset: > > sched_setaffinity() is constrained to those CPUs which are in the > cpuset. > > If a cpu if on/offlined we update each cpuset's cpu mask appropriately > but we do not update all the tasks presently running in the cpuset. > > Outside a cpuset: > > sched_setaffinity() is constrained to all possible cpus > > We don't update each task's cpus_allowed when a CPU is removed. > > > I think we trivially _could_ update each tasks's cpus_allowed mask when a > CPU is removed, actually. I think it has to be done. But that's not so trivial. What happens if all the CPUs a process was supposed to be runnable on vanish. Shouldn't, if no affinity mask is defined, new processors be added? I agree that if the process has a defined affinity mask no new processors should be added _automatically_. >> If yes, sched_getaffinity is a solution until the NUMA topology >> framework can provide something better. Even without a popcnt >> instruction in the CPU (64-bit albeit) it's twice as fast as the the >> stat() method proposed. > > I'm surprised - I'd have expected sched_getaffinity() to be vastly quicker > that doing fileystem operations. You mean because it's only a factor of two? Well, it's not once you count the whole overhead. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 22:13 ` Ulrich Drepper @ 2007-04-03 22:48 ` Andrew Morton 2007-04-03 23:00 ` Ulrich Drepper 2007-04-04 2:52 ` Paul Jackson 0 siblings, 2 replies; 56+ messages in thread From: Andrew Morton @ 2007-04-03 22:48 UTC (permalink / raw) To: Ulrich Drepper Cc: Linux Kernel, Gautham R Shenoy, Dipankar Sarma, Paul Jackson On Tue, 03 Apr 2007 15:13:09 -0700 Ulrich Drepper <drepper@redhat.com> wrote: > Andrew Morton wrote: > > Did we mean to go off-list? > > Oops, no, pressed the wrong button. > > >> Andrew Morton wrote: > >>> So I'd have thought that in general an application should be querying its > >>> present affinity mask - something like sched_getaffinity()? That fixes the > >>> CPU hotplug issues too, of course. > >> Does it really? > >> > >> My recollection is that the affinity masks of running processes is not > >> updated on hotplugging. Is this addressed? > > > > ah, yes, you're correct. > > > > Inside a cpuset: > > > > sched_setaffinity() is constrained to those CPUs which are in the > > cpuset. > > > > If a cpu if on/offlined we update each cpuset's cpu mask appropriately > > but we do not update all the tasks presently running in the cpuset. > > > > Outside a cpuset: > > > > sched_setaffinity() is constrained to all possible cpus > > > > We don't update each task's cpus_allowed when a CPU is removed. > > > > > > I think we trivially _could_ update each tasks's cpus_allowed mask when a > > CPU is removed, actually. > > I think it has to be done. But that's not so trivial. What happens if > all the CPUs a process was supposed to be runnable on vanish. > Shouldn't, if no affinity mask is defined, new processors be added? I > agree that if the process has a defined affinity mask no new processors > should be added _automatically_. > Yes, some policy decision needs to be made there. But whatever we decide to do, the implementation will be relatively straightforward, because hot-unplug uses stop_machine_run() and later, we hope, will use the process freezer. This setting of the whole machine into a known state means (I think) that we can avoid a whole lot of fuss which happens when affinity is altered. Anyway. It's not really clear who maintains CPU hotplug nowadays. <adds a few cc's>. But yes, I do thing we should do <something sane> with process affinity when CPU hot[un]plug happens. Now it could be argued that the current behaviour is that sane thing: we allow the process to "pin" itself to not-present CPUs and just handle it in the CPU scheduler. Paul, could you please describe what cpusets' policy is in the presence of CPU additional and removal? > > >> If yes, sched_getaffinity is a solution until the NUMA topology > >> framework can provide something better. Even without a popcnt > >> instruction in the CPU (64-bit albeit) it's twice as fast as the the > >> stat() method proposed. > > > > I'm surprised - I'd have expected sched_getaffinity() to be vastly quicker > > that doing fileystem operations. > > You mean because it's only a factor of two? Well, it's not once you > count the whole overhead. Is it kernel overhead, or userspace? The overhead of counting the bits? Because sched_getaffinity() could be easily sped up in the case where it is operating on the current process. Anyway, where do we stand? Assuming we can address the CPU hotplug issues, does sched_getaffinity() look like it will be suitable? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 22:48 ` Andrew Morton @ 2007-04-03 23:00 ` Ulrich Drepper 2007-04-03 23:23 ` Andrew Morton ` (2 more replies) 2007-04-04 2:52 ` Paul Jackson 1 sibling, 3 replies; 56+ messages in thread From: Ulrich Drepper @ 2007-04-03 23:00 UTC (permalink / raw) To: Andrew Morton Cc: Linux Kernel, Gautham R Shenoy, Dipankar Sarma, Paul Jackson [-- Attachment #1: Type: text/plain, Size: 1453 bytes --] Andrew Morton wrote: > Now it could be argued that the current behaviour is that sane thing: we > allow the process to "pin" itself to not-present CPUs and just handle it in > the CPU scheduler. As a stop-gap solution Jakub will likely implement the sched_getaffinity hack. So, it would realy be best to get the masks updated. But all this of course does not solve the issue sysconf() has. In sysconf we cannot use sched_getaffinity since all the systems CPUs must be reported. > Is it kernel overhead, or userspace? The overhead of counting the bits? The overhead I meant is userland. > Because sched_getaffinity() could be easily sped up in the case where > it is operating on the current process. If there is possibility to treat this case special and make it faster, please do so. It would be best to allow pid==0 as a special case so that callers don't have to find out the TID (which they shouldn't have to know). > Anyway, where do we stand? Assuming we can address the CPU hotplug issues, > does sched_getaffinity() look like it will be suitable? It's only usable for the special case on the OpenMP code where the number of threads is used to determine the number of worker threads. For sysconf() we still need better support. Maybe now somebody will step up and say they need faster sysconf as well. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 23:00 ` Ulrich Drepper @ 2007-04-03 23:23 ` Andrew Morton 2007-04-03 23:54 ` Ulrich Drepper ` (2 more replies) 2007-04-04 2:58 ` Paul Jackson 2007-04-04 3:04 ` Paul Jackson 2 siblings, 3 replies; 56+ messages in thread From: Andrew Morton @ 2007-04-03 23:23 UTC (permalink / raw) To: Ulrich Drepper Cc: Linux Kernel, Gautham R Shenoy, Dipankar Sarma, Paul Jackson, Ingo Molnar, Oleg Nesterov On Tue, 03 Apr 2007 16:00:50 -0700 Ulrich Drepper <drepper@redhat.com> wrote: > Andrew Morton wrote: > > Now it could be argued that the current behaviour is that sane thing: we > > allow the process to "pin" itself to not-present CPUs and just handle it in > > the CPU scheduler. > > As a stop-gap solution Jakub will likely implement the sched_getaffinity > hack. So, it would realy be best to get the masks updated. > > > But all this of course does not solve the issue sysconf() has. In > sysconf we cannot use sched_getaffinity since all the systems CPUs must > be reported. OK. This is excecptionally gruesome, but one could run sched_getaffinity() against pid 1 (init). Which will break nicely in the OS-virtualised future when the system has multiple pid-1-inits running in containers... > > > Is it kernel overhead, or userspace? The overhead of counting the bits? > > The overhead I meant is userland. > OK. Your cost of counting those bits is proportional to CONFIG_NR_CPUS. It's a bit sad that sys_sched_get_get_affinity() returns sizeof(cpumask_t), because that means that userspace must handle 256 or whatever CPUs on a machine which only has two CPUs. Does anyone see a reason why sys_sched_getaffinity() cannot be altered to return maximum-possible-cpu-id-on-this-machine? That way, your hweight operation will be much faster on sane-sized machines. > > > Because sched_getaffinity() could be easily sped up in the case where > > it is operating on the current process. > > If there is possibility to treat this case special and make it faster, > please do so. It would be best to allow pid==0 as a special case so > that callers don't have to find out the TID (which they shouldn't have > to know). > OK. Does anyone see a reason why we cannot do this? --- a/kernel/sched.c~sched_getaffinity-speedup +++ a/kernel/sched.c @@ -4381,8 +4381,12 @@ long sched_getaffinity(pid_t pid, cpumas struct task_struct *p; int retval; - lock_cpu_hotplug(); - read_lock(&tasklist_lock); + if (pid) { + lock_cpu_hotplug(); + read_lock(&tasklist_lock); + } else { + preempt_disable(); /* Prevent CPU hotplugging */ + } retval = -ESRCH; p = find_process_by_pid(pid); @@ -4396,12 +4400,13 @@ long sched_getaffinity(pid_t pid, cpumas cpus_and(*mask, p->cpus_allowed, cpu_online_map); out_unlock: - read_unlock(&tasklist_lock); - unlock_cpu_hotplug(); - if (retval) - return retval; - - return 0; + if (pid) { + read_unlock(&tasklist_lock); + unlock_cpu_hotplug(); + } else { + preempt_enable(); + } + return retval; } /** _ > > > Anyway, where do we stand? Assuming we can address the CPU hotplug issues, > > does sched_getaffinity() look like it will be suitable? > > It's only usable for the special case on the OpenMP code where the > number of threads is used to determine the number of worker threads. > For sysconf() we still need better support. Maybe now somebody will > step up and say they need faster sysconf as well. I guess we could add a simple sys_get_nr_cpus(). If we want more than that (ie: topology, SMT/MC/NUMA/numa-distance etc) then it gets much more complex and sysfs is more appropriate for that. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 23:23 ` Andrew Morton @ 2007-04-03 23:54 ` Ulrich Drepper 2007-04-04 2:55 ` Paul Jackson 2007-04-04 8:39 ` Oleg Nesterov 2 siblings, 0 replies; 56+ messages in thread From: Ulrich Drepper @ 2007-04-03 23:54 UTC (permalink / raw) To: Andrew Morton Cc: Linux Kernel, Gautham R Shenoy, Dipankar Sarma, Paul Jackson, Ingo Molnar, Oleg Nesterov [-- Attachment #1: Type: text/plain, Size: 240 bytes --] Andrew Morton wrote: > Does anyone see a reason why we cannot do this? Shouldn't sched_setaffinity get the same treatment for symmetry reasons? -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 251 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 23:23 ` Andrew Morton 2007-04-03 23:54 ` Ulrich Drepper @ 2007-04-04 2:55 ` Paul Jackson 2007-04-04 8:39 ` Oleg Nesterov 2 siblings, 0 replies; 56+ messages in thread From: Paul Jackson @ 2007-04-04 2:55 UTC (permalink / raw) To: Andrew Morton; +Cc: drepper, linux-kernel, ego, dipankar, mingo, oleg Andrew wrote: > > But all this of course does not solve the issue sysconf() has. In > > sysconf we cannot use sched_getaffinity since all the systems CPUs must > > be reported. > > OK. > > This is excecptionally gruesome, but one could run sched_getaffinity() > against pid 1 (init). Which will break nicely in the OS-virtualised future > when the system has multiple pid-1-inits running in containers... That nicely breaks on typical cpuset managed systems as well, which frequently put init into a small cpuset (with the classic Unix daemon and login load), leaving the bulk of the system to be managed by a batch scheduler. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 23:23 ` Andrew Morton 2007-04-03 23:54 ` Ulrich Drepper 2007-04-04 2:55 ` Paul Jackson @ 2007-04-04 8:39 ` Oleg Nesterov 2007-04-04 9:39 ` Ingo Molnar 2 siblings, 1 reply; 56+ messages in thread From: Oleg Nesterov @ 2007-04-04 8:39 UTC (permalink / raw) To: Andrew Morton Cc: Ulrich Drepper, Linux Kernel, Gautham R Shenoy, Dipankar Sarma, Paul Jackson, Ingo Molnar On 04/03, Andrew Morton wrote: > > On Tue, 03 Apr 2007 16:00:50 -0700 > Ulrich Drepper <drepper@redhat.com> wrote: > > > If there is possibility to treat this case special and make it faster, > > please do so. It would be best to allow pid==0 as a special case so > > that callers don't have to find out the TID (which they shouldn't have > > to know). > > > > OK. > > Does anyone see a reason why we cannot do this? > > --- a/kernel/sched.c~sched_getaffinity-speedup > +++ a/kernel/sched.c > @@ -4381,8 +4381,12 @@ long sched_getaffinity(pid_t pid, cpumas > struct task_struct *p; > int retval; > > - lock_cpu_hotplug(); > - read_lock(&tasklist_lock); > + if (pid) { > + lock_cpu_hotplug(); > + read_lock(&tasklist_lock); > + } else { > + preempt_disable(); /* Prevent CPU hotplugging */ > + } But we don't need tasklist_lock at all, we can use rcu_read_lock/unlock. Q: don't we need task_rq_lock() to read ->cpus_allowed "atomically" ? UNTESTED. --- OLD/kernel/sched.c~ 2007-04-03 13:05:02.000000000 +0400 +++ OLD/kernel/sched.c 2007-04-04 12:29:04.000000000 +0400 @@ -4433,22 +4433,17 @@ long sched_setaffinity(pid_t pid, cpumas int retval; mutex_lock(&sched_hotcpu_mutex); - read_lock(&tasklist_lock); + rcu_read_lock(); p = find_process_by_pid(pid); if (!p) { - read_unlock(&tasklist_lock); + rcu_read_unlock(); mutex_unlock(&sched_hotcpu_mutex); return -ESRCH; } - /* - * It is not safe to call set_cpus_allowed with the - * tasklist_lock held. We will bump the task_struct's - * usage count and then drop tasklist_lock. - */ get_task_struct(p); - read_unlock(&tasklist_lock); + rcu_read_unlock(); retval = -EPERM; if ((current->euid != p->euid) && (current->euid != p->uid) && @@ -4523,7 +4518,7 @@ long sched_getaffinity(pid_t pid, cpumas int retval; mutex_lock(&sched_hotcpu_mutex); - read_lock(&tasklist_lock); + rcu_read_lock(); retval = -ESRCH; p = find_process_by_pid(pid); @@ -4537,7 +4532,7 @@ long sched_getaffinity(pid_t pid, cpumas cpus_and(*mask, p->cpus_allowed, cpu_online_map); out_unlock: - read_unlock(&tasklist_lock); + rcu_read_unlock(); mutex_unlock(&sched_hotcpu_mutex); if (retval) return retval; ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-04 8:39 ` Oleg Nesterov @ 2007-04-04 9:39 ` Ingo Molnar 2007-04-04 8:57 ` Oleg Nesterov 0 siblings, 1 reply; 56+ messages in thread From: Ingo Molnar @ 2007-04-04 9:39 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Ulrich Drepper, Linux Kernel, Gautham R Shenoy, Dipankar Sarma, Paul Jackson * Oleg Nesterov <oleg@tv-sign.ru> wrote: > But we don't need tasklist_lock at all, we can use > rcu_read_lock/unlock. Q: don't we need task_rq_lock() to read > ->cpus_allowed "atomically" ? right now ->cpus_allowed is protected by tasklist_lock. We cannot do RCU here because ->cpus_allowed modifications are not RCUified. Ingo ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-04 9:39 ` Ingo Molnar @ 2007-04-04 8:57 ` Oleg Nesterov 2007-04-04 10:01 ` Ingo Molnar 0 siblings, 1 reply; 56+ messages in thread From: Oleg Nesterov @ 2007-04-04 8:57 UTC (permalink / raw) To: Ingo Molnar Cc: Andrew Morton, Ulrich Drepper, Linux Kernel, Gautham R Shenoy, Dipankar Sarma, Paul Jackson On 04/04, Ingo Molnar wrote: > > * Oleg Nesterov <oleg@tv-sign.ru> wrote: > > > But we don't need tasklist_lock at all, we can use > > rcu_read_lock/unlock. Q: don't we need task_rq_lock() to read > > ->cpus_allowed "atomically" ? > > right now ->cpus_allowed is protected by tasklist_lock. We cannot do RCU > here because ->cpus_allowed modifications are not RCUified. Is it so? that was my question. Afaics, set_cpus_allowed() does p->cpus_allowed = new_mask under rq->lock, so I don't understand how tasklist_lock can help. Could you clarify? Oleg. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-04 8:57 ` Oleg Nesterov @ 2007-04-04 10:01 ` Ingo Molnar 0 siblings, 0 replies; 56+ messages in thread From: Ingo Molnar @ 2007-04-04 10:01 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, Ulrich Drepper, Linux Kernel, Gautham R Shenoy, Dipankar Sarma, Paul Jackson * Oleg Nesterov <oleg@tv-sign.ru> wrote: > On 04/04, Ingo Molnar wrote: > > > > * Oleg Nesterov <oleg@tv-sign.ru> wrote: > > > > > But we don't need tasklist_lock at all, we can use > > > rcu_read_lock/unlock. Q: don't we need task_rq_lock() to read > > > ->cpus_allowed "atomically" ? > > > > right now ->cpus_allowed is protected by tasklist_lock. We cannot do > > RCU here because ->cpus_allowed modifications are not RCUified. > > Is it so? that was my question. Afaics, set_cpus_allowed() does > p->cpus_allowed = new_mask under rq->lock, so I don't understand how > tasklist_lock can help. you are right, we could (and should) make this depend on rq_lock only - i.e. just take away the tasklist_lock like your patch does. It's not like the user could expect to observe any ordering between PID lookup and affinity-mask changes. And my RCU comment is bogus: it's not like we allocate ->cpus_allowed :-/ Ingo ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 23:00 ` Ulrich Drepper 2007-04-03 23:23 ` Andrew Morton @ 2007-04-04 2:58 ` Paul Jackson 2007-04-04 3:04 ` Paul Jackson 2 siblings, 0 replies; 56+ messages in thread From: Paul Jackson @ 2007-04-04 2:58 UTC (permalink / raw) To: Ulrich Drepper; +Cc: akpm, linux-kernel, ego, dipankar Ulrich wrote: > But all this of course does not solve the issue sysconf() has. In > sysconf we cannot use sched_getaffinity since all the systems CPUs must > be reported. Good point. Yes, sysconf(_SC_NPROCESSORS_CONF) really needs to continue returning the number of CPUs online, to maintain compatibility with the current implementation, and because with a name like that, just about anything else would be "surprising". And OpenMPI shouldn't be calling sysconf(_SC_NPROCESSORS_CONF), as it does not want to know how much hardware is there, but rather how much hardware it is allowed to use. Something based on sched_getaffinity() would seem to be ideal for its purposes. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 23:00 ` Ulrich Drepper 2007-04-03 23:23 ` Andrew Morton 2007-04-04 2:58 ` Paul Jackson @ 2007-04-04 3:04 ` Paul Jackson 2 siblings, 0 replies; 56+ messages in thread From: Paul Jackson @ 2007-04-04 3:04 UTC (permalink / raw) To: Ulrich Drepper; +Cc: akpm, linux-kernel, ego, dipankar Ulrich wrote: > For sysconf() we still need better support. Maybe now somebody will > step up and say they need faster sysconf as well. That won't be me ;). For any kernel compiled with CONFIG_CPUSETS (which includes the major distros I am aware of), one can just count the bits in the top cpusets 'cpus' value, which is always the online CPU mask. Even if your user level software is making no conscious use of cpusets, this works, for kernels so configured. If there is someone who needs both a faster sysconf (_SC_NPROCESSORS_CONF) and a kernel without CPUSETS configured, then they will have to speak up for themselves. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 22:48 ` Andrew Morton 2007-04-03 23:00 ` Ulrich Drepper @ 2007-04-04 2:52 ` Paul Jackson 1 sibling, 0 replies; 56+ messages in thread From: Paul Jackson @ 2007-04-04 2:52 UTC (permalink / raw) To: Andrew Morton; +Cc: drepper, linux-kernel, ego, dipankar, cpw Andrew wrote: > Paul, could you please describe what cpusets' policy is in the presence of > CPU additional and removal? Currently, if we remove the last CPU in a cpuset, we give that cpuset the CPUs of its parent cpuset, in order to ensure that every cpuset with tasks attached actually has some CPUs they can run on. See the routine kernel/cpuset.c:guarantee_online_cpus_mems_in_subtree(), and kernel/cpuset.c:guarantee_online_cpus(), and close your eyes to the recursion ... yeah that has to get fixed someday ;). But Cliff Wickman <cpw@sgi.com> (added to CC) figured out that this was broken, as it could easily violate the cpu_exclusive property of cpusets. He is working on a patch that will move the tasks in the CPU-deficient cpuset up to their parent cpuset. In general, the idea is to ensure that every task has at least one CPU on which it can run. If the sysadmin intends to unplug all the CPUs required by some task, he "should" first move those tasks somewhere else where they can continue to run. If he doesn't, then the kernel "makes do", by either (currently) adding CPUs to the CPU-deficient cpuset, or (future) moving the CPU-deprived tasks somewhere else. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-03 20:16 ` Andrew Morton [not found] ` <4612BB89.8040102@redhat.com> @ 2007-04-04 2:04 ` Paul Jackson 2007-04-04 6:47 ` Jakub Jelinek 1 sibling, 1 reply; 56+ messages in thread From: Paul Jackson @ 2007-04-04 2:04 UTC (permalink / raw) To: Andrew Morton; +Cc: drepper, linux-kernel Andrew wrote: > I'd have thought that in general an application should be querying its > present affinity mask - something like sched_getaffinity()? That fixes the > CPU hotplug issues too, of course. The sched_getaffinity call is quick, too, and it nicely reflects any cpuset constraints, while still working on kernels which don't have CPUSETs configured. There are really at least four "number of CPUs" answers here, and we should be aware of which we are providing. There are, in order of decreasing size: 1) the size of the kernels cpumask_t (NR_CPUS), 2) the maximum number of CPUs that might ever be hotplugged into a booted system, 3) the current number of CPUs online in that system, and 4) the number of CPUs that the current task is allowed to use. I would suggest that (4) is what we should typically return. Certainly it would seem that the use that Ulrich is concerned with, by OpenMP, wants (4). Currently, the sysconf(_SC_NPROCESSORS_CONF) returns (3), by counting the CPUs in /proc/stat, which is rather bogus on cpuset, or even sched_setaffinity, constrained systems. > But we discussed this all a couple years back and it was decided that > sched_getaffinity() was unsuitable. I remember at the time not really > understanding why? Perhaps it was because a robust invocation of sched_getaffinity takes a page of code to write, as Andi Kleen noticed in his libnuma coding of "static int number_of_cpus(void)". One has to size the mask passed in, in case the kernel was compiled with a larger cpumask_t size than you guessed up front. In other words, to get (4) using sched_getaffinity, one first needs an upper bound on (1), above, the kernels configured NR_CPUS. One can either size it by repeatedly invoking sched_getaffinity with larger masks until it stops failing EINVAL, or one can examine the length of the "Cpus_allowed" mask displayed in /proc/self/status (it takes 9 ascii chars, if you include the commas and trailing newline, to display each 32 bits of cpumask_t.) There may be other ways as well; those seem to be the most common. At least the kernel cpumask_t size can be cached for the life of the process and any descendents, so the cost of obtaining it should be less critical. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-04 2:04 ` Paul Jackson @ 2007-04-04 6:47 ` Jakub Jelinek 2007-04-04 7:02 ` Paul Jackson 2007-04-04 14:51 ` Cliff Wickman 0 siblings, 2 replies; 56+ messages in thread From: Jakub Jelinek @ 2007-04-04 6:47 UTC (permalink / raw) To: Paul Jackson; +Cc: Andrew Morton, drepper, linux-kernel On Tue, Apr 03, 2007 at 07:04:58PM -0700, Paul Jackson wrote: > Andrew wrote: > > I'd have thought that in general an application should be querying its > > present affinity mask - something like sched_getaffinity()? That fixes the > > CPU hotplug issues too, of course. > > The sched_getaffinity call is quick, too, and it nicely reflects any > cpuset constraints, while still working on kernels which don't have > CPUSETs configured. > > There are really at least four "number of CPUs" answers here, and we > should be aware of which we are providing. There are, in order of > decreasing size: > 1) the size of the kernels cpumask_t (NR_CPUS), > 2) the maximum number of CPUs that might ever be hotplugged into a > booted system, > 3) the current number of CPUs online in that system, and > 4) the number of CPUs that the current task is allowed to use. > > I would suggest that (4) is what we should typically return. > Certainly it would seem that the use that Ulrich is concerned with, > by OpenMP, wants (4). > > Currently, the sysconf(_SC_NPROCESSORS_CONF) returns (3), by counting > the CPUs in /proc/stat, which is rather bogus on cpuset, or even > sched_setaffinity, constrained systems. OpenMP wants (4) and I'll change it that way. sysconf(_SC_NPROCESSORS_ONLN) must return (3) (this currently scans /proc/stat) and sysconf(_SC_NPROCESSORS_CONF) should IMHO return (2) (this currently scans /proc/cpuinfo on alpha and sparc{,64} for ((ncpus|CPUs) probed|cpus detected) and for the rest just returns sysconf(_SC_NPROCESSORS_ONLN)). Neither of the sysconf returned values should be affected by affinity. Jakub ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-04 6:47 ` Jakub Jelinek @ 2007-04-04 7:02 ` Paul Jackson 2007-04-04 14:51 ` Cliff Wickman 1 sibling, 0 replies; 56+ messages in thread From: Paul Jackson @ 2007-04-04 7:02 UTC (permalink / raw) To: Jakub Jelinek; +Cc: akpm, drepper, linux-kernel Jakub wrote: > OpenMP wants (4) and I'll change it that way. Good. Are you referring to a change in glibc or in OpenMP? > sysconf(_SC_NPROCESSORS_ONLN) must return (3) (this currently scans /proc/stat) Ok > sysconf(_SC_NPROCESSORS_CONF) should IMHO return (2) (this currently > scans /proc/cpuinfo on alpha and sparc{,64} for ((ncpus|CPUs) probed|cpus detected) > and for the rest just returns sysconf(_SC_NPROCESSORS_ONLN)). Not quite what I would have guessed, but seems ok. Thanks for spelling this out. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.925.600.0401 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: getting processor numbers 2007-04-04 6:47 ` Jakub Jelinek 2007-04-04 7:02 ` Paul Jackson @ 2007-04-04 14:51 ` Cliff Wickman 1 sibling, 0 replies; 56+ messages in thread From: Cliff Wickman @ 2007-04-04 14:51 UTC (permalink / raw) To: Jakub Jelinek; +Cc: linux-kernel, pj On Wed, Apr 04, 2007 at 02:47:32AM -0400, Jakub Jelinek wrote: > On Tue, Apr 03, 2007 at 07:04:58PM -0700, Paul Jackson wrote: > > There are really at least four "number of CPUs" answers here, and we > > should be aware of which we are providing. There are, in order of > > decreasing size: > > 1) the size of the kernels cpumask_t (NR_CPUS), > > 2) the maximum number of CPUs that might ever be hotplugged into a > > booted system, > > 3) the current number of CPUs online in that system, and > > 4) the number of CPUs that the current task is allowed to use. > > sysconf(_SC_NPROCESSORS_CONF) should IMHO return (2) (this currently > scans /proc/cpuinfo on alpha and sparc{,64} for ((ncpus|CPUs) probed|cpus detected) > and for the rest just returns sysconf(_SC_NPROCESSORS_ONLN)). > Neither of the sysconf returned values should be affected by affinity. I'm looking at an ia64 system, and when a cpu is hot-unplugged it is removed from /proc/cpuinfo. Wouldn't /sys/devices/system/cpu/ be a better source for 2) ? -- Cliff Wickman Silicon Graphics, Inc. cpw@sgi.com (651) 683-3824 ^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2007-04-04 14:53 UTC | newest]
Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-03 16:54 getting processor numbers Ulrich Drepper
2007-04-03 17:30 ` linux-os (Dick Johnson)
2007-04-03 17:37 ` Ulrich Drepper
2007-04-03 17:56 ` Dr. David Alan Gilbert
2007-04-03 18:11 ` Andi Kleen
2007-04-03 17:17 ` Ulrich Drepper
2007-04-03 17:22 ` Alan Cox
2007-04-03 17:30 ` Andi Kleen
2007-04-03 20:24 ` Jeremy Fitzhardinge
2007-04-03 17:27 ` Andi Kleen
2007-04-03 17:30 ` Ulrich Drepper
2007-04-03 17:35 ` Andi Kleen
2007-04-03 17:45 ` Ulrich Drepper
2007-04-03 17:58 ` Andi Kleen
2007-04-03 18:05 ` Ulrich Drepper
2007-04-03 18:11 ` Andi Kleen
2007-04-03 18:21 ` Ulrich Drepper
2007-04-03 17:44 ` Siddha, Suresh B
2007-04-03 17:59 ` Ulrich Drepper
2007-04-03 19:40 ` Jakub Jelinek
2007-04-03 20:13 ` Ingo Oeser
2007-04-03 23:38 ` J.A. Magallón
2007-04-03 19:55 ` Ulrich Drepper
2007-04-03 20:13 ` Siddha, Suresh B
2007-04-03 20:19 ` Ulrich Drepper
2007-04-03 20:32 ` Eric Dumazet
2007-04-03 20:20 ` Nathan Lynch
2007-04-03 19:15 ` Davide Libenzi
2007-04-03 19:32 ` Ulrich Drepper
2007-04-04 0:31 ` H. Peter Anvin
2007-04-04 0:35 ` Jeremy Fitzhardinge
2007-04-04 0:38 ` H. Peter Anvin
2007-04-04 5:09 ` Eric Dumazet
2007-04-04 5:16 ` H. Peter Anvin
2007-04-04 5:22 ` Jeremy Fitzhardinge
2007-04-04 5:40 ` H. Peter Anvin
2007-04-04 5:46 ` Eric Dumazet
2007-04-04 5:29 ` Eric Dumazet
2007-04-03 20:16 ` Andrew Morton
[not found] ` <4612BB89.8040102@redhat.com>
[not found] ` <20070403141348.9bcdb13e.akpm@linux-foundation.org>
2007-04-03 22:13 ` Ulrich Drepper
2007-04-03 22:48 ` Andrew Morton
2007-04-03 23:00 ` Ulrich Drepper
2007-04-03 23:23 ` Andrew Morton
2007-04-03 23:54 ` Ulrich Drepper
2007-04-04 2:55 ` Paul Jackson
2007-04-04 8:39 ` Oleg Nesterov
2007-04-04 9:39 ` Ingo Molnar
2007-04-04 8:57 ` Oleg Nesterov
2007-04-04 10:01 ` Ingo Molnar
2007-04-04 2:58 ` Paul Jackson
2007-04-04 3:04 ` Paul Jackson
2007-04-04 2:52 ` Paul Jackson
2007-04-04 2:04 ` Paul Jackson
2007-04-04 6:47 ` Jakub Jelinek
2007-04-04 7:02 ` Paul Jackson
2007-04-04 14:51 ` Cliff Wickman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox