* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() [not found] ` <200606160822.23898.ak@suse.de> @ 2006-06-16 9:48 ` Jes Sorensen 2006-06-16 10:09 ` Andi Kleen 0 siblings, 1 reply; 27+ messages in thread From: Jes Sorensen @ 2006-06-16 9:48 UTC (permalink / raw) To: Andi Kleen Cc: Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 >>>>> "Andi" = Andi Kleen <ak@suse.de> writes: Andi> On Thursday 15 June 2006 20:44, Tony Luck wrote: >> Another alternative would be to provide a mechanism for a process >> to bind to the current cpu (whatever cpu that happens to be). Then >> the kernel gets to make the smart placement decisions, and >> processes that want to be bound somewhere (but don't really care >> exactly where) have a way to meet their need. Perhaps a cpumask of >> all zeroes to a sched_setaffinity call could be overloaded for >> this? Andi> I tried something like this a few years ago and it just didn't Andi> work (or rather ran usually slower) The scheduler would select a Andi> home node at startup and then try to move the process there. Andi> The problem is that not using a CPU costs you much more than Andi> whatever overhead you get from using non local memory. It all depends on your application and the type of system you are running on. What you say applies to smaller cpu counts. However once we see the upcoming larger count multi-core cpus become commonly available, this is likely to change and become more like what is seen today on larger NUMA systems. In the scientific application space, there are two very common groupings of jobs. One is simply a large threaded application with a lot of intercommunication, often via MPI. In many cases one ends up running a job on just a subset of the system, in which case you want to see threads placed on the same node(s) to minimize internode communication. It is desirable to either force the other tasks on the system (system daemons etc) onto other node(s) to reduce noise and there could also be space to run another parallel job on the remaining node(s). The other common case is to have jobs which spawn off a number of threads that work together in groups (via OpenMP). In this case you would like to have all your OpenMP threads placed on the same node for similar reasons. Not getting this right can result in significant loss of performance for jobs which are highly memory bound or rely heavily on intercommunication and synchronization. Andi> So by default filling the CPUs must be the highest priority and Andi> memory policy cannot interfere with that. I really don't think this approach is going to solve the problem. As Tony also points out, tasks will eventually migrate. The user needs to tell the kernel where it wants to run the tasks rather than the kernel telling the task where it is located. Only the application (or developer/user) knows how the threads are expected to behave, doing this automatically is almost never going to be optimal. Obviously the user needs visibility of the topology of the machine to do so but that should be available on any NUMA system through /proc or /sys. In the scientific space the jobs are often run repeatedly with new data sets every time, so it is worthwhile to spend the effort up front to get the placement right. One-off runs are obviously something else and there your method is going to be more beneficial. IMHO, what we really need is a more advanced way for user applications to hint at the kernel how to place it's threads. Cheers, Jes ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 9:48 ` FOR REVIEW: New x86-64 vsyscall vgetcpu() Jes Sorensen @ 2006-06-16 10:09 ` Andi Kleen 2006-06-16 11:02 ` Jes Sorensen 0 siblings, 1 reply; 27+ messages in thread From: Andi Kleen @ 2006-06-16 10:09 UTC (permalink / raw) To: Jes Sorensen Cc: Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 > It all depends on your application and the type of system you are > running on. What you say applies to smaller cpu counts. However once > we see the upcoming larger count multi-core cpus become commonly > available, this is likely to change and become more like what is seen > today on larger NUMA systems. Maybe. Maybe not. > > In the scientific application space, there are two very common > groupings of jobs. The scientific users just use pinned CPUs and seem to be happy with that. They also have cheap slav^wgrade students to spend lots of time on manual tuning. I'm not concerned about them. If you already use CPU affinity you should already know where you are and don't need this call at all. So this clearly isn't targetted for them. Interesting is getting the best performance from general purpose applications without any special tuning. For them I'm trying to improve things. Number one applications currently are databases and JVMs. I hope with Wolfam's malloc work it will be useful for more applications too. > Andi> So by default filling the CPUs must be the highest priority and > Andi> memory policy cannot interfere with that. > > I really don't think this approach is going to solve the problem. As > Tony also points out, tasks will eventually migrate. Currently we don't solve this problem with the standard heuristics. It can be solved with manual tuning (mempolicy, explicit CPU affinity) but if you're doing that you're already out side the primary use case of vgetcpu(). vgetcpu() is only trying to be a incremental improvement of the current simple default local policy. > The user needs to Scientific users do that, but other users normally not. I doubt that is going to change. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 10:09 ` Andi Kleen @ 2006-06-16 11:02 ` Jes Sorensen 2006-06-16 11:17 ` Andi Kleen 0 siblings, 1 reply; 27+ messages in thread From: Jes Sorensen @ 2006-06-16 11:02 UTC (permalink / raw) To: Andi Kleen Cc: Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 Andi Kleen wrote: >> In the scientific application space, there are two very common >> groupings of jobs. > > The scientific users just use pinned CPUs and seem to be happy with that. > They also have cheap slav^wgrade students to spend lots of time on > manual tuning. I'm not concerned about them. Do they? There's a lot of scientific sites out there which are not universities or research organizations. They do not have free slave labour at hand. A lot of users fall into this category, especially the users with larger systems or large clusters (be it ia64, x86_64 or PPC). > If you already use CPU affinity you should already know where you are and don't > need this call at all. Except that whats currently available isn't sufficient to do what is needed. > So this clearly isn't targetted for them. > > Interesting is getting the best performance from general purpose applications > without any special tuning. For them I'm trying to improve things. Well I am interested in getting the best performance for some of the same applications, without having to modify them. The current affinity support simply isn't sufficient for that. Placement has to be targetted at launch time since thread implementations can change the layout etc. > Number one applications currently are databases and JVMs. I hope with > Wolfam's malloc work it will be useful for more applications too. If you want this to work for general purpose applications, then how is this new syscall going to help? If you expect application vendors to code for it, that means few users will benefit. >> I really don't think this approach is going to solve the problem. As >> Tony also points out, tasks will eventually migrate. > > Currently we don't solve this problem with the standard heuristics. > It can be solved with manual tuning (mempolicy, explicit CPU affinity) > but if you're doing that you're already out side the primary use > case of vgetcpu(). This is another area where the kernel could do better by possibly using the cpumask to determine where it will allocate memory. > vgetcpu() is only trying to be a incremental improvement of the current > simple default local policy. As Tony rightfully pointed out, tasks do migrate. By making this guess initially and then expecting the application to run for a long time, you will end up with it having zero or possibly a negative effect. >> The user needs to > > Scientific users do that, but other users normally not. I doubt that > is going to change. I just use scientific users since thats where I have the most recent detailed data from. Databases could well benefit from what I mentioned, though the serious ones would want to look into using affinity support explicitly in their code. Cheers, Jes ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 11:02 ` Jes Sorensen @ 2006-06-16 11:17 ` Andi Kleen 2006-06-16 11:58 ` Jes Sorensen 0 siblings, 1 reply; 27+ messages in thread From: Andi Kleen @ 2006-06-16 11:17 UTC (permalink / raw) To: Jes Sorensen Cc: Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 > The current affinity > support simply isn't sufficient for that. Placement has to be targetted > at launch time since thread implementations can change the layout etc. I'm not sure how that's related to vgetcpu, but ok ... In general if you want to affect placement below the process / shared memory segment level you should change the application. Anything else just results in a big messy and unreliable and fragile user command line interface - a quick look at the respective Irix manpage should make that clear. > > Number one applications currently are databases and JVMs. I hope with > > Wolfam's malloc work it will be useful for more applications too. > > If you want this to work for general purpose applications, then how is > this new syscall going to help? It will improve their malloc(). They don't know anything about NUMA, but getting local memory will help them. They already get local memory now from the kernel when they use big allocations, but for smaller allocations it doesn't work because the kernel can't give out anything smaller than a page. This would be solved by a NUMA aware malloc, but it needs vgetcpu() for this if it should work without fixed CPU affinity. Basically it is just for extending the existing already used proven etc. default local policy to sub pages. Also there might be other uses of it too (like per CPU data), although I expect most use of that in user space can be already done using TLS. JVM and databases will use it too, but since they often use their own allocators they will need to be modified. > If you expect application vendors to > code for it, that means few users will benefit. Most applications use malloc() > >> I really don't think this approach is going to solve the problem. As > >> Tony also points out, tasks will eventually migrate. > > > > Currently we don't solve this problem with the standard heuristics. > > It can be solved with manual tuning (mempolicy, explicit CPU affinity) > > but if you're doing that you're already out side the primary use > > case of vgetcpu(). > > This is another area where the kernel could do better by possibly using > the cpumask to determine where it will allocate memory. Modify fallback lists based on cpu affinity? Would get messy in the code because you couldn't easily precompute them anymore. But cpusets already does this kind of, even though it has a quite bad impact on fast paths. Also what happens if the affinity mask is modified later? From the high semantics point it is also a little dubious to mesh them together. My feeling is that as a heuristic it is probably dubious. Also when you set cpu affinity you can as well set memory policy iit. > > > vgetcpu() is only trying to be a incremental improvement of the current > > simple default local policy. > > As Tony rightfully pointed out, tasks do migrate. By making this guess > initially The gamble is already there in the local policy. No change at all. When you already got local memory you can use it better with vgetcpu() though. From our experience it works out in most cases though - in general most benchmarks show better performance with simple local NUMA policy than SMP mode or no policy. In the cases where it doesn't you have to either eat the slow down or use manual tuning. > I just use scientific users since thats where I have the most recent > detailed data from. Databases could well benefit from what I mentioned, > though the serious ones would want to look into using affinity support > explicitly in their code. No exactly not - i got requests from "serious" databases to offer vgetcpu() because affinity is too complicated to configure and manage. It sounds like you want to solve NUMA world hunger here, not concentrate on the specific small incremental improvement vgetcpu is trying to offer. I'm sure there is much research that could be done in the general NUMA tuning area, but I would suggest making it research with numbers first before trying to hack like this anything into the kernel without a clear understanding first. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 11:17 ` Andi Kleen @ 2006-06-16 11:58 ` Jes Sorensen 2006-06-16 12:36 ` Zoltan Menyhart 2006-06-16 14:54 ` Andi Kleen 0 siblings, 2 replies; 27+ messages in thread From: Jes Sorensen @ 2006-06-16 11:58 UTC (permalink / raw) To: Andi Kleen Cc: Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 Andi Kleen wrote: >> The current affinity >> support simply isn't sufficient for that. Placement has to be targetted >> at launch time since thread implementations can change the layout etc. > > I'm not sure how that's related to vgetcpu, but ok ... > > In general if you want to affect placement below the process / shared memory > segment level you should change the application. That would be great, except that a lot of these applications are 'standard' applications which people they don't write themselves. Sometimes the sourcecode is no longer available. We could argue that people should just rewrite their applications, but in reality this isn't whats happening. > It will improve their malloc(). They don't know anything about NUMA, > but getting local memory will help them. They already get local > memory now from the kernel when they use big allocations, but > for smaller allocations it doesn't work because the kernel can't > give out anything smaller than a page. This would be solved > by a NUMA aware malloc, but it needs vgetcpu() for this if it > should work without fixed CPU affinity. I really don't see the benefit here. malloc already gets pages handed down from the kernel which are node local due to them being assigned at a first touch basis. I am not sure about glibc's malloc internals, but rather rely on a vgetcpu() call, all it really needs to do is to keep a thread local pool which will automatically get it's thing locally through first touch usage. I don't see how a new syscall is going to provide anything to malloc that it doesn't already have. What am I missing? > Basically it is just for extending the existing already used proven etc. > default local policy to sub pages. Also there might be other uses > of it too (like per CPU data), although I expect most use of that > in user space can be already done using TLS. The thread libraries already have their own thread local area which should be allocated on the thread's own node if done right, which I assume it is. > JVM and databases will use it too, but since they often use their > own allocators they will need to be modified. I would assume the real databases to be smart enough to benefit from things being first touch already. JVMs .... well who knows, can't say I have a lot of faith in anything running in a JVM :) >> If you expect application vendors to >> code for it, that means few users will benefit. > > Most applications use malloc() Which doesn't need the vgetcpu() call as far as I can see. >> This is another area where the kernel could do better by possibly using >> the cpumask to determine where it will allocate memory. > > Modify fallback lists based on cpu affinity? It's a hint, not guaranteed placement. You have the same problem if you try to allocate memory on a node and there's nothing left there. > But cpusets already does this kind of, even though it has a quite > bad impact on fast paths. > Also what happens if the affinity mask is modified later? > From the high semantics point it is also a little dubious to mesh > them together. My feeling is that as a heuristic it is probably > dubious. If you migrate your app elsewhere, you should migrate the pages with it, or not expect things to run with the local effect. > The gamble is already there in the local policy. No change at all. > When you already got local memory you can use it better with > vgetcpu() though. > > From our experience it works out in most cases though - in general > most benchmarks show better performance with simple local NUMA > policy than SMP mode or no policy. Could you share some information about the type of benchmarks? >> I just use scientific users since thats where I have the most recent >> detailed data from. Databases could well benefit from what I mentioned, >> though the serious ones would want to look into using affinity support >> explicitly in their code. > > No exactly not - i got requests from "serious" databases to offer > vgetcpu() because affinity is too complicated to configure and manage. > > It sounds like you want to solve NUMA world hunger here, not > concentrate on the specific small incremental improvement vgetcpu is trying > to offer. I don't really see the point in solving something half way when it can be done better. Maybe the "serious" databases should open up and let us know what the problem is they are hitting. > I'm sure there is much research that could be done in the general NUMA > tuning area, but I would suggest making it research with numbers first > before trying to hack like this anything into the kernel without > a clear understanding first. Well I did spend a good chunk of time looking at some of this some time ago and did speek a lot to one of my colleagues who actually runs benchmarks using some of these tools to understand the impact. If anything it seems that vgetcpu is the issue that is still in the research stage. Cheers, Jes ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 11:58 ` Jes Sorensen @ 2006-06-16 12:36 ` Zoltan Menyhart 2006-06-16 12:41 ` Jes Sorensen 2006-06-16 14:56 ` Andi Kleen 2006-06-16 14:54 ` Andi Kleen 1 sibling, 2 replies; 27+ messages in thread From: Zoltan Menyhart @ 2006-06-16 12:36 UTC (permalink / raw) To: Jes Sorensen Cc: Andi Kleen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 Just to make sure I understand it correctly... Assuming I have allocated per CPU data (numa control, etc.) pointed at by: void *per_cpu[MAXCPUS]; Assuming a per CPU variable has got an "offset" in each per CPU data area. Accessing this variable can be done as follows: err = vgetcpu(&my_cpu, ...); if (err) goto .... pointer = (typeof pointer) (per_cpu[my_cpu] + offset); // use "pointer"... It is hundred times more long than "__get_per_cpu(var)++". As we do not know when we can be moved to another CPU, "vgetcpu()" has to be called again after a "reasonable short" time. My idea is to map the current task structure at an arch. dependent virtual address into the user space (obviously in RO). #define current ((struct task_struct *) 0x...) No more need to for "vgetcpu()" at all. The example above becomes: pointer = (typeof pointer) (per_cpu[current->thread_info.cpu] + offset); // use "pointer"... As obtaining "pointer" does not cost much, it can be re-calculated at each usage => no problem to know when to recheck it, there is less chance for using the data of a neighbor. Regards, Zoltan Menyhart ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 12:36 ` Zoltan Menyhart @ 2006-06-16 12:41 ` Jes Sorensen 2006-06-16 12:48 ` Zoltan Menyhart 2006-06-16 14:56 ` Andi Kleen 1 sibling, 1 reply; 27+ messages in thread From: Jes Sorensen @ 2006-06-16 12:41 UTC (permalink / raw) To: Zoltan Menyhart Cc: Andi Kleen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 Zoltan Menyhart wrote: > Just to make sure I understand it correctly... > Assuming I have allocated per CPU data (numa control, etc.) pointed at by: I think you misunderstood - vgetcpu is for userland usage, not within the kernel. Cheers, Jes ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 12:41 ` Jes Sorensen @ 2006-06-16 12:48 ` Zoltan Menyhart 2006-06-16 21:04 ` Chase Venters 0 siblings, 1 reply; 27+ messages in thread From: Zoltan Menyhart @ 2006-06-16 12:48 UTC (permalink / raw) To: Jes Sorensen Cc: Andi Kleen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 Jes Sorensen wrote: > Zoltan Menyhart wrote: > >>Just to make sure I understand it correctly... >>Assuming I have allocated per CPU data (numa control, etc.) pointed at by: > > > I think you misunderstood - vgetcpu is for userland usage, not within > the kernel. > > Cheers, > Jes > I did understand it as a user land stuff. This is why I want to map the current task structure into the user space. In user code, we could see the actual value of the "current->thread_info.cpu". My "#define current ((struct task_struct *) 0x...)" is not the same as the kernel's one. Thanks, Zoltan ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 12:48 ` Zoltan Menyhart @ 2006-06-16 21:04 ` Chase Venters 0 siblings, 0 replies; 27+ messages in thread From: Chase Venters @ 2006-06-16 21:04 UTC (permalink / raw) To: Zoltan Menyhart Cc: Jes Sorensen, Andi Kleen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 On Fri, 16 Jun 2006, Zoltan Menyhart wrote: > Jes Sorensen wrote: >> Zoltan Menyhart wrote: >> >> > Just to make sure I understand it correctly... >> > Assuming I have allocated per CPU data (numa control, etc.) pointed at >> > by: >> >> >> I think you misunderstood - vgetcpu is for userland usage, not within >> the kernel. >> >> Cheers, >> Jes >> > I did understand it as a user land stuff. > This is why I want to map the current task structure into the user space. > In user code, we could see the actual value of the > "current->thread_info.cpu". > My "#define current ((struct task_struct *) 0x...)" is not the same as > the kernel's one. I think it's probably best to leave most of the stuff in task_struct private (ie, mapped in kernel only). > Thanks, > > Zoltan Thanks, Chase ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 12:36 ` Zoltan Menyhart 2006-06-16 12:41 ` Jes Sorensen @ 2006-06-16 14:56 ` Andi Kleen 2006-06-16 15:31 ` Zoltan Menyhart 2006-06-16 15:36 ` Brent Casavant 1 sibling, 2 replies; 27+ messages in thread From: Andi Kleen @ 2006-06-16 14:56 UTC (permalink / raw) To: Zoltan Menyhart Cc: Jes Sorensen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 On Friday 16 June 2006 14:36, Zoltan Menyhart wrote: > Just to make sure I understand it correctly... > Assuming I have allocated per CPU data (numa control, etc.) pointed at by: > > void *per_cpu[MAXCPUS]; That is not how user space TLS works. It usually has a base a register. > > Assuming a per CPU variable has got an "offset" in each per CPU data area. > Accessing this variable can be done as follows: > > err = vgetcpu(&my_cpu, ...); > if (err) > goto .... > pointer = (typeof pointer) (per_cpu[my_cpu] + offset); > // use "pointer"... > > It is hundred times more long than "__get_per_cpu(var)++". 14 cycles is not a 100 times longer. > My idea is to map the current task structure at an arch. dependent > virtual address into the user space (obviously in RO). > > #define current ((struct task_struct *) 0x...) This means it cannot be cache colored (because you would need a static offset) and you couldn't share task_structs on a page. Also you would make task_struct part of the userland ABI which seems like a very very bad idea to me. It means we couldn't change it anymore. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 14:56 ` Andi Kleen @ 2006-06-16 15:31 ` Zoltan Menyhart 2006-06-16 15:37 ` Andi Kleen 2006-06-16 21:12 ` Chase Venters 2006-06-16 15:36 ` Brent Casavant 1 sibling, 2 replies; 27+ messages in thread From: Zoltan Menyhart @ 2006-06-16 15:31 UTC (permalink / raw) To: Andi Kleen Cc: Jes Sorensen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 Andi Kleen wrote: > That is not how user space TLS works. It usually has a base a register. Can you please give me a real life (simplified) example? > This means it cannot be cache colored (because you would need a static > offset) and you couldn't share task_structs on a page. I do not see the problem. Can you explain please? E.g. the scheduler pulls a task instead of the current one. The CPU will see "current->thread_info.cpu"-s of all the tasks at the same offset anyway. > Also you would make task_struct part of the userland ABI which > seems like a very very bad idea to me. It means we couldn't change > it anymore. We can make some wrapper, e.g.: user_per_cpu_var(name, offset) "vgetcpu()" would also be added to the ABI which we couldn't change easily either. Thanks, Zoltan ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 15:31 ` Zoltan Menyhart @ 2006-06-16 15:37 ` Andi Kleen 2006-06-16 15:58 ` Jakub Jelinek 2006-06-16 21:12 ` Chase Venters 1 sibling, 1 reply; 27+ messages in thread From: Andi Kleen @ 2006-06-16 15:37 UTC (permalink / raw) To: Zoltan Menyhart Cc: Jes Sorensen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 On Friday 16 June 2006 17:31, Zoltan Menyhart wrote: > Andi Kleen wrote: > > > That is not how user space TLS works. It usually has a base a register. > > Can you please give me a real life (simplified) example? On x86-64 it's just %fs:offset. gcc is a bit dumb on this and usually loads the base address from %fs:0 first. > > > This means it cannot be cache colored (because you would need a static > > offset) and you couldn't share task_structs on a page. > > I do not see the problem. Your scheme relies on task_struct fields being on a known offset in the page. But slab cache coloring varies the offset to make the data spread out better in the caches. > Can you explain please? > E.g. the scheduler pulls a task instead of the current one. The CPU > will see "current->thread_info.cpu"-s of all the tasks at the same > offset anyway. It varies relative to the start of page. That was one of the bigger wins relative to the task_struct in stack page of 2.4 had. > > > Also you would make task_struct part of the userland ABI which > > seems like a very very bad idea to me. It means we couldn't change > > it anymore. > > We can make some wrapper, e.g.: > > user_per_cpu_var(name, offset) You would need to wrap everything and likely users would like task_struct so much that they accessed it anyways without your wrappers. > "vgetcpu()" would also be added to the ABI which we couldn't change > easily either. Yes, but it's a defined function. No different from a system call. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 15:37 ` Andi Kleen @ 2006-06-16 15:58 ` Jakub Jelinek 2006-06-16 16:24 ` Andi Kleen 0 siblings, 1 reply; 27+ messages in thread From: Jakub Jelinek @ 2006-06-16 15:58 UTC (permalink / raw) To: Andi Kleen Cc: Zoltan Menyhart, Jes Sorensen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 On Fri, Jun 16, 2006 at 05:37:06PM +0200, Andi Kleen wrote: > On Friday 16 June 2006 17:31, Zoltan Menyhart wrote: > > Andi Kleen wrote: > > > > > That is not how user space TLS works. It usually has a base a register. > > > > Can you please give me a real life (simplified) example? > > On x86-64 it's just %fs:offset. gcc is a bit dumb on this and usually > loads the base address from %fs:0 first. GCC is not dumb, unless you force it with -mno-tls-direct-seg-refs. Guess you are bitten by SUSE GCC hack which makes -mno-tls-direct-seg-refs the default (especially on x86-64 it is a really bad idea). Jakub ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 15:58 ` Jakub Jelinek @ 2006-06-16 16:24 ` Andi Kleen 2006-06-16 16:33 ` Jakub Jelinek 0 siblings, 1 reply; 27+ messages in thread From: Andi Kleen @ 2006-06-16 16:24 UTC (permalink / raw) To: Jakub Jelinek Cc: Zoltan Menyhart, Jes Sorensen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 On Friday 16 June 2006 17:58, Jakub Jelinek wrote: > On Fri, Jun 16, 2006 at 05:37:06PM +0200, Andi Kleen wrote: > > On Friday 16 June 2006 17:31, Zoltan Menyhart wrote: > > > Andi Kleen wrote: > > > > > > > That is not how user space TLS works. It usually has a base a register. > > > > > > Can you please give me a real life (simplified) example? > > > > On x86-64 it's just %fs:offset. gcc is a bit dumb on this and usually > > loads the base address from %fs:0 first. > > GCC is not dumb, unless you force it with -mno-tls-direct-seg-refs. > Guess you are bitten by SUSE GCC hack which makes -mno-tls-direct-seg-refs > the default (especially on x86-64 it is a really bad idea). I apparently got indeed. I wonder why it happened on x86-64 though - i thought there were no negative offsets on x86-64 TLS. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 16:24 ` Andi Kleen @ 2006-06-16 16:33 ` Jakub Jelinek 0 siblings, 0 replies; 27+ messages in thread From: Jakub Jelinek @ 2006-06-16 16:33 UTC (permalink / raw) To: Andi Kleen Cc: Zoltan Menyhart, Jes Sorensen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 On Fri, Jun 16, 2006 at 06:24:52PM +0200, Andi Kleen wrote: > I wonder why it happened on x86-64 though - i thought there were no negative > offsets on x86-64 TLS. It uses negative offsets for __thread vars and positive are reserved for implementation (i.e. glibc). But as %fs in 64-bit programs is just msr 0xc0000100 base addition, with no segment limit, neither Xen nor VMWare can play limit tricks with it. Jakub ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 15:31 ` Zoltan Menyhart 2006-06-16 15:37 ` Andi Kleen @ 2006-06-16 21:12 ` Chase Venters 1 sibling, 0 replies; 27+ messages in thread From: Chase Venters @ 2006-06-16 21:12 UTC (permalink / raw) To: Zoltan Menyhart Cc: Andi Kleen, Jes Sorensen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 On Fri, 16 Jun 2006, Zoltan Menyhart wrote: > Andi Kleen wrote: > >> That is not how user space TLS works. It usually has a base a register. > > Can you please give me a real life (simplified) example? > >> This means it cannot be cache colored (because you would need a static >> offset) and you couldn't share task_structs on a page. > > I do not see the problem. Can you explain please? > E.g. the scheduler pulls a task instead of the current one. The CPU > will see "current->thread_info.cpu"-s of all the tasks at the same > offset anyway. Memory maps have to fall on page boundaries for lots of various reasons. Assuming a 16-word cache line, you've got plenty of spots you could align task_struct to within a page. (That number of spots is actually constrained by either sizeof(task_struct) or the number of colors). The bottom line is that task_struct won't always be on a page boundary. If it's not on a page boundary in the physical page frames, it's not going to be on a page boundary in virtual memory either. (Note also that if two task_structs shared a page, you'd have an information leak. I'm not sure with sizeof(task_struct) and cache alignment if task_structs are small enough for sharing, though. Definitely on hugepages.) Thanks, Chase ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 14:56 ` Andi Kleen 2006-06-16 15:31 ` Zoltan Menyhart @ 2006-06-16 15:36 ` Brent Casavant 2006-06-16 15:40 ` Andi Kleen 1 sibling, 1 reply; 27+ messages in thread From: Brent Casavant @ 2006-06-16 15:36 UTC (permalink / raw) To: Andi Kleen Cc: Zoltan Menyhart, Jes Sorensen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 On Fri, 16 Jun 2006, Andi Kleen wrote: > On Friday 16 June 2006 14:36, Zoltan Menyhart wrote: > > My idea is to map the current task structure at an arch. dependent > > virtual address into the user space (obviously in RO). > > > > #define current ((struct task_struct *) 0x...) > > This means it cannot be cache colored (because you would need a static > offset) and you couldn't share task_structs on a page. > > Also you would make task_struct part of the userland ABI which > seems like a very very bad idea to me. It means we couldn't change > it anymore. To this last point, it might be more reasonable to map in a page that contained a new structure with a stable ABI, which mirrored some of the task_struct information, and likely other useful information as needs are identified in the future. In any case, it would be hard to beat a single memory read for performance. Cache-coloring and kernel bookkeeping effects could be minimized if this was provided as an mmaped page from a device driver, used only by applications which care. This does work somewhat contrary to the idea of getting support into glibc, unless glibc only used this capability when asked to through some sort of environment variable or other run-time configuration. Brent -- Brent Casavant All music is folk music. I ain't bcasavan@sgi.com never heard a horse sing a song. Silicon Graphics, Inc. -- Louis Armstrong ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 15:36 ` Brent Casavant @ 2006-06-16 15:40 ` Andi Kleen 2006-06-16 21:15 ` Chase Venters 2006-06-16 21:19 ` Chase Venters 0 siblings, 2 replies; 27+ messages in thread From: Andi Kleen @ 2006-06-16 15:40 UTC (permalink / raw) To: Brent Casavant Cc: Zoltan Menyhart, Jes Sorensen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 > To this last point, it might be more reasonable to map in a page that > contained a new structure with a stable ABI, which mirrored some of > the task_struct information, and likely other useful information as > needs are identified in the future. In any case, it would be hard > to beat a single memory read for performance. That would mean making the context switch and possibly other things slower. In general you would need to make a very good case first that all this complexity is worth it. > Cache-coloring and kernel bookkeeping effects could be minimized if this > was provided as an mmaped page from a device driver, used only by > applications which care. I don't see what difference that would make. You would still have the fixed offset problem and doing things on demand often tends to be even more complex. -Andi (who thinks these proposals all sound very messy) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 15:40 ` Andi Kleen @ 2006-06-16 21:15 ` Chase Venters 2006-06-16 21:19 ` Chase Venters 1 sibling, 0 replies; 27+ messages in thread From: Chase Venters @ 2006-06-16 21:15 UTC (permalink / raw) To: Andi Kleen Cc: Brent Casavant, Zoltan Menyhart, Jes Sorensen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 On Fri, 16 Jun 2006, Andi Kleen wrote: > >> To this last point, it might be more reasonable to map in a page that >> contained a new structure with a stable ABI, which mirrored some of >> the task_struct information, and likely other useful information as >> needs are identified in the future. In any case, it would be hard >> to beat a single memory read for performance. > > That would mean making the context switch and possibly other > things slower. > > In general you would need to make a very good case first that all this > complexity is worth it. > >> Cache-coloring and kernel bookkeeping effects could be minimized if this >> was provided as an mmaped page from a device driver, used only by >> applications which care. > > I don't see what difference that would make. You would still > have the fixed offset problem and doing things on demand often tends > to be even more complex. > > > -Andi (who thinks these proposals all sound very messy) > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 15:40 ` Andi Kleen 2006-06-16 21:15 ` Chase Venters @ 2006-06-16 21:19 ` Chase Venters 2006-06-16 23:40 ` Brent Casavant 2006-06-17 6:55 ` [discuss] " Andi Kleen 1 sibling, 2 replies; 27+ messages in thread From: Chase Venters @ 2006-06-16 21:19 UTC (permalink / raw) To: Andi Kleen Cc: Brent Casavant, Zoltan Menyhart, Jes Sorensen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 (Sorry for the empty reply! Pine over a laggy SSH connection is annoying sometimes) On Fri, 16 Jun 2006, Andi Kleen wrote: > >> To this last point, it might be more reasonable to map in a page that >> contained a new structure with a stable ABI, which mirrored some of >> the task_struct information, and likely other useful information as >> needs are identified in the future. In any case, it would be hard >> to beat a single memory read for performance. > > That would mean making the context switch and possibly other > things slower. Well, if every process had a page of its own, what would the context switch overhead be? But, I'm not advocating exporting anything. Though I sort of like the vgetcpu() idea because I was working on a user-space slab allocator recently and magazines could use vgetcpu() instead of pthread keys. (Also means if threads > cpus I'd get better results). Thanks, Chase ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 21:19 ` Chase Venters @ 2006-06-16 23:40 ` Brent Casavant 2006-06-17 6:58 ` Andi Kleen 2006-06-17 6:55 ` [discuss] " Andi Kleen 1 sibling, 1 reply; 27+ messages in thread From: Brent Casavant @ 2006-06-16 23:40 UTC (permalink / raw) To: Chase Venters Cc: Andi Kleen, Zoltan Menyhart, Jes Sorensen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 On Fri, 16 Jun 2006, Chase Venters wrote: > On Fri, 16 Jun 2006, Andi Kleen wrote: > > > > > > To this last point, it might be more reasonable to map in a page that > > > contained a new structure with a stable ABI, which mirrored some of > > > the task_struct information, and likely other useful information as > > > needs are identified in the future. In any case, it would be hard > > > to beat a single memory read for performance. > > > > That would mean making the context switch and possibly other > > things slower. > > Well, if every process had a page of its own, what would the context switch > overhead be? Mostly copying the useful information into the read-only mapped page. However, this doesn't have to be all that expensive. The particular information we care about in this case only needs to be copied when a task begins running on a CPU different from the one it last ran on. In fact, on ia64 we already have something very similar to handle certain I/O pecularities on SN2. http://marc.theaimsgroup.com/?l=linux-ia64&m\x113831137712197&w=2 That work could form the basis for a low-impact method of exporting the current CPU to user space via a read-only mapped page. I'll admit to having zero knowledge of whether this would be workable on anything other than ia64. Thanks, Brent -- Brent Casavant All music is folk music. I ain't bcasavan@sgi.com never heard a horse sing a song. Silicon Graphics, Inc. -- Louis Armstrong ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 23:40 ` Brent Casavant @ 2006-06-17 6:58 ` Andi Kleen 0 siblings, 0 replies; 27+ messages in thread From: Andi Kleen @ 2006-06-17 6:58 UTC (permalink / raw) To: Brent Casavant Cc: Chase Venters, Zoltan Menyhart, Jes Sorensen, Tony Luck, discuss, linux-kernel, libc-alpha, vojtech, linux-ia64 > That work could form the basis for a low-impact method of exporting > the current CPU to user space via a read-only mapped page. I'll admit > to having zero knowledge of whether this would be workable on anything > other than ia64. On x86 per CPU mappings are not really feasible. That is because the CPU uses the Linux page tables directly and to change them per CPU you would need to fork them per CPU. That would add so much complications that I don't even want to think them all through ... -andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [discuss] Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 21:19 ` Chase Venters 2006-06-16 23:40 ` Brent Casavant @ 2006-06-17 6:55 ` Andi Kleen 2006-06-19 8:42 ` Zoltan Menyhart 1 sibling, 1 reply; 27+ messages in thread From: Andi Kleen @ 2006-06-17 6:55 UTC (permalink / raw) To: discuss Cc: Chase Venters, Brent Casavant, Zoltan Menyhart, Jes Sorensen, Tony Luck, linux-kernel, libc-alpha, vojtech, linux-ia64 On Friday 16 June 2006 23:19, Chase Venters wrote: > On Fri, 16 Jun 2006, Andi Kleen wrote: > >> To this last point, it might be more reasonable to map in a page that > >> contained a new structure with a stable ABI, which mirrored some of > >> the task_struct information, and likely other useful information as > >> needs are identified in the future. In any case, it would be hard > >> to beat a single memory read for performance. > > > > That would mean making the context switch and possibly other > > things slower. > > Well, if every process had a page of its own, what would the context > switch overhead be? For process zero, for thread quite high on x86 because you would need per CPU page tables. Doing that would be extremly nasty because you would potentially need to allocate a new set of page tables every time the process is scheduled to a new CPU it hasn't run on before. If you limit it to a process then you can't get the current CPU from such a mapping because a process can run threaded on multiple CPUs. My reference was more to high suggestion of keeping a second version of task_struct for export. That would require changing everything in task struct that is changed on switch_to and should be exported in the other function too. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [discuss] Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-17 6:55 ` [discuss] " Andi Kleen @ 2006-06-19 8:42 ` Zoltan Menyhart 2006-06-19 8:54 ` Andi Kleen 0 siblings, 1 reply; 27+ messages in thread From: Zoltan Menyhart @ 2006-06-19 8:42 UTC (permalink / raw) To: Andi Kleen Cc: discuss, Chase Venters, Brent Casavant, Jes Sorensen, Tony Luck, linux-kernel, libc-alpha, vojtech, linux-ia64 Brent Casavant wrote: > To this last point, it might be more reasonable to map in a page that > contained a new structure with a stable ABI, which mirrored some of > the task_struct information, and likely other useful information as > needs are identified in the future. In any case, it would be hard > to beat a single memory read for performance. > > Cache-coloring and kernel bookkeeping effects could be minimized if this > was provided as an mmaped page from a device driver, used only by > applications which care. This does work somewhat contrary to the idea of > getting support into glibc, unless glibc only used this capability when > asked to through some sort of environment variable or other run-time > configuration. Quite O.K. for me. Andi Kleen wrote: >>Well, if every process had a page of its own, what would the context >>switch overhead be? > For process zero, for thread quite high on x86 because you > would need per CPU page tables. Doing that would be extremly > nasty because you would potentially need to allocate a new > set of page tables every time the process is scheduled to a new > CPU it hasn't run on before. Probably I have not explained it correctly: - The "information page" (that includes the current CPU no.) is not a per CPU page - This page is just another page that is mapped at a "well known" user virtual address (for those who are interested in) - As you do not do any special action for each user page on context switch, there is nothing to to this page either - The scheduler sometimes migrates a task, then it updates the the current CPU number on the "information page" > My reference was more to high suggestion of keeping a second version > of task_struct for export. That would require changing everything > in task struct that is changed on switch_to and should be exported > in the other function too. It depends on what else can be in this "information page". As for the current CPU no., you need a single store on each task migration. Thanks, Zoltan ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [discuss] Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-19 8:42 ` Zoltan Menyhart @ 2006-06-19 8:54 ` Andi Kleen 0 siblings, 0 replies; 27+ messages in thread From: Andi Kleen @ 2006-06-19 8:54 UTC (permalink / raw) To: Zoltan Menyhart Cc: discuss, Chase Venters, Brent Casavant, Jes Sorensen, Tony Luck, linux-kernel, libc-alpha, vojtech, linux-ia64 > Probably I have not explained it correctly: > - The "information page" (that includes the current CPU no.) is not a > per CPU page If it isn't then you can't figure out the current CPU/node for a thread. Anyways I think we're talking past each other. Your approach might even work on ia64 (at least if you're willing to add a lot of cost to the context switch). You presumably could implement vgetcpu() internally with an approach like this (although with IA64's fast EPC calls it seems a bit pointless) It just won't work on x86. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [discuss] Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 11:58 ` Jes Sorensen 2006-06-16 12:36 ` Zoltan Menyhart @ 2006-06-16 14:54 ` Andi Kleen 2006-06-20 8:28 ` Jes Sorensen 1 sibling, 1 reply; 27+ messages in thread From: Andi Kleen @ 2006-06-16 14:54 UTC (permalink / raw) To: discuss Cc: Jes Sorensen, Tony Luck, linux-kernel, libc-alpha, vojtech, linux-ia64 > I really don't see the benefit here. malloc already gets pages handed > down from the kernel which are node local due to them being assigned at > a first touch basis. I am not sure about glibc's malloc internals, but > rather rely on a vgetcpu() call, all it really needs to do is to keep > a thread local pool which will automatically get it's thing locally > through first touch usage. That would add too much overhead on small systems. It's better to be able to share the pools. vgetcpu allows that. > > Basically it is just for extending the existing already used proven etc. > > default local policy to sub pages. Also there might be other uses > > of it too (like per CPU data), although I expect most use of that > > in user space can be already done using TLS. > > The thread libraries already have their own thread local area which > should be allocated on the thread's own node if done right, which I > assume it is. - The heap for small allocations is shared (although this can be tuned) - When another thread does free() you need special handling to keep the item in the correct free lists This is one of the tricky bits in the new kernel NUMA slab allocator too. > > But cpusets already does this kind of, even though it has a quite > > bad impact on fast paths. > > Also what happens if the affinity mask is modified later? > > From the high semantics point it is also a little dubious to mesh > > them together. My feeling is that as a heuristic it is probably > > dubious. > > If you migrate your app elsewhere, you should migrate the pages with it, > or not expect things to run with the local effect. That's too costly to do by default and you have no guarantee that it will amortize. > I don't really see the point in solving something half way when it can > be done better. Maybe the "serious" databases should open up and let us > know what the problem is they are hitting. I see no indication of anything better so far from you. You only offered static configuration instead which while in some cases is better doesn't work in the general case. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [discuss] Re: FOR REVIEW: New x86-64 vsyscall vgetcpu() 2006-06-16 14:54 ` Andi Kleen @ 2006-06-20 8:28 ` Jes Sorensen 0 siblings, 0 replies; 27+ messages in thread From: Jes Sorensen @ 2006-06-20 8:28 UTC (permalink / raw) To: Andi Kleen Cc: discuss, Tony Luck, linux-kernel, libc-alpha, vojtech, linux-ia64 Andi Kleen wrote: >> I really don't see the benefit here. malloc already gets pages handed >> down from the kernel which are node local due to them being assigned at >> a first touch basis. I am not sure about glibc's malloc internals, but >> rather rely on a vgetcpu() call, all it really needs to do is to keep >> a thread local pool which will automatically get it's thing locally >> through first touch usage. > > That would add too much overhead on small systems. It's better to be > able to share the pools. vgetcpu allows that. How do you expect to be able to share the pools? Or are you saying you just one page per numa node? Having a page per thread is not noticable and for databases, which was your primary target usergroup, I think it's fair to see it won't even be visible as noise. >>> Basically it is just for extending the existing already used proven etc. >>> default local policy to sub pages. Also there might be other uses >>> of it too (like per CPU data), although I expect most use of that >>> in user space can be already done using TLS. >> The thread libraries already have their own thread local area which >> should be allocated on the thread's own node if done right, which I >> assume it is. > > - The heap for small allocations is shared (although this can be tuned) > - When another thread does free() you need special handling to keep > the item in the correct free lists > This is one of the tricky bits in the new kernel NUMA slab allocator > too. It should be pretty easy to make the allocator aware of the per thread regions based on the address. >> If you migrate your app elsewhere, you should migrate the pages with it, >> or not expect things to run with the local effect. > > That's too costly to do by default and you have no guarantee that it will amortize. But if you don't migrate the pages with it, the numa aware allocation is wasted anyway, whether you do it on a first-touch basis or using vgetcpu. >> I don't really see the point in solving something half way when it can >> be done better. Maybe the "serious" databases should open up and let us >> know what the problem is they are hitting. > > I see no indication of anything better so far from you. You only offered > static configuration instead which while in some cases is better > doesn't work in the general case. Static configuration? I never said anything about that, I said that libc should offer a memory pool per thread and have it created when it's first touched by the thread. That solves exactly what you have described so far unless is something else you also expect to benefit from vgetcpu(). Jes ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2006-06-20 8:28 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200606140942.31150.ak@suse.de>
[not found] ` <12c511ca0606151144i140c21e5w90dd948af9b536a4@mail.gmail.com>
[not found] ` <200606160822.23898.ak@suse.de>
2006-06-16 9:48 ` FOR REVIEW: New x86-64 vsyscall vgetcpu() Jes Sorensen
2006-06-16 10:09 ` Andi Kleen
2006-06-16 11:02 ` Jes Sorensen
2006-06-16 11:17 ` Andi Kleen
2006-06-16 11:58 ` Jes Sorensen
2006-06-16 12:36 ` Zoltan Menyhart
2006-06-16 12:41 ` Jes Sorensen
2006-06-16 12:48 ` Zoltan Menyhart
2006-06-16 21:04 ` Chase Venters
2006-06-16 14:56 ` Andi Kleen
2006-06-16 15:31 ` Zoltan Menyhart
2006-06-16 15:37 ` Andi Kleen
2006-06-16 15:58 ` Jakub Jelinek
2006-06-16 16:24 ` Andi Kleen
2006-06-16 16:33 ` Jakub Jelinek
2006-06-16 21:12 ` Chase Venters
2006-06-16 15:36 ` Brent Casavant
2006-06-16 15:40 ` Andi Kleen
2006-06-16 21:15 ` Chase Venters
2006-06-16 21:19 ` Chase Venters
2006-06-16 23:40 ` Brent Casavant
2006-06-17 6:58 ` Andi Kleen
2006-06-17 6:55 ` [discuss] " Andi Kleen
2006-06-19 8:42 ` Zoltan Menyhart
2006-06-19 8:54 ` Andi Kleen
2006-06-16 14:54 ` Andi Kleen
2006-06-20 8:28 ` Jes Sorensen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox