* Re: [PATCH] Hook up getcpu system call for IA64
2007-02-06 0:07 [PATCH] Hook up getcpu system call for IA64 Fenghua Yu
@ 2007-02-06 21:21 ` Ken Chen
2007-02-06 21:32 ` Yu, Fenghua
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Ken Chen @ 2007-02-06 21:21 UTC (permalink / raw)
To: linux-ia64
Fenghua Yu wrote on Feb 5, 2007 4:07 PM
> getcpu system call returns cpu# and node# on which this system call
> and its caller are running. This patch hooks up its implementation on
> IA64.
I know it's trivial to wire up sys_getcpu, but please take this seriously,
it should be implemented as assembly fsys call.
- Ken
^ permalink raw reply [flat|nested] 7+ messages in thread* RE: [PATCH] Hook up getcpu system call for IA64
2007-02-06 0:07 [PATCH] Hook up getcpu system call for IA64 Fenghua Yu
2007-02-06 21:21 ` Ken Chen
@ 2007-02-06 21:32 ` Yu, Fenghua
2007-02-06 21:32 ` Luck, Tony
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Yu, Fenghua @ 2007-02-06 21:32 UTC (permalink / raw)
To: linux-ia64
>I know it's trivial to wire up sys_getcpu, but please take this
seriously,
>it should be implemented as assembly fsys call.
Ken,
Could you please explain why fsys call? Sys_getcpu takes about 600cycles
on 1.6GHz Montecito. Do we need to save these cycles?
Thanks.
-Fenghua
^ permalink raw reply [flat|nested] 7+ messages in thread* RE: [PATCH] Hook up getcpu system call for IA64
2007-02-06 0:07 [PATCH] Hook up getcpu system call for IA64 Fenghua Yu
2007-02-06 21:21 ` Ken Chen
2007-02-06 21:32 ` Yu, Fenghua
@ 2007-02-06 21:32 ` Luck, Tony
2007-02-06 22:02 ` Ken Chen
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Luck, Tony @ 2007-02-06 21:32 UTC (permalink / raw)
To: linux-ia64
> > getcpu system call returns cpu# and node# on which this system call
> > and its caller are running. This patch hooks up its implementation on
> > IA64.
>
> I know it's trivial to wire up sys_getcpu, but please take this seriously,
> it should be implemented as assembly fsys call.
We did look at whether it was worth doing as a fast syscall.
How often will applications call getcpu() ... I hope that they aren't doing
it hundreds of thousands of times per second (scheduler won't re-assign to
a different cpu more often than every few milli-seconds, so it is pointless
to call that often).
"Slow" syscall is ~1.5us ... which should be in the noise for what I think
is the fastest rate at which applications may call it.
Do you have some example application scenarios where you believe the
rate will be high enough to make a difference?
-Tony
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH] Hook up getcpu system call for IA64
2007-02-06 0:07 [PATCH] Hook up getcpu system call for IA64 Fenghua Yu
` (2 preceding siblings ...)
2007-02-06 21:32 ` Luck, Tony
@ 2007-02-06 22:02 ` Ken Chen
2007-02-07 18:00 ` Luck, Tony
2007-02-08 1:25 ` Zou, Nanhai
5 siblings, 0 replies; 7+ messages in thread
From: Ken Chen @ 2007-02-06 22:02 UTC (permalink / raw)
To: linux-ia64
> Could you please explain why fsys call? Sys_getcpu takes about 600cycles
> on 1.6GHz Montecito. Do we need to save these cycles?
Andi Kleen used to lecture me every so often why it is important to
have fast vgetcpu on x86-64 (I don't think ia64 is excluded from
that). His lecture is also all over the lkml mailing list.
^ permalink raw reply [flat|nested] 7+ messages in thread* RE: [PATCH] Hook up getcpu system call for IA64
2007-02-06 0:07 [PATCH] Hook up getcpu system call for IA64 Fenghua Yu
` (3 preceding siblings ...)
2007-02-06 22:02 ` Ken Chen
@ 2007-02-07 18:00 ` Luck, Tony
2007-02-08 1:25 ` Zou, Nanhai
5 siblings, 0 replies; 7+ messages in thread
From: Luck, Tony @ 2007-02-07 18:00 UTC (permalink / raw)
To: linux-ia64
> Andi Kleen used to lecture me every so often why it is important to
> have fast vgetcpu on x86-64 (I don't think ia64 is excluded from
> that). His lecture is also all over the lkml mailing list.
I dug though that thread ... it isn't ever explicitly stated, but
it does appear that the intent is that a NUMA aware application
would use a malloc() library that called getcpu() on every memory
allocation. If that is the usage model, then I'll agree that
getcpu() does need to be fast.
One possible way to achieve that on ia64 would be to make use of the
fact that the ar.k* registers are readable by applications, and
specifically ar.k3 contains a per-cpu value (physical address of the
kernel percpu area). *IF* (and that is a BIG IF) we were to guarantee
to maintain ar.k3 as a per-cpu unique value, then we could implement
a very fast (even faster than fsys.S) getcpu() as:
#define __NR_getcpu 1304
int
getcpu(unsigned *cpup, unsigned *nodep, void *cachep)
{
static unsigned cpu = ~0, node = ~0;
static unsigned long save_ar_k3;
unsigned long ar_k3;
asm volatile ("mov %0=ar.k3" : "=r" (ar_k3));
if (cpu = ~0 || ar_k3 != save_ar_k3) {
if (syscall(__NR_getcpu, &cpu, &node, 0) = -1)
return -1;
save_ar_k3 = ar_k3;
}
*cpup = cpu;
*nodep = node;
return 0;
}
Too ugly for words? Or worth serious consideration?
-Tony
^ permalink raw reply [flat|nested] 7+ messages in thread* RE: [PATCH] Hook up getcpu system call for IA64
2007-02-06 0:07 [PATCH] Hook up getcpu system call for IA64 Fenghua Yu
` (4 preceding siblings ...)
2007-02-07 18:00 ` Luck, Tony
@ 2007-02-08 1:25 ` Zou, Nanhai
5 siblings, 0 replies; 7+ messages in thread
From: Zou, Nanhai @ 2007-02-08 1:25 UTC (permalink / raw)
To: linux-ia64
> -----Original Message-----
> From: linux-ia64-owner@vger.kernel.org
> [mailto:linux-ia64-owner@vger.kernel.org] On Behalf Of Luck, Tony
> Sent: 2007Äê2ÔÂ8ÈÕ 2:00
> To: Ken Chen; Yu, Fenghua
> Cc: linux-ia64@vger.kernel.org
> Subject: RE: [PATCH] Hook up getcpu system call for IA64
>
> > Andi Kleen used to lecture me every so often why it is important to
> > have fast vgetcpu on x86-64 (I don't think ia64 is excluded from
> > that). His lecture is also all over the lkml mailing list.
>
> I dug though that thread ... it isn't ever explicitly stated, but
> it does appear that the intent is that a NUMA aware application
> would use a malloc() library that called getcpu() on every memory
> allocation. If that is the usage model, then I'll agree that
> getcpu() does need to be fast.
>
> One possible way to achieve that on ia64 would be to make use of the
> fact that the ar.k* registers are readable by applications, and
> specifically ar.k3 contains a per-cpu value (physical address of the
> kernel percpu area). *IF* (and that is a BIG IF) we were to guarantee
> to maintain ar.k3 as a per-cpu unique value, then we could implement
> a very fast (even faster than fsys.S) getcpu() as:
>
> #define __NR_getcpu 1304
>
> int
> getcpu(unsigned *cpup, unsigned *nodep, void *cachep)
> {
> static unsigned cpu = ~0, node = ~0;
> static unsigned long save_ar_k3;
> unsigned long ar_k3;
>
> asm volatile ("mov %0=ar.k3" : "=r" (ar_k3));
> if (cpu = ~0 || ar_k3 != save_ar_k3) {
> if (syscall(__NR_getcpu, &cpu, &node, 0) = -1)
> return -1;
> save_ar_k3 = ar_k3;
> }
> *cpup = cpu;
> *nodep = node;
>
> return 0;
> }
>
>
> Too ugly for words? Or worth serious consideration?
>
> -Tony
I think it is possible to pin ar.k3 to processor id, then change the raw_smp_processor_id() implementation.
This should be better than current raw_smp_processor_id() implementation.
Then we could implement sys_getcpu in fsys.S
That could not be very complex, only thing we need to care is exception handle in fsys code.
Thanks
Zou Nan hai
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread