public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* ia64 printk_clock()
@ 2006-02-02 20:44 Jack Steiner
  2006-02-02 21:22 ` Dean Roe
                   ` (21 more replies)
  0 siblings, 22 replies; 23+ messages in thread
From: Jack Steiner @ 2006-02-02 20:44 UTC (permalink / raw)
  To: linux-ia64

We are seeing kernel hangs in early boot if CONFIG_PRINTK_TIME is
enabled. The hang is caused by a reference in sched_clock() to 
the percpu data area before it is initialized. 

I know there was discussion about this late last summer & I thought
Tony posted a patch. Does anyone recall the status.

AFAICT, we see failures on SN systems but other IA64 systems work ok.

---
Jack

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
@ 2006-02-02 21:22 ` Dean Roe
  2006-02-02 21:46 ` Luck, Tony
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Dean Roe @ 2006-02-02 21:22 UTC (permalink / raw)
  To: linux-ia64

On Thu, Feb 02, 2006 at 02:44:22PM -0600, Jack Steiner wrote:
> We are seeing kernel hangs in early boot if CONFIG_PRINTK_TIME is
> enabled. The hang is caused by a reference in sched_clock() to 
> the percpu data area before it is initialized. 
> 
> I know there was discussion about this late last summer & I thought
> Tony posted a patch. Does anyone recall the status.
> 
> AFAICT, we see failures on SN systems but other IA64 systems work ok.
> 
> ---
> Jack

Jack, I think this is the thread you are looking for:
    http://marc.theaimsgroup.com/?l=linux-ia64&m\x112422250517703&w=2

Dean

-- 
Dean Roe
Silicon Graphics, Inc.
roe@sgi.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
  2006-02-02 21:22 ` Dean Roe
@ 2006-02-02 21:46 ` Luck, Tony
  2006-02-02 21:50 ` Jack Steiner
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Luck, Tony @ 2006-02-02 21:46 UTC (permalink / raw)
  To: linux-ia64

> Jack, I think this is the thread you are looking for:
>    http://marc.theaimsgroup.com/?l=linux-ia64&m\x112422250517703&w=2

The current state of the tree is that Andrew added the patch
to make the definition of printk_clock() in kernel/printk.c
"__attribute__((weak))" so that architectures could override
with their own version.  Default version uses sched_clock().

So far we haven't made use of this ... but we need to do it (partly
because it crashes early on Altix, but also because it returns
meaningless values on any system where the clocks are not synched
on different cpus (perhaps just Altix).

-Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
  2006-02-02 21:22 ` Dean Roe
  2006-02-02 21:46 ` Luck, Tony
@ 2006-02-02 21:50 ` Jack Steiner
  2006-02-02 22:29 ` Luck, Tony
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jack Steiner @ 2006-02-02 21:50 UTC (permalink / raw)
  To: linux-ia64

On Thu, Feb 02, 2006 at 01:46:05PM -0800, Luck, Tony wrote:
> > Jack, I think this is the thread you are looking for:
> >    http://marc.theaimsgroup.com/?l=linux-ia64&m\x112422250517703&w=2
> 
> The current state of the tree is that Andrew added the patch
> to make the definition of printk_clock() in kernel/printk.c
> "__attribute__((weak))" so that architectures could override
> with their own version.  Default version uses sched_clock().
> 
> So far we haven't made use of this ... but we need to do it (partly
> because it crashes early on Altix, but also because it returns
> meaningless values on any system where the clocks are not synched
> on different cpus (perhaps just Altix).
> 
> -Tony

Do you understand why it crashes only on Altix?? It looked to
me that all IA64 platforms would crash but obviously I'm wrong.
What did I miss...

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (2 preceding siblings ...)
  2006-02-02 21:50 ` Jack Steiner
@ 2006-02-02 22:29 ` Luck, Tony
  2006-02-02 22:52 ` Chen, Kenneth W
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Luck, Tony @ 2006-02-02 22:29 UTC (permalink / raw)
  To: linux-ia64

> Do you understand why it crashes only on Altix?? It looked to
> me that all IA64 platforms would crash but obviously I'm wrong.
> What did I miss...

Not sure ... generic code in init/main.c definitely calls printk()
before it calls setup_arch(), so we should all be doomed.

On Tiger I get the following garbage, but no crash:

[417590805.377580] Linux version 2.6.16-rc1-tiger-smp (aegl@linux-t10) (gcc version 3.4.3 20050227 (Red Hat 3.4.3-22.1)) #2 SMP Thu Feb 2 14:18:45 PST 2006
[418206892.666388] EFI v1.10 by INTEL: SALsystab=0x7fe54980 ACPI=0x7ff84000 ACPI 2.0=0x7ff83000 MPS=0x7ff82000 SMBIOS=0xf0000
[662789972.910145] Initial ramdisk at: 0xe0000001fedff000 (1303562 bytes)
[662917103.942106] SAL 3.20: Intel Corp                       SR870BN4                         version 3.0
[663063150.010040] SAL Platform features: BusLock IRQ_Redirection
[663154083.057631] SAL: AP wakeup using external interrupt vector 0xf0
[1026364795.672067] No logical to physical processor mapping available
[    0.000000] iosapic_system_init: Disabling PC-AT compatible 8259 interrupts

Maybe PAL/SAL/EFI leaves some mapping in the TC for 0xffffffffffff0000 so I
don't die when reading per-cpu area?

-Tony



^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (3 preceding siblings ...)
  2006-02-02 22:29 ` Luck, Tony
@ 2006-02-02 22:52 ` Chen, Kenneth W
  2006-02-03 17:43 ` Chen, Kenneth W
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Chen, Kenneth W @ 2006-02-02 22:52 UTC (permalink / raw)
  To: linux-ia64

Luck, Tony wrote on Thursday, February 02, 2006 2:29 PM
> Maybe PAL/SAL/EFI leaves some mapping in the TC for 0xffffffffffff0000
> so I don't die when reading per-cpu area?

I took a look via hardware debugger, the alt-dtlb was taken and kernel
inserts an identity mapping.  It appears that tiger platform has un-
advertised memory at the top of physical address space.

- Ken


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (4 preceding siblings ...)
  2006-02-02 22:52 ` Chen, Kenneth W
@ 2006-02-03 17:43 ` Chen, Kenneth W
  2006-02-03 18:04 ` Luck, Tony
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Chen, Kenneth W @ 2006-02-03 17:43 UTC (permalink / raw)
  To: linux-ia64

Luck, Tony wrote on Thursday, February 02, 2006 1:46 PM
> The current state of the tree is that Andrew added the patch
> to make the definition of printk_clock() in kernel/printk.c
> "__attribute__((weak))" so that architectures could override
> with their own version.  Default version uses sched_clock().

Why don't we look at ar.k3, it contains per_cpu base address.
If it has a none zero value, then it is initialized by
per_cpu_init and is safe to call sched_clock.

- Ken


--- ./arch/ia64/kernel/time.c.orig	2006-02-03 10:10:10.291334310 -0800
+++ ./arch/ia64/kernel/time.c	2006-02-03 10:23:35.944644753 -0800
@@ -278,3 +278,10 @@ udelay (unsigned long usecs)
 	}
 }
 EXPORT_SYMBOL(udelay);
+
+unsigned long long printk_clock(void)
+{
+	if (ia64_get_kr(IA64_KR_PER_CPU_DATA))
+		return sched_clock();
+	return 0;
+}
--- ./arch/ia64/kernel/head.S.orig	2006-02-03 10:24:57.431948442 -0800
+++ ./arch/ia64/kernel/head.S	2006-02-03 10:30:54.306944070 -0800
@@ -352,6 +352,7 @@ start_ap:
 	mov ar.rsc=0		// place RSE in enforced lazy mode
 	;;
 	loadrs			// clear the dirty partition
+	mov ar.k3=r0		// clear physical per-CPU base
 	;;
 	mov ar.bspstore=r2	// establish the new RSE stack
 	;;



^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (5 preceding siblings ...)
  2006-02-03 17:43 ` Chen, Kenneth W
@ 2006-02-03 18:04 ` Luck, Tony
  2006-02-03 18:30 ` Chen, Kenneth W
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Luck, Tony @ 2006-02-03 18:04 UTC (permalink / raw)
  To: linux-ia64

> Why don't we look at ar.k3, it contains per_cpu base address.
> If it has a none zero value, then it is initialized by
> per_cpu_init and is safe to call sched_clock.

Looks good to me[1].

-Tony

[1] http://tinyurl.com/ah642

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (6 preceding siblings ...)
  2006-02-03 18:04 ` Luck, Tony
@ 2006-02-03 18:30 ` Chen, Kenneth W
  2006-02-03 18:43 ` Luck, Tony
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Chen, Kenneth W @ 2006-02-03 18:30 UTC (permalink / raw)
  To: linux-ia64

Luck, Tony wrote on Friday, February 03, 2006 10:04 AM
> [1] http://tinyurl.com/ah642

Ha, should've check the archive :-)

Now that I read the whole thread, how about remove pining DTR 
requirement for per-cpu data?  In alt_dtlb_miss, we can just
detect whether ifa is per_cpu address and read ar.k3 to get the
physical address, then insert a TC.  That will free up a tlb
entry for application to use.

- Ken


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (7 preceding siblings ...)
  2006-02-03 18:30 ` Chen, Kenneth W
@ 2006-02-03 18:43 ` Luck, Tony
  2006-02-03 18:55 ` Chen, Kenneth W
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Luck, Tony @ 2006-02-03 18:43 UTC (permalink / raw)
  To: linux-ia64

> Now that I read the whole thread, how about remove pining DTR 
> requirement for per-cpu data?  In alt_dtlb_miss, we can just
> detect whether ifa is per_cpu address and read ar.k3 to get the
> physical address, then insert a TC.  That will free up a tlb
> entry for application to use.

What will you do when someone accesses per-cpu area before ar.k3
has been initialized (the bug that we are trying to fix here)?

This would add some extra code to the Alt-DTLB handler.  Could
you hide all/most of the extra latency?

Are there any correctness issues?  Any place where we access a
per-cpu variable where we absolutely cannot take an Alt-DTLB fault?

-Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (8 preceding siblings ...)
  2006-02-03 18:43 ` Luck, Tony
@ 2006-02-03 18:55 ` Chen, Kenneth W
  2006-02-04  0:16 ` Keith Owens
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Chen, Kenneth W @ 2006-02-03 18:55 UTC (permalink / raw)
  To: linux-ia64

Luck, Tony wrote on Friday, February 03, 2006 10:43 AM
> > Now that I read the whole thread, how about remove pining DTR 
> > requirement for per-cpu data?  In alt_dtlb_miss, we can just
> > detect whether ifa is per_cpu address and read ar.k3 to get the
> > physical address, then insert a TC.  That will free up a tlb
> > entry for application to use.
> 
> What will you do when someone accesses per-cpu area before ar.k3
> has been initialized (the bug that we are trying to fix here)?

As far as I can tell, so far printk_clock is the only one that
needs to reference per_cpu area data before it is setup. Everywhere
else in ia64 arch code, it is specially crafted to avoid it.

Alternatively, we can do the same trick as boot strapping init_task,
having per cpu area allocated for BP in the kernel image and have
BP to setup per cpu area for AP before bringing them up.  Is that
worth it?


> This would add some extra code to the Alt-DTLB handler.  Could
> you hide all/most of the extra latency?

I will try ....


> Are there any correctness issues?  Any place where we access a
> per-cpu variable where we absolutely cannot take an Alt-DTLB fault?

Good point, I will look (don't think there is any place we reference
per cpu data with psr.ic off).

- Ken


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (9 preceding siblings ...)
  2006-02-03 18:55 ` Chen, Kenneth W
@ 2006-02-04  0:16 ` Keith Owens
  2006-02-04  0:28 ` Chen, Kenneth W
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Keith Owens @ 2006-02-04  0:16 UTC (permalink / raw)
  To: linux-ia64

"Luck, Tony" (on Fri, 3 Feb 2006 10:43:08 -0800) wrote:
>> Now that I read the whole thread, how about remove pining DTR 
>> requirement for per-cpu data?  In alt_dtlb_miss, we can just
>> detect whether ifa is per_cpu address and read ar.k3 to get the
>> physical address, then insert a TC.  That will free up a tlb
>> entry for application to use.
>
>What will you do when someone accesses per-cpu area before ar.k3
>has been initialized (the bug that we are trying to fix here)?
>
>This would add some extra code to the Alt-DTLB handler.  Could
>you hide all/most of the extra latency?
>
>Are there any correctness issues?  Any place where we access a
>per-cpu variable where we absolutely cannot take an Alt-DTLB fault?

mca_asm.S, GET_IA64_MCA_DATA().  ia64_mca_data is per cpu and is
accessed right from the start of the MCA/INIT handlers, with psr.i,
psr.ic both 0.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (10 preceding siblings ...)
  2006-02-04  0:16 ` Keith Owens
@ 2006-02-04  0:28 ` Chen, Kenneth W
  2006-02-04  0:32 ` Luck, Tony
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Chen, Kenneth W @ 2006-02-04  0:28 UTC (permalink / raw)
  To: linux-ia64

Keith Owens wrote on Friday, February 03, 2006 4:16 PM
> >Are there any correctness issues?  Any place where we access a
> >per-cpu variable where we absolutely cannot take an Alt-DTLB fault?
> 
> mca_asm.S, GET_IA64_MCA_DATA().  ia64_mca_data is per cpu and is
> accessed right from the start of the MCA/INIT handlers, with psr.i,
> psr.ic both 0.

Ouch, can I turn off psr.dt, then access ia64_mca_data in physical
mode, and switch psr.dt back on?  Are there any ill effect this
could cause?

- Ken


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (11 preceding siblings ...)
  2006-02-04  0:28 ` Chen, Kenneth W
@ 2006-02-04  0:32 ` Luck, Tony
  2006-02-04  0:36 ` Keith Owens
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Luck, Tony @ 2006-02-04  0:32 UTC (permalink / raw)
  To: linux-ia64

> mca_asm.S, GET_IA64_MCA_DATA().  ia64_mca_data is per cpu and is
> accessed right from the start of the MCA/INIT handlers, with psr.i,
> psr.ic both 0.

But PSR.dt is also zero, and we are accessing via the physical address
(using ar.k3).  Right?  At the start of the mca/init handlers we don't
trust that the TLB is working.

-Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (12 preceding siblings ...)
  2006-02-04  0:32 ` Luck, Tony
@ 2006-02-04  0:36 ` Keith Owens
  2006-02-06 18:14 ` Luck, Tony
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Keith Owens @ 2006-02-04  0:36 UTC (permalink / raw)
  To: linux-ia64

Keith Owens (on Sat, 04 Feb 2006 11:16:09 +1100) wrote:
>"Luck, Tony" (on Fri, 3 Feb 2006 10:43:08 -0800) wrote:
>>> Now that I read the whole thread, how about remove pining DTR 
>>> requirement for per-cpu data?  In alt_dtlb_miss, we can just
>>> detect whether ifa is per_cpu address and read ar.k3 to get the
>>> physical address, then insert a TC.  That will free up a tlb
>>> entry for application to use.
>>
>>What will you do when someone accesses per-cpu area before ar.k3
>>has been initialized (the bug that we are trying to fix here)?
>>
>>This would add some extra code to the Alt-DTLB handler.  Could
>>you hide all/most of the extra latency?
>>
>>Are there any correctness issues?  Any place where we access a
>>per-cpu variable where we absolutely cannot take an Alt-DTLB fault?
>
>mca_asm.S, GET_IA64_MCA_DATA().  ia64_mca_data is per cpu and is
>accessed right from the start of the MCA/INIT handlers, with psr.i,
>psr.ic both 0.

Ignore that, ia64_mca_data and the data that it points to uses physical
addresses, not virtual.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (13 preceding siblings ...)
  2006-02-04  0:36 ` Keith Owens
@ 2006-02-06 18:14 ` Luck, Tony
  2006-02-06 18:34 ` Chen, Kenneth W
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Luck, Tony @ 2006-02-06 18:14 UTC (permalink / raw)
  To: linux-ia64

On Fri, Feb 03, 2006 at 09:43:29AM -0800, Chen, Kenneth W wrote:
> Why don't we look at ar.k3, it contains per_cpu base address.
> If it has a none zero value, then it is initialized by
> per_cpu_init and is safe to call sched_clock.

We also need to check whether ar.itc is synchronized on different
cpus ... if it isn't, then sched_clock() is useless.  Here's a
simple extension of your patch that does this, falling back to using
a jiffie based clock (which means you'd only have HZ resolution on
systems where there is drift).

If that isn't good enough, then I'd be happy to see a patch that
does a lockless interpolation.

-Tony


diff --git a/arch/ia64/kernel/head.S b/arch/ia64/kernel/head.S
index fbc7ea3..99cfef8 100644
--- a/arch/ia64/kernel/head.S
+++ b/arch/ia64/kernel/head.S
@@ -352,6 +352,7 @@ start_ap:
 	mov ar.rsc=0		// place RSE in enforced lazy mode
 	;;
 	loadrs			// clear the dirty partition
+	mov ar.k3=r0		// clear physical per-CPU base
 	;;
 	mov ar.bspstore=r2	// establish the new RSE stack
 	;;
diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 028a2b9..648c116 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -278,3 +278,13 @@ udelay (unsigned long usecs)
 	}
 }
 EXPORT_SYMBOL(udelay);
+
+unsigned long long printk_clock(void)
+{
+	if (sal_platform_features & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT) {
+		return (unsigned long long)(jiffies_64 - INITIAL_JIFFIES) *
+			(1000000000/HZ);
+	} else if (ia64_get_kr(IA64_KR_PER_CPU_DATA))
+		return sched_clock();
+	return 0;
+}

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (14 preceding siblings ...)
  2006-02-06 18:14 ` Luck, Tony
@ 2006-02-06 18:34 ` Chen, Kenneth W
  2006-02-06 20:40 ` Magenheimer, Dan (HP Labs Fort Collins)
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Chen, Kenneth W @ 2006-02-06 18:34 UTC (permalink / raw)
  To: linux-ia64

Luck, Tony wrote on Monday, February 06, 2006 10:14 AM
> On Fri, Feb 03, 2006 at 09:43:29AM -0800, Chen, Kenneth W wrote:
> > Why don't we look at ar.k3, it contains per_cpu base address.
> > If it has a none zero value, then it is initialized by
> > per_cpu_init and is safe to call sched_clock.
> 
> We also need to check whether ar.itc is synchronized on different
> cpus ... if it isn't, then sched_clock() is useless.  Here's a
> simple extension of your patch that does this, falling back to using
> a jiffie based clock (which means you'd only have HZ resolution on
> systems where there is drift).
> 
> If that isn't good enough, then I'd be happy to see a patch that
> does a lockless interpolation.
> 
> diff --git a/arch/ia64/kernel/head.S b/arch/ia64/kernel/head.S
> index fbc7ea3..99cfef8 100644
> --- a/arch/ia64/kernel/head.S
> +++ b/arch/ia64/kernel/head.S
> @@ -352,6 +352,7 @@ start_ap:
>  	mov ar.rsc=0		// place RSE in enforced lazy mode
>  	;;
>  	loadrs			// clear the dirty partition
> +	mov ar.k3=r0		// clear physical per-CPU base
>  	;;
>  	mov ar.bspstore=r2	// establish the new RSE stack
>  	;;

Sorry, I'm the one who is spreading ... can we change this hunk
to use defined kr constant?

diff --git a/arch/ia64/kernel/head.S b/arch/ia64/kernel/head.S
index fbc7ea3..99cfef8 100644
--- a/arch/ia64/kernel/head.S
+++ b/arch/ia64/kernel/head.S
@@ -352,6 +352,7 @@ start_ap:
 	mov ar.rsc=0		// place RSE in enforced lazy mode
 	;;
 	loadrs			// clear the dirty partition
+	mov IA64_KR(PER_CPU_DATA)=r0
 	;;
 	mov ar.bspstore=r2	// establish the new RSE stack
 	;;



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (15 preceding siblings ...)
  2006-02-06 18:34 ` Chen, Kenneth W
@ 2006-02-06 20:40 ` Magenheimer, Dan (HP Labs Fort Collins)
  2006-02-06 21:37 ` Chen, Kenneth W
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Magenheimer, Dan (HP Labs Fort Collins) @ 2006-02-06 20:40 UTC (permalink / raw)
  To: linux-ia64

One headsup before unpinning per-cpu...

In playing with paravirtualization for Xen/ia64, I saw
some evidence that replacing all uses of ar.kr's in Linux/ia64
with direct memory accesses to (pinned) per-cpu data may speed
up the kernel somewhat (~0.5%?).  (Reading/writing of kr's
on Mckinley is pretty slow... don't know about other processors.)

I talked to Peter Chubb about this at the October Gelato meeting
and we thought it might make a good undergraduate/master's
project/paper so I never reported it here.

However, if the TR used for pinning per-cpu gets used
for something else, this experiment will get much harder
to run so I thought I'd better post this headsup now.

Dan Magenheimer
HP Labs Fort Collins

> -----Original Message-----
> From: linux-ia64-owner@vger.kernel.org 
> [mailto:linux-ia64-owner@vger.kernel.org] On Behalf Of Chen, Kenneth W
> Sent: Friday, February 03, 2006 11:31 AM
> To: Luck, Tony; 'Dean Roe'; 'Jack Steiner'
> Cc: linux-ia64@vger.kernel.org
> Subject: RE: ia64 printk_clock()
> 
> Luck, Tony wrote on Friday, February 03, 2006 10:04 AM
> > [1] http://tinyurl.com/ah642
> 
> Ha, should've check the archive :-)
> 
> Now that I read the whole thread, how about remove pining DTR 
> requirement for per-cpu data?  In alt_dtlb_miss, we can just
> detect whether ifa is per_cpu address and read ar.k3 to get the
> physical address, then insert a TC.  That will free up a tlb
> entry for application to use.
> 
> - Ken
> 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (16 preceding siblings ...)
  2006-02-06 20:40 ` Magenheimer, Dan (HP Labs Fort Collins)
@ 2006-02-06 21:37 ` Chen, Kenneth W
  2006-02-07 11:11 ` Jes Sorensen
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Chen, Kenneth W @ 2006-02-06 21:37 UTC (permalink / raw)
  To: linux-ia64

Magenheimer, Dan wrote on Monday, February 06, 2006 12:41 PM
> One headsup before unpinning per-cpu...
> 
> In playing with paravirtualization for Xen/ia64, I saw
> some evidence that replacing all uses of ar.kr's in Linux/ia64
> with direct memory accesses to (pinned) per-cpu data may speed
> up the kernel somewhat (~0.5%?).  (Reading/writing of kr's
> on Mckinley is pretty slow... don't know about other processors.)


On madison, reading KR is pretty fast.  I have the following experimental
patch[*] and when I measured alt_dtlb_miss latency, with or without the
patch, both yielded 23 cycles.  TLB miss in the kernel definitely went up,
by about 3% with kernel build bench and 7% with an OLTP db workload. The
trade off is to have smaller TLB miss for user app.  I'm doing a few more
experiments to see whether the trade off is worthwhile or not.

- Ken

[*] prerequisite of a patch similar to
http://www.gelato.unsw.edu.au/archives/linux-ia64/0601/16836.html to avoid
referencing per cpu variable ia64_phys_stacked_size_p8 in the kernel exit
path, as the per cpu area is accessed with psr.ic=0.


--- ./arch/ia64/kernel/ivt.S.orig	2006-01-02 19:21:10.000000000 -0800
+++ ./arch/ia64/kernel/ivt.S	2006-02-05 13:30:57.782233990 -0800
@@ -375,6 +375,7 @@ ENTRY(alt_dtlb_miss)
 	movl r19=(((1 << IA64_MAX_PHYS_BITS) - 1) & ~0xfff)
 	mov r21=cr.ipsr
 	mov r31=pr
+	mov r24=PERCPU_ADDR
 	;;
 #ifdef CONFIG_DISABLE_VHPT
 	shr.u r22=r16,61			// get the region number into r21
@@ -387,22 +388,30 @@ ENTRY(alt_dtlb_miss)
 (p8)	mov r29°				// save b0
 (p8)	br.cond.dptk dtlb_fault
 #endif
+	cmp.ge p10,p11=r16,r24			// access to per_cpu_data?
+	tbit.z p12,p0=r16,61			// access to region 6?
+	mov r25=PERCPU_PAGE_SHIFT << 2
+	mov r26=PERCPU_PAGE_SIZE
+	nop.m 0
+	nop.b 0
+	;;
+(p10)	mov r19=IA64_KR(PER_CPU_DATA)
+(p11)	and r19=r19,r16				// clear non-ppn fields
 	extr.u r23=r21,IA64_PSR_CPL0_BIT,2	// extract psr.cpl
 	and r22=IA64_ISR_CODE_MASK,r20		// get the isr.code field
 	tbit.nz p6,p7=r20,IA64_ISR_SP_BIT	// is speculation bit on?
-	shr.u r18=r16,57			// move address bit 61 to bit 4
-	and r19=r19,r16				// clear ed, reserved bits, and PTE control bits
 	tbit.nz p9,p0=r20,IA64_ISR_NA_BIT	// is non-access bit on?
 	;;
-	andcm r18=0x10,r18	// bit 4=~address-bit(61)
+(p10)	sub r19=r19,r26
+(p10)	mov cr.itir=r25
 	cmp.ne p8,p0=r0,r23
 (p9)	cmp.eq.or.andcm p6,p7=IA64_ISR_CODE_LFETCH,r22	// check isr.code field
+(p12)	dep r17=-1,r17,4,1			// set ma=UC for region 6 addr
 (p8)	br.cond.spnt page_fault
 
 	dep r21=-1,r21,IA64_PSR_ED_BIT,1
-	or r19=r19,r17		// insert PTE control bits into r19
 	;;
-	or r19=r19,r18		// set bit 4 (uncached) if the access was to region 6
+	or r19=r19,r17		// insert PTE control bits into r19
 (p6)	mov cr.ipsr=r21
 	;;
 (p7)	itc.d r19		// insert the TLB entry
--- ./arch/ia64/kernel/mca_asm.S.orig	2006-01-02 19:21:10.000000000 -0800
+++ ./arch/ia64/kernel/mca_asm.S	2006-02-05 11:15:55.620223867 -0800
@@ -102,14 +102,6 @@ ia64_do_tlb_purge:
 	;;
 	srlz.d
 	;;
-	// 2. Purge DTR for PERCPU data.
-	movl r16=PERCPU_ADDR
-	mov r18=PERCPU_PAGE_SHIFT<<2
-	;;
-	ptr.d r16,r18
-	;;
-	srlz.d
-	;;
 	// 3. Purge ITR for PAL code.
 	GET_THIS_PADDR(r2, ia64_mca_pal_base)
 	;;
@@ -197,22 +189,6 @@ ia64_reload_tr:
 	srlz.i
 	srlz.d
 	;;
-	// 2. Reload DTR register for PERCPU data.
-	GET_THIS_PADDR(r2, ia64_mca_per_cpu_pte)
-	;;
-	movl r16=PERCPU_ADDR		// vaddr
-	movl r18=PERCPU_PAGE_SHIFT<<2
-	;;
-	mov cr.itir=r18
-	mov cr.ifa=r16
-	;;
-	ld8 r18=[r2]			// load per-CPU PTE
-	mov r16=IA64_TR_PERCPU_DATA;
-	;;
-	itr.d dtr[r16]=r18
-	;;
-	srlz.d
-	;;
 	// 3. Reload ITR for PAL code.
 	GET_THIS_PADDR(r2, ia64_mca_pal_pte)
 	;;
--- ./arch/ia64/mm/init.c.orig	2006-01-02 19:21:10.000000000 -0800
+++ ./arch/ia64/mm/init.c	2006-02-05 11:16:57.578230920 -0800
@@ -332,7 +332,7 @@ setup_gate (void)
 void __devinit
 ia64_mmu_init (void *my_cpu_data)
 {
-	unsigned long psr, pta, impl_va_bits;
+	unsigned long pta, impl_va_bits;
 	extern void __devinit tlb_init (void);
 
 #ifdef CONFIG_DISABLE_VHPT
@@ -341,15 +341,6 @@ ia64_mmu_init (void *my_cpu_data)
 #	define VHPT_ENABLE_BIT	1
 #endif
 
-	/* Pin mapping for percpu area into TLB */
-	psr = ia64_clear_ic();
-	ia64_itr(0x2, IA64_TR_PERCPU_DATA, PERCPU_ADDR,
-		 pte_val(pfn_pte(__pa(my_cpu_data) >> PAGE_SHIFT, PAGE_KERNEL)),
-		 PERCPU_PAGE_SHIFT);
-
-	ia64_set_psr(psr);
-	ia64_srlz_i();
-
 	/*
 	 * Check if the virtually mapped linear page table (VMLPT) overlaps with a mapped
 	 * address space.  The IA-64 architecture guarantees that at least 50 bits of
--- ./include/asm-ia64/kregs.h.orig	2006-01-02 19:21:10.000000000 -0800
+++ ./include/asm-ia64/kregs.h	2006-02-05 11:15:55.621200429 -0800
@@ -29,8 +29,7 @@
  */
 #define IA64_TR_KERNEL		0	/* itr0, dtr0: maps kernel image (code & data) */
 #define IA64_TR_PALCODE		1	/* itr1: maps PALcode as required by EFI */
-#define IA64_TR_PERCPU_DATA	1	/* dtr1: percpu data */
-#define IA64_TR_CURRENT_STACK	2	/* dtr2: maps kernel's memory- & register-stacks */
+#define IA64_TR_CURRENT_STACK	1	/* dtr1: maps kernel's memory- & register-stacks */
 
 /* Processor status register bits: */
 #define IA64_PSR_BE_BIT		1



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (17 preceding siblings ...)
  2006-02-06 21:37 ` Chen, Kenneth W
@ 2006-02-07 11:11 ` Jes Sorensen
  2006-02-08  0:05 ` Luck, Tony
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Jes Sorensen @ 2006-02-07 11:11 UTC (permalink / raw)
  To: linux-ia64

>>>>> "tony" = Luck, Tony <tony.luck@intel.com> writes:

tony> We also need to check whether ar.itc is synchronized on
tony> different cpus ... if it isn't, then sched_clock() is useless.
tony> Here's a simple extension of your patch that does this, falling
tony> back to using a jiffie based clock (which means you'd only have
tony> HZ resolution on systems where there is drift).

tony> If that isn't good enough, then I'd be happy to see a patch that
tony> does a lockless interpolation.

Me too! me too! ;-)

I've never really thought of this feature as being important, but it
might be good for debugging so getting the extra resolution where
possible is probably a good thing<tm>.

What about this, it uses a function pointer that can be updated by the
machine specific code so architectures which have an alternative
reliable clock source can use that instead of falling back to the
jiffies based implementation?

Cheers,
Jes

diff --git a/arch/ia64/kernel/head.S b/arch/ia64/kernel/head.S
index fbc7ea3..99cfef8 100644
 arch/ia64/kernel/head.S     |    1 +
 arch/ia64/kernel/setup.c    |    4 ++++
 arch/ia64/kernel/time.c     |   27 +++++++++++++++++++++++++++
 arch/ia64/sn/kernel/setup.c |   40 +++++++++++++++++++++++++++-------------
 4 files changed, 59 insertions(+), 13 deletions(-)

Index: linux-2.6/arch/ia64/kernel/head.S
=================================--- linux-2.6.orig/arch/ia64/kernel/head.S
+++ linux-2.6/arch/ia64/kernel/head.S
@@ -352,6 +352,7 @@
 	mov ar.rsc=0		// place RSE in enforced lazy mode
 	;;
 	loadrs			// clear the dirty partition
+	mov ar.k3=r0		// clear physical per-CPU base
 	;;
 	mov ar.bspstore=r2	// establish the new RSE stack
 	;;
Index: linux-2.6/arch/ia64/kernel/setup.c
=================================--- linux-2.6.orig/arch/ia64/kernel/setup.c
+++ linux-2.6/arch/ia64/kernel/setup.c
@@ -71,6 +71,8 @@
 EXPORT_SYMBOL(__per_cpu_offset);
 #endif
 
+extern void ia64_setup_printk_clock(void);
+
 DEFINE_PER_CPU(struct cpuinfo_ia64, cpu_info);
 DEFINE_PER_CPU(unsigned long, local_per_cpu_offset);
 DEFINE_PER_CPU(unsigned long, ia64_phys_stacked_size_p8);
@@ -445,6 +447,8 @@
 	/* process SAL system table: */
 	ia64_sal_init(efi.sal_systab);
 
+	ia64_setup_printk_clock();
+
 #ifdef CONFIG_SMP
 	cpu_physical_id(0) = hard_smp_processor_id();
 
Index: linux-2.6/arch/ia64/kernel/time.c
=================================--- linux-2.6.orig/arch/ia64/kernel/time.c
+++ linux-2.6/arch/ia64/kernel/time.c
@@ -278,3 +278,30 @@
 	}
 }
 EXPORT_SYMBOL(udelay);
+
+static unsigned long long ia64_itc_printk_clock(void)
+{
+	if (ia64_get_kr(IA64_KR_PER_CPU_DATA))
+		return sched_clock();
+	return 0;
+}
+
+static unsigned long long ia64_default_printk_clock(void)
+{
+	return (unsigned long long)(jiffies_64 - INITIAL_JIFFIES) *
+		(1000000000/HZ);
+}
+
+unsigned long long (*ia64_printk_clock)(void) = &ia64_default_printk_clock;
+
+unsigned long long printk_clock(void)
+{
+	return ia64_printk_clock();
+}
+
+void __init
+ia64_setup_printk_clock(void)
+{
+	if (!(sal_platform_features & IA64_SAL_PLATFORM_FEATURE_ITC_DRIFT))
+		ia64_printk_clock = ia64_itc_printk_clock;
+}
Index: linux-2.6/arch/ia64/sn/kernel/setup.c
=================================--- linux-2.6.orig/arch/ia64/sn/kernel/setup.c
+++ linux-2.6/arch/ia64/sn/kernel/setup.c
@@ -67,6 +67,7 @@
 extern void (*ia64_mark_idle) (int);
 extern void snidle(int);
 extern unsigned char acpi_kbd_controller_present;
+extern unsigned long long (*ia64_printk_clock)(void);
 
 unsigned long sn_rtc_cycles_per_second;
 EXPORT_SYMBOL(sn_rtc_cycles_per_second);
@@ -372,6 +373,16 @@
 	}
 }
 
+static unsigned long sn2_rtc_initial;
+
+static unsigned long long ia64_sn2_printk_clock(void)
+{
+	unsigned long rtc_now = rtc_time();
+
+	return (rtc_now - sn2_rtc_initial) *
+		(1000000000 / sn_rtc_cycles_per_second);
+}
+
 /**
  * sn_setup - SN platform setup routine
  * @cmdline_p: kernel command line
@@ -386,6 +397,7 @@
 	u32 version = sn_sal_rev();
 	extern void sn_cpu_init(void);
 
+	sn2_rtc_initial = rtc_time();
 	ia64_sn_plat_set_error_handling_features();	// obsolete
 	ia64_sn_set_os_feature(OSF_MCA_SLV_TO_OS_INIT_SLV);
 	ia64_sn_set_os_feature(OSF_FEAT_LOG_SBES);
@@ -437,19 +449,6 @@
 	 */
 	build_cnode_tables();
 
-	/*
-	 * Old PROMs do not provide an ACPI FADT. Disable legacy keyboard
-	 * support here so we don't have to listen to failed keyboard probe
-	 * messages.
-	 */
-	if (version <= 0x0209 && acpi_kbd_controller_present) {
-		printk(KERN_INFO "Disabling legacy keyboard support as prom "
-		       "is too old and doesn't provide FADT\n");
-		acpi_kbd_controller_present = 0;
-	}
-
-	printk("SGI SAL version %x.%02x\n", version >> 8, version & 0x00FF);
-
 	status  	    ia64_sal_freq_base(SAL_FREQ_BASE_REALTIME_CLOCK, &ticks_per_sec,
 			       &drift);
@@ -463,6 +462,21 @@
 
 	platform_intr_list[ACPI_INTERRUPT_CPEI] = IA64_CPE_VECTOR;
 
+	ia64_printk_clock = ia64_sn2_printk_clock;
+
+	/*
+	 * Old PROMs do not provide an ACPI FADT. Disable legacy keyboard
+	 * support here so we don't have to listen to failed keyboard probe
+	 * messages.
+	 */
+	if (version <= 0x0209 && acpi_kbd_controller_present) {
+		printk(KERN_INFO "Disabling legacy keyboard support as prom "
+		       "is too old and doesn't provide FADT\n");
+		acpi_kbd_controller_present = 0;
+	}
+
+	printk("SGI SAL version %x.%02x\n", version >> 8, version & 0x00FF);
+
 	/*
 	 * we set the default root device to /dev/hda
 	 * to make simulation easy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (18 preceding siblings ...)
  2006-02-07 11:11 ` Jes Sorensen
@ 2006-02-08  0:05 ` Luck, Tony
  2006-02-08  6:29 ` Jes Sorensen
  2006-02-08 18:18 ` Luck, Tony
  21 siblings, 0 replies; 23+ messages in thread
From: Luck, Tony @ 2006-02-08  0:05 UTC (permalink / raw)
  To: linux-ia64

> Me too! me too! ;-)
>
> I've never really thought of this feature as being important, but it
> might be good for debugging so getting the extra resolution where
> possible is probably a good thing<tm>.
> 
> What about this, it uses a function pointer that can be updated by the
> machine specific code so architectures which have an alternative
> reliable clock source can use that instead of falling back to the
> jiffies based implementation?

That looks pretty good.  It's a little interesting watching the
boot with CONFIG_PRINTK_TIME=y ... at first lines are printed with
a 0.000000 timestamp because we are using ia64_default_printk_clock()
and jiffies aren't moving yet.  Then I think we switch over to the itc
version, but we still see 0.00000 because the per-cpu nsec_per_cyc
hasn't been initialized yet (multiply by 0 and you get 0).

Once we've read the PAL to get the frequency, we start seeing good
timestamps: until we start bringing up other cpus, when we see a
couple more 0.000000 stamps per cpu.  Not sure if these are because
we printk() before the new cpu has had a chance to set ar.k3, or if
the printk() is before nsec_per_cpu is initialized.

Once all the cpus are up, timestamps proceed in an orderly monotonic
fashion.

-Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (19 preceding siblings ...)
  2006-02-08  0:05 ` Luck, Tony
@ 2006-02-08  6:29 ` Jes Sorensen
  2006-02-08 18:18 ` Luck, Tony
  21 siblings, 0 replies; 23+ messages in thread
From: Jes Sorensen @ 2006-02-08  6:29 UTC (permalink / raw)
  To: linux-ia64

Luck, Tony wrote:
> That looks pretty good.  It's a little interesting watching the
> boot with CONFIG_PRINTK_TIME=y ... at first lines are printed with
> a 0.000000 timestamp because we are using ia64_default_printk_clock()
> and jiffies aren't moving yet.  Then I think we switch over to the itc
> version, but we still see 0.00000 because the per-cpu nsec_per_cyc
> hasn't been initialized yet (multiply by 0 and you get 0).
> 
> Once we've read the PAL to get the frequency, we start seeing good
> timestamps: until we start bringing up other cpus, when we see a
> couple more 0.000000 stamps per cpu.  Not sure if these are because
> we printk() before the new cpu has had a chance to set ar.k3, or if
> the printk() is before nsec_per_cpu is initialized.

Hi Tony,

Hadn't seen this effect since the SN2 use the global clock source which
is in sync for all CPUs. Would it make sense to use the HPET on systems
that got it?

Cheers,
Jes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: ia64 printk_clock()
  2006-02-02 20:44 ia64 printk_clock() Jack Steiner
                   ` (20 preceding siblings ...)
  2006-02-08  6:29 ` Jes Sorensen
@ 2006-02-08 18:18 ` Luck, Tony
  21 siblings, 0 replies; 23+ messages in thread
From: Luck, Tony @ 2006-02-08 18:18 UTC (permalink / raw)
  To: linux-ia64

> Hadn't seen this effect since the SN2 use the global clock source which
> is in sync for all CPUs. Would it make sense to use the HPET on systems
> that got it?

It might be useful for a system with non-synchronized ar.itc to stop them
falling back to jiffie resolution.  But I think sn2 is the only one of those
right now, and you've already got this covered.

The odd 0.000000 values during boot can easily be worked around by anyone
looking to use this to analyse and improve boot time.  I was just amused
that there were three different reasons why a 0.000000 value would be
printed.

-Tony

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2006-02-08 18:18 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-02 20:44 ia64 printk_clock() Jack Steiner
2006-02-02 21:22 ` Dean Roe
2006-02-02 21:46 ` Luck, Tony
2006-02-02 21:50 ` Jack Steiner
2006-02-02 22:29 ` Luck, Tony
2006-02-02 22:52 ` Chen, Kenneth W
2006-02-03 17:43 ` Chen, Kenneth W
2006-02-03 18:04 ` Luck, Tony
2006-02-03 18:30 ` Chen, Kenneth W
2006-02-03 18:43 ` Luck, Tony
2006-02-03 18:55 ` Chen, Kenneth W
2006-02-04  0:16 ` Keith Owens
2006-02-04  0:28 ` Chen, Kenneth W
2006-02-04  0:32 ` Luck, Tony
2006-02-04  0:36 ` Keith Owens
2006-02-06 18:14 ` Luck, Tony
2006-02-06 18:34 ` Chen, Kenneth W
2006-02-06 20:40 ` Magenheimer, Dan (HP Labs Fort Collins)
2006-02-06 21:37 ` Chen, Kenneth W
2006-02-07 11:11 ` Jes Sorensen
2006-02-08  0:05 ` Luck, Tony
2006-02-08  6:29 ` Jes Sorensen
2006-02-08 18:18 ` Luck, Tony

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox