* [Linux-ia64] Re: Lockups on 2.4.1
@ 2001-02-21 16:05 Bill Nottingham
2001-02-21 17:16 ` Gerrit Huizenga
` (17 more replies)
0 siblings, 18 replies; 19+ messages in thread
From: Bill Nottingham @ 2001-02-21 16:05 UTC (permalink / raw)
To: linux-ia64
Michael Madore (mmadore@turbolinux.com) said:
> Has anyone else seen lockups under the 2.4.1 kernel? I saw two machines
> (one Lion, one Big Sur) hang over the weekend. Both machines had black
> screens and wouldn't respond over the network.
>
> I had several other boxes running over the weekend with no problems. Sorry
> I don't have any more details at the moment.
I've definitely seen some completely random deaths here.
Bill
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
@ 2001-02-21 17:16 ` Gerrit Huizenga
2001-02-21 17:57 ` David Mosberger
` (16 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Gerrit Huizenga @ 2001-02-21 17:16 UTC (permalink / raw)
To: linux-ia64
I've been seeing these even under 2.4.0 on a Lion. And my current
BIOS/EFI is making it difficult to get a serial console working to see
what happens at the time of machine death.
gerrit
> Michael Madore (mmadore@turbolinux.com) said:
> > Has anyone else seen lockups under the 2.4.1 kernel? I saw two machines
> > (one Lion, one Big Sur) hang over the weekend. Both machines had black
> > screens and wouldn't respond over the network.
> >
> > I had several other boxes running over the weekend with no problems. Sorry
> > I don't have any more details at the moment.
>
> I've definitely seen some completely random deaths here.
>
> Bill
>
> _______________________________________________
> Linux-IA64 mailing list
> Linux-IA64@linuxia64.org
> http://lists.linuxia64.org/lists/listinfo/linux-ia64
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
2001-02-21 17:16 ` Gerrit Huizenga
@ 2001-02-21 17:57 ` David Mosberger
2001-02-21 18:58 ` Chris McDermott
` (15 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: David Mosberger @ 2001-02-21 17:57 UTC (permalink / raw)
To: linux-ia64
>>>>> On Wed, 21 Feb 2001 11:05:12 -0500, Bill Nottingham <notting@redhat.com> said:
Bill> Michael Madore (mmadore@turbolinux.com) said:
>> Has anyone else seen lockups under the 2.4.1 kernel? I saw two
>> machines (one Lion, one Big Sur) hang over the weekend. Both
>> machines had black screens and wouldn't respond over the network.
>>
>> I had several other boxes running over the weekend with no
>> problems. Sorry I don't have any more details at the moment.
Bill> I've definitely seen some completely random deaths here.
Please be more specific when reporting bugs. At the least, include
(a) what type of machine and (b) what kernel patch you were running at
the time. Ideally, also describe what you where doing at the time and
try to get a backtrace with kdb, if possible.
That way, we should be able to at least get an idea of what the
pattern of the failures are.
Having said that, except for the one-time "rpm" hang and the autofs4
instability, my Big Sur has been rock solid.
>>>>> On Wed, 21 Feb 2001 09:16:51 -0800, Gerrit Huizenga <gerrit@us.ibm.com> said:
Gerrit> I've been seeing these even under 2.4.0 on a Lion. And my
Gerrit> current BIOS/EFI is making it difficult to get a serial
Gerrit> console working to see what happens at the time of machine
Gerrit> death.
Please don't bother with bug reports against the 2.4.0 patch---it is
known to hang in certain cases. The latest patch is relative to
2.4.1.
Thanks,
--david
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
2001-02-21 17:16 ` Gerrit Huizenga
2001-02-21 17:57 ` David Mosberger
@ 2001-02-21 18:58 ` Chris McDermott
2001-02-21 21:02 ` David Mosberger
` (14 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Chris McDermott @ 2001-02-21 18:58 UTC (permalink / raw)
To: linux-ia64
>>>>> On Wed, 21 Feb 2001 11:05:12 -0500, Bill Nottingham
<notting@redhat.com> said:
Bill> Michael Madore (mmadore@turbolinux.com) said:
>> Has anyone else seen lockups under the 2.4.1 kernel? I saw two
>> machines (one Lion, one Big Sur) hang over the weekend. Both
>> machines had black screens and wouldn't respond over the network.
>>
>> I had several other boxes running over the weekend with no
>> problems. Sorry I don't have any more details at the moment.
Bill> I've definitely seen some completely random deaths here.
David> Please be more specific when reporting bugs. At the least, include
David> (a) what type of machine and (b) what kernel patch you were running
at
David> the time. Ideally, also describe what you where doing at the time
and
David> try to get a backtrace with kdb, if possible.
David> That way, we should be able to at least get an idea of what the
David> pattern of the failures are.
David> Having said that, except for the one-time "rpm" hang and the autofs4
David> instability, my Big Sur has been rock solid.
David,
I have seen similar symptoms on our IBM IA64 NUMA hardware. We are
running an in-house memory diagnostics test and a CPU benchmark
concurrently (strictly to keep the CPUs busy and to generate some remote
I/O). I have been assuming that this was a hardware problem (of course I
would, I'm a software guy). When I saw reports that other people were
seeing similar behavior on SDVs, I decided to try to reproduce this on a
4x Lion (B3's with BIOS 71, 2.4.1 kernel with your 0131 IA64 patch). Using
the
same tests, I was able to reproduce a "lockup" problem on the Lion (system
dead, no video). Not sure if it's the same problem yet, still need to do
some
more investigation.
Anyway, I have ITPs connected to the IBM hardware and have noticed that
when the lockup occurs, and we lose video, at least one of the CPUs is
executing in flush_tlb_no_ptcg() or handle_IPI(), in the 'do' loop where
TLB
entries are being purged. What I have observed is that the end address and
the start address are in completely different regions. Usually, the start
address
is in region register 1 (address of 0x2000XXXXXXXXXXXX) and the end address
is in region register 3 (address of 0x6000XXXXXXXXXXXX). I don't know if
this
is the same problem I am seeing on the Lion, but I plan to connect and ITP
and
a serial console (although we haven't been able to get one to work yet on
the
Lion with BIOS 71) to see if the symptoms are the same.
Chris
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (2 preceding siblings ...)
2001-02-21 18:58 ` Chris McDermott
@ 2001-02-21 21:02 ` David Mosberger
2001-02-23 15:19 ` Jun Nakajima
` (13 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: David Mosberger @ 2001-02-21 21:02 UTC (permalink / raw)
To: linux-ia64
>>>>> On Wed, 21 Feb 2001 10:58:22 -0800, "Chris McDermott" <mcdermoc@us.ibm.com> said:
Chris> Anyway, I have ITPs connected to the IBM hardware and have
Chris> noticed that when the lockup occurs, and we lose video, at
Chris> least one of the CPUs is executing in flush_tlb_no_ptcg() or
Chris> handle_IPI(), in the 'do' loop where TLB entries are being
Chris> purged. What I have observed is that the end address and the
Chris> start address are in completely different regions. Usually,
Chris> the start address is in region register 1 (address of
Chris> 0x2000XXXXXXXXXXXX) and the end address is in region register
Chris> 3 (address of 0x6000XXXXXXXXXXXX). I don't know if this is
Chris> the same problem I am seeing on the Lion, but I plan to
Chris> connect and ITP and a serial console (although we haven't
Chris> been able to get one to work yet on the Lion with BIOS 71) to
Chris> see if the symptoms are the same.
This is good info, thanks a lot! flush_tlb_range() should *never* be
called with an address range that spans entire regions. So this is
clearly the immediate problem. The question is of course how it got
there.
I'm just a couple of hours away from hopping on a plane to
Switzerland, and won't be back till Tuesday and am currently busy
tracking down two Heisenbugs which are keeping me from releasing an
updated kernel diff. If someone investigates this some more until in
the meantime, please keep us all posted.
--david
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (3 preceding siblings ...)
2001-02-21 21:02 ` David Mosberger
@ 2001-02-23 15:19 ` Jun Nakajima
2001-02-23 19:06 ` Seth, Rohit
` (12 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Jun Nakajima @ 2001-02-23 15:19 UTC (permalink / raw)
To: linux-ia64
This might explain why I was _not_ able to reproduce such a hang after
overnight stress tests on a 4-way Lion... My configuration has
CONFIG_ITANIUM_PTCG=y, and flush_tlb_no_ptcg() is never called by
flush_tlb_range() if it's defined.
Jack Steiner wrote:
>
> > > Anyway, I have ITPs connected to the IBM hardware and have noticed that
> > > when the lockup occurs, and we lose video, at least one of the CPUs is
> > > executing in flush_tlb_no_ptcg() or handle_IPI(), in the 'do' loop where
> > > TLB
> > > entries are being purged. What I have observed is that the end address and
> > > the start address are in completely different regions. Usually, the start
> > > address
> > > is in region register 1 (address of 0x2000XXXXXXXXXXXX) and the end address
> > > is in region register 3 (address of 0x6000XXXXXXXXXXXX). I don't know if
> > > this
> > > is the same problem I am seeing on the Lion, but I plan to connect and ITP
> > > and
> > > a serial console (although we haven't been able to get one to work yet on
> > > the
> > > Lion with BIOS 71) to see if the symptoms are the same.
> >
> > FWIW, we have seen EXACTLY the same hang running here on our system.
> > The start/end addresses for the purge cross region boundaries.
> >
> >
> > We are running a 2.4.0 kernel.
>
> I found a problem that was causing the lockup described above & I suspect this
> may responsible for some of the other hangs various folks have seen.
>
> There is code in flush_tlb_no_ptcg() that resends the IPI if other
> cpus have not responded within a short time. If this code get invoked, then
> it is possible for flush_cpu_count to get corrupted. When that happens, a cpu
> can be executing in handle_IPI() while flush_start/flush_end are changing.
> A cpu can pick up a non-matching flush_start/flush_end. This leads to hangs or
> lost TLB flushes.
>
> To verify that this could cause the hang, I changed the timeout in
> flush_tlb_no_ptcg() from 40000UL to 400UL. I hung before getting to multiuser mode
> with flush_start/flush_end in different regions.
>
> Here is the patch I used. Note: this is against 2.4.0,
>
> --- linux-trillian/arch/ia64/kernel/smp.c Thu Feb 22 14:35:28 2001
> +++ linux/arch/ia64/kernel/smp.c Thu Feb 22 14:19:46 2001
> @@ -321,6 +321,16 @@
> {
> send_IPI_allbutself(IPI_FLUSH_TLB);
> }
> +
> +void
> +smp_resend_flush_tlb(void)
> +{
> + /*
> + * Really need a null IPI but since this rarely should happen &
> + * since this code will go away, lets not add one.
> + */
> + send_IPI_allbutself(IPI_RESCHEDULE);
> +}
> #endif /* !CONFIG_ITANIUM_PTCG */
>
> /*
> --- linux-trillian/arch/ia64/mm/tlb.c Thu Feb 22 14:35:28 2001
> +++ linux/arch/ia64/mm/tlb.c Thu Feb 22 14:19:50 2001
> @@ -59,6 +59,7 @@
> flush_tlb_no_ptcg (unsigned long start, unsigned long end, unsigned long nbits)
> {
> extern void smp_send_flush_tlb (void);
> + extern void smp_resend_flush_tlb (void);
> unsigned long saved_tpr = 0;
> unsigned long flags;
>
> @@ -101,9 +102,8 @@
> {
> unsigned long start = ia64_get_itc();
> while (atomic_read(&flush_cpu_count) > 0) {
> - if ((ia64_get_itc() - start) > 40000UL) {
> - atomic_set(&flush_cpu_count, smp_num_cpus - 1);
> - smp_send_flush_tlb();
> + if ((ia64_get_itc() - start) > 400UL) {
> + smp_resend_flush_tlb();
> start = ia64_get_itc();
> }
> }
>
> --
> Thanks
>
> Jack Steiner (651-683-5302) (vnet 233-5302) steiner@sgi.com
>
> _______________________________________________
> Linux-IA64 mailing list
> Linux-IA64@linuxia64.org
> http://lists.linuxia64.org/lists/listinfo/linux-ia64
--
Jun U Nakajima
Core OS Development
SCO/Murray Hill, NJ
Email: jun@sco.com, Phone: 908-790-2352 Fax: 908-790-2426
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (4 preceding siblings ...)
2001-02-23 15:19 ` Jun Nakajima
@ 2001-02-23 19:06 ` Seth, Rohit
2001-02-23 19:20 ` Michael Madore
` (11 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Seth, Rohit @ 2001-02-23 19:06 UTC (permalink / raw)
To: linux-ia64
Just want to add that if you are running on B3 then you should have
CONFIG_ITANIUM_PTCG=y and CONFIG_DISABLE_VHPT not set.
-----Original Message-----
From: Jun Nakajima [mailto:jun@sco.com]
Sent: Friday, February 23, 2001 7:20 AM
To: Jack Steiner
Cc: linux-ia64@linuxia64.org
Subject: Re: [Linux-ia64] Re: Lockups on 2.4.1
This might explain why I was _not_ able to reproduce such a hang after
overnight stress tests on a 4-way Lion... My configuration has
CONFIG_ITANIUM_PTCG=y, and flush_tlb_no_ptcg() is never called by
flush_tlb_range() if it's defined.
Jack Steiner wrote:
>
> > > Anyway, I have ITPs connected to the IBM hardware and have noticed
that
> > > when the lockup occurs, and we lose video, at least one of the CPUs is
> > > executing in flush_tlb_no_ptcg() or handle_IPI(), in the 'do' loop
where
> > > TLB
> > > entries are being purged. What I have observed is that the end address
and
> > > the start address are in completely different regions. Usually, the
start
> > > address
> > > is in region register 1 (address of 0x2000XXXXXXXXXXXX) and the end
address
> > > is in region register 3 (address of 0x6000XXXXXXXXXXXX). I don't know
if
> > > this
> > > is the same problem I am seeing on the Lion, but I plan to connect and
ITP
> > > and
> > > a serial console (although we haven't been able to get one to work yet
on
> > > the
> > > Lion with BIOS 71) to see if the symptoms are the same.
> >
> > FWIW, we have seen EXACTLY the same hang running here on our system.
> > The start/end addresses for the purge cross region boundaries.
> >
> >
> > We are running a 2.4.0 kernel.
>
> I found a problem that was causing the lockup described above & I suspect
this
> may responsible for some of the other hangs various folks have seen.
>
> There is code in flush_tlb_no_ptcg() that resends the IPI if other
> cpus have not responded within a short time. If this code get invoked,
then
> it is possible for flush_cpu_count to get corrupted. When that happens, a
cpu
> can be executing in handle_IPI() while flush_start/flush_end are changing.
> A cpu can pick up a non-matching flush_start/flush_end. This leads to
hangs or
> lost TLB flushes.
>
> To verify that this could cause the hang, I changed the timeout in
> flush_tlb_no_ptcg() from 40000UL to 400UL. I hung before getting to
multiuser mode
> with flush_start/flush_end in different regions.
>
> Here is the patch I used. Note: this is against 2.4.0,
>
> --- linux-trillian/arch/ia64/kernel/smp.c Thu Feb 22 14:35:28 2001
> +++ linux/arch/ia64/kernel/smp.c Thu Feb 22 14:19:46 2001
> @@ -321,6 +321,16 @@
> {
> send_IPI_allbutself(IPI_FLUSH_TLB);
> }
> +
> +void
> +smp_resend_flush_tlb(void)
> +{
> + /*
> + * Really need a null IPI but since this rarely should happen &
> + * since this code will go away, lets not add one.
> + */
> + send_IPI_allbutself(IPI_RESCHEDULE);
> +}
> #endif /* !CONFIG_ITANIUM_PTCG */
>
> /*
> --- linux-trillian/arch/ia64/mm/tlb.c Thu Feb 22 14:35:28 2001
> +++ linux/arch/ia64/mm/tlb.c Thu Feb 22 14:19:50 2001
> @@ -59,6 +59,7 @@
> flush_tlb_no_ptcg (unsigned long start, unsigned long end, unsigned long
nbits)
> {
> extern void smp_send_flush_tlb (void);
> + extern void smp_resend_flush_tlb (void);
> unsigned long saved_tpr = 0;
> unsigned long flags;
>
> @@ -101,9 +102,8 @@
> {
> unsigned long start = ia64_get_itc();
> while (atomic_read(&flush_cpu_count) > 0) {
> - if ((ia64_get_itc() - start) > 40000UL) {
> - atomic_set(&flush_cpu_count, smp_num_cpus
- 1);
> - smp_send_flush_tlb();
> + if ((ia64_get_itc() - start) > 400UL) {
> + smp_resend_flush_tlb();
> start = ia64_get_itc();
> }
> }
>
> --
> Thanks
>
> Jack Steiner (651-683-5302) (vnet 233-5302) steiner@sgi.com
>
> _______________________________________________
> Linux-IA64 mailing list
> Linux-IA64@linuxia64.org
> http://lists.linuxia64.org/lists/listinfo/linux-ia64
--
Jun U Nakajima
Core OS Development
SCO/Murray Hill, NJ
Email: jun@sco.com, Phone: 908-790-2352 Fax: 908-790-2426
_______________________________________________
Linux-IA64 mailing list
Linux-IA64@linuxia64.org
http://lists.linuxia64.org/lists/listinfo/linux-ia64
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (5 preceding siblings ...)
2001-02-23 19:06 ` Seth, Rohit
@ 2001-02-23 19:20 ` Michael Madore
2001-02-23 19:48 ` Seth, Rohit
` (10 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Michael Madore @ 2001-02-23 19:20 UTC (permalink / raw)
To: linux-ia64
What is the recommendation for earlier B steppings?
On Fri, Feb 23, 2001 at 11:06:53AM -0800, Seth, Rohit wrote:
> Just want to add that if you are running on B3 then you should have
> CONFIG_ITANIUM_PTCG=y and CONFIG_DISABLE_VHPT not set.
>
> -----Original Message-----
> From: Jun Nakajima [mailto:jun@sco.com]
> Sent: Friday, February 23, 2001 7:20 AM
> To: Jack Steiner
> Cc: linux-ia64@linuxia64.org
> Subject: Re: [Linux-ia64] Re: Lockups on 2.4.1
>
>
> This might explain why I was _not_ able to reproduce such a hang after
> overnight stress tests on a 4-way Lion... My configuration has
> CONFIG_ITANIUM_PTCG=y, and flush_tlb_no_ptcg() is never called by
> flush_tlb_range() if it's defined.
>
>
--
Mike Madore
Software Engineer
TurboLinux, Inc.
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (6 preceding siblings ...)
2001-02-23 19:20 ` Michael Madore
@ 2001-02-23 19:48 ` Seth, Rohit
2001-02-23 20:00 ` Jesse Barnes
` (9 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Seth, Rohit @ 2001-02-23 19:48 UTC (permalink / raw)
To: linux-ia64
For stablity on earlier steppings CONFIG_ITANIUM_PTCG not set,
CONFIG_DISABLE_VHPT=y, CONFIG_ITANIUM_BSTEP_SPECIFIC =y
CONFIG_ITANIUM_B[012]_SPECIFIC should be = y
-----Original Message-----
From: Michael Madore [mailto:mmadore@turbolinux.com]
Sent: Friday, February 23, 2001 11:20 AM
To: Seth, Rohit
Cc: linux-ia64@linuxia64.org
Subject: Re: [Linux-ia64] Re: Lockups on 2.4.1
What is the recommendation for earlier B steppings?
On Fri, Feb 23, 2001 at 11:06:53AM -0800, Seth, Rohit wrote:
> Just want to add that if you are running on B3 then you should have
> CONFIG_ITANIUM_PTCG=y and CONFIG_DISABLE_VHPT not set.
>
> -----Original Message-----
> From: Jun Nakajima [mailto:jun@sco.com]
> Sent: Friday, February 23, 2001 7:20 AM
> To: Jack Steiner
> Cc: linux-ia64@linuxia64.org
> Subject: Re: [Linux-ia64] Re: Lockups on 2.4.1
>
>
> This might explain why I was _not_ able to reproduce such a hang after
> overnight stress tests on a 4-way Lion... My configuration has
> CONFIG_ITANIUM_PTCG=y, and flush_tlb_no_ptcg() is never called by
> flush_tlb_range() if it's defined.
>
>
--
Mike Madore
Software Engineer
TurboLinux, Inc.
_______________________________________________
Linux-IA64 mailing list
Linux-IA64@linuxia64.org
http://lists.linuxia64.org/lists/listinfo/linux-ia64
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (7 preceding siblings ...)
2001-02-23 19:48 ` Seth, Rohit
@ 2001-02-23 20:00 ` Jesse Barnes
2001-02-24 13:39 ` Francis Galiegue
` (8 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Jesse Barnes @ 2001-02-23 20:00 UTC (permalink / raw)
To: linux-ia64
This seems like the sort of info that should be in Configure.help. Anyone
interested in posting a patch to add it?
Jesse
For stablity on earlier steppings CONFIG_ITANIUM_PTCG not set,
CONFIG_DISABLE_VHPT=y, CONFIG_ITANIUM_BSTEP_SPECIFIC =y
CONFIG_ITANIUM_B[012]_SPECIFIC should be = y
-----Original Message-----
From: Michael Madore [mailto:mmadore@turbolinux.com]
Sent: Friday, February 23, 2001 11:20 AM
To: Seth, Rohit
Cc: linux-ia64@linuxia64.org
Subject: Re: [Linux-ia64] Re: Lockups on 2.4.1
What is the recommendation for earlier B steppings?
On Fri, Feb 23, 2001 at 11:06:53AM -0800, Seth, Rohit wrote:
> Just want to add that if you are running on B3 then you should have
> CONFIG_ITANIUM_PTCG=y and CONFIG_DISABLE_VHPT not set.
>
> -----Original Message-----
> From: Jun Nakajima [mailto:jun@sco.com]
> Sent: Friday, February 23, 2001 7:20 AM
> To: Jack Steiner
> Cc: linux-ia64@linuxia64.org
> Subject: Re: [Linux-ia64] Re: Lockups on 2.4.1
>
>
> This might explain why I was _not_ able to reproduce such a hang after
> overnight stress tests on a 4-way Lion... My configuration has
> CONFIG_ITANIUM_PTCG=y, and flush_tlb_no_ptcg() is never called by
> flush_tlb_range() if it's defined.
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (8 preceding siblings ...)
2001-02-23 20:00 ` Jesse Barnes
@ 2001-02-24 13:39 ` Francis Galiegue
2001-02-24 14:44 ` Francis Galiegue
` (7 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Francis Galiegue @ 2001-02-24 13:39 UTC (permalink / raw)
To: linux-ia64
[-- Attachment #1: Type: TEXT/PLAIN, Size: 597 bytes --]
On Fri, 23 Feb 2001, Jesse Barnes wrote:
>
> This seems like the sort of info that should be in Configure.help. Anyone
> interested in posting a patch to add it?
>
As attachment. This is over 2.4.1 + 010131 ia64 patch. I also added help for
B1/B2 specific stuff. Please verify that it is correct...
--
Francis Galiegue, fg@mandrakesoft.com - Normand et fier de l'être
"Programming is a race between programmers, who try and make more and more
idiot-proof software, and universe, which produces more and more remarkable
idiots. Until now, universe leads the race" -- R. Cook
[-- Attachment #2: Type: TEXT/PLAIN, Size: 3499 bytes --]
diff -urN linux-old/Documentation/Configure.help linux/Documentation/Configure.help
--- linux-old/Documentation/Configure.help Sat Feb 24 08:29:45 2001
+++ linux/Documentation/Configure.help Sat Feb 24 09:24:15 2001
@@ -17119,17 +17119,36 @@
with a B0-step CPU. You have a B0-step CPU if the "revision" field in
/proc/cpuinfo is 1.
+Enable Itanium B1-step specific code
+CONFIG_ITANIUM_B1_SPECIFIC
+ Select this option to bild a kernel for an Itanium prototype system
+ with a B1-step CPU. You have a B0-step CPU if the "revision" field in
+ /proc/cpuinfo is 2.
+
+Enable Itanium B2-step specific code
+CONFIG_ITANIUM_B2_SPECIFIC
+ Select this option to bild a kernel for an Itanium prototype system
+ with a B2-step CPU. You have a B0-step CPU if the "revision" field in
+ /proc/cpuinfo is 3.
+
Force interrupt redirection
CONFIG_IA64_HAVE_IRQREDIR
Select this option if you know that your system has the ability to
redirect interrupts to different CPUs. Select N here if you're
unsure.
-Enable use of global TLB purge instruction (ptc.g)
+Enable use of global TLB purge instruction (ptc.g) (READ HELP!)
CONFIG_ITANIUM_PTCG
- Say Y here if you want the kernel to use the IA-64 "ptc.g"
- instruction to flush the TLB on all CPUs. Select N here if
- you're unsure.
+ Saying Y here will allow the kernel to use the IA-64 "ptc.g"
+ instruction to flush the TLB on all CPUs.
+
+ Say N here if the kernel will run on early B step CPUs (B0, B1 and B2). You
+ have such a CPU (or CPUs) if the revision field(s) in /proc/cpuinfo range(s)
+ from 1 to 3. In this case, you should also say Y to "Disable VHPT" (this is
+ in "Kernel hacking" section).
+
+ If you have a more recent CPU, however, you will have to do the opposite: say
+ Y here but N to "Disable VHPT".
Enable SoftSDV hacks
CONFIG_IA64_SOFTSDV_HACKS
@@ -17168,6 +17187,16 @@
To use this option, you have to check that the "/proc file system
support" (CONFIG_PROC_FS) is enabled, too.
+
+Disable VHPT (READ HELP!)
+CONFIG_DISABLE_VHPT
+ Say Y here if the kernel will run on early B step CPUs (B0, B1 and B2). You
+ have such a CPU (or CPUs) if the revision field(s) in /proc/cpuinfo range(s)
+ from 1 to 3. In this case, you should also say N to "Enable use of global TLB
+ purge instruction" (CONFIG_ITANIUM_PTCG) in the "General Setup" section.
+
+ If your CPU(s) is/are more recent, however, you will preferably do the
+ opposite: say N here, but say Y to CONFIG_ITANIUM_PTCG.
#
# A couple of things I keep forgetting:
diff -urN linux-old/arch/ia64/config.in linux/arch/ia64/config.in
--- linux-old/arch/ia64/config.in Sat Feb 24 08:29:45 2001
+++ linux/arch/ia64/config.in Sat Feb 24 09:26:27 2001
@@ -58,7 +58,7 @@
if [ "$CONFIG_ITANIUM_CSTEP_SPECIFIC" = "y" ]; then
bool ' Enable Itanium C0-step specific code' CONFIG_ITANIUM_C0_SPECIFIC
fi
- bool ' Enable use of global TLB purge instruction (ptc.g)' CONFIG_ITANIUM_PTCG
+ bool ' Enable use of global TLB purge instruction (ptc.g) (READ HELP!)' CONFIG_ITANIUM_PTCG
fi
if [ "$CONFIG_IA64_DIG" = "y" ]; then
@@ -259,6 +259,6 @@
bool 'Turn on irq debug checks (slow!)' CONFIG_IA64_DEBUG_IRQ
bool 'Print possible IA64 hazards to console' CONFIG_IA64_PRINT_HAZARDS
bool 'Enable new unwind support' CONFIG_IA64_NEW_UNWIND
-bool 'Disable VHPT' CONFIG_DISABLE_VHPT
+bool 'Disable VHPT (READ HELP!)' CONFIG_DISABLE_VHPT
endmenu
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (9 preceding siblings ...)
2001-02-24 13:39 ` Francis Galiegue
@ 2001-02-24 14:44 ` Francis Galiegue
2001-02-24 18:45 ` Michael Madore
` (6 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Francis Galiegue @ 2001-02-24 14:44 UTC (permalink / raw)
To: linux-ia64
On Sat, 24 Feb 2001, Francis Galiegue wrote:
>
> As attachment. This is over 2.4.1 + 010131 ia64 patch. I also added help for
> B1/B2 specific stuff. Please verify that it is correct...
>
Speaking of help, BTW...
1. what does VHPT mean anyway?
2. What is this new unwind stuff?
--
Francis Galiegue, fg@mandrakesoft.com - Normand et fier de l'être
"Programming is a race between programmers, who try and make more and more
idiot-proof software, and universe, which produces more and more remarkable
idiots. Until now, universe leads the race" -- R. Cook
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (10 preceding siblings ...)
2001-02-24 14:44 ` Francis Galiegue
@ 2001-02-24 18:45 ` Michael Madore
2001-02-24 23:18 ` Joseph V Moss
` (5 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Michael Madore @ 2001-02-24 18:45 UTC (permalink / raw)
To: linux-ia64
Hi,
On Sat, Feb 24, 2001 at 03:44:07PM +0100, Francis Galiegue wrote:
> Speaking of help, BTW...
>
> 1. what does VHPT mean anyway?
VHPT is an acronym for Virtual Hash Page Table. It is an in memory
extension to the processor's translation lookaside buffer. On a TLB miss,
the CPU can optionally search the VHPT for a translation. I believe in
Linux, the short-format is used. For details, refer to:
Intel IA-64 Architecture
Software Developers Manual
Volume 2: IA-64 System Architecture
Section 4.1.5: Virtual Hash Page Table (VHPT)
--
Mike Madore
Software Engineer
TurboLinux, Inc.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (11 preceding siblings ...)
2001-02-24 18:45 ` Michael Madore
@ 2001-02-24 23:18 ` Joseph V Moss
2001-02-25 2:43 ` Francis Galiegue
` (4 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Joseph V Moss @ 2001-02-24 23:18 UTC (permalink / raw)
To: linux-ia64
> On Fri, 23 Feb 2001, Jesse Barnes wrote:
>
> >
> > This seems like the sort of info that should be in Configure.help. Anyone
> > interested in posting a patch to add it?
> >
>
> As attachment. This is over 2.4.1 + 010131 ia64 patch. I also added help for
> B1/B2 specific stuff. Please verify that it is correct...
You've got a small cut'n'paste error there:
> +Enable Itanium B1-step specific code
> +CONFIG_ITANIUM_B1_SPECIFIC
> + Select this option to bild a kernel for an Itanium prototype system
> + with a B1-step CPU. You have a B0-step CPU if the "revision" field in
^^
> + /proc/cpuinfo is 2.
> +
> +Enable Itanium B2-step specific code
> +CONFIG_ITANIUM_B2_SPECIFIC
> + Select this option to bild a kernel for an Itanium prototype system
> + with a B2-step CPU. You have a B0-step CPU if the "revision" field in
^^
> + /proc/cpuinfo is 3.
> +
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (12 preceding siblings ...)
2001-02-24 23:18 ` Joseph V Moss
@ 2001-02-25 2:43 ` Francis Galiegue
2001-02-26 20:52 ` Jim Wilson
` (3 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Francis Galiegue @ 2001-02-25 2:43 UTC (permalink / raw)
To: linux-ia64
[-- Attachment #1: Type: TEXT/PLAIN, Size: 574 bytes --]
On Sat, 24 Feb 2001, Joseph V Moss wrote:
>
> You've got a small cut'n'paste error there:
>
[...]
> > + with a B1-step CPU. You have a B0-step CPU if the "revision" field in
> ^^
Oops! Corrected version attached. Thanks for noticing.
--
Francis Galiegue, fg@mandrakesoft.com - Normand et fier de l'être
"Programming is a race between programmers, who try and make more and more
idiot-proof software, and universe, which produces more and more remarkable
idiots. Until now, universe leads the race" -- R. Cook
[-- Attachment #2: Type: TEXT/PLAIN, Size: 3499 bytes --]
diff -urN linux-old/Documentation/Configure.help linux/Documentation/Configure.help
--- linux-old/Documentation/Configure.help Sat Feb 24 08:29:45 2001
+++ linux/Documentation/Configure.help Sat Feb 24 09:24:15 2001
@@ -17119,17 +17119,36 @@
with a B0-step CPU. You have a B0-step CPU if the "revision" field in
/proc/cpuinfo is 1.
+Enable Itanium B1-step specific code
+CONFIG_ITANIUM_B1_SPECIFIC
+ Select this option to bild a kernel for an Itanium prototype system
+ with a B1-step CPU. You have a B1-step CPU if the "revision" field in
+ /proc/cpuinfo is 2.
+
+Enable Itanium B2-step specific code
+CONFIG_ITANIUM_B2_SPECIFIC
+ Select this option to bild a kernel for an Itanium prototype system
+ with a B2-step CPU. You have a B2-step CPU if the "revision" field in
+ /proc/cpuinfo is 3.
+
Force interrupt redirection
CONFIG_IA64_HAVE_IRQREDIR
Select this option if you know that your system has the ability to
redirect interrupts to different CPUs. Select N here if you're
unsure.
-Enable use of global TLB purge instruction (ptc.g)
+Enable use of global TLB purge instruction (ptc.g) (READ HELP!)
CONFIG_ITANIUM_PTCG
- Say Y here if you want the kernel to use the IA-64 "ptc.g"
- instruction to flush the TLB on all CPUs. Select N here if
- you're unsure.
+ Saying Y here will allow the kernel to use the IA-64 "ptc.g"
+ instruction to flush the TLB on all CPUs.
+
+ Say N here if the kernel will run on early B step CPUs (B0, B1 and B2). You
+ have such a CPU (or CPUs) if the revision field(s) in /proc/cpuinfo range(s)
+ from 1 to 3. In this case, you should also say Y to "Disable VHPT" (this is
+ in "Kernel hacking" section).
+
+ If you have a more recent CPU, however, you will have to do the opposite: say
+ Y here but N to "Disable VHPT".
Enable SoftSDV hacks
CONFIG_IA64_SOFTSDV_HACKS
@@ -17168,6 +17187,16 @@
To use this option, you have to check that the "/proc file system
support" (CONFIG_PROC_FS) is enabled, too.
+
+Disable VHPT (READ HELP!)
+CONFIG_DISABLE_VHPT
+ Say Y here if the kernel will run on early B step CPUs (B0, B1 and B2). You
+ have such a CPU (or CPUs) if the revision field(s) in /proc/cpuinfo range(s)
+ from 1 to 3. In this case, you should also say N to "Enable use of global TLB
+ purge instruction" (CONFIG_ITANIUM_PTCG) in the "General Setup" section.
+
+ If your CPU(s) is/are more recent, however, you will preferably do the
+ opposite: say N here, but say Y to CONFIG_ITANIUM_PTCG.
#
# A couple of things I keep forgetting:
diff -urN linux-old/arch/ia64/config.in linux/arch/ia64/config.in
--- linux-old/arch/ia64/config.in Sat Feb 24 08:29:45 2001
+++ linux/arch/ia64/config.in Sat Feb 24 09:26:27 2001
@@ -58,7 +58,7 @@
if [ "$CONFIG_ITANIUM_CSTEP_SPECIFIC" = "y" ]; then
bool ' Enable Itanium C0-step specific code' CONFIG_ITANIUM_C0_SPECIFIC
fi
- bool ' Enable use of global TLB purge instruction (ptc.g)' CONFIG_ITANIUM_PTCG
+ bool ' Enable use of global TLB purge instruction (ptc.g) (READ HELP!)' CONFIG_ITANIUM_PTCG
fi
if [ "$CONFIG_IA64_DIG" = "y" ]; then
@@ -259,6 +259,6 @@
bool 'Turn on irq debug checks (slow!)' CONFIG_IA64_DEBUG_IRQ
bool 'Print possible IA64 hazards to console' CONFIG_IA64_PRINT_HAZARDS
bool 'Enable new unwind support' CONFIG_IA64_NEW_UNWIND
-bool 'Disable VHPT' CONFIG_DISABLE_VHPT
+bool 'Disable VHPT (READ HELP!)' CONFIG_DISABLE_VHPT
endmenu
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Linux-ia64] Re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (13 preceding siblings ...)
2001-02-25 2:43 ` Francis Galiegue
@ 2001-02-26 20:52 ` Jim Wilson
2001-03-07 3:51 ` [linux-ia64] " Tom King
` (2 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: Jim Wilson @ 2001-02-26 20:52 UTC (permalink / raw)
To: linux-ia64
>2. What is this new unwind stuff?
You probably want to know in the context of the kernel. I can't answer that.
I can answer in the context of C++. The unwind stuff is used for exception
handling. When a program throws, we use the unwind stuff to unwind stack
frames back to where the exception handler (catch clause) is. The unwind
data is similar in spirit to the DWARF2 frame info that is currently used by
most all other gcc targets. It is basically the same data just encoded
differently.
The format of the data is documented in the Software Conventions and Runtime
Architecture manual, section 11. The library interface for C++ EH is
documented in the psABI, section 6. This hasn't been implemented yet, we are
working on it right now.
Jim
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-ia64] re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (14 preceding siblings ...)
2001-02-26 20:52 ` Jim Wilson
@ 2001-03-07 3:51 ` Tom King
2001-03-07 20:34 ` Jim Wilson
2001-03-08 3:45 ` Tom King
17 siblings, 0 replies; 19+ messages in thread
From: Tom King @ 2001-03-07 3:51 UTC (permalink / raw)
To: linux-ia64
>>2. What is this new unwind stuff?
> You probably want to know in the context of the kernel. I can't answer
that.
> I can answer in the context of C++. The unwind stuff is used for
exception
> handling. When a program throws, we use the unwind stuff to unwind stack
> frames back to where the exception handler (catch clause) is. The unwind
> data is similar in spirit to the DWARF2 frame info that is currently used
by
> most all other gcc targets. It is basically the same data just encoded
> differently.
> The format of the data is documented in the Software Conventions and
Runtime
> Architecture manual, section 11. The library interface for C++ EH is
> documented in the psABI, section 6. This hasn't been implemented yet, we
are
> working on it right now.
> Jim
This is really a question about throw/catch on g++. I wonder if your
description of NYI is the reason for the failure?
I am running gnupro-1117-6 (gcc version 2.96-ia64-000717 snap 001117)
on Redhat 7.0 fisher (kernel 2.4.0-0.99.11smp) on a lion processor
the following program runs o.k. with ./a.out 5 - but fails with ./a.out 6
# throws.h
#include <iostream>
#include <stdlib.h>
class exc {
private :
int level;
public :
void setLevel(int il) { level = il; cout << "set exc " << level << "\n"; }
int getLevel() { return level; }
void here(){ cout << "exception" ; }
};
void doit(int, int);
void doit2(int, int);
//--------------------------------------------------------------------------
---------------------------
//throws.cpp
#include "throws.h"
int main(int argc, char ** argv, char **envp)
{
int count = atoi(*(argv+1));
try {
doit(count, count);
}
catch(exc e) {
cout << "doit at top level " << e.getLevel() << "\n";
}
}
void doit(int start, int level) {
if (level = 0 ) {
cout << "o.k.\n" ;
exc e;
e.setLevel(start);
throw(e);
}
else {
doit(start, level - 1);
}
}
//--------------------------------------------------------------------------
-----------------
bash% g++ -Wall -ggdb -fPIC -fexceptions -mb-step throws.cpp
bash% ./a.out 5
o.k.
set exc 5
doit at top level 5
bash% ./a.out 6
o.k.
set exc 6
Illegal instruction (core dumped)
Tom King
Kernel Engineer
Bullant Technology
DID +61-2-8925-1618
http://www.bullant.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [linux-ia64] re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (15 preceding siblings ...)
2001-03-07 3:51 ` [linux-ia64] " Tom King
@ 2001-03-07 20:34 ` Jim Wilson
2001-03-08 3:45 ` Tom King
17 siblings, 0 replies; 19+ messages in thread
From: Jim Wilson @ 2001-03-07 20:34 UTC (permalink / raw)
To: linux-ia64
>This is really a question about throw/catch on g++.
I am not a g++ expert. The best place to find g++ experts is on the FSF
gcc mailing lists. For instance, gcc-bugs@gcc.gnu.org.
>I wonder if your description of NYI is the reason for the failure?
I don't know what "NYI" means.
>I am running gnupro-1117-6 (gcc version 2.96-ia64-000717 snap 001117)
>on Redhat 7.0 fisher (kernel 2.4.0-0.99.11smp) on a lion processor
>the following program runs o.k. with ./a.out 5 - but fails with ./a.out 6
I reproduced the problem easily enough with the old compiler release.
Take a quick stab with the debugger, I see that the gp value in r1 is wrong
which is curious. I will have to look into this some more. I would guess
that there might be a problem with the backing store/stacked registers in the
gcc unwinder. That is one way to explain why it works with 5 nesting levels,
but not at 6 nesting levels.
A more interesting question is whether or not this testcase works with the
gcc-3 pre-release branch. Unfortunately, your testcase will not compile with
gcc-3. I get an error complaining that cout is undeclared. I don't know
enough about C++ to know how to fix this. This means I can't check to see
if the problem has already been fixed. Can you provide me with a testcase
that works with gcc-3?
Jim
^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: [linux-ia64] re: Lockups on 2.4.1
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
` (16 preceding siblings ...)
2001-03-07 20:34 ` Jim Wilson
@ 2001-03-08 3:45 ` Tom King
17 siblings, 0 replies; 19+ messages in thread
From: Tom King @ 2001-03-08 3:45 UTC (permalink / raw)
To: linux-ia64
just remove all the couts and see what happens - "cout is like printf"
the problem is that the exception handling does not work -
I copied the gcc 3 prerelease and will build it (I don't have all the source
yet)
>I don't know what "NYI" means.
>> The format of the data is documented in the Software Conventions and
Runtime
>> Architecture manual, section 11. The library interface for C++ EH is
>> documented in the psABI, section 6. This hasn't been implemented yet, we
are
>> working on it right now.
NYI not yet implemented. I assumed that possibly this might be the reason
why it wasn't working
# throws.h
#include <stdlib.h>
class exc {
private :
int level;
public :
void setLevel(int il) { level = il; }
int getLevel() { return level; }
};
void doit(int, int);
void doit2(int, int);
//--------------------------------------------------------------------------
//throws.cpp
#include "throws.h"
int main(int argc, char ** argv, char **envp)
{
int count = atoi(*(argv+1));
try {
doit(count, count);
}
catch(exc e) {
}
}
void doit(int start, int level) {
if (level = 0 ) {
exc e;
e.setLevel(start);
throw(e);
}
else {
doit(start, level - 1);
}
}
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2001-03-08 3:45 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
2001-02-21 17:16 ` Gerrit Huizenga
2001-02-21 17:57 ` David Mosberger
2001-02-21 18:58 ` Chris McDermott
2001-02-21 21:02 ` David Mosberger
2001-02-23 15:19 ` Jun Nakajima
2001-02-23 19:06 ` Seth, Rohit
2001-02-23 19:20 ` Michael Madore
2001-02-23 19:48 ` Seth, Rohit
2001-02-23 20:00 ` Jesse Barnes
2001-02-24 13:39 ` Francis Galiegue
2001-02-24 14:44 ` Francis Galiegue
2001-02-24 18:45 ` Michael Madore
2001-02-24 23:18 ` Joseph V Moss
2001-02-25 2:43 ` Francis Galiegue
2001-02-26 20:52 ` Jim Wilson
2001-03-07 3:51 ` [linux-ia64] " Tom King
2001-03-07 20:34 ` Jim Wilson
2001-03-08 3:45 ` Tom King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox