* Re: [Linux-ia64] Hard "hangs" with 2.4.16
2001-12-10 20:46 [Linux-ia64] Hard "hangs" with 2.4.16 Chris McDermott
@ 2001-12-10 21:06 ` Jack Steiner
2001-12-10 21:19 ` Andreas Schwab
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Jack Steiner @ 2001-12-10 21:06 UTC (permalink / raw)
To: linux-ia64
>
> Anybody else been seeing hangs with 2.4.16+ia64-1128.diff?
>
> I'm running on 4x Lions, some with B3's and some with C0's.
> The BIOS revision is 83B. I have seen the hang on at least
> 4 of the systems. This is a hard hang, no kb interrupts, so
> kdb doesn't help. I don't have much to go on yet. The hangs
> occur in random places, sometimes at various places during
> boot and sometimes after the system has been up for a few
> minutes. I put an ITP (American Arium) on one of the failing
> systems and it didn't fail. Stayed up all night. I moved the
> ITP to another failing system and was able to catch 1
> occurrence of a "hang". I'm still trying to characterize this
> hang to determine whether this is the same thing I am seeing
> on the other systems or just a hardware (configuration) problem.
> What I saw from the ITP was that all of the processors had
> apparently reset (all of their IPs had returned to the reset
> vector, 0x80000000FFFFFF90 (I did have the ITP set to break on
> reset, but it didn't trigger). Not much else to go on, since I
> couldn't even dump any useful processor context.
We saw the same symptom & tracked it down to a bug in the
write_unlock() macro. The result was a memory-ordering problem
that caused the "unlock" of the tasklist_lock to be made
globally visible before the "store's" to the link fields were
visible. The result was a closed loop in the tasklist links.
Try adding a barrier operation BEFORE the clear_bit.
OLD
#define write_unlock(x) ({clear_bit(31, (x)); mb();})
NEW
#define write_unlock(x) ({smp_mb__before_clear_bit(); clear_bit(31, (x));})
This has been fixed in a later version of the IA64 patch.
>
> Anyone else seeing this sort of odd behavior with 2.4.16 on
> Intel Lion SDVs?
>
>
> Thanks,
>
> Chris McDermott
>
>
> _______________________________________________
> Linux-IA64 mailing list
> Linux-IA64@linuxia64.org
> http://lists.linuxia64.org/lists/listinfo/linux-ia64
>
--
Thanks
Jack Steiner (651-683-5302) (vnet 233-5302) steiner@sgi.com
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [Linux-ia64] Hard "hangs" with 2.4.16
2001-12-10 20:46 [Linux-ia64] Hard "hangs" with 2.4.16 Chris McDermott
2001-12-10 21:06 ` Jack Steiner
@ 2001-12-10 21:19 ` Andreas Schwab
2001-12-10 21:21 ` Chris McDermott
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Andreas Schwab @ 2001-12-10 21:19 UTC (permalink / raw)
To: linux-ia64
Jack Steiner <steiner@sgi.com> writes:
|> >
|> > Anybody else been seeing hangs with 2.4.16+ia64-1128.diff?
|> >
[...]
|> This has been fixed in a later version of the IA64 patch.
Which later version? 1128 is still the latest.
Andreas.
--
Andreas Schwab "And now for something
Andreas.Schwab@suse.de completely different."
SuSE Labs, SuSE GmbH, Schanzäckerstr. 10, D-90443 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [Linux-ia64] Hard "hangs" with 2.4.16
2001-12-10 20:46 [Linux-ia64] Hard "hangs" with 2.4.16 Chris McDermott
2001-12-10 21:06 ` Jack Steiner
2001-12-10 21:19 ` Andreas Schwab
@ 2001-12-10 21:21 ` Chris McDermott
2001-12-10 22:49 ` Chris McDermott
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Chris McDermott @ 2001-12-10 21:21 UTC (permalink / raw)
To: linux-ia64
One minor correction 80000000FFFFFF90 is PALE_CHECK,
80000000FFFFFFB0 is PALE_RESET.
On Mon, Dec 10, 2001 at 12:46 -0800, Chris McDermott - Linux wrote:
> Anybody else been seeing hangs with 2.4.16+ia64-1128.diff?
>
> I'm running on 4x Lions, some with B3's and some with C0's.
> The BIOS revision is 83B. I have seen the hang on at least
> 4 of the systems. This is a hard hang, no kb interrupts, so
> kdb doesn't help. I don't have much to go on yet. The hangs
> occur in random places, sometimes at various places during
> boot and sometimes after the system has been up for a few
> minutes. I put an ITP (American Arium) on one of the failing
> systems and it didn't fail. Stayed up all night. I moved the
> ITP to another failing system and was able to catch 1
> occurrence of a "hang". I'm still trying to characterize this
> hang to determine whether this is the same thing I am seeing
> on the other systems or just a hardware (configuration) problem.
> What I saw from the ITP was that all of the processors had
> apparently reset (all of their IPs had returned to the reset
> vector, 0x80000000FFFFFF90 (I did have the ITP set to break on
> reset, but it didn't trigger). Not much else to go on, since I
> couldn't even dump any useful processor context.
>
> Anyone else seeing this sort of odd behavior with 2.4.16 on
> Intel Lion SDVs?
>
>
> Thanks,
>
> Chris McDermott
>
>
> _______________________________________________
> Linux-IA64 mailing list
> Linux-IA64@linuxia64.org
> http://lists.linuxia64.org/lists/listinfo/linux-ia64
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [Linux-ia64] Hard "hangs" with 2.4.16
2001-12-10 20:46 [Linux-ia64] Hard "hangs" with 2.4.16 Chris McDermott
` (2 preceding siblings ...)
2001-12-10 21:21 ` Chris McDermott
@ 2001-12-10 22:49 ` Chris McDermott
2001-12-11 0:00 ` David Mosberger
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Chris McDermott @ 2001-12-10 22:49 UTC (permalink / raw)
To: linux-ia64
Jack,
Looks like my problem is something else. This patch didn't
correct the problem.
Thanks,
Chris
On Mon, Dec 10, 2001 at 15:06 -0600, Jack Steiner wrote:
> >
> > Anybody else been seeing hangs with 2.4.16+ia64-1128.diff?
> >
> > I'm running on 4x Lions, some with B3's and some with C0's.
> > The BIOS revision is 83B. I have seen the hang on at least
> > 4 of the systems. This is a hard hang, no kb interrupts, so
> > kdb doesn't help. I don't have much to go on yet. The hangs
> > occur in random places, sometimes at various places during
> > boot and sometimes after the system has been up for a few
> > minutes. I put an ITP (American Arium) on one of the failing
> > systems and it didn't fail. Stayed up all night. I moved the
> > ITP to another failing system and was able to catch 1
> > occurrence of a "hang". I'm still trying to characterize this
> > hang to determine whether this is the same thing I am seeing
> > on the other systems or just a hardware (configuration) problem.
> > What I saw from the ITP was that all of the processors had
> > apparently reset (all of their IPs had returned to the reset
> > vector, 0x80000000FFFFFF90 (I did have the ITP set to break on
> > reset, but it didn't trigger). Not much else to go on, since I
> > couldn't even dump any useful processor context.
>
> We saw the same symptom & tracked it down to a bug in the
> write_unlock() macro. The result was a memory-ordering problem
> that caused the "unlock" of the tasklist_lock to be made
> globally visible before the "store's" to the link fields were
> visible. The result was a closed loop in the tasklist links.
>
>
> Try adding a barrier operation BEFORE the clear_bit.
>
> OLD
> #define write_unlock(x) ({clear_bit(31, (x)); mb();})
>
> NEW
> #define write_unlock(x) ({smp_mb__before_clear_bit(); clear_bit(31, (x));})
>
>
> This has been fixed in a later version of the IA64 patch.
>
>
>
> >
> > Anyone else seeing this sort of odd behavior with 2.4.16 on
> > Intel Lion SDVs?
> >
> >
> > Thanks,
> >
> > Chris McDermott
> >
> >
> > _______________________________________________
> > Linux-IA64 mailing list
> > Linux-IA64@linuxia64.org
> > http://lists.linuxia64.org/lists/listinfo/linux-ia64
> >
>
>
> --
> Thanks
>
> Jack Steiner (651-683-5302) (vnet 233-5302) steiner@sgi.com
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [Linux-ia64] Hard "hangs" with 2.4.16
2001-12-10 20:46 [Linux-ia64] Hard "hangs" with 2.4.16 Chris McDermott
` (3 preceding siblings ...)
2001-12-10 22:49 ` Chris McDermott
@ 2001-12-11 0:00 ` David Mosberger
2001-12-11 0:01 ` David Mosberger
2001-12-11 19:15 ` Chris McDermott
6 siblings, 0 replies; 8+ messages in thread
From: David Mosberger @ 2001-12-11 0:00 UTC (permalink / raw)
To: linux-ia64
>>>>> On 10 Dec 2001 22:19:09 +0100, Andreas Schwab <schwab@suse.de> said:
Andreas> Jack Steiner <steiner@sgi.com> writes:
Andreas> |> >
Andreas> |> > Anybody else been seeing hangs with 2.4.16+ia64-1128.diff?
Andreas> |> >
Andreas> [...]
Andreas> |> This has been fixed in a later version of the IA64 patch.
Andreas> Which later version? 1128 is still the latest.
There is no later version. I'll see what happens this week. I hope
that either Linus or Marcelo will release a new version. If so, I'll
sync up to those. In any case, I will put out a new ia64 patch this
week because I'll be on vacation the week after.
--david
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [Linux-ia64] Hard "hangs" with 2.4.16
2001-12-10 20:46 [Linux-ia64] Hard "hangs" with 2.4.16 Chris McDermott
` (4 preceding siblings ...)
2001-12-11 0:00 ` David Mosberger
@ 2001-12-11 0:01 ` David Mosberger
2001-12-11 19:15 ` Chris McDermott
6 siblings, 0 replies; 8+ messages in thread
From: David Mosberger @ 2001-12-11 0:01 UTC (permalink / raw)
To: linux-ia64
>>>>> On Mon, 10 Dec 2001 14:49:45 -0800, Chris McDermott <lcm@us.ibm.com> said:
Chris> Jack,
Chris> Looks like my problem is something else. This patch didn't
Chris> correct the problem.
My guess is you need newer firmware. Try 99 or whatever else is the
latest. (This firmware may still be hard to get; I didn't check
recently.)
--david
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [Linux-ia64] Hard "hangs" with 2.4.16
2001-12-10 20:46 [Linux-ia64] Hard "hangs" with 2.4.16 Chris McDermott
` (5 preceding siblings ...)
2001-12-11 0:01 ` David Mosberger
@ 2001-12-11 19:15 ` Chris McDermott
6 siblings, 0 replies; 8+ messages in thread
From: Chris McDermott @ 2001-12-11 19:15 UTC (permalink / raw)
To: linux-ia64
David,
Thanks for the suggestion. The latest F/W I have is 97B. I upgraded to
this revision and it fixed the problem. It appears that anyone wishing
to run Linux kernels > 2.4.10 on Intel Lion SDVs must use an unsupported
BIOS. Intel, through their Merced SDV program, will apparently not
support a BIOS greater than 83B. The later BIOS builds are reserved for
use on OEM platforms only. So, I would assume that if you do install
the unsupported BIOS and your SDV dies, you would not be able to get
support from Intel. Would anyone from Intel care to confirm this?
Thanks,
Chris
On Mon, Dec 10, 2001 at 16:01 -0800, David Mosberger wrote:
> >>>>> On Mon, 10 Dec 2001 14:49:45 -0800, Chris McDermott <lcm@us.ibm.com> said:
>
> Chris> Jack,
> Chris> Looks like my problem is something else. This patch didn't
> Chris> correct the problem.
>
> My guess is you need newer firmware. Try 99 or whatever else is the
> latest. (This firmware may still be hard to get; I didn't check
> recently.)
>
> --david
^ permalink raw reply [flat|nested] 8+ messages in thread