From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Seth, Rohit" Date: Fri, 23 Feb 2001 19:06:53 +0000 Subject: RE: [Linux-ia64] Re: Lockups on 2.4.1 Message-Id: List-Id: References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-ia64@vger.kernel.org Just want to add that if you are running on B3 then you should have CONFIG_ITANIUM_PTCG=y and CONFIG_DISABLE_VHPT not set. -----Original Message----- From: Jun Nakajima [mailto:jun@sco.com] Sent: Friday, February 23, 2001 7:20 AM To: Jack Steiner Cc: linux-ia64@linuxia64.org Subject: Re: [Linux-ia64] Re: Lockups on 2.4.1 This might explain why I was _not_ able to reproduce such a hang after overnight stress tests on a 4-way Lion... My configuration has CONFIG_ITANIUM_PTCG=y, and flush_tlb_no_ptcg() is never called by flush_tlb_range() if it's defined. Jack Steiner wrote: > > > > Anyway, I have ITPs connected to the IBM hardware and have noticed that > > > when the lockup occurs, and we lose video, at least one of the CPUs is > > > executing in flush_tlb_no_ptcg() or handle_IPI(), in the 'do' loop where > > > TLB > > > entries are being purged. What I have observed is that the end address and > > > the start address are in completely different regions. Usually, the start > > > address > > > is in region register 1 (address of 0x2000XXXXXXXXXXXX) and the end address > > > is in region register 3 (address of 0x6000XXXXXXXXXXXX). I don't know if > > > this > > > is the same problem I am seeing on the Lion, but I plan to connect and ITP > > > and > > > a serial console (although we haven't been able to get one to work yet on > > > the > > > Lion with BIOS 71) to see if the symptoms are the same. > > > > FWIW, we have seen EXACTLY the same hang running here on our system. > > The start/end addresses for the purge cross region boundaries. > > > > > > We are running a 2.4.0 kernel. > > I found a problem that was causing the lockup described above & I suspect this > may responsible for some of the other hangs various folks have seen. > > There is code in flush_tlb_no_ptcg() that resends the IPI if other > cpus have not responded within a short time. If this code get invoked, then > it is possible for flush_cpu_count to get corrupted. When that happens, a cpu > can be executing in handle_IPI() while flush_start/flush_end are changing. > A cpu can pick up a non-matching flush_start/flush_end. This leads to hangs or > lost TLB flushes. > > To verify that this could cause the hang, I changed the timeout in > flush_tlb_no_ptcg() from 40000UL to 400UL. I hung before getting to multiuser mode > with flush_start/flush_end in different regions. > > Here is the patch I used. Note: this is against 2.4.0, > > --- linux-trillian/arch/ia64/kernel/smp.c Thu Feb 22 14:35:28 2001 > +++ linux/arch/ia64/kernel/smp.c Thu Feb 22 14:19:46 2001 > @@ -321,6 +321,16 @@ > { > send_IPI_allbutself(IPI_FLUSH_TLB); > } > + > +void > +smp_resend_flush_tlb(void) > +{ > + /* > + * Really need a null IPI but since this rarely should happen & > + * since this code will go away, lets not add one. > + */ > + send_IPI_allbutself(IPI_RESCHEDULE); > +} > #endif /* !CONFIG_ITANIUM_PTCG */ > > /* > --- linux-trillian/arch/ia64/mm/tlb.c Thu Feb 22 14:35:28 2001 > +++ linux/arch/ia64/mm/tlb.c Thu Feb 22 14:19:50 2001 > @@ -59,6 +59,7 @@ > flush_tlb_no_ptcg (unsigned long start, unsigned long end, unsigned long nbits) > { > extern void smp_send_flush_tlb (void); > + extern void smp_resend_flush_tlb (void); > unsigned long saved_tpr = 0; > unsigned long flags; > > @@ -101,9 +102,8 @@ > { > unsigned long start = ia64_get_itc(); > while (atomic_read(&flush_cpu_count) > 0) { > - if ((ia64_get_itc() - start) > 40000UL) { > - atomic_set(&flush_cpu_count, smp_num_cpus - 1); > - smp_send_flush_tlb(); > + if ((ia64_get_itc() - start) > 400UL) { > + smp_resend_flush_tlb(); > start = ia64_get_itc(); > } > } > > -- > Thanks > > Jack Steiner (651-683-5302) (vnet 233-5302) steiner@sgi.com > > _______________________________________________ > Linux-IA64 mailing list > Linux-IA64@linuxia64.org > http://lists.linuxia64.org/lists/listinfo/linux-ia64 -- Jun U Nakajima Core OS Development SCO/Murray Hill, NJ Email: jun@sco.com, Phone: 908-790-2352 Fax: 908-790-2426 _______________________________________________ Linux-IA64 mailing list Linux-IA64@linuxia64.org http://lists.linuxia64.org/lists/listinfo/linux-ia64