From: David Mosberger <davidm@hpl.hp.com>
To: linux-ia64@vger.kernel.org
Subject: Re: Re: Re: [Linux-ia64] Re: Lockups on 2.4.1
Date: Wed, 28 Feb 2001 06:09:48 +0000 [thread overview]
Message-ID: <marc-linux-ia64-105590693005219@msgid-missing> (raw)
In-Reply-To: <marc-linux-ia64-105590693005190@msgid-missing>
OK, this makes sense: our systems have ptc.g enabled, which explains
why we haven't seen this problem. I made the change of using
smp_resend_flush_tlb() but also increased the timeout by a factor of
10.
Thanks,
--david
>>>>> On Thu, 22 Feb 2001 14:48:03 -0600 (CST), Jack Steiner <steiner@sgi.com> said:
>> > Anyway, I have ITPs connected to the IBM hardware and have
>> noticed that > when the lockup occurs, and we lose video, at
>> least one of the CPUs is > executing in flush_tlb_no_ptcg() or
>> handle_IPI(), in the 'do' loop where > TLB > entries are being
>> purged. What I have observed is that the end address and > the
>> start address are in completely different regions. Usually, the
>> start > address > is in region register 1 (address of
>> 0x2000XXXXXXXXXXXX) and the end address > is in region register 3
>> (address of 0x6000XXXXXXXXXXXX). I don't know if > this > is the
>> same problem I am seeing on the Lion, but I plan to connect and
>> ITP > and > a serial console (although we haven't been able to
>> get one to work yet on > the > Lion with BIOS 71) to see if the
>> symptoms are the same.
>>
>> FWIW, we have seen EXACTLY the same hang running here on our
>> system. The start/end addresses for the purge cross region
>> boundaries.
>>
>>
>> We are running a 2.4.0 kernel.
Jack> I found a problem that was causing the lockup described above
Jack> & I suspect this may responsible for some of the other hangs
Jack> various folks have seen.
Jack> There is code in flush_tlb_no_ptcg() that resends the IPI if
Jack> other cpus have not responded within a short time. If this
Jack> code get invoked, then it is possible for flush_cpu_count to
Jack> get corrupted. When that happens, a cpu can be executing in
Jack> handle_IPI() while flush_start/flush_end are changing. A cpu
Jack> can pick up a non-matching flush_start/flush_end. This leads
Jack> to hangs or lost TLB flushes.
Jack> To verify that this could cause the hang, I changed the
Jack> timeout in flush_tlb_no_ptcg() from 40000UL to 400UL. I hung
Jack> before getting to multiuser mode with flush_start/flush_end in
Jack> different regions.
Jack> Here is the patch I used. Note: this is against 2.4.0,
Jack> --- linux-trillian/arch/ia64/kernel/smp.c Thu Feb 22 14:35:28
Jack> 2001 +++ linux/arch/ia64/kernel/smp.c Thu Feb 22 14:19:46 2001
Jack> @@ -321,6 +321,16 @@ { send_IPI_allbutself(IPI_FLUSH_TLB); } +
Jack> +void +smp_resend_flush_tlb(void) +{ + /* + * Really need a
Jack> null IPI but since this rarely should happen & + * since this
Jack> code will go away, lets not add one. + */ +
Jack> send_IPI_allbutself(IPI_RESCHEDULE); +} #endif /*
Jack> !CONFIG_ITANIUM_PTCG */
Jack> /* --- linux-trillian/arch/ia64/mm/tlb.c Thu Feb 22 14:35:28
Jack> 2001 +++ linux/arch/ia64/mm/tlb.c Thu Feb 22 14:19:50 2001 @@
Jack> -59,6 +59,7 @@ flush_tlb_no_ptcg (unsigned long start,
Jack> unsigned long end, unsigned long nbits) { extern void
Jack> smp_send_flush_tlb (void); + extern void smp_resend_flush_tlb
Jack> (void); unsigned long saved_tpr = 0; unsigned long flags;
Jack> @@ -101,9 +102,8 @@ { unsigned long start = ia64_get_itc();
Jack> while (atomic_read(&flush_cpu_count) > 0) { - if
Jack> ((ia64_get_itc() - start) > 40000UL) { -
Jack> atomic_set(&flush_cpu_count, smp_num_cpus - 1); -
Jack> smp_send_flush_tlb(); + if ((ia64_get_itc() - start) > 400UL)
Jack> { + smp_resend_flush_tlb(); start = ia64_get_itc(); } }
Jack> -- Thanks
Jack> Jack Steiner (651-683-5302) (vnet 233-5302) steiner@sgi.com
Jack> _______________________________________________ Linux-IA64
Jack> mailing list Linux-IA64@linuxia64.org
Jack> http://lists.linuxia64.org/lists/listinfo/linux-ia64
next prev parent reply other threads:[~2001-02-28 6:09 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-02-22 20:48 Re: Re: [Linux-ia64] Re: Lockups on 2.4.1 Jack Steiner
2001-02-28 0:39 ` Mallick, Asit K
2001-02-28 6:09 ` David Mosberger [this message]
2001-02-28 17:05 ` Jack Steiner
2001-02-28 17:56 ` Mallick, Asit K
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=marc-linux-ia64-105590693005219@msgid-missing \
--to=davidm@hpl.hp.com \
--cc=linux-ia64@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.