All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jun Nakajima <jun@sco.com>
To: linux-ia64@vger.kernel.org
Subject: Re: [Linux-ia64] Re: Lockups on 2.4.1
Date: Fri, 23 Feb 2001 15:19:34 +0000	[thread overview]
Message-ID: <marc-linux-ia64-105590693005191@msgid-missing> (raw)
In-Reply-To: <marc-linux-ia64-105590693005175@msgid-missing>

This might explain why I was _not_ able to reproduce such a hang after
overnight stress tests on a 4-way Lion... My configuration has
CONFIG_ITANIUM_PTCG=y, and flush_tlb_no_ptcg() is never called by
flush_tlb_range() if it's defined. 


Jack Steiner wrote:
> 
> > > Anyway, I have ITPs connected to the IBM hardware and have noticed that
> > > when the lockup occurs, and we lose video, at least one of the CPUs is
> > > executing in flush_tlb_no_ptcg() or handle_IPI(), in the 'do' loop where
> > > TLB
> > > entries are being purged. What I have observed is that the end address and
> > > the start address are in completely different regions. Usually, the start
> > > address
> > > is in region register 1 (address of 0x2000XXXXXXXXXXXX) and the end address
> > > is in region register 3 (address of 0x6000XXXXXXXXXXXX). I don't know if
> > > this
> > > is the same problem I am seeing on the Lion, but I plan to connect and ITP
> > > and
> > > a serial console (although we haven't been able to get one to work yet on
> > > the
> > > Lion with BIOS 71) to see if the symptoms are the same.
> >
> > FWIW, we have seen EXACTLY the same hang running here on our system.
> > The start/end addresses for the purge cross region boundaries.
> >
> >
> > We are running a 2.4.0 kernel.
> 
> I found a problem that was causing the lockup described above & I suspect this
> may responsible for some of the other hangs various folks have seen.
> 
> There is code in flush_tlb_no_ptcg() that resends the IPI if other
> cpus have not responded within a short time. If this code get invoked, then
> it is possible for flush_cpu_count to get corrupted. When that happens, a cpu
> can be executing in handle_IPI() while flush_start/flush_end are changing.
> A cpu can pick up a non-matching flush_start/flush_end. This leads to  hangs or
> lost TLB flushes.
> 
> To verify that this could cause the hang, I changed the timeout in
> flush_tlb_no_ptcg() from 40000UL to 400UL. I hung before getting to multiuser mode
> with flush_start/flush_end in different regions.
> 
> Here is the patch I used. Note: this is against 2.4.0,
> 
> --- linux-trillian/arch/ia64/kernel/smp.c       Thu Feb 22 14:35:28 2001
> +++ linux/arch/ia64/kernel/smp.c        Thu Feb 22 14:19:46 2001
> @@ -321,6 +321,16 @@
>  {
>         send_IPI_allbutself(IPI_FLUSH_TLB);
>  }
> +
> +void
> +smp_resend_flush_tlb(void)
> +{
> +       /*
> +        * Really need a null IPI but since this rarely should happen &
> +        * since this code will go away, lets not add one.
> +        */
> +       send_IPI_allbutself(IPI_RESCHEDULE);
> +}
>  #endif /* !CONFIG_ITANIUM_PTCG */
> 
>  /*
> --- linux-trillian/arch/ia64/mm/tlb.c   Thu Feb 22 14:35:28 2001
> +++ linux/arch/ia64/mm/tlb.c    Thu Feb 22 14:19:50 2001
> @@ -59,6 +59,7 @@
>  flush_tlb_no_ptcg (unsigned long start, unsigned long end, unsigned long nbits)
>  {
>         extern void smp_send_flush_tlb (void);
> +       extern void smp_resend_flush_tlb (void);
>         unsigned long saved_tpr = 0;
>         unsigned long flags;
> 
> @@ -101,9 +102,8 @@
>         {
>                 unsigned long start = ia64_get_itc();
>                 while (atomic_read(&flush_cpu_count) > 0) {
> -                       if ((ia64_get_itc() - start) > 40000UL) {
> -                               atomic_set(&flush_cpu_count, smp_num_cpus - 1);
> -                               smp_send_flush_tlb();
> +                       if ((ia64_get_itc() - start) > 400UL) {
> +                               smp_resend_flush_tlb();
>                                 start = ia64_get_itc();
>                         }
>                 }
> 
> --
> Thanks
> 
> Jack Steiner    (651-683-5302)   (vnet 233-5302)      steiner@sgi.com
> 
> _______________________________________________
> Linux-IA64 mailing list
> Linux-IA64@linuxia64.org
> http://lists.linuxia64.org/lists/listinfo/linux-ia64

-- 
Jun U Nakajima
Core OS Development
SCO/Murray Hill, NJ
Email: jun@sco.com, Phone: 908-790-2352 Fax: 908-790-2426


  parent reply	other threads:[~2001-02-23 15:19 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-02-21 16:05 [Linux-ia64] Re: Lockups on 2.4.1 Bill Nottingham
2001-02-21 17:16 ` Gerrit Huizenga
2001-02-21 17:57 ` David Mosberger
2001-02-21 18:58 ` Chris McDermott
2001-02-21 21:02 ` David Mosberger
2001-02-23 15:19 ` Jun Nakajima [this message]
2001-02-23 19:06 ` Seth, Rohit
2001-02-23 19:20 ` Michael Madore
2001-02-23 19:48 ` Seth, Rohit
2001-02-23 20:00 ` Jesse Barnes
2001-02-24 13:39 ` Francis Galiegue
2001-02-24 14:44 ` Francis Galiegue
2001-02-24 18:45 ` Michael Madore
2001-02-24 23:18 ` Joseph V Moss
2001-02-25  2:43 ` Francis Galiegue
2001-02-26 20:52 ` Jim Wilson
2001-03-07  3:51 ` [linux-ia64] " Tom King
2001-03-07 20:34 ` Jim Wilson
2001-03-08  3:45 ` Tom King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=marc-linux-ia64-105590693005191@msgid-missing \
    --to=jun@sco.com \
    --cc=linux-ia64@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.