From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Henry Bausley <hbausley@deltatau.com>
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: Critical Interrupt Input
Date: Wed, 28 Aug 2013 08:14:25 +1000 [thread overview]
Message-ID: <1377641665.3819.138.camel@pasglop> (raw)
In-Reply-To: <1377641516.4691.11.camel@lx-henry>
On Tue, 2013-08-27 at 15:11 -0700, Henry Bausley wrote:
> Both methods you described seem to work. We are currently using the
> method of clearing the partially written TLB. Seems to be working but
> we're still testing. Thanks.
Feel free to send me us patch for review :-)
Cheers,
Ben.
> .
> .
> .
> mfspr r5,SPRN_CSRR0;
> lis r12,finish_tlb_load_44x@h
> ori r12,r12,finish_tlb_load_44x@l;
> addi r11,r12,finish_tlb_load_44x_end-finish_tlb_load_44x;
> cmplw cr0,r5,r12;
> cmplw cr1,r5,r11;
> ble cr0,3f;
> bge cr1,3f;
> li r12,0;
> mr r5,r11
> tlbwe r12,r13,PPC44x_TLB_XLAT;
> tlbwe r12,r13,PPC44x_TLB_PAGEID; /* Clear PAGEID */
> tlbwe r12,r13,PPC44x_TLB_ATTRIB; /* Clear ATTRIB */
> isync
> .
> .
> .
>
>
> On Wed, 2013-08-21 at 09:08 +1000, Benjamin Herrenschmidt wrote:
> > On Tue, 2013-08-20 at 15:48 -0700, Henry Bausley wrote:
> > > Ben,
> > >
> > >
> > > After your hints I suspected the read of a real world i/o variable *piom
> > > which came from ioremap_nocache in the 3 line critical interrupt handler
> > >
> > > void critintr_handler(void *dev)
> > > {
> > > critintrcount++; // increment a variable
> > > iodata = *piom; // read an I/O location
> > > mtdcr(0x0c0, 0x00002000); // clear critical interrupt
> > > }
> > >
> > > is what caused the problem. Commenting it out seems to make the system stable.
> >
> > Right, definitely would do that. BTW. You may want to use proper IO
> > accessors while at it, to get the right memory barriers etc...
> >
> > > This led us to disable the critical interrupt when in the
> > > DataTLBError44x and InstructionTLBError44x exceptions. Now the critical
> > > interrupt handler seems to make things more stable when reading real
> > > world i/o for our application.
> > >
> > >
> > > /* Data TLB Error Interrupt */
> > > START_EXCEPTION(DataTLBError44x)
> > > mtspr SPRN_SPRG_WSCRATCH0, r10 /* Save some working */
> > > + mfmsr r10 /* Disable the */
> > > + rlwinm r10,r10,0,15,13 /* MSR's CE bit */
> > > + mtmsr r10
> > >
> > >
> > > Do you see any potential problems with this approach?
> > >
> > > If so can you advise us on how to better take care of this.
> >
> > - You potentially still have an exposure ... between the mtspr to
> > scratch and the mfmsr, a CRIC can occur, causing a re-entrancy which
> > would than clobber the scratch register. That can be handled by saving
> > that scratc SPRG into the stack frame on entry/exit from the crit
> > interrupt. Look at crit_transfer_to_handler, how it already handles
> > MMUCR:
> >
> > mfspr r0,SPRN_MMUCR
> > stw r0,MMUCR(r11)
> >
> > Probably add saving of the SPRG_WSCRATCH0 in there (need to add a frame
> > slot for it) and do the restore in RESTORE_MMU_REGS
> >
> > - You need to handle Instructions TLB miss as well
> >
> > - You add overhead to the TLB miss handlers which are fairly
> > performance critical pieces of code. You might be able to alleviate
> > that by making the whole thing support re-entrancy properly but that's
> > harder. To do that you would have to:
> >
> > * Save *all* the SPRGs used by the TLB miss during crit entry/exit
> >
> > * Detect in crit_transfer_to_handler (check the CSRR0 bounds) that
> > the crit code interrupted finish_tlb_load_44x before or at the
> > last tlbwe instruction. In that case, immediately clear the
> > partially written TLB entry (index in r13) and change the
> > return address to skip right past the last tlbwe.
> >
> > Cheers,
> > Ben.
> >
> >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, 2013-08-20 at 06:56 +1000, Benjamin Herrenschmidt wrote:
> > > > On Mon, 2013-08-19 at 12:00 -0700, Henry Bausley wrote:
> > > > >
> > > > > Support does appear to be present but there is a problem returning
> > > > > back to user space I suspect.
> > > >
> > > > Probably a problem with TLB misses vs. crit interrupts.
> > > >
> > > > A critical interrupt can re-enter a TLB miss.
> > > >
> > > > I can see two potential issues there:
> > > >
> > > > - A bug where we don't properly restore "something" (I thought we did
> > > > save and restore MMUCR though, but that's worth dbl checking if it works
> > > > properly) accross the crit entry/exit
> > > >
> > > > - Something in your crit code causing a TLB miss (the
> > > > kernel .text/.data/.bss should be bolted but anything else can). We
> > > > don't currently support re-entering the TLB miss that way.
> > > >
> > > > If we were to support the latter, we'd need to detect on entering a crit
> > > > that the PC is within the TLB miss handler, and setup a return context
> > > > to the original instruction (replay the miss) rather than trying to
> > > > resume it..
> > > >
> > > > Cheers,
> > > > Ben.
> > > >
> > > > > What fails is it causes Linux user space programs to get Segmentation
> > > > > errors.
> > > > > Issuing a simple ls causes a segmentation fault sometimes. The shell
> > > > > gets terminated
> > > > > and you cannot log back in. INIT: Id "T0" respawning too fast:
> > > > > disabled for 5 minutes pops up.
> > > > >
> > > > > However, the critical interrupt handler keeps running. I know this by
> > > > > adding the reading
> > > > > of a physical I/O location in the handler and can see it is being read
> > > > > on the scope.
> > > > >
> > > > >
> > > > > The only code in the handler is below.
> > > > >
> > > > > void critintr_handler(void *dev)
> > > > > {
> > > > > critintrcount++; // increment a variable
> > > > > iodata = *piom; // read an I/O location
> > > > > mtdcr(0x0c0, 0x00002000); // clear critical interrupt
> > > > > }
> > > > >
> > > > >
> > > > > Below is a log of the type of crashes that occur:
> > > > >
> > > > > root@10.34.9.213:/opt/ppmac/ktest# ls
> > > > > Segmentation fault
> > > > > root@10.34.9.213:/opt/ppmac/ktest# ls
> > > > > Segmentation fault
> > > > > root@10.34.9.213:/opt/ppmac/ktest# ls
> > > > > Makefile ktest.c ktest.ko ktest.mod.o modules.order
> > > > > Module.symvers ktest.cbp ktest.mod.c ktest.o
> > > > > root@10.34.9.213:/opt/ppmac/ktest# ls
> > > > >
> > > > > Debian GNU/Linux 7 powerpmac ttyS0
> > > > >
> > > > > powerpmac login: root
> > > > >
> > > > > Debian GNU/Linux 7 powerpmac ttyS0
> > > > >
> > > > > powerpmac login: root
> > > > >
> > > > > Debian GNU/Linux 7 powerpmac ttyS0
> > > > >
> > > > > powerpmac login: root
> > > > >
> > > > > Debian GNU/Linux 7 powerpmac ttyS0
> > > > >
> > > > > powerpmac login: root
> > > > > Password:
> > > > > Last login: Thu Nov 30 20:42:16 UTC 1933 on ttyS0
> > > > > Linux powerpmac 3.2.21-aspen_2.01.09 #10 Mon Aug 19 08:49:12 PDT 2013
> > > > > ppc
> > > > >
> > > > > The programs included with the Debian GNU/Linux system are free
> > > > > software;
> > > > > the exact distribution terms for each program are described in the
> > > > > individual files in /usr/share/doc/*/copyright.
> > > > >
> > > > > Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
> > > > > permitted by applicable law.
> > > > > INIT: Id "T0" respawning too fast: disabled for 5 minutes
> > > > >
> > > > >
> > > > > ______________________________________________________________________
> > > > > From: "Benjamin Herrenschmidt" <benh@kernel.crashing.org>
> > > > > Sent: Saturday, August 17, 2013 3:05 PM
> > > > > To: "Kumar Gala" <galak@kernel.crashing.org>
> > > > > Cc: linuxppc-dev@lists.ozlabs.org, hbausley@deltatau.com
> > > > > Subject: Re: Critical Interrupt Input
> > > > >
> > > > > On Fri, 2013-08-16 at 06:04 -0500, Kumar Gala wrote:
> > > > > > The 44x low level code needs to handle exception stacks properly for
> > > > > > this to work. Since its possible to have a critical exception occur
> > > > > > while in a normal exception level, you have to have proper saving of
> > > > > > additional register state and a stack frame for the critical
> > > > > > exception, etc. I'm not sure if that was ever done for 44x.
> > > > >
> > > > > Don't 44x and FSL BookE share the same macros ? I would think 44x does
> > > > > indeed implement the same crit support as e500...
> > > > >
> > > > > What does the crash look like ?
> > > > >
> > > > > Ben.
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Linuxppc-dev mailing list
> > > > > Linuxppc-dev@lists.ozlabs.org
> > > > > https://lists.ozlabs.org/listinfo/linuxppc-dev
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > > Outbound scan for Spam or Virus by Barracuda at Delta Tau
> >
> >
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev
>
>
next prev parent reply other threads:[~2013-08-27 22:14 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-19 19:00 Critical Interrupt Input Henry Bausley
2013-08-19 20:56 ` Benjamin Herrenschmidt
2013-08-19 21:04 ` Denis Kirjanov
2013-08-20 22:48 ` Henry Bausley
2013-08-20 23:08 ` Benjamin Herrenschmidt
2013-08-27 22:11 ` Henry Bausley
2013-08-27 22:14 ` Benjamin Herrenschmidt [this message]
-- strict thread matches above, loose matches on Subject: below --
2013-08-16 4:57 Henry Bausley
2013-08-16 11:04 ` Kumar Gala
2013-08-17 22:05 ` Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1377641665.3819.138.camel@pasglop \
--to=benh@kernel.crashing.org \
--cc=hbausley@deltatau.com \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).