All of lore.kernel.org
 help / color / mirror / Atom feed
From: Scott Wood <oss@buserror.net>
To: Christophe Leroy <christophe.leroy@c-s.fr>,
	Alessio Igor Bogani <alessio.bogani@elettra.eu>
Cc: linuxppc-dev@lists.ozlabs.org, Michael Ellerman <mpe@ellerman.id.au>
Subject: Re: Suspected regression?
Date: Thu, 25 Aug 2016 21:32:16 -0500	[thread overview]
Message-ID: <1472178736.25630.288.camel@buserror.net> (raw)
In-Reply-To: <b843cd6d-b9df-a5b2-4015-042c7894313f@c-s.fr>

On Tue, 2016-08-23 at 13:34 +0200, Christophe Leroy wrote:
> 
> Le 23/08/2016 à 11:20, Alessio Igor Bogani a écrit :
> > 
> > Hi Christophe,
> > 
> > Sorry for delay in reply I was on vacation.
> > 
> > On 6 August 2016 at 11:29, christophe leroy <christophe.leroy@c-s.fr>
> > wrote:
> > > 
> > > Alessio,
> > > 
> > > 
> > > Le 05/08/2016 à 09:51, Christophe Leroy a écrit :
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Le 19/07/2016 à 23:52, Scott Wood a écrit :
> > > > > 
> > > > > 
> > > > > On Tue, 2016-07-19 at 12:00 +0200, Alessio Igor Bogani wrote:
> > > > > > 
> > > > > > 
> > > > > > Hi all,
> > > > > > 
> > > > > > I have got two boards MVME5100 (MPC7410 cpu) and MVME7100
> > > > > > (MPC8641D
> > > > > > cpu) for which I use the same cross-compiler (ppc7400).
> > > > > > 
> > > > > > I tested these against kernel HEAD to found that these don't boot
> > > > > > anymore (PID 1 crash).
> > > > > > 
> > > > > > Bisecting results in first offending commit:
> > > > > > 7aef4136566b0539a1a98391181e188905e33401
> > > > > > 
> > > > > > Removing it from HEAD make boards boot properly again.
> > > > > > 
> > > > > > A third system based on P2010 isn't affected at all.
> > > > > > 
> > > > > > Is it a regression or I have made something wrong?
> > > > > 
> > > > > I booted both my next branch, and Linus's master on MPC8641HPCN and
> > > > > didn't see
> > > > > this -- though possibly your RFS is doing something
> > > > > different.  Maybe
> > > > > that's
> > > > > the difference with P2010 as well.
> > > > > 
> > > > > Is there any way you can debug the cause of the crash?  Or send me a
> > > > > minimal
> > > > > RFS that demonstrates the problem (ideally with debug symbols on the
> > > > > userspace
> > > > > binaries)?
> > > > > 
> > > > I got from Alessio the below information:
> > > > 
> > > > systemd[1]: Caught <BUS>, core dump failed (child 137, code=killed,
> > > > status=7/BUS).
> > > > systemd[1]: Freezing execution.
> > > > 
> > > > 
> > > > What can generate SIGBUS ?
> > > > And shouldn't we also get some KERN_ERR trace, something like
> > > > "unhandled
> > > > signal 7 at ....." ?
> > > > 
> > > As far as I can see, SIGBUS is mainly generated from alignment
> > > exception.
> > > According to 7410 Reference Manual, alignment exception can happen in
> > > the
> > > following cases:
> > > * An operand of a dcbz instruction is on a page that is write-through or
> > > cache-inhibited for a virtual mode access.
> > > * An attempt to execute a dcbz instruction occurs when the cache is
> > > disabled
> > > or locked.
> > > 
> > > Could try with below patch to check if the dcbz insn is causing the
> > > SIGBUS ?
> > Unfortunately that patch doesn't solve the problem.
> > 
> > Is there a chance that cache behavior could settled by board firmware
> > (PPCBug on the MPC7410 board and MotLoad on the MPC8641D one)?
> > In that case what do you suggest me to looking for?
> If the removal of dcbz doesn't solve the issue, I don't think it is a 
> cache related issue.
> As far as I understood, your init gets a SIGBUS signal, right ? Then we 
> must identify the reason for that sigbus.

My guess would be errors demand-loading a page via NFS.

One approach might be to hack up the code so that both versions of
csum_partial_copy_generic() are present, and call both each time.  If the
results differ or the copied bytes are wrong, then spit out a dump of the
details.

-Scott

  reply	other threads:[~2016-08-26  2:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-19 10:00 Suspected regression? Alessio Igor Bogani
2016-07-19 21:52 ` Scott Wood
2016-08-05  7:51   ` Christophe Leroy
2016-08-06  9:29     ` christophe leroy
2016-08-23  9:20       ` Alessio Igor Bogani
2016-08-23 11:34         ` Christophe Leroy
2016-08-26  2:32           ` Scott Wood [this message]
2016-08-26 12:46             ` Christophe Leroy
2016-08-26 14:20               ` Alessio Igor Bogani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1472178736.25630.288.camel@buserror.net \
    --to=oss@buserror.net \
    --cc=alessio.bogani@elettra.eu \
    --cc=christophe.leroy@c-s.fr \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.