From: Ralf Baechle <ralf@linux-mips.org>
To: Matt Turner <mattst88@gmail.com>
Cc: Manuel Lauss <manuel.lauss@gmail.com>,
James Hogan <james.hogan@imgtec.com>,
"linux-mips@linux-mips.org" <linux-mips@linux-mips.org>
Subject: Re: NFS corruption, fixed by echo 1 > /proc/sys/vm/drop_caches -- next debugging steps?
Date: Wed, 15 Mar 2017 16:52:16 +0100 [thread overview]
Message-ID: <20170315155216.GA10914@linux-mips.org> (raw)
In-Reply-To: <CAEdQ38FU6H7ThmP2MgUY-uLhf9feZ6US2JwhEQsCuPw9AeV3nQ@mail.gmail.com>
On Wed, Mar 15, 2017 at 08:31:19AM -0700, Matt Turner wrote:
> On Wed, Mar 15, 2017 at 7:00 AM, Manuel Lauss <manuel.lauss@gmail.com> wrote:
> >
> > On Wed, Mar 15, 2017 at 10:25 AM, Ralf Baechle <ralf@linux-mips.org> wrote:
> >>
> >> On Mon, Mar 13, 2017 at 09:47:57AM +0000, James Hogan wrote:
> >>
> >> > >
> >> > > Note that the corruption is different across reboots, both in the size
> >> > > of the corruption and the location. I saw 1900~ and 1400~ byte
> >> > > sequences corrupted on separate occasions, which don't correspond to
> >> > > the system's 16kB page size.
> >> > >
> >> > > I've tested kernels from v3.19 to 4.11-rc1+ (master branch from
> >> > > today). All exhibit this behavior with differing frequencies. Earlier
> >> > > kernels seem to reproduce the issue less often, while more recent
> >> > > kernels reliably exhibit the problem every boot.
> >> > >
> >> > > How can I further debug this?
> >> >
> >> > It smells a bit like a DMA / caching issue.
> >> >
> >> > Can you provide a full kernel log. That might provide some information
> >> > about caching that might be relevant (e.g. does dcache have aliases?).
> >>
> >> The architecture of the BCM1250 SOC used for the BCM91250 boards are
> >> fully coherent, S-cache and D-cache are physically indexed and tagged.
> >> Only the VIVT (plus the usual ASID tagging) I-cache leaves space for
> >> software to screw up cache management but that shouldn't matter for this
> >> case, so I suggest to start looking into this from the NFS side.
> >
> >
> > I did Matt's tests on Alchemy (VIPT caches) with kernels 3.18 to 4.11-rc
> > against
> > an x86 4.9.15 host, and did not see any problems. Given Ralf's comment
> > about the BCM1250 caches, maybe you have bad hardware (BCM board or
> > network) ?
>
> I certainly cannot rule that possibility out. If that is the case, I
> would like to be sure of it -- see a failure in memtester or something
> for instance. Any suggestions? (I have run memtester and never found
> anything)
>
> For what its worth, did you determine the cause of the NFS corruption
> you reported [1]?
>
> [1] https://www.spinics.net/lists/mips/msg44006.html
I've chased my fair share of kernel bugs on Sibyte systems that were
caused by faulty or unsuitable memory modules, even the BGA solder points
of the BCM1250 SOC coming off. If you have memory modules in both
banks you may want to try if you can reproduce them with only one
bank populated and if it makes a difference if only bank one or only
bank two is populated. Firmware updates have fixed various issues with
memory controller initialization over the years so if you haven't
updated to the latest and greatest CFE for the board, you may want to
try that.
Ralf
next prev parent reply other threads:[~2017-03-15 15:52 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-13 1:43 NFS corruption, fixed by echo 1 > /proc/sys/vm/drop_caches -- next debugging steps? Matt Turner
2017-03-13 9:47 ` James Hogan
2017-03-13 9:47 ` James Hogan
2017-03-13 17:17 ` Matt Turner
2017-03-15 9:25 ` Ralf Baechle
[not found] ` <CAOLZvyGRn5JgeRoiHv0AH8LVwLF5MtXF2KwS5Yr5N8QOK6eYnw@mail.gmail.com>
2017-03-15 15:31 ` Matt Turner
2017-03-15 15:52 ` Ralf Baechle [this message]
2017-03-15 16:46 ` Joshua Kinard
2017-12-08 7:00 ` Matt Turner
2017-12-08 7:54 ` Matt Turner
2017-12-08 13:42 ` Eric Dumazet
2017-12-08 13:42 ` Eric Dumazet
2017-12-08 13:52 ` Eric Dumazet
2017-12-08 13:52 ` Eric Dumazet
2017-12-08 20:26 ` Matt Turner
2017-12-08 20:26 ` Matt Turner
2017-12-08 21:16 ` Eric Dumazet
2017-12-08 21:16 ` Eric Dumazet
2017-12-09 21:03 ` Matt Turner
2017-12-09 21:03 ` Matt Turner
2017-12-09 21:37 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170315155216.GA10914@linux-mips.org \
--to=ralf@linux-mips.org \
--cc=james.hogan@imgtec.com \
--cc=linux-mips@linux-mips.org \
--cc=manuel.lauss@gmail.com \
--cc=mattst88@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.