linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v3] arm64: enable EDAC on arm64
Date: Tue, 22 Apr 2014 17:01:00 +0100	[thread overview]
Message-ID: <20140422160100.GH9820@arm.com> (raw)
In-Reply-To: <CAL_JsqJEwUu0ooEcL+hniD8vxjBW5571ea3u3JmPN3x1MfXW3w@mail.gmail.com>

On Tue, Apr 22, 2014 at 04:23:20PM +0100, Rob Herring wrote:
> On Tue, Apr 22, 2014 at 8:26 AM, Will Deacon <will.deacon@arm.com> wrote:
> > On Tue, Apr 22, 2014 at 01:54:12PM +0100, Rob Herring wrote:
> >> On Tue, Apr 22, 2014 at 5:24 AM, Will Deacon <will.deacon@arm.com> wrote:
> >> > On Mon, Apr 21, 2014 at 05:09:16PM +0100, Rob Herring wrote:
> >> >> +#ifndef ASM_EDAC_H
> >> >> +#define ASM_EDAC_H
> >> >> +/*
> >> >> + * ECC atomic, DMA, SMP and interrupt safe scrub function.
> >> >
> >> > What do you mean by `DMA safe'? For coherent (cacheable) DMA buffers, this
> >> > should work fine, but for non-coherent (and potentially non-cacheable)
> >> > buffers, I think we'll have problems both due to the lack of guaranteed
> >> > exclusive monitor support and also eviction of dirty lines.
> >>
> >> That's just copied from other implementations. I agree you could have
> >> a problem here although I don't see why dirty line eviction would be.
> >
> > I was thinking of the case where you have an ongoing, non-coherent DMA
> > transfer from a device and then the atomic_scrub routine runs in parallel
> > on the CPU, targetting the same buffer. In this case, the stxr could store
> > stale data back to the buffer, leading to corruption (since the monitor
> > won't help). This differs from the case where the monitor could always
> > report failure for non-cacheable regions, causing atomic_scrub to livelock.
> 
> It is only reads that will trigger an error and scrubbing. If the DMA
> is continuously reading (such as a framebuffer), then there would not
> be an issue. What would be the usecase where a DMA continously writes
> to the same location without any synchronization with the cpu? I
> suppose one core could re-trigger a DMA while another core is doing
> the scrubbing. You would have to read the DMA data and be finished
> with it quicker than the scrubbing could get handled. I just wonder
> whether this is really only a theoretical problem, but not one in
> practice.

I don't think it's all that complicated if you consider speculative reads
from the CPU triggering the error. However, discussion with Catalin raised
another question (see below).

> >> There's not really a solution other than not doing s/w scrubbing or
> >> doing it in h/w. So it is up to individual drivers to decide what to
> >> do, but we have to provide this function just to enable EDAC.
> >
> > I think we need to avoid s/w scrubbing of non-cacheable memory altogether.
> 
> There's not really a way to determine the memory attributes easily
> though. Whether it works depends on the h/w. Calxeda's memory
> controller did have an exclusive monitor so I think this would have
> worked even in the non-coherent case.
> 
> What exactly is your proposal to do here? I think we should assume the
> h/w is designed correctly until we have a case that it is not.

Looking at the edac_mc_scrub_block code, atomic_scrub is always called with
a normal, cacheable mapping (kmap_atomic) so that doesn't help us (although
it means the exclusives will at least succeed).

The problem of speculative reads by the CPU could be solved by unmapped the
DMA buffer when we transfer the ownership over to the device (instead of
invalidating it after the transfer). However, I'm now slightly confused as
to how atomic_scrub fixes errors reported at any cache level higher than
L1. Do we need cache-flushing to ensure that the exclusive-store propagates
to the point of failure?

Will

  reply	other threads:[~2014-04-22 16:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-21 16:09 [PATCH v3] arm64: enable EDAC on arm64 Rob Herring
2014-04-22 10:24 ` Will Deacon
2014-04-22 12:54   ` Rob Herring
2014-04-22 13:26     ` Will Deacon
2014-04-22 15:23       ` Rob Herring
2014-04-22 16:01         ` Will Deacon [this message]
2014-04-22 16:29           ` Rob Herring
2014-04-23 17:04             ` Will Deacon
2014-05-09 17:33               ` Catalin Marinas
2014-05-09 17:55                 ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140422160100.GH9820@arm.com \
    --to=will.deacon@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).