All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Jamie Lokier <jamie@shareable.org>
Cc: Saeed Bishara <saeed@marvell.com>,
	Nicolas Pitre <nico@marvell.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"James E.J. Bottomley" <jejb@parisc-linux.org>,
	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
	"Shilimkar, Santosh" <santosh.shilimkar@ti.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
Date: Thu, 13 May 2010 08:47:11 +1000	[thread overview]
Message-ID: <1273704431.21352.136.camel@pasglop> (raw)
In-Reply-To: <20100512222154.GA6841@shareable.org>

On Wed, 2010-05-12 at 23:21 +0100, Jamie Lokier wrote:
> Shilimkar, Santosh wrote:
> > There was a memory write barrier missing before the DMA descriptors 
> > are handed over to DMA controller.
> 
> On that note, are the cache flush functions implicit memory barriers?

(Adding Fujita on CC)

That's a very good question. The generic inline implementation of
dma_sync_* is:

static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
					   size_t size,
					   enum dma_data_direction dir)
{
	struct dma_map_ops *ops = get_dma_ops(dev);

	BUG_ON(!valid_dma_direction(dir));
	if (ops->sync_single_for_cpu)
		ops->sync_single_for_cpu(dev, addr, size, dir);
	debug_dma_sync_single_for_cpu(dev, addr, size, dir);
}

Which means that for coherent architectures that do not implement
the ops->sync_* hooks, we are probably missing a barrier here... 

Thus if the above is expected to be a memory barrier, it's broken on
cache coherent powerpc for example. On non-coherent powerpc, we do cache
flushes and those are implicit barriers.

Now, in the case at hand, which is my ARM based NAS, I believe this
is non cache-coherent and thus uses cache flush ops. I don't know ARM
well enough but I would expect these to be implicit barriers. Russell ?
Nico ?

IE. You may have found a bug here though I don't know whether it's the
bug we are hitting right now :-)

Cheers,
Ben.

WARNING: multiple messages have this Message-ID (diff)
From: benh@kernel.crashing.org (Benjamin Herrenschmidt)
To: linux-arm-kernel@lists.infradead.org
Subject: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
Date: Thu, 13 May 2010 08:47:11 +1000	[thread overview]
Message-ID: <1273704431.21352.136.camel@pasglop> (raw)
In-Reply-To: <20100512222154.GA6841@shareable.org>

On Wed, 2010-05-12 at 23:21 +0100, Jamie Lokier wrote:
> Shilimkar, Santosh wrote:
> > There was a memory write barrier missing before the DMA descriptors 
> > are handed over to DMA controller.
> 
> On that note, are the cache flush functions implicit memory barriers?

(Adding Fujita on CC)

That's a very good question. The generic inline implementation of
dma_sync_* is:

static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
					   size_t size,
					   enum dma_data_direction dir)
{
	struct dma_map_ops *ops = get_dma_ops(dev);

	BUG_ON(!valid_dma_direction(dir));
	if (ops->sync_single_for_cpu)
		ops->sync_single_for_cpu(dev, addr, size, dir);
	debug_dma_sync_single_for_cpu(dev, addr, size, dir);
}

Which means that for coherent architectures that do not implement
the ops->sync_* hooks, we are probably missing a barrier here... 

Thus if the above is expected to be a memory barrier, it's broken on
cache coherent powerpc for example. On non-coherent powerpc, we do cache
flushes and those are implicit barriers.

Now, in the case at hand, which is my ARM based NAS, I believe this
is non cache-coherent and thus uses cache flush ops. I don't know ARM
well enough but I would expect these to be implicit barriers. Russell ?
Nico ?

IE. You may have found a bug here though I don't know whether it's the
bug we are hitting right now :-)

Cheers,
Ben.

WARNING: multiple messages have this Message-ID (diff)
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Jamie Lokier <jamie@shareable.org>
Cc: "Shilimkar, Santosh" <santosh.shilimkar@ti.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	Nicolas Pitre <nico@marvell.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"James E.J. Bottomley" <jejb@parisc-linux.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Saeed Bishara <saeed@marvell.com>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Subject: Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
Date: Thu, 13 May 2010 08:47:11 +1000	[thread overview]
Message-ID: <1273704431.21352.136.camel@pasglop> (raw)
In-Reply-To: <20100512222154.GA6841@shareable.org>

On Wed, 2010-05-12 at 23:21 +0100, Jamie Lokier wrote:
> Shilimkar, Santosh wrote:
> > There was a memory write barrier missing before the DMA descriptors 
> > are handed over to DMA controller.
> 
> On that note, are the cache flush functions implicit memory barriers?

(Adding Fujita on CC)

That's a very good question. The generic inline implementation of
dma_sync_* is:

static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
					   size_t size,
					   enum dma_data_direction dir)
{
	struct dma_map_ops *ops = get_dma_ops(dev);

	BUG_ON(!valid_dma_direction(dir));
	if (ops->sync_single_for_cpu)
		ops->sync_single_for_cpu(dev, addr, size, dir);
	debug_dma_sync_single_for_cpu(dev, addr, size, dir);
}

Which means that for coherent architectures that do not implement
the ops->sync_* hooks, we are probably missing a barrier here... 

Thus if the above is expected to be a memory barrier, it's broken on
cache coherent powerpc for example. On non-coherent powerpc, we do cache
flushes and those are implicit barriers.

Now, in the case at hand, which is my ARM based NAS, I believe this
is non cache-coherent and thus uses cache flush ops. I don't know ARM
well enough but I would expect these to be implicit barriers. Russell ?
Nico ?

IE. You may have found a bug here though I don't know whether it's the
bug we are hitting right now :-)

Cheers,
Ben.



  reply	other threads:[~2010-05-12 22:47 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-11  9:23 Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182) Benjamin Herrenschmidt
2010-05-11  9:23 ` Benjamin Herrenschmidt
2010-05-11 10:16 ` Jamie Lokier
2010-05-11 10:16   ` Jamie Lokier
2010-05-11 10:47   ` Benjamin Herrenschmidt
2010-05-11 10:47     ` Benjamin Herrenschmidt
2010-05-11 10:47     ` Benjamin Herrenschmidt
2010-05-11 10:57 ` Benjamin Herrenschmidt
2010-05-11 10:57   ` Benjamin Herrenschmidt
2010-05-11 11:14   ` Shilimkar, Santosh
2010-05-11 11:14     ` Shilimkar, Santosh
2010-05-12 22:21     ` Jamie Lokier
2010-05-12 22:21       ` Jamie Lokier
2010-05-12 22:47       ` Benjamin Herrenschmidt [this message]
2010-05-12 22:47         ` Benjamin Herrenschmidt
2010-05-12 22:47         ` Benjamin Herrenschmidt
2010-05-12 23:08         ` Russell King - ARM Linux
2010-05-12 23:08           ` Russell King - ARM Linux
2010-05-14 17:41           ` Jamie Lokier
2010-05-14 17:41             ` Jamie Lokier
2010-05-14 17:59             ` Russell King - ARM Linux
2010-05-14 17:59               ` Russell King - ARM Linux
2010-05-12 23:41         ` James Bottomley
2010-05-12 23:41           ` James Bottomley
2010-05-13  0:18           ` Benjamin Herrenschmidt
2010-05-13  0:18             ` Benjamin Herrenschmidt
2010-05-13  0:18             ` Benjamin Herrenschmidt
2010-05-13 15:39             ` James Bottomley
2010-05-13 15:39               ` James Bottomley
2010-05-13 23:53               ` Benjamin Herrenschmidt
2010-05-13 23:53                 ` Benjamin Herrenschmidt
2010-05-13 23:53                 ` Benjamin Herrenschmidt
2010-05-13  3:12         ` FUJITA Tomonori
2010-05-13  3:12           ` FUJITA Tomonori
2010-05-13  4:42           ` Benjamin Herrenschmidt
2010-05-13  4:42             ` Benjamin Herrenschmidt
2010-05-13  4:42             ` Benjamin Herrenschmidt
2010-05-12 15:00   ` Jan Kara
2010-05-12 15:00     ` Jan Kara
2010-05-12 22:13     ` Benjamin Herrenschmidt
2010-05-12 22:13       ` Benjamin Herrenschmidt
2010-05-13  0:15     ` Benjamin Herrenschmidt
2010-05-13  0:15       ` Benjamin Herrenschmidt
2010-05-13 15:12       ` Jan Kara
2010-05-13 15:12         ` Jan Kara
2010-05-13 21:33         ` Benjamin Herrenschmidt
2010-05-13 21:33           ` Benjamin Herrenschmidt
2010-05-13 23:51         ` Benjamin Herrenschmidt
2010-05-13 23:51           ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1273704431.21352.136.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=akpm@linux-foundation.org \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=jamie@shareable.org \
    --cc=jejb@parisc-linux.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nico@marvell.com \
    --cc=saeed@marvell.com \
    --cc=santosh.shilimkar@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.