linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
@ 2010-05-11  9:23 Benjamin Herrenschmidt
  2010-05-11 10:16 ` Jamie Lokier
  2010-05-11 10:57 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-11  9:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arm-kernel, Saeed Bishara, Nicolas Pitre, linux-ext4,
	Andrew Morton, James E.J. Bottomley

Hi folks !

So I've been adapting upstream to my little D-Link DNS323 NAS box, which
is based on a Marvell 88f5182 chipset, ie, orion5 family) and started
hitting a problem with both ext3 and ext4 (on top of md/raid1). 

First, the setup: This is a VIVT cache ARM CPU (so it -could- be some
cache issues though it's a bit weird as I would expect metadata to be
normal kernel pages and thus not hit cache aliases but then I'm no
specialist of how we deal with those critters).

Linux version 2.6.34-rc7-00019-g6c4f192-dirty (benh@pasglop) (gcc version 4.4.0 (GCC) ) #6 PREEMPT Tue May 11 17:02:04 EST 2010
CPU: Feroceon [41069260] revision 0 (ARMv5TEJ), cr=a0053177
CPU: VIVT data cache, VIVT instruction cache

There's two 1T disk using the built-in SATA chipset. There's an almost-1T
partition on each (sda1 and sdb1) which are setup as a raid1 md.

Then, I create a filesystem (I started with ext4 and mkfs'ed it back to ext3
after I started having problems but things persist).

I'm booted off an nfs root and basically the problem happens when I rsync over
a pre-made root to the disks.

On a freshly mkfs'ed ext3 or 4, mounted in /mnt/raid, I rsync the thing over,
used it a bit and generally observe the first symptoms in the form of a few

EXT3-fs (md0): warning: ext3_rename: Deleting old file (34103297), 2, error=-2
EXT3-fs (md0): warning: ext3_rename: Deleting old file (34113314), 2, error=-2
EXT3-fs (md0): warning: ext3_rename: Deleting old file (34127945), 2, error=-2

in my kernel log. I had a very similar message with ext4 iirc.

I unmount the filesystem and fsck it (it takes almost 1h) and I then get
a bunch of:

Problem in HTREE directory inode 5005385: node (12) has bad max hash
Problem in HTREE directory inode 5005385: node (13) has bad max hash
Problem in HTREE directory inode 5005385: node (14) has bad max hash
Invalid HTREE directory inode 5005385 (/raid-foo/var/lib/dpkg/info).  Clear HTree index<y>? yes

followed by a bunch of

Inode 4981405 ref count is 1, should be 2.  Fix<y>? yes                        

This is reasonably reproducable, as I have re-done it twice and I get similar
errors.

Tomorrow, if time permits, I'll see if I can reproduce on a smaller partition
without md, and eventually narrow it to a smaller and more predictible set of
operations.

Since I doubt ext3 is busted so dramatically in mainline for "normal" machines,
I tend to suspect things could be related to the infamous vivt caches. On the
other hand, it's pretty clearly metadata or journal corruption and I'm not
sure we ever do things that could cause aliases (such as vmap etc..) on
these things, and they shouldn't be mapped into userspace... unless it's fsck
itself that causes aliases to occur at the block device level ? (I do unmount
though before I run fsck).

On the other hand, it could also be a busticated marvell SATA driver :-)

I have no problem with the vendor kernel, but it's ancient (2.6.12) and based
on an out of tree variant of a Marvell originated BSP, so everything is
completely different, especially in the area of drivers for the chipset.

Anyways, I'll see if I can gather more data tomorrow as time, viruses and sick
kids permits.

In the meantime, any hint appreciated.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-11  9:23 Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182) Benjamin Herrenschmidt
@ 2010-05-11 10:16 ` Jamie Lokier
  2010-05-11 10:47   ` Benjamin Herrenschmidt
  2010-05-11 10:57 ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 22+ messages in thread
From: Jamie Lokier @ 2010-05-11 10:16 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linux-kernel, Nicolas Pitre, Saeed Bishara, James E.J. Bottomley,
	Andrew Morton, linux-ext4, linux-arm-kernel

Benjamin Herrenschmidt wrote:
> On the other hand, it could also be a busticated marvell SATA driver :-)

I've seen similar ext3 corruption errors with an IDE on a Sigma
Designs EM8620 with ARM core.  It's completely unrelated :-) But the
fault was in the IDE driver doing cache flushes too late after
starting to set up DMA.  To diagnose it, we luckily found a
reproducible test case, then sprinkled full cache flushes at various
places to narrow it down, and eventually found that one full cache
flush in the driver itself sorted it out, which told us where the
fault was.  From there it was easy.

-- Jamie

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-11 10:16 ` Jamie Lokier
@ 2010-05-11 10:47   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-11 10:47 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Saeed Bishara, Nicolas Pitre, linux-kernel, James E.J. Bottomley,
	Andrew Morton, linux-ext4, linux-arm-kernel

On Tue, 2010-05-11 at 11:16 +0100, Jamie Lokier wrote:
> 
> I've seen similar ext3 corruption errors with an IDE on a Sigma
> Designs EM8620 with ARM core.  It's completely unrelated :-) But the
> fault was in the IDE driver doing cache flushes too late after
> starting to set up DMA.  To diagnose it, we luckily found a
> reproducible test case, then sprinkled full cache flushes at various
> places to narrow it down, and eventually found that one full cache
> flush in the driver itself sorted it out, which told us where the
> fault was.  From there it was easy. 

Ok, well, that's -a- possibility... we'll see. I'm doing some more tests
to try to characterize the problem a bit better, then we can play with
the driver. I think it's also worth trying with earlier kernels in case
it's some kind of regression, though that means I'll have to backport my
board support for that little box :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-11  9:23 Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182) Benjamin Herrenschmidt
  2010-05-11 10:16 ` Jamie Lokier
@ 2010-05-11 10:57 ` Benjamin Herrenschmidt
  2010-05-11 11:14   ` Shilimkar, Santosh
  2010-05-12 15:00   ` Jan Kara
  1 sibling, 2 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-11 10:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arm-kernel, Saeed Bishara, Nicolas Pitre, linux-ext4,
	Andrew Morton, James E.J. Bottomley

On Tue, 2010-05-11 at 19:23 +1000, Benjamin Herrenschmidt wrote:

> Since I doubt ext3 is busted so dramatically in mainline for "normal" machines,
> I tend to suspect things could be related to the infamous vivt caches. On the
> other hand, it's pretty clearly metadata or journal corruption and I'm not
> sure we ever do things that could cause aliases (such as vmap etc..) on
> these things, and they shouldn't be mapped into userspace... unless it's fsck
> itself that causes aliases to occur at the block device level ? (I do unmount
> though before I run fsck).
> 
> On the other hand, it could also be a busticated marvell SATA driver :-)
> 
> I have no problem with the vendor kernel, but it's ancient (2.6.12) and based
> on an out of tree variant of a Marvell originated BSP, so everything is
> completely different, especially in the area of drivers for the chipset.
> 
> Anyways, I'll see if I can gather more data tomorrow as time, viruses and sick
> kids permits.
> 
> In the meantime, any hint appreciated.

A quick other test which brings more infos, using a smaller (about 5GB)
partition and no md or raid involved:

 - Boot with NFS root
 - mkfs /dev/sdb2 (no md or raid involved)
 - mount /dev/sdb2 /mnt/test
 - rsync -avx /test-stuff /mnt/test
 - cd /mnt/test
 - md5sum -c ~/test-stuff-sums.txt

That gives me a whole bunch of:

md5sum: ./usr/bin/debconf-escape: No such file or directory
./usr/bin/debconf-escape: FAILED open or read
./usr/bin/stat: OK
md5sum: ./usr/bin/chrt: No such file or directory
./usr/bin/chrt: FAILED open or read

In fact, if I do ls /mnt/test/usr/bin/ I see debconf but if I do
ls /mnt/test/usr/bin/chrt then I get No such file or directory.

So something is badly wrong :-)

Now, trying without the dir_index feature (mkfs.ext3 -O ^dir_index)
and it works fine. All my md5sum's are correct and fsck passes.

So there's what looks like a problem specific to htree's. I don't think
it's a SATA driver problem (doesn't smell like it but we can't
completely dismiss the possibility yet). Could be a VIVT issue but then
why ? I don't see ext3 playing with virtual mappings and none of that
should alias with userspace...

Or is it incorrectly accessing pages while they are DMA'ed to or from ?
IE. Accessing with the CPU pages between dma_map_* and dma_unmap_* ?
That will break on a number of setups including swiotlb on x86 so I tend
to doubt it but who knows...

Anyways, enough for tonight.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-11 10:57 ` Benjamin Herrenschmidt
@ 2010-05-11 11:14   ` Shilimkar, Santosh
  2010-05-12 22:21     ` Jamie Lokier
  2010-05-12 15:00   ` Jan Kara
  1 sibling, 1 reply; 22+ messages in thread
From: Shilimkar, Santosh @ 2010-05-11 11:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linux-kernel@vger.kernel.org
  Cc: Nicolas Pitre, Saeed Bishara, James E.J. Bottomley, Andrew Morton,
	linux-ext4@vger.kernel.org, linux-arm-kernel@lists.infradead.org

> -----Original Message-----
> From: linux-arm-kernel-bounces@lists.infradead.org [mailto:linux-arm-kernel-
> bounces@lists.infradead.org] On Behalf Of Benjamin Herrenschmidt
> Sent: Tuesday, May 11, 2010 4:28 PM
> To: linux-kernel@vger.kernel.org
> Cc: Nicolas Pitre; Saeed Bishara; James E.J. Bottomley; Andrew Morton; linux-ext4@vger.kernel.org;
> linux-arm-kernel@lists.infradead.org
> Subject: Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
> 
> On Tue, 2010-05-11 at 19:23 +1000, Benjamin Herrenschmidt wrote:
> 
> > Since I doubt ext3 is busted so dramatically in mainline for "normal" machines,
> > I tend to suspect things could be related to the infamous vivt caches. On the
> > other hand, it's pretty clearly metadata or journal corruption and I'm not
> > sure we ever do things that could cause aliases (such as vmap etc..) on
> > these things, and they shouldn't be mapped into userspace... unless it's fsck
> > itself that causes aliases to occur at the block device level ? (I do unmount
> > though before I run fsck).
> >
> > On the other hand, it could also be a busticated marvell SATA driver :-)
> >
> > I have no problem with the vendor kernel, but it's ancient (2.6.12) and based
> > on an out of tree variant of a Marvell originated BSP, so everything is
> > completely different, especially in the area of drivers for the chipset.
> >
> > Anyways, I'll see if I can gather more data tomorrow as time, viruses and sick
> > kids permits.
> >
> > In the meantime, any hint appreciated.
> 
> A quick other test which brings more infos, using a smaller (about 5GB)
> partition and no md or raid involved:
> 
>  - Boot with NFS root
>  - mkfs /dev/sdb2 (no md or raid involved)
>  - mount /dev/sdb2 /mnt/test
>  - rsync -avx /test-stuff /mnt/test
>  - cd /mnt/test
>  - md5sum -c ~/test-stuff-sums.txt
> 
> That gives me a whole bunch of:
> 
> md5sum: ./usr/bin/debconf-escape: No such file or directory
> ./usr/bin/debconf-escape: FAILED open or read
> ./usr/bin/stat: OK
> md5sum: ./usr/bin/chrt: No such file or directory
> ./usr/bin/chrt: FAILED open or read
> 
> In fact, if I do ls /mnt/test/usr/bin/ I see debconf but if I do
> ls /mnt/test/usr/bin/chrt then I get No such file or directory.
> 
> So something is badly wrong :-)
> 
> Now, trying without the dir_index feature (mkfs.ext3 -O ^dir_index)
> and it works fine. All my md5sum's are correct and fsck passes.
> 
> So there's what looks like a problem specific to htree's. I don't think
> it's a SATA driver problem (doesn't smell like it but we can't
> completely dismiss the possibility yet). Could be a VIVT issue but then
> why ? I don't see ext3 playing with virtual mappings and none of that
> should alias with userspace...
> 
> Or is it incorrectly accessing pages while they are DMA'ed to or from ?
> IE. Accessing with the CPU pages between dma_map_* and dma_unmap_* ?
> That will break on a number of setups including swiotlb on x86 so I tend
> to doubt it but who knows...

Not sure if it is related but we saw similar issue with MMC driver on
Cortex-A9 MP core. I was seeing similar weird issues with EXT3 and 
everything use to be fine with EXT2.

All those errors you mentioned use to come and at times whole FS
Use to get corrupted.

It was root-caused to MMC driver which uses DMA engine for TX/RX. The
issue was mainly with writes and hence EXT3 was having issue which does
journaling and hence more writes.
There was a memory write barrier missing before the DMA descriptors 
are handed over to DMA controller.

Regards,
Santosh


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-11 10:57 ` Benjamin Herrenschmidt
  2010-05-11 11:14   ` Shilimkar, Santosh
@ 2010-05-12 15:00   ` Jan Kara
  2010-05-12 22:13     ` Benjamin Herrenschmidt
  2010-05-13  0:15     ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 22+ messages in thread
From: Jan Kara @ 2010-05-12 15:00 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linux-kernel, linux-arm-kernel, Saeed Bishara, Nicolas Pitre,
	linux-ext4, Andrew Morton, James E.J. Bottomley

> On Tue, 2010-05-11 at 19:23 +1000, Benjamin Herrenschmidt wrote:
> 
> > Since I doubt ext3 is busted so dramatically in mainline for "normal" machines,
> > I tend to suspect things could be related to the infamous vivt caches. On the
> > other hand, it's pretty clearly metadata or journal corruption and I'm not
> > sure we ever do things that could cause aliases (such as vmap etc..) on
> > these things, and they shouldn't be mapped into userspace... unless it's fsck
> > itself that causes aliases to occur at the block device level ? (I do unmount
> > though before I run fsck).
> > 
> > On the other hand, it could also be a busticated marvell SATA driver :-)
> > 
> > I have no problem with the vendor kernel, but it's ancient (2.6.12) and based
> > on an out of tree variant of a Marvell originated BSP, so everything is
> > completely different, especially in the area of drivers for the chipset.
> > 
> > Anyways, I'll see if I can gather more data tomorrow as time, viruses and sick
> > kids permits.
> > 
> > In the meantime, any hint appreciated.
> 
> A quick other test which brings more infos, using a smaller (about 5GB)
> partition and no md or raid involved:
> 
>  - Boot with NFS root
>  - mkfs /dev/sdb2 (no md or raid involved)
>  - mount /dev/sdb2 /mnt/test
>  - rsync -avx /test-stuff /mnt/test
>  - cd /mnt/test
>  - md5sum -c ~/test-stuff-sums.txt
> 
> That gives me a whole bunch of:
> 
> md5sum: ./usr/bin/debconf-escape: No such file or directory
> ./usr/bin/debconf-escape: FAILED open or read
> ./usr/bin/stat: OK
> md5sum: ./usr/bin/chrt: No such file or directory
> ./usr/bin/chrt: FAILED open or read
  Could you get the filesystem image with: e2image -r /dev/sdb2 buggy-image
bzip2 it and make it available somewhere? Maybe I could guess something
from the way the filesystem gets corrupted.
  Oh, and also overwrite the partition with zeros before calling mkfs to make
the analysis simpler.

> In fact, if I do ls /mnt/test/usr/bin/ I see debconf but if I do
> ls /mnt/test/usr/bin/chrt then I get No such file or directory.
> 
> So something is badly wrong :-)
> 
> Now, trying without the dir_index feature (mkfs.ext3 -O ^dir_index)
> and it works fine. All my md5sum's are correct and fsck passes.
  Funny. Not sure how that could happen...

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-12 15:00   ` Jan Kara
@ 2010-05-12 22:13     ` Benjamin Herrenschmidt
  2010-05-13  0:15     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-12 22:13 UTC (permalink / raw)
  To: Jan Kara
  Cc: Nicolas Pitre, linux-kernel, Saeed Bishara, Andrew Morton,
	linux-ext4, James E.J. Bottomley, linux-arm-kernel


>   Could you get the filesystem image with: e2image -r /dev/sdb2 buggy-image
> bzip2 it and make it available somewhere? Maybe I could guess something
> from the way the filesystem gets corrupted.
>   Oh, and also overwrite the partition with zeros before calling mkfs to make
> the analysis simpler.

Will do asap.

> > In fact, if I do ls /mnt/test/usr/bin/ I see debconf but if I do
> > ls /mnt/test/usr/bin/chrt then I get No such file or directory.
> > 
> > So something is badly wrong :-)
> > 
> > Now, trying without the dir_index feature (mkfs.ext3 -O ^dir_index)
> > and it works fine. All my md5sum's are correct and fsck passes.
>   Funny. Not sure how that could happen...

Yeah, strange. I looked at the code a bit and I don't see anything
obvious, ext3 seems to be using the same standard buffer head access
methods for the htree as for the rest of the metadata, and I see no
fancy playing with virtual addresses that could explain a VIVT problem.

I could be an issue with the SATA controller that gets more easily
triggered by the access patterns caused by htree, though that's a bit
strange.

I'll let you know when I have something to look at.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-11 11:14   ` Shilimkar, Santosh
@ 2010-05-12 22:21     ` Jamie Lokier
  2010-05-12 22:47       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 22+ messages in thread
From: Jamie Lokier @ 2010-05-12 22:21 UTC (permalink / raw)
  To: Shilimkar, Santosh
  Cc: Benjamin Herrenschmidt, linux-kernel@vger.kernel.org,
	Nicolas Pitre, Saeed Bishara, James E.J. Bottomley, Andrew Morton,
	linux-ext4@vger.kernel.org, linux-arm-kernel@lists.infradead.org

Shilimkar, Santosh wrote:
> There was a memory write barrier missing before the DMA descriptors 
> are handed over to DMA controller.

On that note, are the cache flush functions implicit memory barriers?

-- Jamie

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-12 22:21     ` Jamie Lokier
@ 2010-05-12 22:47       ` Benjamin Herrenschmidt
  2010-05-12 23:08         ` Russell King - ARM Linux
                           ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-12 22:47 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Saeed Bishara, Nicolas Pitre, linux-kernel@vger.kernel.org,
	James E.J. Bottomley, FUJITA Tomonori, Shilimkar, Santosh,
	Andrew Morton, linux-ext4@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org

On Wed, 2010-05-12 at 23:21 +0100, Jamie Lokier wrote:
> Shilimkar, Santosh wrote:
> > There was a memory write barrier missing before the DMA descriptors 
> > are handed over to DMA controller.
> 
> On that note, are the cache flush functions implicit memory barriers?

(Adding Fujita on CC)

That's a very good question. The generic inline implementation of
dma_sync_* is:

static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
					   size_t size,
					   enum dma_data_direction dir)
{
	struct dma_map_ops *ops = get_dma_ops(dev);

	BUG_ON(!valid_dma_direction(dir));
	if (ops->sync_single_for_cpu)
		ops->sync_single_for_cpu(dev, addr, size, dir);
	debug_dma_sync_single_for_cpu(dev, addr, size, dir);
}

Which means that for coherent architectures that do not implement
the ops->sync_* hooks, we are probably missing a barrier here... 

Thus if the above is expected to be a memory barrier, it's broken on
cache coherent powerpc for example. On non-coherent powerpc, we do cache
flushes and those are implicit barriers.

Now, in the case at hand, which is my ARM based NAS, I believe this
is non cache-coherent and thus uses cache flush ops. I don't know ARM
well enough but I would expect these to be implicit barriers. Russell ?
Nico ?

IE. You may have found a bug here though I don't know whether it's the
bug we are hitting right now :-)

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-12 22:47       ` Benjamin Herrenschmidt
@ 2010-05-12 23:08         ` Russell King - ARM Linux
  2010-05-14 17:41           ` Jamie Lokier
  2010-05-12 23:41         ` James Bottomley
  2010-05-13  3:12         ` FUJITA Tomonori
  2 siblings, 1 reply; 22+ messages in thread
From: Russell King - ARM Linux @ 2010-05-12 23:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jamie Lokier, Saeed Bishara, Nicolas Pitre,
	linux-kernel@vger.kernel.org, James E.J. Bottomley,
	FUJITA Tomonori, Shilimkar, Santosh, Andrew Morton,
	linux-ext4@vger.kernel.org, linux-arm-kernel@lists.infradead.org

On Thu, May 13, 2010 at 08:47:11AM +1000, Benjamin Herrenschmidt wrote:
> Now, in the case at hand, which is my ARM based NAS, I believe this
> is non cache-coherent and thus uses cache flush ops. I don't know ARM
> well enough but I would expect these to be implicit barriers. Russell ?
> Nico ?

ARMv5 doesn't have a weak memory ordering model, and doesn't have any
memory barrier instructions.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-12 22:47       ` Benjamin Herrenschmidt
  2010-05-12 23:08         ` Russell King - ARM Linux
@ 2010-05-12 23:41         ` James Bottomley
  2010-05-13  0:18           ` Benjamin Herrenschmidt
  2010-05-13  3:12         ` FUJITA Tomonori
  2 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2010-05-12 23:41 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jamie Lokier, Shilimkar, Santosh, linux-ext4@vger.kernel.org,
	Nicolas Pitre, linux-kernel@vger.kernel.org, James E.J. Bottomley,
	Andrew Morton, Saeed Bishara,
	linux-arm-kernel@lists.infradead.org, FUJITA Tomonori

On Thu, 2010-05-13 at 08:47 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2010-05-12 at 23:21 +0100, Jamie Lokier wrote:
> > Shilimkar, Santosh wrote:
> > > There was a memory write barrier missing before the DMA descriptors 
> > > are handed over to DMA controller.
> > 
> > On that note, are the cache flush functions implicit memory barriers?

Not exactly ... they *should* be stream ordered with respect to accesses
to the memory they're flushing (which isn't the same thing, and no-one
ever went broke overestimating the stupidity of chip designers, but if a
flush instruction needs explicit ordering, I'd expect that to be built
into the arch layer).

> (Adding Fujita on CC)
> 
> That's a very good question. The generic inline implementation of
> dma_sync_* is:
> 
> static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
> 					   size_t size,
> 					   enum dma_data_direction dir)
> {
> 	struct dma_map_ops *ops = get_dma_ops(dev);
> 
> 	BUG_ON(!valid_dma_direction(dir));
> 	if (ops->sync_single_for_cpu)
> 		ops->sync_single_for_cpu(dev, addr, size, dir);
> 	debug_dma_sync_single_for_cpu(dev, addr, size, dir);
> }
> 
> Which means that for coherent architectures that do not implement
> the ops->sync_* hooks, we are probably missing a barrier here... 
> 
> Thus if the above is expected to be a memory barrier, it's broken on
> cache coherent powerpc for example. On non-coherent powerpc, we do cache
> flushes and those are implicit barriers.

Can you explain this a little more.  On a cache coherent machine, the
sync is a nop ... why would you want a nop to be any type of barrier?

James



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-12 15:00   ` Jan Kara
  2010-05-12 22:13     ` Benjamin Herrenschmidt
@ 2010-05-13  0:15     ` Benjamin Herrenschmidt
  2010-05-13 15:12       ` Jan Kara
  1 sibling, 1 reply; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-13  0:15 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-kernel, linux-arm-kernel, Saeed Bishara, Nicolas Pitre,
	linux-ext4, Andrew Morton, James E.J. Bottomley

On Wed, 2010-05-12 at 17:00 +0200, Jan Kara wrote:

>  Could you get the filesystem image with: e2image -r /dev/sdb2 buggy-image
> bzip2 it and make it available somewhere? Maybe I could guess something
> from the way the filesystem gets corrupted.
>  Oh, and also overwrite the partition with zeros before calling mkfs to make
> the analysis simpler.

Here we go.

Created a smaller partition (100MB). dd'ed /dev/zero to it, dd'ed it off
to NFS host to locally cmp it with /dev/zero to make sure it's nice and
clean. It is. Then mkfs.ext3, mount, rsync over /usr/lib of my test
root, unmount, and dd it off to the NFS host again.

The result shows the same errors with fsck.ext3 -n -f ./test.img

I uploaded it at:

http://gate.crashing.org/~benh/test.img.bz2

Cheers,
Ben.

> > In fact, if I do ls /mnt/test/usr/bin/ I see debconf but if I do
> > ls /mnt/test/usr/bin/chrt then I get No such file or directory.
> > 
> > So something is badly wrong :-)
> > 
> > Now, trying without the dir_index feature (mkfs.ext3 -O ^dir_index)
> > and it works fine. All my md5sum's are correct and fsck passes.
>   Funny. Not sure how that could happen...
> 
> 								Honza



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-12 23:41         ` James Bottomley
@ 2010-05-13  0:18           ` Benjamin Herrenschmidt
  2010-05-13 15:39             ` James Bottomley
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-13  0:18 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-kernel@vger.kernel.org, Nicolas Pitre, Jamie Lokier,
	Saeed Bishara, James E.J. Bottomley, FUJITA Tomonori,
	Shilimkar, Santosh, Andrew Morton, linux-ext4@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org

On Wed, 2010-05-12 at 18:41 -0500, James Bottomley wrote:
> > Which means that for coherent architectures that do not implement
> > the ops->sync_* hooks, we are probably missing a barrier here... 
> > 
> > Thus if the above is expected to be a memory barrier, it's broken on
> > cache coherent powerpc for example. On non-coherent powerpc, we do
> cache
> > flushes and those are implicit barriers.
> 
> Can you explain this a little more.  On a cache coherent machine, the
> sync is a nop ... why would you want a nop to be any type of barrier?

Well if the driver can peek at the data after the sync, and have any
kind of ordering guarantee that it doesn't get stale data (the load
isn't prefetched or speculated early), that would require an mb() or at
least rmb().

It would seem sensible for drivers to assume that something like
dma_cache_sync_for_cpu() thus has the semantics of an rmb() at least,
no ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-12 22:47       ` Benjamin Herrenschmidt
  2010-05-12 23:08         ` Russell King - ARM Linux
  2010-05-12 23:41         ` James Bottomley
@ 2010-05-13  3:12         ` FUJITA Tomonori
  2010-05-13  4:42           ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 22+ messages in thread
From: FUJITA Tomonori @ 2010-05-13  3:12 UTC (permalink / raw)
  To: benh
  Cc: jamie, santosh.shilimkar, linux-ext4, nico, linux-kernel, jejb,
	akpm, saeed, linux-arm-kernel, fujita.tomonori

On Thu, 13 May 2010 08:47:11 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Wed, 2010-05-12 at 23:21 +0100, Jamie Lokier wrote:
> > Shilimkar, Santosh wrote:
> > > There was a memory write barrier missing before the DMA descriptors 
> > > are handed over to DMA controller.
> > 
> > On that note, are the cache flush functions implicit memory barriers?
> 
> (Adding Fujita on CC)
> 
> That's a very good question. The generic inline implementation of
> dma_sync_* is:
> 
> static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
> 					   size_t size,
> 					   enum dma_data_direction dir)
> {
> 	struct dma_map_ops *ops = get_dma_ops(dev);
> 
> 	BUG_ON(!valid_dma_direction(dir));
> 	if (ops->sync_single_for_cpu)
> 		ops->sync_single_for_cpu(dev, addr, size, dir);
> 	debug_dma_sync_single_for_cpu(dev, addr, size, dir);
> }
> 
> Which means that for coherent architectures that do not implement
> the ops->sync_* hooks, we are probably missing a barrier here... 
> 
> Thus if the above is expected to be a memory barrier, it's broken on
> cache coherent powerpc for example. On non-coherent powerpc, we do cache
> flushes and those are implicit barriers.

X86 OOSTORE uses a memory barrier dma_sync_single_for_device (seems
that some mips archs also use it and do cache operations).

I think that the DMA-API says that

- dma_sync_single_for_device() makes sure the data ready for DMA.

- dma_sync_single_for_cpu() makes sure that drivers doesn't get the
  stale data after DMA.

I guess, it means if an architecture need a memory barrier (not only
cache operations) to guarantee the above, the architecture needs to
take care of it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-13  3:12         ` FUJITA Tomonori
@ 2010-05-13  4:42           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-13  4:42 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: linux-kernel, nico, jamie, saeed, jejb, santosh.shilimkar, akpm,
	linux-ext4, linux-arm-kernel

On Thu, 2010-05-13 at 12:12 +0900, FUJITA Tomonori wrote:
> X86 OOSTORE uses a memory barrier dma_sync_single_for_device (seems
> that some mips archs also use it and do cache operations).
> 
> I think that the DMA-API says that
> 
> - dma_sync_single_for_device() makes sure the data ready for DMA.
> 
> - dma_sync_single_for_cpu() makes sure that drivers doesn't get the
>   stale data after DMA.
> 
> I guess, it means if an architecture need a memory barrier (not only
> cache operations) to guarantee the above, the architecture needs to
> take care of it. 

Right, but doing an indirect function pointer call for a memory barrier
in what can be a pretty hot code path doesn't sound like a great idea
if it can be avoided. I'd rather we define some inline that the arch
can stick in there if it wishes so.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-13  0:15     ` Benjamin Herrenschmidt
@ 2010-05-13 15:12       ` Jan Kara
  2010-05-13 21:33         ` Benjamin Herrenschmidt
  2010-05-13 23:51         ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 22+ messages in thread
From: Jan Kara @ 2010-05-13 15:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jan Kara, linux-kernel, linux-arm-kernel, Saeed Bishara,
	Nicolas Pitre, linux-ext4, Andrew Morton, James E.J. Bottomley

On Thu 13-05-10 10:15:14, Benjamin Herrenschmidt wrote:
> On Wed, 2010-05-12 at 17:00 +0200, Jan Kara wrote:
> 
> >  Could you get the filesystem image with: e2image -r /dev/sdb2 buggy-image
> > bzip2 it and make it available somewhere? Maybe I could guess something
> > from the way the filesystem gets corrupted.
> >  Oh, and also overwrite the partition with zeros before calling mkfs to make
> > the analysis simpler.
> 
> Here we go.
> 
> Created a smaller partition (100MB). dd'ed /dev/zero to it, dd'ed it off
> to NFS host to locally cmp it with /dev/zero to make sure it's nice and
> clean. It is. Then mkfs.ext3, mount, rsync over /usr/lib of my test
> root, unmount, and dd it off to the NFS host again.
> 
> The result shows the same errors with fsck.ext3 -n -f ./test.img
> 
> I uploaded it at:
> 
> http://gate.crashing.org/~benh/test.img.bz2
  Thanks. I had a look at it. It seems that it's really the HTREE code
that's causing problems. When a directory is using HTREEs, it should contain
directory entries in some hash range in each block. Say entries with hash
0-0xabcd1234 in block 0, entries with hash 0xabcd1235-0xffffffff in block
1. But the corrupted directories do not obey this rule. In particular e.g.
block 1 should contain entries with hashes 0-0x2fac6780 but it contains
lots of entries with hash larger than 0x2fac6780. I've spent quite some
time looking into the code and I'm not yet sure how that could happen.
  My first suspect would be code splitting blocks when a block gets full.
The code allocates new block for a directory, then inside the buffer
creates an array of

struct dx_map_entry
{
        u32 hash;
        u16 offs;
        u16 size;
};

entries describing entries in the block to split. Then this array is
sorted, we find a place to split and copy appropriate entries to the new
block and compact the old block.
  In the binary dump of one of corrupted directories, I've found the built
array:
000e70 00 00 00 00 00 00 00 00 16 bd f8 01 f0 03 10 00
000e80 7a b1 0d 05 e0 03 10 00 2e 19 bb 08 cc 03 14 00
000e90 60 bd bf 09 e0 03 10 00 a6 79 5d 0a b0 03 0c 00
000ea0 08 5a 1b 0b a0 03 10 00 ac b6 15 0d a0 03 10 00
000eb0 d8 3e cc 0f 84 03 10 00 8c 0f f1 27 74 03 10 00
000ec0 24 09 85 2f 60 03 14 00 b8 61 c3 35 60 03 14 00
000ed0 c4 27 62 37 60 03 14 00 7c d9 07 39 60 03 14 00
000ee0 80 d6 70 3a 60 03 14 00 28 49 0e 3b 94 03 0c 00
000ef0 fe c6 81 3f 94 03 0c 00 78 b8 1e 40 b0 03 0c 00
000f00 2a 3e 04 42 b0 03 0c 00 82 0a 81 46 b0 03 0c 00
000f10 bc 2a 87 47 b0 03 0c 00 ea 9e f3 47 b0 03 0c 00
000f20 20 ac 28 48 bc 02 18 00 ac 20 f2 4c 5c 02 18 00
000f30 62 10 1d 54 bc 02 18 00 5e f9 f6 57 b0 03 0c 00
000f40 78 d4 58 5e 5c 02 18 00 da ab 8f 62 5c 02 18 00
000f50 b0 15 98 68 e4 01 18 00 2e a4 15 79 5c 02 18 00
000f60 b2 d9 16 79 bc 02 18 00 3e eb fe 7a a0 01 18 00
000f70 40 fa c2 81 a0 01 18 00 a0 c3 79 84 a0 01 18 00
000f80 3a cc a8 85 a0 01 18 00 aa 10 53 87 44 01 18 00
000f90 78 33 b0 8a 44 01 18 00 f4 da a1 8b 8c 02 18 00
000fa0 3e 6a d6 8c 10 02 1c 00 2a 27 17 8e 44 01 18 00
000fb0 c0 4a 91 90 10 02 1c 00 b2 61 6c 9a 70 01 14 00
000fc0 e0 ef eb a3 8c 02 18 00 c2 e3 00 a5 8c 02 18 00
000fd0 c0 5d 6e b6 5c 02 18 00 a8 1c 6e b7 5c 02 18 00
000fe0 fc 45 ea ba 40 00 14 00 52 d5 21 be a0 01 18 00
000ff0 4e 66 55 d1 a0 01 18 00 b0 35 b4 d3 8c 02 18 00
<next directory block starts>

If you look at the array more in detail, you'll notice that 'offs' part of
structure is sometimes identical. That should never happen because 'offs'
contains offset of the corresponding directory entry in a block. So when
offsets are identical in this array, subsequent move will copy some entries
several times and leave entries that should be moved in the old block,
resulting in a corruption we see.
  The question is, how could offsets be the same? dx_make_map seems to get
it right and dx_sort_map as well. Maybe I'd peek into disassembly of
dx_sort_map to see whether swap() macro does what it should... If that
looks OK, you could try adding some debug checks into dx_sort_map and try
to catch the moment when duplicate offsets are created...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-13  0:18           ` Benjamin Herrenschmidt
@ 2010-05-13 15:39             ` James Bottomley
  2010-05-13 23:53               ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2010-05-13 15:39 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Saeed Bishara, Nicolas Pitre, Jamie Lokier,
	linux-kernel@vger.kernel.org, James E.J. Bottomley,
	FUJITA Tomonori, Shilimkar, Santosh, Andrew Morton,
	linux-ext4@vger.kernel.org, linux-arm-kernel@lists.infradead.org

On Thu, 2010-05-13 at 10:18 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2010-05-12 at 18:41 -0500, James Bottomley wrote:
> > > Which means that for coherent architectures that do not implement
> > > the ops->sync_* hooks, we are probably missing a barrier here... 
> > > 
> > > Thus if the above is expected to be a memory barrier, it's broken on
> > > cache coherent powerpc for example. On non-coherent powerpc, we do
> > cache
> > > flushes and those are implicit barriers.
> > 
> > Can you explain this a little more.  On a cache coherent machine, the
> > sync is a nop ... why would you want a nop to be any type of barrier?
> 
> Well if the driver can peek at the data after the sync, and have any
> kind of ordering guarantee that it doesn't get stale data (the load
> isn't prefetched or speculated early), that would require an mb() or at
> least rmb().

So the guarantee that it doesn't look at stale data after the sync on a
cache coherent machine means ordering the dma write to physical memory
with the subsequent cpu read ... no memory barrier can actually do that.
Usually this is done externally, by making sure the memory change is
visible before sending the irq that tells the driver it is there ... on
some numa systems, this can be a problem (hence the mmiowb/relaxed read
thing).

> It would seem sensible for drivers to assume that something like
> dma_cache_sync_for_cpu() thus has the semantics of an rmb() at least,
> no ?

I still don't see why ... I don't see how you'd ever get a read of the
area speculated before the event that tells the driver its OK to read
the memory.  In theory, I agree that it looks logical to require the
read never be speculated before the sync ... but in practice, I don't
see there ever being a problem with this since the sync isn't the event
that says the memory is safe to read.

James



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-13 15:12       ` Jan Kara
@ 2010-05-13 21:33         ` Benjamin Herrenschmidt
  2010-05-13 23:51         ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-13 21:33 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-kernel, linux-arm-kernel, Saeed Bishara, Nicolas Pitre,
	linux-ext4, Andrew Morton, James E.J. Bottomley

On Thu, 2010-05-13 at 17:12 +0200, Jan Kara wrote:

> If you look at the array more in detail, you'll notice that 'offs' part of
> structure is sometimes identical. That should never happen because 'offs'
> contains offset of the corresponding directory entry in a block. So when
> offsets are identical in this array, subsequent move will copy some entries
> several times and leave entries that should be moved in the old block,
> resulting in a corruption we see.
>   The question is, how could offsets be the same? dx_make_map seems to get
> it right and dx_sort_map as well. Maybe I'd peek into disassembly of
> dx_sort_map to see whether swap() macro does what it should... If that
> looks OK, you could try adding some debug checks into dx_sort_map and try
> to catch the moment when duplicate offsets are created...

Thanks. I'll have a look. It could be a miscompile indeed. My cross-gcc
is a 4.4.0, maybe that's the root of the problem.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-13 15:12       ` Jan Kara
  2010-05-13 21:33         ` Benjamin Herrenschmidt
@ 2010-05-13 23:51         ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-13 23:51 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-kernel, linux-arm-kernel, Saeed Bishara, Nicolas Pitre,
	linux-ext4, Andrew Morton, James E.J. Bottomley

On Thu, 2010-05-13 at 17:12 +0200, Jan Kara wrote:

> If you look at the array more in detail, you'll notice that 'offs' part of
> structure is sometimes identical. That should never happen because 'offs'
> contains offset of the corresponding directory entry in a block. So when
> offsets are identical in this array, subsequent move will copy some entries
> several times and leave entries that should be moved in the old block,
> resulting in a corruption we see.
>   The question is, how could offsets be the same? dx_make_map seems to get
> it right and dx_sort_map as well. Maybe I'd peek into disassembly of
> dx_sort_map to see whether swap() macro does what it should... If that
> looks OK, you could try adding some debug checks into dx_sort_map and try
> to catch the moment when duplicate offsets are created...

Ok so a very quick test with another compiler and the problem -appears-
gone. I'll investigate more in depth later tonight or this week-end.

Looks like it may all have been a false alarm, just don't use
gcc-4.4.0 :-)

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-13 15:39             ` James Bottomley
@ 2010-05-13 23:53               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-13 23:53 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-ext4@vger.kernel.org, Nicolas Pitre, Jamie Lokier,
	linux-kernel@vger.kernel.org, James E.J. Bottomley,
	FUJITA Tomonori, Shilimkar, Santosh, Andrew Morton, Saeed Bishara,
	linux-arm-kernel@lists.infradead.org


> > Well if the driver can peek at the data after the sync, and have any
> > kind of ordering guarantee that it doesn't get stale data (the load
> > isn't prefetched or speculated early), that would require an mb() or at
> > least rmb().
> 
> So the guarantee that it doesn't look at stale data after the sync on a
> cache coherent machine means ordering the dma write to physical memory
> with the subsequent cpu read ... no memory barrier can actually do that.
> Usually this is done externally, by making sure the memory change is
> visible before sending the irq that tells the driver it is there ... on
> some numa systems, this can be a problem (hence the mmiowb/relaxed read
> thing).

Right. I was more thinking about something along the lines of the device
writes to some descriptor and then queues it up in a list. So we have
two different DMA data structures with a data dependency.

The question is whether the driver can expect to just do the sync, read
the queue, then sync again, then read the descriptor, or does it also
need an explicit rmb in between ?

Let's assume that the ordering is maintained at the DMA -> coherency
domain level, the question is will the sync op be guaranteed to toss
prefetch / speculative loads ? (IE. Guarantee the order of the two loads
done by the CPU) or do we need an explicit rmb.

> > It would seem sensible for drivers to assume that something like
> > dma_cache_sync_for_cpu() thus has the semantics of an rmb() at least,
> > no ?
> 
> I still don't see why ... I don't see how you'd ever get a read of the
> area speculated before the event that tells the driver its OK to read
> the memory.  In theory, I agree that it looks logical to require the
> read never be speculated before the sync ... but in practice, I don't
> see there ever being a problem with this since the sync isn't the event
> that says the memory is safe to read.

Cheers,
Ben.

> James
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-12 23:08         ` Russell King - ARM Linux
@ 2010-05-14 17:41           ` Jamie Lokier
  2010-05-14 17:59             ` Russell King - ARM Linux
  0 siblings, 1 reply; 22+ messages in thread
From: Jamie Lokier @ 2010-05-14 17:41 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Benjamin Herrenschmidt, Saeed Bishara, Nicolas Pitre,
	linux-kernel@vger.kernel.org, James E.J. Bottomley,
	FUJITA Tomonori, Shilimkar, Santosh, Andrew Morton,
	linux-ext4@vger.kernel.org, linux-arm-kernel@lists.infradead.org

Russell King - ARM Linux wrote:
> On Thu, May 13, 2010 at 08:47:11AM +1000, Benjamin Herrenschmidt wrote:
> > Now, in the case at hand, which is my ARM based NAS, I believe this
> > is non cache-coherent and thus uses cache flush ops. I don't know ARM
> > well enough but I would expect these to be implicit barriers. Russell ?
> > Nico ?
> 
> ARMv5 doesn't have a weak memory ordering model, and doesn't have any
> memory barrier instructions.

It does have buffered writes, doesn't it?  Are they always flushed by
the cache flush ops?

-- Jamie

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182)
  2010-05-14 17:41           ` Jamie Lokier
@ 2010-05-14 17:59             ` Russell King - ARM Linux
  0 siblings, 0 replies; 22+ messages in thread
From: Russell King - ARM Linux @ 2010-05-14 17:59 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Benjamin Herrenschmidt, Saeed Bishara, Nicolas Pitre,
	linux-kernel@vger.kernel.org, James E.J. Bottomley,
	FUJITA Tomonori, Shilimkar, Santosh, Andrew Morton,
	linux-ext4@vger.kernel.org, linux-arm-kernel@lists.infradead.org

On Fri, May 14, 2010 at 06:41:53PM +0100, Jamie Lokier wrote:
> Russell King - ARM Linux wrote:
> > On Thu, May 13, 2010 at 08:47:11AM +1000, Benjamin Herrenschmidt wrote:
> > > Now, in the case at hand, which is my ARM based NAS, I believe this
> > > is non cache-coherent and thus uses cache flush ops. I don't know ARM
> > > well enough but I would expect these to be implicit barriers. Russell ?
> > > Nico ?
> > 
> > ARMv5 doesn't have a weak memory ordering model, and doesn't have any
> > memory barrier instructions.
> 
> It does have buffered writes, doesn't it?  Are they always flushed by
> the cache flush ops?

Surprisingly, for DMA it's something we've always done.  Odd that.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-05-14 17:59 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-11  9:23 Rampant ext3/4 corruption on 2.6.34-rc7 with VIVT ARM (Marvell 88f5182) Benjamin Herrenschmidt
2010-05-11 10:16 ` Jamie Lokier
2010-05-11 10:47   ` Benjamin Herrenschmidt
2010-05-11 10:57 ` Benjamin Herrenschmidt
2010-05-11 11:14   ` Shilimkar, Santosh
2010-05-12 22:21     ` Jamie Lokier
2010-05-12 22:47       ` Benjamin Herrenschmidt
2010-05-12 23:08         ` Russell King - ARM Linux
2010-05-14 17:41           ` Jamie Lokier
2010-05-14 17:59             ` Russell King - ARM Linux
2010-05-12 23:41         ` James Bottomley
2010-05-13  0:18           ` Benjamin Herrenschmidt
2010-05-13 15:39             ` James Bottomley
2010-05-13 23:53               ` Benjamin Herrenschmidt
2010-05-13  3:12         ` FUJITA Tomonori
2010-05-13  4:42           ` Benjamin Herrenschmidt
2010-05-12 15:00   ` Jan Kara
2010-05-12 22:13     ` Benjamin Herrenschmidt
2010-05-13  0:15     ` Benjamin Herrenschmidt
2010-05-13 15:12       ` Jan Kara
2010-05-13 21:33         ` Benjamin Herrenschmidt
2010-05-13 23:51         ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).