Re: minix/ext2 + rd problem

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: minix/ext2 + rd problem
       [not found]   ` <alpine.DEB.1.10.0810150943350.3259@krichy.tvnetwork.hu>
@ 2008-10-15 14:05     ` Nick Piggin
  2008-10-15 14:10       ` Richard Kojedzinszky
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Piggin @ 2008-10-15 14:05 UTC (permalink / raw)
  To: Richard Kojedzinszky; +Cc: linux-ext4, linux-fsdevel

On Wed, Oct 15, 2008 at 10:19:44AM +0200, Richard Kojedzinszky wrote:
> dear nick,
> 
> i have tried a sync after the remount, but that did not help. what helped 
> is dropping the cache by echoing 3 to /proc/sys/vm/drop_caches, but this 
> still didnt solve the problem in 100%, only in 95% of the cases.
> 
> But when i read the device with
> # dd if=/dev/ram0 iflag=direct ...
> then it worked. I think this bypassed some caches, and thus read the 
> actual data.
> 
> But a sad result is that I experienced with it, and only with ramdisk does 
> it work as expected. for example with a logical volume it behaves in the 
> wrong way.

I've reproduced this problem (ext2 image corruption flagged in e2fsck
even though it was remounted ro and marked clean in the sb).

Issuing a sync, then drop_caches, seems to fix it here for me.

On the other hand, I also see problems with inconsistencies even after
unmounting if I hold the /dev/ram0 device open with something else (which
causes the buffer cache not to be invalidated on unmount).

I think what is happening is that the block device is being modified
without going through the buffer cache (ie. via pagecache or direct
writes), but the buffer cache doesn't get invalidated. So you get stale
data when reading from /dev/ram0.

I don't think we're going to want the overhead in the kernel to detect
these kinds of aliases. It might be reasonable to flush the blockdev
on unmount and remount,ro after syncing the filesystem.

The old rd driver's backing store was actually its buffercache, so that
particular issue wouldn't cause aliasing.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: minix/ext2 + rd problem
  2008-10-15 14:05     ` minix/ext2 + rd problem Nick Piggin
@ 2008-10-15 14:10       ` Richard Kojedzinszky
  2008-10-15 14:34         ` Nick Piggin
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Kojedzinszky @ 2008-10-15 14:10 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-ext4, linux-fsdevel

Dear Nick,

Sorry for my stupid question, but how can i flush a blockdev? If i can 
do it without unmounting the fs i will be happy.

Thanks in advance,


Kojedzinszky Richard
TvNetWork Nyrt.
E-mail: krichy (at) tvnetwork [dot] hu
PGP: 0x24E79141
   Fingerprint = 6847 ECFF EF58 0C09 18A5  16CF 270F 0C6F 24E7 9141

On Wed, 15 Oct 2008, Nick Piggin wrote:

> Date: Wed, 15 Oct 2008 16:05:23 +0200
> From: Nick Piggin <npiggin@suse.de>
> To: Richard Kojedzinszky <krichy@tvnetwork.hu>
> Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
> Subject: Re: minix/ext2 + rd problem
> 
> On Wed, Oct 15, 2008 at 10:19:44AM +0200, Richard Kojedzinszky wrote:
>> dear nick,
>>
>> i have tried a sync after the remount, but that did not help. what helped
>> is dropping the cache by echoing 3 to /proc/sys/vm/drop_caches, but this
>> still didnt solve the problem in 100%, only in 95% of the cases.
>>
>> But when i read the device with
>> # dd if=/dev/ram0 iflag=direct ...
>> then it worked. I think this bypassed some caches, and thus read the
>> actual data.
>>
>> But a sad result is that I experienced with it, and only with ramdisk does
>> it work as expected. for example with a logical volume it behaves in the
>> wrong way.
>
> I've reproduced this problem (ext2 image corruption flagged in e2fsck
> even though it was remounted ro and marked clean in the sb).
>
> Issuing a sync, then drop_caches, seems to fix it here for me.
>
> On the other hand, I also see problems with inconsistencies even after
> unmounting if I hold the /dev/ram0 device open with something else (which
> causes the buffer cache not to be invalidated on unmount).
>
> I think what is happening is that the block device is being modified
> without going through the buffer cache (ie. via pagecache or direct
> writes), but the buffer cache doesn't get invalidated. So you get stale
> data when reading from /dev/ram0.
>
> I don't think we're going to want the overhead in the kernel to detect
> these kinds of aliases. It might be reasonable to flush the blockdev
> on unmount and remount,ro after syncing the filesystem.
>
> The old rd driver's backing store was actually its buffercache, so that
> particular issue wouldn't cause aliasing.
>
> Thanks,
> Nick
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: minix/ext2 + rd problem
  2008-10-15 14:10       ` Richard Kojedzinszky
@ 2008-10-15 14:34         ` Nick Piggin
  2008-10-15 19:22           ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Piggin @ 2008-10-15 14:34 UTC (permalink / raw)
  To: Richard Kojedzinszky; +Cc: linux-ext4, linux-fsdevel

On Wed, Oct 15, 2008 at 04:10:08PM +0200, Richard Kojedzinszky wrote:
> Dear Nick,
> 
> Sorry for my stupid question, but how can i flush a blockdev? If i can 
> do it without unmounting the fs i will be happy.
> 
> Thanks in advance,

Not a stupid question. Actually I meant that maybe the kernel should do
the flush so people don't get surprised like this.

You can flush and invalidate the blockdev with the --flushbufs argument
to blockdev command. However you can't use this with ramdisk devices:
someone thought it would be a good idea to save on precious ioctl space
and implemented totally different semantics on that device with the
same ioctl (it throws away the underlying data as well as the cache).

Your direct io reads essentially do the same thing (and the kernel flushes
the cache in that case to avoid a similar aliasing problem). Actually I
would say direct IO to the block device is the safest option when you are
working on the block device like this (does the direct IO read work for
logical volumes?)

Thanks,
Nick

> 
> 
> Kojedzinszky Richard
> TvNetWork Nyrt.
> E-mail: krichy (at) tvnetwork [dot] hu
> PGP: 0x24E79141
>   Fingerprint = 6847 ECFF EF58 0C09 18A5  16CF 270F 0C6F 24E7 9141
> 
> On Wed, 15 Oct 2008, Nick Piggin wrote:
> 
> >Date: Wed, 15 Oct 2008 16:05:23 +0200
> >From: Nick Piggin <npiggin@suse.de>
> >To: Richard Kojedzinszky <krichy@tvnetwork.hu>
> >Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org
> >Subject: Re: minix/ext2 + rd problem
> >
> >On Wed, Oct 15, 2008 at 10:19:44AM +0200, Richard Kojedzinszky wrote:
> >>dear nick,
> >>
> >>i have tried a sync after the remount, but that did not help. what helped
> >>is dropping the cache by echoing 3 to /proc/sys/vm/drop_caches, but this
> >>still didnt solve the problem in 100%, only in 95% of the cases.
> >>
> >>But when i read the device with
> >># dd if=/dev/ram0 iflag=direct ...
> >>then it worked. I think this bypassed some caches, and thus read the
> >>actual data.
> >>
> >>But a sad result is that I experienced with it, and only with ramdisk does
> >>it work as expected. for example with a logical volume it behaves in the
> >>wrong way.
> >
> >I've reproduced this problem (ext2 image corruption flagged in e2fsck
> >even though it was remounted ro and marked clean in the sb).
> >
> >Issuing a sync, then drop_caches, seems to fix it here for me.
> >
> >On the other hand, I also see problems with inconsistencies even after
> >unmounting if I hold the /dev/ram0 device open with something else (which
> >causes the buffer cache not to be invalidated on unmount).
> >
> >I think what is happening is that the block device is being modified
> >without going through the buffer cache (ie. via pagecache or direct
> >writes), but the buffer cache doesn't get invalidated. So you get stale
> >data when reading from /dev/ram0.
> >
> >I don't think we're going to want the overhead in the kernel to detect
> >these kinds of aliases. It might be reasonable to flush the blockdev
> >on unmount and remount,ro after syncing the filesystem.
> >
> >The old rd driver's backing store was actually its buffercache, so that
> >particular issue wouldn't cause aliasing.
> >
> >Thanks,
> >Nick
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: minix/ext2 + rd problem
  2008-10-15 14:34         ` Nick Piggin
@ 2008-10-15 19:22           ` Matthew Wilcox
  2008-10-16  3:48             ` Nick Piggin
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2008-10-15 19:22 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Richard Kojedzinszky, linux-ext4, linux-fsdevel

On Wed, Oct 15, 2008 at 04:34:25PM +0200, Nick Piggin wrote:
> You can flush and invalidate the blockdev with the --flushbufs argument
> to blockdev command. However you can't use this with ramdisk devices:
> someone thought it would be a good idea to save on precious ioctl space
> and implemented totally different semantics on that device with the
> same ioctl (it throws away the underlying data as well as the cache).

What happens if we declare that a bug and fix it (and add a new ioctl to
actually throw away the data ... oh, wait, we have one, it's BLKDISCARD)?

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: minix/ext2 + rd problem
  2008-10-15 19:22           ` Matthew Wilcox
@ 2008-10-16  3:48             ` Nick Piggin
  0 siblings, 0 replies; 5+ messages in thread
From: Nick Piggin @ 2008-10-16  3:48 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Richard Kojedzinszky, linux-ext4, linux-fsdevel

On Wed, Oct 15, 2008 at 01:22:54PM -0600, Matthew Wilcox wrote:
> On Wed, Oct 15, 2008 at 04:34:25PM +0200, Nick Piggin wrote:
> > You can flush and invalidate the blockdev with the --flushbufs argument
> > to blockdev command. However you can't use this with ramdisk devices:
> > someone thought it would be a good idea to save on precious ioctl space
> > and implemented totally different semantics on that device with the
> > same ioctl (it throws away the underlying data as well as the cache).
> 
> What happens if we declare that a bug and fix it (and add a new ioctl to
> actually throw away the data ... oh, wait, we have one, it's BLKDISCARD)?

Well... that's a good point. We probably could, because the worst someone
will see is their backing store memory does not get freed. It won't munch
someone's data.

I'd love to do this, OTOH we've had the old behaviour, apparently documented
and used by someone at some point, for a long time :(

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-10-16  3:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <alpine.DEB.1.10.0810141631130.30331@krichy.tvnetwork.hu>
     [not found] ` <20081015041644.GA24613@wotan.suse.de>
     [not found]   ` <alpine.DEB.1.10.0810150943350.3259@krichy.tvnetwork.hu>
2008-10-15 14:05     ` minix/ext2 + rd problem Nick Piggin
2008-10-15 14:10       ` Richard Kojedzinszky
2008-10-15 14:34         ` Nick Piggin
2008-10-15 19:22           ` Matthew Wilcox
2008-10-16  3:48             ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).