* Re: minix/ext2 + rd problem [not found] ` <alpine.DEB.1.10.0810150943350.3259@krichy.tvnetwork.hu> @ 2008-10-15 14:05 ` Nick Piggin 2008-10-15 14:10 ` Richard Kojedzinszky 0 siblings, 1 reply; 5+ messages in thread From: Nick Piggin @ 2008-10-15 14:05 UTC (permalink / raw) To: Richard Kojedzinszky; +Cc: linux-ext4, linux-fsdevel On Wed, Oct 15, 2008 at 10:19:44AM +0200, Richard Kojedzinszky wrote: > dear nick, > > i have tried a sync after the remount, but that did not help. what helped > is dropping the cache by echoing 3 to /proc/sys/vm/drop_caches, but this > still didnt solve the problem in 100%, only in 95% of the cases. > > But when i read the device with > # dd if=/dev/ram0 iflag=direct ... > then it worked. I think this bypassed some caches, and thus read the > actual data. > > But a sad result is that I experienced with it, and only with ramdisk does > it work as expected. for example with a logical volume it behaves in the > wrong way. I've reproduced this problem (ext2 image corruption flagged in e2fsck even though it was remounted ro and marked clean in the sb). Issuing a sync, then drop_caches, seems to fix it here for me. On the other hand, I also see problems with inconsistencies even after unmounting if I hold the /dev/ram0 device open with something else (which causes the buffer cache not to be invalidated on unmount). I think what is happening is that the block device is being modified without going through the buffer cache (ie. via pagecache or direct writes), but the buffer cache doesn't get invalidated. So you get stale data when reading from /dev/ram0. I don't think we're going to want the overhead in the kernel to detect these kinds of aliases. It might be reasonable to flush the blockdev on unmount and remount,ro after syncing the filesystem. The old rd driver's backing store was actually its buffercache, so that particular issue wouldn't cause aliasing. Thanks, Nick ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: minix/ext2 + rd problem 2008-10-15 14:05 ` minix/ext2 + rd problem Nick Piggin @ 2008-10-15 14:10 ` Richard Kojedzinszky 2008-10-15 14:34 ` Nick Piggin 0 siblings, 1 reply; 5+ messages in thread From: Richard Kojedzinszky @ 2008-10-15 14:10 UTC (permalink / raw) To: Nick Piggin; +Cc: linux-ext4, linux-fsdevel Dear Nick, Sorry for my stupid question, but how can i flush a blockdev? If i can do it without unmounting the fs i will be happy. Thanks in advance, Kojedzinszky Richard TvNetWork Nyrt. E-mail: krichy (at) tvnetwork [dot] hu PGP: 0x24E79141 Fingerprint = 6847 ECFF EF58 0C09 18A5 16CF 270F 0C6F 24E7 9141 On Wed, 15 Oct 2008, Nick Piggin wrote: > Date: Wed, 15 Oct 2008 16:05:23 +0200 > From: Nick Piggin <npiggin@suse.de> > To: Richard Kojedzinszky <krichy@tvnetwork.hu> > Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org > Subject: Re: minix/ext2 + rd problem > > On Wed, Oct 15, 2008 at 10:19:44AM +0200, Richard Kojedzinszky wrote: >> dear nick, >> >> i have tried a sync after the remount, but that did not help. what helped >> is dropping the cache by echoing 3 to /proc/sys/vm/drop_caches, but this >> still didnt solve the problem in 100%, only in 95% of the cases. >> >> But when i read the device with >> # dd if=/dev/ram0 iflag=direct ... >> then it worked. I think this bypassed some caches, and thus read the >> actual data. >> >> But a sad result is that I experienced with it, and only with ramdisk does >> it work as expected. for example with a logical volume it behaves in the >> wrong way. > > I've reproduced this problem (ext2 image corruption flagged in e2fsck > even though it was remounted ro and marked clean in the sb). > > Issuing a sync, then drop_caches, seems to fix it here for me. > > On the other hand, I also see problems with inconsistencies even after > unmounting if I hold the /dev/ram0 device open with something else (which > causes the buffer cache not to be invalidated on unmount). > > I think what is happening is that the block device is being modified > without going through the buffer cache (ie. via pagecache or direct > writes), but the buffer cache doesn't get invalidated. So you get stale > data when reading from /dev/ram0. > > I don't think we're going to want the overhead in the kernel to detect > these kinds of aliases. It might be reasonable to flush the blockdev > on unmount and remount,ro after syncing the filesystem. > > The old rd driver's backing store was actually its buffercache, so that > particular issue wouldn't cause aliasing. > > Thanks, > Nick > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: minix/ext2 + rd problem 2008-10-15 14:10 ` Richard Kojedzinszky @ 2008-10-15 14:34 ` Nick Piggin 2008-10-15 19:22 ` Matthew Wilcox 0 siblings, 1 reply; 5+ messages in thread From: Nick Piggin @ 2008-10-15 14:34 UTC (permalink / raw) To: Richard Kojedzinszky; +Cc: linux-ext4, linux-fsdevel On Wed, Oct 15, 2008 at 04:10:08PM +0200, Richard Kojedzinszky wrote: > Dear Nick, > > Sorry for my stupid question, but how can i flush a blockdev? If i can > do it without unmounting the fs i will be happy. > > Thanks in advance, Not a stupid question. Actually I meant that maybe the kernel should do the flush so people don't get surprised like this. You can flush and invalidate the blockdev with the --flushbufs argument to blockdev command. However you can't use this with ramdisk devices: someone thought it would be a good idea to save on precious ioctl space and implemented totally different semantics on that device with the same ioctl (it throws away the underlying data as well as the cache). Your direct io reads essentially do the same thing (and the kernel flushes the cache in that case to avoid a similar aliasing problem). Actually I would say direct IO to the block device is the safest option when you are working on the block device like this (does the direct IO read work for logical volumes?) Thanks, Nick > > > Kojedzinszky Richard > TvNetWork Nyrt. > E-mail: krichy (at) tvnetwork [dot] hu > PGP: 0x24E79141 > Fingerprint = 6847 ECFF EF58 0C09 18A5 16CF 270F 0C6F 24E7 9141 > > On Wed, 15 Oct 2008, Nick Piggin wrote: > > >Date: Wed, 15 Oct 2008 16:05:23 +0200 > >From: Nick Piggin <npiggin@suse.de> > >To: Richard Kojedzinszky <krichy@tvnetwork.hu> > >Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org > >Subject: Re: minix/ext2 + rd problem > > > >On Wed, Oct 15, 2008 at 10:19:44AM +0200, Richard Kojedzinszky wrote: > >>dear nick, > >> > >>i have tried a sync after the remount, but that did not help. what helped > >>is dropping the cache by echoing 3 to /proc/sys/vm/drop_caches, but this > >>still didnt solve the problem in 100%, only in 95% of the cases. > >> > >>But when i read the device with > >># dd if=/dev/ram0 iflag=direct ... > >>then it worked. I think this bypassed some caches, and thus read the > >>actual data. > >> > >>But a sad result is that I experienced with it, and only with ramdisk does > >>it work as expected. for example with a logical volume it behaves in the > >>wrong way. > > > >I've reproduced this problem (ext2 image corruption flagged in e2fsck > >even though it was remounted ro and marked clean in the sb). > > > >Issuing a sync, then drop_caches, seems to fix it here for me. > > > >On the other hand, I also see problems with inconsistencies even after > >unmounting if I hold the /dev/ram0 device open with something else (which > >causes the buffer cache not to be invalidated on unmount). > > > >I think what is happening is that the block device is being modified > >without going through the buffer cache (ie. via pagecache or direct > >writes), but the buffer cache doesn't get invalidated. So you get stale > >data when reading from /dev/ram0. > > > >I don't think we're going to want the overhead in the kernel to detect > >these kinds of aliases. It might be reasonable to flush the blockdev > >on unmount and remount,ro after syncing the filesystem. > > > >The old rd driver's backing store was actually its buffercache, so that > >particular issue wouldn't cause aliasing. > > > >Thanks, > >Nick > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: minix/ext2 + rd problem 2008-10-15 14:34 ` Nick Piggin @ 2008-10-15 19:22 ` Matthew Wilcox 2008-10-16 3:48 ` Nick Piggin 0 siblings, 1 reply; 5+ messages in thread From: Matthew Wilcox @ 2008-10-15 19:22 UTC (permalink / raw) To: Nick Piggin; +Cc: Richard Kojedzinszky, linux-ext4, linux-fsdevel On Wed, Oct 15, 2008 at 04:34:25PM +0200, Nick Piggin wrote: > You can flush and invalidate the blockdev with the --flushbufs argument > to blockdev command. However you can't use this with ramdisk devices: > someone thought it would be a good idea to save on precious ioctl space > and implemented totally different semantics on that device with the > same ioctl (it throws away the underlying data as well as the cache). What happens if we declare that a bug and fix it (and add a new ioctl to actually throw away the data ... oh, wait, we have one, it's BLKDISCARD)? -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: minix/ext2 + rd problem 2008-10-15 19:22 ` Matthew Wilcox @ 2008-10-16 3:48 ` Nick Piggin 0 siblings, 0 replies; 5+ messages in thread From: Nick Piggin @ 2008-10-16 3:48 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Richard Kojedzinszky, linux-ext4, linux-fsdevel On Wed, Oct 15, 2008 at 01:22:54PM -0600, Matthew Wilcox wrote: > On Wed, Oct 15, 2008 at 04:34:25PM +0200, Nick Piggin wrote: > > You can flush and invalidate the blockdev with the --flushbufs argument > > to blockdev command. However you can't use this with ramdisk devices: > > someone thought it would be a good idea to save on precious ioctl space > > and implemented totally different semantics on that device with the > > same ioctl (it throws away the underlying data as well as the cache). > > What happens if we declare that a bug and fix it (and add a new ioctl to > actually throw away the data ... oh, wait, we have one, it's BLKDISCARD)? Well... that's a good point. We probably could, because the worst someone will see is their backing store memory does not get freed. It won't munch someone's data. I'd love to do this, OTOH we've had the old behaviour, apparently documented and used by someone at some point, for a long time :( ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-10-16 3:48 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <alpine.DEB.1.10.0810141631130.30331@krichy.tvnetwork.hu>
[not found] ` <20081015041644.GA24613@wotan.suse.de>
[not found] ` <alpine.DEB.1.10.0810150943350.3259@krichy.tvnetwork.hu>
2008-10-15 14:05 ` minix/ext2 + rd problem Nick Piggin
2008-10-15 14:10 ` Richard Kojedzinszky
2008-10-15 14:34 ` Nick Piggin
2008-10-15 19:22 ` Matthew Wilcox
2008-10-16 3:48 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).