* [linux-lvm] LVM 2.2 snapshot bug
@ 2000-11-07 10:55 Rik van Riel
2000-11-07 12:16 ` Heinz J. Mauelshagen
2000-11-07 13:21 ` [linux-lvm] " Andrea Arcangeli
0 siblings, 2 replies; 20+ messages in thread
From: Rik van Riel @ 2000-11-07 10:55 UTC (permalink / raw)
To: linux-lvm; +Cc: marcelo, andrea
Hi,
I think I found an {easy to fix, very annoying} bug in the
2.2.<ver>aa derived kernel LVM drivers (which, most likely,
is also in Heinz' drivers).
On snapshot creation, the snapshot block device (/dev/vg0/snap1)
is NOT made a read-only device, so ext3 tries to do journal
recovery when the snapshot device is mounted...
(leading to all kinds of nasty oopses)
It should be easy enough to do an set_device_ro() on the LVM
snapshot, shouldn't it?
That would fix the oopses I've been seeing and would make
the snapshot "more useful" ... I hope a fix will be available
soon since I want to use this feature for NL.linux.org :)
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
http://www.conectiva.com/ http://www.surriel.com/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linux-lvm] LVM 2.2 snapshot bug
2000-11-07 10:55 [linux-lvm] LVM 2.2 snapshot bug Rik van Riel
@ 2000-11-07 12:16 ` Heinz J. Mauelshagen
2000-11-07 14:42 ` Rik van Riel
2000-11-07 13:21 ` [linux-lvm] " Andrea Arcangeli
1 sibling, 1 reply; 20+ messages in thread
From: Heinz J. Mauelshagen @ 2000-11-07 12:16 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-lvm
Hi Rik,
this is already fixed in the LVM 0.9 source.
Thanks,
Heinz -- The LVM guy --
On Tue, Nov 07, 2000 at 11:55:42AM +0100, Rik van Riel wrote:
> Hi,
>
> I think I found an {easy to fix, very annoying} bug in the
> 2.2.<ver>aa derived kernel LVM drivers (which, most likely,
> is also in Heinz' drivers).
>
> On snapshot creation, the snapshot block device (/dev/vg0/snap1)
> is NOT made a read-only device, so ext3 tries to do journal
> recovery when the snapshot device is mounted...
>
> (leading to all kinds of nasty oopses)
>
> It should be easy enough to do an set_device_ro() on the LVM
> snapshot, shouldn't it?
Yes, that's what i do in 0.9.
>
> That would fix the oopses I've been seeing and would make
> the snapshot "more useful" ... I hope a fix will be available
> soon since I want to use this feature for NL.linux.org :)
I'll take care of that.
>
> regards,
>
> Rik
> --
> The Internet is not a network of computers. It is a network
> of people. That is its real strength.
>
> http://www.conectiva.com/ http://www.surriel.com/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Heinz Mauelshagen Sistina Software Inc.
Senior Consultant/Developer Bartningstr. 12
64289 Darmstadt
Germany
Mauelshagen@Sistina.com +49 6151 7103 86
FAX 7103 96
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
^ permalink raw reply [flat|nested] 20+ messages in thread
* [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-07 10:55 [linux-lvm] LVM 2.2 snapshot bug Rik van Riel
2000-11-07 12:16 ` Heinz J. Mauelshagen
@ 2000-11-07 13:21 ` Andrea Arcangeli
2000-11-07 14:56 ` Rik van Riel
1 sibling, 1 reply; 20+ messages in thread
From: Andrea Arcangeli @ 2000-11-07 13:21 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-lvm, marcelo
On Tue, Nov 07, 2000 at 11:55:42AM +0100, Rik van Riel wrote:
> On snapshot creation, the snapshot block device (/dev/vg0/snap1)
> is NOT made a read-only device, so ext3 tries to do journal
It is made read only.
> (leading to all kinds of nasty oopses)
Could you show me the Oopses?
> It should be easy enough to do an set_device_ro() on the LVM
> snapshot, shouldn't it?
That shouldn't be necessary. The way LVM handle this looks correct
to me, but maybe it's never been tested in the ll_rw_block layer because the
open(O_RDWR) check always handled it with ext2. Maybe ext3 forces writes via
ll_rw_block also when the device is mounted read only (probably when doing log
reply?) and maybe it hits the ll_rw_block check for the first time.
Currently we choose if a device is readable or not using the
VM_WRITE bitflag in the lv->lv_access field. If the bitflag is set
the device is writeable. You'll find that the snapshots has that bitflag
unset (please verify via /proc that they don't have the W capability set).
If the snapshot is not writeable as expected, the lvm hook in ll_rw_block
should return -1 and ll_rw_block should goto sorry just like if we would be
using the ro_bits via set_device_ro. So it should not be necessary
to use the set_device_ro.
Or maybe the bug is in ext3 that doesn't handle real read only blockdevices?
> That would fix the oopses I've been seeing and would make
Hopefully the Oopses will tell us more about this ext3/snapshot collision.
Also please ensure you can reproduce with 2.2.18pre17aa1 or 2.4.0-test10 to
make sure we're looking at the same sources.
Andrea
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linux-lvm] LVM 2.2 snapshot bug
2000-11-07 12:16 ` Heinz J. Mauelshagen
@ 2000-11-07 14:42 ` Rik van Riel
0 siblings, 0 replies; 20+ messages in thread
From: Rik van Riel @ 2000-11-07 14:42 UTC (permalink / raw)
To: Mauelshagen; +Cc: linux-lvm
On Tue, 7 Nov 2000, Heinz J. Mauelshagen wrote:
> this is already fixed in the LVM 0.9 source.
Do you have a patch available for LVM 0.8 too?
I'd really like to use this feature for NL.linux.org,
but I don't think I want to use LVM 0.9 from the start :)
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
http://www.conectiva.com/ http://www.surriel.com/
^ permalink raw reply [flat|nested] 20+ messages in thread
* [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-07 13:21 ` [linux-lvm] " Andrea Arcangeli
@ 2000-11-07 14:56 ` Rik van Riel
2000-11-07 16:45 ` Andrea Arcangeli
0 siblings, 1 reply; 20+ messages in thread
From: Rik van Riel @ 2000-11-07 14:56 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-lvm, marcelo
On Tue, 7 Nov 2000, Andrea Arcangeli wrote:
> On Tue, Nov 07, 2000 at 11:55:42AM +0100, Rik van Riel wrote:
> > On snapshot creation, the snapshot block device (/dev/vg0/snap1)
> > is NOT made a read-only device, so ext3 tries to do journal
>
> It is made read only.
No it isn't. It is made into a read-write block device
where all writes fail :)
kernel: Bad lvm_map in ll_rw_block
kernel: lvm - lvm_map: ll_rw_blk write for readonly LV /dev/vg0/snap1
> > (leading to all kinds of nasty oopses)
>
> Could you show me the Oopses?
That's a bit much to type in by hand ... and it's basically
kjournald being confused by all its writes failing on a RW
block device.
> > It should be easy enough to do an set_device_ro() on the LVM
> > snapshot, shouldn't it?
>
> That shouldn't be necessary. The way LVM handle this looks correct to
> me,
Please look again...
> but maybe it's never been tested in the ll_rw_block layer because the
> open(O_RDWR) check always handled it with ext2. Maybe ext3 forces
> writes via ll_rw_block also when the device is mounted read only
> (probably when doing log reply?) and maybe it hits the ll_rw_block
> check for the first time.
Indeed this is the case. When the block device is read-write
(the is_read_only(blk_dev) is non-true) it tries to replay
the log, even for a read-only mounted FS.
> Currently we choose if a device is readable or not using the VM_WRITE
> bitflag in the lv->lv_access field. If the bitflag is set the device
> is writeable. You'll find that the snapshots has that bitflag unset
> (please verify via /proc that they don't have the W capability set).
That bitflag is indeed not set, the snapshot is read-only,
but that status isn't propagated up to the block device
layer.
> Or maybe the bug is in ext3 that doesn't handle real read only blockdevices?
Real read only block devices show their status at the block
device layer :)
> Also please ensure you can reproduce with 2.2.18pre17aa1 or
> 2.4.0-test10 to make sure we're looking at the same sources.
It's with the LVM from the Conectiva kernel RPM, which
uses the source code from your 2.2 LVM driver.
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
http://www.conectiva.com/ http://www.surriel.com/
^ permalink raw reply [flat|nested] 20+ messages in thread
* [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-07 14:56 ` Rik van Riel
@ 2000-11-07 16:45 ` Andrea Arcangeli
2000-11-07 17:04 ` Stephen C. Tweedie
2000-11-07 23:04 ` Rik van Riel
0 siblings, 2 replies; 20+ messages in thread
From: Andrea Arcangeli @ 2000-11-07 16:45 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-lvm, marcelo, Stephen C. Tweedie
On Tue, Nov 07, 2000 at 03:56:59PM +0100, Rik van Riel wrote:
> That's a bit much to type in by hand ... and it's basically
> kjournald being confused by all its writes failing on a RW
> block device.
So ext3 will crash also if I/O errors happen during the log reply. The Oopses
seems due an _ext3_ bug (not due the missing ro_bits in the LVM snapshot)
as far I can tell.
> Indeed this is the case. When the block device is read-write
> (the is_read_only(blk_dev) is non-true) it tries to replay
> the log, even for a read-only mounted FS.
Ok, I agree it's a minor LVM bug, but again I can't see how that minor bug can
cause oopses and I think setting ro_bits won't fix the real bug but it will
only hide it.
BTW, LVM also internally checks for the LV_WRITE bitflag during open(2) so any
attempt to open the snapshot RW will fail return -EACCESS as expected.
> It's with the LVM from the Conectiva kernel RPM, which
> uses the source code from your 2.2 LVM driver.
OK.
I will fix the is_read_only thing for the snapshot but you should make sure the
bug that is oopsing your machine gets fixed too :).
Andrea
^ permalink raw reply [flat|nested] 20+ messages in thread
* [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-07 16:45 ` Andrea Arcangeli
@ 2000-11-07 17:04 ` Stephen C. Tweedie
2000-11-07 19:51 ` Andrea Arcangeli
2000-11-07 23:04 ` Rik van Riel
1 sibling, 1 reply; 20+ messages in thread
From: Stephen C. Tweedie @ 2000-11-07 17:04 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Rik van Riel, linux-lvm, marcelo, Stephen C. Tweedie
Hi,
On Tue, Nov 07, 2000 at 05:45:04PM +0100, Andrea Arcangeli wrote:
> On Tue, Nov 07, 2000 at 03:56:59PM +0100, Rik van Riel wrote:
> > That's a bit much to type in by hand ... and it's basically
> > kjournald being confused by all its writes failing on a RW
> > block device.
>
> So ext3 will crash also if I/O errors happen during the log reply. The Oopses
> seems due an _ext3_ bug (not due the missing ro_bits in the LVM snapshot)
> as far I can tell.
The current ext3 includes debugging code to trap invariants which the
filesystem expects to be guaranteed, and it's entirely possible that
their over-cautious checking is trapping on such failed writes. Send
me an oops and I can deal with it.
> > Indeed this is the case. When the block device is read-write
> > (the is_read_only(blk_dev) is non-true) it tries to replay
> > the log, even for a read-only mounted FS.
>
> Ok, I agree it's a minor LVM bug, but again I can't see how that minor bug can
> cause oopses and I think setting ro_bits won't fix the real bug but it will
> only hide it.
It's a major bug as far as ext3 is concerned, because filesystem
recovery is a critical prerequisite for mounting a filesystem, and
that requires write access. ext3 has to be able to trust the ro bits
in order to know whether it is safe to perform recovery writes for a
mount, or whether the mount must be rejected because recovery cannot
take place.
Cheers,
Stephen
^ permalink raw reply [flat|nested] 20+ messages in thread
* [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-07 17:04 ` Stephen C. Tweedie
@ 2000-11-07 19:51 ` Andrea Arcangeli
2000-11-07 20:36 ` Andreas Dilger
2000-11-08 11:10 ` Stephen C. Tweedie
0 siblings, 2 replies; 20+ messages in thread
From: Andrea Arcangeli @ 2000-11-07 19:51 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Rik van Riel, linux-lvm, marcelo
On Tue, Nov 07, 2000 at 05:04:20PM +0000, Stephen C. Tweedie wrote:
> The current ext3 includes debugging code to trap invariants which the
> filesystem expects to be guaranteed, and it's entirely possible that
> their over-cautious checking is trapping on such failed writes. Send
> me an oops and I can deal with it.
I don't have the Oops (I asked for it too) but Rik should have it.
> > > Indeed this is the case. When the block device is read-write
> > > (the is_read_only(blk_dev) is non-true) it tries to replay
> > > the log, even for a read-only mounted FS.
> >
> > Ok, I agree it's a minor LVM bug, but again I can't see how that minor bug can
> > cause oopses and I think setting ro_bits won't fix the real bug but it will
> > only hide it.
>
> It's a major bug as far as ext3 is concerned, because filesystem
> recovery is a critical prerequisite for mounting a filesystem, and
> that requires write access. ext3 has to be able to trust the ro bits
> in order to know whether it is safe to perform recovery writes for a
> mount, or whether the mount must be rejected because recovery cannot
> take place.
Stephen, the floppy device is doing exactly the same thing of LVM.
I don't think it's a major bug. The _only_ downside of the bug is that
it will generate I/O errors when you try to write to the device via
ll_rw_block (no oopses, no corruption), and those I/O errors will happen
anyways in real world too with real harddisk so we must able to cope with them
regardless.
The bug that hurted Rik is ext3 that is not able to deal with I/O error
properly during recovery and if you fix that, then LVM snapshot not setting
ro_bits will be a minor problem IMHO.
Infact I'm not even sure if it worth to have LVM snapshot to set ro_bits
given it will soon become a writeable snapshot (so that we can do recovery
on it too :).
Andrea
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-07 19:51 ` Andrea Arcangeli
@ 2000-11-07 20:36 ` Andreas Dilger
2000-11-08 11:11 ` Stephen C. Tweedie
2000-11-08 16:16 ` Andrea Arcangeli
2000-11-08 11:10 ` Stephen C. Tweedie
1 sibling, 2 replies; 20+ messages in thread
From: Andreas Dilger @ 2000-11-07 20:36 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Stephen C. Tweedie, Rik van Riel, linux-lvm, marcelo
Andreas writes:
> The bug that hurted Rik is ext3 that is not able to deal with I/O error
> properly during recovery and if you fix that, then LVM snapshot not setting
> ro_bits will be a minor problem IMHO.
The problem is that I/O errors during journal recovery would mean a
corrupt filesystem. The ext3 recovery code (now) does the correct thing
and aborts the journal recovery and refuses to mount the filesystem in
this case. It is up to e2fsck to fix problems inside the journal.
It should be possible to use an ext3 LVM snapshot for backup purposes
by using the ext2 dump program (which does raw partition access), but
currently not for simply mounting the filesystem.
It may be possible to change the ext3 code to continue to mount a
filesystem without journal recovery if it is really on a R/O device.
The real problem with this is that you aren't sure that what is on the
filesystem is consistent, because it may be half in the journal - the
fsync_dev() call should mostly handle this.
It would probably be a lot better to simply change ext3 to handle the
"proper snapshot" stuff that is in ReiserFS, so that at the time of the
snapshot, the journal is flushed and the RECOVER flag is removed from
the superblock (and returned afterwards). However, this is also a LVM
0.9 fix.
> Infact I'm not even sure if it worth to have LVM snapshot to set ro_bits
> given it will soon become a writeable snapshot (so that we can do recovery
> on it too :).
It would still be good to have a workable solution for the 0.8final code,
because it may be that LVM 0.9 will not make it into the 2.4 kernel,
and even if it does, people may still want to use existing 0.8 LVM.
Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
^ permalink raw reply [flat|nested] 20+ messages in thread
* [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-07 16:45 ` Andrea Arcangeli
2000-11-07 17:04 ` Stephen C. Tweedie
@ 2000-11-07 23:04 ` Rik van Riel
2000-11-08 7:55 ` Heinz Mauelshagen
2000-11-08 16:31 ` Andrea Arcangeli
1 sibling, 2 replies; 20+ messages in thread
From: Rik van Riel @ 2000-11-07 23:04 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-lvm, marcelo, Stephen C. Tweedie
On Tue, 7 Nov 2000, Andrea Arcangeli wrote:
> On Tue, Nov 07, 2000 at 03:56:59PM +0100, Rik van Riel wrote:
> > That's a bit much to type in by hand ... and it's basically
> > kjournald being confused by all its writes failing on a RW
> > block device.
>
> So ext3 will crash also if I/O errors happen during the log reply.
> The Oopses seems due an _ext3_ bug (not due the missing ro_bits in the
> LVM snapshot) as far I can tell.
I haven't checked yet if LVM actually returns an error
to ext3 or if it just silently (well, except for the
syslog noise) discards the data :)
> > Indeed this is the case. When the block device is read-write
> > (the is_read_only(blk_dev) is non-true) it tries to replay
> > the log, even for a read-only mounted FS.
>
> Ok, I agree it's a minor LVM bug, but again I can't see how that minor
> bug can cause oopses and I think setting ro_bits won't fix the real
> bug but it will only hide it.
Exposing a read-only device as read-write to the users
will cause a bit of confusion, yes :)
> BTW, LVM also internally checks for the LV_WRITE bitflag during
> open(2) so any attempt to open the snapshot RW will fail return
> -EACCESS as expected.
Indeed, I saw this in my syslog...
> > It's with the LVM from the Conectiva kernel RPM, which
> > uses the source code from your 2.2 LVM driver.
>
> OK.
>
> I will fix the is_read_only thing for the snapshot but you should make
> sure the bug that is oopsing your machine gets fixed too :).
*nod*
Though I guess Stephen's decision to do log replay on
read-only mounted filesystems on read-write block
devices is certainly a defendable decision. Btw, don't
the reiserfs people do the same?
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
http://www.conectiva.com/ http://www.surriel.com/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-07 23:04 ` Rik van Riel
@ 2000-11-08 7:55 ` Heinz Mauelshagen
2000-11-08 13:44 ` Rik van Riel
2000-11-08 16:31 ` Andrea Arcangeli
1 sibling, 1 reply; 20+ messages in thread
From: Heinz Mauelshagen @ 2000-11-08 7:55 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-lvm, sct
On Wed, Nov 08, 2000 at 12:04:50AM +0100, Rik van Riel wrote:
> On Tue, 7 Nov 2000, Andrea Arcangeli wrote:
> > On Tue, Nov 07, 2000 at 03:56:59PM +0100, Rik van Riel wrote:
> > > That's a bit much to type in by hand ... and it's basically
> > > kjournald being confused by all its writes failing on a RW
> > > block device.
> >
> > So ext3 will crash also if I/O errors happen during the log reply.
> > The Oopses seems due an _ext3_ bug (not due the missing ro_bits in the
> > LVM snapshot) as far I can tell.
>
> I haven't checked yet if LVM actually returns an error
> to ext3 or if it just silently (well, except for the
> syslog noise) discards the data :)
>
> > > Indeed this is the case. When the block device is read-write
> > > (the is_read_only(blk_dev) is non-true) it tries to replay
> > > the log, even for a read-only mounted FS.
> >
> > Ok, I agree it's a minor LVM bug, but again I can't see how that minor
> > bug can cause oopses and I think setting ro_bits won't fix the real
> > bug but it will only hide it.
>
> Exposing a read-only device as read-write to the users
> will cause a bit of confusion, yes :)
>
> > BTW, LVM also internally checks for the LV_WRITE bitflag during
> > open(2) so any attempt to open the snapshot RW will fail return
> > -EACCESS as expected.
>
> Indeed, I saw this in my syslog...
>
> > > It's with the LVM from the Conectiva kernel RPM, which
> > > uses the source code from your 2.2 LVM driver.
> >
> > OK.
> >
> > I will fix the is_read_only thing for the snapshot but you should make
> > sure the bug that is oopsing your machine gets fixed too :).
>
> *nod*
>
> Though I guess Stephen's decision to do log replay on
> read-only mounted filesystems on read-write block
> devices is certainly a defendable decision. Btw, don't
> the reiserfs people do the same?
Yes.
Chris Mason and I came up with a VFS extension which flushes the Journal,
sets the FS to a clean state and locks it to enable LVM to activate the
snapshot and to unlock the filesystem again.
BTW: Stephen was involved with the design/implementation and we presented
it at the Linux Storage Management Workshop in Miami, Rik.
--
Regards,
Heinz -- The LVM guy --
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Heinz Mauelshagen Sistina Software Inc.
Senior Consultant/Developer Bartningstr. 12
64289 Darmstadt
Germany
Mauelshagen@Sistina.com +49 6151 7103 86
FAX 7103 96
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
^ permalink raw reply [flat|nested] 20+ messages in thread
* [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-07 19:51 ` Andrea Arcangeli
2000-11-07 20:36 ` Andreas Dilger
@ 2000-11-08 11:10 ` Stephen C. Tweedie
2000-11-08 16:53 ` Andrea Arcangeli
1 sibling, 1 reply; 20+ messages in thread
From: Stephen C. Tweedie @ 2000-11-08 11:10 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Stephen C. Tweedie, Rik van Riel, linux-lvm, marcelo
Hi,
On Tue, Nov 07, 2000 at 08:51:37PM +0100, Andrea Arcangeli wrote:
> On Tue, Nov 07, 2000 at 05:04:20PM +0000, Stephen C. Tweedie wrote:
> > It's a major bug as far as ext3 is concerned, because filesystem
> > recovery is a critical prerequisite for mounting a filesystem, and
> > that requires write access. ext3 has to be able to trust the ro bits
> > in order to know whether it is safe to perform recovery writes for a
> > mount, or whether the mount must be rejected because recovery cannot
> > take place.
>
> Stephen, the floppy device is doing exactly the same thing of LVM.
"floppy.c has this bug so it's OK for a major storage infrastructure
device to have the same bug." Hmm!
> I don't think it's a major bug. The _only_ downside of the bug is that
> it will generate I/O errors when you try to write to the device via
> ll_rw_block (no oopses, no corruption)
Worse --- it's the difference between having the kernel tell the user
"sorry, this operation is not possible" and saying "sure, fine"
followed by a thousand IO errors in the syslog when the user tries to
mount the ext3 filesystem. Journal recovery can generate several
megabytes of write requests, and each request is going to fail
noisily (failing writes silently during recovery is not an option).
Cheers,
Stephen
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-07 20:36 ` Andreas Dilger
@ 2000-11-08 11:11 ` Stephen C. Tweedie
2000-11-08 16:16 ` Andrea Arcangeli
1 sibling, 0 replies; 20+ messages in thread
From: Stephen C. Tweedie @ 2000-11-08 11:11 UTC (permalink / raw)
To: Andreas Dilger
Cc: Andrea Arcangeli, Stephen C. Tweedie, Rik van Riel, linux-lvm,
marcelo
Hi,
On Tue, Nov 07, 2000 at 01:36:06PM -0700, Andreas Dilger wrote:
> Andreas writes:
> It would probably be a lot better to simply change ext3 to handle the
> "proper snapshot" stuff that is in ReiserFS, so that at the time of the
> snapshot, the journal is flushed and the RECOVER flag is removed from
> the superblock (and returned afterwards).
Right --- that's easy to do, but it doesn't mean that it's OK to lie
about the ro_bits!
Cheers,
Stephen
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-08 7:55 ` Heinz Mauelshagen
@ 2000-11-08 13:44 ` Rik van Riel
0 siblings, 0 replies; 20+ messages in thread
From: Rik van Riel @ 2000-11-08 13:44 UTC (permalink / raw)
To: Heinz Mauelshagen; +Cc: linux-lvm, sct
On Wed, 8 Nov 2000, Heinz Mauelshagen wrote:
> Chris Mason and I came up with a VFS extension which flushes the Journal,
> sets the FS to a clean state and locks it to enable LVM to activate the
> snapshot and to unlock the filesystem again.
Indeed, I know about this idea and like it a lot.
However, I intend to install a 2.2 kernel with
LVM 0.8 and ext3 for NL.linux.org this week, so
I'm mainly looking for fixes that make stuff
work _now_ :)
regards,
Rik
--
The Internet is not a network of computers. It is a network
of people. That is its real strength.
http://www.conectiva.com/ http://www.surriel.com/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-07 20:36 ` Andreas Dilger
2000-11-08 11:11 ` Stephen C. Tweedie
@ 2000-11-08 16:16 ` Andrea Arcangeli
2000-11-08 19:08 ` Andreas Dilger
1 sibling, 1 reply; 20+ messages in thread
From: Andrea Arcangeli @ 2000-11-08 16:16 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Stephen C. Tweedie, Rik van Riel, linux-lvm, marcelo
On Tue, Nov 07, 2000 at 01:36:06PM -0700, Andreas Dilger wrote:
> The problem is that I/O errors during journal recovery would mean a
> corrupt filesystem. The ext3 recovery code (now) does the correct thing
> and aborts the journal recovery [..]
ext3 oopses the kernel after getting I/O errors during the writes. I don't
consider that the correct thing.
Andrea
^ permalink raw reply [flat|nested] 20+ messages in thread
* [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-07 23:04 ` Rik van Riel
2000-11-08 7:55 ` Heinz Mauelshagen
@ 2000-11-08 16:31 ` Andrea Arcangeli
[not found] ` <898720000.973701988@coffee>
1 sibling, 1 reply; 20+ messages in thread
From: Andrea Arcangeli @ 2000-11-08 16:31 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-lvm, marcelo, Stephen C. Tweedie, Chris Mason
On Wed, Nov 08, 2000 at 12:04:50AM +0100, Rik van Riel wrote:
> I haven't checked yet if LVM actually returns an error
> to ext3 or if it just silently (well, except for the
After ll_rw_block(WRITE) returns the buffer will be clean and _not_ uptodate.
That's the only way a blockdevice reports I/O errors to highlevel layers.
> Exposing a read-only device as read-write to the users
> will cause a bit of confusion, yes :)
I agree.
> Though I guess Stephen's decision to do log replay on
> read-only mounted filesystems on read-write block
> devices is certainly a defendable decision. Btw, don't
Yes, I'm not complaining that decsion.
What I am saying is that the major bug here is that ext3 Oopses the kernel when
it gets I/O errors because of faulty harddisk during recovery, not that LVM
forgot to set the ro_bits just like the floppy disk when it's read only and
the floppy disk case is likely to be unfixable infact. I can't see any
bug in LVM that could explain an Oops.
>the reiserfs people do the same?
About reiserfs I don't know what it does but I guess it does the same.
Chris, could you try to mount reiserfs in readonly mode on a snapshot
blockdevice and see what happens?
Andrea
^ permalink raw reply [flat|nested] 20+ messages in thread
* [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-08 11:10 ` Stephen C. Tweedie
@ 2000-11-08 16:53 ` Andrea Arcangeli
0 siblings, 0 replies; 20+ messages in thread
From: Andrea Arcangeli @ 2000-11-08 16:53 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Rik van Riel, linux-lvm, marcelo
On Wed, Nov 08, 2000 at 11:10:28AM +0000, Stephen C. Tweedie wrote:
> "sorry, this operation is not possible" and saying "sure, fine"
> followed by a thousand IO errors in the syslog when the user tries to
> mount the ext3 filesystem. Journal recovery can generate several
I agree the bug is annoying but OTOH it's also _harmless_ as far I can see and
I _much_ prefer to fix by making the snapshot writeable (side note: writeable
and persistent are completly orthogonal features). Note that you will get I/O
errors if during the writes you run out of snapshot space (this is possible if
you given less space to the snapshot than to the real blockdevice) so expect
I/O errors in that case too. Really we could allow the creation of a writeable
snapshot only if the user made the snapshot as large as the real logical
volume, but I think that shouldn't be a requirement and a warning during the
snapshot creation should be enough IMHO. (if the user want to make
sure not to run out of snapshot space he should only care to give
to the snapshot the same space he given to the snapshotted blockdevice)
Another possible feature then is to be able to snapshot a snapshot
(however to make that to work we should break some more assumption of the
userspace tools).
Andrea
^ permalink raw reply [flat|nested] 20+ messages in thread
* [linux-lvm] Re: LVM 2.2 snapshot bug
[not found] ` <898720000.973701988@coffee>
@ 2000-11-08 17:04 ` Andrea Arcangeli
2000-11-08 19:11 ` Andreas Dilger
0 siblings, 1 reply; 20+ messages in thread
From: Andrea Arcangeli @ 2000-11-08 17:04 UTC (permalink / raw)
To: Chris Mason; +Cc: Rik van Riel, linux-lvm, marcelo, Stephen C. Tweedie
On Wed, Nov 08, 2000 at 11:46:28AM -0500, Chris Mason wrote:
> But, if is_read_only(dev) returns 0,[..]
That is the case for a read only floppy disk and for a LVM snapshot but
in reality both generates I/O errors after calling ll_rw_block(WRITE)
because in reality they are both read only.
> [..] reiserfs tries to update the journal
> header block on mount. [..]
So it does the same thing of ext3, I remeber we also discussed that design
decision in the past and I remeber there are good reasons for that.
> Heinz and found the bug where is_read_only returns 0 for the snapshot
Oh, I didn't know you were just aware of it.
> volume while we were working on the snapshot api for journaled filesystems,
> so this code got a lot of testing.
Ok.
Andrea
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-08 16:16 ` Andrea Arcangeli
@ 2000-11-08 19:08 ` Andreas Dilger
0 siblings, 0 replies; 20+ messages in thread
From: Andreas Dilger @ 2000-11-08 19:08 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Andreas Dilger, Stephen C. Tweedie, Rik van Riel, linux-lvm,
marcelo
You write:
> On Tue, Nov 07, 2000 at 01:36:06PM -0700, Andreas Dilger wrote:
> > The problem is that I/O errors during journal recovery would mean a
> > corrupt filesystem. The ext3 recovery code (now) does the correct thing
> > and aborts the journal recovery [..]
>
> ext3 oopses the kernel after getting I/O errors during the writes. I don't
> consider that the correct thing.
That was in the older versions. That's why I put "(now)" in there - the
ext3-0.0.3 and later code handles I/O errors and OOM in a reasonable way.
Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [linux-lvm] Re: LVM 2.2 snapshot bug
2000-11-08 17:04 ` Andrea Arcangeli
@ 2000-11-08 19:11 ` Andreas Dilger
0 siblings, 0 replies; 20+ messages in thread
From: Andreas Dilger @ 2000-11-08 19:11 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Chris Mason, Rik van Riel, linux-lvm, marcelo, Stephen C. Tweedie
Andrea writes:
> On Wed, Nov 08, 2000 at 11:46:28AM -0500, Chris Mason wrote:
> > [..] reiserfs tries to update the journal header block on mount. [..]
>
> So it does the same thing of ext3, I remeber we also discussed that design
> decision in the past and I remeber there are good reasons for that.
>
> > Heinz and found the bug where is_read_only returns 0 for the snapshot
>
> Oh, I didn't know you were just aware of it.
>
> > volume while we were working on the snapshot api for journaled filesystems,
> > so this code got a lot of testing.
Chris, can you send me the API that you settled on, and I will add it into
ext3 and LVM 0.8final so that Rik can have a working system.
Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2000-11-08 19:11 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-11-07 10:55 [linux-lvm] LVM 2.2 snapshot bug Rik van Riel
2000-11-07 12:16 ` Heinz J. Mauelshagen
2000-11-07 14:42 ` Rik van Riel
2000-11-07 13:21 ` [linux-lvm] " Andrea Arcangeli
2000-11-07 14:56 ` Rik van Riel
2000-11-07 16:45 ` Andrea Arcangeli
2000-11-07 17:04 ` Stephen C. Tweedie
2000-11-07 19:51 ` Andrea Arcangeli
2000-11-07 20:36 ` Andreas Dilger
2000-11-08 11:11 ` Stephen C. Tweedie
2000-11-08 16:16 ` Andrea Arcangeli
2000-11-08 19:08 ` Andreas Dilger
2000-11-08 11:10 ` Stephen C. Tweedie
2000-11-08 16:53 ` Andrea Arcangeli
2000-11-07 23:04 ` Rik van Riel
2000-11-08 7:55 ` Heinz Mauelshagen
2000-11-08 13:44 ` Rik van Riel
2000-11-08 16:31 ` Andrea Arcangeli
[not found] ` <898720000.973701988@coffee>
2000-11-08 17:04 ` Andrea Arcangeli
2000-11-08 19:11 ` Andreas Dilger
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.