* trouble with generic/081 @ 2016-12-14 16:43 Christoph Hellwig 2016-12-15 6:29 ` Eryu Guan 2016-12-15 6:36 ` Dave Chinner 0 siblings, 2 replies; 12+ messages in thread From: Christoph Hellwig @ 2016-12-14 16:43 UTC (permalink / raw) To: eguan; +Cc: fstests Hi Eryu, I'm running into a fairly reproducable issue with generic/081 (about every other run): For some reason the umount call in _cleanup doesn't do anything because it thinks the file system isn't mounted, but then vgremove complains that there is a mounted file system. This leads to the scratch device no being release and all subsequent tests failing. Here is the output if I let the commands in _cleanup print to stdout: QA output created by 081 Silence is golden umount: /mnt/test/mnt_081: not mounted Logical volume vg_081/snap_081 contains a filesystem in use. PV /dev/sdc belongs to Volume Group vg_081 so please use vgreduce first. You added a comment in _cleanup that sais: # lvm may have umounted it on I/O error, but in case it does not Does LVM really unmount filesystems on it's own? Could we be racing with it? With a "sleep 1" added before the umount call the test passes reliably for me, but that seems like papering over the issue. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: trouble with generic/081 2016-12-14 16:43 trouble with generic/081 Christoph Hellwig @ 2016-12-15 6:29 ` Eryu Guan 2016-12-15 6:36 ` Dave Chinner 1 sibling, 0 replies; 12+ messages in thread From: Eryu Guan @ 2016-12-15 6:29 UTC (permalink / raw) To: Christoph Hellwig; +Cc: fstests On Wed, Dec 14, 2016 at 08:43:14AM -0800, Christoph Hellwig wrote: > Hi Eryu, > > I'm running into a fairly reproducable issue with generic/081 > (about every other run): For some reason the umount call in > _cleanup doesn't do anything because it thinks the file system isn't > mounted, but then vgremove complains that there is a mounted file > system. This leads to the scratch device no being release and all > subsequent tests failing. > > Here is the output if I let the commands in _cleanup print to stdout: > > QA output created by 081 > Silence is golden > umount: /mnt/test/mnt_081: not mounted > Logical volume vg_081/snap_081 contains a filesystem in use. > PV /dev/sdc belongs to Volume Group vg_081 so please use vgreduce first. Yes, I have this problem too. My original patch didn't have "-c fsync" in the last xfs_io pwrite command, $XFS_IO_PROG -fc "pwrite 0 5m" -c fsync $mnt/testfile >>$seqres.full 2>&1 and Brian suggested that an explicit fsync would make the test clear. And Dave added it and committed the patch. https://www.spinics.net/lists/fstests/msg01265.html This cleanup failure was the exact reason why I didn't include fsync at first. https://www.spinics.net/lists/fstests/msg01269.html Then I sent a follow-up patch to workaround this issue, but Dave suggested that we should triage and fix the underlying bug first (if there's any). https://www.spinics.net/lists/fstests/msg01406.html I tried to follow & dig into it but went nowhere, I didn't know that part of code good enough.. > > You added a comment in _cleanup that sais: > > # lvm may have umounted it on I/O error, but in case it does not > > Does LVM really unmount filesystems on it's own? Could we be racing > with it? IIRC, there's some kind of hooks in LVM that unmount the filesystems, but I can't recall the details now.. From the ending results, the filesystems are umounted, perhaps that's why you see "/mnt/test/mnt_081: not mounted" (this error message is redirected to /dev/null in the test). > > With a "sleep 1" added before the umount call the test passes reliably > for me, but that seems like papering over the issue. Do you have any preference on this? Thanks, Eryu ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: trouble with generic/081 2016-12-14 16:43 trouble with generic/081 Christoph Hellwig 2016-12-15 6:29 ` Eryu Guan @ 2016-12-15 6:36 ` Dave Chinner 2016-12-15 8:42 ` Christoph Hellwig 1 sibling, 1 reply; 12+ messages in thread From: Dave Chinner @ 2016-12-15 6:36 UTC (permalink / raw) To: Christoph Hellwig; +Cc: eguan, fstests On Wed, Dec 14, 2016 at 08:43:14AM -0800, Christoph Hellwig wrote: > Hi Eryu, > > I'm running into a fairly reproducable issue with generic/081 > (about every other run): For some reason the umount call in > _cleanup doesn't do anything because it thinks the file system isn't > mounted, but then vgremove complains that there is a mounted file > system. This leads to the scratch device no being release and all > subsequent tests failing. Yup, been seeing that on my pmem test setup for months. Reported along with the subsequent LVM configuration fuckup it resulted in: https://www.redhat.com/archives/dm-devel/2016-July/msg00405.html > Here is the output if I let the commands in _cleanup print to stdout: > > QA output created by 081 > Silence is golden > umount: /mnt/test/mnt_081: not mounted > Logical volume vg_081/snap_081 contains a filesystem in use. > PV /dev/sdc belongs to Volume Group vg_081 so please use vgreduce first. > > You added a comment in _cleanup that sais: > > # lvm may have umounted it on I/O error, but in case it does not > > Does LVM really unmount filesystems on it's own? Could we be racing > with it? Nope, I'm pretty sure it's a snapshot lifecycle issue - the snapshot is still busy doing something (probably IO) for a short while after we unmount, so LVM can't tear it down immediately like we ask. Wait a few seconds, the snapshot work finishes, goes idle, and then it can be torn down. But if you consider the fuckup that occurs if generic/085 starts up and tries to reconfigure LVM while the snapshot from generic/081 is still in this whacky window (as reported in the above link), this is really quite a nasty bug. > With a "sleep 1" added before the umount call the test passes reliably > for me, but that seems like papering over the issue. Yup, same here. My local patch is this: --- tests/generic/081 | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tests/generic/081 b/tests/generic/081 index 11755d4d89ff..ff33ffaa4fb8 100755 --- a/tests/generic/081 +++ b/tests/generic/081 @@ -36,6 +36,11 @@ _cleanup() rm -f $tmp.* # lvm may have umounted it on I/O error, but in case it does not $UMOUNT_PROG $mnt >/dev/null 2>&1 + + # on a pmem device, the vgremove/pvremove commands fail immediately + # after unmount. Wait a bit before removing them in the hope it + # succeeds. + sleep 5 $LVM_PROG vgremove -f $vgname >>$seqres.full 2>&1 $LVM_PROG pvremove -f $SCRATCH_DEV >>$seqres.full 2>&1 } Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: trouble with generic/081 2016-12-15 6:36 ` Dave Chinner @ 2016-12-15 8:42 ` Christoph Hellwig 2016-12-15 9:16 ` Zdenek Kabelac 0 siblings, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2016-12-15 8:42 UTC (permalink / raw) To: Dave Chinner; +Cc: eguan, fstests, dm-devel On Thu, Dec 15, 2016 at 05:36:50PM +1100, Dave Chinner wrote: > Yup, same here. My local patch is this: I have a sleep 1 before the unmount. To be honest this lvm behavior of auto-unmounting on error seems like a huge mess, I wonder if there is a way to disable it? Even on a production system I'd much rather have a shutdown XFS fs than LVM trying to unmount, probably hanging because there are busy fds on the fs, and if not the application might not write to another fs becaus of this. It's just an amazingly stupid idea. > > --- > tests/generic/081 | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/tests/generic/081 b/tests/generic/081 > index 11755d4d89ff..ff33ffaa4fb8 100755 > --- a/tests/generic/081 > +++ b/tests/generic/081 > @@ -36,6 +36,11 @@ _cleanup() > rm -f $tmp.* > # lvm may have umounted it on I/O error, but in case it does not > $UMOUNT_PROG $mnt >/dev/null 2>&1 > + > + # on a pmem device, the vgremove/pvremove commands fail immediately > + # after unmount. Wait a bit before removing them in the hope it > + # succeeds. > + sleep 5 > $LVM_PROG vgremove -f $vgname >>$seqres.full 2>&1 > $LVM_PROG pvremove -f $SCRATCH_DEV >>$seqres.full 2>&1 > } > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ---end quoted text--- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: trouble with generic/081 2016-12-15 8:42 ` Christoph Hellwig @ 2016-12-15 9:16 ` Zdenek Kabelac 2016-12-16 8:15 ` Christoph Hellwig 0 siblings, 1 reply; 12+ messages in thread From: Zdenek Kabelac @ 2016-12-15 9:16 UTC (permalink / raw) To: Christoph Hellwig, Dave Chinner; +Cc: eguan, fstests, dm-devel Dne 15.12.2016 v 09:42 Christoph Hellwig napsal(a): > On Thu, Dec 15, 2016 at 05:36:50PM +1100, Dave Chinner wrote: >> Yup, same here. My local patch is this: > > I have a sleep 1 before the unmount. To be honest this lvm behavior of > auto-unmounting on error seems like a huge mess, I wonder if there is a way > to disable it? > > Even on a production system I'd much rather have a shutdown XFS fs than LVM > trying to unmount, probably hanging because there are busy fds on the fs, > and if not the application might not write to another fs becaus of this. > It's just an amazingly stupid idea. > Hi So let me explain the logic behind this 'amazingly stupid' idea. The full pool is by far worst case and ATM lvm2 has only a single policy (though for many many years) to prevent bigger disaster by intentionally causing smaller one - and sometimes it works (since typically if you have open descriptors in apps - and you can't open new one app dies, descriptors are closed, FS is unmounted....) (Originally we wanted to be even more 'brutal' - and replace such thin devices with error targets...) Unfortunately remount of filesystem to read-only mode will not ensure there will be no further writes (journal replies, endless retry of XFS to store unwritten blocks...) but could be possibly better choice. What is worth here to emphasize: the 95% fullness of thin-poll (which is the moment where the unmount will be tried) it simply the failure of admin - it should never get to that fullness in the first place - plain and simple. Thin-pool is NOT designed to be commonly operated over the corner cases - if that's your common use scenario - you've picked wrong technology - it doesn't fit. We have already seen some use scenarios where 100% full pool was meant to be part of regular work-flow.... If you do monitor with threshold 70% and you already have a pool over 95% there is nearly zero chance it will get fixed by some miracle. Filesystem are not so much ready yet to deal with full thin-pool well - this is plain real-world fact. They will destroy them-self totally when they suddenly face massive random error sector counts. It may cause serious possibly even irreparable damage in case of XFS. The ext4 is ahead with years proven error remount read-only behavior technology and until very recently where XFS gained something close to this. But of course we are open to proposal - what better we should do to not be 'amazingly stupid' (just note trying to use full thin-pool is not 'amazingly smart') - this is incomparable case to full single filesystem. We do already have some plans to call 'fstrim' at some points - but still we need something as emergency stop for filesystem before there is hit of full-pool wall - this is the case which always means data-loss and is not-trival to recover. For each such proposal it's worth to make upstream Bugzilla. Regards Zdenek ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: trouble with generic/081 2016-12-15 9:16 ` Zdenek Kabelac @ 2016-12-16 8:15 ` Christoph Hellwig 2017-01-04 23:03 ` Eric Sandeen 0 siblings, 1 reply; 12+ messages in thread From: Christoph Hellwig @ 2016-12-16 8:15 UTC (permalink / raw) To: Zdenek Kabelac; +Cc: Christoph Hellwig, Dave Chinner, eguan, fstests, dm-devel On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote: > So let me explain the logic behind this 'amazingly stupid' idea. And that logic doesn't make any sense at all. invibly unmounting a file system behind the users back is actively harmful, as it is contradicting the principle of least surprise, and the xfstests mess is one simple example for it. Add a callback in-kernel to tell the fs to shut down but NOT unmount and expose the namespace below it, which the administrator has probably intentionally hid. Even worse unmount may trigger further writes and with fses not handling them the fs might now be stuck after being detached from the namespace without a way for the admin to detect or recover this. What XFS did on IRIX was to let the volume manager call into the fs and shut it down. At this point no further writes are possible, but we do not expose the namespace under the mount point, and the admin can fix the situation with all the normal tools. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: trouble with generic/081 2016-12-16 8:15 ` Christoph Hellwig @ 2017-01-04 23:03 ` Eric Sandeen 2017-01-05 10:35 ` Zdenek Kabelac 0 siblings, 1 reply; 12+ messages in thread From: Eric Sandeen @ 2017-01-04 23:03 UTC (permalink / raw) To: Christoph Hellwig, Zdenek Kabelac; +Cc: Dave Chinner, eguan, fstests, dm-devel On 12/16/16 2:15 AM, Christoph Hellwig wrote: > On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote: >> So let me explain the logic behind this 'amazingly stupid' idea. > > And that logic doesn't make any sense at all. invibly unmounting > a file system behind the users back is actively harmful, as it is > contradicting the principle of least surprise, and the xfstests mess > is one simple example for it. Add a callback in-kernel to tell the > fs to shut down but NOT unmount and expose the namespace below it, > which the administrator has probably intentionally hid. > > Even worse unmount may trigger further writes and with fses not > handling them the fs might now be stuck after being detached from > the namespace without a way for the admin to detect or recover this. > > What XFS did on IRIX was to let the volume manager call into the fs > and shut it down. At this point no further writes are possible, > but we do not expose the namespace under the mount point, and the > admin can fix the situation with all the normal tools. <late to the party> Is there a need for this kind of call-up when xfs now has the configurable error handling so that it will shut down after X retries or Y seconds of a persistent error? -Eric ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: trouble with generic/081 2017-01-04 23:03 ` Eric Sandeen @ 2017-01-05 10:35 ` Zdenek Kabelac 2017-01-05 16:26 ` Mike Snitzer 0 siblings, 1 reply; 12+ messages in thread From: Zdenek Kabelac @ 2017-01-05 10:35 UTC (permalink / raw) To: Eric Sandeen, Christoph Hellwig, Zdenek Kabelac Cc: Dave Chinner, eguan, fstests, dm-devel Dne 5.1.2017 v 00:03 Eric Sandeen napsal(a): > > > On 12/16/16 2:15 AM, Christoph Hellwig wrote: >> On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote: >>> So let me explain the logic behind this 'amazingly stupid' idea. >> >> And that logic doesn't make any sense at all. invibly unmounting >> a file system behind the users back is actively harmful, as it is >> contradicting the principle of least surprise, and the xfstests mess >> is one simple example for it. Add a callback in-kernel to tell the >> fs to shut down but NOT unmount and expose the namespace below it, >> which the administrator has probably intentionally hid. >> >> Even worse unmount may trigger further writes and with fses not >> handling them the fs might now be stuck after being detached from >> the namespace without a way for the admin to detect or recover this. >> >> What XFS did on IRIX was to let the volume manager call into the fs >> and shut it down. At this point no further writes are possible, >> but we do not expose the namespace under the mount point, and the >> admin can fix the situation with all the normal tools. > > <late to the party> > > Is there a need for this kind of call-up when xfs now has the configurable > error handling so that it will shut down after X retries or Y seconds > of a persistent error? We need likely to open RFE bugzilla here - and specify how it should work when some conditions are met. Current 'best effort' tries to minimize damage by trying to do a full-stop when pool approaches 95% fullness. Which is relatively 'low/small' for small sized thin-pool - but there is reasonable big free space for commonly sized thin-pool to be able to flush most of page cache on disk before things will go crazy. Now - we could probably detect presence of kernel version and xfs/ext4 present features - and change reactions. What I expect from this BZ is - how to detect things and what is the 'best' thing to do. I'm clearly not an expert on all filesystem and all their features - but lvm2 needs to work reasonable well across all variants of kernels and filesystems - so we cannot say to user - now we require you to use the latest 4.10 kernel with these features enabled otherwise all your data could be lost. We need to know what to do with 3.X kernel, 4.X kernel and present features in kernel and how we can detect them in runtime. Regards Zdenek ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: trouble with generic/081 2017-01-05 10:35 ` Zdenek Kabelac @ 2017-01-05 16:26 ` Mike Snitzer 2017-01-05 17:42 ` Zdenek Kabelac 0 siblings, 1 reply; 12+ messages in thread From: Mike Snitzer @ 2017-01-05 16:26 UTC (permalink / raw) To: Zdenek Kabelac Cc: Eric Sandeen, Christoph Hellwig, dm-devel, Dave Chinner, eguan, fstests On Thu, Jan 05 2017 at 5:35am -0500, Zdenek Kabelac <zkabelac@redhat.com> wrote: > Dne 5.1.2017 v 00:03 Eric Sandeen napsal(a): > > > > > >On 12/16/16 2:15 AM, Christoph Hellwig wrote: > >>On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote: > >>>So let me explain the logic behind this 'amazingly stupid' idea. > >> > >>And that logic doesn't make any sense at all. invibly unmounting > >>a file system behind the users back is actively harmful, as it is > >>contradicting the principle of least surprise, and the xfstests mess > >>is one simple example for it. Add a callback in-kernel to tell the > >>fs to shut down but NOT unmount and expose the namespace below it, > >>which the administrator has probably intentionally hid. > >> > >>Even worse unmount may trigger further writes and with fses not > >>handling them the fs might now be stuck after being detached from > >>the namespace without a way for the admin to detect or recover this. > >> > >>What XFS did on IRIX was to let the volume manager call into the fs > >>and shut it down. At this point no further writes are possible, > >>but we do not expose the namespace under the mount point, and the > >>admin can fix the situation with all the normal tools. > > > ><late to the party> > > > >Is there a need for this kind of call-up when xfs now has the configurable > >error handling so that it will shut down after X retries or Y seconds > >of a persistent error? > > > We need likely to open RFE bugzilla here - and specify how it should > work when some conditions are met. > > Current 'best effort' tries to minimize damage by trying to do a full-stop > when pool approaches 95% fullness. Which is relatively 'low/small' > for small sized thin-pool - but there is reasonable big free space > for > commonly sized thin-pool to be able to flush most of page cache on > disk before things will go crazy. > > Now - we could probably detect presence of kernel version and > xfs/ext4 present features - and change reactions. > > What I expect from this BZ is - how to detect things and what is > the 'best' thing to do. > > I'm clearly not an expert on all filesystem and all their features - but lvm2 > needs to work reasonable well across all variants of kernels and > filesystems - so we cannot say to user - now we require you to use > the latest 4.10 > kernel with these features enabled otherwise all your data could be lost. > > We need to know what to do with 3.X kernel, 4.X kernel and present > features in kernel and how we can detect them in runtime. No we need to fix upstream. It is the job of distros to sort out other solutions. And yeah I appreciate that you need to worry about distro X, Y, Z from Red Hat but lvm2 needs to not be so cute about things. And there has been significant progress on XFS's error handling so that it no longer hangs in the face of ENOSPC. Eric says the basics are documented in Documentation/filesystems/xfs.txt under "Error Handling". Bottomline is lvm2 really has no business unmounting the filesystem. That lvm2 "feature" should be reverted. It isn't what anyone would expect. And it only serves to create problems. Nice intent but it was a misfire to implement it. At most a discard should be issued once you cross a threshold. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: trouble with generic/081 2017-01-05 16:26 ` Mike Snitzer @ 2017-01-05 17:42 ` Zdenek Kabelac 2017-01-05 18:07 ` Mike Snitzer 2017-01-05 18:40 ` Eric Sandeen 0 siblings, 2 replies; 12+ messages in thread From: Zdenek Kabelac @ 2017-01-05 17:42 UTC (permalink / raw) To: Mike Snitzer Cc: Eric Sandeen, Christoph Hellwig, dm-devel, Dave Chinner, eguan, fstests Dne 5.1.2017 v 17:26 Mike Snitzer napsal(a): > On Thu, Jan 05 2017 at 5:35am -0500, > Zdenek Kabelac <zkabelac@redhat.com> wrote: > >> Dne 5.1.2017 v 00:03 Eric Sandeen napsal(a): >>> >>> >>> On 12/16/16 2:15 AM, Christoph Hellwig wrote: >>>> On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote: >>>>> So let me explain the logic behind this 'amazingly stupid' idea. >>>> >>>> And that logic doesn't make any sense at all. invibly unmounting >>>> a file system behind the users back is actively harmful, as it is >>>> contradicting the principle of least surprise, and the xfstests mess >>>> is one simple example for it. Add a callback in-kernel to tell the >>>> fs to shut down but NOT unmount and expose the namespace below it, >>>> which the administrator has probably intentionally hid. >>>> >>>> Even worse unmount may trigger further writes and with fses not >>>> handling them the fs might now be stuck after being detached from >>>> the namespace without a way for the admin to detect or recover this. >>>> >>>> What XFS did on IRIX was to let the volume manager call into the fs >>>> and shut it down. At this point no further writes are possible, >>>> but we do not expose the namespace under the mount point, and the >>>> admin can fix the situation with all the normal tools. >>> >>> <late to the party> >>> >>> Is there a need for this kind of call-up when xfs now has the configurable >>> error handling so that it will shut down after X retries or Y seconds >>> of a persistent error? >> >> >> We need likely to open RFE bugzilla here - and specify how it should >> work when some conditions are met. >> >> Current 'best effort' tries to minimize damage by trying to do a full-stop >> when pool approaches 95% fullness. Which is relatively 'low/small' >> for small sized thin-pool - but there is reasonable big free space >> for >> commonly sized thin-pool to be able to flush most of page cache on >> disk before things will go crazy. >> >> Now - we could probably detect presence of kernel version and >> xfs/ext4 present features - and change reactions. >> >> What I expect from this BZ is - how to detect things and what is >> the 'best' thing to do. >> >> I'm clearly not an expert on all filesystem and all their features - but lvm2 >> needs to work reasonable well across all variants of kernels and >> filesystems - so we cannot say to user - now we require you to use >> the latest 4.10 >> kernel with these features enabled otherwise all your data could be lost. >> >> We need to know what to do with 3.X kernel, 4.X kernel and present >> features in kernel and how we can detect them in runtime. > > No we need to fix upstream. It is the job of distros to sort out other > solutions. And yeah I appreciate that you need to worry about distro X, > Y, Z from Red Hat but lvm2 needs to not be so cute about things. > > And there has been significant progress on XFS's error handling so that > it no longer hangs in the face of ENOSPC. > > Eric says the basics are documented in Documentation/filesystems/xfs.txt > under "Error Handling". > > Bottomline is lvm2 really has no business unmounting the filesystem. > That lvm2 "feature" should be reverted. It isn't what anyone would > expect. And it only serves to create problems. Nice intent but it was > a misfire to implement it. At most a discard should be issued once you > cross a threshold. > Users are mostly using thin LVs with ext4 and XFS filesytems. Lots of users do use quite ancient kernels (<4.X). When lvm2 decides to UNMOUNT volumes - it's the moment where everything else HAS failed (mainly Admin has failed to provide promised space) And it should be seen as mild OOPS replacement. Un/fortunately lvm2 does care about older distributions and kernels - so unlike many other 'modern' software you can take recent lvm2 - compile & run on several years system and it does its best - so far I'd call it a feature. Not really sure what do you mean by - leaving this on 'distro' maintainers since these are typically able to run 'configure & make', without no big idea about configurable lvm2 internals. -- Now there is no objection about adding configurable behavior on such cases (policies) here you are 'breaking to the open doors' since we do plan to provide these for some time (I think there are couple BZs already) So far I understand that admin HAS to configure 'better XFS logic' himself on filesystem - it's not granted default so far even on 4.10 kernel ? On lvm2 side we need to first convert plugin code executions to lvconvert policies (will make BZ for this one). Regards Zdenek ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: trouble with generic/081 2017-01-05 17:42 ` Zdenek Kabelac @ 2017-01-05 18:07 ` Mike Snitzer 2017-01-05 18:40 ` Eric Sandeen 1 sibling, 0 replies; 12+ messages in thread From: Mike Snitzer @ 2017-01-05 18:07 UTC (permalink / raw) To: Zdenek Kabelac Cc: Eric Sandeen, Christoph Hellwig, dm-devel, Dave Chinner, eguan, fstests On Thu, Jan 05 2017 at 12:42pm -0500, Zdenek Kabelac <zkabelac@redhat.com> wrote: > Dne 5.1.2017 v 17:26 Mike Snitzer napsal(a): > >On Thu, Jan 05 2017 at 5:35am -0500, > >Zdenek Kabelac <zkabelac@redhat.com> wrote: > > > >>Dne 5.1.2017 v 00:03 Eric Sandeen napsal(a): > >>> > >>> > >>>On 12/16/16 2:15 AM, Christoph Hellwig wrote: > >>>>On Thu, Dec 15, 2016 at 10:16:23AM +0100, Zdenek Kabelac wrote: > >>>>>So let me explain the logic behind this 'amazingly stupid' idea. > >>>> > >>>>And that logic doesn't make any sense at all. invibly unmounting > >>>>a file system behind the users back is actively harmful, as it is > >>>>contradicting the principle of least surprise, and the xfstests mess > >>>>is one simple example for it. Add a callback in-kernel to tell the > >>>>fs to shut down but NOT unmount and expose the namespace below it, > >>>>which the administrator has probably intentionally hid. > >>>> > >>>>Even worse unmount may trigger further writes and with fses not > >>>>handling them the fs might now be stuck after being detached from > >>>>the namespace without a way for the admin to detect or recover this. > >>>> > >>>>What XFS did on IRIX was to let the volume manager call into the fs > >>>>and shut it down. At this point no further writes are possible, > >>>>but we do not expose the namespace under the mount point, and the > >>>>admin can fix the situation with all the normal tools. > >>> > >>><late to the party> > >>> > >>>Is there a need for this kind of call-up when xfs now has the configurable > >>>error handling so that it will shut down after X retries or Y seconds > >>>of a persistent error? > >> > >> > >>We need likely to open RFE bugzilla here - and specify how it should > >>work when some conditions are met. > >> > >>Current 'best effort' tries to minimize damage by trying to do a full-stop > >>when pool approaches 95% fullness. Which is relatively 'low/small' > >>for small sized thin-pool - but there is reasonable big free space > >>for > >>commonly sized thin-pool to be able to flush most of page cache on > >>disk before things will go crazy. > >> > >>Now - we could probably detect presence of kernel version and > >>xfs/ext4 present features - and change reactions. > >> > >>What I expect from this BZ is - how to detect things and what is > >>the 'best' thing to do. > >> > >>I'm clearly not an expert on all filesystem and all their features - but lvm2 > >>needs to work reasonable well across all variants of kernels and > >>filesystems - so we cannot say to user - now we require you to use > >>the latest 4.10 > >>kernel with these features enabled otherwise all your data could be lost. > >> > >>We need to know what to do with 3.X kernel, 4.X kernel and present > >>features in kernel and how we can detect them in runtime. > > > >No we need to fix upstream. It is the job of distros to sort out other > >solutions. And yeah I appreciate that you need to worry about distro X, > >Y, Z from Red Hat but lvm2 needs to not be so cute about things. > > > >And there has been significant progress on XFS's error handling so that > >it no longer hangs in the face of ENOSPC. > > > >Eric says the basics are documented in Documentation/filesystems/xfs.txt > >under "Error Handling". > > > >Bottomline is lvm2 really has no business unmounting the filesystem. > >That lvm2 "feature" should be reverted. It isn't what anyone would > >expect. And it only serves to create problems. Nice intent but it was > >a misfire to implement it. At most a discard should be issued once you > >cross a threshold. > > > > > Users are mostly using thin LVs with ext4 and XFS filesytems. > Lots of users do use quite ancient kernels (<4.X). > > When lvm2 decides to UNMOUNT volumes - it's the moment where > everything else HAS failed (mainly Admin has failed to provide > promised space) > And it should be seen as mild OOPS replacement. > > > Un/fortunately lvm2 does care about older distributions and kernels > - so unlike many other 'modern' software you can take recent lvm2 - > compile & run on several years system and it does its best - so far > I'd call it a feature. > > Not really sure what do you mean by - leaving this on 'distro' > maintainers since these are typically able to run 'configure & > make', > without no big idea about configurable lvm2 internals. > > -- > > Now there is no objection about adding configurable behavior on such cases > (policies) here you are 'breaking to the open doors' since we do > plan to provide these for some time (I think there are couple BZs > already) > > So far I understand that admin HAS to configure 'better XFS logic' > himself on filesystem - it's not granted default so far even on > 4.10 kernel ? > > On lvm2 side we need to first convert plugin code executions to > lvconvert policies (will make BZ for this one). lvm2 isn't the one that needs to fix this! but it certainly should not be doing an unmount. discard at most. why is discard kernel version dependent? You're making an historically annoying problem even more so. please stop this. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: trouble with generic/081 2017-01-05 17:42 ` Zdenek Kabelac 2017-01-05 18:07 ` Mike Snitzer @ 2017-01-05 18:40 ` Eric Sandeen 1 sibling, 0 replies; 12+ messages in thread From: Eric Sandeen @ 2017-01-05 18:40 UTC (permalink / raw) To: Zdenek Kabelac, Mike Snitzer Cc: Christoph Hellwig, dm-devel, Dave Chinner, eguan, fstests On 1/5/17 11:42 AM, Zdenek Kabelac wrote: > Dne 5.1.2017 v 17:26 Mike Snitzer napsal(a): ... >> Bottomline is lvm2 really has no business unmounting the filesystem. >> That lvm2 "feature" should be reverted. It isn't what anyone would >> expect. And it only serves to create problems. Nice intent but it was >> a misfire to implement it. At most a discard should be issued once you >> cross a threshold. Agreed. > > > Users are mostly using thin LVs with ext4 and XFS filesytems. > Lots of users do use quite ancient kernels (<4.X). Yes, old kernels have bugs. Enterprise distros should fix those bugs, configure && make distros and roll-your-own users get what they get, IMHO. > When lvm2 decides to UNMOUNT volumes - it's the moment where > everything else HAS failed (mainly Admin has failed to provide > promised space) > And it should be seen as mild OOPS replacement. unmounting is a bad idea. It's not even guaranteed to complete, because unmount may well require IO. It has undesirable and unpredictable impacts on the system. At a bare minimum, it should not be the default behavior. > > Un/fortunately lvm2 does care about older distributions and kernels - > so unlike many other 'modern' software you can take recent lvm2 - > compile & run on several years system and it does its best - so far > I'd call it a feature. It's great that lvm2 stays compatible - xfsprogs does too, FWIW - but trying to work around every old bad behavior is likely to be a fool's errand, IMHO. > Not really sure what do you mean by - leaving this on 'distro' > maintainers since these are typically able to run 'configure & > make', without no big idea about configurable lvm2 internals. But unmounting /does not/ solve that for them. It's bad behavior. It's far better IMHO to let xfs spew errors to the log for a day than to unmount a volume and then fill the root fs with ongoing IO, for example. > -- > > Now there is no objection about adding configurable behavior on such cases > (policies) here you are 'breaking to the open doors' since we do plan > to provide these for some time (I think there are couple BZs > already) > > So far I understand that admin HAS to configure 'better XFS logic' > himself on filesystem - it's not granted default so far even on 4.10 > kernel ? Ok, back to concrete things: Right, XFS behavior is tunable, but largely unchanged other than a better behavior at unmount. Default behavior in the face of IO errors is otherwise unchanged. Not all the world is thinp; there are other error conditions and cases which may warrant different responses. As such, the default remains to keep trying IO unless something critical fails, which will then shut down the filesystem. That said, lvm2 offering to configure such behavior for xfs on a thin volume might be quite reasonable. I really think we need a robust test framework for all of this so we can move beyond anecdotes and assumptions, and get some repeatable regression tests going. -Eric > On lvm2 side we need to first convert plugin code executions to lvconvert policies (will make BZ for this one). > > > Regards > > Zdenek > > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2017-01-05 18:40 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-12-14 16:43 trouble with generic/081 Christoph Hellwig 2016-12-15 6:29 ` Eryu Guan 2016-12-15 6:36 ` Dave Chinner 2016-12-15 8:42 ` Christoph Hellwig 2016-12-15 9:16 ` Zdenek Kabelac 2016-12-16 8:15 ` Christoph Hellwig 2017-01-04 23:03 ` Eric Sandeen 2017-01-05 10:35 ` Zdenek Kabelac 2017-01-05 16:26 ` Mike Snitzer 2017-01-05 17:42 ` Zdenek Kabelac 2017-01-05 18:07 ` Mike Snitzer 2017-01-05 18:40 ` Eric Sandeen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox