From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mx1.redhat.com (ext-mx16.extmail.prod.ext.phx2.redhat.com [10.5.110.21]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r797vh0p021700 for ; Fri, 9 Aug 2013 03:57:43 -0400 Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.17.10]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r797vbYi025057 for ; Fri, 9 Aug 2013 03:57:38 -0400 Received: from [192.168.0.8] by pse.dyndns.org with esmtp (Exim 4.72) (envelope-from ) id 1V7hZn-0002AV-Rb for linux-lvm@redhat.com; Fri, 09 Aug 2013 09:57:35 +0200 Message-ID: <5204A0EF.7010803@pse-consulting.de> Date: Fri, 09 Aug 2013 09:57:35 +0200 From: Andreas Pflug MIME-Version: 1.0 References: <20130806173719.GB15184@mail.waldi.eu.org> <520211BB.2040301@pse-consulting.de> <5202164B.5010302@redhat.com> <52028170.1010000@pse-consulting.de> <52036C86.3040702@redhat.com> In-Reply-To: <52036C86.3040702@redhat.com> Content-Transfer-Encoding: 7bit Subject: Re: [linux-lvm] Missing error handling in lv_snapshot_remove Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii" To: linux-lvm@redhat.com Am 08.08.13 12:01, schrieb Zdenek Kabelac: > Dne 7.8.2013 19:18, Andreas Pflug napsal(a): >> On 08/07/13 11:41, Zdenek Kabelac wrote: >>> Dne 7.8.2013 11:22, Andreas Pflug napsal(a): >>>> Am 06.08.13 19:37, schrieb Bastian Blank: >>>>> Hi >>>>> >>>>> I tried to tackle a particular bug that shows up in Debian for >>>>> some time >>>>> now. Some blamed the udev rules and I still can't completely rule >>>>> them >>>>> out. But this triggers a much worse bug in the error cleanup of the >>>>> snapshot remove. I reproduced this with Debian/Linux 3.2.46/LVM >>>>> 2.02.99 >>>>> without udevd running and Fedora 19/LVM 2.02.98-10.fc19. >>>>> >>>>> On snapshot removal, LVM first converts the device into a regular LV >>>>> (lv_remove_snapshot) and in a second step removes this LV >>>>> (lv_remove_single). Is there a reason for this two step removal? An >>>>> error during removal leaves a non-snapshot LV behind. >>>> Ah, this explains why sometimes my backup stops: I take a snapshot, >>>> rsync the stuff and remove the snapshot with a daily cron job, but I >>>> observed twice that a non-snapshot volume named like a backup snapshot >>>> was lingering around, preventing the script to work. So this is no >>>> exotic corner case, but happens in real life. >>>> >>>> I observe this since I dist-upgraded to wheezy. >>>> >>> >>> Because Debian is using non-upstream udev rules. >>> >>> With upstream udev rules with standard real-life use, this situation >>> cannot happen - since these rules are constructed to play better with >>> udev WATCH rule. >> >> Hm, does udev play a role on this at all? Without having dived the >> code, I'd >> assume udev has only to do with creation and deletion of /dev/mapper/... >> and/or /dev/vgname/... devices (upon lvchange -aX), but not with lvm >> metadata >> manipulation. > > > Udev attempts to update it device database after any change event > (you could observe its work with udevadm monitor) > > So in your case - you unmount filesystem -> close device -> fires > WATCH event with some randomly delayed (systemd)udevd scan machism - > so in unpredictable moment blkid opens device and scans its sectors > (keeping device open and interfering with deactivate operation). For > this short-time opens there is now built-in retry which tries to > deactivate device several times when it's known device is not mounted. So in order to harden my script against this problem, I should deactivate the volume explicitely, wait a while and then remove it? Regards, Andreas