* semop failed for cookie? @ 2010-04-27 20:56 Douglas McClendon 2010-04-27 22:33 ` Alasdair G Kergon 2010-04-28 9:38 ` Peter Rajnoha 0 siblings, 2 replies; 11+ messages in thread From: Douglas McClendon @ 2010-04-27 20:56 UTC (permalink / raw) To: dm-devel; +Cc: Sebastian Dziallas, satellit Hi, I have a user of an installation tool of mine that is hitting this message, with a very recent pre-fedora-13 kernel. zyx-liveinstaller-cli: creating temporary rootfs virtual duplicate device-mapper: resume ioctl failed: Invalid argument semid 65536: semop failed for cookie 0xd4d423c: incorrect semaphore state Failed to set a proper state for notification semaphore identified by cookie value 223167036 (0xd4d423c) to initialize waiting for incoming notifications. Command failed zyx-liveinstaller-cli: error: failed to create temporary rootfs virtual duplicate The error is happening at line 742 in this monster bash script (which I'm pretty sure does work on f11 and f12) http://filteredperception.org/dawg/projects/zyx-liveinstaller/src/zyx-liveinstaller-latest/rli/zyx-liveinstaller-cli.html http://filteredperception.org/dawg/projects/zyx-liveinstaller Now, I've already urged the sugar on a stick project to primarily pursue the path of the standard fedora livecd/usb installer, for many additional reasons. But, I guess I would still like my rebootless installer to actually work. And this may be one of those great corner case issues that developers love. I.e. a very obscure tickling of some code path. Or maybe I'm just being too lazy to read the accompanying dmsetup man page for some clue as to what I should be doing when I see such a bizarre error message. Or, some other bug in the script is causing the arguments to become malformed. Still, the crypticness of the message from devicemapper suggests something you folks might want to improve. In any event, FWIW, there it is... -dmc ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: semop failed for cookie? 2010-04-27 20:56 semop failed for cookie? Douglas McClendon @ 2010-04-27 22:33 ` Alasdair G Kergon 2010-04-28 3:52 ` Douglas McClendon 2010-04-28 23:11 ` Douglas McClendon 2010-04-28 9:38 ` Peter Rajnoha 1 sibling, 2 replies; 11+ messages in thread From: Alasdair G Kergon @ 2010-04-27 22:33 UTC (permalink / raw) To: device-mapper development; +Cc: Sebastian Dziallas, satellit On Tue, Apr 27, 2010 at 03:56:57PM -0500, Douglas McClendon wrote: > I have a user of an installation tool of mine that is hitting this > message, with a very recent pre-fedora-13 kernel. udev is now involved in this process. Check they have up-to-date lvm2 and udev packages and that they've not tried to customise their udev rules - if they have, you'll need to check their changes didn't break things. Big script. Debug it by adding lines to dump the state immediately before the problem command, then immediately after it. Dump state by running 'dmsetup info -c', 'dmsetup table', 'dmsetup status' and 'dmsetup udevcookies'. If that still doesn't help, break the 'dmsetup create' command down into its three constituent commands (dmsetup create --notable, dmsetup load, dmsetup resume) and dump the state between each of them and confirm which is failing. Alasdair ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: semop failed for cookie? 2010-04-27 22:33 ` Alasdair G Kergon @ 2010-04-28 3:52 ` Douglas McClendon 2010-04-28 23:11 ` Douglas McClendon 1 sibling, 0 replies; 11+ messages in thread From: Douglas McClendon @ 2010-04-28 3:52 UTC (permalink / raw) To: dm-devel On 04/27/2010 05:33 PM, Alasdair G Kergon wrote: > On Tue, Apr 27, 2010 at 03:56:57PM -0500, Douglas McClendon wrote: >> I have a user of an installation tool of mine that is hitting this >> message, with a very recent pre-fedora-13 kernel. > > udev is now involved in this process. > Check they have up-to-date lvm2 and udev packages and that they've not > tried to customise their udev rules - if they have, you'll need to > check their changes didn't break things. Thanks for the reply and the advice. I'm not so interested in the issue that I'll necessarily get to it very soon, but what you said will no doubt help. One thing though, which may not have been obvious, and almost sounds dubious, is what I'm actually doing there. I'll try to describe it in words here, to see if this shouts out as a situation that may no longer be expected to work (because honestly I was probably pretty pleased and half-surprised to discover that it did work 3 years ago)- Basically the livecd mode you should be familiar with. ext3 image on a loop device. cow file in tmpfs on a loop device. Combined with dm-snapshot, resulting in what is used as the rootfs device. Simple enough. So what I do (and this is dusty code I haven't payed attention to in a long time, so maybe I'm misunderstanding my own code, but probably not) is this- 1) with that dmsetup create that is now failing, I first create a duplicate device (different name, same table) as the one that the rootfs. I.e. another snapshot device with the same components/table. 2) I use a reload --table on the device that is the rootfs, to replace it with a new table, that is a mirror of the device created in (1) and the target normal hard disk partition that the script is installing the OS to. 3) I do a resume on the rootfs device such that the new table with the mirror activates, and the migration starts to occur 4) when the mirror completes, I do another reload then resume with a new linear table pointing to the newly installed fs on normal disk partition. Then I tear down all the unused original devices. So, if something about this description screams out- the new udev semantics will prevent (1) from working, let me know. > > Big script. > > Debug it by adding lines to dump the state immediately before the problem > command, then immediately after it. > > Dump state by running 'dmsetup info -c', 'dmsetup table', 'dmsetup status' > and 'dmsetup udevcookies'. > > If that still doesn't help, break the 'dmsetup create' command down into > its three constituent commands (dmsetup create --notable, dmsetup load, > dmsetup resume) and dump the state between each of them and confirm > which is failing. Sounds good. Again, what I'm doing with two devices with the same table smells like something that might have been inadvertently allowed before and now not. Or maybe other people do it all the time for other reasons I'm not considering right now. -dmc > > Alasdair > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: semop failed for cookie? 2010-04-27 22:33 ` Alasdair G Kergon 2010-04-28 3:52 ` Douglas McClendon @ 2010-04-28 23:11 ` Douglas McClendon 2010-04-29 0:00 ` Alasdair G Kergon 1 sibling, 1 reply; 11+ messages in thread From: Douglas McClendon @ 2010-04-28 23:11 UTC (permalink / raw) To: dm-devel On 04/27/2010 05:33 PM, Alasdair G Kergon wrote: > On Tue, Apr 27, 2010 at 03:56:57PM -0500, Douglas McClendon wrote: >> I have a user of an installation tool of mine that is hitting this >> message, with a very recent pre-fedora-13 kernel. > > udev is now involved in this process. > Check they have up-to-date lvm2 and udev packages and that they've not > tried to customise their udev rules - if they have, you'll need to > check their changes didn't break things. > > Big script. > > Debug it by adding lines to dump the state immediately before the problem > command, then immediately after it. Actually, I just grabbed the latest soas nightly livecd build, which for these purposes should presumably be considered the same as rawhide. I tried manually do do what I described. I.e. make a duplicate (same table, different name) snapshot device. Interestingly, I'm not seeing the semop cookie thing, but now after the 'resume ioctl failed' message, I checked dmesg, and I'm seeing- device-mapper: snaphots: Unable to perform snapshot handover until source is suspended. Also, this is under virtualization, which, as with other fedora dev builds I've seen, runs bizarrely slowly. I.e. I had a couple text root logins timeout because it didn't finish whatever it needed to finish in 60 seconds. And while booting I saw dozens of weird udev failure messages. But I'm thinking that may have nothing to do with the issue, and hoping the above message elicits an explanation. I.e. is what I'm doing somehow inadvertently utilizing the new snapshot merging semantics even though it wasn't before, and for my purposes shouldn't? -dmc > > Dump state by running 'dmsetup info -c', 'dmsetup table', 'dmsetup status' > and 'dmsetup udevcookies'. > > If that still doesn't help, break the 'dmsetup create' command down into > its three constituent commands (dmsetup create --notable, dmsetup load, > dmsetup resume) and dump the state between each of them and confirm > which is failing. > > Alasdair > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: semop failed for cookie? 2010-04-28 23:11 ` Douglas McClendon @ 2010-04-29 0:00 ` Alasdair G Kergon 2010-04-29 3:32 ` Douglas McClendon 0 siblings, 1 reply; 11+ messages in thread From: Alasdair G Kergon @ 2010-04-29 0:00 UTC (permalink / raw) To: Douglas McClendon; +Cc: dm-devel On Wed, Apr 28, 2010 at 06:11:46PM -0500, Douglas McClendon wrote: > device-mapper: snaphots: Unable to perform snapshot handover until > source is suspended. It has never been OK to have the same snapshot metadata in use simultaneously in two targets at once (because of caching in memory). It's the responsibility of userspace to adhere to the correct semantics or live with the potential data corruption if they are violated. It sounds like your process may fall into that second category. Part of the process of adding snapshot merging support involved providing a controlled method for handing over the snapshot metadata from one target instance to another. If you are trying to move a snapshot from one target to another, then you must either deactivate the snapshot first (older kernels) or (newer kernels) make use of the 'snapshot handover' mechanism as the message suggests. Alasdair ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: semop failed for cookie? 2010-04-29 0:00 ` Alasdair G Kergon @ 2010-04-29 3:32 ` Douglas McClendon 2010-04-29 16:23 ` Alasdair G Kergon 0 siblings, 1 reply; 11+ messages in thread From: Douglas McClendon @ 2010-04-29 3:32 UTC (permalink / raw) To: dm-devel On 04/28/2010 07:00 PM, Alasdair G Kergon wrote: > On Wed, Apr 28, 2010 at 06:11:46PM -0500, Douglas McClendon wrote: >> device-mapper: snaphots: Unable to perform snapshot handover until >> source is suspended. > > It has never been OK to have the same snapshot metadata in use > simultaneously in two targets at once (because of caching in memory). > It's the responsibility of userspace to adhere to the correct semantics > or live with the potential data corruption if they are violated. It > sounds like your process may fall into that second category. Yeah, I had a hunch I was 'getting away with something'. I.e. I wasn't acutely aware of the 3 differentiatable phases of the create. In my case, literally nothing would happen while both instances were 'live', except the handoff. And it seemed to work very reliably. I.e. I would do dmsetup create --table="same as rootfs's table (a snapshot)" rootfs-copy dmsetup reload --table"mirror, goodside rootfs-copy, badside harddisk" rootfs dmsetup resume rootfs I guess I'll have to learn the new snapshot handover stuff. Just let me know if you suspect some impossibility here. Or if there is a magic --live-dangerously flag I can use :) -dmc > > Part of the process of adding snapshot merging support involved > providing a controlled method for handing over the snapshot metadata > from one target instance to another. > > If you are trying to move a snapshot from one target to another, then > you must either deactivate the snapshot first (older kernels) or (newer > kernels) make use of the 'snapshot handover' mechanism as the message > suggests. > > Alasdair ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: semop failed for cookie? 2010-04-29 3:32 ` Douglas McClendon @ 2010-04-29 16:23 ` Alasdair G Kergon 2010-05-03 1:36 ` Douglas McClendon 2010-05-05 5:18 ` Snapshot handover working, yippee!, was " Douglas McClendon 0 siblings, 2 replies; 11+ messages in thread From: Alasdair G Kergon @ 2010-04-29 16:23 UTC (permalink / raw) To: Douglas McClendon; +Cc: dm-devel On Wed, Apr 28, 2010 at 10:32:54PM -0500, Douglas McClendon wrote: > case, literally nothing would happen while both instances were 'live', If you had no data written to the snapshot or origin while both were loaded, you might have got away with it. But I think the new handover code give you a safe and supported mechanism now. Alasdair ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: semop failed for cookie? 2010-04-29 16:23 ` Alasdair G Kergon @ 2010-05-03 1:36 ` Douglas McClendon 2010-05-05 5:18 ` Snapshot handover working, yippee!, was " Douglas McClendon 1 sibling, 0 replies; 11+ messages in thread From: Douglas McClendon @ 2010-05-03 1:36 UTC (permalink / raw) To: dm-devel On 04/29/2010 11:23 AM, Alasdair G Kergon wrote: > On Wed, Apr 28, 2010 at 10:32:54PM -0500, Douglas McClendon wrote: >> case, literally nothing would happen while both instances were 'live', > > If you had no data written to the snapshot or origin while both > were loaded, you might have got away with it. But I think the new handover > code give you a safe and supported mechanism now. > > Alasdair I've got a "BUG" for you- So, I tried on a nightly soas livecd iso build, booted under qemu (should be the same for a rawhide or fedora13 beta i386 livecd iso, booted to runlevel 1 (for simplicity sake)) I tried to suspend the snapshot device holding the rootfs, thinking that I might be able to do that, then resume the newly created with a new name copy of that device, and then resume the rootfs device after loading a new table of a mirror of the copy device and the destination partition. But the instant I suspend the snapshot device containing the rootfs (dmsetup suspend live-rw), I get- BUG: lock held when returning to user space! dmsetup/865 is leaving the kernel with locks still held! 1 lock held by dmsetup/865: #0: (&journal->j_barrier){+.+...+}, at: [<c056b84d>] jbd2_journal_lock_\ updates+0xbd/0xc5 --------------- (manually transcribed) ------------ Now, my first guess as to how to proceed would be to try to get a statically linked dmsetup copied into a tmpfs. Which, given the particular target and my lack of enthusiasm for all of this, may take me some time to try. Any other advice? Note, I could craft some manual dmsetup commands to reproduce what I'm trying to do, that would apply to any fedora-13/soas livecd iso. But for the sake of argument, lets pretend that all I want to do is to run (dmsetup suspend live-rw ; dmsetup resume live-rw) and have the system not fall over dead. Also, to reiterate again, I got that message in dmesg about a problem with the snapshot handover while trying to use my previously working but not guaranteed to work 100% method. But note that method does not involve snapshot merging at all, which from the documentation I found (perhaps I didn't look in all the right places), is the only place that the snapshot handover is related to. Note, I'm not complaining, as this is very low priority for me, but rather just doing my best to explain the issue. -dmc ^ permalink raw reply [flat|nested] 11+ messages in thread
* Snapshot handover working, yippee!, was Re: semop failed for cookie? 2010-04-29 16:23 ` Alasdair G Kergon 2010-05-03 1:36 ` Douglas McClendon @ 2010-05-05 5:18 ` Douglas McClendon 2010-05-05 15:22 ` Mike Snitzer 1 sibling, 1 reply; 11+ messages in thread From: Douglas McClendon @ 2010-05-05 5:18 UTC (permalink / raw) To: dm-devel Ok, so my complaining about code-progress should now be complete. I think I'm back in action. But the "BUG: lock held when returning to user space" still seems like a message you folks wanted yourselves to hear. After grumbling to myself that fedora doesn't provide a package with dmsetup.static, I went ahead and copied all relocatable deps (/lib/ld-linux.so.2??) to tmpfs along with dmsetup, using LD_LIBRARY_PATH. I also used --noudevrules and --noudevsync. Then I followed the documentation I found which, for use cases as esoteric as mine, might be desirable in snapshot.txt https://patchwork.kernel.org/patch/59806/ I.e. rules for snapshot handover in the general case. (who knows, maybe I'll be the only user of non-snapshot-merge snapshot handover for all time) (though technically I guess you can call what I've been doing for the last several years to be an alternate form of snapshot merging. I.e. where your snapshot base is readonly and you are merging both the readonly base and cow to a third writable device) Anyway, I finally got my stuffs working again, or at least, I have a virtual snapshot-as-rootfs being dm-mirror migrated right now. I assume that my loading of a new table, is sufficiently equivalent to the dmsetup remove of the old snapshot as described in the above link. But finally, and the real reason for this message- during this I noticed that the aforementioned "BUG:" was not in fact what was killing my system. I.e. I still get that message, though everything else appears to work and it appears to be harmless (I hope). BUG: lock held when returning to user space! dmsetup/865 is leaving the kernel with locks still held! 1 lock held by dmsetup/865: #0: (&journal->j_barrier){+.+...+}, at: [<c056b84d>] jbd2_journal_lock_\ updates+0xbd/0xc5 Thanks again for putting up with my corner case complaints. Cheers, -dmc ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Snapshot handover working, yippee!, was Re: semop failed for cookie? 2010-05-05 5:18 ` Snapshot handover working, yippee!, was " Douglas McClendon @ 2010-05-05 15:22 ` Mike Snitzer 0 siblings, 0 replies; 11+ messages in thread From: Mike Snitzer @ 2010-05-05 15:22 UTC (permalink / raw) To: Douglas McClendon; +Cc: dm-devel On Wed, May 05 2010 at 1:18am -0400, Douglas McClendon <dmc.fedora@filteredperception.org> wrote: > Ok, so my complaining about code-progress should now be complete. I > think I'm back in action. But the "BUG: lock held when returning to > user space" still seems like a message you folks wanted yourselves > to hear. > > After grumbling to myself that fedora doesn't provide a package with > dmsetup.static, I went ahead and copied all relocatable deps > (/lib/ld-linux.so.2??) to tmpfs along with dmsetup, using > LD_LIBRARY_PATH. I also used --noudevrules and --noudevsync. Then > I followed the documentation I found which, for use cases as > esoteric as mine, might be desirable in snapshot.txt > > https://patchwork.kernel.org/patch/59806/ > > I.e. rules for snapshot handover in the general case. (who knows, > maybe I'll be the only user of non-snapshot-merge snapshot handover > for all time) (though technically I guess you can call what I've > been doing for the last several years to be an alternate form of > snapshot merging. I.e. where your snapshot base is readonly and you > are merging both the readonly base and cow to a third writable > device) > > Anyway, I finally got my stuffs working again, or at least, I have a > virtual snapshot-as-rootfs being dm-mirror migrated right now. I > assume that my loading of a new table, is sufficiently equivalent to > the dmsetup remove of the old snapshot as described in the above > link. Glad to hear snapshot cow handover is working well for you. > But finally, and the real reason for this message- during this I > noticed that the aforementioned "BUG:" was not in fact what was > killing my system. I.e. I still get that message, though everything > else appears to work and it appears to be harmless (I hope). > > BUG: lock held when returning to user space! > > dmsetup/865 is leaving the kernel with locks still held! > 1 lock held by dmsetup/865: > #0: (&journal->j_barrier){+.+...+}, at: [<c056b84d>] jbd2_journal_lock_updates+0xbd/0xc5 This is an ext4 issue that has been fixed upstream, see: https://bugzilla.redhat.com/show_bug.cgi?id=568503 and it is staged for inclussion in the next kernel (2.6.35): http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=commit;h=74a8f0090293696a716c728f9cd484b45083937f ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: semop failed for cookie? 2010-04-27 20:56 semop failed for cookie? Douglas McClendon 2010-04-27 22:33 ` Alasdair G Kergon @ 2010-04-28 9:38 ` Peter Rajnoha 1 sibling, 0 replies; 11+ messages in thread From: Peter Rajnoha @ 2010-04-28 9:38 UTC (permalink / raw) To: device-mapper development; +Cc: Sebastian Dziallas, satellit On 04/27/2010 10:56 PM, Douglas McClendon wrote: > zyx-liveinstaller-cli: creating temporary rootfs virtual duplicate > device-mapper: resume ioctl failed: Invalid argument > semid 65536: semop failed for cookie 0xd4d423c: incorrect semaphore state > Failed to set a proper state for notification semaphore identified by > cookie value 223167036 (0xd4d423c) to initialize waiting for incoming > notifications. Well, the primary cause is that "resume" ioctl that is failing (can you trace the exact parameters that are substituted in the script for that failing dmsetup call?). I think the errors printed afterwards are just an outcome of this failure. Anyway, it seems that our internal "_udev_complete" fn is called more than once on some error path. This call is exactly the same as calling "dmsetup udevcomplete", but we have to call one internally if any error occurs while processing a device-mapper task (that generates udev events). That's because we can't await any notification for failed ioctls since no udev events will be generated. We need to do that to prevent infinite waiting for notifications that will never come. If that internal _udev_complete is called more than necessary, we'll get into an improper state with the semaphore so that needs to be fixed! What's the exact version of dmsetup/lvm2 used? Also, in addition to Alasdair's hints in the other post, could you please run the failing dmsetup with verbose output "dmsetup -vv ...". This way we should see how the semaphore is handled throughout processing.. Peter ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-05-05 15:22 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-27 20:56 semop failed for cookie? Douglas McClendon 2010-04-27 22:33 ` Alasdair G Kergon 2010-04-28 3:52 ` Douglas McClendon 2010-04-28 23:11 ` Douglas McClendon 2010-04-29 0:00 ` Alasdair G Kergon 2010-04-29 3:32 ` Douglas McClendon 2010-04-29 16:23 ` Alasdair G Kergon 2010-05-03 1:36 ` Douglas McClendon 2010-05-05 5:18 ` Snapshot handover working, yippee!, was " Douglas McClendon 2010-05-05 15:22 ` Mike Snitzer 2010-04-28 9:38 ` Peter Rajnoha
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.