All of lore.kernel.org
 help / color / mirror / Atom feed
* semop failed for cookie?
@ 2010-04-27 20:56 Douglas McClendon
  2010-04-27 22:33 ` Alasdair G Kergon
  2010-04-28  9:38 ` Peter Rajnoha
  0 siblings, 2 replies; 11+ messages in thread
From: Douglas McClendon @ 2010-04-27 20:56 UTC (permalink / raw)
  To: dm-devel; +Cc: Sebastian Dziallas, satellit

Hi,

I have a user of an installation tool of mine that is hitting this 
message, with a very recent pre-fedora-13 kernel.

zyx-liveinstaller-cli: creating temporary rootfs virtual duplicate
device-mapper: resume ioctl failed: Invalid argument
semid 65536: semop failed for cookie 0xd4d423c: incorrect semaphore state
Failed to set a proper state for notification semaphore identified by 
cookie value 223167036 (0xd4d423c) to initialize waiting for incoming 
notifications.
Command failed
zyx-liveinstaller-cli: error: failed to create temporary rootfs virtual 
duplicate

The error is happening at line 742 in this monster bash script (which 
I'm pretty sure does work on f11 and f12)

http://filteredperception.org/dawg/projects/zyx-liveinstaller/src/zyx-liveinstaller-latest/rli/zyx-liveinstaller-cli.html

http://filteredperception.org/dawg/projects/zyx-liveinstaller

Now, I've already urged the sugar on a stick project to primarily pursue 
the path of the standard fedora livecd/usb installer, for many 
additional reasons.  But, I guess I would still like my rebootless 
installer to actually work.  And this may be one of those great corner 
case issues that developers love.  I.e. a very obscure tickling of some 
code path.  Or maybe I'm just being too lazy to read the accompanying 
dmsetup man page for some clue as to what I should be doing when I see 
such a bizarre error message.  Or, some other bug in the script is 
causing the arguments to become malformed.  Still, the crypticness of 
the message from devicemapper suggests something you folks might want to 
improve.  In any event, FWIW, there it is...

-dmc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: semop failed for cookie?
  2010-04-27 20:56 semop failed for cookie? Douglas McClendon
@ 2010-04-27 22:33 ` Alasdair G Kergon
  2010-04-28  3:52   ` Douglas McClendon
  2010-04-28 23:11   ` Douglas McClendon
  2010-04-28  9:38 ` Peter Rajnoha
  1 sibling, 2 replies; 11+ messages in thread
From: Alasdair G Kergon @ 2010-04-27 22:33 UTC (permalink / raw)
  To: device-mapper development; +Cc: Sebastian Dziallas, satellit

On Tue, Apr 27, 2010 at 03:56:57PM -0500, Douglas McClendon wrote:
> I have a user of an installation tool of mine that is hitting this  
> message, with a very recent pre-fedora-13 kernel.

udev is now involved in this process.
Check they have up-to-date lvm2 and udev packages and that they've not
tried to customise their udev rules - if they have, you'll need to
check their changes didn't break things.

Big script.

Debug it by adding lines to dump the state immediately before the problem
command, then immediately after it.

Dump state by running 'dmsetup info -c', 'dmsetup table', 'dmsetup status'
and 'dmsetup udevcookies'.

If that still doesn't help, break the 'dmsetup create' command down into
its three constituent commands (dmsetup create --notable, dmsetup load,
dmsetup resume) and dump the state between each of them and confirm
which is failing.

Alasdair

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: semop failed for cookie?
  2010-04-27 22:33 ` Alasdair G Kergon
@ 2010-04-28  3:52   ` Douglas McClendon
  2010-04-28 23:11   ` Douglas McClendon
  1 sibling, 0 replies; 11+ messages in thread
From: Douglas McClendon @ 2010-04-28  3:52 UTC (permalink / raw)
  To: dm-devel

On 04/27/2010 05:33 PM, Alasdair G Kergon wrote:
> On Tue, Apr 27, 2010 at 03:56:57PM -0500, Douglas McClendon wrote:
>> I have a user of an installation tool of mine that is hitting this
>> message, with a very recent pre-fedora-13 kernel.
>
> udev is now involved in this process.
> Check they have up-to-date lvm2 and udev packages and that they've not
> tried to customise their udev rules - if they have, you'll need to
> check their changes didn't break things.

Thanks for the reply and the advice.  I'm not so interested in the issue 
that I'll necessarily get to it very soon, but what you said will no 
doubt help.

One thing though, which may not have been obvious, and almost sounds 
dubious, is what I'm actually doing there.  I'll try to describe it in 
words here, to see if this shouts out as a situation that may no longer 
be expected to work (because honestly I was probably pretty pleased and 
half-surprised to discover that it did work 3 years ago)-

Basically the livecd mode you should be familiar with.  ext3 image on a 
loop device.  cow file in tmpfs on a loop device.  Combined with 
dm-snapshot, resulting in what is used as the rootfs device.  Simple enough.

So what I do (and this is dusty code I haven't payed attention to in a 
long time, so maybe I'm misunderstanding my own code, but probably not) 
is this-

1) with that dmsetup create that is now failing, I first create a 
duplicate device (different name, same table) as the one that the 
rootfs.  I.e. another snapshot device with the same components/table.

2) I use a reload --table on the device that is the rootfs, to replace 
it with a new table, that is a mirror of the device created in (1) and 
the target normal hard disk partition that the script is installing the 
OS to.

3) I do a resume on the rootfs device such that the new table with the 
mirror activates, and the migration starts to occur

4) when the mirror completes, I do another reload then resume with a new 
linear table pointing to the newly installed fs on normal disk 
partition.  Then I tear down all the unused original devices.

So, if something about this description screams out- the new udev 
semantics will prevent (1) from working, let me know.


>
> Big script.
>
> Debug it by adding lines to dump the state immediately before the problem
> command, then immediately after it.
>
> Dump state by running 'dmsetup info -c', 'dmsetup table', 'dmsetup status'
> and 'dmsetup udevcookies'.
>
> If that still doesn't help, break the 'dmsetup create' command down into
> its three constituent commands (dmsetup create --notable, dmsetup load,
> dmsetup resume) and dump the state between each of them and confirm
> which is failing.

Sounds good.  Again, what I'm doing with two devices with the same table 
smells like something that might have been inadvertently allowed before 
and now not.  Or maybe other people do it all the time for other reasons 
I'm not considering right now.

-dmc


>
> Alasdair
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: semop failed for cookie?
  2010-04-27 20:56 semop failed for cookie? Douglas McClendon
  2010-04-27 22:33 ` Alasdair G Kergon
@ 2010-04-28  9:38 ` Peter Rajnoha
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Rajnoha @ 2010-04-28  9:38 UTC (permalink / raw)
  To: device-mapper development; +Cc: Sebastian Dziallas, satellit

On 04/27/2010 10:56 PM, Douglas McClendon wrote:
> zyx-liveinstaller-cli: creating temporary rootfs virtual duplicate
> device-mapper: resume ioctl failed: Invalid argument
> semid 65536: semop failed for cookie 0xd4d423c: incorrect semaphore state
> Failed to set a proper state for notification semaphore identified by
> cookie value 223167036 (0xd4d423c) to initialize waiting for incoming
> notifications.

Well, the primary cause is that "resume" ioctl that is failing
(can you trace the exact parameters that are substituted in the
script for that failing dmsetup call?). I think the errors printed
afterwards are just an outcome of this failure.

Anyway, it seems that our internal "_udev_complete" fn is called more
than once on some error path. This call is exactly the same as calling
"dmsetup udevcomplete", but we have to call one internally if any error
occurs while processing a device-mapper task (that generates udev events).
That's because we can't await any notification for failed ioctls since
no udev events will be generated. We need to do that to prevent
infinite waiting for notifications that will never come.

If that internal _udev_complete is called more than necessary, we'll
get into an improper state with the semaphore so that needs to be
fixed!

What's the exact version of dmsetup/lvm2 used? Also, in addition to
Alasdair's hints in the other post, could you please run the failing
dmsetup with verbose output "dmsetup -vv ...". This way we should
see how the semaphore is handled throughout processing..

Peter

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: semop failed for cookie?
  2010-04-27 22:33 ` Alasdair G Kergon
  2010-04-28  3:52   ` Douglas McClendon
@ 2010-04-28 23:11   ` Douglas McClendon
  2010-04-29  0:00     ` Alasdair G Kergon
  1 sibling, 1 reply; 11+ messages in thread
From: Douglas McClendon @ 2010-04-28 23:11 UTC (permalink / raw)
  To: dm-devel

On 04/27/2010 05:33 PM, Alasdair G Kergon wrote:
> On Tue, Apr 27, 2010 at 03:56:57PM -0500, Douglas McClendon wrote:
>> I have a user of an installation tool of mine that is hitting this
>> message, with a very recent pre-fedora-13 kernel.
>
> udev is now involved in this process.
> Check they have up-to-date lvm2 and udev packages and that they've not
> tried to customise their udev rules - if they have, you'll need to
> check their changes didn't break things.
>
> Big script.
>
> Debug it by adding lines to dump the state immediately before the problem
> command, then immediately after it.

Actually, I just grabbed the latest soas nightly livecd build, which for 
these purposes should presumably be considered the same as rawhide.

I tried manually do do what I described.  I.e. make a duplicate (same 
table, different name) snapshot device.

Interestingly, I'm not seeing the semop cookie thing, but now after the 
'resume ioctl failed' message, I checked dmesg, and I'm seeing-

device-mapper: snaphots: Unable to perform snapshot handover until 
source is suspended.

Also, this is under virtualization, which, as with other fedora dev 
builds I've seen, runs bizarrely slowly.  I.e. I had a couple text root 
logins timeout because it didn't finish whatever it needed to finish in 
60 seconds.  And while booting I saw dozens of weird udev failure 
messages.  But I'm thinking that may have nothing to do with the issue, 
and hoping the above message elicits an explanation.  I.e. is what I'm 
doing somehow inadvertently utilizing the new snapshot merging semantics 
even though it wasn't before, and for my purposes shouldn't?

-dmc


>
> Dump state by running 'dmsetup info -c', 'dmsetup table', 'dmsetup status'
> and 'dmsetup udevcookies'.
>
> If that still doesn't help, break the 'dmsetup create' command down into
> its three constituent commands (dmsetup create --notable, dmsetup load,
> dmsetup resume) and dump the state between each of them and confirm
> which is failing.
>
> Alasdair
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: semop failed for cookie?
  2010-04-28 23:11   ` Douglas McClendon
@ 2010-04-29  0:00     ` Alasdair G Kergon
  2010-04-29  3:32       ` Douglas McClendon
  0 siblings, 1 reply; 11+ messages in thread
From: Alasdair G Kergon @ 2010-04-29  0:00 UTC (permalink / raw)
  To: Douglas McClendon; +Cc: dm-devel

On Wed, Apr 28, 2010 at 06:11:46PM -0500, Douglas McClendon wrote:
> device-mapper: snaphots: Unable to perform snapshot handover until  
> source is suspended.

It has never been OK to have the same snapshot metadata in use
simultaneously in two targets at once (because of caching in memory).
It's the responsibility of userspace to adhere to the correct semantics
or live with the potential data corruption if they are violated.  It
sounds like your process may fall into that second category.

Part of the process of adding snapshot merging support involved
providing a controlled method for handing over the snapshot metadata
from one target instance to another.

If you are trying to move a snapshot from one target to another, then
you must either deactivate the snapshot first (older kernels) or (newer
kernels) make use of the 'snapshot handover' mechanism as the message
suggests.

Alasdair

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: semop failed for cookie?
  2010-04-29  0:00     ` Alasdair G Kergon
@ 2010-04-29  3:32       ` Douglas McClendon
  2010-04-29 16:23         ` Alasdair G Kergon
  0 siblings, 1 reply; 11+ messages in thread
From: Douglas McClendon @ 2010-04-29  3:32 UTC (permalink / raw)
  To: dm-devel

On 04/28/2010 07:00 PM, Alasdair G Kergon wrote:
> On Wed, Apr 28, 2010 at 06:11:46PM -0500, Douglas McClendon wrote:
>> device-mapper: snaphots: Unable to perform snapshot handover until
>> source is suspended.
>
> It has never been OK to have the same snapshot metadata in use
> simultaneously in two targets at once (because of caching in memory).
> It's the responsibility of userspace to adhere to the correct semantics
> or live with the potential data corruption if they are violated.  It
> sounds like your process may fall into that second category.

Yeah, I had a hunch I was 'getting away with something'.  I.e. I wasn't 
acutely aware of the 3 differentiatable phases of the create.  In my 
case, literally nothing would happen while both instances were 'live', 
except the handoff.  And it seemed to work very reliably.  I.e. I would do

dmsetup create --table="same as rootfs's table (a snapshot)" rootfs-copy
dmsetup reload --table"mirror, goodside rootfs-copy, badside harddisk" 
rootfs
dmsetup resume rootfs

I guess I'll have to learn the new snapshot handover stuff.  Just let me 
know if you suspect some impossibility here.  Or if there is a magic 
--live-dangerously flag I can use :)

-dmc


>
> Part of the process of adding snapshot merging support involved
> providing a controlled method for handing over the snapshot metadata
> from one target instance to another.
>
> If you are trying to move a snapshot from one target to another, then
> you must either deactivate the snapshot first (older kernels) or (newer
> kernels) make use of the 'snapshot handover' mechanism as the message
> suggests.
>
> Alasdair

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: semop failed for cookie?
  2010-04-29  3:32       ` Douglas McClendon
@ 2010-04-29 16:23         ` Alasdair G Kergon
  2010-05-03  1:36           ` Douglas McClendon
  2010-05-05  5:18           ` Snapshot handover working, yippee!, was " Douglas McClendon
  0 siblings, 2 replies; 11+ messages in thread
From: Alasdair G Kergon @ 2010-04-29 16:23 UTC (permalink / raw)
  To: Douglas McClendon; +Cc: dm-devel

On Wed, Apr 28, 2010 at 10:32:54PM -0500, Douglas McClendon wrote:
> case, literally nothing would happen while both instances were 'live',  

If you had no data written to the snapshot or origin while both
were loaded, you might have got away with it.  But I think the new handover
code give you a safe and supported mechanism now.

Alasdair

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: semop failed for cookie?
  2010-04-29 16:23         ` Alasdair G Kergon
@ 2010-05-03  1:36           ` Douglas McClendon
  2010-05-05  5:18           ` Snapshot handover working, yippee!, was " Douglas McClendon
  1 sibling, 0 replies; 11+ messages in thread
From: Douglas McClendon @ 2010-05-03  1:36 UTC (permalink / raw)
  To: dm-devel

On 04/29/2010 11:23 AM, Alasdair G Kergon wrote:
> On Wed, Apr 28, 2010 at 10:32:54PM -0500, Douglas McClendon wrote:
>> case, literally nothing would happen while both instances were 'live',
>
> If you had no data written to the snapshot or origin while both
> were loaded, you might have got away with it.  But I think the new handover
> code give you a safe and supported mechanism now.
>
> Alasdair

I've got a "BUG" for you-

So, I tried on a nightly soas livecd iso build, booted under qemu 
(should be the same for a rawhide or fedora13 beta i386 livecd iso, 
booted to runlevel 1 (for simplicity sake))

I tried to suspend the snapshot device holding the rootfs, thinking that 
I might be able to do that, then resume the newly created with a new 
name copy of that device, and then resume the rootfs device after 
loading a new table of a mirror of the copy device and the destination 
partition.

But the instant I suspend the snapshot device containing the rootfs 
(dmsetup suspend live-rw), I get-

BUG: lock held when returning to user space!

dmsetup/865 is leaving the kernel with locks still held!
1 lock held by dmsetup/865:
  #0:   (&journal->j_barrier){+.+...+}, at: [<c056b84d>] jbd2_journal_lock_\
updates+0xbd/0xc5

--------------- (manually transcribed) ------------

Now, my first guess as to how to proceed would be to try to get a 
statically linked dmsetup copied into a tmpfs.  Which, given the 
particular target and my lack of enthusiasm for all of this, may take me 
some time to try.  Any other advice?  Note, I could craft some manual 
dmsetup commands to reproduce what I'm trying to do, that would apply to 
any fedora-13/soas livecd iso.  But for the sake of argument, lets 
pretend that all I want to do is to run (dmsetup suspend live-rw ; 
dmsetup resume live-rw) and have the system not fall over dead.

Also, to reiterate again, I got that message in dmesg about a problem 
with the snapshot handover while trying to use my previously working but 
not guaranteed to work 100% method.  But note that method does not 
involve snapshot merging at all, which from the documentation I found 
(perhaps I didn't look in all the right places), is the only place that 
the snapshot handover is related to.

Note, I'm not complaining, as this is very low priority for me, but 
rather just doing my best to explain the issue.

-dmc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Snapshot handover working, yippee!, was Re:  semop failed for cookie?
  2010-04-29 16:23         ` Alasdair G Kergon
  2010-05-03  1:36           ` Douglas McClendon
@ 2010-05-05  5:18           ` Douglas McClendon
  2010-05-05 15:22             ` Mike Snitzer
  1 sibling, 1 reply; 11+ messages in thread
From: Douglas McClendon @ 2010-05-05  5:18 UTC (permalink / raw)
  To: dm-devel

Ok, so my complaining about code-progress should now be complete.  I 
think I'm back in action.  But the "BUG: lock held when returning to 
user space" still seems like a message you folks wanted yourselves to hear.

After grumbling to myself that fedora doesn't provide a package with 
dmsetup.static, I went ahead and copied all relocatable deps 
(/lib/ld-linux.so.2??) to tmpfs along with dmsetup, using 
LD_LIBRARY_PATH.  I also used --noudevrules and --noudevsync.  Then I 
followed the documentation I found which, for use cases as esoteric as 
mine, might be desirable in snapshot.txt

https://patchwork.kernel.org/patch/59806/

I.e. rules for snapshot handover in the general case.  (who knows, maybe 
I'll be the only user of non-snapshot-merge snapshot handover for all 
time) (though technically I guess you can call what I've been doing for 
the last several years to be an alternate form of snapshot merging. 
I.e. where your snapshot base is readonly and you are merging both the 
readonly base and cow to a third writable device)

Anyway, I finally got my stuffs working again, or at least, I have a 
virtual snapshot-as-rootfs being dm-mirror migrated right now.  I assume 
that my loading of a new table, is sufficiently equivalent to the 
dmsetup remove of the old snapshot as described in the above link.

But finally, and the real reason for this message- during this I noticed 
that the aforementioned "BUG:" was not in fact what was killing my 
system.  I.e. I still get that message, though everything else appears 
to work and it appears to be harmless (I hope).

BUG: lock held when returning to user space!

dmsetup/865 is leaving the kernel with locks still held!
1 lock held by dmsetup/865:
  #0:   (&journal->j_barrier){+.+...+}, at: [<c056b84d>] jbd2_journal_lock_\
updates+0xbd/0xc5

Thanks again for putting up with my corner case complaints.

Cheers,

-dmc

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Snapshot handover working, yippee!, was Re:  semop failed for cookie?
  2010-05-05  5:18           ` Snapshot handover working, yippee!, was " Douglas McClendon
@ 2010-05-05 15:22             ` Mike Snitzer
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Snitzer @ 2010-05-05 15:22 UTC (permalink / raw)
  To: Douglas McClendon; +Cc: dm-devel

On Wed, May 05 2010 at  1:18am -0400,
Douglas McClendon <dmc.fedora@filteredperception.org> wrote:

> Ok, so my complaining about code-progress should now be complete.  I
> think I'm back in action.  But the "BUG: lock held when returning to
> user space" still seems like a message you folks wanted yourselves
> to hear.
> 
> After grumbling to myself that fedora doesn't provide a package with
> dmsetup.static, I went ahead and copied all relocatable deps
> (/lib/ld-linux.so.2??) to tmpfs along with dmsetup, using
> LD_LIBRARY_PATH.  I also used --noudevrules and --noudevsync.  Then
> I followed the documentation I found which, for use cases as
> esoteric as mine, might be desirable in snapshot.txt
> 
> https://patchwork.kernel.org/patch/59806/
> 
> I.e. rules for snapshot handover in the general case.  (who knows,
> maybe I'll be the only user of non-snapshot-merge snapshot handover
> for all time) (though technically I guess you can call what I've
> been doing for the last several years to be an alternate form of
> snapshot merging. I.e. where your snapshot base is readonly and you
> are merging both the readonly base and cow to a third writable
> device)
> 
> Anyway, I finally got my stuffs working again, or at least, I have a
> virtual snapshot-as-rootfs being dm-mirror migrated right now.  I
> assume that my loading of a new table, is sufficiently equivalent to
> the dmsetup remove of the old snapshot as described in the above
> link.

Glad to hear snapshot cow handover is working well for you.

> But finally, and the real reason for this message- during this I
> noticed that the aforementioned "BUG:" was not in fact what was
> killing my system.  I.e. I still get that message, though everything
> else appears to work and it appears to be harmless (I hope).
> 
> BUG: lock held when returning to user space!
> 
> dmsetup/865 is leaving the kernel with locks still held!
> 1 lock held by dmsetup/865:
>  #0:   (&journal->j_barrier){+.+...+}, at: [<c056b84d>] jbd2_journal_lock_updates+0xbd/0xc5

This is an ext4 issue that has been fixed upstream, see:
https://bugzilla.redhat.com/show_bug.cgi?id=568503

and it is staged for inclussion in the next kernel (2.6.35):
http://git.kernel.org/?p=linux/kernel/git/tytso/ext4.git;a=commit;h=74a8f0090293696a716c728f9cd484b45083937f

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-05-05 15:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-27 20:56 semop failed for cookie? Douglas McClendon
2010-04-27 22:33 ` Alasdair G Kergon
2010-04-28  3:52   ` Douglas McClendon
2010-04-28 23:11   ` Douglas McClendon
2010-04-29  0:00     ` Alasdair G Kergon
2010-04-29  3:32       ` Douglas McClendon
2010-04-29 16:23         ` Alasdair G Kergon
2010-05-03  1:36           ` Douglas McClendon
2010-05-05  5:18           ` Snapshot handover working, yippee!, was " Douglas McClendon
2010-05-05 15:22             ` Mike Snitzer
2010-04-28  9:38 ` Peter Rajnoha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.