From: Milan Broz <mbroz@redhat.com>
To: David Disseldorp <ddiss@suse.de>, Sage Weil <sage@newdream.net>
Cc: Wyllys Ingersoll <wyllys.ingersoll@keepertech.com>,
ceph-devel@vger.kernel.org, Lars Marowsky-Bree <lmb@suse.com>
Subject: Re: dmcrypt with luks keys in hammer
Date: Tue, 21 Jul 2015 16:25:40 +0200 [thread overview]
Message-ID: <55AE5664.9070809@redhat.com> (raw)
In-Reply-To: <20150721131429.472492bf@g21.suse.de>
On 07/21/2015 01:14 PM, David Disseldorp wrote:
>>> A race condition (or other issue) with udev seems likely given that
>>> its rather random which ones come up and which ones don't
>>
>> A race condition during creation or activation? If it's activation I
>> would expect ceph-disk activate ... to work reasonably reliably when
>> called manually (on a single device at a time).
I still do not understand completely how the dmcrypt activation
in Ceph is designed, but there are clear problems in the current design.
Activation of another device-mapper inside udev rules (here LUKS or
plain dmcrypt device) is broken by design, it can work with only
with ugly workarounds.
The first reason is correctly mentioned in your mentioned wip branch
(udev RUN is intended for short-running commands. For example,
I think if you increase iteration count in LUKS device, the whole Ceph udev
rules fails completely because udev thread processing will kill it on timeout...)
(Unlocking can take even minutes when you move encrypted disk to a very slow machine)
The second reason is even more serious - cryptsetup itself uses udev
(through libdevmapper) to create nodes and must synchronize with
some other device-mapper udev rules. So here it is a race by design...
udev waits for another udev process. Ditto for creating /dev/by* links
(created by udev rule as well).
(And add to mix +watch rules, which reacts on close-on-write on every
node by running another udev rule blkid scan. If you see some leftover
temporary-cryptsetup* devices, something is really wrong. These
devices are internal to libcryptsetup and maps keyslots only, there are never
keep open in correct operation.)
So moving activation outside of the udev rules is the correct solution here,
only processing of device nodes should be there and rest should be
offloaded after udev rules run.
> We encountered similar issues on a non-dmcrypt firefly deployment with
> 10 OSDs per node.
>
> I've been working on a patch set to defer device activation to systemd
> services. ceph-disk activate is extended to support mapping of dmcrypt
> devices prior to OSD startup.
Well, using systemd service is one option. But then it should handle all
cryptsetup device activations.
Milan
prev parent reply other threads:[~2015-07-21 14:25 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-20 19:52 dmcrypt with luks keys in hammer Wyllys Ingersoll
2015-07-20 21:22 ` Sage Weil
2015-07-20 21:46 ` Wyllys Ingersoll
2015-07-20 22:21 ` Sage Weil
2015-07-20 22:23 ` Wyllys Ingersoll
2015-07-21 11:14 ` David Disseldorp
2015-07-21 14:00 ` Sage Weil
2015-07-21 14:26 ` Wyllys Ingersoll
2015-07-21 14:25 ` Milan Broz [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55AE5664.9070809@redhat.com \
--to=mbroz@redhat.com \
--cc=ceph-devel@vger.kernel.org \
--cc=ddiss@suse.de \
--cc=lmb@suse.com \
--cc=sage@newdream.net \
--cc=wyllys.ingersoll@keepertech.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.