All of lore.kernel.org
 help / color / mirror / Atom feed
* partprobe or partx or ... ?
@ 2015-09-19 15:23 Loic Dachary
  2015-09-19 20:08 ` Loic Dachary
  0 siblings, 1 reply; 4+ messages in thread
From: Loic Dachary @ 2015-09-19 15:23 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 1085 bytes --]

Hi Ilya,

At present ceph-disk uses partprobe to ensure the kernel is aware of the latest partition changes after a new one is created, or after zapping the partition table. Although it works reliably (in the sense that the kernel is indeed aware of the desired partition layout), it goes as far as to remove all partition devices of the current kernel table, only to re-add them with the new partition table. The delay it implies is not an issue because ceph-disk is rarely called. It however generate many udev events (dozens remove/change/add for a two partition disk) and almost always creates border cases that are difficult to figure out and debug. While it is a good way to ensure that ceph-disk is idempotent and immune to race conditions, maybe it is needlessly hard.

Do you know of a light weight alternative to partprobe ? In the past we've used partx but I remember it failed to address some border cases in non-intuitive ways. Do you know of another, simpler, approach to this ?

Thanks in advance for your help :-)

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: partprobe or partx or ... ?
  2015-09-19 15:23 partprobe or partx or ... ? Loic Dachary
@ 2015-09-19 20:08 ` Loic Dachary
  2015-09-21 10:23   ` Ilya Dryomov
  0 siblings, 1 reply; 4+ messages in thread
From: Loic Dachary @ 2015-09-19 20:08 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]



On 19/09/2015 17:23, Loic Dachary wrote:
> Hi Ilya,
> 
> At present ceph-disk uses partprobe to ensure the kernel is aware of the latest partition changes after a new one is created, or after zapping the partition table. Although it works reliably (in the sense that the kernel is indeed aware of the desired partition layout), it goes as far as to remove all partition devices of the current kernel table, only to re-add them with the new partition table. The delay it implies is not an issue because ceph-disk is rarely called. It however generate many udev events (dozens remove/change/add for a two partition disk) and almost always creates border cases that are difficult to figure out and debug. While it is a good way to ensure that ceph-disk is idempotent and immune to race conditions, maybe it is needlessly hard.
> 
> Do you know of a light weight alternative to partprobe ? In the past we've used partx but I remember it failed to address some border cases in non-intuitive ways. Do you know of another, simpler, approach to this ?
> 
> Thanks in advance for your help :-)
> 

For the record using /sys/block/sdX/device/rescan sounds good but does not exist for devices created via devicemapper (used for dmcrypt and multipath).

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: partprobe or partx or ... ?
  2015-09-19 20:08 ` Loic Dachary
@ 2015-09-21 10:23   ` Ilya Dryomov
  2015-09-21 10:40     ` Loic Dachary
  0 siblings, 1 reply; 4+ messages in thread
From: Ilya Dryomov @ 2015-09-21 10:23 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

On Sat, Sep 19, 2015 at 11:08 PM, Loic Dachary <loic@dachary.org> wrote:
>
>
> On 19/09/2015 17:23, Loic Dachary wrote:
>> Hi Ilya,
>>
>> At present ceph-disk uses partprobe to ensure the kernel is aware of the latest partition changes after a new one is created, or after zapping the partition table. Although it works reliably (in the sense that the kernel is indeed aware of the desired partition layout), it goes as far as to remove all partition devices of the current kernel table, only to re-add them with the new partition table. The delay it implies is not an issue because ceph-disk is rarely called. It however generate many udev events (dozens remove/change/add for a two partition disk) and almost always creates border cases that are difficult to figure out and debug. While it is a good way to ensure that ceph-disk is idempotent and immune to race conditions, maybe it is needlessly hard.
>>
>> Do you know of a light weight alternative to partprobe ? In the past we've used partx but I remember it failed to address some border cases in non-intuitive ways. Do you know of another, simpler, approach to this ?
>>
>> Thanks in advance for your help :-)
>>
>
> For the record using /sys/block/sdX/device/rescan sounds good but does not exist for devices created via devicemapper (used for dmcrypt and multipath).

Hi Loic,

Yeah, partprobe loops through the entire partition table, trying do
delete/add every slot.  As an aside, the in-kernel way to do this
(blockdev --rereadpt) is similar in that it also drops all partitions
and re-adds them later, but it's faster and probably generates less
change events.  The downside is it won't work on busy device.

I don't think there is any alternative, except for using partx --add
with --nr, that is targeting specific slots in the partition table.  If
all you are doing is adding partitions and zapping entire partition
tables, that may work well enough.

That said, given that the resulting delay (which can be in the seconds
range, especially if your disk happens to have a busy partition) isn't
a problem, what difference does it make?  What are you listening to
those events for?

/sys/block/sdX/device/rescan is sd only, and AFAIK it doesn't generally
trigger a re-read of a partition table.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: partprobe or partx or ... ?
  2015-09-21 10:23   ` Ilya Dryomov
@ 2015-09-21 10:40     ` Loic Dachary
  0 siblings, 0 replies; 4+ messages in thread
From: Loic Dachary @ 2015-09-21 10:40 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 3570 bytes --]

Hi Ilya,

On 21/09/2015 12:23, Ilya Dryomov wrote:
> On Sat, Sep 19, 2015 at 11:08 PM, Loic Dachary <loic@dachary.org> wrote:
>>
>>
>> On 19/09/2015 17:23, Loic Dachary wrote:
>>> Hi Ilya,
>>>
>>> At present ceph-disk uses partprobe to ensure the kernel is aware of the latest partition changes after a new one is created, or after zapping the partition table. Although it works reliably (in the sense that the kernel is indeed aware of the desired partition layout), it goes as far as to remove all partition devices of the current kernel table, only to re-add them with the new partition table. The delay it implies is not an issue because ceph-disk is rarely called. It however generate many udev events (dozens remove/change/add for a two partition disk) and almost always creates border cases that are difficult to figure out and debug. While it is a good way to ensure that ceph-disk is idempotent and immune to race conditions, maybe it is needlessly hard.
>>>
>>> Do you know of a light weight alternative to partprobe ? In the past we've used partx but I remember it failed to address some border cases in non-intuitive ways. Do you know of another, simpler, approach to this ?
>>>
>>> Thanks in advance for your help :-)
>>>
>>
>> For the record using /sys/block/sdX/device/rescan sounds good but does not exist for devices created via devicemapper (used for dmcrypt and multipath).
> 
> Hi Loic,
> 
> Yeah, partprobe loops through the entire partition table, trying do
> delete/add every slot.  As an aside, the in-kernel way to do this
> (blockdev --rereadpt) is similar in that it also drops all partitions
> and re-adds them later, but it's faster and probably generates less
> change events.  The downside is it won't work on busy device.
> 
> I don't think there is any alternative, except for using partx --add
> with --nr, that is targeting specific slots in the partition table.  If
> all you are doing is adding partitions and zapping entire partition
> tables, that may work well enough.
> 
> That said, given that the resulting delay (which can be in the seconds
> range, especially if your disk happens to have a busy partition) isn't
> a problem, what difference does it make?  What are you listening to
> those events for?

This is part of the ceph-disk prepare / activate workflow:

 ceph-disk prepare creates partitions, mounts them, populate them and exits
 ceph udev rules ( 95-ceph-osd.rules ) react to udev events when the partition type is known and run ceph-disk activate in the background

When a machine boots or a disk is hot swapped, udev rules do the same and activate: we only have one code path for all cases. The problem is to ensure all race conditions are addressed. What used to work in hammer has to be revisited because the code path was changed in infernalis. udev actions no longer call ceph-disk activate, because it can take a long time and that's not what udev is good at. Instead, udev actions run ceph-disk activate in the background, using systemd/upstart when available (it falls back to the legacy syncrhonous behavior when they are not available).

I think I managed to address all race conditons with the patch series at https://github.com/ceph/ceph/pull/5999.

We should be good with partprobe :-)

> 
> /sys/block/sdX/device/rescan is sd only, and AFAIK it doesn't generally
> trigger a re-read of a partition table.

Thanks a lot for your insights !

Cheers

> 
> Thanks,
> 
>                 Ilya
> 

-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-09-21 10:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-19 15:23 partprobe or partx or ... ? Loic Dachary
2015-09-19 20:08 ` Loic Dachary
2015-09-21 10:23   ` Ilya Dryomov
2015-09-21 10:40     ` Loic Dachary

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.