Adding a delay when restarting all OSDs on a host

All of lore.kernel.org
 help / color / mirror / Atom feed

* Adding a delay when restarting all OSDs on a host
@ 2014-07-22 13:19 Wido den Hollander
  2014-07-22 13:48 ` Andrey Korolyov
  2014-07-23  0:08 ` Gregory Farnum
  0 siblings, 2 replies; 7+ messages in thread
From: Wido den Hollander @ 2014-07-22 13:19 UTC (permalink / raw)
  To: ceph-devel@vger.kernel.org

Hi,

Currently on Ubuntu with Upstart when you invoke a restart like this:

$ sudo restart ceph-osd-all

It will restart all OSDs at once, which can increase the load on the 
system a quite a bit.

It's better to restart all OSDs by restarting them one by one:

$ sudo ceph restart ceph-osd id=X

But you then have to figure out all the IDs by doing a find in 
/var/lib/ceph/osd and that's more manual work.

I'm thinking of patching the init scripts which allows something like this:

$ sudo restart ceph-osd-all delay=180

It then waits 180 seconds between each OSD restart making the proces 
even smoother.

I know there are currently sysvinit, upstart and systemd scripts, so it 
has to be implemented on various places, but how does the general idea 
sound?

-- 
Wido den Hollander
Ceph consultant and trainer
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Adding a delay when restarting all OSDs on a host
  2014-07-22 13:19 Adding a delay when restarting all OSDs on a host Wido den Hollander
@ 2014-07-22 13:48 ` Andrey Korolyov
  2014-07-22 14:28   ` Wido den Hollander
  2014-07-23  0:08 ` Gregory Farnum
  1 sibling, 1 reply; 7+ messages in thread
From: Andrey Korolyov @ 2014-07-22 13:48 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: ceph-devel@vger.kernel.org

On Tue, Jul 22, 2014 at 5:19 PM, Wido den Hollander <wido@42on.com> wrote:
> Hi,
>
> Currently on Ubuntu with Upstart when you invoke a restart like this:
>
> $ sudo restart ceph-osd-all
>
> It will restart all OSDs at once, which can increase the load on the system
> a quite a bit.
>
> It's better to restart all OSDs by restarting them one by one:
>
> $ sudo ceph restart ceph-osd id=X
>
> But you then have to figure out all the IDs by doing a find in
> /var/lib/ceph/osd and that's more manual work.
>
> I'm thinking of patching the init scripts which allows something like this:
>
> $ sudo restart ceph-osd-all delay=180
>
> It then waits 180 seconds between each OSD restart making the proces even
> smoother.
>
> I know there are currently sysvinit, upstart and systemd scripts, so it has
> to be implemented on various places, but how does the general idea sound?
>
> --
> Wido den Hollander
> Ceph consultant and trainer
> 42on B.V.
>
> Phone: +31 (0)20 700 9902
> Skype: contact42on
> --


Hi,

this behaviour obviously have a negative side of increased overall
peering time and larger integral value of out-of-SLA delays. I`d vote
for warming up necessary files, most likely collections, just before
restart. If there are no enough room to hold all of them at once, we
can probably combine both methods to achieve lower impact value on
restart, although adding a simple delay sounds much more straight than
putting file cache to ram.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Adding a delay when restarting all OSDs on a host
  2014-07-22 13:48 ` Andrey Korolyov
@ 2014-07-22 14:28   ` Wido den Hollander
  2014-07-22 14:58     ` Andrey Korolyov
  0 siblings, 1 reply; 7+ messages in thread
From: Wido den Hollander @ 2014-07-22 14:28 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: ceph-devel@vger.kernel.org

On 07/22/2014 03:48 PM, Andrey Korolyov wrote:
> On Tue, Jul 22, 2014 at 5:19 PM, Wido den Hollander <wido@42on.com> wrote:
>> Hi,
>>
>> Currently on Ubuntu with Upstart when you invoke a restart like this:
>>
>> $ sudo restart ceph-osd-all
>>
>> It will restart all OSDs at once, which can increase the load on the system
>> a quite a bit.
>>
>> It's better to restart all OSDs by restarting them one by one:
>>
>> $ sudo ceph restart ceph-osd id=X
>>
>> But you then have to figure out all the IDs by doing a find in
>> /var/lib/ceph/osd and that's more manual work.
>>
>> I'm thinking of patching the init scripts which allows something like this:
>>
>> $ sudo restart ceph-osd-all delay=180
>>
>> It then waits 180 seconds between each OSD restart making the proces even
>> smoother.
>>
>> I know there are currently sysvinit, upstart and systemd scripts, so it has
>> to be implemented on various places, but how does the general idea sound?
>>
>> --
>> Wido den Hollander
>> Ceph consultant and trainer
>> 42on B.V.
>>
>> Phone: +31 (0)20 700 9902
>> Skype: contact42on
>> --
>
>
> Hi,
>
> this behaviour obviously have a negative side of increased overall
> peering time and larger integral value of out-of-SLA delays. I`d vote
> for warming up necessary files, most likely collections, just before
> restart. If there are no enough room to hold all of them at once, we
> can probably combine both methods to achieve lower impact value on
> restart, although adding a simple delay sounds much more straight than
> putting file cache to ram.
>

In the case I'm talking about there are 23 OSDs running on a single 
machine and restarting all the OSDs causes a lot of peering and reading 
PG logs.

A warm-up mechanism might work, but that would be a lot of work.

When upgrading your cluster you simply want to do this:

$ dsh -g ceph-osd "sudo restart ceph-osd-all delay=180"

That might take hours to complete, but if it's just an upgrade that 
doesn't matter. You want as minimal impact on service as possible.

-- 
Wido den Hollander
Ceph consultant and trainer
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Adding a delay when restarting all OSDs on a host
  2014-07-22 14:28   ` Wido den Hollander
@ 2014-07-22 14:58     ` Andrey Korolyov
  0 siblings, 0 replies; 7+ messages in thread
From: Andrey Korolyov @ 2014-07-22 14:58 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: ceph-devel@vger.kernel.org

On Tue, Jul 22, 2014 at 6:28 PM, Wido den Hollander <wido@42on.com> wrote:
> On 07/22/2014 03:48 PM, Andrey Korolyov wrote:
>>
>> On Tue, Jul 22, 2014 at 5:19 PM, Wido den Hollander <wido@42on.com> wrote:
>>>
>>> Hi,
>>>
>>> Currently on Ubuntu with Upstart when you invoke a restart like this:
>>>
>>> $ sudo restart ceph-osd-all
>>>
>>> It will restart all OSDs at once, which can increase the load on the
>>> system
>>> a quite a bit.
>>>
>>> It's better to restart all OSDs by restarting them one by one:
>>>
>>> $ sudo ceph restart ceph-osd id=X
>>>
>>> But you then have to figure out all the IDs by doing a find in
>>> /var/lib/ceph/osd and that's more manual work.
>>>
>>> I'm thinking of patching the init scripts which allows something like
>>> this:
>>>
>>> $ sudo restart ceph-osd-all delay=180
>>>
>>> It then waits 180 seconds between each OSD restart making the proces even
>>> smoother.
>>>
>>> I know there are currently sysvinit, upstart and systemd scripts, so it
>>> has
>>> to be implemented on various places, but how does the general idea sound?
>>>
>>> --
>>> Wido den Hollander
>>> Ceph consultant and trainer
>>> 42on B.V.
>>>
>>> Phone: +31 (0)20 700 9902
>>> Skype: contact42on
>>> --
>>
>>
>>
>> Hi,
>>
>> this behaviour obviously have a negative side of increased overall
>> peering time and larger integral value of out-of-SLA delays. I`d vote
>> for warming up necessary files, most likely collections, just before
>> restart. If there are no enough room to hold all of them at once, we
>> can probably combine both methods to achieve lower impact value on
>> restart, although adding a simple delay sounds much more straight than
>> putting file cache to ram.
>>
>
> In the case I'm talking about there are 23 OSDs running on a single machine
> and restarting all the OSDs causes a lot of peering and reading PG logs.
>
> A warm-up mechanism might work, but that would be a lot of work.
>
> When upgrading your cluster you simply want to do this:
>
> $ dsh -g ceph-osd "sudo restart ceph-osd-all delay=180"
>
> That might take hours to complete, but if it's just an upgrade that doesn't
> matter. You want as minimal impact on service as possible.
>

I may suggest to measure impact with vmtouch[0], it decreased OSD
startup time greatly on mine tests, but I was stuck with same resource
exhaustion as before after OSD marked itself up (IOPS ceiling
primarily).


0. http://hoytech.com/vmtouch/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Adding a delay when restarting all OSDs on a host
  2014-07-22 13:19 Adding a delay when restarting all OSDs on a host Wido den Hollander
  2014-07-22 13:48 ` Andrey Korolyov
@ 2014-07-23  0:08 ` Gregory Farnum
  2014-07-23 19:22   ` Vit Yenukas
  1 sibling, 1 reply; 7+ messages in thread
From: Gregory Farnum @ 2014-07-23  0:08 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: ceph-devel@vger.kernel.org

On Tue, Jul 22, 2014 at 6:19 AM, Wido den Hollander <wido@42on.com> wrote:
> Hi,
>
> Currently on Ubuntu with Upstart when you invoke a restart like this:
>
> $ sudo restart ceph-osd-all
>
> It will restart all OSDs at once, which can increase the load on the system
> a quite a bit.
>
> It's better to restart all OSDs by restarting them one by one:
>
> $ sudo ceph restart ceph-osd id=X
>
> But you then have to figure out all the IDs by doing a find in
> /var/lib/ceph/osd and that's more manual work.
>
> I'm thinking of patching the init scripts which allows something like this:
>
> $ sudo restart ceph-osd-all delay=180
>
> It then waits 180 seconds between each OSD restart making the proces even
> smoother.
>
> I know there are currently sysvinit, upstart and systemd scripts, so it has
> to be implemented on various places, but how does the general idea sound?

That sounds like a good idea to me. I presume you're meaning to
actually delay the restarts, not just turning them on, so that the
daemons all remain alive (that's what it sounds like to me here, just
wanted to clarify).
-Greg

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Adding a delay when restarting all OSDs on a host
  2014-07-23  0:08 ` Gregory Farnum
@ 2014-07-23 19:22   ` Vit Yenukas
  2014-07-23 20:57     ` Milosz Tanski
  0 siblings, 1 reply; 7+ messages in thread
From: Vit Yenukas @ 2014-07-23 19:22 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Wido den Hollander, ceph-devel@vger.kernel.org

Just some fun fact pertaining to the resources consumption during startup sequence - 
we've ran out of memory on a 72-disk server with 256GB RAM during the startup.
ceph-osd dies with 'can not fork' and cores. There were in excess of 40 thousands 
threads when this began to happen. With default thread stack size being 8MB, no wonder :)
Note that this was in an experimental setup with just one node, so all OSDs peering happens on the same host.
Just for heck of it, I reduced the number of OSDs by two (to 36 OSDs) by setting up a soft RAID-0 for each disk pair. 
This worked after some tweaking of udev rules (that ignore 'md' block devs). I'm not sure if we're going to see 
the same problem with real cluster (18 such 72-disk nodes), with EC 9-3. 
Also, not sure if reducing user proc stack to 4MB would be a good idea. 

On 07/22/2014 08:08 PM, Gregory Farnum wrote:

> On Tue, Jul 22, 2014 at 6:19 AM, Wido den Hollander <wido@42on.com> wrote:
>> Hi,
>>
>> Currently on Ubuntu with Upstart when you invoke a restart like this:
>>
>> $ sudo restart ceph-osd-all
>>
>> It will restart all OSDs at once, which can increase the load on the system
>> a quite a bit.
>>
>> It's better to restart all OSDs by restarting them one by one:
>>
>> $ sudo ceph restart ceph-osd id=X
>>
>> But you then have to figure out all the IDs by doing a find in
>> /var/lib/ceph/osd and that's more manual work.
>>
>> I'm thinking of patching the init scripts which allows something like this:
>>
>> $ sudo restart ceph-osd-all delay=180
>>
>> It then waits 180 seconds between each OSD restart making the proces even
>> smoother.
>>
>> I know there are currently sysvinit, upstart and systemd scripts, so it has
>> to be implemented on various places, but how does the general idea sound?
> 
> That sounds like a good idea to me. I presume you're meaning to
> actually delay the restarts, not just turning them on, so that the
> daemons all remain alive (that's what it sounds like to me here, just
> wanted to clarify).
> -Greg

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Adding a delay when restarting all OSDs on a host
  2014-07-23 19:22   ` Vit Yenukas
@ 2014-07-23 20:57     ` Milosz Tanski
  0 siblings, 0 replies; 7+ messages in thread
From: Milosz Tanski @ 2014-07-23 20:57 UTC (permalink / raw)
  To: Vit Yenukas, Gregory Farnum
  Cc: Wido den Hollander, ceph-devel@vger.kernel.org

Default stack size shouldn't matter. At least it's not an issue on a kernel with over-commit turned on (default). Most threads / apps never use that many stack frames (in fact they use a fraction of that), thus the kernel doesn't bother allocating the pages to it. My bet is on some other resource.

On 7/23/14, 3:22 PM, Vit Yenukas wrote:
> Just some fun fact pertaining to the resources consumption during startup sequence - 
> we've ran out of memory on a 72-disk server with 256GB RAM during the startup.
> ceph-osd dies with 'can not fork' and cores. There were in excess of 40 thousands 
> threads when this began to happen. With default thread stack size being 8MB, no wonder :)
> Note that this was in an experimental setup with just one node, so all OSDs peering happens on the same host.
> Just for heck of it, I reduced the number of OSDs by two (to 36 OSDs) by setting up a soft RAID-0 for each disk pair. 
> This worked after some tweaking of udev rules (that ignore 'md' block devs). I'm not sure if we're going to see 
> the same problem with real cluster (18 such 72-disk nodes), with EC 9-3. 
> Also, not sure if reducing user proc stack to 4MB would be a good idea. 
>
> On 07/22/2014 08:08 PM, Gregory Farnum wrote:
>
>> On Tue, Jul 22, 2014 at 6:19 AM, Wido den Hollander <wido@42on.com> wrote:
>>> Hi,
>>>
>>> Currently on Ubuntu with Upstart when you invoke a restart like this:
>>>
>>> $ sudo restart ceph-osd-all
>>>
>>> It will restart all OSDs at once, which can increase the load on the system
>>> a quite a bit.
>>>
>>> It's better to restart all OSDs by restarting them one by one:
>>>
>>> $ sudo ceph restart ceph-osd id=X
>>>
>>> But you then have to figure out all the IDs by doing a find in
>>> /var/lib/ceph/osd and that's more manual work.
>>>
>>> I'm thinking of patching the init scripts which allows something like this:
>>>
>>> $ sudo restart ceph-osd-all delay=180
>>>
>>> It then waits 180 seconds between each OSD restart making the proces even
>>> smoother.
>>>
>>> I know there are currently sysvinit, upstart and systemd scripts, so it has
>>> to be implemented on various places, but how does the general idea sound?
>> That sounds like a good idea to me. I presume you're meaning to
>> actually delay the restarts, not just turning them on, so that the
>> daemons all remain alive (that's what it sounds like to me here, just
>> wanted to clarify).
>> -Greg
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-07-23 20:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-22 13:19 Adding a delay when restarting all OSDs on a host Wido den Hollander
2014-07-22 13:48 ` Andrey Korolyov
2014-07-22 14:28   ` Wido den Hollander
2014-07-22 14:58     ` Andrey Korolyov
2014-07-23  0:08 ` Gregory Farnum
2014-07-23 19:22   ` Vit Yenukas
2014-07-23 20:57     ` Milosz Tanski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.