scoping daemon-helper replacement effort

All of lore.kernel.org
 help / color / mirror / Atom feed

* scoping daemon-helper replacement effort
@ 2016-07-29 16:40 Ken Dreyer
  2016-07-29 16:55 ` Josh Durgin
  0 siblings, 1 reply; 7+ messages in thread
From: Ken Dreyer @ 2016-07-29 16:40 UTC (permalink / raw)
  To: ceph-devel

daemon-helper predates a lot of things in Ceph, and the further we go
into systemd-land with things like unprivilged daemons, SELinux, and
cgroups, the further Teuthology diverges from what our users do. To
remedy this, I want to retire daemon-helper and have Teuthology tests
use the normal init system, particularly now that our main supported
distros are unified around systemd.

From what I understand, we use daemon-helper in Teuthology to:

  1) start a daemon and eventually stop it with either SIGTERM or
SIGKILL, depending on whether the Teuthology task has enabled the
coverage or valgrind options,

  2) send data via STDIN

  3) print some messages when the child crashes

I think we could run the services using the systemd unit files and
still accomplish #1 and #3.

For #2 (communicating to the daemons via STDIN), how could we
accomplish this? What sort of things are we writing to the daemons'
STDIN? I'm having trouble finding examples in ceph-qa-suite.git.

- Ken

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: scoping daemon-helper replacement effort
  2016-07-29 16:40 scoping daemon-helper replacement effort Ken Dreyer
@ 2016-07-29 16:55 ` Josh Durgin
  2016-07-29 16:57   ` Ken Dreyer
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Josh Durgin @ 2016-07-29 16:55 UTC (permalink / raw)
  To: Ken Dreyer, ceph-devel

On 07/29/2016 09:40 AM, Ken Dreyer wrote:
> daemon-helper predates a lot of things in Ceph, and the further we go
> into systemd-land with things like unprivilged daemons, SELinux, and
> cgroups, the further Teuthology diverges from what our users do. To
> remedy this, I want to retire daemon-helper and have Teuthology tests
> use the normal init system, particularly now that our main supported
> distros are unified around systemd.
>
>  From what I understand, we use daemon-helper in Teuthology to:
>
>    1) start a daemon and eventually stop it with either SIGTERM or
> SIGKILL, depending on whether the Teuthology task has enabled the
> coverage or valgrind options,
>
>    2) send data via STDIN
>
>    3) print some messages when the child crashes
>
> I think we could run the services using the systemd unit files and
> still accomplish #1 and #3.
>
> For #2 (communicating to the daemons via STDIN), how could we
> accomplish this? What sort of things are we writing to the daemons'
> STDIN? I'm having trouble finding examples in ceph-qa-suite.git.

We're not using it to write data to the daemons, but as a way to kill 
them automatically if our ssh connection dies.

With fast reimaging in the works, this will be irrelevant. Even now,
it's not really useful for the usual scheduled jobs where the nodes are
rebooted on failure. So I wouldn't worry about (2).

Josh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: scoping daemon-helper replacement effort
  2016-07-29 16:55 ` Josh Durgin
@ 2016-07-29 16:57   ` Ken Dreyer
  2016-07-29 17:22   ` Vasu Kulkarni
  2016-07-29 18:40   ` Gregory Farnum
  2 siblings, 0 replies; 7+ messages in thread
From: Ken Dreyer @ 2016-07-29 16:57 UTC (permalink / raw)
  To: Josh Durgin; +Cc: ceph-devel

On Fri, Jul 29, 2016 at 10:55 AM, Josh Durgin <jdurgin@redhat.com> wrote:
> We're not using it to write data to the daemons, but as a way to kill them
> automatically if our ssh connection dies.
>
> With fast reimaging in the works, this will be irrelevant. Even now,
> it's not really useful for the usual scheduled jobs where the nodes are
> rebooted on failure. So I wouldn't worry about (2).

Oh! That's a lot simpler then, thanks Josh.

Fast re-imaging would be fantastic.

- Ken

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: scoping daemon-helper replacement effort
  2016-07-29 16:55 ` Josh Durgin
  2016-07-29 16:57   ` Ken Dreyer
@ 2016-07-29 17:22   ` Vasu Kulkarni
  2016-07-29 17:25     ` Vasu Kulkarni
  2016-07-29 18:39     ` Ken Dreyer
  2016-07-29 18:40   ` Gregory Farnum
  2 siblings, 2 replies; 7+ messages in thread
From: Vasu Kulkarni @ 2016-07-29 17:22 UTC (permalink / raw)
  To: Josh Durgin; +Cc: Ken Dreyer, ceph-devel

On Fri, Jul 29, 2016 at 9:55 AM, Josh Durgin <jdurgin@redhat.com> wrote:
> On 07/29/2016 09:40 AM, Ken Dreyer wrote:
>>
>> daemon-helper predates a lot of things in Ceph, and the further we go
>> into systemd-land with things like unprivilged daemons, SELinux, and
>> cgroups, the further Teuthology diverges from what our users do. To
>> remedy this, I want to retire daemon-helper and have Teuthology tests
>> use the normal init system, particularly now that our main supported
>> distros are unified around systemd.
If we just have to worry about systemd it would be much simpler compared to
supporting various previous initd systems(14.04/7.0 etc). changes can
affect only jewel+ branches.

>>
>>  From what I understand, we use daemon-helper in Teuthology to:
>>
>>    1) start a daemon and eventually stop it with either SIGTERM or
>> SIGKILL, depending on whether the Teuthology task has enabled the
>> coverage or valgrind options,
I still see some challenges here, some of the code in ceph_manager.py
does pretty fast killing and revive of osd's, I think that would be possible
with systemd and also generally KILL of process wont work since systemd
will end up restart the process based on its config, So those cases have
to be rewritten.

>>
>>    2) send data via STDIN
>>
>>    3) print some messages when the child crashes
>>
>> I think we could run the services using the systemd unit files and
>> still accomplish #1 and #3.
>>
>> For #2 (communicating to the daemons via STDIN), how could we
>> accomplish this? What sort of things are we writing to the daemons'
>> STDIN? I'm having trouble finding examples in ceph-qa-suite.git.
>
>
> We're not using it to write data to the daemons, but as a way to kill them
> automatically if our ssh connection dies.
>
> With fast reimaging in the works, this will be irrelevant. Even now,
> it's not really useful for the usual scheduled jobs where the nodes are
> rebooted on failure. So I wouldn't worry about (2).
>
> Josh
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: scoping daemon-helper replacement effort
  2016-07-29 17:22   ` Vasu Kulkarni
@ 2016-07-29 17:25     ` Vasu Kulkarni
  2016-07-29 18:39     ` Ken Dreyer
  1 sibling, 0 replies; 7+ messages in thread
From: Vasu Kulkarni @ 2016-07-29 17:25 UTC (permalink / raw)
  To: Josh Durgin; +Cc: Ken Dreyer, ceph-devel

On Fri, Jul 29, 2016 at 10:22 AM, Vasu Kulkarni <vakulkar@redhat.com> wrote:
> On Fri, Jul 29, 2016 at 9:55 AM, Josh Durgin <jdurgin@redhat.com> wrote:
>> On 07/29/2016 09:40 AM, Ken Dreyer wrote:
>>>
>>> daemon-helper predates a lot of things in Ceph, and the further we go
>>> into systemd-land with things like unprivilged daemons, SELinux, and
>>> cgroups, the further Teuthology diverges from what our users do. To
>>> remedy this, I want to retire daemon-helper and have Teuthology tests
>>> use the normal init system, particularly now that our main supported
>>> distros are unified around systemd.
> If we just have to worry about systemd it would be much simpler compared to
> supporting various previous initd systems(14.04/7.0 etc). changes can
> affect only jewel+ branches.
>
>>>
>>>  From what I understand, we use daemon-helper in Teuthology to:
>>>
>>>    1) start a daemon and eventually stop it with either SIGTERM or
>>> SIGKILL, depending on whether the Teuthology task has enabled the
>>> coverage or valgrind options,
> I still see some challenges here, some of the code in ceph_manager.py
> does pretty fast killing and revive of osd's, I think that would be possible
  Correction: will not be possible

> with systemd and also generally KILL of process wont work since systemd
> will end up restart the process based on its config, So those cases have
> to be rewritten.
>
>>>
>>>    2) send data via STDIN
>>>
>>>    3) print some messages when the child crashes
>>>
>>> I think we could run the services using the systemd unit files and
>>> still accomplish #1 and #3.
>>>
>>> For #2 (communicating to the daemons via STDIN), how could we
>>> accomplish this? What sort of things are we writing to the daemons'
>>> STDIN? I'm having trouble finding examples in ceph-qa-suite.git.
>>
>>
>> We're not using it to write data to the daemons, but as a way to kill them
>> automatically if our ssh connection dies.
>>
>> With fast reimaging in the works, this will be irrelevant. Even now,
>> it's not really useful for the usual scheduled jobs where the nodes are
>> rebooted on failure. So I wouldn't worry about (2).
>>
>> Josh
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: scoping daemon-helper replacement effort
  2016-07-29 17:22   ` Vasu Kulkarni
  2016-07-29 17:25     ` Vasu Kulkarni
@ 2016-07-29 18:39     ` Ken Dreyer
  1 sibling, 0 replies; 7+ messages in thread
From: Ken Dreyer @ 2016-07-29 18:39 UTC (permalink / raw)
  To: Vasu Kulkarni; +Cc: Josh Durgin, ceph-devel

On Fri, Jul 29, 2016 at 11:22 AM, Vasu Kulkarni <vakulkar@redhat.com> wrote:
> If we just have to worry about systemd it would be much simpler compared to
> supporting various previous initd systems(14.04/7.0 etc). changes can
> affect only jewel+ branches.

Right, the Upstart stuff for Trusty would be a sticking point, but I'm
not too worried about that for now, since this is a ways' off.

Everybody should be moving to Jessie and Xenial, right? :)

> I still see some challenges here, some of the code in ceph_manager.py
> does pretty fast killing and revive of osd's, I think that would be possible
> with systemd and also generally KILL of process wont work since systemd
> will end up restart the process based on its config, So those cases have
> to be rewritten.

That's a good point about systemd automatically restarting the
daemons, that will need more scrutiny.

I see ceph_manager.py doing stuff like:

  self.ctx.daemons.get_daemon('osd', osd, self.cluster).stop()

and from teuthology/orchestra/daemon.py , the "stop()" method there
just calls close() on the daemon's stdin. From what Josh said, stdin
is simply there to keep the daemon alive for the duration of the test,
so this ceph_manager.py code is effectively like sending a TERM or
KILL signal.

In other words, I think there's no relevant difference between what
ceph_manager.py does to kill daemons today, and what "systemctl stop
ceph-osd@..." would do? That would solve the automatic restart
concern.

- Ken

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: scoping daemon-helper replacement effort
  2016-07-29 16:55 ` Josh Durgin
  2016-07-29 16:57   ` Ken Dreyer
  2016-07-29 17:22   ` Vasu Kulkarni
@ 2016-07-29 18:40   ` Gregory Farnum
  2 siblings, 0 replies; 7+ messages in thread
From: Gregory Farnum @ 2016-07-29 18:40 UTC (permalink / raw)
  To: Josh Durgin; +Cc: Ken Dreyer, ceph-devel

On Fri, Jul 29, 2016 at 9:55 AM, Josh Durgin <jdurgin@redhat.com> wrote:
> On 07/29/2016 09:40 AM, Ken Dreyer wrote:
>>
>> daemon-helper predates a lot of things in Ceph, and the further we go
>> into systemd-land with things like unprivilged daemons, SELinux, and
>> cgroups, the further Teuthology diverges from what our users do. To
>> remedy this, I want to retire daemon-helper and have Teuthology tests
>> use the normal init system, particularly now that our main supported
>> distros are unified around systemd.
>>
>>  From what I understand, we use daemon-helper in Teuthology to:
>>
>>    1) start a daemon and eventually stop it with either SIGTERM or
>> SIGKILL, depending on whether the Teuthology task has enabled the
>> coverage or valgrind options,
>>
>>    2) send data via STDIN
>>
>>    3) print some messages when the child crashes
>>
>> I think we could run the services using the systemd unit files and
>> still accomplish #1 and #3.
>>
>> For #2 (communicating to the daemons via STDIN), how could we
>> accomplish this? What sort of things are we writing to the daemons'
>> STDIN? I'm having trouble finding examples in ceph-qa-suite.git.
>
>
> We're not using it to write data to the daemons, but as a way to kill them
> automatically if our ssh connection dies.
>
> With fast reimaging in the works, this will be irrelevant. Even now,
> it's not really useful for the usual scheduled jobs where the nodes are
> rebooted on failure. So I wouldn't worry about (2).

I think we also use this to kill processes by closing stdin on the
teuthology process side. Some quick grepping suggests we have ~21 call
sites and they're largely not in the tracker; I don't imagine they'll
be too hard to replace but it might be a little awkward in some
places.
-Greg

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-07-29 18:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-29 16:40 scoping daemon-helper replacement effort Ken Dreyer
2016-07-29 16:55 ` Josh Durgin
2016-07-29 16:57   ` Ken Dreyer
2016-07-29 17:22   ` Vasu Kulkarni
2016-07-29 17:25     ` Vasu Kulkarni
2016-07-29 18:39     ` Ken Dreyer
2016-07-29 18:40   ` Gregory Farnum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.