All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Vorel <pvorel@suse.cz>
To: ltp@lists.linux.it
Subject: [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
Date: Fri, 17 Sep 2021 13:03:00 +0200	[thread overview]
Message-ID: <YUR15AMqM4qEYXpV@pevik> (raw)
In-Reply-To: <YUR1K7XE3QmTFxT7@yuki>

> Hi!
> > > > I managed to reproduce this in dash. I bet that this is a bug where
> > > > signal handler inside dash is temporarily disabled when we install the
> > > > trap and if we manage to hit that window the signal is discarded. At
> > > > least that is my working theory. After I've installed debug prints, in
> > > > the cases where it hangs the signal was sent just before have installed
> > > > the trap. And in some cases when the signal arrives the timer process is
> > > > killed but the trap is not invoked. So it really looks like signal
> > > > handling in dash is simply broken. Not sure what we can do about bugs
> > > > like this apart from switching to a real programming language.
> > Which version of bash and dash are you testing on?

> > > Thanks for the debugging. *bash* is also affected, at least some releases.
> > > I reproduced it also on some older SLES, with bash 4.4.
> > dash 0.5.11.4 and 5.1.8 on my Tumbleweed laptop are OK.

> > I tested it on various my VM:

> > dash *failing*: 0.5.8 (SLES), 0.5.11.3 (Tumbleweed), 0.5.11+git20200708+dd9ef66-5 (Debian), 0.5.7-4+b1 (Debian)
> > dash *OK*: 0.5.11.2 (SLES 15), 0.5.10.2 (CentOS)

> > bash *failing*: 5.1.4-1.4 (Tumbleweed), 4.4-9.7.1 (SLES 15)
> > bash *ok*: 4.4-17 (SLES 15), 4.3-83 (SLES 12), 4.3-11+deb8u (Debian), 5.1-2+b3
> > (Debian), 4.2.46-34 (CentOS)

> > I have no idea what it causes, whether really some bash and dash versions are
> > buggy or it's reproducible only on certain environment.

> bash 5.1.8 seems to work okay, dash 0.5.11.3-r2 seems to fail here.

> > Any tip what to search for?

> Not really, apart from reading source code and figuring out exactly what
> happens here.

> I guess that we can make things more predictable and easier to read by
> shifting parts of the shell library to a C process.

> For instance if we wrote an utility that would implement all the
> tst_kill_test and timeout process in C we would simplify things greatly.

> Something as:

> tst_timeout_kill

> that would be used as:


> tst_timeout_kill 300 12342
>                   ^    ^
> 		  |    process group leader pid
> 		  timeout in seconds


> That would implement both the loop for killing tests and timeout
> processing as well. That way we would get rid of the trap in the
> subshell and we would end up with a single pid for the whole timeout
> process and avoid the recursive sigkill to begin with.

+1 for this idea instead of never ending story to fix various shells.
Please if you have time, wrote that.

Kind regards,
Petr

WARNING: multiple messages have this Message-ID (diff)
From: Petr Vorel <pvorel@suse.cz>
To: Cyril Hrubis <chrubis@suse.cz>
Cc: ltp@lists.linux.it, Joerg Vehlow <joerg.vehlow@aox-tech.de>
Subject: Re: [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
Date: Fri, 17 Sep 2021 13:03:00 +0200	[thread overview]
Message-ID: <YUR15AMqM4qEYXpV@pevik> (raw)
Message-ID: <20210917110300.jnu22ARoR39M6wpM7iC-OoqAZweBlynQvojVu7Lo1gc@z> (raw)
In-Reply-To: <YUR1K7XE3QmTFxT7@yuki>

> Hi!
> > > > I managed to reproduce this in dash. I bet that this is a bug where
> > > > signal handler inside dash is temporarily disabled when we install the
> > > > trap and if we manage to hit that window the signal is discarded. At
> > > > least that is my working theory. After I've installed debug prints, in
> > > > the cases where it hangs the signal was sent just before have installed
> > > > the trap. And in some cases when the signal arrives the timer process is
> > > > killed but the trap is not invoked. So it really looks like signal
> > > > handling in dash is simply broken. Not sure what we can do about bugs
> > > > like this apart from switching to a real programming language.
> > Which version of bash and dash are you testing on?

> > > Thanks for the debugging. *bash* is also affected, at least some releases.
> > > I reproduced it also on some older SLES, with bash 4.4.
> > dash 0.5.11.4 and 5.1.8 on my Tumbleweed laptop are OK.

> > I tested it on various my VM:

> > dash *failing*: 0.5.8 (SLES), 0.5.11.3 (Tumbleweed), 0.5.11+git20200708+dd9ef66-5 (Debian), 0.5.7-4+b1 (Debian)
> > dash *OK*: 0.5.11.2 (SLES 15), 0.5.10.2 (CentOS)

> > bash *failing*: 5.1.4-1.4 (Tumbleweed), 4.4-9.7.1 (SLES 15)
> > bash *ok*: 4.4-17 (SLES 15), 4.3-83 (SLES 12), 4.3-11+deb8u (Debian), 5.1-2+b3
> > (Debian), 4.2.46-34 (CentOS)

> > I have no idea what it causes, whether really some bash and dash versions are
> > buggy or it's reproducible only on certain environment.

> bash 5.1.8 seems to work okay, dash 0.5.11.3-r2 seems to fail here.

> > Any tip what to search for?

> Not really, apart from reading source code and figuring out exactly what
> happens here.

> I guess that we can make things more predictable and easier to read by
> shifting parts of the shell library to a C process.

> For instance if we wrote an utility that would implement all the
> tst_kill_test and timeout process in C we would simplify things greatly.

> Something as:

> tst_timeout_kill

> that would be used as:


> tst_timeout_kill 300 12342
>                   ^    ^
> 		  |    process group leader pid
> 		  timeout in seconds


> That would implement both the loop for killing tests and timeout
> processing as well. That way we would get rid of the trap in the
> subshell and we would end up with a single pid for the whole timeout
> process and avoid the recursive sigkill to begin with.

+1 for this idea instead of never ending story to fix various shells.
Please if you have time, wrote that.

Kind regards,
Petr

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

  reply	other threads:[~2021-09-17 11:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-19  8:58 [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process Li Wang
2021-05-19  9:21 ` Joerg Vehlow
2021-05-27  4:11   ` Li Wang
2021-05-31  9:25 ` Li Wang
2021-09-15 22:40 ` Petr Vorel
2021-09-15 22:40   ` Petr Vorel
2021-09-15 23:01   ` Petr Vorel
2021-09-15 23:01     ` Petr Vorel
2021-09-17  8:50     ` Cyril Hrubis
2021-09-17  8:50       ` Cyril Hrubis
2021-09-17  9:17       ` Petr Vorel
2021-09-17  9:17         ` Petr Vorel
2021-09-17 10:18         ` Petr Vorel
2021-09-17 10:18           ` Petr Vorel
2021-09-17 10:59           ` Cyril Hrubis
2021-09-17 10:59             ` Cyril Hrubis
2021-09-17 11:03             ` Petr Vorel [this message]
2021-09-17 11:03               ` Petr Vorel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUR15AMqM4qEYXpV@pevik \
    --to=pvorel@suse.cz \
    --cc=ltp@lists.linux.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.