All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cyril Hrubis <chrubis@suse.cz>
To: ltp@lists.linux.it
Subject: [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
Date: Fri, 17 Sep 2021 12:59:55 +0200	[thread overview]
Message-ID: <YUR1K7XE3QmTFxT7@yuki> (raw)
In-Reply-To: <YURrcrX9RQYIMt4V@pevik>

Hi!
> > > I managed to reproduce this in dash. I bet that this is a bug where
> > > signal handler inside dash is temporarily disabled when we install the
> > > trap and if we manage to hit that window the signal is discarded. At
> > > least that is my working theory. After I've installed debug prints, in
> > > the cases where it hangs the signal was sent just before have installed
> > > the trap. And in some cases when the signal arrives the timer process is
> > > killed but the trap is not invoked. So it really looks like signal
> > > handling in dash is simply broken. Not sure what we can do about bugs
> > > like this apart from switching to a real programming language.
> Which version of bash and dash are you testing on?
> 
> > Thanks for the debugging. *bash* is also affected, at least some releases.
> > I reproduced it also on some older SLES, with bash 4.4.
> dash 0.5.11.4 and 5.1.8 on my Tumbleweed laptop are OK.
> 
> I tested it on various my VM:
> 
> dash *failing*: 0.5.8 (SLES), 0.5.11.3 (Tumbleweed), 0.5.11+git20200708+dd9ef66-5 (Debian), 0.5.7-4+b1 (Debian)
> dash *OK*: 0.5.11.2 (SLES 15), 0.5.10.2 (CentOS)
> 
> bash *failing*: 5.1.4-1.4 (Tumbleweed), 4.4-9.7.1 (SLES 15)
> bash *ok*: 4.4-17 (SLES 15), 4.3-83 (SLES 12), 4.3-11+deb8u (Debian), 5.1-2+b3
> (Debian), 4.2.46-34 (CentOS)
> 
> I have no idea what it causes, whether really some bash and dash versions are
> buggy or it's reproducible only on certain environment.

bash 5.1.8 seems to work okay, dash 0.5.11.3-r2 seems to fail here.

> Any tip what to search for?

Not really, apart from reading source code and figuring out exactly what
happens here.

I guess that we can make things more predictable and easier to read by
shifting parts of the shell library to a C process.

For instance if we wrote an utility that would implement all the
tst_kill_test and timeout process in C we would simplify things greatly.

Something as:

tst_timeout_kill

that would be used as:


tst_timeout_kill 300 12342
                  ^    ^
		  |    process group leader pid
		  timeout in seconds


That would implement both the loop for killing tests and timeout
processing as well. That way we would get rid of the trap in the
subshell and we would end up with a single pid for the whole timeout
process and avoid the recursive sigkill to begin with.

-- 
Cyril Hrubis
chrubis@suse.cz

WARNING: multiple messages have this Message-ID (diff)
From: Cyril Hrubis <chrubis@suse.cz>
To: Petr Vorel <pvorel@suse.cz>
Cc: ltp@lists.linux.it, Joerg Vehlow <joerg.vehlow@aox-tech.de>
Subject: Re: [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process
Date: Fri, 17 Sep 2021 12:59:55 +0200	[thread overview]
Message-ID: <YUR1K7XE3QmTFxT7@yuki> (raw)
Message-ID: <20210917105955.S_-8ka1lww9qYA8Fjo6pCV_Yawt_WubMdnVScqpIHQY@z> (raw)
In-Reply-To: <YURrcrX9RQYIMt4V@pevik>

Hi!
> > > I managed to reproduce this in dash. I bet that this is a bug where
> > > signal handler inside dash is temporarily disabled when we install the
> > > trap and if we manage to hit that window the signal is discarded. At
> > > least that is my working theory. After I've installed debug prints, in
> > > the cases where it hangs the signal was sent just before have installed
> > > the trap. And in some cases when the signal arrives the timer process is
> > > killed but the trap is not invoked. So it really looks like signal
> > > handling in dash is simply broken. Not sure what we can do about bugs
> > > like this apart from switching to a real programming language.
> Which version of bash and dash are you testing on?
> 
> > Thanks for the debugging. *bash* is also affected, at least some releases.
> > I reproduced it also on some older SLES, with bash 4.4.
> dash 0.5.11.4 and 5.1.8 on my Tumbleweed laptop are OK.
> 
> I tested it on various my VM:
> 
> dash *failing*: 0.5.8 (SLES), 0.5.11.3 (Tumbleweed), 0.5.11+git20200708+dd9ef66-5 (Debian), 0.5.7-4+b1 (Debian)
> dash *OK*: 0.5.11.2 (SLES 15), 0.5.10.2 (CentOS)
> 
> bash *failing*: 5.1.4-1.4 (Tumbleweed), 4.4-9.7.1 (SLES 15)
> bash *ok*: 4.4-17 (SLES 15), 4.3-83 (SLES 12), 4.3-11+deb8u (Debian), 5.1-2+b3
> (Debian), 4.2.46-34 (CentOS)
> 
> I have no idea what it causes, whether really some bash and dash versions are
> buggy or it's reproducible only on certain environment.

bash 5.1.8 seems to work okay, dash 0.5.11.3-r2 seems to fail here.

> Any tip what to search for?

Not really, apart from reading source code and figuring out exactly what
happens here.

I guess that we can make things more predictable and easier to read by
shifting parts of the shell library to a C process.

For instance if we wrote an utility that would implement all the
tst_kill_test and timeout process in C we would simplify things greatly.

Something as:

tst_timeout_kill

that would be used as:


tst_timeout_kill 300 12342
                  ^    ^
		  |    process group leader pid
		  timeout in seconds


That would implement both the loop for killing tests and timeout
processing as well. That way we would get rid of the trap in the
subshell and we would end up with a single pid for the whole timeout
process and avoid the recursive sigkill to begin with.

-- 
Cyril Hrubis
chrubis@suse.cz

-- 
Mailing list info: https://lists.linux.it/listinfo/ltp

  reply	other threads:[~2021-09-17 10:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-19  8:58 [LTP] [PATCH v2] tst_test: using SIGTERM to terminate process Li Wang
2021-05-19  9:21 ` Joerg Vehlow
2021-05-27  4:11   ` Li Wang
2021-05-31  9:25 ` Li Wang
2021-09-15 22:40 ` Petr Vorel
2021-09-15 22:40   ` Petr Vorel
2021-09-15 23:01   ` Petr Vorel
2021-09-15 23:01     ` Petr Vorel
2021-09-17  8:50     ` Cyril Hrubis
2021-09-17  8:50       ` Cyril Hrubis
2021-09-17  9:17       ` Petr Vorel
2021-09-17  9:17         ` Petr Vorel
2021-09-17 10:18         ` Petr Vorel
2021-09-17 10:18           ` Petr Vorel
2021-09-17 10:59           ` Cyril Hrubis [this message]
2021-09-17 10:59             ` Cyril Hrubis
2021-09-17 11:03             ` Petr Vorel
2021-09-17 11:03               ` Petr Vorel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YUR1K7XE3QmTFxT7@yuki \
    --to=chrubis@suse.cz \
    --cc=ltp@lists.linux.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.