All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Rachel Menge <rachelmenge@linux.microsoft.com>,
	linux-kernel@vger.kernel.org, rcu@vger.kernel.org,
	Wei Fu <fuweid89@gmail.com>,
	apais@linux.microsoft.com,
	Sudhanva Huruli <Sudhanva.Huruli@microsoft.com>,
	Jens Axboe <axboe@kernel.dk>,
	Christian Brauner <brauner@kernel.org>,
	Mike Christie <michael.christie@oracle.com>,
	Joel Granados <j.granados@samsung.com>,
	Mateusz Guzik <mjguzik@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Josh Triplett <josh@joshtriplett.org>,
	Boqun Feng <boqun.feng@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Zqiang <qiang.zhang1211@gmail.com>
Subject: Re: [PATCH] zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
Date: Thu, 13 Jun 2024 17:30:22 +0200	[thread overview]
Message-ID: <20240613153021.GC18218@redhat.com> (raw)
In-Reply-To: <87a5jpqamx.fsf@email.froward.int.ebiederm.org>

On 06/13, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@redhat.com> writes:
>
> > kernel_wait4() doesn't sleep and returns -EINTR if there is no
> > eligible child and signal_pending() is true.
> >
> > That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not
> > enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending()
> > return false and avoid a busy-wait loop.
>
> I took a look through the code.  It used to be that TIF_NOTIFY_SIGNAL
> was all about waking up a task so that task_work_run can be used.
> io_uring still mostly uses it that way.  There is also a use in
> kthread_stop that just uses it as a TIF_SIGPENDING without having a
> pending signal.
>
> At the point in do_exit where exit_notify and thus zap_pid_ns_processes
> is called I can't possibly see a use for TIF_NOTIFY_SIGNAL.
> exit_task_work, exit_signals, and io_uring_cancel have all been called.
>
> So TIF_NOTIFY_SIGNAL should be spurious at this point and safe to clear.
> Why it remains set is a mystery to me.

because exit_task_work() -> task_work_run() doesn't clear TIF_NOTIFY_SIGNAL.

So yes, it is spurious, but to me a possible TIF_SIGPENDING is even more
"spurious". See my reply to Wei.

We don't need to clear TIF_NOTIFY_SIGNAL inside the loop, task_work_addd()
can't succeed after exit_task_work() sets ->task_works =&work_exited, but
this is another story and this doesn't (well, shouldn't) differ from
TIF_SIGPENDING.

> If I had infinite time and energy the ideal is to rework the pid
> namespace exit logic

Perhaps  in this case you could take a look at the next loop waiting for
pid_ns->pid_allocated == init_pids ;)

I always hated the fact that the the exiting sub-namespace init can
"hang forever" if this namespace has the tasks injected from the parent
namespace. And I had enough hard-to-debug internal bug reports which
blamed the kernel.

> This active waiting is weird and non-standard in the kernel and winds up
> causeing a problem every couple of years because of that.

Agreed.

Oleg.


  parent reply	other threads:[~2024-06-13 15:32 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-05 23:42 [RCU] zombie task hung in synchronize_rcu_expedited Rachel Menge
2024-06-06 11:10 ` Oleg Nesterov
2024-06-06 15:45   ` Wei Fu
2024-06-06 17:28     ` Oleg Nesterov
2024-06-07  3:02       ` Wei Fu
2024-06-07  6:25         ` Oleg Nesterov
2024-06-07 15:04           ` Wei Fu
2024-06-07 21:22             ` Oleg Nesterov
2024-06-08 12:42               ` Oleg Nesterov
2024-06-10  0:07                 ` Wei Fu
2024-06-08 12:06 ` [PATCH] zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING Oleg Nesterov
2024-06-08 17:00   ` Boqun Feng
2024-06-09 14:12   ` Wei Fu
2024-06-12 16:57   ` Jens Axboe
2024-06-13 12:40   ` Eric W. Biederman
2024-06-13 14:02     ` Wei Fu
2024-06-13 14:49       ` Oleg Nesterov
2024-06-13 15:30     ` Oleg Nesterov [this message]
2024-06-08 15:48 ` [PATCH] zap_pid_ns_processes: don't send SIGKILL to sub-threads Oleg Nesterov
2024-06-13 13:01   ` Eric W. Biederman
2024-06-13 15:00     ` Oleg Nesterov
2024-06-13 16:23       ` Eric W. Biederman
2024-07-05 16:08       ` Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240613153021.GC18218@redhat.com \
    --to=oleg@redhat.com \
    --cc=Sudhanva.Huruli@microsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=apais@linux.microsoft.com \
    --cc=axboe@kernel.dk \
    --cc=boqun.feng@gmail.com \
    --cc=brauner@kernel.org \
    --cc=ebiederm@xmission.com \
    --cc=frederic@kernel.org \
    --cc=fuweid89@gmail.com \
    --cc=j.granados@samsung.com \
    --cc=jiangshanlai@gmail.com \
    --cc=joel@joelfernandes.org \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=michael.christie@oracle.com \
    --cc=mjguzik@gmail.com \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=qiang.zhang1211@gmail.com \
    --cc=rachelmenge@linux.microsoft.com \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.