All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Kirill Tkhai <ktkhai@parallels.com>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Michal Hocko <mhocko@suse.cz>, Rik van Riel <riel@redhat.com>,
	Ionut Alexa <ionut.m.alexa@gmail.com>,
	Peter Hurley <peter@hurleysoftware.com>,
	Kirill Tkhai <tkhai@yandex.ru>
Subject: Re: [PATCH] exit: Use read lock for do_notify_parent() instead of write lock
Date: Fri, 10 Apr 2015 19:50:48 +0200	[thread overview]
Message-ID: <20150410175048.GA23971@redhat.com> (raw)
In-Reply-To: <1428602348.12166.29.camel@parallels.com>

Kirill,

I'll try to read this patch tomorrow, currently I am hopelessly buried
in user-space programming :/

But I have to admit that so far I dislike this patch very much. It adds
a lot of complications and for what?

Yes, yes, yes. tasklist_lock is another BKL and must die. We need the
per-process lock. Until then I do not think the hacks like this make
any sense, unless you have the "real" workload with before/after
performance numbers.

On 04/09, Kirill Tkhai wrote:
>
> I suggest to execute do_notify_parent() under read_lock(). It allows more tasks
> to use it in parallel. Read lock gives enough guarantees for us: child's parent
> won't change during the notification.

Well, write_unlock() + read_lock() is not nice too...

> include/asm-generic/qrwlock.h:
> static inline void queue_reduce_locked_write_to_read(struct qrwlock *lock)
> {
> 	smp_mb__before_atomic();
> 	atomic_add(_QR_BIAS - _QW_LOCKED, &lock->cnts);
> }

Yes, downgrade() will be better.

Still, this only removes do_notify_parent() from the write_lock'ed section.

(lets ignore kill_orphaned_pgrp(), we want to make will_become_orphaned_pgrp
 lockless. Look at get_signal).

And this changes the rules: currently ->exit_state is stable under read_lock,
except -> EXIT_DEAD transition. OK, this is probably fine, but we need to
recheck. At least this was certainly wrong some time before iirc.

> @@ -594,7 +597,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
>
>  	write_lock_irq(&tasklist_lock);
>  	forget_original_parent(tsk, &dead);
> +	tsk->exit_state = EXIT_NOTIFY;
> +	write_unlock_irq(&tasklist_lock);

And unless I missed something this EXIT_NOTIFY turns the concurrent
do_wait() into the busy-wait loop.

Now suppose that CONFIG_SMP=n and the rt parent preempts the exiting
child right after it drops tasklist: deadlock?

> +	read_lock(&tasklist_lock);
>  	if (group_dead)
>  		kill_orphaned_pgrp(tsk->group_leader, NULL);
>
> @@ -612,13 +618,14 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
>  	}
>
>  	tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;

This needs WRITE_ONCE(). Otherwise gcc can do, say,

	tsk->exit_state = EXIT_ZOMBIE;
	if (autoreap)
		tsk->exit_state = EXIT_DEAD;

which will lead to kernel crash (both parent and child can release this
task).


> -	if (tsk->exit_state == EXIT_DEAD)
> +	smp_wmb(); /* Pairs with read_lock() in do_wait() */

Why? this barries looks unnecessary.

OTOH. We need to set EXIT_XXX before __wake_up_parent(). OK, OK, we do not
because of the busy-wait loop, but busy-wait is not an option.

> @@ -1317,6 +1324,13 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
>  		return 0;
>  	}
>
> +	if (unlikely(exit_state == EXIT_NOTIFY)) {
> +		if (wo->wo_flags & WNOHANG)
> +			return 0;
> +		read_unlock(&tasklist_lock);
> +		return -REPEAT_DOWAIT;
> +	}

No, no, no. If you do something like this, please (ab)use wo->notask_error.
And wait_consider_task() should continue after that,

Oleg.


  parent reply	other threads:[~2015-04-10 17:51 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-09 17:59 [PATCH] exit: Use read lock for do_notify_parent() instead of write lock Kirill Tkhai
2015-04-09 18:28 ` Peter Zijlstra
2015-04-10  8:22   ` Ingo Molnar
2015-04-10 17:50 ` Oleg Nesterov [this message]
2015-04-13 16:34   ` Kirill Tkhai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150410175048.GA23971@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ionut.m.alexa@gmail.com \
    --cc=ktkhai@parallels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=mingo@redhat.com \
    --cc=peter@hurleysoftware.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tkhai@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.