From: Kirill Tkhai <ktkhai@parallels.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Michal Hocko <mhocko@suse.cz>, Rik van Riel <riel@redhat.com>,
Ionut Alexa <ionut.m.alexa@gmail.com>,
Peter Hurley <peter@hurleysoftware.com>,
"Kirill Tkhai" <tkhai@yandex.ru>
Subject: Re: [PATCH] exit: Use read lock for do_notify_parent() instead of write lock
Date: Mon, 13 Apr 2015 19:34:12 +0300 [thread overview]
Message-ID: <1428942852.1467.30.camel@parallels.com> (raw)
In-Reply-To: <20150410175048.GA23971@redhat.com>
Hi, Oleg,
thanks for your review.
В Пт, 10/04/2015 в 19:50 +0200, Oleg Nesterov пишет:
> Kirill,
>
> I'll try to read this patch tomorrow, currently I am hopelessly buried
> in user-space programming :/
>
> But I have to admit that so far I dislike this patch very much. It adds
> a lot of complications and for what?
>
> Yes, yes, yes. tasklist_lock is another BKL and must die. We need the
> per-process lock. Until then I do not think the hacks like this make
> any sense, unless you have the "real" workload with before/after
> performance numbers.
I don't think the complication is very huge. We add one rule about
exit_state. Yes, the state becomes unstable under read_lock(), but
only wait_consider_task() is affected by this.
Ok, what do you mean when you're speaking about killing tasklist_lock?
Can't we leave it for fork() and __unhash_process() only, but change
other places which lock it for write? Every of the places will get rid
of it by its own way. EXIT_NOTIFY is for do_exit().
Or you want to kill it completelly?
I didn't test the patch on special workload or large SMP systems.
The results for 4 CPU box (kernel compilation):
[origin]
1)534.37user 32.15system 2:29.32elapsed 379%CPU (0avgtext+0avgdata 142488maxresident)k
0inputs+724264outputs (0major+23852891minor)pagefaults 0swaps
2)534.85user 32.81system 2:28.67elapsed 381%CPU (0avgtext+0avgdata 142476maxresident)k
0inputs+724264outputs (0major+23853531minor)pagefaults 0swaps
[patched]
1)531.65user 32.69system 2:27.41elapsed 382%CPU (0avgtext+0avgdata 142580maxresident)k
0inputs+724256outputs (0major+23854620minor)pagefaults 0swaps
2)530.92user 32.51system 2:28.18elapsed 380%CPU (0avgtext+0avgdata 142544maxresident)k
0inputs+724256outputs (0major+23852925minor)pagefaults 0swaps
My test machine has HDD, so it's not the best test for the patch. I'll try something
else later. But I don't expect exciting results on workloads like this.
>
> On 04/09, Kirill Tkhai wrote:
> >
> > I suggest to execute do_notify_parent() under read_lock(). It allows more tasks
> > to use it in parallel. Read lock gives enough guarantees for us: child's parent
> > won't change during the notification.
>
> Well, write_unlock() + read_lock() is not nice too...
>
> > include/asm-generic/qrwlock.h:
> > static inline void queue_reduce_locked_write_to_read(struct qrwlock *lock)
> > {
> > smp_mb__before_atomic();
> > atomic_add(_QR_BIAS - _QW_LOCKED, &lock->cnts);
> > }
>
> Yes, downgrade() will be better.
>
> Still, this only removes do_notify_parent() from the write_lock'ed section.
Yeah, but the plan is to go successively to removing write lock from every
place it's used, except of hashing in fork() and unhashing in __unhash_process().
> (lets ignore kill_orphaned_pgrp(), we want to make will_become_orphaned_pgrp
> lockless. Look at get_signal).
>
> And this changes the rules: currently ->exit_state is stable under read_lock,
> except -> EXIT_DEAD transition. OK, this is probably fine, but we need to
> recheck. At least this was certainly wrong some time before iirc.
>
> > @@ -594,7 +597,10 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
> >
> > write_lock_irq(&tasklist_lock);
> > forget_original_parent(tsk, &dead);
> > + tsk->exit_state = EXIT_NOTIFY;
> > + write_unlock_irq(&tasklist_lock);
>
> And unless I missed something this EXIT_NOTIFY turns the concurrent
> do_wait() into the busy-wait loop.
>
> Now suppose that CONFIG_SMP=n and the rt parent preempts the exiting
> child right after it drops tasklist: deadlock?
You sure, thank. We need to disable preemption there.
> > + read_lock(&tasklist_lock);
> > if (group_dead)
> > kill_orphaned_pgrp(tsk->group_leader, NULL);
> >
> > @@ -612,13 +618,14 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
> > }
> >
> > tsk->exit_state = autoreap ? EXIT_DEAD : EXIT_ZOMBIE;
>
> This needs WRITE_ONCE(). Otherwise gcc can do, say,
>
> tsk->exit_state = EXIT_ZOMBIE;
> if (autoreap)
> tsk->exit_state = EXIT_DEAD;
>
> which will lead to kernel crash (both parent and child can release this
> task).
Ah, thanks.
>
>
> > - if (tsk->exit_state == EXIT_DEAD)
> > + smp_wmb(); /* Pairs with read_lock() in do_wait() */
>
> Why? this barries looks unnecessary.
Sure, it's unnecessary for do_wait().
> OTOH. We need to set EXIT_XXX before __wake_up_parent(). OK, OK, we do not
> because of the busy-wait loop, but busy-wait is not an option.
>
> > @@ -1317,6 +1324,13 @@ static int wait_consider_task(struct wait_opts *wo, int ptrace,
> > return 0;
> > }
> >
> > + if (unlikely(exit_state == EXIT_NOTIFY)) {
> > + if (wo->wo_flags & WNOHANG)
> > + return 0;
> > + read_unlock(&tasklist_lock);
> > + return -REPEAT_DOWAIT;
> > + }
>
> No, no, no. If you do something like this, please (ab)use wo->notask_error.
> And wait_consider_task() should continue after that,
Kirill
prev parent reply other threads:[~2015-04-13 16:34 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-09 17:59 [PATCH] exit: Use read lock for do_notify_parent() instead of write lock Kirill Tkhai
2015-04-09 18:28 ` Peter Zijlstra
2015-04-10 8:22 ` Ingo Molnar
2015-04-10 17:50 ` Oleg Nesterov
2015-04-13 16:34 ` Kirill Tkhai [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1428942852.1467.30.camel@parallels.com \
--to=ktkhai@parallels.com \
--cc=akpm@linux-foundation.org \
--cc=ionut.m.alexa@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.cz \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=peter@hurleysoftware.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tkhai@yandex.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.