From: Oleg Nesterov <oleg@redhat.com>
To: Michel Lespinasse <walken@google.com>
Cc: David Howells <dhowells@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/5] tasklist_lock fairness issues
Date: Sat, 9 Mar 2013 19:26:13 +0100 [thread overview]
Message-ID: <20130309182613.GA32343@redhat.com> (raw)
In-Reply-To: <1362717437-1729-1-git-send-email-walken@google.com>
Hi Michel,
Well. I can't say I really like this. 4/5 itself looks fine, but other
complications do not look nice, at least in the long term. Imho, imho,
I can be wrong.
Everyone seem to agree that tasklist should die, this series doesn't
even try to solve the fundamental problems with this global lock.
However, I agree. tasklist rework can take ages, so probably we need
to solve at least the write-starvation problem right now. Probably this
series is correct (I didn't read it except 4/5), but too complex.
Can't we do some simple hack to avoid the user-triggered livelocks?
Say, can't we add
// Just for illustration, at least we need CONTENDED/etc
void read_lock_fair(rwlock_t *lock)
{
while (arch_writer_active(lock))
cpu_relax();
read_lock(lock);
}
and then turn some of read_lock's (do_wait, setpriority, more) into
read_lock_fair?
I have to admit, I do not know how rwlock_t actually works, and it seems
that arch_writer_active() is not trivial even on x86. __write_lock_failed()
changes the counter back, and afaics you can't detect the writer in the
window between addl and subl. Perhaps we can change __write_lock_failed()
to use RW_LOCK_BIAS + HUGE_NUMBER while it spins, but I am not sure...
Or perhaps we can turn write_lock(tasklist) into
void tasklist_write_lock(void)
{
if (write_try_lock(taslist_lock))
return;
// checked by writer_active() above
taslist_lock_writer_active = true;
write_lock(taslist_lock);
taslist_lock_writer_active = false;
}
In any case I am not proud of this idea, and in any case I would like
to see more comments.
Oleg.
On 03/07, Michel Lespinasse wrote:
>
> I'd like to gather comments on these patches, which apply over v3.9-rc1.
>
> I have been looking at rwlock_t fairness issues, and in particular at
> tasklist_lock as this is the one rwlock_t user that seems to be making
> it difficult to switch rwlock_t to a fair lock. This is because the
> tasklist_lock read side is taken both in process and irq contexts,
> and we don't want to have to disable irqs when tasklist_lock is taken
> in process context, so we need a rwlock_t that supports a recursive
> read side.
>
> Patches 1-3 convert the existing tasklist_lock users to use helper
> functions. For the tasklist_lock read side, we have different helpers
> for call sites that are known to run only in process context, vs call
> sites that may run in any context.
>
> - Patch 1 introduces the tasklist_lock helper functions;
>
> - Patch 2 identifies all tasklist_lock call sites that may run in
> (soft or hard) irq context and converts them to use the
> tasklist_read_[un]lock_any helper functions;
>
> - Patch 3 converts the remaining call sites (known to run only in
> process context) to use the tasklist_{read,write}_[un]lock helpers.
>
> Together, these 3 patches provide some kind of documentation about which
> tasklist users may run in irq contexts, and are thus currently making it
> difficult to do a straight conversion of rwlock_t to a fair rw spinlock.
>
> Patch 4 introduces a generic implementation of a fair (ticket based)
> rw spinlock. I did not try optimizing it; it would be possible to do
> better using arch specific code. Seems too early for it though :)
>
> Patch 5 makes tasklist_lock fair when acquired from process context.
> This is done using a double locking scheme; there is a fair rwlock
> that is used when tasklist_lock is acquired from process context and
> an unfair rwlock_t that is used when tasklist_lock read side is acquired
> from irq context. When tasklist_lock is acquired for write both locks
> are taken. Clearly, such double locking is suboptimal; one would hope
> that the irq context read side users could be eliminated over time.
> Converting the process context users to a fair rwlock also has the
> additional benefit that it lets lockdep verify that we are not trying
> to do recursive acquires of the tasklist_lock read side in process context.
> (I haven't been able to break this check so far).
>
> Does this look like a reasonable approach to get traction on the
> tasklist_lock issues ?
>
> Michel Lespinasse (5):
> kernel: add tasklist_{read,write}_lock{,_any} helper functions
> kernel: use tasklist_read_lock_any() when locking tasklist in irq context
> kernel: use tasklist_{read,write}_lock() to lock tasklist in process context
> kernel: add ticket based fair rwlock
> kernel: make tasklist_lock fair for process context call sites
>
> arch/blackfin/kernel/trace.c | 4 +--
> arch/frv/mm/mmu-context.c | 4 +--
> arch/ia64/kernel/mca.c | 13 ++++----
> arch/ia64/kernel/perfmon.c | 8 ++---
> arch/ia64/kernel/ptrace.c | 8 ++---
> arch/metag/kernel/smp.c | 4 +--
> arch/mips/kernel/mips-mt-fpaff.c | 4 +--
> arch/sh/mm/asids-debugfs.c | 4 +--
> arch/um/kernel/reboot.c | 4 +--
> drivers/tty/sysrq.c | 4 +--
> drivers/tty/tty_io.c | 20 +++++------
> fs/exec.c | 14 ++++----
> fs/fcntl.c | 8 ++---
> fs/fs_struct.c | 4 +--
> fs/proc/array.c | 4 +--
> fs/proc/base.c | 4 +--
> include/linux/fair_rwlock.h | 72 ++++++++++++++++++++++++++++++++++++++++
> include/linux/lockdep.h | 15 +++++++++
> include/linux/sched.h | 46 ++++++++++++++++++++++++-
> kernel/cgroup.c | 4 +--
> kernel/cpu.c | 4 +--
> kernel/exit.c | 42 +++++++++++------------
> kernel/fork.c | 14 +++++---
> kernel/lockdep.c | 6 ++--
> kernel/pid_namespace.c | 8 ++---
> kernel/posix-cpu-timers.c | 32 +++++++++---------
> kernel/power/process.c | 16 ++++-----
> kernel/ptrace.c | 20 +++++------
> kernel/sched/core.c | 13 ++++----
> kernel/sched/debug.c | 5 ++-
> kernel/signal.c | 26 +++++++--------
> kernel/sys.c | 22 ++++++------
> kernel/trace/ftrace.c | 5 ++-
> kernel/tracepoint.c | 10 +++---
> mm/kmemleak.c | 4 +--
> mm/memory-failure.c | 8 ++---
> mm/oom_kill.c | 4 +--
> security/keys/keyctl.c | 4 +--
> security/selinux/hooks.c | 4 +--
> 39 files changed, 312 insertions(+), 183 deletions(-)
> create mode 100644 include/linux/fair_rwlock.h
>
> --
> 1.8.1.3
next prev parent reply other threads:[~2013-03-09 18:28 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-08 4:37 [RFC PATCH 0/5] tasklist_lock fairness issues Michel Lespinasse
2013-03-08 4:37 ` [RFC PATCH 1/5] kernel: add tasklist_{read,write}_lock{,_any} helper functions Michel Lespinasse
2013-03-08 4:37 ` [RFC PATCH 2/5] kernel: use tasklist_read_lock_any() when locking tasklist in irq context Michel Lespinasse
2013-03-08 4:37 ` [RFC PATCH 3/5] kernel: use tasklist_{read,write}_lock() to lock tasklist in process context Michel Lespinasse
2013-03-08 4:37 ` [RFC PATCH 4/5] kernel: add ticket based fair rwlock Michel Lespinasse
2013-03-08 4:37 ` [RFC PATCH 5/5] kernel: make tasklist_lock fair for process context call sites Michel Lespinasse
2013-03-09 18:26 ` Oleg Nesterov [this message]
2013-03-10 2:37 ` [RFC PATCH 0/5] tasklist_lock fairness issues Michel Lespinasse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130309182613.GA32343@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.