* rwlock_t unfairness and tasklist_lock @ 2013-01-09 4:03 Michel Lespinasse 2013-01-09 17:49 ` Oleg Nesterov 2013-01-11 14:34 ` Thomas Gleixner 0 siblings, 2 replies; 8+ messages in thread From: Michel Lespinasse @ 2013-01-09 4:03 UTC (permalink / raw) To: David Howells, Thomas Gleixner, Salman Qazi, Oleg Nesterov, LKML Like others before me, I have discovered how easy it is to DOS a system by abusing the rwlock_t unfairness and causing the tasklist_lock read side to be continuously held (my abuse code makes use of the getpriority syscall, but there are plenty of other ways anyway). My understanding is that the issue of rwlock_t fairness has come up several times over the last 10 years (I first saw a fair rwlock_t proposal by David Howells 10 years ago, https://lkml.org/lkml/2002/11/8/102), and every time the answer has been that we can't easily change this because tasklist_lock makes use of the read-side reentrancy and interruptibility properties of rwlock_t, and that we should really find something smart to do about tasklist_lock. Yet that last part never gets done, and the problem is still with us. I am wondering: - Does anyone know of any current work towards removing the tasklist_lock use of rwlock_t ? Thomas Gleixner mentioned 3 years ago that he'd give it a shot (https://lwn.net/Articles/364601/), did he encounter some unforeseen difficulty that we should learn from ? - Would there be any fundamental objection to implementing a fair rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My proposal there would be along the lines of: 1- implement a fair rwlock_t - the ticket based idea from David Howells seems quite appropriate to me 2- if any places use reader side reentrancy within the same context, adjust the code as needed to get rid of that reentrancy 3- a simple way to deal with reentrancy between contexts (as in, we take the tasklist_lock read side in process context, get interrupted, and we now need to take it again in interrupt or softirq context) would be to have different locks depending on context. tasklist_lock read side in process context would work as usual, but in irq or contexts we'd take tasklist_irq_lock instead (and, if there are any irq handlers taking tasklist_lock read side, we'd have to disable interrupt handling when tasklist_irq_lock is held to avoid further nesting). tasklist_lock write side - that is, mainly fork() and exec() - would have to take both tasklist_lock and tasklist_irq_lock, in that order. While it might seem to be a downside that tasklist_lock write side would now have to take both tasklist_lock and tasklist_irq_lock, I must note that this wouldn't increase the number of atomic operations: the current rwlock_t implementation uses atomics on both lock and unlock, while the ticket based one would only need atomics on the lock side (unlock is just a regular mov instruction), so the total cost should be comparable to what we have now. Any comments about this proposal ? (I should note that I haven't given much thought to tasklist_lock before, and I'm not quite sure just from code inspection which read locks are run in which context...) -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rwlock_t unfairness and tasklist_lock 2013-01-09 4:03 rwlock_t unfairness and tasklist_lock Michel Lespinasse @ 2013-01-09 17:49 ` Oleg Nesterov 2013-01-09 23:20 ` Michel Lespinasse 2013-01-11 14:34 ` Thomas Gleixner 1 sibling, 1 reply; 8+ messages in thread From: Oleg Nesterov @ 2013-01-09 17:49 UTC (permalink / raw) To: Michel Lespinasse; +Cc: David Howells, Thomas Gleixner, Salman Qazi, LKML On 01/08, Michel Lespinasse wrote: > > Like others before me, I have discovered how easy it is to DOS a > system by abusing the rwlock_t unfairness and causing the > tasklist_lock read side to be continuously held Yes. Plus it has perfomance problems. It should die. We still need the global lock to protect, say, init_task.tasks list, but otherwise we need the per-process locking. > - Would there be any fundamental objection to implementing a fair > rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My > proposal there would be along the lines of: I don't really understand your proposal in details, but until we kill tasklist_lock, perhaps it makes sense to implement something simple, say, write-biased rwlock and add "int task_struct->tasklist_read_lock_counter" to avoid the read-write-read deadlock. Oleg. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rwlock_t unfairness and tasklist_lock 2013-01-09 17:49 ` Oleg Nesterov @ 2013-01-09 23:20 ` Michel Lespinasse 2013-01-12 17:31 ` Oleg Nesterov 0 siblings, 1 reply; 8+ messages in thread From: Michel Lespinasse @ 2013-01-09 23:20 UTC (permalink / raw) To: Oleg Nesterov; +Cc: David Howells, Thomas Gleixner, Salman Qazi, LKML On Wed, Jan 9, 2013 at 9:49 AM, Oleg Nesterov <oleg@redhat.com> wrote: > On 01/08, Michel Lespinasse wrote: >> Like others before me, I have discovered how easy it is to DOS a >> system by abusing the rwlock_t unfairness and causing the >> tasklist_lock read side to be continuously held > > Yes. Plus it has perfomance problems. > > It should die. We still need the global lock to protect, say, > init_task.tasks list, but otherwise we need the per-process locking. To be clear: I'm not trying to defend tasklist_lock here. However, given how long this has been a known issue, I think we should consider attacking the problem from the lock fairness perspective first and stop waiting for an eventual tasklist_lock death. >> - Would there be any fundamental objection to implementing a fair >> rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My >> proposal there would be along the lines of: > > I don't really understand your proposal in details, but until we kill > tasklist_lock, perhaps it makes sense to implement something simple, say, > write-biased rwlock and add "int task_struct->tasklist_read_lock_counter" > to avoid the read-write-read deadlock. Right. But one complexity that has to be dealt with, is how to handle reentrant uses of the tasklist_lock read side, when such uses come from a different context (say, the lock was first taken in process context and the reentrant use is in irq or softirq context). If in process context we take the tasklist_lock read side, and *then* increment the tasklist_read_lock_counter, there is still the possibility of an irq coming up in before the counter is incremented. So to deal with that, I think we have to explicitly detect the tasklist_lock uses that are in irq/softirq context and deal with these differently from those in process context - we would have to either ignore the tasklist_lock write bias when in irq/softirq context, or we could deal with it by taking a separate lock then (as in my proposal). -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rwlock_t unfairness and tasklist_lock 2013-01-09 23:20 ` Michel Lespinasse @ 2013-01-12 17:31 ` Oleg Nesterov 2013-01-25 0:33 ` Michel Lespinasse 0 siblings, 1 reply; 8+ messages in thread From: Oleg Nesterov @ 2013-01-12 17:31 UTC (permalink / raw) To: Michel Lespinasse; +Cc: David Howells, Thomas Gleixner, Salman Qazi, LKML On 01/09, Michel Lespinasse wrote: > > On Wed, Jan 9, 2013 at 9:49 AM, Oleg Nesterov <oleg@redhat.com> wrote: > > On 01/08, Michel Lespinasse wrote: > >> Like others before me, I have discovered how easy it is to DOS a > >> system by abusing the rwlock_t unfairness and causing the > >> tasklist_lock read side to be continuously held > > > > Yes. Plus it has perfomance problems. > > > > It should die. We still need the global lock to protect, say, > > init_task.tasks list, but otherwise we need the per-process locking. > > To be clear: I'm not trying to defend tasklist_lock here. I understand, > However, > given how long this has been a known issue, I think we should consider > attacking the problem from the lock fairness perspective first and > stop waiting for an eventual tasklist_lock death. And probably you are right, > >> - Would there be any fundamental objection to implementing a fair > >> rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My > >> proposal there would be along the lines of: > > > > I don't really understand your proposal in details, but until we kill > > tasklist_lock, perhaps it makes sense to implement something simple, say, > > write-biased rwlock and add "int task_struct->tasklist_read_lock_counter" > > to avoid the read-write-read deadlock. > > Right. But one complexity that has to be dealt with, is how to handle > reentrant uses of the tasklist_lock read side, > ... > > there is still the > possibility of an irq coming up in before the counter is incremented. Sure, I didn't try to say that it is trivial to implement read_lock_tasklist(), we should prevent this race. > So to deal with that, I think we have to explicitly detect the > tasklist_lock uses that are in irq/softirq context and deal with these > differently from those in process context I disagree. In the long term, I think that tasklist (or whatever we use instead) should be never used in irq/atomic context. And probably the per-process lock should be rw_semaphore (although it is not recursive). But until then, if we try to improve the things somehow, we should not complicate the code, we need something simple. But actually I am not sure, you can be right. Oleg. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rwlock_t unfairness and tasklist_lock 2013-01-12 17:31 ` Oleg Nesterov @ 2013-01-25 0:33 ` Michel Lespinasse 0 siblings, 0 replies; 8+ messages in thread From: Michel Lespinasse @ 2013-01-25 0:33 UTC (permalink / raw) To: Oleg Nesterov; +Cc: David Howells, Thomas Gleixner, Salman Qazi, LKML On Sat, Jan 12, 2013 at 9:31 AM, Oleg Nesterov <oleg@redhat.com> wrote: > On 01/09, Michel Lespinasse wrote: >> >> - Would there be any fundamental objection to implementing a fair >> >> rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My >> >> proposal there would be along the lines of: >> > >> > I don't really understand your proposal in details, but until we kill >> > tasklist_lock, perhaps it makes sense to implement something simple, say, >> > write-biased rwlock and add "int task_struct->tasklist_read_lock_counter" >> > to avoid the read-write-read deadlock. >> >> Right. But one complexity that has to be dealt with, is how to handle >> reentrant uses of the tasklist_lock read side, >> ... >> >> there is still the >> possibility of an irq coming up in before the counter is incremented. > > Sure, I didn't try to say that it is trivial to implement > read_lock_tasklist(), we should prevent this race. > >> So to deal with that, I think we have to explicitly detect the >> tasklist_lock uses that are in irq/softirq context and deal with these >> differently from those in process context > > I disagree. In the long term, I think that tasklist (or whatever we use > instead) should be never used in irq/atomic context. And probably the > per-process lock should be rw_semaphore (although it is not recursive). All right. So I went through all tasklist_lock call sites, and converted them to use helper functions: First 4 are for the sites I know are only used in process context: tasklist_write_lock() / tasklist_write_unlock() / tasklist_read_lock() / tasklist_read_unlock() Remaining ones are for the sites that can be called from irq/softirq context as well: tassklist_read_lock_any() / tasklist_read_trylock_any() / tasklist_read_unlock_any() I'm not sure if upstream would take these ? IMO, it helps to identify the irq context call sites. If I didn't get it wrong, they are: - send_sigio(): may be called in any context from kill_fasync() (examples: process context from pipe_read(), softirq context from n_tty_receive_buf(), hardirq context from rtc_interrupt()) - send_sigurg(): may be called from process or softirq context (network receive is typically processed in softirq, but packets can be queued up while sockets are being locked and the backlog processed from process context as the socket gets unlocked) - kill_pgrp(): called in process context from job_control(), pty_resize() or in softirq context through n_tty_receive_buf() or even in hardirq context from arch/um/drivers/line.c winch_interrupt() - posix_cpu_timer_schedule(): called through cpu_timer_fire(), in process context (from posix_cpu_timer_set()) or in hardirq context (from run_posix_cpu_timers()) - sysrq debugging features: handle_sysrq() runs in multiple contexts and calls into send_sig_all(), debug_show_all_locks(), normalize_rt_tasks(), print_rq(). - arch/blackfin/kernel/trace.c decode_address(): another debugging feature, may be called from any contexts. I suppose we could do away with the sysrq stuff (or run it in a work queue or whatever). This leaves us with signal delivery and posix cpu timers. To be honest, I looked at signal delivery and couldn't tell why it needs to take the tasklist_lock, but I also couldn't remove it in any way that I would feel safe about :) > But until then, if we try to improve the things somehow, we should not > complicate the code, we need something simple. > > But actually I am not sure, you can be right. I actually also have an implementation that makes tasklist_lock fair for process context call sites. It's easy enough if you can use a split lock for the process vs irq context uses. But I agree it'd be much nicer / more upstreamable if we could just get rid of the irq context uses first, at which point there is no remaining obstacle to replacing rwlock_t with a fair lock. -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: rwlock_t unfairness and tasklist_lock 2013-01-09 4:03 rwlock_t unfairness and tasklist_lock Michel Lespinasse 2013-01-09 17:49 ` Oleg Nesterov @ 2013-01-11 14:34 ` Thomas Gleixner 2013-01-12 3:33 ` [PATCH] " Michel Lespinasse 1 sibling, 1 reply; 8+ messages in thread From: Thomas Gleixner @ 2013-01-11 14:34 UTC (permalink / raw) To: Michel Lespinasse; +Cc: David Howells, Salman Qazi, Oleg Nesterov, LKML On Tue, 8 Jan 2013, Michel Lespinasse wrote: > - Does anyone know of any current work towards removing the > tasklist_lock use of rwlock_t ? Thomas Gleixner mentioned 3 years ago > that he'd give it a shot (https://lwn.net/Articles/364601/), did he > encounter some unforeseen difficulty that we should learn from ? I converted quite a bunch of the read side instances to rcu protection, but got distracted. There was no fundamental difficulty, just lack of time. > - Would there be any fundamental objection to implementing a fair > rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My > proposal there would be along the lines of: > > 1- implement a fair rwlock_t - the ticket based idea from David > Howells seems quite appropriate to me Nah. Lets get it killed. Most of the stuff can be converted to RCU and the remaining bits and pieces are the write lock sides which then can be converted to a big standard spinlock. There might be a few more complex ones, but Oleg said back then that those should be solved by locking the process instead of locking the whole tasklist. Thanks, tglx ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] rwlock_t unfairness and tasklist_lock 2013-01-11 14:34 ` Thomas Gleixner @ 2013-01-12 3:33 ` Michel Lespinasse 2013-01-12 17:46 ` Oleg Nesterov 0 siblings, 1 reply; 8+ messages in thread From: Michel Lespinasse @ 2013-01-12 3:33 UTC (permalink / raw) To: Thomas Gleixner; +Cc: David Howells, Salman Qazi, Oleg Nesterov, LKML On Fri, Jan 11, 2013 at 03:34:41PM +0100, Thomas Gleixner wrote: > On Tue, 8 Jan 2013, Michel Lespinasse wrote: > > - Does anyone know of any current work towards removing the > > tasklist_lock use of rwlock_t ? Thomas Gleixner mentioned 3 years ago > > that he'd give it a shot (https://lwn.net/Articles/364601/), did he > > encounter some unforeseen difficulty that we should learn from ? > > I converted quite a bunch of the read side instances to rcu > protection, but got distracted. There was no fundamental difficulty, > just lack of time. All right. Thanks for explaining here and offline; it looks like the problem is not as intractable as I had thought initially. > > - Would there be any fundamental objection to implementing a fair > > rwlock_t and dealing with the reentrancy issues in tasklist_lock ? My > > proposal there would be along the lines of: > > > > 1- implement a fair rwlock_t - the ticket based idea from David > > Howells seems quite appropriate to me > > Nah. Lets get it killed. Most of the stuff can be converted to RCU and > the remaining bits and pieces are the write lock sides which then can > be converted to a big standard spinlock. There might be a few more > complex ones, but Oleg said back then that those should be solved by > locking the process instead of locking the whole tasklist. So I looked again at getpriority() since that's what I had used for my DOS test code, and it looks like everything there is already protected by RCU or smaller granularity locks and refcounts. Patch attached to remove this tasklist_lock usage. Since I'm new to this, I would like someone to double check me. Also, what is the proper tree to send such patches to so they'll get some testing before making it into Linus's tree ? --------------------------------8<----------------------------- remove use of tasklist_lock in getpriority / setpriority syscalls I can't see anything in these syscalls that isn't already protected by RCU (for the task/thread iterations and for mapping pids to tasks) or by smaller granularity locks (for set_one_prio()) or refcounts (for find_user()). So, it looks like we can just remove the use of tasklist_lock... Signed-off-by: Michel Lespinasse <walken@google.com> --- kernel/sys.c | 4 ---- 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/kernel/sys.c b/kernel/sys.c index 265b37690421..5df66d4b118f 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -189,7 +189,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval) niceval = 19; rcu_read_lock(); - read_lock(&tasklist_lock); switch (which) { case PRIO_PROCESS: if (who) @@ -226,7 +225,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval) break; } out_unlock: - read_unlock(&tasklist_lock); rcu_read_unlock(); out: return error; @@ -251,7 +249,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who) return -EINVAL; rcu_read_lock(); - read_lock(&tasklist_lock); switch (which) { case PRIO_PROCESS: if (who) @@ -296,7 +293,6 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who) break; } out_unlock: - read_unlock(&tasklist_lock); rcu_read_unlock(); return retval; -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] rwlock_t unfairness and tasklist_lock 2013-01-12 3:33 ` [PATCH] " Michel Lespinasse @ 2013-01-12 17:46 ` Oleg Nesterov 0 siblings, 0 replies; 8+ messages in thread From: Oleg Nesterov @ 2013-01-12 17:46 UTC (permalink / raw) To: Michel Lespinasse; +Cc: Thomas Gleixner, David Howells, Salman Qazi, LKML On 01/11, Michel Lespinasse wrote: > > So I looked again at getpriority() since that's what I had used for my > DOS test code, and it looks like everything there is already protected > by RCU or smaller granularity locks and refcounts. Patch attached to > remove this tasklist_lock usage. And probably the change in getpriority() is fine, but ... > @@ -189,7 +189,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval) > niceval = 19; > > rcu_read_lock(); > - read_lock(&tasklist_lock); > switch (which) { > case PRIO_PROCESS: > if (who) > @@ -226,7 +225,6 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval) > break; > } > out_unlock: > - read_unlock(&tasklist_lock); you also changed setpriority(), this should be documented at least ;) OK. Even without this change, say, sys_setpriority(PRIO_PGRP) can obviously race with fork(), so this change probably is not bad. Oleg. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-01-25 0:33 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-01-09 4:03 rwlock_t unfairness and tasklist_lock Michel Lespinasse 2013-01-09 17:49 ` Oleg Nesterov 2013-01-09 23:20 ` Michel Lespinasse 2013-01-12 17:31 ` Oleg Nesterov 2013-01-25 0:33 ` Michel Lespinasse 2013-01-11 14:34 ` Thomas Gleixner 2013-01-12 3:33 ` [PATCH] " Michel Lespinasse 2013-01-12 17:46 ` Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox