From: Thomas Gleixner <tglx@linutronix.de>
To: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>,
LKML <linux-kernel@vger.kernel.org>,
linux-fsdevel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
Jens Axboe <jaxboe@fusionio.com>,
mgalbraith@suse.com
Subject: Re: Deadlocks due to per-process plugging
Date: Fri, 13 Jul 2012 16:25:05 +0200 (CEST) [thread overview]
Message-ID: <alpine.LFD.2.02.1207131444490.32033@ionos> (raw)
In-Reply-To: <20120713123318.GB20361@quack.suse.cz>
On Fri, 13 Jul 2012, Jan Kara wrote:
> On Thu 12-07-12 16:15:29, Thomas Gleixner wrote:
> > > Ah, I didn't know this. Thanks for the hint. So in the kdump I have I can
> > > see requests queued in tsk->plug despite the process is sleeping in
> > > TASK_UNINTERRUPTIBLE state. So the only way how unplug could have been
> > > omitted is if tsk_is_pi_blocked() was true. Rummaging through the dump...
> > > indeed task has pi_blocked_on = 0xffff8802717d79c8. The dump is from an -rt
> > > kernel (I just didn't originally thought that makes any difference) so
> > > actually any mutex is rtmutex and thus tsk_is_pi_blocked() is true whenever
> > > we are sleeping on a mutex. So this seems like a bug in rtmutex code.
> >
> > Well, the reason why this check is there is that the task which is
> > blocked on a lock can hold another lock which might cause a deadlock
> > in the flush path.
> OK. Let me understand the details. Block layer needs just queue_lock for
> unplug to succeed. That is a spinlock but in RT kernel, even a process
> holding a spinlock can be preempted if I remember correctly. So that
> condition is there effectively to not unplug when a task is being scheduled
> away while holding queue_lock? Did I get it right?
blk_flush_plug_list() is not only queue_lock. There can be other locks
taken in the callbacks, elevator ...
> > > Thomas, you seemed to have added that condition... Any idea how to avoid
> > > the deadlock?
> >
> > Good question. We could do the flush when the blocked task does not
> > hold a lock itself. Might be worth a try.
> Yeah, that should work for avoiding the deadlock as well.
Though we don't have a lock held count except when lockdep is enabled,
which you probably don't want to do when running a production system.
But we only care about stuff being scheduled out while blocked on a
"sleeping spinlock" - i.e. spinlock, rwlock.
So the patch below should allow the unplug to take place when blocked
on mutexes etc.
Thanks,
tglx
----
Index: linux-stable-rt/include/linux/sched.h
===================================================================
--- linux-stable-rt.orig/include/linux/sched.h
+++ linux-stable-rt/include/linux/sched.h
@@ -2145,9 +2145,10 @@ extern unsigned int sysctl_sched_cfs_ban
extern int rt_mutex_getprio(struct task_struct *p);
extern void rt_mutex_setprio(struct task_struct *p, int prio);
extern void rt_mutex_adjust_pi(struct task_struct *p);
+extern bool pi_blocked_on_rt_lock(struct task_struct *tsk);
static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
{
- return tsk->pi_blocked_on != NULL;
+ return tsk->pi_blocked_on != NULL && pi_blocked_on_rt_lock(tsk);
}
#else
static inline int rt_mutex_getprio(struct task_struct *p)
Index: linux-stable-rt/kernel/rtmutex.c
===================================================================
--- linux-stable-rt.orig/kernel/rtmutex.c
+++ linux-stable-rt/kernel/rtmutex.c
@@ -699,6 +699,11 @@ static int adaptive_wait(struct rt_mutex
# define pi_lock(lock) raw_spin_lock_irq(lock)
# define pi_unlock(lock) raw_spin_unlock_irq(lock)
+bool pi_blocked_on_rt_lock(struct task_struct *tsk)
+{
+ return tsk->pi_blocked_on && tsk->pi_blocked_on->savestate;
+}
+
/*
* Slow path lock function spin_lock style: this variant is very
* careful not to miss any non-lock wakeups.
next prev parent reply other threads:[~2012-07-13 14:25 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-11 13:37 Deadlocks due to per-process plugging Jan Kara
2012-07-11 16:05 ` Jeff Moyer
2012-07-11 20:16 ` Jan Kara
2012-07-11 22:12 ` Thomas Gleixner
2012-07-12 4:12 ` Mike Galbraith
2012-07-13 12:38 ` Jan Kara
2012-07-12 2:07 ` Mike Galbraith
2012-07-12 14:15 ` Thomas Gleixner
2012-07-13 12:33 ` Jan Kara
2012-07-13 14:25 ` Thomas Gleixner [this message]
2012-07-13 14:46 ` Jan Kara
2012-07-15 8:59 ` Thomas Gleixner
2012-07-15 9:14 ` Mike Galbraith
2012-07-15 9:51 ` Thomas Gleixner
2012-07-16 2:22 ` Mike Galbraith
2012-07-16 8:59 ` Thomas Gleixner
2012-07-16 9:48 ` Mike Galbraith
2012-07-16 9:59 ` Thomas Gleixner
2012-07-16 10:13 ` Mike Galbraith
2012-07-16 10:08 ` Mike Galbraith
2012-07-16 10:19 ` Thomas Gleixner
2012-07-16 10:30 ` Mike Galbraith
2012-07-16 11:24 ` Mike Galbraith
2012-07-16 14:35 ` Mike Galbraith
2012-07-17 13:10 ` Mike Galbraith
2012-07-18 4:44 ` Mike Galbraith
2012-07-18 5:30 ` Mike Galbraith
2012-07-21 7:47 ` Mike Galbraith
2012-07-22 18:43 ` Mike Galbraith
2012-07-23 9:46 ` Mike Galbraith
2012-07-14 11:00 ` Mike Galbraith
2012-07-14 11:06 ` Mike Galbraith
2012-07-15 7:14 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.02.1207131444490.32033@ionos \
--to=tglx@linutronix.de \
--cc=jack@suse.cz \
--cc=jaxboe@fusionio.com \
--cc=jmoyer@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mgalbraith@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).