* [PATCH RT] fs: jbd2: pull your plug when waiting for space @ 2014-02-21 12:32 Sebastian Andrzej Siewior 2014-02-21 13:54 ` Mike Galbraith 2014-03-10 17:26 ` Steven Rostedt 0 siblings, 2 replies; 6+ messages in thread From: Sebastian Andrzej Siewior @ 2014-02-21 12:32 UTC (permalink / raw) To: Mike Galbraith, Steven Rostedt Cc: linux-kernel@vger.kernel.org, linux-rt-users, tglx Two cps in parallel managed to stall the the ext4 fs. It seems that journal code is either waiting for locks or sleeping waiting for something to happen. This seems similar to what Mike observed on ext3, here is his description: |With an -rt kernel, and a heavy sync IO load, tasks can jam |up on journal locks without unplugging, which can lead to |terminal IO starvation. Unplug and schedule when waiting |for space. This is on v3.2-RT. This cp testcase triggers about once in four runs. It did not trigger once in 20 runs on v3.12-RT. This brings me to the question: could it been fixed in the meantime and we not need the jbd patches in latest -RT is there a better testcase? Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- fs/jbd2/checkpoint.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c index 16a698b..c6443c3 100644 --- a/fs/jbd2/checkpoint.c +++ b/fs/jbd2/checkpoint.c @@ -129,6 +129,8 @@ void __jbd2_log_wait_for_space(journal_t *journal) if (journal->j_flags & JBD2_ABORT) return; write_unlock(&journal->j_state_lock); + if (current->plug) + io_schedule(); mutex_lock(&journal->j_checkpoint_mutex); /* -- 1.9.0.rc3 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH RT] fs: jbd2: pull your plug when waiting for space 2014-02-21 12:32 [PATCH RT] fs: jbd2: pull your plug when waiting for space Sebastian Andrzej Siewior @ 2014-02-21 13:54 ` Mike Galbraith 2014-03-10 17:47 ` Theodore Ts'o 2014-03-10 17:26 ` Steven Rostedt 1 sibling, 1 reply; 6+ messages in thread From: Mike Galbraith @ 2014-02-21 13:54 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Steven Rostedt, linux-kernel@vger.kernel.org, linux-rt-users, tglx On Fri, 2014-02-21 at 13:32 +0100, Sebastian Andrzej Siewior wrote: > Two cps in parallel managed to stall the the ext4 fs. It seems that > journal code is either waiting for locks or sleeping waiting for > something to happen. This seems similar to what Mike observed on ext3, > here is his description: > > |With an -rt kernel, and a heavy sync IO load, tasks can jam > |up on journal locks without unplugging, which can lead to > |terminal IO starvation. Unplug and schedule when waiting > |for space. > > This is on v3.2-RT. This cp testcase triggers about once in four runs. > It did not trigger once in 20 runs on v3.12-RT. In 3.0-rt, it could take ages to hit an IO deadlock. > This brings me to the question: could it been fixed in the meantime and > we not need the jbd patches in latest -RT is there a better testcase? Dunno, suse QA does a simple but heavy dbench async then sync stress test, which would eventually lead to IO deadlock in 3.0-rt. I dumped the pull your plug for jbd only patch in favor of the (stunningly beautiful) patch below, because XFS and others eventually deadlocked with crossed IO [ABBAXYZ] dependencies as well. I haven't had time to do massive IO pounding in 3.12-rt yet, but the below got 3.0-rt over the IO hurdle, along with the one below that for btrfs, which lasted for about, oh, 2us without it. Subject: rt: pull your plug before blocking Queued IO can lead to IO deadlock should a task require wakeup from as task which is blocked on that queued IO. ext3: dbench1 queues a buffer, blocks on journal mutex, it's plug is not pulled. dbench2 mutex owner is waiting for kjournald, who is waiting for the buffer queued by dbench1. Game over. Signed-off-by: Mike Galbraith <efault@gmx.de> --- kernel/rtmutex.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) --- a/kernel/rtmutex.c +++ b/kernel/rtmutex.c @@ -22,6 +22,7 @@ #include <linux/sched/rt.h> #include <linux/timer.h> #include <linux/ww_mutex.h> +#include <linux/blkdev.h> #include "rtmutex_common.h" @@ -674,8 +675,18 @@ static inline void rt_spin_lock_fastlock if (likely(rt_mutex_cmpxchg(lock, NULL, current))) rt_mutex_deadlock_account_lock(lock, current); - else + else { + /* + * We can't pull the plug if we're already holding a lock + * else we can deadlock. eg, if we're holding slab_lock, + * ksoftirqd can block while processing BLOCK_SOFTIRQ after + * having acquired q->queue_lock. If _we_ then block on + * that q->queue_lock while flushing our plug, deadlock. + */ + if (__migrate_disabled(current) < 2 && blk_needs_flush_plug(current)) + blk_schedule_flush_plug(current); slowfn(lock); + } } static inline void rt_spin_lock_fastunlock(struct rt_mutex *lock, @@ -1275,8 +1286,11 @@ rt_mutex_fastlock(struct rt_mutex *lock, if (!detect_deadlock && likely(rt_mutex_cmpxchg(lock, NULL, current))) { rt_mutex_deadlock_account_lock(lock, current); return 0; - } else + } else { + if (blk_needs_flush_plug(current)) + blk_schedule_flush_plug(current); return slowfn(lock, state, NULL, detect_deadlock, ww_ctx); + } } static inline int Subject: rt,fs,btrfs: fix rt deadlock on extent_buffer->lock Trivially repeatable deadlock is cured by enabling lockdep code in btrfs_clear_path_blocking() as suggested by Chris Mason. He also suggested restricting blocking reader count to one, and not allowing a spinning reader while blocking reader exists. This has proven to be unnecessary, the strict lock order enforcement is enough.. or rather that's my box's opinion after long hours of hard pounding. Note: extent-tree.c bit is additional recommendation from Chris Mason, split into a separate patch after discussion. Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Chris Mason <chris.mason@fusionio.com> --- fs/btrfs/ctree.c | 4 ++-- fs/btrfs/extent-tree.c | 8 -------- 2 files changed, 2 insertions(+), 10 deletions(-) --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -81,7 +81,7 @@ noinline void btrfs_clear_path_blocking( { int i; -#ifdef CONFIG_DEBUG_LOCK_ALLOC +#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE)) /* lockdep really cares that we take all of these spinlocks * in the right order. If any of the locks in the path are not * currently blocking, it is going to complain. So, make really @@ -108,7 +108,7 @@ noinline void btrfs_clear_path_blocking( } } -#ifdef CONFIG_DEBUG_LOCK_ALLOC +#if (defined(CONFIG_DEBUG_LOCK_ALLOC) || defined(CONFIG_PREEMPT_RT_BASE)) if (held) btrfs_clear_lock_blocking_rw(held, held_rw); #endif --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6899,14 +6899,6 @@ use_block_rsv(struct btrfs_trans_handle goto again; } - if (btrfs_test_opt(root, ENOSPC_DEBUG)) { - static DEFINE_RATELIMIT_STATE(_rs, - DEFAULT_RATELIMIT_INTERVAL * 10, - /*DEFAULT_RATELIMIT_BURST*/ 1); - if (__ratelimit(&_rs)) - WARN(1, KERN_DEBUG - "btrfs: block rsv returned %d\n", ret); - } try_reserve: ret = reserve_metadata_bytes(root, block_rsv, blocksize, BTRFS_RESERVE_NO_FLUSH); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RT] fs: jbd2: pull your plug when waiting for space 2014-02-21 13:54 ` Mike Galbraith @ 2014-03-10 17:47 ` Theodore Ts'o 2014-03-11 4:13 ` Mike Galbraith 0 siblings, 1 reply; 6+ messages in thread From: Theodore Ts'o @ 2014-03-10 17:47 UTC (permalink / raw) To: Mike Galbraith Cc: Sebastian Andrzej Siewior, Steven Rostedt, linux-kernel@vger.kernel.org, linux-rt-users, tglx On Fri, Feb 21, 2014 at 02:54:12PM +0100, Mike Galbraith wrote: > > ext3: dbench1 queues a buffer, blocks on journal mutex, it's plug is not > pulled. dbench2 mutex owner is waiting for kjournald, who is waiting for > the buffer queued by dbench1. Game over. Where is in ext3/4 are we calling some function which could end up blocking on kjournald while we have the I/O queue plugged? That sounds suspicious and potentially wrong. - Ted ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RT] fs: jbd2: pull your plug when waiting for space 2014-03-10 17:47 ` Theodore Ts'o @ 2014-03-11 4:13 ` Mike Galbraith 0 siblings, 0 replies; 6+ messages in thread From: Mike Galbraith @ 2014-03-11 4:13 UTC (permalink / raw) To: Theodore Ts'o Cc: Sebastian Andrzej Siewior, Steven Rostedt, linux-kernel@vger.kernel.org, linux-rt-users, tglx On Mon, 2014-03-10 at 13:47 -0400, Theodore Ts'o wrote: > On Fri, Feb 21, 2014 at 02:54:12PM +0100, Mike Galbraith wrote: > > > > ext3: dbench1 queues a buffer, blocks on journal mutex, it's plug is not > > pulled. dbench2 mutex owner is waiting for kjournald, who is waiting for > > the buffer queued by dbench1. Game over. > > Where is in ext3/4 are we calling some function which could end up > blocking on kjournald while we have the I/O queue plugged? That > sounds suspicious and potentially wrong. I don't have the crash dumps and analysis handy, this was quite some time ago. Problem is that.. static inline void sched_submit_work(struct task_struct *tsk) { if (!tsk->state || tsk_is_pi_blocked(tsk)) return; /* * If we are going to sleep and we have plugged IO queued, * make sure to submit it to avoid deadlocks. */ if (blk_needs_flush_plug(tsk)) blk_schedule_flush_plug(tsk); } ..tsk_is_pi_blocked(tsk) leaves us with IO queued, dependency on which can (_did_ for ext[34] and xfs that I recall) end up with our waker waiting on our IO. There were other deadlock scenarios, not only the one in the quoted text. -Mike ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RT] fs: jbd2: pull your plug when waiting for space 2014-02-21 12:32 [PATCH RT] fs: jbd2: pull your plug when waiting for space Sebastian Andrzej Siewior 2014-02-21 13:54 ` Mike Galbraith @ 2014-03-10 17:26 ` Steven Rostedt 2014-03-10 17:41 ` Sebastian Andrzej Siewior 1 sibling, 1 reply; 6+ messages in thread From: Steven Rostedt @ 2014-03-10 17:26 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Mike Galbraith, linux-kernel@vger.kernel.org, linux-rt-users, tglx On Fri, 21 Feb 2014 13:32:53 +0100 Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > Two cps in parallel managed to stall the the ext4 fs. It seems that > journal code is either waiting for locks or sleeping waiting for > something to happen. This seems similar to what Mike observed on ext3, > here is his description: > > |With an -rt kernel, and a heavy sync IO load, tasks can jam > |up on journal locks without unplugging, which can lead to > |terminal IO starvation. Unplug and schedule when waiting > |for space. > > This is on v3.2-RT. This cp testcase triggers about once in four runs. > It did not trigger once in 20 runs on v3.12-RT. > This brings me to the question: could it been fixed in the meantime and > we not need the jbd patches in latest -RT is there a better testcase? I'm a little confused. Is this only needed for 3.12-rt? I see that you did not mark it for stable. -- Steve > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > fs/jbd2/checkpoint.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c > index 16a698b..c6443c3 100644 > --- a/fs/jbd2/checkpoint.c > +++ b/fs/jbd2/checkpoint.c > @@ -129,6 +129,8 @@ void __jbd2_log_wait_for_space(journal_t *journal) > if (journal->j_flags & JBD2_ABORT) > return; > write_unlock(&journal->j_state_lock); > + if (current->plug) > + io_schedule(); > mutex_lock(&journal->j_checkpoint_mutex); > > /* ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RT] fs: jbd2: pull your plug when waiting for space 2014-03-10 17:26 ` Steven Rostedt @ 2014-03-10 17:41 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 6+ messages in thread From: Sebastian Andrzej Siewior @ 2014-03-10 17:41 UTC (permalink / raw) To: Steven Rostedt Cc: Mike Galbraith, linux-kernel@vger.kernel.org, linux-rt-users, tglx On 03/10/2014 06:26 PM, Steven Rostedt wrote: > I'm a little confused. Is this only needed for 3.12-rt? I see that you > did not mark it for stable. it supposed to go stable. Sorry about the missing tag. > -- Steve Sebastian ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-03-11 4:13 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-02-21 12:32 [PATCH RT] fs: jbd2: pull your plug when waiting for space Sebastian Andrzej Siewior 2014-02-21 13:54 ` Mike Galbraith 2014-03-10 17:47 ` Theodore Ts'o 2014-03-11 4:13 ` Mike Galbraith 2014-03-10 17:26 ` Steven Rostedt 2014-03-10 17:41 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).