From mboxrd@z Thu Jan 1 00:00:00 1970 From: Artem Bityutskiy Subject: Re: [PATCHv5-1 10/15] writeback: move bdi threads exiting logic to the forker thread Date: Sun, 25 Jul 2010 12:26:49 +0300 Message-ID: <1280050009.9990.6.camel@localhost> References: <1280046581-23623-1-git-send-email-dedekind1@gmail.com> <1280046581-23623-11-git-send-email-dedekind1@gmail.com> Reply-To: dedekind1@gmail.com Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org To: Jens Axboe Return-path: Received: from smtp.nokia.com ([192.100.122.233]:35235 "EHLO mgw-mx06.nokia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752100Ab0GYJeQ (ORCPT ); Sun, 25 Jul 2010 05:34:16 -0400 In-Reply-To: <1280046581-23623-11-git-send-email-dedekind1@gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Sorry, this patch has a "space before tab" problem at one line, spotted by checkpatch.pl. So, FWIW, re-sending it. It is the same, but with checkpatch.pl warning fixed. =46rom: Artem Bityutskiy Subject: [PATCHv5 10/15] writeback: move bdi threads exiting logic to t= he forker thread Currently, bdi threads can decide to exit if there were no useful activ= ities for 5 minutes. However, this causes nasty races: we can easily oops in = the 'bdi_queue_work()' if the bdi thread decides to exit while we are wakin= g it up. And even if we do not oops, but the bdi tread exits immediately after w= e wake it up, we'd lose the wake-up event and have an unnecessary delay (up to= 5 secs) in the bdi work processing. This patch makes the forker thread to be the central place which not on= ly creates bdi threads, but also kills them if they were inactive long eno= ugh. This better design-wise. Another reason why this change was done is to prepare for the further c= hanges which will prevent the bdi threads from waking up every 5 sec and wasti= ng power. Indeed, when the task does not wake up periodically anymore, it = won't be able to exit either. This patch also moves the the 'wake_up_bit()' call from the bdi thread = to the forker thread as well. So now the forker thread sets the BDI_pending bi= t, then forks the task or kills it, then clears the bit and wakes up the waitin= g process. The only process which may wain on the bit is 'bdi_wb_shutdown()'. This function was changed as well - now it first removes the bdi from the 'bdi_list', then waits on the 'BDI_pending' bit. Once it wakes up, it i= s guaranteed that the forker thread won't race with it, because the bdi i= s not visible. Note, the forker thread sets the 'BDI_pending' bit under the 'bdi->wb_lock' which is essential for proper serialization. And additionally, when we change 'bdi->wb.task', we now take the 'bdi->work_lock', to make sure that we do not lose wake-ups which we ot= herwise would when raced with, say, 'bdi_queue_work()'. Signed-off-by: Artem Bityutskiy Reviewed-by: Christoph Hellwig --- fs/fs-writeback.c | 54 +++++++++-------------------------------- mm/backing-dev.c | 69 ++++++++++++++++++++++++++++++++++++++++++++-= ------- 2 files changed, 70 insertions(+), 53 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 53e1028..b9e5ba0 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -78,21 +78,17 @@ static void bdi_queue_work(struct backing_dev_info = *bdi, =20 spin_lock(&bdi->wb_lock); list_add_tail(&work->list, &bdi->work_list); - spin_unlock(&bdi->wb_lock); - - /* - * If the default thread isn't there, make sure we add it. When - * it gets created and wakes up, we'll run this work. - */ - if (unlikely(!bdi->wb.task)) { + if (bdi->wb.task) { + wake_up_process(bdi->wb.task); + } else { + /* + * The bdi thread isn't there, wake up the forker thread which + * will create and run it. + */ trace_writeback_nothread(bdi, work); wake_up_process(default_backing_dev_info.wb.task); - } else { - struct bdi_writeback *wb =3D &bdi->wb; - - if (wb->task) - wake_up_process(wb->task); } + spin_unlock(&bdi->wb_lock); } =20 static void @@ -800,7 +796,6 @@ int bdi_writeback_thread(void *data) { struct bdi_writeback *wb =3D data; struct backing_dev_info *bdi =3D wb->bdi; - unsigned long wait_jiffies =3D -1UL; long pages_written; =20 current->flags |=3D PF_FLUSHER | PF_SWAPWRITE; @@ -812,13 +807,6 @@ int bdi_writeback_thread(void *data) */ set_user_nice(current, 0); =20 - /* - * Clear pending bit and wakeup anybody waiting to tear us down - */ - clear_bit(BDI_pending, &bdi->state); - smp_mb__after_clear_bit(); - wake_up_bit(&bdi->state, BDI_pending); - trace_writeback_thread_start(bdi); =20 while (!kthread_should_stop()) { @@ -828,18 +816,6 @@ int bdi_writeback_thread(void *data) =20 if (pages_written) wb->last_active =3D jiffies; - else if (wait_jiffies !=3D -1UL) { - unsigned long max_idle; - - /* - * Longest period of inactivity that we tolerate. If we - * see dirty data again later, the thread will get - * recreated automatically. - */ - max_idle =3D max(5UL * 60 * HZ, wait_jiffies); - if (time_after(jiffies, max_idle + wb->last_active)) - break; - } =20 set_current_state(TASK_INTERRUPTIBLE); if (!list_empty(&bdi->work_list)) { @@ -847,21 +823,15 @@ int bdi_writeback_thread(void *data) continue; } =20 - if (dirty_writeback_interval) { - wait_jiffies =3D msecs_to_jiffies(dirty_writeback_interval * 10); - schedule_timeout(wait_jiffies); - } else + if (dirty_writeback_interval) + schedule_timeout(msecs_to_jiffies(dirty_writeback_interval * 10)); + else schedule(); =20 try_to_freeze(); } =20 - wb->task =3D NULL; - - /* - * Flush any work that raced with us exiting. No new work - * will be added, since this bdi isn't discoverable anymore. - */ + /* Flush any work that raced with us exiting */ if (!list_empty(&bdi->work_list)) wb_do_writeback(wb, 1); =20 diff --git a/mm/backing-dev.c b/mm/backing-dev.c index e104e32..9c1c199 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -316,6 +316,18 @@ static void sync_supers_timer_fn(unsigned long unu= sed) bdi_arm_supers_timer(); } =20 +/* + * Calculate the longest interval (jiffies) bdi threads are allowed to= be + * inactive. + */ +static unsigned long bdi_longest_inactive(void) +{ + unsigned long interval; + + interval =3D msecs_to_jiffies(dirty_writeback_interval * 10); + return max(5UL * 60 * HZ, interval); +} + static int bdi_forker_thread(void *ptr) { struct bdi_writeback *me =3D ptr; @@ -329,11 +341,12 @@ static int bdi_forker_thread(void *ptr) set_user_nice(current, 0); =20 for (;;) { - struct task_struct *task; + struct task_struct *task =3D NULL; struct backing_dev_info *bdi; enum { NO_ACTION, /* Nothing to do */ FORK_THREAD, /* Fork bdi thread */ + KILL_THREAD, /* Kill inactive bdi thread */ } action =3D NO_ACTION; =20 /* @@ -346,10 +359,6 @@ static int bdi_forker_thread(void *ptr) spin_lock_bh(&bdi_lock); set_current_state(TASK_INTERRUPTIBLE); =20 - /* - * Check if any existing bdi's have dirty data without - * a thread registered. If so, set that up. - */ list_for_each_entry(bdi, &bdi_list, bdi_list) { bool have_dirty_io; =20 @@ -376,6 +385,25 @@ static int bdi_forker_thread(void *ptr) action =3D FORK_THREAD; break; } + + spin_lock(&bdi->wb_lock); + /* + * If there is no work to do and the bdi thread was + * inactive long enough - kill it. The wb_lock is taken + * to make sure no-one adds more work to this bdi and + * wakes the bdi thread up. + */ + if (bdi->wb.task && !have_dirty_io && + time_after(jiffies, bdi->wb.last_active + + bdi_longest_inactive())) { + task =3D bdi->wb.task; + bdi->wb.task =3D NULL; + spin_unlock(&bdi->wb_lock); + set_bit(BDI_pending, &bdi->state); + action =3D KILL_THREAD; + break; + } + spin_unlock(&bdi->wb_lock); } spin_unlock_bh(&bdi_lock); =20 @@ -394,8 +422,20 @@ static int bdi_forker_thread(void *ptr) * the bdi from the thread. */ bdi_flush_io(bdi); - } else + } else { + /* + * The spinlock makes sure we do not lose + * wake-ups when racing with 'bdi_queue_work()'. + */ + spin_lock(&bdi->wb_lock); bdi->wb.task =3D task; + spin_unlock(&bdi->wb_lock); + } + break; + + case KILL_THREAD: + __set_current_state(TASK_RUNNING); + kthread_stop(task); break; =20 case NO_ACTION: @@ -407,6 +447,13 @@ static int bdi_forker_thread(void *ptr) /* Back to the main loop */ continue; } + + /* + * Clear pending bit and wakeup anybody waiting to tear us down. + */ + clear_bit(BDI_pending, &bdi->state); + smp_mb__after_clear_bit(); + wake_up_bit(&bdi->state, BDI_pending); } =20 return 0; @@ -490,15 +537,15 @@ static void bdi_wb_shutdown(struct backing_dev_in= fo *bdi) return; =20 /* - * If setup is pending, wait for that to complete first + * Make sure nobody finds us on the bdi_list anymore */ - wait_on_bit(&bdi->state, BDI_pending, bdi_sched_wait, - TASK_UNINTERRUPTIBLE); + bdi_remove_from_list(bdi); =20 /* - * Make sure nobody finds us on the bdi_list anymore + * If setup is pending, wait for that to complete first */ - bdi_remove_from_list(bdi); + wait_on_bit(&bdi->state, BDI_pending, bdi_sched_wait, + TASK_UNINTERRUPTIBLE); =20 /* * Finally, kill the kernel thread. We don't need to be RCU --=20 1.7.1.1 --=20 Best Regards, Artem Bityutskiy (=D0=90=D1=80=D1=82=D1=91=D0=BC =D0=91=D0=B8=D1=82=D1=8E= =D1=86=D0=BA=D0=B8=D0=B9) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html