* [PATCH 0/3] writeback: more clean-ups and fixes
@ 2010-07-12 15:22 Artem Bityutskiy
2010-07-12 15:22 ` [PATCH 1/3] writeback: remove redundant list initialization Artem Bityutskiy
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Artem Bityutskiy @ 2010-07-12 15:22 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-fsdevel
[Sorry for a duplicating mail, I tried to make this series to be
an In-Reply-To to the previous clean-up series, but failed, so
just re-sending with git send-email ]
And on top of those 4 clean-up patches I've implemented the following
3 patches which do further clean-up and fix one of the races I see.
There is more work, but I am sending this early to get an early feedback.
Note, I only gave few test to my patches - booted Fedora, made sure
bdi writeback threads are created/deleted/doing writeback, hotplugged
external USB drive, saw writeback thread created, unplugged and saw it
be removed. Did some work while running a kernel with these patches.
The patches are against your for-2.6.36 branch.
My long-term plan is to get rid of unneeded wake-ups, but I'm still just
reading your code and learning, and clean-up / fix things I spot.
Artem.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/3] writeback: remove redundant list initialization
2010-07-12 15:22 [PATCH 0/3] writeback: more clean-ups and fixes Artem Bityutskiy
@ 2010-07-12 15:22 ` Artem Bityutskiy
2010-07-12 15:22 ` [PATCH 2/3] writeback: simplify bdi code a little Artem Bityutskiy
2010-07-12 15:22 ` [PATCH 3/3] writeback: fix possible race when shutting down bdi Artem Bityutskiy
2 siblings, 0 replies; 4+ messages in thread
From: Artem Bityutskiy @ 2010-07-12 15:22 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-fsdevel
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
We do not have to call 'INIT_LIST_HEAD()' for list elements
('bdi->bdi_list') before inserting them to the 'bdi_pending_list',
because 'list_add_tail' will initialize the element anyway. Thus,
kill the redundant initialization.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
mm/backing-dev.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 15378db..2df2af6 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -416,7 +416,6 @@ static void bdi_add_to_pending(struct rcu_head *head)
struct backing_dev_info *bdi;
bdi = container_of(head, struct backing_dev_info, rcu_head);
- INIT_LIST_HEAD(&bdi->bdi_list);
spin_lock(&bdi_lock);
list_add_tail(&bdi->bdi_list, &bdi_pending_list);
--
1.7.1.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/3] writeback: simplify bdi code a little
2010-07-12 15:22 [PATCH 0/3] writeback: more clean-ups and fixes Artem Bityutskiy
2010-07-12 15:22 ` [PATCH 1/3] writeback: remove redundant list initialization Artem Bityutskiy
@ 2010-07-12 15:22 ` Artem Bityutskiy
2010-07-12 15:22 ` [PATCH 3/3] writeback: fix possible race when shutting down bdi Artem Bityutskiy
2 siblings, 0 replies; 4+ messages in thread
From: Artem Bityutskiy @ 2010-07-12 15:22 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-fsdevel
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch simplifies bdi code a little by removing the 'pending_list'
which is redundant. Indeed, currently the forker thread
('bdi_forker_thread()') is working like this:
1. In a loop, fetch all bdi's which have works but have no writeback
thread and move them to the 'pending_list'.
2. If the list is empty, sleep for 5 s.
3. Otherwise, take one BDI from the list, fork the writeback thread for
this bdi, and repeat the loop again.
IOW, it first moves everything to the 'pending_list', then process only
one element, and so on. With this patch the algorithm is:
1. Find the first bdi which has work and remove it from the global
list of bdi's (bdi_list).
2. If there were not such bdi, sleep 5 s.
3. Fork the writeback thread for this bdi and repeat the loop again.
IOW, now we find the first bdi to process, process it, and so on. This
is simpler and involves less lists.
This patch also removes unneeded and unused complications which involve
'rcu_call()' and 'bdi->rcu_head' - now we use simple 'synchronize_rcu()'
and remove the 'rcu_head' field from 'struct backing_dev_info'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
include/linux/backing-dev.h | 1 -
mm/backing-dev.c | 85 ++++++++++++-------------------------------
2 files changed, 23 insertions(+), 63 deletions(-)
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 14648b9..a9f35c6 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -58,7 +58,6 @@ struct bdi_writeback {
struct backing_dev_info {
struct list_head bdi_list;
- struct rcu_head rcu_head;
unsigned long ra_pages; /* max readahead in PAGE_CACHE_SIZE units */
unsigned long state; /* Always use atomic bitops on this */
unsigned int capabilities; /* Device capabilities */
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 2df2af6..0a69066 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -50,8 +50,6 @@ static struct timer_list sync_supers_timer;
static int bdi_sync_supers(void *);
static void sync_supers_timer_fn(unsigned long);
-static void bdi_add_default_flusher_thread(struct backing_dev_info *bdi);
-
#ifdef CONFIG_DEBUG_FS
#include <linux/debugfs.h>
#include <linux/seq_file.h>
@@ -331,6 +329,7 @@ static int bdi_forker_thread(void *ptr)
set_user_nice(current, 0);
for (;;) {
+ bool no_work = 1;
struct backing_dev_info *bdi, *tmp;
struct bdi_writeback *wb;
@@ -353,13 +352,26 @@ static int bdi_forker_thread(void *ptr)
if (list_empty(&bdi->work_list) &&
!bdi_has_dirty_io(bdi))
continue;
+ if (!bdi_cap_writeback_dirty(bdi))
+ continue;
+
+ WARN(!test_bit(BDI_registered, &bdi->state),
+ "bdi %p/%s is not registered!\n", bdi, bdi->name);
+
+ list_del_rcu(&bdi->bdi_list);
- bdi_add_default_flusher_thread(bdi);
+ /*
+ * Set the pending bit - if someone will try to
+ * unregister this bdi - it'll wait on this bit.
+ */
+ set_bit(BDI_pending, &bdi->state);
+ no_work = 0;
+ break;
}
set_current_state(TASK_INTERRUPTIBLE);
- if (list_empty(&bdi_pending_list)) {
+ if (no_work) {
unsigned long wait;
spin_unlock_bh(&bdi_lock);
@@ -373,24 +385,18 @@ static int bdi_forker_thread(void *ptr)
}
__set_current_state(TASK_RUNNING);
-
- /*
- * This is our real job - check for pending entries in
- * bdi_pending_list, and create the threads that got added
- */
- bdi = list_entry(bdi_pending_list.next, struct backing_dev_info,
- bdi_list);
- list_del_init(&bdi->bdi_list);
spin_unlock_bh(&bdi_lock);
+ /* Make sure no one uses the picked bdi */
+ synchronize_rcu();
+
wb = &bdi->wb;
wb->thread = kthread_run(bdi_writeback_thread, wb, "flush-%s",
dev_name(bdi->dev));
/*
- * If thread creation fails, then readd the bdi to
- * the pending list and force writeout of the bdi
- * from this forker thread. That will free some memory
- * and we can try again.
+ * If thread creation fails, then readd the bdi back to the
+ * list and force writeout of the bdi from this forker thread.
+ * That will free some memory and we can try again.
*/
if (IS_ERR(wb->thread)) {
wb->thread = NULL;
@@ -401,7 +407,7 @@ static int bdi_forker_thread(void *ptr)
* memory.
*/
spin_lock_bh(&bdi_lock);
- list_add_tail(&bdi->bdi_list, &bdi_pending_list);
+ list_add(&bdi->bdi_list, &bdi_list);
spin_unlock_bh(&bdi_lock);
bdi_flush_io(bdi);
@@ -411,50 +417,6 @@ static int bdi_forker_thread(void *ptr)
return 0;
}
-static void bdi_add_to_pending(struct rcu_head *head)
-{
- struct backing_dev_info *bdi;
-
- bdi = container_of(head, struct backing_dev_info, rcu_head);
-
- spin_lock(&bdi_lock);
- list_add_tail(&bdi->bdi_list, &bdi_pending_list);
- spin_unlock(&bdi_lock);
-}
-
-/*
- * Add the default flusher thread that gets created for any bdi
- * that has dirty data pending writeout
- */
-void static bdi_add_default_flusher_thread(struct backing_dev_info *bdi)
-{
- if (!bdi_cap_writeback_dirty(bdi))
- return;
-
- if (WARN_ON(!test_bit(BDI_registered, &bdi->state))) {
- printk(KERN_ERR "bdi %p/%s is not registered!\n",
- bdi, bdi->name);
- return;
- }
-
- /*
- * Check with the helper whether to proceed adding a thread. Will only
- * abort if we two or more simultanous calls to
- * bdi_add_default_flusher_thread() occured, further additions will
- * block waiting for previous additions to finish.
- */
- if (!test_and_set_bit(BDI_pending, &bdi->state)) {
- list_del_rcu(&bdi->bdi_list);
-
- /*
- * We must wait for the current RCU period to end before
- * moving to the pending list. So schedule that operation
- * from an RCU callback.
- */
- call_rcu(&bdi->rcu_head, bdi_add_to_pending);
- }
-}
-
/*
* Remove bdi from bdi_list, and ensure that it is no longer visible
*/
@@ -595,7 +557,6 @@ int bdi_init(struct backing_dev_info *bdi)
bdi->max_ratio = 100;
bdi->max_prop_frac = PROP_FRAC_BASE;
spin_lock_init(&bdi->wb_lock);
- INIT_RCU_HEAD(&bdi->rcu_head);
INIT_LIST_HEAD(&bdi->bdi_list);
INIT_LIST_HEAD(&bdi->work_list);
--
1.7.1.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 3/3] writeback: fix possible race when shutting down bdi
2010-07-12 15:22 [PATCH 0/3] writeback: more clean-ups and fixes Artem Bityutskiy
2010-07-12 15:22 ` [PATCH 1/3] writeback: remove redundant list initialization Artem Bityutskiy
2010-07-12 15:22 ` [PATCH 2/3] writeback: simplify bdi code a little Artem Bityutskiy
@ 2010-07-12 15:22 ` Artem Bityutskiy
2 siblings, 0 replies; 4+ messages in thread
From: Artem Bityutskiy @ 2010-07-12 15:22 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-fsdevel
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Current bdi code has the following race between 'bdi_wb_shutdown()'
and 'bdi_forker_thread()':
Initial condition for the bdi we consider: BDI_pending is cleaned,
bdi has no writeback thread, because it was inactive and exited,
'bdi_wb_shutdown()' and 'bdi_forker_thread()' are executed concurrently.
1. bdi_wb_shutdown() executes wait_on_bit(), tests the BDI_pending bit,
it is clean, so it does not wait for anything.
2. 'bdi_forker_thread()' takes the 'bdi_lock', finds out that bdi has
work to do, takes it out of the 'bdi_list', sets the BDI_pending flag,
unlocks the 'bdi_lock' lock
3. 'bdi_wb_shutdown()' takes the lock, and nasty things start happening:
a) it removes the bdi from bdi->bdi_list, but the bdi is not in any
list
b) it starts deleting the bdi, but 'bdi_forker_thread()' is still working
with it.
Note, it is very difficult to hit this race, and I never observed it, so it
is quite theoretical, but it is still a race. Also note, this race exist
without my previous clean-ups as well.
This patch fixes this race by making 'bdi_wb_shutdown()' first search for
the bdi in the 'bdi_list', and only if it is there, remove it from 'bdi_list'
and destroy. But if it is not there, assume it is in transit and re-try
waiting on the BDI_pending bit.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
mm/backing-dev.c | 41 ++++++++++++++++++++++++++++-------------
1 files changed, 28 insertions(+), 13 deletions(-)
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 0a69066..6862cb5 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -418,15 +418,26 @@ static int bdi_forker_thread(void *ptr)
}
/*
- * Remove bdi from bdi_list, and ensure that it is no longer visible
+ * Look up for bdi in the bdi_list. If found, remove it, ensure that it is
+ * no longer visible, and return 0. If not found, return 1.
*/
-static void bdi_remove_from_list(struct backing_dev_info *bdi)
+static int bdi_remove_from_list(struct backing_dev_info *me)
{
+ struct backing_dev_info *bdi;
+
spin_lock_bh(&bdi_lock);
- list_del_rcu(&bdi->bdi_list);
+ list_for_each_entry(bdi, &bdi_list, bdi_list) {
+ if (bdi == me) {
+ list_del_rcu(&me->bdi_list);
+ spin_unlock_bh(&bdi_lock);
+ synchronize_rcu();
+ return 0;
+ }
+
+ }
spin_unlock_bh(&bdi_lock);
+ return 1;
- synchronize_rcu();
}
int bdi_register(struct backing_dev_info *bdi, struct device *parent,
@@ -494,16 +505,20 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
if (!bdi_cap_writeback_dirty(bdi))
return;
- /*
- * If setup is pending, wait for that to complete first
- */
- wait_on_bit(&bdi->state, BDI_pending, bdi_sched_wait,
- TASK_UNINTERRUPTIBLE);
+ do {
+ /*
+ * If setup is pending, wait for that to complete first
+ */
+ wait_on_bit(&bdi->state, BDI_pending, bdi_sched_wait,
+ TASK_UNINTERRUPTIBLE);
- /*
- * Make sure nobody finds us on the bdi_list anymore
- */
- bdi_remove_from_list(bdi);
+ /*
+ * Make sure nobody finds us on the bdi_list anymore. However,
+ * bdi may be temporary be not in the bdi_list but be in transit
+ * in bdi_forker_thread. Namely, this may happen if we race
+ * with the forker thread.
+ */
+ } while (bdi_remove_from_list(bdi));
/*
* Finally, kill the kernel thread. We don't need to be RCU
--
1.7.1.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-07-12 15:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-12 15:22 [PATCH 0/3] writeback: more clean-ups and fixes Artem Bityutskiy
2010-07-12 15:22 ` [PATCH 1/3] writeback: remove redundant list initialization Artem Bityutskiy
2010-07-12 15:22 ` [PATCH 2/3] writeback: simplify bdi code a little Artem Bityutskiy
2010-07-12 15:22 ` [PATCH 3/3] writeback: fix possible race when shutting down bdi Artem Bityutskiy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).