* [patch 1/5] fs: make use of rcu helpers
2009-09-15 19:19 [patch 0/5] bdi writeback fixes npiggin
@ 2009-09-15 19:19 ` npiggin
2009-09-15 19:31 ` Jens Axboe
2009-09-15 19:19 ` [patch 2/5] fs: improve scalability of bdi writeback work queues npiggin
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: npiggin @ 2009-09-15 19:19 UTC (permalink / raw)
To: Jens Axboe; +Cc: Linus Torvalds, Andrew Morton, linux-kernel
[-- Attachment #1: fs-bdiwb-rcu-fix.patch --]
[-- Type: text/plain, Size: 670 bytes --]
list_add_tail_rcu contains required barriers.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
fs/fs-writeback.c | 5 -----
1 file changed, 5 deletions(-)
Index: linux-2.6/fs/fs-writeback.c
===================================================================
--- linux-2.6.orig/fs/fs-writeback.c
+++ linux-2.6/fs/fs-writeback.c
@@ -152,11 +152,6 @@ static void bdi_queue_work(struct backin
atomic_set(&work->pending, bdi->wb_cnt);
BUG_ON(!bdi->wb_cnt);
- /*
- * Make sure stores are seen before it appears on the list
- */
- smp_mb();
-
spin_lock(&bdi->wb_lock);
list_add_tail_rcu(&work->list, &bdi->work_list);
spin_unlock(&bdi->wb_lock);
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [patch 1/5] fs: make use of rcu helpers
2009-09-15 19:19 ` [patch 1/5] fs: make use of rcu helpers npiggin
@ 2009-09-15 19:31 ` Jens Axboe
0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2009-09-15 19:31 UTC (permalink / raw)
To: npiggin; +Cc: Linus Torvalds, Andrew Morton, linux-kernel
On Wed, Sep 16 2009, npiggin@suse.de wrote:
> list_add_tail_rcu contains required barriers.
Thanks, applied as well.
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> fs/fs-writeback.c | 5 -----
> 1 file changed, 5 deletions(-)
>
> Index: linux-2.6/fs/fs-writeback.c
> ===================================================================
> --- linux-2.6.orig/fs/fs-writeback.c
> +++ linux-2.6/fs/fs-writeback.c
> @@ -152,11 +152,6 @@ static void bdi_queue_work(struct backin
> atomic_set(&work->pending, bdi->wb_cnt);
> BUG_ON(!bdi->wb_cnt);
>
> - /*
> - * Make sure stores are seen before it appears on the list
> - */
> - smp_mb();
> -
> spin_lock(&bdi->wb_lock);
> list_add_tail_rcu(&work->list, &bdi->work_list);
> spin_unlock(&bdi->wb_lock);
>
>
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
* [patch 2/5] fs: improve scalability of bdi writeback work queues
2009-09-15 19:19 [patch 0/5] bdi writeback fixes npiggin
2009-09-15 19:19 ` [patch 1/5] fs: make use of rcu helpers npiggin
@ 2009-09-15 19:19 ` npiggin
2009-09-15 19:31 ` Jens Axboe
2009-09-15 19:19 ` [patch 3/5] fs: fix bdi writeback use after free 1 npiggin
` (2 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: npiggin @ 2009-09-15 19:19 UTC (permalink / raw)
To: Jens Axboe; +Cc: Linus Torvalds, Andrew Morton, linux-kernel
[-- Attachment #1: fs-bdiwb-scale.patch --]
[-- Type: text/plain, Size: 823 bytes --]
If you're going to do an atomic RMW on each list entry, there's not much
point in all the RCU complexities of the list walking. This is only going
to help the multi-thread case I guess, but it doesn't hurt to do now.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
fs/fs-writeback.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Index: linux-2.6/fs/fs-writeback.c
===================================================================
--- linux-2.6.orig/fs/fs-writeback.c
+++ linux-2.6/fs/fs-writeback.c
@@ -736,8 +736,9 @@ static struct bdi_work *get_next_work_it
rcu_read_lock();
list_for_each_entry_rcu(work, &bdi->work_list, list) {
- if (!test_and_clear_bit(wb->nr, &work->seen))
+ if (!test_bit(wb->nr, &work->seen))
continue;
+ clear_bit(wb->nr, &work->seen);
ret = work;
break;
^ permalink raw reply [flat|nested] 11+ messages in thread* [patch 3/5] fs: fix bdi writeback use after free 1
2009-09-15 19:19 [patch 0/5] bdi writeback fixes npiggin
2009-09-15 19:19 ` [patch 1/5] fs: make use of rcu helpers npiggin
2009-09-15 19:19 ` [patch 2/5] fs: improve scalability of bdi writeback work queues npiggin
@ 2009-09-15 19:19 ` npiggin
2009-09-15 19:32 ` Jens Axboe
2009-09-15 19:19 ` [patch 4/5] fs: fix possible bdi writeback refcounting problem npiggin
2009-09-15 19:19 ` [patch 5/5] fs: fix bdi writeback use after free 2 npiggin
4 siblings, 1 reply; 11+ messages in thread
From: npiggin @ 2009-09-15 19:19 UTC (permalink / raw)
To: Jens Axboe; +Cc: Linus Torvalds, Andrew Morton, linux-kernel
[-- Attachment #1: fs-bdiwb-rcu-fix-2.patch --]
[-- Type: text/plain, Size: 1241 bytes --]
By the time bdi_work_on_stack gets evaluated again in bdi_work_free, it
cna already have been deallocated and used for something else in the
!on stack case, giving a false positive in this test and causing corruption.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
fs/fs-writeback.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
Index: linux-2.6/fs/fs-writeback.c
===================================================================
--- linux-2.6.orig/fs/fs-writeback.c
+++ linux-2.6/fs/fs-writeback.c
@@ -114,6 +114,7 @@ static void bdi_work_free(struct rcu_hea
static void wb_work_complete(struct bdi_work *work)
{
const enum writeback_sync_modes sync_mode = work->sync_mode;
+ int onstack = bdi_work_on_stack(work);
/*
* For allocated work, we can clear the done/seen bit right here.
@@ -121,9 +122,9 @@ static void wb_work_complete(struct bdi_
* to after the RCU grace period, since the stack could be invalidated
* as soon as bdi_work_clear() has done the wakeup.
*/
- if (!bdi_work_on_stack(work))
+ if (!onstack)
bdi_work_clear(work);
- if (sync_mode == WB_SYNC_NONE || bdi_work_on_stack(work))
+ if (sync_mode == WB_SYNC_NONE || onstack)
call_rcu(&work->rcu_head, bdi_work_free);
}
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [patch 3/5] fs: fix bdi writeback use after free 1
2009-09-15 19:19 ` [patch 3/5] fs: fix bdi writeback use after free 1 npiggin
@ 2009-09-15 19:32 ` Jens Axboe
0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2009-09-15 19:32 UTC (permalink / raw)
To: npiggin; +Cc: Linus Torvalds, Andrew Morton, linux-kernel
On Wed, Sep 16 2009, npiggin@suse.de wrote:
> By the time bdi_work_on_stack gets evaluated again in bdi_work_free, it
> cna already have been deallocated and used for something else in the
> !on stack case, giving a false positive in this test and causing corruption.
You are right, that is also buggy... Applied.
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> fs/fs-writeback.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/fs/fs-writeback.c
> ===================================================================
> --- linux-2.6.orig/fs/fs-writeback.c
> +++ linux-2.6/fs/fs-writeback.c
> @@ -114,6 +114,7 @@ static void bdi_work_free(struct rcu_hea
> static void wb_work_complete(struct bdi_work *work)
> {
> const enum writeback_sync_modes sync_mode = work->sync_mode;
> + int onstack = bdi_work_on_stack(work);
>
> /*
> * For allocated work, we can clear the done/seen bit right here.
> @@ -121,9 +122,9 @@ static void wb_work_complete(struct bdi_
> * to after the RCU grace period, since the stack could be invalidated
> * as soon as bdi_work_clear() has done the wakeup.
> */
> - if (!bdi_work_on_stack(work))
> + if (!onstack)
> bdi_work_clear(work);
> - if (sync_mode == WB_SYNC_NONE || bdi_work_on_stack(work))
> + if (sync_mode == WB_SYNC_NONE || onstack)
> call_rcu(&work->rcu_head, bdi_work_free);
> }
>
>
>
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
* [patch 4/5] fs: fix possible bdi writeback refcounting problem
2009-09-15 19:19 [patch 0/5] bdi writeback fixes npiggin
` (2 preceding siblings ...)
2009-09-15 19:19 ` [patch 3/5] fs: fix bdi writeback use after free 1 npiggin
@ 2009-09-15 19:19 ` npiggin
2009-09-15 19:30 ` Jens Axboe
2009-09-15 19:19 ` [patch 5/5] fs: fix bdi writeback use after free 2 npiggin
4 siblings, 1 reply; 11+ messages in thread
From: npiggin @ 2009-09-15 19:19 UTC (permalink / raw)
To: Jens Axboe; +Cc: Linus Torvalds, Andrew Morton, linux-kernel
[-- Attachment #1: fs-bdiwb-refcount-fix.patch --]
[-- Type: text/plain, Size: 1352 bytes --]
wb_clear_pending AFAIKS should not be called after the item has been
put on the list, except by the worker threads. It could lead to the
situation where the refcount is decremented below 0 and cause lots of
problems.
Presumably the !wb_has_dirty_io case is not a common one, so it can
be discovered when the thread wakes up to check?
Also add a comment in bdi_work_clear.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
fs/fs-writeback.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
Index: linux-2.6/fs/fs-writeback.c
===================================================================
--- linux-2.6.orig/fs/fs-writeback.c
+++ linux-2.6/fs/fs-writeback.c
@@ -98,6 +98,11 @@ static void bdi_work_clear(struct bdi_wo
{
clear_bit(WS_USED_B, &work->state);
smp_mb__after_clear_bit();
+ /*
+ * work can have disappeared at this point. bit waitq functions
+ * should be able to tolerate this, provided bdi_sched_wait does
+ * not dereference it's pointer argument.
+ */
wake_up_bit(&work->state, WS_USED_B);
}
@@ -172,10 +177,7 @@ static void bdi_queue_work(struct backin
* thread always. As a safety precaution, it'll flush out
* everything
*/
- if (!wb_has_dirty_io(wb)) {
- if (work)
- wb_clear_pending(wb, work);
- } else if (wb->task)
+ if (wb->task)
wake_up_process(wb->task);
}
}
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [patch 4/5] fs: fix possible bdi writeback refcounting problem
2009-09-15 19:19 ` [patch 4/5] fs: fix possible bdi writeback refcounting problem npiggin
@ 2009-09-15 19:30 ` Jens Axboe
0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2009-09-15 19:30 UTC (permalink / raw)
To: npiggin; +Cc: Linus Torvalds, Andrew Morton, linux-kernel
On Wed, Sep 16 2009, npiggin@suse.de wrote:
> wb_clear_pending AFAIKS should not be called after the item has been
> put on the list, except by the worker threads. It could lead to the
> situation where the refcount is decremented below 0 and cause lots of
> problems.
Good point!
> Presumably the !wb_has_dirty_io case is not a common one, so it can
> be discovered when the thread wakes up to check?
It's checked earlier as well, so I see no problem in killing the check
there.
>
> Also add a comment in bdi_work_clear.
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> fs/fs-writeback.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> Index: linux-2.6/fs/fs-writeback.c
> ===================================================================
> --- linux-2.6.orig/fs/fs-writeback.c
> +++ linux-2.6/fs/fs-writeback.c
> @@ -98,6 +98,11 @@ static void bdi_work_clear(struct bdi_wo
> {
> clear_bit(WS_USED_B, &work->state);
> smp_mb__after_clear_bit();
> + /*
> + * work can have disappeared at this point. bit waitq functions
> + * should be able to tolerate this, provided bdi_sched_wait does
> + * not dereference it's pointer argument.
> + */
> wake_up_bit(&work->state, WS_USED_B);
> }
>
> @@ -172,10 +177,7 @@ static void bdi_queue_work(struct backin
> * thread always. As a safety precaution, it'll flush out
> * everything
> */
> - if (!wb_has_dirty_io(wb)) {
> - if (work)
> - wb_clear_pending(wb, work);
> - } else if (wb->task)
> + if (wb->task)
> wake_up_process(wb->task);
> }
> }
>
>
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
* [patch 5/5] fs: fix bdi writeback use after free 2
2009-09-15 19:19 [patch 0/5] bdi writeback fixes npiggin
` (3 preceding siblings ...)
2009-09-15 19:19 ` [patch 4/5] fs: fix possible bdi writeback refcounting problem npiggin
@ 2009-09-15 19:19 ` npiggin
2009-09-15 19:29 ` Jens Axboe
4 siblings, 1 reply; 11+ messages in thread
From: npiggin @ 2009-09-15 19:19 UTC (permalink / raw)
To: Jens Axboe; +Cc: Linus Torvalds, Andrew Morton, linux-kernel
[-- Attachment #1: fs-bdiwb-rcu-fix-3.patch --]
[-- Type: text/plain, Size: 1165 bytes --]
work can be gone after wb_clear_pending, so load the sb from it first.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
fs/fs-writeback.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
Index: linux-2.6/fs/fs-writeback.c
===================================================================
--- linux-2.6.orig/fs/fs-writeback.c
+++ linux-2.6/fs/fs-writeback.c
@@ -783,6 +783,7 @@ long wb_do_writeback(struct bdi_writebac
while ((work = get_next_work_item(bdi, wb)) != NULL) {
enum writeback_sync_modes sync_mode;
+ struct super_block *sb;
nr_pages = work->nr_pages;
@@ -794,6 +795,8 @@ long wb_do_writeback(struct bdi_writebac
else
sync_mode = work->sync_mode;
+ sb = work->sb;
+
/*
* If this isn't a data integrity operation, just notify
* that we have seen this work and we are now starting it.
@@ -801,7 +804,7 @@ long wb_do_writeback(struct bdi_writebac
if (sync_mode == WB_SYNC_NONE)
wb_clear_pending(wb, work);
- wrote += wb_writeback(wb, nr_pages, work->sb, sync_mode, 0);
+ wrote += wb_writeback(wb, nr_pages, sb, sync_mode, 0);
/*
* This is a data integrity writeback, so only do the
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [patch 5/5] fs: fix bdi writeback use after free 2
2009-09-15 19:19 ` [patch 5/5] fs: fix bdi writeback use after free 2 npiggin
@ 2009-09-15 19:29 ` Jens Axboe
0 siblings, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2009-09-15 19:29 UTC (permalink / raw)
To: npiggin; +Cc: Linus Torvalds, Andrew Morton, linux-kernel
On Wed, Sep 16 2009, npiggin@suse.de wrote:
> work can be gone after wb_clear_pending, so load the sb from it first.
This one is already moot, I fixed that in an earlier patch.
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
> fs/fs-writeback.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/fs/fs-writeback.c
> ===================================================================
> --- linux-2.6.orig/fs/fs-writeback.c
> +++ linux-2.6/fs/fs-writeback.c
> @@ -783,6 +783,7 @@ long wb_do_writeback(struct bdi_writebac
>
> while ((work = get_next_work_item(bdi, wb)) != NULL) {
> enum writeback_sync_modes sync_mode;
> + struct super_block *sb;
>
> nr_pages = work->nr_pages;
>
> @@ -794,6 +795,8 @@ long wb_do_writeback(struct bdi_writebac
> else
> sync_mode = work->sync_mode;
>
> + sb = work->sb;
> +
> /*
> * If this isn't a data integrity operation, just notify
> * that we have seen this work and we are now starting it.
> @@ -801,7 +804,7 @@ long wb_do_writeback(struct bdi_writebac
> if (sync_mode == WB_SYNC_NONE)
> wb_clear_pending(wb, work);
>
> - wrote += wb_writeback(wb, nr_pages, work->sb, sync_mode, 0);
> + wrote += wb_writeback(wb, nr_pages, sb, sync_mode, 0);
>
> /*
> * This is a data integrity writeback, so only do the
>
>
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread