From: Jens Axboe <jens.axboe@oracle.com>
To: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
chris.mason@oracle.com, david@fromorbit.com, hch@infradead.org,
akpm@linux-foundation.org, jack@suse.cz
Subject: Re: [PATCH 0/11] Per-bdi writeback flusher threads #4
Date: Wed, 20 May 2009 10:54:47 +0200 [thread overview]
Message-ID: <20090520085446.GN11363@kernel.dk> (raw)
In-Reply-To: <20090520080938.GM11363@kernel.dk>
[-- Attachment #1: Type: text/plain, Size: 5097 bytes --]
On Wed, May 20 2009, Jens Axboe wrote:
> On Wed, May 20 2009, Zhang, Yanmin wrote:
> > On Tue, 2009-05-19 at 08:20 +0200, Jens Axboe wrote:
> > > On Tue, May 19 2009, Zhang, Yanmin wrote:
> > > > On Mon, 2009-05-18 at 14:19 +0200, Jens Axboe wrote:
> > > > > Hi,
> > > > >
> > > > > This is the fourth version of this patchset. Chances since v3:
> > > > >
> > > > > - Dropped a prep patch, it has been included in mainline since.
> > > > >
> > > > > - Add a work-to-do list to the bdi. This is struct bdi_work. Each
> > > > > wb thread will notice and execute work on bdi->work_list. The arguments
> > > > > are which sb (or NULL for all) to flush and how many pages to flush.
> > > > >
> > > > > - Fix a bug where not all bdi's would end up on the bdi_list, so potentially
> > > > > some data would not be flushed.
> > > > >
> > > > > - Make wb_kupdated() pass on wbc->older_than_this so we maintain the same
> > > > > behaviour for kupdated flushes.
> > > > >
> > > > > - Have the wb thread flush first before sleeping, to avoid losing the
> > > > > first flush on lazy register.
> > > > >
> > > > > - Rebase to newer kernels.
> >
> > > I'm attaching two patches - apply #1 to -rc6, and then #2 is a roll-up
> > > of the patch series that you can apply next.
> > Jens,
> >
> > I run into 2 issues with kernel 2.6.30-rc6+BDI_Flusher_V4. Below is one.
> >
> > Tue May 19 00:00:00 CST 2009
> > BUG: unable to handle kernel NULL pointer dereference at 00000000000001d8
> > IP: [<ffffffff803f3c4c>] generic_make_request+0x10a/0x384
> > PGD 0
> > Oops: 0000 [#1] SMP
> > last sysfs file: /sys/block/sdb/stat
> > CPU 0
> > Modules linked in: igb
> > Pid: 1445, comm: bdi-8:16 Not tainted 2.6.30-rc6-bdiflusherv4 #1 X8DTN
> > RIP: 0010:[<ffffffff803f3c4c>] [<ffffffff803f3c4c>] generic_make_request+0x10a/0x384
> > RSP: 0018:ffff8800bd04da60 EFLAGS: 00010206
> > RAX: 0000000000000000 RBX: ffff8801be45d500 RCX: 00000000038a0df8
> > RDX: 0000000000000008 RSI: 0000000000000576 RDI: ffff8801bf408680
> > RBP: ffff8801be45d500 R08: ffffe20001ee8140 R09: ffff8800bd04da98
> > R10: 0000000000000000 R11: ffff8800bd72eb40 R12: ffff8801be45d500
> > R13: ffff88005f51f310 R14: 0000000000000008 R15: ffff8800b15a5458
> > FS: 0000000000000000(0000) GS:ffffc20000000000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> > CR2: 00000000000001d8 CR3: 0000000000201000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process bdi-8:16 (pid: 1445, threadinfo ffff8800bd04c000, task ffff8800bd1b75f0)
> > Stack:
> > 0000000000000008 ffffffff8027a613 00000000848dc000 ffffffffffffffff
> > ffff8800a8190f50 ffffffff00000012 ffff8800a81938e0 ffffc2000000001b
> > 0000000000000000 0000000000000000 ffffe200026f9c30 0000000000000000
> > Call Trace:
> > [<ffffffff8027a613>] ? mempool_alloc+0x59/0x10f
> > [<ffffffff803f3f70>] ? submit_bio+0xaa/0xb1
> > [<ffffffff802c6a3f>] ? submit_bh+0xe3/0x103
> > [<ffffffff802c92ea>] ? __block_write_full_page+0x1fb/0x2f2
> > [<ffffffff802c7d6a>] ? end_buffer_async_write+0x0/0xfb
> > [<ffffffff8027e8d2>] ? __writepage+0xa/0x25
> > [<ffffffff8027f036>] ? write_cache_pages+0x21c/0x338
> > [<ffffffff8027e8c8>] ? __writepage+0x0/0x25
> > [<ffffffff8027f195>] ? do_writepages+0x27/0x2d
> > [<ffffffff802c22c1>] ? __writeback_single_inode+0x159/0x2b3
> > [<ffffffff8071e52a>] ? thread_return+0x3e/0xaa
> > [<ffffffff8027f267>] ? determine_dirtyable_memory+0xd/0x1d
> > [<ffffffff8027f2dd>] ? get_dirty_limits+0x1d/0x255
> > [<ffffffff802c27bc>] ? generic_sync_wb_inodes+0x1b4/0x220
> > [<ffffffff802c3130>] ? wb_do_writeback+0x16c/0x215
> > [<ffffffff802c323e>] ? bdi_writeback_task+0x65/0x10d
> > [<ffffffff8024cc06>] ? autoremove_wake_function+0x0/0x2e
> > [<ffffffff8024cb27>] ? bit_waitqueue+0x10/0xa0
> > [<ffffffff80289257>] ? bdi_start_fn+0x0/0xba
> > [<ffffffff802892c6>] ? bdi_start_fn+0x6f/0xba
> > [<ffffffff8024c860>] ? kthread+0x54/0x80
> > [<ffffffff8020c97a>] ? child_rip+0xa/0x20
> > [<ffffffff8024c80c>] ? kthread+0x0/0x80
> > [<ffffffff8020c970>] ? child_rip+0x0/0x20
> >
> > The panic happened at the beginging of a mmap randrw after a mmap randwrite.
> >
> > It's triggered in __generic_make_request => bdev_get_queue(bio->bi_bdev),
> > because ???bio->bi_bdev->bd_disk is equal to NULL.
> >
> > The callchain is:
> > ???bdi_writeback_task =>
> > wb_do_writeback =>
> > ???generic_sync_wb_inodes =>
> > ???__writeback_single_inode =>
> > ...
> > ???__block_write_full_page =>
> > ???submit_bh =>
> > submit_bio=>
> > ???generic_make_request
>
> Wow, that is really odd. Can you pass the details of the test you ran?
I found one issue yesterday and one today that could cause issues, not
sure it would explain this one. But at least it's worth a try, if it's
reproducible. I'm attaching the three patches I have against the posted
series. The one in the middle is just an optimization, the first and
third are the bug fixes.
--
Jens Axboe
[-- Attachment #2: 0001-writeback-add-memory-barrier-before-wake_up_bit-in-b.patch --]
[-- Type: text/x-diff, Size: 857 bytes --]
>From 9025f9ffc675c3d8bf6c25fdebe30ca98082bab6 Mon Sep 17 00:00:00 2001
From: Jens Axboe <jens.axboe@oracle.com>
Date: Tue, 19 May 2009 09:47:02 +0200
Subject: [PATCH 1/3] writeback: add memory barrier before wake_up_bit() in bdi_work_free()
As per wake_up_bit() documentation, was also triggered in the wild.
Process got stuck forever waiting for a bit clear that had happened.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
fs/fs-writeback.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index a287c09..6052701 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -102,6 +102,7 @@ static void bdi_work_free(struct rcu_head *head)
kfree(work);
else {
clear_bit(0, &work->state);
+ smp_mb__after_clear_bit();
wake_up_bit(&work->state, 0);
}
}
--
1.6.3.9.g6345
[-- Attachment #3: 0002-writeback-attempt-to-allocate-work-struct-in-bdi_sta.patch --]
[-- Type: text/x-diff, Size: 1601 bytes --]
>From b4c4af0be4ff04648d2033dc3ac4dd4d50d5864d Mon Sep 17 00:00:00 2001
From: Jens Axboe <jens.axboe@oracle.com>
Date: Tue, 19 May 2009 11:26:58 +0200
Subject: [PATCH 2/3] writeback: attempt to allocate work struct in bdi_start_writeback()
If the allocation works, then we don't have to wait for the threads
to wake up and notice the work. So it would potentially cause less
lag in bdi_start_writeback(). If it fails, just fall back to an on-stack
work struct again.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
fs/fs-writeback.c | 19 +++++++++++++++----
1 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 6052701..f80afaa 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -191,14 +191,25 @@ static void bdi_wait_on_work_start(struct bdi_work *work)
int bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
long nr_pages)
{
- struct bdi_work work;
+ struct bdi_work work_stack, *work;
int ret;
- bdi_work_init_on_stack(&work, sb, nr_pages);
+ work = kmalloc(sizeof(*work), GFP_ATOMIC);
+ if (work)
+ bdi_work_init(work, sb, nr_pages);
+ else {
+ work = &work_stack;
+ bdi_work_init_on_stack(work, sb, nr_pages);
+ }
- ret = bdi_queue_writeback(bdi, &work);
+ ret = bdi_queue_writeback(bdi, work);
- bdi_wait_on_work_start(&work);
+ /*
+ * If this came from our stack, we need to wait until the wb threads
+ * have noticed this work before we return (and invalidate the stack)
+ */
+ if (work == &work_stack)
+ bdi_wait_on_work_start(work);
return ret;
}
--
1.6.3.9.g6345
[-- Attachment #4: 0003-writeback-mm-backing-dev.c-bdi_start_fn-should-use-b.patch --]
[-- Type: text/x-diff, Size: 992 bytes --]
>From 81eabcf5ca618e2453d97a8822bc6b00fdad81c2 Mon Sep 17 00:00:00 2001
From: Jens Axboe <jens.axboe@oracle.com>
Date: Wed, 20 May 2009 10:53:44 +0200
Subject: [PATCH 3/3] writeback: mm/backing-dev.c:bdi_start_fn() should use bh disabling locks
bdi_lock is grabbed from softirq context, so we need to always use
bh disabling spinlocks. All the other callsites are OK, but this one
missed the _bh() postfix.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
mm/backing-dev.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index d45251f..60578bc 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -365,9 +365,9 @@ static int bdi_start_fn(void *ptr)
/*
* Make us discoverable on the bdi_list again
*/
- spin_lock(&bdi_lock);
+ spin_lock_bh(&bdi_lock);
list_add_tail_rcu(&bdi->bdi_list, &bdi_list);
- spin_unlock(&bdi_lock);
+ spin_unlock_bh(&bdi_lock);
ret = bdi_writeback_task(wb);
--
1.6.3.9.g6345
next prev parent reply other threads:[~2009-05-20 8:54 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-18 12:19 [PATCH 0/11] Per-bdi writeback flusher threads #4 Jens Axboe
2009-05-18 12:19 ` [PATCH 01/11] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-05-18 12:19 ` [PATCH 02/11] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-05-19 10:20 ` Richard Kennedy
2009-05-19 12:23 ` Jens Axboe
2009-05-19 13:45 ` Richard Kennedy
2009-05-19 17:56 ` Jens Axboe
2009-05-19 22:11 ` Peter Zijlstra
2009-05-20 11:18 ` Jan Kara
2009-05-20 11:32 ` Jens Axboe
2009-05-20 12:11 ` Jan Kara
2009-05-20 12:16 ` Jens Axboe
2009-05-20 12:24 ` Christoph Hellwig
2009-05-20 12:48 ` Jens Axboe
2009-05-20 12:37 ` Christoph Hellwig
2009-05-20 12:49 ` Jens Axboe
2009-05-20 14:02 ` Anton Altaparmakov
2009-05-18 12:19 ` [PATCH 03/11] writeback: get rid of pdflush completely Jens Axboe
2009-05-18 12:19 ` [PATCH 04/11] writeback: separate the flushing state/task from the bdi Jens Axboe
2009-05-20 11:34 ` Jan Kara
2009-05-20 11:39 ` Jens Axboe
2009-05-20 12:06 ` Jan Kara
2009-05-20 12:09 ` Jens Axboe
2009-05-18 12:19 ` [PATCH 05/11] writeback: support > 1 flusher thread per bdi Jens Axboe
2009-05-18 12:19 ` [PATCH 06/11] writeback: include default_backing_dev_info in writeback Jens Axboe
2009-05-18 12:19 ` [PATCH 07/11] writeback: allow sleepy exit of default writeback task Jens Axboe
2009-05-18 12:19 ` [PATCH 08/11] writeback: btrfs must register its backing_devices Jens Axboe
2009-05-18 12:19 ` [PATCH 09/11] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-05-18 12:19 ` [PATCH 10/11] writeback: add name to backing_dev_info Jens Axboe
2009-05-18 12:19 ` [PATCH 11/11] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
2009-05-19 6:11 ` [PATCH 0/11] Per-bdi writeback flusher threads #4 Zhang, Yanmin
2009-05-19 6:20 ` Jens Axboe
2009-05-19 6:43 ` Zhang, Yanmin
2009-05-20 7:51 ` Zhang, Yanmin
2009-05-20 8:09 ` Jens Axboe
2009-05-20 8:54 ` Jens Axboe [this message]
2009-05-20 9:19 ` Zhang, Yanmin
2009-05-20 9:25 ` Jens Axboe
2009-05-20 11:19 ` Jens Axboe
2009-05-21 6:33 ` Zhang, Yanmin
2009-05-21 9:10 ` Jan Kara
2009-05-22 1:28 ` Zhang, Yanmin
2009-05-22 8:15 ` Jens Axboe
2009-05-22 20:44 ` Jens Axboe
2009-05-23 19:15 ` Jens Axboe
2009-05-25 8:02 ` Zhang, Yanmin
2009-05-25 8:06 ` Jens Axboe
2009-05-25 8:43 ` Zhang, Yanmin
2009-05-25 8:48 ` Jens Axboe
2009-05-25 8:54 ` Zhang, Yanmin
2009-05-22 7:53 ` Jens Axboe
2009-05-22 7:53 ` Jens Axboe
2009-05-25 15:57 ` Richard Kennedy
2009-05-25 17:05 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090520085446.GN11363@kernel.dk \
--to=jens.axboe@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).