* Re: [PATCH] aio: fix uncorrent dirty pages accouting when truncating AIO ring buffer [not found] <1414750042-20832-1-git-send-email-guz.fnst@cn.fujitsu.com> @ 2014-11-05 9:53 ` Gu Zheng 2014-11-05 14:56 ` Benjamin LaHaise 0 siblings, 1 reply; 4+ messages in thread From: Gu Zheng @ 2014-11-05 9:53 UTC (permalink / raw) To: bcrl, akpm; +Cc: Gu Zheng, linux-aio, m.koenigshaus, linux-kernel, stable ping... On 10/31/2014 06:07 PM, Gu Zheng wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=86831 > > Markus reported that when shutting down mysqld (with AIO support, > on a ext3 formatted Harddrive) leads to a negative number of dirty pages > (underrun to the counter). The negative number results in a drastic reduction > of the write performance because the page cache is not used, because the kernel > thinks it is still 2 ^ 32 dirty pages open. > > Add a warn trace in __dec_zone_state will catch this easily: > > static inline void __dec_zone_state(struct zone *zone, enum > zone_stat_item item) > { > atomic_long_dec(&zone->vm_stat[item]); > + WARN_ON_ONCE(item == NR_FILE_DIRTY && > atomic_long_read(&zone->vm_stat[item]) < 0); > atomic_long_dec(&vm_stat[item]); > } > > [ 21.341632] ------------[ cut here ]------------ > [ 21.346294] WARNING: CPU: 0 PID: 309 at include/linux/vmstat.h:242 > cancel_dirty_page+0x164/0x224() > [ 21.355296] Modules linked in: wutbox_cp sata_mv > [ 21.359968] CPU: 0 PID: 309 Comm: kworker/0:1 Not tainted 3.14.21-WuT #80 > [ 21.366793] Workqueue: events free_ioctx > [ 21.370760] [<c0016a64>] (unwind_backtrace) from [<c0012f88>] > (show_stack+0x20/0x24) > [ 21.378562] [<c0012f88>] (show_stack) from [<c03f8ccc>] > (dump_stack+0x24/0x28) > [ 21.385840] [<c03f8ccc>] (dump_stack) from [<c0023ae4>] > (warn_slowpath_common+0x84/0x9c) > [ 21.393976] [<c0023ae4>] (warn_slowpath_common) from [<c0023bb8>] > (warn_slowpath_null+0x2c/0x34) > [ 21.402800] [<c0023bb8>] (warn_slowpath_null) from [<c00c0688>] > (cancel_dirty_page+0x164/0x224) > [ 21.411524] [<c00c0688>] (cancel_dirty_page) from [<c00c080c>] > (truncate_inode_page+0x8c/0x158) > [ 21.420272] [<c00c080c>] (truncate_inode_page) from [<c00c0a94>] > (truncate_inode_pages_range+0x11c/0x53c) > [ 21.429890] [<c00c0a94>] (truncate_inode_pages_range) from > [<c00c0f6c>] (truncate_pagecache+0x88/0xac) > [ 21.439252] [<c00c0f6c>] (truncate_pagecache) from [<c00c0fec>] > (truncate_setsize+0x5c/0x74) > [ 21.447731] [<c00c0fec>] (truncate_setsize) from [<c013b3a8>] > (put_aio_ring_file.isra.14+0x34/0x90) > [ 21.456826] [<c013b3a8>] (put_aio_ring_file.isra.14) from > [<c013b424>] (aio_free_ring+0x20/0xcc) > [ 21.465660] [<c013b424>] (aio_free_ring) from [<c013b4f4>] > (free_ioctx+0x24/0x44) > [ 21.473190] [<c013b4f4>] (free_ioctx) from [<c003d8d8>] > (process_one_work+0x134/0x47c) > [ 21.481132] [<c003d8d8>] (process_one_work) from [<c003e988>] > (worker_thread+0x130/0x414) > [ 21.489350] [<c003e988>] (worker_thread) from [<c00448ac>] > (kthread+0xd4/0xec) > [ 21.496621] [<c00448ac>] (kthread) from [<c000ec18>] > (ret_from_fork+0x14/0x20) > [ 21.503884] ---[ end trace 79c4bf42c038c9a1 ]--- > > The cause is that we set the aio ring file pages as *DIRTY* via SetPageDirty > (bypasses the VFS dirty pages increment) when init, and aio fs uses > *default_backing_dev_info* as the backing dev, which does not disable > the dirty pages accounting capability. > So truncating aio ring file contributes to accounting dirty pages, > error occurs. > The original goal is keeping these pages in memory (can not be reclaimed > or swapped) in life-time via marking it dirty. But thinking more, > pinning pages can already achieve the goal, so the SetPageDirty seems > unnecessary. > In order to fix the issue, using the __set_page_dirty_no_writeback instead > of the nop .set_page_dirty, and dropped the SetPageDirty (don't manually > set the dirty flags, don't disable set_page_dirty(), rely on default behaviour). > And an aio private backing dev info (disabled the ACCT_DIRTY/WRITEBACK/ACCT_WB > capabilities) is introduced to replace the default one to avoid accounting > dirty pages. > > Reported-by: Markus Königshaus <m.koenigshaus@wut.de> > Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> > Cc: stable <stable@vger.kernel.org> > --- > fs/aio.c | 18 +++++++++++------- > 1 files changed, 11 insertions(+), 7 deletions(-) > > diff --git a/fs/aio.c b/fs/aio.c > index 84a7510..b23dc31 100644 > --- a/fs/aio.c > +++ b/fs/aio.c > @@ -165,6 +165,12 @@ static struct vfsmount *aio_mnt; > static const struct file_operations aio_ring_fops; > static const struct address_space_operations aio_ctx_aops; > > +static struct backing_dev_info aio_fs_backing_dev_info = { > + .name = "aiofs", > + .state = 0, > + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK | BDI_CAP_MAP_COPY, > +}; > + > static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) > { > struct qstr this = QSTR_INIT("[aio]", 5); > @@ -176,6 +182,7 @@ static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) > > inode->i_mapping->a_ops = &aio_ctx_aops; > inode->i_mapping->private_data = ctx; > + inode->i_mapping->backing_dev_info = &aio_fs_backing_dev_info; > inode->i_size = PAGE_SIZE * nr_pages; > > path.dentry = d_alloc_pseudo(aio_mnt->mnt_sb, &this); > @@ -220,6 +227,9 @@ static int __init aio_setup(void) > if (IS_ERR(aio_mnt)) > panic("Failed to create aio fs mount."); > > + if (bdi_init(&aio_fs_backing_dev_info)) > + panic("Failed to init aio fs backing dev info."); > + > kiocb_cachep = KMEM_CACHE(kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC); > kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC); > > @@ -281,11 +291,6 @@ static const struct file_operations aio_ring_fops = { > .mmap = aio_ring_mmap, > }; > > -static int aio_set_page_dirty(struct page *page) > -{ > - return 0; > -} > - > #if IS_ENABLED(CONFIG_MIGRATION) > static int aio_migratepage(struct address_space *mapping, struct page *new, > struct page *old, enum migrate_mode mode) > @@ -357,7 +362,7 @@ out: > #endif > > static const struct address_space_operations aio_ctx_aops = { > - .set_page_dirty = aio_set_page_dirty, > + .set_page_dirty = __set_page_dirty_no_writeback, > #if IS_ENABLED(CONFIG_MIGRATION) > .migratepage = aio_migratepage, > #endif > @@ -412,7 +417,6 @@ static int aio_setup_ring(struct kioctx *ctx) > pr_debug("pid(%d) page[%d]->count=%d\n", > current->pid, i, page_count(page)); > SetPageUptodate(page); > - SetPageDirty(page); > unlock_page(page); > > ctx->ring_pages[i] = page; ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] aio: fix uncorrent dirty pages accouting when truncating AIO ring buffer 2014-11-05 9:53 ` [PATCH] aio: fix uncorrent dirty pages accouting when truncating AIO ring buffer Gu Zheng @ 2014-11-05 14:56 ` Benjamin LaHaise 2014-11-05 21:00 ` Andrew Morton 0 siblings, 1 reply; 4+ messages in thread From: Benjamin LaHaise @ 2014-11-05 14:56 UTC (permalink / raw) To: Gu Zheng; +Cc: akpm, linux-aio, m.koenigshaus, linux-kernel, stable On Wed, Nov 05, 2014 at 05:53:11PM +0800, Gu Zheng wrote: > ping... I need someone a bit more familiar with this area of code to chime in on reviewing this. Andrew, can you provide any feedback on this fix? -ben > On 10/31/2014 06:07 PM, Gu Zheng wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=86831 > > > > Markus reported that when shutting down mysqld (with AIO support, > > on a ext3 formatted Harddrive) leads to a negative number of dirty pages > > (underrun to the counter). The negative number results in a drastic reduction > > of the write performance because the page cache is not used, because the kernel > > thinks it is still 2 ^ 32 dirty pages open. > > > > Add a warn trace in __dec_zone_state will catch this easily: > > > > static inline void __dec_zone_state(struct zone *zone, enum > > zone_stat_item item) > > { > > atomic_long_dec(&zone->vm_stat[item]); > > + WARN_ON_ONCE(item == NR_FILE_DIRTY && > > atomic_long_read(&zone->vm_stat[item]) < 0); > > atomic_long_dec(&vm_stat[item]); > > } > > > > [ 21.341632] ------------[ cut here ]------------ > > [ 21.346294] WARNING: CPU: 0 PID: 309 at include/linux/vmstat.h:242 > > cancel_dirty_page+0x164/0x224() > > [ 21.355296] Modules linked in: wutbox_cp sata_mv > > [ 21.359968] CPU: 0 PID: 309 Comm: kworker/0:1 Not tainted 3.14.21-WuT #80 > > [ 21.366793] Workqueue: events free_ioctx > > [ 21.370760] [<c0016a64>] (unwind_backtrace) from [<c0012f88>] > > (show_stack+0x20/0x24) > > [ 21.378562] [<c0012f88>] (show_stack) from [<c03f8ccc>] > > (dump_stack+0x24/0x28) > > [ 21.385840] [<c03f8ccc>] (dump_stack) from [<c0023ae4>] > > (warn_slowpath_common+0x84/0x9c) > > [ 21.393976] [<c0023ae4>] (warn_slowpath_common) from [<c0023bb8>] > > (warn_slowpath_null+0x2c/0x34) > > [ 21.402800] [<c0023bb8>] (warn_slowpath_null) from [<c00c0688>] > > (cancel_dirty_page+0x164/0x224) > > [ 21.411524] [<c00c0688>] (cancel_dirty_page) from [<c00c080c>] > > (truncate_inode_page+0x8c/0x158) > > [ 21.420272] [<c00c080c>] (truncate_inode_page) from [<c00c0a94>] > > (truncate_inode_pages_range+0x11c/0x53c) > > [ 21.429890] [<c00c0a94>] (truncate_inode_pages_range) from > > [<c00c0f6c>] (truncate_pagecache+0x88/0xac) > > [ 21.439252] [<c00c0f6c>] (truncate_pagecache) from [<c00c0fec>] > > (truncate_setsize+0x5c/0x74) > > [ 21.447731] [<c00c0fec>] (truncate_setsize) from [<c013b3a8>] > > (put_aio_ring_file.isra.14+0x34/0x90) > > [ 21.456826] [<c013b3a8>] (put_aio_ring_file.isra.14) from > > [<c013b424>] (aio_free_ring+0x20/0xcc) > > [ 21.465660] [<c013b424>] (aio_free_ring) from [<c013b4f4>] > > (free_ioctx+0x24/0x44) > > [ 21.473190] [<c013b4f4>] (free_ioctx) from [<c003d8d8>] > > (process_one_work+0x134/0x47c) > > [ 21.481132] [<c003d8d8>] (process_one_work) from [<c003e988>] > > (worker_thread+0x130/0x414) > > [ 21.489350] [<c003e988>] (worker_thread) from [<c00448ac>] > > (kthread+0xd4/0xec) > > [ 21.496621] [<c00448ac>] (kthread) from [<c000ec18>] > > (ret_from_fork+0x14/0x20) > > [ 21.503884] ---[ end trace 79c4bf42c038c9a1 ]--- > > > > The cause is that we set the aio ring file pages as *DIRTY* via SetPageDirty > > (bypasses the VFS dirty pages increment) when init, and aio fs uses > > *default_backing_dev_info* as the backing dev, which does not disable > > the dirty pages accounting capability. > > So truncating aio ring file contributes to accounting dirty pages, > > error occurs. > > The original goal is keeping these pages in memory (can not be reclaimed > > or swapped) in life-time via marking it dirty. But thinking more, > > pinning pages can already achieve the goal, so the SetPageDirty seems > > unnecessary. > > In order to fix the issue, using the __set_page_dirty_no_writeback instead > > of the nop .set_page_dirty, and dropped the SetPageDirty (don't manually > > set the dirty flags, don't disable set_page_dirty(), rely on default behaviour). > > And an aio private backing dev info (disabled the ACCT_DIRTY/WRITEBACK/ACCT_WB > > capabilities) is introduced to replace the default one to avoid accounting > > dirty pages. > > > > Reported-by: Markus K�nigshaus <m.koenigshaus@wut.de> > > Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> > > Cc: stable <stable@vger.kernel.org> > > --- > > fs/aio.c | 18 +++++++++++------- > > 1 files changed, 11 insertions(+), 7 deletions(-) > > > > diff --git a/fs/aio.c b/fs/aio.c > > index 84a7510..b23dc31 100644 > > --- a/fs/aio.c > > +++ b/fs/aio.c > > @@ -165,6 +165,12 @@ static struct vfsmount *aio_mnt; > > static const struct file_operations aio_ring_fops; > > static const struct address_space_operations aio_ctx_aops; > > > > +static struct backing_dev_info aio_fs_backing_dev_info = { > > + .name = "aiofs", > > + .state = 0, > > + .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK | BDI_CAP_MAP_COPY, > > +}; > > + > > static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) > > { > > struct qstr this = QSTR_INIT("[aio]", 5); > > @@ -176,6 +182,7 @@ static struct file *aio_private_file(struct kioctx *ctx, loff_t nr_pages) > > > > inode->i_mapping->a_ops = &aio_ctx_aops; > > inode->i_mapping->private_data = ctx; > > + inode->i_mapping->backing_dev_info = &aio_fs_backing_dev_info; > > inode->i_size = PAGE_SIZE * nr_pages; > > > > path.dentry = d_alloc_pseudo(aio_mnt->mnt_sb, &this); > > @@ -220,6 +227,9 @@ static int __init aio_setup(void) > > if (IS_ERR(aio_mnt)) > > panic("Failed to create aio fs mount."); > > > > + if (bdi_init(&aio_fs_backing_dev_info)) > > + panic("Failed to init aio fs backing dev info."); > > + > > kiocb_cachep = KMEM_CACHE(kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC); > > kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC); > > > > @@ -281,11 +291,6 @@ static const struct file_operations aio_ring_fops = { > > .mmap = aio_ring_mmap, > > }; > > > > -static int aio_set_page_dirty(struct page *page) > > -{ > > - return 0; > > -} > > - > > #if IS_ENABLED(CONFIG_MIGRATION) > > static int aio_migratepage(struct address_space *mapping, struct page *new, > > struct page *old, enum migrate_mode mode) > > @@ -357,7 +362,7 @@ out: > > #endif > > > > static const struct address_space_operations aio_ctx_aops = { > > - .set_page_dirty = aio_set_page_dirty, > > + .set_page_dirty = __set_page_dirty_no_writeback, > > #if IS_ENABLED(CONFIG_MIGRATION) > > .migratepage = aio_migratepage, > > #endif > > @@ -412,7 +417,6 @@ static int aio_setup_ring(struct kioctx *ctx) > > pr_debug("pid(%d) page[%d]->count=%d\n", > > current->pid, i, page_count(page)); > > SetPageUptodate(page); > > - SetPageDirty(page); > > unlock_page(page); > > > > ctx->ring_pages[i] = page; > -- "Thought is the essence of where you are now." ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] aio: fix uncorrent dirty pages accouting when truncating AIO ring buffer 2014-11-05 14:56 ` Benjamin LaHaise @ 2014-11-05 21:00 ` Andrew Morton 2014-11-06 1:14 ` Gu Zheng 0 siblings, 1 reply; 4+ messages in thread From: Andrew Morton @ 2014-11-05 21:00 UTC (permalink / raw) To: Benjamin LaHaise; +Cc: Gu Zheng, linux-aio, m.koenigshaus, linux-kernel, stable On Wed, 5 Nov 2014 09:56:20 -0500 Benjamin LaHaise <bcrl@kvack.org> wrote: > On Wed, Nov 05, 2014 at 05:53:11PM +0800, Gu Zheng wrote: > > ping... > > I need someone a bit more familiar with this area of code to chime in on > reviewing this. Andrew, can you provide any feedback on this fix? That looks like a sensible solution and the changelog is good. A couple of suggested modifications: - Don't use the term "pinned" in the changelog without explaining it - it's a terribly ambiguous term. I assume here you mean "pinned by elevating the page's refcount"? - Could we have just one wee little code comment, explaining why aio_fs_backing_dev_info exists, what function it is serving? Put yourself in the position of a reader wondering "why is this here". ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] aio: fix uncorrent dirty pages accouting when truncating AIO ring buffer 2014-11-05 21:00 ` Andrew Morton @ 2014-11-06 1:14 ` Gu Zheng 0 siblings, 0 replies; 4+ messages in thread From: Gu Zheng @ 2014-11-06 1:14 UTC (permalink / raw) To: Andrew Morton Cc: Benjamin LaHaise, linux-aio, m.koenigshaus, linux-kernel, stable Hi Andrew, On 11/06/2014 05:00 AM, Andrew Morton wrote: > On Wed, 5 Nov 2014 09:56:20 -0500 Benjamin LaHaise <bcrl@kvack.org> wrote: > >> On Wed, Nov 05, 2014 at 05:53:11PM +0800, Gu Zheng wrote: >>> ping... >> >> I need someone a bit more familiar with this area of code to chime in on >> reviewing this. Andrew, can you provide any feedback on this fix? > > That looks like a sensible solution and the changelog is good. A > couple of suggested modifications: > > - Don't use the term "pinned" in the changelog without explaining it > - it's a terribly ambiguous term. I assume here you mean "pinned by > elevating the page's refcount"? Yes. > > - Could we have just one wee little code comment, explaining why > aio_fs_backing_dev_info exists, what function it is serving? Put > yourself in the position of a reader wondering "why is this here". > . Got it. Thanks for your suggestion. Will update it soon. Regards, Gu > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-11-06 1:14 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1414750042-20832-1-git-send-email-guz.fnst@cn.fujitsu.com>
2014-11-05 9:53 ` [PATCH] aio: fix uncorrent dirty pages accouting when truncating AIO ring buffer Gu Zheng
2014-11-05 14:56 ` Benjamin LaHaise
2014-11-05 21:00 ` Andrew Morton
2014-11-06 1:14 ` Gu Zheng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).