From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wu Fengguang Subject: Re: [PATCH 30/45] vmscan: lumpy pageout Date: Thu, 8 Oct 2009 10:37:01 +0800 Message-ID: <20091008023701.GA20021@localhost> References: <8acda98c0910070338n7220fdabo8c7f8f9e7d21ef6c@mail.gmail.com> <20091007111454.GB15936@localhost> <8acda98c0910070437g1498f99eua9a5ad71e6c2bae4@mail.gmail.com> <20091007132924.GB20855@localhost> <20091007134254.GA26244@localhost> <20091007142054.GA6798@localhost> <8acda98c0910070750x6428b96fgdeee5946d1408888@mail.gmail.com> <20091007150047.GA9848@localhost> <8acda98c0910070850x14614e0fh832f5cd29b1588f0@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Hugh Dickins , Andrew Morton , Theodore Tso , Christoph Hellwig , Dave Chinner , Chris Mason , Peter Zijlstra , "Li, Shaohua" , Myklebust Trond , "jens.axboe@oracle.com" , Jan Kara , Nick Piggin , "linux-fsdevel@vger.kernel.org" To: Nikita Danilov Return-path: Received: from mga03.intel.com ([143.182.124.21]:12745 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753850AbZJHChz (ORCPT ); Wed, 7 Oct 2009 22:37:55 -0400 Content-Disposition: inline In-Reply-To: <8acda98c0910070850x14614e0fh832f5cd29b1588f0@mail.gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Wed, Oct 07, 2009 at 11:50:58PM +0800, Nikita Danilov wrote: > 2009/10/7 Wu Fengguang : >=20 > [...] >=20 > > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0*/ > > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (current_is_k= swapd() && > > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 bd= i_cap_writeback_dirty(mapping->backing_dev_info) && > > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 !m= apping->a_ops->writepages) { > > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 wbc.range_start =3D (page->index + 1) << PAGE_CACHE_SHIFT; > > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 wbc.nr_to_write =3D LUMPY_PAGEOUT_PAGES - 1; > > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 generic_writepages(mapping, &wbc); > > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 iput(inode); > > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } > > + > > + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* >=20 > One potential problem with this is that generic_writepages() waits on > page locks and this can stall kswapd (always bad). This can be worked > around by replacing lock_page() with trylock_page() conditionally on > wbc->for_reclaim (or wbc->nonblocking?), but then, this almost look > like a separate function would be better. IMHO trylock_page() is not necessary. Locked pages are rare in normal states. kswapd already do lock_page() for all pages it try to examine state for reclaim. So it makes sense for lumpy pageout to follow the (simple) convention. > On a good side, it seems I was wrong and pageout calls iput() already= : > shrink_slab()->prune_icache()->iput(). Not totally wrong ;) iput() will be called if __GFP_FS is on. However pageout may be called on either __GFP_FS or (__GFP_IO && PageSwapCache)= =2E So I updated the patch to do lumpy pageout for __GFP_FS. In long term, it would be good to remove AOP_WRITEPAGE_ACTIVATE and ->writepage() totally, and to support shmem as well :) Thanks, =46engguang --- vmscan: lumpy pageout When pageout a dirty page, try to piggy back more consecutive dirty pages (up to 512KB) to improve IO efficiency. Only ext3/reiserfs which don't have its own aops->writepages are supported in this initial version. CC: Hugh Dickins CC: Dave Chinner CC: Nikita Danilov Signed-off-by: Wu Fengguang --- mm/page-writeback.c | 12 ++++++++++++ mm/vmscan.c | 23 ++++++++++++++++++++++- 2 files changed, 34 insertions(+), 1 deletion(-) --- linux.orig/mm/vmscan.c 2009-10-08 07:35:06.000000000 +0800 +++ linux/mm/vmscan.c 2009-10-08 10:20:51.000000000 +0800 @@ -344,11 +344,14 @@ typedef enum { PAGE_CLEAN, } pageout_t; =20 +#define LUMPY_PAGEOUT_PAGES (512 * 1024 / PAGE_CACHE_SIZE) + /* * pageout is called by shrink_page_list() for each dirty page. * Calls ->writepage(). */ static pageout_t pageout(struct page *page, struct address_space *mapp= ing, + struct scan_control *sc, enum pageout_io sync_writeback) { /* @@ -398,6 +401,10 @@ static pageout_t pageout(struct page *pa .nonblocking =3D 1, .for_reclaim =3D 1, }; + struct inode *inode =3D NULL; + + if (sc->gfp_mask & __GFP_FS) + inode =3D igrab(mapping->host); =20 SetPageReclaim(page); res =3D mapping->a_ops->writepage(page, &wbc); @@ -405,10 +412,24 @@ static pageout_t pageout(struct page *pa handle_write_error(mapping, page, res); if (res =3D=3D AOP_WRITEPAGE_ACTIVATE) { ClearPageReclaim(page); + iput(inode); return PAGE_ACTIVATE; } =20 /* + * only write_cache_pages() supports for_reclaim for now + * ignore shmem for now, thanks to Nikita. + */ + if (current_is_kswapd() && + bdi_cap_writeback_dirty(mapping->backing_dev_info) && + !mapping->a_ops->writepages) { + wbc.range_start =3D (page->index + 1) << PAGE_CACHE_SHIFT; + wbc.nr_to_write =3D LUMPY_PAGEOUT_PAGES - 1; + generic_writepages(mapping, &wbc); + iput(inode); + } + + /* * Wait on writeback if requested to. This happens when * direct reclaiming a large contiguous area and the * first attempt to free a range of pages fails. @@ -684,7 +705,7 @@ static unsigned long shrink_page_list(st goto keep_locked; =20 /* Page is dirty, try to write it out here */ - switch (pageout(page, mapping, sync_writeback)) { + switch (pageout(page, mapping, sc, sync_writeback)) { case PAGE_KEEP: goto keep_locked; case PAGE_ACTIVATE: --- linux.orig/mm/page-writeback.c 2009-10-08 07:38:57.000000000 +0800 +++ linux/mm/page-writeback.c 2009-10-08 10:06:33.000000000 +0800 @@ -805,6 +805,11 @@ int write_cache_pages(struct address_spa break; } =20 + if (wbc->for_reclaim && done_index !=3D page->index) { + done =3D 1; + break; + } + if (nr_to_write !=3D wbc->nr_to_write && done_index + WB_SEGMENT_DIST < page->index && --wbc->nr_segments <=3D 0) { @@ -846,6 +851,13 @@ continue_unlock: if (!clear_page_dirty_for_io(page)) goto continue_unlock; =20 + /* + * active and unevictable pages will be checked at + * rotate time + */ + if (wbc->for_reclaim) + SetPageReclaim(page); + ret =3D (*writepage)(page, wbc, data); if (unlikely(ret)) { if (ret =3D=3D AOP_WRITEPAGE_ACTIVATE) { -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html