From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wu Fengguang Subject: Re: [Lsf] IO less throttling and cgroup aware writeback (Was: Re: Preliminary Agenda and Activities for LSF) Date: Thu, 21 Apr 2011 23:10:45 +0800 Message-ID: <20110421151044.GA24463@localhost> References: <20110419143423.GC31712@redhat.com> <20110419144832.GB9556@quack.suse.cz> <20110419151111.GE31712@redhat.com> <20110419152239.GA30715@localhost> <20110419153106.GF31712@redhat.com> <20110419165838.GA2134@localhost> <20110419170543.GK31712@redhat.com> <20110420011638.GA4421@localhost> <20110420184433.GH29872@redhat.com> <20110421150618.GA22436@localhost> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="liOOAslEiF7prFVr" Cc: Jan Kara , James Bottomley , "lsf@lists.linux-foundation.org" , "linux-fsdevel@vger.kernel.org" , Dave Chinner To: Vivek Goyal Return-path: Received: from mga02.intel.com ([134.134.136.20]:4176 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754456Ab1DUPKx (ORCPT ); Thu, 21 Apr 2011 11:10:53 -0400 Content-Disposition: inline In-Reply-To: <20110421150618.GA22436@localhost> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: --liOOAslEiF7prFVr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sorry, attached is the "separate ACCOUNTING from THROTTLING" patch. > It's very possible to throttle meta data READS/WRITES, as long as they > can be attributed to the original task (assuming task oriented throttling > instead of bio/request oriented). > > The trick is to separate the concepts of THROTTLING and ACCOUNTING. > You can ACCOUNT data and meta data reads/writes to the right task, and > only to THROTTLE the task when it's doing data reads/writes. > > FYI I played the same trick for balance_dirty_pages_ratelimited() for > another reason: _accurate_ accounting of dirtied pages. > > That trick should play well with most applications who do interleaved > data and meta data reads/writes. For the special case of "find" who > does pure meta data reads, we can still throttle it by playing another > trick: to THROTTLE meta data reads/writes with a much higher threshold > than that of data. So normal applications will be almost always be > throttled at data accesses while "find" will be throttled at meta data > accesses. > > For a real example of how it works, you can check this patch (plus the > attached one) > > writeback: IO-less balance_dirty_pages() > http://git.kernel.org/?p=linux/kernel/git/wfg/writeback.git;a=commitdiff;h=e0de5e9961eeb992f305e877c5ef944fcd7a4269;hp=992851d56d79d227beaba1e4dcc657cbcf815556 > > Where tsk->nr_dirtied does dirty ACCOUNTING and tsk->nr_dirtied_pause > is the threshold for THROTTLING. When > > tsk->nr_dirtied > tsk->nr_dirtied_pause > > The task will voluntarily enter balance_dirty_pages() for taking a > nap (pause time will be proportional to tsk->nr_dirtied), and when > finished, start a new account-and-throttle period by resetting > tsk->nr_dirtied and possibly adjust tsk->nr_dirtied_pause for a more > reasonable pause time at next sleep. > > BTW, I'd like to advocate balance_dirty_pages() based IO controller :) > > As you may have noticed, it's not all that hard: the main functions > blkcg_update_bandwidth()/blkcg_update_dirty_ratelimit() can fit nicely > in one screen! > > writeback: async write IO controllers > http://git.kernel.org/?p=linux/kernel/git/wfg/writeback.git;a=commitdiff;h=1a58ad99ce1f6a9df6618a4b92fa4859cc3e7e90;hp=5b6fcb3125ea52ff04a2fad27a51307842deb1a0 > > Thanks, > Fengguang --liOOAslEiF7prFVr Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="writeback-accurate-task-dirtied.patch" Subject: writeback: accurately account dirtied pages Date: Thu Apr 14 07:52:37 CST 2011 Signed-off-by: Wu Fengguang --- mm/page-writeback.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- linux-next.orig/mm/page-writeback.c 2011-04-16 11:28:41.000000000 +0800 +++ linux-next/mm/page-writeback.c 2011-04-16 11:28:41.000000000 +0800 @@ -1352,8 +1352,6 @@ void balance_dirty_pages_ratelimited_nr( if (!bdi_cap_account_dirty(bdi)) return; - current->nr_dirtied += nr_pages_dirtied; - if (dirty_exceeded_recently(bdi, MAX_PAUSE)) { unsigned long max = current->nr_dirtied + (128 >> (PAGE_SHIFT - 10)); @@ -1819,6 +1817,7 @@ void account_page_dirtied(struct page *p __inc_bdi_stat(mapping->backing_dev_info, BDI_DIRTIED); task_dirty_inc(current); task_io_account_write(PAGE_CACHE_SIZE); + current->nr_dirtied++; } } EXPORT_SYMBOL(account_page_dirtied); --liOOAslEiF7prFVr--