From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>,
linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Nick Piggin <nickpiggin@yahoo.com.au>
Subject: Re: [PATCH] fix task dirty balancing
Date: Sat, 05 Jul 2008 11:30:02 +0200 [thread overview]
Message-ID: <1215250202.6320.10.camel@lappy.programming.kicks-ass.net> (raw)
In-Reply-To: <20080705150401.8bd28b71.kamezawa.hiroyu@jp.fujitsu.com>
On Sat, 2008-07-05 at 15:04 +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 02 Jul 2008 22:27:18 +0200
> Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> > On Wed, 2008-07-02 at 17:26 +0900, YAMAMOTO Takashi wrote:
> > > hi,
> > >
> > > task_dirty_inc doesn't seem to be called properly for
> > > filesystems which don't use set_page_dirty for write(2).
> > > eg. ext2 w/o nobh option.
> >
> > I'm thinking this is an ext2 bug. So I'd rather it'd just call
> > set_page_dirty() like a proper filesystem instead of doing things like
> > this.
> >
> > And I certainly don't like exporting task_dirty_inc() - filesystems and
> > the like should not have to know about things like that.
> >
> Hmm, a bit complicated for me.
>
> At first, there are 2 __set_page_dirty() in the kernel.
> - mm/page-writeback.c: __set_page_dirty()
> .... set_page_dirty() calls this.
> - fs/buffer.c : __set_page_dirty()
> .... __set_page_dirty_buffers() and mark_buffer_dirty() calls this.
>
> Why per-task dirty acconitng is done in mm/page-writeback.c::set_page_dirty() ?
>
> It seems other accounting is done in the fs/buffer.c: __set_page_dirty()
>
> The purpose of task-dirty accounting is different from others ?
>
> = fs/buffer.c
> 697 static int __set_page_dirty(struct page *page,
> 698 struct address_space *mapping, int warn)
> 699 {
> 700 if (unlikely(!mapping))
> 701 return !TestSetPageDirty(page);
> 702
> 703 if (TestSetPageDirty(page))
> 704 return 0;
> 705
> 706 write_lock_irq(&mapping->tree_lock);
> 707 if (page->mapping) { /* Race with truncate? */
> 708 WARN_ON_ONCE(warn && !PageUptodate(page));
> 709
> 710 if (mapping_cap_account_dirty(mapping)) {
> 711 __inc_zone_page_state(page, NR_FILE_DIRTY);
> 712 __inc_bdi_stat(mapping->backing_dev_info,
> 713 BDI_RECLAIMABLE);
> 714 task_io_account_write(PAGE_CACHE_SIZE);
> 715 }
> 716 radix_tree_tag_set(&mapping->page_tree,
> 717 page_index(page), PAGECACHE_TAG_DIRTY);
> 718 }
> 719 write_unlock_irq(&mapping->tree_lock);
> 720 __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
> 721
> 722 return 1;
> ==
>
> And task-dirty-limit don't have to take care of following 2 case ?
> - __set_page_dirty_nobuffers(struct page *page) (increment BDI_RECRAIMABLE)
> - test_set_page_writeback() (increment BDI_RECLAIMABLE)
Gah - what a mess...
It's in set_page_dirty() so it wouldn't have to be in all the
a_ops->set_page_dirty() functions...
But now it turns out people don't use set_page_dirty() to dirty
pages :-(
For the purpose of task_dirty_inc() I guess we might as well pair it
with task_io_account_write() for each PAGE_CACHE_SIZE (and ignore the
DIO bit, since that doesn't care about the dirty limit anyway).
Might be my ignorance, but _why_ do we have __set_page_dirty_nobuffers()
reimplemented in fs/buffers.c:__set_page_dirty() ? - those two functions
look suspiciously similar.
Also, why was the EXPORT added anyway - fs/buffers.o never ends up in
modules?
Please beat me to cleaning up this stuff - otherwise I'll have to look
at it when I get back from holidays.
Peter
next prev parent reply other threads:[~2008-07-05 9:45 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-02 8:26 [PATCH] fix task dirty balancing YAMAMOTO Takashi
2008-07-02 20:27 ` Peter Zijlstra
2008-07-03 7:33 ` YAMAMOTO Takashi
2008-07-05 6:04 ` KAMEZAWA Hiroyuki
2008-07-05 9:30 ` Peter Zijlstra [this message]
2008-07-07 6:46 ` YAMAMOTO Takashi
2008-07-07 10:36 ` Nick Piggin
2008-07-08 23:38 ` YAMAMOTO Takashi
2008-07-09 7:41 ` Nick Piggin
2008-07-10 3:10 ` YAMAMOTO Takashi
2008-07-10 7:23 ` Nick Piggin
2008-07-24 0:27 ` YAMAMOTO Takashi
2008-07-24 15:08 ` Nick Piggin
2008-07-25 8:04 ` YAMAMOTO Takashi
2008-07-25 9:57 ` Peter Zijlstra
2008-07-28 5:50 ` YAMAMOTO Takashi
2008-07-28 7:24 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1215250202.6320.10.camel@lappy.programming.kicks-ass.net \
--to=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nickpiggin@yahoo.com.au \
--cc=yamamoto@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.