From: Vivek Goyal <vgoyal@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Chris Mason <chris.mason@oracle.com>,
Andreas Dilger <adilger@dilger.ca>,
Justin TerAvest <teravest@google.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
m-ikeda <m-ikeda@ds.jp.nec.com>, jaxboe <jaxboe@fusionio.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
ryov <ryov@valinux.co.jp>, taka <taka@valinux.co.jp>,
"righi.andrea" <righi.andrea@gmail.com>,
guijianfeng <guijianfeng@cn.fujitsu.com>,
balbir <balbir@linux.vnet.ibm.com>,
ctalbott <ctalbott@google.com>, nauman <nauman@google.com>,
mrubin <mrubin@google.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC] Storing cgroup id in page->private (Was: Re: [RFC] [PATCH 0/6] Provide cgroup isolation for buffered writes.)
Date: Thu, 10 Mar 2011 21:15:31 -0500 [thread overview]
Message-ID: <20110311021531.GA11710@redhat.com> (raw)
In-Reply-To: <20110311014618.GC15097@dastard>
On Fri, Mar 11, 2011 at 12:46:18PM +1100, Dave Chinner wrote:
> On Thu, Mar 10, 2011 at 04:43:31PM -0500, Chris Mason wrote:
> > Excerpts from Vivek Goyal's message of 2011-03-10 16:38:32 -0500:
> > > On Thu, Mar 10, 2011 at 02:24:07PM -0700, Andreas Dilger wrote:
> > > > On 2011-03-10, at 2:15 PM, Chris Mason wrote:
> > > > > Excerpts from Vivek Goyal's message of 2011-03-10 14:41:06 -0500:
> > > > >> On Thu, Mar 10, 2011 at 02:11:15PM -0500, Vivek Goyal wrote:
> > > > >>>>> I think the person who dirtied the page can store the information in
> > > > >>>>> page->private (assuming buffer heads were not generated) and if flusher
> > > > >>>>> thread later ends up generating buffer heads and ends up modifying
> > > > >>>>> page->private, this can be copied in buffer heads?
> > > > >>>>
> > > > >>>> This scares me a bit.
> > > > >>>>
> > > > >>>> As I understand it, fs/ code expects total ownership of page->private.
> > > > >>>> This adds a responsibility for every user to copy the data through and
> > > > >>>> store it in the buffer head (or anything else). btrfs seems to do
> > > > >>>> something entirely different in some cases and store a different kind
> > > > >>>> of value.
> > > > >>>
> > > > >>> If filesystems are using page->private for some other purpose also, then
> > > > >>> I guess we have issues.
> > > > >>>
> > > > >>> I am ccing linux-fsdevel to have some feedback on the idea of trying
> > > > >>> to store cgroup id of page dirtying thread in page->private and/or buffer
> > > > >>> head for tracking which group originally dirtied the page in IO controller
> > > > >>> during writeback.
> > > > >>
> > > > >> A quick "grep" showed that btrfs, ceph and logfs are using page->private
> > > > >> for other purposes also.
> > > > >>
> > > > >> I was under the impression that either page->private is null or it
> > > > >> points to buffer heads for the writeback case. So storing the info
> > > > >> directly in either buffer head directly or first in page->private and
> > > > >> then transferring it to buffer heads would have helped.
> > > > >
> > > > > Right, btrfs has its own uses for page->private, and we expect to own
> > > > > it. With a proper callback, the FS could store the extra information you
> > > > > need in out own structs.
> > > >
> > > > There is no requirement that page->private ever points to a
> > > > buffer_head, and Lustre clients use it for its own tracking
> > > > structure (never touching buffer_heads at all). Any
> > > > assumption about what a filesystem is storing in page->private
> > > > in other parts of the code is just broken.
> > >
> > > Andreas,
> > >
> > > As Chris mentioned, will providing callbacks so that filesystem
> > > can save/restore page->private be reasonable?
> >
> > Just to clarify, I think saving/restoring page->private is going
> > to be hard. I'd rather just have a call back that says here's a
> > page, storage this for the block io controller please, and another
> > one that returns any previously stored info.
>
> Agreed - there is absolutely no guarantee that some other thread
> doesn't grab the page while it is under writeback and dereference
> page->private expecting there to be buffer heads or some filesystem
> specific structure to be there. Hence swapping out the expected
> structure with something different is problematic.
>
> However, I think there's bigger issues. e.g. page->private might
> point to multiple bufferheads that map to non-contiguous disk blocks
> that were written by different threads - what happens if we get two
> concurrent IOs to the one page, perhaps with different cgroup IDs?
I guess in such cases we can afford to lose some accuracy and a
simple approximation can be the last writer's cgroup id is used
for whole page.
>
> Further, page->private might not even point to a per-page specific
> structure - it might point to a structure shared by multiple pages
> (e.g. an extent map). Adding a callback like this requires
> filesystems to be able to store per-page or per-block information
> for external users. Indeed, one of the areas of development in XFS
> right now is to move away from storing internal per-block/per-page
> information because of the memory overhead it causes.
Ok, if filesystem is trying to move away from per page information then
these kind of callbacks become a burden.
>
> IMO, if you really need some per-page information, then just put it
> in the struct page - you can't hide the memory overhead just by
> having the filesystem to store it for you. That just adds
> unnecessary complexity...
Ok. I guess adding anything to struct page is going to be hard and
we might have to fall back to looking into using page_cgroup for
tracking page state. I was trying to explore the ways so that we don't
have to instantiate whole page_cgroup structure just for trying
to figure out who dirtied the page.
Thanks
Vivek
next prev parent reply other threads:[~2011-03-11 2:15 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-08 21:20 [RFC] [PATCH 0/6] Provide cgroup isolation for buffered writes Justin TerAvest
2011-03-08 21:20 ` [PATCH 1/6] Add IO cgroup tracking " Justin TerAvest
2011-03-08 21:20 ` [PATCH 2/6] Make async queues per cgroup Justin TerAvest
2011-03-08 21:20 ` [PATCH 3/6] Modify CFQ to use IO tracking information Justin TerAvest
2011-03-08 21:20 ` [PATCH 4/6] With per-cgroup async, don't special case queues Justin TerAvest
2011-03-08 21:20 ` [PATCH 5/6] Add stat for per cgroup writeout done by flusher Justin TerAvest
2011-03-08 21:20 ` [PATCH 6/6] Per cgroup request descriptor counts Justin TerAvest
2011-03-08 22:43 ` [RFC] [PATCH 0/6] Provide cgroup isolation for buffered writes Vivek Goyal
2011-03-08 22:50 ` Vivek Goyal
2011-03-09 18:04 ` Justin TerAvest
2011-03-11 2:47 ` Vivek Goyal
2011-03-11 16:07 ` Justin TerAvest
2011-03-11 16:39 ` Vivek Goyal
2011-03-15 16:41 ` Justin TerAvest
2011-03-15 18:31 ` Vivek Goyal
2011-03-09 5:22 ` KAMEZAWA Hiroyuki
2011-03-09 15:19 ` Vivek Goyal
2011-03-09 18:05 ` Justin TerAvest
2011-03-10 18:08 ` Justin TerAvest
2011-03-10 18:15 ` Vivek Goyal
2011-03-10 18:57 ` Justin TerAvest
2011-03-10 19:11 ` [RFC] Storing cgroup id in page->private (Was: Re: [RFC] [PATCH 0/6] Provide cgroup isolation for buffered writes.) Vivek Goyal
2011-03-10 19:41 ` Vivek Goyal
2011-03-10 21:15 ` Chris Mason
2011-03-10 21:15 ` Chris Mason
2011-03-10 21:24 ` Andreas Dilger
2011-03-10 21:38 ` Vivek Goyal
2011-03-10 21:43 ` Chris Mason
2011-03-11 1:20 ` KAMEZAWA Hiroyuki
2011-03-11 1:46 ` Dave Chinner
2011-03-11 2:15 ` Vivek Goyal [this message]
2011-03-11 2:52 ` KAMEZAWA Hiroyuki
2011-03-11 3:15 ` Vivek Goyal
2011-03-11 3:13 ` KAMEZAWA Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110311021531.GA11710@redhat.com \
--to=vgoyal@redhat.com \
--cc=adilger@dilger.ca \
--cc=balbir@linux.vnet.ibm.com \
--cc=chris.mason@oracle.com \
--cc=ctalbott@google.com \
--cc=david@fromorbit.com \
--cc=guijianfeng@cn.fujitsu.com \
--cc=jaxboe@fusionio.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=m-ikeda@ds.jp.nec.com \
--cc=mrubin@google.com \
--cc=nauman@google.com \
--cc=righi.andrea@gmail.com \
--cc=ryov@valinux.co.jp \
--cc=taka@valinux.co.jp \
--cc=teravest@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.