All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Chris Mason <chris.mason@oracle.com>
Cc: Vivek Goyal <vgoyal@redhat.com>,
	Andreas Dilger <adilger@dilger.ca>,
	Justin TerAvest <teravest@google.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	m-ikeda <m-ikeda@ds.jp.nec.com>, jaxboe <jaxboe@fusionio.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	ryov <ryov@valinux.co.jp>, taka <taka@valinux.co.jp>,
	"righi.andrea" <righi.andrea@gmail.com>,
	guijianfeng <guijianfeng@cn.fujitsu.com>,
	balbir <balbir@linux.vnet.ibm.com>,
	ctalbott <ctalbott@google.com>, nauman <nauman@google.com>,
	mrubin <mrubin@google.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [RFC] Storing cgroup id in page->private (Was: Re: [RFC] [PATCH 0/6] Provide cgroup isolation for buffered writes.)
Date: Fri, 11 Mar 2011 12:46:18 +1100	[thread overview]
Message-ID: <20110311014618.GC15097@dastard> (raw)
In-Reply-To: <1299793340-sup-9066@think>

On Thu, Mar 10, 2011 at 04:43:31PM -0500, Chris Mason wrote:
> Excerpts from Vivek Goyal's message of 2011-03-10 16:38:32 -0500:
> > On Thu, Mar 10, 2011 at 02:24:07PM -0700, Andreas Dilger wrote:
> > > On 2011-03-10, at 2:15 PM, Chris Mason wrote:
> > > > Excerpts from Vivek Goyal's message of 2011-03-10 14:41:06 -0500:
> > > >> On Thu, Mar 10, 2011 at 02:11:15PM -0500, Vivek Goyal wrote:
> > > >>>>> I think the person who dirtied the page can store the information in
> > > >>>>> page->private (assuming buffer heads were not generated) and if flusher
> > > >>>>> thread later ends up generating buffer heads and ends up modifying
> > > >>>>> page->private, this can be copied in buffer heads?
> > > >>>> 
> > > >>>> This scares me a bit.
> > > >>>> 
> > > >>>> As I understand it, fs/ code expects total ownership of page->private.
> > > >>>> This adds a responsibility for every user to copy the data through and
> > > >>>> store it in the buffer head (or anything else). btrfs seems to do
> > > >>>> something entirely different in some cases and store a different kind
> > > >>>> of value.
> > > >>> 
> > > >>> If filesystems are using page->private for some other purpose also, then
> > > >>> I guess we have issues. 
> > > >>> 
> > > >>> I am ccing linux-fsdevel to have some feedback on the idea of trying
> > > >>> to store cgroup id of page dirtying thread in page->private and/or buffer
> > > >>> head for tracking which group originally dirtied the page in IO controller
> > > >>> during writeback.
> > > >> 
> > > >> A quick "grep" showed that btrfs, ceph and logfs are using page->private
> > > >> for other purposes also.
> > > >> 
> > > >> I was under the impression that either page->private is null or it 
> > > >> points to buffer heads for the writeback case. So storing the info
> > > >> directly in either buffer head directly or first in page->private and
> > > >> then transferring it to buffer heads would have helped. 
> > > > 
> > > > Right, btrfs has its own uses for page->private, and we expect to own
> > > > it.  With a proper callback, the FS could store the extra information you
> > > > need in out own structs.
> > > 
> > > There is no requirement that page->private ever points to a
> > > buffer_head, and Lustre clients use it for its own tracking
> > > structure (never touching buffer_heads at all).  Any
> > > assumption about what a filesystem is storing in page->private
> > > in other parts of the code is just broken.
> > 
> > Andreas,
> > 
> > As Chris mentioned, will providing callbacks so that filesystem
> > can save/restore page->private be reasonable?
> 
> Just to clarify, I think saving/restoring page->private is going
> to be hard.  I'd rather just have a call back that says here's a
> page, storage this for the block io controller please, and another
> one that returns any previously stored info.

Agreed - there is absolutely no guarantee that some other thread
doesn't grab the page while it is under writeback and dereference
page->private expecting there to be buffer heads or some filesystem
specific structure to be there. Hence swapping out the expected
structure with something different is problematic.

However, I think there's bigger issues. e.g.  page->private might
point to multiple bufferheads that map to non-contiguous disk blocks
that were written by different threads - what happens if we get two
concurrent IOs to the one page, perhaps with different cgroup IDs?

Further, page->private might not even point to a per-page specific
structure - it might point to a structure shared by multiple pages
(e.g. an extent map). Adding a callback like this requires
filesystems to be able to store per-page or per-block information
for external users. Indeed, one of the areas of development in XFS
right now is to move away from storing internal per-block/per-page
information because of the memory overhead it causes.

IMO, if you really need some per-page information, then just put it
in the struct page - you can't hide the memory overhead just by
having the filesystem to store it for you. That just adds
unnecessary complexity...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2011-03-11  1:46 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-08 21:20 [RFC] [PATCH 0/6] Provide cgroup isolation for buffered writes Justin TerAvest
2011-03-08 21:20 ` [PATCH 1/6] Add IO cgroup tracking " Justin TerAvest
2011-03-08 21:20 ` [PATCH 2/6] Make async queues per cgroup Justin TerAvest
2011-03-08 21:20 ` [PATCH 3/6] Modify CFQ to use IO tracking information Justin TerAvest
2011-03-08 21:20 ` [PATCH 4/6] With per-cgroup async, don't special case queues Justin TerAvest
2011-03-08 21:20 ` [PATCH 5/6] Add stat for per cgroup writeout done by flusher Justin TerAvest
2011-03-08 21:20 ` [PATCH 6/6] Per cgroup request descriptor counts Justin TerAvest
2011-03-08 22:43 ` [RFC] [PATCH 0/6] Provide cgroup isolation for buffered writes Vivek Goyal
2011-03-08 22:50   ` Vivek Goyal
2011-03-09 18:04     ` Justin TerAvest
2011-03-11  2:47       ` Vivek Goyal
2011-03-11 16:07         ` Justin TerAvest
2011-03-11 16:39           ` Vivek Goyal
2011-03-15 16:41             ` Justin TerAvest
2011-03-15 18:31               ` Vivek Goyal
2011-03-09  5:22 ` KAMEZAWA Hiroyuki
2011-03-09 15:19   ` Vivek Goyal
2011-03-09 18:05   ` Justin TerAvest
2011-03-10 18:08   ` Justin TerAvest
2011-03-10 18:15     ` Vivek Goyal
2011-03-10 18:57       ` Justin TerAvest
2011-03-10 19:11         ` [RFC] Storing cgroup id in page->private (Was: Re: [RFC] [PATCH 0/6] Provide cgroup isolation for buffered writes.) Vivek Goyal
2011-03-10 19:41           ` Vivek Goyal
2011-03-10 21:15             ` Chris Mason
2011-03-10 21:15               ` Chris Mason
2011-03-10 21:24               ` Andreas Dilger
2011-03-10 21:38                 ` Vivek Goyal
2011-03-10 21:43                   ` Chris Mason
2011-03-11  1:20                     ` KAMEZAWA Hiroyuki
2011-03-11  1:46                     ` Dave Chinner [this message]
2011-03-11  2:15                       ` Vivek Goyal
2011-03-11  2:52                         ` KAMEZAWA Hiroyuki
2011-03-11  3:15                           ` Vivek Goyal
2011-03-11  3:13                             ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110311014618.GC15097@dastard \
    --to=david@fromorbit.com \
    --cc=adilger@dilger.ca \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=chris.mason@oracle.com \
    --cc=ctalbott@google.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=jaxboe@fusionio.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m-ikeda@ds.jp.nec.com \
    --cc=mrubin@google.com \
    --cc=nauman@google.com \
    --cc=righi.andrea@gmail.com \
    --cc=ryov@valinux.co.jp \
    --cc=taka@valinux.co.jp \
    --cc=teravest@google.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.