From: Josef Bacik <jbacik@fb.com>
To: Jan Kara <jack@suse.cz>
Cc: <linux-btrfs@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
<kernel-team@fb.com>, <jack@suse.com>, <viro@zeniv.linux.org.uk>,
<dchinner@redhat.com>, <hch@lst.de>
Subject: Re: [PATCH 2/2] writeback: allow for dirty metadata accounting
Date: Wed, 10 Aug 2016 10:05:58 -0400 [thread overview]
Message-ID: <8d745233-4cfc-f0c2-9ba4-fee74eb1940e@fb.com> (raw)
In-Reply-To: <20160810100957.GC12157@quack2.suse.cz>
On 08/10/2016 06:09 AM, Jan Kara wrote:
> On Tue 09-08-16 15:08:27, Josef Bacik wrote:
>> Provide a mechanism for file systems to indicate how much dirty metadata they
>> are holding. This introduces a few things
>>
>> 1) Zone stats for dirty metadata, which is the same as the NR_FILE_DIRTY.
>> 2) WB stat for dirty metadata. This way we know if we need to try and call into
>> the file system to write out metadata. This could potentially be used in the
>> future to make balancing of dirty pages smarter.
>> 3) A super callback to handle writing back dirty metadata.
>>
>> A future patch will take advantage of this work in btrfs. Thanks,
>
> Hum, I once had a patch to allow filesystems to hook more into writeback
> where a filesystem was just asked to do writeback and it could decide what
> to do with it (it could use generic helpers to essentially replicate what
> current writeback code does) but it could also choose some smarter strategy
> of picking inodes to write. This scheme could easily accommodate your
> metadata writeback as well and there are also other uses for it. But that
> patch got broken by Tejun's cgroup aware writeback so one would have to
> start from scratch.
>
> We certainly have to think how to integrate this with cgroup aware
> writeback. I guess your ->writeback_metadata() just does not bother and would
> write anything in the root cgroup, right? After all you don't even pass the
> information for which memcg the metadata writeback should be performed down
> to the fs callback (that is encoded in the bdi_writeback structure). And
> for now I think we could get away with that although it would need to be
> handled properly in future I think.
>
I thought about this some but I'm not sure how to work it out so it's sane.
Currently no other file system's metadata is covered by the writeback cgroup.
Btrfs is simply by accident, we have an inode where all of our metadata is
attached. This doesn't make a whole lot of sense as the inode is tied to
whichever task dirited it last, so you are going to end up with weird writeback
behavior on btrfs metadata if you are using writeback cgroups. I think removing
this capability for now is actually better overall so we can come up with a
different solution.
> If we created a generic filesystem writeback callback as I suggest, proper
> integration with memcg writeback in unavoidable. But I have to think how to
> do that best.
So the reason I'm doing this is because the last time I tried to kill our btree
inode I got bogged down trying to reproduce our own special writeback logic for
metadata. I basically constantly oom'ed the box because we'd fill up memory
with dirty metadata, and then I started just wholesale copying
mm/page-writeback.c and mm/fs-writeback.c to try and stop the madness and gave
up because that was just as crazy.
I think that having writeback a little more modularized so file systems can be
smarter about picking inodes if they want is a good long term goal, but for now
I'd like to get this work in so I can go about killing our fs wide inode. Thanks,
Josef
next prev parent reply other threads:[~2016-08-10 18:03 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-09 19:08 [PATCH 0/2][RFC] Provide accounting for dirty metadata Josef Bacik
2016-08-09 19:08 ` [PATCH 1/2] remove mapping from balance_dirty_pages*() Josef Bacik
2016-08-09 19:30 ` kbuild test robot
2016-08-09 19:32 ` kbuild test robot
2016-08-09 20:12 ` kbuild test robot
2016-08-09 20:50 ` kbuild test robot
2016-08-10 8:27 ` Jan Kara
2016-08-10 8:29 ` Jan Kara
2016-08-10 19:56 ` Tejun Heo
2016-08-09 19:08 ` [PATCH 2/2] writeback: allow for dirty metadata accounting Josef Bacik
2016-08-10 10:09 ` Jan Kara
2016-08-10 14:05 ` Josef Bacik [this message]
2016-08-10 20:12 ` Tejun Heo
2016-08-10 21:16 ` Josef Bacik
2016-08-10 21:39 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8d745233-4cfc-f0c2-9ba4-fee74eb1940e@fb.com \
--to=jbacik@fb.com \
--cc=dchinner@redhat.com \
--cc=hch@lst.de \
--cc=jack@suse.com \
--cc=jack@suse.cz \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).