From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: chris.mason@oracle.com, xfs@oss.sgi.com
Subject: Re: [PATCH 5/5] xfs: kick inode writeback when low on memory
Date: Wed, 9 Mar 2011 16:46:57 +1100 [thread overview]
Message-ID: <20110309054657.GR15097@dastard> (raw)
In-Reply-To: <20110303161929.GA5795@infradead.org>
On Thu, Mar 03, 2011 at 11:19:29AM -0500, Christoph Hellwig wrote:
> On Thu, Mar 03, 2011 at 10:48:19AM -0500, Christoph Hellwig wrote:
> > I don't think we'll be able to get around chaning the dirty_inode
> > callback. We need a way to prevent the VFS from marking the inode
> > dirty, otherwise we have no chance of reclaiming it.
> >
> > Except for that I think it's really simple:
> >
> > 1) we need to reintroduce the i_update_size flag or an equivalent to
> > distinguish unlogged timestamp from unlogged size updates for fsync
> > vs fdatasync. At that point we can stop looking at the VFS dirty
> > bits in fsync.
> > 2) ->dirty_inode needs to tag the inode as dirty in the inode radix
> > tree
> >
> > With those minimal changes we should be set - we already
> > callxfs_sync_attr from the sync_fs path, and xfs_sync_inode_attr
> > properly picks up inodes with unlogged changes.
>
> Actually xfs_sync_attr does not get called from the sync path right now,
> which is a bit odd.
Right, and that is the root cause of the "filesystem doesn't idle"
problems that have been reported lately. As it is, I've taken the
approach of pushing the AIL every 30s rather than calling
xfs_sync_attr() as the method of avoiding this problem...
> But once we add it, possibly with an earlier
> trylock pass and/or an inode cluster read-ahead the above plan still
> stands.
I don't think that matters very much to the problem at hand.
> What's also rather odd is how much we use xfs_sync_data - unlike the
> inodes where our own code doing writeback based on disk order makes
> a lot of sense data is actually handled very well by the core writeback
> code. The two remaining callers of xfs_sync_data are
> xfs_flush_inodes_work and xfs_quiesce_data. The former area really
> belongs into this patchset - can you try what only calling
> writeback_inodes* from the ENOSPC handler instead of doing our own stuff
> does? It should give you the avoidance of double writeout for free, and
> get rid of one of xfs_sync_data callers.
Not odd at all - both are doing something the linux VFS has not been
able to do until recently. However, where-ever I've tried to use
writeback_inodes_sb_if_idle() in XFS has resulted in lockdep
complaints because it takes s_umount....
> After that we just need to look into xfs_quiesce_data. The core
> writeback code now does reliably writeback before calling into
> ->sync_fs, so the actual writeback should be superflous. We will still
> need a log force after it, and we might need an iteration through all
> inodes to do an xfs_ioend_wait, but this are can be simplified a lot.
I still don't fully trust the VFS data writeback to write all data
out in the case of freezing the filesystem, so I'm extremely wary of
dropping the data flushing that XFS is doing there.
And if we still have to do a xfs_ioend_wait() pass (which we do to
wait for direct io to complete), then all we are doing
is dropping 2 or 3 lines of code in xfs_sync_inode_data().
Hence I'm not really inclined to change either of these calls right
now as neither are critical to fixing the OOM problems.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
prev parent reply other threads:[~2011-03-09 5:44 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-22 22:16 [RFC, PATCH 0/5] xfs: Reduce OOM kill problems under heavy load Dave Chinner
2011-02-22 22:16 ` [PATCH 1/5] xfs: introduce inode cluster buffer trylocks for xfs_iflush Dave Chinner
2011-03-03 15:55 ` Christoph Hellwig
2011-03-03 22:04 ` Dave Chinner
2011-02-22 22:16 ` [PATCH 2/5] xfs: introduce a xfssyncd workqueue Dave Chinner
2011-02-22 22:16 ` [PATCH 3/5] xfs: convert ENOSPC inode flushing to use new syncd workqueue Dave Chinner
2011-03-03 15:34 ` Christoph Hellwig
2011-03-03 22:41 ` Dave Chinner
2011-03-04 12:40 ` Christoph Hellwig
2011-02-22 22:16 ` [PATCH 4/5] xfs: introduce background inode reclaim work Dave Chinner
2011-03-03 15:36 ` Christoph Hellwig
2011-03-03 22:43 ` Dave Chinner
2011-02-22 22:16 ` [PATCH 5/5] xfs: kick inode writeback when low on memory Dave Chinner
2011-03-02 3:06 ` Dave Chinner
2011-03-02 14:12 ` Christoph Hellwig
2011-03-03 2:42 ` Dave Chinner
2011-03-03 15:48 ` Christoph Hellwig
2011-03-03 16:19 ` Christoph Hellwig
2011-03-09 5:46 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110309054657.GR15097@dastard \
--to=david@fromorbit.com \
--cc=chris.mason@oracle.com \
--cc=hch@infradead.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox