Re: Write-back from inside FS - need suggestions

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@linux-foundation.org>
To: Artem Bityutskiy <dedekind@yandex.ru>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Write-back from inside FS - need suggestions
Date: Sat, 29 Sep 2007 13:00:11 -0700	[thread overview]
Message-ID: <20070929130011.c3a11139.akpm@linux-foundation.org> (raw)
In-Reply-To: <46FEA332.9090904@yandex.ru>

On Sat, 29 Sep 2007 22:10:42 +0300 Artem Bityutskiy <dedekind@yandex.ru> wrote:

> Andrew Morton wrote:
> > I'd have thought that a suitable wrapper around a suitably-modified
> > sync_sb_inodes() would be appropriate for both filesystems?
> 
> Ok, I've modified sync_inodes_sb() so that I can pass it my own wbc,
> where I set wcb->nr_to_write = 20. It gives me _exactly_ what I want.
> It just flushes a bit more then 20 pages and returns. I use
> WB_SYNC_ALL. Great!

ok..

> Now I would like to understand why it works :-) To my surprise, it
> does not deadlock! I call it from ->prepare_write where I'm holding
> i_mutex, and it works just fine. It calls ->writepage() without trying
> to lock i_mutex! This looks like some witchcraft for me.

writepage under i_mutex is commonly done on the
sys_write->alloc_pages->direct-reclaim path.  It absolutely has to work,
and you'll be fine relying upon that.

However ->prepare_write() is called with the page locked, so you are
vulnerable to deadlocks there.  I suspect you got lucky because the page
which you're holding the lock on is not dirty in your testing.  But in
other applications (eg: 1k blocksize ext2/3/4) the page _can_ be dirty
while we're trying to allocate more blocks for it, in which case the
lock_page() deadlock can happen.

One approach might be to add another flag to writeback_control telling
write_cache_pages() to skip locked pages.  Or even put a page* into
wrietback_control and change it to skip *this* page.

> This means that if I'm in the middle of an operation or ino #X, I own
> its i_mutex, but not I_LOCK, I can be preempted and ->writepage can
> be called for a dirty page belonging to this inode #X?

yup.  Or another CPU can do the same.

> I haven't seen
> this in practice and I do not believe this may happen. Why?

Perhaps a heavier workload is needed.

There is code in the VFS which tries to prevent lots of CPUs from getting
in and fighting with each other (see writeback_acquire()) which will have
the effect of serialising things for some extent.  But writeback_acquire()
is causing scalability problems on monster IO systems and might be removed,
and it is only a partial thing - there are other ways in which concurrent
writeout can occur (fsync, sync, page reclaim, ...)

> Could you or someone please give me a hint what exactly
> inode->i_flags & I_LOCK protects?

err, it's basically an open-coded mutex via which one thread can get
exclusive access to some parts of an inode's internals.  Perhaps it could
literally be replaced with a mutex.  Exactly what I_LOCK protects has not
been documented afaik.  That would need to be reverse engineered :(

> What is its relationship to i_mutex?

On a regular file i_mutex is used mainly for protection of the data part of
the file, although it gets borrowed for other things, like protecting f_pos
of all the inode's file*'s.  I_LOCK is used to serialise access to a few
parts of the inode itself.

next prev parent reply	other threads:[~2007-09-29 20:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-28  9:16 Write-back from inside FS - need suggestions Artem Bityutskiy
2007-09-28 10:29 ` Andrew Morton
2007-09-29  9:56 ` Artem Bityutskiy
2007-09-29 10:39   ` Andrew Morton
2007-09-29 10:44     ` Artem Bityutskiy
2007-09-29 19:10     ` Artem Bityutskiy
2007-09-29 20:00       ` Andrew Morton [this message]
2007-09-30  8:40         ` Artem Bityutskiy
2007-09-30 20:24         ` Jörn Engel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070929130011.c3a11139.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=dedekind@yandex.ru \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.