public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Artem Bityutskiy <dedekind@yandex.ru>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Write-back from inside FS - need suggestions
Date: Sat, 29 Sep 2007 13:00:11 -0700	[thread overview]
Message-ID: <20070929130011.c3a11139.akpm@linux-foundation.org> (raw)
In-Reply-To: <46FEA332.9090904@yandex.ru>

On Sat, 29 Sep 2007 22:10:42 +0300 Artem Bityutskiy <dedekind@yandex.ru> wrote:

> Andrew Morton wrote:
> > I'd have thought that a suitable wrapper around a suitably-modified
> > sync_sb_inodes() would be appropriate for both filesystems?
> 
> Ok, I've modified sync_inodes_sb() so that I can pass it my own wbc,
> where I set wcb->nr_to_write = 20. It gives me _exactly_ what I want.
> It just flushes a bit more then 20 pages and returns. I use
> WB_SYNC_ALL. Great!

ok..

> Now I would like to understand why it works :-) To my surprise, it
> does not deadlock! I call it from ->prepare_write where I'm holding
> i_mutex, and it works just fine. It calls ->writepage() without trying
> to lock i_mutex! This looks like some witchcraft for me.

writepage under i_mutex is commonly done on the
sys_write->alloc_pages->direct-reclaim path.  It absolutely has to work,
and you'll be fine relying upon that.

However ->prepare_write() is called with the page locked, so you are
vulnerable to deadlocks there.  I suspect you got lucky because the page
which you're holding the lock on is not dirty in your testing.  But in
other applications (eg: 1k blocksize ext2/3/4) the page _can_ be dirty
while we're trying to allocate more blocks for it, in which case the
lock_page() deadlock can happen.

One approach might be to add another flag to writeback_control telling
write_cache_pages() to skip locked pages.  Or even put a page* into
wrietback_control and change it to skip *this* page.

> This means that if I'm in the middle of an operation or ino #X, I own
> its i_mutex, but not I_LOCK, I can be preempted and ->writepage can
> be called for a dirty page belonging to this inode #X?

yup.  Or another CPU can do the same.

> I haven't seen
> this in practice and I do not believe this may happen. Why?

Perhaps a heavier workload is needed.

There is code in the VFS which tries to prevent lots of CPUs from getting
in and fighting with each other (see writeback_acquire()) which will have
the effect of serialising things for some extent.  But writeback_acquire()
is causing scalability problems on monster IO systems and might be removed,
and it is only a partial thing - there are other ways in which concurrent
writeout can occur (fsync, sync, page reclaim, ...)

> Could you or someone please give me a hint what exactly
> inode->i_flags & I_LOCK protects?

err, it's basically an open-coded mutex via which one thread can get
exclusive access to some parts of an inode's internals.  Perhaps it could
literally be replaced with a mutex.  Exactly what I_LOCK protects has not
been documented afaik.  That would need to be reverse engineered :(

> What is its relationship to i_mutex?

On a regular file i_mutex is used mainly for protection of the data part of
the file, although it gets borrowed for other things, like protecting f_pos
of all the inode's file*'s.  I_LOCK is used to serialise access to a few
parts of the inode itself.


  reply	other threads:[~2007-09-29 20:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-28  9:16 Write-back from inside FS - need suggestions Artem Bityutskiy
2007-09-28 10:29 ` Andrew Morton
2007-09-29  9:56 ` Artem Bityutskiy
2007-09-29 10:39   ` Andrew Morton
2007-09-29 10:44     ` Artem Bityutskiy
2007-09-29 19:10     ` Artem Bityutskiy
2007-09-29 20:00       ` Andrew Morton [this message]
2007-09-30  8:40         ` Artem Bityutskiy
2007-09-30 20:24         ` Jörn Engel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070929130011.c3a11139.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=dedekind@yandex.ru \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox