From: "Stephen C. Tweedie" <sct@redhat.com>
To: Marcelo Tosatti <marcelo@conectiva.com.br>
Cc: "Stephen C. Tweedie" <sct@redhat.com>,
Linus Torvalds <torvalds@transmeta.com>,
Alexander Viro <viro@math.psu.edu>,
lkml <linux-kernel@vger.kernel.org>
Subject: Re: generic_osync_inode/ext2_fsync_inode still not safe
Date: Fri, 20 Apr 2001 21:27:27 +0100 [thread overview]
Message-ID: <20010420212727.B1042@redhat.com> (raw)
In-Reply-To: <20010417140928.C2505@redhat.com> <Pine.LNX.4.21.0104180632490.4382-100000@freak.distro.conectiva>
In-Reply-To: <Pine.LNX.4.21.0104180632490.4382-100000@freak.distro.conectiva>; from marcelo@conectiva.com.br on Wed, Apr 18, 2001 at 06:45:40AM -0300
Hi,
On Wed, Apr 18, 2001 at 06:45:40AM -0300, Marcelo Tosatti wrote:
> As far as I can see, you cannot guarantee that an inode which is unlocked
> _and_ clean (accordingly to the inode->i_state) is safely on disk.
>
> The reason for that are calls to sync_one() which write the inode
> asynchronously:
>
> sync_one(struct inode *inode, int sync) {
> ...
> /* Don't write the inode if only I_DIRTY_PAGES was set */
> if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC))
> write_inode(inode, sync); <-------------
...
> inode->i_state &= ~I_LOCK;
Right --- nasty. But not _too_ nasty, there's a moderately easy way
of dealing with this.
Basically we can't trust the i_state flag for this purpose if we are
allowing async IO to happen on inodes without having proper IO
completion callbacks marking the inodes as unlocked once they are firm
on disk. However, in this case the filesystem itself will know which
underlying buffer_head contains the inode data, and can check to see
if that buffer is locked and perform a wait if necessary.
This is somewhat unpleasant in that it may sometimes cause unnecessary
false sharing, given that we have multiple inodes in an inode block.
However, I can't see any simple way around that.
Linus, do you have any preference for how to deal with this? We can
either do it by adding a s_op->wait_inode() function to complement
write_inode(), and have a wait_inode() implementation in block-device
filesystems which does the buffer lookup and wait; or we can push that
whole logic into the filesystems, so that the I_DIRTY check is removed
from the VFS mid-layer altogether and the filesystem is responsible
for testing both the inode and buffer locked state when we try to wait
for outstanding inode IO to complete.
The second method is a bit cleaner conceptually but it results in more
code duplication.
Cheers,
Stephen
prev parent reply other threads:[~2001-04-20 20:33 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-04-14 10:24 generic_osync_inode/ext2_fsync_inode still not safe Marcelo Tosatti
2001-04-17 13:09 ` Stephen C. Tweedie
2001-04-18 9:45 ` Marcelo Tosatti
2001-04-20 20:27 ` Stephen C. Tweedie [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20010420212727.B1042@redhat.com \
--to=sct@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo@conectiva.com.br \
--cc=torvalds@transmeta.com \
--cc=viro@math.psu.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox