From: Jan Kara <jack@suse.cz>
To: Lukas Czerner <lczerner@redhat.com>
Cc: Jan Kara <jack@suse.cz>,
linux-ext4@vger.kernel.org, jlayton@kernel.org, tytso@mit.edu,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 2/2] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE
Date: Fri, 29 Jul 2022 13:18:40 +0200 [thread overview]
Message-ID: <20220729111840.a7qmh3vjtr662tvx@quack3> (raw)
In-Reply-To: <20220729085219.3mbn7vrrdsxvdcyf@fedora>
On Fri 29-07-22 10:52:19, Lukas Czerner wrote:
> On Thu, Jul 28, 2022 at 06:53:32PM +0200, Jan Kara wrote:
> > On Thu 28-07-22 15:39:14, Lukas Czerner wrote:
> > > Currently the I_DIRTY_TIME will never get set if the inode already has
> > > I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's
> > > true, however ext4 will only update the on-disk inode in
> > > ->dirty_inode(), not on actual writeback. As a result if the inode
> > > already has I_DIRTY_INODE state by the time we get to
> > > __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled
> > > into on-disk inode and will not get updated until the next I_DIRTY_INODE
> > > update, which might never come if we crash or get a power failure.
> > >
> > > The problem can be reproduced on ext4 by running xfstest generic/622
> > > with -o iversion mount option. Fix it by setting I_DIRTY_TIME even if
> > > the inode already has I_DIRTY_INODE.
>
> Hi Jan,
>
> thanks for th review.
>
> >
> > As a datapoint I've checked and XFS has the very same problem as ext4.
>
> Very interesting, I did look at xfs when I was debugging this problem
> and wans't able to tell whether they have the same problem or not, but
> it certainly can't be reproduced by generic/622. Or at least I can't
> reproduce it on XFS.
>
> So I wonder what is XFS doing differently in that case.
OK, that's a bit curious but xfs has xfs_fs_dirty_inode() that's there
exactly to update timestamps when lazytime period expires. So in theory it
seems possible we lose the timestamp update.
> > > Also clear the I_DIRTY_TIME after ->dirty_inode() otherwise it may never
> > > get cleared.
> > >
> > > Signed-off-by: Lukas Czerner <lczerner@redhat.com>
> > > ---
> > > fs/fs-writeback.c | 18 +++++++++++++++---
> > > 1 file changed, 15 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > > index 05221366a16d..174f01e6b912 100644
> > > --- a/fs/fs-writeback.c
> > > +++ b/fs/fs-writeback.c
> > > @@ -2383,6 +2383,11 @@ void __mark_inode_dirty(struct inode *inode, int flags)
> > >
> > > /* I_DIRTY_INODE supersedes I_DIRTY_TIME. */
> > > flags &= ~I_DIRTY_TIME;
> > > + if (inode->i_state & I_DIRTY_TIME) {
> > > + spin_lock(&inode->i_lock);
> > > + inode->i_state &= ~I_DIRTY_TIME;
> > > + spin_unlock(&inode->i_lock);
> > > + }
> >
> > Hum, so this is a bit dangerous because inode->i_state may be inconsistent
> > with the writeback list inode is queued in (wb->b_dirty_time) and these two
> > are supposed to be in sync. So I rather think we need to make sure we go
> > through the full round of 'update flags and writeback list' below in case
> > we need to clear I_DIRTY_TIME from inode->i_state.
>
> Ok, so we're clearing I_DIRTY_TIME in __ext4_update_other_inode_time()
> which will opportunistically update the time fields for inodes in the
> same block as the inode we're doing an update for via
> ext4_do_update_inode(). Don't we also need to rewire that differently?
>
> XFS is also clearing it on it's own in log code, but I can't tell if it
> has the same problem as you describe here.
Yes, we'll possibly have clean inodes still on wb->b_dirty_time list.
Checking the code, this should be safe in the end.
But thinking more about the possible races these two places clearing
I_DIRTY_TIME are safe because we copy timestamps to on-disk inode after
clearing I_DIRTY_TIME. But your clearing of I_DIRTY_TIME in
__mark_inode_dirty() could result in loosing timestamp update if it races
in the wrong way (basically the bug you're trying to fix would remain
unfixed).
Hum, thinking about it, even clearing of I_DIRTY_TIME later in
__mark_inode_dirty is problematic. There is still a race like:
CPU1 CPU2
__mark_inode_dirty(inode, I_DIRTY_TIME)
sets I_DIRTY_TIME in inode->i_state
__mark_inode_dirty(inode, I_DIRTY_SYNC)
->dirty_inode() - copies timestamps
__mark_inode_dirty(inode, I_DIRTY_TIME)
I_DIRTY_TIME already set -> bail
...
if (flags & I_DIRTY_INODE)
inode->i_state &= ~I_DIRTY_TIME;
and we have just lost the second timestamp update.
To fix this we'd need to clear I_DIRTY_TIME in inode->i_state before
calling ->dirty_inode() but that clashes with XFS' usage of ->dirty_inode
which uses I_DIRTY_TIME in inode->i_state to detect that timestamp update
is requested. I think we could do something like:
if (flags & I_DIRTY_INODE) {
/* Inode timestamp update will piggback on this dirtying */
if (inode->i_state & I_DIRTY_TIME) {
spin_lock(&inode->i_lock);
if (inode->i_state & I_DIRTY_TIME) {
inode->i_state &= ~I_DIRTY_TIME;
flags |= I_DIRTY_TIME;
}
spin_unlock(&inode->i_lock);
}
...
if (sb->s_op->dirty_inode)
sb->s_op->dirty_inode(inode,
flags & (I_DIRTY_INODE | I_DIRTY_TIME));
...
}
And then XFS could check for I_DIRTY_TIME in flags to detect what it needs
to do.
Hopefully now things are correct ;). Famous last words...
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2022-07-29 11:18 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-28 13:39 [PATCH 1/2] ext4: don't increase iversion counter for ea_inodes Lukas Czerner
2022-07-28 13:39 ` [PATCH 2/2] fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Lukas Czerner
2022-07-28 16:53 ` Jan Kara
2022-07-29 8:52 ` Lukas Czerner
2022-07-29 11:18 ` Jan Kara [this message]
2022-07-29 4:05 ` Eric Biggers
2022-07-29 8:54 ` Lukas Czerner
2022-07-28 15:52 ` [PATCH 1/2] ext4: don't increase iversion counter for ea_inodes Jan Kara
2022-08-02 11:58 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220729111840.a7qmh3vjtr662tvx@quack3 \
--to=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=lczerner@redhat.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox