From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: Jan Kara <jack@suse.cz>, Ted Tso <tytso@mit.edu>,
linux-ext4@vger.kernel.org,
syzbot+47479b71cdfc78f56d30@syzkaller.appspotmail.com
Subject: Re: [PATCH] ext4: Fix warning in ext4_dio_write_end_io()
Date: Thu, 23 Nov 2023 15:17:28 +0530 [thread overview]
Message-ID: <87ttpcrebz.fsf@doe.com> (raw)
In-Reply-To: <20231123084954.oegpgspqm37nnz2a@quack3>
Jan Kara <jack@suse.cz> writes:
> On Thu 23-11-23 12:37:03, Ritesh Harjani wrote:
>> Jan Kara <jack@suse.cz> writes:
>>
>> > The syzbot has reported that it can hit the warning in
>> > ext4_dio_write_end_io() because i_size < i_disksize. Indeed the
>> > reproducer creates a race between DIO IO completion and truncate
>> > expanding the file and thus ext4_dio_write_end_io() sees an inconsistent
>> > inode state where i_disksize is already updated but i_size is not
>> > updated yet. Since we are careful when setting up DIO write and consider
>> > it extending (and thus performing the IO synchronously with i_rwsem held
>> > exclusively) whenever it goes past either of i_size or i_disksize, we
>> > can use the same test during IO completion without risking entering
>> > ext4_handle_inode_extension() without i_rwsem held. This way we make it
>> > obvious both i_size and i_disksize are large enough when we report DIO
>> > completion without relying on unreliable WARN_ON.
>>
>> Does it make sense to add this in ext4_handle_inode_extension()?
>> WARN_ON_ONCE(!inode_is_locked(inode));
>> Ohk, we already have "lockdep_assert_held_write(&inode->i_rwsem)" so
>> hopefully it can catch via lockdep.
>
> Exactly.
>
>> So, IIUC, the WARN happened when we were doing a non-extending
>> AIO-DIO write which was racing with truncate trying to expand the file
>> size. Because only then the DIO completion will not have i_rwsem held
>> which can race with truncate. Truncate since it is expanding the file
>> size, will not use inode_dio_wait() (since no block allocations).
>>
>> Is this understanding correct?
>
> Yes, correct.
Thanks Jan,
Also ext4_inode_extension_cleanup() function can take care of deleting
the inode from the orphan list in case if there is a race with truncate
which extended made both i_disksize and inode->i_size and the DIO
completion couldn't call ext4_handle_inode_extension(), right?
In that case, does it make sense to update a comment here too?
@@ -350,7 +350,10 @@ static void ext4_inode_extension_cleanup(struct inode *inode, ssize_t count)
}
/*
* If i_disksize got extended due to writeback of delalloc blocks while
- * the DIO was running we could fail to cleanup the orphan list in
+ * the DIO was running, or
+ * If i_disksize and inode->i_size both got extened during truncate
+ * which raced with DIO completion,
+ * In both such cases, we could fail to cleanup the orphan list in
* ext4_handle_inode_extension(). Do it now.
*/
if (!list_empty(&EXT4_I(inode)->i_orphan) && inode->i_nlink) {
-ritesh
>
> Honza
>
>>
>> >
>> > Reported-by: syzbot+47479b71cdfc78f56d30@syzkaller.appspotmail.com
>> > Fixes: 91562895f803 ("ext4: properly sync file size update after O_SYNC direct IO")
>> > Signed-off-by: Jan Kara <jack@suse.cz>
>> > ---
>> > fs/ext4/file.c | 7 ++++---
>> > 1 file changed, 4 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/fs/ext4/file.c b/fs/ext4/file.c
>> > index 0166bb9ca160..ba497aabdd1e 100644
>> > --- a/fs/ext4/file.c
>> > +++ b/fs/ext4/file.c
>> > @@ -386,10 +386,11 @@ static int ext4_dio_write_end_io(struct kiocb *iocb, ssize_t size,
>> > * blocks. But the code in ext4_iomap_alloc() is careful to use
>> > * zeroed/unwritten extents if this is possible; thus we won't leave
>> > * uninitialized blocks in a file even if we didn't succeed in writing
>> > - * as much as we intended.
>> > + * as much as we intended. Also we can race with truncate or write
>> > + * expanding the file so we have to be a bit careful here.
>> > */
>> > - WARN_ON_ONCE(i_size_read(inode) < READ_ONCE(EXT4_I(inode)->i_disksize));
>> > - if (pos + size <= READ_ONCE(EXT4_I(inode)->i_disksize))
>> > + if (pos + size <= READ_ONCE(EXT4_I(inode)->i_disksize) &&
>> > + pos + size <= i_size_read(inode))
>> > return size;
>> > return ext4_handle_inode_extension(inode, pos, size);
>> > }
>> > --
>> > 2.35.3
>>
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
next prev parent reply other threads:[~2023-11-23 9:47 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-22 18:14 [PATCH] ext4: Fix warning in ext4_dio_write_end_io() Jan Kara
2023-11-23 7:07 ` Ritesh Harjani
2023-11-23 8:49 ` Jan Kara
2023-11-23 9:47 ` Ritesh Harjani [this message]
2023-11-30 9:55 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ttpcrebz.fsf@doe.com \
--to=ritesh.list@gmail.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=syzbot+47479b71cdfc78f56d30@syzkaller.appspotmail.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox