public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
To: Jan Kara <jack@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>, Jan Kara <jack@suse.com>,
	Tao Ma <boyu.mt@taobao.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Eric Biggers <ebiggers@google.com>,
	linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-dev@igalia.com,
	syzbot+0c89d865531d053abb2d@syzkaller.appspotmail.com,
	stable@vger.kernel.org
Subject: Re: [PATCH] ext4: inline: do not convert when writing to memory map
Date: Mon, 26 May 2025 11:10:42 -0300	[thread overview]
Message-ID: <aDR2Yvy39Q-XgeAB@quatroqueijos.cascardo.eti.br> (raw)
In-Reply-To: <ixlyfqaobk4whctod5wwhusqeeduqxamni6zkxl2wdlbtcyms2@intsywwjfv25>

On Mon, May 26, 2025 at 03:43:31PM +0200, Jan Kara wrote:
> On Wed 21-05-25 18:52:03, Thadeu Lima de Souza Cascardo wrote:
> > On Tue, May 20, 2025 at 10:57:08AM -0400, Theodore Ts'o wrote:
> > > On Mon, May 19, 2025 at 07:42:46AM -0300, Thadeu Lima de Souza Cascardo wrote:
> > > > inline data handling has a race between writing and writing to a memory
> > > > map.
> > > > 
> > > > When ext4_page_mkwrite is called, it calls ext4_convert_inline_data, which
> > > > destroys the inline data, but if block allocation fails, restores the
> > > > inline data. In that process, we could have:
> > > > 
> > > > CPU1					CPU2
> > > > destroy_inline_data
> > > > 					write_begin (does not see inline data)
> > > > restory_inline_data
> > > > 					write_end (sees inline data)
> > > > 
> > > > The conversion inside ext4_page_mkwrite was introduced at commit
> > > > 7b4cc9787fe3 ("ext4: evict inline data when writing to memory map"). This
> > > > fixes a documented bug in the commit message, which suggests some
> > > > alternatives fixes.
> > > 
> > > Your fix just reverts commit 7b4cc9787fe3, and removes the BUG_ON.
> > > While this is great for shutting up the syzbot report, but it causes
> > > file writes to an inline data file via a mmap to never get written
> > > back to the storage device.  So you are replacing BUG_ON that can get
> > > triggered on a race condition in case of a failed block allocation,
> > > with silent data corruption.   This is not an improvement.
> > > 
> > > Thanks for trying to address this, but I'm not going to accept your
> > > proposed fix.
> > > 
> > >      	    	 	       	       - Ted
> > 
> > Hi, Ted.
> > 
> > I am trying to understand better the circumstances where the data loss
> > might occur with the fix, but might not occur without the fix. Or, even if
> > they occur either way, such that I can work on a better/proper fix.
> > 
> > Right now, if ext4_convert_inline_data (called from ext4_page_mkwrite)
> > fails with ENOSPC, the memory access will lead to a SIGBUS. The same will
> > happen without the fix, if there are no blocks available.
> > 
> > Now, without ext4_convert_inline_data, blocks will be allocated by
> > ext4_page_mkwrite and written by ext4_do_writepages. Are you concerned
> > about a failure between the clearing of the inode data and the writing of
> > the block in ext4_do_writepages?
> > 
> > Or are you concerned about a potential race condition when allocating
> > blocks?
> > 
> > Which of these cannot happen today with the code as is? If I understand
> > correctly, the inline conversion code also calls ext4_destroy_inline_data
> > before allocating and writing to blocks.
> > 
> > Thanks a lot for the review and guidance.
> 
> So I'm not sure what Ted was exactly worried about because writeback code
> should normally allocate underlying blocks for writeout of the mmaped page
> AFAICT. But the problem I can see is that clearing
> EXT4_STATE_MAY_INLINE_DATA requires i_rwsem held as otherwise we may be
> racing with e.g. write(2) and switching EXT4_STATE_MAY_INLINE_DATA in the
> middle of the write will cause bad things (inconsistency between how
> write_begin() and write_end() callbacks behave).
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

Thanks, Jan.

I later noticed as well that writepages is not holding the inode lock
either, so there would be a potential for race condition there as well.

I have sent a v2 that I find would not have this problem. But we should
probably cleanup the handling of inline data in writepages as a followup.

Cascardo.

      reply	other threads:[~2025-05-26 14:10 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-19 10:42 [PATCH] ext4: inline: do not convert when writing to memory map Thadeu Lima de Souza Cascardo
2025-05-20 14:57 ` Theodore Ts'o
2025-05-21 21:52   ` Thadeu Lima de Souza Cascardo
2025-05-26 13:43     ` Jan Kara
2025-05-26 14:10       ` Thadeu Lima de Souza Cascardo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aDR2Yvy39Q-XgeAB@quatroqueijos.cascardo.eti.br \
    --to=cascardo@igalia.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=boyu.mt@taobao.com \
    --cc=ebiggers@google.com \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=kernel-dev@igalia.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+0c89d865531d053abb2d@syzkaller.appspotmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox