Re: [PATCH 3/5] ext4: abort journal on data writeback failure if in data_err=abort mode

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

From: Baokun Li <libaokun1@huawei.com>
To: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>, <tytso@mit.edu>,
	<adilger.kernel@dilger.ca>, <linux-kernel@vger.kernel.org>,
	<yi.zhang@huawei.com>, <yangerkun@huawei.com>,
	<libaokun@huaweicloud.com>
Subject: Re: [PATCH 3/5] ext4: abort journal on data writeback failure if in data_err=abort mode
Date: Fri, 20 Dec 2024 21:39:39 +0800	[thread overview]
Message-ID: <47a46888-064f-4c7d-a554-30ba49c45bab@huawei.com> (raw)
In-Reply-To: <20241220103617.xkqmwkmk5inlq3dz@quack3>

On 2024/12/20 18:36, Jan Kara wrote:
> On Fri 20-12-24 14:07:55, libaokun@huaweicloud.com wrote:
>> From: Baokun Li <libaokun1@huawei.com>
>>
>> If we mount an ext4 fs with data_err=abort option, it should abort on
>> file data write error. But if the extent is unwritten, we won't add a
>> JI_WAIT_DATA bit to the inode, so jbd2 won't wait for the inode's data
>> to be written back and check the inode mapping for errors. The data
>> writeback failures are not sensed unless the log is watched or fsync
>> is called.
>>
>> Therefore, when data_err=abort is enabled, the journal is aborted when
>> an I/O error is detected in ext4_end_io_end() to make users who are
>> concerned about the contents of the file happy.
>>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
Hi Honza,

Thank you for your review and feedback!
> I'm not opposed to this change but I think we should better define the
> expectations around data_err=abort.
Totally agree, the definition of this option is a bit vague right now.
It's semantics have changed implicitly with iterations of the version.

Originally in v2.6.28-rc1 commit 5bf5683a33f3 (“ext4: add an option to
control error handling on file data”) introduced “data_err=abort”, the
implementation of this mount option relies on JBD2_ABORT_ON_ SYNCDATA_ERR,
and this flag takes effect when the journal_finish_inode_data_buffers()
function returns an error. At this point in ext4_write_end(), in ordered
mode, we add the inode to the ordered data list, whether it is an append
write or an overwrite write. Therefore all write failures in ordered mode
will abort the journal. This is also the semantics in the documentation
- “Abort the journal if an error occurs in a file data buffer in ordered
mode.”.

Until commit 06bd3c36a733 (“ext4: fix data exposure after a crash”) in
v4.7-rc1, in order to avoid stale data, we will only add inodes to the
ordered data list when attaching freshly allocated blocks to inode
using a written extent. Since then, only written write (aka dioread_lock)
failures in ordered mode will abort the journal, and “data_err=abort” in
unwritten mode will no longer take effect.

There are more historical changes to the relevant logic, so please
correct me if I'm missing something.
> For example the dependency on
> data=ordered is kind of strange and the current semantics of data_err=abort
> are hard to understand for admins (since they are mostly implementation
> defined). For example if IO error happens on data overwrites, the
> filesystem will not be aborted because we don't bother tracking such data
> as ordered (for performance reasons). Since you've apparently talked to people
> using this option: What are their expectations about the option?
>
> 								Honza
As was the original intent of introducing "data_err=abort", users who
use this option are concerned about corruption of critical data spreading
silently, that is, they are concerned that the data actually read does
not match the data written.

But as you said, we don't track overwrite writes for performance reasons.
But compared to the poor performance of journal_data and the risk of the
drop cache exposing stale, not being able to sense data errors on overwrite
writes is acceptable.

After enabling ‘data_err=abort’ in dioread_nolock mode, after drop_cache
or remount, the user will not see the unexpected all-zero data in the
unwritten area, but rather the earlier consistent data, and the data in
the file is trustworthy, at the cost of some trailing data.

On the other hand, adding a new written extents and converting an
unwritten extents to written both expose the data to the user, so the user
is concerned about whether the data is correct at that point.

In general, I think we can update the semantics of “data_err=abort” to,
“Abort the journal if the file fails to write back data on extended writes
in ORDERED mode”. Do you have any thoughts on this?


Thanks,
Baokun

next prev parent reply	other threads:[~2024-12-20 13:39 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-20  6:07 [PATCH 0/5] ext4: fix issues caused by data write-back failures libaokun
2024-12-20  6:07 ` [PATCH 1/5] ext4: replace opencoded ext4_end_io_end() in ext4_put_io_end() libaokun
2024-12-20 10:22   ` Markus Elfring
2024-12-20 10:26   ` Jan Kara
2024-12-20  6:07 ` [PATCH 2/5] ext4: do not convert the unwritten extents if data writeback fails libaokun
2024-12-20 10:28   ` Jan Kara
2024-12-20  6:07 ` [PATCH 3/5] ext4: abort journal on data writeback failure if in data_err=abort mode libaokun
2024-12-20 10:36   ` Jan Kara
2024-12-20 13:39     ` Baokun Li [this message]
2025-01-06 14:32       ` Jan Kara
2025-01-08  3:43         ` Baokun Li
2025-01-08 13:43           ` Jan Kara
2025-01-08 14:44             ` Baokun Li
2025-01-08 15:28               ` Jan Kara
2025-01-09  2:45                 ` Baokun Li
2024-12-20  6:07 ` [PATCH 4/5] ext4: remove unused member 'i_unwritten' from 'ext4_inode_info' libaokun
2024-12-20 11:05   ` Jan Kara
2024-12-20  6:07 ` [PATCH 5/5] ext4: pack holes in ext4_inode_info libaokun
2024-12-20 11:06   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47a46888-064f-4c7d-a554-30ba49c45bab@huawei.com \
    --to=libaokun1@huawei.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=jack@suse.cz \
    --cc=libaokun@huaweicloud.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox