From: Baokun Li <libaokun1@huawei.com>
To: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>, <tytso@mit.edu>,
<adilger.kernel@dilger.ca>, <linux-kernel@vger.kernel.org>,
<yi.zhang@huawei.com>, <yangerkun@huawei.com>,
<libaokun@huaweicloud.com>
Subject: Re: [PATCH 3/5] ext4: abort journal on data writeback failure if in data_err=abort mode
Date: Wed, 8 Jan 2025 22:44:42 +0800 [thread overview]
Message-ID: <0820379d-7ed2-4aff-a243-0c92957331a6@huawei.com> (raw)
In-Reply-To: <dfoxg4aaolu6wknvh4644acbo3pvbtacwiztianjaol7zuf7vb@hbb7x2zitvwf>
Hello!
On 2025/1/8 21:43, Jan Kara wrote:
> On Wed 08-01-25 11:43:08, Baokun Li wrote:
>> On 2025/1/6 22:32, Jan Kara wrote:
>>>> But as you said, we don't track overwrite writes for performance reasons.
>>>> But compared to the poor performance of journal_data and the risk of the
>>>> drop cache exposing stale, not being able to sense data errors on overwrite
>>>> writes is acceptable.
>>>>
>>>> After enabling ‘data_err=abort’ in dioread_nolock mode, after drop_cache
>>>> or remount, the user will not see the unexpected all-zero data in the
>>>> unwritten area, but rather the earlier consistent data, and the data in
>>>> the file is trustworthy, at the cost of some trailing data.
>>>>
>>>> On the other hand, adding a new written extents and converting an
>>>> unwritten extents to written both expose the data to the user, so the user
>>>> is concerned about whether the data is correct at that point.
>>>>
>>>> In general, I think we can update the semantics of “data_err=abort” to,
>>>> “Abort the journal if the file fails to write back data on extended writes
>>>> in ORDERED mode”. Do you have any thoughts on this?
>>> I agree it makes sense to make the semantics of data_err=abort more
>>> obvious. Based on the usecase you've described - i.e., rather take the
>>> filesystem down on write IO error than risk returning old data later - it
>>> would make sense to me to also do this on direct IO writes.
>> Okay, I will update the semantics of data_err=abort in the next version.
>> For direct I/O writes, I think we don't need it because users can
>> perceive errors in time.
> So I agree that direct IO users will generally notice the IO error so the
> chances for bugs due to missing the IO error is low. But I think the
> question is really the other way around: Is there a good reason to make
> direct IO writes different? Because if I as a sysadmin want to secure a
> system from IO error handling bugs, then having to think whether some
> application uses direct IO or not is another nuissance. Why should I be
> bothered?
This is not quite right. Regardless of whether it is a BIO write or a DIO
write, users will check the return value of the write operation, because
errors can occur not only when data is written to disk.
It's just that when a DIO write returns successfully, users can be sure
that the data has been written to the disk.
However, when a BIO write returns successfully, it only means that the
data has been copied into the buffer. Whether it has been successfully
written back to the disk is unknown to the user.
That's why we need data_err=abort to ensure that users are aware when the
page writeback fails and to prevent data corruption from spreading.
>>> Also I would do
>>> this regardless of data=writeback/ordered/journalled mode because although
>>> users wanting data_err=abort behavior will also likely want the guarantees
>>> of data=ordered mode, these are two different things
>> For data=journal mode, the journal itself will abort when data is abnormal.
>> However, as you pointed out, the above bug may cause errors to be missed.
>> Therefore, we can perform this check by default for journaled files.
>>> and I can imagine use
>>> cases for setups with data=writeback and data_err=abort as well (e.g. for
>>> scratch filesystems which get recreated on each system startup).
>> Users using data=writeback often do not care about data consistency.
>> I did not understand your example. Could you please explain it in detail?
> Well, they don't care about data consistency after a crash. But they
> usually do care about data consistency while the system is running. And
> unhandled IO errors can lead to data consistency problems without crashing
> the system (for example if writeback fails and page gets evicted from
> memory later, you have lost the new data and may see old version of it).
I see your point. I concur that it is indeed meaningful for
data_err=abort to be supported in data=writeback mode.
Thank you for your explanation!
> And I see data_err=abort as a way to say: "I don't trust my applications to
> handle IO errors well. Rather take the filesystem down in that case than
> risk data consistency issues".
>
> Honza
I still prefer to think of this as a supplement for users not being able
to perceive page writeback in a timely manner. The fsync operation is
complex, requires frequent waiting, and may have omissions.
In addition, because ext4_end_bio() runs in interrupt context, we can't
abort the journal directly there due to potential locking issues.
Instead, we now add write-back error checks and journal abortion logic
to ext4_end_io_end(), which is called by a kworker during unwritten
extent conversion.
Consequently, for modes that don't support unwritten extents (e.g.,
nodelalloc, journal_data, see ext4_should_dioread_nolock()), only the
check in journal_submit_data_buffers() will be effective. Should we
call the kworker for all files in ext4_end_bio()?
Thanks again!
Regards,
Baokun
next prev parent reply other threads:[~2025-01-08 14:44 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-20 6:07 [PATCH 0/5] ext4: fix issues caused by data write-back failures libaokun
2024-12-20 6:07 ` [PATCH 1/5] ext4: replace opencoded ext4_end_io_end() in ext4_put_io_end() libaokun
2024-12-20 10:22 ` Markus Elfring
2024-12-20 10:26 ` Jan Kara
2024-12-20 6:07 ` [PATCH 2/5] ext4: do not convert the unwritten extents if data writeback fails libaokun
2024-12-20 10:28 ` Jan Kara
2024-12-20 6:07 ` [PATCH 3/5] ext4: abort journal on data writeback failure if in data_err=abort mode libaokun
2024-12-20 10:36 ` Jan Kara
2024-12-20 13:39 ` Baokun Li
2025-01-06 14:32 ` Jan Kara
2025-01-08 3:43 ` Baokun Li
2025-01-08 13:43 ` Jan Kara
2025-01-08 14:44 ` Baokun Li [this message]
2025-01-08 15:28 ` Jan Kara
2025-01-09 2:45 ` Baokun Li
2024-12-20 6:07 ` [PATCH 4/5] ext4: remove unused member 'i_unwritten' from 'ext4_inode_info' libaokun
2024-12-20 11:05 ` Jan Kara
2024-12-20 6:07 ` [PATCH 5/5] ext4: pack holes in ext4_inode_info libaokun
2024-12-20 11:06 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0820379d-7ed2-4aff-a243-0c92957331a6@huawei.com \
--to=libaokun1@huawei.com \
--cc=adilger.kernel@dilger.ca \
--cc=jack@suse.cz \
--cc=libaokun@huaweicloud.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox