From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Jan Kara <jack@suse.cz>,
Zhang Yi <yi.zhang@huawei.com>,
Zhihao Cheng <chengzhihao1@huawei.com>,
Theodore Tso <tytso@mit.edu>
Subject: [PATCH 5.15 76/78] jbd2: recheck chechpointing non-dirty buffer
Date: Tue, 25 Jul 2023 12:47:07 +0200 [thread overview]
Message-ID: <20230725104454.221311267@linuxfoundation.org> (raw)
In-Reply-To: <20230725104451.275227789@linuxfoundation.org>
From: Zhang Yi <yi.zhang@huawei.com>
commit c2d6fd9d6f35079f1669f0100f05b46708c74b7f upstream.
There is a long-standing metadata corruption issue that happens from
time to time, but it's very difficult to reproduce and analyse, benefit
from the JBD2_CYCLE_RECORD option, we found out that the problem is the
checkpointing process miss to write out some buffers which are raced by
another do_get_write_access(). Looks below for detail.
jbd2_log_do_checkpoint() //transaction X
//buffer A is dirty and not belones to any transaction
__buffer_relink_io() //move it to the IO list
__flush_batch()
write_dirty_buffer()
do_get_write_access()
clear_buffer_dirty
__jbd2_journal_file_buffer()
//add buffer A to a new transaction Y
lock_buffer(bh)
//doesn't write out
__jbd2_journal_remove_checkpoint()
//finish checkpoint except buffer A
//filesystem corrupt if the new transaction Y isn't fully write out.
Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint()
have already handles waiting for buffers under IO and re-added new
transaction to complete commit, and it also removing cleaned buffers,
this makes sure the list will eventually get empty. So it's fine to
leave buffers on the t_checkpoint_list while flushing out and completely
stop using the t_checkpoint_io_list.
Cc: stable@vger.kernel.org
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Tested-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/jbd2/checkpoint.c | 102 ++++++++++++++-------------------------------------
1 file changed, 29 insertions(+), 73 deletions(-)
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -58,28 +58,6 @@ static inline void __buffer_unlink(struc
}
/*
- * Move a buffer from the checkpoint list to the checkpoint io list
- *
- * Called with j_list_lock held
- */
-static inline void __buffer_relink_io(struct journal_head *jh)
-{
- transaction_t *transaction = jh->b_cp_transaction;
-
- __buffer_unlink_first(jh);
-
- if (!transaction->t_checkpoint_io_list) {
- jh->b_cpnext = jh->b_cpprev = jh;
- } else {
- jh->b_cpnext = transaction->t_checkpoint_io_list;
- jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
- jh->b_cpprev->b_cpnext = jh;
- jh->b_cpnext->b_cpprev = jh;
- }
- transaction->t_checkpoint_io_list = jh;
-}
-
-/*
* Check a checkpoint buffer could be release or not.
*
* Requires j_list_lock
@@ -183,6 +161,7 @@ __flush_batch(journal_t *journal, int *b
struct buffer_head *bh = journal->j_chkpt_bhs[i];
BUFFER_TRACE(bh, "brelse");
__brelse(bh);
+ journal->j_chkpt_bhs[i] = NULL;
}
*batch_count = 0;
}
@@ -242,6 +221,11 @@ restart:
jh = transaction->t_checkpoint_list;
bh = jh2bh(jh);
+ /*
+ * The buffer may be writing back, or flushing out in the
+ * last couple of cycles, or re-adding into a new transaction,
+ * need to check it again until it's unlocked.
+ */
if (buffer_locked(bh)) {
get_bh(bh);
spin_unlock(&journal->j_list_lock);
@@ -287,28 +271,32 @@ restart:
}
if (!buffer_dirty(bh)) {
BUFFER_TRACE(bh, "remove from checkpoint");
- if (__jbd2_journal_remove_checkpoint(jh))
- /* The transaction was released; we're done */
+ /*
+ * If the transaction was released or the checkpoint
+ * list was empty, we're done.
+ */
+ if (__jbd2_journal_remove_checkpoint(jh) ||
+ !transaction->t_checkpoint_list)
goto out;
- continue;
+ } else {
+ /*
+ * We are about to write the buffer, it could be
+ * raced by some other transaction shrink or buffer
+ * re-log logic once we release the j_list_lock,
+ * leave it on the checkpoint list and check status
+ * again to make sure it's clean.
+ */
+ BUFFER_TRACE(bh, "queue");
+ get_bh(bh);
+ J_ASSERT_BH(bh, !buffer_jwrite(bh));
+ journal->j_chkpt_bhs[batch_count++] = bh;
+ transaction->t_chp_stats.cs_written++;
+ transaction->t_checkpoint_list = jh->b_cpnext;
}
- /*
- * Important: we are about to write the buffer, and
- * possibly block, while still holding the journal
- * lock. We cannot afford to let the transaction
- * logic start messing around with this buffer before
- * we write it to disk, as that would break
- * recoverability.
- */
- BUFFER_TRACE(bh, "queue");
- get_bh(bh);
- J_ASSERT_BH(bh, !buffer_jwrite(bh));
- journal->j_chkpt_bhs[batch_count++] = bh;
- __buffer_relink_io(jh);
- transaction->t_chp_stats.cs_written++;
+
if ((batch_count == JBD2_NR_BATCH) ||
- need_resched() ||
- spin_needbreak(&journal->j_list_lock))
+ need_resched() || spin_needbreak(&journal->j_list_lock) ||
+ jh2bh(transaction->t_checkpoint_list) == journal->j_chkpt_bhs[0])
goto unlock_and_flush;
}
@@ -322,38 +310,6 @@ restart:
goto restart;
}
- /*
- * Now we issued all of the transaction's buffers, let's deal
- * with the buffers that are out for I/O.
- */
-restart2:
- /* Did somebody clean up the transaction in the meanwhile? */
- if (journal->j_checkpoint_transactions != transaction ||
- transaction->t_tid != this_tid)
- goto out;
-
- while (transaction->t_checkpoint_io_list) {
- jh = transaction->t_checkpoint_io_list;
- bh = jh2bh(jh);
- if (buffer_locked(bh)) {
- get_bh(bh);
- spin_unlock(&journal->j_list_lock);
- wait_on_buffer(bh);
- /* the journal_head may have gone by now */
- BUFFER_TRACE(bh, "brelse");
- __brelse(bh);
- spin_lock(&journal->j_list_lock);
- goto restart2;
- }
-
- /*
- * Now in whatever state the buffer currently is, we
- * know that it has been written out and so we can
- * drop it from the list
- */
- if (__jbd2_journal_remove_checkpoint(jh))
- break;
- }
out:
spin_unlock(&journal->j_list_lock);
result = jbd2_cleanup_journal_tail(journal);
next prev parent reply other threads:[~2023-07-25 11:10 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-25 10:45 [PATCH 5.15 00/78] 5.15.123-rc1 review Greg Kroah-Hartman
2023-07-25 10:45 ` [PATCH 5.15 01/78] ALSA: hda/realtek - remove 3k pull low procedure Greg Kroah-Hartman
2023-07-25 10:45 ` [PATCH 5.15 02/78] ALSA: hda/realtek: Add quirk for Clevo NS70AU Greg Kroah-Hartman
2023-07-25 10:45 ` [PATCH 5.15 03/78] ALSA: hda/realtek: Enable Mute LED on HP Laptop 15s-eq2xxx Greg Kroah-Hartman
2023-07-25 10:45 ` [PATCH 5.15 04/78] keys: Fix linking a duplicate key to a keyrings assoc_array Greg Kroah-Hartman
2023-07-25 10:45 ` [PATCH 5.15 05/78] perf probe: Add test for regression introduced by switch to die_get_decl_file() Greg Kroah-Hartman
2023-07-25 10:45 ` [PATCH 5.15 06/78] btrfs: fix warning when putting transaction with qgroups enabled after abort Greg Kroah-Hartman
2023-07-25 10:45 ` [PATCH 5.15 07/78] fuse: revalidate: dont invalidate if interrupted Greg Kroah-Hartman
2023-07-25 10:45 ` [PATCH 5.15 08/78] btrfs: zoned: fix memory leak after finding block group with super blocks Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 09/78] fuse: ioctl: translate ENOSYS in outarg Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 10/78] selftests: tc: set timeout to 15 minutes Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 11/78] selftests: tc: add ct action kconfig dep Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 12/78] regmap: Drop initial version of maximum transfer length fixes Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 13/78] regmap: Account for register length in SMBus I/O limits Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 14/78] can: bcm: Fix UAF in bcm_proc_show() Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 15/78] selftests: tc: add ConnTrack procfs kconfig Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 16/78] drm/client: Fix memory leak in drm_client_target_cloned Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 17/78] drm/client: Fix memory leak in drm_client_modeset_probe Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 18/78] drm/amd/display: Disable MPC split by default on special asic Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 19/78] drm/amd/display: Keep PHY active for DP displays on DCN31 Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 20/78] ASoC: fsl_sai: Disable bit clock with transmitter Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 21/78] ASoC: codecs: wcd938x: fix missing clsh ctrl error handling Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 22/78] ASoC: codecs: wcd-mbhc-v2: fix resource leaks on component remove Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 23/78] ASoC: codecs: wcd938x: " Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 24/78] ASoC: codecs: wcd938x: fix missing mbhc init error handling Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 25/78] ASoC: codecs: wcd934x: fix resource leaks on component remove Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 26/78] ASoC: codecs: wcd938x: fix codec initialisation race Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 27/78] ASoC: codecs: wcd938x: fix soundwire " Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 28/78] ext4: correct inline offset when handling xattrs in inode body Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 29/78] [PATCH AUTOSEL 4.14 1/9] drm/radeon: Fix integer overflow in radeon_cs_parser_init Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 30/78] [PATCH AUTOSEL 4.14 2/9] ALSA: emu10k1: roll up loops in DSP setup code for Audigy Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 31/78] [PATCH AUTOSEL 4.14 3/9] quota: Properly disable quotas when add_dquot_ref() fails Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 32/78] [PATCH AUTOSEL 4.14 4/9] quota: fix warning in dqgrab() Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 33/78] [PATCH AUTOSEL 4.14 5/9] udf: Fix uninitialized array access for some pathnames Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 34/78] [PATCH AUTOSEL 4.14 6/9] fs: jfs: Fix UBSAN: array-index-out-of-bounds in dbAllocDmapLev Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 35/78] [PATCH AUTOSEL 4.14 7/9] MIPS: dec: prom: Address -Warray-bounds warning Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 36/78] [PATCH AUTOSEL 4.14 8/9] FS: JFS: Fix null-ptr-deref Read in txBegin Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 37/78] [PATCH AUTOSEL 4.14 9/9] FS: JFS: Check for read-only mounted filesystem " Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 38/78] spi: bcm63xx: fix max prepend length Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 39/78] fbdev: imxfb: warn about invalid left/right margin Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 40/78] perf build: Fix library not found error when using CSLIBS Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 41/78] pinctrl: amd: Use amd_pinconf_set() for all config options Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 42/78] net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field() Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 43/78] bridge: Add extack warning when enabling STP in netns Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 44/78] ethernet: use eth_hw_addr_set() instead of ether_addr_copy() Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 45/78] of: net: add a helper for loading netdev->dev_addr Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 46/78] ethernet: use of_get_ethdev_address() Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 47/78] net: ethernet: mtk_eth_soc: handle probe deferral Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 48/78] net: sched: cls_bpf: Undo tcf_bind_filter in case of an error Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 49/78] iavf: Fix use-after-free in free_netdev Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 50/78] iavf: Fix out-of-bounds when setting channels on remove Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 51/78] security: keys: Modify mismatched function name Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 52/78] octeontx2-pf: Dont allocate BPIDs for LBK interfaces Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 53/78] bpf: Fix subprog idx logic in check_max_stack_depth Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 54/78] igc: Prevent garbled TX queue with XDP ZEROCOPY Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 55/78] tcp: annotate data-races around tcp_rsk(req)->ts_recent Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 56/78] net: ipv4: Use kfree_sensitive instead of kfree Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 57/78] net:ipv6: check return value of pskb_trim() Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 58/78] Revert "tcp: avoid the lookup process failing to get sk in ehash table" Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 59/78] fbdev: au1200fb: Fix missing IRQ check in au1200fb_drv_probe Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 60/78] llc: Dont drop packet from non-root netns Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 61/78] netfilter: nf_tables: fix spurious set element insertion failure Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 62/78] netfilter: nft_set_pipapo: fix improper element removal Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 63/78] netfilter: nf_tables: skip bound chain in netns release path Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 64/78] netfilter: nf_tables: skip bound chain on rule flush Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 65/78] tcp: annotate data-races around tp->tcp_tx_delay Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 66/78] tcp: annotate data-races around tp->keepalive_time Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 67/78] tcp: annotate data-races around tp->keepalive_intvl Greg Kroah-Hartman
2023-07-25 10:46 ` [PATCH 5.15 68/78] tcp: annotate data-races around tp->keepalive_probes Greg Kroah-Hartman
2023-07-25 10:47 ` [PATCH 5.15 69/78] tcp: annotate data-races around icsk->icsk_syn_retries Greg Kroah-Hartman
2023-07-25 10:47 ` [PATCH 5.15 70/78] tcp: annotate data-races around tp->linger2 Greg Kroah-Hartman
2023-07-25 10:47 ` [PATCH 5.15 71/78] tcp: annotate data-races around rskq_defer_accept Greg Kroah-Hartman
2023-07-25 10:47 ` [PATCH 5.15 72/78] tcp: annotate data-races around tp->notsent_lowat Greg Kroah-Hartman
2023-07-25 10:47 ` [PATCH 5.15 73/78] tcp: annotate data-races around icsk->icsk_user_timeout Greg Kroah-Hartman
2023-07-25 10:47 ` [PATCH 5.15 74/78] tcp: annotate data-races around fastopenq.max_qlen Greg Kroah-Hartman
2023-07-25 10:47 ` [PATCH 5.15 75/78] net: phy: prevent stale pointer dereference in phy_init() Greg Kroah-Hartman
2023-07-25 10:47 ` Greg Kroah-Hartman [this message]
2023-07-25 10:47 ` [PATCH 5.15 77/78] tracing/histograms: Return an error if we fail to add histogram to hist_vars list Greg Kroah-Hartman
2023-07-25 10:47 ` [PATCH 5.15 78/78] nixge: fix mac address error handling again Greg Kroah-Hartman
2023-07-25 16:27 ` [PATCH 5.15 00/78] 5.15.123-rc1 review Jon Hunter
2023-07-25 18:08 ` SeongJae Park
2023-07-25 20:20 ` Shuah Khan
2023-07-25 21:52 ` Florian Fainelli
2023-07-26 14:19 ` Naresh Kamboju
2023-07-27 0:02 ` Guenter Roeck
2023-07-27 0:10 ` Ron Economos
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230725104454.221311267@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=chengzhihao1@huawei.com \
--cc=jack@suse.cz \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=yi.zhang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.