public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Ming-Hung Tsai <mtsai@redhat.com>,
	Mikulas Patocka <mpatocka@redhat.com>,
	Sasha Levin <sashal@kernel.org>,
	agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev
Subject: [PATCH AUTOSEL 5.4 32/79] dm cache: prevent BUG_ON by blocking retries on failed device resumes
Date: Mon,  5 May 2025 19:21:04 -0400	[thread overview]
Message-ID: <20250505232151.2698893-32-sashal@kernel.org> (raw)
In-Reply-To: <20250505232151.2698893-1-sashal@kernel.org>

From: Ming-Hung Tsai <mtsai@redhat.com>

[ Upstream commit 5da692e2262b8f81993baa9592f57d12c2703dea ]

A cache device failing to resume due to mapping errors should not be
retried, as the failure leaves a partially initialized policy object.
Repeating the resume operation risks triggering BUG_ON when reloading
cache mappings into the incomplete policy object.

Reproduce steps:

1. create a cache metadata consisting of 512 or more cache blocks,
   with some mappings stored in the first array block of the mapping
   array. Here we use cache_restore v1.0 to build the metadata.

cat <<EOF >> cmeta.xml
<superblock uuid="" block_size="128" nr_cache_blocks="512" \
policy="smq" hint_width="4">
  <mappings>
    <mapping cache_block="0" origin_block="0" dirty="false"/>
  </mappings>
</superblock>
EOF
dmsetup create cmeta --table "0 8192 linear /dev/sdc 0"
cache_restore -i cmeta.xml -o /dev/mapper/cmeta --metadata-version=2
dmsetup remove cmeta

2. wipe the second array block of the mapping array to simulate
   data degradations.

mapping_root=$(dd if=/dev/sdc bs=1c count=8 skip=192 \
2>/dev/null | hexdump -e '1/8 "%u\n"')
ablock=$(dd if=/dev/sdc bs=1c count=8 skip=$((4096*mapping_root+2056)) \
2>/dev/null | hexdump -e '1/8 "%u\n"')
dd if=/dev/zero of=/dev/sdc bs=4k count=1 seek=$ablock

3. try bringing up the cache device. The resume is expected to fail
   due to the broken array block.

dmsetup create cmeta --table "0 8192 linear /dev/sdc 0"
dmsetup create cdata --table "0 65536 linear /dev/sdc 8192"
dmsetup create corig --table "0 524288 linear /dev/sdc 262144"
dmsetup create cache --notable
dmsetup load cache --table "0 524288 cache /dev/mapper/cmeta \
/dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0"
dmsetup resume cache

4. try resuming the cache again. An unexpected BUG_ON is triggered
   while loading cache mappings.

dmsetup resume cache

Kernel logs:

(snip)
------------[ cut here ]------------
kernel BUG at drivers/md/dm-cache-policy-smq.c:752!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 332 Comm: dmsetup Not tainted 6.13.4 #3
RIP: 0010:smq_load_mapping+0x3e5/0x570

Fix by disallowing resume operations for devices that failed the
initial attempt.

Signed-off-by: Ming-Hung Tsai <mtsai@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/md/dm-cache-target.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index c1d2e3376afcd..0aa22a994c86a 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -2996,6 +2996,27 @@ static dm_cblock_t get_cache_dev_size(struct cache *cache)
 	return to_cblock(size);
 }
 
+static bool can_resume(struct cache *cache)
+{
+	/*
+	 * Disallow retrying the resume operation for devices that failed the
+	 * first resume attempt, as the failure leaves the policy object partially
+	 * initialized. Retrying could trigger BUG_ON when loading cache mappings
+	 * into the incomplete policy object.
+	 */
+	if (cache->sized && !cache->loaded_mappings) {
+		if (get_cache_mode(cache) != CM_WRITE)
+			DMERR("%s: unable to resume a failed-loaded cache, please check metadata.",
+			      cache_device_name(cache));
+		else
+			DMERR("%s: unable to resume cache due to missing proper cache table reload",
+			      cache_device_name(cache));
+		return false;
+	}
+
+	return true;
+}
+
 static bool can_resize(struct cache *cache, dm_cblock_t new_size)
 {
 	if (from_cblock(new_size) > from_cblock(cache->cache_size)) {
@@ -3044,6 +3065,9 @@ static int cache_preresume(struct dm_target *ti)
 	struct cache *cache = ti->private;
 	dm_cblock_t csize = get_cache_dev_size(cache);
 
+	if (!can_resume(cache))
+		return -EINVAL;
+
 	/*
 	 * Check to see if the cache has resized.
 	 */
-- 
2.39.5


  parent reply	other threads:[~2025-05-05 23:22 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-05 23:20 [PATCH AUTOSEL 5.4 01/79] kconfig: merge_config: use an empty file as initfile Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 02/79] mailbox: use error ret code of of_parse_phandle_with_args() Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 03/79] fbdev: fsl-diu-fb: add missing device_remove_file() Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 04/79] fbdev: core: tileblit: Implement missing margin clearing for tileblit Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 05/79] NFSv4: Treat ENETUNREACH errors as fatal for state recovery Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 06/79] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 07/79] exit: fix the usage of delay_group_leader->exit_code in do_notify_parent() and pidfs_exit() Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 08/79] dql: Fix dql->limit value when reset Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 09/79] tools/build: Don't pass test log files to linker Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 10/79] pNFS/flexfiles: Report ENETDOWN as a connection error Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 11/79] libnvdimm/labels: Fix divide error in nd_label_data_init() Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 12/79] mmc: host: Wait for Vdd to settle on card power off Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 13/79] i2c: pxa: fix call balance of i2c->clk handling routines Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 14/79] btrfs: avoid linker error in btrfs_find_create_tree_block() Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 15/79] btrfs: send: return -ENAMETOOLONG when attempting a path that is too long Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 16/79] um: Store full CSGSFS and SS register from mcontext Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 17/79] um: Update min_low_pfn to match changes in uml_reserved Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 18/79] ext4: reorder capability check last Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 19/79] scsi: st: Tighten the page format heuristics with MODE SELECT Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 20/79] scsi: st: ERASE does not change tape location Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 21/79] kbuild: fix argument parsing in scripts/config Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 22/79] dm: restrict dm device size to 2^63-512 bytes Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 23/79] xen: Add support for XenServer 6.1 platform device Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 24/79] posix-timers: Add cond_resched() to posix_timer_add() search loop Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 25/79] netfilter: conntrack: Bound nf_conntrack sysctl writes Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 26/79] mmc: sdhci: Disable SD card clock before changing parameters Sasha Levin
2025-05-05 23:20 ` [PATCH AUTOSEL 5.4 27/79] powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7 Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 28/79] rtc: ds1307: stop disabling alarms on probe Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 29/79] ieee802154: ca8210: Use proper setters and getters for bitwise types Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 30/79] ARM: tegra: Switch DSI-B clock parent to PLLD on Tegra114 Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 31/79] media: c8sectpfe: Call of_node_put(i2c_bus) only once in c8sectpfe_probe() Sasha Levin
2025-05-05 23:21 ` Sasha Levin [this message]
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 33/79] orangefs: Do not truncate file size Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 34/79] media: cx231xx: set device_caps for 417 Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 35/79] pinctrl: bcm281xx: Use "unsigned int" instead of bare "unsigned" Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 36/79] net: pktgen: fix mpls maximum labels list parsing Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 37/79] x86/bugs: Make spectre user default depend on MITIGATION_SPECTRE_V2 Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 38/79] hwmon: (gpio-fan) Add missing mutex locks Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 39/79] drm/mediatek: mtk_dpi: Add checks for reg_h_fre_con existence Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 40/79] fpga: altera-cvp: Increase credit timeout Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 41/79] net/mlx5: Avoid report two health errors on same syndrome Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 42/79] drm/amdkfd: KFD release_work possible circular locking Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 43/79] net: xgene-v2: remove incorrect ACPI_PTR annotation Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 44/79] bonding: report duplicate MAC address in all situations Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 45/79] x86/nmi: Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus() Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 46/79] cpuidle: menu: Avoid discarding useful information Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 47/79] MIPS: Use arch specific syscall name match function Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 48/79] MIPS: pm-cps: Use per-CPU variables as per-CPU, not per-core Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 49/79] scsi: mpt3sas: Send a diag reset if target reset fails Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 50/79] wifi: rtw88: Fix rtw_init_ht_cap() for RTL8814AU Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 51/79] net: pktgen: fix access outside of user given buffer in pktgen_thread_write() Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 52/79] EDAC/ie31200: work around false positive build warning Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 53/79] PCI: Fix old_size lower bound in calculate_iosize() too Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 54/79] ACPI: HED: Always initialize before evged Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 55/79] net/mlx5: Modify LSB bitmask in temperature event to include only the first bit Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 56/79] net/mlx5: Apply rate-limiting to high temperature warning Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 57/79] ASoC: ops: Enforce platform maximum on initial value Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 58/79] pinctrl: devicetree: do not goto err when probing hogs in pinctrl_dt_to_map Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 59/79] smack: recognize ipv4 CIPSO w/o categories Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 60/79] net/mlx4_core: Avoid impossible mlx4_db_alloc() order value Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 61/79] phy: core: don't require set_mode() callback for phy_get_mode() to work Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 62/79] net/mlx5: Extend Ethtool loopback selftest to support non-linear SKB Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 63/79] net/mlx5e: set the tx_queue_len for pfifo_fast Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 64/79] net/mlx5e: reduce rep rxq depth to 256 for ECPF Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 65/79] ip: fib_rules: Fetch net from fib_rule in fib[46]_rule_configure() Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 66/79] exit: change the release_task() paths to call flush_sigqueue() lockless Sasha Levin
2025-05-06 11:21   ` Oleg Nesterov
2025-05-20 14:05     ` Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 67/79] hwmon: (xgene-hwmon) use appropriate type for the latency value Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 68/79] vxlan: Annotate FDB data races Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 69/79] net-sysfs: prevent uncleared queues from being re-added Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 70/79] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 71/79] rcu: fix header guard for rcu_all_qs() Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 72/79] scsi: lpfc: Handle duplicate D_IDs in ndlp search-by D_ID routine Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 73/79] scsi: st: Restore some drive settings after reset Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 74/79] HID: usbkbd: Fix the bit shift number for LED_KANA Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 75/79] bpftool: Fix readlink usage in get_fd_type Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 76/79] wifi: rtw88: Don't use static local variable in rtw8822b_set_tx_power_index_by_rate Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 77/79] regulator: ad5398: Add device tree support Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 78/79] drm/atomic: clarify the rules around drm_atomic_state->allow_modeset Sasha Levin
2025-05-05 23:21 ` [PATCH AUTOSEL 5.4 79/79] drm: Add valid clones check Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250505232151.2698893-32-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=agk@redhat.com \
    --cc=dm-devel@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=mtsai@redhat.com \
    --cc=snitzer@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox