From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A88C35ABD6; Mon, 5 May 2025 22:45:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746485157; cv=none; b=B06wRgtHKtivb1ma+uhnI8E4ruRwyhF6Db3rILgRKmpHu0mS0OPlvNmgGqKRMwikNc64Nk6S3DJQ3siljfPpJu5MI9fR3Gi50j0Q78s+urbaRd+0Vdu5ShHEIssuRhblr4/Tfukweev8lKAZci5rR+HHywXmIGD7q27Q9tej0u8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746485157; c=relaxed/simple; bh=m9tcIJUZZZ6qL3yvqw7//qLFD07A26cZti3O/peXdOI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=GvB8HMPWu7rQy9HuZL+oFWl66K5FzLIZfWI/RwFCJ3fTKk5PF/gWlhG/GeJV3yKaiCz3Ug8d+ODtNSj0WU+1+WwlACsslpkM+9Bfm0zcdMJ04i1ZueTzhkVgS9HleSsQLHQLpLmVHKLBNlHYl+qdQyMDpY/eZBnEQOzr7zNDfjY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iMl/Mz/6; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iMl/Mz/6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 978A3C4CEEF; Mon, 5 May 2025 22:45:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1746485155; bh=m9tcIJUZZZ6qL3yvqw7//qLFD07A26cZti3O/peXdOI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iMl/Mz/6bPBhs6AFnEeqd+QGd4tLfPtWXupdu+fTDbl3RhCSZO24pS8w4zc3rxIIa a/E2MfOH/0i9DIEN4hyBzfKla1h3/mHeVWgxrDTalr8dlrSyerJSMEiL4Fwux+OcsE gNYiVAksam/Rk330PnFzNP+vqXf7Zk5CWnTth1CJU/hWPFCt4VTL0iXJJg/ygAZxJS FT+lyG3ub1zrPAaHRzw+qKmIRt8txlBLI24DDJQ+R7LA5bWDZbkTU3lVwzKoMqFiEK j9pLUW+AkerRrQRV7ESunHWTFF0I2E28/HpsHP2FCt/CazKZ0kinj2xfXHPh0uOzqu k2NsmMjZjzCMQ== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Ming-Hung Tsai , Mikulas Patocka , Sasha Levin , agk@redhat.com, snitzer@kernel.org, dm-devel@lists.linux.dev Subject: [PATCH AUTOSEL 6.12 189/486] dm cache: prevent BUG_ON by blocking retries on failed device resumes Date: Mon, 5 May 2025 18:34:25 -0400 Message-Id: <20250505223922.2682012-189-sashal@kernel.org> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250505223922.2682012-1-sashal@kernel.org> References: <20250505223922.2682012-1-sashal@kernel.org> Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore X-stable-base: Linux 6.12.26 Content-Transfer-Encoding: 8bit From: Ming-Hung Tsai [ Upstream commit 5da692e2262b8f81993baa9592f57d12c2703dea ] A cache device failing to resume due to mapping errors should not be retried, as the failure leaves a partially initialized policy object. Repeating the resume operation risks triggering BUG_ON when reloading cache mappings into the incomplete policy object. Reproduce steps: 1. create a cache metadata consisting of 512 or more cache blocks, with some mappings stored in the first array block of the mapping array. Here we use cache_restore v1.0 to build the metadata. cat <> cmeta.xml EOF dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" cache_restore -i cmeta.xml -o /dev/mapper/cmeta --metadata-version=2 dmsetup remove cmeta 2. wipe the second array block of the mapping array to simulate data degradations. mapping_root=$(dd if=/dev/sdc bs=1c count=8 skip=192 \ 2>/dev/null | hexdump -e '1/8 "%u\n"') ablock=$(dd if=/dev/sdc bs=1c count=8 skip=$((4096*mapping_root+2056)) \ 2>/dev/null | hexdump -e '1/8 "%u\n"') dd if=/dev/zero of=/dev/sdc bs=4k count=1 seek=$ablock 3. try bringing up the cache device. The resume is expected to fail due to the broken array block. dmsetup create cmeta --table "0 8192 linear /dev/sdc 0" dmsetup create cdata --table "0 65536 linear /dev/sdc 8192" dmsetup create corig --table "0 524288 linear /dev/sdc 262144" dmsetup create cache --notable dmsetup load cache --table "0 524288 cache /dev/mapper/cmeta \ /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0" dmsetup resume cache 4. try resuming the cache again. An unexpected BUG_ON is triggered while loading cache mappings. dmsetup resume cache Kernel logs: (snip) ------------[ cut here ]------------ kernel BUG at drivers/md/dm-cache-policy-smq.c:752! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 UID: 0 PID: 332 Comm: dmsetup Not tainted 6.13.4 #3 RIP: 0010:smq_load_mapping+0x3e5/0x570 Fix by disallowing resume operations for devices that failed the initial attempt. Signed-off-by: Ming-Hung Tsai Signed-off-by: Mikulas Patocka Signed-off-by: Sasha Levin --- drivers/md/dm-cache-target.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index 849eb6333e980..6aa4095dc5876 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -2899,6 +2899,27 @@ static dm_cblock_t get_cache_dev_size(struct cache *cache) return to_cblock(size); } +static bool can_resume(struct cache *cache) +{ + /* + * Disallow retrying the resume operation for devices that failed the + * first resume attempt, as the failure leaves the policy object partially + * initialized. Retrying could trigger BUG_ON when loading cache mappings + * into the incomplete policy object. + */ + if (cache->sized && !cache->loaded_mappings) { + if (get_cache_mode(cache) != CM_WRITE) + DMERR("%s: unable to resume a failed-loaded cache, please check metadata.", + cache_device_name(cache)); + else + DMERR("%s: unable to resume cache due to missing proper cache table reload", + cache_device_name(cache)); + return false; + } + + return true; +} + static bool can_resize(struct cache *cache, dm_cblock_t new_size) { if (from_cblock(new_size) > from_cblock(cache->cache_size)) { @@ -2947,6 +2968,9 @@ static int cache_preresume(struct dm_target *ti) struct cache *cache = ti->private; dm_cblock_t csize = get_cache_dev_size(cache); + if (!can_resume(cache)) + return -EINVAL; + /* * Check to see if the cache has resized. */ -- 2.39.5