From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BEC121A6192; Tue, 30 Jul 2024 16:33:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722357209; cv=none; b=U/OsQZQssZ17oOdeQy78CXimjhNOw0VLdXGniYm3zMhwuyty0UYRtDS823dWzeqRkrjc4dMxqWugu/jc2XlWQPZ9ilQ/s03eBIasLUTklRJhibAhk4prFvkdKxPMTO+vplL/doxGcIX+dHb7kgA7P2pYyLHGgZvIKbCFA6kVhmc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722357209; c=relaxed/simple; bh=oMLl+UvX9a2sHGYMP24XLgcnxQR16+k4guDkfbC53YI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oYPGBuDP30pyxlOFhlKMUk7isnpwl+u96yLyhV4RjmlJq2dzf5bVEbMS19E1Z/hW4zKl/8iN8IZlKn2nzZmhhgoWigKNzLmVb1q8eA6TKDGvlCJJ+Teu36P7Kr6QOyJz/SuNNgmv3TGqCCNMw5H0/U/yTS0Nyw95OBV48wLsork= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b=HKOqAFOh; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linuxfoundation.org header.i=@linuxfoundation.org header.b="HKOqAFOh" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 404C6C32782; Tue, 30 Jul 2024 16:33:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1722357209; bh=oMLl+UvX9a2sHGYMP24XLgcnxQR16+k4guDkfbC53YI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HKOqAFOh58XibgqUMbjErzIpFxjh3JpGLnrSv9R6KSQsp4ZWTDkG3hPbjcam/pucw W+w5poAFH1aTGnI0t6wSamdP06utm00MFmWuTQ8hvBuLS0nN4ERaQ8Ufksr5rDeTOJ Ry53mNbRR0XO4/9x7wsVLN6PFPGTwRh2LY70NNfM= From: Greg Kroah-Hartman To: stable@vger.kernel.org Cc: Greg Kroah-Hartman , patches@lists.linux.dev, Yang Yang , Ming Lei , Bart Van Assche , Jens Axboe , Sasha Levin Subject: [PATCH 6.1 269/440] sbitmap: fix io hung due to race on sbitmap_word::cleared Date: Tue, 30 Jul 2024 17:48:22 +0200 Message-ID: <20240730151626.344689780@linuxfoundation.org> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20240730151615.753688326@linuxfoundation.org> References: <20240730151615.753688326@linuxfoundation.org> User-Agent: quilt/0.67 X-stable: review X-Patchwork-Hint: ignore Precedence: bulk X-Mailing-List: patches@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit 6.1-stable review patch. If anyone has any objections, please let me know. ------------------ From: Yang Yang [ Upstream commit 72d04bdcf3f7d7e07d82f9757946f68802a7270a ] Configuration for sbq: depth=64, wake_batch=6, shift=6, map_nr=1 1. There are 64 requests in progress: map->word = 0xFFFFFFFFFFFFFFFF 2. After all the 64 requests complete, and no more requests come: map->word = 0xFFFFFFFFFFFFFFFF, map->cleared = 0xFFFFFFFFFFFFFFFF 3. Now two tasks try to allocate requests: T1: T2: __blk_mq_get_tag . __sbitmap_queue_get . sbitmap_get . sbitmap_find_bit . sbitmap_find_bit_in_word . __sbitmap_get_word -> nr=-1 __blk_mq_get_tag sbitmap_deferred_clear __sbitmap_queue_get /* map->cleared=0xFFFFFFFFFFFFFFFF */ sbitmap_find_bit if (!READ_ONCE(map->cleared)) sbitmap_find_bit_in_word return false; __sbitmap_get_word -> nr=-1 mask = xchg(&map->cleared, 0) sbitmap_deferred_clear atomic_long_andnot() /* map->cleared=0 */ if (!(map->cleared)) return false; /* * map->cleared is cleared by T1 * T2 fail to acquire the tag */ 4. T2 is the sole tag waiter. When T1 puts the tag, T2 cannot be woken up due to the wake_batch being set at 6. If no more requests come, T1 will wait here indefinitely. This patch achieves two purposes: 1. Check on ->cleared and update on both ->cleared and ->word need to be done atomically, and using spinlock could be the simplest solution. 2. Add extra check in sbitmap_deferred_clear(), to identify whether ->word has free bits. Fixes: ea86ea2cdced ("sbitmap: ammortize cost of clearing bits") Signed-off-by: Yang Yang Reviewed-by: Ming Lei Reviewed-by: Bart Van Assche Link: https://lore.kernel.org/r/20240716082644.659566-1-yang.yang@vivo.com Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin --- include/linux/sbitmap.h | 5 +++++ lib/sbitmap.c | 36 +++++++++++++++++++++++++++++------- 2 files changed, 34 insertions(+), 7 deletions(-) diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index d662cf136021d..c09cdcc99471e 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -36,6 +36,11 @@ struct sbitmap_word { * @cleared: word holding cleared bits */ unsigned long cleared ____cacheline_aligned_in_smp; + + /** + * @swap_lock: serializes simultaneous updates of ->word and ->cleared + */ + spinlock_t swap_lock; } ____cacheline_aligned_in_smp; /** diff --git a/lib/sbitmap.c b/lib/sbitmap.c index a727d0b12763a..61075535a8073 100644 --- a/lib/sbitmap.c +++ b/lib/sbitmap.c @@ -60,12 +60,30 @@ static inline void update_alloc_hint_after_get(struct sbitmap *sb, /* * See if we have deferred clears that we can batch move */ -static inline bool sbitmap_deferred_clear(struct sbitmap_word *map) +static inline bool sbitmap_deferred_clear(struct sbitmap_word *map, + unsigned int depth, unsigned int alloc_hint, bool wrap) { - unsigned long mask; + unsigned long mask, word_mask; - if (!READ_ONCE(map->cleared)) - return false; + guard(spinlock_irqsave)(&map->swap_lock); + + if (!map->cleared) { + if (depth == 0) + return false; + + word_mask = (~0UL) >> (BITS_PER_LONG - depth); + /* + * The current behavior is to always retry after moving + * ->cleared to word, and we change it to retry in case + * of any free bits. To avoid an infinite loop, we need + * to take wrap & alloc_hint into account, otherwise a + * soft lockup may occur. + */ + if (!wrap && alloc_hint) + word_mask &= ~((1UL << alloc_hint) - 1); + + return (READ_ONCE(map->word) & word_mask) != word_mask; + } /* * First get a stable cleared mask, setting the old mask to 0. @@ -85,6 +103,7 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, bool alloc_hint) { unsigned int bits_per_word; + int i; if (shift < 0) shift = sbitmap_calculate_shift(depth); @@ -116,6 +135,9 @@ int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, return -ENOMEM; } + for (i = 0; i < sb->map_nr; i++) + spin_lock_init(&sb->map[i].swap_lock); + return 0; } EXPORT_SYMBOL_GPL(sbitmap_init_node); @@ -126,7 +148,7 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth) unsigned int i; for (i = 0; i < sb->map_nr; i++) - sbitmap_deferred_clear(&sb->map[i]); + sbitmap_deferred_clear(&sb->map[i], 0, 0, 0); sb->depth = depth; sb->map_nr = DIV_ROUND_UP(sb->depth, bits_per_word); @@ -179,7 +201,7 @@ static int sbitmap_find_bit_in_word(struct sbitmap_word *map, alloc_hint, wrap); if (nr != -1) break; - if (!sbitmap_deferred_clear(map)) + if (!sbitmap_deferred_clear(map, depth, alloc_hint, wrap)) break; } while (1); @@ -505,7 +527,7 @@ unsigned long __sbitmap_queue_get_batch(struct sbitmap_queue *sbq, int nr_tags, unsigned int map_depth = __map_depth(sb, index); unsigned long val; - sbitmap_deferred_clear(map); + sbitmap_deferred_clear(map, 0, 0, 0); val = READ_ONCE(map->word); if (val == (1UL << (map_depth - 1)) - 1) goto next; -- 2.43.0