public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev,
	Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>,
	Coly Li <colyli@suse.de>,
	Eric Wheeler <bcache@lists.ewheeler.net>,
	Jens Axboe <axboe@kernel.dk>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.19 19/36] bcache: avoid oversize memory allocation by small stripe_size
Date: Mon, 18 Dec 2023 14:51:29 +0100	[thread overview]
Message-ID: <20231218135042.529383681@linuxfoundation.org> (raw)
In-Reply-To: <20231218135041.876499958@linuxfoundation.org>

4.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Coly Li <colyli@suse.de>

[ Upstream commit baf8fb7e0e5ec54ea0839f0c534f2cdcd79bea9c ]

Arraies bcache->stripe_sectors_dirty and bcache->full_dirty_stripes are
used for dirty data writeback, their sizes are decided by backing device
capacity and stripe size. Larger backing device capacity or smaller
stripe size make these two arraies occupies more dynamic memory space.

Currently bcache->stripe_size is directly inherited from
queue->limits.io_opt of underlying storage device. For normal hard
drives, its limits.io_opt is 0, and bcache sets the corresponding
stripe_size to 1TB (1<<31 sectors), it works fine 10+ years. But for
devices do declare value for queue->limits.io_opt, small stripe_size
(comparing to 1TB) becomes an issue for oversize memory allocations of
bcache->stripe_sectors_dirty and bcache->full_dirty_stripes, while the
capacity of hard drives gets much larger in recent decade.

For example a raid5 array assembled by three 20TB hardrives, the raid
device capacity is 40TB with typical 512KB limits.io_opt. After the math
calculation in bcache code, these two arraies will occupy 400MB dynamic
memory. Even worse Andrea Tomassetti reports that a 4KB limits.io_opt is
declared on a new 2TB hard drive, then these two arraies request 2GB and
512MB dynamic memory from kzalloc(). The result is that bcache device
always fails to initialize on his system.

To avoid the oversize memory allocation, bcache->stripe_size should not
directly inherited by queue->limits.io_opt from the underlying device.
This patch defines BCH_MIN_STRIPE_SZ (4MB) as minimal bcache stripe size
and set bcache device's stripe size against the declared limits.io_opt
value from the underlying storage device,
- If the declared limits.io_opt > BCH_MIN_STRIPE_SZ, bcache device will
  set its stripe size directly by this limits.io_opt value.
- If the declared limits.io_opt < BCH_MIN_STRIPE_SZ, bcache device will
  set its stripe size by a value multiplying limits.io_opt and euqal or
  large than BCH_MIN_STRIPE_SZ.

Then the minimal stripe size of a bcache device will always be >= 4MB.
For a 40TB raid5 device with 512KB limits.io_opt, memory occupied by
bcache->stripe_sectors_dirty and bcache->full_dirty_stripes will be 50MB
in total. For a 2TB hard drive with 4KB limits.io_opt, memory occupied
by these two arraies will be 2.5MB in total.

Such mount of memory allocated for bcache->stripe_sectors_dirty and
bcache->full_dirty_stripes is reasonable for most of storage devices.

Reported-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Eric Wheeler <bcache@lists.ewheeler.net>
Link: https://lore.kernel.org/r/20231120052503.6122-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/md/bcache/bcache.h | 1 +
 drivers/md/bcache/super.c  | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index e81d783109847..d0311e3065caf 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -265,6 +265,7 @@ struct bcache_device {
 #define BCACHE_DEV_WB_RUNNING		3
 #define BCACHE_DEV_RATE_DW_RUNNING	4
 	int			nr_stripes;
+#define BCH_MIN_STRIPE_SZ		((4 << 20) >> SECTOR_SHIFT)
 	unsigned int		stripe_size;
 	atomic_t		*stripe_sectors_dirty;
 	unsigned long		*full_dirty_stripes;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 4b076f7f184be..8b4a2bd6b57c9 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -807,6 +807,8 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
 
 	if (!d->stripe_size)
 		d->stripe_size = 1 << 31;
+	else if (d->stripe_size < BCH_MIN_STRIPE_SZ)
+		d->stripe_size = roundup(BCH_MIN_STRIPE_SZ, d->stripe_size);
 
 	d->nr_stripes = DIV_ROUND_UP_ULL(sectors, d->stripe_size);
 
-- 
2.43.0




  parent reply	other threads:[~2023-12-18 13:53 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-18 13:51 [PATCH 4.19 00/36] 4.19.303-rc1 review Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 01/36] qca_debug: Prevent crash on TX ring changes Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 02/36] qca_debug: Fix ethtool -G iface tx behavior Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 03/36] qca_spi: Fix reset behavior Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 04/36] atm: solos-pci: Fix potential deadlock on &cli_queue_lock Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 05/36] atm: solos-pci: Fix potential deadlock on &tx_queue_lock Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 06/36] atm: Fix Use-After-Free in do_vcc_ioctl Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 07/36] net/rose: Fix Use-After-Free in rose_ioctl Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 08/36] qed: Fix a potential use-after-free in qed_cxt_tables_alloc Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 09/36] net: Remove acked SYN flag from packet in the transmit queue correctly Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 10/36] sign-file: Fix incorrect return values check Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 11/36] vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space() Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 12/36] net: prevent mss overflow in skb_segment() Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 13/36] net: stmmac: use dev_err_probe() for reporting mdio bus registration failure Greg Kroah-Hartman
2023-12-18 18:45   ` Daniel Díaz
2023-12-18 20:25     ` Greg Kroah-Hartman
2023-12-19  7:23       ` Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 14/36] net: stmmac: Handle disabled MDIO busses from devicetree Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 15/36] appletalk: Fix Use-After-Free in atalk_ioctl Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 16/36] Revert "PCI: acpiphp: Reassign resources on bridge if necessary" Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 17/36] cred: switch to using atomic_long_t Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 18/36] blk-throttle: fix lockdep warning of "cgroup_mutex or RCU read lock required!" Greg Kroah-Hartman
2023-12-18 13:51 ` Greg Kroah-Hartman [this message]
2023-12-18 13:51 ` [PATCH 4.19 20/36] bcache: add code comments for bch_btree_node_get() and __bch_btree_node_alloc() Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 21/36] bcache: avoid NULL checking to c->root in run_cache_set() Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 22/36] platform/x86: intel_telemetry: Fix kernel doc descriptions Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 23/36] HID: add ALWAYS_POLL quirk for Apple kb Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 24/36] HID: hid-asus: reset the backlight brightness level on resume Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 25/36] HID: multitouch: Add quirk for HONOR GLO-GXXX touchpad Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 26/36] asm-generic: qspinlock: fix queued_spin_value_unlocked() implementation Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 27/36] net: usb: qmi_wwan: claim interface 4 for ZTE MF290 Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 28/36] HID: hid-asus: add const to read-only outgoing usb buffer Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 29/36] perf: Fix perf_event_validate_size() lockdep splat Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 30/36] ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 31/36] arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 32/36] team: Fix use-after-free when an option instance allocation fails Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 33/36] ring-buffer: Fix memory leak of free page Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 34/36] mmc: block: Be sure to wait while busy in CQE error recovery Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 35/36] powerpc/ftrace: Create a dummy stackframe to fix stack unwind Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 4.19 36/36] powerpc/ftrace: Fix stack teardown in ftrace_no_trace Greg Kroah-Hartman
2023-12-18 17:49 ` [PATCH 4.19 00/36] 4.19.303-rc1 review Pavel Machek
2023-12-19  0:05 ` Shuah Khan
2023-12-19  5:27 ` Daniel Díaz
2023-12-19  5:53   ` Harshit Mogalapalli
2023-12-19  7:22     ` Greg Kroah-Hartman
2023-12-19 11:30 ` Jon Hunter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231218135042.529383681@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=andrea.tomassetti-opensource@devo.com \
    --cc=axboe@kernel.dk \
    --cc=bcache@lists.ewheeler.net \
    --cc=colyli@suse.de \
    --cc=patches@lists.linux.dev \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox