From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev,
Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>,
Coly Li <colyli@suse.de>,
Eric Wheeler <bcache@lists.ewheeler.net>,
Jens Axboe <axboe@kernel.dk>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.10 35/62] bcache: avoid oversize memory allocation by small stripe_size
Date: Mon, 18 Dec 2023 14:51:59 +0100 [thread overview]
Message-ID: <20231218135047.829998376@linuxfoundation.org> (raw)
In-Reply-To: <20231218135046.178317233@linuxfoundation.org>
5.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Coly Li <colyli@suse.de>
[ Upstream commit baf8fb7e0e5ec54ea0839f0c534f2cdcd79bea9c ]
Arraies bcache->stripe_sectors_dirty and bcache->full_dirty_stripes are
used for dirty data writeback, their sizes are decided by backing device
capacity and stripe size. Larger backing device capacity or smaller
stripe size make these two arraies occupies more dynamic memory space.
Currently bcache->stripe_size is directly inherited from
queue->limits.io_opt of underlying storage device. For normal hard
drives, its limits.io_opt is 0, and bcache sets the corresponding
stripe_size to 1TB (1<<31 sectors), it works fine 10+ years. But for
devices do declare value for queue->limits.io_opt, small stripe_size
(comparing to 1TB) becomes an issue for oversize memory allocations of
bcache->stripe_sectors_dirty and bcache->full_dirty_stripes, while the
capacity of hard drives gets much larger in recent decade.
For example a raid5 array assembled by three 20TB hardrives, the raid
device capacity is 40TB with typical 512KB limits.io_opt. After the math
calculation in bcache code, these two arraies will occupy 400MB dynamic
memory. Even worse Andrea Tomassetti reports that a 4KB limits.io_opt is
declared on a new 2TB hard drive, then these two arraies request 2GB and
512MB dynamic memory from kzalloc(). The result is that bcache device
always fails to initialize on his system.
To avoid the oversize memory allocation, bcache->stripe_size should not
directly inherited by queue->limits.io_opt from the underlying device.
This patch defines BCH_MIN_STRIPE_SZ (4MB) as minimal bcache stripe size
and set bcache device's stripe size against the declared limits.io_opt
value from the underlying storage device,
- If the declared limits.io_opt > BCH_MIN_STRIPE_SZ, bcache device will
set its stripe size directly by this limits.io_opt value.
- If the declared limits.io_opt < BCH_MIN_STRIPE_SZ, bcache device will
set its stripe size by a value multiplying limits.io_opt and euqal or
large than BCH_MIN_STRIPE_SZ.
Then the minimal stripe size of a bcache device will always be >= 4MB.
For a 40TB raid5 device with 512KB limits.io_opt, memory occupied by
bcache->stripe_sectors_dirty and bcache->full_dirty_stripes will be 50MB
in total. For a 2TB hard drive with 4KB limits.io_opt, memory occupied
by these two arraies will be 2.5MB in total.
Such mount of memory allocated for bcache->stripe_sectors_dirty and
bcache->full_dirty_stripes is reasonable for most of storage devices.
Reported-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Eric Wheeler <bcache@lists.ewheeler.net>
Link: https://lore.kernel.org/r/20231120052503.6122-2-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
drivers/md/bcache/bcache.h | 1 +
drivers/md/bcache/super.c | 2 ++
2 files changed, 3 insertions(+)
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index e8bf4f752e8be..5804a8e6e215a 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -265,6 +265,7 @@ struct bcache_device {
#define BCACHE_DEV_WB_RUNNING 3
#define BCACHE_DEV_RATE_DW_RUNNING 4
int nr_stripes;
+#define BCH_MIN_STRIPE_SZ ((4 << 20) >> SECTOR_SHIFT)
unsigned int stripe_size;
atomic_t *stripe_sectors_dirty;
unsigned long *full_dirty_stripes;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index e8c8077667f9e..9c3e1632568c3 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -917,6 +917,8 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
if (!d->stripe_size)
d->stripe_size = 1 << 31;
+ else if (d->stripe_size < BCH_MIN_STRIPE_SZ)
+ d->stripe_size = roundup(BCH_MIN_STRIPE_SZ, d->stripe_size);
n = DIV_ROUND_UP_ULL(sectors, d->stripe_size);
if (!n || n > max_stripes) {
--
2.43.0
next prev parent reply other threads:[~2023-12-18 14:08 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-18 13:51 [PATCH 5.10 00/62] 5.10.205-rc1 review Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 01/62] netfilter: nf_tables: fix exist matching on bigendian arches Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 02/62] afs: Fix refcount underflow from error handling race Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 03/62] HID: lenovo: Restrict detection of patched firmware only to USB cptkbd Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 04/62] net: ipv6: support reporting otherwise unknown prefix flags in RTM_NEWPREFIX Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 05/62] qca_debug: Prevent crash on TX ring changes Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 06/62] qca_debug: Fix ethtool -G iface tx behavior Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 07/62] qca_spi: Fix reset behavior Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 08/62] atm: solos-pci: Fix potential deadlock on &cli_queue_lock Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 09/62] atm: solos-pci: Fix potential deadlock on &tx_queue_lock Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 10/62] net: vlan: introduce skb_vlan_eth_hdr() Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 11/62] net: fec: correct queue selection Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 12/62] atm: Fix Use-After-Free in do_vcc_ioctl Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 13/62] net/rose: Fix Use-After-Free in rose_ioctl Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 14/62] qed: Fix a potential use-after-free in qed_cxt_tables_alloc Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 15/62] net: Remove acked SYN flag from packet in the transmit queue correctly Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 16/62] net: ena: Destroy correct number of xdp queues upon failure Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 17/62] net: ena: Fix XDP redirection error Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 18/62] sign-file: Fix incorrect return values check Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 19/62] vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space() Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 20/62] net: stmmac: use dev_err_probe() for reporting mdio bus registration failure Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 21/62] net: stmmac: Handle disabled MDIO busses from devicetree Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 22/62] appletalk: Fix Use-After-Free in atalk_ioctl Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 23/62] net: atlantic: fix double free in ring reinit logic Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 24/62] cred: switch to using atomic_long_t Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 25/62] fuse: dax: set fc->dax to NULL in fuse_dax_conn_free() Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 26/62] ALSA: hda/hdmi: add force-connect quirks for ASUSTeK Z170 variants Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 27/62] ALSA: hda/realtek: Apply mute LED quirk for HP15-db Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 28/62] Revert "PCI: acpiphp: Reassign resources on bridge if necessary" Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 29/62] PCI: loongson: Limit MRRS to 256 Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 30/62] drm/atomic: Pass the full state to CRTC atomic begin and flush Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 31/62] drm: Use state helper instead of CRTC state pointer Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 32/62] drm/mediatek: Add spinlock for setting vblank event in atomic_begin Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 33/62] usb: aqc111: check packet for fixup for true limit Greg Kroah-Hartman
2023-12-18 13:51 ` [PATCH 5.10 34/62] blk-throttle: fix lockdep warning of "cgroup_mutex or RCU read lock required!" Greg Kroah-Hartman
2023-12-18 13:51 ` Greg Kroah-Hartman [this message]
2023-12-18 13:52 ` [PATCH 5.10 36/62] bcache: remove redundant assignment to variable cur_idx Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 37/62] bcache: add code comments for bch_btree_node_get() and __bch_btree_node_alloc() Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 38/62] bcache: avoid NULL checking to c->root in run_cache_set() Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 39/62] platform/x86: intel_telemetry: Fix kernel doc descriptions Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 40/62] HID: glorious: fix Glorious Model I HID report Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 41/62] HID: add ALWAYS_POLL quirk for Apple kb Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 42/62] HID: hid-asus: reset the backlight brightness level on resume Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 43/62] HID: multitouch: Add quirk for HONOR GLO-GXXX touchpad Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 44/62] asm-generic: qspinlock: fix queued_spin_value_unlocked() implementation Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 45/62] net: usb: qmi_wwan: claim interface 4 for ZTE MF290 Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 46/62] HID: hid-asus: add const to read-only outgoing usb buffer Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 47/62] perf: Fix perf_event_validate_size() lockdep splat Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 48/62] soundwire: stream: fix NULL pointer dereference for multi_link Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 49/62] ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 50/62] arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 51/62] team: Fix use-after-free when an option instance allocation fails Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 52/62] ring-buffer: Fix memory leak of free page Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 53/62] tracing: Update snapshot buffer on resize if it is allocated Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 54/62] ring-buffer: Have saved event hold the entire event Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 55/62] ring-buffer: Fix writing to the buffer with max_data_size Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 56/62] ring-buffer: Fix a race in rb_time_cmpxchg() for 32 bit archs Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 57/62] USB: gadget: core: adjust uevent timing on gadget unbind Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 58/62] tty: n_gsm: fix tty registration before control channel open Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 59/62] tty: n_gsm, remove duplicates of parameters Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 60/62] tty: n_gsm: add sanity check for gsm->receive in gsm_receive_buf() Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 61/62] powerpc/ftrace: Create a dummy stackframe to fix stack unwind Greg Kroah-Hartman
2023-12-18 13:52 ` [PATCH 5.10 62/62] powerpc/ftrace: Fix stack teardown in ftrace_no_trace Greg Kroah-Hartman
2023-12-18 15:24 ` [PATCH 5.10 00/62] 5.10.205-rc1 review Naresh Kamboju
2023-12-19 0:54 ` Dominique Martinet
2023-12-19 7:29 ` Greg Kroah-Hartman
2023-12-21 15:50 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231218135047.829998376@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=andrea.tomassetti-opensource@devo.com \
--cc=axboe@kernel.dk \
--cc=bcache@lists.ewheeler.net \
--cc=colyli@suse.de \
--cc=patches@lists.linux.dev \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox