All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Mingzhe Zou <mingzhe.zou@easystack.cn>,
	Coly Li <colyli@suse.de>, Jens Axboe <axboe@kernel.dk>
Subject: [PATCH 6.1 71/82] bcache: fixup lock c->root error
Date: Thu, 30 Nov 2023 16:22:42 +0000	[thread overview]
Message-ID: <20231130162138.232571063@linuxfoundation.org> (raw)
In-Reply-To: <20231130162135.977485944@linuxfoundation.org>

6.1-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mingzhe Zou <mingzhe.zou@easystack.cn>

commit e34820f984512b433ee1fc291417e60c47d56727 upstream.

We had a problem with io hung because it was waiting for c->root to
release the lock.

crash> cache_set.root -l cache_set.list ffffa03fde4c0050
  root = 0xffff802ef454c800
crash> btree -o 0xffff802ef454c800 | grep rw_semaphore
  [ffff802ef454c858] struct rw_semaphore lock;
crash> struct rw_semaphore ffff802ef454c858
struct rw_semaphore {
  count = {
    counter = -4294967297
  },
  wait_list = {
    next = 0xffff00006786fc28,
    prev = 0xffff00005d0efac8
  },
  wait_lock = {
    raw_lock = {
      {
        val = {
          counter = 0
        },
        {
          locked = 0 '\000',
          pending = 0 '\000'
        },
        {
          locked_pending = 0,
          tail = 0
        }
      }
    }
  },
  osq = {
    tail = {
      counter = 0
    }
  },
  owner = 0xffffa03fdc586603
}

The "counter = -4294967297" means that lock count is -1 and a write lock
is being attempted. Then, we found that there is a btree with a counter
of 1 in btree_cache_freeable.

crash> cache_set -l cache_set.list ffffa03fde4c0050 -o|grep btree_cache
  [ffffa03fde4c1140] struct list_head btree_cache;
  [ffffa03fde4c1150] struct list_head btree_cache_freeable;
  [ffffa03fde4c1160] struct list_head btree_cache_freed;
  [ffffa03fde4c1170] unsigned int btree_cache_used;
  [ffffa03fde4c1178] wait_queue_head_t btree_cache_wait;
  [ffffa03fde4c1190] struct task_struct *btree_cache_alloc_lock;
crash> list -H ffffa03fde4c1140|wc -l
973
crash> list -H ffffa03fde4c1150|wc -l
1123
crash> cache_set.btree_cache_used -l cache_set.list ffffa03fde4c0050
  btree_cache_used = 2097
crash> list -s btree -l btree.list -H ffffa03fde4c1140|grep -E -A2 "^  lock = {" > btree_cache.txt
crash> list -s btree -l btree.list -H ffffa03fde4c1150|grep -E -A2 "^  lock = {" > btree_cache_freeable.txt
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# pwd
/var/crash/127.0.0.1-2023-08-04-16:40:28
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache.txt|grep counter|grep -v "counter = 0"
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache_freeable.txt|grep counter|grep -v "counter = 0"
      counter = 1

We found that this is a bug in bch_sectors_dirty_init() when locking c->root:
    (1). Thread X has locked c->root(A) write.
    (2). Thread Y failed to lock c->root(A), waiting for the lock(c->root A).
    (3). Thread X bch_btree_set_root() changes c->root from A to B.
    (4). Thread X releases the lock(c->root A).
    (5). Thread Y successfully locks c->root(A).
    (6). Thread Y releases the lock(c->root B).

        down_write locked ---(1)----------------------┐
                |                                     |
                |   down_read waiting ---(2)----┐     |
                |           |               ┌-------------┐ ┌-------------┐
        bch_btree_set_root ===(3)========>> | c->root   A | | c->root   B |
                |           |               └-------------┘ └-------------┘
            up_write ---(4)---------------------┘     |            |
                            |                         |            |
                    down_read locked ---(5)-----------┘            |
                            |                                      |
                        up_read ---(6)-----------------------------┘

Since c->root may change, the correct steps to lock c->root should be
the same as bch_root_usage(), compare after locking.

static unsigned int bch_root_usage(struct cache_set *c)
{
        unsigned int bytes = 0;
        struct bkey *k;
        struct btree *b;
        struct btree_iter iter;

        goto lock_root;

        do {
                rw_unlock(false, b);
lock_root:
                b = c->root;
                rw_lock(false, b, b->level);
        } while (b != c->root);

        for_each_key_filter(&b->keys, k, &iter, bch_ptr_bad)
                bytes += bkey_bytes(k);

        rw_unlock(false, b);

        return (bytes * 100) / btree_bytes(c);
}

Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Cc:  <stable@vger.kernel.org>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20231120052503.6122-7-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/md/bcache/writeback.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -977,14 +977,22 @@ static int bch_btre_dirty_init_thread_nr
 void bch_sectors_dirty_init(struct bcache_device *d)
 {
 	int i;
+	struct btree *b = NULL;
 	struct bkey *k = NULL;
 	struct btree_iter iter;
 	struct sectors_dirty_init op;
 	struct cache_set *c = d->c;
 	struct bch_dirty_init_state state;
 
+retry_lock:
+	b = c->root;
+	rw_lock(0, b, b->level);
+	if (b != c->root) {
+		rw_unlock(0, b);
+		goto retry_lock;
+	}
+
 	/* Just count root keys if no leaf node */
-	rw_lock(0, c->root, c->root->level);
 	if (c->root->level == 0) {
 		bch_btree_op_init(&op.op, -1);
 		op.inode = d->id;
@@ -997,7 +1005,7 @@ void bch_sectors_dirty_init(struct bcach
 			sectors_dirty_init_fn(&op.op, c->root, k);
 		}
 
-		rw_unlock(0, c->root);
+		rw_unlock(0, b);
 		return;
 	}
 
@@ -1034,7 +1042,7 @@ void bch_sectors_dirty_init(struct bcach
 out:
 	/* Must wait for all threads to stop. */
 	wait_event(state.wait, atomic_read(&state.started) == 0);
-	rw_unlock(0, c->root);
+	rw_unlock(0, b);
 }
 
 void bch_cached_dev_writeback_init(struct cached_dev *dc)



  parent reply	other threads:[~2023-11-30 16:30 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-30 16:21 [PATCH 6.1 00/82] 6.1.65-rc1 review Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 01/82] afs: Fix afs_server_list to be cleaned up with RCU Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 02/82] afs: Make error on cell lookup failure consistent with OpenAFS Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 03/82] drm/panel: boe-tv101wum-nl6: Fine tune the panel power sequence Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 04/82] drm/panel: auo,b101uan08.3: " Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 05/82] drm/panel: simple: Fix Innolux G101ICE-L01 bus flags Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 06/82] drm/panel: simple: Fix Innolux G101ICE-L01 timings Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 07/82] wireguard: use DEV_STATS_INC() Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 08/82] octeontx2-pf: Fix memory leak during interface down Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 09/82] ata: pata_isapnp: Add missing error check for devm_ioport_map() Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 10/82] drm/i915: do not clean GT table on error path Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 11/82] drm/rockchip: vop: Fix color for RGB888/BGR888 format on VOP full Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 12/82] HID: fix HID device resource race between HID core and debugging support Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 13/82] ipv4: Correct/silence an endian warning in __ip_do_redirect Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 14/82] net: usb: ax88179_178a: fix failed operations during ax88179_reset Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 15/82] net/smc: avoid data corruption caused by decline Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 16/82] arm/xen: fix xen_vcpu_info allocation alignment Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 17/82] octeontx2-pf: Fix ntuple rule creation to direct packet to VF with higher Rx queue than its PF Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 18/82] amd-xgbe: handle corner-case during sfp hotplug Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 19/82] amd-xgbe: handle the corner-case during tx completion Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 20/82] amd-xgbe: propagate the correct speed and duplex status Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 21/82] net: axienet: Fix check for partial TX checksum Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 22/82] afs: Return ENOENT if no cell DNS record can be found Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 23/82] afs: Fix file locking on R/O volumes to operate in local mode Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 24/82] mm,kfence: decouple kfence from page granularity mapping judgement Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 25/82] arm64: mm: Fix "rodata=on" when CONFIG_RODATA_FULL_DEFAULT_ENABLED=y Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 26/82] i40e: use ERR_PTR error print in i40e messages Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 27/82] i40e: Fix adding unsupported cloud filters Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 6.1 28/82] nvmet: nul-terminate the NQNs passed in the connect command Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 29/82] USB: dwc3: qcom: fix resource leaks on probe deferral Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 30/82] USB: dwc3: qcom: fix ACPI platform device leak Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 31/82] lockdep: Fix block chain corruption Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 32/82] cifs: minor cleanup of some headers Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 33/82] smb3: allow dumping session and tcon id to improve stats analysis and debugging Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 34/82] cifs: print last update time for interface list Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 35/82] cifs: distribute channels across interfaces based on speed Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 36/82] cifs: account for primary channel in the interface list Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 37/82] cifs: fix leak of iface for primary channel Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 38/82] MIPS: KVM: Fix a build warning about variable set but not used Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 39/82] media: camss: Split power domain management Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 40/82] media: camss: Convert to platform remove callback returning void Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 41/82] media: qcom: Initialise V4L2 async notifier later Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 42/82] media: qcom: camss: Fix V4L2 async notifier error path Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 43/82] media: qcom: camss: Fix genpd cleanup Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 44/82] ext4: add a new helper to check if es must be kept Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 45/82] ext4: factor out __es_alloc_extent() and __es_free_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 46/82] ext4: use pre-allocated es in __es_insert_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 47/82] ext4: use pre-allocated es in __es_remove_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 48/82] ext4: using nofail preallocation in ext4_es_remove_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 49/82] ext4: using nofail preallocation in ext4_es_insert_delayed_block() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 50/82] ext4: using nofail preallocation in ext4_es_insert_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 51/82] ext4: fix slab-use-after-free " Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 52/82] ext4: make sure allocate pending entry not fail Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 53/82] NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 54/82] NFSD: Fix checksum mismatches in the duplicate reply cache Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 55/82] arm64: dts: imx8mn-var-som: add 20ms delay to ethernet regulator enable Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 56/82] ACPI: resource: Skip IRQ override on ASUS ExpertBook B1402CVA Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 57/82] swiotlb-xen: provide the "max_mapping_size" method Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 58/82] bcache: replace a mistaken IS_ERR() by IS_ERR_OR_NULL() in btree_gc_coalesce() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 59/82] md: fix bi_status reporting in md_end_clone_io Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 60/82] bcache: fixup multi-threaded bch_sectors_dirty_init() wake-up race Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 61/82] io_uring/fs: consider link->flags when getting path for LINKAT Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 62/82] s390/dasd: protect device queue against concurrent access Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 63/82] USB: serial: option: add Luat Air72*U series products Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 64/82] hv_netvsc: fix race of netvsc and VF register_netdevice Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 65/82] hv_netvsc: Fix race of register_netdevice_notifier and VF register Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 66/82] hv_netvsc: Mark VF as slave before exposing it to user-mode Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 67/82] dm-delay: fix a race between delay_presuspend and delay_bio Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 68/82] bcache: check return value from btree_node_alloc_replacement() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 69/82] bcache: prevent potential division by zero error Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 70/82] bcache: fixup init dirty data errors Greg Kroah-Hartman
2023-11-30 16:22 ` Greg Kroah-Hartman [this message]
2023-11-30 16:22 ` [PATCH 6.1 72/82] usb: cdnsp: Fix deadlock issue during using NCM gadget Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 73/82] USB: serial: option: add Fibocom L7xx modules Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 74/82] USB: serial: option: fix FM101R-GL defines Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 75/82] USB: serial: option: dont claim interface 4 for ZTE MF290 Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 76/82] usb: typec: tcpm: Skip hard reset when in error recovery Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 77/82] USB: dwc2: write HCINT with INTMASK applied Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 78/82] usb: dwc3: Fix default mode initialization Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 79/82] usb: dwc3: set the dma max_seg_size Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 80/82] USB: dwc3: qcom: fix software node leak on probe errors Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 81/82] USB: dwc3: qcom: fix wakeup after probe deferral Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 6.1 82/82] io_uring: fix off-by one bvec index Greg Kroah-Hartman
2023-11-30 19:10 ` [PATCH 6.1 00/82] 6.1.65-rc1 review Florian Fainelli
2023-12-01  0:09 ` Shuah Khan
2023-12-01 10:54 ` Jon Hunter
2023-12-01 11:01 ` Conor Dooley
2023-12-01 13:41 ` Naresh Kamboju
2023-12-01 20:30 ` Guenter Roeck
2023-12-02  0:40 ` SeongJae Park
2023-12-02  2:40 ` Ron Economos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231130162138.232571063@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=colyli@suse.de \
    --cc=mingzhe.zou@easystack.cn \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.