public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Mingzhe Zou <mingzhe.zou@easystack.cn>,
	Coly Li <colyli@suse.de>, Jens Axboe <axboe@kernel.dk>
Subject: [PATCH 5.15 58/69] bcache: fixup lock c->root error
Date: Thu, 30 Nov 2023 16:22:55 +0000	[thread overview]
Message-ID: <20231130162134.962040542@linuxfoundation.org> (raw)
In-Reply-To: <20231130162133.035359406@linuxfoundation.org>

5.15-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mingzhe Zou <mingzhe.zou@easystack.cn>

commit e34820f984512b433ee1fc291417e60c47d56727 upstream.

We had a problem with io hung because it was waiting for c->root to
release the lock.

crash> cache_set.root -l cache_set.list ffffa03fde4c0050
  root = 0xffff802ef454c800
crash> btree -o 0xffff802ef454c800 | grep rw_semaphore
  [ffff802ef454c858] struct rw_semaphore lock;
crash> struct rw_semaphore ffff802ef454c858
struct rw_semaphore {
  count = {
    counter = -4294967297
  },
  wait_list = {
    next = 0xffff00006786fc28,
    prev = 0xffff00005d0efac8
  },
  wait_lock = {
    raw_lock = {
      {
        val = {
          counter = 0
        },
        {
          locked = 0 '\000',
          pending = 0 '\000'
        },
        {
          locked_pending = 0,
          tail = 0
        }
      }
    }
  },
  osq = {
    tail = {
      counter = 0
    }
  },
  owner = 0xffffa03fdc586603
}

The "counter = -4294967297" means that lock count is -1 and a write lock
is being attempted. Then, we found that there is a btree with a counter
of 1 in btree_cache_freeable.

crash> cache_set -l cache_set.list ffffa03fde4c0050 -o|grep btree_cache
  [ffffa03fde4c1140] struct list_head btree_cache;
  [ffffa03fde4c1150] struct list_head btree_cache_freeable;
  [ffffa03fde4c1160] struct list_head btree_cache_freed;
  [ffffa03fde4c1170] unsigned int btree_cache_used;
  [ffffa03fde4c1178] wait_queue_head_t btree_cache_wait;
  [ffffa03fde4c1190] struct task_struct *btree_cache_alloc_lock;
crash> list -H ffffa03fde4c1140|wc -l
973
crash> list -H ffffa03fde4c1150|wc -l
1123
crash> cache_set.btree_cache_used -l cache_set.list ffffa03fde4c0050
  btree_cache_used = 2097
crash> list -s btree -l btree.list -H ffffa03fde4c1140|grep -E -A2 "^  lock = {" > btree_cache.txt
crash> list -s btree -l btree.list -H ffffa03fde4c1150|grep -E -A2 "^  lock = {" > btree_cache_freeable.txt
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# pwd
/var/crash/127.0.0.1-2023-08-04-16:40:28
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache.txt|grep counter|grep -v "counter = 0"
[root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache_freeable.txt|grep counter|grep -v "counter = 0"
      counter = 1

We found that this is a bug in bch_sectors_dirty_init() when locking c->root:
    (1). Thread X has locked c->root(A) write.
    (2). Thread Y failed to lock c->root(A), waiting for the lock(c->root A).
    (3). Thread X bch_btree_set_root() changes c->root from A to B.
    (4). Thread X releases the lock(c->root A).
    (5). Thread Y successfully locks c->root(A).
    (6). Thread Y releases the lock(c->root B).

        down_write locked ---(1)----------------------┐
                |                                     |
                |   down_read waiting ---(2)----┐     |
                |           |               ┌-------------┐ ┌-------------┐
        bch_btree_set_root ===(3)========>> | c->root   A | | c->root   B |
                |           |               └-------------┘ └-------------┘
            up_write ---(4)---------------------┘     |            |
                            |                         |            |
                    down_read locked ---(5)-----------┘            |
                            |                                      |
                        up_read ---(6)-----------------------------┘

Since c->root may change, the correct steps to lock c->root should be
the same as bch_root_usage(), compare after locking.

static unsigned int bch_root_usage(struct cache_set *c)
{
        unsigned int bytes = 0;
        struct bkey *k;
        struct btree *b;
        struct btree_iter iter;

        goto lock_root;

        do {
                rw_unlock(false, b);
lock_root:
                b = c->root;
                rw_lock(false, b, b->level);
        } while (b != c->root);

        for_each_key_filter(&b->keys, k, &iter, bch_ptr_bad)
                bytes += bkey_bytes(k);

        rw_unlock(false, b);

        return (bytes * 100) / btree_bytes(c);
}

Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Cc:  <stable@vger.kernel.org>
Signed-off-by: Coly Li <colyli@suse.de>
Link: https://lore.kernel.org/r/20231120052503.6122-7-colyli@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/md/bcache/writeback.c |   14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -967,14 +967,22 @@ static int bch_btre_dirty_init_thread_nr
 void bch_sectors_dirty_init(struct bcache_device *d)
 {
 	int i;
+	struct btree *b = NULL;
 	struct bkey *k = NULL;
 	struct btree_iter iter;
 	struct sectors_dirty_init op;
 	struct cache_set *c = d->c;
 	struct bch_dirty_init_state state;
 
+retry_lock:
+	b = c->root;
+	rw_lock(0, b, b->level);
+	if (b != c->root) {
+		rw_unlock(0, b);
+		goto retry_lock;
+	}
+
 	/* Just count root keys if no leaf node */
-	rw_lock(0, c->root, c->root->level);
 	if (c->root->level == 0) {
 		bch_btree_op_init(&op.op, -1);
 		op.inode = d->id;
@@ -987,7 +995,7 @@ void bch_sectors_dirty_init(struct bcach
 			sectors_dirty_init_fn(&op.op, c->root, k);
 		}
 
-		rw_unlock(0, c->root);
+		rw_unlock(0, b);
 		return;
 	}
 
@@ -1024,7 +1032,7 @@ void bch_sectors_dirty_init(struct bcach
 out:
 	/* Must wait for all threads to stop. */
 	wait_event(state.wait, atomic_read(&state.started) == 0);
-	rw_unlock(0, c->root);
+	rw_unlock(0, b);
 }
 
 void bch_cached_dev_writeback_init(struct cached_dev *dc)



  parent reply	other threads:[~2023-11-30 16:33 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-30 16:21 [PATCH 5.15 00/69] 5.15.141-rc1 review Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 5.15 01/69] afs: Fix afs_server_list to be cleaned up with RCU Greg Kroah-Hartman
2023-11-30 16:21 ` [PATCH 5.15 02/69] afs: Make error on cell lookup failure consistent with OpenAFS Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 03/69] drm/panel: boe-tv101wum-nl6: Fine tune the panel power sequence Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 04/69] drm/panel: auo,b101uan08.3: " Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 05/69] drm/panel: simple: Fix Innolux G101ICE-L01 bus flags Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 06/69] drm/panel: simple: Fix Innolux G101ICE-L01 timings Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 07/69] wireguard: use DEV_STATS_INC() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 08/69] octeontx2-pf: Fix memory leak during interface down Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 09/69] ata: pata_isapnp: Add missing error check for devm_ioport_map() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 10/69] drm/rockchip: vop: Fix color for RGB888/BGR888 format on VOP full Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 11/69] HID: core: store the unique system identifier in hid_device Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 12/69] HID: fix HID device resource race between HID core and debugging support Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 13/69] ipv4: Correct/silence an endian warning in __ip_do_redirect Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 14/69] net: usb: ax88179_178a: fix failed operations during ax88179_reset Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 15/69] net/smc: avoid data corruption caused by decline Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 16/69] arm/xen: fix xen_vcpu_info allocation alignment Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 17/69] octeontx2-pf: Fix ntuple rule creation to direct packet to VF with higher Rx queue than its PF Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 18/69] amd-xgbe: handle corner-case during sfp hotplug Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 19/69] amd-xgbe: handle the corner-case during tx completion Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 20/69] amd-xgbe: propagate the correct speed and duplex status Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 21/69] net: axienet: Fix check for partial TX checksum Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 22/69] afs: Return ENOENT if no cell DNS record can be found Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 23/69] afs: Fix file locking on R/O volumes to operate in local mode Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 24/69] nvmet: nul-terminate the NQNs passed in the connect command Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 25/69] USB: dwc3: qcom: fix resource leaks on probe deferral Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 26/69] USB: dwc3: qcom: fix ACPI platform device leak Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 27/69] lockdep: Fix block chain corruption Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 28/69] MIPS: KVM: Fix a build warning about variable set but not used Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 29/69] media: camss: Replace hard coded value with parameter Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 30/69] media: camss: sm8250: Virtual channels for CSID Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 31/69] media: qcom: camss: Fix set CSI2_RX_CFG1_VC_MODE when VC is greater than 3 Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 32/69] media: qcom: camss: Fix csid-gen2 for test pattern generator Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 33/69] ext4: add a new helper to check if es must be kept Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 34/69] ext4: factor out __es_alloc_extent() and __es_free_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 35/69] ext4: use pre-allocated es in __es_insert_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 36/69] ext4: use pre-allocated es in __es_remove_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 37/69] ext4: using nofail preallocation in ext4_es_remove_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 38/69] ext4: using nofail preallocation in ext4_es_insert_delayed_block() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 39/69] ext4: using nofail preallocation in ext4_es_insert_extent() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 40/69] ext4: fix slab-use-after-free " Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 41/69] ext4: make sure allocate pending entry not fail Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 42/69] tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 43/69] proc: sysctl: prevent aliased sysctls from getting passed to init Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 44/69] ACPI: resource: Skip IRQ override on ASUS ExpertBook B1402CVA Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 45/69] swiotlb-xen: provide the "max_mapping_size" method Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 46/69] bcache: replace a mistaken IS_ERR() by IS_ERR_OR_NULL() in btree_gc_coalesce() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 47/69] md: fix bi_status reporting in md_end_clone_io Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 48/69] bcache: fixup multi-threaded bch_sectors_dirty_init() wake-up race Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 49/69] io_uring/fs: consider link->flags when getting path for LINKAT Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 50/69] s390/dasd: protect device queue against concurrent access Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 51/69] USB: serial: option: add Luat Air72*U series products Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 52/69] hv_netvsc: Fix race of register_netdevice_notifier and VF register Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 53/69] hv_netvsc: Mark VF as slave before exposing it to user-mode Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 54/69] dm-delay: fix a race between delay_presuspend and delay_bio Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 55/69] bcache: check return value from btree_node_alloc_replacement() Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 56/69] bcache: prevent potential division by zero error Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 57/69] bcache: fixup init dirty data errors Greg Kroah-Hartman
2023-11-30 16:22 ` Greg Kroah-Hartman [this message]
2023-11-30 16:22 ` [PATCH 5.15 59/69] usb: cdnsp: Fix deadlock issue during using NCM gadget Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 60/69] USB: serial: option: add Fibocom L7xx modules Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 61/69] USB: serial: option: fix FM101R-GL defines Greg Kroah-Hartman
2023-11-30 16:22 ` [PATCH 5.15 62/69] USB: serial: option: dont claim interface 4 for ZTE MF290 Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 63/69] usb: typec: tcpm: Skip hard reset when in error recovery Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 64/69] USB: dwc2: write HCINT with INTMASK applied Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 65/69] usb: dwc3: Fix default mode initialization Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 66/69] usb: dwc3: set the dma max_seg_size Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 67/69] USB: dwc3: qcom: fix software node leak on probe errors Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 68/69] USB: dwc3: qcom: fix wakeup after probe deferral Greg Kroah-Hartman
2023-11-30 16:23 ` [PATCH 5.15 69/69] io_uring: fix off-by one bvec index Greg Kroah-Hartman
2023-11-30 17:21 ` [PATCH 5.15 00/69] 5.15.141-rc1 review Daniel Díaz
2023-11-30 17:44   ` Guenter Roeck
2023-11-30 18:11     ` Daniel Díaz
2023-11-30 18:56       ` Guenter Roeck
2023-12-01  8:21       ` Greg Kroah-Hartman
2023-12-01  9:35         ` Francis Laniel
2023-12-01  9:44           ` Greg Kroah-Hartman
2023-12-01 14:34             ` Daniel Díaz
2023-12-01 23:04               ` Greg Kroah-Hartman
2023-12-04 14:55                 ` Daniel Díaz
2023-12-01  6:31     ` Harshit Mogalapalli
2023-11-30 22:27   ` Pavel Machek
2023-11-30 18:57 ` Florian Fainelli
2023-12-01  0:08 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231130162134.962040542@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=colyli@suse.de \
    --cc=mingzhe.zou@easystack.cn \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox