public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Guenter Roeck <linux@roeck-us.net>
To: Breno Leitao <leitao@debian.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Thomas Graf <tgraf@suug.ch>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Tejun Heo <tj@kernel.org>, Hao Luo <haoluo@google.com>,
	Josh Don <joshdon@google.com>, Barret Rhoden <brho@google.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock
Date: Mon, 13 Jan 2025 11:50:36 -0800	[thread overview]
Message-ID: <6a0b432d-284a-45eb-991a-c7bba2c93e0b@roeck-us.net> (raw)
In-Reply-To: <20241128-scx_lockdep-v1-1-2315b813b36b@debian.org>

Hi,

On Thu, Nov 28, 2024 at 04:16:25AM -0800, Breno Leitao wrote:
> Move the hash table growth check and work scheduling outside the
> rht lock to prevent a possible circular locking dependency.
> 
> The original implementation could trigger a lockdep warning due to
> a potential deadlock scenario involving nested locks between
> rhashtable bucket, rq lock, and dsq lock. By relocating the
> growth check and work scheduling after releasing the rth lock, we break
> this potential deadlock chain.
> 
> This change expands the flexibility of rhashtable by removing
> restrictive locking that previously limited its use in scheduler
> and workqueue contexts.
> 
> Import to say that this calls rht_grow_above_75(), which reads from
> struct rhashtable without holding the lock, if this is a problem, we can
> move the check to the lock, and schedule the workqueue after the lock.
> 
> Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Breno Leitao <leitao@debian.org>

With this patch in linux-next, I get some unit test errors.

[    3.800185]     # Subtest: hw_breakpoint
[    3.800469]     # module: hw_breakpoint_test
[    3.800718]     1..9
[    3.810825]     # test_one_cpu: pass:1 fail:0 skip:0 total:1
[    3.810950]     ok 1 test_one_cpu
[    3.814941]     # test_many_cpus: pass:1 fail:0 skip:0 total:1
[    3.815092]     ok 2 test_many_cpus
[    3.822977]     # test_one_task_on_all_cpus: pass:1 fail:0 skip:0 total:1
[    3.823100]     ok 3 test_one_task_on_all_cpus
[    3.829071]     # test_two_tasks_on_all_cpus: pass:1 fail:0 skip:0 total:1
[    3.829199]     ok 4 test_two_tasks_on_all_cpus
[    3.830914]     # test_one_task_on_one_cpu: ASSERTION FAILED at kernel/events/hw_breakpoint_test.c:70
[    3.830914]     Expected IS_ERR(bp) to be false, but is true
[    3.832572]     # test_one_task_on_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.832572]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.833002]     # test_one_task_on_one_cpu: pass:0 fail:1 skip:0 total:1
[    3.833071]     not ok 5 test_one_task_on_one_cpu
[    3.834994]     # test_one_task_mixed: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.834994]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.835583]     # test_one_task_mixed: pass:0 fail:1 skip:0 total:1
[    3.835638]     not ok 6 test_one_task_mixed
[    3.837131]     # test_two_tasks_on_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.837131]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.837827]     # test_two_tasks_on_one_cpu: pass:0 fail:1 skip:0 total:1
[    3.837882]     not ok 7 test_two_tasks_on_one_cpu
[    3.839868]     # test_two_tasks_on_one_all_cpus: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.839868]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.840294]     # test_two_tasks_on_one_all_cpus: pass:0 fail:1 skip:0 total:1
[    3.840538]     not ok 8 test_two_tasks_on_one_all_cpus
[    3.843599]     # test_task_on_all_and_one_cpu: EXPECTATION FAILED at kernel/events/hw_breakpoint_test.c:320
[    3.843599]     Expected hw_breakpoint_is_used() to be false, but is true
[    3.844163]     # test_task_on_all_and_one_cpu: pass:0 fail:1 skip:0 total:1
[    3.844215]     not ok 9 test_task_on_all_and_one_cpu
[    3.844453] # hw_breakpoint: pass:4 fail:5 skip:0 total:9
[    3.844610] # Totals: pass:4 fail:5 skip:0 total:9
[    3.844797] not ok 1 hw_breakpoint

Sometimes I also see:

[   12.579842]     # Subtest: Handshake API tests
[   12.579971]     1..11
[   12.580052]         KTAP version 1
[   12.580410]         # Subtest: req_alloc API fuzzing
[   12.582206]         ok 1 handshake_req_alloc NULL proto
[   12.583541]         ok 2 handshake_req_alloc CLASS_NONE
[   12.585419]         ok 3 handshake_req_alloc CLASS_MAX
[   12.587291]         ok 4 handshake_req_alloc no callbacks
[   12.589239]         ok 5 handshake_req_alloc no done callback
[   12.590758]         ok 6 handshake_req_alloc excessive privsize
[   12.592642]         ok 7 handshake_req_alloc all good
[   12.592802]     # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
[   12.593185]     ok 1 req_alloc API fuzzing
[   12.597371]     # req_submit NULL req arg: pass:1 fail:0 skip:0 total:1
[   12.597501]     ok 2 req_submit NULL req arg
[   12.599208]     # req_submit NULL sock arg: pass:1 fail:0 skip:0 total:1
[   12.599338]     ok 3 req_submit NULL sock arg
[   12.601549]     # req_submit NULL sock->file: pass:1 fail:0 skip:0 total:1
[   12.601680]     ok 4 req_submit NULL sock->file
[   12.605334]     # req_lookup works: pass:1 fail:0 skip:0 total:1
[   12.605469]     ok 5 req_lookup works
[   12.609596]     # req_submit max pending: pass:1 fail:0 skip:0 total:1
[   12.609730]     ok 6 req_submit max pending
[   12.613796]     # req_submit multiple: pass:1 fail:0 skip:0 total:1
[   12.614250]     ok 7 req_submit multiple
[   12.616395]     # req_cancel before accept: ASSERTION FAILED at net/handshake/handshake-test.c:333
[   12.616395]     Expected err == 0, but
[   12.616395]         err == -16 (0xfffffffffffffff0)
[   12.618061]     # req_cancel before accept: pass:0 fail:1 skip:0 total:1
[   12.618135]     not ok 8 req_cancel before accept
[   12.619437]     # req_cancel after accept: ASSERTION FAILED at net/handshake/handshake-test.c:369
[   12.619437]     Expected err == 0, but
[   12.619437]         err == -16 (0xfffffffffffffff0)
[   12.621055]     # req_cancel after accept: pass:0 fail:1 skip:0 total:1
[   12.621119]     not ok 9 req_cancel after accept
[   12.622342]     # req_cancel after done: ASSERTION FAILED at net/handshake/handshake-test.c:411
[   12.622342]     Expected err == 0, but
[   12.622342]         err == -16 (0xfffffffffffffff0)
[   12.623547]     # req_cancel after done: pass:0 fail:1 skip:0 total:1
[   12.623608]     not ok 10 req_cancel after done
[   12.625297]     # req_destroy works: ASSERTION FAILED at net/handshake/handshake-test.c:469
[   12.625297]     Expected err == 0, but
[   12.625297]         err == -16 (0xfffffffffffffff0)
[   12.626633]     # req_destroy works: pass:0 fail:1 skip:0 total:1
[   12.626696]     not ok 11 req_destroy works
[   12.626837] # Handshake API tests: pass:7 fail:4 skip:0 total:11
[   12.627298] # Totals: pass:13 fail:4 skip:0 total:17
[   12.627446] not ok 90 Handshake API tests

The log is from a test with x86_64, but other architectures are affected
as well. Reverting this patch fixes the problem.

Bisect log is attached for reference.

Guenter

---
# bad: [37136bf5c3a6f6b686d74f41837a6406bec6b7bc] Add linux-next specific files for 20250113
# good: [9d89551994a430b50c4fffcb1e617a057fa76e20] Linux 6.13-rc6
git bisect start 'HEAD' 'v6.13-rc6'
# good: [25dcaaf9b3bdaa117b8eb722ebde76ec9ed30038] Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect good 25dcaaf9b3bdaa117b8eb722ebde76ec9ed30038
# bad: [c6ab5ee56509953c3ee6647ac9f266a7c628f082] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
git bisect bad c6ab5ee56509953c3ee6647ac9f266a7c628f082
# good: [39388d53c57be95eafb0ce1d81d0ec6bd2f6f42d] Merge tag 'cgroup-dmem-drm-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into drm-next
git bisect good 39388d53c57be95eafb0ce1d81d0ec6bd2f6f42d
# bad: [0f8b2b2250abe043cef890caa378bebe5c4f5d88] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394.git
git bisect bad 0f8b2b2250abe043cef890caa378bebe5c4f5d88
# good: [67fcb0469b17071890761d437bdf83d2e2d14575] Merge branch 'spi-nor/next' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git
git bisect good 67fcb0469b17071890761d437bdf83d2e2d14575
# bad: [7fd4a4e4c397fc315f8ebf0fb22e50526f66580a] Merge branch 'drm-next' of https://gitlab.freedesktop.org/agd5f/linux
git bisect bad 7fd4a4e4c397fc315f8ebf0fb22e50526f66580a
# good: [e2c4c6c10542ccfe4a0830bb6c9fd5b177b7bbb7] drm/amd/display: Initialize denominator defaults to 1
git bisect good e2c4c6c10542ccfe4a0830bb6c9fd5b177b7bbb7
# bad: [5b7981c1ca61ca7ad7162cfe95bf271d001d29ac] crypto: x86/aes-xts - use .irp when useful
git bisect bad 5b7981c1ca61ca7ad7162cfe95bf271d001d29ac
# good: [ce8fd0500b741b3669c246cc604f1f2343cdd6fd] crypto: qce - use __free() for a buffer that's always freed
git bisect good ce8fd0500b741b3669c246cc604f1f2343cdd6fd
# good: [5e252f490c1c2c989cdc2ca50744f30fbca356b4] crypto: tea - stop using cra_alignmask
git bisect good 5e252f490c1c2c989cdc2ca50744f30fbca356b4
# good: [f916e44487f56df4827069ff3a2070c0746dc511] crypto: keywrap - remove assignment of 0 to cra_alignmask
git bisect good f916e44487f56df4827069ff3a2070c0746dc511
# bad: [b9b894642fede191d50230d08608bd4f4f49f73d] crypto: lib/gf128mul - Remove some bbe deadcode
git bisect bad b9b894642fede191d50230d08608bd4f4f49f73d
# bad: [e1d3422c95f003eba241c176adfe593c33e8a8f6] rhashtable: Fix potential deadlock by moving schedule_work outside lock
git bisect bad e1d3422c95f003eba241c176adfe593c33e8a8f6
# first bad commit: [e1d3422c95f003eba241c176adfe593c33e8a8f6] rhashtable: Fix potential deadlock by moving schedule_work outside lock

  parent reply	other threads:[~2025-01-13 19:50 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-28 12:16 [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Breno Leitao
2024-11-29  5:47 ` Herbert Xu
2024-12-02 20:16   ` Breno Leitao
2024-12-03 20:16 ` Tejun Heo
2024-12-04  1:34   ` Herbert Xu
2024-12-12 12:33 ` Herbert Xu
2024-12-21  9:06   ` Herbert Xu
2025-01-02 10:15     ` Breno Leitao
2025-01-09  3:16       ` Michael Kelley
2025-01-09 10:15         ` Breno Leitao
2025-01-10  9:27           ` Herbert Xu
2025-01-10  9:49             ` Breno Leitao
2025-01-10 10:07               ` Herbert Xu
2025-01-10 14:46             ` Zaslonko Mikhail
2025-01-10 16:59             ` Michael Kelley
2025-01-10 17:24               ` [v2 PATCH] rhashtable: Fix rhashtable_try_insert test Herbert Xu
2025-01-10 18:22                 ` Michael Kelley
2025-01-14  3:15                   ` [v3 " Herbert Xu
2025-01-14 11:58                     ` Michael Kelley
2025-01-15 15:15                     ` Breno Leitao
2025-01-16  9:10                       ` Herbert Xu
2025-01-16 11:48                     ` Alexander Gordeev
2025-01-17 13:20                     ` Zaslonko Mikhail
2025-01-13 19:50 ` Guenter Roeck [this message]
2025-01-14  3:23   ` [PATCH] rhashtable: Fix potential deadlock by moving schedule_work outside lock Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6a0b432d-284a-45eb-991a-c7bba2c93e0b@roeck-us.net \
    --to=linux@roeck-us.net \
    --cc=akpm@linux-foundation.org \
    --cc=brho@google.com \
    --cc=haoluo@google.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=joshdon@google.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox