From: bugzilla@dpdk.org
To: dev@dpdk.org
Subject: [Bug 1277] memory_hotplug_lock deadlock during initialization
Date: Wed, 23 Aug 2023 14:02:23 +0000 [thread overview]
Message-ID: <bug-1277-3@http.bugs.dpdk.org/> (raw)
[-- Attachment #1: Type: text/plain, Size: 7843 bytes --]
https://bugs.dpdk.org/show_bug.cgi?id=1277
Bug ID: 1277
Summary: memory_hotplug_lock deadlock during initialization
Product: DPDK
Version: unspecified
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: Normal
Component: core
Assignee: dev@dpdk.org
Reporter: artemyko@nvidia.com
Target Milestone: ---
It seems the issue arose due to changes in the DPDK read-write lock
implementation. Following these changes, the RW-lock no longer supports
recursion, implying that a single thread shouldn't obtain a read lock if it
already possesses one. The problem arises during initialization: the
rte_eal_memory_init() function acquires the memory_hotplug_lock, and later on,
the sequence of calls eal_memalloc_init() -> rte_memseg_list_walk() acquires it
again without releasing it. This scenario introduces the risk of a potential
deadlock when concurrent write locks are applied to the same
memory_hotplug_lock. To address this locally, we resolved the issue by
replacing rte_memseg_list_walk() with rte_memseg_list_walk_thread_unsafe().
Reproduction:
Create mp_deadlock directory under dpdk/examples/. Then add main.c
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright(c) 2010-2014 Intel Corporation
*/
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <errno.h>
#include <sys/queue.h>
#include <rte_memory.h>
#include <rte_malloc.h>
#include <rte_launch.h>
#include <rte_eal.h>
#include <rte_per_lcore.h>
#include <rte_lcore.h>
#include <rte_debug.h>
/* Initialization of Environment Abstraction Layer (EAL). 8< */
int
main(int argc, char **argv)
{
int ret;
ret = rte_eal_init(argc, argv);
if (ret < 0)
rte_panic("Cannot init EAL\n");
/* >8 End of initialization of Environment Abstraction Layer */
if (rte_eal_process_type() == RTE_PROC_PRIMARY)
getchar();
else {
if (rte_lcore_id() <= 1) {
int i = 0;
void *p;
while (1) {
p = rte_malloc_socket(NULL, 0x1000000, 0x1000,
-1);
rte_free(p);
printf("malloc %d times\n", i++);
}
}
}
/* clean up the EAL */
rte_eal_cleanup();
return 0;
}
Compile: I followed https://doc.dpdk.org/guides/prog_guide/build_app.html and
some tips from related web page.
Run primary: ./examples/mp_deadlock/build/mp_deadlock -l 0 --file-prefix=dpdk1
--proc-type=primary
Run secondary 1: ./examples/mp_deadlock/build/mp_deadlock -l 1
--file-prefix=dpdk1 --proc-type=secondary
Run secondary 2:
while true
do
./examples/mp_deadlock/build/mp_deadlock -l 2 --file-prefix=dpdk1
--proc-type=secondary
done
Stack trace. It looks like the following caused deadlock.
#0 0x00007f850e97a3f2 in rte_mcfg_mem_write_lock () from
/usr/local/lib64/librte_eal.so.23
And
#0 0x00007f3f591b5362 in rte_mcfg_mem_read_lock () from
/usr/local/lib64/librte_eal.so.23
[root@fedora dpdk]# ps -ef | grep deadlock
root 7328 1004 0 20:47 pts/0 00:00:00 bash ./mp_deadlock1.sh
root 7329 7328 4 20:47 pts/0 00:00:01
./examples/mp_deadlock/build/mp_deadlock -l 0 --file-prefix=dpdk1
--proc-type=primary
root 7333 5693 0 20:47 pts/4 00:00:00 bash ./mp_deadlock2.sh
root 7334 7333 94 20:47 pts/4 00:00:31
./examples/mp_deadlock/build/mp_deadlock -l 1 --file-prefix=dpdk1
--proc-type=secondary
root 7337 5267 0 20:47 pts/1 00:00:00 bash ./mp_deadlock.sh
root 7338 7337 98 20:47 pts/1 00:00:29
./examples/mp_deadlock/build/mp_deadlock -l 2 --file-prefix=dpdk1
--proc-type=secondary
root 7342 5480 0 20:47 pts/2 00:00:00 grep --color=auto deadlock
[root@fedora dpdk]# pstack 7329
Thread 4 (Thread 0x7f20ae487640 (LWP 7332) "telemetry-v2"):
#0 0x00007f20b200ae6f in accept () from /lib64/libc.so.6
#1 0x00007f20b1e004a3 in socket_listener () from
/usr/local/lib64/librte_telemetry.so.23
#2 0x00007f20b1f85b17 in start_thread () from /lib64/libc.so.6
#3 0x00007f20b200a6a0 in clone3 () from /lib64/libc.so.6
Thread 3 (Thread 0x7f20aec88640 (LWP 7331) "rte_mp_handle"):
#0 0x00007f20b200b23d in recvmsg () from /lib64/libc.so.6
#1 0x00007f20b2137ecf in mp_handle () from /usr/local/lib64/librte_eal.so.23
#2 0x00007f20b1f85b17 in start_thread () from /lib64/libc.so.6
#3 0x00007f20b200a6a0 in clone3 () from /lib64/libc.so.6
Thread 2 (Thread 0x7f20af489640 (LWP 7330) "eal-intr-thread"):
#0 0x00007f20b2009c7e in epoll_wait () from /lib64/libc.so.6
#1 0x00007f20b2141c54 in eal_intr_thread_main () from
/usr/local/lib64/librte_eal.so.23
#2 0x00007f20b1f85b17 in start_thread () from /lib64/libc.so.6
#3 0x00007f20b200a6a0 in clone3 () from /lib64/libc.so.6
Thread 1 (Thread 0x7f20b1df9900 (LWP 7329) "mp_deadlock"):
#0 0x00007f20b1ff984c in read () from /lib64/libc.so.6
#1 0x00007f20b1f7e914 in __GI__IO_file_underflow () from /lib64/libc.so.6
#2 0x00007f20b1f7f946 in _IO_default_uflow () from /lib64/libc.so.6
#3 0x00007f20b1f7a328 in getc () from /lib64/libc.so.6
#4 0x000000000040113e in main ()
[root@fedora dpdk]# pstack 7334
Thread 3 (Thread 0x7f850b4da640 (LWP 7336) "rte_mp_handle"):
#0 0x00007f850e85d23d in recvmsg () from /lib64/libc.so.6
#1 0x00007f850e989ecf in mp_handle () from /usr/local/lib64/librte_eal.so.23
#2 0x00007f850e7d7b17 in start_thread () from /lib64/libc.so.6
#3 0x00007f850e85c6a0 in clone3 () from /lib64/libc.so.6
Thread 2 (Thread 0x7f850bcdb640 (LWP 7335) "eal-intr-thread"):
#0 0x00007f850e85bc7e in epoll_wait () from /lib64/libc.so.6
#1 0x00007f850e993c54 in eal_intr_thread_main () from
/usr/local/lib64/librte_eal.so.23
#2 0x00007f850e7d7b17 in start_thread () from /lib64/libc.so.6
#3 0x00007f850e85c6a0 in clone3 () from /lib64/libc.so.6
Thread 1 (Thread 0x7f850e64b900 (LWP 7334) "mp_deadlock"):
#0 0x00007f850e97a3f2 in rte_mcfg_mem_write_lock () from
/usr/local/lib64/librte_eal.so.23
#1 0x00007f850e984509 in malloc_heap_free () from
/usr/local/lib64/librte_eal.so.23
#2 0x00007f850e98508f in rte_free () from /usr/local/lib64/librte_eal.so.23
#3 0x0000000000401126 in main ()
[root@fedora dpdk]# pstack 7338
Thread 3 (Thread 0x7f3f55d15640 (LWP 7340) "rte_mp_handle"):
#0 0x00007f3f5909823d in recvmsg () from /lib64/libc.so.6
#1 0x00007f3f591c4ecf in mp_handle () from /usr/local/lib64/librte_eal.so.23
#2 0x00007f3f59012b17 in start_thread () from /lib64/libc.so.6
#3 0x00007f3f590976a0 in clone3 () from /lib64/libc.so.6
Thread 2 (Thread 0x7f3f56516640 (LWP 7339) "eal-intr-thread"):
#0 0x00007f3f59096c7e in epoll_wait () from /lib64/libc.so.6
#1 0x00007f3f591cec54 in eal_intr_thread_main () from
/usr/local/lib64/librte_eal.so.23
#2 0x00007f3f59012b17 in start_thread () from /lib64/libc.so.6
#3 0x00007f3f590976a0 in clone3 () from /lib64/libc.so.6
Thread 1 (Thread 0x7f3f58e86900 (LWP 7338) "mp_deadlock"):
#0 0x00007f3f591b5362 in rte_mcfg_mem_read_lock () from
/usr/local/lib64/librte_eal.so.23
#1 0x00007f3f591b6bf2 in rte_memseg_list_walk () from
/usr/local/lib64/librte_eal.so.23
#2 0x00007f3f591d2f65 in eal_memalloc_init () from
/usr/local/lib64/librte_eal.so.23
#3 0x00007f3f591b741b in rte_eal_memory_init () from
/usr/local/lib64/librte_eal.so.23
#4 0x00007f3f591aab64 in rte_eal_init.cold () from
/usr/local/lib64/librte_eal.so.23
#5 0x00000000004010d9 in main ()
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #2: Type: text/html, Size: 10012 bytes --]
next reply other threads:[~2023-08-23 14:02 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-23 14:02 bugzilla [this message]
2023-08-23 14:56 ` [Bug 1277] memory_hotplug_lock deadlock during initialization Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-1277-3@http.bugs.dpdk.org/ \
--to=bugzilla@dpdk.org \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.