All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minlan Wang via lttng-dev <lttng-dev@lists.lttng.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: lttng-dev <lttng-dev@lists.lttng.org>
Subject: Re: [lttng-dev] urcu workqueue thread uses 99% of cpu while workqueue is empty
Date: Thu, 16 Jun 2022 04:09:12 -0400	[thread overview]
Message-ID: <20220616080912.GA246781@localhost.localdomain> (raw)
In-Reply-To: <396090936.61897.1655302558168.JavaMail.zimbra@efficios.com>

[-- Attachment #1: Type: text/plain, Size: 4422 bytes --]

Hi, Mathieu,
I tried to write a simple program to reproduce this issue, but failed.
The environment we used to produce this issue is a storage cluster.
It has 3 nodes, each with an NVME and several SATA disks, urcu is part of the
storage cluster software.
The storage software is pretty big, 3+ GB in size. I guess it's hard for you to
build a environment like ours.

Besides, I made some change into workqueue.c to debug this issue today, here's
the patch i used for debuging:

From 18f3fea1436e0c999b346b998b7153150b92e62f Mon Sep 17 00:00:00 2001
From: wangminlan <wangminlan@szsandstone.com>
Date: Wed, 15 Jun 2022 14:28:53 +0800
Subject: [PATCH] <urcu> debug workqueue->futex

---
 .../userspace-rcu-0.12.1/src/workqueue.c           | 49 ++++++++++++++++++----
 1 file changed, 42 insertions(+), 7 deletions(-)

diff --git a/ceph/thirdpart/ascache/thirdparty/userspace-rcu-0.12.1/src/workqueue.c b/ceph/thirdpart/ascache/thirdparty/userspace-rcu-0.12.1/src/workqueue.c
index 59eb21d..eed1e21 100644
--- a/ceph/thirdpart/ascache/thirdparty/userspace-rcu-0.12.1/src/workqueue.c
+++ b/ceph/thirdpart/ascache/thirdparty/userspace-rcu-0.12.1/src/workqueue.c
@@ -132,12 +132,21 @@ static int set_thread_cpu_affinity(struct urcu_workqueue *workqueue)
 
 static void futex_wait(int32_t *futex)
 {
+	int ret;
+	int err;
 	/* Read condition before read futex */
 	cmm_smp_mb();
-	if (uatomic_read(futex) != -1)
+	if (uatomic_read(futex) != -1) {
+		fprintf(stderr, "%lu: wq %p: %s futex != -1, don't wait\n",
+			pthread_self(), caa_container_of(futex, struct urcu_workqueue, futex), __func__);
+
 		return;
-	while (futex_async(futex, FUTEX_WAIT, -1, NULL, NULL, 0)) {
-		switch (errno) {
+	}
+	while ((ret = futex_async(futex, FUTEX_WAIT, -1, NULL, NULL, 0))) {
+		err = errno;
+		fprintf(stderr, "%lu: wq %p: %s failed, errno %d\n",
+			pthread_self(), caa_container_of(futex, struct urcu_workqueue, futex), __func__, err);
+		switch (err) {
 		case EWOULDBLOCK:
 			/* Value already changed. */
 			return;
@@ -146,17 +155,35 @@ static void futex_wait(int32_t *futex)
 			break;	/* Get out of switch. */
 		default:
 			/* Unexpected error. */
-			urcu_die(errno);
+			urcu_die(err);
 		}
 	}
+
+	fprintf(stderr, "%lu: wq %p: %s wait return %d\n",
+		pthread_self(), caa_container_of(futex, struct urcu_workqueue, futex), __func__, ret);
+
 }
 
 static void futex_wake_up(int32_t *futex)
 {
 	/* Write to condition before reading/writing futex */
+	int32_t old;
+
 	cmm_smp_mb();
-	if (caa_unlikely(uatomic_read(futex) == -1)) {
-		uatomic_set(futex, 0);
+	old = uatomic_read(futex);
+	if (caa_unlikely(old == -1)) {
+		old = uatomic_xchg(futex, 0);
+		if (old == -1) {
+			fprintf(stderr, "%lu: wq %p, wakeup succeed: old %d\n",
+				pthread_self(),
+				caa_container_of(futex, struct urcu_workqueue, futex),
+				old);
+		} else {
+			fprintf(stderr, "%lu: wq %p, wakeup failed: old %d\n",
+				pthread_self(),
+				caa_container_of(futex, struct urcu_workqueue, futex),
+				old);
+		}
 		if (futex_async(futex, FUTEX_WAKE, 1,
 				NULL, NULL, 0) < 0)
 			urcu_die(errno);
@@ -237,8 +264,16 @@ static void *workqueue_thread(void *arg)
 		if (!rt) {
 			if (cds_wfcq_empty(&workqueue->cbs_head,
 					&workqueue->cbs_tail)) {
+				int32_t new;
 				futex_wait(&workqueue->futex);
-				uatomic_dec(&workqueue->futex);
+				new = uatomic_add_return(&workqueue->futex, -1);
+				if (new == -1) {
+					fprintf(stderr, "%lu: wq %p dec succeed: old %d, new %d\n",
+						pthread_self(), workqueue, new + 1, new);
+				} else {
+					fprintf(stderr, "%lu: wq %p dec failed: old %d\n",
+						pthread_self(), workqueue, new + 1);
+				}
 				/*
 				 * Decrement futex before reading
 				 * urcu_work list.
-- 
1.8.3.1

And the instresting log began from here:
---
...
   2097 140196498532096: wq 0x55f69a99ea00, wakeup succeed: old -1	/* enqueue op set futex: -1 -> 0 */
   2098 140196272453376: wq 0x55f69a99ea00: futex_wait wait return 0	/* workqueue_thread wait success */
   2099 140196272453376: wq 0x55f69a99ea00 dec failed: old -1		/* workqueue_thread got futex == -1 ? */
   2100 140196272453376: wq 0x55f69a99ea00: futex_wait futex != -1, don't wait
   2101 140196272453376: wq 0x55f69a99ea00 dec failed: old -2
...
---
In line 2099, how did workqueue_thread got futex as -1? This is wierd.

I put the patch i used to debug, and the complete log of the reproduce into the
attachment, may this helps.

Thanks,
Minlan

[-- Attachment #2: debug.tar.gz --]
[-- Type: application/gzip, Size: 5023 bytes --]

[-- Attachment #3: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

  parent reply	other threads:[~2022-06-16  8:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14  3:55 [lttng-dev] urcu workqueue thread uses 99% of cpu while workqueue is empty Minlan Wang via lttng-dev
2022-06-14 13:39 ` Mathieu Desnoyers via lttng-dev
2022-06-14 13:40   ` Mathieu Desnoyers via lttng-dev
2022-06-14 14:19 ` Mathieu Desnoyers via lttng-dev
2022-06-14 15:53 ` Mathieu Desnoyers via lttng-dev
2022-06-15  3:49   ` Minlan Wang via lttng-dev
2022-06-15 13:35     ` Mathieu Desnoyers via lttng-dev
2022-06-15 14:15     ` Mathieu Desnoyers via lttng-dev
2022-06-16  7:09       ` Minlan Wang via lttng-dev
2022-06-16  8:09       ` Minlan Wang via lttng-dev [this message]
2022-06-17 13:37         ` Mathieu Desnoyers via lttng-dev
2022-06-21  3:52           ` Minlan Wang via lttng-dev
2022-06-21 13:12             ` [lttng-dev] ***UNCHECKED*** " Mathieu Desnoyers via lttng-dev
2022-06-22  7:45               ` Minlan Wang via lttng-dev
2022-06-22 13:19                 ` [lttng-dev] ***UNCHECKED*** " Mathieu Desnoyers via lttng-dev
2022-06-22 20:28                   ` Mathieu Desnoyers via lttng-dev
2022-06-22 20:52                     ` Mathieu Desnoyers via lttng-dev
2022-06-23  9:08                       ` Minlan Wang via lttng-dev
     [not found]                   ` <20220623034528.GA271179@localhost.localdomain>
2022-06-23  3:57                     ` Minlan Wang via lttng-dev
2022-06-23 14:09                       ` Mathieu Desnoyers via lttng-dev
2022-06-24  6:21                         ` Minlan Wang via lttng-dev
2022-06-23  8:36                   ` [lttng-dev] [PART 4/4] " Minlan Wang via lttng-dev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220616080912.GA246781@localhost.localdomain \
    --to=lttng-dev@lists.lttng.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=wangminlan@szsandstone.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.