public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: davem@davemloft.net, kuba@kernel.org, edumazet@google.com,
	pabeni@redhat.com, fw@strlen.de, mingo@redhat.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	bristot@redhat.com, tglx@linutronix.de, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: linuxarm@huawei.com, guodong.xu@linaro.org,
	yangyicong@huawei.com, shenyang39@huawei.com,
	tangchengchang@huawei.com,
	Barry Song <song.bao.hua@hisilicon.com>,
	Libo Chen <libo.chen@oracle.com>,
	Tim Chen <tim.c.chen@linux.intel.com>
Subject: [RFC PATCH] sched&net: avoid over-pulling tasks due to network interrupts
Date: Fri,  5 Nov 2021 18:51:36 +0800	[thread overview]
Message-ID: <20211105105136.12137-1-21cnbao@gmail.com> (raw)

From: Barry Song <song.bao.hua@hisilicon.com>

In LPC2021, both Libo Chen and Tim Chen have reported the overpull
of network interrupts[1]. For example, while running one database,
ethernet is located in numa0, numa1 might be almost idle due to
interrupts are pulling tasks to numa0 because of wake_up affine.
I have seen the same problem. One way to solve this problem is
moving to a normal wakeup in network rather than using a sync
wakeup which will be more aggressively pulling tasks in scheduler
core.

On kunpeng920 with 4numa, ethernet is located at numa0, storage
disk is located at numa2. While using sysbench to connect this
mysql machine, I am seeing numa1 is idle though numa0,2 and 3
are quite busy.

The benchmark command:

 sysbench --db-driver=mysql --mysql-user=sbtest_user \
 --mysql_password=password --mysql-db=sbtest \
 --mysql-host=192.168.101.3 --mysql-port=3306 \
 --point-selects=10 --simple-ranges=1 \
 --sum-ranges=1 --order-ranges=1 --distinct-ranges=1 \
 --index-updates=1 --non-index-updates=1 \
 --delete-inserts=1 --range-size=100 \
 --time=600 --events=0 --report-interval=60 \
 --tables=64 --table-size=2000000 --threads=128 \
  /usr/share/sysbench/oltp_read_only.lua run

The benchmark result is as below:
                 tps        qps
w/o patch     31748.22     507971.56
w/  patch     35075.20     561203.13
              +10.5%

With the patch I am seeing NUMA1 becomes busy as well so I am
getting 10%+ performance improvement.

I am not saying this patch is exactly the right approach, But I'd
like to use this RFC to connect the people of net and scheduler,
and start the discussion in this wider range.

Testing was done based on the latest linus tree commit d4439a1189.
with the .config[2]

[1] https://linuxplumbersconf.org/event/11/contributions/1044/attachments/801/1508/lpc21_wakeup_pulling_libochen.pdf
[2] http://www.linuxep.com/patches/config

Cc: Libo Chen <libo.chen@oracle.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
---
 net/core/sock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 9862eef..a346359 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3133,7 +3133,7 @@ void sock_def_readable(struct sock *sk)
 	rcu_read_lock();
 	wq = rcu_dereference(sk->sk_wq);
 	if (skwq_has_sleeper(wq))
-		wake_up_interruptible_sync_poll(&wq->wait, EPOLLIN | EPOLLPRI |
+		wake_up_interruptible_poll(&wq->wait, EPOLLIN | EPOLLPRI |
 						EPOLLRDNORM | EPOLLRDBAND);
 	sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
 	rcu_read_unlock();
@@ -3151,7 +3151,7 @@ static void sock_def_write_space(struct sock *sk)
 	if ((refcount_read(&sk->sk_wmem_alloc) << 1) <= READ_ONCE(sk->sk_sndbuf)) {
 		wq = rcu_dereference(sk->sk_wq);
 		if (skwq_has_sleeper(wq))
-			wake_up_interruptible_sync_poll(&wq->wait, EPOLLOUT |
+			wake_up_interruptible_poll(&wq->wait, EPOLLOUT |
 						EPOLLWRNORM | EPOLLWRBAND);
 
 		/* Should agree with poll, otherwise some programs break */
-- 
1.8.3.1


             reply	other threads:[~2021-11-05 10:51 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-05 10:51 Barry Song [this message]
2021-11-05 12:24 ` [RFC PATCH] sched&net: avoid over-pulling tasks due to network interrupts Peter Zijlstra
2021-11-07 18:08   ` Barry Song
2021-11-08  9:27     ` Peter Zijlstra
2021-11-08 16:27       ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211105105136.12137-1-21cnbao@gmail.com \
    --to=21cnbao@gmail.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=davem@davemloft.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=guodong.xu@linaro.org \
    --cc=juri.lelli@redhat.com \
    --cc=kuba@kernel.org \
    --cc=libo.chen@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=shenyang39@huawei.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=tangchengchang@huawei.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=yangyicong@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox