From: Jason Xing <kerneljasonxing@gmail.com>
To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
kerneljasonxing@gmail.com, Jason Xing <kernelxing@tencent.com>
Subject: [PATCH v2 net] net: rps: avoid raising a softirq on the current cpu when scheduling napi
Date: Tue, 28 Mar 2023 22:21:12 +0800 [thread overview]
Message-ID: <20230328142112.12493-1-kerneljasonxing@gmail.com> (raw)
From: Jason Xing <kernelxing@tencent.com>
When we are scheduling napi and then RPS decides to put the skb into
a backlog queue of another cpu, we shouldn't raise the softirq for
the current cpu. When to raise a softirq is based on whether we have
more data left to process later. But apparently, as to the current
cpu, there is no indication of more data enqueued, so we do not need
this action. After enqueuing to another cpu, net_rx_action() or
process_backlog() will call ipi and then another cpu will raise the
softirq as expected.
Also, raising more softirqs which set the corresponding bit field
can make the IRQ mechanism think we probably need to start ksoftirqd
on the current cpu. Actually it shouldn't happen.
Here are some codes to clarify how it can trigger ksoftirqd:
__do_softirq()
[1] net_rx_action() -> enqueue_to_backlog() -> raise an IRQ
[2] check if pending is set again -> wakeup_softirqd
Comments on above:
[1] when RPS chooses another cpu to enqueue skb
[2] in __do_softirq() it will wait a little bit of time around 2 jiffies
In this patch, raising an IRQ can be avoided when RPS enqueues the skb
into another backlog queue not the current one.
I captured some data when starting one iperf3 process and found out
we can reduces around ~1500 times/sec at least calling
__raise_softirq_irqoff().
Fixes: 0a9627f2649a ("rps: Receive Packet Steering")
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
v2:
1) change the title and add more details.
2) add one parameter to recognise whether it is napi or non-napi case
suggested by Eric.
Link: https://lore.kernel.org/lkml/20230325152417.5403-1-kerneljasonxing@gmail.com/
---
net/core/dev.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 1518a366783b..504dc3fc09b1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4586,7 +4586,7 @@ static void trigger_rx_softirq(void *data)
* If yes, queue it to our IPI list and return 1
* If no, return 0
*/
-static int napi_schedule_rps(struct softnet_data *sd)
+static int napi_schedule_rps(struct softnet_data *sd, bool napi)
{
struct softnet_data *mysd = this_cpu_ptr(&softnet_data);
@@ -4594,8 +4594,9 @@ static int napi_schedule_rps(struct softnet_data *sd)
if (sd != mysd) {
sd->rps_ipi_next = mysd->rps_ipi_list;
mysd->rps_ipi_list = sd;
+ if (!napi)
+ __raise_softirq_irqoff(NET_RX_SOFTIRQ);
- __raise_softirq_irqoff(NET_RX_SOFTIRQ);
return 1;
}
#endif /* CONFIG_RPS */
@@ -4648,7 +4649,7 @@ static bool skb_flow_limit(struct sk_buff *skb, unsigned int qlen)
* queue (may be a remote CPU queue).
*/
static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
- unsigned int *qtail)
+ unsigned int *qtail, bool napi)
{
enum skb_drop_reason reason;
struct softnet_data *sd;
@@ -4675,7 +4676,7 @@ static int enqueue_to_backlog(struct sk_buff *skb, int cpu,
* We can use non atomic operation since we own the queue lock
*/
if (!__test_and_set_bit(NAPI_STATE_SCHED, &sd->backlog.state))
- napi_schedule_rps(sd);
+ napi_schedule_rps(sd, napi);
goto enqueue;
}
reason = SKB_DROP_REASON_CPU_BACKLOG;
@@ -4933,7 +4934,7 @@ static int netif_rx_internal(struct sk_buff *skb)
if (cpu < 0)
cpu = smp_processor_id();
- ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
+ ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail, false);
rcu_read_unlock();
} else
@@ -4941,7 +4942,7 @@ static int netif_rx_internal(struct sk_buff *skb)
{
unsigned int qtail;
- ret = enqueue_to_backlog(skb, smp_processor_id(), &qtail);
+ ret = enqueue_to_backlog(skb, smp_processor_id(), &qtail, false);
}
return ret;
}
@@ -5670,7 +5671,7 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
int cpu = get_rps_cpu(skb->dev, skb, &rflow);
if (cpu >= 0) {
- ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
+ ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail, false);
rcu_read_unlock();
return ret;
}
@@ -5705,7 +5706,7 @@ void netif_receive_skb_list_internal(struct list_head *head)
if (cpu >= 0) {
/* Will be handled, remove from list */
skb_list_del_init(skb);
- enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
+ enqueue_to_backlog(skb, cpu, &rflow->last_qtail, true);
}
}
}
--
2.37.3
next reply other threads:[~2023-03-28 14:21 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-28 14:21 Jason Xing [this message]
2023-03-28 14:33 ` [PATCH v2 net] net: rps: avoid raising a softirq on the current cpu when scheduling napi Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230328142112.12493-1-kerneljasonxing@gmail.com \
--to=kerneljasonxing@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kernelxing@tencent.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox