* [RFC PATCH] net, rps: bypass enqueue_to_backlog()
@ 2013-12-18 13:03 Zhi Yong Wu
2013-12-18 13:56 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Zhi Yong Wu @ 2013-12-18 13:03 UTC (permalink / raw)
To: therbert; +Cc: netdev, Zhi Yong Wu
From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
When local cpu is just target cpu which will handle network soft irq,
the packet should be directly injected to network stack, by bypassing
enqueue_to_backlog(), it can speed up the packet processing.
HI, guys
I checked the first several versions of RPS patch which seemed to have
this condition determination, but why was it removed later? Do i miss
anything? if yes, please correct me, thanks.
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
---
net/core/dev.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index c482fe8..d29d61f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3693,7 +3693,7 @@ int netif_receive_skb(struct sk_buff *skb)
cpu = get_rps_cpu(skb->dev, skb, &rflow);
- if (cpu >= 0) {
+ if ((cpu >= 0) && (cpu != raw_smp_processor_id())) {
ret = enqueue_to_backlog(skb, cpu, &rflow->last_qtail);
rcu_read_unlock();
return ret;
--
1.7.6.5
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [RFC PATCH] net, rps: bypass enqueue_to_backlog()
2013-12-18 13:03 [RFC PATCH] net, rps: bypass enqueue_to_backlog() Zhi Yong Wu
@ 2013-12-18 13:56 ` Eric Dumazet
2013-12-18 14:04 ` Eric Dumazet
2013-12-18 14:20 ` Zhi Yong Wu
0 siblings, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2013-12-18 13:56 UTC (permalink / raw)
To: Zhi Yong Wu; +Cc: therbert, netdev, Zhi Yong Wu
On Wed, 2013-12-18 at 21:03 +0800, Zhi Yong Wu wrote:
> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>
> When local cpu is just target cpu which will handle network soft irq,
> the packet should be directly injected to network stack, by bypassing
> enqueue_to_backlog(), it can speed up the packet processing.
>
> HI, guys
>
> I checked the first several versions of RPS patch which seemed to have
> this condition determination, but why was it removed later? Do i miss
> anything? if yes, please correct me, thanks.
>
> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> ---
Hmm... Could you elaborate ?
At which point do you think this condition was tested or removed ?
I think the idea was to drain NIC RX queues as fast as possible, then :
- Send the IPI to remote cpus
- process our queue in parallel with other cpus processing their own
queue.
If we process our packets through whole stack, packets for other cpus
will have a fair amount of extra latency.
Thats a tradeoff I suppose.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] net, rps: bypass enqueue_to_backlog()
2013-12-18 13:56 ` Eric Dumazet
@ 2013-12-18 14:04 ` Eric Dumazet
2013-12-18 14:11 ` Zhi Yong Wu
2013-12-18 14:20 ` Zhi Yong Wu
1 sibling, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2013-12-18 14:04 UTC (permalink / raw)
To: Zhi Yong Wu; +Cc: therbert, netdev, Zhi Yong Wu
On Wed, 2013-12-18 at 05:56 -0800, Eric Dumazet wrote:
> On Wed, 2013-12-18 at 21:03 +0800, Zhi Yong Wu wrote:
> > From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> >
> > When local cpu is just target cpu which will handle network soft irq,
> > the packet should be directly injected to network stack, by bypassing
> > enqueue_to_backlog(), it can speed up the packet processing.
> >
> > HI, guys
> >
> > I checked the first several versions of RPS patch which seemed to have
> > this condition determination, but why was it removed later? Do i miss
> > anything? if yes, please correct me, thanks.
> >
> > Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> > ---
>
> Hmm... Could you elaborate ?
>
> At which point do you think this condition was tested or removed ?
>
> I think the idea was to drain NIC RX queues as fast as possible, then :
>
> - Send the IPI to remote cpus
> - process our queue in parallel with other cpus processing their own
> queue.
>
> If we process our packets through whole stack, packets for other cpus
> will have a fair amount of extra latency.
>
> Thats a tradeoff I suppose.
>
Also note that going through the backlog permits thinks like
99bbc70741903c0 ("rps: selective flow shedding during softnet
overflow")
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [RFC PATCH] net, rps: bypass enqueue_to_backlog()
2013-12-18 14:04 ` Eric Dumazet
@ 2013-12-18 14:11 ` Zhi Yong Wu
2013-12-18 14:26 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Zhi Yong Wu @ 2013-12-18 14:11 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Tom Herbert, Linux Netdev List, Zhi Yong Wu
On Wed, Dec 18, 2013 at 10:04 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2013-12-18 at 05:56 -0800, Eric Dumazet wrote:
>> On Wed, 2013-12-18 at 21:03 +0800, Zhi Yong Wu wrote:
>> > From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> >
>> > When local cpu is just target cpu which will handle network soft irq,
>> > the packet should be directly injected to network stack, by bypassing
>> > enqueue_to_backlog(), it can speed up the packet processing.
>> >
>> > HI, guys
>> >
>> > I checked the first several versions of RPS patch which seemed to have
>> > this condition determination, but why was it removed later? Do i miss
>> > anything? if yes, please correct me, thanks.
>> >
>> > Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> > ---
>>
>> Hmm... Could you elaborate ?
>>
>> At which point do you think this condition was tested or removed ?
>>
>> I think the idea was to drain NIC RX queues as fast as possible, then :
>>
>> - Send the IPI to remote cpus
>> - process our queue in parallel with other cpus processing their own
>> queue.
>>
>> If we process our packets through whole stack, packets for other cpus
>> will have a fair amount of extra latency.
>>
>> Thats a tradeoff I suppose.
>>
>
> Also note that going through the backlog permits thinks like
>
> 99bbc70741903c0 ("rps: selective flow shedding during softnet
> overflow"
it makes sense, but if cpu < 0, some packets seems to bypass the flow
shedding stuff, That is, the flow limit will be not so accurate,
right?
>
>
>
--
Regards,
Zhi Yong Wu
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [RFC PATCH] net, rps: bypass enqueue_to_backlog()
2013-12-18 14:11 ` Zhi Yong Wu
@ 2013-12-18 14:26 ` Eric Dumazet
0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2013-12-18 14:26 UTC (permalink / raw)
To: Zhi Yong Wu; +Cc: Tom Herbert, Linux Netdev List, Zhi Yong Wu
On Wed, 2013-12-18 at 22:11 +0800, Zhi Yong Wu wrote:
> it makes sense, but if cpu < 0, some packets seems to bypass the flow
> shedding stuff, That is, the flow limit will be not so accurate,
> right?
Not sure what you mean.
cpu = -1 on very rare occasions, like cpu hot unplug.
In netif_receive_skb(), we just process the packet normally.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC PATCH] net, rps: bypass enqueue_to_backlog()
2013-12-18 13:56 ` Eric Dumazet
2013-12-18 14:04 ` Eric Dumazet
@ 2013-12-18 14:20 ` Zhi Yong Wu
1 sibling, 0 replies; 6+ messages in thread
From: Zhi Yong Wu @ 2013-12-18 14:20 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Tom Herbert, Linux Netdev List, Zhi Yong Wu
thanks for your explanation.
by the way, pls ignore this patch, thanks.
On Wed, Dec 18, 2013 at 9:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2013-12-18 at 21:03 +0800, Zhi Yong Wu wrote:
>> From: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>
>> When local cpu is just target cpu which will handle network soft irq,
>> the packet should be directly injected to network stack, by bypassing
>> enqueue_to_backlog(), it can speed up the packet processing.
>>
>> HI, guys
>>
>> I checked the first several versions of RPS patch which seemed to have
>> this condition determination, but why was it removed later? Do i miss
>> anything? if yes, please correct me, thanks.
>>
>> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> ---
>
> Hmm... Could you elaborate ?
>
> At which point do you think this condition was tested or removed ?
>
> I think the idea was to drain NIC RX queues as fast as possible, then :
>
> - Send the IPI to remote cpus
> - process our queue in parallel with other cpus processing their own
> queue.
>
> If we process our packets through whole stack, packets for other cpus
> will have a fair amount of extra latency.
>
> Thats a tradeoff I suppose.
>
>
--
Regards,
Zhi Yong Wu
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-12-18 14:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-18 13:03 [RFC PATCH] net, rps: bypass enqueue_to_backlog() Zhi Yong Wu
2013-12-18 13:56 ` Eric Dumazet
2013-12-18 14:04 ` Eric Dumazet
2013-12-18 14:11 ` Zhi Yong Wu
2013-12-18 14:26 ` Eric Dumazet
2013-12-18 14:20 ` Zhi Yong Wu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox