Netdev List
 help / color / mirror / Atom feed
From: "Gao Feng" <gfree.wind@vip.163.com>
To: "Willem de Bruijn" <willemdebruijn.kernel@gmail.com>
Cc: "davem@davemloft.net" <davem@davemloft.net>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"jakub.kicinski@netronome.com" <jakub.kicinski@netronome.com>,
	"David Ahern" <dsahern@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re:Re: Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS
Date: Fri, 11 May 2018 22:44:23 +0800 (CST)	[thread overview]
Message-ID: <ac2b5b7.97a1.1634fa8f60d.Coremail.gfree.wind@vip.163.com> (raw)
In-Reply-To: <CAF=yD-JuAivOyrimVSxB_TtaiyiT498Z6fpe+XpF_oNqCMdUPA@mail.gmail.com>

At 2018-05-11 21:23:55, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com> wrote:
>On Fri, May 11, 2018 at 2:20 AM, Gao Feng <gfree.wind@vip.163.com> wrote:
>> At 2018-05-11 11:54:55, "Willem de Bruijn" <willemdebruijn.kernel@gmail.com> wrote:
>>>On Thu, May 10, 2018 at 4:28 AM,  <gfree.wind@vip.163.com> wrote:
>>>> From: Gao Feng <gfree.wind@vip.163.com>
>>>>
>>>> The skb flow limit is implemented for each CPU independently. In the
>>>> current codes, the function skb_flow_limit gets the softnet_data by
>>>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
>>>> the current cpu when enable RPS. As the result, the skb_flow_limit checks
>>>> the stats of current CPU, while the skb is going to append the queue of
>>>> another CPU. It isn't the expected behavior.
>>>>
>>>> Now pass the softnet_data as a param to softnet_data to make consistent.
>>>
>>>The local cpu softnet_data is used on purpose. The operations in
>>>skb_flow_limit() on sd fields could race if not executed on the local cpu.
>>
>> I think the race doesn't exist because of the rps_lock.
>> The enqueue_to_backlog has hold the rps_lock before skb_flow_limit.
>
>Indeed, I overlooked that. There still is the matter of cache contention.

The cache contention is really important in this case?
I don't think so, because the enqueue_to_backlog have touched and modified the softnet_stat
of target cpu.

>
>>>Flow limit tries to detect large ("elephant") DoS flows with a fixed four-tuple.
>>>These would always hit the same RPS cpu, so that cpu being backlogged
>>
>> They may hit the different target CPU when enable RFS. Because the app could be scheduled
>> to another CPU, then RFS tries to deliver the skb to latest core which has hot cache.
>
>This even more suggest using the initial (or IRQ) cpu to track state, instead
>of the destination (RPS/RFS) cpu.

I couldn't understand why it is better to track state on initial cpu, not the target cpu.
The latter one could get more accurate result.

>
>>>may be an indication that such a flow is active. But the flow will also always
>>>arrive on the same initial cpu courtesy of RSS. So storing the lookup table
>>
>> The RSS couldn't make sure the irq is handled by same cpu. It would be balanced between
>> the cpus.
>
>IRQs are usually pinned to cores. Unless using something like irqbalance,
>but that operates at too coarse a timescale to do anything useful at Mpps
>packet rates.

There are some motherboard which couldn't make sure the irq is pinned.
The flow_limit wouldn't work as well as expected.

>
>>>on the initial CPU is also fine. There may be false positives on other CPUs
>>>with the same RPS destination, but that is unlikely with a highly concurrent
>>>traffic server mix ("mice").
>>
>> If my comment is right, the flow couldn't always arrive one the same initial cpu,  although
>> it may be sent to one same target cpu.
>>
>>>
>>>Note that the sysctl net.core.flow_limit_cpu_bitmap enables the feature
>>>for the cpus on which traffic initially lands, not the RPS destination cpus.
>>>See also Documentation/networking/scaling.txt
>>>
>>>That said, I had to reread the code, as it does seem sensible that the
>>>same softnet_data is intended to be used both when testing qlen and
>>>flow_limit.
>>
>> In most cases, user configures the same RPS map with flow_limit like 0xff.
>> Because user couldn't predict which core the evil flow would arrive on.
>>
>> Take an example, there are 2 cores, cpu0 and cpu1.
>> One flow is the an evil flow, but the irq is sent to cpu0. After RPS/RFS, the target cpu is cpu1.
>> Now cpu0 invokes enqueue_to_backlog, then the skb_flow_limit checkes the queue length
>> of cpu0. Certainly it could pass the check of skb_flow_limit because there is no any evil flow on cpu0.
>
>No, enqueue_to_backlog passes qlen to skb_flow_limit, so that does
>check the queue length of the RPS cpu.

Sorry, I overlooked the qlen is the length of the rps cpu.
Then it's ok unless the stats may be not accurate when irq isn't pinned.

But I still doubt that is it really important to track state on initial cpu, not target cpu?
Because the enqueue_to_backlog have touched the softnet_data of target cpu.

Best Regards
Feng

>
>> Then the cpu0 inserts the skb into the queue of cpu1.
>> As a result, the skb_flow_limit doesn't work as expected.
>>
>> BTW, I have already sent the v2 patch which only adds the "Fixes: tag".
>
>The change also makes the code inconsistent with
>Documentation/networking/scaling.txt
>
>"In such environments, enable the feature on all CPUs that handle
>network rx interrupts (as set in /proc/irq/N/smp_affinity)."

  reply	other threads:[~2018-05-11 14:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-10  8:28 [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS gfree.wind
2018-05-10 13:02 ` Eric Dumazet
2018-05-11  0:18   ` Gao Feng
2018-05-11  0:55     ` Eric Dumazet
2018-05-11  1:29       ` Gao Feng
2018-05-11  3:54 ` Willem de Bruijn
2018-05-11  6:20   ` Gao Feng
2018-05-11 13:23     ` Willem de Bruijn
2018-05-11 14:44       ` Gao Feng [this message]
2018-05-11 14:56         ` Willem de Bruijn
2018-05-14  1:26           ` Gao Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac2b5b7.97a1.1634fa8f60d.Coremail.gfree.wind@vip.163.com \
    --to=gfree.wind@vip.163.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=jakub.kicinski@netronome.com \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox