netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Deng-Cheng Zhu <dczhu@mips.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <therbert@google.com>, <davem@davemloft.net>,
	<netdev@vger.kernel.org>
Subject: Re: [PATCH v2] RPS: Sparse connection optimizations - v2
Date: Tue, 8 May 2012 14:43:45 +0800	[thread overview]
Message-ID: <4FA8C0A1.1060607@mips.com> (raw)
In-Reply-To: <1336379153.3752.2273.camel@edumazet-glaptop>

On 05/07/2012 04:25 PM, Eric Dumazet wrote:
> On Mon, 2012-05-07 at 16:01 +0800, Deng-Cheng Zhu wrote:
>
>> Did you really read my patch and understand what I commented? When I was
>> talking about using rps_sparse_flow (initially cpu_flow), neither
>> rps_sock_flow_table nor rps_dev_flow_table is activated (number of
>> entries: 0).
>
> I read your patch and am concerned of performance issues when handling
> typical workload. Say between 0.1 and 20 Mpps on current hardware.

Thanks for reading. For the performance concern, that's why I posted
the test data in the patch description.

>
> The argument "oh its only selected when
> CONFIG_RPS_SPARSE_FLOW_OPTIMIZATION is set" is wrong.

Certainly, as stated before, I admit (in fact, it's obvious) that this
patch is nothing more than a tradeoff between the throughput of sparse
flows and that of many. In the tests, less than 4% performance loss was
observed as the number of connections went higher. If people (in most
cases I understand) don't care about the sparse flow case, then leave
this feature not selected (the default). In the real world, there are
many tradeoffs, right?

>
> CONFIG_NR_RPS_MAP_LOOPS is wrong.

To balance the overhead and the sparse flow throughput.

>
> Your HZ timeout is yet another dark side of your patch.

Certainly it can be changed to something more reasonable.

>
> Your (flow->dev == skb->dev) test is wrong.

Please let me know why, thanks. The tests showed it's possible that 2
correlated flows could have the same hash value but from different
devices.

>
> Your : flow->ts = now; is wrong (dirtying memory for each packet)

It doesn't touch the packet, does it? It only records the last time when
the flow is active. And the flow entries are only managed by this
feature.

>
> Really I dont like your patch.

Regretfully.

>
> You are kindly asked to find another way to solve your problem, a
> generic mechanism that can help others, not only you.

This is the meat of the problem. And it's why I'm still saying something
about this patch. We do have the platform where SMP irq affinity is not
available to NICs and, according to tests, RPS does take effect with a
bit imperfection: relatively low and inconsistent throughput in the case
of sparse connections. What you are saying is not the linux way, IMHO:
Provided the incoming code doesn't do harm to the kernel, we should
offer options to users. My case can be a rare one, but it would be good
to have the kernel support.

>
> We do think activating RFS is the way to go. Its the standard layer we
> added below RPS, its configurable and scales. It can be expanded at will
> with configurable plugins.
>
> For example, using single queue NICS, it makes sense to select cpu on
> the output device only, not on the rxhash by itself (a modulo or
> something), to reduce false sharing and qdisc/device lock on tx path.
>
> If your machine has 4 cpus, and 4 nics, you can instruct RFS table to
> prefer cpu on the NIC that packet will use for output.

I understand what you say above, but it does not apply in my case.


Thanks,

Deng-Cheng

      reply	other threads:[~2012-05-08  6:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-03  8:56 [PATCH v2] RPS: Sparse connection optimizations - v2 Deng-Cheng Zhu
2012-05-04  3:22 ` Tom Herbert
2012-05-04  3:39   ` Deng-Cheng Zhu
2012-05-04  4:25   ` Deng-Cheng Zhu
2012-05-04  7:47     ` Eric Dumazet
2012-05-07  6:51       ` Deng-Cheng Zhu
2012-05-07  7:22         ` Eric Dumazet
2012-05-04 15:31     ` Tom Herbert
2012-05-04 15:47       ` Eric Dumazet
2012-05-04 20:53         ` Eric Dumazet
2012-05-07  6:48       ` Deng-Cheng Zhu
2012-05-07  7:38         ` Eric Dumazet
2012-05-07  8:01           ` Deng-Cheng Zhu
2012-05-07  8:25             ` Eric Dumazet
2012-05-08  6:43               ` Deng-Cheng Zhu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FA8C0A1.1060607@mips.com \
    --to=dczhu@mips.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=therbert@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).