From: Thomas Graf <tgraf@suug.ch>
To: Tom Herbert <tom@herbertland.com>
Cc: Peter N??rlund <pch@ordbogen.com>,
David Miller <davem@davemloft.net>,
Linux Kernel Network Developers <netdev@vger.kernel.org>,
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
James Morris <jmorris@namei.org>,
Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
Patrick McHardy <kaber@trash.net>,
linux-api@vger.kernel.org,
Roopa Prabhu <roopa@cumulusnetworks.com>,
sfeldma <sfeldma@gmail.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Nicolas Dichtel <nicolas.dichtel@6wind.com>,
Jiri Benc <jbenc@redhat.com>
Subject: Re: [PATCH v2 net-next 0/3] ipv4: Hash-based multipath routing
Date: Mon, 31 Aug 2015 11:02:11 +0200 [thread overview]
Message-ID: <20150831090211.GA12707@pox.localdomain> (raw)
In-Reply-To: <CALx6S36wZ36jS9fmrFJZ3BddnpGpBJXph5HtOBYQ=ekatSKDNw@mail.gmail.com>
On 08/30/15 at 03:29pm, Tom Herbert wrote:
> On Sun, Aug 30, 2015 at 2:28 PM, Peter N??rlund <pch@ordbogen.com> wrote:
> > It would definitely be simpler, and it would be nice to just fetch the
> > hash directly from the NIC - and for link aggregation it would probably
> > be fine. But with L4, we always need to consider fragmented packets,
> > which might cause some packets of a flow to be routed differently - and
> > with ECMP, the ramifications of suddenly choosing another path for a
> > flow are worse than for link aggregation. The latency through the
> > different paths may differ enough to cause out-or-order packets and bad
> > TCP performance as a consequence. Both Cisco and Juniper routers
> > defaults to L3 for ECMP - exactly for that reason, I believe. RFC 2991
> > also points out that ports probably shouldn't be used as part of the
> > flow key with ECMP.
> >
> That's more reason why we need vendors to use IPv6 flow label instead
> of ports to do ECMP :-). In any case, if we're fragmenting TCP packets
> then we're already in a bad place performance-wise-- we really don't
> need to optimize for that case. Albeit, it would be nice if fragments
> of packet followed same path, but the would require devices to not do
> L4 hash over ports when MF is set-- I don't know if anyone does that
> (I have been meaning to add that to stack).
+1 for solving this at hash level. Being able to rely on the L4 HW
hash for multipath routing is very desirable. A simple MF bit ||
FO > 0 check with fall back to flow dissector to generate an L3 hash
in case the HW provided an L4 hash should be sufficient to address the
fragmentation concern.
Since performance is gone anyway, I'm not sure it's worth offloading
this behaviour to the HW.
prev parent reply other threads:[~2015-08-31 9:02 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-28 20:00 [PATCH v2 net-next 0/3] ipv4: Hash-based multipath routing pch
2015-08-28 20:00 ` [PATCH v2 net-next 3/3] ipv4: ICMP packet inspection for L3 multipath pch
[not found] ` <1440792050-2109-1-git-send-email-pch-chEQUL3jiZBWk0Htik3J/w@public.gmane.org>
2015-08-28 20:00 ` [PATCH v2 net-next 1/3] ipv4: Lock-less per-packet multipath pch-chEQUL3jiZBWk0Htik3J/w
2015-08-28 20:00 ` [PATCH v2 net-next 2/3] ipv4: L3 and L4 hash-based multipath routing pch-chEQUL3jiZBWk0Htik3J/w
2015-08-30 22:48 ` Tom Herbert
2015-08-29 20:14 ` [PATCH v2 net-next 0/3] ipv4: Hash-based " David Miller
[not found] ` <20150829.131429.360433621593751136.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2015-08-29 20:31 ` Peter Nørlund
2015-08-29 20:46 ` David Miller
[not found] ` <20150829.134628.1013990034021542524.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2015-08-29 20:55 ` Scott Feldman
2015-08-29 20:59 ` Tom Herbert
2015-08-30 21:28 ` Peter Nørlund
2015-08-30 22:29 ` Tom Herbert
2015-08-31 9:02 ` Thomas Graf [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150831090211.GA12707@pox.localdomain \
--to=tgraf@suug.ch \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=jbenc@redhat.com \
--cc=jmorris@namei.org \
--cc=kaber@trash.net \
--cc=kuznet@ms2.inr.ac.ru \
--cc=linux-api@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=nicolas.dichtel@6wind.com \
--cc=pch@ordbogen.com \
--cc=roopa@cumulusnetworks.com \
--cc=sfeldma@gmail.com \
--cc=tom@herbertland.com \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).