netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kuniyuki Iwashima <kuniyu@amazon.com>
To: <aksecurity@gmail.com>
Cc: <benh@amazon.com>, <davem@davemloft.net>, <dsahern@kernel.org>,
	<edumazet@google.com>, <kuba@kernel.org>, <kuni1840@gmail.com>,
	<kuniyu@amazon.com>, <netdev@vger.kernel.org>,
	<pabeni@redhat.com>, <samjonas@amazon.com>, <trawets@amazon.com>
Subject: Re: [PATCH v2 net] tcp: Reduce chance of collisions in inet6_hashfn().
Date: Fri, 21 Jul 2023 18:39:48 -0700	[thread overview]
Message-ID: <20230722013948.42820-1-kuniyu@amazon.com> (raw)
In-Reply-To: <CANEQ_+KSPSy3cQmVf_WdkdHMaNdCh-Qyo_7M8p+qFXXqbeZWgw@mail.gmail.com>

From: Amit Klein <aksecurity@gmail.com>
Date: Sat, 22 Jul 2023 03:07:49 +0300
> Resending because some recipients require plaintext email. Sorry.
> Original message:
> 
> This is certainly better than the original code.

Thanks for reviewing!


> 
> Two remarks:
> 
> 1. In general, using SipHash is more secure, even if only for its
> longer key (128 bits, cf. jhash's 32 bits), which renders simple
> enumeration attacks infeasible. I understand that in a different
> context, switching from jhash to siphash incurred some overhead, but
> maybe here it won't.

I see.  Stewart tested hsiphash and observed more overhead as
noted in the changelog, but let me give another shot to SipHash
and HSiphash.

I'll report back here next week.


> 
> 2. Taking a more holistic approach to __ipv6_addr_jhash(), I surveyed
> where and how it's used. In most cases, it is used for hashing
> together the IPv6 local address, foreign address and optionally some
> more data (e.g. layer 4 protocol number, layer 4 ports).
> Security-wise, it makes more sense to hash all data together once, and
> not piecewise as it's done today (i.e. today it's
> jhash(....,jhash(faddr),...), which cases the faddr into 32 bits,
> whereas the recommended way is to hash all items in their entirety,
> i.e. jhash(...,faddr,...)).

Agree.


> This requires scrutinizing all 6
> invocations of __ipv6_addr_jhash() one by one and modifying the
> calling code accordingly.

At a glance, only rds_conn_bucket() seems a little bit tricky
as it uses v4 hash function later.

But I'll take a deeper look.

Thanks!


> 
> Thanks,
> -Amit
> 
> On Sat, Jul 22, 2023 at 1:24 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From: Stewart Smith <trawets@amazon.com>
> >
> > For both IPv4 and IPv6 incoming TCP connections are tracked in a hash
> > table with a hash over the source & destination addresses and ports.
> > However, the IPv6 hash is insufficient and can lead to a high rate of
> > collisions.
> >
> > The IPv6 hash used an XOR to fit everything into the 96 bits for the
> > fast jenkins hash, meaning it is possible for an external entity to
> > ensure the hash collides, thus falling back to a linear search in the
> > bucket, which is slow.
> >
> > We take the approach of hash the full length of IPv6 address in
> > __ipv6_addr_jhash() so that all users can benefit from a more secure
> > version.
> >
> > While this may look like it adds overhead, the reality of modern CPUs
> > means that this is unmeasurable in real world scenarios.
> >
> > In simulating with llvm-mca, the increase in cycles for the hashing
> > code was ~16 cycles on Skylake (from a base of ~155), and an extra ~9
> > on Nehalem (base of ~173).
> >
> > In commit dd6d2910c5e0 ("netfilter: conntrack: switch to siphash")
> > netfilter switched from a jenkins hash to a siphash, but even the faster
> > hsiphash is a more significant overhead (~20-30%) in some preliminary
> > testing.  So, in this patch, we keep to the more conservative approach to
> > ensure we don't add much overhead per SYN.
> >
> > In testing, this results in a consistently even spread across the
> > connection buckets.  In both testing and real-world scenarios, we have
> > not found any measurable performance impact.
> >
> > Fixes: 08dcdbf6a7b9 ("ipv6: use a stronger hash for tcp")
> > Signed-off-by: Stewart Smith <trawets@amazon.com>
> > Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
> > Suggested-by: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > ---
> > v2:
> >   * Hash all IPv6 bytes once in __ipv6_addr_jhash() instead of reusing
> >     some bytes twice.
> >
> > v1: https://lore.kernel.org/netdev/20230629015844.800280-1-samjonas@amazon.com/
> > ---
> >  include/net/ipv6.h | 8 ++------
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> > index 7332296eca44..2acc4c808d45 100644
> > --- a/include/net/ipv6.h
> > +++ b/include/net/ipv6.h
> > @@ -752,12 +752,8 @@ static inline u32 ipv6_addr_hash(const struct in6_addr *a)
> >  /* more secured version of ipv6_addr_hash() */
> >  static inline u32 __ipv6_addr_jhash(const struct in6_addr *a, const u32 initval)
> >  {
> > -       u32 v = (__force u32)a->s6_addr32[0] ^ (__force u32)a->s6_addr32[1];
> > -
> > -       return jhash_3words(v,
> > -                           (__force u32)a->s6_addr32[2],
> > -                           (__force u32)a->s6_addr32[3],
> > -                           initval);
> > +       return jhash2((__force const u32 *)a->s6_addr32,
> > +                     ARRAY_SIZE(a->s6_addr32), initval);
> >  }
> >
> >  static inline bool ipv6_addr_loopback(const struct in6_addr *a)
> > --
> > 2.30.2


  reply	other threads:[~2023-07-22  1:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-21 22:24 [PATCH v2 net] tcp: Reduce chance of collisions in inet6_hashfn() Kuniyuki Iwashima
2023-07-22  0:07 ` Amit Klein
2023-07-22  1:39   ` Kuniyuki Iwashima [this message]
2023-07-22  3:17     ` Smith, Stewart
2023-07-24 19:58       ` Kuniyuki Iwashima
2023-07-24  9:41 ` Eric Dumazet
2023-07-25  0:20 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230722013948.42820-1-kuniyu@amazon.com \
    --to=kuniyu@amazon.com \
    --cc=aksecurity@gmail.com \
    --cc=benh@amazon.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=kuni1840@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=samjonas@amazon.com \
    --cc=trawets@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).