Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Hannes Frederic Sowa @ 2013-09-24  3:35 UTC (permalink / raw)
  To: Eric Dumazet, Tom Herbert, davem, netdev, jesse.brandeburg
In-Reply-To: <20130924023038.GA22393@order.stressinduktion.org>

On Tue, Sep 24, 2013 at 04:30:38AM +0200, Hannes Frederic Sowa wrote:
> On Mon, Sep 23, 2013 at 05:03:11PM -0700, Eric Dumazet wrote:
> > On Mon, 2013-09-23 at 15:41 -0700, Tom Herbert wrote:
> > 
> > > +#ifdef CONFIG_NET_TOEPLITZ
> > > +	toeplitz_net = toeplitz_alloc();
> > > +	if (!toeplitz_net)
> > > +		goto out;
> > > +
> > > +	toeplitz_init(toeplitz_net, NULL);
> > > +#endif
> > > +
> > 
> > Hmm
> > 
> > 1) Security alert here.
> > 
> > Many devices (lets say Android phones) have no entropy at this point,
> > all devices will have same toeplitz key.
> > 
> > Check build_ehash_secret() for a possible point for the feeding of the
> > key. (and commit 08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664 )
> > 
> > If hardware toeplitz is ever used, we want to make sure every host uses
> > a private and hidden Toeplitz key.
> build_ehash_secret builds up the data which seeds fragmentation ids, ephermal
> port randomization etc. Could we drop the check of sock->type? I guess the
> idea was that in-kernel sockets of type raw/udp do not seed the keys when no
> entropy is available?

Would this be better (I checked inet_ehash_secret, ipv6_hash_secret
and net_secret to actual get initialized)?

[PATCH] inet: initialize hash secret values on first non-kernel socket creation

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv4/af_inet.c  | 5 ++---
 net/ipv6/af_inet6.c | 4 +---
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7a1874b..489834a 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -286,9 +286,8 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
 	int try_loading_module = 0;
 	int err;
 
-	if (unlikely(!inet_ehash_secret))
-		if (sock->type != SOCK_RAW && sock->type != SOCK_DGRAM)
-			build_ehash_secret();
+	if (unlikely(!inet_ehash_secret && !kern))
+		build_ehash_secret();
 
 	sock->state = SS_UNCONNECTED;
 
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 7c96100..dbf8c35 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -110,9 +110,7 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
 	int try_loading_module = 0;
 	int err;
 
-	if (sock->type != SOCK_RAW &&
-	    sock->type != SOCK_DGRAM &&
-	    !inet_ehash_secret)
+	if (unlikely(!inet_ehash_secret && !kern))
 		build_ehash_secret();
 
 	/* Look for the requested type/protocol pair. */
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH v2.39 7/7] datapath: Add basic MPLS support to kernel
From: Simon Horman @ 2013-09-24  2:49 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Ben Pfaff, Pravin Shelar, dev@openvswitch.org, netdev, Ravi K,
	Isaku Yamahata, Joe Stringer
In-Reply-To: <CAEP_g=8i9djcOWhNEKweg8Qx8LhQ4jD142af2PgfR59yo=SsAA@mail.gmail.com>

On Mon, Sep 23, 2013 at 06:38:23PM -0700, Jesse Gross wrote:
> On Mon, Sep 23, 2013 at 6:32 PM, Simon Horman <horms@verge.net.au> wrote:
> > On Mon, Sep 23, 2013 at 02:17:50PM -0700, Jesse Gross wrote:
> >> On Sat, Sep 21, 2013 at 10:34 PM, Simon Horman <horms@verge.net.au> wrote:
> >> > On Thu, Sep 19, 2013 at 12:21:33PM -0500, Jesse Gross wrote:
> >> >> On Thu, Sep 19, 2013 at 10:57 AM, Simon Horman <horms@verge.net.au> wrote:
> >> >> > On Mon, Sep 16, 2013 at 03:38:21PM -0500, Jesse Gross wrote:
> >> >> >> On Mon, Sep 9, 2013 at 12:20 AM, Simon Horman <horms@verge.net.au> wrote:
> >> >> One other consideration in the OVS case - with recirculation we may
> >> >> hit this code multiple times and the difference in behavior could be
> >> >> surprising. However, on the other hand, we need to be careful because
> >> >> skb->cb is not guaranteed to be initialized to zero.
> >> >
> >> > Thanks, that is also not something that I had considered.
> >> >
> >> > I'm not sure, but I think that we can rely on skb->cb
> >> > not being clobbered between rounds of recirculation.
> >> > Or at the very least I think we could save and restore it
> >> > as necessary.
> >>
> >> Yes, it should be safe to assume this.
> >>
> >> > So I think if we could be careful to make sure that inner_protocol
> >> > is in a sane state the first time we see the skb but not
> >> > each time it is recirculated then I think things should work out.
> >> >
> >> > In my current implementation of recirculation the datapath
> >> > side is driven ovs_dp_process_received_packet(). So by my reasoning
> >> > above I think it would make sense to reset the inner_protocol there
> >> > and in ovs_packet_cmd_execute() rather than in ovs_execute_actions()
> >> > which each of those functions call.
> >>
> >> I think that would work, however, I wonder if it's the right place in
> >> general, independent of this compatibility issue. I guess it still
> >> seems like the ideal thing to do is to move this as close to where it
> >> is necessary as possible, specifically in mpls_push(). Is there a
> >> reason to not put it there (again, other than the out-of-tree
> >> compatibility issues)?
> >
> > I agree that should work, out-of-tree compatibility issues aside.
> >
> > Perhaps a solution is to have a conditional set_inner_protocol call inside
> > push_mpls, where the condition is that inner_protocol is zero.
> > And a reset_inner_protocol call earlier on, a call that sets inner_protocol
> > to zero only if the compatibility code is in use and thus it resides in
> > struct ovs_gso_cb. This call could be remove once the compatibility
> > code is no longer needed, that is once kernels older than 3.11 are no
> > longer supported.
> 
> I agree that's probably the right solution.

Thanks, I will see about making it so.

^ permalink raw reply

* Re: [RESEND PATCH iproute2] vxlan: add ipv6 support
From: Cong Wang @ 2013-09-24  2:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, David S. Miller
In-Reply-To: <20130923132049.7fa5673e@nehalam.linuxnetplumber.net>

On Mon, 2013-09-23 at 13:20 -0700, Stephen Hemminger wrote:
> On Sat, 21 Sep 2013 10:35:24 +0800
> Cong Wang <amwang@redhat.com> wrote:
> 
> > +			if (!inet_pton(AF_INET, *argv, &gaddr)) {
> > +				if (!inet_pton(AF_INET6, *argv, &gaddr6)) {
> > +					fprintf(stderr, "Invalid address \"%s\"\n", *argv);
> > +					return -1;
> > +				} else if (!IN6_IS_ADDR_MULTICAST(&gaddr6))
> > +					invarg("invald group address", *argv);
> > +			} else if (!IN_MULTICAST(ntohl(gaddr)))
> > +					invarg("invald group address", *argv);
> 
> This is ugly, can't it be done more generically by checking for ':' in address.
> Or even use getaddrinfo.
> 
> Hate to have lots of special code to handle both address types.
> 

Alright, I will introduce a helper function for it.

Thanks!

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Hannes Frederic Sowa @ 2013-09-24  2:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, davem, netdev, jesse.brandeburg
In-Reply-To: <1379980991.3165.37.camel@edumazet-glaptop>

On Mon, Sep 23, 2013 at 05:03:11PM -0700, Eric Dumazet wrote:
> On Mon, 2013-09-23 at 15:41 -0700, Tom Herbert wrote:
> 
> > +#ifdef CONFIG_NET_TOEPLITZ
> > +	toeplitz_net = toeplitz_alloc();
> > +	if (!toeplitz_net)
> > +		goto out;
> > +
> > +	toeplitz_init(toeplitz_net, NULL);
> > +#endif
> > +
> 
> Hmm
> 
> 1) Security alert here.
> 
> Many devices (lets say Android phones) have no entropy at this point,
> all devices will have same toeplitz key.
> 
> Check build_ehash_secret() for a possible point for the feeding of the
> key. (and commit 08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664 )
> 
> If hardware toeplitz is ever used, we want to make sure every host uses
> a private and hidden Toeplitz key.

I just had a look at it myself and have one question:

ipv6/af_inet6.c:
112         if (sock->type != SOCK_RAW &&
113             sock->type != SOCK_DGRAM &&
114             !inet_ehash_secret)
115                 build_ehash_secret();

ipv4/af_inet.c:
289         if (unlikely(!inet_ehash_secret))
290                 if (sock->type != SOCK_RAW && sock->type != SOCK_DGRAM)
291                         build_ehash_secret();


Why do we care about the sock->type?

build_ehash_secret builds up the data which seeds fragmentation ids, ephermal
port randomization etc. Could we drop the check of sock->type? I guess the
idea was that in-kernel sockets of type raw/udp do not seed the keys when no
entropy is available?

Thanks,

  Hannes

^ permalink raw reply

* linux-next: manual merge of the ipsec-next tree with the net-next tree
From: Stephen Rothwell @ 2013-09-24  2:16 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: linux-next, linux-kernel, Fan Du, Joe Perches, David Miller,
	netdev

[-- Attachment #1: Type: text/plain, Size: 4878 bytes --]

Hi Steffen,

Today's linux-next merge of the ipsec-next tree got a conflict in
include/net/xfrm.h between commit d511337a1eda ("xfrm.h: Remove extern
from function prototypes") from the net-next tree and commit aba826958830
("{ipv4,xfrm}: Introduce xfrm_tunnel_notifier for xfrm tunnel mode
callback") from the ipsec-next tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc include/net/xfrm.h
index 7657461,c7afa6e..0000000
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@@ -1493,39 -1495,35 +1499,39 @@@ static inline int xfrm4_rcv_spi(struct 
  	return xfrm4_rcv_encap(skb, nexthdr, spi, 0);
  }
  
 -extern int xfrm4_extract_output(struct xfrm_state *x, struct sk_buff *skb);
 -extern int xfrm4_prepare_output(struct xfrm_state *x, struct sk_buff *skb);
 -extern int xfrm4_output(struct sk_buff *skb);
 -extern int xfrm4_output_finish(struct sk_buff *skb);
 -extern int xfrm4_tunnel_register(struct xfrm_tunnel *handler, unsigned short family);
 -extern int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
 -extern int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel_notifier *handler);
 -extern int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel_notifier *handler);
 -extern int xfrm6_extract_header(struct sk_buff *skb);
 -extern int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb);
 -extern int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi);
 -extern int xfrm6_transport_finish(struct sk_buff *skb, int async);
 -extern int xfrm6_rcv(struct sk_buff *skb);
 -extern int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 -			    xfrm_address_t *saddr, u8 proto);
 -extern int xfrm6_tunnel_register(struct xfrm6_tunnel *handler, unsigned short family);
 -extern int xfrm6_tunnel_deregister(struct xfrm6_tunnel *handler, unsigned short family);
 -extern __be32 xfrm6_tunnel_alloc_spi(struct net *net, xfrm_address_t *saddr);
 -extern __be32 xfrm6_tunnel_spi_lookup(struct net *net, const xfrm_address_t *saddr);
 -extern int xfrm6_extract_output(struct xfrm_state *x, struct sk_buff *skb);
 -extern int xfrm6_prepare_output(struct xfrm_state *x, struct sk_buff *skb);
 -extern int xfrm6_output(struct sk_buff *skb);
 -extern int xfrm6_output_finish(struct sk_buff *skb);
 -extern int xfrm6_find_1stfragopt(struct xfrm_state *x, struct sk_buff *skb,
 -				 u8 **prevhdr);
 +int xfrm4_extract_output(struct xfrm_state *x, struct sk_buff *skb);
 +int xfrm4_prepare_output(struct xfrm_state *x, struct sk_buff *skb);
 +int xfrm4_output(struct sk_buff *skb);
 +int xfrm4_output_finish(struct sk_buff *skb);
 +int xfrm4_tunnel_register(struct xfrm_tunnel *handler, unsigned short family);
 +int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
- int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel *handler);
- int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel *handler);
++int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel_notifier *handler);
++int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel_notifier *handler);
 +void xfrm4_local_error(struct sk_buff *skb, u32 mtu);
 +int xfrm6_extract_header(struct sk_buff *skb);
 +int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb);
 +int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi);
 +int xfrm6_transport_finish(struct sk_buff *skb, int async);
 +int xfrm6_rcv(struct sk_buff *skb);
 +int xfrm6_input_addr(struct sk_buff *skb, xfrm_address_t *daddr,
 +		     xfrm_address_t *saddr, u8 proto);
 +int xfrm6_tunnel_register(struct xfrm6_tunnel *handler, unsigned short family);
 +int xfrm6_tunnel_deregister(struct xfrm6_tunnel *handler,
 +			    unsigned short family);
 +__be32 xfrm6_tunnel_alloc_spi(struct net *net, xfrm_address_t *saddr);
 +__be32 xfrm6_tunnel_spi_lookup(struct net *net, const xfrm_address_t *saddr);
 +int xfrm6_extract_output(struct xfrm_state *x, struct sk_buff *skb);
 +int xfrm6_prepare_output(struct xfrm_state *x, struct sk_buff *skb);
 +int xfrm6_output(struct sk_buff *skb);
 +int xfrm6_output_finish(struct sk_buff *skb);
 +int xfrm6_find_1stfragopt(struct xfrm_state *x, struct sk_buff *skb,
 +			  u8 **prevhdr);
 +void xfrm6_local_error(struct sk_buff *skb, u32 mtu);
  
  #ifdef CONFIG_XFRM
 -extern int xfrm4_udp_encap_rcv(struct sock *sk, struct sk_buff *skb);
 -extern int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen);
 +int xfrm4_udp_encap_rcv(struct sock *sk, struct sk_buff *skb);
 +int xfrm_user_policy(struct sock *sk, int optname,
 +		     u8 __user *optval, int optlen);
  #else
  static inline int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen)
  {

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: David Miller @ 2013-09-24  1:39 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev, jesse.brandeburg
In-Reply-To: <1379980991.3165.37.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 23 Sep 2013 17:03:11 -0700

> 1) Security alert here.
> 
> Many devices (lets say Android phones) have no entropy at this point,
> all devices will have same toeplitz key.
> 
> Check build_ehash_secret() for a possible point for the feeding of the
> key. (and commit 08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664 )
> 
> If hardware toeplitz is ever used, we want to make sure every host uses
> a private and hidden Toeplitz key.
> 
> 2) Also it seems a given tuple would hash the same on different
> namespaces. Could be a problem if one particular TCP hash bucket is
> holding thousand of sockets.
> 
> 3) jhash() is fast, there is no possible cache line misses

4) Random input to the hash is now not used at all, instant exploit
   because now any attacker can open up connections over and over that
   will all hash to the same hash bucket making our lookups linear.

^ permalink raw reply

* Re: [PATCH v2.39 7/7] datapath: Add basic MPLS support to kernel
From: Jesse Gross @ 2013-09-24  1:38 UTC (permalink / raw)
  To: Simon Horman
  Cc: Ben Pfaff, Pravin Shelar, dev@openvswitch.org, netdev, Ravi K,
	Isaku Yamahata, Joe Stringer
In-Reply-To: <20130924013222.GL25601@verge.net.au>

On Mon, Sep 23, 2013 at 6:32 PM, Simon Horman <horms@verge.net.au> wrote:
> On Mon, Sep 23, 2013 at 02:17:50PM -0700, Jesse Gross wrote:
>> On Sat, Sep 21, 2013 at 10:34 PM, Simon Horman <horms@verge.net.au> wrote:
>> > On Thu, Sep 19, 2013 at 12:21:33PM -0500, Jesse Gross wrote:
>> >> On Thu, Sep 19, 2013 at 10:57 AM, Simon Horman <horms@verge.net.au> wrote:
>> >> > On Mon, Sep 16, 2013 at 03:38:21PM -0500, Jesse Gross wrote:
>> >> >> On Mon, Sep 9, 2013 at 12:20 AM, Simon Horman <horms@verge.net.au> wrote:
>> >> One other consideration in the OVS case - with recirculation we may
>> >> hit this code multiple times and the difference in behavior could be
>> >> surprising. However, on the other hand, we need to be careful because
>> >> skb->cb is not guaranteed to be initialized to zero.
>> >
>> > Thanks, that is also not something that I had considered.
>> >
>> > I'm not sure, but I think that we can rely on skb->cb
>> > not being clobbered between rounds of recirculation.
>> > Or at the very least I think we could save and restore it
>> > as necessary.
>>
>> Yes, it should be safe to assume this.
>>
>> > So I think if we could be careful to make sure that inner_protocol
>> > is in a sane state the first time we see the skb but not
>> > each time it is recirculated then I think things should work out.
>> >
>> > In my current implementation of recirculation the datapath
>> > side is driven ovs_dp_process_received_packet(). So by my reasoning
>> > above I think it would make sense to reset the inner_protocol there
>> > and in ovs_packet_cmd_execute() rather than in ovs_execute_actions()
>> > which each of those functions call.
>>
>> I think that would work, however, I wonder if it's the right place in
>> general, independent of this compatibility issue. I guess it still
>> seems like the ideal thing to do is to move this as close to where it
>> is necessary as possible, specifically in mpls_push(). Is there a
>> reason to not put it there (again, other than the out-of-tree
>> compatibility issues)?
>
> I agree that should work, out-of-tree compatibility issues aside.
>
> Perhaps a solution is to have a conditional set_inner_protocol call inside
> push_mpls, where the condition is that inner_protocol is zero.
> And a reset_inner_protocol call earlier on, a call that sets inner_protocol
> to zero only if the compatibility code is in use and thus it resides in
> struct ovs_gso_cb. This call could be remove once the compatibility
> code is no longer needed, that is once kernels older than 3.11 are no
> longer supported.

I agree that's probably the right solution.

^ permalink raw reply

* Re: [PATCH v2.39 7/7] datapath: Add basic MPLS support to kernel
From: Simon Horman @ 2013-09-24  1:33 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Pravin Shelar, dev@openvswitch.org, netdev, Ravi K,
	Isaku Yamahata, Joe Stringer
In-Reply-To: <CAEP_g=_Di4yUzR0ka_Ma73DPKRoeCYW-ZZvQ=_n7OVDmpRKbTQ@mail.gmail.com>

On Mon, Sep 23, 2013 at 02:24:31PM -0700, Jesse Gross wrote:
> On Mon, Sep 23, 2013 at 12:47 PM, Pravin Shelar <pshelar@nicira.com> wrote:
> > This patch does not work since vport-netdev does not include compat
> > gso header. after including gso.h it gives me compiler error.
> > Can you post combined patch with fixes?
> 
> I think it's probably because of this:
> 
> +#if 1 // LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37)
> +#define dev_queue_xmit rpl_dev_queue_xmit
> +#endif
> 
> But otherwise, I agree the approach is much nicer than what is currently there.

Sorry for letting that slip through.

I'll post a more complete patch after doing some more testing.

^ permalink raw reply

* Re: [PATCH v2.39 7/7] datapath: Add basic MPLS support to kernel
From: Simon Horman @ 2013-09-24  1:32 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Ben Pfaff, Pravin Shelar, dev@openvswitch.org, netdev, Ravi K,
	Isaku Yamahata, Joe Stringer
In-Reply-To: <CAEP_g=969qq6EAmo3mkaJP+tzqHQ9AUHje9j31CpzfD3RoRPaA@mail.gmail.com>

On Mon, Sep 23, 2013 at 02:17:50PM -0700, Jesse Gross wrote:
> On Sat, Sep 21, 2013 at 10:34 PM, Simon Horman <horms@verge.net.au> wrote:
> > On Thu, Sep 19, 2013 at 12:21:33PM -0500, Jesse Gross wrote:
> >> On Thu, Sep 19, 2013 at 10:57 AM, Simon Horman <horms@verge.net.au> wrote:
> >> > On Mon, Sep 16, 2013 at 03:38:21PM -0500, Jesse Gross wrote:
> >> >> On Mon, Sep 9, 2013 at 12:20 AM, Simon Horman <horms@verge.net.au> wrote:
> >> >> > @@ -616,6 +736,13 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb)
> >> >> >                 goto out_loop;
> >> >> >         }
> >> >> >
> >> >> > +       /* Needed to initialise inner protocol on kernels older
> >> >> > +        * than v3.11 where skb->inner_protocol is not present
> >> >> > +        * and compatibility code uses the OVS_CB(skb) to store
> >> >> > +        * the inner protocol.
> >> >> > +        */
> >> >> > +       ovs_skb_set_inner_protocol(skb, skb->protocol);
> >> >>
> >> >> The comment makes it sound like this code should just be deleted when
> >> >> upstreaming. However, I believe that we still need to initialize this
> >> >> field, right? Is this the best place do it or should it be conditional
> >> >> on adding a first MPLS header? (i.e. what happens if inner_protocol is
> >> >> already set and the packet simply passes through OVS?)
> >> >
> >> > I believe there are several problems here.
> >> >
> >> > The first one, which my comment was written around is that I think that if
> >> > inner_protocol is a field of struct sk_buff then we can rely on it already
> >> > being initialised.  However, if we are using compatibility code, where
> >> > inner_protcol is called in the callback field of struct sk_buff then I
> >> > think that OVS needs to initialise it.
> >>
> >> I'm not sure that it's true that inner_protocol is already initialized
> >> - I grepped the tree and the only assignment that I found is in
> >> skbuff.c in __copy_skb_header().
> >
> > My assumption was that it would be initialised to zero,
> > primarily due to the behaviour of __alloc_skb_head().
> > Perhaps the core code should be fixed to make my assumption true?
> 
> I misunderstood then - I think you can assume that it is initialized
> to zero, I though you meant that it was initialized to a protocol
> value. However, I then still have my original question - don't we need
> to set it here in both cases since we're not just 'initializing' it
> but actually setting a protocol?

I believe that you are correct and it needs to be set in both cases.

> >> One other consideration in the OVS case - with recirculation we may
> >> hit this code multiple times and the difference in behavior could be
> >> surprising. However, on the other hand, we need to be careful because
> >> skb->cb is not guaranteed to be initialized to zero.
> >
> > Thanks, that is also not something that I had considered.
> >
> > I'm not sure, but I think that we can rely on skb->cb
> > not being clobbered between rounds of recirculation.
> > Or at the very least I think we could save and restore it
> > as necessary.
> 
> Yes, it should be safe to assume this.
> 
> > So I think if we could be careful to make sure that inner_protocol
> > is in a sane state the first time we see the skb but not
> > each time it is recirculated then I think things should work out.
> >
> > In my current implementation of recirculation the datapath
> > side is driven ovs_dp_process_received_packet(). So by my reasoning
> > above I think it would make sense to reset the inner_protocol there
> > and in ovs_packet_cmd_execute() rather than in ovs_execute_actions()
> > which each of those functions call.
> 
> I think that would work, however, I wonder if it's the right place in
> general, independent of this compatibility issue. I guess it still
> seems like the ideal thing to do is to move this as close to where it
> is necessary as possible, specifically in mpls_push(). Is there a
> reason to not put it there (again, other than the out-of-tree
> compatibility issues)?

I agree that should work, out-of-tree compatibility issues aside.

Perhaps a solution is to have a conditional set_inner_protocol call inside
push_mpls, where the condition is that inner_protocol is zero.
And a reset_inner_protocol call earlier on, a call that sets inner_protocol
to zero only if the compatibility code is in use and thus it resides in
struct ovs_gso_cb. This call could be remove once the compatibility
code is no longer needed, that is once kernels older than 3.11 are no
longer supported.

^ permalink raw reply

* Re: Question on Netlink IPv6 routing table lookup
From: Hannes Frederic Sowa @ 2013-09-24  0:04 UTC (permalink / raw)
  To: Fernando Gont; +Cc: netdev
In-Reply-To: <52409953.8040208@gont.com.ar>

On Mon, Sep 23, 2013 at 04:41:07PM -0300, Fernando Gont wrote:
> If that's not (currently) possible, should I expect RTA_SRC to work as
> described above at some point in the future?

The RTA_SRC attriute matches on sutrees in the ipv6 routing table:

ip -6 r a default via fe80::1 dev eth0 from 2000::/64
ip -6 r a default via fe80::2 dev eth0 from 2000:1:/64

ip -6 r g :: from 2000::
ip -6 r g :: from 2000:1::

...should return different routes. The from parameter is the RTA_SRC
attribute.

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH 1/2] net: Toeplitz library functions
From: Eric Dumazet @ 2013-09-24  0:03 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, jesse.brandeburg
In-Reply-To: <alpine.DEB.2.02.1309231535030.23896@tomh.mtv.corp.google.com>

On Mon, 2013-09-23 at 15:41 -0700, Tom Herbert wrote:

> +#ifdef CONFIG_NET_TOEPLITZ
> +	toeplitz_net = toeplitz_alloc();
> +	if (!toeplitz_net)
> +		goto out;
> +
> +	toeplitz_init(toeplitz_net, NULL);
> +#endif
> +

Hmm

1) Security alert here.

Many devices (lets say Android phones) have no entropy at this point,
all devices will have same toeplitz key.

Check build_ehash_secret() for a possible point for the feeding of the
key. (and commit 08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664 )

If hardware toeplitz is ever used, we want to make sure every host uses
a private and hidden Toeplitz key.

2) Also it seems a given tuple would hash the same on different
namespaces. Could be a problem if one particular TCP hash bucket is
holding thousand of sockets.

3) jhash() is fast, there is no possible cache line misses

  With your implementation, toeplitz hashing 36 bytes could have a cost
of 36 additional cache line misses.

You do not see that on TCP_RR test because cpu caches are preloaded, but
it will show on latency sensitive workload.

^ permalink raw reply

* [PATCH ethtool 2/2] Hide state of VLAN tag offload and LRO if the kernel is too old
From: Ben Hutchings @ 2013-09-23 23:26 UTC (permalink / raw)
  To: netdev; +Cc: linux-net-drivers

Starting with Linux 2.6.37 and ethtool 2.6.36 it was possible to show
the state of VLAN tag offload (using ETHTOOL_GFLAGS).  But the state
would always be shown as 'off' for older kernel versions, even though
VLAN tag offload had been implemented long before this.

In ethtool 3.4.2 I attempted to fix this by also reading the state of
VLAN tag offload from the 'features' attribute in sysfs.  But this had
to be reverted because it causes 'ethtool -K' to pass the flags back
into ETHTOOL_SFLAGS.

Instead, hide the VLAN tag offload flags if the kernel is older than
2.6.37.

Similarly, LRO was implemented some time before it was exposed through
ETHTOOL_GFLAGS in Linux 2.6.24.  So hide the LRO flag if the kernel
is older than that.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 ethtool.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 46 insertions(+), 12 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 2dc07d3..b06dfa3 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -113,6 +113,8 @@ enum {
 };
 #endif
 
+#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))
+
 static void exit_bad_args(void) __attribute__((noreturn));
 
 static void exit_bad_args(void)
@@ -183,32 +185,43 @@ struct off_flag_def {
 	const char *kernel_name;
 	u32 get_cmd, set_cmd;
 	u32 value;
+	/* For features exposed through ETHTOOL_GFLAGS, the oldest
+	 * kernel version for which we can trust the result.  Where
+	 * the flag was added at the same time the kernel started
+	 * supporting the feature, this is 0 (to allow for backports).
+	 * Where the feature was supported before the flag was added,
+	 * it is the version that introduced the flag.
+	 */
+	u32 min_kernel_ver;
 };
 static const struct off_flag_def off_flag_def[] = {
 	{ "rx",     "rx-checksumming",		    "rx-checksum",
-	  ETHTOOL_GRXCSUM, ETHTOOL_SRXCSUM, ETH_FLAG_RXCSUM },
+	  ETHTOOL_GRXCSUM, ETHTOOL_SRXCSUM, ETH_FLAG_RXCSUM,	0 },
 	{ "tx",     "tx-checksumming",		    "tx-checksum-*",
-	  ETHTOOL_GTXCSUM, ETHTOOL_STXCSUM, ETH_FLAG_TXCSUM },
+	  ETHTOOL_GTXCSUM, ETHTOOL_STXCSUM, ETH_FLAG_TXCSUM,	0 },
 	{ "sg",     "scatter-gather",		    "tx-scatter-gather*",
-	  ETHTOOL_GSG,	   ETHTOOL_SSG,     ETH_FLAG_SG },
+	  ETHTOOL_GSG,	   ETHTOOL_SSG,     ETH_FLAG_SG,	0 },
 	{ "tso",    "tcp-segmentation-offload",	    "tx-tcp*-segmentation",
-	  ETHTOOL_GTSO,	   ETHTOOL_STSO,    ETH_FLAG_TSO },
+	  ETHTOOL_GTSO,	   ETHTOOL_STSO,    ETH_FLAG_TSO,	0 },
 	{ "ufo",    "udp-fragmentation-offload",    "tx-udp-fragmentation",
-	  ETHTOOL_GUFO,	   ETHTOOL_SUFO,    ETH_FLAG_UFO },
+	  ETHTOOL_GUFO,	   ETHTOOL_SUFO,    ETH_FLAG_UFO,	0 },
 	{ "gso",    "generic-segmentation-offload", "tx-generic-segmentation",
-	  ETHTOOL_GGSO,	   ETHTOOL_SGSO,    ETH_FLAG_GSO },
+	  ETHTOOL_GGSO,	   ETHTOOL_SGSO,    ETH_FLAG_GSO,	0 },
 	{ "gro",    "generic-receive-offload",	    "rx-gro",
-	  ETHTOOL_GGRO,	   ETHTOOL_SGRO,    ETH_FLAG_GRO },
+	  ETHTOOL_GGRO,	   ETHTOOL_SGRO,    ETH_FLAG_GRO,	0 },
 	{ "lro",    "large-receive-offload",	    "rx-lro",
-	  0,		   0,		    ETH_FLAG_LRO },
+	  0,		   0,		    ETH_FLAG_LRO,
+	  KERNEL_VERSION(2,6,24) },
 	{ "rxvlan", "rx-vlan-offload",		    "rx-vlan-hw-parse",
-	  0,		   0,		    ETH_FLAG_RXVLAN },
+	  0,		   0,		    ETH_FLAG_RXVLAN,
+	  KERNEL_VERSION(2,6,37) },
 	{ "txvlan", "tx-vlan-offload",		    "tx-vlan-hw-insert",
-	  0,		   0,		    ETH_FLAG_TXVLAN },
+	  0,		   0,		    ETH_FLAG_TXVLAN,
+	  KERNEL_VERSION(2,6,37) },
 	{ "ntuple", "ntuple-filters",		    "rx-ntuple-filter",
-	  0,		   0,		    ETH_FLAG_NTUPLE },
+	  0,		   0,		    ETH_FLAG_NTUPLE,	0 },
 	{ "rxhash", "receive-hashing",		    "rx-hashing",
-	  0,		   0,		    ETH_FLAG_RXHASH },
+	  0,		   0,		    ETH_FLAG_RXHASH,	0 },
 };
 
 struct feature_def {
@@ -1179,15 +1192,36 @@ static void dump_one_feature(const char *indent, const char *name,
 	       : "");
 }
 
+static int linux_version_code(void)
+{
+	struct utsname utsname;
+	unsigned version, patchlevel, sublevel = 0;
+
+	if (uname(&utsname))
+		return -1;
+	if (sscanf(utsname.release, "%u.%u.%u", &version, &patchlevel, &sublevel) < 2)
+		return -1;
+	return KERNEL_VERSION(version, patchlevel, sublevel);
+}
+
 static void dump_features(const struct feature_defs *defs,
 			  const struct feature_state *state,
 			  const struct feature_state *ref_state)
 {
+	int kernel_ver = linux_version_code();
 	u32 value;
 	int indent;
 	int i, j;
 
 	for (i = 0; i < ARRAY_SIZE(off_flag_def); i++) {
+		/* Don't show features whose state is unknown on this
+		 * kernel version
+		 */
+		if (defs->off_flag_matched[i] == 0 &&
+		    off_flag_def[i].get_cmd == 0 &&
+		    kernel_ver < off_flag_def[i].min_kernel_ver)
+			continue;
+
 		value = off_flag_def[i].value;
 
 		/* If this offload flag matches exactly one generic
-- 
1.8.1.4


-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* Re: [PATCH 2/2] net: Use Toeplitz for IPv4 and IPv6 connection hashing
From: Tom Herbert @ 2013-09-23 23:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, Linux Netdev List, Brandeburg, Jesse
In-Reply-To: <20130923161115.5d756838@nehalam.linuxnetplumber.net>

On Mon, Sep 23, 2013 at 4:11 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Mon, 23 Sep 2013 15:44:51 -0700 (PDT)
> Tom Herbert <therbert@google.com> wrote:
>
>> Toeplitz
>>   IPv4
>>     58.72% CPU utilization
>>     110/146/198 90/95/99% latencies
>>     1.72549e+06 tps
>>   IPv6
>>     72.38% CPU utilization
>>     117/168/255 90/95/99% latencies
>>     1.58545e+06 tps
>>
>> Jhash
>>   IPv4
>>     57.67% CPU utilization
>>     111/146/196 90/95/99% latencies
>>     1.71574e+06 tps
>>   IPv6
>>     71.84% CPU utilization
>>     117/166/248 90/95/99% latencies
>>     1.59359e+06 tps
>
> It looks slower and more complex than Jhash, what is the benefit?
> Have you investigated using Murmur instead?

Benefit would be to leverage and be compatible HW hash computation...
perhaps this is just an intellectual curiosity :-)

^ permalink raw reply

* [PATCH ethtool 1/2] Revert "Fix reporting of VLAN tag offload flags on Linux < 2.6.37"
From: Ben Hutchings @ 2013-09-23 23:24 UTC (permalink / raw)
  To: netdev; +Cc: linux-net-drivers

This reverts commit 1cddbe64cfc66b58988c85086d6df3a77c0c61a5.
It causes 'ethtool -K' to pass the VLAN tag offload flags to
ETHTOOL_SFLAGS when they are not supported and will be rejected.
---
 ethtool.c | 41 -----------------------------------------
 1 file changed, 41 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index dcdc0a9..2dc07d3 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -35,8 +35,6 @@
 #include <sys/utsname.h>
 #include <limits.h>
 #include <ctype.h>
-#include <assert.h>
-#include <sys/fcntl.h>
 
 #include <sys/socket.h>
 #include <netinet/in.h>
@@ -1506,31 +1504,6 @@ static struct feature_defs *get_feature_defs(struct cmd_context *ctx)
 	return defs;
 }
 
-static int get_netdev_attr(struct cmd_context *ctx, const char *name,
-		    char *buf, size_t buf_len)
-{
-#ifdef TEST_ETHTOOL
-	errno = ENOENT;
-	return -1;
-#else
-	char path[40 + IFNAMSIZ];
-	ssize_t len;
-	int fd;
-
-	len = snprintf(path, sizeof(path), "/sys/class/net/%s/%s",
-		       ctx->devname, name);
-	assert(len < sizeof(path));
-	fd = open(path, O_RDONLY);
-	if (fd < 0)
-		return fd;
-	len = read(fd, buf, buf_len - 1);
-	if (len >= 0)
-		buf[len] = 0;
-	close(fd);
-	return len;
-#endif
-}
-
 static int do_gdrv(struct cmd_context *ctx)
 {
 	int err;
@@ -1972,20 +1945,6 @@ get_features(struct cmd_context *ctx, const struct feature_defs *defs)
 			perror("Cannot get device generic features");
 		else
 			allfail = 0;
-	} else {
-		/* We should have got VLAN tag offload flags through
-		 * ETHTOOL_GFLAGS.  However, prior to Linux 2.6.37
-		 * they were not exposed in this way - and since VLAN
-		 * tag offload was defined and implemented by many
-		 * drivers, we shouldn't assume they are off.
-		 * Instead, since these feature flag values were
-		 * stable, read them from sysfs.
-		 */
-		char buf[20];
-		if (get_netdev_attr(ctx, "features", buf, sizeof(buf)) > 0)
-			state->off_flags |=
-				strtoul(buf, NULL, 0) &
-				(ETH_FLAG_RXVLAN | ETH_FLAG_TXVLAN);
 	}
 
 	if (allfail) {
-- 
1.8.1.4



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* Re: [PATCH 2/2] net: Use Toeplitz for IPv4 and IPv6 connection hashing
From: Hannes Frederic Sowa @ 2013-09-23 23:17 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, jesse.brandeburg
In-Reply-To: <alpine.DEB.2.02.1309231543330.23714@tomh.mtv.corp.google.com>

On Mon, Sep 23, 2013 at 03:44:51PM -0700, Tom Herbert wrote:
> diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
> index f52fa88..492a45b 100644
> --- a/include/net/inet6_hashtables.h
> +++ b/include/net/inet6_hashtables.h
> @@ -32,12 +32,28 @@ static inline unsigned int inet6_ehashfn(struct net *net,
>  				const struct in6_addr *laddr, const u16 lport,
>  				const struct in6_addr *faddr, const __be16 fport)
>  {
> +#if IS_ENABLED(CONFIG_IP_HASH_TOEPLITZ)
> +	struct {
> +		struct in6_addr saddr;
> +		struct in6_addr daddr;
> +		u16 sport;
> +		u16 dport;
> +	} input;
> +
> +        input.daddr = *laddr;
> +        input.saddr = *faddr;
> +        input.sport = htons(lport);
> +        input.dport = fport;
> +
> +        return toeplitz_hash((u8 *)&input, toeplitz_net, sizeof(input));
> +#else
>  	u32 ports = (((u32)lport) << 16) | (__force u32)fport;
>  
>  	return jhash_3words((__force u32)laddr->s6_addr32[3],
>  			    ipv6_addr_jhash(faddr),
>  			    ports,
>  			    inet_ehash_secret + net_hash_mix(net));
> +#endif

You seem to discard the secret inputs. This should make the hashing
considerable more insecure.

I always believed the reason for choosing linear feedback shift register
based hash functions was because of the parallelism a pure hardware
based implementation could exploit. This does not matter for the kernel.

IMHO jhash should be considered more secure just because of its wider usage.
;)

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH 0/2] Toeplitz hashing
From: Rick Jones @ 2013-09-23 23:16 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, jesse.brandeburg
In-Reply-To: <alpine.DEB.2.02.1309231534010.23714@tomh.mtv.corp.google.com>

On 09/23/2013 03:41 PM, Tom Herbert wrote:
> There's also the possibility of calculating the Toeplitz hash for a
> connection in order to deduce which RSS queue the connection maps
> to, and thus allow a poor man's aRFS (idea from Jesse Brandeburg).

Perhaps the answer is in the meaning of the leading 'a' in aRFS but 
doesn't one still depend on the thread of execution handling only one 
flow for that to work effectively?  Or would flows be assigned to 
threads of execution based on the Toeplitz hash of the connections?

rick jones

^ permalink raw reply

* Re: [PATCH 2/2] net: Use Toeplitz for IPv4 and IPv6 connection hashing
From: Stephen Hemminger @ 2013-09-23 23:11 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, jesse.brandeburg
In-Reply-To: <alpine.DEB.2.02.1309231543330.23714@tomh.mtv.corp.google.com>

On Mon, 23 Sep 2013 15:44:51 -0700 (PDT)
Tom Herbert <therbert@google.com> wrote:

> Toeplitz
>   IPv4
>     58.72% CPU utilization
>     110/146/198 90/95/99% latencies
>     1.72549e+06 tps
>   IPv6
>     72.38% CPU utilization
>     117/168/255 90/95/99% latencies
>     1.58545e+06 tps
> 
> Jhash
>   IPv4
>     57.67% CPU utilization
>     111/146/196 90/95/99% latencies
>     1.71574e+06 tps
>   IPv6
>     71.84% CPU utilization
>     117/166/248 90/95/99% latencies
>     1.59359e+06 tps

It looks slower and more complex than Jhash, what is the benefit?
Have you investigated using Murmur instead?

^ permalink raw reply

* Re: BUG: 32 bit net stats
From: Jamal Hadi Salim @ 2013-09-23 22:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Eric Dumazet
In-Reply-To: <5240BD64.1020601@mojatatu.com>

On 13-09-23 06:15 PM, Jamal Hadi Salim wrote:

> Sorry - let me take a closer look and get back
> to you. I think the bug maybe in user space.

Sigh - long standing bug on my (user space) side
got aggrevated by last changes.
Ignore the noise. Kernel works fine.

cheers,
jamal

^ permalink raw reply

* Re:
From: Tom Herbert @ 2013-09-23 22:45 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, Brandeburg, Jesse
In-Reply-To: <alpine.DEB.2.02.1309231535460.23966@tomh.mtv.corp.google.com>

Disregard...

On Mon, Sep 23, 2013 at 3:41 PM, Tom Herbert <therbert@google.com> wrote:
> From cf54b0651b7ea35fab4c398f1732e800550732ef Mon Sep 17 00:00:00 2001
> From: Tom Herbert <therbert@google.com>
> Date: Mon, 23 Sep 2013 12:27:17 -0700
> Subject: [PATCH 2/2] net: Use Toeplitz for IPv4 and IPv6 connection hashing
>
> Add a config option to specify which hash to use for IPv4 and IPv6
> established connection hashing. The alternative option is original
> jhash method (this patch sets Toeplitz to default).
>
> Toeplitz is a little more heavy weight than jhash method.  For IPv4
> the difference seems to be negligible, for IPv6 there is some
> performance regression due mostly to the fact that Toeplitz hashes
> over all the bits in the IPv6 address whereas Jhash doesn't (this
> implies that Toeplitz might be more secure).
>
> Some performance numbers using 200 netperf TCP_RR clients:
>
> Toeplitz
>   IPv4
>     58.72% CPU utilization
>     110/146/198 90/95/99% latencies
>     1.72549e+06 tps
>   IPv6
>     72.38% CPU utilization
>     117/168/255 90/95/99% latencies
>     1.58545e+06 tps
>
> Jhash
>   IPv4
>     57.67% CPU utilization
>     111/146/196 90/95/99% latencies
>     1.71574e+06 tps
>   IPv6
>     71.84% CPU utilization
>     117/166/248 90/95/99% latencies
>     1.59359e+06 tps
>
> Standalone performance measurement:
>
> Toeplitz
>   IPv4
>     40 nsecs/hash
>   IPv6
>     105 nsecs/hash
> Jhash
>   IPv4
>     39 nsecs/hash
>   IPv6
>     77 nsecs/hash
>
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>  include/net/inet6_hashtables.h | 16 ++++++++++++++++
>  include/net/inet_sock.h        | 16 ++++++++++++++++
>  net/ipv4/Kconfig               | 14 ++++++++++++++
>  3 files changed, 46 insertions(+)
>
> diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
> index f52fa88..492a45b 100644
> --- a/include/net/inet6_hashtables.h
> +++ b/include/net/inet6_hashtables.h
> @@ -32,12 +32,28 @@ static inline unsigned int inet6_ehashfn(struct net *net,
>                                 const struct in6_addr *laddr, const u16 lport,
>                                 const struct in6_addr *faddr, const __be16 fport)
>  {
> +#if IS_ENABLED(CONFIG_IP_HASH_TOEPLITZ)
> +       struct {
> +               struct in6_addr saddr;
> +               struct in6_addr daddr;
> +               u16 sport;
> +               u16 dport;
> +       } input;
> +
> +        input.daddr = *laddr;
> +        input.saddr = *faddr;
> +        input.sport = htons(lport);
> +        input.dport = fport;
> +
> +        return toeplitz_hash((u8 *)&input, toeplitz_net, sizeof(input));
> +#else
>         u32 ports = (((u32)lport) << 16) | (__force u32)fport;
>
>         return jhash_3words((__force u32)laddr->s6_addr32[3],
>                             ipv6_addr_jhash(faddr),
>                             ports,
>                             inet_ehash_secret + net_hash_mix(net));
> +#endif
>  }
>
>  static inline int inet6_sk_ehashfn(const struct sock *sk)
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index 636d203..02e2ee2 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -209,10 +209,26 @@ static inline unsigned int inet_ehashfn(struct net *net,
>                                         const __be32 laddr, const __u16 lport,
>                                         const __be32 faddr, const __be16 fport)
>  {
> +#if IS_ENABLED(CONFIG_IP_HASH_TOEPLITZ)
> +       struct {
> +               u32 saddr;
> +               u32 daddr;
> +               u16 sport;
> +               u16 dport;
> +       } input;
> +
> +       input.saddr = faddr;
> +       input.daddr = laddr;
> +       input.sport = fport;
> +       input.dport = htons(lport);
> +
> +       return toeplitz_hash((u8 *)&input, toeplitz_net, sizeof(input));
> +#else
>         return jhash_3words((__force __u32) laddr,
>                             (__force __u32) faddr,
>                             ((__u32) lport) << 16 | (__force __u32)fport,
>                             inet_ehash_secret + net_hash_mix(net));
> +#endif
>  }
>
>  static inline int inet_sk_ehashfn(const struct sock *sk)
> diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
> index 05c57f0..c9a533f 100644
> --- a/net/ipv4/Kconfig
> +++ b/net/ipv4/Kconfig
> @@ -104,6 +104,20 @@ config IP_ROUTE_VERBOSE
>  config IP_ROUTE_CLASSID
>         bool
>
> +choice
> +       prompt "IP: connection hashing algorithm"
> +       default IP_HASH_TOEPLITZ
> +       help
> +         Select the default hashing algortihm for IP connections
> +
> +       config IP_HASH_JHASH
> +               bool "Jhash"
> +
> +       config IP_HASH_TOEPLITZ
> +               bool "Toeplitz"
> +               select NET_TOEPLITZ
> +endchoice
> +
>  config IP_PNP
>         bool "IP: kernel level autoconfiguration"
>         help
> --
> 1.8.4
>

^ permalink raw reply

* [PATCH 2/2] net: Use Toeplitz for IPv4 and IPv6 connection hashing
From: Tom Herbert @ 2013-09-23 22:44 UTC (permalink / raw)
  To: davem; +Cc: netdev, jesse.brandeburg

Add a config option to specify which hash to use for IPv4 and IPv6
established connection hashing. The alternative option is original
jhash method (this patch sets Toeplitz to default).

Toeplitz is a little more heavy weight than jhash method.  For IPv4
the difference seems to be negligible, for IPv6 there is some
performance regression due mostly to the fact that Toeplitz hashes
over all the bits in the IPv6 address whereas Jhash doesn't (this
implies that Toeplitz might be more secure).

Some performance numbers using 200 netperf TCP_RR clients:

Toeplitz
  IPv4
    58.72% CPU utilization
    110/146/198 90/95/99% latencies
    1.72549e+06 tps
  IPv6
    72.38% CPU utilization
    117/168/255 90/95/99% latencies
    1.58545e+06 tps

Jhash
  IPv4
    57.67% CPU utilization
    111/146/196 90/95/99% latencies
    1.71574e+06 tps
  IPv6
    71.84% CPU utilization
    117/166/248 90/95/99% latencies
    1.59359e+06 tps

Standalone performance measurement:

Toeplitz
  IPv4
    40 nsecs/hash
  IPv6
    105 nsecs/hash
Jhash
  IPv4
    39 nsecs/hash
  IPv6
    77 nsecs/hash

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/net/inet6_hashtables.h | 16 ++++++++++++++++
 include/net/inet_sock.h        | 16 ++++++++++++++++
 net/ipv4/Kconfig               | 14 ++++++++++++++
 3 files changed, 46 insertions(+)

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index f52fa88..492a45b 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -32,12 +32,28 @@ static inline unsigned int inet6_ehashfn(struct net *net,
 				const struct in6_addr *laddr, const u16 lport,
 				const struct in6_addr *faddr, const __be16 fport)
 {
+#if IS_ENABLED(CONFIG_IP_HASH_TOEPLITZ)
+	struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		u16 sport;
+		u16 dport;
+	} input;
+
+        input.daddr = *laddr;
+        input.saddr = *faddr;
+        input.sport = htons(lport);
+        input.dport = fport;
+
+        return toeplitz_hash((u8 *)&input, toeplitz_net, sizeof(input));
+#else
 	u32 ports = (((u32)lport) << 16) | (__force u32)fport;
 
 	return jhash_3words((__force u32)laddr->s6_addr32[3],
 			    ipv6_addr_jhash(faddr),
 			    ports,
 			    inet_ehash_secret + net_hash_mix(net));
+#endif
 }
 
 static inline int inet6_sk_ehashfn(const struct sock *sk)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 636d203..02e2ee2 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -209,10 +209,26 @@ static inline unsigned int inet_ehashfn(struct net *net,
 					const __be32 laddr, const __u16 lport,
 					const __be32 faddr, const __be16 fport)
 {
+#if IS_ENABLED(CONFIG_IP_HASH_TOEPLITZ)
+	struct {
+		u32 saddr;
+		u32 daddr;
+		u16 sport;
+		u16 dport;
+	} input;
+
+	input.saddr = faddr;
+	input.daddr = laddr;
+	input.sport = fport;
+	input.dport = htons(lport);
+
+	return toeplitz_hash((u8 *)&input, toeplitz_net, sizeof(input));
+#else
 	return jhash_3words((__force __u32) laddr,
 			    (__force __u32) faddr,
 			    ((__u32) lport) << 16 | (__force __u32)fport,
 			    inet_ehash_secret + net_hash_mix(net));
+#endif
 }
 
 static inline int inet_sk_ehashfn(const struct sock *sk)
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 05c57f0..c9a533f 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -104,6 +104,20 @@ config IP_ROUTE_VERBOSE
 config IP_ROUTE_CLASSID
 	bool
 
+choice
+	prompt "IP: connection hashing algorithm"
+	default IP_HASH_TOEPLITZ
+	help
+	  Select the default hashing algortihm for IP connections
+
+	config IP_HASH_JHASH
+		bool "Jhash"
+
+	config IP_HASH_TOEPLITZ
+		bool "Toeplitz"
+		select NET_TOEPLITZ
+endchoice
+
 config IP_PNP
 	bool "IP: kernel level autoconfiguration"
 	help
-- 
1.8.4

^ permalink raw reply related

* [PATCH 1/2] net: Toeplitz library functions
From: Tom Herbert @ 2013-09-23 22:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, jesse.brandeburg

Introduce Toeplitz hash functions. Toeplitz is a hash used primarily in
NICs to performan RSS flow steering.  This is a software implemenation
of that. In order to make the hash calculation efficient, we precompute
the possible hash values for each inidividual byte of input. The input
length is up to 40 bytes, so we make an array of cache[40][256].

The implemenation was verified against MSDN "Verify RSS hash" sample
values.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/netdevice.h |  3 +++
 include/linux/toeplitz.h  | 27 +++++++++++++++++++
 lib/Kconfig               |  3 +++
 lib/Makefile              |  2 ++
 lib/toeplitz.c            | 66 +++++++++++++++++++++++++++++++++++++++++++++++
 net/Kconfig               |  5 ++++
 net/core/dev.c            | 11 ++++++++
 7 files changed, 117 insertions(+)
 create mode 100644 include/linux/toeplitz.h
 create mode 100644 lib/toeplitz.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3de49ac..546caf2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -38,6 +38,7 @@
 #include <linux/dmaengine.h>
 #include <linux/workqueue.h>
 #include <linux/dynamic_queue_limits.h>
+#include <linux/toeplitz.h>
 
 #include <linux/ethtool.h>
 #include <net/net_namespace.h>
@@ -195,6 +196,8 @@ struct net_device_stats {
 extern struct static_key rps_needed;
 #endif
 
+extern struct toeplitz *toeplitz_net;
+
 struct neighbour;
 struct neigh_parms;
 struct sk_buff;
diff --git a/include/linux/toeplitz.h b/include/linux/toeplitz.h
new file mode 100644
index 0000000..bc0b8e8
--- /dev/null
+++ b/include/linux/toeplitz.h
@@ -0,0 +1,27 @@
+#ifndef __LINUX_TOEPLITZ_H
+#define __LINUX_TOEPLITZ_H
+
+#define TOEPLITZ_KEY_LEN 40
+
+struct toeplitz {
+	u8 key_vals[TOEPLITZ_KEY_LEN];
+	u32 key_cache[TOEPLITZ_KEY_LEN][256];
+};
+
+static inline unsigned int
+toeplitz_hash(const unsigned char *bytes,
+	      struct toeplitz *toeplitz, int n)
+{
+	int i;
+	unsigned int result = 0;
+
+	for (i = 0; i < n; i++)
+		result ^= toeplitz->key_cache[i][bytes[i]];
+
+        return result;
+};
+
+extern struct toeplitz *toeplitz_alloc(void);
+extern void toeplitz_init(struct toeplitz *toeplitz, u8 *key_vals);
+
+#endif /* __LINUX_TOEPLITZ_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index b3c8be0..463b2b1 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -359,6 +359,9 @@ config CPU_RMAP
 config DQL
 	bool
 
+config TOEPLITZ
+	bool
+
 #
 # Netlink attribute parsing support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index f3bb2cb..a28349b 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -133,6 +133,8 @@ obj-$(CONFIG_CORDIC) += cordic.o
 
 obj-$(CONFIG_DQL) += dynamic_queue_limits.o
 
+obj-$(CONFIG_TOEPLITZ) += toeplitz.o
+
 obj-$(CONFIG_MPILIB) += mpi/
 obj-$(CONFIG_SIGNATURE) += digsig.o
 
diff --git a/lib/toeplitz.c b/lib/toeplitz.c
new file mode 100644
index 0000000..0951dd9
--- /dev/null
+++ b/lib/toeplitz.c
@@ -0,0 +1,66 @@
+/*
+ * Toeplitz hash implemenation. See include/linux/toeplitz.h
+ *
+ * Copyright (c) 2011, Tom Herbert <therbert@google.com>
+ */
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/random.h>
+#include <linux/toeplitz.h>
+
+struct toeplitz *toeplitz_alloc(void)
+{
+	return kmalloc(sizeof(struct toeplitz), GFP_KERNEL);
+}
+
+static u32 toeplitz_get_kval(unsigned char *key, int idx)
+{
+	u32 v, r;
+	int off, rem;
+
+	off = idx / 8;
+	rem = idx % 8;
+
+	v = (((unsigned int)key[off]) << 24) +
+	    (((unsigned int)key[off + 1]) << 16) +
+	    (((unsigned int)key[off + 2]) << 8) +
+	    (((unsigned int)key[off + 3]));
+
+	r = v << rem | (unsigned int)key[off + 4] >> (8 - rem);
+	return r;
+}
+
+static inline int idx8(int idx)
+{
+#ifdef __LITTLE_ENDIAN
+        idx = (idx / 8) * 8 + (8 - (idx % 8 + 1));
+#endif
+        return idx;
+}
+
+void toeplitz_init(struct toeplitz *toeplitz, u8 *key_vals)
+{
+	int i;
+	unsigned long a, j;
+	unsigned int result = 0;
+
+	/* Set up key val table */
+	if (key_vals)
+		for (i = 0; i < TOEPLITZ_KEY_LEN; i++)
+			toeplitz->key_vals[i] = key_vals[i];
+	else
+		prandom_bytes(toeplitz->key_vals, TOEPLITZ_KEY_LEN);
+
+	/* Set up key cache table */
+	for (i = 0; i < TOEPLITZ_KEY_LEN; i++) {
+		for (j = 0; j < 256; j++) {
+			result = 0;
+			for (a = find_first_bit(&j, 8); a < 8;
+			    a = find_next_bit(&j, 8, a + 1))
+				result ^= toeplitz_get_kval(
+				   toeplitz->key_vals, idx8(a + (i * 8)));
+			toeplitz->key_cache[i][j] = result;
+		}
+	}
+}
diff --git a/net/Kconfig b/net/Kconfig
index b50dacc..860c9fa 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -254,6 +254,11 @@ config BQL
 	select DQL
 	default y
 
+config NET_TOEPLITZ
+	boolean
+	select TOEPLITZ
+	default n
+
 config BPF_JIT
 	bool "enable BPF Just In Time compiler"
 	depends on HAVE_BPF_JIT
diff --git a/net/core/dev.c b/net/core/dev.c
index 5c713f2..074f530 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6633,6 +6633,9 @@ static struct pernet_operations __net_initdata default_device_ops = {
 	.exit_batch = default_device_exit_batch,
 };
 
+struct toeplitz *toeplitz_net;
+EXPORT_SYMBOL(toeplitz_net);
+
 /*
  *	Initialize the DEV module. At boot time this walks the device list and
  *	unhooks any devices that fail to initialise (normally hardware not
@@ -6656,6 +6659,14 @@ static int __init net_dev_init(void)
 	if (netdev_kobject_init())
 		goto out;
 
+#ifdef CONFIG_NET_TOEPLITZ
+	toeplitz_net = toeplitz_alloc();
+	if (!toeplitz_net)
+		goto out;
+
+	toeplitz_init(toeplitz_net, NULL);
+#endif
+
 	INIT_LIST_HEAD(&ptype_all);
 	for (i = 0; i < PTYPE_HASH_SIZE; i++)
 		INIT_LIST_HEAD(&ptype_base[i]);
-- 
1.8.4

^ permalink raw reply related

* (unknown)
From: Tom Herbert @ 2013-09-23 22:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, jesse.brandeburg

>From cf54b0651b7ea35fab4c398f1732e800550732ef Mon Sep 17 00:00:00 2001
From: Tom Herbert <therbert@google.com>
Date: Mon, 23 Sep 2013 12:27:17 -0700
Subject: [PATCH 2/2] net: Use Toeplitz for IPv4 and IPv6 connection hashing

Add a config option to specify which hash to use for IPv4 and IPv6
established connection hashing. The alternative option is original
jhash method (this patch sets Toeplitz to default).

Toeplitz is a little more heavy weight than jhash method.  For IPv4
the difference seems to be negligible, for IPv6 there is some
performance regression due mostly to the fact that Toeplitz hashes
over all the bits in the IPv6 address whereas Jhash doesn't (this
implies that Toeplitz might be more secure).

Some performance numbers using 200 netperf TCP_RR clients:

Toeplitz
  IPv4
    58.72% CPU utilization
    110/146/198 90/95/99% latencies
    1.72549e+06 tps
  IPv6
    72.38% CPU utilization
    117/168/255 90/95/99% latencies
    1.58545e+06 tps

Jhash
  IPv4
    57.67% CPU utilization
    111/146/196 90/95/99% latencies
    1.71574e+06 tps
  IPv6
    71.84% CPU utilization
    117/166/248 90/95/99% latencies
    1.59359e+06 tps

Standalone performance measurement:

Toeplitz
  IPv4
    40 nsecs/hash
  IPv6
    105 nsecs/hash
Jhash
  IPv4
    39 nsecs/hash
  IPv6
    77 nsecs/hash

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/net/inet6_hashtables.h | 16 ++++++++++++++++
 include/net/inet_sock.h        | 16 ++++++++++++++++
 net/ipv4/Kconfig               | 14 ++++++++++++++
 3 files changed, 46 insertions(+)

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index f52fa88..492a45b 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -32,12 +32,28 @@ static inline unsigned int inet6_ehashfn(struct net *net,
 				const struct in6_addr *laddr, const u16 lport,
 				const struct in6_addr *faddr, const __be16 fport)
 {
+#if IS_ENABLED(CONFIG_IP_HASH_TOEPLITZ)
+	struct {
+		struct in6_addr saddr;
+		struct in6_addr daddr;
+		u16 sport;
+		u16 dport;
+	} input;
+
+        input.daddr = *laddr;
+        input.saddr = *faddr;
+        input.sport = htons(lport);
+        input.dport = fport;
+
+        return toeplitz_hash((u8 *)&input, toeplitz_net, sizeof(input));
+#else
 	u32 ports = (((u32)lport) << 16) | (__force u32)fport;
 
 	return jhash_3words((__force u32)laddr->s6_addr32[3],
 			    ipv6_addr_jhash(faddr),
 			    ports,
 			    inet_ehash_secret + net_hash_mix(net));
+#endif
 }
 
 static inline int inet6_sk_ehashfn(const struct sock *sk)
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 636d203..02e2ee2 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -209,10 +209,26 @@ static inline unsigned int inet_ehashfn(struct net *net,
 					const __be32 laddr, const __u16 lport,
 					const __be32 faddr, const __be16 fport)
 {
+#if IS_ENABLED(CONFIG_IP_HASH_TOEPLITZ)
+	struct {
+		u32 saddr;
+		u32 daddr;
+		u16 sport;
+		u16 dport;
+	} input;
+
+	input.saddr = faddr;
+	input.daddr = laddr;
+	input.sport = fport;
+	input.dport = htons(lport);
+
+	return toeplitz_hash((u8 *)&input, toeplitz_net, sizeof(input));
+#else
 	return jhash_3words((__force __u32) laddr,
 			    (__force __u32) faddr,
 			    ((__u32) lport) << 16 | (__force __u32)fport,
 			    inet_ehash_secret + net_hash_mix(net));
+#endif
 }
 
 static inline int inet_sk_ehashfn(const struct sock *sk)
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 05c57f0..c9a533f 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -104,6 +104,20 @@ config IP_ROUTE_VERBOSE
 config IP_ROUTE_CLASSID
 	bool
 
+choice
+	prompt "IP: connection hashing algorithm"
+	default IP_HASH_TOEPLITZ
+	help
+	  Select the default hashing algortihm for IP connections
+
+	config IP_HASH_JHASH
+		bool "Jhash"
+
+	config IP_HASH_TOEPLITZ
+		bool "Toeplitz"
+		select NET_TOEPLITZ
+endchoice
+
 config IP_PNP
 	bool "IP: kernel level autoconfiguration"
 	help
-- 
1.8.4

^ permalink raw reply related

* [PATCH 0/2] Toeplitz hashing
From: Tom Herbert @ 2013-09-23 22:41 UTC (permalink / raw)
  To: davem; +Cc: netdev, jesse.brandeburg

These patches introduce software implementation of Toeplitz hashing.
The first use case is in inet_ehashfn and inet6_ehashfn using Toeplitz
for just IPv4 and IPv6 TCP connection lookup.

In some follow up patches we can move to using the HW hash itself to
do connection lookup (possibly eliminating SW hash computation on RX.
These will do:
  1) Convert skb_get_rxhash (used for RPS steering) to use Toeplitz. 
  2) Allow TCP connection lookup to use rxhash if the hash if the
     hash is marked as being proper Toeplitz hash.
  3) Allow drivers to mark a hash value as being proper Toeplitz
     of a TCP packet

There's also the possibility of calculating the Toeplitz hash for a
connection in order to deduce which RSS queue the connection maps
to, and thus allow a poor man's aRFS (idea from Jesse Brandeburg).

^ permalink raw reply

* Re: BUG: 32 bit net stats
From: Jamal Hadi Salim @ 2013-09-23 22:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Eric Dumazet
In-Reply-To: <1379973869.3165.27.camel@edumazet-glaptop>

On 13-09-23 06:04 PM, Eric Dumazet wrote:

>
> Sorry, this doesnt make sense to me.
>
> Existing code seems to be fine.
>
> Which driver do you use ?

Sorry - let me take a closer look and get back
to you. I think the bug maybe in user space.

cheers,
jamal

^ permalink raw reply

* [PATCH 09/10] emulex: Remove extern from function prototypes
From: Joe Perches @ 2013-09-23 22:11 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, linux-kernel, Sathya Perla, Subbu Seetharaman,
	Ajit Khaparde
In-Reply-To: <5570169a078375fa8662adeb2a7f24c1ae718bfb.1379974101.git.joe@perches.com>

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.

Signed-off-by: Joe Perches <joe@perches.com>
---
 drivers/net/ethernet/emulex/benet/be.h      |  24 +--
 drivers/net/ethernet/emulex/benet/be_cmds.h | 238 +++++++++++++---------------
 2 files changed, 123 insertions(+), 139 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index ace5050..4a0d3b7 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -694,27 +694,27 @@ static inline int qnq_async_evt_rcvd(struct be_adapter *adapter)
 	return adapter->flags & BE_FLAGS_QNQ_ASYNC_EVT_RCVD;
 }
 
-extern void be_cq_notify(struct be_adapter *adapter, u16 qid, bool arm,
-		u16 num_popped);
-extern void be_link_status_update(struct be_adapter *adapter, u8 link_status);
-extern void be_parse_stats(struct be_adapter *adapter);
-extern int be_load_fw(struct be_adapter *adapter, u8 *func);
-extern bool be_is_wol_supported(struct be_adapter *adapter);
-extern bool be_pause_supported(struct be_adapter *adapter);
-extern u32 be_get_fw_log_level(struct be_adapter *adapter);
+void be_cq_notify(struct be_adapter *adapter, u16 qid, bool arm,
+		  u16 num_popped);
+void be_link_status_update(struct be_adapter *adapter, u8 link_status);
+void be_parse_stats(struct be_adapter *adapter);
+int be_load_fw(struct be_adapter *adapter, u8 *func);
+bool be_is_wol_supported(struct be_adapter *adapter);
+bool be_pause_supported(struct be_adapter *adapter);
+u32 be_get_fw_log_level(struct be_adapter *adapter);
 int be_update_queues(struct be_adapter *adapter);
 int be_poll(struct napi_struct *napi, int budget);
 
 /*
  * internal function to initialize-cleanup roce device.
  */
-extern void be_roce_dev_add(struct be_adapter *);
-extern void be_roce_dev_remove(struct be_adapter *);
+void be_roce_dev_add(struct be_adapter *);
+void be_roce_dev_remove(struct be_adapter *);
 
 /*
  * internal function to open-close roce device during ifup-ifdown.
  */
-extern void be_roce_dev_open(struct be_adapter *);
-extern void be_roce_dev_close(struct be_adapter *);
+void be_roce_dev_open(struct be_adapter *);
+void be_roce_dev_close(struct be_adapter *);
 
 #endif				/* BE_H */
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.h b/drivers/net/ethernet/emulex/benet/be_cmds.h
index d026226..84f8c52 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.h
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.h
@@ -1863,137 +1863,121 @@ struct be_cmd_resp_get_iface_list {
 	struct be_if_desc if_desc;
 };
 
-extern int be_pci_fnum_get(struct be_adapter *adapter);
-extern int be_fw_wait_ready(struct be_adapter *adapter);
-extern int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr,
-				 bool permanent, u32 if_handle, u32 pmac_id);
-extern int be_cmd_pmac_add(struct be_adapter *adapter, u8 *mac_addr,
-			u32 if_id, u32 *pmac_id, u32 domain);
-extern int be_cmd_pmac_del(struct be_adapter *adapter, u32 if_id,
-			int pmac_id, u32 domain);
-extern int be_cmd_if_create(struct be_adapter *adapter, u32 cap_flags,
-			    u32 en_flags, u32 *if_handle, u32 domain);
-extern int be_cmd_if_destroy(struct be_adapter *adapter, int if_handle,
-			u32 domain);
-extern int be_cmd_eq_create(struct be_adapter *adapter, struct be_eq_obj *eqo);
-extern int be_cmd_cq_create(struct be_adapter *adapter,
-			struct be_queue_info *cq, struct be_queue_info *eq,
-			bool no_delay, int num_cqe_dma_coalesce);
-extern int be_cmd_mccq_create(struct be_adapter *adapter,
-			struct be_queue_info *mccq,
-			struct be_queue_info *cq);
-extern int be_cmd_txq_create(struct be_adapter *adapter,
-			struct be_tx_obj *txo);
-extern int be_cmd_rxq_create(struct be_adapter *adapter,
-			struct be_queue_info *rxq, u16 cq_id,
-			u16 frag_size, u32 if_id, u32 rss, u8 *rss_id);
-extern int be_cmd_q_destroy(struct be_adapter *adapter, struct be_queue_info *q,
-			int type);
-extern int be_cmd_rxq_destroy(struct be_adapter *adapter,
-			struct be_queue_info *q);
-extern int be_cmd_link_status_query(struct be_adapter *adapter, u16 *link_speed,
-				    u8 *link_status, u32 dom);
-extern int be_cmd_reset(struct be_adapter *adapter);
-extern int be_cmd_get_stats(struct be_adapter *adapter,
-			struct be_dma_mem *nonemb_cmd);
-extern int lancer_cmd_get_pport_stats(struct be_adapter *adapter,
-			struct be_dma_mem *nonemb_cmd);
-extern int be_cmd_get_fw_ver(struct be_adapter *adapter, char *fw_ver,
-		char *fw_on_flash);
-
-extern int be_cmd_modify_eqd(struct be_adapter *adapter, u32 eq_id, u32 eqd);
-extern int be_cmd_vlan_config(struct be_adapter *adapter, u32 if_id,
-			u16 *vtag_array, u32 num, bool untagged,
-			bool promiscuous);
-extern int be_cmd_rx_filter(struct be_adapter *adapter, u32 flags, u32 status);
-extern int be_cmd_set_flow_control(struct be_adapter *adapter,
-			u32 tx_fc, u32 rx_fc);
-extern int be_cmd_get_flow_control(struct be_adapter *adapter,
-			u32 *tx_fc, u32 *rx_fc);
-extern int be_cmd_query_fw_cfg(struct be_adapter *adapter, u32 *port_num,
+int be_pci_fnum_get(struct be_adapter *adapter);
+int be_fw_wait_ready(struct be_adapter *adapter);
+int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr,
+			  bool permanent, u32 if_handle, u32 pmac_id);
+int be_cmd_pmac_add(struct be_adapter *adapter, u8 *mac_addr, u32 if_id,
+		    u32 *pmac_id, u32 domain);
+int be_cmd_pmac_del(struct be_adapter *adapter, u32 if_id, int pmac_id,
+		    u32 domain);
+int be_cmd_if_create(struct be_adapter *adapter, u32 cap_flags, u32 en_flags,
+		     u32 *if_handle, u32 domain);
+int be_cmd_if_destroy(struct be_adapter *adapter, int if_handle, u32 domain);
+int be_cmd_eq_create(struct be_adapter *adapter, struct be_eq_obj *eqo);
+int be_cmd_cq_create(struct be_adapter *adapter, struct be_queue_info *cq,
+		     struct be_queue_info *eq, bool no_delay,
+		     int num_cqe_dma_coalesce);
+int be_cmd_mccq_create(struct be_adapter *adapter, struct be_queue_info *mccq,
+		       struct be_queue_info *cq);
+int be_cmd_txq_create(struct be_adapter *adapter, struct be_tx_obj *txo);
+int be_cmd_rxq_create(struct be_adapter *adapter, struct be_queue_info *rxq,
+		      u16 cq_id, u16 frag_size, u32 if_id, u32 rss, u8 *rss_id);
+int be_cmd_q_destroy(struct be_adapter *adapter, struct be_queue_info *q,
+		     int type);
+int be_cmd_rxq_destroy(struct be_adapter *adapter, struct be_queue_info *q);
+int be_cmd_link_status_query(struct be_adapter *adapter, u16 *link_speed,
+			     u8 *link_status, u32 dom);
+int be_cmd_reset(struct be_adapter *adapter);
+int be_cmd_get_stats(struct be_adapter *adapter, struct be_dma_mem *nonemb_cmd);
+int lancer_cmd_get_pport_stats(struct be_adapter *adapter,
+			       struct be_dma_mem *nonemb_cmd);
+int be_cmd_get_fw_ver(struct be_adapter *adapter, char *fw_ver,
+		      char *fw_on_flash);
+
+int be_cmd_modify_eqd(struct be_adapter *adapter, u32 eq_id, u32 eqd);
+int be_cmd_vlan_config(struct be_adapter *adapter, u32 if_id, u16 *vtag_array,
+		       u32 num, bool untagged, bool promiscuous);
+int be_cmd_rx_filter(struct be_adapter *adapter, u32 flags, u32 status);
+int be_cmd_set_flow_control(struct be_adapter *adapter, u32 tx_fc, u32 rx_fc);
+int be_cmd_get_flow_control(struct be_adapter *adapter, u32 *tx_fc, u32 *rx_fc);
+int be_cmd_query_fw_cfg(struct be_adapter *adapter, u32 *port_num,
 			u32 *function_mode, u32 *function_caps, u16 *asic_rev);
-extern int be_cmd_reset_function(struct be_adapter *adapter);
-extern int be_cmd_rss_config(struct be_adapter *adapter, u8 *rsstable,
-			     u32 rss_hash_opts, u16 table_size);
-extern int be_process_mcc(struct be_adapter *adapter);
-extern int be_cmd_set_beacon_state(struct be_adapter *adapter,
-			u8 port_num, u8 beacon, u8 status, u8 state);
-extern int be_cmd_get_beacon_state(struct be_adapter *adapter,
-			u8 port_num, u32 *state);
-extern int be_cmd_write_flashrom(struct be_adapter *adapter,
-			struct be_dma_mem *cmd, u32 flash_oper,
-			u32 flash_opcode, u32 buf_size);
-extern int lancer_cmd_write_object(struct be_adapter *adapter,
-				   struct be_dma_mem *cmd,
-				   u32 data_size, u32 data_offset,
-				   const char *obj_name,
-				   u32 *data_written, u8 *change_status,
-				   u8 *addn_status);
+int be_cmd_reset_function(struct be_adapter *adapter);
+int be_cmd_rss_config(struct be_adapter *adapter, u8 *rsstable,
+		      u32 rss_hash_opts, u16 table_size);
+int be_process_mcc(struct be_adapter *adapter);
+int be_cmd_set_beacon_state(struct be_adapter *adapter, u8 port_num, u8 beacon,
+			    u8 status, u8 state);
+int be_cmd_get_beacon_state(struct be_adapter *adapter, u8 port_num,
+			    u32 *state);
+int be_cmd_write_flashrom(struct be_adapter *adapter, struct be_dma_mem *cmd,
+			  u32 flash_oper, u32 flash_opcode, u32 buf_size);
+int lancer_cmd_write_object(struct be_adapter *adapter, struct be_dma_mem *cmd,
+			    u32 data_size, u32 data_offset,
+			    const char *obj_name, u32 *data_written,
+			    u8 *change_status, u8 *addn_status);
 int lancer_cmd_read_object(struct be_adapter *adapter, struct be_dma_mem *cmd,
-		u32 data_size, u32 data_offset, const char *obj_name,
-		u32 *data_read, u32 *eof, u8 *addn_status);
+			   u32 data_size, u32 data_offset, const char *obj_name,
+			   u32 *data_read, u32 *eof, u8 *addn_status);
 int be_cmd_get_flash_crc(struct be_adapter *adapter, u8 *flashed_crc,
-				int offset);
-extern int be_cmd_enable_magic_wol(struct be_adapter *adapter, u8 *mac,
-				struct be_dma_mem *nonemb_cmd);
-extern int be_cmd_fw_init(struct be_adapter *adapter);
-extern int be_cmd_fw_clean(struct be_adapter *adapter);
-extern void be_async_mcc_enable(struct be_adapter *adapter);
-extern void be_async_mcc_disable(struct be_adapter *adapter);
-extern int be_cmd_loopback_test(struct be_adapter *adapter, u32 port_num,
-				u32 loopback_type, u32 pkt_size,
-				u32 num_pkts, u64 pattern);
-extern int be_cmd_ddr_dma_test(struct be_adapter *adapter, u64 pattern,
-			u32 byte_cnt, struct be_dma_mem *cmd);
-extern int be_cmd_get_seeprom_data(struct be_adapter *adapter,
-				struct be_dma_mem *nonemb_cmd);
-extern int be_cmd_set_loopback(struct be_adapter *adapter, u8 port_num,
-				u8 loopback_type, u8 enable);
-extern int be_cmd_get_phy_info(struct be_adapter *adapter);
-extern int be_cmd_set_qos(struct be_adapter *adapter, u32 bps, u32 domain);
-extern void be_detect_error(struct be_adapter *adapter);
-extern int be_cmd_get_die_temperature(struct be_adapter *adapter);
-extern int be_cmd_get_cntl_attributes(struct be_adapter *adapter);
-extern int be_cmd_req_native_mode(struct be_adapter *adapter);
-extern int be_cmd_get_reg_len(struct be_adapter *adapter, u32 *log_size);
-extern void be_cmd_get_regs(struct be_adapter *adapter, u32 buf_len, void *buf);
-extern int be_cmd_get_fn_privileges(struct be_adapter *adapter,
-				    u32 *privilege, u32 domain);
-extern int be_cmd_set_fn_privileges(struct be_adapter *adapter,
-				    u32 privileges, u32 vf_num);
-extern int be_cmd_get_mac_from_list(struct be_adapter *adapter, u8 *mac,
-				    bool *pmac_id_active, u32 *pmac_id,
-				    u8 domain);
-extern int be_cmd_get_active_mac(struct be_adapter *adapter, u32 pmac_id,
-				 u8 *mac);
-extern int be_cmd_get_perm_mac(struct be_adapter *adapter, u8 *mac);
-extern int be_cmd_set_mac_list(struct be_adapter *adapter, u8 *mac_array,
-						u8 mac_count, u32 domain);
-extern int be_cmd_set_mac(struct be_adapter *adapter, u8 *mac, int if_id,
-			  u32 dom);
-extern int be_cmd_set_hsw_config(struct be_adapter *adapter, u16 pvid,
-				 u32 domain, u16 intf_id, u16 hsw_mode);
-extern int be_cmd_get_hsw_config(struct be_adapter *adapter, u16 *pvid,
-				 u32 domain, u16 intf_id, u8 *mode);
-extern int be_cmd_get_acpi_wol_cap(struct be_adapter *adapter);
-extern int be_cmd_get_ext_fat_capabilites(struct be_adapter *adapter,
-					  struct be_dma_mem *cmd);
-extern int be_cmd_set_ext_fat_capabilites(struct be_adapter *adapter,
-					  struct be_dma_mem *cmd,
-					  struct be_fat_conf_params *cfgs);
-extern int lancer_wait_ready(struct be_adapter *adapter);
-extern int lancer_physdev_ctrl(struct be_adapter *adapter, u32 mask);
-extern int lancer_initiate_dump(struct be_adapter *adapter);
-extern bool dump_present(struct be_adapter *adapter);
-extern int lancer_test_and_set_rdy_state(struct be_adapter *adapter);
-extern int be_cmd_query_port_name(struct be_adapter *adapter, u8 *port_name);
+			 int offset);
+int be_cmd_enable_magic_wol(struct be_adapter *adapter, u8 *mac,
+			    struct be_dma_mem *nonemb_cmd);
+int be_cmd_fw_init(struct be_adapter *adapter);
+int be_cmd_fw_clean(struct be_adapter *adapter);
+void be_async_mcc_enable(struct be_adapter *adapter);
+void be_async_mcc_disable(struct be_adapter *adapter);
+int be_cmd_loopback_test(struct be_adapter *adapter, u32 port_num,
+			 u32 loopback_type, u32 pkt_size, u32 num_pkts,
+			 u64 pattern);
+int be_cmd_ddr_dma_test(struct be_adapter *adapter, u64 pattern, u32 byte_cnt,
+			struct be_dma_mem *cmd);
+int be_cmd_get_seeprom_data(struct be_adapter *adapter,
+			    struct be_dma_mem *nonemb_cmd);
+int be_cmd_set_loopback(struct be_adapter *adapter, u8 port_num,
+			u8 loopback_type, u8 enable);
+int be_cmd_get_phy_info(struct be_adapter *adapter);
+int be_cmd_set_qos(struct be_adapter *adapter, u32 bps, u32 domain);
+void be_detect_error(struct be_adapter *adapter);
+int be_cmd_get_die_temperature(struct be_adapter *adapter);
+int be_cmd_get_cntl_attributes(struct be_adapter *adapter);
+int be_cmd_req_native_mode(struct be_adapter *adapter);
+int be_cmd_get_reg_len(struct be_adapter *adapter, u32 *log_size);
+void be_cmd_get_regs(struct be_adapter *adapter, u32 buf_len, void *buf);
+int be_cmd_get_fn_privileges(struct be_adapter *adapter, u32 *privilege,
+			     u32 domain);
+int be_cmd_set_fn_privileges(struct be_adapter *adapter, u32 privileges,
+			     u32 vf_num);
+int be_cmd_get_mac_from_list(struct be_adapter *adapter, u8 *mac,
+			     bool *pmac_id_active, u32 *pmac_id, u8 domain);
+int be_cmd_get_active_mac(struct be_adapter *adapter, u32 pmac_id, u8 *mac);
+int be_cmd_get_perm_mac(struct be_adapter *adapter, u8 *mac);
+int be_cmd_set_mac_list(struct be_adapter *adapter, u8 *mac_array, u8 mac_count,
+			u32 domain);
+int be_cmd_set_mac(struct be_adapter *adapter, u8 *mac, int if_id, u32 dom);
+int be_cmd_set_hsw_config(struct be_adapter *adapter, u16 pvid, u32 domain,
+			  u16 intf_id, u16 hsw_mode);
+int be_cmd_get_hsw_config(struct be_adapter *adapter, u16 *pvid, u32 domain,
+			  u16 intf_id, u8 *mode);
+int be_cmd_get_acpi_wol_cap(struct be_adapter *adapter);
+int be_cmd_get_ext_fat_capabilites(struct be_adapter *adapter,
+				   struct be_dma_mem *cmd);
+int be_cmd_set_ext_fat_capabilites(struct be_adapter *adapter,
+				   struct be_dma_mem *cmd,
+				   struct be_fat_conf_params *cfgs);
+int lancer_wait_ready(struct be_adapter *adapter);
+int lancer_physdev_ctrl(struct be_adapter *adapter, u32 mask);
+int lancer_initiate_dump(struct be_adapter *adapter);
+bool dump_present(struct be_adapter *adapter);
+int lancer_test_and_set_rdy_state(struct be_adapter *adapter);
+int be_cmd_query_port_name(struct be_adapter *adapter, u8 *port_name);
 int be_cmd_get_func_config(struct be_adapter *adapter,
 			   struct be_resources *res);
 int be_cmd_get_profile_config(struct be_adapter *adapter,
 			      struct be_resources *res, u8 domain);
-extern int be_cmd_set_profile_config(struct be_adapter *adapter, u32 bps,
-				     u8 domain);
-extern int be_cmd_get_if_id(struct be_adapter *adapter,
-			    struct be_vf_cfg *vf_cfg, int vf_num);
-extern int be_cmd_enable_vf(struct be_adapter *adapter, u8 domain);
-extern int be_cmd_intr_set(struct be_adapter *adapter, bool intr_enable);
+int be_cmd_set_profile_config(struct be_adapter *adapter, u32 bps, u8 domain);
+int be_cmd_get_if_id(struct be_adapter *adapter, struct be_vf_cfg *vf_cfg,
+		     int vf_num);
+int be_cmd_enable_vf(struct be_adapter *adapter, u8 domain);
+int be_cmd_intr_set(struct be_adapter *adapter, bool intr_enable);
-- 
1.8.1.2.459.gbcd45b4.dirty

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox