Netdev List
 help / color / mirror / Atom feed
* The check of upper MTU limit when changing it in ip6 gre tunnel seems incorrect.
From: Oussama Ghorbel @ 2013-10-02 16:44 UTC (permalink / raw)
  To: netdev

The check of upper MTU limit when changing it in ip6 gre tunnel seems incorrect.
The function in question is:

static int ip6gre_tunnel_change_mtu(struct net_device *dev, int new_mtu)
{
    struct ip6_tnl *tunnel = netdev_priv(dev);

    if (new_mtu < 68 ||
        new_mtu > 0xFFF8 - dev->hard_header_len - tunnel->hlen)
        return -EINVAL;
    dev->mtu = new_mtu;
    return 0;
}

However the dev->hard_header_len and tunnel->hlen are initialized in
the following way in ip6gre_tnl_link_config():

int addend = sizeof(struct ipv6hdr) + 4;
...
dev->hard_header_len = rt->dst.dev->hard_header_len + addend;
...
t->hlen = addend; // t is ip6_tnl pointer

As you see the information t->hlen is already included in
dev->hard_header_len, so why calculate it twice?

Thanks

^ permalink raw reply

* Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
From: Hannes Frederic Sowa @ 2013-10-02 16:27 UTC (permalink / raw)
  To: jdmason
  Cc: Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris, kaber,
	herbert, eric.dumazet
In-Reply-To: <1380726867.19002.104.camel@edumazet-glaptop.roam.corp.google.com>

Hi!

I have a question regarding UFO and the neterion driver, which as the only one
advertises hardware UFO support:

The patch discusses in this thread
http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
some semantics how packets are constructed before submitted to the driver.

We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
payload is attached in the skb's frags. With the changes discussed in this
thread it is possible that we also append to skb->data some amount of data
which is not targeted for the header. From reading the driver sources it seems
the hardware interprets the skb->data to skb_headlen as the header, so we
could include some data in the fragments more than once.

Do you think this change is safe? Otherwise I would suggest that the UFO
capability is switched off until the driver signals the hardware the start and
end of the headers correctly?

I left the mail below intact which points to the specific place in s2io.c
where I think the problem is.

On Wed, Oct 02, 2013 at 08:14:27AM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-02 at 15:03 +0200, Hannes Frederic Sowa wrote:
> > On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> > > Hi Eric!
> > > 
> > > On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > > > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > > > 
> > > > > 
> > > > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > > > well" which does the setting of gso_size.
> > > > 
> > > > Well, skb having frags or not should not be a concern :
> > > > Thats an allocation choice (lets say to avoid high order allocations). 
> > > > 
> > > > Setting gso_size is probably better.
> > > 
> > > e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> > > approach") states:
> > > 
> > > "
> > > skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> > > contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> > > indicating that hardware has to do checksum calculation. Hardware should
> > > compute the UDP checksum of complete datagram and also ip header checksum of
> > > each fragmented IP packet.
> > > "
> > > 
> > > This is the reason why I tried not to update the gso_size. If it is ok, I am
> > > fine with that.
> > 
> > Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
> > mapping (skb->data with skb_headlen, which is fine) is used as the inband
> > header:
> > 
> >         if (offload_type == SKB_GSO_UDP)
> >                 frg_cnt++; /* as Txd0 was used for inband header */
> > 
> > That is my only other hint that we maybe should not update gso_size and
> > gso_type. I guess software fallback does not have this problem, but I won't
> > have time to check until this evening.
> > 
> > I am really not sure if just setting gso_size does not break neterion UFO
> > offloading. :/
> 
> Well, just ask Jon Mason to double check ;)
> 
> I think the commit intent was to set gso_size :
> 
>    skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
>     fragment going out of the adapter after IP fragmentation by hardware.
> 
> The fact that it states "skb->data will contain MAC/IP/UDP header and
> skb_shinfo(skb)->frags[] contains the data payload." seems irrelevant.
> 
> If Neterion driver mandates that skb->head *only* contains the
> MAC/IP/UDP header, that should be handled in the driver itself.

Thanks Eric for clearing this up.

I really thought it would be the common pattern for UFO to have only headers
in skb->data, so I didn't bother to ask in the first place.

Thanks,

  Hannes

^ permalink raw reply

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Eric Dumazet @ 2013-10-02 15:14 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris, kaber,
	herbert
In-Reply-To: <20131002130329.GP10771@order.stressinduktion.org>

On Wed, 2013-10-02 at 15:03 +0200, Hannes Frederic Sowa wrote:
> On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> > Hi Eric!
> > 
> > On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > > 
> > > > 
> > > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > > well" which does the setting of gso_size.
> > > 
> > > Well, skb having frags or not should not be a concern :
> > > Thats an allocation choice (lets say to avoid high order allocations). 
> > > 
> > > Setting gso_size is probably better.
> > 
> > e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> > approach") states:
> > 
> > "
> > skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> > contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> > indicating that hardware has to do checksum calculation. Hardware should
> > compute the UDP checksum of complete datagram and also ip header checksum of
> > each fragmented IP packet.
> > "
> > 
> > This is the reason why I tried not to update the gso_size. If it is ok, I am
> > fine with that.
> 
> Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
> mapping (skb->data with skb_headlen, which is fine) is used as the inband
> header:
> 
>         if (offload_type == SKB_GSO_UDP)
>                 frg_cnt++; /* as Txd0 was used for inband header */
> 
> That is my only other hint that we maybe should not update gso_size and
> gso_type. I guess software fallback does not have this problem, but I won't
> have time to check until this evening.
> 
> I am really not sure if just setting gso_size does not break neterion UFO
> offloading. :/

Well, just ask Jon Mason to double check ;)

I think the commit intent was to set gso_size :

   skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
    fragment going out of the adapter after IP fragmentation by hardware.

The fact that it states "skb->data will contain MAC/IP/UDP header and
skb_shinfo(skb)->frags[] contains the data payload." seems irrelevant.

If Neterion driver mandates that skb->head *only* contains the
MAC/IP/UDP header, that should be handled in the driver itself.

^ permalink raw reply

* Re: [PATCH RFC] random: introduce get_random_bytes_busy_wait_initialized
From: Theodore Ts'o @ 2013-10-02 15:10 UTC (permalink / raw)
  To: Eric Dumazet, Tom Herbert, davem, netdev, jesse.brandeburg,
	linux-kernel
In-Reply-To: <20130925090034.GC4904@order.stressinduktion.org>

On Wed, Sep 25, 2013 at 11:00:34AM +0200, Hannes Frederic Sowa wrote:
> [PATCH RFC] random: introduce get_random_bytes_busy_wait_initialized
> 
> We want to use good entropy for initializing the secret keys used for
> hashing in the core network stack. So busy wait before extracting random
> data until the nonblocking_pool is initialized.
> 
> Further entropy is also gathered by interrupts, so we are guaranteed to
> make progress here.

One thing that makes me a bit worried is that on certain
architectures, it may take quite a while before we get enough entropy
such that the non-blocking pool gets initialized.

Speaking more generally, there are many different reasons why
different parts of the kernel needs randomness.  I've found a number
of places (mostly in various file systems so far because I know that
subsystem better) because we are trying to use a random number
generator with a higher level of guarantees than what was really
required.

What's not completely clear to me is what's the potential danger if
build_ehash_secret() is initialized with a value that might be known
to an adversary.  I'll note that inet_ehash_secret() is a 32-bit uint.
A 32-bit number isn't all that hard for an adversary to brute force.
If the answer is there's now oracle that can be used so an adversary
can tell whether or not they have correctly figured out the ehash
secret, then it's not that clear that it's worth blocking until the
urandom pool has 128 bits of entropy, when ehash_secret is only a
32-bit value.

Speaking even more generally, any time you have subsystems grabbing
random entropy very shortly after boot, especially if it's less than
64 bits, it's really good idea of the secret gets periodically
rekeyed.  I understand why that may be hard in this case, since it
would require rehashing all of the currently open sockets, and maybe
in this case the security requirements are such that it's not really
necessary.  But it's something we should definitely keep in mind for
situations where we are generating random session keys for CIFS,
ipsec, etc.

Regards,

						- Ted

^ permalink raw reply

* [PATCH net-next] 3com: Fix drivers/net/ethernet/3com/Kconfig references to PCMCIA and 3c515
From: Matthew Whitehead @ 2013-10-02 15:08 UTC (permalink / raw)
  To: netdev; +Cc: Matthew Whitehead

The Vortex driver works with PCI and Cardbus devices, not PCMCIA.

There never was an EISA 3c515 card, only ISA, so remove that option.

Signed-off-by: Matthew Whitehead <tedheadster@gmail.com>
---
 drivers/net/ethernet/3com/Kconfig |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/3com/Kconfig b/drivers/net/ethernet/3com/Kconfig
index f00c763..65b735d 100644
--- a/drivers/net/ethernet/3com/Kconfig
+++ b/drivers/net/ethernet/3com/Kconfig
@@ -35,7 +35,7 @@ config EL3
 
 config 3C515
 	tristate "3c515 ISA \"Fast EtherLink\""
-	depends on (ISA || EISA) && ISA_DMA_API
+	depends on ISA && ISA_DMA_API
 	---help---
 	  If you have a 3Com ISA EtherLink XL "Corkscrew" 3c515 Fast Ethernet
 	  network card, say Y and read the Ethernet-HOWTO, available from
@@ -70,7 +70,7 @@ config VORTEX
 	select MII
 	---help---
 	  This option enables driver support for a large number of 10Mbps and
-	  10/100Mbps EISA, PCI and PCMCIA 3Com network cards:
+	  10/100Mbps EISA, PCI and Cardbus 3Com network cards:
 
 	  "Vortex"    (Fast EtherLink 3c590/3c592/3c595/3c597) EISA and PCI
 	  "Boomerang" (EtherLink XL 3c900 or 3c905)            PCI
-- 
1.7.2.5

^ permalink raw reply related

* RE: [PATCH net-next] tcp: shrink tcp6_timewait_sock by one cache line
From: Eric Dumazet @ 2013-10-02 14:57 UTC (permalink / raw)
  To: David Laight; +Cc: David Miller, netdev
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B736F@saturn3.aculab.com>

On Wed, 2013-10-02 at 13:08 +0100, David Laight wrote:
> > -	tmo = tw->tw_ttd - jiffies;
> > +	tmo = tw->tw_ttd - (u32)jiffies;
> 
> Do you need any of these (u32) casts?
> The compiler will almost certainly use 32bit arithmetic (on 32bit systems at least)
> because the 'as if' rule lets if use the smaller type.

I wanted to clearly show the intent of the code.

Some compilers might warnings here.

> 
> > +		tw->tw_ttd = (u32)(jiffies + (slot << INET_TWDR_RECYCLE_TICK));
> 
> If that (u32) cast is needed in order to avoid 64bit maths, it is in the wrong place.

I want to make sure no compiler will complain for potential losses of
precision.

Note we have :

include/net/tcp.h:691:#define tcp_time_stamp            ((__u32)(jiffies))

But this could change if we want 100us resolution for TCP timestamps at
one point.

In this code, we want HZ units, but truncated to 32bits.

Adding a 

#define jiffies32   ((u32)jiffies)

might do the trick, but lot of pain for such a trivial patch.

^ permalink raw reply

* Re: [RFC PATCH 0/2] net: alternate proposal for using macvlans with forwarding acceleration
From: Neil Horman @ 2013-10-02 13:28 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev
In-Reply-To: <524BBDC1.6040000@gmail.com>

On Tue, Oct 01, 2013 at 11:31:29PM -0700, John Fastabend wrote:
> On 09/25/2013 01:16 PM, Neil Horman wrote:
> >John, et al. -
> >      As promised, heres my (very rough) first pass at an alternate propsal for
> >what you're trying to do with virtual station interfaces here.  Its completely
> >untested, but it builds, and I'll be trying to run it over the next few days
> >(though I'm sure I got part of the hardware manipulation wrong).  I wanted to
> >post it early though so you could get a look at it to see what you did and
> >didn't like about it.  Some notes:
> 
> Sorry for the delay. I like the idea one nice win here is my macvlan
> kvm setup would use the offloads without having to reconfigure.
> 
Thats ok, I've been swamped, so this has gone slower than I had hoped for me.

> >
> >1) As discussed, the major effort here is to tie in macvlans with l2 forwarding
> >acceleration, rather than creating a new vsi link type.  That should make
> >management easier for admins (be it via ovs or some other mechanism).  It
> >basically exposes a bit less to the user, which I think is good.
> >
> 
> The trick here is the offload path may be functionally different from
> the non-offload path. The user needs some visibility into this. For
> example any qdiscs running on the lowerdev will not be visible from the
> accelerated path.
> 
Actually, given your suggestion in your other note (using dev_queue_xmit), I
think this will be handled, in that we will now pass accelerated traffic through
any lowerdev attached qdiscs.

> When a new link type was being used I was able to convince myself that
> it was a property of the link type. But if we reuse macvlan I think we
> need some way to either automatically disable the offload path when this
> occurs or provide the user visibility. Maybe a feature flag and a
> netif_can_hw_offload() routine is needed?
> 
As noted above, I think we can make macvlans work identially with and without
acceleration by reusing the xmit path, and keying off data (what I've done is
attach a pointer to acceleration data in the skb so the common ndo_start_xmit
path in the driver can differentiate accelerated from non-accelerated skbs).
That will allow everything to function the same way.  I do think though that a
feature flag just to provide some visibility to the admin is warranted.

> >2) I've separated out the l2 forwarding acceleration operations from the
> >net_device_operations structure.  I'm not sure I like that yet, but I'm kind on
> >leaning that way.  Since a limited set of hardare supports forwarding
> >acceleration, it makes for a nice easy way to group functionality without
> >polluting the net_device_operations structure.  It also lets us group simmilar
> >functions together nicely (I can see a future l3_accel_ops structure if we can
> >do l3 flows in hardware).  Anywho, its a divergence from what we've been doing
> >so I thought I would call attention to it.
> >
> >3) I've included a l2_accel_xmit method in the accel_ops structure for fast path
> >forwarding, but I'm not sure I like that.  It seems we should be able to use
> >ndo_start_xmit and key off some data to recognize that we should be doing
> >hardware forwarding.  I'm not quite sure how to do that yet though.  Something
> >to think about.
> 
> Without a specific xmit routine though you will be adding operations in
> the common case for a special case. Having a new op fixes this.
> 
Agreed, and thats something to consider.  I'm not sure if separating the special
case to its own operation is a net win however, given the behavioral differences
that arise (the qdisc issue above).  My next version will use a common xmit
path.  We can look it over then for comparison and weight the pros and cons.

> >
> >4) I've borrowed heavily from your vsi work of course just to get this building.
> >I think theres probbaly alot of consolidation that can be done in the code that
> >I added to ixgbe_main.c to make it smaller.  Again, I just wanted to post this
> >so you could speak up if you though this was all crap before I wen't too far
> >down the rabbit hole.
> 
> There was some consolidation needed in my original RFC as well. I can
> help clean some of this stuff up if you want.
> 
That would be greatly appreciated.  Thanks.  Next version should be available
in the next few days.  I'll get it to you asap.

Best
Neil

> .John
> 
> -- 
> John Fastabend         Intel Corporation
> 

^ permalink raw reply

* Confirm your webmail Identity
From: Webmail Helpdesk Support @ 2013-10-02 13:16 UTC (permalink / raw)




Confirm your webmail Identity
You have reached the quota limit of email. This will not be able to  
send or receive incomming mails,increases mailbox size until a new  
message. Click the below link to update your account.
itservice-upgrade.phpforms.net/f/webadminn
Regards
Webmail HelpDesk Support
Copyright © 2013,Webmail, Technical Support Team, All Rights Reserved

^ permalink raw reply

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Hannes Frederic Sowa @ 2013-10-02 13:03 UTC (permalink / raw)
  To: Eric Dumazet, Jiri Pirko, netdev, yoshfuji, davem, kuznet,
	jmorris, kaber, herbert
In-Reply-To: <20131002121207.GO10771@order.stressinduktion.org>

On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> Hi Eric!
> 
> On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > 
> > > 
> > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > well" which does the setting of gso_size.
> > 
> > Well, skb having frags or not should not be a concern :
> > Thats an allocation choice (lets say to avoid high order allocations). 
> > 
> > Setting gso_size is probably better.
> 
> e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> approach") states:
> 
> "
> skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> indicating that hardware has to do checksum calculation. Hardware should
> compute the UDP checksum of complete datagram and also ip header checksum of
> each fragmented IP packet.
> "
> 
> This is the reason why I tried not to update the gso_size. If it is ok, I am
> fine with that.

Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
mapping (skb->data with skb_headlen, which is fine) is used as the inband
header:

        if (offload_type == SKB_GSO_UDP)
                frg_cnt++; /* as Txd0 was used for inband header */

That is my only other hint that we maybe should not update gso_size and
gso_type. I guess software fallback does not have this problem, but I won't
have time to check until this evening.

I am really not sure if just setting gso_size does not break neterion UFO
offloading. :/

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH net-next v5 2/3] bonding: modify the old and add new xmit hash policies
From: Veaceslav Falico @ 2013-10-02 12:56 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, davem, andy, fubar, eric.dumazet
In-Reply-To: <1380713966-3891-3-git-send-email-nikolay@redhat.com>

On Wed, Oct 02, 2013 at 01:39:25PM +0200, Nikolay Aleksandrov wrote:
>This patch adds two new hash policy modes which use skb_flow_dissect:
>3 - Encapsulated layer 2+3
>4 - Encapsulated layer 3+4
>There should be a good improvement for tunnel users in those modes.
>It also changes the old hash functions to:
>hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
>hash ^= (hash >> 16);
>hash ^= (hash >> 8);
>
>Where hash will be initialized either to L2 hash, that is
>SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted
>from the upper layer. Flow's dst and src are also extracted based on the
>xmit policy either directly from the buffer or by using skb_flow_dissect,
>but in both cases if the protocol is IPv6 then dst and src are obtained by
>ipv6_addr_hash() on the real addresses. In case of a non-dissectable
>packet, the algorithms fall back to L2 hashing.
>The bond_set_mode_ops() function is now obsolete and thus deleted
>because it was used only to set the proper hash policy. Also we trim a
>pointer from struct bonding because we no longer need to keep the hash
>function, now there's only a single hash function - bond_xmit_hash that
>works based on bond->params.xmit_policy.
>
>The hash function and skb_flow_dissect were suggested by Eric Dumazet.
>The layer names were suggested by Andy Gospodarek, because I suck at
>semantics.
>
>Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>

Acked-by: Veaceslav Falico <vfalico@redhat.com>

>---
>v2: fix a bug in bond_flow_dissect which might've caused the use of
>    uninitalized flow_keys and make use of skb_flow_get_ports
>v3, v4: no change
>v5: re-base
>One line is intentionally left at 82 chars since it's the whole function
>and IMO looks better that way.
>
> drivers/net/bonding/bond_3ad.c   |   2 +-
> drivers/net/bonding/bond_main.c  | 197 ++++++++++++++-------------------------
> drivers/net/bonding/bond_sysfs.c |   2 -
> drivers/net/bonding/bonding.h    |   3 +-
> include/uapi/linux/if_bonding.h  |   2 +
> 5 files changed, 72 insertions(+), 134 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index c62606a..ea3e64e 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -2403,7 +2403,7 @@ int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev)
> 		goto out;
> 	}
>
>-	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
>+	slave_agg_no = bond_xmit_hash(bond, skb, slaves_in_agg);
> 	first_ok_slave = NULL;
>
> 	bond_for_each_slave(bond, slave, iter) {
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index fe8a94f..dfb4f6d 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -78,6 +78,7 @@
> #include <net/netns/generic.h>
> #include <net/pkt_sched.h>
> #include <linux/rculist.h>
>+#include <net/flow_keys.h>
> #include "bonding.h"
> #include "bond_3ad.h"
> #include "bond_alb.h"
>@@ -159,7 +160,8 @@ MODULE_PARM_DESC(min_links, "Minimum number of available links before turning on
> module_param(xmit_hash_policy, charp, 0);
> MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; "
> 				   "0 for layer 2 (default), 1 for layer 3+4, "
>-				   "2 for layer 2+3");
>+				   "2 for layer 2+3, 3 for encap layer 2+3, "
>+				   "4 for encap layer 3+4");
> module_param(arp_interval, int, 0);
> MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
> module_param_array(arp_ip_target, charp, NULL, 0);
>@@ -217,6 +219,8 @@ const struct bond_parm_tbl xmit_hashtype_tbl[] = {
> {	"layer2",		BOND_XMIT_POLICY_LAYER2},
> {	"layer3+4",		BOND_XMIT_POLICY_LAYER34},
> {	"layer2+3",		BOND_XMIT_POLICY_LAYER23},
>+{	"encap2+3",		BOND_XMIT_POLICY_ENCAP23},
>+{	"encap3+4",		BOND_XMIT_POLICY_ENCAP34},
> {	NULL,			-1},
> };
>
>@@ -3035,99 +3039,85 @@ static struct notifier_block bond_netdev_notifier = {
>
> /*---------------------------- Hashing Policies -----------------------------*/
>
>-/*
>- * Hash for the output device based upon layer 2 data
>- */
>-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
>+/* L2 hash helper */
>+static inline u32 bond_eth_hash(struct sk_buff *skb)
> {
> 	struct ethhdr *data = (struct ethhdr *)skb->data;
>
> 	if (skb_headlen(skb) >= offsetof(struct ethhdr, h_proto))
>-		return (data->h_dest[5] ^ data->h_source[5]) % count;
>+		return data->h_dest[5] ^ data->h_source[5];
>
> 	return 0;
> }
>
>-/*
>- * Hash for the output device based upon layer 2 and layer 3 data. If
>- * the packet is not IP, fall back on bond_xmit_hash_policy_l2()
>- */
>-static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
>+/* Extract the appropriate headers based on bond's xmit policy */
>+static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
>+			      struct flow_keys *fk)
> {
>-	const struct ethhdr *data;
>+	const struct ipv6hdr *iph6;
> 	const struct iphdr *iph;
>-	const struct ipv6hdr *ipv6h;
>-	u32 v6hash;
>-	const __be32 *s, *d;
>+	int noff, proto = -1;
>
>-	if (skb->protocol == htons(ETH_P_IP) &&
>-	    pskb_network_may_pull(skb, sizeof(*iph))) {
>+	if (bond->params.xmit_policy > BOND_XMIT_POLICY_LAYER23)
>+		return skb_flow_dissect(skb, fk);
>+
>+	fk->ports = 0;
>+	noff = skb_network_offset(skb);
>+	if (skb->protocol == htons(ETH_P_IP)) {
>+		if (!pskb_may_pull(skb, noff + sizeof(*iph)))
>+			return false;
> 		iph = ip_hdr(skb);
>-		data = (struct ethhdr *)skb->data;
>-		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
>-			(data->h_dest[5] ^ data->h_source[5])) % count;
>-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
>-		   pskb_network_may_pull(skb, sizeof(*ipv6h))) {
>-		ipv6h = ipv6_hdr(skb);
>-		data = (struct ethhdr *)skb->data;
>-		s = &ipv6h->saddr.s6_addr32[0];
>-		d = &ipv6h->daddr.s6_addr32[0];
>-		v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
>-		v6hash ^= (v6hash >> 24) ^ (v6hash >> 16) ^ (v6hash >> 8);
>-		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
>-	}
>-
>-	return bond_xmit_hash_policy_l2(skb, count);
>+		fk->src = iph->saddr;
>+		fk->dst = iph->daddr;
>+		noff += iph->ihl << 2;
>+		if (!ip_is_fragment(iph))
>+			proto = iph->protocol;
>+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
>+		if (!pskb_may_pull(skb, noff + sizeof(*iph6)))
>+			return false;
>+		iph6 = ipv6_hdr(skb);
>+		fk->src = (__force __be32)ipv6_addr_hash(&iph6->saddr);
>+		fk->dst = (__force __be32)ipv6_addr_hash(&iph6->daddr);
>+		noff += sizeof(*iph6);
>+		proto = iph6->nexthdr;
>+	} else {
>+		return false;
>+	}
>+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34 && proto >= 0)
>+		fk->ports = skb_flow_get_ports(skb, noff, proto);
>+
>+	return true;
> }
>
>-/*
>- * Hash for the output device based upon layer 3 and layer 4 data. If
>- * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
>- * altogether not IP, fall back on bond_xmit_hash_policy_l2()
>+/**
>+ * bond_xmit_hash - generate a hash value based on the xmit policy
>+ * @bond: bonding device
>+ * @skb: buffer to use for headers
>+ * @count: modulo value
>+ *
>+ * This function will extract the necessary headers from the skb buffer and use
>+ * them to generate a hash based on the xmit_policy set in the bonding device
>+ * which will be reduced modulo count before returning.
>  */
>-static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
>+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count)
> {
>-	u32 layer4_xor = 0;
>-	const struct iphdr *iph;
>-	const struct ipv6hdr *ipv6h;
>-	const __be32 *s, *d;
>-	const __be16 *l4 = NULL;
>-	__be16 _l4[2];
>-	int noff = skb_network_offset(skb);
>-	int poff;
>-
>-	if (skb->protocol == htons(ETH_P_IP) &&
>-	    pskb_may_pull(skb, noff + sizeof(*iph))) {
>-		iph = ip_hdr(skb);
>-		poff = proto_ports_offset(iph->protocol);
>+	struct flow_keys flow;
>+	u32 hash;
>
>-		if (!ip_is_fragment(iph) && poff >= 0) {
>-			l4 = skb_header_pointer(skb, noff + (iph->ihl << 2) + poff,
>-						sizeof(_l4), &_l4);
>-			if (l4)
>-				layer4_xor = ntohs(l4[0] ^ l4[1]);
>-		}
>-		return (layer4_xor ^
>-			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
>-	} else if (skb->protocol == htons(ETH_P_IPV6) &&
>-		   pskb_may_pull(skb, noff + sizeof(*ipv6h))) {
>-		ipv6h = ipv6_hdr(skb);
>-		poff = proto_ports_offset(ipv6h->nexthdr);
>-		if (poff >= 0) {
>-			l4 = skb_header_pointer(skb, noff + sizeof(*ipv6h) + poff,
>-						sizeof(_l4), &_l4);
>-			if (l4)
>-				layer4_xor = ntohs(l4[0] ^ l4[1]);
>-		}
>-		s = &ipv6h->saddr.s6_addr32[0];
>-		d = &ipv6h->daddr.s6_addr32[0];
>-		layer4_xor ^= (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
>-		layer4_xor ^= (layer4_xor >> 24) ^ (layer4_xor >> 16) ^
>-			       (layer4_xor >> 8);
>-		return layer4_xor % count;
>-	}
>+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
>+	    !bond_flow_dissect(bond, skb, &flow))
>+		return bond_eth_hash(skb) % count;
>+
>+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
>+	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23)
>+		hash = bond_eth_hash(skb);
>+	else
>+		hash = (__force u32)flow.ports;
>+	hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
>+	hash ^= (hash >> 16);
>+	hash ^= (hash >> 8);
>
>-	return bond_xmit_hash_policy_l2(skb, count);
>+	return hash % count;
> }
>
> /*-------------------------- Device entry points ----------------------------*/
>@@ -3721,8 +3711,7 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
> 	return NETDEV_TX_OK;
> }
>
>-/*
>- * In bond_xmit_xor() , we determine the output device by using a pre-
>+/* In bond_xmit_xor() , we determine the output device by using a pre-
>  * determined xmit_hash_policy(), If the selected device is not enabled,
>  * find the next active slave.
>  */
>@@ -3730,8 +3719,7 @@ static int bond_xmit_xor(struct sk_buff *skb, struct net_device *bond_dev)
> {
> 	struct bonding *bond = netdev_priv(bond_dev);
>
>-	bond_xmit_slave_id(bond, skb,
>-			   bond->xmit_hash_policy(skb, bond->slave_cnt));
>+	bond_xmit_slave_id(bond, skb, bond_xmit_hash(bond, skb, bond->slave_cnt));
>
> 	return NETDEV_TX_OK;
> }
>@@ -3768,22 +3756,6 @@ static int bond_xmit_broadcast(struct sk_buff *skb, struct net_device *bond_dev)
>
> /*------------------------- Device initialization ---------------------------*/
>
>-static void bond_set_xmit_hash_policy(struct bonding *bond)
>-{
>-	switch (bond->params.xmit_policy) {
>-	case BOND_XMIT_POLICY_LAYER23:
>-		bond->xmit_hash_policy = bond_xmit_hash_policy_l23;
>-		break;
>-	case BOND_XMIT_POLICY_LAYER34:
>-		bond->xmit_hash_policy = bond_xmit_hash_policy_l34;
>-		break;
>-	case BOND_XMIT_POLICY_LAYER2:
>-	default:
>-		bond->xmit_hash_policy = bond_xmit_hash_policy_l2;
>-		break;
>-	}
>-}
>-
> /*
>  * Lookup the slave that corresponds to a qid
>  */
>@@ -3894,38 +3866,6 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
> 	return ret;
> }
>
>-/*
>- * set bond mode specific net device operations
>- */
>-void bond_set_mode_ops(struct bonding *bond, int mode)
>-{
>-	struct net_device *bond_dev = bond->dev;
>-
>-	switch (mode) {
>-	case BOND_MODE_ROUNDROBIN:
>-		break;
>-	case BOND_MODE_ACTIVEBACKUP:
>-		break;
>-	case BOND_MODE_XOR:
>-		bond_set_xmit_hash_policy(bond);
>-		break;
>-	case BOND_MODE_BROADCAST:
>-		break;
>-	case BOND_MODE_8023AD:
>-		bond_set_xmit_hash_policy(bond);
>-		break;
>-	case BOND_MODE_ALB:
>-		/* FALLTHRU */
>-	case BOND_MODE_TLB:
>-		break;
>-	default:
>-		/* Should never happen, mode already checked */
>-		pr_err("%s: Error: Unknown bonding mode %d\n",
>-		       bond_dev->name, mode);
>-		break;
>-	}
>-}
>-
> static int bond_ethtool_get_settings(struct net_device *bond_dev,
> 				     struct ethtool_cmd *ecmd)
> {
>@@ -4027,7 +3967,6 @@ static void bond_setup(struct net_device *bond_dev)
> 	ether_setup(bond_dev);
> 	bond_dev->netdev_ops = &bond_netdev_ops;
> 	bond_dev->ethtool_ops = &bond_ethtool_ops;
>-	bond_set_mode_ops(bond, bond->params.mode);
>
> 	bond_dev->destructor = bond_destructor;
>
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index e06c644..e924952 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -318,7 +318,6 @@ static ssize_t bonding_store_mode(struct device *d,
> 	/* don't cache arp_validate between modes */
> 	bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
> 	bond->params.mode = new_value;
>-	bond_set_mode_ops(bond, bond->params.mode);
> 	pr_info("%s: setting mode to %s (%d).\n",
> 		bond->dev->name, bond_mode_tbl[new_value].modename,
> 		new_value);
>@@ -358,7 +357,6 @@ static ssize_t bonding_store_xmit_hash(struct device *d,
> 		ret = -EINVAL;
> 	} else {
> 		bond->params.xmit_policy = new_value;
>-		bond_set_mode_ops(bond, bond->params.mode);
> 		pr_info("%s: setting xmit hash policy to %s (%d).\n",
> 			bond->dev->name,
> 			xmit_hashtype_tbl[new_value].modename, new_value);
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index 9a26fbd..0bd04fb 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -217,7 +217,6 @@ struct bonding {
> 	char     proc_file_name[IFNAMSIZ];
> #endif /* CONFIG_PROC_FS */
> 	struct   list_head bond_list;
>-	int      (*xmit_hash_policy)(struct sk_buff *, int);
> 	u16      rr_tx_counter;
> 	struct   ad_bond_info ad_info;
> 	struct   alb_bond_info alb_info;
>@@ -409,7 +408,7 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev);
> void bond_mii_monitor(struct work_struct *);
> void bond_loadbalance_arp_mon(struct work_struct *);
> void bond_activebackup_arp_mon(struct work_struct *);
>-void bond_set_mode_ops(struct bonding *bond, int mode);
>+int bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, int count);
> int bond_parse_parm(const char *mode_arg, const struct bond_parm_tbl *tbl);
> void bond_select_active_slave(struct bonding *bond);
> void bond_change_active_slave(struct bonding *bond, struct slave *new_active);
>diff --git a/include/uapi/linux/if_bonding.h b/include/uapi/linux/if_bonding.h
>index a17edda..9635a62 100644
>--- a/include/uapi/linux/if_bonding.h
>+++ b/include/uapi/linux/if_bonding.h
>@@ -91,6 +91,8 @@
> #define BOND_XMIT_POLICY_LAYER2		0 /* layer 2 (MAC only), default */
> #define BOND_XMIT_POLICY_LAYER34	1 /* layer 3+4 (IP ^ (TCP || UDP)) */
> #define BOND_XMIT_POLICY_LAYER23	2 /* layer 2+3 (IP ^ MAC) */
>+#define BOND_XMIT_POLICY_ENCAP23	3 /* encapsulated layer 2+3 */
>+#define BOND_XMIT_POLICY_ENCAP34	4 /* encapsulated layer 3+4 */
>
> typedef struct ifbond {
> 	__s32 bond_mode;
>-- 
>1.8.1.4
>

^ permalink raw reply

* Re: [PATCH net-next v5 1/3] flow_dissector: factor out the ports extraction in skb_flow_get_ports
From: Veaceslav Falico @ 2013-10-02 12:55 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, davem, andy, fubar, eric.dumazet
In-Reply-To: <1380713966-3891-2-git-send-email-nikolay@redhat.com>

On Wed, Oct 02, 2013 at 01:39:24PM +0200, Nikolay Aleksandrov wrote:
>Factor out the code that extracts the ports from skb_flow_dissect and
>add a new function skb_flow_get_ports which can be re-used.
>
>Suggested-by: Veaceslav Falico <vfalico@redhat.com>
>Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>

Reviewed-by: Veaceslav Falico <vfalico@redhat.com>

>---
>v2: new patch
>v3: fix a bug in skb_flow_dissect where thoff didn't have poff added by
>    modifying thoff directly in skb_flow_get_ports as it's done anyway.
>    Also add the necessary export symbol for skb_flow_get_ports.
>v4: integrate the thoff fix in skb_flow_get_ports
>v5: disintegrate the thoff fix, and re-base on Eric's fix
>This seems like a good idea because there're other users that can re-use
>it later as well.
>
> include/net/flow_keys.h   |  1 +
> net/core/flow_dissector.c | 39 ++++++++++++++++++++++++++++-----------
> 2 files changed, 29 insertions(+), 11 deletions(-)
>
>diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
>index ac2439d..7e64bd8 100644
>--- a/include/net/flow_keys.h
>+++ b/include/net/flow_keys.h
>@@ -14,4 +14,5 @@ struct flow_keys {
> };
>
> bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow);
>+__be32 skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto);
> #endif
>diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>index 8d7d0dd..f8e25ac 100644
>--- a/net/core/flow_dissector.c
>+++ b/net/core/flow_dissector.c
>@@ -25,9 +25,35 @@ static void iph_to_flow_copy_addrs(struct flow_keys *flow, const struct iphdr *i
> 	memcpy(&flow->src, &iph->saddr, sizeof(flow->src) + sizeof(flow->dst));
> }
>
>+/**
>+ * skb_flow_get_ports - extract the upper layer ports and return them
>+ * @skb: buffer to extract the ports from
>+ * @thoff: transport header offset
>+ * @ip_proto: protocol for which to get port offset
>+ *
>+ * The function will try to retrieve the ports at offset thoff + poff where poff
>+ * is the protocol port offset returned from proto_ports_offset
>+ */
>+__be32 skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto)
>+{
>+	int poff = proto_ports_offset(ip_proto);
>+
>+	if (poff >= 0) {
>+		__be32 *ports, _ports;
>+
>+		ports = skb_header_pointer(skb, thoff + poff,
>+					   sizeof(_ports), &_ports);
>+		if (ports)
>+			return *ports;
>+	}
>+
>+	return 0;
>+}
>+EXPORT_SYMBOL(skb_flow_get_ports);
>+
> bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow)
> {
>-	int poff, nhoff = skb_network_offset(skb);
>+	int nhoff = skb_network_offset(skb);
> 	u8 ip_proto;
> 	__be16 proto = skb->protocol;
>
>@@ -150,16 +176,7 @@ ipv6:
> 	}
>
> 	flow->ip_proto = ip_proto;
>-	poff = proto_ports_offset(ip_proto);
>-	if (poff >= 0) {
>-		__be32 *ports, _ports;
>-
>-		ports = skb_header_pointer(skb, nhoff + poff,
>-					   sizeof(_ports), &_ports);
>-		if (ports)
>-			flow->ports = *ports;
>-	}
>-
>+	flow->ports = skb_flow_get_ports(skb, nhoff, ip_proto);
> 	flow->thoff = (u16) nhoff;
>
> 	return true;
>-- 
>1.8.1.4
>

^ permalink raw reply

* Re: [RFC PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
From: Neil Horman @ 2013-10-02 12:53 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev, john.fastabend, David S. Miller
In-Reply-To: <524BC65B.4080803@gmail.com>

On Wed, Oct 02, 2013 at 12:08:11AM -0700, John Fastabend wrote:
> On 09/25/2013 01:16 PM, Neil Horman wrote:
> >Add a operations structure that allows a network interface to export the fact
> >that it supports package forwarding in hardware between physical interfaces and
> >other mac layer devices assigned to it (such as macvlans).  this operaions
> >structure can be used by virtual mac devices to bypass software switching so
> >that forwarding can be done in hardware more efficiently.
> 
> Some additional nits below which maybe you have already thought of.
> 
Its ok, you can say it, theyre more than nits :), gaping holes is more like it.
This pass was completely untested, I'm still ironing out bits, but I think for
the most part we're in agreement on the items below.  Comments inline.

> >
> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >CC: john.fastabend@intel.com
> >CC: "David S. Miller" <davem@davemloft.net>
> >---
> >  drivers/net/macvlan.c      | 37 +++++++++++++++++++++++++++++++++++++
> >  include/linux/if_macvlan.h |  1 +
> >  include/linux/netdevice.h  | 10 ++++++++++
> >  net/core/dev.c             |  3 +++
> >  4 files changed, 51 insertions(+)
> >
> >diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> >index 9bf46bd..0c37b30 100644
> >--- a/drivers/net/macvlan.c
> >+++ b/drivers/net/macvlan.c
> >@@ -296,8 +296,16 @@ netdev_tx_t macvlan_start_xmit(struct sk_buff *skb,
> >  	unsigned int len = skb->len;
> >  	int ret;
> >  	const struct macvlan_dev *vlan = netdev_priv(dev);
> >+	const struct l2_forwarding_accel_ops *l2a_ops = vlan->lowerdev->l2a_ops;
> >+
> >+	if (l2a_ops->l2_accel_xmit) {
> >+		ret = l2a_ops->l2_accel_xmit(skb, vlan->l2a_priv);
> >+		if (likely(ret == NETDEV_TX_OK))
> 
> maybe dev_xmit_complete() would be more appropriate?
> 
Yup, I've currently modified this to dev_queue_xmit, but _complete might be a
better option.

> >+			goto update_stats;
> >+	}
> >
> >  	ret = macvlan_queue_xmit(skb, dev);
> >+update_stats:
> >  	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
> >  		struct macvlan_pcpu_stats *pcpu_stats;
> >
> >@@ -336,6 +344,7 @@ static int macvlan_open(struct net_device *dev)
> >  {
> >  	struct macvlan_dev *vlan = netdev_priv(dev);
> >  	struct net_device *lowerdev = vlan->lowerdev;
> >+	const struct l2_forwarding_accel_ops *l2a_ops = lowerdev->l2a_ops;
> >  	int err;
> >
> >  	if (vlan->port->passthru) {
> >@@ -347,6 +356,19 @@ static int macvlan_open(struct net_device *dev)
> >  		goto hash_add;
> 
> Looks like this might break in the passthru case? If you don't call
> l2_accel_add_dev here but still use the l2_accel_xmit.
> 
Yeah, I need to clean this up.

> >  	}
> >
> >+	if (l2a_ops->l2_accel_add_dev) {
> 
> In the error cases it might be preferred to fallback to the
> non-offloaded software path. For example hardware may have a limit
> to the number of VSIs that can be created but we wouldn't want to
> push that up the stack.
> 
Hadn't thought of that, but yes, I agree, falling back to software switching is
definately desireable here

> >+		/* The lowerdev supports l2 switching
> >+		 * try to add this macvlan to it
> >+		 */
> >+		vlan->l2a_priv = kzalloc(l2a_ops->priv_size, GFP_KERNEL);
> >+		if (!vlan->l2a_priv)
> >+			return -ENOMEM;
> >+		err = l2a_ops->l2_accel_add_dev(vlan->lowerdev,
> >+						dev, vlan->l2a_priv);
> >+		if (err < 0)
> >+			return err;
> >+	}
> >+
> >  	err = -EBUSY;
> >  	if (macvlan_addr_busy(vlan->port, dev->dev_addr))
> >  		goto out;
> >@@ -367,6 +389,13 @@ hash_add:
> >  del_unicast:
> >  	dev_uc_del(lowerdev, dev->dev_addr);
> >  out:
> >+	if (vlan->l2a_priv) {
> 
> Add a feature flag here so it can be disabled.
> 
Makes sense.

I'm still working out kinks, but I hope to have a next version later this
week/early next.  I'll make sure to incorporate these changes. 

Thanks!
Neil

> >+		if (l2a_ops->l2_accel_del_dev)
> >+			l2a_ops->l2_accel_del_dev(vlan->lowerdev,
> >+						  vlan->l2a_priv);
> >+		kfree(vlan->l2a_priv);
> >+		vlan->l2a_priv = NULL;
> >+	}
> >  	return err;
> >  }
> 
> [...]
> 
> Thanks,
> John
> 
> 
> -- 
> John Fastabend         Intel Corporation
> 

^ permalink raw reply

* See the attached file
From: Microsoft Promotion @ 2013-10-02 12:23 UTC (permalink / raw)

In-Reply-To: <1380716610.99461.YahooMailNeo@web5703.biz.mail.ne1.yahoo.com>

[-- Attachment #1: Type: text/plain, Size: 21 bytes --]

See the attached file

[-- Attachment #2: MICROSOFT_AWARD_PROMOTION_2013.doc --]
[-- Type: application/msword, Size: 124416 bytes --]

^ permalink raw reply

* Re: [PATCH 3/3] net: mv643xx_eth: fix missing device_node for port devices
From: Jason Cooper @ 2013-10-02 12:22 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: David Miller, Lennert Buytenhek, netdev, linux-arm-kernel,
	linux-kernel
In-Reply-To: <1380711442-24735-4-git-send-email-sebastian.hesselbarth@gmail.com>

On Wed, Oct 02, 2013 at 12:57:22PM +0200, Sebastian Hesselbarth wrote:
> DT-based mv643xx_eth probes and creates platform_devices for the
> port devices on its own. To allow fixups for ports based on the
> device_node, we need to set .of_node of the corresponding device
> with the correct node.
> 
> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> ---
> Cc: David Miller <davem@davemloft.net>
> Cc: Lennert Buytenhek <buytenh@wantstofly.org>
> Cc: Jason Cooper <jason@lakedaemon.net>
> Cc: netdev@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/net/ethernet/marvell/mv643xx_eth.c |    1 +
>  1 file changed, 1 insertion(+)

Acked-by: Jason Cooper <jason@lakedaemon.net>

thx,

Jason.

^ permalink raw reply

* Re: [PATCH 2/3] net: mv643xx_eth: fix orphaned statistics timer crash
From: Jason Cooper @ 2013-10-02 12:20 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: David Miller, Lennert Buytenhek, netdev, linux-arm-kernel,
	linux-kernel
In-Reply-To: <1380711442-24735-3-git-send-email-sebastian.hesselbarth@gmail.com>

On Wed, Oct 02, 2013 at 12:57:21PM +0200, Sebastian Hesselbarth wrote:
> The periodic statistics timer gets started at port _probe() time, but
> is stopped on _stop() only. In a modular environment, this can cause
> the timer to access already deallocated memory, if the module is unloaded
> without starting the eth device. To fix this, we add the timer right
> before the port is started, instead of at _probe() time.
> 
> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> ---
> Cc: David Miller <davem@davemloft.net>
> Cc: Lennert Buytenhek <buytenh@wantstofly.org>
> Cc: Jason Cooper <jason@lakedaemon.net>
> Cc: netdev@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/net/ethernet/marvell/mv643xx_eth.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Acked-by: Jason Cooper <jason@lakedaemon.net>

Introduced by:

  4ff3495a mv643xx_eth: enforce frequent hardware statistics polling

which also goes all the way back to v2.6.28

thx,

Jason.

^ permalink raw reply

* [PATCH 09/19 v2] net: bnx2x: Change variable type to bool
From: Peter Senna Tschudin @ 2013-10-02 12:19 UTC (permalink / raw)
  To: eilong; +Cc: netdev, linux-kernel, kernel-janitors, Peter Senna Tschudin
In-Reply-To: <1380716391-20214-1-git-send-email-peter.senna@gmail.com>

The variable rc is only assigned the values true and false.
The function bnx2x_prev_is_path_marked already returns bool.
Change rc type to bool.

The simplified semantic patch that find this problem is as
follows (http://coccinelle.lip6.fr/):

@exists@
type T;
identifier b;
@@
- T
+ bool
  b = ...;
  ... when any
  b = \(true\|false\)

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
---
Changes from v1:
 - Added subsystem prefix to shortlog

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index fccfc1d..105cc80 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -9874,7 +9874,7 @@ static int bnx2x_prev_path_mark_eeh(struct bnx2x *bp)
 static bool bnx2x_prev_is_path_marked(struct bnx2x *bp)
 {
 	struct bnx2x_prev_path_list *tmp_list;
-	int rc = false;
+	bool rc = false;
 
 	if (down_trylock(&bnx2x_prev_sem))
 		return false;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 08/19 v2] net: atl1c: Change variable type to bool
From: Peter Senna Tschudin @ 2013-10-02 12:19 UTC (permalink / raw)
  To: jcliburn
  Cc: chris.snook, jkosina, rdunlap, standby24x7, peter.senna, netdev,
	linux-kernel, kernel-janitors

The variable ret is only assigned the values true and false.
The function atl1c_read_eeprom already returns bool. Change
ret type to bool.

The simplified semantic patch that find this problem is as
follows (http://coccinelle.lip6.fr/):

@exists@
type T;
identifier b;
@@
- T
+ bool
  b = ...;
  ... when any
  b = \(true\|false\)

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
---
Changes from v1:
 - Added subsystem prefix to shortlog

 drivers/net/ethernet/atheros/atl1c/atl1c_hw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c b/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c
index 3ef7092..1cda49a 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_hw.c
@@ -153,7 +153,7 @@ static int atl1c_get_permanent_address(struct atl1c_hw *hw)
 bool atl1c_read_eeprom(struct atl1c_hw *hw, u32 offset, u32 *p_value)
 {
 	int i;
-	int ret = false;
+	bool ret = false;
 	u32 otp_ctrl_data;
 	u32 control;
 	u32 data;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 19/19 v2] net: ipv4: Change variable type to bool
From: Peter Senna Tschudin @ 2013-10-02 12:19 UTC (permalink / raw)
  To: davem
  Cc: kuznet, jmorris, kaber, netdev, linux-kernel, kernel-janitors,
	Peter Senna Tschudin
In-Reply-To: <1380716391-20214-1-git-send-email-peter.senna@gmail.com>

The variable fully_acked is only assigned the values true and false.
Change its type to bool.

The simplified semantic patch that find this problem is as
follows (http://coccinelle.lip6.fr/):

@exists@
type T;
identifier b;
@@
- T
+ bool
  b = ...;
  ... when any
  b = \(true\|false\)

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
---
Changes from v1:
 - Added subsystem prefix to shortlog

 net/ipv4/tcp_input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 25a89ea..fa17dce 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2970,7 +2970,7 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets,
 	const struct inet_connection_sock *icsk = inet_csk(sk);
 	struct sk_buff *skb;
 	u32 now = tcp_time_stamp;
-	int fully_acked = true;
+	bool fully_acked = true;
 	int flag = 0;
 	u32 pkts_acked = 0;
 	u32 reord = tp->packets_out;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH 10/19 v2] net: myri10ge: Change variable type to bool
From: Peter Senna Tschudin @ 2013-10-02 12:19 UTC (permalink / raw)
  To: hykim; +Cc: netdev, linux-kernel, kernel-janitors, Peter Senna Tschudin
In-Reply-To: <1380716391-20214-1-git-send-email-peter.senna@gmail.com>

There is the rc variable on both myri10ge_ss_lock_napi and
myri10ge_ss_lock_poll functions. In both cases rc is only assigned the
values true and false. Both functions already return bool. Change rc
type to bool.

The simplified semantic patch that find this problem is as
follows (http://coccinelle.lip6.fr/):

@exists@
type T;
identifier b;
@@
- T
+ bool
  b = ...;
  ... when any
  b = \(true\|false\)

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
---
Changes from v1:
 - Added subsystem prefix to shortlog

 drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
index 149355b..7792264 100644
--- a/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
+++ b/drivers/net/ethernet/myricom/myri10ge/myri10ge.c
@@ -934,7 +934,7 @@ static inline void myri10ge_ss_init_lock(struct myri10ge_slice_state *ss)
 
 static inline bool myri10ge_ss_lock_napi(struct myri10ge_slice_state *ss)
 {
-	int rc = true;
+	bool rc = true;
 	spin_lock(&ss->lock);
 	if ((ss->state & SLICE_LOCKED)) {
 		WARN_ON((ss->state & SLICE_STATE_NAPI));
@@ -957,7 +957,7 @@ static inline void myri10ge_ss_unlock_napi(struct myri10ge_slice_state *ss)
 
 static inline bool myri10ge_ss_lock_poll(struct myri10ge_slice_state *ss)
 {
-	int rc = true;
+	bool rc = true;
 	spin_lock_bh(&ss->lock);
 	if ((ss->state & SLICE_LOCKED)) {
 		ss->state |= SLICE_STATE_POLL_YIELD;
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH 1/3] net: mv643xx_eth: update statistics timer from timer context only
From: Jason Cooper @ 2013-10-02 12:17 UTC (permalink / raw)
  To: Sebastian Hesselbarth
  Cc: David Miller, Lennert Buytenhek, netdev, linux-arm-kernel,
	linux-kernel
In-Reply-To: <1380711442-24735-2-git-send-email-sebastian.hesselbarth@gmail.com>

On Wed, Oct 02, 2013 at 12:57:20PM +0200, Sebastian Hesselbarth wrote:
> Each port driver installs a periodic timer to update port statistics
> by calling mib_counters_update. As mib_counters_update is also called
> from non-timer context, we should not reschedule the timer there but
> rather move it to timer-only context.
> 
> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> ---
> Cc: David Miller <davem@davemloft.net>
> Cc: Lennert Buytenhek <buytenh@wantstofly.org>
> Cc: Jason Cooper <jason@lakedaemon.net>
> Cc: netdev@vger.kernel.org
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  drivers/net/ethernet/marvell/mv643xx_eth.c |    4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

Acked-by: Jason Cooper <jason@lakedaemon.net>

Introduced by:

  4ff3495a mv643xx_eth: enforce frequent hardware statistics polling

which goes all the way back to v2.6.28

thx,

Jason.

^ permalink raw reply

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Hannes Frederic Sowa @ 2013-10-02 12:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris, kaber,
	herbert
In-Reply-To: <1380710488.19002.67.camel@edumazet-glaptop.roam.corp.google.com>

Hi Eric!

On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> 
> > 
> > This seems correct to me. sk_is_gso would work as well is you apply my
> > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > well" which does the setting of gso_size.
> 
> Well, skb having frags or not should not be a concern :
> Thats an allocation choice (lets say to avoid high order allocations). 
> 
> Setting gso_size is probably better.

e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
approach") states:

"
skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
indicating that hardware has to do checksum calculation. Hardware should
compute the UDP checksum of complete datagram and also ip header checksum of
each fragmented IP packet.
"

This is the reason why I tried not to update the gso_size. If it is ok, I am
fine with that.

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Jiri Pirko @ 2013-10-02 12:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert, hannes
In-Reply-To: <1380714835.19002.92.camel@edumazet-glaptop.roam.corp.google.com>

Wed, Oct 02, 2013 at 01:53:55PM CEST, eric.dumazet@gmail.com wrote:
>
>> This patch should fix this on ipv4 as well:
>> 
>> Subject: ip_output: do skb ufo init for peeked non ufo skb as well
>> 
>
>Is it an official patch ? s/Subject:/[PATCH]/ 

Yes. I thought that to state "Subject:" is enough for patchwork to parse
it. Apparently not :/

>
>> Now, if user application does:
>
>Any idea when the bug was added (commit id + title) ?

This is hard to say. This code is more or less broken from the very
beginning, which is:

commit e89e9cf539a28df7d0eb1d0a545368e9920b34ac "[IPv4/IPv6]: UFO Scatter-gather approach"

>
>> sendto len<mtu flag MSG_MORE
>> sendto len>mtu flag 0
>> The skb is not treated as fragmented one because it is not initialized
>> that way. So move the initialization to fix this.
>> 
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>  net/ipv4/ip_output.c | 14 +++++++++-----
>>  1 file changed, 9 insertions(+), 5 deletions(-)
>> 
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index a04d872..bd21c5d 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -772,15 +772,19 @@ static inline int ip_ufo_append_data(struct sock *sk,
>>  		/* initialize protocol header pointer */
>>  		skb->transport_header = skb->network_header + fragheaderlen;
>>  
>> -		skb->ip_summed = CHECKSUM_PARTIAL;
>>  		skb->csum = 0;
>
>Any idea why we have skb->csum = 0 here ?

If I understand this correctly, that can be removed from here.


>
>Thanks !
>
>

^ permalink raw reply

* RE: [PATCH net-next] tcp: shrink tcp6_timewait_sock by one cache line
From: David Laight @ 2013-10-02 12:08 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: netdev
In-Reply-To: <1380711604.19002.78.camel@edumazet-glaptop.roam.corp.google.com>

> -	tmo = tw->tw_ttd - jiffies;
> +	tmo = tw->tw_ttd - (u32)jiffies;

Do you need any of these (u32) casts?
The compiler will almost certainly use 32bit arithmetic (on 32bit systems at least)
because the 'as if' rule lets if use the smaller type.

> +		tw->tw_ttd = (u32)(jiffies + (slot << INET_TWDR_RECYCLE_TICK));

If that (u32) cast is needed in order to avoid 64bit maths, it is in the wrong place.

	David


^ permalink raw reply

* Re: [PATCH net-next v5 2/3] bonding: modify the old and add new xmit hash policies
From: Eric Dumazet @ 2013-10-02 12:03 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, davem, andy, fubar, vfalico
In-Reply-To: <1380713966-3891-3-git-send-email-nikolay@redhat.com>

On Wed, 2013-10-02 at 13:39 +0200, Nikolay Aleksandrov wrote:
> This patch adds two new hash policy modes which use skb_flow_dissect:
> 3 - Encapsulated layer 2+3
> 4 - Encapsulated layer 3+4
> There should be a good improvement for tunnel users in those modes.
> It also changes the old hash functions to:
> hash ^= (__force u32)flow.dst ^ (__force u32)flow.src;
> hash ^= (hash >> 16);
> hash ^= (hash >> 8);
> 
> Where hash will be initialized either to L2 hash, that is
> SRCMAC[5] XOR DSTMAC[5], or to flow->ports which should be extracted
> from the upper layer. Flow's dst and src are also extracted based on the
> xmit policy either directly from the buffer or by using skb_flow_dissect,
> but in both cases if the protocol is IPv6 then dst and src are obtained by
> ipv6_addr_hash() on the real addresses. In case of a non-dissectable
> packet, the algorithms fall back to L2 hashing.
> The bond_set_mode_ops() function is now obsolete and thus deleted
> because it was used only to set the proper hash policy. Also we trim a
> pointer from struct bonding because we no longer need to keep the hash
> function, now there's only a single hash function - bond_xmit_hash that
> works based on bond->params.xmit_policy.
> 
> The hash function and skb_flow_dissect were suggested by Eric Dumazet.
> The layer names were suggested by Andy Gospodarek, because I suck at
> semantics.
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> ---

Very nice, thanks !

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Hannes Frederic Sowa @ 2013-10-02 12:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert,
	eric.dumazet
In-Reply-To: <20131002103333.GB1528@minipsycho.brq.redhat.com>

On Wed, Oct 02, 2013 at 12:33:33PM +0200, Jiri Pirko wrote:
> Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> >On Tue, Oct 01, 2013 at 11:47:21PM +0200, Hannes Frederic Sowa wrote:
> >> The strange thing is that if I don't do the IPV6_MTU setsockopt I don't
> >> get an oops.
> >
> >This is incorrect, it just depends on the size of the writes and on the
> >interface mtu.
> >
> >> IPv4 seems to work without problems, too.
> >
> >I also get kernel oopses from IPv4 now, too.
> 
> I'm not able to trigger this with ipv4. Can you please send strace
> output for this as well?

I used this snippet on loopback with UFO enabled and lo mtu 1280.

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <linux/udp.h>
#include <stdio.h>

int test(int mtu)
{
        int fd;
        const int one = 1;
        const int off = 0;
        struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
        unsigned char buffer[3701];

        inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);

        fd = socket(AF_INET, SOCK_DGRAM, 0);
        connect(fd, (struct sockaddr *) &addr, sizeof(addr));

        setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));

        write(fd, " ", 1);
        write(fd, buffer, sizeof(buffer));
        write(fd, " ", 1);

        setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));

        close(fd);
}

int main() {
        test(1280);
}

Greetings,

  Hannes

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox