Netdev List
 help / color / mirror / Atom feed
* Re: Poorer networking performance in later kernels?
From: Eric Dumazet @ 2016-04-15 22:33 UTC (permalink / raw)
  To: Butler, Peter; +Cc: netdev@vger.kernel.org
In-Reply-To: <SN1PR0301MB19983946D7A38001E4B54D20D6680@SN1PR0301MB1998.namprd03.prod.outlook.com>

On Fri, 2016-04-15 at 21:02 +0000, Butler, Peter wrote:
> (Please keep me CC'd to all comments/responses)
> 
> I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop in networking performance.  Nothing was changed on the test systems, other than the kernel itself (and kernel modules).  The identical .config used to build the 3.4.2 kernel was brought over into the 4.4.0 kernel source tree, and any configuration differences (e.g. new parameters, etc.) were taken as default values.
> 
> The testing was performed on the same actual hardware for both kernel versions (i.e. take the existing 3.4.2 physical setup, simply boot into the (new) kernel and run the same test).  The netperf utility was used for benchmarking and the testing was always performed on idle systems.
> 
> TCP testing yielded the following results, where the 4.4.0 kernel only got about 1/2 of the throughput:
> 
>       Recv     Send       Send                          Utilization       Service Demand
>       Socket   Socket     Message Elapsed               Send     Recv     Send    Recv
>       Size     Size       Size    Time       Throughput local    remote   local   remote
>       bytes    bytes      bytes   secs.      10^6bits/s % S      % S      us/KB   us/KB
> 
> 3.4.2 13631488 13631488   8952    30.01      9370.29    10.14    6.50     0.709   0.454
> 4.4.0 13631488 13631488   8952    30.02      5314.03    9.14     14.31    1.127   1.765
> 
> SCTP testing yielded the following results, where the 4.4.0 kernel only got about 1/3 of the throughput:
> 
>       Recv     Send       Send                          Utilization       Service Demand
>       Socket   Socket     Message Elapsed               Send     Recv     Send    Recv
>       Size     Size       Size    Time       Throughput local    remote   local   remote
>       bytes    bytes      bytes   secs.      10^6bits/s  % S     % S      us/KB   us/KB
> 
> 3.4.2 13631488 13631488   8952    30.00      2306.22    13.87    13.19    3.941   3.747
> 4.4.0 13631488 13631488   8952    30.01       882.74    16.86    19.14    12.516  14.210
> 
> The same tests were performed a multitude of time, and are always consistent (within a few percent).  I've also tried playing with various run-time kernel parameters (/proc/sys/kernel/net/...) on the 4.4.0 kernel to alleviate the issue but have had no success at all.
> 
> I'm at a loss as to what could possibly account for such a discrepancy...

Maybe new kernel is faster and you have drops somewhere ?

nstat >/dev/null
netperf -H ...
nstat

Would help

^ permalink raw reply

* Re: [PATCH] net: phy: Ensure the state machine is called when phy is UP
From: Andrew Lunn @ 2016-04-15 22:30 UTC (permalink / raw)
  To: Alexandre Belloni
  Cc: Florian Fainelli, David S . Miller, Nicolas Ferre, netdev,
	linux-kernel
In-Reply-To: <20160415221711.GG25196@piout.net>

On Sat, Apr 16, 2016 at 12:17:11AM +0200, Alexandre Belloni wrote:
> On 16/04/2016 at 00:05:08 +0200, Andrew Lunn wrote :
> > > Trace without my patch:
> > > libphy: MACB_mii_bus: probed
> > > macb f8020000.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf8020000 irq 27 (fc:c2:3d:0c:6e:05)
> > > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: attached PHY driver [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f8020000.etherne:01, irq=171)
> > > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
> > > [...]
> > > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
> > 
> > Are there some state changes before this? How is it getting to state
> > READY? It would expect it to start in DOWN, from when the phy device
> > was created in phy_device_create().
> > 
> 
> No other changes. I forgot to mention that this is when booting with a
> cable plugged in. Unplugging and replugging the cable makes the link
> detection work fine even without the patch.

Are you tftpbooting? I.e. has the boot loader already done an auto
negotiation?

I've looked at the code and i still don't see how it gets to READY.
What i do see is that when you connect the phy to the MAC, the
interrupt handler is installed. So maybe there are some PHY interrupts
before the interface is opened? Could you put a print in
phy_interrupt().

	Andrew

^ permalink raw reply

* Re: [PATCH] net: phy: Ensure the state machine is called when phy is UP
From: Florian Fainelli @ 2016-04-15 22:23 UTC (permalink / raw)
  To: Alexandre Belloni, Andrew Lunn
  Cc: David S . Miller, Nicolas Ferre, netdev, linux-kernel
In-Reply-To: <20160415221711.GG25196@piout.net>

On 15/04/16 15:17, Alexandre Belloni wrote:
> On 16/04/2016 at 00:05:08 +0200, Andrew Lunn wrote :
>>> Trace without my patch:
>>> libphy: MACB_mii_bus: probed
>>> macb f8020000.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf8020000 irq 27 (fc:c2:3d:0c:6e:05)
>>> Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: attached PHY driver [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f8020000.etherne:01, irq=171)
>>> Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
>>> [...]
>>> Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
>>
>> Are there some state changes before this? How is it getting to state
>> READY? It would expect it to start in DOWN, from when the phy device
>> was created in phy_device_create().
>>
> 
> No other changes. I forgot to mention that this is when booting with a
> cable plugged in. Unplugging and replugging the cable makes the link
> detection work fine even without the patch.

OK, so the last hunk of the change in d5c3d84657db ("net: phy: Avoid
polling PHY with PHY_IGNORE_INTERRUPTS"):

-       queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
-                          PHY_STATE_TIME * HZ);
+       /* Only re-schedule a PHY state machine change if we are polling the
+        * PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
+        * between states from phy_mac_interrupt()
+        */
+       if (phydev->irq == PHY_POLL)
+               queue_delayed_work(system_power_efficient_wq,
&phydev->state_queue,
+                                  PHY_STATE_TIME * HZ);


is presumably what broke for you, right?

Could you also give this patch a spin and see if it works better with
it? The macb driver does something racy with how the MDIO and PHY are
probe wrt. registering the netdev, that needs fixing too.

diff --git a/drivers/net/ethernet/cadence/macb.c
b/drivers/net/ethernet/cadence/macb.c
index eec3200ade4a..98b99149ce0b 100644
--- a/drivers/net/ethernet/cadence/macb.c
+++ b/drivers/net/ethernet/cadence/macb.c
@@ -3005,28 +3005,36 @@ static int macb_probe(struct platform_device *pdev)
        if (err)
                goto err_out_free_netdev;

+       err = macb_mii_init(bp);
+       if (err)
+               goto err_out_free_netdev;
+
+       phydev = bp->phy_dev;
+       phy_attached_info(phydev);
+
+       netif_carrier_off(dev);
+
        err = register_netdev(dev);
        if (err) {
                dev_err(&pdev->dev, "Cannot register net device,
aborting.\n");
                goto err_out_unregister_netdev;
        }

-       err = macb_mii_init(bp);
-       if (err)
-               goto err_out_unregister_netdev;
-
-       netif_carrier_off(dev);
-
        netdev_info(dev, "Cadence %s rev 0x%08x at 0x%08lx irq %d (%pM)\n",
                    macb_is_gem(bp) ? "GEM" : "MACB", macb_readl(bp, MID),
                    dev->base_addr, dev->irq, dev->dev_addr);

-       phydev = bp->phy_dev;
-       phy_attached_info(phydev);
-
        return 0;

 err_out_unregister_netdev:
+       phy_disconnect(bp->phy_dev);
+       mdiobus_unregister(bp->mii_bus);
+       mdiobus_free(bp->mii_bus);
+
+       /* Shutdown the PHY if there is a GPIO reset */
+       if (bp->reset_gpio)
+               gpiod_set_value(bp->reset_gpio, 0);
+
        unregister_netdev(dev);

 err_out_free_netdev:



-- 
Florian

^ permalink raw reply related

* Re: [PATCH] net: phy: Ensure the state machine is called when phy is UP
From: Alexandre Belloni @ 2016-04-15 22:17 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, David S . Miller, Nicolas Ferre, netdev,
	linux-kernel
In-Reply-To: <20160415220508.GC26665@lunn.ch>

On 16/04/2016 at 00:05:08 +0200, Andrew Lunn wrote :
> > Trace without my patch:
> > libphy: MACB_mii_bus: probed
> > macb f8020000.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf8020000 irq 27 (fc:c2:3d:0c:6e:05)
> > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: attached PHY driver [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f8020000.etherne:01, irq=171)
> > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
> > [...]
> > Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
> 
> Are there some state changes before this? How is it getting to state
> READY? It would expect it to start in DOWN, from when the phy device
> was created in phy_device_create().
> 

No other changes. I forgot to mention that this is when booting with a
cable plugged in. Unplugging and replugging the cable makes the link
detection work fine even without the patch.

-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply

* Poorer networking performance in later kernels?
From: Butler, Peter @ 2016-04-15 21:02 UTC (permalink / raw)
  To: netdev@vger.kernel.org
In-Reply-To: <SN1PR0301MB19987565B71A4688593A2CDBD6680@SN1PR0301MB1998.namprd03.prod.outlook.com>

(Please keep me CC'd to all comments/responses)

I've tried a kernel upgrade from 3.4.2 to 4.4.0 and see a marked drop in networking performance.  Nothing was changed on the test systems, other than the kernel itself (and kernel modules).  The identical .config used to build the 3.4.2 kernel was brought over into the 4.4.0 kernel source tree, and any configuration differences (e.g. new parameters, etc.) were taken as default values.

The testing was performed on the same actual hardware for both kernel versions (i.e. take the existing 3.4.2 physical setup, simply boot into the (new) kernel and run the same test).  The netperf utility was used for benchmarking and the testing was always performed on idle systems.

TCP testing yielded the following results, where the 4.4.0 kernel only got about 1/2 of the throughput:

      Recv     Send       Send                          Utilization       Service Demand
      Socket   Socket     Message Elapsed               Send     Recv     Send    Recv
      Size     Size       Size    Time       Throughput local    remote   local   remote
      bytes    bytes      bytes   secs.      10^6bits/s % S      % S      us/KB   us/KB

3.4.2 13631488 13631488   8952    30.01      9370.29    10.14    6.50     0.709   0.454
4.4.0 13631488 13631488   8952    30.02      5314.03    9.14     14.31    1.127   1.765

SCTP testing yielded the following results, where the 4.4.0 kernel only got about 1/3 of the throughput:

      Recv     Send       Send                          Utilization       Service Demand
      Socket   Socket     Message Elapsed               Send     Recv     Send    Recv
      Size     Size       Size    Time       Throughput local    remote   local   remote
      bytes    bytes      bytes   secs.      10^6bits/s  % S     % S      us/KB   us/KB

3.4.2 13631488 13631488   8952    30.00      2306.22    13.87    13.19    3.941   3.747
4.4.0 13631488 13631488   8952    30.01       882.74    16.86    19.14    12.516  14.210

The same tests were performed a multitude of time, and are always consistent (within a few percent).  I've also tried playing with various run-time kernel parameters (/proc/sys/kernel/net/...) on the 4.4.0 kernel to alleviate the issue but have had no success at all.

I'm at a loss as to what could possibly account for such a discrepancy...

^ permalink raw reply

* Re: [PATCH] net: phy: Ensure the state machine is called when phy is UP
From: Andrew Lunn @ 2016-04-15 22:05 UTC (permalink / raw)
  To: Alexandre Belloni
  Cc: Florian Fainelli, David S . Miller, Nicolas Ferre, netdev,
	linux-kernel
In-Reply-To: <20160415205613.GE25196@piout.net>

> Trace without my patch:
> libphy: MACB_mii_bus: probed
> macb f8020000.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf8020000 irq 27 (fc:c2:3d:0c:6e:05)
> Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: attached PHY driver [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f8020000.etherne:01, irq=171)
> Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
> [...]
> Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY

Are there some state changes before this? How is it getting to state
READY? It would expect it to start in DOWN, from when the phy device
was created in phy_device_create().

       Andrew

^ permalink raw reply

* Re: [PATCH net-next 7/7] net: dsa: mv88e6xxx: drop switch id
From: Andrew Lunn @ 2016-04-15 21:51 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <87k2jya8ot.fsf@ketchup.mtl.sfl>

On Fri, Apr 15, 2016 at 05:00:50PM -0400, Vivien Didelot wrote:
> Hi Andrew,
> 
> Andrew Lunn <andrew@lunn.ch> writes:
> 
> <snip>
> 
> >> -#define PORT_SWITCH_ID_6350	0x3710
> >> -#define PORT_SWITCH_ID_6351	0x3750
> >> -#define PORT_SWITCH_ID_6352	0x3520
> >
> > NACK
> >
> > These numbers are not obvious. PORT_SWITCH_ID_6320 i can
> > understand. 0x1150 i have no idea what it is.
> 
> 0x1150 is not even correct. That's the product number (bits 4:15) masked
> with an assumed revision 0 (bits 0:3).
> 
> That leads to confusion and error, as seen in the patch 2/7.
> 
> These values are now only used in a device description table, where they
> seem pretty understandable to me.

      { MV88E6XXX_INFO(6320, 0x115, "Marvell 88E6320") },
      { MV88E6XXX_INFO(6320, 0x310, "Marvell 88E6321") },

What does 0x115 have to do with 6320?
What does 0x310 have to do with 6321?

Most do have a pattern, but not all. For a few devices, Marvell has
used /dev/random to pick the ID. Using the macro PORT_SWITCH_ID_6320
documents where these numbers come from, and how to figure out the
correct number of a new device, etc.

> But OK if we really want them defined, I'll introduce 12-bit
> PORT_SWITCH_ID_PROD_NUM_* before dropping the 16-bit
> PORT_SWITCH_ID_*.

I'm O.K. with that.

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
From: Hannes Frederic Sowa @ 2016-04-15 21:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, eric.dumazet, jbenc, marcelo.leitner
In-Reply-To: <20160415.163644.1883719564658558438.davem@davemloft.net>

On 15.04.2016 22:36, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Fri,  8 Apr 2016 22:55:01 +0200
>
>> @@ -1053,7 +1052,9 @@ static void __vxlan_sock_release(struct vxlan_sock *vs)
>>   	vxlan_notify_del_rx_port(vs);
>>   	spin_unlock(&vn->sock_lock);
>>
>> -	queue_work(vxlan_wq, &vs->del_work);
>> +	synchronize_net();
>> +	udp_tunnel_sock_release(vs->sock);
>> +	kfree(vs);
>>   }
>>
>>   static void vxlan_sock_release(struct vxlan_dev *vxlan)
>
> I just want to make sure you saw this change in net-next:
>
> ====================
> commit ca065d0cf80fa547724440a8bf37f1e674d917c0
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Fri Apr 1 08:52:13 2016 -0700
>
>      udp: no longer use SLAB_DESTROY_BY_RCU
> ====================
>
> Does that effect your change?

I have seen this patch and it does not affect this patch:

The socket is matched from the net->vxlan_net->hlist in fast path. I 
don't want to destruct (kern_sock_shutdown) the socket while packets 
could be in flight through the vxlan stack. We clean up socket memory 
and destruct after rcu again, but given that we only do this during 
ifdown of a vxlan interface I don't see that we need to optimize this again.

All other tunneling protocols don't look up sockets in fast path, so 
they don't need to protect against this.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
From: Marcelo Ricardo Leitner @ 2016-04-15 21:47 UTC (permalink / raw)
  To: Stephen Hemminger, Hannes Frederic Sowa
  Cc: Cong Wang, Linux Kernel Network Developers, Eric Dumazet,
	Jiri Benc
In-Reply-To: <20160415135832.773707e3@xeon-e3>

Em 15-04-2016 17:58, Stephen Hemminger escreveu:
> On Sat, 09 Apr 2016 01:55:06 +0200
> Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
>
>>
>>
>> On Sat, Apr 9, 2016, at 01:24, Cong Wang wrote:
>>> On Fri, Apr 8, 2016 at 1:55 PM, Hannes Frederic Sowa
>>> <hannes@stressinduktion.org> wrote:
>>>> Due to the fact that the udp socket is destructed asynchronously in a
>>>> work queue, we have some nondeterministic behavior during shutdown of
>>>> vxlan tunnels and creating new ones. Fix this by keeping the destruction
>>>> process synchronous in regards to the user space process so IFF_UP can
>>>> be reliably set.
>>>>
>>>> udp_tunnel_sock_release destroys vs->sock->sk if reference counter
>>>> indicates so. We expect to have the same lifetime of vxlan_sock and
>>>> vxlan_sock->sock->sk even in fast paths with only rcu locks held. So
>>>> only destruct the whole socket after we can be sure it cannot be found
>>>> by searching vxlan_net->sock_list.
>>>>
>>>
>>> I am wondering what is the reason why we used work queue from
>>> the beginning?
>>
>> I actually don't know. It was like that from the beginning. I cc'ed
>> Stephen, maybe he remembers?
>>
>> Bye,
>> Hannes
>
> The problem was that VXLAN needs to update multicast settings and that
> can't be done under RTNL.

If the socket destroy was delayed just due to this, it should be all 
good then, because I took care of this multicast issue on commits

56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope")
54ff9ef36bdf ("ipv4, ipv6: kill ip_mc_{join, leave}_group and 
ipv6_sock_mc_{join, drop}")

   Marcelo

^ permalink raw reply

* Re: [PATCHv3 net-next 0/6] sctp: support sctp_diag in kernel
From: David Miller @ 2016-04-15 21:30 UTC (permalink / raw)
  To: lucien.xin
  Cc: netdev, linux-sctp, marcelo.leitner, vyasevich, daniel,
	eric.dumazet
In-Reply-To: <cover.1460618169.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Thu, 14 Apr 2016 15:35:29 +0800

> This patchset will add sctp_diag module to implement diag interface on
> sctp in kernel.
 ...

Looks good to me, series applied, thanks.

Please follow up on the suggestion to use jiffies_to_ms(), thanks.

^ permalink raw reply

* Re: [PATCHv3 net-next 1/6] sctp: add sctp_info dump api for sctp_diag
From: David Miller @ 2016-04-15 21:28 UTC (permalink / raw)
  To: lucien.xin
  Cc: netdev, linux-sctp, marcelo.leitner, vyasevich, daniel,
	eric.dumazet
In-Reply-To: <cd6bf9c2696f125a74491e1f82b77a30bb1005dd.1460618169.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Thu, 14 Apr 2016 15:35:30 +0800

> sctp_diag will dump some important details of sctp's assoc or ep, we use
> sctp_info to describe them,  sctp_get_sctp_info to get them, and export
> it to sctp_diag.ko.
> 
> v2->v3:
> - we will not use list_for_each_safe in sctp_get_sctp_info, cause
>   all the callers of it will use lock_sock.
> 
> - fix the holes in struct sctp_info with __reserved* field.
>   because sctp_diag is a new feature, and sctp_info is just for now,
>   it may be changed in the future.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Feedback was given here not to mix the changelog and the commit message.

And I want to explicitly state that I totally and _COMPLETELY_ disagree
with this.

It is absolutely essential information and belongs in the commit message.

Adding more information never hurts, so don't do this crap of putting
things that might be useful to know after the "---", ever.

Someone in the future might ask "why didn't he implement it like XXX"
and the changelog can tell him that originally that is what was done
and feedback was given to do it differently.

So Xin thanks for correctly putting the changelog inside of the commit
message, so that future developers can benefit from this knowledge.

^ permalink raw reply

* Re: [PATCH for-next V1 0/2] mlx5_core: mlx5_ifc updates
From: David Miller @ 2016-04-15 21:22 UTC (permalink / raw)
  To: saeedm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: saeedm-VPRAkNaXOzVWk0Htik3J/w, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ogerlitz-VPRAkNaXOzVWk0Htik3J/w, matanb-VPRAkNaXOzVWk0Htik3J/w,
	leonro-VPRAkNaXOzVWk0Htik3J/w, talal-VPRAkNaXOzVWk0Htik3J/w
In-Reply-To: <CALzJLG__o56edSf__To-BM6jakuwP7zAAEEQH8LQNTKp-2LRDQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

From: Saeed Mahameed <saeedm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Date: Fri, 15 Apr 2016 20:10:07 +0300

> On Wed, Apr 13, 2016 at 7:11 PM, Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>> Hi Dave and Doug
>>
>> Changes form V0:
>>         - 2nd patch commit message fixes.
>>
>> This series include mlx5_core updates for both net-next and rdma
>> trees for 4.7 kernel cycle. This is the only shared code planned
>> for 4.7 between rdma and net trees. Hopefully, this will prevent
>> future conflicts when merging between ib-next and net-next once
>> 4.7 cycle is over and merge window is opened.
>>
>> Both Mellanox rdma and net submissions will proceed once this series
>> is applied into both trees.
>>
>> Future shared code will be sent to both maintainers as pull requests
>> from Mellanox's kernel.org tree.
>>
>> We have included all the maintainers of respective drivers.
>> Kindly review the change and let us know in case of any review comments.
>>
>> Saeed Mahameed (1):
>>   net/mlx5: Update mlx5_ifc hardware features
>>
>> Tariq Toukan (1):
>>   net/mlx5: Fix mlx5 ifc cmd_hca_cap bad offsets
>>
>>  include/linux/mlx5/mlx5_ifc.h |  253 +++++++++++++++++++++++++++++------------
>>  1 files changed, 179 insertions(+), 74 deletions(-)
>>
>> --
> 
> Hi Dave,
> 
> This series is still in "Changes Requested" state in patchwork, but
> there is nothing to change here.
> I will be glad if you give it a shot, it is blocking all of our mlx5
> activities for both net and rdma trees.

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] sctp: simplify sk_receive_queue locking
From: David Miller @ 2016-04-15 21:22 UTC (permalink / raw)
  To: marcelo.leitner; +Cc: netdev, linux-sctp, vyasevich, nhorman
In-Reply-To: <6c4b2f1fab1e792537cc1661b130724d1ea26279.1460583258.git.marcelo.leitner@gmail.com>

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Wed, 13 Apr 2016 19:12:29 -0300

> SCTP already serializes access to rcvbuf through its sock lock:
> sctp_recvmsg takes it right in the start and release at the end, while
> rx path will also take the lock before doing any socket processing. On
> sctp_rcv() it will check if there is an user using the socket and, if
> there is, it will queue incoming packets to the backlog. The backlog
> processing will do the same. Even timers will do such check and
> re-schedule if an user is using the socket.
> 
> Simplifying this will allow us to remove sctp_skb_list_tail and get ride
> of some expensive lockings.  The lists that it is used on are also
> mangled with functions like __skb_queue_tail and __skb_unlink in the
> same context, like on sctp_ulpq_tail_event() and sctp_clear_pd().
> sctp_close() will also purge those while using only the sock lock.
> 
> Therefore the lockings performed by sctp_skb_list_tail() are not
> necessary. This patch removes this function and replaces its calls with
> just skb_queue_splice_tail_init() instead.
> 
> The biggest gain is at sctp_ulpq_tail_event(), because the events always
> contain a list, even if it's queueing a single skb and this was
> triggering expensive calls to spin_lock_irqsave/_irqrestore for every
> data chunk received.
> 
> As SCTP will deliver each data chunk on a corresponding recvmsg, the
> more effective the change will be.
> Before this patch, with chunks with 30 bytes:
> netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000
> 400000 -s 400000 400000
> on a 10Gbit link with 1500 MTU:
 ...
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v2 net-next 0/5] qed/qede: Add tunneling support
From: David Miller @ 2016-04-15 21:08 UTC (permalink / raw)
  To: manish.chopra; +Cc: netdev, Ariel.Elior, Yuval.Mintz
In-Reply-To: <1460612313-20323-1-git-send-email-manish.chopra@qlogic.com>

From: Manish Chopra <manish.chopra@qlogic.com>
Date: Thu, 14 Apr 2016 01:38:28 -0400

> This patch series adds support for VXLAN, GRE and GENEVE tunnels
> to be used over this driver. With this support, adapter can perform
> TSO offload, inner/outer checksums offloads on TX and RX for
> encapsulated packets.
> 
> V1->V2 [ Comments from Jesse Gross incorporated ]
> * Drop general infrastructure change patch.
>   "net: Make vxlan/geneve default udp ports public"
> * Remove by default Linux default UDP ports configurations in driver.
>   Instead, use general registration APIs for UDP port configurations
> * Removing .ndo_features_check - we will add it later with proper change.
> 
> Please consider applying this series to net-next.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net/hsr: Added support for HSR v1
From: David Miller @ 2016-04-15 21:08 UTC (permalink / raw)
  To: mail
  Cc: arvid.brodin, hannes, sd, henrik, nikolay, tgraf, linville, gospo,
	dsa, eranbe, ast, netdev, peter.heise
In-Reply-To: <20160413115222.GA42572@aircraft-controller>

From: Peter Heise <mail@pheise.de>
Date: Wed, 13 Apr 2016 13:52:22 +0200

> This patch adds support for the newer version 1 of the HSR
> networking standard. Version 0 is still default and the new
> version has to be selected via iproute2.
> 
> Main changes are in the supervision frame handling and its
> ethertype field.
> 
> Signed-off-by: Peter Heise <peter.heise@airbus.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 4/7] net: dsa: mv88e6xxx: add family to info
From: Vivien Didelot @ 2016-04-15 21:06 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <notmuch-sha1-5b8efc36765bacf253ff62234056042d5c4c5c37>

Vivien Didelot <vivien.didelot@savoirfairelinux.com> writes:

>>> +	{ MV88E6XXX_INFO(6165, 0x165, "Marvell 88E6165") },
>>
>> I think 
>>
>>> +	{ MV88E6XXX_INFO(MV88E6XXX_FAMILY_6165, 0x165, "Marvell 88E6165") },
>>
>> is clearer. It is hard to know what these values mean unless you go
>> look at the macro.
>
> Same goes for the MV88E6XXX_INFO macro... I wanted to avoid long lines
> while keeping the info table clear enough.
>
>     MV88E6XXX_INFO(0x121, "Marvell 88E6123",
>                    MV88E6XXX_FAMILY_6165,) },
>    /*             Family   Prod   Name             */
>    { MV88E6XXX_INFO(6165, 0x121, "Marvell 88E6123") },
>    { MV88E6XXX_INFO(6165, 0x161, "Marvell 88E6161") },
>    { MV88E6XXX_INFO(6165, 0x165, "Marvell 88E6165") },
>    { MV88E6XXX_INFO(6165, 0x165, "Marvell 88E6165") },
>
> But I don't really mind in fact, we'll do as you guys wish.

Oops, sent too fast. Thinking about that, I'll just keep plain struct
mv88e6xxx_info in the tables and we will maybe introduce such macro when
merging everything together.

Thanks,
Vivien

^ permalink raw reply

* Re: [PATCH net-next 7/7] net: dsa: mv88e6xxx: drop switch id
From: Vivien Didelot @ 2016-04-15 21:00 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli
In-Reply-To: <20160415193818.GE18523@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

<snip>

>> -#define PORT_SWITCH_ID_6350	0x3710
>> -#define PORT_SWITCH_ID_6351	0x3750
>> -#define PORT_SWITCH_ID_6352	0x3520
>
> NACK
>
> These numbers are not obvious. PORT_SWITCH_ID_6320 i can
> understand. 0x1150 i have no idea what it is.

0x1150 is not even correct. That's the product number (bits 4:15) masked
with an assumed revision 0 (bits 0:3).

That leads to confusion and error, as seen in the patch 2/7.

These values are now only used in a device description table, where they
seem pretty understandable to me.

This header file is full of inconsistencies. We have masks, offsets,
shifts, shifted and unshifted values, just for the sake of hidding said
magic numbers, while an explicit comment in a function could do the job.

But OK if we really want them defined, I'll introduce 12-bit
PORT_SWITCH_ID_PROD_NUM_* before dropping the 16-bit PORT_SWITCH_ID_*.

Thanks,
Vivien

^ permalink raw reply

* Re: [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
From: Stephen Hemminger @ 2016-04-15 20:58 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Cong Wang, Linux Kernel Network Developers, Eric Dumazet,
	Jiri Benc, Marcelo Ricardo Leitner
In-Reply-To: <1460159706.2880965.573380353.39845928@webmail.messagingengine.com>

On Sat, 09 Apr 2016 01:55:06 +0200
Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:

> 
> 
> On Sat, Apr 9, 2016, at 01:24, Cong Wang wrote:
> > On Fri, Apr 8, 2016 at 1:55 PM, Hannes Frederic Sowa
> > <hannes@stressinduktion.org> wrote:
> > > Due to the fact that the udp socket is destructed asynchronously in a
> > > work queue, we have some nondeterministic behavior during shutdown of
> > > vxlan tunnels and creating new ones. Fix this by keeping the destruction
> > > process synchronous in regards to the user space process so IFF_UP can
> > > be reliably set.
> > >
> > > udp_tunnel_sock_release destroys vs->sock->sk if reference counter
> > > indicates so. We expect to have the same lifetime of vxlan_sock and
> > > vxlan_sock->sock->sk even in fast paths with only rcu locks held. So
> > > only destruct the whole socket after we can be sure it cannot be found
> > > by searching vxlan_net->sock_list.
> > >
> > 
> > I am wondering what is the reason why we used work queue from
> > the beginning?
> 
> I actually don't know. It was like that from the beginning. I cc'ed
> Stephen, maybe he remembers?
> 
> Bye,
> Hannes

The problem was that VXLAN needs to update multicast settings and that
can't be done under RTNL.

^ permalink raw reply

* Re: [PATCH] net: phy: Ensure the state machine is called when phy is UP
From: Alexandre Belloni @ 2016-04-15 20:56 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: David S . Miller, Nicolas Ferre, netdev, linux-kernel,
	Andrew Lunn
In-Reply-To: <57114AA4.5080803@gmail.com>

On 15/04/2016 at 13:10:12 -0700, Florian Fainelli wrote :
> On 15/04/16 12:56, Alexandre Belloni wrote:
> > Commit d5c3d84657db ("net: phy: Avoid polling PHY with
> > PHY_IGNORE_INTERRUPTS") removed the last polling done on the phy. Since
> > then, the last actual poll done on the phy happens PHY_STATE_TIME seconds
> > (that is actually one second) after registering the phy. If the interface
> > is not UP by that time, any previous IRQ indicating the link is up is
> > ignored. Moreover, nothing will start the autonegociation so the phy will
> > simply change from READY to UP and never actually go to RUNNING.
> 
> What do you mean by that? phy_start() will start auto-negotiation.
> 

In my case, it doesn't because it switches the state from PHY_READY to
PHY_UP but phy_state_machine() is never called afterwards.

> > The one second delay explains why the issue is not seen when booting from
> > NFS or when the interface is configured at boot time.
> > 
> > To solve that, ensure the state machine is called as soon as the state
> > changes from READY to UP.
> 
> The fix may be good, but I would like to see which driver are you
> observing this with? Also, having a capture of the PHY state machine
> with debug prints enabled could help us figure out the sequence of
> events leading to what you observed.
> 

I'm using a macb with a Micrel KSZ8081.

Trace without my patch:
libphy: MACB_mii_bus: probed
macb f8020000.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf8020000 irq 27 (fc:c2:3d:0c:6e:05)
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: attached PHY driver [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f8020000.etherne:01, irq=171)
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
[...]
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
[...]
# ifconfig eth0 up
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready


With my patch:
libphy: MACB_mii_bus: probed
macb f8020000.ethernet eth0: Cadence GEM rev 0x00020120 at 0xf8020000 irq 27 (fc:c2:3d:0c:6e:05)
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: attached PHY driver [Micrel KSZ8081 or KSZ8091] (mii_bus:phy_addr=f8020000.etherne:01, irq=171)
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
[...]
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change READY -> READY
[...]
# ifconfig eth0 up
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change UP -> AN
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change AN -> NOLINK
macb f8020000.ethernet eth0: link up (100/Full)
Micrel KSZ8081 or KSZ8091 f8020000.etherne:01: PHY state change CHANGELINK -> RUNNING
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready



> Assuming you might be using the macb driver, I see a race condition in
> how macb_probe() registers for its MDIO bus and probes for the PHY,
> after calling register_netdev(), which is something that is not good,
> because as soon as register_netdev() is called, an in-kernel notifier
> can start opening the device for use before you have returned...
> 

Well, I'm not sure  I'm running into that because phy_start() is only called
once I open the interface from userspace.


-- 
Alexandre Belloni, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH net-next 0/2] tcp: final work on SYNFLOOD behavior
From: David Miller @ 2016-04-15 20:46 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet
In-Reply-To: <1460610340-22163-1-git-send-email-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 13 Apr 2016 22:05:38 -0700

> In the first patch, I remove the costly association of SYNACK+COOKIES
> to a listener. I believe other parts of the stack should be ready.
> 
> The second patch removes a useless write into listener socket
> in tcp_rcv_state_process(), incurring false sharing in
> tcp_conn_request()
> 
> Performance under SYNFLOOD goes from 3.2 Mpps to 6 Mpps.

Geese, you almost make it look too easy....

> Test was using a single TCP listener, on a host with 8 RX queues
> on the NIC, and 24 cores (48 ht)

Looks good, series applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH] qlge: Replace create_singlethread_workqueue with alloc_ordered_workqueue
From: David Miller @ 2016-04-15 20:42 UTC (permalink / raw)
  To: amitoj1606
  Cc: harish.patil, sudarsana.kalluru, Dept-GELinuxNICDev, linux-driver,
	netdev, linux-kernel, tj
In-Reply-To: <20160409115744.GA30104@amitoj-Inspiron-3542>

From: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Date: Sat, 9 Apr 2016 17:27:45 +0530

> Replace deprecated create_singlethread_workqueue with
> alloc_ordered_workqueue.
> 
> Work items include getting tx/rx frame sizes, resetting MPI processor,
> setting asic recovery bit so ordering seems necessary as only one work
> item should be in queue/executing at any given time, hence the use of
> alloc_ordered_workqueue.
> 
> WQ_MEM_RECLAIM flag has been set since ethernet devices seem to sit in
> memory reclaim path, so to guarantee forward progress regardless of 
> memory pressure.
> 
> Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
> Acked-by: Tejun Heo <tj@kernel.org>

I'll apply this to net-next, thanks.

^ permalink raw reply

* Re: [PATCH net-next v2] vxlan: synchronously and race-free destruction of vxlan sockets
From: David Miller @ 2016-04-15 20:36 UTC (permalink / raw)
  To: hannes; +Cc: netdev, eric.dumazet, jbenc, marcelo.leitner
In-Reply-To: <1460148901-23740-1-git-send-email-hannes@stressinduktion.org>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Fri,  8 Apr 2016 22:55:01 +0200

> @@ -1053,7 +1052,9 @@ static void __vxlan_sock_release(struct vxlan_sock *vs)
>  	vxlan_notify_del_rx_port(vs);
>  	spin_unlock(&vn->sock_lock);
>  
> -	queue_work(vxlan_wq, &vs->del_work);
> +	synchronize_net();
> +	udp_tunnel_sock_release(vs->sock);
> +	kfree(vs);
>  }
>  
>  static void vxlan_sock_release(struct vxlan_dev *vxlan)

I just want to make sure you saw this change in net-next:

====================
commit ca065d0cf80fa547724440a8bf37f1e674d917c0
Author: Eric Dumazet <edumazet@google.com>
Date:   Fri Apr 1 08:52:13 2016 -0700

    udp: no longer use SLAB_DESTROY_BY_RCU
====================

Does that effect your change?

^ permalink raw reply

* [PATCH net-next 3/3] ila: add checksum neutral ILA translations
From: Tom Herbert @ 2016-04-15 20:34 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team
In-Reply-To: <1460752452-3328784-1-git-send-email-tom@herbertland.com>

Support checksum neutral ILA as described in the ILA draft. The low
order 16 bits of the identifier are used to contain the checksum
adjustment value.

The csum-mode parameter is added to described checksum processing. There
are three values:
 - adjust transport checksum (previous behavior)
 - do checksum neutral mapping
 - do nothing

On output the csum-mode in the ila_params is checked and acted on. If
mode is checksum neutral mapping then to mapping and set C-bit.

On input, C-bit is checked. If it is set checksum-netural mapping is
done (regardless of csum-mode in ila params) and C-bit will be cleared.
If it is not set then action in csum-mode is taken.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/uapi/linux/ila.h  |  7 +++++
 net/ipv6/ila/ila.h        | 16 ++++++++--
 net/ipv6/ila/ila_common.c | 74 +++++++++++++++++++++++++++++++++++++++++++++--
 net/ipv6/ila/ila_lwt.c    | 14 +++++++--
 net/ipv6/ila/ila_xlat.c   | 16 +++++-----
 5 files changed, 112 insertions(+), 15 deletions(-)

diff --git a/include/uapi/linux/ila.h b/include/uapi/linux/ila.h
index abde7bb..8ac61b8 100644
--- a/include/uapi/linux/ila.h
+++ b/include/uapi/linux/ila.h
@@ -14,6 +14,7 @@ enum {
 	ILA_ATTR_LOCATOR_MATCH,			/* u64 */
 	ILA_ATTR_IFINDEX,			/* s32 */
 	ILA_ATTR_DIR,				/* u32 */
+	ILA_ATTR_CSUM_MODE,			/* u8 */
 
 	__ILA_ATTR_MAX,
 };
@@ -34,4 +35,10 @@ enum {
 #define ILA_DIR_IN	(1 << 0)
 #define ILA_DIR_OUT	(1 << 1)
 
+enum {
+	ILA_CSUM_ADJUST_TRANSPORT,
+	ILA_CSUM_NEUTRAL_MAP,
+	ILA_CSUM_NO_ACTION,
+};
+
 #endif /* _UAPI_LINUX_ILA_H */
diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
index f532967..d08fd2d 100644
--- a/net/ipv6/ila/ila.h
+++ b/net/ipv6/ila/ila.h
@@ -36,11 +36,13 @@ struct ila_identifier {
 	union {
 		struct {
 #if defined(__LITTLE_ENDIAN_BITFIELD)
-			u8 __space:5;
+			u8 __space:4;
+			u8 csum_neutral:1;
 			u8 type:3;
 #elif defined(__BIG_ENDIAN_BITFIELD)
 			u8 type:3;
-			u8 __space:5;
+			u8 csum_neutral:1;
+			u8 __space:4;
 #else
 #error  "Adjust your <asm/byteorder.h> defines"
 #endif
@@ -64,6 +66,8 @@ enum {
 	ILA_ATYPE_RSVD_3,
 };
 
+#define CSUM_NEUTRAL_FLAG	htonl(0x10000000)
+
 struct ila_addr {
 	union {
 		struct in6_addr addr;
@@ -88,6 +92,7 @@ struct ila_params {
 	struct ila_locator locator;
 	struct ila_locator locator_match;
 	__wsum csum_diff;
+	u8 csum_mode;
 };
 
 static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to)
@@ -99,8 +104,15 @@ static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to)
 	return csum_partial(diff, sizeof(diff), 0);
 }
 
+static inline bool ila_csum_neutral_set(struct ila_identifier ident)
+{
+	return !!(ident.csum_neutral);
+}
+
 void ila_update_ipv6_locator(struct sk_buff *skb, struct ila_params *p);
 
+void ila_init_saved_csum(struct ila_params *p);
+
 int ila_lwt_init(void);
 void ila_lwt_fini(void);
 int ila_xlat_init(void);
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index c3078d0..0e94042 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -17,21 +17,50 @@ static __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p)
 {
 	struct ila_addr *iaddr = ila_a2i(&ip6h->daddr);
 
-	if (iaddr->loc.v64 == p->locator_match.v64)
+	if (p->locator_match.v64)
 		return p->csum_diff;
 	else
 		return compute_csum_diff8((__be32 *)&iaddr->loc,
 					  (__be32 *)&p->locator);
 }
 
-void ila_update_ipv6_locator(struct sk_buff *skb, struct ila_params *p)
+static void ila_csum_do_neutral(struct ila_addr *iaddr,
+				struct ila_params *p)
+{
+	__sum16 *adjust = (__force __sum16 *)&iaddr->ident.v16[3];
+	__wsum diff, fval;
+
+	/* Check if checksum adjust value has been cached */
+	if (p->locator_match.v64) {
+		diff = p->csum_diff;
+	} else {
+		diff = compute_csum_diff8((__be32 *)iaddr,
+					  (__be32 *)&p->locator);
+	}
+
+	fval = (__force __wsum)(ila_csum_neutral_set(iaddr->ident) ?
+			~CSUM_NEUTRAL_FLAG : CSUM_NEUTRAL_FLAG);
+
+	diff = csum_add(diff, fval);
+
+	*adjust = ~csum_fold(csum_add(diff, csum_unfold(*adjust)));
+
+	/* Flip the csum-neutral bit. Either we are doing a SIR->ILA
+	 * translation with ILA_CSUM_NEUTRAL_MAP as the csum_method
+	 * and the C-bit is not set, or we are doing an ILA-SIR
+	 * tranlsation and the C-bit is set.
+	 */
+	iaddr->ident.csum_neutral ^= 1;
+}
+
+static void ila_csum_adjust_transport(struct sk_buff *skb,
+				      struct ila_params *p)
 {
 	__wsum diff;
 	struct ipv6hdr *ip6h = ipv6_hdr(skb);
 	struct ila_addr *iaddr = ila_a2i(&ip6h->daddr);
 	size_t nhoff = sizeof(struct ipv6hdr);
 
-	/* First update checksum */
 	switch (ip6h->nexthdr) {
 	case NEXTHDR_TCP:
 		if (likely(pskb_may_pull(skb, nhoff + sizeof(struct tcphdr)))) {
@@ -74,6 +103,45 @@ void ila_update_ipv6_locator(struct sk_buff *skb, struct ila_params *p)
 	iaddr->loc = p->locator;
 }
 
+void ila_update_ipv6_locator(struct sk_buff *skb, struct ila_params *p)
+{
+	struct ipv6hdr *ip6h = ipv6_hdr(skb);
+	struct ila_addr *iaddr = ila_a2i(&ip6h->daddr);
+
+	/* First deal with the transport checksum */
+	if (ila_csum_neutral_set(iaddr->ident)) {
+		/* C-bit is set in the locator indicating that this
+		 * is a locator being translated to a SIR address.
+		 * Perform (receiver) checksum-neutral translation.
+		 */
+		ila_csum_do_neutral(iaddr, p);
+	} else {
+		switch (p->csum_mode) {
+		case ILA_CSUM_ADJUST_TRANSPORT:
+			ila_csum_adjust_transport(skb, p);
+			break;
+		case ILA_CSUM_NEUTRAL_MAP:
+			ila_csum_do_neutral(iaddr, p);
+			break;
+		case ILA_CSUM_NO_ACTION:
+			break;
+		}
+	}
+
+	/* Now change destination address */
+	iaddr->loc = p->locator;
+}
+
+void ila_init_saved_csum(struct ila_params *p)
+{
+	if (!p->locator_match.v64)
+		return;
+
+	p->csum_diff = compute_csum_diff8(
+				(__be32 *)&p->locator_match,
+				(__be32 *)&p->locator);
+}
+
 static int __init ila_init(void)
 {
 	int ret;
diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
index 27e68de..e81e39a 100644
--- a/net/ipv6/ila/ila_lwt.c
+++ b/net/ipv6/ila/ila_lwt.c
@@ -53,6 +53,7 @@ drop:
 
 static struct nla_policy ila_nl_policy[ILA_ATTR_MAX + 1] = {
 	[ILA_ATTR_LOCATOR] = { .type = NLA_U64, },
+	[ILA_ATTR_CSUM_MODE] = { .type = NLA_U8, },
 };
 
 static int ila_build_state(struct net_device *dev, struct nlattr *nla,
@@ -79,8 +80,10 @@ static int ila_build_state(struct net_device *dev, struct nlattr *nla,
 
 	iaddr = (struct ila_addr *)&cfg6->fc_dst;
 
-	if (!ila_is_ila_addr(iaddr)) {
-		/* Don't allow setting a translation for a non-ILA address */
+	if (!ila_addr_is_ila(iaddr) || ila_csum_neutral_set(iaddr->ident)) {
+		/* Don't allow translation for a non-ILA address or checksum
+		 * neutral flag to be set.
+		 */
 		return -EINVAL;
 	}
 
@@ -108,6 +111,11 @@ static int ila_build_state(struct net_device *dev, struct nlattr *nla,
 	p->csum_diff = compute_csum_diff8(
 		(__be32 *)&p->locator_match, (__be32 *)&p->locator);
 
+	if (tb[ILA_ATTR_CSUM_MODE])
+		p->csum_mode = nla_get_u8(tb[ILA_ATTR_CSUM_MODE]);
+
+	ila_init_saved_csum(p);
+
 	newts->type = LWTUNNEL_ENCAP_ILA;
 	newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT |
 			LWTUNNEL_STATE_INPUT_REDIRECT;
@@ -124,6 +132,8 @@ static int ila_fill_encap_info(struct sk_buff *skb,
 
 	if (nla_put_u64(skb, ILA_ATTR_LOCATOR, (__force u64)p->locator.v64))
 		goto nla_put_failure;
+	if (nla_put_u64(skb, ILA_ATTR_CSUM_MODE, (__force u8)p->csum_mode))
+		goto nla_put_failure;
 
 	return 0;
 
diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
index d17d429..c0323a2 100644
--- a/net/ipv6/ila/ila_xlat.c
+++ b/net/ipv6/ila/ila_xlat.c
@@ -132,6 +132,7 @@ static struct nla_policy ila_nl_policy[ILA_ATTR_MAX + 1] = {
 	[ILA_ATTR_LOCATOR] = { .type = NLA_U64, },
 	[ILA_ATTR_LOCATOR_MATCH] = { .type = NLA_U64, },
 	[ILA_ATTR_IFINDEX] = { .type = NLA_U32, },
+	[ILA_ATTR_CSUM_MODE] = { .type = NLA_U8, },
 };
 
 static int parse_nl_config(struct genl_info *info,
@@ -147,6 +148,9 @@ static int parse_nl_config(struct genl_info *info,
 		xp->ip.locator_match.v64 = (__force __be64)nla_get_u64(
 			info->attrs[ILA_ATTR_LOCATOR_MATCH]);
 
+	if (info->attrs[ILA_ATTR_CSUM_MODE])
+		xp->ip.csum_mode = nla_get_u8(info->attrs[ILA_ATTR_CSUM_MODE]);
+
 	if (info->attrs[ILA_ATTR_IFINDEX])
 		xp->ifindex = nla_get_s32(info->attrs[ILA_ATTR_IFINDEX]);
 
@@ -249,14 +253,9 @@ static int ila_add_mapping(struct net *net, struct ila_xlat_params *xp)
 	if (!ila)
 		return -ENOMEM;
 
-	ila->xp = *xp;
+	ila_init_saved_csum(&xp->ip);
 
-	/* Precompute checksum difference for translation since we
-	 * know both the old identifier and the new one.
-	 */
-	ila->xp.ip.csum_diff = compute_csum_diff8(
-		(__be32 *)&xp->ip.locator_match,
-		(__be32 *)&xp->ip.locator);
+	ila->xp = *xp;
 
 	order = ila_order(ila);
 
@@ -406,7 +405,8 @@ static int ila_fill_info(struct ila_map *ila, struct sk_buff *msg)
 			(__force u64)ila->xp.ip.locator.v64) ||
 	    nla_put_u64(msg, ILA_ATTR_LOCATOR_MATCH,
 			(__force u64)ila->xp.ip.locator_match.v64) ||
-	    nla_put_s32(msg, ILA_ATTR_IFINDEX, ila->xp.ifindex))
+	    nla_put_s32(msg, ILA_ATTR_IFINDEX, ila->xp.ifindex) ||
+	    nla_put_u32(msg, ILA_ATTR_CSUM_MODE, ila->xp.ip.csum_mode))
 		return -1;
 
 	return 0;
-- 
2.8.0.rc2

^ permalink raw reply related

* [PATCH net-next 2/3] ila: xlat changes
From: Tom Herbert @ 2016-04-15 20:34 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team
In-Reply-To: <1460752452-3328784-1-git-send-email-tom@herbertland.com>

Change model of xlat to be used only for input where lookup is done on
the locator part of an address (comparing to locator_match as key
in rhashtable). This is needed for checksum neutral translation
which obfuscates the low order 16 bits of the identifier. It also
permits hosts to be in muliple ILA domains (each locator can map
to a different SIR address). A check is also added to disallow
translating non-ILA addresses (check of type in identifier).

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 net/ipv6/ila/ila_xlat.c | 102 ++++++++++++++++--------------------------------
 1 file changed, 34 insertions(+), 68 deletions(-)

diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
index 075c782..d17d429 100644
--- a/net/ipv6/ila/ila_xlat.c
+++ b/net/ipv6/ila/ila_xlat.c
@@ -11,9 +11,7 @@
 
 struct ila_xlat_params {
 	struct ila_params ip;
-	struct ila_identifier identifier;
 	int ifindex;
-	unsigned int dir;
 };
 
 struct ila_map {
@@ -66,35 +64,29 @@ static __always_inline void __ila_hash_secret_init(void)
 	net_get_random_once(&hashrnd, sizeof(hashrnd));
 }
 
-static inline u32 ila_identifier_hash(struct ila_identifier ident)
+static inline u32 ila_locator_hash(struct ila_locator loc)
 {
-	u32 *v = (u32 *)ident.v32;
+	u32 *v = (u32 *)loc.v32;
 
 	return jhash_2words(v[0], v[1], hashrnd);
 }
 
 static inline spinlock_t *ila_get_lock(struct ila_net *ilan,
-				       struct ila_identifier ident)
+				       struct ila_locator loc)
 {
-	return &ilan->locks[ila_identifier_hash(ident) & ilan->locks_mask];
+	return &ilan->locks[ila_locator_hash(loc) & ilan->locks_mask];
 }
 
 static inline int ila_cmp_wildcards(struct ila_map *ila,
-				    struct ila_addr *iaddr, int ifindex,
-				    unsigned int dir)
+				    struct ila_addr *iaddr, int ifindex)
 {
-	return (ila->xp.ip.locator_match.v64 &&
-		ila->xp.ip.locator_match.v64 != iaddr->loc.v64) ||
-	       (ila->xp.ifindex && ila->xp.ifindex != ifindex) ||
-	       !(ila->xp.dir & dir);
+	return (ila->xp.ifindex && ila->xp.ifindex != ifindex);
 }
 
 static inline int ila_cmp_params(struct ila_map *ila,
 				 struct ila_xlat_params *xp)
 {
-	return (ila->xp.ip.locator_match.v64 != xp->ip.locator_match.v64) ||
-	       (ila->xp.ifindex != xp->ifindex) ||
-	       (ila->xp.dir != xp->dir);
+	return (ila->xp.ifindex != xp->ifindex);
 }
 
 static int ila_cmpfn(struct rhashtable_compare_arg *arg,
@@ -102,16 +94,13 @@ static int ila_cmpfn(struct rhashtable_compare_arg *arg,
 {
 	const struct ila_map *ila = obj;
 
-	return (ila->xp.identifier.v64 != *(__be64 *)arg->key);
+	return (ila->xp.ip.locator_match.v64 != *(__be64 *)arg->key);
 }
 
 static inline int ila_order(struct ila_map *ila)
 {
 	int score = 0;
 
-	if (ila->xp.ip.locator_match.v64)
-		score += 1 << 0;
-
 	if (ila->xp.ifindex)
 		score += 1 << 1;
 
@@ -121,7 +110,7 @@ static inline int ila_order(struct ila_map *ila)
 static const struct rhashtable_params rht_params = {
 	.nelem_hint = 1024,
 	.head_offset = offsetof(struct ila_map, node),
-	.key_offset = offsetof(struct ila_map, xp.identifier),
+	.key_offset = offsetof(struct ila_map, xp.ip.locator_match),
 	.key_len = sizeof(u64), /* identifier */
 	.max_size = 1048576,
 	.min_size = 256,
@@ -140,11 +129,9 @@ static struct genl_family ila_nl_family = {
 };
 
 static struct nla_policy ila_nl_policy[ILA_ATTR_MAX + 1] = {
-	[ILA_ATTR_IDENTIFIER] = { .type = NLA_U64, },
 	[ILA_ATTR_LOCATOR] = { .type = NLA_U64, },
 	[ILA_ATTR_LOCATOR_MATCH] = { .type = NLA_U64, },
 	[ILA_ATTR_IFINDEX] = { .type = NLA_U32, },
-	[ILA_ATTR_DIR] = { .type = NLA_U32, },
 };
 
 static int parse_nl_config(struct genl_info *info,
@@ -152,10 +139,6 @@ static int parse_nl_config(struct genl_info *info,
 {
 	memset(xp, 0, sizeof(*xp));
 
-	if (info->attrs[ILA_ATTR_IDENTIFIER])
-		xp->identifier.v64 = (__force __be64)nla_get_u64(
-			info->attrs[ILA_ATTR_IDENTIFIER]);
-
 	if (info->attrs[ILA_ATTR_LOCATOR])
 		xp->ip.locator.v64 = (__force __be64)nla_get_u64(
 			info->attrs[ILA_ATTR_LOCATOR]);
@@ -167,24 +150,20 @@ static int parse_nl_config(struct genl_info *info,
 	if (info->attrs[ILA_ATTR_IFINDEX])
 		xp->ifindex = nla_get_s32(info->attrs[ILA_ATTR_IFINDEX]);
 
-	if (info->attrs[ILA_ATTR_DIR])
-		xp->dir = nla_get_u32(info->attrs[ILA_ATTR_DIR]);
-
 	return 0;
 }
 
 /* Must be called with rcu readlock */
 static inline struct ila_map *ila_lookup_wildcards(struct ila_addr *iaddr,
 						   int ifindex,
-						   unsigned int dir,
 						   struct ila_net *ilan)
 {
 	struct ila_map *ila;
 
-	ila = rhashtable_lookup_fast(&ilan->rhash_table, &iaddr->ident,
+	ila = rhashtable_lookup_fast(&ilan->rhash_table, &iaddr->loc,
 				     rht_params);
 	while (ila) {
-		if (!ila_cmp_wildcards(ila, iaddr, ifindex, dir))
+		if (!ila_cmp_wildcards(ila, iaddr, ifindex))
 			return ila;
 		ila = rcu_access_pointer(ila->next);
 	}
@@ -198,7 +177,8 @@ static inline struct ila_map *ila_lookup_by_params(struct ila_xlat_params *xp,
 {
 	struct ila_map *ila;
 
-	ila = rhashtable_lookup_fast(&ilan->rhash_table, &xp->identifier,
+	ila = rhashtable_lookup_fast(&ilan->rhash_table,
+				     &xp->ip.locator_match,
 				     rht_params);
 	while (ila) {
 		if (!ila_cmp_params(ila, xp))
@@ -226,14 +206,14 @@ static void ila_free_cb(void *ptr, void *arg)
 	}
 }
 
-static int ila_xlat_addr(struct sk_buff *skb, int dir);
+static int ila_xlat_addr(struct sk_buff *skb);
 
 static unsigned int
 ila_nf_input(void *priv,
 	     struct sk_buff *skb,
 	     const struct nf_hook_state *state)
 {
-	ila_xlat_addr(skb, ILA_DIR_IN);
+	ila_xlat_addr(skb);
 	return NF_ACCEPT;
 }
 
@@ -250,7 +230,7 @@ static int ila_add_mapping(struct net *net, struct ila_xlat_params *xp)
 {
 	struct ila_net *ilan = net_generic(net, ila_net_id);
 	struct ila_map *ila, *head;
-	spinlock_t *lock = ila_get_lock(ilan, xp->identifier);
+	spinlock_t *lock = ila_get_lock(ilan, xp->ip.locator_match);
 	int err = 0, order;
 
 	if (!ilan->hooks_registered) {
@@ -271,20 +251,19 @@ static int ila_add_mapping(struct net *net, struct ila_xlat_params *xp)
 
 	ila->xp = *xp;
 
-	if (xp->ip.locator_match.v64) {
-		/* Precompute checksum difference for translation since we
-		 * know both the old identifier and the new one.
-		 */
-		ila->xp.ip.csum_diff = compute_csum_diff8(
-			(__be32 *)&xp->ip.locator_match,
-			(__be32 *)&xp->ip.locator);
-	}
+	/* Precompute checksum difference for translation since we
+	 * know both the old identifier and the new one.
+	 */
+	ila->xp.ip.csum_diff = compute_csum_diff8(
+		(__be32 *)&xp->ip.locator_match,
+		(__be32 *)&xp->ip.locator);
 
 	order = ila_order(ila);
 
 	spin_lock(lock);
 
-	head = rhashtable_lookup_fast(&ilan->rhash_table, &xp->identifier,
+	head = rhashtable_lookup_fast(&ilan->rhash_table,
+				      &xp->ip.locator_match,
 				      rht_params);
 	if (!head) {
 		/* New entry for the rhash_table */
@@ -335,13 +314,13 @@ static int ila_del_mapping(struct net *net, struct ila_xlat_params *xp)
 {
 	struct ila_net *ilan = net_generic(net, ila_net_id);
 	struct ila_map *ila, *head, *prev;
-	spinlock_t *lock = ila_get_lock(ilan, xp->identifier);
+	spinlock_t *lock = ila_get_lock(ilan, xp->ip.locator_match);
 	int err = -ENOENT;
 
 	spin_lock(lock);
 
 	head = rhashtable_lookup_fast(&ilan->rhash_table,
-				      &xp->identifier, rht_params);
+				      &xp->ip.locator_match, rht_params);
 	ila = head;
 
 	prev = NULL;
@@ -423,14 +402,11 @@ static int ila_nl_cmd_del_mapping(struct sk_buff *skb, struct genl_info *info)
 
 static int ila_fill_info(struct ila_map *ila, struct sk_buff *msg)
 {
-	if (nla_put_u64(msg, ILA_ATTR_IDENTIFIER,
-			(__force u64)ila->xp.identifier.v64) ||
-	    nla_put_u64(msg, ILA_ATTR_LOCATOR,
+	if (nla_put_u64(msg, ILA_ATTR_LOCATOR,
 			(__force u64)ila->xp.ip.locator.v64) ||
 	    nla_put_u64(msg, ILA_ATTR_LOCATOR_MATCH,
 			(__force u64)ila->xp.ip.locator_match.v64) ||
-	    nla_put_s32(msg, ILA_ATTR_IFINDEX, ila->xp.ifindex) ||
-	    nla_put_u32(msg, ILA_ATTR_DIR, ila->xp.dir))
+	    nla_put_s32(msg, ILA_ATTR_IFINDEX, ila->xp.ifindex))
 		return -1;
 
 	return 0;
@@ -619,22 +595,24 @@ static struct pernet_operations ila_net_ops = {
 	.size = sizeof(struct ila_net),
 };
 
-static int ila_xlat_addr(struct sk_buff *skb, int dir)
+static int ila_xlat_addr(struct sk_buff *skb)
 {
 	struct ila_map *ila;
 	struct ipv6hdr *ip6h = ipv6_hdr(skb);
 	struct net *net = dev_net(skb->dev);
 	struct ila_net *ilan = net_generic(net, ila_net_id);
 	struct ila_addr *iaddr = ila_a2i(&ip6h->daddr);
-	size_t nhoff;
 
 	/* Assumes skb contains a valid IPv6 header that is pulled */
 
-	nhoff = sizeof(struct ipv6hdr);
+	if (!ila_addr_is_ila(iaddr)) {
+		/* Type indicates this is not an ILA address */
+		return 0;
+	}
 
 	rcu_read_lock();
 
-	ila = ila_lookup_wildcards(iaddr, skb->dev->ifindex, dir, ilan);
+	ila = ila_lookup_wildcards(iaddr, skb->dev->ifindex, ilan);
 	if (ila)
 		ila_update_ipv6_locator(skb, &ila->xp.ip);
 
@@ -643,18 +621,6 @@ static int ila_xlat_addr(struct sk_buff *skb, int dir)
 	return 0;
 }
 
-int ila_xlat_incoming(struct sk_buff *skb)
-{
-	return ila_xlat_addr(skb, ILA_DIR_IN);
-}
-EXPORT_SYMBOL(ila_xlat_incoming);
-
-int ila_xlat_outgoing(struct sk_buff *skb)
-{
-	return ila_xlat_addr(skb, ILA_DIR_OUT);
-}
-EXPORT_SYMBOL(ila_xlat_outgoing);
-
 int ila_xlat_init(void)
 {
 	int ret;
-- 
2.8.0.rc2

^ permalink raw reply related

* [PATCH net-next 1/3] ila: Add struct definitions and helpers
From: Tom Herbert @ 2016-04-15 20:34 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team
In-Reply-To: <1460752452-3328784-1-git-send-email-tom@herbertland.com>

Add structures for identifiers, locators, and an ila address which
is composed of a locator and identifier and in6_addr can be cast to
it. This includes a three bit type field and enums for the types defined
in ILA I-D.

In ILA lwt don't allow user to set a translation for a non-ILA
address (type of identifier is zero meaning it is an IID). This also
requires that the destination prefix is at least 65 bytes (64
bit locator and first byte of identifier).

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 net/ipv6/ila/ila.h        |  67 ++++++++++++++++++++++--
 net/ipv6/ila/ila_common.c |  11 ++--
 net/ipv6/ila/ila_lwt.c    |  39 +++++++++-----
 net/ipv6/ila/ila_xlat.c   | 126 +++++++++++++++++++++++-----------------------
 4 files changed, 161 insertions(+), 82 deletions(-)

diff --git a/net/ipv6/ila/ila.h b/net/ipv6/ila/ila.h
index 28542cb..f532967 100644
--- a/net/ipv6/ila/ila.h
+++ b/net/ipv6/ila/ila.h
@@ -23,9 +23,70 @@
 #include <net/protocol.h>
 #include <uapi/linux/ila.h>
 
+struct ila_locator {
+	union {
+		__u8            v8[8];
+		__be16          v16[4];
+		__be32          v32[2];
+		__be64		v64;
+	};
+};
+
+struct ila_identifier {
+	union {
+		struct {
+#if defined(__LITTLE_ENDIAN_BITFIELD)
+			u8 __space:5;
+			u8 type:3;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+			u8 type:3;
+			u8 __space:5;
+#else
+#error  "Adjust your <asm/byteorder.h> defines"
+#endif
+			u8 __space2[7];
+		};
+		__u8            v8[8];
+		__be16          v16[4];
+		__be32          v32[2];
+		__be64		v64;
+	};
+};
+
+enum {
+	ILA_ATYPE_IID = 0,
+	ILA_ATYPE_LUID,
+	ILA_ATYPE_VIRT_V4,
+	ILA_ATYPE_VIRT_UNI_V6,
+	ILA_ATYPE_VIRT_MULTI_V6,
+	ILA_ATYPE_RSVD_1,
+	ILA_ATYPE_RSVD_2,
+	ILA_ATYPE_RSVD_3,
+};
+
+struct ila_addr {
+	union {
+		struct in6_addr addr;
+		struct {
+			struct ila_locator loc;
+			struct ila_identifier ident;
+		};
+	};
+};
+
+static inline struct ila_addr *ila_a2i(struct in6_addr *addr)
+{
+	return (struct ila_addr *)addr;
+}
+
+static inline bool ila_addr_is_ila(struct ila_addr *iaddr)
+{
+	return (iaddr->ident.type != ILA_ATYPE_IID);
+}
+
 struct ila_params {
-	__be64 locator;
-	__be64 locator_match;
+	struct ila_locator locator;
+	struct ila_locator locator_match;
 	__wsum csum_diff;
 };
 
@@ -38,7 +99,7 @@ static inline __wsum compute_csum_diff8(const __be32 *from, const __be32 *to)
 	return csum_partial(diff, sizeof(diff), 0);
 }
 
-void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p);
+void ila_update_ipv6_locator(struct sk_buff *skb, struct ila_params *p);
 
 int ila_lwt_init(void);
 void ila_lwt_fini(void);
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index 3061305..c3078d0 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -15,17 +15,20 @@
 
 static __wsum get_csum_diff(struct ipv6hdr *ip6h, struct ila_params *p)
 {
-	if (*(__be64 *)&ip6h->daddr == p->locator_match)
+	struct ila_addr *iaddr = ila_a2i(&ip6h->daddr);
+
+	if (iaddr->loc.v64 == p->locator_match.v64)
 		return p->csum_diff;
 	else
-		return compute_csum_diff8((__be32 *)&ip6h->daddr,
+		return compute_csum_diff8((__be32 *)&iaddr->loc,
 					  (__be32 *)&p->locator);
 }
 
-void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p)
+void ila_update_ipv6_locator(struct sk_buff *skb, struct ila_params *p)
 {
 	__wsum diff;
 	struct ipv6hdr *ip6h = ipv6_hdr(skb);
+	struct ila_addr *iaddr = ila_a2i(&ip6h->daddr);
 	size_t nhoff = sizeof(struct ipv6hdr);
 
 	/* First update checksum */
@@ -68,7 +71,7 @@ void update_ipv6_locator(struct sk_buff *skb, struct ila_params *p)
 	}
 
 	/* Now change destination address */
-	*(__be64 *)&ip6h->daddr = p->locator;
+	iaddr->loc = p->locator;
 }
 
 static int __init ila_init(void)
diff --git a/net/ipv6/ila/ila_lwt.c b/net/ipv6/ila/ila_lwt.c
index 2ae3c4f..27e68de 100644
--- a/net/ipv6/ila/ila_lwt.c
+++ b/net/ipv6/ila/ila_lwt.c
@@ -26,7 +26,7 @@ static int ila_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 	if (skb->protocol != htons(ETH_P_IPV6))
 		goto drop;
 
-	update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
+	ila_update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
 
 	return dst->lwtstate->orig_output(net, sk, skb);
 
@@ -42,7 +42,7 @@ static int ila_input(struct sk_buff *skb)
 	if (skb->protocol != htons(ETH_P_IPV6))
 		goto drop;
 
-	update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
+	ila_update_ipv6_locator(skb, ila_params_lwtunnel(dst->lwtstate));
 
 	return dst->lwtstate->orig_input(skb);
 
@@ -64,11 +64,26 @@ static int ila_build_state(struct net_device *dev, struct nlattr *nla,
 	size_t encap_len = sizeof(*p);
 	struct lwtunnel_state *newts;
 	const struct fib6_config *cfg6 = cfg;
+	struct ila_addr *iaddr;
 	int ret;
 
 	if (family != AF_INET6)
 		return -EINVAL;
 
+	if (cfg6->fc_dst_len < sizeof(struct ila_locator) + 1) {
+		/* Need to have full locator and at least type field
+		 * included in destination
+		 */
+		return -EINVAL;
+	}
+
+	iaddr = (struct ila_addr *)&cfg6->fc_dst;
+
+	if (!ila_is_ila_addr(iaddr)) {
+		/* Don't allow setting a translation for a non-ILA address */
+		return -EINVAL;
+	}
+
 	ret = nla_parse_nested(tb, ILA_ATTR_MAX, nla,
 			       ila_nl_policy);
 	if (ret < 0)
@@ -84,16 +99,14 @@ static int ila_build_state(struct net_device *dev, struct nlattr *nla,
 	newts->len = encap_len;
 	p = ila_params_lwtunnel(newts);
 
-	p->locator = (__force __be64)nla_get_u64(tb[ILA_ATTR_LOCATOR]);
+	p->locator.v64 = (__force __be64)nla_get_u64(tb[ILA_ATTR_LOCATOR]);
 
-	if (cfg6->fc_dst_len > sizeof(__be64)) {
-		/* Precompute checksum difference for translation since we
-		 * know both the old locator and the new one.
-		 */
-		p->locator_match = *(__be64 *)&cfg6->fc_dst;
-		p->csum_diff = compute_csum_diff8(
-			(__be32 *)&p->locator_match, (__be32 *)&p->locator);
-	}
+	/* Precompute checksum difference for translation since we
+	 * know both the old locator and the new one.
+	 */
+	p->locator_match = iaddr->loc;
+	p->csum_diff = compute_csum_diff8(
+		(__be32 *)&p->locator_match, (__be32 *)&p->locator);
 
 	newts->type = LWTUNNEL_ENCAP_ILA;
 	newts->flags |= LWTUNNEL_STATE_OUTPUT_REDIRECT |
@@ -109,7 +122,7 @@ static int ila_fill_encap_info(struct sk_buff *skb,
 {
 	struct ila_params *p = ila_params_lwtunnel(lwtstate);
 
-	if (nla_put_u64(skb, ILA_ATTR_LOCATOR, (__force u64)p->locator))
+	if (nla_put_u64(skb, ILA_ATTR_LOCATOR, (__force u64)p->locator.v64))
 		goto nla_put_failure;
 
 	return 0;
@@ -129,7 +142,7 @@ static int ila_encap_cmp(struct lwtunnel_state *a, struct lwtunnel_state *b)
 	struct ila_params *a_p = ila_params_lwtunnel(a);
 	struct ila_params *b_p = ila_params_lwtunnel(b);
 
-	return (a_p->locator != b_p->locator);
+	return (a_p->locator.v64 != b_p->locator.v64);
 }
 
 static const struct lwtunnel_encap_ops ila_encap_ops = {
diff --git a/net/ipv6/ila/ila_xlat.c b/net/ipv6/ila/ila_xlat.c
index 0b03533..075c782 100644
--- a/net/ipv6/ila/ila_xlat.c
+++ b/net/ipv6/ila/ila_xlat.c
@@ -11,13 +11,13 @@
 
 struct ila_xlat_params {
 	struct ila_params ip;
-	__be64 identifier;
+	struct ila_identifier identifier;
 	int ifindex;
 	unsigned int dir;
 };
 
 struct ila_map {
-	struct ila_xlat_params p;
+	struct ila_xlat_params xp;
 	struct rhash_head node;
 	struct ila_map __rcu *next;
 	struct rcu_head rcu;
@@ -66,31 +66,35 @@ static __always_inline void __ila_hash_secret_init(void)
 	net_get_random_once(&hashrnd, sizeof(hashrnd));
 }
 
-static inline u32 ila_identifier_hash(__be64 identifier)
+static inline u32 ila_identifier_hash(struct ila_identifier ident)
 {
-	u32 *v = (u32 *)&identifier;
+	u32 *v = (u32 *)ident.v32;
 
 	return jhash_2words(v[0], v[1], hashrnd);
 }
 
-static inline spinlock_t *ila_get_lock(struct ila_net *ilan, __be64 identifier)
+static inline spinlock_t *ila_get_lock(struct ila_net *ilan,
+				       struct ila_identifier ident)
 {
-	return &ilan->locks[ila_identifier_hash(identifier) & ilan->locks_mask];
+	return &ilan->locks[ila_identifier_hash(ident) & ilan->locks_mask];
 }
 
-static inline int ila_cmp_wildcards(struct ila_map *ila, __be64 loc,
-				    int ifindex, unsigned int dir)
+static inline int ila_cmp_wildcards(struct ila_map *ila,
+				    struct ila_addr *iaddr, int ifindex,
+				    unsigned int dir)
 {
-	return (ila->p.ip.locator_match && ila->p.ip.locator_match != loc) ||
-	       (ila->p.ifindex && ila->p.ifindex != ifindex) ||
-	       !(ila->p.dir & dir);
+	return (ila->xp.ip.locator_match.v64 &&
+		ila->xp.ip.locator_match.v64 != iaddr->loc.v64) ||
+	       (ila->xp.ifindex && ila->xp.ifindex != ifindex) ||
+	       !(ila->xp.dir & dir);
 }
 
-static inline int ila_cmp_params(struct ila_map *ila, struct ila_xlat_params *p)
+static inline int ila_cmp_params(struct ila_map *ila,
+				 struct ila_xlat_params *xp)
 {
-	return (ila->p.ip.locator_match != p->ip.locator_match) ||
-	       (ila->p.ifindex != p->ifindex) ||
-	       (ila->p.dir != p->dir);
+	return (ila->xp.ip.locator_match.v64 != xp->ip.locator_match.v64) ||
+	       (ila->xp.ifindex != xp->ifindex) ||
+	       (ila->xp.dir != xp->dir);
 }
 
 static int ila_cmpfn(struct rhashtable_compare_arg *arg,
@@ -98,17 +102,17 @@ static int ila_cmpfn(struct rhashtable_compare_arg *arg,
 {
 	const struct ila_map *ila = obj;
 
-	return (ila->p.identifier != *(__be64 *)arg->key);
+	return (ila->xp.identifier.v64 != *(__be64 *)arg->key);
 }
 
 static inline int ila_order(struct ila_map *ila)
 {
 	int score = 0;
 
-	if (ila->p.ip.locator_match)
+	if (ila->xp.ip.locator_match.v64)
 		score += 1 << 0;
 
-	if (ila->p.ifindex)
+	if (ila->xp.ifindex)
 		score += 1 << 1;
 
 	return score;
@@ -117,7 +121,7 @@ static inline int ila_order(struct ila_map *ila)
 static const struct rhashtable_params rht_params = {
 	.nelem_hint = 1024,
 	.head_offset = offsetof(struct ila_map, node),
-	.key_offset = offsetof(struct ila_map, p.identifier),
+	.key_offset = offsetof(struct ila_map, xp.identifier),
 	.key_len = sizeof(u64), /* identifier */
 	.max_size = 1048576,
 	.min_size = 256,
@@ -144,42 +148,43 @@ static struct nla_policy ila_nl_policy[ILA_ATTR_MAX + 1] = {
 };
 
 static int parse_nl_config(struct genl_info *info,
-			   struct ila_xlat_params *p)
+			   struct ila_xlat_params *xp)
 {
-	memset(p, 0, sizeof(*p));
+	memset(xp, 0, sizeof(*xp));
 
 	if (info->attrs[ILA_ATTR_IDENTIFIER])
-		p->identifier = (__force __be64)nla_get_u64(
+		xp->identifier.v64 = (__force __be64)nla_get_u64(
 			info->attrs[ILA_ATTR_IDENTIFIER]);
 
 	if (info->attrs[ILA_ATTR_LOCATOR])
-		p->ip.locator = (__force __be64)nla_get_u64(
+		xp->ip.locator.v64 = (__force __be64)nla_get_u64(
 			info->attrs[ILA_ATTR_LOCATOR]);
 
 	if (info->attrs[ILA_ATTR_LOCATOR_MATCH])
-		p->ip.locator_match = (__force __be64)nla_get_u64(
+		xp->ip.locator_match.v64 = (__force __be64)nla_get_u64(
 			info->attrs[ILA_ATTR_LOCATOR_MATCH]);
 
 	if (info->attrs[ILA_ATTR_IFINDEX])
-		p->ifindex = nla_get_s32(info->attrs[ILA_ATTR_IFINDEX]);
+		xp->ifindex = nla_get_s32(info->attrs[ILA_ATTR_IFINDEX]);
 
 	if (info->attrs[ILA_ATTR_DIR])
-		p->dir = nla_get_u32(info->attrs[ILA_ATTR_DIR]);
+		xp->dir = nla_get_u32(info->attrs[ILA_ATTR_DIR]);
 
 	return 0;
 }
 
 /* Must be called with rcu readlock */
-static inline struct ila_map *ila_lookup_wildcards(__be64 id, __be64 loc,
+static inline struct ila_map *ila_lookup_wildcards(struct ila_addr *iaddr,
 						   int ifindex,
 						   unsigned int dir,
 						   struct ila_net *ilan)
 {
 	struct ila_map *ila;
 
-	ila = rhashtable_lookup_fast(&ilan->rhash_table, &id, rht_params);
+	ila = rhashtable_lookup_fast(&ilan->rhash_table, &iaddr->ident,
+				     rht_params);
 	while (ila) {
-		if (!ila_cmp_wildcards(ila, loc, ifindex, dir))
+		if (!ila_cmp_wildcards(ila, iaddr, ifindex, dir))
 			return ila;
 		ila = rcu_access_pointer(ila->next);
 	}
@@ -188,15 +193,15 @@ static inline struct ila_map *ila_lookup_wildcards(__be64 id, __be64 loc,
 }
 
 /* Must be called with rcu readlock */
-static inline struct ila_map *ila_lookup_by_params(struct ila_xlat_params *p,
+static inline struct ila_map *ila_lookup_by_params(struct ila_xlat_params *xp,
 						   struct ila_net *ilan)
 {
 	struct ila_map *ila;
 
-	ila = rhashtable_lookup_fast(&ilan->rhash_table, &p->identifier,
+	ila = rhashtable_lookup_fast(&ilan->rhash_table, &xp->identifier,
 				     rht_params);
 	while (ila) {
-		if (!ila_cmp_params(ila, p))
+		if (!ila_cmp_params(ila, xp))
 			return ila;
 		ila = rcu_access_pointer(ila->next);
 	}
@@ -241,11 +246,11 @@ static struct nf_hook_ops ila_nf_hook_ops[] __read_mostly = {
 	},
 };
 
-static int ila_add_mapping(struct net *net, struct ila_xlat_params *p)
+static int ila_add_mapping(struct net *net, struct ila_xlat_params *xp)
 {
 	struct ila_net *ilan = net_generic(net, ila_net_id);
 	struct ila_map *ila, *head;
-	spinlock_t *lock = ila_get_lock(ilan, p->identifier);
+	spinlock_t *lock = ila_get_lock(ilan, xp->identifier);
 	int err = 0, order;
 
 	if (!ilan->hooks_registered) {
@@ -264,22 +269,22 @@ static int ila_add_mapping(struct net *net, struct ila_xlat_params *p)
 	if (!ila)
 		return -ENOMEM;
 
-	ila->p = *p;
+	ila->xp = *xp;
 
-	if (p->ip.locator_match) {
+	if (xp->ip.locator_match.v64) {
 		/* Precompute checksum difference for translation since we
 		 * know both the old identifier and the new one.
 		 */
-		ila->p.ip.csum_diff = compute_csum_diff8(
-			(__be32 *)&p->ip.locator_match,
-			(__be32 *)&p->ip.locator);
+		ila->xp.ip.csum_diff = compute_csum_diff8(
+			(__be32 *)&xp->ip.locator_match,
+			(__be32 *)&xp->ip.locator);
 	}
 
 	order = ila_order(ila);
 
 	spin_lock(lock);
 
-	head = rhashtable_lookup_fast(&ilan->rhash_table, &p->identifier,
+	head = rhashtable_lookup_fast(&ilan->rhash_table, &xp->identifier,
 				      rht_params);
 	if (!head) {
 		/* New entry for the rhash_table */
@@ -289,7 +294,7 @@ static int ila_add_mapping(struct net *net, struct ila_xlat_params *p)
 		struct ila_map *tila = head, *prev = NULL;
 
 		do {
-			if (!ila_cmp_params(tila, p)) {
+			if (!ila_cmp_params(tila, xp)) {
 				err = -EEXIST;
 				goto out;
 			}
@@ -326,23 +331,23 @@ out:
 	return err;
 }
 
-static int ila_del_mapping(struct net *net, struct ila_xlat_params *p)
+static int ila_del_mapping(struct net *net, struct ila_xlat_params *xp)
 {
 	struct ila_net *ilan = net_generic(net, ila_net_id);
 	struct ila_map *ila, *head, *prev;
-	spinlock_t *lock = ila_get_lock(ilan, p->identifier);
+	spinlock_t *lock = ila_get_lock(ilan, xp->identifier);
 	int err = -ENOENT;
 
 	spin_lock(lock);
 
 	head = rhashtable_lookup_fast(&ilan->rhash_table,
-				      &p->identifier, rht_params);
+				      &xp->identifier, rht_params);
 	ila = head;
 
 	prev = NULL;
 
 	while (ila) {
-		if (ila_cmp_params(ila, p)) {
+		if (ila_cmp_params(ila, xp)) {
 			prev = ila;
 			ila = rcu_dereference_protected(ila->next,
 							lockdep_is_held(lock));
@@ -404,14 +409,14 @@ static int ila_nl_cmd_add_mapping(struct sk_buff *skb, struct genl_info *info)
 static int ila_nl_cmd_del_mapping(struct sk_buff *skb, struct genl_info *info)
 {
 	struct net *net = genl_info_net(info);
-	struct ila_xlat_params p;
+	struct ila_xlat_params xp;
 	int err;
 
-	err = parse_nl_config(info, &p);
+	err = parse_nl_config(info, &xp);
 	if (err)
 		return err;
 
-	ila_del_mapping(net, &p);
+	ila_del_mapping(net, &xp);
 
 	return 0;
 }
@@ -419,13 +424,13 @@ static int ila_nl_cmd_del_mapping(struct sk_buff *skb, struct genl_info *info)
 static int ila_fill_info(struct ila_map *ila, struct sk_buff *msg)
 {
 	if (nla_put_u64(msg, ILA_ATTR_IDENTIFIER,
-			(__force u64)ila->p.identifier) ||
+			(__force u64)ila->xp.identifier.v64) ||
 	    nla_put_u64(msg, ILA_ATTR_LOCATOR,
-			(__force u64)ila->p.ip.locator) ||
+			(__force u64)ila->xp.ip.locator.v64) ||
 	    nla_put_u64(msg, ILA_ATTR_LOCATOR_MATCH,
-			(__force u64)ila->p.ip.locator_match) ||
-	    nla_put_s32(msg, ILA_ATTR_IFINDEX, ila->p.ifindex) ||
-	    nla_put_u32(msg, ILA_ATTR_DIR, ila->p.dir))
+			(__force u64)ila->xp.ip.locator_match.v64) ||
+	    nla_put_s32(msg, ILA_ATTR_IFINDEX, ila->xp.ifindex) ||
+	    nla_put_u32(msg, ILA_ATTR_DIR, ila->xp.dir))
 		return -1;
 
 	return 0;
@@ -457,11 +462,11 @@ static int ila_nl_cmd_get_mapping(struct sk_buff *skb, struct genl_info *info)
 	struct net *net = genl_info_net(info);
 	struct ila_net *ilan = net_generic(net, ila_net_id);
 	struct sk_buff *msg;
-	struct ila_xlat_params p;
+	struct ila_xlat_params xp;
 	struct ila_map *ila;
 	int ret;
 
-	ret = parse_nl_config(info, &p);
+	ret = parse_nl_config(info, &xp);
 	if (ret)
 		return ret;
 
@@ -471,7 +476,7 @@ static int ila_nl_cmd_get_mapping(struct sk_buff *skb, struct genl_info *info)
 
 	rcu_read_lock();
 
-	ila = ila_lookup_by_params(&p, ilan);
+	ila = ila_lookup_by_params(&xp, ilan);
 	if (ila) {
 		ret = ila_dump_info(ila,
 				    info->snd_portid,
@@ -620,21 +625,18 @@ static int ila_xlat_addr(struct sk_buff *skb, int dir)
 	struct ipv6hdr *ip6h = ipv6_hdr(skb);
 	struct net *net = dev_net(skb->dev);
 	struct ila_net *ilan = net_generic(net, ila_net_id);
-	__be64 identifier, locator_match;
+	struct ila_addr *iaddr = ila_a2i(&ip6h->daddr);
 	size_t nhoff;
 
 	/* Assumes skb contains a valid IPv6 header that is pulled */
 
-	identifier = *(__be64 *)&ip6h->daddr.in6_u.u6_addr8[8];
-	locator_match = *(__be64 *)&ip6h->daddr.in6_u.u6_addr8[0];
 	nhoff = sizeof(struct ipv6hdr);
 
 	rcu_read_lock();
 
-	ila = ila_lookup_wildcards(identifier, locator_match,
-				   skb->dev->ifindex, dir, ilan);
+	ila = ila_lookup_wildcards(iaddr, skb->dev->ifindex, dir, ilan);
 	if (ila)
-		update_ipv6_locator(skb, &ila->p.ip);
+		ila_update_ipv6_locator(skb, &ila->xp.ip);
 
 	rcu_read_unlock();
 
-- 
2.8.0.rc2

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox