* Re: [PATCH v3 09/15] net: dsa: Add support for switch EEPROM access
From: Guenter Roeck @ 2014-10-31 2:53 UTC (permalink / raw)
To: Andrew Lunn; +Cc: netdev, David S. Miller, Florian Fainelli, linux-kernel
In-Reply-To: <20141031024044.GA4082@lunn.ch>
On 10/30/2014 07:40 PM, Andrew Lunn wrote:
>> As suspected, ethtool will attempt to read a zero-length eeprom.
>>
>> The following patch should solve the problem. Not sure if it is worth it,
>> though, since this will change behavior for existing drivers.
>
> Yes, it changes behaviour, but it does make it more consistent.
>
> Probably it should be up to core network people to decide if this is
> the write fix or leave it as is.
>
s/write/right/.
The patch shows up in the netdev patchwork. David marked it as RFC,
so we'll see where it goes.
Thanks,
Guenter
^ permalink raw reply
* Re: [PATCH] VNIC: Adding support for Cavium ThunderX network controller
From: Stephen Hemminger @ 2014-10-31 2:45 UTC (permalink / raw)
To: Robert Richter
Cc: David S. Miller, Sunil Goutham, Robert Richter, Stefan Assmann,
linux-kernel, linux-arm-kernel, netdev
In-Reply-To: <20141030165434.GW20170@rric.localhost>
On Thu, 30 Oct 2014 17:54:34 +0100
Robert Richter <rric@kernel.org> wrote:
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index 1fa99a301817..80bd3336691e 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -2324,6 +2324,8 @@
> #define PCI_DEVICE_ID_ALTIMA_AC9100 0x03ea
> #define PCI_DEVICE_ID_ALTIMA_AC1003 0x03eb
>
> +#define PCI_VENDOR_ID_CAVIUM 0x177d
I don't think PCI folks want this updated with every id anymore.
^ permalink raw reply
* Re: [PATCH v3 09/15] net: dsa: Add support for switch EEPROM access
From: Andrew Lunn @ 2014-10-31 2:40 UTC (permalink / raw)
To: Guenter Roeck
Cc: Andrew Lunn, netdev, David S. Miller, Florian Fainelli,
linux-kernel
In-Reply-To: <20141031010039.GA29492@roeck-us.net>
> As suspected, ethtool will attempt to read a zero-length eeprom.
>
> The following patch should solve the problem. Not sure if it is worth it,
> though, since this will change behavior for existing drivers.
Yes, it changes behaviour, but it does make it more consistent.
Probably it should be up to core network people to decide if this is
the write fix or leave it as is.
Andrew
^ permalink raw reply
* [PATCH net 2/2] mpls: Allow mpls_gso to be built as module
From: Pravin B Shelar @ 2014-10-30 7:50 UTC (permalink / raw)
To: davem; +Cc: netdev, Pravin B Shelar, Simon Horman
Kconfig already allows mpls to be built as module. Following patch
fixes Makefile to do same.
CC: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
net/mpls/Makefile | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/mpls/Makefile b/net/mpls/Makefile
index 0a3c171..6dec088 100644
--- a/net/mpls/Makefile
+++ b/net/mpls/Makefile
@@ -1,4 +1,4 @@
#
# Makefile for MPLS.
#
-obj-y += mpls_gso.o
+obj-$(CONFIG_NET_MPLS_GSO) += mpls_gso.o
--
1.7.1
^ permalink raw reply related
* [PATCH net 1/2] mpls: Fix mpls_gso handler.
From: Pravin B Shelar @ 2014-10-30 7:49 UTC (permalink / raw)
To: davem; +Cc: netdev, Pravin B Shelar, Simon Horman
mpls gso handler needs to pull skb after segmenting skb.
CC: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
net/mpls/mpls_gso.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index f0f5309..e3545f2 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -59,8 +59,7 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
* above pulled. It will be re-pushed after returning
* skb_mac_gso_segment(), an indirect caller of this function.
*/
- __skb_push(skb, skb->data - skb_mac_header(skb));
-
+ __skb_pull(skb, skb->data - skb_mac_header(skb));
out:
return segs;
}
--
1.7.1
^ permalink raw reply related
* Re: [PATCH v3 09/15] net: dsa: Add support for switch EEPROM access
From: Guenter Roeck @ 2014-10-31 1:00 UTC (permalink / raw)
To: Andrew Lunn; +Cc: netdev, David S. Miller, Florian Fainelli, linux-kernel
In-Reply-To: <20141030223951.GA19489@roeck-us.net>
On Thu, Oct 30, 2014 at 03:39:51PM -0700, Guenter Roeck wrote:
> On Thu, Oct 30, 2014 at 10:11:31PM +0100, Andrew Lunn wrote:
> > > +static int dsa_slave_get_eeprom_len(struct net_device *dev)
> > > +{
> > > + struct dsa_slave_priv *p = netdev_priv(dev);
> > > + struct dsa_switch *ds = p->parent;
> > > +
> > > + if (ds->pd->eeprom_len)
> > > + return ds->pd->eeprom_len;
> > > +
> > > + if (ds->drv->get_eeprom_len)
> > > + return ds->drv->get_eeprom_len(ds);
> > > +
> > > + return 0;
> > > +}
> > > +
> >
> > Hi Guenter
> >
> > I just started doing some testing with this patchset. A bit late since
> > David just accepted it, but...
> >
> > root@dir665:~# ethtool -e lan4
> > Cannot get EEPROM data: Invalid argument
> > root@dir665:~# ethtool -e eth0
> > Cannot get EEPROM data: Operation not supported
> >
> > There is no eeprom for the hardware i'm testing. Operation not
> > supported seems like a better error code and Invalid argument, and is
> > what other network drivers i tried returned.
> >
> Hi Andrew,
>
> I think the problem is that the infrastructure code (net/core/ethtool.c)
> does not accept an error from the get_eeprom_len function, but instead
> assumes that reporting eeprom data is supported if a driver provides
> the access functions. The get_eeprom_len function returns 0 in your case,
> which in ethtool_get_any_eeprom() translates to -EINVAL because user space
> either requests no data or more data than available. I wonder why user
> space requests anything in the first place; I would have assumed that it
> reads the driver information first and is told that the eeprom length is 0,
> but I guess that is a different question.
>
> I quickly browsed through a couple of other drivers supporting get_eprom_len,
> and they all return 0 if there is no eeprom. Doesn't that mean that they all
> end up reporting -EINVAL if an attempt is made to read the eeprom ?
>
> The only solution that comes to my mind would be to have the infrastructure
> code check the return value from get_eeprom_len and return -EOPNOTSUPP
> if the reported eeprom length is 0. That would be an infrastructure change,
> though. Does that sound reasonable, or do you have a better idea ?
>
> In parallel, I'll have a look into the ethtool command to see why it
> requests eeprom data even though the reported eeprom length is 0.
>
As suspected, ethtool will attempt to read a zero-length eeprom.
The following patch should solve the problem. Not sure if it is worth it,
though, since this will change behavior for existing drivers.
Thanks,
Guenter
---
From: Guenter Roeck <linux@roeck-us.net>
Date: Thu, 30 Oct 2014 17:51:34 -0700
Subject: [RFC PATCH] net: ethtool: Return -EOPNOTSUPP if user space tries to read
EEPROM with lengh 0
If a driver supports reading EEPROM but no EEPROM is installed in the system,
the driver's get_eeprom_len function will return 0. ethtool will subsequently
try to read that zero-length EEPROM anyway. If the driver does not support
EEPROM access at all, this operation will return -EOPNOTSUPP. If the driver
does support EEPROM access but no EEPROM is installed, the operation will
return -EINVAL. Return -EOPNOTSUPP in both cases for consistency.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
net/core/ethtool.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 1600aa2..06dfb29 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1036,7 +1036,8 @@ static int ethtool_get_eeprom(struct net_device *dev, void __user *useraddr)
{
const struct ethtool_ops *ops = dev->ethtool_ops;
- if (!ops->get_eeprom || !ops->get_eeprom_len)
+ if (!ops->get_eeprom || !ops->get_eeprom_len ||
+ !ops->get_eeprom_len(dev))
return -EOPNOTSUPP;
return ethtool_get_any_eeprom(dev, useraddr, ops->get_eeprom,
@@ -1052,7 +1053,8 @@ static int ethtool_set_eeprom(struct net_device *dev, void __user *useraddr)
u8 *data;
int ret = 0;
- if (!ops->set_eeprom || !ops->get_eeprom_len)
+ if (!ops->set_eeprom || !ops->get_eeprom_len ||
+ !ops->get_eeprom_len(dev))
return -EOPNOTSUPP;
if (copy_from_user(&eeprom, useraddr, sizeof(eeprom)))
--
1.9.1
^ permalink raw reply related
* Re: [PATCH net-next 8/8] net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE
From: Yann Ylavic @ 2014-10-31 0:38 UTC (permalink / raw)
To: Or Gerlitz
Cc: David S. Miller, netdev, Matan Barak, Amir Vadai, Saeed Mahameed,
Shani Michaeli, Jerry Chu
In-Reply-To: <1414685216-28907-9-git-send-email-ogerlitz@mellanox.com>
Hi,
On Thu, Oct 30, 2014 at 5:06 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
[...]
> +static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, int hwtstamp_rx_filter)
> +{
> + __wsum hw_checksum = 0;
> +
> + void *hdr = (u8 *)skb->data + sizeof(struct ethhdr);
> +
> + hw_checksum = csum_unfold((__force __sum16)cqe->checksum);
> +
> + if (((struct ethhdr *)skb->data)->h_proto == htons(ETH_P_8021Q) &&
> + hwtstamp_rx_filter != HWTSTAMP_FILTER_NONE) {
> + /* next protocol non IPv4 or IPv6 */
> + if (((struct vlan_hdr *)hdr)->h_vlan_encapsulated_proto
> + != htons(ETH_P_IP) ||
Shouldn't this be a AND (&&)?
> + ((struct vlan_hdr *)hdr)->h_vlan_encapsulated_proto
> + != htons(ETH_P_IPV6))
> + return -1;
Regards,
Yann.
^ permalink raw reply
* Re: Sporadic ESP payload corruption when using IPSec in NAT-T Transport Mode
From: Evan Gilman @ 2014-10-31 0:05 UTC (permalink / raw)
To: Herbert Xu; +Cc: Steffen Klassert, linux-kernel, netdev
In-Reply-To: <20140630132119.GA19500@gondor.apana.org.au>
Indeed, I am using aesni-intel. I have again been bitten by this
problem, but do not have the cycles to pinpoint the kernel version in
which the trouble was introduced. I have done a bit more research, and
have found that hosts running under Xen 4.4.2 are not affected
(regardless of kernel version), while hosts under Xen 4.1.6 and Xen
3.4.3 are affected. The latter is the version we are observing in AWS,
and ami-6d6b6028 (official Ubuntu Trusty image) is affected
out-of-the-box, with the latest kernel available for Trusty (linux
3.13.0). I can also confirm that the corruption ceases to occur after
unloading the aesni-intel kernel module.
I have been using the following test to identify hosts which are
affected, where hostA is known to be unaffected:
-- evan@hostA:~ $ dd if=/dev/zero | nc hostB 8080
2530292+0 records in
2530291+0 records out
1295508992 bytes (1.3 GB) copied, 413.288 s, 3.1 MB/s
^C-- evan@hostA:~ $
...
-- evan@hostB:~ $ nc -l 8080 | xxd -a
0000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
*
189edea0:0000 1e30 e75c a3ef ab8b 8723 781c a4eb ...0.\.....#x...
189edeb0:6527 1e30 e75c a3ef ab8b 8723 781c a4eb e'.0.\.....#x...
189edec0:6527 1e30 e75c a3ef ab8b 8723 781c a4eb e'.0.\.....#x...
189eded0:6527 1e30 e75c a3ef ab8b 8723 781c a4eb e'.0.\.....#x...
189edee0:6527 9d05 f655 6228 1366 5365 a932 2841 e'...Ub(.fSe.2(A
189edef0:2663 0000 0000 0000 0000 0000 0000 0000 &c..............
189edf00:0000 0000 0000 0000 0000 0000 0000 0000 ................
*
4927d4e0:5762 b190 5b5d db75 cb39 accd 5b73 982b Wb..[].u.9..[s.+
4927d4f0:5762 b190 5b5d db75 cb39 accd 5b73 982b Wb..[].u.9..[s.+
4927d500:5762 b190 5b5d db75 cb39 accd 5b73 982b Wb..[].u.9..[s.+
4927d510:5762 b190 5b5d db75 cb39 accd 5b73 982b Wb..[].u.9..[s.+
4927d520:01db 332d cf4b 3804 6f9c a5ad b9c8 0932 ..3-.K8.o......2
4927d530:0000 0000 0000 0000 0000 0000 0000 0000 ................
*
4bb51110:0000 54f8 a1cb 8f0d e916 80a2 0768 3bd3 ..T..........h;.
4bb51120:3794 54f8 a1cb 8f0d e916 80a2 0768 3bd3 7.T..........h;.
4bb51130:3794 54f8 a1cb 8f0d e916 80a2 0768 3bd3 7.T..........h;.
4bb51140:3794 54f8 a1cb 8f0d e916 80a2 0768 3bd3 7.T..........h;.
4bb51150:3794 20a0 1e44 ae70 25b7 7768 7d1d 38b1 7. ..D.p%.wh}.8.
4bb51160:8191 0000 0000 0000 0000 0000 0000 0000 ................
4bb51170:0000 0000 0000 0000 0000 0000 0000 0000 ................
*
4de3d390:0000 0000 0000 ......
-- evan@hostB:~ $
I hope that this simple test will aide others in reproducing the issue
and/or identifying if they are also affected.
It is possible that the issue has gone unnoticed by many as lots of
applications will gracefully handle the case. We just happened to hit
a bug in our application which failed to check the bound of a
particular value in it's protocol, causing the thread to OOM when it
tried to allocate memory for the bogus value.
Since the corruption can be cured by changing either Xen version or
Linux kernel version, could this be a bug in the interaction between
aesni-intel and Xen itself? If so, it might stand that a fix could be
shipped with a future kernel update, which would be great for people
like us whom cannot control nor convince our providers to upgrade Xen
(i.e. AWS).
I tried to find a reference to the previous report of aesni-intel
causing IPSec corruption under Xen - I'd be interested to read it if
anyone here has it on hand. For now, we are looking to blacklist
aesni-intel as we have no other suitable solution, and when combined
with our other bug, has a detrimental effect on our infrastructure.
On Mon, Jun 30, 2014 at 6:21 AM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Mon, Jun 30, 2014 at 01:33:24PM +0200, Steffen Klassert wrote:
>> Ccing netdev.
>>
>> On Thu, Jun 26, 2014 at 02:12:30PM -0700, Evan Gilman wrote:
>> > Hi all
>> > We have a couple Ubuntu 10.04 hosts with kernel version 3.14.5 which are
>> > experiencing TCP payload corruption when using IPSec in NAT-T transport
>> > mode. All are running under Xen at third party providers. When
>> > communicating with other hosts using IPSec, we see that these corrupt TCP
>> > PDUs are still being received by the remote listener, even though the TCP
>> > checksum is invalid.
>> > All other checksums (IPSec authentication header and IP checksum) are
>> > good. So, we are thinking that corruption is happening during the ESP
>> > encapsulation and decapsulation phase (IPSec required for reproduction).
>> > The corruption occurs sporadically, and we have not found any one
>> > payload/packet combination that will reliably trigger it, though we can
>> > typically reproduce it in less than 30 minutes. We can do it very simply
>> > by reading from /dev/zero with dd and piping through netcat. It occurs
>> > whenever a 3.14.5 kernel is involved at either end of the conversation. I
>> > can send captures to those who are interested. Does any of this sound
>> > familiar?
>>
>> I can't remember anyone reporting such problems, but maybe someone
>> else does.
>
> I have seen one report where a Xen guest experienced IPsec corruption
> when using aesni-intel. However, in that case the corruption was at
> the authentication level. Are you using aesni-intel by any chance?
>
> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
evan
^ permalink raw reply
* Re: [PATCH net-next] hyperv: Add IPv6 into the hash computation for vRSS
From: David Miller @ 2014-10-31 0:02 UTC (permalink / raw)
To: haiyangz; +Cc: olaf, netdev, jasowang, driverdev-devel, linux-kernel
In-Reply-To: <1414703237-11510-1-git-send-email-haiyangz@microsoft.com>
From: Haiyang Zhang <haiyangz@microsoft.com>
Date: Thu, 30 Oct 2014 14:07:17 -0700
> This will allow the workload spreading via vRSS for IPv6.
>
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Applied.
^ permalink raw reply
* Re: [PATCH v2 net 0/2] drivers/net,ipv6: Fix IPv6 fragment ID selection for virtio
From: David Miller @ 2014-10-31 0:02 UTC (permalink / raw)
To: ben; +Cc: netdev, hannes, virtualization
In-Reply-To: <1414693592.16849.61.camel@decadent.org.uk>
From: Ben Hutchings <ben@decadent.org.uk>
Date: Thu, 30 Oct 2014 18:26:32 +0000
> The virtio net protocol supports UFO but does not provide for passing a
> fragment ID for fragmentation of IPv6 packets. We used to generate a
> fragment ID wherever such a packet was fragmented, but currently we
> always use ID=0!
>
> v2: Add blank lines after declarations
Series applied and queued up for -stable, thanks Ben.
^ permalink raw reply
* Re: [PATCH] net: skb_fclone_busy() needs to detect orphaned skb
From: David Miller @ 2014-10-30 23:59 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev, ncardwell
In-Reply-To: <1414690354.9028.9.camel@edumazet-glaptop2.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 30 Oct 2014 10:32:34 -0700
> From: Eric Dumazet <edumazet@google.com>
>
> Some drivers are unable to perform TX completions in a bound time.
> They instead call skb_orphan()
>
> Problem is skb_fclone_busy() has to detect this case, otherwise
> we block TCP retransmits and can freeze unlucky tcp sessions on
> mostly idle hosts.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Fixes: 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host queues")
Applied, and queued up for -stable, thanks Eric.
> This problem is known to hurt users of linux-3.16 kernels used by guests kernels.
> David, I can provide backports if you want.
Since 3.16 is no longer active, I'll only need to put this into 3.17-stable
which I should be able to handle on my own.
But thanks for offering, sometimes difficult backports take up a lot
of time.
^ permalink raw reply
* Re: [PATCH net-next 8/8] net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE
From: Tom Herbert @ 2014-10-30 23:59 UTC (permalink / raw)
To: Or Gerlitz
Cc: David S. Miller, Linux Netdev List, Matan Barak, Amir Vadai,
Saeed Mahameed, Shani Michaeli, Jerry Chu
In-Reply-To: <1414685216-28907-9-git-send-email-ogerlitz@mellanox.com>
On Thu, Oct 30, 2014 at 9:06 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> From: Shani Michaeli <shanim@mellanox.com>
>
> When processing received traffic, pass CHECKSUM_COMPLETE status to the
> stack, with calculated checksum for non TCP/UDP packets (such
> as GRE or ICMP).
>
Hi Or,
This is very exciting work! One question though, what would mlx4
return in the case of a zero UDP checksum? (I assume this patch won't
affect this case but would like to make sure).
Thanks,
Tom
> Although the stack expects checksum which doesn't include the pseudo
> header, the HW adds it. To address that, we are subtracting the pseudo
> header checksum from the checksum value provided by the HW.
>
> In the IPv6 case, we also compute/add the IP header checksum which
> is not added by the HW for such packets.
>
> Cc: Jerry Chu <hkchu@google.com>
> Signed-off-by: Shani Michaeli <shanim@mellanox.com>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> ---
> drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 2 +-
> drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 5 +
> drivers/net/ethernet/mellanox/mlx4/en_port.c | 2 +
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 116 +++++++++++++++++++++-
> drivers/net/ethernet/mellanox/mlx4/main.c | 9 ++
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 5 +-
> include/linux/mlx4/device.h | 1 +
> 7 files changed, 132 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> index 8ea4d5b..6c64323 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
> @@ -115,7 +115,7 @@ static const char main_strings[][ETH_GSTRING_LEN] = {
> "tso_packets",
> "xmit_more",
> "queue_stopped", "wake_queue", "tx_timeout", "rx_alloc_failed",
> - "rx_csum_good", "rx_csum_none", "tx_chksum_offload",
> + "rx_csum_good", "rx_csum_none", "rx_csum_complete", "tx_chksum_offload",
>
> /* packet statistics */
> "broadcast", "rx_prio_0", "rx_prio_1", "rx_prio_2", "rx_prio_3",
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 0efbae9..d1eb25d 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -1893,6 +1893,7 @@ static void mlx4_en_clear_stats(struct net_device *dev)
> priv->rx_ring[i]->packets = 0;
> priv->rx_ring[i]->csum_ok = 0;
> priv->rx_ring[i]->csum_none = 0;
> + priv->rx_ring[i]->csum_complete = 0;
> }
> }
>
> @@ -2503,6 +2504,10 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
> /* Query for default mac and max mtu */
> priv->max_mtu = mdev->dev->caps.eth_mtu_cap[priv->port];
>
> + if (mdev->dev->caps.rx_checksum_flags_port[priv->port] &
> + MLX4_RX_CSUM_MODE_VAL_NON_TCP_UDP)
> + priv->flags |= MLX4_EN_FLAG_RX_CSUM_NON_TCP_UDP;
> +
> /* Set default MAC */
> dev->addr_len = ETH_ALEN;
> mlx4_en_u64_to_mac(dev->dev_addr, mdev->dev->caps.def_mac[priv->port]);
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c
> index 134b12e..6cb8007 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
> @@ -155,11 +155,13 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
> stats->rx_bytes = 0;
> priv->port_stats.rx_chksum_good = 0;
> priv->port_stats.rx_chksum_none = 0;
> + priv->port_stats.rx_chksum_complete = 0;
> for (i = 0; i < priv->rx_ring_num; i++) {
> stats->rx_packets += priv->rx_ring[i]->packets;
> stats->rx_bytes += priv->rx_ring[i]->bytes;
> priv->port_stats.rx_chksum_good += priv->rx_ring[i]->csum_ok;
> priv->port_stats.rx_chksum_none += priv->rx_ring[i]->csum_none;
> + priv->port_stats.rx_chksum_complete += priv->rx_ring[i]->csum_complete;
> }
> stats->tx_packets = 0;
> stats->tx_bytes = 0;
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 2a29a1a..f8a0449 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -42,6 +42,10 @@
> #include <linux/vmalloc.h>
> #include <linux/irq.h>
>
> +#if IS_ENABLED(CONFIG_IPV6)
> +#include <net/ip6_checksum.h>
> +#endif
> +
> #include "mlx4_en.h"
>
> static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
> @@ -642,6 +646,86 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
> }
> }
>
> +/* When hardware doesn't strip the vlan, we need to calculate the checksum
> + * over it and add it to the hardware's checksum calculation
> + */
> +static inline __wsum get_fixed_vlan_csum(__wsum hw_checksum,
> + struct vlan_hdr *vlanh)
> +{
> + return csum_add(hw_checksum, *(__wsum *)vlanh);
> +}
> +
> +/* Although the stack expects checksum which doesn't include the pseudo
> + * header, the HW adds it. To address that, we are subtracting the pseudo
> + * header checksum from the checksum value provided by the HW.
> + */
> +static void get_fixed_ipv4_csum(__wsum hw_checksum, struct sk_buff *skb,
> + struct iphdr *iph)
> +{
> + __u16 length_for_csum = 0;
> + __wsum csum_pseudo_header = 0;
> +
> + length_for_csum = (be16_to_cpu(iph->tot_len) - (iph->ihl << 2));
> + csum_pseudo_header = csum_tcpudp_nofold(iph->saddr, iph->daddr,
> + length_for_csum, iph->protocol, 0);
> + skb->csum = csum_sub(hw_checksum, csum_pseudo_header);
> +}
> +
> +#if IS_ENABLED(CONFIG_IPV6)
> +/* In IPv6 packets, besides subtracting the pseudo header checksum,
> + * we also compute/add the IP header checksum which
> + * is not added by the HW.
> + */
> +static int get_fixed_ipv6_csum(__wsum hw_checksum, struct sk_buff *skb,
> + struct ipv6hdr *ipv6h)
> +{
> + __wsum csum_pseudo_header = 0;
> +
> + if (ipv6h->nexthdr == IPPROTO_FRAGMENT || ipv6h->nexthdr == IPPROTO_HOPOPTS)
> + return -1;
> + hw_checksum = csum_add(hw_checksum, (__force __wsum)(ipv6h->nexthdr << 8));
> +
> + csum_pseudo_header = csum_ipv6_magic_nofold(&ipv6h->saddr,
> + &ipv6h->daddr,
> + ntohs(ipv6h->payload_len),
> + ipv6h->nexthdr,
> + 0);
> + skb->csum = csum_sub(hw_checksum, csum_pseudo_header);
> + skb->csum = csum_add(skb->csum, csum_partial(ipv6h, sizeof(struct ipv6hdr), 0));
> + return 0;
> +}
> +#endif
> +
> +static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, int hwtstamp_rx_filter)
> +{
> + __wsum hw_checksum = 0;
> +
> + void *hdr = (u8 *)skb->data + sizeof(struct ethhdr);
> +
> + hw_checksum = csum_unfold((__force __sum16)cqe->checksum);
> +
> + if (((struct ethhdr *)skb->data)->h_proto == htons(ETH_P_8021Q) &&
> + hwtstamp_rx_filter != HWTSTAMP_FILTER_NONE) {
> + /* next protocol non IPv4 or IPv6 */
> + if (((struct vlan_hdr *)hdr)->h_vlan_encapsulated_proto
> + != htons(ETH_P_IP) ||
> + ((struct vlan_hdr *)hdr)->h_vlan_encapsulated_proto
> + != htons(ETH_P_IPV6))
> + return -1;
> + hw_checksum = get_fixed_vlan_csum(hw_checksum, hdr);
> + hdr += sizeof(struct vlan_hdr);
> + }
> +
> + if (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPV4))
> + get_fixed_ipv4_csum(hw_checksum, skb, hdr);
> +#if IS_ENABLED(CONFIG_IPV6)
> + else if (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPV6))
> + if (get_fixed_ipv6_csum(hw_checksum, skb, hdr))
> + return -1;
> +#endif
> + return 0;
> +}
> +
> int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
> {
> struct mlx4_en_priv *priv = netdev_priv(dev);
> @@ -743,13 +827,26 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
> (cqe->vlan_my_qpn & cpu_to_be32(MLX4_CQE_L2_TUNNEL));
>
> if (likely(dev->features & NETIF_F_RXCSUM)) {
> - if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
> - (cqe->checksum == cpu_to_be16(0xffff))) {
> - ring->csum_ok++;
> - ip_summed = CHECKSUM_UNNECESSARY;
> + if (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_TCP |
> + MLX4_CQE_STATUS_UDP)) {
> + if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
> + cqe->checksum == cpu_to_be16(0xffff)) {
> + ip_summed = CHECKSUM_UNNECESSARY;
> + ring->csum_ok++;
> + } else {
> + ip_summed = CHECKSUM_NONE;
> + ring->csum_none++;
> + }
> } else {
> - ip_summed = CHECKSUM_NONE;
> - ring->csum_none++;
> + if (priv->flags & MLX4_EN_FLAG_RX_CSUM_NON_TCP_UDP &&
> + (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPV4 |
> + MLX4_CQE_STATUS_IPV6))) {
> + ip_summed = CHECKSUM_COMPLETE;
> + ring->csum_complete++;
> + } else {
> + ip_summed = CHECKSUM_NONE;
> + ring->csum_none++;
> + }
> }
> } else {
> ip_summed = CHECKSUM_NONE;
> @@ -767,6 +864,13 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
> goto next;
> }
>
> + if (ip_summed == CHECKSUM_COMPLETE) {
> + if (check_csum(cqe, skb, ring->hwtstamp_rx_filter)) {
> + ip_summed = CHECKSUM_NONE;
> + ring->csum_none++;
> + }
> + }
> +
> skb->ip_summed = ip_summed;
> skb->protocol = eth_type_trans(skb, dev);
> skb_record_rx_queue(skb, cq->ring);
> diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
> index 9f82196..2f6ba42 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/main.c
> @@ -1629,6 +1629,7 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
> struct mlx4_init_hca_param init_hca;
> u64 icm_size;
> int err;
> + struct mlx4_config_dev_params params;
>
> if (!mlx4_is_slave(dev)) {
> err = mlx4_QUERY_FW(dev);
> @@ -1762,6 +1763,14 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
> goto unmap_bf;
> }
>
> + /* Query CONFIG_DEV parameters */
> + err = mlx4_config_dev_retrieval(dev, ¶ms);
> + if (err && err != -ENOTSUPP) {
> + mlx4_err(dev, "Failed to query CONFIG_DEV parameters\n");
> + } else if (!err) {
> + dev->caps.rx_checksum_flags_port[1] = params.rx_csum_flags_port_1;
> + dev->caps.rx_checksum_flags_port[2] = params.rx_csum_flags_port_2;
> + }
> priv->eq_table.inta_pin = adapter.inta_pin;
> memcpy(dev->board_id, adapter.board_id, sizeof dev->board_id);
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index ef83d12..de45674 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -326,6 +326,7 @@ struct mlx4_en_rx_ring {
> #endif
> unsigned long csum_ok;
> unsigned long csum_none;
> + unsigned long csum_complete;
> int hwtstamp_rx_filter;
> cpumask_var_t affinity_mask;
> };
> @@ -449,6 +450,7 @@ struct mlx4_en_port_stats {
> unsigned long rx_alloc_failed;
> unsigned long rx_chksum_good;
> unsigned long rx_chksum_none;
> + unsigned long rx_chksum_complete;
> unsigned long tx_chksum_offload;
> #define NUM_PORT_STATS 9
> };
> @@ -507,7 +509,8 @@ enum {
> MLX4_EN_FLAG_ENABLE_HW_LOOPBACK = (1 << 2),
> /* whether we need to drop packets that hardware loopback-ed */
> MLX4_EN_FLAG_RX_FILTER_NEEDED = (1 << 3),
> - MLX4_EN_FLAG_FORCE_PROMISC = (1 << 4)
> + MLX4_EN_FLAG_FORCE_PROMISC = (1 << 4),
> + MLX4_EN_FLAG_RX_CSUM_NON_TCP_UDP = (1 << 5),
> };
>
> #define MLX4_EN_MAC_HASH_SIZE (1 << BITS_PER_BYTE)
> diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
> index 5cc5eac..3d9bff0 100644
> --- a/include/linux/mlx4/device.h
> +++ b/include/linux/mlx4/device.h
> @@ -497,6 +497,7 @@ struct mlx4_caps {
> u16 hca_core_clock;
> u64 phys_port_id[MLX4_MAX_PORTS + 1];
> int tunnel_offload_mode;
> + u8 rx_checksum_flags_port[MLX4_MAX_PORTS + 1];
> };
>
> struct mlx4_buf_list {
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCHv2 net-next 0/2] sunvnet: Use multiple Tx queues.
From: David Miller @ 2014-10-30 23:57 UTC (permalink / raw)
To: sowmini.varadhan; +Cc: netdev
In-Reply-To: <20141030164547.GE650@oracle.com>
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Thu, 30 Oct 2014 12:45:47 -0400
>
> v2: moved tcp fix out of this series per David Miller feedback
>
> The primary objective of this patch-set is to address the suggestion from
> http://marc.info/?l=linux-netdev&m=140790778931563&w=2
> With the changes in Patch 2, every vnet_port will get packets from
> a single tx-queue, and flow-control/head-of-line-blocking is
> confined to the vnet_ports that share that tx queue (as opposed to
> flow-controlling *all* peers).
>
> Patch 1 is an optimization that resets the DATA_READY bit when
> we re-enable Rx interrupts. This optimization lets us exit quickly
> from vnet_event_napi() when new data has not triggered an interrupt.
This looks great, series applied to net-next, thanks.
^ permalink raw reply
* Re: [PATCH net-next] tcp: Correction to RFC number in comment
From: David Miller @ 2014-10-30 23:54 UTC (permalink / raw)
To: sowmini.varadhan; +Cc: netdev
In-Reply-To: <20141030164808.GH650@oracle.com>
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Thu, 30 Oct 2014 12:48:08 -0400
> Challenge ACK is described in RFC 5961, fix typo.
>
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Applied, thanks.
^ permalink raw reply
* DONATION!!!
From: Mrs Birgit Rausing @ 2014-10-30 23:53 UTC (permalink / raw)
I,Birgit authenticate this email, you can read about me on:
http://en.wikipedia.org/wiki/Birgit_Rausing
I have funds for you to manage and disburse to various charities of your
choice. If you are sure you can handle this, it will be of help to you and
others. Please reply if you are interested for more details.please
Contact my private email;( mrs_BirgitRausin0@qq.com ) for more
information
With love,
Mrs Birgit Rausing
^ permalink raw reply
* Re: [PATCH net-next 8/8] net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE
From: Jerry Chu @ 2014-10-30 23:53 UTC (permalink / raw)
To: Or Gerlitz
Cc: Or Gerlitz, David S. Miller, netdev@vger.kernel.org, Matan Barak,
Amir Vadai, Saeed Mahameed, Shani Michaeli
In-Reply-To: <CAJ3xEMgC75anCmeKie8NZBZHfR8OW67pBiPQhyLUPK9cNJZHMg@mail.gmail.com>
On Thu, Oct 30, 2014 at 4:28 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>
> On Thu, Oct 30, 2014 at 11:20 PM, Jerry Chu <hkchu@google.com> wrote:
> > Acked-by: H.K. Jerry Chu <hkchu@google.com>
> >
> > BTW, will the patch work for all versions of the chip?
>
> If you'll look carefully, you'll see we go that path only when
> priv->flags & MLX4_EN_FLAG_RX_CSUM_NON_TCP_UDP is true. This currently
> holds only for ConnectX3 and not ConnectX3-pro. Down the road, the
> feature will also be available for the -pro too.
It didn't dawn on me that flag will be tied to CX3 but this makes sense.
Thanks,
Jerry
^ permalink raw reply
* Re: [PATCH net] gre: Use inner mac length when computing tunnel length
From: David Miller @ 2014-10-30 23:52 UTC (permalink / raw)
To: therbert; +Cc: alexander.duyck, netdev
In-Reply-To: <1414683656-26493-1-git-send-email-therbert@google.com>
From: Tom Herbert <therbert@google.com>
Date: Thu, 30 Oct 2014 08:40:56 -0700
> Currently, skb_inner_network_header is used but this does not account
> for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which
> handles TEB and also should work with IP encapsulation in which case
> inner mac and inner network headers are the same.
>
> Tested: Ran TCP_STREAM over GRE, worked as expected.
>
> Signed-off-by: Tom Herbert <therbert@google.com>
Applied and queued up for -stable.
Thanks everyone.
^ permalink raw reply
* DONATION!!!
From: Mrs Birgit Rausing @ 2014-10-30 23:51 UTC (permalink / raw)
I,Birgit authenticate this email, you can read about me on:
http://en.wikipedia.org/wiki/Birgit_Rausing
I have funds for you to manage and disburse to various charities of your
choice. If you are sure you can handle this, it will be of help to you and
others. Please reply if you are interested for more details.please
Contact my private email;( mrs_BirgitRausin0@qq.com ) for more
information
With love,
Mrs Birgit Rausing
^ permalink raw reply
* Re: [PATCH net 0/2] mlx4 driver encapsulation/steering fixes
From: David Miller @ 2014-10-30 23:49 UTC (permalink / raw)
To: ogerlitz; +Cc: netdev, matanb, amirv, saeedm
In-Reply-To: <1414677568-28409-1-git-send-email-ogerlitz@mellanox.com>
From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Thu, 30 Oct 2014 15:59:26 +0200
> The 1st patch fixes a bug in the TX path that supports offloading the
> TX checksum of (VXLAN) encapsulated TCP packets. It turns out that the
> bug is revealed only when the receiver runs in non-offloaded mode, so
> we somehow missed it so far... please queue it for -stable >= 3.14
>
> The 2nd patch makes sure not to leak steering entry on error flow,
> please queue it to 3.17-stable
Applied and queue up for -stable.
^ permalink raw reply
* Re: [net 0/4][pull request] Intel Wired LAN Driver Updates 2014-10-30
From: David Miller @ 2014-10-30 23:47 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <1414672436-20616-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 30 Oct 2014 05:33:52 -0700
> This series contains updates to e1000, igb and ixgbe.
>
> Francesco Ruggeri fixes an issue with e1000 where in a VM the driver did
> not support unicast filtering.
>
> Roman Gushchin fixes an issue with igb where the driver was re-using
> mapped pages so that packets were still getting dropped even if all
> the memory issues are gone and there is free memory.
>
> Junwei Zhang found where in the ixgbe_clean_rx_ring() we were repeating
> the assignment of NULL to the receive buffer skb and fixes it.
>
> Emil fixes a race condition between setup_link and SFP detection routine
> in the watchdog when setting the advertised speed.
Pulled, thanks Jeff.
^ permalink raw reply
* DONATION!!!
From: Mrs Birgit Rausing @ 2014-10-30 23:45 UTC (permalink / raw)
I,Birgit authenticate this email, you can read about me on:
http://en.wikipedia.org/wiki/Birgit_Rausing
I have funds for you to manage and disburse to various charities of your
choice. If you are sure you can handle this, it will be of help to you and
others. Please reply if you are interested for more details.please
Contact my private email;( mrs_BirgitRausin0@qq.com ) for more
information
With love,
Mrs Birgit Rausing
^ permalink raw reply
* Re: [PATCH] stmmac: pci: set default of the filter bins
From: David Miller @ 2014-10-30 23:44 UTC (permalink / raw)
To: andriy.shevchenko; +Cc: peppe.cavallaro, netdev, hock.leong.kweh, vbridgers2013
In-Reply-To: <1414661965-1140-1-git-send-email-andriy.shevchenko@linux.intel.com>
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date: Thu, 30 Oct 2014 11:39:25 +0200
> @@ -32,7 +32,10 @@ static struct stmmac_dma_cfg dma_cfg;
>
> static void stmmac_default_data(void)
> {
> + struct plat_stmmacenet_data *plat = &plat_dat;
> +
> memset(&plat_dat, 0, sizeof(struct plat_stmmacenet_data));
> +
> plat_dat.bus_id = 1;
> plat_dat.phy_addr = 0;
> plat_dat.interface = PHY_INTERFACE_MODE_GMII;
> @@ -47,6 +50,12 @@ static void stmmac_default_data(void)
> dma_cfg.pbl = 32;
> dma_cfg.burst_len = DMA_AXI_BLEN_256;
> plat_dat.dma_cfg = &dma_cfg;
> +
> + /* Set default value for multicast hash bins */
> + plat->multicast_filter_bins = HASH_TABLE_SIZE;
> +
> + /* Set default value for unicast filter entries */
> + plat->unicast_filter_entries = 1;
Don't do this.
The rest of the function goes "plat_dat.foo" so it looks terribly
inconsistent when you add the local variable to dereference it like
this.
So just do "plat_dat.multicast_filter_bins = x;" etc.
^ permalink raw reply
* DONATION!!!
From: Mrs Birgit Rausing @ 2014-10-30 23:43 UTC (permalink / raw)
I,Birgit authenticate this email, you can read about me on:
http://en.wikipedia.org/wiki/Birgit_Rausing
I have funds for you to manage and disburse to various charities of your
choice. If you are sure you can handle this, it will be of help to you and
others. Please reply if you are interested for more details.please
Contact my private email;( mrs_BirgitRausin0@qq.com ) for more
information
With love,
Mrs Birgit Rausing
^ permalink raw reply
* Re: [net-next 2/2] sctp: replace seq_printf with seq_puts
From: David Miller @ 2014-10-30 23:40 UTC (permalink / raw)
To: michele; +Cc: vyasevich, nhorman, linux-sctp, netdev
In-Reply-To: <1414661356-17255-2-git-send-email-michele@acksyn.org>
From: Michele Baldessari <michele@acksyn.org>
Date: Thu, 30 Oct 2014 10:29:16 +0100
> Fixes checkpatch warning:
> "WARNING: Prefer seq_puts to seq_printf"
>
> Signed-off-by: Michele Baldessari <michele@acksyn.org>
Applied.
^ permalink raw reply
* Re: [net-next 1/2] sctp: add transport state in /proc/net/sctp/remaddr
From: David Miller @ 2014-10-30 23:40 UTC (permalink / raw)
To: michele; +Cc: vyasevich, nhorman, linux-sctp, netdev
In-Reply-To: <1414661356-17255-1-git-send-email-michele@acksyn.org>
From: Michele Baldessari <michele@acksyn.org>
Date: Thu, 30 Oct 2014 10:29:15 +0100
> It is often quite helpful to be able to know the state of a transport
> outside of the application itself (for troubleshooting purposes or for
> monitoring purposes). Add it under /proc/net/sctp/remaddr.
>
> Signed-off-by: Michele Baldessari <michele@acksyn.org>
Applied.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox