Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [flamebait] xdp, well meaning but pointless
From: Thomas Graf @ 2016-12-01 16:28 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Florian Westphal, netdev
In-Reply-To: <7e2be2fc-7c04-b333-59c7-43d4fcfcb451@stressinduktion.org>

On 12/01/16 at 04:52pm, Hannes Frederic Sowa wrote:
> First of all, this is a rant targeted at XDP and not at eBPF as a whole.
> XDP manipulates packets at free will and thus all security guarantees
> are off as well as in any user space solution.
> 
> Secondly user space provides policy, acl, more controlled memory
> protection, restartability and better debugability. If I had multi
> tenant workloads I would definitely put more complex "business/acl"
> logic into user space, so I can make use of LSM and other features to
> especially prevent a network facing service to attack the tenants. If
> stuff gets put into the kernel you run user controlled code in the
> kernel exposing a much bigger attack vector.
> 
> What use case do you see in XDP specifically e.g. for container networking?

DDOS mitigation to protect distributed applications in large clusters.
Relying on CDN works to protect API gateways and frontends (as long as
they don't throw you out of their network) but offers no protection
beyond that, e.g. a noisy/hostile neighbour. Doing this at the server
level and allowing the mitigation capability to scale up with the number
of servers is natural and cheap.

> > I agree with you if the LB is a software based appliance in either a
> > dedicated VM or on dedicated baremetal.
> > 
> > The reality is turning out to be different in many cases though, LB
> > needs to be performed not only for north south but east west as well.
> > So even if I would handle LB for traffic entering my datacenter in user
> > space, I will need the same LB for packets from my applications and
> > I definitely don't want to move all of that into user space.
> 
> The open question to me is why is programmability needed here.
> 
> Look at the discussion about ECMP and consistent hashing. It is not very
> easy to actually write this code correctly. Why can't we just put C code
> into the kernel that implements this once and for all and let user space
> update the policies?

Whatever LB logic is put in place with native C code now is unlikely the
logic we need in two years. We can't really predict the future. If it
was the case, networking would have been done long ago and we would all
be working on self eating ice cream now.

> Load balancers have to deal correctly with ICMP packets, e.g. they even
> have to be duplicated to every ECMP route. This seems to be problematic
> to do in eBPF programs due to looping constructs so you end up with
> complicated user space anyway.

Feel free to implement such complex LBs in user space or natively. It is
not required for the majority of use cases. The most popular LBs for
application load balancing have no idea of ECMP and require ECMP aware
routers to be made redundant itself.

^ permalink raw reply

* Re: [PATCH v2 net-next 0/4] Adding PHY MDI(X) support
From: David Miller @ 2016-12-01 16:27 UTC (permalink / raw)
  To: Raju.Lakkaraju; +Cc: netdev, f.fainelli, Allan.Nielsen, andrew
In-Reply-To: <1480412809-6122-1-git-send-email-Raju.Lakkaraju@microsemi.com>

From: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
Date: Tue, 29 Nov 2016 15:16:45 +0530

> I updated all review comments which were given by Andrew and Florian.
> 
> This series add support for PHY MDI(X), and implement it for MSCC phys.
> 
> Tested on Beaglebone Black with VSC 8531 PHY.

Series applied, thanks.

^ permalink raw reply

* Re: [flamebait] xdp, well meaning but pointless
From: David Miller @ 2016-12-01 16:19 UTC (permalink / raw)
  To: tgraf; +Cc: fw, netdev
In-Reply-To: <20161201145834.GA569@pox.localdomain>

From: Thomas Graf <tgraf@suug.ch>
Date: Thu, 1 Dec 2016 15:58:34 +0100

> The benefits of XDP for this use case are extremely obvious in combination
> with local applications which need to be protected. ntuple filters won't
> cut it. They are limited and subject to a certain rate at which they
> can be configured. Any serious mitigation will require stateful filtering
> with at least minimal L7 matching abilities and this is exactly where XDP
> will excel.

+1

Saying that ntuple filters can handle the early drop use case doesn't
take into consideration the nature of the tables (hundreds of
thousands of "evil" IP addresses), whether hardware can actually
handle that (it can't), and whether simple IP address matching is the
full extent of it (it isn't).

Most of the time when I hear anti-XDP rhetoric, it's usually comes
from a crowd who for some reason feels threatened by the technology
and what it might replace and make useless.

That to me says that we are _exactly_ going down the right path.

^ permalink raw reply

* Re: [PATCH 00/11] Netfilter fixes for net
From: David Miller @ 2016-12-01 16:13 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1480543045-3389-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed, 30 Nov 2016 22:57:14 +0100

> This is a large batch of Netfilter fixes for net, they are:
 ...
> I know is late but I think these are important, specifically the NAT
> bits, as they are mostly addressing fallout from recent changes. I also
> read there are chances to have -rc8, if that is the case, that would
> also give us a bit more time to test this.
> 
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks Pablo.

^ permalink raw reply

* Re: [PATCH net v2] tipc: check minimum bearer MTU
From: Ben Hutchings @ 2016-12-01 16:11 UTC (permalink / raw)
  To: Michal Kubecek, Jon Maloy, Ying Xue
  Cc: David S. Miller, tipc-discussion, netdev, linux-kernel,
	Qian Zhang
In-Reply-To: <20161201110205.10749A0F33@unicorn.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 716 bytes --]

On Thu, 2016-12-01 at 12:02 +0100, Michal Kubecek wrote:
[...] 
> +/* check if device MTU is sufficient for tipc headers */
> +static inline bool tipc_check_mtu(struct net_device *dev, unsigned int reserve)
> +{
> +	if (dev->mtu >= TIPC_MIN_BEARER_MTU + reserve)
> +		return false;
> +	netdev_warn(dev, "MTU too low for tipc bearer\n");
> +	return true;
> +}
[...]

The comment says "check if ... sufficient" but the return value
indicates the opposite.  Could you make these consistent?

Other than that, this looks OK to me.  I haven't tested any version as
I don't know how to use TIPC.

Ben.

-- 
Ben Hutchings
A free society is one where it is safe to be unpopular. - Adlai
Stevenson


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
From: Eric Dumazet @ 2016-12-01 16:08 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: Jesper Dangaard Brouer, David Miller, netdev, Tariq Toukan
In-Reply-To: <1480607729.18162.311.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, 2016-12-01 at 07:55 -0800, Eric Dumazet wrote:

> So removing the spinlock is doable, but needs to add a new parameter
> to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64()
> before mlx4_en_fold_software_stats(dev)

Untested patch would be :

 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |    2 -
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |   10 +----
 drivers/net/ethernet/mellanox/mlx4/en_port.c    |   24 +++++++++-----
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |    3 +
 4 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index d9c9f86a30df953fa555934c5406057dcaf28960..676050e352703cebe7fcaa5202a06496f7a5a0df 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -367,7 +367,7 @@ static void mlx4_en_get_ethtool_stats(struct net_device *dev,
 
 	spin_lock_bh(&priv->stats_lock);
 
-	mlx4_en_fold_software_stats(dev);
+	mlx4_en_fold_software_stats(dev, NULL);
 
 	for (i = 0; i < NUM_MAIN_STATS; i++, bitmap_iterator_inc(&it))
 		if (bitmap_iterator_test(&it))
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 091b904262bc7932d3edf99cf850affb23b9ce6e..6ee9e31e59c392cb88faedf9c541b3bc6d195228 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1321,13 +1321,9 @@ static void mlx4_en_tx_timeout(struct net_device *dev)
 static struct rtnl_link_stats64 *
 mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 {
-	struct mlx4_en_priv *priv = netdev_priv(dev);
-
-	spin_lock_bh(&priv->stats_lock);
-	mlx4_en_fold_software_stats(dev);
 	netdev_stats_to_stats64(stats, &dev->stats);
-	spin_unlock_bh(&priv->stats_lock);
-
+	/* Must be called after netdev_stats_to_stats64() */
+	mlx4_en_fold_software_stats(dev, stats);
 	return stats;
 }
 
@@ -1810,7 +1806,7 @@ void mlx4_en_stop_port(struct net_device *dev, int detach)
 	netif_tx_disable(dev);
 
 	spin_lock_bh(&priv->stats_lock);
-	mlx4_en_fold_software_stats(dev);
+	mlx4_en_fold_software_stats(dev, NULL);
 	/* Set port as not active */
 	priv->port_up = false;
 	spin_unlock_bh(&priv->stats_lock);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c
index 9166d90e732858610b1407fe85cbf6cbe27f5e0b..eea042a18e3cfba62745ece4ca673c2db967b9aa 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
@@ -147,7 +147,8 @@ static unsigned long en_stats_adder(__be64 *start, __be64 *next, int num)
 	return ret;
 }
 
-void mlx4_en_fold_software_stats(struct net_device *dev)
+void mlx4_en_fold_software_stats(struct net_device *dev,
+				 struct rtnl_link_stats64 *stats)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_en_dev *mdev = priv->mdev;
@@ -165,9 +166,13 @@ void mlx4_en_fold_software_stats(struct net_device *dev)
 		packets += READ_ONCE(ring->packets);
 		bytes   += READ_ONCE(ring->bytes);
 	}
-	dev->stats.rx_packets = packets;
-	dev->stats.rx_bytes = bytes;
-
+	if (stats) {
+		stats->rx_packets = packets;
+		stats->rx_bytes = bytes;
+	} else {
+		dev->stats.rx_packets = packets;
+		dev->stats.rx_bytes = bytes;
+	}
 	packets = 0;
 	bytes = 0;
 	for (i = 0; i < priv->tx_ring_num[TX]; i++) {
@@ -176,8 +181,13 @@ void mlx4_en_fold_software_stats(struct net_device *dev)
 		packets += READ_ONCE(ring->packets);
 		bytes   += READ_ONCE(ring->bytes);
 	}
-	dev->stats.tx_packets = packets;
-	dev->stats.tx_bytes = bytes;
+	if (stats) {
+		stats->tx_packets = packets;
+		stats->tx_bytes = bytes;
+	} else {
+		dev->stats.tx_packets = packets;
+		dev->stats.tx_bytes = bytes;
+	}
 }
 
 int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
@@ -208,7 +218,7 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
 
 	spin_lock_bh(&priv->stats_lock);
 
-	mlx4_en_fold_software_stats(dev);
+	mlx4_en_fold_software_stats(dev, NULL);
 
 	priv->port_stats.rx_chksum_good = 0;
 	priv->port_stats.rx_chksum_none = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 20a936428f4a44c8ca0a7161855da310f9166b50..92dbb41f425b282e9ab7c8d534f091da0ba661c3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -755,7 +755,8 @@ void mlx4_en_rx_irq(struct mlx4_cq *mcq);
 int mlx4_SET_MCAST_FLTR(struct mlx4_dev *dev, u8 port, u64 mac, u64 clear, u8 mode);
 int mlx4_SET_VLAN_FLTR(struct mlx4_dev *dev, struct mlx4_en_priv *priv);
 
-void mlx4_en_fold_software_stats(struct net_device *dev);
+void mlx4_en_fold_software_stats(struct net_device *dev,
+				 struct rtnl_link_stats64 *stats);
 int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset);
 int mlx4_en_QUERY_PORT(struct mlx4_en_dev *mdev, u8 port);
 

^ permalink raw reply related

* Re: [PATCH v5 net-next 0/7] Support Armada 37xx SoC (ARMv8 64-bits) in mvneta driver
From: Gregory CLEMENT @ 2016-12-01 16:09 UTC (permalink / raw)
  To: Marcin Wojtas
  Cc: David S. Miller, linux-kernel, netdev, Jisheng Zhang,
	Arnd Bergmann, Jason Cooper, Andrew Lunn, Sebastian Hesselbarth,
	Thomas Petazzoni, linux-arm-kernel@lists.infradead.org,
	Nadav Haklai, Dmitri Epshtein, Yelena Krivosheev
In-Reply-To: <CAPv3WKdWyS0wCVsRKR86qpMx4r8NN9=RmjM9oHFy2mmvaR5PAA@mail.gmail.com>

Hi Marcin,
 
 On jeu., déc. 01 2016, Marcin Wojtas <mw@semihalf.com> wrote:

> Hi Gregory,
>
> Checked on a388-gp with and without HWBM, also both ports work on
> a3700 (second one after changing to sgmii).
>
> Tested-by: Marcin Wojtas <mw@semihalf.com>

Thanks, I am going to send a new version with tour tested-by and the dts
fix for the second port.

Gregory

>
> Best regards,
> Marcin
>
> 2016-11-30 22:42 GMT+01:00 Gregory CLEMENT <gregory.clement@free-electrons.com>:
>> Hi,
>>
>> The Armada 37xx is a new ARMv8 SoC from Marvell using same network
>> controller as the older Armada 370/38x/XP SoCs. This series adapts the
>> driver in order to be able to use it on this new SoC. The main changes
>> are:
>>
>> - 64-bits support: the first patches allow using the driver on a 64-bit
>>   architecture.
>>
>> - MBUS support: the mbus configuration is different on Armada 37xx
>>   from the older SoCs.
>>
>> - per cpu interrupt: Armada 37xx do not support per cpu interrupt for
>>   the NETA IP, the non-per-CPU behavior was added back.
>>
>> The first patch is an optimization in the rx path in swbm mode.
>> The second patch remove unnecessary allocation for HWBM.
>> The first item is solved by patches 4 and 5.
>> The 2 last items are solved by patch 6.
>> In patch 7 the dt support is added.
>>
>> Beside Armada 37xx, this series have been again tested on Armada XP
>> and Armada 38x (with Hardware Buffer Management and with Software
>> Buffer Management).
>>
>> This is the 5th version of the series:
>> - 1st version:
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/469588.html
>>
>> - 2nd version:
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/470476.html
>>
>> - 3rd version:
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/470901.html
>>
>> - 4th version:
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/471039.html
>>
>> Changelog:
>> v4 -> v5:
>>  - remove unnecessary cast in patch 3
>>
>> v3 -> v4:
>>  - Adding new patch: "net: mvneta: do not allocate buffer in rxq init
>>    with HWBM"
>>
>>  - Simplify the HWBM case in patch 3 as suggested by Marcin
>>
>> v2 -> v3:
>>  - Adding patch 1 "Optimize rx path for small frame"
>>
>>  - Fix the kbuild error by moving the "phys_addr += pp->rx_offset_correction;"
>>   line from patch 2 to patch 3 where rx_offset_correction is introduced.
>>
>>  - Move the memory allocation of the buf_virt_addr of the rxq to be
>>    called by the probe function in order to avoid a memory leak.
>>
>> Thanks,
>>
>> Gregory
>>
>> Gregory CLEMENT (5):
>>   net: mvneta: Optimize rx path for small frame
>>   net: mvneta: Do not allocate buffer in rxq init with HWBM
>>   net: mvneta: Use cacheable memory to store the rx buffer virtual address
>>   net: mvneta: Only disable mvneta_bm for 64-bits
>>   ARM64: dts: marvell: Add network support for Armada 3700
>>
>> Marcin Wojtas (2):
>>   net: mvneta: Convert to be 64 bits compatible
>>   net: mvneta: Add network support for Armada 3700 SoC
>>
>>  Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt |   7 +-
>>  arch/arm64/boot/dts/marvell/armada-3720-db.dts                    |  23 +++++-
>>  arch/arm64/boot/dts/marvell/armada-37xx.dtsi                      |  23 +++++-
>>  drivers/net/ethernet/marvell/Kconfig                              |  10 +-
>>  drivers/net/ethernet/marvell/mvneta.c                             | 344 +++++++++++++++++++++++++++++++++++++++++++++++++++---------------------
>>  5 files changed, 305 insertions(+), 102 deletions(-)
>>
>> base-commit: 436accebb53021ef7c63535f60bda410aa87c136
>> --
>> git-series 0.8.10

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply

* Re: [flamebait] xdp, well meaning but pointless
From: Florian Westphal @ 2016-12-01 16:06 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Florian Westphal, netdev
In-Reply-To: <20161201145834.GA569@pox.localdomain>

Thomas Graf <tgraf@suug.ch> wrote:
> On 12/01/16 at 10:11am, Florian Westphal wrote:
> > Aside from this, XDP, like DPDK, is a kernel bypass.
> > You might say 'Its just stack bypass, not a kernel bypass!'.
> > But what does that mean exactly?  That packets can still be passed
> > onward to normal stack?
> > Bypass solutions like netmap can also inject packets back to
> > kernel stack again.
> 
> I have a fundamental issue with the approach of exporting packets into
> user space and reinjecting them: Once the packet leaves the kernel,
> any security guarantees are off. I have no control over what is
> running in user space and whether whatever listener up there has been
> compromised or not. To me, that's a no go, in particular for servers
> hosting multi tenant workloads. This is one of the main reasons why
> XDP, in particular in combination with BPF, is very interesting to me.

Funny, I see it exactly the other way around :)

To me packet coming from this "userspace injection" is no different than
a tun/tap, or any other packet coming from network.

I see no change or increase in attack surface.

^ permalink raw reply

* Re: [PATCH v5 net-next 0/7] Support Armada 37xx SoC (ARMv8 64-bits) in mvneta driver
From: Marcin Wojtas @ 2016-12-01 16:07 UTC (permalink / raw)
  To: Gregory CLEMENT
  Cc: David S. Miller, linux-kernel, netdev, Jisheng Zhang,
	Arnd Bergmann, Jason Cooper, Andrew Lunn, Sebastian Hesselbarth,
	Thomas Petazzoni, linux-arm-kernel@lists.infradead.org,
	Nadav Haklai, Dmitri Epshtein, Yelena Krivosheev
In-Reply-To: <cover.0270f6d2413a709521fe2c8c17fbebea6f2e78d1.1480542157.git-series.gregory.clement@free-electrons.com>

Hi Gregory,

Checked on a388-gp with and without HWBM, also both ports work on
a3700 (second one after changing to sgmii).

Tested-by: Marcin Wojtas <mw@semihalf.com>

Best regards,
Marcin

2016-11-30 22:42 GMT+01:00 Gregory CLEMENT <gregory.clement@free-electrons.com>:
> Hi,
>
> The Armada 37xx is a new ARMv8 SoC from Marvell using same network
> controller as the older Armada 370/38x/XP SoCs. This series adapts the
> driver in order to be able to use it on this new SoC. The main changes
> are:
>
> - 64-bits support: the first patches allow using the driver on a 64-bit
>   architecture.
>
> - MBUS support: the mbus configuration is different on Armada 37xx
>   from the older SoCs.
>
> - per cpu interrupt: Armada 37xx do not support per cpu interrupt for
>   the NETA IP, the non-per-CPU behavior was added back.
>
> The first patch is an optimization in the rx path in swbm mode.
> The second patch remove unnecessary allocation for HWBM.
> The first item is solved by patches 4 and 5.
> The 2 last items are solved by patch 6.
> In patch 7 the dt support is added.
>
> Beside Armada 37xx, this series have been again tested on Armada XP
> and Armada 38x (with Hardware Buffer Management and with Software
> Buffer Management).
>
> This is the 5th version of the series:
> - 1st version:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/469588.html
>
> - 2nd version:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/470476.html
>
> - 3rd version:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/470901.html
>
> - 4th version:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/471039.html
>
> Changelog:
> v4 -> v5:
>  - remove unnecessary cast in patch 3
>
> v3 -> v4:
>  - Adding new patch: "net: mvneta: do not allocate buffer in rxq init
>    with HWBM"
>
>  - Simplify the HWBM case in patch 3 as suggested by Marcin
>
> v2 -> v3:
>  - Adding patch 1 "Optimize rx path for small frame"
>
>  - Fix the kbuild error by moving the "phys_addr += pp->rx_offset_correction;"
>   line from patch 2 to patch 3 where rx_offset_correction is introduced.
>
>  - Move the memory allocation of the buf_virt_addr of the rxq to be
>    called by the probe function in order to avoid a memory leak.
>
> Thanks,
>
> Gregory
>
> Gregory CLEMENT (5):
>   net: mvneta: Optimize rx path for small frame
>   net: mvneta: Do not allocate buffer in rxq init with HWBM
>   net: mvneta: Use cacheable memory to store the rx buffer virtual address
>   net: mvneta: Only disable mvneta_bm for 64-bits
>   ARM64: dts: marvell: Add network support for Armada 3700
>
> Marcin Wojtas (2):
>   net: mvneta: Convert to be 64 bits compatible
>   net: mvneta: Add network support for Armada 3700 SoC
>
>  Documentation/devicetree/bindings/net/marvell-armada-370-neta.txt |   7 +-
>  arch/arm64/boot/dts/marvell/armada-3720-db.dts                    |  23 +++++-
>  arch/arm64/boot/dts/marvell/armada-37xx.dtsi                      |  23 +++++-
>  drivers/net/ethernet/marvell/Kconfig                              |  10 +-
>  drivers/net/ethernet/marvell/mvneta.c                             | 344 +++++++++++++++++++++++++++++++++++++++++++++++++++---------------------
>  5 files changed, 305 insertions(+), 102 deletions(-)
>
> base-commit: 436accebb53021ef7c63535f60bda410aa87c136
> --
> git-series 0.8.10

^ permalink raw reply

* Re: [PATCH v5 net-next 7/7] ARM64: dts: marvell: Add network support for Armada 3700
From: Marcin Wojtas @ 2016-12-01 16:05 UTC (permalink / raw)
  To: Gregory CLEMENT
  Cc: David S. Miller, linux-kernel, netdev, Jisheng Zhang,
	Arnd Bergmann, Jason Cooper, Andrew Lunn, Sebastian Hesselbarth,
	Thomas Petazzoni, linux-arm-kernel@lists.infradead.org,
	Nadav Haklai, Dmitri Epshtein, Yelena Krivosheev
In-Reply-To: <f205231e640a2324b8f007073cf54b166a263ed4.1480542157.git-series.gregory.clement@free-electrons.com>

Hi Gregory,

2016-11-30 22:42 GMT+01:00 Gregory CLEMENT <gregory.clement@free-electrons.com>:
> Add neta nodes for network support both in device tree for the SoC and
> the board.
>
> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
> ---
>  arch/arm64/boot/dts/marvell/armada-3720-db.dts | 23 +++++++++++++++++++-
>  arch/arm64/boot/dts/marvell/armada-37xx.dtsi   | 23 +++++++++++++++++++-
>  2 files changed, 46 insertions(+), 0 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/marvell/armada-3720-db.dts b/arch/arm64/boot/dts/marvell/armada-3720-db.dts
> index 1372e9a6aaa4..c8b82e4145de 100644
> --- a/arch/arm64/boot/dts/marvell/armada-3720-db.dts
> +++ b/arch/arm64/boot/dts/marvell/armada-3720-db.dts
> @@ -81,3 +81,26 @@
>  &pcie0 {
>         status = "okay";
>  };
> +
> +&mdio {
> +       status = "okay";
> +       phy0: ethernet-phy@0 {
> +               reg = <0>;
> +       };
> +
> +       phy1: ethernet-phy@1 {
> +               reg = <1>;
> +       };
> +};
> +
> +&eth0 {
> +       phy-mode = "rgmii-id";
> +       phy = <&phy0>;
> +       status = "okay";
> +};
> +
> +&eth1 {
> +       phy-mode = "rgmii-id";

Should be "sgmii".

Best regards,
Marcin

^ permalink raw reply

* Re: [WIP] net+mlx4: auto doorbell
From: Jesper Dangaard Brouer @ 2016-12-01 16:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Saeed Mahameed, Rick Jones, Linux Netdev List, Saeed Mahameed,
	Tariq Toukan, brouer
In-Reply-To: <1480602274.18162.285.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, 01 Dec 2016 06:24:34 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2016-12-01 at 13:05 +0100, Jesper Dangaard Brouer wrote:
> > On Wed, 30 Nov 2016 18:27:45 +0200
> > Saeed Mahameed <saeedm@dev.mellanox.co.il> wrote:
> >   
> > > >> All in all, this is risky business :),  the right way to go is to
> > > >> force the upper layer to use xmit-more and delay doorbells/use bulking
> > > >> but from the same context (xmit routine).  For example see
> > > >> Achiad's suggestion (attached in Jesper's response), he used stop
> > > >> queue to force the stack to queue up packets (TX bulking)
> > > >> which would set xmit-more and will use the next completion to
> > > >> release the "stopped" ring TXQ rather than hit the doorbell on
> > > >> behalf of it.    
> > > >
> > > > Well, you depend on having a higher level queue like a qdisc.
> > > >
> > > > Some users do not use a qdisc.
> > > > If you stop the queue, they no longer can send anything -> drops.
> > > >  
> > 
> > You do have a point that stopping the device might not be the best way
> > to create a push-back (to allow stack queue packets).
> > 
> >  netif_tx_stop_queue() / __QUEUE_STATE_DRV_XOFF
> > 
> >   
> > > In this case, i think they should implement their own bulking (pktgen
> > > is not a good example) but XDP can predict if it has more packets to
> > > xmit  as long as all of them fall in the same NAPI cycle.
> > > Others should try and do the same.  
> > 
> > I actually agree with Saeed here.
> > 
> > Maybe we can come up with another __QUEUE_STATE_xxx that informs the
> > upper layer what the driver is doing.  Then users not using a qdisc can
> > use this indication (like the qdisc could).  (qdisc-bypass users already
> > check the QUEUE_STATE flags e.g. via netif_xmit_frozen_or_drv_stopped).  
> 
> Can you explain how this is going to help trafgen using AF_PACKET with
> Qdisc bypass ?
> 
> Say trafgen wants to send 10 or 1000 packets back to back (as fast as
> possible)
>
> With my proposal, only the first is triggering a doorbell from
> ndo_start_xmit(). Following ones are driven by TX completion logic, or
> BQL if we can push packets faster than TX interrupt can be
> delivered/handled.
> 
> If you stop the queue (with yet another atomic operations to stop/unstop
> btw), packet_direct_xmit() will have to drop trafgen packets on the
> floor.

I think you misunderstood my concept[1].  I don't want to stop the
queue. The new __QUEUE_STATE_FLUSH_NEEDED does not stop the queue, is
it just indicating that someone need to flush/ring-doorbell.  Maybe it
need another name, because it also indicate that the driver can see
that its TX queue is so busy that we don't need to call it immediately.
The qdisc layer can then choose to enqueue instead if doing direct xmit.

When qdisc layer or trafgen/af_packet see this indication it knows it
should/must flush the queue when it don't have more work left.  Perhaps
through net_tx_action(), by registering itself and e.g. if qdisc_run()
is called and queue is empty then check if queue needs a flush. I would
also allow driver to flush and clear this bit.

I just see it as an extension of your solution, as we still need the
driver to figure out then the doorbell/flush can be delayed.
p.s. don't be discouraged by this feedback, I'm just very excited and
happy that your are working on a solution in this area. As this is a
problem area that I've not been able to solve myself for the last
approx 2 years. Keep up the good work!

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[1] http://lkml.kernel.org/r/20161130233015.3de95356@redhat.com

^ permalink raw reply

* Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
From: Eric Dumazet @ 2016-12-01 15:55 UTC (permalink / raw)
  To: Saeed Mahameed; +Cc: Jesper Dangaard Brouer, David Miller, netdev, Tariq Toukan
In-Reply-To: <CALzJLG8O28ZBS3sdqLpohPZtpLbLs228K4aO6hYd4URsA3yefA@mail.gmail.com>

On Thu, 2016-12-01 at 17:38 +0200, Saeed Mahameed wrote:

> 
> Hi Eric, Thanks for the patch, I already acked it.

Thanks !

> 
> I have one educational question (not related to this patch, but
> related to stats reading in general).
> I was wondering why do we need to disable bh every time we read stats
> "spin_lock_bh" ? is it essential ?
> 
> I checked and in mlx4 we don't hold stats_lock in softirq
> (en_rx.c/en_tx.c), so I don't see any deadlock risk in here..

Excellent question, and I chose to keep the spinlock.

That would be doable, only if we do not overwrite dev->stats.

Current code is :

static struct rtnl_link_stats64 *
mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
{
        struct mlx4_en_priv *priv = netdev_priv(dev);

        spin_lock_bh(&priv->stats_lock);
        mlx4_en_fold_software_stats(dev);
        netdev_stats_to_stats64(stats, &dev->stats);
        spin_unlock_bh(&priv->stats_lock);

        return stats;
}

If you remove the spin_lock_bh() :

static struct rtnl_link_stats64 *
mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
{
        struct mlx4_en_priv *priv = netdev_priv(dev);

        mlx4_en_fold_software_stats(dev); // possible races

        netdev_stats_to_stats64(stats, &dev->stats);

        return stats;
}

1) one mlx4_en_fold_software_stats(dev) could be preempted
on a CONFIG_PREEMPT kernel, or interrupted by long irqs.

2) Another cpu would also call mlx4_en_fold_software_stats(dev) while
   first cpu is busy.

3) Then when resuming first cpu/thread, part of the dev->stats fieds 
would be updated with 'old counters',
while another thread might have updated them with newer values.

4) A SNMP reader could then get counters that are not monotonically
increasing,
which would be confusing/buggy.

So removing the spinlock is doable, but needs to add a new parameter
to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64()
before mlx4_en_fold_software_stats(dev)

static struct rtnl_link_stats64 *
mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
{
        struct mlx4_en_priv *priv = netdev_priv(dev);

        netdev_stats_to_stats64(stats, &dev->stats);

	// Passing a non NULL stats asks mlx4_en_fold_software_stats()
	// to not update dev->stats, but stats directly.

        mlx4_en_fold_software_stats(dev, stats)


        return stats;
}

^ permalink raw reply

* Re: [flamebait] xdp, well meaning but pointless
From: Hannes Frederic Sowa @ 2016-12-01 15:52 UTC (permalink / raw)
  To: Thomas Graf, Florian Westphal; +Cc: netdev
In-Reply-To: <20161201145834.GA569@pox.localdomain>

Hi,

On 01.12.2016 15:58, Thomas Graf wrote:
> On 12/01/16 at 10:11am, Florian Westphal wrote:
>> Aside from this, XDP, like DPDK, is a kernel bypass.
>> You might say 'Its just stack bypass, not a kernel bypass!'.
>> But what does that mean exactly?  That packets can still be passed
>> onward to normal stack?
>> Bypass solutions like netmap can also inject packets back to
>> kernel stack again.
> 
> I have a fundamental issue with the approach of exporting packets into
> user space and reinjecting them: Once the packet leaves the kernel,
> any security guarantees are off. I have no control over what is
> running in user space and whether whatever listener up there has been
> compromised or not. To me, that's a no go, in particular for servers
> hosting multi tenant workloads. This is one of the main reasons why
> XDP, in particular in combination with BPF, is very interesting to me.

First of all, this is a rant targeted at XDP and not at eBPF as a whole.
XDP manipulates packets at free will and thus all security guarantees
are off as well as in any user space solution.

Secondly user space provides policy, acl, more controlled memory
protection, restartability and better debugability. If I had multi
tenant workloads I would definitely put more complex "business/acl"
logic into user space, so I can make use of LSM and other features to
especially prevent a network facing service to attack the tenants. If
stuff gets put into the kernel you run user controlled code in the
kernel exposing a much bigger attack vector.

What use case do you see in XDP specifically e.g. for container networking?

>> b). with regards to a programmable data path: IFF one wants to do this
>> in kernel (and thats a big if), it seems much more preferrable to provide
>> a config/data-based approach rather than a programmable one.  If you want
>> full freedom DPDK is architecturally just too powerful to compete with.
> 
> I must have missed the legal disclaimer that is usually put in front
> of the DPDK marketing show :-)
>
> I don't want full freedom. I want programmability with stack integration
> at sufficient speed and the ability to benefit from the hardware
> abstractions that the kernel provides.
> 
>> Proponents of XDP sometimes provide usage examples.
>> Lets look at some of these.
> 
> [ I won't comment on any of the other use cases because they are of no
>   interest to me ]
> 
>> * Load balancer
>> State holding algorithm need sorting and searching, so also no fit for
>> eBPF (could be exposed by function exports, but then can we do DoS by
>> finding worst case scenarios?).
>>
>> Also again needs way to forward frame out via another interface.
>>
>> For cases where packet gets sent out via same interface it would appear
>> to be easier to use port mirroring in a switch and use stochastic filtering
>> on end nodes to determine which host should take responsibility.
>>
>> XDP plus: central authority over how distribution will work in case
>> nodes are added/removed from pool.
>> But then again, it will be easier to hande this with netmap/dpdk where
>> more complicated scheduling algorithms can be used.
> 
> I agree with you if the LB is a software based appliance in either a
> dedicated VM or on dedicated baremetal.
> 
> The reality is turning out to be different in many cases though, LB
> needs to be performed not only for north south but east west as well.
> So even if I would handle LB for traffic entering my datacenter in user
> space, I will need the same LB for packets from my applications and
> I definitely don't want to move all of that into user space.

The open question to me is why is programmability needed here.

Look at the discussion about ECMP and consistent hashing. It is not very
easy to actually write this code correctly. Why can't we just put C code
into the kernel that implements this once and for all and let user space
update the policies?

Load balancers have to deal correctly with ICMP packets, e.g. they even
have to be duplicated to every ECMP route. This seems to be problematic
to do in eBPF programs due to looping constructs so you end up with
complicated user space anyway.

>> * early drop/filtering.
>> While its possible to do "u32" like filters with ebpf, all modern nics
>> support ntuple filtering in hardware, which is going to be faster because
>> such packet will never even be signalled to the operating system.
>> For more complicated cases (e.g. doing socket lookup to check if particular
>> packet does match bound socket (and expected sequence numbers etc) I don't
>> see easy ways to do that with XDP (and without sk_buff context).
>> Providing it via function exports is possible of course, but that will only
>> result in an "arms race" where we will see special-sauce functions
>> all over the place -- DoS will always attempt to go for something
>> that is difficult to filter against, cf. all the recent volume-based
>> floodings.
> 
> You probably put this last because this was the most difficult to
> shoot down ;-)
> 
> The benefits of XDP for this use case are extremely obvious in combination
> with local applications which need to be protected. ntuple filters won't
> cut it. They are limited and subject to a certain rate at which they
> can be configured. Any serious mitigation will require stateful filtering
> with at least minimal L7 matching abilities and this is exactly where XDP
> will excel.

In my experience and research of DoS attacks you certainly want to put a
bit more logic into a filter than to look up something from hash tables
and drop it then. You certainly also want to have some more logic than
32 * 4096 instructions to execute, e.g. parsing and matching of DNS/NTP
packets with certain conditions and side look-ups. If you seriously do
that stuff you end up with a highly optimized programs containing
stochastic filters and also complex database logic.

If I want to drop based on hash table lookups, as Florian wrote, I would
let the hardware do that and assemble the tables in user space.

Bye,
Hannes

^ permalink raw reply

* [PATCH 1/1] NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040
From: Daniele Palmas @ 2016-12-01 15:52 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: netdev, Daniele Palmas
In-Reply-To: <1480607525-23044-1-git-send-email-dnlplm@gmail.com>

This patch adds support for PID 0x1040 of Telit LE922A.

The qmi adapter requires to have DTR set for proper working,
so QMI_WWAN_QUIRK_DTR has been enabled.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
---
 drivers/net/usb/qmi_wwan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 3ff76c6..6fe1cdb 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -894,6 +894,7 @@ static const struct usb_device_id products[] = {
 	{QMI_FIXED_INTF(0x1bbb, 0x0203, 2)},	/* Alcatel L800MA */
 	{QMI_FIXED_INTF(0x2357, 0x0201, 4)},	/* TP-LINK HSUPA Modem MA180 */
 	{QMI_FIXED_INTF(0x2357, 0x9000, 4)},	/* TP-LINK MA260 */
+	{QMI_QUIRK_SET_DTR(0x1bc7, 0x1040, 2)},	/* Telit LE922A */
 	{QMI_FIXED_INTF(0x1bc7, 0x1200, 5)},	/* Telit LE920 */
 	{QMI_FIXED_INTF(0x1bc7, 0x1201, 2)},	/* Telit LE920 */
 	{QMI_FIXED_INTF(0x1c9e, 0x9b01, 3)},	/* XS Stick W100-2 from 4G Systems */
-- 
2.7.4

^ permalink raw reply related

* [PATCH 0/1] NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040
From: Daniele Palmas @ 2016-12-01 15:52 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: netdev, Daniele Palmas

This patch adds support for PID 0x1040 of Telit LE922A.

The qmi adapter requires to have DTR set for proper working,
so QMI_WWAN_QUIRK_DTR has been enabled.

Following verbose lsusb output of the composition:

Bus 003 Device 006: ID 1bc7:1040 Telit Wireless Solutions 
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  idVendor           0x1bc7 Telit Wireless Solutions
  idProduct          0x1040 
  bcdDevice            3.10
  iManufacturer           1 Android
  iProduct                2 Android
  iSerial                 3 359fb2
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength          281
    bNumInterfaces          7
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0x80
      (Bus Powered)
    MaxPower              500mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass    255 Vendor Specific Subclass
      bInterfaceProtocol    255 Vendor Specific Protocol
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x01  EP 1 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        1
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass     66 
      bInterfaceProtocol      1 
      iInterface              4 ADB Interface
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        2
      bAlternateSetting       0
      bNumEndpoints           3
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass    255 Vendor Specific Subclass
      bInterfaceProtocol    255 Vendor Specific Protocol
      iInterface              0 
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x84  EP 4 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval               9
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x83  EP 3 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x03  EP 3 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        3
      bAlternateSetting       0
      bNumEndpoints           3
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
      ** UNRECOGNIZED:  05 24 00 10 01
      ** UNRECOGNIZED:  05 24 01 00 00
      ** UNRECOGNIZED:  04 24 02 02
      ** UNRECOGNIZED:  05 24 06 00 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x86  EP 6 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x000a  1x 10 bytes
        bInterval               9
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x85  EP 5 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x04  EP 4 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        4
      bAlternateSetting       0
      bNumEndpoints           3
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
      ** UNRECOGNIZED:  05 24 00 10 01
      ** UNRECOGNIZED:  05 24 01 00 00
      ** UNRECOGNIZED:  04 24 02 02
      ** UNRECOGNIZED:  05 24 06 00 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x88  EP 8 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x000a  1x 10 bytes
        bInterval               9
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x87  EP 7 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x05  EP 5 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        5
      bAlternateSetting       0
      bNumEndpoints           3
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
      ** UNRECOGNIZED:  05 24 00 10 01
      ** UNRECOGNIZED:  05 24 01 00 00
      ** UNRECOGNIZED:  04 24 02 02
      ** UNRECOGNIZED:  05 24 06 00 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x8a  EP 10 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x000a  1x 10 bytes
        bInterval               9
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x89  EP 9 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x06  EP 6 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        6
      bAlternateSetting       0
      bNumEndpoints           3
      bInterfaceClass       255 Vendor Specific Class
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
      ** UNRECOGNIZED:  05 24 00 10 01
      ** UNRECOGNIZED:  05 24 01 00 00
      ** UNRECOGNIZED:  04 24 02 02
      ** UNRECOGNIZED:  05 24 06 00 00
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x8c  EP 12 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x000a  1x 10 bytes
        bInterval               9
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x8b  EP 11 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x07  EP 7 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
Device Qualifier (for other device speed):
  bLength                10
  bDescriptorType         6
  bcdUSB               2.00
  bDeviceClass            0 (Defined at Interface level)
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  bNumConfigurations      1
Device Status:     0x0000
  (Bus Powered)


Daniele Palmas (1):
  NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040

 drivers/net/usb/qmi_wwan.c | 1 +
 1 file changed, 1 insertion(+)

-- 
2.7.4

^ permalink raw reply

* Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
From: Saeed Mahameed @ 2016-12-01 15:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jesper Dangaard Brouer, David Miller, netdev, Tariq Toukan
In-Reply-To: <1480539652.18162.205.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, Nov 30, 2016 at 11:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2016-11-30 at 22:42 +0200, Saeed Mahameed wrote:
>> On Wed, Nov 30, 2016 at 7:35 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote:
>> >
>> >> we had/still have the proper stats they are the ones that
>> >> mlx4_en_fold_software_stats is trying to cache into  (they always
>> >> exist),
>> >> but the ones that you are trying to read from (the mlx4 rings) are gone !
>> >>
>> >> This bug is totally new and as i warned, this is another symptom of
>> >> the real root cause (can't sleep while reading stats).
>> >>
>> >> Eric what do you suggest ? Keep pre-allocated MAX_RINGS stats  and
>> >> always iterate over all of them to query stats ?
>> >> what if you have one ring/none/1K ? how would you know how many to query ?
>> >
>> > I am suggesting I will fix the bug I introduced.
>> >
>> > Do not panic.
>> >
>> >
>>
>> Not at all, I trust you are the only one who is capable of providing
>> the best solution.
>> I am just trying to read your mind :-).
>>
>> As i said i like the solution and i want to adapt it to mlx5, so I am
>> a little bit enthusiastic :)
>
> What about the following fix guys ?
>
> As a bonus we update the stats right before they are sent to monitors
> via rtnetlink ;)

Hi Eric, Thanks for the patch, I already acked it.

I have one educational question (not related to this patch, but
related to stats reading in general).
I was wondering why do we need to disable bh every time we read stats
"spin_lock_bh" ? is it essential ?

I checked and in mlx4 we don't hold stats_lock in softirq
(en_rx.c/en_tx.c), so I don't see any deadlock risk in here..

 Thanks
Saeed.

^ permalink raw reply

* Re: [PATCH] ip6_offload: check segs for NULL in ipv6_gso_segment.
From: Eric Dumazet @ 2016-12-01 15:27 UTC (permalink / raw)
  To: Artem Savkov
  Cc: davem, netdev, linux-kernel, jstancek, steffen.klassert,
	alexander.h.duyck,
	YOSHIFUJI Hideaki / 吉藤英明
In-Reply-To: <20161201150703.dv463hxgqajqo6pm@shodan.usersys.redhat.com>

On Thu, 2016-12-01 at 16:07 +0100, Artem Savkov wrote:

> I am not, but this would have the same behavior as pre-07b26c9 code and
> IS_ERR_OR_NULL is used in ipv4's inet_gso_segment().

My concern might have been that IS_ERR_OR_NULL() considers the !ptr to
be unlikely.

But in this code path, we really can not tell.

segs == NULL can be quite likely in TUN case, because of DODGY bit

Commit 50c3a487d50756 replaced the perfectly fine :

if (!segs || IS_ERR(segs))

into dubious

if (IS_ERR_OR_NULL(segs))

segs = NULL is not an error, but use of IS_ERR_OR_NULL() might mislead
programmers trying to understand this code.

^ permalink raw reply

* Re: [PATCH net-next] mlx4: fix use-after-free in mlx4_en_fold_software_stats()
From: Saeed Mahameed @ 2016-12-01 15:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jesper Dangaard Brouer, David Miller, netdev, Tariq Toukan
In-Reply-To: <1480597326.18162.276.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, Dec 1, 2016 at 3:02 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> My recent commit to get more precise rx/tx counters in ndo_get_stats64()
> can lead to crashes at device dismantle, as Jesper found out.
>
> We must prevent mlx4_en_fold_software_stats() trying to access
> tx/rx rings if they are deleted.
>
> Fix this by adding a test against priv->port_up in
> mlx4_en_fold_software_stats()
>
> Calling mlx4_en_fold_software_stats() from mlx4_en_stop_port()
> allows us to eventually broadcast the latest/current counters to
> rtnetlink monitors.
>
> Fixes: 40931b85113d ("mlx4: give precise rx/tx bytes/packets counters")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-and-bisected-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Saeed Mahameed <saeedm@dev.mellanox.co.il>

Acked-by: Saeed Mahameed <saeedm@mellanox.com>

^ permalink raw reply

* [PATCH 2/2] net: stmmac: unify mdio functions
From: Corentin Labbe @ 2016-12-01 15:19 UTC (permalink / raw)
  To: peppe.cavallaro, alexandre.torgue; +Cc: netdev, linux-kernel, Corentin Labbe
In-Reply-To: <1480605581-13350-1-git-send-email-clabbe.montjoie@gmail.com>

stmmac_mdio_{read|write} and stmmac_mdio_{read|write}_gmac4 are not
enought different for being split.
The only differences between thoses two functions are shift/mask for
addr/reg/clk_csr.

This patch introduce a per platform set of variable for setting thoses
shift/mask and unify mdio read and write functions.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
---
 drivers/net/ethernet/stmicro/stmmac/common.h       |   6 +
 .../net/ethernet/stmicro/stmmac/dwmac1000_core.c   |   6 +
 .../net/ethernet/stmicro/stmmac/dwmac100_core.c    |   7 ++
 drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c  |   6 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  | 121 ++++-----------------
 5 files changed, 48 insertions(+), 98 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 5bd4c05..3ced2e1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -507,6 +507,12 @@ struct mac_link {
 struct mii_regs {
 	unsigned int addr;	/* MII Address */
 	unsigned int data;	/* MII Data */
+	unsigned int addr_shift;	/* MII address shift */
+	unsigned int reg_shift;		/* MII reg shift */
+	unsigned int addr_mask;		/* MII address mask */
+	unsigned int reg_mask;		/* MII reg mask */
+	unsigned int clk_csr_shift;
+	unsigned int clk_csr_mask;
 };
 
 /* Helpers to manage the descriptors for chain and ring modes */
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
index 7df4ff1..b21d03f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c
@@ -534,6 +534,12 @@ struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr, int mcbins,
 	mac->link.speed = GMAC_CONTROL_FES;
 	mac->mii.addr = GMAC_MII_ADDR;
 	mac->mii.data = GMAC_MII_DATA;
+	mac->mii.addr_shift = 11;
+	mac->mii.addr_mask = 0x0000F800;
+	mac->mii.reg_shift = 6;
+	mac->mii.reg_mask = 0x000007C0;
+	mac->mii.clk_csr_shift = 2;
+	mac->mii.clk_csr_mask = 0xF;
 
 	/* Get and dump the chip ID */
 	*synopsys_id = stmmac_get_synopsys_id(hwid);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
index 6418b2e..a1d582f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c
@@ -192,6 +192,13 @@ struct mac_device_info *dwmac100_setup(void __iomem *ioaddr, int *synopsys_id)
 	mac->link.speed = 0;
 	mac->mii.addr = MAC_MII_ADDR;
 	mac->mii.data = MAC_MII_DATA;
+	mac->mii.addr_shift = 11;
+	mac->mii.addr_mask = 0x0000F800;
+	mac->mii.reg_shift = 6;
+	mac->mii.reg_mask = 0x000007C0;
+	mac->mii.clk_csr_shift = 2;
+	mac->mii.clk_csr_mask = 0xF;
+
 	/* Synopsys Id is not available on old chips */
 	*synopsys_id = 0;
 
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index 51019b7..eaed7cb 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -430,6 +430,12 @@ struct mac_device_info *dwmac4_setup(void __iomem *ioaddr, int mcbins,
 	mac->link.speed = GMAC_CONFIG_FES;
 	mac->mii.addr = GMAC_MDIO_ADDR;
 	mac->mii.data = GMAC_MDIO_DATA;
+	mac->mii.addr_shift = 21;
+	mac->mii.addr_mask = GENMASK(25, 21);
+	mac->mii.reg_shift = 16;
+	mac->mii.reg_mask = GENMASK(20, 16);
+	mac->mii.clk_csr_shift = 8;
+	mac->mii.clk_csr_mask = GENMASK(11, 8);
 
 	/* Get and dump the chip ID */
 	*synopsys_id = stmmac_get_synopsys_id(hwid);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index 6796c28..23322fd 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -42,13 +42,6 @@
 #define MII_GMAC4_WRITE			(1 << MII_GMAC4_GOC_SHIFT)
 #define MII_GMAC4_READ			(3 << MII_GMAC4_GOC_SHIFT)
 
-#define MII_PHY_ADDR_GMAC4_SHIFT	21
-#define MII_PHY_ADDR_GMAC4_MASK		GENMASK(25, 21)
-#define MII_PHY_REG_GMAC4_SHIFT		16
-#define MII_PHY_REG_GMAC4_MASK		GENMASK(20, 16)
-#define MII_CSR_CLK_GMAC4_SHIFT		8
-#define MII_CSR_CLK_GMAC4_MASK		GENMASK(11, 8)
-
 static int stmmac_mdio_busy_wait(void __iomem *ioaddr, unsigned int mii_addr)
 {
 	unsigned long curr;
@@ -68,8 +61,8 @@ static int stmmac_mdio_busy_wait(void __iomem *ioaddr, unsigned int mii_addr)
 /**
  * stmmac_mdio_read
  * @bus: points to the mii_bus structure
- * @phyaddr: MII addr reg bits 15-11
- * @phyreg: MII addr reg bits 10-6
+ * @phyaddr: MII addr
+ * @phyreg: MII reg
  * Description: it reads data from the MII register from within the phy device.
  * For the 7111 GMAC, we must set the bit 0 in the MII address register while
  * accessing the PHY registers.
@@ -83,9 +76,15 @@ static int stmmac_mdio_read(struct mii_bus *bus, int phyaddr, int phyreg)
 	unsigned int mii_data = priv->hw->mii.data;
 
 	int data;
-	u16 value = (((phyaddr << 11) & (0x0000F800)) |
-			((phyreg << 6) & (0x000007C0)));
-	value |= MII_BUSY | ((priv->clk_csr & 0xF) << 2);
+	u32 value = MII_BUSY;
+
+	value |= (phyaddr << priv->hw->mii.addr_shift)
+		& priv->hw->mii.addr_mask;
+	value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
+	value |= (priv->clk_csr & priv->hw->mii.clk_csr_mask)
+		<< priv->hw->mii.clk_csr_shift;
+	if (priv->plat->has_gmac4)
+		value |= MII_GMAC4_READ;
 
 	if (stmmac_mdio_busy_wait(priv->ioaddr, mii_address))
 		return -EBUSY;
@@ -104,8 +103,8 @@ static int stmmac_mdio_read(struct mii_bus *bus, int phyaddr, int phyreg)
 /**
  * stmmac_mdio_write
  * @bus: points to the mii_bus structure
- * @phyaddr: MII addr reg bits 15-11
- * @phyreg: MII addr reg bits 10-6
+ * @phyaddr: MII addr
+ * @phyreg: MII reg
  * @phydata: phy data
  * Description: it writes the data into the MII register from within the device.
  */
@@ -117,85 +116,16 @@ static int stmmac_mdio_write(struct mii_bus *bus, int phyaddr, int phyreg,
 	unsigned int mii_address = priv->hw->mii.addr;
 	unsigned int mii_data = priv->hw->mii.data;
 
-	u16 value =
-	    (((phyaddr << 11) & (0x0000F800)) | ((phyreg << 6) & (0x000007C0)))
-	    | MII_WRITE;
+	u32 value = MII_WRITE | MII_BUSY;
 
-	value |= MII_BUSY | ((priv->clk_csr & 0xF) << 2);
+	value |= (phyaddr << priv->hw->mii.addr_shift)
+		& priv->hw->mii.addr_mask;
+	value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask;
 
-	/* Wait until any existing MII operation is complete */
-	if (stmmac_mdio_busy_wait(priv->ioaddr, mii_address))
-		return -EBUSY;
-
-	/* Set the MII address register to write */
-	writel(phydata, priv->ioaddr + mii_data);
-	writel(value, priv->ioaddr + mii_address);
-
-	/* Wait until any existing MII operation is complete */
-	return stmmac_mdio_busy_wait(priv->ioaddr, mii_address);
-}
-
-/**
- * stmmac_mdio_read_gmac4
- * @bus: points to the mii_bus structure
- * @phyaddr: MII addr reg bits 25-21
- * @phyreg: MII addr reg bits 20-16
- * Description: it reads data from the MII register of GMAC4 from within
- * the phy device.
- */
-static int stmmac_mdio_read_gmac4(struct mii_bus *bus, int phyaddr, int phyreg)
-{
-	struct net_device *ndev = bus->priv;
-	struct stmmac_priv *priv = netdev_priv(ndev);
-	unsigned int mii_address = priv->hw->mii.addr;
-	unsigned int mii_data = priv->hw->mii.data;
-	int data;
-	u32 value = (((phyaddr << MII_PHY_ADDR_GMAC4_SHIFT) &
-		     (MII_PHY_ADDR_GMAC4_MASK)) |
-		     ((phyreg << MII_PHY_REG_GMAC4_SHIFT) &
-		     (MII_PHY_REG_GMAC4_MASK))) | MII_GMAC4_READ;
-
-	value |= MII_BUSY | ((priv->clk_csr & MII_CSR_CLK_GMAC4_MASK)
-		 << MII_CSR_CLK_GMAC4_SHIFT);
-
-	if (stmmac_mdio_busy_wait(priv->ioaddr, mii_address))
-		return -EBUSY;
-
-	writel(value, priv->ioaddr + mii_address);
-
-	if (stmmac_mdio_busy_wait(priv->ioaddr, mii_address))
-		return -EBUSY;
-
-	/* Read the data from the MII data register */
-	data = (int)readl(priv->ioaddr + mii_data);
-
-	return data;
-}
-
-/**
- * stmmac_mdio_write_gmac4
- * @bus: points to the mii_bus structure
- * @phyaddr: MII addr reg bits 25-21
- * @phyreg: MII addr reg bits 20-16
- * @phydata: phy data
- * Description: it writes the data into the MII register of GMAC4 from within
- * the device.
- */
-static int stmmac_mdio_write_gmac4(struct mii_bus *bus, int phyaddr, int phyreg,
-				   u16 phydata)
-{
-	struct net_device *ndev = bus->priv;
-	struct stmmac_priv *priv = netdev_priv(ndev);
-	unsigned int mii_address = priv->hw->mii.addr;
-	unsigned int mii_data = priv->hw->mii.data;
-
-	u32 value = (((phyaddr << MII_PHY_ADDR_GMAC4_SHIFT) &
-		     (MII_PHY_ADDR_GMAC4_MASK)) |
-		     ((phyreg << MII_PHY_REG_GMAC4_SHIFT) &
-		     (MII_PHY_REG_GMAC4_MASK))) | MII_GMAC4_WRITE;
-
-	value |= MII_BUSY | ((priv->clk_csr & MII_CSR_CLK_GMAC4_MASK)
-		 << MII_CSR_CLK_GMAC4_SHIFT);
+	value |= ((priv->clk_csr & priv->hw->mii.clk_csr_mask)
+		<< priv->hw->mii.clk_csr_shift);
+	if (priv->plat->has_gmac4)
+		value |= MII_GMAC4_WRITE;
 
 	/* Wait until any existing MII operation is complete */
 	if (stmmac_mdio_busy_wait(priv->ioaddr, mii_address))
@@ -305,13 +235,8 @@ int stmmac_mdio_register(struct net_device *ndev)
 #endif
 
 	new_bus->name = "stmmac";
-	if (priv->plat->has_gmac4) {
-		new_bus->read = &stmmac_mdio_read_gmac4;
-		new_bus->write = &stmmac_mdio_write_gmac4;
-	} else {
-		new_bus->read = &stmmac_mdio_read;
-		new_bus->write = &stmmac_mdio_write;
-	}
+	new_bus->read = &stmmac_mdio_read;
+	new_bus->write = &stmmac_mdio_write;
 
 	new_bus->reset = &stmmac_mdio_reset;
 	snprintf(new_bus->id, MII_BUS_ID_SIZE, "%s-%x",
-- 
2.7.3

^ permalink raw reply related

* [PATCH 1/2] net: stmmac: avoid Camelcase naming
From: Corentin Labbe @ 2016-12-01 15:19 UTC (permalink / raw)
  To: peppe.cavallaro, alexandre.torgue; +Cc: netdev, linux-kernel, Corentin Labbe

This patch simply rename regValue to value, like it was named in other
mdio functions.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index e3216e5..6796c28 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -83,14 +83,14 @@ static int stmmac_mdio_read(struct mii_bus *bus, int phyaddr, int phyreg)
 	unsigned int mii_data = priv->hw->mii.data;
 
 	int data;
-	u16 regValue = (((phyaddr << 11) & (0x0000F800)) |
+	u16 value = (((phyaddr << 11) & (0x0000F800)) |
 			((phyreg << 6) & (0x000007C0)));
-	regValue |= MII_BUSY | ((priv->clk_csr & 0xF) << 2);
+	value |= MII_BUSY | ((priv->clk_csr & 0xF) << 2);
 
 	if (stmmac_mdio_busy_wait(priv->ioaddr, mii_address))
 		return -EBUSY;
 
-	writel(regValue, priv->ioaddr + mii_address);
+	writel(value, priv->ioaddr + mii_address);
 
 	if (stmmac_mdio_busy_wait(priv->ioaddr, mii_address))
 		return -EBUSY;
-- 
2.7.3

^ permalink raw reply related

* Re: [PATCH] ip6_offload: check segs for NULL in ipv6_gso_segment.
From: Eric Dumazet @ 2016-12-01 15:19 UTC (permalink / raw)
  To: Artem Savkov
  Cc: davem, netdev, linux-kernel, jstancek, steffen.klassert,
	alexander.h.duyck
In-Reply-To: <1480602847.18162.288.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, 2016-12-01 at 06:34 -0800, Eric Dumazet wrote:
> On Thu, 2016-12-01 at 14:06 +0100, Artem Savkov wrote:
> > segs needs to be checked for being NULL in ipv6_gso_segment() before calling
> > skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference:
> 
> 
> > Signed-off-by: Artem Savkov <asavkov@redhat.com>
> > ---
> >  
> 
> > diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
> > index 1fcf61f..89c59e6 100644
> > --- a/net/ipv6/ip6_offload.c
> > +++ b/net/ipv6/ip6_offload.c
> > @@ -99,7 +99,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
> >  		segs = ops->callbacks.gso_segment(skb, features);
> >  	}
> >  
> > -	if (IS_ERR(segs))
> > +	if (IS_ERR_OR_NULL(segs))
> >  		goto out;
> >  
> >  	gso_partial = !!(skb_shinfo(segs)->gso_type & SKB_GSO_PARTIAL);
> 
> Do you know when was this bug added ?
> 
> Are you sure this is the right fix ?
> 
> Which gso_segment() is returning NULL exactly ?

Oh never mind.

This is the same fix than 576a30eb64534 but applied to IPv6.

Thanks !

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH net-next v6 6/6] samples/bpf: add userspace example for prohibiting sockets
From: David Ahern @ 2016-12-01 15:13 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, daniel, ast, daniel, maheshb, tgraf
In-Reply-To: <20161201055953.GB42375@ast-mbp.thefacebook.com>

On 11/30/16 10:59 PM, Alexei Starovoitov wrote:
> On Wed, Nov 30, 2016 at 10:16:50AM -0800, David Ahern wrote:
>> Add examples preventing a process in a cgroup from opening a socket
>> based family, protocol and type.
>>
>> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ...
>> +++ b/samples/bpf/sock_flags_kern.c
>> @@ -0,0 +1,37 @@
>> +#include <uapi/linux/bpf.h>
>> +#include <linux/socket.h>
>> +#include "bpf_helpers.h"
>> +
>> +SEC("cgroup/sock1")
>> +int bpf_prog1(struct bpf_sock *sk)
>> +{
>> +	char fmt[] = "socket: family %d type %d protocol %d\n";
>> +
>> +	bpf_trace_printk(fmt, sizeof(fmt), sk->family, sk->type, sk->protocol);
>> +
>> +	/* block PF_INET6, SOCK_RAW, IPPROTO_ICMPV6 sockets
>> +	 * ie., make ping6 fail
>> +	 */
>> +	if (sk->family == PF_INET6 && sk->type == 3 && sk->protocol == 58)
>> +		return 0;
> 
> why not to use SOCK_RAW and IPPROTO_ICMPV6 instead of constants?

header file hell.

^ permalink raw reply

* Re: [PATCH] ip6_offload: check segs for NULL in ipv6_gso_segment.
From: Artem Savkov @ 2016-12-01 15:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, netdev, linux-kernel, jstancek, steffen.klassert,
	alexander.h.duyck
In-Reply-To: <1480602847.18162.288.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, Dec 01, 2016 at 06:34:07AM -0800, Eric Dumazet wrote:
> On Thu, 2016-12-01 at 14:06 +0100, Artem Savkov wrote:
> > segs needs to be checked for being NULL in ipv6_gso_segment() before calling
> > skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference:
> 
> 
> > Signed-off-by: Artem Savkov <asavkov@redhat.com>
> > ---
> >  
> 
> > diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
> > index 1fcf61f..89c59e6 100644
> > --- a/net/ipv6/ip6_offload.c
> > +++ b/net/ipv6/ip6_offload.c
> > @@ -99,7 +99,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
> >  		segs = ops->callbacks.gso_segment(skb, features);
> >  	}
> >  
> > -	if (IS_ERR(segs))
> > +	if (IS_ERR_OR_NULL(segs))
> >  		goto out;
> >  
> >  	gso_partial = !!(skb_shinfo(segs)->gso_type & SKB_GSO_PARTIAL);
> 
> Do you know when was this bug added ?

It started to show up with 4.9-rc4, from what I see the culprit is
07b26c9 gso: Support partial splitting at the frag_list pointer

> Are you sure this is the right fix ?

I am not, but this would have the same behavior as pre-07b26c9 code and
IS_ERR_OR_NULL is used in ipv4's inet_gso_segment().

> Which gso_segment() is returning NULL exactly ?

Unfortunatelly I don't know that and I don't have a good reproducer, the
only way to reproduce this that I currently have is calling
virt-install.

-- 
Regards,
  Artem

^ permalink raw reply

* [PATCH scsi 2/3] scsi: cxgb4i,libcxgbi: add active open cmd for T6 adapters
From: Varun Prakash @ 2016-12-01 14:58 UTC (permalink / raw)
  To: martin.petersen, jejb; +Cc: linux-scsi, netdev, davem, indranil, varun
In-Reply-To: <cover.1480603332.git.varun@chelsio.com>

Add T6 active open cmd to open active connections
on T6 adapters.

Signed-off-by: Varun Prakash <varun@chelsio.com>
---
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 86 ++++++++++++++++++++++++++++++++------
 drivers/scsi/cxgbi/libcxgbi.c      |  6 +--
 2 files changed, 77 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 90522d4..a6fc990 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -188,7 +188,6 @@ static void send_act_open_req(struct cxgbi_sock *csk, struct sk_buff *skb,
 				struct l2t_entry *e)
 {
 	struct cxgb4_lld_info *lldi = cxgbi_cdev_priv(csk->cdev);
-	int t4 = is_t4(lldi->adapter_type);
 	int wscale = cxgbi_sock_compute_wscale(csk->mss_idx);
 	unsigned long long opt0;
 	unsigned int opt2;
@@ -231,7 +230,7 @@ static void send_act_open_req(struct cxgbi_sock *csk, struct sk_buff *skb,
 			csk, &req->local_ip, ntohs(req->local_port),
 			&req->peer_ip, ntohs(req->peer_port),
 			csk->atid, csk->rss_qid);
-	} else {
+	} else if (is_t5(lldi->adapter_type)) {
 		struct cpl_t5_act_open_req *req =
 				(struct cpl_t5_act_open_req *)skb->head;
 		u32 isn = (prandom_u32() & ~7UL) - 1;
@@ -259,12 +258,45 @@ static void send_act_open_req(struct cxgbi_sock *csk, struct sk_buff *skb,
 			csk, &req->local_ip, ntohs(req->local_port),
 			&req->peer_ip, ntohs(req->peer_port),
 			csk->atid, csk->rss_qid);
+	} else {
+		struct cpl_t6_act_open_req *req =
+				(struct cpl_t6_act_open_req *)skb->head;
+		u32 isn = (prandom_u32() & ~7UL) - 1;
+
+		INIT_TP_WR(req, 0);
+		OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_ACT_OPEN_REQ,
+							    qid_atid));
+		req->local_port = csk->saddr.sin_port;
+		req->peer_port = csk->daddr.sin_port;
+		req->local_ip = csk->saddr.sin_addr.s_addr;
+		req->peer_ip = csk->daddr.sin_addr.s_addr;
+		req->opt0 = cpu_to_be64(opt0);
+		req->params = cpu_to_be64(FILTER_TUPLE_V(
+				cxgb4_select_ntuple(
+					csk->cdev->ports[csk->port_id],
+					csk->l2t)));
+		req->rsvd = cpu_to_be32(isn);
+
+		opt2 |= T5_ISS_VALID;
+		opt2 |= RX_FC_DISABLE_F;
+		opt2 |= T5_OPT_2_VALID_F;
+
+		req->opt2 = cpu_to_be32(opt2);
+		req->rsvd2 = cpu_to_be32(0);
+		req->opt3 = cpu_to_be32(0);
+
+		log_debug(1 << CXGBI_DBG_TOE | 1 << CXGBI_DBG_SOCK,
+			  "csk t6 0x%p, %pI4:%u-%pI4:%u, atid %d, qid %u.\n",
+			  csk, &req->local_ip, ntohs(req->local_port),
+			  &req->peer_ip, ntohs(req->peer_port),
+			  csk->atid, csk->rss_qid);
 	}
 
 	set_wr_txq(skb, CPL_PRIORITY_SETUP, csk->port_id);
 
 	pr_info_ipaddr("t%d csk 0x%p,%u,0x%lx,%u, rss_qid %u.\n",
-		       (&csk->saddr), (&csk->daddr), t4 ? 4 : 5, csk,
+		       (&csk->saddr), (&csk->daddr),
+		       CHELSIO_CHIP_VERSION(lldi->adapter_type), csk,
 		       csk->state, csk->flags, csk->atid, csk->rss_qid);
 
 	cxgb4_l2t_send(csk->cdev->ports[csk->port_id], skb, csk->l2t);
@@ -275,7 +307,6 @@ static void send_act_open_req6(struct cxgbi_sock *csk, struct sk_buff *skb,
 			       struct l2t_entry *e)
 {
 	struct cxgb4_lld_info *lldi = cxgbi_cdev_priv(csk->cdev);
-	int t4 = is_t4(lldi->adapter_type);
 	int wscale = cxgbi_sock_compute_wscale(csk->mss_idx);
 	unsigned long long opt0;
 	unsigned int opt2;
@@ -293,10 +324,9 @@ static void send_act_open_req6(struct cxgbi_sock *csk, struct sk_buff *skb,
 
 	opt2 = RX_CHANNEL_V(0) |
 		RSS_QUEUE_VALID_F |
-		RX_FC_DISABLE_F |
 		RSS_QUEUE_V(csk->rss_qid);
 
-	if (t4) {
+	if (is_t4(lldi->adapter_type)) {
 		struct cpl_act_open_req6 *req =
 			    (struct cpl_act_open_req6 *)skb->head;
 
@@ -321,7 +351,7 @@ static void send_act_open_req6(struct cxgbi_sock *csk, struct sk_buff *skb,
 		req->params = cpu_to_be32(cxgb4_select_ntuple(
 					  csk->cdev->ports[csk->port_id],
 					  csk->l2t));
-	} else {
+	} else if (is_t5(lldi->adapter_type)) {
 		struct cpl_t5_act_open_req6 *req =
 				(struct cpl_t5_act_open_req6 *)skb->head;
 
@@ -344,12 +374,41 @@ static void send_act_open_req6(struct cxgbi_sock *csk, struct sk_buff *skb,
 		req->params = cpu_to_be64(FILTER_TUPLE_V(cxgb4_select_ntuple(
 					  csk->cdev->ports[csk->port_id],
 					  csk->l2t)));
+	} else {
+		struct cpl_t6_act_open_req6 *req =
+				(struct cpl_t6_act_open_req6 *)skb->head;
+
+		INIT_TP_WR(req, 0);
+		OPCODE_TID(req) = cpu_to_be32(MK_OPCODE_TID(CPL_ACT_OPEN_REQ6,
+							    qid_atid));
+		req->local_port = csk->saddr6.sin6_port;
+		req->peer_port = csk->daddr6.sin6_port;
+		req->local_ip_hi = *(__be64 *)(csk->saddr6.sin6_addr.s6_addr);
+		req->local_ip_lo = *(__be64 *)(csk->saddr6.sin6_addr.s6_addr +
+									8);
+		req->peer_ip_hi = *(__be64 *)(csk->daddr6.sin6_addr.s6_addr);
+		req->peer_ip_lo = *(__be64 *)(csk->daddr6.sin6_addr.s6_addr +
+									8);
+		req->opt0 = cpu_to_be64(opt0);
+
+		opt2 |= RX_FC_DISABLE_F;
+		opt2 |= T5_OPT_2_VALID_F;
+
+		req->opt2 = cpu_to_be32(opt2);
+
+		req->params = cpu_to_be64(FILTER_TUPLE_V(cxgb4_select_ntuple(
+					  csk->cdev->ports[csk->port_id],
+					  csk->l2t)));
+
+		req->rsvd2 = cpu_to_be32(0);
+		req->opt3 = cpu_to_be32(0);
 	}
 
 	set_wr_txq(skb, CPL_PRIORITY_SETUP, csk->port_id);
 
 	pr_info("t%d csk 0x%p,%u,0x%lx,%u, [%pI6]:%u-[%pI6]:%u, rss_qid %u.\n",
-		t4 ? 4 : 5, csk, csk->state, csk->flags, csk->atid,
+		CHELSIO_CHIP_VERSION(lldi->adapter_type), csk, csk->state,
+		csk->flags, csk->atid,
 		&csk->saddr6.sin6_addr, ntohs(csk->saddr.sin_port),
 		&csk->daddr6.sin6_addr, ntohs(csk->daddr.sin_port),
 		csk->rss_qid);
@@ -1381,7 +1440,6 @@ static int init_act_open(struct cxgbi_sock *csk)
 	void *daddr;
 	unsigned int step;
 	unsigned int size, size6;
-	int t4 = is_t4(lldi->adapter_type);
 	unsigned int linkspeed;
 	unsigned int rcv_winf, snd_winf;
 
@@ -1427,12 +1485,15 @@ static int init_act_open(struct cxgbi_sock *csk)
 		cxgb4_clip_get(ndev, (const u32 *)&csk->saddr6.sin6_addr, 1);
 #endif
 
-	if (t4) {
+	if (is_t4(lldi->adapter_type)) {
 		size = sizeof(struct cpl_act_open_req);
 		size6 = sizeof(struct cpl_act_open_req6);
-	} else {
+	} else if (is_t5(lldi->adapter_type)) {
 		size = sizeof(struct cpl_t5_act_open_req);
 		size6 = sizeof(struct cpl_t5_act_open_req6);
+	} else {
+		size = sizeof(struct cpl_t6_act_open_req);
+		size6 = sizeof(struct cpl_t6_act_open_req6);
 	}
 
 	if (csk->csk_family == AF_INET)
@@ -1793,7 +1854,8 @@ static void *t4_uld_add(const struct cxgb4_lld_info *lldi)
 	cdev->nports = lldi->nports;
 	cdev->mtus = lldi->mtus;
 	cdev->nmtus = NMTUS;
-	cdev->rx_credit_thres = cxgb4i_rx_credit_thres;
+	cdev->rx_credit_thres = (CHELSIO_CHIP_VERSION(lldi->adapter_type) <=
+				 CHELSIO_T5) ? cxgb4i_rx_credit_thres : 0;
 	cdev->skb_tx_rsvd = CXGB4I_TX_HEADER_LEN;
 	cdev->skb_rx_extra = sizeof(struct cpl_iscsi_hdr);
 	cdev->itp = &cxgb4i_iscsi_transport;
diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index 2ffe029..5423378 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -1627,15 +1627,15 @@ static void csk_return_rx_credits(struct cxgbi_sock *csk, int copied)
 		csk->rcv_wup, cdev->rx_credit_thres,
 		csk->rcv_win);
 
+	if (!cdev->rx_credit_thres)
+		return;
+
 	if (csk->state != CTP_ESTABLISHED)
 		return;
 
 	credits = csk->copied_seq - csk->rcv_wup;
 	if (unlikely(!credits))
 		return;
-	if (unlikely(cdev->rx_credit_thres == 0))
-		return;
-
 	must_send = credits + 16384 >= csk->rcv_win;
 	if (must_send || credits >= cdev->rx_credit_thres)
 		csk->rcv_wup += cdev->csk_send_rx_credits(csk, credits);
-- 
2.0.2

^ permalink raw reply related

* [PATCH scsi 1/3] scsi: cxgb4i: use cxgb4_tp_smt_idx() to get smt_idx
From: Varun Prakash @ 2016-12-01 14:58 UTC (permalink / raw)
  To: martin.petersen, jejb; +Cc: linux-scsi, netdev, davem, indranil, varun
In-Reply-To: <cover.1480603332.git.varun@chelsio.com>

cxgb4_tp_smt_idx() is defined in cxgb4 driver, it returns
smt_idx for T4,T5,T6 adapters.

Signed-off-by: Varun Prakash <varun@chelsio.com>
---
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 0039beb..90522d4 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -1451,8 +1451,8 @@ static int init_act_open(struct cxgbi_sock *csk)
 		csk->mtu = dst_mtu(csk->dst);
 	cxgb4_best_mtu(lldi->mtus, csk->mtu, &csk->mss_idx);
 	csk->tx_chan = cxgb4_port_chan(ndev);
-	/* SMT two entries per row */
-	csk->smac_idx = ((cxgb4_port_viid(ndev) & 0x7F)) << 1;
+	csk->smac_idx = cxgb4_tp_smt_idx(lldi->adapter_type,
+					 cxgb4_port_viid(ndev));
 	step = lldi->ntxq / lldi->nchan;
 	csk->txq_idx = cxgb4_port_idx(ndev) * step;
 	step = lldi->nrxq / lldi->nchan;
-- 
2.0.2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox