Netdev List

Netdev List
 help / color / mirror / Atom feed

* Ethernet on my CycloneV broke since 4.9.124
From: Clément Péron @ 2018-10-31 13:01 UTC (permalink / raw)
  To: Dinh Nguyen; +Cc: netdev

Hi,

The patch "net: stmmac: socfpga: add additional ocp reset line for
Stratix10" introduce in 4.9.124 broke the ethernet on my CycloneV
board.

When I boot i have this issue :

socfpga-dwmac ff702000.ethernet: error getting reset control of ocp -2
socfpga-dwmac: probe of ff702000.ethernet failed with error -2

Reverting the commit : 6f37f7b62baa6a71d7f3f298acb64de51275e724 fix the issue.

Thanks,
Clement

^ permalink raw reply

* [PATCH v2] usbnet: smsc95xx: disable carrier check while suspending
From: Frieder Schrempf @ 2018-10-31 21:52 UTC (permalink / raw)
  To: steve.glendinning, UNGLinuxDriver
  Cc: davem, netdev, linux-usb, linux-kernel, Frieder Schrempf, stable

We need to make sure, that the carrier check polling is disabled
while suspending. Otherwise we can end up with usbnet_read_cmd()
being issued when only usbnet_read_cmd_nopm() is allowed. If this
happens, read operations lock up.

Fixes: d69d169493 ("usbnet: smsc95xx: fix link detection for disabled autonegotiation")
Cc: <stable@vger.kernel.org>
Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
---
Changes in v2:
 * move cancel_delayed_work_sync() to correct error path

 drivers/net/usb/smsc95xx.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 262e7a3..2d17f3b 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1598,6 +1598,8 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
 		return ret;
 	}
 
+	cancel_delayed_work_sync(&pdata->carrier_check);
+
 	if (pdata->suspend_flags) {
 		netdev_warn(dev->net, "error during last resume\n");
 		pdata->suspend_flags = 0;
@@ -1840,6 +1842,11 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
 	 */
 	if (ret && PMSG_IS_AUTO(message))
 		usbnet_resume(intf);
+
+	if (ret)
+		schedule_delayed_work(&pdata->carrier_check,
+				      CARRIER_CHECK_DELAY);
+
 	return ret;
 }
 
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net] net: sched: Remove TCA_OPTIONS from policy
From: Marco Berizzi @ 2018-10-31 12:42 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Kernel Network Developers, dsahern
In-Reply-To: <CAM_iQpWzthb3JoHyMv0AtJZQNVr9CQgkoJ8jOpsZpMSX1R7WKg@mail.gmail.com>

> Il 26 ottobre 2018 alle 20.19 Cong Wang <xiyou.wangcong@gmail.com> ha scritto:
> 
> On Fri, Oct 26, 2018 at 4:35 AM Marco Berizzi <pupilla@libero.it> wrote:
> 
> > Apologies for bothering you again.
> > I applied your patch to 4.19, but after issuing this
> > command:
> > 
> > root@Calimero:~# tc qdisc add dev eth0 root handle 1:0 hfsc default 1
> > root@Calimero:~# ping 10.81.104.1
> > PING 10.81.104.1 (10.81.104.1) 56(84) bytes of data.
> > ^C
> > --- 10.81.104.1 ping statistics ---
> > 2 packets transmitted, 0 received, 100% packet loss, time 1001ms
> > 
> > I'm losing ipv4 connectivity.
> > If I remove the qdisc everything is going to work again:
> 
> Did this really work before?
> 
> You specify a default class without adding it, so the packets are dropped.
> 
> How would you expect this to work? :)

:-) yes indeed. Apologies for the noise.

^ permalink raw reply

* [PATCH] usbnet: smsc95xx: disable carrier check while suspending
From: Frieder Schrempf @ 2018-10-31 21:27 UTC (permalink / raw)
  To: steve.glendinning, UNGLinuxDriver
  Cc: davem, netdev, linux-usb, linux-kernel, Frieder Schrempf, stable

We need to make sure, that the carrier check polling is disabled
while suspending. Otherwise we can end up with usbnet_read_cmd()
being issued when only usbnet_read_cmd_nopm() is allowed. If this
happens, read operations lock up.

Fixes: d69d169493 ("usbnet: smsc95xx: fix link detection for disabled autonegotiation")
Cc: <stable@vger.kernel.org>
Signed-off-by: Frieder Schrempf <frieder.schrempf@kontron.de>
---
 drivers/net/usb/smsc95xx.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 262e7a3..3bc9633 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1592,6 +1592,8 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
 	u32 val, link_up;
 	int ret;
 
+	cancel_delayed_work_sync(&pdata->carrier_check);
+
 	ret = usbnet_suspend(intf, message);
 	if (ret < 0) {
 		netdev_warn(dev->net, "usbnet_suspend error\n");
@@ -1840,6 +1842,11 @@ static int smsc95xx_suspend(struct usb_interface *intf, pm_message_t message)
 	 */
 	if (ret && PMSG_IS_AUTO(message))
 		usbnet_resume(intf);
+
+	if (ret)
+		schedule_delayed_work(&pdata->carrier_check,
+				      CARRIER_CHECK_DELAY);
+
 	return ret;
 }
 
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net] net/mlx4_en: add a missing <net/ip.h> include
From: Tariq Toukan @ 2018-10-31 11:46 UTC (permalink / raw)
  To: edumazet@google.com, abdhalee@linux.vnet.ibm.com
  Cc: David Miller, netdev@vger.kernel.org, Tariq Toukan,
	eric.dumazet@gmail.com
In-Reply-To: <20181030.111924.2171118850630801686.davem@davemloft.net>



On 30/10/2018 8:19 PM, David Miller wrote:
> From: Eric Dumazet <edumazet@google.com>
> Date: Tue, 30 Oct 2018 00:18:12 -0700
> 
>> Abdul Haleem reported a build error on ppc :
>>
>> drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: `struct
>> iphdr` declared inside parameter list [enabled by default]
>>             struct iphdr *iph)
>>                    ^
>> drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: its scope is
>> only this definition or declaration, which is probably not what you want
>> [enabled by default]
>> drivers/net/ethernet/mellanox/mlx4/en_rx.c: In function
>> get_fixed_ipv4_csum:
>> drivers/net/ethernet/mellanox/mlx4/en_rx.c:586:20: error: dereferencing
>> pointer to incomplete type
>>    __u8 ipproto = iph->protocol;
>>                      ^
>>
>> Fixes: 55469bc6b577 ("drivers: net: remove <net/busy_poll.h> inclusion when not needed")
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> 
> Applied, thanks Eric.
> 

Thanks for the report, Abdul Haleem.
Thanks for your patch, Eric.

Regards,
Tariq

^ permalink raw reply

* Re: [PATCH net 3/3] net/mlx4_en: use netdev_tx_sent_queue_more()
From: Tariq Toukan @ 2018-10-31 11:37 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Willem de Bruijn, Tariq Toukan, Eric Dumazet
In-Reply-To: <20181029232539.217268-4-edumazet@google.com>



On 30/10/2018 1:25 AM, Eric Dumazet wrote:
> This patch has two changes :
> 
> 1) Use netdev_tx_sent_queue_more() for skbs with xmit_more
>     This avoids mangling BQL status, since we only need to
>     take care of it for the last skb of the batch.
> 
> 2) doorbel only depends on xmit_more and netif_tx_queue_stopped()
> 
>    While not strictly necessary after 1), it is more consistent
>    this way.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/en_tx.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 1857ee0f0871d48285a6d3711f7c3e9a1e08a05f..3acce02ade6a115881ecd72e4710e332d3f380cb 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -1006,7 +1006,6 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
>   		ring->packets++;
>   	}
>   	ring->bytes += tx_info->nr_bytes;
> -	netdev_tx_sent_queue(ring->tx_queue, tx_info->nr_bytes);
>   	AVG_PERF_COUNTER(priv->pstats.tx_pktsz_avg, skb->len);
>   
>   	if (tx_info->inl)
> @@ -1044,7 +1043,14 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
>   		netif_tx_stop_queue(ring->tx_queue);
>   		ring->queue_stopped++;
>   	}
> -	send_doorbell = !skb->xmit_more || netif_xmit_stopped(ring->tx_queue);
> +
> +	if (skb->xmit_more) {
> +		netdev_tx_sent_queue_more(ring->tx_queue, tx_info->nr_bytes);
> +		send_doorbell = netif_tx_queue_stopped(ring->tx_queue);
> +	} else {
> +		netdev_tx_sent_queue(ring->tx_queue, tx_info->nr_bytes);
> +		send_doorbell = true;
> +	}
>   
>   	real_size = (real_size / 16) & 0x3f;
>   
> 

The drivers' code template would be nicer if we unify the two functions 
netdev_tx_sent_queue/netdev_tx_sent_queue_more to a single one with a 
parameter.

Currently, all drivers that would want to benefit from this optimization 
will have to repeat these if/else blocks.

Regards,
Tariq

^ permalink raw reply

* Re: [PATCH net 1/3] net: bql: add netdev_tx_sent_queue_more() helper
From: Tariq Toukan @ 2018-10-31 11:30 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Willem de Bruijn, Tariq Toukan, Eric Dumazet
In-Reply-To: <20181029232539.217268-2-edumazet@google.com>



On 30/10/2018 1:25 AM, Eric Dumazet wrote:
> When qdisc_run() tries to use BQL budget to bulk-dequeue a batch
> of packets, GSO can later transform this list in another list
> of skbs, and each skb is sent to device ndo_start_xmit(),
> one at a time, with skb->xmit_more being set to one but
> for last skb.
> 
> Problem is that very often, BQL limit is hit in the middle of
> the packet train, forcing dev_hard_start_xmit() to stop the
> bulk send and requeue the end of the list.
> 
> BQL role is to avoid head of line blocking, making sure
> a qdisc can deliver high priority packets before low priority ones.
> 
> But there is no way requeued packets can be bypassed by fresh
> packets in the qdisc.
> 
> Aborting the bulk send increases TX softirqs, and hot cache
> lines (after skb_segment()) are wasted.
> 
> Note that for TSO packets, we never split a packet in the middle
> because of BQL limit being hit.
> 
> Drivers should be able to update BQL counters without
> flipping/caring about BQL status, if the current skb
> has xmit_more set.
> 
> Upper layers are ultimately responsible to stop sending another
> packet train when BQL limit is hit.
> 
> Code template in a driver might look like the following :
> 
> 	if (skb->xmit_more) {
> 		netdev_tx_sent_queue_more(tx_queue, nr_bytes);
> 		send_doorbell = netif_tx_queue_stopped(tx_queue);
> 	} else {
> 		netdev_tx_sent_queue(tx_queue, nr_bytes);
> 		send_doorbell = true;
> 	}
> 

Hi Eric,
Nice optimization.

I thought of another way of implementing it, by just extending the 
existing netdev_tx_sent_queue function with a new xmit_more parameter, 
that the driver passes.
Something like this:

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d837dad24b4c..feb9cbcb5759 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3129,12 +3129,12 @@ static inline void 
netdev_txq_bql_complete_prefetchw(struct netdev_queue *dev_qu
  }

  static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue,
-                                       unsigned int bytes)
+                                       unsigned int bytes, bool more)
  {
  #ifdef CONFIG_BQL
         dql_queued(&dev_queue->dql, bytes);

-       if (likely(dql_avail(&dev_queue->dql) >= 0))
+       if (more || likely(dql_avail(&dev_queue->dql) >= 0))
                 return;

         set_bit(__QUEUE_STATE_STACK_XOFF, &dev_queue->state);


This unifies and simplifies both the stack and driver code, as the new 
suggested function netdev_tx_sent_queue_more can become a private case 
of the existing one.
This would, however, require a one-time maintenance of 31 existing 
usages of the function:
$ git grep netdev_tx_sent_queue drivers/net/ethernet/ | wc -l
31

What do you think?

> Note that netdev_tx_sent_queue_more() use is not mandatory,
> since following patch will change dev_hard_start_xmit()
> to not care about BQL status.

OK so changing the driver code according to the suggested here becomes 
safe starting next patch, and you do it only in patch 3, so it's fine.

> 
> But it is higly recommended so that xmit_more full benefits

typo: highly, just in case you re-spin.

> can be reached (less doorbells sent, and less atomic operations as well)
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>   include/linux/netdevice.h | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index dc1d9ed33b3192e9406b17c3107b3235b28ff1b9..beb37232688f7e4a71c932e472454e94df18b865 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3166,6 +3166,18 @@ static inline void netdev_txq_bql_complete_prefetchw(struct netdev_queue *dev_qu
>   #endif
>   }
>   
> +/* Variant of netdev_tx_sent_queue() for packets with xmit_more.
> + * We do want to change __QUEUE_STATE_STACK_XOFF only for the last
> + * skb of a batch.
> + */
> +static inline void netdev_tx_sent_queue_more(struct netdev_queue *dev_queue,
> +					     unsigned int bytes)
> +{
> +#ifdef CONFIG_BQL
> +	dql_queued(&dev_queue->dql, bytes);
> +#endif
> +}
> +
>   static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue,
>   					unsigned int bytes)
>   {
> 

^ permalink raw reply related

* Re: [PATCH net] net/mlx5e: fix csum adjustments caused by RXFCS
From: Eran Ben Elisha @ 2018-10-31 11:27 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Eric Dumazet, Saeed Mahameed, Dimitris Michailidis,
	Cong Wang, Paweł Staszewski, Maria Pasechnik
In-Reply-To: <20181030075725.195824-1-edumazet@google.com>



On 10/30/2018 9:57 AM, Eric Dumazet wrote:
> As shown by Dmitris, we need to use csum_block_add() instead of csum_add()
> when adding the FCS contribution to skb csum.
> 
> Before 4.18 (more exactly commit 88078d98d1bb "net: pskb_trim_rcsum()
> and CHECKSUM_COMPLETE are friends"), the whole skb csum was thrown away,
> so RXFCS changes were ignored.
> 
> Then before commit d55bef5059dd ("net: fix pskb_trim_rcsum_slow() with
> odd trim offset") both mlx5 and pskb_trim_rcsum_slow() bugs were canceling
> each other.
> 
> Now we fixed pskb_trim_rcsum_slow() we need to fix mlx5.
> 
> Note that this patch also rewrites mlx5e_get_fcs() to :
> 
> - Use skb_header_pointer() instead of reinventing it.
> - Use __get_unaligned_cpu32() to avoid possible non aligned accesses
>    as Dmitris pointed out.
> 
> Fixes: 902a545904c7 ("net/mlx5e: When RXFCS is set, add FCS data into checksum calculation")
> Reported-by: Paweł Staszewski <pstaszewski@itcare.pl>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Eran Ben Elisha <eranbe@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Cc: Dimitris Michailidis <dmichail@google.com>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: Paweł Staszewski <pstaszewski@itcare.pl>
> ---
>   .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 45 ++++---------------
>   1 file changed, 9 insertions(+), 36 deletions(-)

Thanks for the modification!
We run a direct test we had for this scenario and it passed.

Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Tested-By: Maria Pasechnik <mariap@mellanox.com>

^ permalink raw reply

* Re: [RFC PATCH 1/3] can: m_can: Create m_can core to leverage common code
From: Dan Murphy @ 2018-10-31 20:15 UTC (permalink / raw)
  To: Wolfgang Grandegger, mkl, davem; +Cc: linux-can, netdev, linux-kernel
In-Reply-To: <52811b27-00c0-f5e2-2b18-608ccf846723@grandegger.com>

Wolfgang

Thanks for the review

On 10/27/2018 09:19 AM, Wolfgang Grandegger wrote:
> Hello Dan,
> 
> for the RFC, could you please just do the necessary changes to the
> existing code. We can discuss about better names, etc. later. For
> the review if the common code I quickly did:
> 
>   mv m_can.c m_can_platform.c
>   mv m_can_core.c m_can.c
> 
> The file names are similar to what we have for the C_CAN driver.
> 
>   s/classdev/priv/
>   variable name s/m_can_dev/priv/
> 
> Then your patch 1/3 looks as shown below. I'm going to comment on that
> one. The comments start with "***"....
> 

So you would like me to align the names with the c_can driver?

<snip>
> 
> *** I didn't review the rest of the patch for now.
> 

snipped the code to reply to the comment.

> Looking to the generic code, you didn't really change the way
> the driver is accessing the registers. Also the interrupt handling
> and rx polling is as it was before. Does that work properly using
> the SPI interface of the TCAN4x5x?

I don't want to change any of that yet.  Maybe my cover letter was not clear
or did not go through.

But the intention was just to break out the functionality to create a MCAN framework
that can be used by devices that contain the Bosch MCAN core and provider their own protocal to access
the registers in the device.

I don't want to do any functional changes at this time on the IP code itself until we have a framework.
There should be no regression in the io mapped code.

I did comment on the interrupt handling and asked if a threaded work queue would affect CAN timing.
For the original TCAN driver this was the way it was implemented.

> 
> I was also thinking about optimized read/write functions handling
> more than 4 bytes of data, e.g. for the CAN payload data. That
> would speed-up SPI transfers, I think. But that could also be
> introduced later-on.

That would be the plan.

> 
> Wolfgang.
> 
> 
> 
> 

-- 
------------------
Dan Murphy

^ permalink raw reply

* [PATCH] bonding:avoid repeated display of same link status change
From: mk.singh @ 2018-10-31 10:57 UTC (permalink / raw)
  To: netdev
  Cc: eric.dumazet, mkubecek, Manish Kumar Singh, Jay Vosburgh,
	Veaceslav Falico, Andy Gospodarek, David S. Miller, linux-kernel

From: Manish Kumar Singh <mk.singh@oracle.com>

When link status change needs to be committed and rtnl lock couldn't be
taken, avoid redisplay of same link status change message.

Signed-off-by: Manish Kumar Singh <mk.singh@oracle.com>
---
 drivers/net/bonding/bond_main.c | 8 ++++++--
 include/net/bonding.h           | 1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2b01180be834..b3d95c7040ac 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2096,7 +2096,8 @@ static int bond_miimon_inspect(struct bonding *bond)
 			bond_propose_link_state(slave, BOND_LINK_FAIL);
 			commit++;
 			slave->delay = bond->params.downdelay;
-			if (slave->delay) {
+			if (slave->delay &&
+			    !atomic64_read(&bond->rtnl_needed)) {
 				netdev_info(bond->dev, "link status down for %sinterface %s, disabling it in %d ms\n",
 					    (BOND_MODE(bond) ==
 					     BOND_MODE_ACTIVEBACKUP) ?
@@ -2136,7 +2137,8 @@ static int bond_miimon_inspect(struct bonding *bond)
 			commit++;
 			slave->delay = bond->params.updelay;
 
-			if (slave->delay) {
+			if (slave->delay &&
+			    !atomic64_read(&bond->rtnl_needed)) {
 				netdev_info(bond->dev, "link status up for interface %s, enabling it in %d ms\n",
 					    slave->dev->name,
 					    ignore_updelay ? 0 :
@@ -2310,9 +2312,11 @@ static void bond_mii_monitor(struct work_struct *work)
 		if (!rtnl_trylock()) {
 			delay = 1;
 			should_notify_peers = false;
+			atomic64_set(&bond->rtnl_needed, 1);
 			goto re_arm;
 		}
 
+		atomic64_set(&bond->rtnl_needed, 0);
 		bond_for_each_slave(bond, slave, iter) {
 			bond_commit_link_state(slave, BOND_SLAVE_NOTIFY_LATER);
 		}
diff --git a/include/net/bonding.h b/include/net/bonding.h
index a4f116f06c50..20c3c875266f 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -229,6 +229,7 @@ struct bonding {
 	struct	 dentry *debug_dir;
 #endif /* CONFIG_DEBUG_FS */
 	struct rtnl_link_stats64 bond_stats;
+	atomic64_t rtnl_needed;
 };
 
 #define bond_slave_get_rcu(dev) \
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH net] vhost: Fix Spectre V1 vulnerability
From: David Miller @ 2018-10-31 19:39 UTC (permalink / raw)
  To: jasowang; +Cc: mst, kvm, virtualization, netdev, linux-kernel, jpoimboe,
	aarcange
In-Reply-To: <20181030061049.7424-1-jasowang@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Tue, 30 Oct 2018 14:10:49 +0800

> The idx in vhost_vring_ioctl() was controlled by userspace, hence a
> potential exploitation of the Spectre variant 1 vulnerability.
> 
> Fixing this by sanitizing idx before using it to index d->vqs.
> 
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH][net-next] net: hns3: fix spelling mistake "intrerrupt" -> "interrupt"
From: David Miller @ 2018-10-31 19:33 UTC (permalink / raw)
  To: colin.king
  Cc: yisen.zhuang, salil.mehta, shiju.jose, netdev, kernel-janitors,
	linux-kernel
In-Reply-To: <20181029224611.20008-1-colin.king@canonical.com>

From: Colin King <colin.king@canonical.com>
Date: Mon, 29 Oct 2018 22:46:11 +0000

> From: Colin Ian King <colin.king@canonical.com>
> 
> Trivial fix to spelling mistake in dev_err message
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

net-next is closed, but a spelling fix is fine for 'net'.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH ghak90 (was ghak32) V4 09/10] audit: NETFILTER_PKT: record each container ID associated with a netNS
From: Richard Guy Briggs @ 2018-10-31 19:30 UTC (permalink / raw)
  To: Paul Moore
  Cc: containers, linux-audit, linux-kernel, netdev, netfilter-devel,
	ebiederm, luto, carlos, dhowells, viro, simo, Eric Paris,
	Serge Hallyn
In-Reply-To: <CAHC9VhT2GJH2cJsrCsxCZMLsdv67oXJhH9mHAgpfMRY4=y43WQ@mail.gmail.com>

On 2018-10-19 19:18, Paul Moore wrote:
> On Sun, Aug 5, 2018 at 4:33 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > Add audit container identifier auxiliary record(s) to NETFILTER_PKT
> > event standalone records.  Iterate through all potential audit container
> > identifiers associated with a network namespace.
> >
> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > ---
> >  include/linux/audit.h    |  5 +++++
> >  kernel/audit.c           | 26 ++++++++++++++++++++++++++
> >  net/netfilter/xt_AUDIT.c | 12 ++++++++++--
> >  3 files changed, 41 insertions(+), 2 deletions(-)
> 
> ...
> 
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index 9a02095..8755f4d 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -169,6 +169,8 @@ extern int audit_log_contid(struct audit_context *context,
> >  extern void audit_netns_contid_add(struct net *net, u64 contid);
> >  extern void audit_netns_contid_del(struct net *net, u64 contid);
> >  extern void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p);
> > +extern void audit_log_netns_contid_list(struct net *net,
> > +                                struct audit_context *context);
> >
> >  extern int                 audit_update_lsm_rules(void);
> >
> > @@ -228,6 +230,9 @@ static inline void audit_netns_contid_del(struct net *net, u64 contid)
> >  { }
> >  static inline void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p)
> >  { }
> > +static inline void audit_log_netns_contid_list(struct net *net,
> > +                                       struct audit_context *context)
> > +{ }
> >
> >  #define audit_enabled AUDIT_OFF
> >  #endif /* CONFIG_AUDIT */
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index c5fed3b..b23711c 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -392,6 +392,32 @@ void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p)
> >                 audit_netns_contid_add(new->net_ns, contid);
> >  }
> >
> > +void audit_log_netns_contid_list(struct net *net, struct audit_context *context)
> > +{
> > +       spinlock_t *lock = audit_get_netns_contid_list_lock(net);
> > +       struct audit_buffer *ab;
> > +       struct audit_contid *cont;
> > +       bool first = true;
> > +
> > +       /* Generate AUDIT_CONTAINER record with container ID CSV list */
> > +       ab = audit_log_start(context, GFP_ATOMIC, AUDIT_CONTAINER);
> > +       if (!ab) {
> > +               audit_log_lost("out of memory in audit_log_netns_contid_list");
> > +               return;
> > +       }
> > +       audit_log_format(ab, "contid=");
> > +       spin_lock(lock);
> > +       list_for_each_entry(cont, audit_get_netns_contid_list(net), list) {
> > +               if (!first)
> > +                       audit_log_format(ab, ",");
> > +               audit_log_format(ab, "%llu", cont->id);
> > +               first = false;
> > +       }
> > +       spin_unlock(lock);
> 
> This is looking like potentially a lot of work to be doing under a
> spinlock, not to mention a single spinlock that is shared across CPUs.
> Considering that I expect changes to the list to be somewhat
> infrequent, this might be a good candidate for a RCU based locking
> scheme.

Would something like this look reasonable?
(This is on top of a patch to make contid list lock and unlock
functions.)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index be5d6eb..9428fc3 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -92,6 +92,7 @@ struct audit_contid {
 	struct list_head	list;
 	u64			id;
 	refcount_t		refcount;
+	struct rcu_head		rcu;
 };
 
 extern int is_audit_feature_set(int which);
diff --git a/kernel/audit.c b/kernel/audit.c
index d5b58163..6f84c25 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -106,7 +106,6 @@
 struct audit_net {
 	struct sock *sk;
 	struct list_head contid_list;
-	spinlock_t contid_list_lock;
 };
 
 /**
@@ -327,26 +326,6 @@ struct list_head *audit_get_netns_contid_list(const struct net *net)
 	return &aunet->contid_list;
 }
 
-static int audit_netns_contid_lock(const struct net *net)
-{
-	struct audit_net *aunet = net_generic(net, audit_net_id);
-
-	if (!aunet)
-		return -EINVAL;
-	spin_lock(aunet->contid_list_lock);
-	return 0;
-}
-
-static int audit_netns_contid_unlock(const struct net *net)
-{
-	struct audit_net *aunet = net_generic(net, audit_net_id);
-
-	if (!aunet)
-		return -EINVAL;
-	spin_unlock(aunet->contid_list_lock);
-	return 0;
-}
-
 void audit_netns_contid_add(struct net *net, u64 contid)
 {
 	struct list_head *contid_list = audit_get_netns_contid_list(net);
@@ -354,10 +333,9 @@ void audit_netns_contid_add(struct net *net, u64 contid)
 
 	if (!audit_contid_valid(contid))
 		return;
-	if (audit_netns_contid_lock(net))
-		return;
+	rcu_read_lock();
 	if (!list_empty(contid_list))
-		list_for_each_entry(cont, contid_list, list)
+		list_for_each_entry_rcu(cont, contid_list, list)
 			if (cont->id == contid) {
 				refcount_inc(&cont->refcount);
 				goto out;
@@ -367,10 +345,16 @@ void audit_netns_contid_add(struct net *net, u64 contid)
 		INIT_LIST_HEAD(&cont->list);
 		cont->id = contid;
 		refcount_set(&cont->refcount, 1);
-		list_add(&cont->list, contid_list);
+		list_add_rcu(&cont->list, contid_list);
 	}
 out:
-	audit_netns_contid_unlock(net);
+	rcu_read_unlock();
+}
+
+audit_free_contid_rcu(struct rcu_head *head) {
+	struct audit_contid *contid = container_of(head, struct audit_contid, rcu);
+
+	kfree(contid);
 }
 
 void audit_netns_contid_del(struct net *net, u64 contid)
@@ -380,17 +364,16 @@ void audit_netns_contid_del(struct net *net, u64 contid)
 
 	if (!audit_contid_valid(contid))
 		return;
-	if (audit_netns_contid_lock(net))
-		return;
+	rcu_read_lock();
 	if (!list_empty(contid_list))
-		list_for_each_entry(cont, contid_list, list)
+		list_for_each_entry_rcu(cont, contid_list, list)
 			if (cont->id == contid) {
-				list_del(&cont->list);
+				list_del_rcu(&cont->list);
 				if (refcount_dec_and_test(&cont->refcount))
-					kfree(cont);
+					call_rcu(&cont->rcu, audit_free_contid_rcu);
 				break;
 			}
-	audit_netns_contid_unlock(net);
+	rcu_read_unlock();
 }
 
 void audit_switch_task_namespaces(struct nsproxy *ns, struct task_struct *p)
@@ -418,15 +401,14 @@ void audit_log_netns_contid_list(struct net *net, struct audit_context *context)
 		return;
 	}
 	audit_log_format(ab, "ref=net contid=");
-	if (audit_netns_contid_lock(net))
-		return;
-	list_for_each_entry(cont, audit_get_netns_contid_list(net), list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(cont, audit_get_netns_contid_list(net), list) {
 		if (!first)
 			audit_log_format(ab, ",");
 		audit_log_format(ab, "%llu", cont->id);
 		first = false;
 	}
-	audit_netns_contid_unlock(net);
+	rcu_read_unlock();
 	audit_log_end(ab);
 }
 EXPORT_SYMBOL(audit_log_netns_contid_list);
@@ -1674,7 +1656,6 @@ static int __net_init audit_net_init(struct net *net)
 		.flags	= NL_CFG_F_NONROOT_RECV,
 		.groups	= AUDIT_NLGRP_MAX,
 	};
-
 	struct audit_net *aunet = net_generic(net, audit_net_id);
 
 	aunet->sk = netlink_kernel_create(net, NETLINK_AUDIT, &cfg);
@@ -1684,8 +1665,6 @@ static int __net_init audit_net_init(struct net *net)
 	}
 	aunet->sk->sk_sndtimeo = MAX_SCHEDULE_TIMEOUT;
 	INIT_LIST_HEAD(&aunet->contid_list);
-	spin_lock_init(&aunet->contid_list_lock);
-
 	return 0;
 }
 
> 
> > +       audit_log_end(ab);
> > +}
> > +EXPORT_SYMBOL(audit_log_netns_contid_list);
> >
> >  void audit_panic(const char *message)
> >  {
> >         switch (audit_failure) {
> > diff --git a/net/netfilter/xt_AUDIT.c b/net/netfilter/xt_AUDIT.c
> > index af883f1..44fac3f 100644
> > --- a/net/netfilter/xt_AUDIT.c
> > +++ b/net/netfilter/xt_AUDIT.c
> > @@ -71,10 +71,13 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
> >  {
> >         struct audit_buffer *ab;
> >         int fam = -1;
> > +       struct audit_context *context;
> > +       struct net *net;
> >
> >         if (audit_enabled == AUDIT_OFF)
> > -               goto errout;
> > -       ab = audit_log_start(NULL, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
> > +               goto out;
> > +       context = audit_alloc_local(GFP_ATOMIC);
> > +       ab = audit_log_start(context, GFP_ATOMIC, AUDIT_NETFILTER_PKT);
> >         if (ab == NULL)
> >                 goto errout;
> >
> > @@ -104,7 +107,12 @@ static bool audit_ip6(struct audit_buffer *ab, struct sk_buff *skb)
> >
> >         audit_log_end(ab);
> >
> > +       net = xt_net(par);
> > +       audit_log_netns_contid_list(net, context);
> > +
> >  errout:
> > +       audit_free_context(context);
> > +out:
> >         return XT_CONTINUE;
> >  }
> >
> 
> --
> paul moore
> www.paul-moore.com

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply related

* [PATCH rdma] net/mlx5: Fix XRC SRQ umem valid bits
From: Leon Romanovsky @ 2018-10-31 10:20 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Yishai Hadas, RDMA mailing list, Artemy Kovalyov, Saeed Mahameed,
	linux-netdev, Leon Romanovsky

From: Yishai Hadas <yishaih@mellanox.com>

Adapt XRC SRQ to the latest HW specification with fixed definition
around umem valid bits. The previous definition relied on a bit which
was taken for other purposes in legacy FW.

Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID bits")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
Hi Doug, Jason

This commit fixes code sent in this merge window, so I'm not marking it
with any rdma-rc/rdma-next. It will be better to be sent during this merge
window if you have extra pull request to issue, or as a -rc material, if
not.

BTW, we didn't combine reserved fields, because our convention is to align such
fields to 32 bits for better readability.

Thanks
---
 include/linux/mlx5/mlx5_ifc.h | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 0f460fb22c31..248116bc13a8 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2470,14 +2470,15 @@ struct mlx5_ifc_xrc_srqc_bits {

 	u8         wq_signature[0x1];
 	u8         cont_srq[0x1];
-	u8         dbr_umem_valid[0x1];
+	u8         reserved_at_22[0x1];
 	u8         rlky[0x1];
 	u8         basic_cyclic_rcv_wqe[0x1];
 	u8         log_rq_stride[0x3];
 	u8         xrcd[0x18];

 	u8         page_offset[0x6];
-	u8         reserved_at_46[0x2];
+	u8         reserved_at_46[0x1];
+	u8         dbr_umem_valid[0x1];
 	u8         cqn[0x18];

 	u8         reserved_at_60[0x20];
@@ -6685,9 +6686,12 @@ struct mlx5_ifc_create_xrc_srq_in_bits {

 	struct mlx5_ifc_xrc_srqc_bits xrc_srq_context_entry;

-	u8         reserved_at_280[0x40];
+	u8         reserved_at_280[0x60];
+
 	u8         xrc_srq_umem_valid[0x1];
-	u8         reserved_at_2c1[0x5bf];
+	u8         reserved_at_2e1[0x1f];
+
+	u8         reserved_at_300[0x580];

 	u8         pas[0][0x40];
 };

^ permalink raw reply related

* [PATCH net 4/4] selftests: mlxsw: qos_mc_aware: Add a test for UC awareness
From: Ido Schimmel @ 2018-10-31  9:56 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata, mlxsw,
	Ido Schimmel
In-Reply-To: <20181031095601.29846-1-idosch@mellanox.com>

From: Petr Machata <petrm@mellanox.com>

In a previous patch, mlxsw was updated to configure a minimum bandwidth
allowance on MC TCs. Test that this indeed fixes the problem of UC
traffic overload pushing out all MC traffic.

Fixes: b5638d46c90a ("selftests: mlxsw: Add a test for UC behavior under MC flood")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../drivers/net/mlxsw/qos_mc_aware.sh         | 93 ++++++++++++++-----
 1 file changed, 70 insertions(+), 23 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh
index a8fc36d670e1..117f6f35d72f 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh
@@ -25,24 +25,24 @@
 # Thus we set MTU to 10K on all involved interfaces. Then both unicast and
 # multicast traffic uses 8K frames.
 #
-# +-----------------------+                +----------------------------------+
-# | H1                    |                |                               H2 |
-# |                       |                |  unicast --> + $h2.111           |
-# |                       |                |  traffic     | 192.0.2.129/28    |
-# |          multicast    |                |              | e-qos-map 0:1     |
-# |          traffic      |                |              |                   |
-# | $h1 + <-----          |                |              + $h2               |
-# +-----|-----------------+                +--------------|-------------------+
-#       |                                                 |
-# +-----|-------------------------------------------------|-------------------+
-# |     + $swp1                                           + $swp2             |
-# |     | >1Gbps                                          | >1Gbps            |
-# | +---|----------------+                     +----------|----------------+  |
-# | |   + $swp1.1        |                     |          + $swp2.111      |  |
+# +---------------------------+            +----------------------------------+
+# | H1                        |            |                               H2 |
+# |                           |            |  unicast --> + $h2.111           |
+# |                 multicast |            |  traffic     | 192.0.2.129/28    |
+# |                 traffic   |            |              | e-qos-map 0:1     |
+# |           $h1 + <-----    |            |              |                   |
+# | 192.0.2.65/28 |           |            |              + $h2               |
+# +---------------|-----------+            +--------------|-------------------+
+#                 |                                       |
+# +---------------|---------------------------------------|-------------------+
+# |         $swp1 +                                       + $swp2             |
+# |        >1Gbps |                                       | >1Gbps            |
+# | +-------------|------+                     +----------|----------------+  |
+# | |     $swp1.1 +      |                     |          + $swp2.111      |  |
 # | |                BR1 |             SW      | BR111                     |  |
-# | |   + $swp3.1        |                     |          + $swp3.111      |  |
-# | +---|----------------+                     +----------|----------------+  |
-# |     \_________________________________________________/                   |
+# | |     $swp3.1 +      |                     |          + $swp3.111      |  |
+# | +-------------|------+                     +----------|----------------+  |
+# |               \_______________________________________/                   |
 # |                                    |                                      |
 # |                                    + $swp3                                |
 # |                                    | 1Gbps bottleneck                     |
@@ -51,6 +51,7 @@
 #                                      |
 #                                   +--|-----------------+
 #                                   |  + $h3          H3 |
+#                                   |  | 192.0.2.66/28   |
 #                                   |  |                 |
 #                                   |  + $h3.111         |
 #                                   |    192.0.2.130/28  |
@@ -59,6 +60,7 @@
 ALL_TESTS="
 	ping_ipv4
 	test_mc_aware
+	test_uc_aware
 "
 
 lib_dir=$(dirname $0)/../../../net/forwarding
@@ -68,14 +70,14 @@ source $lib_dir/lib.sh
 
 h1_create()
 {
-	simple_if_init $h1
+	simple_if_init $h1 192.0.2.65/28
 	mtu_set $h1 10000
 }
 
 h1_destroy()
 {
 	mtu_restore $h1
-	simple_if_fini $h1
+	simple_if_fini $h1 192.0.2.65/28
 }
 
 h2_create()
@@ -97,7 +99,7 @@ h2_destroy()
 
 h3_create()
 {
-	simple_if_init $h3
+	simple_if_init $h3 192.0.2.66/28
 	mtu_set $h3 10000
 
 	vlan_create $h3 111 v$h3 192.0.2.130/28
@@ -108,7 +110,7 @@ h3_destroy()
 	vlan_destroy $h3 111
 
 	mtu_restore $h3
-	simple_if_fini $h3
+	simple_if_fini $h3 192.0.2.66/28
 }
 
 switch_create()
@@ -251,7 +253,7 @@ measure_uc_rate()
 	# average ingress rate to somewhat mitigate this.
 	local min_ingress=2147483648
 
-	mausezahn $h2.111 -p 8000 -A 192.0.2.129 -B 192.0.2.130 -c 0 \
+	$MZ $h2.111 -p 8000 -A 192.0.2.129 -B 192.0.2.130 -c 0 \
 		-a own -b $h3mac -t udp -q &
 	sleep 1
 
@@ -291,7 +293,7 @@ test_mc_aware()
 	check_err $? "Could not get high enough UC-only ingress rate"
 	local ucth1=${uc_rate[1]}
 
-	mausezahn $h1 -p 8000 -c 0 -a own -b bc -t udp -q &
+	$MZ $h1 -p 8000 -c 0 -a own -b bc -t udp -q &
 
 	local d0=$(date +%s)
 	local t0=$(ethtool_stats_get $h3 rx_octets_prio_0)
@@ -335,6 +337,51 @@ test_mc_aware()
 	echo "    egress UC throughput  $(humanize ${uc_rate_2[1]})"
 	echo "    ingress MC throughput $(humanize $mc_ir)"
 	echo "    egress MC throughput  $(humanize $mc_er)"
+	echo
+}
+
+test_uc_aware()
+{
+	RET=0
+
+	$MZ $h2.111 -p 8000 -A 192.0.2.129 -B 192.0.2.130 -c 0 \
+		-a own -b $h3mac -t udp -q &
+
+	local d0=$(date +%s)
+	local t0=$(ethtool_stats_get $h3 rx_octets_prio_1)
+	local u0=$(ethtool_stats_get $swp2 rx_octets_prio_1)
+	sleep 1
+
+	local attempts=50
+	local passes=0
+	local i
+
+	for ((i = 0; i < attempts; ++i)); do
+		if $ARPING -c 1 -I $h1 -b 192.0.2.66 -q -w 0.1; then
+			((passes++))
+		fi
+
+		sleep 0.1
+	done
+
+	local d1=$(date +%s)
+	local t1=$(ethtool_stats_get $h3 rx_octets_prio_1)
+	local u1=$(ethtool_stats_get $swp2 rx_octets_prio_1)
+
+	local interval=$((d1 - d0))
+	local uc_ir=$(rate $u0 $u1 $interval)
+	local uc_er=$(rate $t0 $t1 $interval)
+
+	((attempts == passes))
+	check_err $?
+
+	# Suppress noise from killing mausezahn.
+	{ kill %% && wait; } 2>/dev/null
+
+	log_test "MC performace under UC overload"
+	echo "    ingress UC throughput $(humanize ${uc_ir})"
+	echo "    egress UC throughput  $(humanize ${uc_er})"
+	echo "    sent $attempts BC ARPs, got $passes responses"
 }
 
 trap cleanup EXIT
-- 
2.17.2

^ permalink raw reply related

* [PATCH net 2/4] mlxsw: spectrum: Set minimum shaper on MC TCs
From: Ido Schimmel @ 2018-10-31  9:56 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata, mlxsw,
	Ido Schimmel
In-Reply-To: <20181031095601.29846-1-idosch@mellanox.com>

From: Petr Machata <petrm@mellanox.com>

An MC-aware mode was introduced in commit 7b8195306694 ("mlxsw:
spectrum: Configure MC-aware mode on mlxsw ports"). In MC-aware mode,
BUM traffic gets a special treatment by being assigned to a separate set
of traffic classes 8..15. Pairs of TCs 0 and 8, 1 and 9, etc., are then
configured to strictly prioritize the lower-numbered ones. The intention
is to prevent BUM traffic from flooding the switch and push out all UC
traffic, which would otherwise happen, and instead give UC traffic
precedence.

However strictly prioritizing UC traffic has the effect that UC overload
pushes out all BUM traffic, such as legitimate ARP queries. These
packets are kept in queues for a while, but under sustained UC overload,
their lifetime eventually expires and these packets are dropped. That is
detrimental to network performance as well.

Therefore configure the MC TCs (8..15) with minimum shaper of 200Mbps (a
minimum permitted value) to allow a trickle of necessary control traffic
to get through.

Fixes: 7b8195306694 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum.c    | 25 +++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 8a4983adae94..a2df12b79f8e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -2740,6 +2740,21 @@ int mlxsw_sp_port_ets_maxrate_set(struct mlxsw_sp_port *mlxsw_sp_port,
 	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(qeec), qeec_pl);
 }

+static int mlxsw_sp_port_min_bw_set(struct mlxsw_sp_port *mlxsw_sp_port,
+				    enum mlxsw_reg_qeec_hr hr, u8 index,
+				    u8 next_index, u32 minrate)
+{
+	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
+	char qeec_pl[MLXSW_REG_QEEC_LEN];
+
+	mlxsw_reg_qeec_pack(qeec_pl, mlxsw_sp_port->local_port, hr, index,
+			    next_index);
+	mlxsw_reg_qeec_mise_set(qeec_pl, true);
+	mlxsw_reg_qeec_min_shaper_rate_set(qeec_pl, minrate);
+
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(qeec), qeec_pl);
+}
+
 int mlxsw_sp_port_prio_tc_set(struct mlxsw_sp_port *mlxsw_sp_port,
 			      u8 switch_prio, u8 tclass)
 {
@@ -2817,6 +2832,16 @@ static int mlxsw_sp_port_ets_init(struct mlxsw_sp_port *mlxsw_sp_port)
 			return err;
 	}

+	/* Configure the min shaper for multicast TCs. */
+	for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+		err = mlxsw_sp_port_min_bw_set(mlxsw_sp_port,
+					       MLXSW_REG_QEEC_HIERARCY_TC,
+					       i + 8, i,
+					       MLXSW_REG_QEEC_MIS_MIN);
+		if (err)
+			return err;
+	}
+
 	/* Map all priorities to traffic class 0. */
 	for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
 		err = mlxsw_sp_port_prio_tc_set(mlxsw_sp_port, i, 0);
-- 
2.17.2

^ permalink raw reply related

* [PATCH net 3/4] selftests: mlxsw: qos_mc_aware: Tweak for min shaper
From: Ido Schimmel @ 2018-10-31  9:56 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata, mlxsw,
	Ido Schimmel
In-Reply-To: <20181031095601.29846-1-idosch@mellanox.com>

From: Petr Machata <petrm@mellanox.com>

Since the minimum shaper is now being enabled for MC TCs, it's
unreasonable to expect no UC traffic loss. Minimal min shaper value is
200Mbps, which is 20% of the 1Gbps that this test configures on egress.
To cover for glitches, tolerate up to 25% UC degradation under MC
overload.

Fixes: b5638d46c90a ("selftests: mlxsw: Add a test for UC behavior under MC flood")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh
index 0150bb2741eb..a8fc36d670e1 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/qos_mc_aware.sh
@@ -311,7 +311,7 @@ test_mc_aware()
 			ret = 100 * ($ucth1 - $ucth2) / $ucth1
 			if (ret > 0) { ret } else { 0 }
 		    ")
-	check_err $(bc <<< "$deg > 10")
+	check_err $(bc <<< "$deg > 25")
 
 	local interval=$((d1 - d0))
 	local mc_ir=$(rate $u0 $u1 $interval)
-- 
2.17.2

^ permalink raw reply related

* [PATCH net 1/4] mlxsw: reg: QEEC: Add minimum shaper fields
From: Ido Schimmel @ 2018-10-31  9:56 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata, mlxsw,
	Ido Schimmel
In-Reply-To: <20181031095601.29846-1-idosch@mellanox.com>

From: Petr Machata <petrm@mellanox.com>

Add QEEC.mise (minimum shaper enable) and QEEC.min_shaper_rate to enable
configuration of minimum shaper.

Increase the QEEC length to 0x20 as well: that's the length that the
register has had for a long time now, but with the configurations that
mlxsw typically exercises, the firmware tolerated 0x1C-sized packets.
With mise=true however, FW rejects packets unless they have the full
required length.

Fixes: b9b7cee40579 ("mlxsw: reg: Add QoS ETS Element Configuration register")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 32cb6718bb17..db3d2790aeec 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -3284,7 +3284,7 @@ static inline void mlxsw_reg_qtct_pack(char *payload, u8 local_port,
  * Configures the ETS elements.
  */
 #define MLXSW_REG_QEEC_ID 0x400D
-#define MLXSW_REG_QEEC_LEN 0x1C
+#define MLXSW_REG_QEEC_LEN 0x20
 
 MLXSW_REG_DEFINE(qeec, MLXSW_REG_QEEC_ID, MLXSW_REG_QEEC_LEN);
 
@@ -3326,6 +3326,15 @@ MLXSW_ITEM32(reg, qeec, element_index, 0x04, 0, 8);
  */
 MLXSW_ITEM32(reg, qeec, next_element_index, 0x08, 0, 8);
 
+/* reg_qeec_mise
+ * Min shaper configuration enable. Enables configuration of the min
+ * shaper on this ETS element
+ * 0 - Disable
+ * 1 - Enable
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, qeec, mise, 0x0C, 31, 1);
+
 enum {
 	MLXSW_REG_QEEC_BYTES_MODE,
 	MLXSW_REG_QEEC_PACKETS_MODE,
@@ -3342,6 +3351,17 @@ enum {
  */
 MLXSW_ITEM32(reg, qeec, pb, 0x0C, 28, 1);
 
+/* The smallest permitted min shaper rate. */
+#define MLXSW_REG_QEEC_MIS_MIN	200000		/* Kbps */
+
+/* reg_qeec_min_shaper_rate
+ * Min shaper information rate.
+ * For CPU port, can only be configured for port hierarchy.
+ * When in bytes mode, value is specified in units of 1000bps.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, qeec, min_shaper_rate, 0x0C, 0, 28);
+
 /* reg_qeec_mase
  * Max shaper configuration enable. Enables configuration of the max
  * shaper on this ETS element.
-- 
2.17.2

^ permalink raw reply related

* [PATCH net 0/4] mlxsw: Enable minimum shaper on MC TCs
From: Ido Schimmel @ 2018-10-31  9:56 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: davem@davemloft.net, Jiri Pirko, Petr Machata, mlxsw,
	Ido Schimmel

Petr says:

An MC-aware mode was introduced in commit 7b8195306694 ("mlxsw:
spectrum: Configure MC-aware mode on mlxsw ports"). In MC-aware mode,
BUM traffic gets a special treatment by being assigned to a separate set
of traffic classes 8..15. Pairs of TCs 0 and 8, 1 and 9, etc., are then
configured to strictly prioritize the lower-numbered ones. The intention
is to prevent BUM traffic from flooding the switch and push out all UC
traffic, which would otherwise happen, and instead give UC traffic
precedence.

However strictly prioritizing UC traffic has the effect that UC overload
pushes out all BUM traffic, such as legitimate ARP queries. These
packets are kept in queues for a while, but under sustained UC overload,
their lifetime eventually expires and these packets are dropped. That is
detrimental to network performance as well.

In this patchset, MC TCs (8..15) are configured with minimum shaper of
200Mbps (a minimum permitted value) to allow a trickle of necessary
control traffic to get through.

First in patch #1, the QEEC register is extended with fields necessary
to configure the minimum shaper.

In patch #2, minimum shaper is enabled on TCs 8..15.

In patches #3 and #4, first the MC-awareness test is tweaked to support
the minimum shaper, and then a new test is introduced to test that MC
traffic behaves well under UC overload.

Petr Machata (4):
  mlxsw: reg: QEEC: Add minimum shaper fields
  mlxsw: spectrum: Set minimum shaper on MC TCs
  selftests: mlxsw: qos_mc_aware: Tweak for min shaper
  selftests: mlxsw: qos_mc_aware: Add a test for UC awareness

 drivers/net/ethernet/mellanox/mlxsw/reg.h     | 22 ++++-
 .../net/ethernet/mellanox/mlxsw/spectrum.c    | 25 +++++
 .../drivers/net/mlxsw/qos_mc_aware.sh         | 95 ++++++++++++++-----
 3 files changed, 117 insertions(+), 25 deletions(-)

-- 
2.17.2

^ permalink raw reply

* Re: [RFC PATCH v3 06/10] udp: cope with UDP GRO packet misdirection
From: Paolo Abeni @ 2018-10-31  9:54 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Willem de Bruijn, Steffen Klassert,
	Subash Abhinov Kasiviswanathan
In-Reply-To: <0fbc6f0cea4b3a976a003a593c6365cb4e4a9e99.1540920083.git.pabeni@redhat.com>

On Tue, 2018-10-30 at 18:24 +0100, Paolo Abeni wrote:
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -406,17 +406,24 @@ static inline int copy_linear_skb(struct sk_buff *skb, int len, int off,
>  } while(0)
>  
>  #if IS_ENABLED(CONFIG_IPV6)
> -#define __UDPX_INC_STATS(sk, field)					\
> -do {									\
> -	if ((sk)->sk_family == AF_INET)					\
> -		__UDP_INC_STATS(sock_net(sk), field, 0);		\
> -	else								\
> -		__UDP6_INC_STATS(sock_net(sk), field, 0);		\
> -} while (0)
> +#define __UDPX_MIB(sk, ipv4)						\
> +({									\
> +	ipv4 ? (IS_UDPLITE(sk) ? sock_net(sk)->mib.udplite_statistics :	\
> +				 sock_net(sk)->mib.udp_statistics) :	\
> +		(IS_UDPLITE(sk) ? sock_net(sk)->mib.udplite_stats_in6 :	\
> +				 sock_net(sk)->mib.udp_stats_in6);	\
> +})
>  #else
> -#define __UDPX_INC_STATS(sk, field) __UDP_INC_STATS(sock_net(sk), field, 0)
> +#define __UDPX_MIB(sk, ipv4)						\
> +({									\
> +	IS_UDPLITE(sk) ? sock_net(sk)->mib.udplite_statistics :		\
> +			 sock_net(sk)->mib.udp_statistics;		\
> +})
>  #endif
>  
> +#define __UDPX_INC_STATS(sk, field) \
> +	__SNMP_INC_STATS(__UDPX_MIB(sk, (sk)->sk_family == AF_INET, field)
> +

This is broken (complains only if CONFIG_AF_RXRPC is set), will fix in
next iteration (thanks kbuildbot).

But I'd prefer to keep the above helper: it can be used in a follow-up
patch to cleanup a bit udp6_recvmsg().

>  #ifdef CONFIG_PROC_FS
>  struct udp_seq_afinfo {
>  	sa_family_t			family;
> @@ -450,4 +457,32 @@ DECLARE_STATIC_KEY_FALSE(udpv6_encap_needed_key);
>  void udpv6_encap_enable(void);
>  #endif
>  
> +static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
> +					      struct sk_buff *skb)
> +{
> +	bool ipv4 = skb->protocol == htons(ETH_P_IP);

And this cause a compile warning when # CONFIG_IPV6 is not set, I will
fix in the next iteration (again thanks kbuildbot)

Cheers,

Paolo

^ permalink raw reply

* Re: [RFC PATCH 3/4] igb: add support for extended PHC gettime
From: Miroslav Lichvar @ 2018-10-31  9:39 UTC (permalink / raw)
  To: Richard Cochran; +Cc: netdev, intel-wired-lan, Jacob Keller
In-Reply-To: <20181031022916.nuvbjegnn6bsqxss@localhost>

On Tue, Oct 30, 2018 at 07:29:16PM -0700, Richard Cochran wrote:
> On Fri, Oct 26, 2018 at 06:27:41PM +0200, Miroslav Lichvar wrote:
> > +static int igb_ptp_gettimex(struct ptp_clock_info *ptp,
> > +			    struct ptp_system_timestamp *sts)
> > +{
> > +	struct igb_adapter *igb = container_of(ptp, struct igb_adapter,
> > +					       ptp_caps);
> > +	struct e1000_hw *hw = &igb->hw;
> > +	unsigned long flags;
> > +	u32 lo, hi;
> > +	u64 ns;
> > +
> > +	spin_lock_irqsave(&igb->tmreg_lock, flags);
> > +
> > +	/* 82576 doesn't have SYSTIMR */
> > +	if (igb->hw.mac.type == e1000_82576) {
> 
> Instead of if/then/else, can't you follow the pattern of providing
> different function flavors ...

I can. I was just trying to minimize the amount of triplicated code.
In the next version I'll add a patch to deprecate the old gettime
functions, as Jacob suggested, and replace them with the extended
versions, so the amount of code will not change that much.

Thanks,

-- 
Miroslav Lichvar

^ permalink raw reply

* Re: [PATCH 4/4] net: macb: Add support for suspend/resume with full power down
From: Harini Katakam @ 2018-10-31 16:56 UTC (permalink / raw)
  To: Claudiu Beznea
  Cc: Harini Katakam, Nicolas Ferre, David Miller, netdev, linux-kernel,
	Michal Simek, appanad
In-Reply-To: <f69c428d-565e-cf8e-de53-45850f2625cf@microchip.com>

Hi Claudiu,

> Hi Harini,
>
> I applied these patches on net-next/master cloned from [1], updated this
> moment, but I don't have a phy_dev member in struct macb. Maybe you wanted
> to use netdev->phydev ?
>
> Thank you,
> Claudiu Beznea
>
I apologize. Yes, I'm using netdev->phydev and it was uncommitted on my branch
@@ -4264,8 +4264,8 @@ static int __maybe_unused macb_suspend(struct device *dev)
                netif_device_detach(netdev);
                for (q = 0, queue = bp->queues; q < bp->num_queues;
++q, ++queue)
                        napi_disable(&queue->napi);
-               phy_stop(bp->phy_dev);
-               phy_suspend(bp->phy_dev);
+               phy_stop(netdev->phydev);
+               phy_suspend(netdev->phydev);
                spin_lock_irqsave(&bp->lock, flags);
                macb_reset_hw(bp);
                spin_unlock_irqrestore(&bp->lock, flags);
@@ -4300,9 +4300,9 @@ static int __maybe_unused macb_resume(struct device *dev)
                macb_writel(bp, NCR, MACB_BIT(MPE));
                for (q = 0, queue = bp->queues; q < bp->num_queues;
++q, ++queue)
                        napi_enable(&queue->napi);
-               phy_resume(bp->phy_dev);
-               phy_init_hw(bp->phy_dev);
-               phy_start(bp->phy_dev);
+               phy_resume(netdev->phydev);
+               phy_init_hw(netdev->phydev);
+               phy_start(netdev->phydev);
Will fix in next set.

Regards,
Harini

^ permalink raw reply

* Re: WARNING in rds_message_alloc_sgs
From: Santosh Shilimkar @ 2018-10-31 16:44 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: syzbot, linux-rdma, netdev, rds-devel, syzkaller-bugs, davem,
	linux-kernel
In-Reply-To: <20181031064220.GN3974@mtr-leonro.mtl.com>

On 10/30/2018 11:42 PM, Leon Romanovsky wrote:
> On Tue, Oct 30, 2018 at 12:38:02PM -0700, Santosh Shilimkar wrote:
>> On 10/30/2018 12:28 PM, syzbot wrote:
>>> Hello,
>>>
>>> syzbot found the following crash on:
>>>
>>> HEAD commit:    6201f31a39f8 Add linux-next specific files for 20181030
>>> git tree:       linux-next
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1397d06d400000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=2a22859d870756c1
>>> dashboard link:
>>> https://syzkaller.appspot.com/bug?extid=26de17458aeda9d305d8
>>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10bb52eb400000
>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=118bdfc5400000
>>>
>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> Reported-by: syzbot+26de17458aeda9d305d8@syzkaller.appspotmail.com
>>>
>>> WARNING: CPU: 0 PID: 19789 at net/rds/message.c:316
>>> rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
>>> Kernel panic - not syncing: panic_on_warn set ...
>> Looks like this kernel build has panic on warn enabled which
>> triggers panic for " WARN_ON(!nr_pages)" case. Will look into
>> it. Thanks !!
> 
> Please don't forget to remove user triggered WARN_ON.
> https://lwn.net/Articles/769365/
> "Greg Kroah-Hartman raised the problem of core kernel API code that will
> use WARN_ON_ONCE() to complain about bad usage; that will not generate
> the desired result if WARN_ON_ONCE() is configured to crash the machine.
> He was told that the code should just call pr_warn() instead, and that
> the called function should return an error in such situations. It was
> generally agreed that any WARN_ON() or WARN_ON_ONCE() calls that can be
> triggered from user space need to be fixed."
> 
OK. Thanks for the note !!

^ permalink raw reply

* Re: [PATCH net] net: ethernet: cadence: fix socket buffer corruption problem
From: Claudiu.Beznea @ 2018-10-31  7:45 UTC (permalink / raw)
  To: Tristram.Ha; +Cc: UNGLinuxDriver, netdev, davem, Nicolas.Ferre
In-Reply-To: <93AF473E2DA327428DE3D46B72B1E9FD411D09C8@CHN-SV-EXMX02.mchp-main.com>



On 30.10.2018 21:36, Tristram Ha - C24268 wrote:
>> Could you check on your side that applying this on top of your patch, your
>> scenario is still working?
>>
>> diff --git a/drivers/net/ethernet/cadence/macb_main.c
>> b/drivers/net/ethernet/cadence/macb_main.c
>> index 1d86b4d5645a..e1347d6d1b50 100644
>> --- a/drivers/net/ethernet/cadence/macb_main.c
>> +++ b/drivers/net/ethernet/cadence/macb_main.c
>> @@ -1702,12 +1702,8 @@ static int macb_pad_and_fcs(struct sk_buff **skb,
>> struct net_device *ndev)
>>                 *skb = nskb;
>>         }
>>
>> -       if (padlen) {
>> -               if (padlen >= ETH_FCS_LEN)
>> -                       skb_put_zero(*skb, padlen - ETH_FCS_LEN);
>> -               else
>> -                       skb_trim(*skb, ETH_FCS_LEN - padlen);
>> -       }
>> +       if (padlen > ETH_FCS_LEN)
>> +               skb_put_zero(*skb, padlen - ETH_FCS_LEN);
> 
> I think it is okay but I need to check all paths are covered.

On my side I checked with pktgen generating packets with sizes starting
from 1-MTU. Same scripts I also used when I first implemented this but it
seems that your scenario wasn't covered. Please have a look and let me know
how it works on your side.

> 
>> It was reported in [1] that UDP checksum is offloaded to hardware no matter
>> the application previously computed it.
>>
>> The code should be executed only for packets that has checksum computed
>> by
>> applications ((*skb)->ip_summed != CHECKSUM_PARTIAL). The idea was to
>> not
>> recompute checksum for packets with checksum already computed. To do
>> so,
>> while hardware checksum is enabled (NETIF_F_HW_CSUM), TX_NOCRC bit
>> should
>> be set on buffer descriptor. But to do so, packets must have a minimum size
>> of 64 and FCS to be computed.
>>
>> The NETIF_F_HW_CSUM check was placed there because the issue
>> described in
>> [1] is reproducible because hardware checksum is enabled and overrides the
>> checksum provided by applications.
>>
>> [1] https://www.spinics.net/lists/netdev/msg505065.html
> 
> I understand the issue now.  It is weird that the transmit descriptor does not
> have direct control over turning on checksum generation or not, but it wastes
> 3 bits returning the error codes of such generation.  What can the driver do
> with such information?

Yep, from my POV it would have been nice if it could have been able to do
the pad an FCS even when hardware checksum is enabled and TX_NOCRC bit is
set in descriptor. For cases when hardware checksum is disable it seems
that it could handle it.

> 
> In my opinion then hardware transmit checksumming cannot be supported
> In Linux.
> 
>>> NETIF_F_SG is not enabled in the MAC I used, so enabling
>> NETIF_IF_HW_CSUM
>>> is rather pointless.  With the padding code the transmit throughput cannot
>> get
>>> higher than 100Mbps in a gigabit connection.
>>>
>>> I would recommend to add this option to disable manual padding in one of
>> those
>>> macb_config structures.
>>
>> In this way the user would have to know from the beginning what kind of
>> packets are used.
>>
> 
> The kernel already does a good job of calculating checksum.  Using hardware to
> do that does not improve performance much.
> 
> Alternative is to use ethtool to disable hardware tx checksum so that software can
> intentionally send wrong checksums.
> 

^ permalink raw reply

* Re: [PATCH net] net: dsa: microchip: initialize mutex before use
From: Pavel Machek @ 2018-10-31  7:42 UTC (permalink / raw)
  To: Tristram.Ha
  Cc: David S. Miller, Andrew Lunn, Florian Fainelli, UNGLinuxDriver,
	netdev
In-Reply-To: <1540943149-26832-1-git-send-email-Tristram.Ha@microchip.com>

[-- Attachment #1: Type: text/plain, Size: 539 bytes --]

On Tue 2018-10-30 16:45:49, Tristram.Ha@microchip.com wrote:
> From: Tristram Ha <Tristram.Ha@microchip.com>
> 
> Initialize mutex before use.  Avoid kernel complaint when
> CONFIG_DEBUG_LOCK_ALLOC is enabled.
> 
> Fixes: b987e98e50ab90e5 ("dsa: add DSA switch driver for Microchip KSZ9477")
> Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com>

Acked-by: Pavel Machek <pavel@ucw.cz>

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox