Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 0/3] net: stmmac: Performance improvements in Multi-Queue
From: Jose Abreu @ 2019-02-15 13:42 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Jose Abreu, Florian Fainelli, Joao Pinto, David S . Miller,
	Giuseppe Cavallaro, Alexandre Torgue

Tested in XGMAC2 and GMAC5. I guess 1/3 can be backported but besides being a
kind of "not that small" refactoring it's also for a scenario that no-one
complained yet ... Advice ?

Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>

Jose Abreu (3):
  net: stmmac: Fix NAPI poll in TX path when in multi-queue
  net: stmmac: dwmac4: Also use TBU interrupt to clean TX path
  net: stmmac: dwxgmac2: Also use TBU interrupt to clean TX path

 drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c   |  24 ++---
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h     |   4 +-
 drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c |   8 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac.h       |   5 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 112 ++++++++++++---------
 5 files changed, 77 insertions(+), 76 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH net-next 1/3] net: stmmac: Fix NAPI poll in TX path when in multi-queue
From: Jose Abreu @ 2019-02-15 13:42 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Jose Abreu, Florian Fainelli, Joao Pinto, David S . Miller,
	Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <cover.1550237884.git.joabreu@synopsys.com>

Commit 8fce33317023 introduced the concept of NAPI per-channel and
independent cleaning of TX path.

This is currently breaking performance in some cases. The scenario
happens when all packets are being received in Queue 0 but the TX is
performed in Queue != 0.

Fix this by using different NAPI instances per each TX and RX queue, as
suggested by Florian.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac.h      |   5 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 112 ++++++++++++----------
 2 files changed, 64 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 63e1064b27a2..e697ecd9b0a6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -78,11 +78,10 @@ struct stmmac_rx_queue {
 };
 
 struct stmmac_channel {
-	struct napi_struct napi ____cacheline_aligned_in_smp;
+	struct napi_struct rx_napi ____cacheline_aligned_in_smp;
+	struct napi_struct tx_napi ____cacheline_aligned_in_smp;
 	struct stmmac_priv *priv_data;
 	u32 index;
-	int has_rx;
-	int has_tx;
 };
 
 struct stmmac_tc_entry {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 685d20472358..89a4dd75af76 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -155,7 +155,10 @@ static void stmmac_disable_all_queues(struct stmmac_priv *priv)
 	for (queue = 0; queue < maxq; queue++) {
 		struct stmmac_channel *ch = &priv->channel[queue];
 
-		napi_disable(&ch->napi);
+		if (queue < rx_queues_cnt)
+			napi_disable(&ch->rx_napi);
+		if (queue < tx_queues_cnt)
+			napi_disable(&ch->tx_napi);
 	}
 }
 
@@ -173,7 +176,10 @@ static void stmmac_enable_all_queues(struct stmmac_priv *priv)
 	for (queue = 0; queue < maxq; queue++) {
 		struct stmmac_channel *ch = &priv->channel[queue];
 
-		napi_enable(&ch->napi);
+		if (queue < rx_queues_cnt)
+			napi_enable(&ch->rx_napi);
+		if (queue < tx_queues_cnt)
+			napi_enable(&ch->tx_napi);
 	}
 }
 
@@ -1939,6 +1945,10 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue)
 		mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_T(eee_timer));
 	}
 
+	/* We still have pending packets, let's call for a new scheduling */
+	if (tx_q->dirty_tx != tx_q->cur_tx)
+		mod_timer(&tx_q->txtimer, STMMAC_COAL_TIMER(10));
+
 	__netif_tx_unlock_bh(netdev_get_tx_queue(priv->dev, queue));
 
 	return count;
@@ -2029,23 +2039,15 @@ static int stmmac_napi_check(struct stmmac_priv *priv, u32 chan)
 	int status = stmmac_dma_interrupt_status(priv, priv->ioaddr,
 						 &priv->xstats, chan);
 	struct stmmac_channel *ch = &priv->channel[chan];
-	bool needs_work = false;
-
-	if ((status & handle_rx) && ch->has_rx) {
-		needs_work = true;
-	} else {
-		status &= ~handle_rx;
-	}
 
-	if ((status & handle_tx) && ch->has_tx) {
-		needs_work = true;
-	} else {
-		status &= ~handle_tx;
+	if ((status & handle_rx) && (chan < priv->plat->rx_queues_to_use)) {
+		stmmac_disable_dma_irq(priv, priv->ioaddr, chan);
+		napi_schedule_irqoff(&ch->rx_napi);
 	}
 
-	if (needs_work && napi_schedule_prep(&ch->napi)) {
+	if ((status & handle_tx) && (chan < priv->plat->tx_queues_to_use)) {
 		stmmac_disable_dma_irq(priv, priv->ioaddr, chan);
-		__napi_schedule(&ch->napi);
+		napi_schedule_irqoff(&ch->tx_napi);
 	}
 
 	return status;
@@ -2241,8 +2243,14 @@ static void stmmac_tx_timer(struct timer_list *t)
 
 	ch = &priv->channel[tx_q->queue_index];
 
-	if (likely(napi_schedule_prep(&ch->napi)))
-		__napi_schedule(&ch->napi);
+	/*
+	 * If NAPI is already running we can miss some events. Let's rearm
+	 * the timer and try again.
+	 */
+	if (likely(napi_schedule_prep(&ch->tx_napi)))
+		__napi_schedule(&ch->tx_napi);
+	else
+		mod_timer(&tx_q->txtimer, STMMAC_COAL_TIMER(10));
 }
 
 /**
@@ -3498,7 +3506,7 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 			else
 				skb->ip_summed = CHECKSUM_UNNECESSARY;
 
-			napi_gro_receive(&ch->napi, skb);
+			napi_gro_receive(&ch->rx_napi, skb);
 
 			priv->dev->stats.rx_packets++;
 			priv->dev->stats.rx_bytes += frame_len;
@@ -3513,41 +3521,41 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 	return count;
 }
 
-/**
- *  stmmac_poll - stmmac poll method (NAPI)
- *  @napi : pointer to the napi structure.
- *  @budget : maximum number of packets that the current CPU can receive from
- *	      all interfaces.
- *  Description :
- *  To look at the incoming frames and clear the tx resources.
- */
-static int stmmac_napi_poll(struct napi_struct *napi, int budget)
+static int stmmac_napi_poll_rx(struct napi_struct *napi, int budget)
 {
 	struct stmmac_channel *ch =
-		container_of(napi, struct stmmac_channel, napi);
+		container_of(napi, struct stmmac_channel, rx_napi);
 	struct stmmac_priv *priv = ch->priv_data;
-	int work_done, rx_done = 0, tx_done = 0;
 	u32 chan = ch->index;
+	int work_done;
 
 	priv->xstats.napi_poll++;
 
-	if (ch->has_tx)
-		tx_done = stmmac_tx_clean(priv, budget, chan);
-	if (ch->has_rx)
-		rx_done = stmmac_rx(priv, budget, chan);
+	work_done = stmmac_rx(priv, budget, chan);
+	if (work_done < budget && napi_complete_done(napi, work_done))
+		stmmac_enable_dma_irq(priv, priv->ioaddr, chan);
+	return work_done;
+}
 
-	work_done = max(rx_done, tx_done);
-	work_done = min(work_done, budget);
+static int stmmac_napi_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct stmmac_channel *ch =
+		container_of(napi, struct stmmac_channel, tx_napi);
+	struct stmmac_priv *priv = ch->priv_data;
+	struct stmmac_tx_queue *tx_q;
+	u32 chan = ch->index;
+	int work_done;
 
-	if (work_done < budget && napi_complete_done(napi, work_done)) {
-		int stat;
+	priv->xstats.napi_poll++;
 
+	work_done = stmmac_tx_clean(priv, budget, chan);
+	if (work_done < budget && napi_complete_done(napi, work_done))
 		stmmac_enable_dma_irq(priv, priv->ioaddr, chan);
-		stat = stmmac_dma_interrupt_status(priv, priv->ioaddr,
-						   &priv->xstats, chan);
-		if (stat && napi_reschedule(napi))
-			stmmac_disable_dma_irq(priv, priv->ioaddr, chan);
-	}
+
+	/* Force transmission restart */
+	tx_q = &priv->tx_queue[chan];
+	stmmac_enable_dma_transmission(priv, priv->ioaddr);
+	stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, chan);
 
 	return work_done;
 }
@@ -4323,13 +4331,14 @@ int stmmac_dvr_probe(struct device *device,
 		ch->priv_data = priv;
 		ch->index = queue;
 
-		if (queue < priv->plat->rx_queues_to_use)
-			ch->has_rx = true;
-		if (queue < priv->plat->tx_queues_to_use)
-			ch->has_tx = true;
-
-		netif_napi_add(ndev, &ch->napi, stmmac_napi_poll,
-			       NAPI_POLL_WEIGHT);
+		if (queue < priv->plat->rx_queues_to_use) {
+			netif_napi_add(ndev, &ch->rx_napi, stmmac_napi_poll_rx,
+				       NAPI_POLL_WEIGHT);
+		}
+		if (queue < priv->plat->tx_queues_to_use) {
+			netif_napi_add(ndev, &ch->tx_napi, stmmac_napi_poll_tx,
+				       NAPI_POLL_WEIGHT);
+		}
 	}
 
 	mutex_init(&priv->lock);
@@ -4385,7 +4394,10 @@ int stmmac_dvr_probe(struct device *device,
 	for (queue = 0; queue < maxq; queue++) {
 		struct stmmac_channel *ch = &priv->channel[queue];
 
-		netif_napi_del(&ch->napi);
+		if (queue < priv->plat->rx_queues_to_use)
+			netif_napi_del(&ch->rx_napi);
+		if (queue < priv->plat->tx_queues_to_use)
+			netif_napi_del(&ch->tx_napi);
 	}
 error_hw_init:
 	destroy_workqueue(priv->wq);
-- 
2.7.4


^ permalink raw reply related

* [RFC PATCH] bonding: use mutex lock in bond_get_stats()
From: Kefeng Wang @ 2019-02-15 13:50 UTC (permalink / raw)
  To: netdev, Willem de Bruijn, David S . Miller, Jay Vosburgh,
	Veaceslav Falico, Eric Dumazet
  Cc: weiyongjun1, Kefeng Wang

With CONFIG_DEBUG_SPINLOCK=y, we find following stack,

 BUG: spinlock wrong CPU on CPU#0, ip/16047
  lock: 0xffff803f5febc998, .magic: dead4ead, .owner: ip/16047, .owner_cpu: 0
 CPU: 1 PID: 16047 Comm: ip Kdump: loaded Tainted: G            E 4.19.12.aarch64 #1
 Hardware name: Huawei TaiShan 2280 V2/BC82AMDA, BIOS TA BIOS TaiShan 2280 V2 - B900 01/29/2019
 Call trace:
  dump_backtrace+0x0/0x1c0
  show_stack+0x24/0x30
  dump_stack+0x90/0xbc
  spin_dump+0x84/0xa8
  do_raw_spin_unlock+0xf8/0x100
  _raw_spin_unlock+0x20/0x30
  bond_get_stats+0x110/0x140 [bonding]
  rtnl_fill_stats+0x50/0x150
  rtnl_fill_ifinfo+0x4d4/0xd18
  rtnl_dump_ifinfo+0x200/0x3a8
  netlink_dump+0x100/0x2b0
  netlink_recvmsg+0x310/0x3e8
  sock_recvmsg+0x58/0x68
  ___sys_recvmsg+0xd0/0x278
  __sys_recvmsg+0x74/0xd0
  __arm64_sys_recvmsg+0x2c/0x38
  el0_svc_common+0x7c/0x118
  el0_svc_handler+0x30/0x40
  el0_svc+0x8/0xc

and then lead to softlockup issue, fix this by using mutex lock instead
of spin lock.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---

Not sure if this is right fix, please correct me if I'm wrong.

 drivers/net/bonding/bond_main.c | 6 +++---
 include/net/bonding.h           | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 485462d3087f..3f7849525759 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3452,7 +3452,7 @@ static void bond_get_stats(struct net_device *bond_dev,
 	struct list_head *iter;
 	struct slave *slave;
 
-	spin_lock_nested(&bond->stats_lock, bond_get_nest_level(bond_dev));
+	mutex_lock_nested(&bond->stats_lock, bond_get_nest_level(bond_dev));
 	memcpy(stats, &bond->bond_stats, sizeof(*stats));
 
 	rcu_read_lock();
@@ -3468,7 +3468,7 @@ static void bond_get_stats(struct net_device *bond_dev,
 	rcu_read_unlock();
 
 	memcpy(&bond->bond_stats, stats, sizeof(*stats));
-	spin_unlock(&bond->stats_lock);
+	mutex_unlock(&bond->stats_lock);
 }
 
 static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd)
@@ -4284,7 +4284,7 @@ void bond_setup(struct net_device *bond_dev)
 	struct bonding *bond = netdev_priv(bond_dev);
 
 	spin_lock_init(&bond->mode_lock);
-	spin_lock_init(&bond->stats_lock);
+	mutex_init(&bond->stats_lock);
 	bond->params = bonding_defaults;
 
 	/* Initialize pointers */
diff --git a/include/net/bonding.h b/include/net/bonding.h
index b46d68acf701..3a6dbb2b376c 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -205,7 +205,7 @@ struct bonding {
 	 * ALB mode (6) - to sync the use and modifications of its hash table
 	 */
 	spinlock_t mode_lock;
-	spinlock_t stats_lock;
+	struct mutex	stats_lock;
 	u8	 send_peer_notif;
 	u8       igmp_retrans;
 #ifdef CONFIG_PROC_FS
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next v2 3/3] enetc: Add ENETC PF level external MDIO support
From: Andrew Lunn @ 2019-02-15 13:34 UTC (permalink / raw)
  To: Claudiu Manoil
  Cc: Shawn Guo, Li Yang, David S . Miller, alexandru.marginean,
	linux-arm-kernel, devicetree, netdev, linux-kernel
In-Reply-To: <1550225414-12125-4-git-send-email-claudiu.manoil@nxp.com>

On Fri, Feb 15, 2019 at 12:10:14PM +0200, Claudiu Manoil wrote:
> Each ENETC PF has its own MDIO interface, the corresponding
> MDIO registers are mapped in the ENETC's Port register block.
> The current patch adds a driver for these PF level MDIO buses,
> so that each PF can manage directly its own external link.
> 
> Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com>
> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
> ---
> v2 - used readx_poll_timeout()
>    - added mdio node child to the port node

Hi Claudiu

Please document this in the device tree binding.

> +	/* return all Fs if nothing was there */
> +	if (enetc_rd_reg(&regs->mdio_cfg) & MDIO_CFG_RD_ER) {
> +		dev_err(&bus->dev,
> +			"Error while reading PHY%d reg at %d.%hhu\n",
> +			phy_id, dev_addr, regnum);
> +		return 0xffff;

I'm not sure you want a dev_err() here. The device tree binding allows
you to have a phy node without a reg value. When that happens, the
core code will scan all 32 addresses to find the PHY. This is going to
spam the log. dev_dbg() might be better.

     Andrew

^ permalink raw reply

* Re: [PATCH net-next] net: phy: marvell10g: Don't explicitly set Pause and Asym_Pause
From: Andrew Lunn @ 2019-02-15 13:24 UTC (permalink / raw)
  To: Maxime Chevallier
  Cc: davem, netdev, linux-kernel, Florian Fainelli, Heiner Kallweit,
	Russell King, linux-arm-kernel, Antoine Tenart, thomas.petazzoni,
	gregory.clement, miquel.raynal
In-Reply-To: <20190215083347.5886-1-maxime.chevallier@bootlin.com>

On Fri, Feb 15, 2019 at 09:33:47AM +0100, Maxime Chevallier wrote:
> The PHY core expects PHY drivers not to set Pause and Asym_Pause bits,
> unless the driver only wants to specify one of them due to HW
> limitation. In the case of the Marvell10g driver, we don't need to set
> them.
> 
> Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> Suggested-by: Andrew Lunn <andrew@lunn.ch>

Thanks Maxime

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH RFC] net: bridge: don't flood known multicast traffic when snooping is enabled
From: Nikolay Aleksandrov @ 2019-02-15 13:09 UTC (permalink / raw)
  To: netdev; +Cc: roopa, wkok, anuradhak, bridge, linus.luessing, davem, stephen
In-Reply-To: <20190215130427.29824-1-nikolay@cumulusnetworks.com>

On 15/02/2019 15:04, Nikolay Aleksandrov wrote:
> The behaviour since b00589af3b04 ("bridge: disable snooping if there is
> no querier") is wrong, we shouldn't be flooding multicast traffic when
> there is an mdb entry and we know where it should be forwarded to when
> multicast snooping is enabled. This patch changes the behaviour to not
> flood known unicast traffic. I'll give two obviously broken cases:
>  - most obvious: static mdb created by the user with snooping enabled
>  - user-space daemon controlling the mdb table (e.g. MLAG)
> 

I had to mention: when snooping is enabled and there is *no querier
configured*, that is the important bit here. In most setups the querier
is explicitly configured when there is no mcast router, but it shouldn't
be required to have the intuitive and normal behaviour.

> Every user would expect to have traffic forwarded only to the configured
> mdb destination when snooping is enabled, instead now to get that one
> needs to enable both snooping and querier. Enabling querier on all
> switches could be problematic and is not a good solution, for example
> as summarized by our multicast experts:
> "every switch would send an IGMP query for any random multicast traffic it
> received across the entire domain and it would send it forever as long as a
> host exists wanting that stream even if it has no downstream/directly
> connected receivers"
> 
> Sending as an RFC to get the discussion going, but I'm strongly for
> removing this behaviour and would like to send this patch officially.
> 
> We could make this behaviour possible via a knob if necessary, but
> it really should not be the default.
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> ---
>  net/bridge/br_device.c | 3 +--
>  net/bridge/br_input.c  | 3 +--
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
> index 013323b6dbe4..2aa8a6509924 100644
> --- a/net/bridge/br_device.c
> +++ b/net/bridge/br_device.c
> @@ -96,8 +96,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
>  		}
>  
>  		mdst = br_mdb_get(br, skb, vid);
> -		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
> -		    br_multicast_querier_exists(br, eth_hdr(skb)))
> +		if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb))
>  			br_multicast_flood(mdst, skb, false, true);
>  		else
>  			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 5ea7e56119c1..aae78095cf67 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -136,8 +136,7 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
>  	switch (pkt_type) {
>  	case BR_PKT_MULTICAST:
>  		mdst = br_mdb_get(br, skb, vid);
> -		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
> -		    br_multicast_querier_exists(br, eth_hdr(skb))) {
> +		if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) {
>  			if ((mdst && mdst->host_joined) ||
>  			    br_multicast_is_router(br)) {
>  				local_rcv = true;
> 


^ permalink raw reply

* [PATCH RFC] net: bridge: don't flood known multicast traffic when snooping is enabled
From: Nikolay Aleksandrov @ 2019-02-15 13:04 UTC (permalink / raw)
  To: netdev
  Cc: roopa, wkok, anuradhak, bridge, linus.luessing, davem, stephen,
	Nikolay Aleksandrov

The behaviour since b00589af3b04 ("bridge: disable snooping if there is
no querier") is wrong, we shouldn't be flooding multicast traffic when
there is an mdb entry and we know where it should be forwarded to when
multicast snooping is enabled. This patch changes the behaviour to not
flood known unicast traffic. I'll give two obviously broken cases:
 - most obvious: static mdb created by the user with snooping enabled
 - user-space daemon controlling the mdb table (e.g. MLAG)

Every user would expect to have traffic forwarded only to the configured
mdb destination when snooping is enabled, instead now to get that one
needs to enable both snooping and querier. Enabling querier on all
switches could be problematic and is not a good solution, for example
as summarized by our multicast experts:
"every switch would send an IGMP query for any random multicast traffic it
received across the entire domain and it would send it forever as long as a
host exists wanting that stream even if it has no downstream/directly
connected receivers"

Sending as an RFC to get the discussion going, but I'm strongly for
removing this behaviour and would like to send this patch officially.

We could make this behaviour possible via a knob if necessary, but
it really should not be the default.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 net/bridge/br_device.c | 3 +--
 net/bridge/br_input.c  | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 013323b6dbe4..2aa8a6509924 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -96,8 +96,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 		}
 
 		mdst = br_mdb_get(br, skb, vid);
-		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
-		    br_multicast_querier_exists(br, eth_hdr(skb)))
+		if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb))
 			br_multicast_flood(mdst, skb, false, true);
 		else
 			br_flood(br, skb, BR_PKT_MULTICAST, false, true);
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 5ea7e56119c1..aae78095cf67 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -136,8 +136,7 @@ int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb
 	switch (pkt_type) {
 	case BR_PKT_MULTICAST:
 		mdst = br_mdb_get(br, skb, vid);
-		if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
-		    br_multicast_querier_exists(br, eth_hdr(skb))) {
+		if (mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) {
 			if ((mdst && mdst->host_joined) ||
 			    br_multicast_is_router(br)) {
 				local_rcv = true;
-- 
2.17.2


^ permalink raw reply related

* pull-request: mac80211 2019-02-15
From: Johannes Berg @ 2019-02-15 12:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-wireless

Hi Dave,

It's clear things are winding down, this is basically just the stuff
from Herbert that we've been discussing. I threw in a simple error
path fix, mostly because it's simple :-)

Please pull and let me know if there's any problem.

Thanks,
johannes



The following changes since commit f9bcc9f3ee4fbbe8f11dfec76745476f5780517e:

  net: ethernet: freescale: set FEC ethtool regs version (2019-02-14 12:45:35 -0500)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211.git tags/mac80211-for-davem-2019-02-15

for you to fetch changes up to 83e37e0bdd1470bbe6612250b745ad39b1a7b130:

  mac80211: Restore vif beacon interval if start ap fails (2019-02-15 13:30:24 +0100)

----------------------------------------------------------------
Just a few fixes this time:
 * mesh rhashtable fixes from Herbert
 * a small error path fix when starting AP interfaces

----------------------------------------------------------------
Herbert Xu (2):
      mac80211: Use linked list instead of rhashtable walk for mesh tables
      mac80211: Free mpath object when rhashtable insertion fails

Rakesh Pillai (1):
      mac80211: Restore vif beacon interval if start ap fails

 net/mac80211/cfg.c          |   6 +-
 net/mac80211/mesh.h         |   6 ++
 net/mac80211/mesh_pathtbl.c | 155 +++++++++++++-------------------------------
 3 files changed, 57 insertions(+), 110 deletions(-)


^ permalink raw reply

* Re: [PATCH] qmi_wwan: apply SET_DTR quirk to Sierra WP7607
From: Bjørn Mork @ 2019-02-15 12:32 UTC (permalink / raw)
  To: Beniamino Galvani; +Cc: David S . Miller, netdev, linux-usb, linux-kernel
In-Reply-To: <20190215122042.23195-1-bgalvani@redhat.com>

Beniamino Galvani <bgalvani@redhat.com> writes:

> The 1199:68C0 USB ID is reused by Sierra WP7607 which requires the DTR
> quirk to be detected. Apply QMI_QUIRK_SET_DTR unconditionally as
> already done for other IDs shared between different devices.
>
> Signed-off-by: Beniamino Galvani <bgalvani@redhat.com>

Thanks. Looks like it's time to consider applying this unconditionally
to all devices.  Feel free to propose something like that the next time
this issue comes up.

Acked-by: Bjørn Mork <bjorn@mork.no>

^ permalink raw reply

* Re: [PATCH net-next 03/12] net: sched: flower: introduce reference counting for filters
From: Stefano Brivio @ 2019-02-15 12:32 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net
In-Reply-To: <vbfef89jny6.fsf@mellanox.com>

On Fri, 15 Feb 2019 11:22:45 +0000
Vlad Buslov <vladbu@mellanox.com> wrote:

> On Thu 14 Feb 2019 at 20:34, Stefano Brivio <sbrivio@redhat.com> wrote:
> > On Thu, 14 Feb 2019 09:47:03 +0200
> > Vlad Buslov <vladbu@mellanox.com> wrote:
> >  
> >> +static struct cls_fl_filter *fl_get_next_filter(struct tcf_proto *tp,
> >> +						unsigned long *handle)
> >> +{
> >> +	struct cls_fl_head *head = fl_head_dereference(tp);
> >> +	struct cls_fl_filter *f;
> >> +
> >> +	rcu_read_lock();
> >> +	/* don't return filters that are being deleted */
> >> +	while ((f = idr_get_next_ul(&head->handle_idr,
> >> +				    handle)) != NULL &&
> >> +	       !refcount_inc_not_zero(&f->refcnt))
> >> +		++(*handle);  
> >
> > This... hurts :) What about:
> >
> > 	while ((f = idr_get_next_ul(&head->handle_idr, &handle))) {
> > 		if (refcount_inc_not_zero(&f->refcnt))
> > 			break;
> > 		++(*handle);
> > 	}
> >
> > ?  
> 
> I prefer to avoid using value of assignment as boolean and
> non-structured jumps, when possible. In this case it seems OK either
> way, but how about:
> 
> 	for (f = idr_get_next_ul(&head->handle_idr, handle);
> 	     f && !refcount_inc_not_zero(&f->refcnt);
> 	     f = idr_get_next_ul(&head->handle_idr, handle))
> 		++(*handle);

Honestly, I preferred the original, this is repeating idr_get_next_ul()
twice.

Maybe, just:

	[...]
	struct idr *idr;

	[...]
	idr = &head->handle_idr;
	while ((f = idr_get_next_ul(idr, handle)) != NULL &&
	       !refcount_inc_not_zero(&f->refcnt))
		++(*handle);

also rather ugly, but not entirely unreadable. I tried drafting a
helper for this, but it just ends up hiding what this does.

> >> @@ -1349,6 +1404,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
> >>  		err = -ENOBUFS;
> >>  		goto errout_tb;
> >>  	}
> >> +	refcount_set(&fnew->refcnt, 1);
> >>
> >>  	err = tcf_exts_init(&fnew->exts, TCA_FLOWER_ACT, 0);
> >>  	if (err < 0)
> >> @@ -1381,6 +1437,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
> >>  	if (!tc_in_hw(fnew->flags))
> >>  		fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
> >>
> >> +	refcount_inc(&fnew->refcnt);  
> >
> > I guess I'm not getting the semantics but... why is it 2 now?  
> 
> As soon as fnew is inserted into head->handle_idr (one reference), it
> becomes accessible to concurrent users, which means that it can be
> deleted at any time. However, tp->change() returns a reference to newly
> created filter to cls_api by assigning "arg" parameter to it (second
> reference). After tp->change() returns, cls API continues to use fnew
> and releases it with tfilter_put() when finished.

I see, thanks for the explanation!

-- 
Stefano

^ permalink raw reply

* Re: [PATCH AUTOSEL 3.18 15/16] cfg80211: extend range deviation for DMG
From: Johannes Berg @ 2019-02-15 12:31 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable
  Cc: Chaitanya Tata, Chaitanya Tata, linux-wireless, netdev
In-Reply-To: <20190215021546.179605-15-sashal@kernel.org>

On Thu, 2019-02-14 at 21:15 -0500, Sasha Levin wrote:
> From: Chaitanya Tata <chaitanya.tata@bluwirelesstechnology.com>
> 
> [ Upstream commit 93183bdbe73bbdd03e9566c8dc37c9d06b0d0db6 ]
> 
> Recently, DMG frequency bands have been extended till 71GHz, so extend
> the range check till 20GHz (45-71GHZ), else some channels will be marked
> as disabled.

There's not really any danger in picking this up for old kernels, but
also practically no value since those kernels wouldn't have supoprt for
the higher ranges ("recently, ...") part :)

johannes


^ permalink raw reply

* [PATCH] qmi_wwan: apply SET_DTR quirk to Sierra WP7607
From: Beniamino Galvani @ 2019-02-15 12:20 UTC (permalink / raw)
  To: Bjørn Mork, David S . Miller, netdev, linux-usb,
	linux-kernel

The 1199:68C0 USB ID is reused by Sierra WP7607 which requires the DTR
quirk to be detected. Apply QMI_QUIRK_SET_DTR unconditionally as
already done for other IDs shared between different devices.

Signed-off-by: Beniamino Galvani <bgalvani@redhat.com>
---
 drivers/net/usb/qmi_wwan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 735ad838e2ba..18af2f8eee96 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -1201,8 +1201,8 @@ static const struct usb_device_id products[] = {
 	{QMI_FIXED_INTF(0x114f, 0x68a2, 8)},    /* Sierra Wireless MC7750 */
 	{QMI_FIXED_INTF(0x1199, 0x68a2, 8)},	/* Sierra Wireless MC7710 in QMI mode */
 	{QMI_FIXED_INTF(0x1199, 0x68a2, 19)},	/* Sierra Wireless MC7710 in QMI mode */
-	{QMI_FIXED_INTF(0x1199, 0x68c0, 8)},	/* Sierra Wireless MC7304/MC7354 */
-	{QMI_FIXED_INTF(0x1199, 0x68c0, 10)},	/* Sierra Wireless MC7304/MC7354 */
+	{QMI_QUIRK_SET_DTR(0x1199, 0x68c0, 8)},	/* Sierra Wireless MC7304/MC7354, WP76xx */
+	{QMI_QUIRK_SET_DTR(0x1199, 0x68c0, 10)},/* Sierra Wireless MC7304/MC7354 */
 	{QMI_FIXED_INTF(0x1199, 0x901c, 8)},    /* Sierra Wireless EM7700 */
 	{QMI_FIXED_INTF(0x1199, 0x901f, 8)},    /* Sierra Wireless EM7355 */
 	{QMI_FIXED_INTF(0x1199, 0x9041, 8)},	/* Sierra Wireless MC7305/MC7355 */
-- 
2.20.1


^ permalink raw reply related

* Re: [v3 PATCH 0/4] mac80211: Fix incorrect usage of rhashtable walk API
From: Johannes Berg @ 2019-02-15 12:21 UTC (permalink / raw)
  To: Herbert Xu, David Miller, linux-wireless, netdev, j, tgraf,
	Julia Lawall
In-Reply-To: <20190214140236.omt74prxhkfaasue@gondor.apana.org.au>

> The first two patches in this series are bug fixes and should be
> backported to stable.
> 
> They fixes a number of issues with the use of the rhashtable API
> in mac80211.  First of all it converts the use of rashtable walks over
> to a simple linked list.  This is because an rhashtable walk is
> inherently unstable and not meant for uses that require stability,
> e.g., when you're trying to lookup an object to delete.

Thanks a lot, Herbert.

Applied those now, I'll send a pull request to Dave with them. Once that
trickles back into net-next I'll apply the third patch (it doesn't apply
without the others), and then Dave you can take the rhashtable one.

Let me know if you'd prefer I take the rhashtable one through my tree,
which really would be only so you don't have to track the dependency.

NB: it'd be easier in patchwork if you tagged all the patches with v3 in
their own PATCH tag, or put the "v3" tag into the actual subject (not
the "[PATCH 0/4]" tag because evidently patchwork drops the tags and
doesn't track them for the *series* just each *patch* ... so with what
you did nothing is visible in patchwork, even just appending "(v3)" to
the subject of the cover letter would've fixed that...

johannes

^ permalink raw reply

* Re: [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain_lock mutex
From: Vlad Buslov @ 2019-02-15 12:15 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	ast@kernel.org, daniel@iogearbox.net
In-Reply-To: <20190215113041.GA10511@splinter>


On Fri 15 Feb 2019 at 11:30, Ido Schimmel <idosch@idosch.org> wrote:
> On Fri, Feb 15, 2019 at 10:02:11AM +0000, Vlad Buslov wrote:
>>
>> On Thu 14 Feb 2019 at 18:24, Ido Schimmel <idosch@idosch.org> wrote:
>> > On Mon, Feb 11, 2019 at 10:55:38AM +0200, Vlad Buslov wrote:
>> >> Extend tcf_chain with new filter_chain_lock mutex. Always lock the chain
>> >> when accessing filter_chain list, instead of relying on rtnl lock.
>> >> Dereference filter_chain with tcf_chain_dereference() lockdep macro to
>> >> verify that all users of chain_list have the lock taken.
>> >>
>> >> Rearrange tp insert/remove code in tc_new_tfilter/tc_del_tfilter to execute
>> >> all necessary code while holding chain lock in order to prevent
>> >> invalidation of chain_info structure by potential concurrent change. This
>> >> also serializes calls to tcf_chain0_head_change(), which allows head change
>> >> callbacks to rely on filter_chain_lock for synchronization instead of rtnl
>> >> mutex.
>> >>
>> >> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>> >> Acked-by: Jiri Pirko <jiri@mellanox.com>
>> >
>> > With this sequence [1] I get the following trace [2]. Bisected it to
>> > this patch. Note that second filter is rejected by the device driver
>> > (that's the intention). I guess it is not properly removed from the
>> > filter chain in the error path?
>> >
>> > Thanks
>> >
>> > [1]
>> > # tc qdisc add dev swp3 clsact
>> > # tc filter add dev swp3 ingress pref 1 matchall skip_sw \
>> > 	action mirred egress mirror dev swp7
>> > # tc filter add dev swp3 ingress pref 2 matchall skip_sw \
>> > 	action mirred egress mirror dev swp7
>> > RTNETLINK answers: File exists
>> > We have an error talking to the kernel, -1
>> > # tc qdisc del dev swp3 clsact
>> >
>> > [2]
>> > [   70.545131] kasan: GPF could be caused by NULL-ptr deref or user memory access
>> > [   70.553394] general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
>> > [   70.560618] CPU: 2 PID: 2268 Comm: tc Not tainted 5.0.0-rc5-custom-02103-g415d39427317 #1304
>> > [   70.570065] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
>> > [   70.580204] RIP: 0010:mall_reoffload+0x14a/0x760
>> > [   70.585382] Code: c1 0f 85 ba 05 00 00 31 c0 4d 8d 6c 24 34 b9 06 00 00 00 4c 89 ff f3 48 ab 4c 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 14 02 4c 89 e8 83
>> > e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 bd
>> > [   70.606382] RSP: 0018:ffff888231faefc0 EFLAGS: 00010207
>> > [   70.612235] RAX: dffffc0000000000 RBX: 1ffff110463f5dfe RCX: 0000000000000000
>> > [   70.620220] RDX: 0000000000000006 RSI: 1ffff110463f5e01 RDI: ffff888231faf040
>> > [   70.628206] RBP: ffff8881ef151a00 R08: 0000000000000000 R09: ffffed10463f5dfa
>> > [   70.636192] R10: ffffed10463f5dfa R11: 0000000000000003 R12: 0000000000000000
>> > [   70.644177] R13: 0000000000000034 R14: 0000000000000000 R15: ffff888231faf010
>> > [   70.652163] FS:  00007f46b5bf0800(0000) GS:ffff888236c00000(0000) knlGS:0000000000000000
>> > [   70.661218] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [   70.667649] CR2: 0000000001d590a8 CR3: 0000000231c3c000 CR4: 00000000001006e0
>> > [   70.675633] Call Trace:
>> > [   70.693046]  tcf_block_playback_offloads+0x94/0x230
>> > [   70.710617]  __tcf_block_cb_unregister+0xf7/0x2d0
>> > [   70.734143]  mlxsw_sp_setup_tc+0x20f/0x660
>> > [   70.738739]  tcf_block_offload_unbind+0x22b/0x350
>> > [   70.748898]  __tcf_block_put+0x203/0x630
>> > [   70.769700]  tcf_block_put_ext+0x2f/0x40
>> > [   70.774098]  clsact_destroy+0x7a/0xb0
>> > [   70.782604]  qdisc_destroy+0x11a/0x5c0
>> > [   70.786810]  qdisc_put+0x5a/0x70
>> > [   70.790435]  notify_and_destroy+0x8e/0xa0
>> > [   70.794933]  qdisc_graft+0xbb7/0xef0
>> > [   70.809009]  tc_get_qdisc+0x518/0xa30
>> > [   70.821530]  rtnetlink_rcv_msg+0x397/0xa20
>> > [   70.835510]  netlink_rcv_skb+0x132/0x380
>> > [   70.848419]  netlink_unicast+0x4c0/0x690
>> > [   70.866019]  netlink_sendmsg+0x929/0xe10
>> > [   70.882134]  sock_sendmsg+0xc8/0x110
>> > [   70.886144]  ___sys_sendmsg+0x77a/0x8f0
>> > [   70.924064]  __sys_sendmsg+0xf7/0x250
>> > [   70.955727]  do_syscall_64+0x14d/0x610
>>
>> Hi Ido,
>>
>> Thanks for reporting this.
>>
>> I looked at the code and problem seems to be matchall classifier
>> specific. My implementation of unlocked cls API assumes that concurrent
>> insertions are possible and checks for it when deleting "empty" tp.
>> Since classifiers don't expose number of elements, the only way to test
>> this is to do tp->walk() on them and assume that walk callback is called
>> once per filter on every classifier. In your example new tp is created
>> for second filter, filter insertion fails, number of elements on newly
>> created tp is checked with tp->walk() before deleting it. However,
>> matchall classifier always calls the tp->walk() callback once, even when
>> it doesn't have a valid filter (in this case with NULL filter pointer).
>>
>> Possible ways to fix this:
>>
>> 1) Check for NULL filter pointer in tp->walk() callback and ignore it
>> when counting filters on tp - will work but I don't like it because I
>> don't think it is ever correct to call tp->walk() callback with NULL
>> filter pointer.
>>
>> 2) Fix matchall to not call tp->walk() callback with NULL filter
>> pointers - my preferred simple solution.
>>
>> 3) Extend tp API to have direct way to count filters by implementing
>> tp->count - requires change to every classifier. Or maybe add it as
>> optional API that only unlocked classifiers are required to implement
>> and just delete rtnl lock dependent tp's without checking for concurrent
>> insertions.
>>
>> What do you think?
>
> Since the problem is matchall-specific, then it makes sense to fix it in
> matchall like you suggested in option #2.
>
> Can you please use this opportunity and audit other classifiers to
> confirm problem is indeed specific to matchall?
>
> In any case, feel free to send me a patch and I'll test it.
>
> Thanks

I've sent you the patch for matchall and will audit all other
classifiers for this issue.

Thanks,
Vlad

^ permalink raw reply

* [PATCH] net: sched: matchall: verify that filter is not NULL in mall_walk()
From: Vlad Buslov @ 2019-02-15 12:11 UTC (permalink / raw)
  To: idosch; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, Vlad Buslov
In-Reply-To: <20190215113041.GA10511@splinter>

Check that filter is not NULL before passing it to tcf_walker->fn()
callback. This can happen when mall_change() failed to offload filter to
hardware.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
 net/sched/cls_matchall.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index a37137430e61..1f9d481b0fbb 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -247,6 +247,9 @@ static void mall_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 
 	if (arg->count < arg->skip)
 		goto skip;
+
+	if (!head)
+		return;
 	if (arg->fn(tp, head, arg) < 0)
 		arg->stop = 1;
 skip:
-- 
2.13.6


^ permalink raw reply related

* Re: [BUG] 4.20: mv88e6xxx: WARNING: possible circular locking dependency detected
From: Russell King - ARM Linux admin @ 2019-02-15 12:11 UTC (permalink / raw)
  To: Andrew Lunn, Vivien Didelot; +Cc: netdev
In-Reply-To: <20190215114617.lfag4sarqzbtca6c@shell.armlinux.org.uk>

On Fri, Feb 15, 2019 at 11:46:17AM +0000, Russell King - ARM Linux admin wrote:
> Hi Andrew, Vivien,
> 
> I decided to try adding support for the DSA switch interrupt on
> SolidRun's Clearfog platform, but I notice having done so I get:
> 
> WARNING: possible circular locking dependency detected
> 4.20.0+ #297 Not tainted
> ------------------------------------------------------
> systemd-udevd/157 is trying to acquire lock:
> ecc4a080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704
> 
> but task is already holding lock:
> edf9c940 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&desc->request_mutex){+.+.}:
>        mutex_lock_nested+0x1c/0x24
>        __setup_irq+0xa0/0x704
>        request_threaded_irq+0xd0/0x150
>        mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx]
>        mdio_probe+0x2c/0x54
>        really_probe+0x200/0x2c4
>        driver_probe_device+0x5c/0x174
>        __driver_attach+0xd8/0xdc
>        bus_for_each_dev+0x58/0x7c
>        bus_add_driver+0xe4/0x1f0
>        driver_register+0x7c/0x110
>        mdio_driver_register+0x24/0x58
>        do_one_initcall+0x74/0x2e8
>        do_init_module+0x60/0x1d0
>        load_module+0x1968/0x1ff4
>        sys_finit_module+0x8c/0x98
>        ret_fast_syscall+0x0/0x28
>        0xbee82ae8
> 
> -> #0 (&chip->reg_lock){+.+.}:
>        __mutex_lock+0x50/0x8b8
>        mutex_lock_nested+0x1c/0x24
>        __setup_irq+0x640/0x704
>        request_threaded_irq+0xd0/0x150
>        mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx]
>        mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx]
>        mdio_probe+0x2c/0x54
>        really_probe+0x200/0x2c4
>        driver_probe_device+0x5c/0x174
>        __driver_attach+0xd8/0xdc
>        bus_for_each_dev+0x58/0x7c
>        bus_add_driver+0xe4/0x1f0
>        driver_register+0x7c/0x110
>        mdio_driver_register+0x24/0x58
>        do_one_initcall+0x74/0x2e8
>        do_init_module+0x60/0x1d0
>        load_module+0x1968/0x1ff4
>        sys_finit_module+0x8c/0x98
>        ret_fast_syscall+0x0/0x28
>        0xbee82ae8
> 
> other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&desc->request_mutex);
>                                lock(&chip->reg_lock);
>                                lock(&desc->request_mutex);
>   lock(&chip->reg_lock);
> 
>  *** DEADLOCK ***
> 
> 2 locks held by systemd-udevd/157:
>  #0: ee040868 (&dev->mutex){....}, at: __driver_attach+0x70/0xdc
> ------------[ cut here ]------------
>  #1: edf9c940 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704
> WARNING: CPU: 0 PID: 152 at kernel/locking/lockdep.c:355
> stack backtrace:
> downgrading a read lock
> Modules linked in:
> CPU: 1 PID: 157 Comm: systemd-udevd Not tainted 4.20.0+ #297
>  marvell_cesa(+) mv88e6xxx(+) dsa_core devlink xhci_plat_hcd(+) xhci_hcd armada_Hardware name: Marvell Armada 380/385 (Device Tree)
> [<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
> [<c0014888>] (show_stack) from [<c07f55c0>] (dump_stack+0x9c/0xd4)
> [<c07f55c0>] (dump_stack) from [<c0088afc>] (print_circular_bug+0x284/0x2d8)
> [<c0088afc>] (print_circular_bug) from [<c0086b5c>] (__lock_acquire+0x15d4/0x19b[<c0086b5c>] (__lock_acquire) from [<c0087828>] (lock_acquire+0xc4/0x1dc)
> [<c0087828>] (lock_acquire) from [<c080fe68>] (__mutex_lock+0x50/0x8b8)
> [<c080fe68>] (__mutex_lock) from [<c0810758>] (mutex_lock_nested+0x1c/0x24)
> [<c0810758>] (mutex_lock_nested) from [<c009e060>] (__setup_irq+0x640/0x704)
> [<c009e060>] (__setup_irq) from [<c009e2e0>] (request_threaded_irq+0xd0/0x150)
> [<c009e2e0>] (request_threaded_irq) from [<bf0dc970>] (mv88e6xxx_g2_irq_setup+0x[<bf0dc970>] (mv88e6xxx_g2_irq_setup [mv88e6xxx]) from [<bf0d5a90>] (mv88e6xxx_p[<bf0d5a90>] (mv88e6xxx_probe [mv88e6xxx]) from [<c050d420>] (mdio_probe+0x2c/0x[<c050d420>] (mdio_probe) from [<c0496eac>] (really_probe+0x200/0x2c4)
> [<c0496eac>] (really_probe) from [<c0497140>] (driver_probe_device+0x5c/0x174)
> [<c0497140>] (driver_probe_device) from [<c0497330>] (__driver_attach+0xd8/0xdc)[<c0497330>] (__driver_attach) from [<c0495494>] (bus_for_each_dev+0x58/0x7c)
> [<c0495494>] (bus_for_each_dev) from [<c04963d4>] (bus_add_driver+0xe4/0x1f0)
> [<c04963d4>] (bus_add_driver) from [<c0498038>] (driver_register+0x7c/0x110)
> [<c0498038>] (driver_register) from [<c050d338>] (mdio_driver_register+0x24/0x58[<c050d338>] (mdio_driver_register) from [<c000afdc>] (do_one_initcall+0x74/0x2e[<c000afdc>] (do_one_initcall) from [<c00d4994>] (do_init_module+0x60/0x1d0)
> [<c00d4994>] (do_init_module) from [<c00d39e0>] (load_module+0x1968/0x1ff4)
> [<c00d39e0>] (load_module) from [<c00d4248>] (sys_finit_module+0x8c/0x98)
> [<c00d4248>] (sys_finit_module) from [<c0009000>] (ret_fast_syscall+0x0/0x28)
> Exception stack(0xed42bfa8 to 0xed42bff0)
> bfa0:                   00020000 00000000 0000000b b6e814b5 00000000 010b31e0
> bfc0: 00020000 00000000 00000000 0000017b 010b1b30 00020000 00000000 010b31e0
> bfe0: bee82af8 bee82ae8 b6e7b2ac b6ddad70
> CPU: 0 PID: 152 Comm: systemd-udevd Not tainted 4.20.0+ #297
> Hardware name: Marvell Armada 380/385 (Device Tree)
> [<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
> [<c0014888>] (show_stack) from [<c07f55c0>] (dump_stack+0x9c/0xd4)
> [<c07f55c0>] (dump_stack) from [<c00312bc>] (__warn+0xf8/0x124)
> [<c00312bc>] (__warn) from [<c00313b0>] (warn_slowpath_fmt+0x38/0x48)
> [<c00313b0>] (warn_slowpath_fmt) from [<c0087518>] (lock_downgrade+0x14c/0x1b8)
> [<c0087518>] (lock_downgrade) from [<c0081650>] (downgrade_write+0x14/0xd4)
> [<c0081650>] (downgrade_write) from [<c0189d10>] (__do_munmap+0x2b8/0x31c)
> [<c0189d10>] (__do_munmap) from [<c0189dd4>] (__vm_munmap+0x60/0xa0)
> [<c0189dd4>] (__vm_munmap) from [<c0009000>] (ret_fast_syscall+0x0/0x28)
> Exception stack(0xed443fa8 to 0xed443ff0)
> 3fa0:                   010a8240 00001000 b665f000 00001000 00000000 00000000
> 3fc0: 010a8240 00001000 00000000 0000005b 00000000 00000007 b6ee5f10 00000000
> 3fe0: fbad2418 bee7f124 b6d7c7b4 b6ddafac
> irq event stamp: 83666
> hardirqs last  enabled at (83665): [<c001d6b0>] do_page_fault+0x190/0x360
> hardirqs last disabled at (83666): [<c080e474>] __schedule+0xbc/0x9c4
> softirqs last  enabled at (82980): [<c000a484>] __do_softirq+0x344/0x540
> softirqs last disabled at (82971): [<c00386e0>] irq_exit+0x124/0x144
> ---[ end trace c91466d44e5e3485 ]---
> 
> This is caused by the locking order inversion in mv88e6xxx_probe:
> 
>         mutex_lock(&chip->reg_lock);
>         if (chip->irq > 0)
>                 err = mv88e6xxx_g1_irq_setup(chip);
>         else
>                 err = mv88e6xxx_irq_poll_setup(chip);
>         mutex_unlock(&chip->reg_lock);
> 
> Here, we take chip->reg_lock, and then call into mv88e6xxx_g1_irq_setup()
> which then calls request_threaded_irq(), taking the request_mutex.
> However, when we request the g2 interrupt, we call request_threaded_irq()
> again, which takes the request_mutex, which then goes on to call
> chip_bus_lock().  This comes through to mv88e6xxx_g1_irq_bus_lock,
> which then tries to grab chip->reg_lock.
> 
> It looks to me like the mutex_lock()/unlock() for reg_lock should be
> moved inside mv88e6xxx_g1_irq_free_common() and
> mv88e6xxx_g1_irq_setup_common(), which will avoid holding it while
> calling request_threaded_irq() or setting up the delayed work.

Maybe something like this, which seems to solve the problem here:

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index c771a58b975e..c859efd8d329 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -349,9 +349,11 @@ static void mv88e6xxx_g1_irq_free_common(struct mv88e6xxx_chip *chip)
 	int irq, virq;
 	u16 mask;
 
+	mutex_lock(&chip->reg_lock);
 	mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &mask);
 	mask &= ~GENMASK(chip->g1_irq.nirqs, 0);
 	mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask);
+	mutex_unlock(&chip->reg_lock);
 
 	for (irq = 0; irq < chip->g1_irq.nirqs; irq++) {
 		virq = irq_find_mapping(chip->g1_irq.domain, irq);
@@ -369,9 +371,7 @@ static void mv88e6xxx_g1_irq_free(struct mv88e6xxx_chip *chip)
 	 */
 	free_irq(chip->irq, chip);
 
-	mutex_lock(&chip->reg_lock);
 	mv88e6xxx_g1_irq_free_common(chip);
-	mutex_unlock(&chip->reg_lock);
 }
 
 static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip)
@@ -392,6 +392,7 @@ static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip)
 	chip->g1_irq.chip = mv88e6xxx_g1_irq_chip;
 	chip->g1_irq.masked = ~0;
 
+	mutex_lock(&chip->reg_lock);
 	err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &mask);
 	if (err)
 		goto out_mapping;
@@ -406,6 +407,7 @@ static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip)
 	err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &reg);
 	if (err)
 		goto out_disable;
+	mutex_unlock(&chip->reg_lock);
 
 	return 0;
 
@@ -414,6 +416,7 @@ static int mv88e6xxx_g1_irq_setup_common(struct mv88e6xxx_chip *chip)
 	mv88e6xxx_g1_write(chip, MV88E6XXX_G1_CTL1, mask);
 
 out_mapping:
+	mutex_unlock(&chip->reg_lock);
 	for (irq = 0; irq < 16; irq++) {
 		virq = irq_find_mapping(chip->g1_irq.domain, irq);
 		irq_dispose_mapping(virq);
@@ -479,9 +482,7 @@ static void mv88e6xxx_irq_poll_free(struct mv88e6xxx_chip *chip)
 	kthread_cancel_delayed_work_sync(&chip->irq_poll_work);
 	kthread_destroy_worker(chip->kworker);
 
-	mutex_lock(&chip->reg_lock);
 	mv88e6xxx_g1_irq_free_common(chip);
-	mutex_unlock(&chip->reg_lock);
 }
 
 int mv88e6xxx_wait(struct mv88e6xxx_chip *chip, int addr, int reg, u16 mask)
@@ -4808,12 +4809,10 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
 	 * the PHYs will link their interrupts to these interrupt
 	 * controllers
 	 */
-	mutex_lock(&chip->reg_lock);
 	if (chip->irq > 0)
 		err = mv88e6xxx_g1_irq_setup(chip);
 	else
 		err = mv88e6xxx_irq_poll_setup(chip);
-	mutex_unlock(&chip->reg_lock);
 
 	if (err)
 		goto out;

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply related

* Re: [PATCH] net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
From: John Garry @ 2019-02-15 12:00 UTC (permalink / raw)
  To: Salil Mehta, Huang Zijiang, Zhuangyuzeng (Yisen)
  Cc: davem@davemloft.net, lipeng (Y), liuyonglong, yuehaibing,
	keescook@chromium.org, wangxi (M), netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, wang.yi59@zte.com.cn, Linuxarm
In-Reply-To: <11b3bcdf50ee435587fbfad6043e49bc@huawei.com>

On 15/02/2019 11:25, Salil Mehta wrote:
>> From: John Garry
>> Sent: Friday, February 15, 2019 10:52 AM
>>
>> On 14/02/2019 06:41, Huang Zijiang wrote:
>>> The of_find_device_by_node() takes a reference to the underlying device
>>> structure, we should release that reference.
>>
>> of_find_device_by_node() is not called for every path, so is this change proper:
>
>
> It looks okay to me and with the suggested device reference is being
> released from every possible error leg in the function. Could you be
> more specific which error path is not being addressed?
>
>
>
>> 	/* find the platform device corresponding to fwnode */
>> 	if (is_of_node(dsaf_fwnode)) {
>> 		pdev = of_find_device_by_node(to_of_node(dsaf_fwnode));
>
>
> This will get the reference to the device, which needs to be
> released later.
>
>
>> 	} else if (is_acpi_device_node(dsaf_fwnode)) {
>> 		pdev = hns_dsaf_find_platform_device(dsaf_fwnode);
>
>
> This will also get the reference to the device when bus_find_device()
> gets called and returns the 'device'. Therefore, this needs to be
> released as well using put_device()

OK, I see. That's non-obvious.

And so the commit message was simply incomplete.

>

So who finally drops this reference? This patch only seems to release on 
error path. I checked the callers and they don't seem to.

John

>
>
>> 	} else {
>> 		pr_err("fwnode is neither OF or ACPI type\n");
>> 		return -EINVAL;
>> 	}
>>
>> 	/* check if we were a success in fetching pdev */
>> 	if (!pdev) {
>> 		pr_err("couldn't find platform device for node\n");
>> 		return -ENODEV;
>> 	}
>>
>> 	/* retrieve the dsaf_device from the driver data */
>> 	dsaf_dev = dev_get_drvdata(&pdev->dev);
>> 	if (!dsaf_dev) {
>> 		dev_err(&pdev->dev, "dsaf_dev is NULL\n");
>> 		return -ENODEV;
>> 	}
>>
>> John
>
>
> [...]
>
>>> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
>>> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
>>> @@ -3081,6 +3081,7 @@ int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset)
>>>  	dsaf_dev = dev_get_drvdata(&pdev->dev);
>>>  	if (!dsaf_dev) {
>>>  		dev_err(&pdev->dev, "dsaf_dev is NULL\n");
>>> +		put_device(&pdev->dev);
>
>
> This looks okay.
>
>
>>>  		return -ENODEV;
>>>  	}
>>>
>>> @@ -3088,6 +3089,7 @@ int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset)
>>>  	if (AE_IS_VER1(dsaf_dev->dsaf_ver)) {
>>>  		dev_err(dsaf_dev->dev, "%s v1 chip doesn't support RoCE!\n",
>>>  			dsaf_dev->ae_dev.name);
>>> +		put_device(&pdev->dev);
>
>
> This looks okay.
>
>
> Salil.
>
> .
>



^ permalink raw reply

* [BUG] 4.20: mv88e6xxx: WARNING: possible circular locking dependency detected
From: Russell King - ARM Linux admin @ 2019-02-15 11:46 UTC (permalink / raw)
  To: Andrew Lunn, Vivien Didelot; +Cc: netdev

Hi Andrew, Vivien,

I decided to try adding support for the DSA switch interrupt on
SolidRun's Clearfog platform, but I notice having done so I get:

WARNING: possible circular locking dependency detected
4.20.0+ #297 Not tainted
------------------------------------------------------
systemd-udevd/157 is trying to acquire lock:
ecc4a080 (&chip->reg_lock){+.+.}, at: __setup_irq+0x640/0x704

but task is already holding lock:
edf9c940 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&desc->request_mutex){+.+.}:
       mutex_lock_nested+0x1c/0x24
       __setup_irq+0xa0/0x704
       request_threaded_irq+0xd0/0x150
       mv88e6xxx_probe+0x41c/0x694 [mv88e6xxx]
       mdio_probe+0x2c/0x54
       really_probe+0x200/0x2c4
       driver_probe_device+0x5c/0x174
       __driver_attach+0xd8/0xdc
       bus_for_each_dev+0x58/0x7c
       bus_add_driver+0xe4/0x1f0
       driver_register+0x7c/0x110
       mdio_driver_register+0x24/0x58
       do_one_initcall+0x74/0x2e8
       do_init_module+0x60/0x1d0
       load_module+0x1968/0x1ff4
       sys_finit_module+0x8c/0x98
       ret_fast_syscall+0x0/0x28
       0xbee82ae8

-> #0 (&chip->reg_lock){+.+.}:
       __mutex_lock+0x50/0x8b8
       mutex_lock_nested+0x1c/0x24
       __setup_irq+0x640/0x704
       request_threaded_irq+0xd0/0x150
       mv88e6xxx_g2_irq_setup+0xcc/0x1b4 [mv88e6xxx]
       mv88e6xxx_probe+0x44c/0x694 [mv88e6xxx]
       mdio_probe+0x2c/0x54
       really_probe+0x200/0x2c4
       driver_probe_device+0x5c/0x174
       __driver_attach+0xd8/0xdc
       bus_for_each_dev+0x58/0x7c
       bus_add_driver+0xe4/0x1f0
       driver_register+0x7c/0x110
       mdio_driver_register+0x24/0x58
       do_one_initcall+0x74/0x2e8
       do_init_module+0x60/0x1d0
       load_module+0x1968/0x1ff4
       sys_finit_module+0x8c/0x98
       ret_fast_syscall+0x0/0x28
       0xbee82ae8

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&desc->request_mutex);
                               lock(&chip->reg_lock);
                               lock(&desc->request_mutex);
  lock(&chip->reg_lock);

 *** DEADLOCK ***

2 locks held by systemd-udevd/157:
 #0: ee040868 (&dev->mutex){....}, at: __driver_attach+0x70/0xdc
------------[ cut here ]------------
 #1: edf9c940 (&desc->request_mutex){+.+.}, at: __setup_irq+0xa0/0x704
WARNING: CPU: 0 PID: 152 at kernel/locking/lockdep.c:355
stack backtrace:
downgrading a read lock
Modules linked in:
CPU: 1 PID: 157 Comm: systemd-udevd Not tainted 4.20.0+ #297
 marvell_cesa(+) mv88e6xxx(+) dsa_core devlink xhci_plat_hcd(+) xhci_hcd armada_Hardware name: Marvell Armada 380/385 (Device Tree)
[<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
[<c0014888>] (show_stack) from [<c07f55c0>] (dump_stack+0x9c/0xd4)
[<c07f55c0>] (dump_stack) from [<c0088afc>] (print_circular_bug+0x284/0x2d8)
[<c0088afc>] (print_circular_bug) from [<c0086b5c>] (__lock_acquire+0x15d4/0x19b[<c0086b5c>] (__lock_acquire) from [<c0087828>] (lock_acquire+0xc4/0x1dc)
[<c0087828>] (lock_acquire) from [<c080fe68>] (__mutex_lock+0x50/0x8b8)
[<c080fe68>] (__mutex_lock) from [<c0810758>] (mutex_lock_nested+0x1c/0x24)
[<c0810758>] (mutex_lock_nested) from [<c009e060>] (__setup_irq+0x640/0x704)
[<c009e060>] (__setup_irq) from [<c009e2e0>] (request_threaded_irq+0xd0/0x150)
[<c009e2e0>] (request_threaded_irq) from [<bf0dc970>] (mv88e6xxx_g2_irq_setup+0x[<bf0dc970>] (mv88e6xxx_g2_irq_setup [mv88e6xxx]) from [<bf0d5a90>] (mv88e6xxx_p[<bf0d5a90>] (mv88e6xxx_probe [mv88e6xxx]) from [<c050d420>] (mdio_probe+0x2c/0x[<c050d420>] (mdio_probe) from [<c0496eac>] (really_probe+0x200/0x2c4)
[<c0496eac>] (really_probe) from [<c0497140>] (driver_probe_device+0x5c/0x174)
[<c0497140>] (driver_probe_device) from [<c0497330>] (__driver_attach+0xd8/0xdc)[<c0497330>] (__driver_attach) from [<c0495494>] (bus_for_each_dev+0x58/0x7c)
[<c0495494>] (bus_for_each_dev) from [<c04963d4>] (bus_add_driver+0xe4/0x1f0)
[<c04963d4>] (bus_add_driver) from [<c0498038>] (driver_register+0x7c/0x110)
[<c0498038>] (driver_register) from [<c050d338>] (mdio_driver_register+0x24/0x58[<c050d338>] (mdio_driver_register) from [<c000afdc>] (do_one_initcall+0x74/0x2e[<c000afdc>] (do_one_initcall) from [<c00d4994>] (do_init_module+0x60/0x1d0)
[<c00d4994>] (do_init_module) from [<c00d39e0>] (load_module+0x1968/0x1ff4)
[<c00d39e0>] (load_module) from [<c00d4248>] (sys_finit_module+0x8c/0x98)
[<c00d4248>] (sys_finit_module) from [<c0009000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xed42bfa8 to 0xed42bff0)
bfa0:                   00020000 00000000 0000000b b6e814b5 00000000 010b31e0
bfc0: 00020000 00000000 00000000 0000017b 010b1b30 00020000 00000000 010b31e0
bfe0: bee82af8 bee82ae8 b6e7b2ac b6ddad70
CPU: 0 PID: 152 Comm: systemd-udevd Not tainted 4.20.0+ #297
Hardware name: Marvell Armada 380/385 (Device Tree)
[<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
[<c0014888>] (show_stack) from [<c07f55c0>] (dump_stack+0x9c/0xd4)
[<c07f55c0>] (dump_stack) from [<c00312bc>] (__warn+0xf8/0x124)
[<c00312bc>] (__warn) from [<c00313b0>] (warn_slowpath_fmt+0x38/0x48)
[<c00313b0>] (warn_slowpath_fmt) from [<c0087518>] (lock_downgrade+0x14c/0x1b8)
[<c0087518>] (lock_downgrade) from [<c0081650>] (downgrade_write+0x14/0xd4)
[<c0081650>] (downgrade_write) from [<c0189d10>] (__do_munmap+0x2b8/0x31c)
[<c0189d10>] (__do_munmap) from [<c0189dd4>] (__vm_munmap+0x60/0xa0)
[<c0189dd4>] (__vm_munmap) from [<c0009000>] (ret_fast_syscall+0x0/0x28)
Exception stack(0xed443fa8 to 0xed443ff0)
3fa0:                   010a8240 00001000 b665f000 00001000 00000000 00000000
3fc0: 010a8240 00001000 00000000 0000005b 00000000 00000007 b6ee5f10 00000000
3fe0: fbad2418 bee7f124 b6d7c7b4 b6ddafac
irq event stamp: 83666
hardirqs last  enabled at (83665): [<c001d6b0>] do_page_fault+0x190/0x360
hardirqs last disabled at (83666): [<c080e474>] __schedule+0xbc/0x9c4
softirqs last  enabled at (82980): [<c000a484>] __do_softirq+0x344/0x540
softirqs last disabled at (82971): [<c00386e0>] irq_exit+0x124/0x144
---[ end trace c91466d44e5e3485 ]---

This is caused by the locking order inversion in mv88e6xxx_probe:

        mutex_lock(&chip->reg_lock);
        if (chip->irq > 0)
                err = mv88e6xxx_g1_irq_setup(chip);
        else
                err = mv88e6xxx_irq_poll_setup(chip);
        mutex_unlock(&chip->reg_lock);

Here, we take chip->reg_lock, and then call into mv88e6xxx_g1_irq_setup()
which then calls request_threaded_irq(), taking the request_mutex.
However, when we request the g2 interrupt, we call request_threaded_irq()
again, which takes the request_mutex, which then goes on to call
chip_bus_lock().  This comes through to mv88e6xxx_g1_irq_bus_lock,
which then tries to grab chip->reg_lock.

It looks to me like the mutex_lock()/unlock() for reg_lock should be
moved inside mv88e6xxx_g1_irq_free_common() and
mv88e6xxx_g1_irq_setup_common(), which will avoid holding it while
calling request_threaded_irq() or setting up the delayed work.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain_lock mutex
From: Ido Schimmel @ 2019-02-15 11:30 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net,
	ast@kernel.org, daniel@iogearbox.net
In-Reply-To: <vbfh8d5jroi.fsf@mellanox.com>

On Fri, Feb 15, 2019 at 10:02:11AM +0000, Vlad Buslov wrote:
> 
> On Thu 14 Feb 2019 at 18:24, Ido Schimmel <idosch@idosch.org> wrote:
> > On Mon, Feb 11, 2019 at 10:55:38AM +0200, Vlad Buslov wrote:
> >> Extend tcf_chain with new filter_chain_lock mutex. Always lock the chain
> >> when accessing filter_chain list, instead of relying on rtnl lock.
> >> Dereference filter_chain with tcf_chain_dereference() lockdep macro to
> >> verify that all users of chain_list have the lock taken.
> >>
> >> Rearrange tp insert/remove code in tc_new_tfilter/tc_del_tfilter to execute
> >> all necessary code while holding chain lock in order to prevent
> >> invalidation of chain_info structure by potential concurrent change. This
> >> also serializes calls to tcf_chain0_head_change(), which allows head change
> >> callbacks to rely on filter_chain_lock for synchronization instead of rtnl
> >> mutex.
> >>
> >> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> >> Acked-by: Jiri Pirko <jiri@mellanox.com>
> >
> > With this sequence [1] I get the following trace [2]. Bisected it to
> > this patch. Note that second filter is rejected by the device driver
> > (that's the intention). I guess it is not properly removed from the
> > filter chain in the error path?
> >
> > Thanks
> >
> > [1]
> > # tc qdisc add dev swp3 clsact
> > # tc filter add dev swp3 ingress pref 1 matchall skip_sw \
> > 	action mirred egress mirror dev swp7
> > # tc filter add dev swp3 ingress pref 2 matchall skip_sw \
> > 	action mirred egress mirror dev swp7
> > RTNETLINK answers: File exists
> > We have an error talking to the kernel, -1
> > # tc qdisc del dev swp3 clsact
> >
> > [2]
> > [   70.545131] kasan: GPF could be caused by NULL-ptr deref or user memory access
> > [   70.553394] general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
> > [   70.560618] CPU: 2 PID: 2268 Comm: tc Not tainted 5.0.0-rc5-custom-02103-g415d39427317 #1304
> > [   70.570065] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
> > [   70.580204] RIP: 0010:mall_reoffload+0x14a/0x760
> > [   70.585382] Code: c1 0f 85 ba 05 00 00 31 c0 4d 8d 6c 24 34 b9 06 00 00 00 4c 89 ff f3 48 ab 4c 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <0f> b6 14 02 4c 89 e8 83
> > e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 bd
> > [   70.606382] RSP: 0018:ffff888231faefc0 EFLAGS: 00010207
> > [   70.612235] RAX: dffffc0000000000 RBX: 1ffff110463f5dfe RCX: 0000000000000000
> > [   70.620220] RDX: 0000000000000006 RSI: 1ffff110463f5e01 RDI: ffff888231faf040
> > [   70.628206] RBP: ffff8881ef151a00 R08: 0000000000000000 R09: ffffed10463f5dfa
> > [   70.636192] R10: ffffed10463f5dfa R11: 0000000000000003 R12: 0000000000000000
> > [   70.644177] R13: 0000000000000034 R14: 0000000000000000 R15: ffff888231faf010
> > [   70.652163] FS:  00007f46b5bf0800(0000) GS:ffff888236c00000(0000) knlGS:0000000000000000
> > [   70.661218] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   70.667649] CR2: 0000000001d590a8 CR3: 0000000231c3c000 CR4: 00000000001006e0
> > [   70.675633] Call Trace:
> > [   70.693046]  tcf_block_playback_offloads+0x94/0x230
> > [   70.710617]  __tcf_block_cb_unregister+0xf7/0x2d0
> > [   70.734143]  mlxsw_sp_setup_tc+0x20f/0x660
> > [   70.738739]  tcf_block_offload_unbind+0x22b/0x350
> > [   70.748898]  __tcf_block_put+0x203/0x630
> > [   70.769700]  tcf_block_put_ext+0x2f/0x40
> > [   70.774098]  clsact_destroy+0x7a/0xb0
> > [   70.782604]  qdisc_destroy+0x11a/0x5c0
> > [   70.786810]  qdisc_put+0x5a/0x70
> > [   70.790435]  notify_and_destroy+0x8e/0xa0
> > [   70.794933]  qdisc_graft+0xbb7/0xef0
> > [   70.809009]  tc_get_qdisc+0x518/0xa30
> > [   70.821530]  rtnetlink_rcv_msg+0x397/0xa20
> > [   70.835510]  netlink_rcv_skb+0x132/0x380
> > [   70.848419]  netlink_unicast+0x4c0/0x690
> > [   70.866019]  netlink_sendmsg+0x929/0xe10
> > [   70.882134]  sock_sendmsg+0xc8/0x110
> > [   70.886144]  ___sys_sendmsg+0x77a/0x8f0
> > [   70.924064]  __sys_sendmsg+0xf7/0x250
> > [   70.955727]  do_syscall_64+0x14d/0x610
> 
> Hi Ido,
> 
> Thanks for reporting this.
> 
> I looked at the code and problem seems to be matchall classifier
> specific. My implementation of unlocked cls API assumes that concurrent
> insertions are possible and checks for it when deleting "empty" tp.
> Since classifiers don't expose number of elements, the only way to test
> this is to do tp->walk() on them and assume that walk callback is called
> once per filter on every classifier. In your example new tp is created
> for second filter, filter insertion fails, number of elements on newly
> created tp is checked with tp->walk() before deleting it. However,
> matchall classifier always calls the tp->walk() callback once, even when
> it doesn't have a valid filter (in this case with NULL filter pointer).
> 
> Possible ways to fix this:
> 
> 1) Check for NULL filter pointer in tp->walk() callback and ignore it
> when counting filters on tp - will work but I don't like it because I
> don't think it is ever correct to call tp->walk() callback with NULL
> filter pointer.
> 
> 2) Fix matchall to not call tp->walk() callback with NULL filter
> pointers - my preferred simple solution.
> 
> 3) Extend tp API to have direct way to count filters by implementing
> tp->count - requires change to every classifier. Or maybe add it as
> optional API that only unlocked classifiers are required to implement
> and just delete rtnl lock dependent tp's without checking for concurrent
> insertions.
> 
> What do you think?

Since the problem is matchall-specific, then it makes sense to fix it in
matchall like you suggested in option #2.

Can you please use this opportunity and audit other classifiers to
confirm problem is indeed specific to matchall?

In any case, feel free to send me a patch and I'll test it.

Thanks

^ permalink raw reply

* RE: [PATCH] net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
From: Salil Mehta @ 2019-02-15 11:25 UTC (permalink / raw)
  To: John Garry, Huang Zijiang, Zhuangyuzeng (Yisen)
  Cc: davem@davemloft.net, lipeng (Y), liuyonglong, yuehaibing,
	keescook@chromium.org, wangxi (M), netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, wang.yi59@zte.com.cn, Linuxarm
In-Reply-To: <b20d235c-779c-77bd-e69f-2d63ffd79ec1@huawei.com>

> From: John Garry
> Sent: Friday, February 15, 2019 10:52 AM
> 
> On 14/02/2019 06:41, Huang Zijiang wrote:
> > The of_find_device_by_node() takes a reference to the underlying device
> > structure, we should release that reference.
> 
> of_find_device_by_node() is not called for every path, so is this change proper:


It looks okay to me and with the suggested device reference is being
released from every possible error leg in the function. Could you be
more specific which error path is not being addressed?



> 	/* find the platform device corresponding to fwnode */
> 	if (is_of_node(dsaf_fwnode)) {
> 		pdev = of_find_device_by_node(to_of_node(dsaf_fwnode));


This will get the reference to the device, which needs to be
released later.


> 	} else if (is_acpi_device_node(dsaf_fwnode)) {
> 		pdev = hns_dsaf_find_platform_device(dsaf_fwnode);


This will also get the reference to the device when bus_find_device()
gets called and returns the 'device'. Therefore, this needs to be
released as well using put_device()



> 	} else {
> 		pr_err("fwnode is neither OF or ACPI type\n");
> 		return -EINVAL;
> 	}
> 
> 	/* check if we were a success in fetching pdev */
> 	if (!pdev) {
> 		pr_err("couldn't find platform device for node\n");
> 		return -ENODEV;
> 	}
> 
> 	/* retrieve the dsaf_device from the driver data */
> 	dsaf_dev = dev_get_drvdata(&pdev->dev);
> 	if (!dsaf_dev) {
> 		dev_err(&pdev->dev, "dsaf_dev is NULL\n");
> 		return -ENODEV;
> 	}
> 
> John


[...]

> > --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
> > +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
> > @@ -3081,6 +3081,7 @@ int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset)
> >  	dsaf_dev = dev_get_drvdata(&pdev->dev);
> >  	if (!dsaf_dev) {
> >  		dev_err(&pdev->dev, "dsaf_dev is NULL\n");
> > +		put_device(&pdev->dev);


This looks okay.


> >  		return -ENODEV;
> >  	}
> >
> > @@ -3088,6 +3089,7 @@ int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset)
> >  	if (AE_IS_VER1(dsaf_dev->dsaf_ver)) {
> >  		dev_err(dsaf_dev->dev, "%s v1 chip doesn't support RoCE!\n",
> >  			dsaf_dev->ae_dev.name);
> > +		put_device(&pdev->dev);


This looks okay.


Salil.

^ permalink raw reply

* Re: [PATCH net-next 03/12] net: sched: flower: introduce reference counting for filters
From: Vlad Buslov @ 2019-02-15 11:22 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net
In-Reply-To: <20190214213423.2260f5b9@redhat.com>


On Thu 14 Feb 2019 at 20:34, Stefano Brivio <sbrivio@redhat.com> wrote:
> On Thu, 14 Feb 2019 09:47:03 +0200
> Vlad Buslov <vladbu@mellanox.com> wrote:
>
>> +static struct cls_fl_filter *fl_get_next_filter(struct tcf_proto *tp,
>> +						unsigned long *handle)
>> +{
>> +	struct cls_fl_head *head = fl_head_dereference(tp);
>> +	struct cls_fl_filter *f;
>> +
>> +	rcu_read_lock();
>> +	/* don't return filters that are being deleted */
>> +	while ((f = idr_get_next_ul(&head->handle_idr,
>> +				    handle)) != NULL &&
>> +	       !refcount_inc_not_zero(&f->refcnt))
>> +		++(*handle);
>
> This... hurts :) What about:
>
> 	while ((f = idr_get_next_ul(&head->handle_idr, &handle))) {
> 		if (refcount_inc_not_zero(&f->refcnt))
> 			break;
> 		++(*handle);
> 	}
>
> ?

I prefer to avoid using value of assignment as boolean and
non-structured jumps, when possible. In this case it seems OK either
way, but how about:

	for (f = idr_get_next_ul(&head->handle_idr, handle);
	     f && !refcount_inc_not_zero(&f->refcnt);
	     f = idr_get_next_ul(&head->handle_idr, handle))
		++(*handle);

>
>> +	rcu_read_unlock();
>> +
>> +	return f;
>> +}
>> +
>>  static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
>>  			struct netlink_ext_ack *extack)
>>  {
>> @@ -456,10 +503,7 @@ static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
>>  	if (!tc_skip_hw(f->flags))
>>  		fl_hw_destroy_filter(tp, f, extack);
>>  	tcf_unbind_filter(tp, &f->res);
>> -	if (async)
>> -		tcf_queue_work(&f->rwork, fl_destroy_filter_work);
>> -	else
>> -		__fl_destroy_filter(f);
>> +	__fl_put(f);
>>
>>  	return last;
>>  }
>> @@ -494,11 +538,18 @@ static void fl_destroy(struct tcf_proto *tp, bool rtnl_held,
>>  	tcf_queue_work(&head->rwork, fl_destroy_sleepable);
>>  }
>>
>> +static void fl_put(struct tcf_proto *tp, void *arg)
>> +{
>> +	struct cls_fl_filter *f = arg;
>> +
>> +	__fl_put(f);
>> +}
>> +
>>  static void *fl_get(struct tcf_proto *tp, u32 handle)
>>  {
>>  	struct cls_fl_head *head = fl_head_dereference(tp);
>>
>> -	return idr_find(&head->handle_idr, handle);
>> +	return __fl_get(head, handle);
>>  }
>>
>>  static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
>> @@ -1321,12 +1372,16 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>>  	struct nlattr **tb;
>>  	int err;
>>
>> -	if (!tca[TCA_OPTIONS])
>> -		return -EINVAL;
>> +	if (!tca[TCA_OPTIONS]) {
>> +		err = -EINVAL;
>> +		goto errout_fold;
>> +	}
>>
>>  	mask = kzalloc(sizeof(struct fl_flow_mask), GFP_KERNEL);
>> -	if (!mask)
>> -		return -ENOBUFS;
>> +	if (!mask) {
>> +		err = -ENOBUFS;
>> +		goto errout_fold;
>> +	}
>>
>>  	tb = kcalloc(TCA_FLOWER_MAX + 1, sizeof(struct nlattr *), GFP_KERNEL);
>>  	if (!tb) {
>> @@ -1349,6 +1404,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>>  		err = -ENOBUFS;
>>  		goto errout_tb;
>>  	}
>> +	refcount_set(&fnew->refcnt, 1);
>>
>>  	err = tcf_exts_init(&fnew->exts, TCA_FLOWER_ACT, 0);
>>  	if (err < 0)
>> @@ -1381,6 +1437,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>>  	if (!tc_in_hw(fnew->flags))
>>  		fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
>>
>> +	refcount_inc(&fnew->refcnt);
>
> I guess I'm not getting the semantics but... why is it 2 now?

As soon as fnew is inserted into head->handle_idr (one reference), it
becomes accessible to concurrent users, which means that it can be
deleted at any time. However, tp->change() returns a reference to newly
created filter to cls_api by assigning "arg" parameter to it (second
reference). After tp->change() returns, cls API continues to use fnew
and releases it with tfilter_put() when finished.

^ permalink raw reply

* Re: [REGRESSION 4.20] mvneta - DMA-API: device driver tries to sync DMA memory it has not allocated
From: Russell King - ARM Linux admin @ 2019-02-15 11:10 UTC (permalink / raw)
  To: Thomas Petazzoni, netdev
In-Reply-To: <20190212135919.xbka3lamjn4ifcki@shell.armlinux.org.uk>

On Tue, Feb 12, 2019 at 01:59:19PM +0000, Russell King - ARM Linux admin wrote:
> Hi,
> 
> Booting 4.20 on SolidRun Clearfog reliably provokes the following
> warning - this is with mvneta built in, but DSA as modules:
> 
> WARNING: CPU: 0 PID: 555 at kernel/dma/debug.c:1230 check_sync+0x514/0x5bc
> mvneta f1070000.ethernet: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000002dd7dc00] [size=240 bytes]
> Modules linked in: ahci mv88e6xxx dsa_core xhci_plat_hcd xhci_hcd devlink armada_thermal marvell_cesa des_generic ehci_orion phy_armada38x_comphy mcp3021 spi_orion evbug sfp mdio_i2c ip_tables x_tables
> CPU: 0 PID: 555 Comm: bridge-network- Not tainted 4.20.0+ #291
> Hardware name: Marvell Armada 380/385 (Device Tree)
> [<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
> [<c0014888>] (show_stack) from [<c07f54e0>] (dump_stack+0x9c/0xd4)
> [<c07f54e0>] (dump_stack) from [<c00312bc>] (__warn+0xf8/0x124)
> [<c00312bc>] (__warn) from [<c00313b0>] (warn_slowpath_fmt+0x38/0x48)
> [<c00313b0>] (warn_slowpath_fmt) from [<c00b0370>] (check_sync+0x514/0x5bc)
> [<c00b0370>] (check_sync) from [<c00b04f8>] (debug_dma_sync_single_range_for_cpu+0x6c/0x74)
> [<c00b04f8>] (debug_dma_sync_single_range_for_cpu) from [<c051bd14>] (mvneta_poll+0x298/0xf58)
> [<c051bd14>] (mvneta_poll) from [<c0656194>] (net_rx_action+0x128/0x424)
> [<c0656194>] (net_rx_action) from [<c000a230>] (__do_softirq+0xf0/0x540)
> [<c000a230>] (__do_softirq) from [<c00386e0>] (irq_exit+0x124/0x144)
> [<c00386e0>] (irq_exit) from [<c009b5e0>] (__handle_domain_irq+0x58/0xb0)
> [<c009b5e0>] (__handle_domain_irq) from [<c03a63c4>] (gic_handle_irq+0x48/0x98)
> [<c03a63c4>] (gic_handle_irq) from [<c0009a10>] (__irq_svc+0x70/0x98)
> ...
> 
> This appears to be from:
> 
>                 if (rx_bytes <= rx_copybreak) {
>                         /* better copy a small frame and not unmap the DMA region */
>                         skb = netdev_alloc_skb_ip_align(dev, rx_bytes);
>                         if (unlikely(!skb))
>                                 goto err_drop_frame_ret_pool;
> 
>                         dma_sync_single_range_for_cpu(dev->dev.parent,
>                                                       rx_desc->buf_phys_addr,
>                                                       MVNETA_MH_SIZE + NET_SKB_PAD,
>                                                       rx_bytes,
>                                                       DMA_FROM_DEVICE);
> 
> which suggests that rx_desc->buf_phys_addr is not something that should
> be passed to dma_sync_single_range_for_cpu().  I've not been able to
> track down why that is, nor which interface is provoking that.
> 
> As I don't have the details of how the buffer management hardware works
> on Armada 388, I'm unable to debug this myself.

Doing what debugging I _can_ do, it seems that this has been a long-term
error in mvneta, but one that was merely uncovered by:

  commit 562e2f467e71f45f0400ebee5077eaa426d3e426
  Author: Yelena Krivosheev <yelena@marvell.com>
  Date:   Wed Jul 18 18:10:57 2018 +0200

The buffer that is being complained about is sync'd using a device of
dev->dev.parent 'f1070000.ethernet', but is allocated by
mvneta_bm_construct() against a different device:

mvneta_bm_construct: 0x2dd85c00 +0x140 for ee113294 (f10c8000.bm)

namely 'f10c8000.bm'.

It's long-term, because it will only trigger in older kernels if we
hit the copy-break stuff, which used to do:

	if (rx_bytes <= rx_copybreak) {
		skb = netdev_alloc_skb_ip_align(dev, rx_bytes);
		if (unlikely(!skb)) {
			...
		}
		
		dma_sync_single_range_for_cpu(dev->dev.parent,
					      phys_addr,
					      MVNETA_MH_SIZE + NET_SKB_PAD,
					      rx_bytes,
					      DMA_FROM_DEVICE);

where rx_copybreak is 256 bytes.  Quite why that hasn't been seen
already, I do not know.

Looking at the code after the commit, if mvneta is used on a
non-coherent platform, then we have problems:

	copy_size = min(skb_size, rx_bytes);
	...
	memcpy(rxq->skb->data, data + MVNETA_MH_SIZE,
	       copy_size);
	...
	if (rxq->left_size == 0) {
		int size = copy_size + MVNETA_MH_SIZE;
		dma_sync_single_range_for_cpu(dev->dev.parent,
					      phys_addr, 0,
					      size,
					      DMA_FROM_DEVICE);

Since the sync is done _after_ we've copied data from a non-coherent
buffer.

If this code has been written to assume that we're always coherent,
then is there any point at all to having the incorrect dma_sync_*()
calls at all?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up

^ permalink raw reply

* Re: [PATCH] net: hns: Fix object reference leaks in hns_dsaf_roce_reset()
From: John Garry @ 2019-02-15 10:52 UTC (permalink / raw)
  To: Huang Zijiang, yisen.zhuang
  Cc: salil.mehta, davem, lipeng321, liuyonglong, yuehaibing, keescook,
	wangxi11, netdev, linux-kernel, wang.yi59, Linuxarm
In-Reply-To: <1550126505-28394-1-git-send-email-huang.zijiang@zte.com.cn>

On 14/02/2019 06:41, Huang Zijiang wrote:
> The of_find_device_by_node() takes a reference to the underlying device
> structure, we should release that reference.

of_find_device_by_node() is not called for every path, so is this change 
proper:

	/* find the platform device corresponding to fwnode */
	if (is_of_node(dsaf_fwnode)) {
		pdev = of_find_device_by_node(to_of_node(dsaf_fwnode));
	} else if (is_acpi_device_node(dsaf_fwnode)) {
		pdev = hns_dsaf_find_platform_device(dsaf_fwnode);
	} else {
		pr_err("fwnode is neither OF or ACPI type\n");
		return -EINVAL;
	}

	/* check if we were a success in fetching pdev */
	if (!pdev) {
		pr_err("couldn't find platform device for node\n");
		return -ENODEV;
	}

	/* retrieve the dsaf_device from the driver data */
	dsaf_dev = dev_get_drvdata(&pdev->dev);
	if (!dsaf_dev) {
		dev_err(&pdev->dev, "dsaf_dev is NULL\n");
		return -ENODEV;
	}

John

>
> Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
> ---
>  drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
> index 14d7ec7..697d929 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
> @@ -3081,6 +3081,7 @@ int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset)
>  	dsaf_dev = dev_get_drvdata(&pdev->dev);
>  	if (!dsaf_dev) {
>  		dev_err(&pdev->dev, "dsaf_dev is NULL\n");
> +		put_device(&pdev->dev);
>  		return -ENODEV;
>  	}
>
> @@ -3088,6 +3089,7 @@ int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool dereset)
>  	if (AE_IS_VER1(dsaf_dev->dsaf_ver)) {
>  		dev_err(dsaf_dev->dev, "%s v1 chip doesn't support RoCE!\n",
>  			dsaf_dev->ae_dev.name);
> +		put_device(&pdev->dev);
>  		return -ENODEV;
>  	}
>
>



^ permalink raw reply

* Re: [PATCH net-next 02/12] net: sched: flower: refactor fl_change
From: Stefano Brivio @ 2019-02-15 10:47 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev@vger.kernel.org, jhs@mojatatu.com,
	xiyou.wangcong@gmail.com, jiri@resnulli.us, davem@davemloft.net
In-Reply-To: <vbfftspjq48.fsf@mellanox.com>

On Fri, 15 Feb 2019 10:38:04 +0000
Vlad Buslov <vladbu@mellanox.com> wrote:

> On Thu 14 Feb 2019 at 20:34, Stefano Brivio <sbrivio@redhat.com> wrote:
> > On Thu, 14 Feb 2019 09:47:02 +0200
> > Vlad Buslov <vladbu@mellanox.com> wrote:
> >  
> >> As a preparation for using classifier spinlock instead of relying on
> >> external rtnl lock, rearrange code in fl_change. The goal is to group the
> >> code which changes classifier state in single block in order to allow
> >> following commits in this set to protect it from parallel modification with
> >> tp->lock. Data structures that require tp->lock protection are mask
> >> hashtable and filters list, and classifier handle_idr.
> >>
> >> fl_hw_replace_filter() is a sleeping function and cannot be called while
> >> holding a spinlock. In order to execute all sequence of changes to shared
> >> classifier data structures atomically, call fl_hw_replace_filter() before
> >> modifying them.
> >>
> >> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> >> Acked-by: Jiri Pirko <jiri@mellanox.com>
> >> ---
> >>  net/sched/cls_flower.c | 85 ++++++++++++++++++++++++++------------------------
> >>  1 file changed, 44 insertions(+), 41 deletions(-)
> >>
> >> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> >> index 88d7af78ba7e..91596a6271f8 100644
> >> --- a/net/sched/cls_flower.c
> >> +++ b/net/sched/cls_flower.c
> >> @@ -1354,90 +1354,93 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
> >>  	if (err < 0)
> >>  		goto errout;
> >>
> >> -	if (!handle) {
> >> -		handle = 1;
> >> -		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
> >> -				    INT_MAX, GFP_KERNEL);
> >> -	} else if (!fold) {
> >> -		/* user specifies a handle and it doesn't exist */
> >> -		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
> >> -				    handle, GFP_KERNEL);
> >> -	}
> >> -	if (err)
> >> -		goto errout;
> >> -	fnew->handle = handle;
> >> -
> >>
> >> [...]
> >>
> >>  	if (fold) {
> >> +		fnew->handle = handle;  
> >
> > I'm probably missing something, but what if fold is passed and the
> > handle isn't specified? That can still happen, right? In that case we
> > wouldn't be allocating the handle.  
> 
> Hi Stefano,
> 
> Thank you for reviewing my code.
> 
> Cls API lookups fold by handle, so this pointer can only be not NULL
> when user specified a handle and filter with such handle exists on tp.

Ah, of course. Thanks for clarifying. By the way, what tricked me here
was this check in fl_change():

	if (fold && handle && fold->handle != handle)
		...

which could be turned into:

	if (fold && fold->handle != handle)
		...

at this point.

-- 
Stefano

^ permalink raw reply

* r8169 driver causing screen flickering since commit a99790bf5c7f3d68d8b01e015d3212a98ee7bd57
From: Elias Rydberg @ 2019-02-15 10:44 UTC (permalink / raw)
  To: netdev

Hi,

Since commit a99790bf5c7f3d68d8b01e015d3212a98ee7bd57, some systems have 
experienced screen flickering, followed by a black screen, when using 
WiFi (for example starting NetworkManager). So far this only seems to be 
a problem with the following spec:

CPU: Intel i7-8550U
WiFi module: Intel AC-9260 1730Mbps

I contacted the author of the commit and got the following reply:

"From platforms I’ve seen, if ASPM on r8169 is disabled, the deepest 
Package C-State on Intel CPU will limit to PC3.

If the ASPM on r8169 is enabled, it can reach to PC8 when the display is 
on, and be able to reach PC10 when display is off.

So the issue seems to be a platform bug - deeper Package C state makes 
your system unstable.

Please contact the system vendor, the deepest Package C state can be 
reach should be set by BIOS."

Now, Kai-Heng suggest contacting the system vendor, but after consulting 
my distributions forum, I got the suggestion of contacting this email 
list in hope of implementing a patch disabling the ASPM in the r8169 in 
systems with this spec. I apologize if this is a completely unreasonable 
request; I'm not experienced with kernel developing.

Kind regards,
Elias Rydberg

Thread in my distributions forum, which includes more information:
https://bbs.archlinux.org/viewtopic.php?id=244115 
<https://bbs.archlinux.org/viewtopic.php?id=244115>

The result of a kernel bisection I did:

|a99790bf5c7f3d68d8b01e015d3212a98ee7bd57 is the first bad commit commit 
a99790bf5c7f3d68d8b01e015d3212a98ee7bd57 Author: Kai-Heng Feng 
<kai.heng.feng@canonical.com <mailto:kai.heng.feng@canonical.com>> Date: 
Thu Jun 21 16:30:39 2018 +0800 r8169: Reinstate ASPM Support On Intel 
platforms (Skylake and newer), ASPM support in r8169 is the last missing 
puzzle to let CPU's Package C-State reaches PC8. Without ASPM support, 
the CPU cannot reach beyond PC3. PC8 can save additional ~3W in 
comparison with PC3 on a Coffee Lake platform, Dell G3 3779. This is 
based on the work from Chunhao Lin <hau@realtek.com 
<mailto:hau@realtek.com>>. Signed-off-by: Kai-Heng Feng 
<kai.heng.feng@canonical.com <mailto:kai.heng.feng@canonical.com>> 
Signed-off-by: David S. Miller <davem@davemloft.net 
<mailto:davem@davemloft.net>> :040000 040000 
87d4515c14f8eb8cf99b6f47f97ca4db700931c1 
e542595c74609136a7094777afb10b83012f6643 M drivers|

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox