Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf] nfp: bpf: fix latency bug when updating stack index register
From: Song Liu @ 2019-08-26  5:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, Networking, oss-drivers,
	Jiong Wang
In-Reply-To: <20190824020028.6242-1-jakub.kicinski@netronome.com>

On Fri, Aug 23, 2019 at 7:04 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> From: Jiong Wang <jiong.wang@netronome.com>
>
> NFP is using Local Memory to model stack. LM_addr could be used as base of
> a 16 32-bit word region of Local Memory. Then, if the stack offset is
> beyond the current region, the local index needs to be updated. The update
> needs at least three cycles to take effect, therefore the sequence normally
> looks like:
>
>   local_csr_wr[ActLMAddr3, gprB_5]
>   nop
>   nop
>   nop
>
> If the local index switch happens on a narrow loads, then the instruction
> preparing value to zero high 32-bit of the destination register could be
> counted as one cycle, the sequence then could be something like:
>
>   local_csr_wr[ActLMAddr3, gprB_5]
>   nop
>   nop
>   immed[gprB_5, 0]
>
> However, we have zero extension optimization that zeroing high 32-bit could
> be eliminated, therefore above IMMED insn won't be available for which case
> the first sequence needs to be generated.
>
> Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen")
> Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
I haven't looked into the code yet. But ^^^ should be

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

right?

^ permalink raw reply

* Re: [PATCH net-next 04/14] bnxt_en: Handle firmware reset status during IF_UP.
From: David Miller @ 2019-08-26  5:47 UTC (permalink / raw)
  To: michael.chan; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui
In-Reply-To: <1566791705-20473-5-git-send-email-michael.chan@broadcom.com>

From: Michael Chan <michael.chan@broadcom.com>
Date: Sun, 25 Aug 2019 23:54:55 -0400

> @@ -7005,7 +7005,9 @@ static int __bnxt_hwrm_ver_get(struct bnxt *bp, bool silent)
>  
>  	rc = bnxt_hwrm_do_send_msg(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT,
>  				   silent);
> -	return rc;
> +	if (rc)
> +		return -ENODEV;
> +	return 0;
>  }
>  
>  static int bnxt_hwrm_ver_get(struct bnxt *bp)
 ...
> @@ -8528,26 +8533,53 @@ static int bnxt_hwrm_if_change(struct bnxt *bp, bool up)
>  		req.flags = cpu_to_le32(FUNC_DRV_IF_CHANGE_REQ_FLAGS_UP);
>  	mutex_lock(&bp->hwrm_cmd_lock);
>  	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
> -	if (!rc && (resp->flags &
> -		    cpu_to_le32(FUNC_DRV_IF_CHANGE_RESP_FLAGS_RESC_CHANGE)))
> -		resc_reinit = true;
> +	if (!rc)
> +		flags = le32_to_cpu(resp->flags);
>  	mutex_unlock(&bp->hwrm_cmd_lock);
> +	if (rc)
> +		return -EIO;

Following up to my other review comments, if _hwrm_send_message() et
al. returned consistently proper error codes instead of sometimes -1,
couldn't you avoid at least some of these 'rc' remappings?

^ permalink raw reply

* Re: [PATCH net-next 05/14] bnxt_en: Discover firmware error recovery capabilities.
From: David Miller @ 2019-08-26  5:49 UTC (permalink / raw)
  To: michael.chan; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui
In-Reply-To: <1566791705-20473-6-git-send-email-michael.chan@broadcom.com>

From: Michael Chan <michael.chan@broadcom.com>
Date: Sun, 25 Aug 2019 23:54:56 -0400

> +static int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
> +{
> +	struct hwrm_error_recovery_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
> +	struct bnxt_fw_health *fw_health = bp->fw_health;
> +	struct hwrm_error_recovery_qcfg_input req = {0};
> +	int rc, i;
> +
> +	if (!(bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY))
> +		return 0;
> +
> +	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_ERROR_RECOVERY_QCFG, -1, -1);
> +	mutex_lock(&bp->hwrm_cmd_lock);
> +	rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
> +	if (rc) {
> +		rc = -EOPNOTSUPP;
> +		goto err_recovery_out;
> +	}

How is this logically an unsupported operation if you're guarding it's use
with an appropriate capability check?

^ permalink raw reply

* Re: [PATCH net-next 08/14] bnxt_en: Add BNXT_STATE_IN_FW_RESET state and pf->registered_vfs.
From: David Miller @ 2019-08-26  5:59 UTC (permalink / raw)
  To: michael.chan; +Cc: netdev, vasundhara-v.volam, jiri, ray.jui
In-Reply-To: <1566791705-20473-9-git-send-email-michael.chan@broadcom.com>

From: Michael Chan <michael.chan@broadcom.com>
Date: Sun, 25 Aug 2019 23:54:59 -0400

> @@ -9234,6 +9243,13 @@ int bnxt_close_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init)
>  {
>  	int rc = 0;
>  
> +	while (test_bit(BNXT_STATE_IN_FW_RESET, &bp->state)) {
> +		netdev_info(bp->dev, "FW reset in progress, delaying close");
> +		rtnl_unlock();
> +		msleep(250);
> +		rtnl_lock();
> +	}

Dropping the RTNL here is extremely dangerous.

Operations other than actual device close can get into the
bnxt_close_nic() code paths (changing features, ethtool ops, etc.)

So we can thus re-enter this function once you drop the RTNL mutex.

Furthermore, and I understand what pains you go into in patch #9 to
avoid this, but it's an endless loop.  If there are bugs there, we
will get stuck in this close path forever.

^ permalink raw reply

* Re: [PATCH net-next 03/14] bnxt_en: Refactor bnxt_sriov_enable().
From: Leon Romanovsky @ 2019-08-26  6:00 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, netdev, vasundhara-v.volam, jiri, ray.jui
In-Reply-To: <1566791705-20473-4-git-send-email-michael.chan@broadcom.com>

On Sun, Aug 25, 2019 at 11:54:54PM -0400, Michael Chan wrote:
> Refactor the hardware/firmware configuration portion in
> bnxt_sriov_enable() into a new function bnxt_cfg_hw_sriov().  This
> new function can be called after a firmware reset to reconfigure the
> VFs previously enabled.

I wonder what does it mean for already bound VFs to vfio driver?
Will you rebind them as well? Can I assume that FW error in one VF
will trigger "restart" of other VFs too?

Thanks

^ permalink raw reply

* Re: [PATCH net-next 03/14] bnxt_en: Refactor bnxt_sriov_enable().
From: Michael Chan @ 2019-08-26  6:06 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev, Vasundhara Volam, Jiri Pirko, Ray Jui
In-Reply-To: <20190825.223603.2113058192469260500.davem@davemloft.net>

On Sun, Aug 25, 2019 at 10:36 PM David Miller <davem@davemloft.net> wrote:
>
> From: Michael Chan <michael.chan@broadcom.com>
> Date: Sun, 25 Aug 2019 23:54:54 -0400
>
> > @@ -687,6 +687,32 @@ static int bnxt_func_cfg(struct bnxt *bp, int num_vfs)
> >               return bnxt_hwrm_func_cfg(bp, num_vfs);
> >  }
> >
> > +int bnxt_cfg_hw_sriov(struct bnxt *bp, int *num_vfs)
> > +{
> > +     int rc;
> > +
> > +     /* Register buffers for VFs */
> > +     rc = bnxt_hwrm_func_buf_rgtr(bp);
> > +     if (rc)
> > +             return rc;
> > +
> > +     /* Reserve resources for VFs */
> > +     rc = bnxt_func_cfg(bp, *num_vfs);
> > +     if (rc != *num_vfs) {
>
> I notice that these two operations are reversed here from where they were in the
> bnxt_sriov_enable() function.  Does the BUF_RGTR operation have to be undone if
> the bnxt_func_cfg() fails?
>
> When it's not a straight extraction of code into a helper function one really
> should do one of two things in my opinion:
>
> 1) Explain the differences in the commit message.
>
> 2) Do a straight extration in one commit, change the ordering in another.

OK.  Will break it up into 2 commits with explanations.  The reason is
that during normal init, the PF is always initialized first and the
exact bring-up sequence does not matter too much.  During error
recovery, the PF and VFs can be all trying to recover at about the
same time and the sequence matters more.  Will explain all of this
again in v2.  Thanks.

^ permalink raw reply

* [PATCH bpf-next v2 0/4] xsk: various CPU barrier and {READ, WRITE}_ONCE fixes
From: Björn Töpel @ 2019-08-26  6:10 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
	bjorn.topel, jonathan.lemon, syzbot+c82697e3043781e08802, hdanton,
	i.maximets

This is a four patch series of various barrier, {READ, WRITE}_ONCE
cleanups in the AF_XDP socket code. More details can be found in the
corresponding commit message.

For an AF_XDP socket, most control plane operations are done under the
control mutex (struct xdp_sock, mutex), but there are some places
where members of the struct is read outside the control mutex. This,
as pointed out by Daniel in [1], requires proper {READ,
WRITE}_ONCE-correctness [2] [3]. To address this, and to simplify the
code, the state variable (introduced by Ilya), is now used a point of
synchronization ("is the socket in a valid state, or not").

Thanks,
Björn

[1] https://lore.kernel.org/bpf/beef16bb-a09b-40f1-7dd0-c323b4b89b17@iogearbox.net/
[2] https://lwn.net/Articles/793253/
[3] https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE

v1->v2:
  Removed redundant dev check. (Jonathan)

Björn Töpel (4):
  xsk: avoid store-tearing when assigning queues
  xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
  xsk: avoid store-tearing when assigning umem
  xsk: lock the control mutex in sock_diag interface

 net/xdp/xsk.c      | 61 +++++++++++++++++++++++++++++++---------------
 net/xdp/xsk_diag.c |  3 +++
 2 files changed, 45 insertions(+), 19 deletions(-)

-- 
2.20.1

^ permalink raw reply

* [PATCH bpf-next v2 1/4] xsk: avoid store-tearing when assigning queues
From: Björn Töpel @ 2019-08-26  6:10 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
	jonathan.lemon, syzbot+c82697e3043781e08802, hdanton, i.maximets
In-Reply-To: <20190826061053.15996-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

Use WRITE_ONCE when doing the store of tx, rx, fq, and cq, to avoid
potential store-tearing. These members are read outside of the control
mutex in the mmap implementation.

Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: 37b076933a8e ("xsk: add missing write- and data-dependency barrier")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index ee4428a892fa..f3351013c2a5 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -409,7 +409,7 @@ static int xsk_init_queue(u32 entries, struct xsk_queue **queue,
 
 	/* Make sure queue is ready before it can be seen by others */
 	smp_wmb();
-	*queue = q;
+	WRITE_ONCE(*queue, q);
 	return 0;
 }
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH bpf-next v2 2/4] xsk: add proper barriers and {READ, WRITE}_ONCE-correctness for state
From: Björn Töpel @ 2019-08-26  6:10 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
	jonathan.lemon, syzbot+c82697e3043781e08802, hdanton, i.maximets
In-Reply-To: <20190826061053.15996-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

The state variable was read, and written outside the control mutex
(struct xdp_sock, mutex), without proper barriers and {READ,
WRITE}_ONCE correctness.

In this commit this issue is addressed, and the state member is now
used a point of synchronization whether the socket is setup correctly
or not.

This also fixes a race, found by syzcaller, in xsk_poll() where umem
could be accessed when stale.

Suggested-by: Hillf Danton <hdanton@sina.com>
Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
Fixes: 77cd0d7b3f25 ("xsk: add support for need_wakeup flag in AF_XDP rings")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 57 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 40 insertions(+), 17 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index f3351013c2a5..8fafa3ce3ae6 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -162,10 +162,23 @@ static int __xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 	return err;
 }
 
+static bool xsk_is_bound(struct xdp_sock *xs)
+{
+	if (READ_ONCE(xs->state) == XSK_BOUND) {
+		/* Matches smp_wmb() in bind(). */
+		smp_rmb();
+		return true;
+	}
+	return false;
+}
+
 int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	u32 len;
 
+	if (!xsk_is_bound(xs))
+		return -EINVAL;
+
 	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
 		return -EINVAL;
 
@@ -362,7 +375,7 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	struct sock *sk = sock->sk;
 	struct xdp_sock *xs = xdp_sk(sk);
 
-	if (unlikely(!xs->dev))
+	if (unlikely(!xsk_is_bound(xs)))
 		return -ENXIO;
 	if (unlikely(!(xs->dev->flags & IFF_UP)))
 		return -ENETDOWN;
@@ -378,10 +391,15 @@ static unsigned int xsk_poll(struct file *file, struct socket *sock,
 			     struct poll_table_struct *wait)
 {
 	unsigned int mask = datagram_poll(file, sock, wait);
-	struct sock *sk = sock->sk;
-	struct xdp_sock *xs = xdp_sk(sk);
-	struct net_device *dev = xs->dev;
-	struct xdp_umem *umem = xs->umem;
+	struct xdp_sock *xs = xdp_sk(sock->sk);
+	struct net_device *dev;
+	struct xdp_umem *umem;
+
+	if (unlikely(!xsk_is_bound(xs)))
+		return mask;
+
+	dev = xs->dev;
+	umem = xs->umem;
 
 	if (umem->need_wakeup)
 		dev->netdev_ops->ndo_xsk_wakeup(dev, xs->queue_id,
@@ -417,10 +435,9 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
 {
 	struct net_device *dev = xs->dev;
 
-	if (!dev || xs->state != XSK_BOUND)
+	if (xs->state != XSK_BOUND)
 		return;
-
-	xs->state = XSK_UNBOUND;
+	WRITE_ONCE(xs->state, XSK_UNBOUND);
 
 	/* Wait for driver to stop using the xdp socket. */
 	xdp_del_sk_umem(xs->umem, xs);
@@ -495,7 +512,9 @@ static int xsk_release(struct socket *sock)
 	local_bh_enable();
 
 	xsk_delete_from_maps(xs);
+	mutex_lock(&xs->mutex);
 	xsk_unbind_dev(xs);
+	mutex_unlock(&xs->mutex);
 
 	xskq_destroy(xs->rx);
 	xskq_destroy(xs->tx);
@@ -589,19 +608,18 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 		}
 
 		umem_xs = xdp_sk(sock->sk);
-		if (!umem_xs->umem) {
-			/* No umem to inherit. */
+		if (!xsk_is_bound(umem_xs)) {
 			err = -EBADF;
 			sockfd_put(sock);
 			goto out_unlock;
-		} else if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
+		}
+		if (umem_xs->dev != dev || umem_xs->queue_id != qid) {
 			err = -EINVAL;
 			sockfd_put(sock);
 			goto out_unlock;
 		}
-
 		xdp_get_umem(umem_xs->umem);
-		xs->umem = umem_xs->umem;
+		WRITE_ONCE(xs->umem, umem_xs->umem);
 		sockfd_put(sock);
 	} else if (!xs->umem || !xdp_umem_validate_queues(xs->umem)) {
 		err = -EINVAL;
@@ -626,10 +644,15 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len)
 	xdp_add_sk_umem(xs->umem, xs);
 
 out_unlock:
-	if (err)
+	if (err) {
 		dev_put(dev);
-	else
-		xs->state = XSK_BOUND;
+	} else {
+		/* Matches smp_rmb() in bind() for shared umem
+		 * sockets, and xsk_is_bound().
+		 */
+		smp_wmb();
+		WRITE_ONCE(xs->state, XSK_BOUND);
+	}
 out_release:
 	mutex_unlock(&xs->mutex);
 	rtnl_unlock();
@@ -869,7 +892,7 @@ static int xsk_mmap(struct file *file, struct socket *sock,
 	unsigned long pfn;
 	struct page *qpg;
 
-	if (xs->state != XSK_READY)
+	if (READ_ONCE(xs->state) != XSK_READY)
 		return -EBUSY;
 
 	if (offset == XDP_PGOFF_RX_RING) {
-- 
2.20.1


^ permalink raw reply related

* [PATCH bpf-next v2 3/4] xsk: avoid store-tearing when assigning umem
From: Björn Töpel @ 2019-08-26  6:10 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
	jonathan.lemon, syzbot+c82697e3043781e08802, hdanton, i.maximets
In-Reply-To: <20190826061053.15996-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

The umem member of struct xdp_sock is read outside of the control
mutex, in the mmap implementation, and needs a WRITE_ONCE to avoid
potentional store-tearing.

Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: 423f38329d26 ("xsk: add umem fill queue support and mmap")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 8fafa3ce3ae6..e3e99ee5631b 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -716,7 +716,7 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname,
 
 		/* Make sure umem is ready before it can be seen by others */
 		smp_wmb();
-		xs->umem = umem;
+		WRITE_ONCE(xs->umem, umem);
 		mutex_unlock(&xs->mutex);
 		return 0;
 	}
-- 
2.20.1


^ permalink raw reply related

* [PATCH bpf-next v2 4/4] xsk: lock the control mutex in sock_diag interface
From: Björn Töpel @ 2019-08-26  6:10 UTC (permalink / raw)
  To: ast, daniel, netdev
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, bpf,
	jonathan.lemon, syzbot+c82697e3043781e08802, hdanton, i.maximets
In-Reply-To: <20190826061053.15996-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

When accessing the members of an XDP socket, the control mutex should
be held. This commit fixes that.

Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Fixes: a36b38aa2af6 ("xsk: add sock_diag interface for AF_XDP")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 net/xdp/xsk_diag.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/xdp/xsk_diag.c b/net/xdp/xsk_diag.c
index d5e06c8e0cbf..c8f4f11edbbc 100644
--- a/net/xdp/xsk_diag.c
+++ b/net/xdp/xsk_diag.c
@@ -97,6 +97,7 @@ static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
 	msg->xdiag_ino = sk_ino;
 	sock_diag_save_cookie(sk, msg->xdiag_cookie);
 
+	mutex_lock(&xs->mutex);
 	if ((req->xdiag_show & XDP_SHOW_INFO) && xsk_diag_put_info(xs, nlskb))
 		goto out_nlmsg_trim;
 
@@ -117,10 +118,12 @@ static int xsk_diag_fill(struct sock *sk, struct sk_buff *nlskb,
 	    sock_diag_put_meminfo(sk, nlskb, XDP_DIAG_MEMINFO))
 		goto out_nlmsg_trim;
 
+	mutex_unlock(&xs->mutex);
 	nlmsg_end(nlskb, nlh);
 	return 0;
 
 out_nlmsg_trim:
+	mutex_unlock(&xs->mutex);
 	nlmsg_cancel(nlskb, nlh);
 	return -EMSGSIZE;
 }
-- 
2.20.1


^ permalink raw reply related

* [PATCH v2] net: fix skb use after free in netpoll
From: Feng Sun @ 2019-08-26  6:13 UTC (permalink / raw)
  To: davem
  Cc: edumazet, dsterba, dbanerje, fw, davej, tglx, matwey,
	sakari.ailus, netdev, linux-kernel, Feng Sun, Xiaojun Zhao
In-Reply-To: <1566577920-20956-1-git-send-email-loyou85@gmail.com>

After commit baeababb5b85d5c4e6c917efe2a1504179438d3b
("tun: return NET_XMIT_DROP for dropped packets"),
when tun_net_xmit drop packets, it will free skb and return NET_XMIT_DROP,
netpoll_send_skb_on_dev will run into following use after free cases:
1. retry netpoll_start_xmit with freed skb;
2. queue freed skb in npinfo->txq.
queue_process will also run into use after free case.

hit netpoll_send_skb_on_dev first case with following kernel log:

[  117.864773] kernel BUG at mm/slub.c:306!
[  117.864773] invalid opcode: 0000 [#1] SMP PTI
[  117.864774] CPU: 3 PID: 2627 Comm: loop_printmsg Kdump: loaded Tainted: P           OE     5.3.0-050300rc5-generic #201908182231
[  117.864775] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  117.864775] RIP: 0010:kmem_cache_free+0x28d/0x2b0
[  117.864781] Call Trace:
[  117.864781]  ? tun_net_xmit+0x21c/0x460
[  117.864781]  kfree_skbmem+0x4e/0x60
[  117.864782]  kfree_skb+0x3a/0xa0
[  117.864782]  tun_net_xmit+0x21c/0x460
[  117.864782]  netpoll_start_xmit+0x11d/0x1b0
[  117.864788]  netpoll_send_skb_on_dev+0x1b8/0x200
[  117.864789]  __br_forward+0x1b9/0x1e0 [bridge]
[  117.864789]  ? skb_clone+0x53/0xd0
[  117.864790]  ? __skb_clone+0x2e/0x120
[  117.864790]  deliver_clone+0x37/0x50 [bridge]
[  117.864790]  maybe_deliver+0x89/0xc0 [bridge]
[  117.864791]  br_flood+0x6c/0x130 [bridge]
[  117.864791]  br_dev_xmit+0x315/0x3c0 [bridge]
[  117.864792]  netpoll_start_xmit+0x11d/0x1b0
[  117.864792]  netpoll_send_skb_on_dev+0x1b8/0x200
[  117.864792]  netpoll_send_udp+0x2c6/0x3e8
[  117.864793]  write_msg+0xd9/0xf0 [netconsole]
[  117.864793]  console_unlock+0x386/0x4e0
[  117.864793]  vprintk_emit+0x17e/0x280
[  117.864794]  vprintk_default+0x29/0x50
[  117.864794]  vprintk_func+0x4c/0xbc
[  117.864794]  printk+0x58/0x6f
[  117.864795]  loop_fun+0x24/0x41 [printmsg_loop]
[  117.864795]  kthread+0x104/0x140
[  117.864795]  ? 0xffffffffc05b1000
[  117.864796]  ? kthread_park+0x80/0x80
[  117.864796]  ret_from_fork+0x35/0x40

Signed-off-by: Feng Sun <loyou85@gmail.com>
Signed-off-by: Xiaojun Zhao <xiaojunzhao141@gmail.com>
---
Changes in v2:
- change commit and title
- add netpoll_xmit_complete helper
- add one more return value check of netpoll_start_xmit
---
 net/core/netpoll.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 2cf27da..a3f20e9 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -95,6 +95,11 @@ static int netpoll_start_xmit(struct sk_buff *skb, struct net_device *dev,
 	return status;
 }
 
+static inline bool netpoll_xmit_complete(int rc)
+{
+	return dev_xmit_complete(rc);
+}
+
 static void queue_process(struct work_struct *work)
 {
 	struct netpoll_info *npinfo =
@@ -122,7 +127,7 @@ static void queue_process(struct work_struct *work)
 		txq = netdev_get_tx_queue(dev, q_index);
 		HARD_TX_LOCK(dev, txq, smp_processor_id());
 		if (netif_xmit_frozen_or_stopped(txq) ||
-		    netpoll_start_xmit(skb, dev, txq) != NETDEV_TX_OK) {
+		    !netpoll_xmit_complete(netpoll_start_xmit(skb, dev, txq))) {
 			skb_queue_head(&npinfo->txq, skb);
 			HARD_TX_UNLOCK(dev, txq);
 			local_irq_restore(flags);
@@ -335,7 +340,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
 
 				HARD_TX_UNLOCK(dev, txq);
 
-				if (status == NETDEV_TX_OK)
+				if (netpoll_xmit_complete(status))
 					break;
 
 			}
@@ -352,7 +357,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
 
 	}
 
-	if (status != NETDEV_TX_OK) {
+	if (!netpoll_xmit_complete(status)) {
 		skb_queue_tail(&npinfo->txq, skb);
 		schedule_delayed_work(&npinfo->tx_work,0);
 	}
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH net-next 01/14] bnxt_en: Suppress all error messages in hwrm_do_send_msg() in silent mode.
From: Michael Chan @ 2019-08-26  6:17 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev, Vasundhara Volam, Jiri Pirko, Ray Jui
In-Reply-To: <20190825.221507.1465677703637201643.davem@davemloft.net>

On Sun, Aug 25, 2019 at 10:15 PM David Miller <davem@davemloft.net> wrote:
>
> From: Michael Chan <michael.chan@broadcom.com>
> Date: Sun, 25 Aug 2019 23:54:52 -0400
>
> > If the silent parameter is set, suppress all messages when there is
> > no response from firmware.  When polling for firmware to come out of
> > reset, no response may be normal and we want to suppress the error
> > messages.  Also, don't poll for the firmware DMA response if Bus Master
> > is disabled.  This is in preparation for error recovery when firmware
> > may be in error or reset state or Bus Master is disabled.
> >
> > Signed-off-by: Michael Chan <michael.chan@broadcom.com>
>
> The function bnxt_hwrm_do_send_msg() seems to be an interesting mix of return
> values, what are the semantics?
>
> It seems to use 0 for success, some error codes, and -1.  Does -1 have special
> meaning?
>
> Just curious, and really this unorthodox return value semantic should
> be documented into a comment above the function.

Sadly, it was coded initially to return firmware defined error codes.
But in some cases, the return code gets propagated all the way back to
userspace.  The long term goal is to convert to standard error codes
and so we try to use standard error codes whenever we add new patches
related to this function.  I will see what I can do to make this
better in v2.  Thanks.

^ permalink raw reply

* Re: [PATCH v2] net: fix skb use after free in netpoll
From: David Miller @ 2019-08-26  6:20 UTC (permalink / raw)
  To: loyou85
  Cc: edumazet, dsterba, dbanerje, fw, davej, tglx, matwey,
	sakari.ailus, netdev, linux-kernel, xiaojunzhao141
In-Reply-To: <1566800020-10007-1-git-send-email-loyou85@gmail.com>

From: Feng Sun <loyou85@gmail.com>
Date: Mon, 26 Aug 2019 14:13:40 +0800

> +static inline bool netpoll_xmit_complete(int rc)
> +{
> +	return dev_xmit_complete(rc);
> +}

There is no need for this useless indirection, just call dev_xmit_complete()
staright.

Also, even if it was suitable, never use the inline keyword in foo.c files.

^ permalink raw reply

* Re: [PATCH net-next 05/14] bnxt_en: Discover firmware error recovery capabilities.
From: Michael Chan @ 2019-08-26  6:23 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev, Vasundhara Volam, Jiri Pirko, Ray Jui
In-Reply-To: <20190825.224913.1760774642952210371.davem@davemloft.net>

On Sun, Aug 25, 2019 at 10:49 PM David Miller <davem@davemloft.net> wrote:
>
> From: Michael Chan <michael.chan@broadcom.com>
> Date: Sun, 25 Aug 2019 23:54:56 -0400
>
> > +static int bnxt_hwrm_error_recovery_qcfg(struct bnxt *bp)
> > +{
> > +     struct hwrm_error_recovery_qcfg_output *resp = bp->hwrm_cmd_resp_addr;
> > +     struct bnxt_fw_health *fw_health = bp->fw_health;
> > +     struct hwrm_error_recovery_qcfg_input req = {0};
> > +     int rc, i;
> > +
> > +     if (!(bp->fw_cap & BNXT_FW_CAP_ERROR_RECOVERY))
> > +             return 0;
> > +
> > +     bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_ERROR_RECOVERY_QCFG, -1, -1);
> > +     mutex_lock(&bp->hwrm_cmd_lock);
> > +     rc = _hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
> > +     if (rc) {
> > +             rc = -EOPNOTSUPP;
> > +             goto err_recovery_out;
> > +     }
>
> How is this logically an unsupported operation if you're guarding it's use
> with an appropriate capability check?

The BNXT_FW_CAP_ERROR_RECOVERY flag says that error recovery should be
supported, but if the firmware call to get the recovery parameters
fails, we return -EOPNOTSUPP to let the caller know that error
recovery cannot be supported.  Again, I will try to clean up the error
codes in v2.

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2019-08-26  6:29 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) Use 32-bit index for tails calls in s390 bpf JIT, from Ilya Leoshkevich.

2) Fix missed EPOLLOUT events in TCP, from Eric Dumazet.  Same fix for SMC
   from Jason Baron.

3) ipv6_mc_may_pull() should return 0 for malformed packets, not -EINVAL.
   From Stefano Brivio.

4) Don't forget to unpin umem xdp pages in error path of
   xdp_umem_reg().  From Ivan Khoronzhuk.

5) Fix sta object leak in mac80211, from Johannes Berg.

6) Fix regression by not configuring PHYLINK on CPU port of bcm_sf2
   switches.  From Florian Fainelli.

7) Revert DMA sync removal from r8169 which was causing regressions on some
   MIPS Loongson platforms.  From Heiner Kallweit.

8) Use after free in flow dissector, from Jakub Sitnicki.

9) Fix NULL derefs of net devices during ICMP processing across collect_md
   tunnels, from Hangbin Liu.

10) proto_register() memory leaks, from Zhang Lin.

11) Set NLM_F_MULTI flag in multipart netlink messages consistently, from
    John Fastabend.

Please pull, thanks a lot!

The following changes since commit 06821504fd47a5e5b641aeeb638a0ae10a216ef8:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2019-08-19 10:00:01 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git 

for you to fetch changes up to f53a7ad189594a112167efaf17ea8d0242b5ac00:

  r8152: Set memory to all 0xFFs on failed reg reads (2019-08-25 19:52:59 -0700)

----------------------------------------------------------------
Alexander Wetzel (1):
      cfg80211: Fix Extended Key ID key install checks

Alexei Starovoitov (1):
      bpf: fix precision tracking in presence of bpf2bpf calls

Alexey Kodanev (1):
      ipv4: mpls: fix mpls_xmit for iptunnel

Anders Roxell (2):
      selftests/bpf: add config fragment BPF_JIT
      selftests/bpf: install files test_xdp_vlan.sh

Andrew Lunn (1):
      MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT

Antoine Tenart (1):
      net: cpsw: fix NULL pointer exception in the probe error path

Christophe JAILLET (1):
      Kconfig: Fix the reference to the IDT77105 Phy driver in the description of ATM_NICSTAR_USE_IDT77105

Colin Ian King (1):
      net: ieee802154: remove redundant assignment to rc

Dan Carpenter (1):
      gve: Copy and paste bug in gve_get_stats()

Daniel Borkmann (1):
      bpf: fix use after free in prog symbol exposure

David Ahern (1):
      nexthop: Fix nexthop_num_path for blackhole nexthops

David S. Miller (8):
      Merge git://git.kernel.org/.../pablo/nf
      Merge tag 'mac80211-for-davem-2019-08-21' of git://git.kernel.org/.../jberg/mac80211
      Merge tag 'batadv-net-for-davem-20190821' of git://git.open-mesh.org/linux-merge
      Merge tag 'wireless-drivers-for-davem-2019-08-21' of git://git.kernel.org/.../kvalo/wireless-drivers
      Merge git://git.kernel.org/.../bpf/bpf
      Merge branch 'ieee802154-for-davem-2019-08-24' of git://git.kernel.org/.../sschmidt/wpan
      Merge branch 'collect_md-mode-dev-null'
      Merge tag 'mlx5-fixes-2019-08-22' of git://git.kernel.org/.../saeed/linux

Denis Efremov (2):
      MAINTAINERS: Remove IP MASQUERADING record
      MAINTAINERS: net_failover: Fix typo in a filepath

Emmanuel Grumbach (1):
      iwlwifi: pcie: fix the byte count table format for 22560 devices

Eran Ben Elisha (2):
      net/mlx5e: Add num bytes metadata to WQE info
      net/mlx5e: Remove ethernet segment from dump WQE

Eric Dumazet (2):
      batman-adv: fix uninit-value in batadv_netlink_get_ifindex()
      tcp: make sure EPOLLOUT wont be missed

Florian Fainelli (1):
      net: dsa: bcm_sf2: Do not configure PHYLINK on CPU port

Hangbin Liu (3):
      ipv6/addrconf: allow adding multicast addr if IFA_F_MCAUTOJOIN is set
      ipv4/icmp: fix rt dst dev null pointer dereference
      xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode

Heiner Kallweit (1):
      Revert "r8169: remove not needed call to dma_sync_single_for_device"

Hodaszi, Robert (1):
      Revert "cfg80211: fix processing world regdomain when non modular"

Ilan Peer (1):
      iwlwifi: mvm: Allow multicast data frames only when associated

Ilya Leoshkevich (6):
      s390/bpf: fix lcgr instruction encoding
      s390/bpf: use 32-bit index for tail calls
      selftests/bpf: fix "bind{4, 6} deny specific IP & port" on s390
      selftests/bpf: fix test_cgroup_storage on s390
      selftests/bpf: fix test_btf_dump with O=
      bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0

Ivan Khoronzhuk (1):
      xdp: unpin xdp umem pages in error path

Jakub Sitnicki (1):
      flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH

Jason Baron (1):
      net/smc: make sure EPOLLOUT is raised

Johannes Berg (1):
      mac80211: fix possible sta leak

John Fastabend (1):
      net: route dump netlink NLM_F_MULTI flag missing

Julian Wiedmann (1):
      s390/qeth: reject oversized SNMP requests

Juliana Rodrigueiro (1):
      netfilter: xt_nfacct: Fix alignment mismatch in xt_nfacct_match_info

Justin.Lee1@Dell.com (1):
      net/ncsi: Fix the payload copying for the request coming from Netlink

Li RongQing (2):
      net: fix __ip_mc_inc_group usage
      net: fix icmp_socket_deliver argument 2 input

Luca Coelho (2):
      iwlwifi: pcie: don't switch FW to qnj when ax201 is detected
      iwlwifi: pcie: fix recognition of QuZ devices

Masahiro Yamada (1):
      netfilter: add include guard to nf_conntrack_h323_types.h

Mike Rapoport (1):
      trivial: netns: fix typo in 'struct net.passive' description

Moshe Shemesh (2):
      net/mlx5: Fix crdump chunks print
      net/mlx5: Fix delay in fw fatal report handling due to fw report

Pablo Neira Ayuso (1):
      netfilter: nft_flow_offload: missing netlink attribute policy

Prashant Malani (1):
      r8152: Set memory to all 0xFFs on failed reg reads

Quentin Monnet (1):
      tools: bpftool: close prog FD before exit on showing a single program

Sabrina Dubroca (1):
      ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev

Stanislaw Gruszka (2):
      mt76: mt76x0u: do not reset radio on resume
      rt2x00: clear IV's on start to fix AP mode regression

Stefano Brivio (1):
      ipv6: Fix return value of ipv6_mc_may_pull() for malformed packets

Terry S. Duncan (1):
      net/ncsi: Ensure 32-bit boundary for data cksum

Todd Seidelmann (1):
      netfilter: ebtables: Fix argument order to ADD_COUNTER

Vlad Buslov (1):
      nfp: flower: verify that block cb is not busy before binding

Wenwen Wang (1):
      qed: Add cleanup in qed_slowpath_start()

Yangbo Lu (1):
      ocelot_ace: fix action of trap

Yi-Hung Wei (2):
      openvswitch: Fix log message in ovs conntrack
      openvswitch: Fix conntrack cache with timeout

YueHaibing (2):
      ieee802154: hwsim: Fix error handle path in hwsim_init_module
      ieee802154: hwsim: unregister hw while hwsim_subscribe_all_others fails

Zhu Yanjun (1):
      net: rds: add service level support in rds-info

zhanglin (1):
      sock: fix potential memory leak in proto_register()

 MAINTAINERS                                                |  8 ++------
 arch/s390/net/bpf_jit_comp.c                               | 12 +++++++-----
 drivers/atm/Kconfig                                        |  2 +-
 drivers/net/dsa/bcm_sf2.c                                  | 10 ++++++++--
 drivers/net/ethernet/google/gve/gve_main.c                 |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 38 +++++++++++++++++---------------------
 drivers/net/ethernet/mellanox/mlx5/core/health.c           | 22 ++++++++++++----------
 drivers/net/ethernet/mscc/ocelot_ace.c                     |  2 +-
 drivers/net/ethernet/netronome/nfp/flower/offload.c        |  7 +++++++
 drivers/net/ethernet/qlogic/qed/qed_main.c                 |  4 +++-
 drivers/net/ethernet/realtek/r8169_main.c                  |  1 +
 drivers/net/ethernet/ti/cpsw.c                             |  2 +-
 drivers/net/ieee802154/mac802154_hwsim.c                   |  8 +++++---
 drivers/net/usb/r8152.c                                    |  5 ++++-
 drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c          | 33 ++++++++++++++++++++++++++++++---
 drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c          | 10 ++++++++++
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c              | 17 +++++++++++++++++
 drivers/net/wireless/intel/iwlwifi/pcie/trans.c            |  1 +
 drivers/net/wireless/intel/iwlwifi/pcie/tx-gen2.c          | 20 +++++++++++++-------
 drivers/net/wireless/mediatek/mt76/mt76x0/usb.c            |  8 ++++----
 drivers/net/wireless/ralink/rt2x00/rt2800lib.c             |  9 +++++++++
 drivers/net/wireless/ralink/rt2x00/rt2x00.h                |  1 +
 drivers/net/wireless/ralink/rt2x00/rt2x00dev.c             | 13 ++++++++-----
 drivers/s390/net/qeth_core_main.c                          |  4 ++++
 include/linux/netfilter/nf_conntrack_h323_types.h          |  5 +++++
 include/net/addrconf.h                                     |  2 +-
 include/net/net_namespace.h                                |  2 +-
 include/net/nexthop.h                                      |  6 ------
 include/net/route.h                                        |  2 +-
 include/uapi/linux/netfilter/xt_nfacct.h                   |  5 +++++
 include/uapi/linux/rds.h                                   |  2 ++
 kernel/bpf/syscall.c                                       | 30 ++++++++++++++++++------------
 kernel/bpf/verifier.c                                      |  9 +++++----
 net/batman-adv/netlink.c                                   |  2 +-
 net/bridge/netfilter/ebtables.c                            |  8 ++++----
 net/core/filter.c                                          |  8 ++++----
 net/core/flow_dissector.c                                  |  2 +-
 net/core/sock.c                                            | 31 +++++++++++++++++++++----------
 net/core/stream.c                                          | 16 +++++++++-------
 net/ieee802154/socket.c                                    |  2 +-
 net/ipv4/fib_trie.c                                        |  2 +-
 net/ipv4/icmp.c                                            | 10 ++++++++--
 net/ipv4/igmp.c                                            |  4 ++--
 net/ipv4/route.c                                           | 17 ++++++++++-------
 net/ipv6/addrconf.c                                        | 19 ++++++++++---------
 net/mac80211/cfg.c                                         |  9 +++++----
 net/mpls/mpls_iptunnel.c                                   |  8 ++++----
 net/ncsi/ncsi-cmd.c                                        | 13 ++++++++++---
 net/ncsi/ncsi-rsp.c                                        |  9 ++++++---
 net/netfilter/nft_flow_offload.c                           |  6 ++++++
 net/netfilter/xt_nfacct.c                                  | 36 +++++++++++++++++++++++++-----------
 net/openvswitch/conntrack.c                                | 15 ++++++++++++++-
 net/rds/ib.c                                               | 16 ++++++++++------
 net/rds/ib.h                                               |  1 +
 net/rds/ib_cm.c                                            |  3 +++
 net/rds/rdma_transport.c                                   | 10 ++++++++--
 net/smc/smc_tx.c                                           |  6 ++----
 net/wireless/reg.c                                         |  2 +-
 net/wireless/util.c                                        | 23 ++++++++++++++---------
 net/xdp/xdp_umem.c                                         |  4 +++-
 net/xfrm/xfrm_policy.c                                     |  4 ++--
 tools/bpf/bpftool/prog.c                                   |  4 +++-
 tools/testing/selftests/bpf/Makefile                       |  6 +++++-
 tools/testing/selftests/bpf/config                         |  1 +
 tools/testing/selftests/bpf/test_btf_dump.c                |  7 +++++++
 tools/testing/selftests/bpf/test_cgroup_storage.c          |  6 +++---
 tools/testing/selftests/bpf/test_sock.c                    |  7 +++++--
 67 files changed, 415 insertions(+), 204 deletions(-)

^ permalink raw reply

* Re: Unable to create htb tc classes more than 64K
From: Eric Dumazet @ 2019-08-26  6:32 UTC (permalink / raw)
  To: Cong Wang, Akshat Kakkar; +Cc: Anton Danilov, NetFilter, lartc, netdev
In-Reply-To: <CAM_iQpVtGUH6CAAegRtTgyemLtHsO+RFP8f6LH2WtiYu9-srfw@mail.gmail.com>



On 8/25/19 7:52 PM, Cong Wang wrote:
> On Wed, Aug 21, 2019 at 11:00 PM Akshat Kakkar <akshat.1984@gmail.com> wrote:
>>
>> On Thu, Aug 22, 2019 at 3:37 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>> I am using ipset +  iptables to classify and not filters. Besides, if
>>>> tc is allowing me to define qdisc -> classes -> qdsic -> classes
>>>> (1,2,3 ...) sort of structure (ie like the one shown in ascii tree)
>>>> then how can those lowest child classes be actually used or consumed?
>>>
>>> Just install tc filters on the lower level too.
>>
>> If I understand correctly, you are saying,
>> instead of :
>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>> 0x00000001 fw flowid 1:10
>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>> 0x00000002 fw flowid 1:20
>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>> 0x00000003 fw flowid 2:10
>> tc filter add dev eno2 parent 100: protocol ip prio 1 handle
>> 0x00000004 fw flowid 2:20
>>
>>
>> I should do this: (i.e. changing parent to just immediate qdisc)
>> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000001
>> fw flowid 1:10
>> tc filter add dev eno2 parent 1: protocol ip prio 1 handle 0x00000002
>> fw flowid 1:20
>> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000003
>> fw flowid 2:10
>> tc filter add dev eno2 parent 2: protocol ip prio 1 handle 0x00000004
>> fw flowid 2:20
> 
> 
> Yes, this is what I meant.
> 
> 
>>
>> I tried this previously. But there is not change in the result.
>> Behaviour is exactly same, i.e. I am still getting 100Mbps and not
>> 100kbps or 300kbps
>>
>> Besides, as I mentioned previously I am using ipset + skbprio and not
>> filters stuff. Filters I used just to test.
>>
>> ipset  -N foo hash:ip,mark skbinfo
>>
>> ipset -A foo 10.10.10.10, 0x0x00000001 skbprio 1:10
>> ipset -A foo 10.10.10.20, 0x0x00000002 skbprio 1:20
>> ipset -A foo 10.10.10.30, 0x0x00000003 skbprio 2:10
>> ipset -A foo 10.10.10.40, 0x0x00000004 skbprio 2:20
>>
>> iptables -A POSTROUTING -j SET --map-set foo dst,dst --map-prio
> 
> Hmm..
> 
> I am not familiar with ipset, but it seems to save the skbprio into
> skb->priority, so it doesn't need TC filter to classify it again.
> 
> I guess your packets might go to the direct queue of HTB, which
> bypasses the token bucket. Can you dump the stats and check?

With more than 64K 'classes' I suggest to use a single FQ qdisc [1], and
an eBPF program using EDT model (Earliest Departure Time)

The BPF program would perform the classification, then find a data structure
based on the 'class', and then update/maintain class virtual times and skb->tstamp

TBF = bpf_map_lookup_elem(&map, &classid);

uint64_t now = bpf_ktime_get_ns();
uint64_t time_to_send = max(TBF->time_to_send, now);

time_to_send += (u64)qdisc_pkt_len(skb) * NSEC_PER_SEC / TBF->rate;
if (time_to_send > TBF->max_horizon) {
    return TC_ACT_SHOT;
}
TBF->time_to_send = time_to_send;
skb->tstamp = max(time_to_send, skb->tstamp);
if (time_to_send - now > TBF->ecn_horizon)
    bpf_skb_ecn_set_ce(skb);
return TC_ACT_OK;

tools/testing/selftests/bpf/progs/test_tc_edt.c shows something similar.


[1]  MQ + FQ if the device is multi-queues.

   Note that this setup scales very well on SMP, since we no longer are forced
 to use a single HTB hierarchy (protected by a single spinlock)


^ permalink raw reply

* Re: libbpf distro packaging
From: Jiri Olsa @ 2019-08-26  6:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Julia Kartseva, Andrii Nakryiko, labbott@redhat.com,
	acme@kernel.org, debian-kernel@lists.debian.org,
	netdev@vger.kernel.org, Andrey Ignatov, Yonghong Song,
	jolsa@kernel.org, Daniel Borkmann
In-Reply-To: <a00bab9b-dae8-23d8-8de0-3751a1d1b023@fb.com>

On Fri, Aug 23, 2019 at 04:00:01PM +0000, Alexei Starovoitov wrote:
> On 8/23/19 2:22 AM, Jiri Olsa wrote:
> > btw, the libbpf GH repo tag v0.0.4 has 0.0.3 version set in Makefile:
> > 
> >    VERSION = 0
> >    PATCHLEVEL = 0
> >    EXTRAVERSION = 3
> > 
> > current code takes version from libbpf.map so it's fine,
> > but would be great to start from 0.0.5 so we don't need to
> > bother with rpm patches.. is 0.0.5 planned soon?
> 
> Technically we can bump it at any time.
> The goal was to bump it only when new kernel is released
> to capture a collection of new APIs in a given 0.0.X release.
> So that libbpf versions are synchronized with kernel versions
> in some what loose way.
> In this case we can make an exception and bump it now.

I see, I dont think it's worth of the exception now,
the patch is simple or we'll start with 0.0.3

jirka

^ permalink raw reply

* [PATCH v3] net: fix skb use after free in netpoll
From: Feng Sun @ 2019-08-26  6:46 UTC (permalink / raw)
  To: davem
  Cc: edumazet, dsterba, dbanerje, fw, davej, tglx, matwey,
	sakari.ailus, netdev, linux-kernel, Feng Sun, Xiaojun Zhao
In-Reply-To: <20190825.232003.1145950065287854577.davem@davemloft.net>

After commit baeababb5b85d5c4e6c917efe2a1504179438d3b
("tun: return NET_XMIT_DROP for dropped packets"),
when tun_net_xmit drop packets, it will free skb and return NET_XMIT_DROP,
netpoll_send_skb_on_dev will run into following use after free cases:
1. retry netpoll_start_xmit with freed skb;
2. queue freed skb in npinfo->txq.
queue_process will also run into use after free case.

hit netpoll_send_skb_on_dev first case with following kernel log:

[  117.864773] kernel BUG at mm/slub.c:306!
[  117.864773] invalid opcode: 0000 [#1] SMP PTI
[  117.864774] CPU: 3 PID: 2627 Comm: loop_printmsg Kdump: loaded Tainted: P           OE     5.3.0-050300rc5-generic #201908182231
[  117.864775] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  117.864775] RIP: 0010:kmem_cache_free+0x28d/0x2b0
[  117.864781] Call Trace:
[  117.864781]  ? tun_net_xmit+0x21c/0x460
[  117.864781]  kfree_skbmem+0x4e/0x60
[  117.864782]  kfree_skb+0x3a/0xa0
[  117.864782]  tun_net_xmit+0x21c/0x460
[  117.864782]  netpoll_start_xmit+0x11d/0x1b0
[  117.864788]  netpoll_send_skb_on_dev+0x1b8/0x200
[  117.864789]  __br_forward+0x1b9/0x1e0 [bridge]
[  117.864789]  ? skb_clone+0x53/0xd0
[  117.864790]  ? __skb_clone+0x2e/0x120
[  117.864790]  deliver_clone+0x37/0x50 [bridge]
[  117.864790]  maybe_deliver+0x89/0xc0 [bridge]
[  117.864791]  br_flood+0x6c/0x130 [bridge]
[  117.864791]  br_dev_xmit+0x315/0x3c0 [bridge]
[  117.864792]  netpoll_start_xmit+0x11d/0x1b0
[  117.864792]  netpoll_send_skb_on_dev+0x1b8/0x200
[  117.864792]  netpoll_send_udp+0x2c6/0x3e8
[  117.864793]  write_msg+0xd9/0xf0 [netconsole]
[  117.864793]  console_unlock+0x386/0x4e0
[  117.864793]  vprintk_emit+0x17e/0x280
[  117.864794]  vprintk_default+0x29/0x50
[  117.864794]  vprintk_func+0x4c/0xbc
[  117.864794]  printk+0x58/0x6f
[  117.864795]  loop_fun+0x24/0x41 [printmsg_loop]
[  117.864795]  kthread+0x104/0x140
[  117.864795]  ? 0xffffffffc05b1000
[  117.864796]  ? kthread_park+0x80/0x80
[  117.864796]  ret_from_fork+0x35/0x40

Signed-off-by: Feng Sun <loyou85@gmail.com>
Signed-off-by: Xiaojun Zhao <xiaojunzhao141@gmail.com>
---
Changes in v3:
- use dev_xmit_complete directly

Changes in v2:
- change commit and title
- add netpoll_xmit_complete helper
- add one more return value check of netpoll_start_xmit
---
 net/core/netpoll.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 2cf27da..849380a6 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -122,7 +122,7 @@ static void queue_process(struct work_struct *work)
 		txq = netdev_get_tx_queue(dev, q_index);
 		HARD_TX_LOCK(dev, txq, smp_processor_id());
 		if (netif_xmit_frozen_or_stopped(txq) ||
-		    netpoll_start_xmit(skb, dev, txq) != NETDEV_TX_OK) {
+		    !dev_xmit_complete(netpoll_start_xmit(skb, dev, txq))) {
 			skb_queue_head(&npinfo->txq, skb);
 			HARD_TX_UNLOCK(dev, txq);
 			local_irq_restore(flags);
@@ -335,7 +335,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
 
 				HARD_TX_UNLOCK(dev, txq);
 
-				if (status == NETDEV_TX_OK)
+				if (dev_xmit_complete(status))
 					break;
 
 			}
@@ -352,7 +352,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
 
 	}
 
-	if (status != NETDEV_TX_OK) {
+	if (!dev_xmit_complete(status)) {
 		skb_queue_tail(&npinfo->txq, skb);
 		schedule_delayed_work(&npinfo->tx_work,0);
 	}
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH] bpf: handle 32-bit zext during constant blinding
From: Naveen N. Rao @ 2019-08-26  6:59 UTC (permalink / raw)
  To: Jiong Wang, Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, linux-kernel, linuxppc-dev, Michael Ellerman, netdev
In-Reply-To: <87zhk2faqg.fsf@netronome.com>

Jiong Wang wrote:
> 
> Naveen N. Rao writes:
> 
>> Since BPF constant blinding is performed after the verifier pass, the
>> ALU32 instructions inserted for doubleword immediate loads don't have a
>> corresponding zext instruction. This is causing a kernel oops on powerpc
>> and can be reproduced by running 'test_cgroup_storage' with
>> bpf_jit_harden=2.
>>
>> Fix this by emitting BPF_ZEXT during constant blinding if
>> prog->aux->verifier_zext is set.
>>
>> Fixes: a4b1d3c1ddf6cb ("bpf: verifier: insert zero extension according to analysis result")
>> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
>> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> 
> Thanks for the fix.
> 
> Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
> 
> Just two other comments during review in case I am wrong on somewhere.
> 
>   - Use verifier_zext instead of bpf_jit_needs_zext() seems better, even
>     though the latter could avoid extending function argument.
> 
>     Because JIT back-ends look at verifier_zext, true means zext inserted
>     by verifier so JITs won't do the code-gen.
> 
>     Use verifier_zext is sort of keeping JIT blinding the same behaviour
>     has verifier even though blinding doesn't belong to verifier, but for
>     such insn patching, it could be seen as a extension of verifier,
>     therefore use verifier_zext seems better than bpf_jit_needs_zext() to
>     me.
>    
>   - JIT blinding is also escaping the HI32 randomization which happens
>     inside verifier, otherwise x86-64 regression should have caught this issue.

Jiong,
Thanks for the review.

Alexei, Daniel,
Can you please pick this up for v5.3. This is a regression and is 
causing a crash on powerpc.


- Naveen


^ permalink raw reply

* Re: [PATCH v2 -next] net: mediatek: remove set but not used variable 'status'
From: René van Dorst @ 2019-08-26  7:10 UTC (permalink / raw)
  To: Mao Wenan, sr
  Cc: nbd, john, sean.wang, nelson.chang, davem, matthias.bgg,
	kernel-janitors, netdev, linux-arm-kernel, linux-mediatek,
	linux-kernel
In-Reply-To: <20190826013118.22720-1-maowenan@huawei.com>

Let's add Stefan to the conversation.
He is the author of this commit.

Quoting Mao Wenan <maowenan@huawei.com>:

> Fixes gcc '-Wunused-but-set-variable' warning:
> drivers/net/ethernet/mediatek/mtk_eth_soc.c: In function mtk_handle_irq:
> drivers/net/ethernet/mediatek/mtk_eth_soc.c:1951:6: warning:  
> variable status set but not used [-Wunused-but-set-variable]
>
> Fixes: 296c9120752b ("net: ethernet: mediatek: Add MT7628/88 SoC support")
> Signed-off-by: Mao Wenan <maowenan@huawei.com>
> ---
>  v2: change format of 'Fixes' tag.
>  drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c  
> b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> index 8ddbb8d..bb7d623 100644
> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> @@ -1948,9 +1948,7 @@ static irqreturn_t mtk_handle_irq_tx(int irq,  
> void *_eth)
>  static irqreturn_t mtk_handle_irq(int irq, void *_eth)
>  {
>  	struct mtk_eth *eth = _eth;
> -	u32 status;
>
> -	status = mtk_r32(eth, MTK_PDMA_INT_STATUS);

Hi Stefan,

You added an extra MTK_PDMA_INT_STATUS read in mtk_handle_irq()
Is that read necessary to work properly?

Greats,

René


>  	if (mtk_r32(eth, MTK_PDMA_INT_MASK) & MTK_RX_DONE_INT) {
>  		if (mtk_r32(eth, MTK_PDMA_INT_STATUS) & MTK_RX_DONE_INT)
>  			mtk_handle_irq_rx(irq, _eth);
> --
> 2.7.4





^ permalink raw reply

* [PATCH v4] rtw88: pci: Move a mass of jobs in hw IRQ to soft IRQ
From: Jian-Hong Pan @ 2019-08-26  7:08 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller
  Cc: linux-wireless, netdev, linux-kernel, linux, Jian-Hong Pan
In-Reply-To: <F7CD281DE3E379468C6D07993EA72F84D18A5786@RTITMBSVM04.realtek.com.tw>

There is a mass of jobs between spin lock and unlock in the hardware
IRQ which will occupy much time originally. To make system work more
efficiently, this patch moves the jobs to the soft IRQ (bottom half) to
reduce the time in hardware IRQ.

Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
---
v2:
 Change the spin_lock_irqsave/unlock_irqrestore to spin_lock/unlock in
 rtw_pci_interrupt_handler. Because the interrupts are already disabled
 in the hardware interrupt handler.

v3:
 Extend the spin lock protecting area for the TX path in
 rtw_pci_interrupt_threadfn by Realtek's suggestion

v4:
 Remove the WiFi running check in rtw_pci_interrupt_threadfn to avoid AP
 connection failed by Realtek's suggestion.

 drivers/net/wireless/realtek/rtw88/pci.c | 32 +++++++++++++++++++-----
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index 00ef229552d5..955dd6c6fb57 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -866,12 +866,29 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq, void *dev)
 {
 	struct rtw_dev *rtwdev = dev;
 	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
-	u32 irq_status[4];
 
 	spin_lock(&rtwpci->irq_lock);
 	if (!rtwpci->irq_enabled)
 		goto out;
 
+	/* disable RTW PCI interrupt to avoid more interrupts before the end of
+	 * thread function
+	 */
+	rtw_pci_disable_interrupt(rtwdev, rtwpci);
+out:
+	spin_unlock(&rtwpci->irq_lock);
+
+	return IRQ_WAKE_THREAD;
+}
+
+static irqreturn_t rtw_pci_interrupt_threadfn(int irq, void *dev)
+{
+	struct rtw_dev *rtwdev = dev;
+	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
+	unsigned long flags;
+	u32 irq_status[4];
+
+	spin_lock_irqsave(&rtwpci->irq_lock, flags);
 	rtw_pci_irq_recognized(rtwdev, rtwpci, irq_status);
 
 	if (irq_status[0] & IMR_MGNTDOK)
@@ -891,8 +908,9 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq, void *dev)
 	if (irq_status[0] & IMR_ROK)
 		rtw_pci_rx_isr(rtwdev, rtwpci, RTW_RX_QUEUE_MPDU);
 
-out:
-	spin_unlock(&rtwpci->irq_lock);
+	/* all of the jobs for this interrupt have been done */
+	rtw_pci_enable_interrupt(rtwdev, rtwpci);
+	spin_unlock_irqrestore(&rtwpci->irq_lock, flags);
 
 	return IRQ_HANDLED;
 }
@@ -1152,8 +1170,10 @@ static int rtw_pci_probe(struct pci_dev *pdev,
 		goto err_destroy_pci;
 	}
 
-	ret = request_irq(pdev->irq, &rtw_pci_interrupt_handler,
-			  IRQF_SHARED, KBUILD_MODNAME, rtwdev);
+	ret = devm_request_threaded_irq(rtwdev->dev, pdev->irq,
+					rtw_pci_interrupt_handler,
+					rtw_pci_interrupt_threadfn,
+					IRQF_SHARED, KBUILD_MODNAME, rtwdev);
 	if (ret) {
 		ieee80211_unregister_hw(hw);
 		goto err_destroy_pci;
@@ -1192,7 +1212,7 @@ static void rtw_pci_remove(struct pci_dev *pdev)
 	rtw_pci_disable_interrupt(rtwdev, rtwpci);
 	rtw_pci_destroy(rtwdev, pdev);
 	rtw_pci_declaim(rtwdev, pdev);
-	free_irq(rtwpci->pdev->irq, rtwdev);
+	devm_free_irq(rtwdev->dev, rtwpci->pdev->irq, rtwdev);
 	rtw_core_deinit(rtwdev);
 	ieee80211_free_hw(hw);
 }
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v3] rtw88: pci: Move a mass of jobs in hw IRQ to soft IRQ
From: Jian-Hong Pan @ 2019-08-26  7:21 UTC (permalink / raw)
  To: Tony Chuang
  Cc: Kalle Valo, David S . Miller, linux-wireless@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux@endlessm.com
In-Reply-To: <F7CD281DE3E379468C6D07993EA72F84D18A5786@RTITMBSVM04.realtek.com.tw>

Tony Chuang <yhchuang@realtek.com> 於 2019年8月21日 週三 下午4:16寫道：
>
> Hi,
>
> > From: Jian-Hong Pan [mailto:jian-hong@endlessm.com]
> >
> > There is a mass of jobs between spin lock and unlock in the hardware
> > IRQ which will occupy much time originally. To make system work more
> > efficiently, this patch moves the jobs to the soft IRQ (bottom half) to
> > reduce the time in hardware IRQ.
> >
> > Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> > ---
> > v2:
> >  Change the spin_lock_irqsave/unlock_irqrestore to spin_lock/unlock in
> >  rtw_pci_interrupt_handler. Because the interrupts are already disabled
> >  in the hardware interrupt handler.
> >
> > v3:
> >  Extend the spin lock protecting area for the TX path in
> >  rtw_pci_interrupt_threadfn by Realtek's suggestion
> >
> >  drivers/net/wireless/realtek/rtw88/pci.c | 33 +++++++++++++++++++-----
> >  1 file changed, 27 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/net/wireless/realtek/rtw88/pci.c
> > b/drivers/net/wireless/realtek/rtw88/pci.c
> > index 00ef229552d5..a8c17a01f318 100644
> > --- a/drivers/net/wireless/realtek/rtw88/pci.c
> > +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> > @@ -866,12 +866,29 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq,
> > void *dev)
> >  {
> >       struct rtw_dev *rtwdev = dev;
> >       struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
> > -     u32 irq_status[4];
> >
> >       spin_lock(&rtwpci->irq_lock);
> >       if (!rtwpci->irq_enabled)
> >               goto out;
> >
> > +     /* disable RTW PCI interrupt to avoid more interrupts before the end of
> > +      * thread function
> > +      */
> > +     rtw_pci_disable_interrupt(rtwdev, rtwpci);
> > +out:
> > +     spin_unlock(&rtwpci->irq_lock);
> > +
> > +     return IRQ_WAKE_THREAD;
> > +}
> > +
> > +static irqreturn_t rtw_pci_interrupt_threadfn(int irq, void *dev)
> > +{
> > +     struct rtw_dev *rtwdev = dev;
> > +     struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
> > +     unsigned long flags;
> > +     u32 irq_status[4];
> > +
> > +     spin_lock_irqsave(&rtwpci->irq_lock, flags);
> >       rtw_pci_irq_recognized(rtwdev, rtwpci, irq_status);
> >
> >       if (irq_status[0] & IMR_MGNTDOK)
> > @@ -891,8 +908,10 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq,
> > void *dev)
> >       if (irq_status[0] & IMR_ROK)
> >               rtw_pci_rx_isr(rtwdev, rtwpci, RTW_RX_QUEUE_MPDU);
> >
> > -out:
> > -     spin_unlock(&rtwpci->irq_lock);
> > +     /* all of the jobs for this interrupt have been done */
> > +     if (rtw_flag_check(rtwdev, RTW_FLAG_RUNNING))
> > +             rtw_pci_enable_interrupt(rtwdev, rtwpci);
>
> I've applied this patch and tested it.
> But I failed to connect to AP, it seems that the
> scan_result is empty. And when I failed to connect
> to the AP, I found that the IMR is not enabled.
> I guess the check bypass the interrupt enable function.
>
> And I also found that *without MSI*, the driver is
> able to connect to the AP. Could you please verify
> this patch again with MSI interrupt enabled and
> send a v4?
>
> You can find my MSI patch on
> https://patchwork.kernel.org/patch/11081539/

I have just sent v4 patch.  Also tested the modified MSI patch like below:
The WiFi works fine on ASUS X512DK (including MSI enabled).

Jian-Hong Pan

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c
b/drivers/net/wireless/realtek/rtw88/pci.c
index 955dd6c6fb57..d18f5aae1585 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -11,6 +11,10 @@
 #include "fw.h"
 #include "debug.h"

+static bool rtw_disable_msi;
+module_param_named(disable_msi, rtw_disable_msi, bool, 0644);
+MODULE_PARM_DESC(disable_msi, "Set Y to disable MSI interrupt support");
+
 static u32 rtw_pci_tx_queue_idx_addr[] = {
     [RTW_TX_QUEUE_BK]    = RTK_PCI_TXBD_IDX_BKQ,
     [RTW_TX_QUEUE_BE]    = RTK_PCI_TXBD_IDX_BEQ,
@@ -1116,6 +1120,48 @@ static struct rtw_hci_ops rtw_pci_ops = {
     .write_data_h2c = rtw_pci_write_data_h2c,
 };

+static int rtw_pci_request_irq(struct rtw_dev *rtwdev, struct pci_dev *pdev)
+{
+    struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
+    int ret;
+
+    if (!rtw_disable_msi) {
+        ret = pci_enable_msi(pdev);
+        if (ret) {
+            rtw_warn(rtwdev, "failed to enable msi %d, using legacy irq\n",
+                 ret);
+        } else {
+            rtw_warn(rtwdev, "pci msi enabled\n");
+            rtwpci->msi_enabled = true;
+        }
+    }
+
+    ret = devm_request_threaded_irq(rtwdev->dev, pdev->irq,
+                    rtw_pci_interrupt_handler,
+                    rtw_pci_interrupt_threadfn,
+                    IRQF_SHARED, KBUILD_MODNAME, rtwdev);
+    if (ret) {
+        rtw_err(rtwdev, "failed to request irq %d\n", ret);
+        if (rtwpci->msi_enabled) {
+            pci_disable_msi(pdev);
+            rtwpci->msi_enabled = false;
+        }
+    }
+
+    return ret;
+}
+
+static void rtw_pci_free_irq(struct rtw_dev *rtwdev, struct pci_dev *pdev)
+{
+    struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
+
+    devm_free_irq(rtwdev->dev, rtwpci->pdev->irq, rtwdev);
+    if (rtwpci->msi_enabled) {
+        pci_disable_msi(pdev);
+        rtwpci->msi_enabled = false;
+    }
+}
+
 static int rtw_pci_probe(struct pci_dev *pdev,
              const struct pci_device_id *id)
 {
@@ -1170,10 +1216,7 @@ static int rtw_pci_probe(struct pci_dev *pdev,
         goto err_destroy_pci;
     }

-    ret = devm_request_threaded_irq(rtwdev->dev, pdev->irq,
-                    rtw_pci_interrupt_handler,
-                    rtw_pci_interrupt_threadfn,
-                    IRQF_SHARED, KBUILD_MODNAME, rtwdev);
+    ret = rtw_pci_request_irq(rtwdev, pdev);
     if (ret) {
         ieee80211_unregister_hw(hw);
         goto err_destroy_pci;
@@ -1212,7 +1255,7 @@ static void rtw_pci_remove(struct pci_dev *pdev)
     rtw_pci_disable_interrupt(rtwdev, rtwpci);
     rtw_pci_destroy(rtwdev, pdev);
     rtw_pci_declaim(rtwdev, pdev);
-    devm_free_irq(rtwdev->dev, rtwpci->pdev->irq, rtwdev);
+    rtw_pci_free_irq(rtwdev, pdev);
     rtw_core_deinit(rtwdev);
     ieee80211_free_hw(hw);
 }
diff --git a/drivers/net/wireless/realtek/rtw88/pci.h
b/drivers/net/wireless/realtek/rtw88/pci.h
index 87824a4caba9..a8e369c5eaca 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.h
+++ b/drivers/net/wireless/realtek/rtw88/pci.h
@@ -186,6 +186,7 @@ struct rtw_pci {
     spinlock_t irq_lock;
     u32 irq_mask[4];
     bool irq_enabled;
+    bool msi_enabled;

     u16 rx_tag;
     struct rtw_pci_tx_ring tx_rings[RTK_MAX_TX_QUEUE_NUM];


> > +     spin_unlock_irqrestore(&rtwpci->irq_lock, flags);
> >
> >       return IRQ_HANDLED;
> >  }
> > @@ -1152,8 +1171,10 @@ static int rtw_pci_probe(struct pci_dev *pdev,
> >               goto err_destroy_pci;
> >       }
> >
> > -     ret = request_irq(pdev->irq, &rtw_pci_interrupt_handler,
> > -                       IRQF_SHARED, KBUILD_MODNAME, rtwdev);
> > +     ret = devm_request_threaded_irq(rtwdev->dev, pdev->irq,
> > +                                     rtw_pci_interrupt_handler,
> > +                                     rtw_pci_interrupt_threadfn,
> > +                                     IRQF_SHARED, KBUILD_MODNAME, rtwdev);
> >       if (ret) {
> >               ieee80211_unregister_hw(hw);
> >               goto err_destroy_pci;
> > @@ -1192,7 +1213,7 @@ static void rtw_pci_remove(struct pci_dev *pdev)
> >       rtw_pci_disable_interrupt(rtwdev, rtwpci);
> >       rtw_pci_destroy(rtwdev, pdev);
> >       rtw_pci_declaim(rtwdev, pdev);
> > -     free_irq(rtwpci->pdev->irq, rtwdev);
> > +     devm_free_irq(rtwdev->dev, rtwpci->pdev->irq, rtwdev);
> >       rtw_core_deinit(rtwdev);
> >       ieee80211_free_hw(hw);
> >  }
> > --
>
>
> NACK
> Need to verify it with MSI https://patchwork.kernel.org/patch/11081539/
> And hope v4 could fix it.
>
> Yan-Hsuan
>

^ permalink raw reply related

* [PATCH] net: Adding parameter detection in __ethtool_get_link_ksettings.
From: Dongxu Liu @ 2019-08-26  7:23 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-kernel

The __ethtool_get_link_ksettings symbol will be exported,
and external users may use an illegal address.
We should check the parameters before using them,
otherwise the system will crash.

[ 8980.991134] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 8980.993049] IP: [<ffffffff8155aca7>] __ethtool_get_link_ksettings+0x27/0x140
[ 8980.994285] PGD 0
[ 8980.995013] Oops: 0000 [#1] SMP
[ 8980.995896] Modules linked in: sch_ingress ...
[ 8981.013220] CPU: 3 PID: 25174 Comm: kworker/3:3 Tainted: G           O   ----V-------   3.10.0-327.36.58.4.x86_64 #1
[ 8981.017667] Workqueue: events linkwatch_event
[ 8981.018652] task: ffff8800a8348000 ti: ffff8800b045c000 task.ti: ffff8800b045c000
[ 8981.020418] RIP: 0010:[<ffffffff8155aca7>]  [<ffffffff8155aca7>] __ethtool_get_link_ksettings+0x27/0x140
[ 8981.022383] RSP: 0018:ffff8800b045fc88  EFLAGS: 00010202
[ 8981.023453] RAX: 0000000000000000 RBX: ffff8800b045fcac RCX: 0000000000000000
[ 8981.024726] RDX: ffff8800b658f600 RSI: ffff8800b045fcac RDI: ffff8802296e0000
[ 8981.026000] RBP: ffff8800b045fc98 R08: 0000000000000000 R09: 0000000000000001
[ 8981.027273] R10: 00000000000073e0 R11: 0000082b0cc8adea R12: ffff8802296e0000
[ 8981.028561] R13: ffff8800b566e8c0 R14: ffff8800b658f600 R15: ffff8800b566e000
[ 8981.029841] FS:  0000000000000000(0000) GS:ffff88023ed80000(0000) knlGS:0000000000000000
[ 8981.031715] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8981.032845] CR2: 0000000000000000 CR3: 00000000b39a9000 CR4: 00000000003407e0
[ 8981.034137] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8981.035427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8981.036702] Stack:
[ 8981.037406]  ffff8800b658f600 0000000000009c40 ffff8800b045fce8 ffffffffa047a71d
[ 8981.039238]  000000000000004d ffff8800b045fcc8 ffff8800b045fd28 ffffffff815cb198
[ 8981.041070]  ffff8800b045fcd8 ffffffff810807e6 00000000e8212951 0000000000000001
[ 8981.042910] Call Trace:
[ 8981.043660]  [<ffffffffa047a71d>] bond_update_speed_duplex+0x3d/0x90 [bonding]
[ 8981.045424]  [<ffffffff815cb198>] ? inetdev_event+0x38/0x530
[ 8981.046554]  [<ffffffff810807e6>] ? put_online_cpus+0x56/0x80
[ 8981.047688]  [<ffffffffa0480d67>] bond_netdev_event+0x137/0x360 [bonding]
...

Signed-off-by: Dongxu Liu <liudongxu3@huawei.com>
---
 net/core/ethtool.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 6288e69..9a50b64 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -545,6 +545,8 @@ int __ethtool_get_link_ksettings(struct net_device *dev,
 {
 	ASSERT_RTNL();
 
+	if (!dev || !dev->ethtool_ops)
+		return -EOPNOTSUPP;
 	if (!dev->ethtool_ops->get_link_ksettings)
 		return -EOPNOTSUPP;
 
-- 
2.12.3



^ permalink raw reply related

* Re: [PATCH v2 -next] net: mediatek: remove set but not used variable 'status'
From: Stefan Roese @ 2019-08-26  7:27 UTC (permalink / raw)
  To: René van Dorst, Mao Wenan
  Cc: nbd, john, sean.wang, nelson.chang, davem, matthias.bgg,
	kernel-janitors, netdev, linux-arm-kernel, linux-mediatek,
	linux-kernel
In-Reply-To: <20190826071048.Horde.gwS9nzceYYiYGJLnJ6-x2hz@www.vdorst.com>

Hi!

On 26.08.19 09:10, René van Dorst wrote:
> Let's add Stefan to the conversation.
> He is the author of this commit.

Thanks Rene.
  
> Quoting Mao Wenan <maowenan@huawei.com>:
> 
>> Fixes gcc '-Wunused-but-set-variable' warning:
>> drivers/net/ethernet/mediatek/mtk_eth_soc.c: In function mtk_handle_irq:
>> drivers/net/ethernet/mediatek/mtk_eth_soc.c:1951:6: warning:
>> variable status set but not used [-Wunused-but-set-variable]
>>
>> Fixes: 296c9120752b ("net: ethernet: mediatek: Add MT7628/88 SoC support")
>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>> ---
>>   v2: change format of 'Fixes' tag.
>>   drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 --
>>   1 file changed, 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
>> b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
>> index 8ddbb8d..bb7d623 100644
>> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
>> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
>> @@ -1948,9 +1948,7 @@ static irqreturn_t mtk_handle_irq_tx(int irq,
>> void *_eth)
>>   static irqreturn_t mtk_handle_irq(int irq, void *_eth)
>>   {
>>   	struct mtk_eth *eth = _eth;
>> -	u32 status;
>>
>> -	status = mtk_r32(eth, MTK_PDMA_INT_STATUS);
> 
> Hi Stefan,
> 
> You added an extra MTK_PDMA_INT_STATUS read in mtk_handle_irq()
> Is that read necessary to work properly?

No, its not needed. This read must have "slipped in" from some earlier
patch versions and I forgot to remove it later. Thanks for catching it.

Reviewed-by: Stefan Roese <sr@denx.de>

Thanks,
Stefan

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox