Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH V2 4/9] vhost: reset invalidate_count in vhost_set_vring_num_addr()
From: Jason Gunthorpe @ 2019-07-31 12:41 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, kvm, virtualization, netdev, linux-kernel, linux-mm
In-Reply-To: <20190731084655.7024-5-jasowang@redhat.com>

On Wed, Jul 31, 2019 at 04:46:50AM -0400, Jason Wang wrote:
> The vhost_set_vring_num_addr() could be called in the middle of
> invalidate_range_start() and invalidate_range_end(). If we don't reset
> invalidate_count after the un-registering of MMU notifier, the
> invalidate_cont will run out of sync (e.g never reach zero). This will
> in fact disable the fast accessor path. Fixing by reset the count to
> zero.
> 
> Reported-by: Michael S. Tsirkin <mst@redhat.com>

Did Michael report this as well?

> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
>  drivers/vhost/vhost.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 2a3154976277..2a7217c33668 100644
> +++ b/drivers/vhost/vhost.c
> @@ -2073,6 +2073,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
>  		d->has_notifier = false;
>  	}
>  
> +	/* reset invalidate_count in case we are in the middle of
> +	 * invalidate_start() and invalidate_end().
> +	 */
> +	vq->invalidate_count = 0;
>  	vhost_uninit_vq_maps(vq);
>  #endif
>  

^ permalink raw reply

* Re: [PATCH V2 7/9] vhost: do not use RCU to synchronize MMU notifier with worker
From: Jason Gunthorpe @ 2019-07-31 12:39 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, kvm, virtualization, netdev, linux-kernel, linux-mm
In-Reply-To: <20190731084655.7024-8-jasowang@redhat.com>

On Wed, Jul 31, 2019 at 04:46:53AM -0400, Jason Wang wrote:
> We used to use RCU to synchronize MMU notifier with worker. This leads
> calling synchronize_rcu() in invalidate_range_start(). But on a busy
> system, there would be many factors that may slow down the
> synchronize_rcu() which makes it unsuitable to be called in MMU
> notifier.
> 
> A solution is SRCU but its overhead is obvious with the expensive full
> memory barrier. Another choice is to use seqlock, but it doesn't
> provide a synchronization method between readers and writers. The last
> choice is to use vq mutex, but it need to deal with the worst case
> that MMU notifier must be blocked and wait for the finish of swap in.
> 
> So this patch switches use a counter to track whether or not the map
> was used. The counter was increased when vq try to start or finish
> uses the map. This means, when it was even, we're sure there's no
> readers and MMU notifier is synchronized. When it was odd, it means
> there's a reader we need to wait it to be even again then we are
> synchronized. 

You just described a seqlock.

We've been talking about providing this as some core service from mmu
notifiers because nearly every use of this API needs it.

IMHO this gets the whole thing backwards, the common pattern is to
protect the 'shadow pte' data with a seqlock (usually open coded),
such that the mmu notififer side has the write side of that lock and
the read side is consumed by the thread accessing or updating the SPTE.


> Reported-by: Michael S. Tsirkin <mst@redhat.com>
> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
>  drivers/vhost/vhost.c | 145 ++++++++++++++++++++++++++----------------
>  drivers/vhost/vhost.h |   7 +-
>  2 files changed, 94 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index cfc11f9ed9c9..db2c81cb1e90 100644
> +++ b/drivers/vhost/vhost.c
> @@ -324,17 +324,16 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
>  
>  	spin_lock(&vq->mmu_lock);
>  	for (i = 0; i < VHOST_NUM_ADDRS; i++) {
> -		map[i] = rcu_dereference_protected(vq->maps[i],
> -				  lockdep_is_held(&vq->mmu_lock));
> +		map[i] = vq->maps[i];
>  		if (map[i]) {
>  			vhost_set_map_dirty(vq, map[i], i);
> -			rcu_assign_pointer(vq->maps[i], NULL);
> +			vq->maps[i] = NULL;
>  		}
>  	}
>  	spin_unlock(&vq->mmu_lock);
>  
> -	/* No need for synchronize_rcu() or kfree_rcu() since we are
> -	 * serialized with memory accessors (e.g vq mutex held).
> +	/* No need for synchronization since we are serialized with
> +	 * memory accessors (e.g vq mutex held).
>  	 */
>  
>  	for (i = 0; i < VHOST_NUM_ADDRS; i++)
> @@ -362,6 +361,44 @@ static bool vhost_map_range_overlap(struct vhost_uaddr *uaddr,
>  	return !(end < uaddr->uaddr || start > uaddr->uaddr - 1 + uaddr->size);
>  }
>  
> +static void inline vhost_vq_access_map_begin(struct vhost_virtqueue *vq)
> +{
> +	int ref = READ_ONCE(vq->ref);

Is a lock/single threaded supposed to be held for this?

> +
> +	smp_store_release(&vq->ref, ref + 1);
> +	/* Make sure ref counter is visible before accessing the map */
> +	smp_load_acquire(&vq->ref);

release/acquire semantics are intended to protect blocks of related
data, so reading something with acquire and throwing away the result
is nonsense.

> +}
> +
> +static void inline vhost_vq_access_map_end(struct vhost_virtqueue *vq)
> +{
> +	int ref = READ_ONCE(vq->ref);

If the write to vq->ref is not locked this algorithm won't work, if it
is locked the READ_ONCE is not needed.

> +	/* Make sure vq access is done before increasing ref counter */
> +	smp_store_release(&vq->ref, ref + 1);
> +}
> +
> +static void inline vhost_vq_sync_access(struct vhost_virtqueue *vq)
> +{
> +	int ref;
> +
> +	/* Make sure map change was done before checking ref counter */
> +	smp_mb();

This is probably smp_rmb after reading ref, and if you are setting ref
with smp_store_release then this should be smp_load_acquire() without
an explicit mb.

> +	ref = READ_ONCE(vq->ref);
> +	if (ref & 0x1) {
> +		/* When ref change, we are sure no reader can see
> +		 * previous map */
> +		while (READ_ONCE(vq->ref) == ref) {
> +			set_current_state(TASK_RUNNING);
> +			schedule();
> +		}
> +	}

This is basically read_seqcount_begin()' with a schedule instead of
cpu_relax


> +	/* Make sure ref counter was checked before any other
> +	 * operations that was dene on map. */
> +	smp_mb();

should be in a smp_load_acquire()

> +}
> +
>  static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
>  				      int index,
>  				      unsigned long start,
> @@ -376,16 +413,15 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
>  	spin_lock(&vq->mmu_lock);
>  	++vq->invalidate_count;
>  
> -	map = rcu_dereference_protected(vq->maps[index],
> -					lockdep_is_held(&vq->mmu_lock));
> +	map = vq->maps[index];
>  	if (map) {
>  		vhost_set_map_dirty(vq, map, index);
> -		rcu_assign_pointer(vq->maps[index], NULL);
> +		vq->maps[index] = NULL;
>  	}
>  	spin_unlock(&vq->mmu_lock);
>  
>  	if (map) {
> -		synchronize_rcu();
> +		vhost_vq_sync_access(vq);

What prevents racing with vhost_vq_access_map_end here?

>  		vhost_map_unprefetch(map);
>  	}
>  }

Overall I don't like it. 

We are trying to get rid of these botique mmu notifier patterns in
drivers. 

Jason

^ permalink raw reply

* [PATCH 2/2] cnic: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-07-31 12:22 UTC (permalink / raw)
  Cc: David S . Miller, netdev, linux-kernel, Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/ethernet/broadcom/cnic.c    | 26 ++++++++++++-------------
 drivers/net/ethernet/broadcom/cnic_if.h |  6 +++---
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/cnic.c b/drivers/net/ethernet/broadcom/cnic.c
index 57dc3cbff36e..215777d63cda 100644
--- a/drivers/net/ethernet/broadcom/cnic.c
+++ b/drivers/net/ethernet/broadcom/cnic.c
@@ -141,22 +141,22 @@ static int cnic_uio_close(struct uio_info *uinfo, struct inode *inode)
 
 static inline void cnic_hold(struct cnic_dev *dev)
 {
-	atomic_inc(&dev->ref_count);
+	refcount_inc(&dev->ref_count);
 }
 
 static inline void cnic_put(struct cnic_dev *dev)
 {
-	atomic_dec(&dev->ref_count);
+	refcount_dec(&dev->ref_count);
 }
 
 static inline void csk_hold(struct cnic_sock *csk)
 {
-	atomic_inc(&csk->ref_count);
+	refcount_inc(&csk->ref_count);
 }
 
 static inline void csk_put(struct cnic_sock *csk)
 {
-	atomic_dec(&csk->ref_count);
+	refcount_dec(&csk->ref_count);
 }
 
 static struct cnic_dev *cnic_from_netdev(struct net_device *netdev)
@@ -177,12 +177,12 @@ static struct cnic_dev *cnic_from_netdev(struct net_device *netdev)
 
 static inline void ulp_get(struct cnic_ulp_ops *ulp_ops)
 {
-	atomic_inc(&ulp_ops->ref_count);
+	refcount_inc(&ulp_ops->ref_count);
 }
 
 static inline void ulp_put(struct cnic_ulp_ops *ulp_ops)
 {
-	atomic_dec(&ulp_ops->ref_count);
+	refcount_dec(&ulp_ops->ref_count);
 }
 
 static void cnic_ctx_wr(struct cnic_dev *dev, u32 cid_addr, u32 off, u32 val)
@@ -494,7 +494,7 @@ int cnic_register_driver(int ulp_type, struct cnic_ulp_ops *ulp_ops)
 	}
 	read_unlock(&cnic_dev_lock);
 
-	atomic_set(&ulp_ops->ref_count, 0);
+	refcount_set(&ulp_ops->ref_count, 0);
 	rcu_assign_pointer(cnic_ulp_tbl[ulp_type], ulp_ops);
 	mutex_unlock(&cnic_lock);
 
@@ -545,12 +545,12 @@ int cnic_unregister_driver(int ulp_type)
 
 	mutex_unlock(&cnic_lock);
 	synchronize_rcu();
-	while ((atomic_read(&ulp_ops->ref_count) != 0) && (i < 20)) {
+	while ((refcount_read(&ulp_ops->ref_count) != 0) && (i < 20)) {
 		msleep(100);
 		i++;
 	}
 
-	if (atomic_read(&ulp_ops->ref_count) != 0)
+	if (refcount_read(&ulp_ops->ref_count) != 0)
 		pr_warn("%s: Failed waiting for ref count to go to zero\n",
 			__func__);
 	return 0;
@@ -3596,7 +3596,7 @@ static int cnic_cm_create(struct cnic_dev *dev, int ulp_type, u32 cid,
 	}
 
 	csk1 = &cp->csk_tbl[l5_cid];
-	if (atomic_read(&csk1->ref_count))
+	if (refcount_read(&csk1->ref_count))
 		return -EAGAIN;
 
 	if (test_and_set_bit(SK_F_INUSE, &csk1->flags))
@@ -3651,7 +3651,7 @@ static int cnic_cm_destroy(struct cnic_sock *csk)
 	csk_hold(csk);
 	clear_bit(SK_F_INUSE, &csk->flags);
 	smp_mb__after_atomic();
-	while (atomic_read(&csk->ref_count) != 1)
+	while (refcount_read(&csk->ref_count) != 1)
 		msleep(1);
 	cnic_cm_cleanup(csk);
 
@@ -5432,11 +5432,11 @@ static void cnic_free_dev(struct cnic_dev *dev)
 {
 	int i = 0;
 
-	while ((atomic_read(&dev->ref_count) != 0) && i < 10) {
+	while ((refcount_read(&dev->ref_count) != 0) && i < 10) {
 		msleep(100);
 		i++;
 	}
-	if (atomic_read(&dev->ref_count) != 0)
+	if (refcount_read(&dev->ref_count) != 0)
 		netdev_err(dev->netdev, "Failed waiting for ref count to go to zero\n");
 
 	netdev_info(dev->netdev, "Removed CNIC device\n");
diff --git a/drivers/net/ethernet/broadcom/cnic_if.h b/drivers/net/ethernet/broadcom/cnic_if.h
index 789e5c7e9311..5232a05ac7ba 100644
--- a/drivers/net/ethernet/broadcom/cnic_if.h
+++ b/drivers/net/ethernet/broadcom/cnic_if.h
@@ -300,7 +300,7 @@ struct cnic_sock {
 #define SK_F_CLOSING		7
 #define SK_F_HW_ERR		8
 
-	atomic_t ref_count;
+	refcount_t ref_count;
 	u32 state;
 	struct kwqe kwqe1;
 	struct kwqe kwqe2;
@@ -335,7 +335,7 @@ struct cnic_dev {
 #define CNIC_F_CNIC_UP		1
 #define CNIC_F_BNX2_CLASS	3
 #define CNIC_F_BNX2X_CLASS	4
-	atomic_t	ref_count;
+	refcount_t	ref_count;
 	u8		mac_addr[ETH_ALEN];
 
 	int		max_iscsi_conn;
@@ -378,7 +378,7 @@ struct cnic_ulp_ops {
 				  char *data, u16 data_size);
 	int (*cnic_get_stats)(void *ulp_ctx);
 	struct module *owner;
-	atomic_t ref_count;
+	refcount_t ref_count;
 };
 
 int cnic_register_driver(int ulp_type, struct cnic_ulp_ops *ulp_ops);
-- 
2.20.1


^ permalink raw reply related

* [PATCH 1/2] bnxt_en: Use refcount_t for refcount
From: Chuhong Yuan @ 2019-07-31 12:22 UTC (permalink / raw)
  Cc: Michael Chan, David S . Miller, netdev, linux-kernel,
	Chuhong Yuan

refcount_t is better for reference counters since its
implementation can prevent overflows.
So convert atomic_t ref counters to refcount_t.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 8 ++++----
 drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
index fc77caf0a076..eb7ed34639e2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c
@@ -49,7 +49,7 @@ static int bnxt_register_dev(struct bnxt_en_dev *edev, int ulp_id,
 			return -ENOMEM;
 	}
 
-	atomic_set(&ulp->ref_count, 0);
+	refcount_set(&ulp->ref_count, 0);
 	ulp->handle = handle;
 	rcu_assign_pointer(ulp->ulp_ops, ulp_ops);
 
@@ -87,7 +87,7 @@ static int bnxt_unregister_dev(struct bnxt_en_dev *edev, int ulp_id)
 	synchronize_rcu();
 	ulp->max_async_event_id = 0;
 	ulp->async_events_bmap = NULL;
-	while (atomic_read(&ulp->ref_count) != 0 && i < 10) {
+	while (refcount_read(&ulp->ref_count) != 0 && i < 10) {
 		msleep(100);
 		i++;
 	}
@@ -246,12 +246,12 @@ static int bnxt_send_msg(struct bnxt_en_dev *edev, int ulp_id,
 
 static void bnxt_ulp_get(struct bnxt_ulp *ulp)
 {
-	atomic_inc(&ulp->ref_count);
+	refcount_inc(&ulp->ref_count);
 }
 
 static void bnxt_ulp_put(struct bnxt_ulp *ulp)
 {
-	atomic_dec(&ulp->ref_count);
+	refcount_dec(&ulp->ref_count);
 }
 
 void bnxt_ulp_stop(struct bnxt *bp)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
index cd78453d0bf0..fc4aa582d190 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.h
@@ -52,7 +52,7 @@ struct bnxt_ulp {
 	u16		max_async_event_id;
 	u16		msix_requested;
 	u16		msix_base;
-	atomic_t	ref_count;
+	refcount_t	ref_count;
 };
 
 struct bnxt_en_dev {
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH] net: sctp: Rename fallthrough label to unhandled
From: Neil Horman @ 2019-07-31 12:16 UTC (permalink / raw)
  To: Joe Perches
  Cc: Vlad Yasevich, Marcelo Ricardo Leitner, David S. Miller,
	linux-sctp, netdev, linux-kernel
In-Reply-To: <eac3fe457d553a2b366e1c1898d47ae8c048087c.camel@perches.com>

On Wed, Jul 31, 2019 at 04:32:43AM -0700, Joe Perches wrote:
> On Wed, 2019-07-31 at 07:19 -0400, Neil Horman wrote:
> > On Tue, Jul 30, 2019 at 10:04:37PM -0700, Joe Perches wrote:
> > > fallthrough may become a pseudo reserved keyword so this only use of
> > > fallthrough is better renamed to allow it.
> > > 
> > > Signed-off-by: Joe Perches <joe@perches.com>
> > Are you referring to the __attribute__((fallthrough)) statement that gcc
> > supports?  If so the compiler should by all rights be able to differentiate
> > between a null statement attribute and a explicit goto and label without the
> > need for renaming here.  Or are you referring to something else?
> 
> Hi.
> 
> I sent after this a patch that adds
> 
> # define fallthrough                    __attribute__((__fallthrough__))
> 
> https://lore.kernel.org/patchwork/patch/1108577/
> 
> So this rename is a prerequisite to adding this #define.
> 
why not just define __fallthrough instead, like we do for all the other
attributes we alias (i.e. __read_mostly, __protected_by, __unused, __exception,
etc)

Neil

> > > diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> []
> > > @@ -2152,7 +2152,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
> > >  	case SCTP_PARAM_SET_PRIMARY:
> > >  		if (net->sctp.addip_enable)
> > >  			break;
> > > -		goto fallthrough;
> > > +		goto unhandled;
> 
> etc...
> 
> 
> 

^ permalink raw reply

* [PATCH nf,v2] netfilter: nf_tables: map basechain priority to hardware priority
From: Pablo Neira Ayuso @ 2019-07-31 12:16 UTC (permalink / raw)
  To: netfilter-devel
  Cc: marcelo.leitner, wenxu, jiri, saeedm, gerlitz.or, paulb, netdev,
	jakub.kicinski

This patch adds initial support for offloading basechains using the
priority range from -8192 to 8191.

The software priority -8192 is mapped to the hardware priority
0xC000 + 1. tcf_auto_prio() uses 0xC000 if the user specifies no
priority, then it subtracts 1 for each new tcf_proto object. This patch
reserves the hardware priority range from 0xC000 to 0xFFFF for
netfilter.

The software to hardware priority mapping is not exposed to userspace,
so it should be possible to update this / extend the range of supported
priorities later on.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
v2: fix indent in priority assignment - Marcelo Leitner.
    Missing << 16 bitshift.

 include/net/netfilter/nf_tables_offload.h |  2 ++
 net/netfilter/nf_tables_api.c             |  4 ++++
 net/netfilter/nf_tables_offload.c         | 31 ++++++++++++++++++++++++++++---
 3 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/nf_tables_offload.h b/include/net/netfilter/nf_tables_offload.h
index 3196663a10e3..2d497394021e 100644
--- a/include/net/netfilter/nf_tables_offload.h
+++ b/include/net/netfilter/nf_tables_offload.h
@@ -73,4 +73,6 @@ int nft_flow_rule_offload_commit(struct net *net);
 	(__reg)->key		= __key;				\
 	memset(&(__reg)->mask, 0xff, (__reg)->len);
 
+u16 nft_chain_offload_priority(struct nft_base_chain *basechain);
+
 #endif
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 605a7cfe7ca7..9cf0fecf5cb9 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1662,6 +1662,10 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
 
 		chain->flags |= NFT_BASE_CHAIN | flags;
 		basechain->policy = NF_ACCEPT;
+		if (chain->flags & NFT_CHAIN_HW_OFFLOAD &&
+		    !nft_chain_offload_priority(basechain))
+			return -EOPNOTSUPP;
+
 		flow_block_init(&basechain->flow_block);
 	} else {
 		chain = kzalloc(sizeof(*chain), GFP_KERNEL);
diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c
index 64f5fd5f240e..a79efd49c8a8 100644
--- a/net/netfilter/nf_tables_offload.c
+++ b/net/netfilter/nf_tables_offload.c
@@ -103,10 +103,11 @@ void nft_offload_update_dependency(struct nft_offload_ctx *ctx,
 }
 
 static void nft_flow_offload_common_init(struct flow_cls_common_offload *common,
-					 __be16 proto,
-					struct netlink_ext_ack *extack)
+					 __be16 proto, int priority,
+					 struct netlink_ext_ack *extack)
 {
 	common->protocol = proto;
+	common->prio = priority << 16;
 	common->extack = extack;
 }
 
@@ -124,6 +125,28 @@ static int nft_setup_cb_call(struct nft_base_chain *basechain,
 	return 0;
 }
 
+/* Available priorities for hardware offload range: -8192..8191 */
+#define NFT_BASECHAIN_OFFLOAD_PRIO_MAX		(SHRT_MAX / 4)
+#define NFT_BASECHAIN_OFFLOAD_PRIO_MIN		(SHRT_MIN / 4)
+#define NFT_BASECHAIN_OFFLOAD_PRIO_RANGE	(USHRT_MAX / 4)
+/* tcf_auto_prio() uses 0xC000 as base, then subtract one for each new chain. */
+#define NFT_BASECHAIN_OFFLOAD_HW_PRIO_BASE	(0xC000 + 1)
+
+u16 nft_chain_offload_priority(struct nft_base_chain *basechain)
+{
+	u16 prio;
+
+	if (basechain->ops.priority < NFT_BASECHAIN_OFFLOAD_PRIO_MIN ||
+	    basechain->ops.priority > NFT_BASECHAIN_OFFLOAD_PRIO_MAX)
+		return 0;
+
+	/* map netfilter chain priority to hardware priority. */
+	prio = basechain->ops.priority + NFT_BASECHAIN_OFFLOAD_PRIO_MAX +
+		NFT_BASECHAIN_OFFLOAD_HW_PRIO_BASE;
+
+	return prio;
+}
+
 static int nft_flow_offload_rule(struct nft_trans *trans,
 				 enum flow_cls_command command)
 {
@@ -142,7 +165,9 @@ static int nft_flow_offload_rule(struct nft_trans *trans,
 	if (flow)
 		proto = flow->proto;
 
-	nft_flow_offload_common_init(&cls_flow.common, proto, &extack);
+	nft_flow_offload_common_init(&cls_flow.common, proto,
+				     nft_chain_offload_priority(basechain),
+				     &extack);
 	cls_flow.command = command;
 	cls_flow.cookie = (unsigned long) rule;
 	if (flow)
-- 
2.11.0


^ permalink raw reply related

* [PATCH 4/8] netfilter: add include guard to xt_connlabel.h
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Masahiro Yamada <yamada.masahiro@socionext.com>

Add a header include guard just in case.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/xt_connlabel.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/uapi/linux/netfilter/xt_connlabel.h b/include/uapi/linux/netfilter/xt_connlabel.h
index 2312f0ec07b2..323f0dfc2a4e 100644
--- a/include/uapi/linux/netfilter/xt_connlabel.h
+++ b/include/uapi/linux/netfilter/xt_connlabel.h
@@ -1,4 +1,8 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+#ifndef _UAPI_XT_CONNLABEL_H
+#define _UAPI_XT_CONNLABEL_H
+
 #include <linux/types.h>
 
 #define XT_CONNLABEL_MAXBIT 127
@@ -11,3 +15,5 @@ struct xt_connlabel_mtinfo {
 	__u16 bit;
 	__u16 options;
 };
+
+#endif /* _UAPI_XT_CONNLABEL_H */
-- 
2.11.0


^ permalink raw reply related

* [PATCH 8/8] netfilter: ebtables: also count base chain policies
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

ebtables doesn't include the base chain policies in the rule count,
so we need to add them manually when we call into the x_tables core
to allocate space for the comapt offset table.

This lead syzbot to trigger:
WARNING: CPU: 1 PID: 9012 at net/netfilter/x_tables.c:649
xt_compat_add_offset.cold+0x11/0x36 net/netfilter/x_tables.c:649

Reported-by: syzbot+276ddebab3382bbf72db@syzkaller.appspotmail.com
Fixes: 2035f3ff8eaa ("netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/ebtables.c | 28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index fd84b48e48b5..c8177a89f52c 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1770,20 +1770,28 @@ static int compat_calc_entry(const struct ebt_entry *e,
 	return 0;
 }
 
+static int ebt_compat_init_offsets(unsigned int number)
+{
+	if (number > INT_MAX)
+		return -EINVAL;
+
+	/* also count the base chain policies */
+	number += NF_BR_NUMHOOKS;
+
+	return xt_compat_init_offsets(NFPROTO_BRIDGE, number);
+}
 
 static int compat_table_info(const struct ebt_table_info *info,
 			     struct compat_ebt_replace *newinfo)
 {
 	unsigned int size = info->entries_size;
 	const void *entries = info->entries;
+	int ret;
 
 	newinfo->entries_size = size;
-	if (info->nentries) {
-		int ret = xt_compat_init_offsets(NFPROTO_BRIDGE,
-						 info->nentries);
-		if (ret)
-			return ret;
-	}
+	ret = ebt_compat_init_offsets(info->nentries);
+	if (ret)
+		return ret;
 
 	return EBT_ENTRY_ITERATE(entries, size, compat_calc_entry, info,
 							entries, newinfo);
@@ -2234,11 +2242,9 @@ static int compat_do_replace(struct net *net, void __user *user,
 
 	xt_compat_lock(NFPROTO_BRIDGE);
 
-	if (tmp.nentries) {
-		ret = xt_compat_init_offsets(NFPROTO_BRIDGE, tmp.nentries);
-		if (ret < 0)
-			goto out_unlock;
-	}
+	ret = ebt_compat_init_offsets(tmp.nentries);
+	if (ret < 0)
+		goto out_unlock;
 
 	ret = compat_copy_entries(entries_tmp, tmp.entries_size, &state);
 	if (ret < 0)
-- 
2.11.0


^ permalink raw reply related

* [PATCH 6/8] netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Stefano Brivio <sbrivio@redhat.com>

In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the
KADT functions for sets matching on MAC addreses the copy of source or
destination MAC address depending on the configured match.

This was done correctly for hash:mac, but for hash:ip,mac and
bitmap:ip,mac, copying and pasting the same code block presents an
obvious problem: in these two set types, the MAC address is the second
dimension, not the first one, and we are actually selecting the MAC
address depending on whether the first dimension (IP address) specifies
source or destination.

Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags.

This way, mixing source and destination matches for the two dimensions
of ip,mac set types works as expected. With this setup:

  ip netns add A
  ip link add veth1 type veth peer name veth2 netns A
  ip addr add 192.0.2.1/24 dev veth1
  ip -net A addr add 192.0.2.2/24 dev veth2
  ip link set veth1 up
  ip -net A link set veth2 up

  dst=$(ip netns exec A cat /sys/class/net/veth2/address)

  ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16
  ip netns exec A ipset add test_bitmap 192.0.2.1,${dst}
  ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP

  ip netns exec A ipset create test_hash hash:ip,mac
  ip netns exec A ipset add test_hash 192.0.2.1,${dst}
  ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP

ipset correctly matches a test packet:

  # ping -c1 192.0.2.2 >/dev/null
  # echo $?
  0

Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
---
 net/netfilter/ipset/ip_set_bitmap_ipmac.c | 2 +-
 net/netfilter/ipset/ip_set_hash_ipmac.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_bitmap_ipmac.c b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
index ca7ac4a25ada..1d4e63326e68 100644
--- a/net/netfilter/ipset/ip_set_bitmap_ipmac.c
+++ b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
@@ -226,7 +226,7 @@ bitmap_ipmac_kadt(struct ip_set *set, const struct sk_buff *skb,

 	e.id = ip_to_id(map, ip);

-	if (opt->flags & IPSET_DIM_ONE_SRC)
+	if (opt->flags & IPSET_DIM_TWO_SRC)
 		ether_addr_copy(e.ether, eth_hdr(skb)->h_source);
 	else
 		ether_addr_copy(e.ether, eth_hdr(skb)->h_dest);
diff --git a/net/netfilter/ipset/ip_set_hash_ipmac.c b/net/netfilter/ipset/ip_set_hash_ipmac.c
index eb1443408320..24d8f4df4230 100644
--- a/net/netfilter/ipset/ip_set_hash_ipmac.c
+++ b/net/netfilter/ipset/ip_set_hash_ipmac.c
@@ -93,7 +93,7 @@ hash_ipmac4_kadt(struct ip_set *set, const struct sk_buff *skb,
 	    (skb_mac_header(skb) + ETH_HLEN) > skb->data)
 		return -EINVAL;

-	if (opt->flags & IPSET_DIM_ONE_SRC)
+	if (opt->flags & IPSET_DIM_TWO_SRC)
 		ether_addr_copy(e.ether, eth_hdr(skb)->h_source);
 	else
 		ether_addr_copy(e.ether, eth_hdr(skb)->h_dest);
-- 
2.11.0

^ permalink raw reply related

* [PATCH 5/8] netfilter: ipset: Actually allow destination MAC address for hash:ip,mac sets too
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Stefano Brivio <sbrivio@redhat.com>

In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the
KADT check that prevents matching on destination MAC addresses for
hash:mac sets, but forgot to remove the same check for hash:ip,mac set.

Drop this check: functionality is now commented in man pages and there's
no reason to restrict to source MAC address matching anymore.

Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
---
 net/netfilter/ipset/ip_set_hash_ipmac.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_ipmac.c b/net/netfilter/ipset/ip_set_hash_ipmac.c
index faf59b6a998f..eb1443408320 100644
--- a/net/netfilter/ipset/ip_set_hash_ipmac.c
+++ b/net/netfilter/ipset/ip_set_hash_ipmac.c
@@ -89,10 +89,6 @@ hash_ipmac4_kadt(struct ip_set *set, const struct sk_buff *skb,
 	struct hash_ipmac4_elem e = { .ip = 0, { .foo[0] = 0, .foo[1] = 0 } };
 	struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
 
-	 /* MAC can be src only */
-	if (!(opt->flags & IPSET_DIM_TWO_SRC))
-		return 0;
-
 	if (skb_mac_header(skb) < skb->head ||
 	    (skb_mac_header(skb) + ETH_HLEN) > skb->data)
 		return -EINVAL;
-- 
2.11.0


^ permalink raw reply related

* [PATCH 7/8] netfilter: ipset: Fix rename concurrency with listing
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Jozsef Kadlecsik <kadlec@netfilter.org>

Shijie Luo reported that when stress-testing ipset with multiple concurrent
create, rename, flush, list, destroy commands, it can result

ipset <version>: Broken LIST kernel message: missing DATA part!

error messages and broken list results. The problem was the rename operation
was not properly handled with respect of listing. The patch fixes the issue.

Reported-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
---
 net/netfilter/ipset/ip_set_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 2e151856ad99..e64d5f9a89dd 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1161,7 +1161,7 @@ static int ip_set_rename(struct net *net, struct sock *ctnl,
 		return -ENOENT;
 
 	write_lock_bh(&ip_set_ref_lock);
-	if (set->ref != 0) {
+	if (set->ref != 0 || set->ref_netlink != 0) {
 		ret = -IPSET_ERR_REFERENCED;
 		goto out;
 	}
-- 
2.11.0


^ permalink raw reply related

* [PATCH 2/8] netfilter: nf_tables: Make nft_meta expression more robust
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Phil Sutter <phil@nwl.cc>

nft_meta_get_eval()'s tendency to bail out setting NFT_BREAK verdict in
situations where required data is missing leads to unexpected behaviour
with inverted checks like so:

| meta iifname != eth0 accept

This rule will never match if there is no input interface (or it is not
known) which is not intuitive and, what's worse, breaks consistency of
iptables-nft with iptables-legacy.

Fix this by falling back to placing a value in dreg which never matches
(avoiding accidental matches), i.e. zero for interface index and an
empty string for interface name.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/nft_meta_bridge.c |  6 +-----
 net/netfilter/nft_meta.c               | 16 ++++------------
 2 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/net/bridge/netfilter/nft_meta_bridge.c b/net/bridge/netfilter/nft_meta_bridge.c
index bed66f536b34..a98dec2cf0cf 100644
--- a/net/bridge/netfilter/nft_meta_bridge.c
+++ b/net/bridge/netfilter/nft_meta_bridge.c
@@ -30,13 +30,9 @@ static void nft_meta_bridge_get_eval(const struct nft_expr *expr,
 	switch (priv->key) {
 	case NFT_META_BRI_IIFNAME:
 		br_dev = nft_meta_get_bridge(in);
-		if (!br_dev)
-			goto err;
 		break;
 	case NFT_META_BRI_OIFNAME:
 		br_dev = nft_meta_get_bridge(out);
-		if (!br_dev)
-			goto err;
 		break;
 	case NFT_META_BRI_IIFPVID: {
 		u16 p_pvid;
@@ -64,7 +60,7 @@ static void nft_meta_bridge_get_eval(const struct nft_expr *expr,
 		goto out;
 	}
 
-	strncpy((char *)dest, br_dev->name, IFNAMSIZ);
+	strncpy((char *)dest, br_dev ? br_dev->name : "", IFNAMSIZ);
 	return;
 out:
 	return nft_meta_get_eval(expr, regs, pkt);
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index f1b1d948c07b..f69afb9ff3cb 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -60,24 +60,16 @@ void nft_meta_get_eval(const struct nft_expr *expr,
 		*dest = skb->mark;
 		break;
 	case NFT_META_IIF:
-		if (in == NULL)
-			goto err;
-		*dest = in->ifindex;
+		*dest = in ? in->ifindex : 0;
 		break;
 	case NFT_META_OIF:
-		if (out == NULL)
-			goto err;
-		*dest = out->ifindex;
+		*dest = out ? out->ifindex : 0;
 		break;
 	case NFT_META_IIFNAME:
-		if (in == NULL)
-			goto err;
-		strncpy((char *)dest, in->name, IFNAMSIZ);
+		strncpy((char *)dest, in ? in->name : "", IFNAMSIZ);
 		break;
 	case NFT_META_OIFNAME:
-		if (out == NULL)
-			goto err;
-		strncpy((char *)dest, out->name, IFNAMSIZ);
+		strncpy((char *)dest, out ? out->name : "", IFNAMSIZ);
 		break;
 	case NFT_META_IIFTYPE:
 		if (in == NULL)
-- 
2.11.0


^ permalink raw reply related

* [PATCH 3/8] netfilter: nft_meta_bridge: Eliminate 'out' label
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Phil Sutter <phil@nwl.cc>

The label is used just once and the code it points at is not reused, no
point in keeping it.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/nft_meta_bridge.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/bridge/netfilter/nft_meta_bridge.c b/net/bridge/netfilter/nft_meta_bridge.c
index a98dec2cf0cf..1804e867f715 100644
--- a/net/bridge/netfilter/nft_meta_bridge.c
+++ b/net/bridge/netfilter/nft_meta_bridge.c
@@ -57,13 +57,11 @@ static void nft_meta_bridge_get_eval(const struct nft_expr *expr,
 		return;
 	}
 	default:
-		goto out;
+		return nft_meta_get_eval(expr, regs, pkt);
 	}
 
 	strncpy((char *)dest, br_dev ? br_dev->name : "", IFNAMSIZ);
 	return;
-out:
-	return nft_meta_get_eval(expr, regs, pkt);
 err:
 	regs->verdict.code = NFT_BREAK;
 }
-- 
2.11.0


^ permalink raw reply related

* [PATCH 1/8] netfilter: ebtables: fix a memory leak bug in compat
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190731115157.27020-1-pablo@netfilter.org>

From: Wenwen Wang <wenwen@cs.uga.edu>

In compat_do_replace(), a temporary buffer is allocated through vmalloc()
to hold entries copied from the user space. The buffer address is firstly
saved to 'newinfo->entries', and later on assigned to 'entries_tmp'. Then
the entries in this temporary buffer is copied to the internal kernel
structure through compat_copy_entries(). If this copy process fails,
compat_do_replace() should be terminated. However, the allocated temporary
buffer is not freed on this path, leading to a memory leak.

To fix the bug, free the buffer before returning from compat_do_replace().

Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/netfilter/ebtables.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 963dfdc14827..fd84b48e48b5 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -2261,8 +2261,10 @@ static int compat_do_replace(struct net *net, void __user *user,
 	state.buf_kern_len = size64;

 	ret = compat_copy_entries(entries_tmp, tmp.entries_size, &state);
-	if (WARN_ON(ret < 0))
+	if (WARN_ON(ret < 0)) {
+		vfree(entries_tmp);
 		goto out_unlock;
+	}

 	vfree(entries_tmp);
 	tmp.entries_size = size64;
-- 
2.11.0

^ permalink raw reply related

* [PATCH 0/8] netfilter fixes for net
From: Pablo Neira Ayuso @ 2019-07-31 11:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi,

The following patchset contains Netfilter fixes for your net tree:

1) memleak in ebtables from the error path for the 32/64 compat layer,
   from Florian Westphal.

2) Fix inverted meta ifname/ifidx matching when no interface is set
   on either from the input/output path, from Phil Sutter.

3) Remove goto label in nft_meta_bridge, also from Phil.

4) Missing include guard in xt_connlabel, from Masahiro Yamada.

5) Two patch to fix ipset destination MAC matching coming from
   Stephano Brivio, via Jozsef Kadlecsik.

6) Fix set rename and listing concurrency problem, from Shijie Luo.
   Patch also coming via Jozsef Kadlecsik.

7) ebtables 32/64 compat missing base chain policy in rule count,
   from Florian Westphal.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!

----------------------------------------------------------------

The following changes since commit 0cea0e1148fe134a4a3aaf0b1496f09241fb943a:

  net: phy: sfp: hwmon: Fix scaling of RX power (2019-07-21 11:51:50 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 7cdc4412284777c76c919e2ab33b3b8dbed18559:

  Merge branch 'master' of git://blackhole.kfki.hu/nf (2019-07-30 13:39:20 +0200)

----------------------------------------------------------------
Florian Westphal (1):
      netfilter: ebtables: also count base chain policies

Jozsef Kadlecsik (1):
      netfilter: ipset: Fix rename concurrency with listing

Masahiro Yamada (1):
      netfilter: add include guard to xt_connlabel.h

Pablo Neira Ayuso (1):
      Merge branch 'master' of git://blackhole.kfki.hu/nf

Phil Sutter (2):
      netfilter: nf_tables: Make nft_meta expression more robust
      netfilter: nft_meta_bridge: Eliminate 'out' label

Stefano Brivio (2):
      netfilter: ipset: Actually allow destination MAC address for hash:ip,mac sets too
      netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets

Wenwen Wang (1):
      netfilter: ebtables: fix a memory leak bug in compat

 include/uapi/linux/netfilter/xt_connlabel.h |  6 ++++++
 net/bridge/netfilter/ebtables.c             | 32 ++++++++++++++++++-----------
 net/bridge/netfilter/nft_meta_bridge.c      | 10 ++-------
 net/netfilter/ipset/ip_set_bitmap_ipmac.c   |  2 +-
 net/netfilter/ipset/ip_set_core.c           |  2 +-
 net/netfilter/ipset/ip_set_hash_ipmac.c     |  6 +-----
 net/netfilter/nft_meta.c                    | 16 ++++-----------
 7 files changed, 35 insertions(+), 39 deletions(-)

^ permalink raw reply

* Re: [PATCH net] drop_monitor: Add missing uAPI file to MAINTAINERS file
From: Neil Horman @ 2019-07-31 11:36 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, Ido Schimmel
In-Reply-To: <20190731063819.10001-1-idosch@idosch.org>

On Wed, Jul 31, 2019 at 09:38:19AM +0300, Ido Schimmel wrote:
> From: Ido Schimmel <idosch@mellanox.com>
> 
> Fixes: 6e43650cee64 ("add maintainer for network drop monitor kernel service")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 9f5b8bd4faf9..b540794cbd91 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11137,6 +11137,7 @@ L:	netdev@vger.kernel.org
>  S:	Maintained
>  W:	https://fedorahosted.org/dropwatch/
>  F:	net/core/drop_monitor.c
> +F:	include/uapi/linux/net_dropmon.h
>  
>  NETWORKING DRIVERS
>  M:	"David S. Miller" <davem@davemloft.net>
> -- 
> 2.21.0
> 
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>


^ permalink raw reply

* Re: next/master build: 221 builds: 11 failed, 210 passed, 13 errors, 1174 warnings (next-20190731)
From: Greg Kroah-Hartman @ 2019-07-31 11:35 UTC (permalink / raw)
  To: Mark Brown
  Cc: Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Matthew Wilcox (Oracle), devel, netdev, linux-next,
	linux-arm-kernel, kernel-build-reports
In-Reply-To: <20190731112441.GB4369@sirena.org.uk>

On Wed, Jul 31, 2019 at 12:24:41PM +0100, Mark Brown wrote:
> On Wed, Jul 31, 2019 at 04:07:41AM -0700, kernelci.org bot wrote:
> 
> Today's -next fails to build an ARM allmodconfig due to:
> 
> > allmodconfig (arm, gcc-8) — FAIL, 1 error, 40 warnings, 0 section mismatches
> > 
> > Errors:
> >     drivers/net/phy/mdio-cavium.h:111:36: error: implicit declaration of function 'writeq'; did you mean 'writel'? [-Werror=implicit-function-declaration]
> 
> as a result of the changes that introduced:
> 
> WARNING: unmet direct dependencies detected for MDIO_OCTEON
>   Depends on [n]: NETDEVICES [=y] && MDIO_DEVICE [=m] && MDIO_BUS [=m] && 64BIT && HAS_IOMEM [=y] && OF_MDIO [=m]
>   Selected by [m]:
>   - OCTEON_ETHERNET [=m] && STAGING [=y] && (CAVIUM_OCTEON_SOC && NETDEVICES [=y] || COMPILE_TEST [=y])
> 
> which is triggered by the staging OCTEON_ETHERNET driver which misses a
> 64BIT dependency but added COMPILE_TEST in 171a9bae68c72f2
> (staging/octeon: Allow test build on !MIPS).

A patch was posted for this, but it needs to go through the netdev tree
as that's where the offending patches are coming from.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] net: sctp: Rename fallthrough label to unhandled
From: Joe Perches @ 2019-07-31 11:32 UTC (permalink / raw)
  To: Neil Horman
  Cc: Vlad Yasevich, Marcelo Ricardo Leitner, David S. Miller,
	linux-sctp, netdev, linux-kernel
In-Reply-To: <20190731111932.GA9823@hmswarspite.think-freely.org>

On Wed, 2019-07-31 at 07:19 -0400, Neil Horman wrote:
> On Tue, Jul 30, 2019 at 10:04:37PM -0700, Joe Perches wrote:
> > fallthrough may become a pseudo reserved keyword so this only use of
> > fallthrough is better renamed to allow it.
> > 
> > Signed-off-by: Joe Perches <joe@perches.com>
> Are you referring to the __attribute__((fallthrough)) statement that gcc
> supports?  If so the compiler should by all rights be able to differentiate
> between a null statement attribute and a explicit goto and label without the
> need for renaming here.  Or are you referring to something else?

Hi.

I sent after this a patch that adds

# define fallthrough                    __attribute__((__fallthrough__))

https://lore.kernel.org/patchwork/patch/1108577/

So this rename is a prerequisite to adding this #define.

> > diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
[]
> > @@ -2152,7 +2152,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
> >  	case SCTP_PARAM_SET_PRIMARY:
> >  		if (net->sctp.addip_enable)
> >  			break;
> > -		goto fallthrough;
> > +		goto unhandled;

etc...



^ permalink raw reply

* Re: next/master build: 221 builds: 11 failed, 210 passed, 13 errors, 1174 warnings (next-20190731)
From: Mark Brown @ 2019-07-31 11:24 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, Heiner Kallweit,
	Greg Kroah-Hartman, Matthew Wilcox (Oracle)
  Cc: kernel-build-reports, linux-arm-kernel, netdev, devel, linux-next
In-Reply-To: <5d41767d.1c69fb81.d6304.4c8c@mx.google.com>

[-- Attachment #1: Type: text/plain, Size: 915 bytes --]

On Wed, Jul 31, 2019 at 04:07:41AM -0700, kernelci.org bot wrote:

Today's -next fails to build an ARM allmodconfig due to:

> allmodconfig (arm, gcc-8) — FAIL, 1 error, 40 warnings, 0 section mismatches
> 
> Errors:
>     drivers/net/phy/mdio-cavium.h:111:36: error: implicit declaration of function 'writeq'; did you mean 'writel'? [-Werror=implicit-function-declaration]

as a result of the changes that introduced:

WARNING: unmet direct dependencies detected for MDIO_OCTEON
  Depends on [n]: NETDEVICES [=y] && MDIO_DEVICE [=m] && MDIO_BUS [=m] && 64BIT && HAS_IOMEM [=y] && OF_MDIO [=m]
  Selected by [m]:
  - OCTEON_ETHERNET [=m] && STAGING [=y] && (CAVIUM_OCTEON_SOC && NETDEVICES [=y] || COMPILE_TEST [=y])

which is triggered by the staging OCTEON_ETHERNET driver which misses a
64BIT dependency but added COMPILE_TEST in 171a9bae68c72f2
(staging/octeon: Allow test build on !MIPS).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH] net: sctp: Rename fallthrough label to unhandled
From: Neil Horman @ 2019-07-31 11:19 UTC (permalink / raw)
  To: Joe Perches
  Cc: Vlad Yasevich, Marcelo Ricardo Leitner, David S. Miller,
	linux-sctp, netdev, linux-kernel
In-Reply-To: <e0dd3af448e38e342c1ac6e7c0c802696eb77fd6.1564549413.git.joe@perches.com>

On Tue, Jul 30, 2019 at 10:04:37PM -0700, Joe Perches wrote:
> fallthrough may become a pseudo reserved keyword so this only use of
> fallthrough is better renamed to allow it.
> 
> Signed-off-by: Joe Perches <joe@perches.com>
Are you referring to the __attribute__((fallthrough)) statement that gcc
supports?  If so the compiler should by all rights be able to differentiate
between a null statement attribute and a explicit goto and label without the
need for renaming here.  Or are you referring to something else?

Neil

> ---
>  net/sctp/sm_make_chunk.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index 36bd8a6e82df..3fdcaa2fbf12 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -2152,7 +2152,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  	case SCTP_PARAM_SET_PRIMARY:
>  		if (net->sctp.addip_enable)
>  			break;
> -		goto fallthrough;
> +		goto unhandled;
>  
>  	case SCTP_PARAM_HOST_NAME_ADDRESS:
>  		/* Tell the peer, we won't support this param.  */
> @@ -2163,11 +2163,11 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  	case SCTP_PARAM_FWD_TSN_SUPPORT:
>  		if (ep->prsctp_enable)
>  			break;
> -		goto fallthrough;
> +		goto unhandled;
>  
>  	case SCTP_PARAM_RANDOM:
>  		if (!ep->auth_enable)
> -			goto fallthrough;
> +			goto unhandled;
>  
>  		/* SCTP-AUTH: Secion 6.1
>  		 * If the random number is not 32 byte long the association
> @@ -2184,7 +2184,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  
>  	case SCTP_PARAM_CHUNKS:
>  		if (!ep->auth_enable)
> -			goto fallthrough;
> +			goto unhandled;
>  
>  		/* SCTP-AUTH: Section 3.2
>  		 * The CHUNKS parameter MUST be included once in the INIT or
> @@ -2200,7 +2200,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  
>  	case SCTP_PARAM_HMAC_ALGO:
>  		if (!ep->auth_enable)
> -			goto fallthrough;
> +			goto unhandled;
>  
>  		hmacs = (struct sctp_hmac_algo_param *)param.p;
>  		n_elt = (ntohs(param.p->length) -
> @@ -2223,7 +2223,7 @@ static enum sctp_ierror sctp_verify_param(struct net *net,
>  			retval = SCTP_IERROR_ABORT;
>  		}
>  		break;
> -fallthrough:
> +unhandled:
>  	default:
>  		pr_debug("%s: unrecognized param:%d for chunk:%d\n",
>  			 __func__, ntohs(param.p->type), cid);
> -- 
> 2.15.0
> 
> 

^ permalink raw reply

* Re: [PATCH net-next] be2net: disable bh with spin_lock in be_process_mcc
From: Denis Kirjanov @ 2019-07-31 10:54 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: sathya.perla, ajit.khaparde, sriharsha.basavapatna,
	Network Development, Denis Kirjanov
In-Reply-To: <CA+FuTSfnqV4zGvW+W0fh+=X-wm8rz1O5ZqGKXpxSVN1vPMD+sw@mail.gmail.com>

On 7/30/19, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> On Tue, Jul 30, 2019 at 7:33 AM Denis Kirjanov <kda@linux-powerpc.org>
> wrote:
>>
>> Signed-off-by: Denis Kirjanov <kdav@linux-powerpc.org>
>
> This is a partial revert of the previous change to these lines in 2012
> in commit 072a9c486004 ("netpoll: revert 6bdb7fe3104 and fix be_poll()
> instead").
>
> The commit message is empty. Can you give some context as to why this
> is needed and correct?

The idea was just to make some cleanup. Now I've checked that be_process_mcc
is invoked in 3 different places and always with BHs disabled except
the be_poll function but since it's invoked from softirq with BHs
disabled it won't hurt.


>

^ permalink raw reply

* [PATCH net-next 2/2] selftests: mlxsw: Add a test for leftover DSCP rule
From: Petr Machata @ 2019-07-31 10:30 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Petr Machata, Ido Schimmel
In-Reply-To: <cover.1564568595.git.petrm@mellanox.com>

Commit dedfde2fe1c4 ("mlxsw: spectrum_dcb: Configure DSCP map as the last
rule is removed") fixed a problem in mlxsw where last DSCP rule to be
removed remained in effect when DSCP rewrite was applied.

Add a selftest that covers this problem.

Signed-off-by: Petr Machata <petrm@mellanox.com>
---
 .../drivers/net/mlxsw/qos_dscp_router.sh      | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh
index f25e3229e1cc..c745ce3befee 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh
@@ -31,6 +31,7 @@ ALL_TESTS="
 	ping_ipv4
 	test_update
 	test_no_update
+	test_dscp_leftover
 "
 
 lib_dir=$(dirname $0)/../../../net/forwarding
@@ -50,6 +51,11 @@ reprioritize()
 	echo ${reprio[$in]}
 }
 
+zero()
+{
+    echo 0
+}
+
 h1_create()
 {
 	simple_if_init $h1 192.0.2.1/28
@@ -225,6 +231,19 @@ test_no_update()
 	__test_update 0 echo
 }
 
+# Test that when the last APP rule is removed, the prio->DSCP map is properly
+# set to zeroes, and that the last APP rule does not stay active in the ASIC.
+test_dscp_leftover()
+{
+	lldptool -T -i $swp2 -V APP -d $(dscp_map 0) >/dev/null
+	lldpad_app_wait_del
+
+	__test_update 0 zero
+
+	lldptool -T -i $swp2 -V APP $(dscp_map 0) >/dev/null
+	lldpad_app_wait_set $swp2
+}
+
 trap cleanup EXIT
 
 setup_prepare
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next 1/2] selftests: mlxsw: Fix local variable declarations in DSCP tests
From: Petr Machata @ 2019-07-31 10:30 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Petr Machata, Ido Schimmel
In-Reply-To: <cover.1564568595.git.petrm@mellanox.com>

These two tests have some problems in the global scope pollution and on
contrary, contain unnecessary local declarations. Fix them.

Signed-off-by: Petr Machata <petrm@mellanox.com>
---
 .../testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh  | 6 ++++--
 .../testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh  | 5 +++--
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh
index 40f16f2a3afd..5cbff8038f84 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_bridge.sh
@@ -36,8 +36,6 @@ source $lib_dir/lib.sh
 
 h1_create()
 {
-	local dscp;
-
 	simple_if_init $h1 192.0.2.1/28
 	tc qdisc add dev $h1 clsact
 	dscp_capture_install $h1 10
@@ -67,6 +65,7 @@ h2_destroy()
 dscp_map()
 {
 	local base=$1; shift
+	local prio
 
 	for prio in {0..7}; do
 		echo app=$prio,5,$((base + prio))
@@ -138,6 +137,7 @@ dscp_ping_test()
 	local prio=$1; shift
 	local dev_10=$1; shift
 	local dev_20=$1; shift
+	local key
 
 	local dscp_10=$(((prio + 10) << 2))
 	local dscp_20=$(((prio + 20) << 2))
@@ -175,6 +175,8 @@ dscp_ping_test()
 
 test_dscp()
 {
+	local prio
+
 	for prio in {0..7}; do
 		dscp_ping_test v$h1 192.0.2.1 192.0.2.2 $prio $h1 $h2
 	done
diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh
index 9faf02e32627..f25e3229e1cc 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/qos_dscp_router.sh
@@ -52,8 +52,6 @@ reprioritize()
 
 h1_create()
 {
-	local dscp;
-
 	simple_if_init $h1 192.0.2.1/28
 	tc qdisc add dev $h1 clsact
 	dscp_capture_install $h1 0
@@ -87,6 +85,7 @@ h2_destroy()
 dscp_map()
 {
 	local base=$1; shift
+	local prio
 
 	for prio in {0..7}; do
 		echo app=$prio,5,$((base + prio))
@@ -156,6 +155,7 @@ dscp_ping_test()
 	local reprio=$1; shift
 	local dev1=$1; shift
 	local dev2=$1; shift
+	local i
 
 	local prio2=$($reprio $prio)   # ICMP Request egress prio
 	local prio3=$($reprio $prio2)  # ICMP Response egress prio
@@ -205,6 +205,7 @@ __test_update()
 {
 	local update=$1; shift
 	local reprio=$1; shift
+	local prio
 
 	sysctl_restore net.ipv4.ip_forward_update_priority
 	sysctl_set net.ipv4.ip_forward_update_priority $update
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next 0/2] mlxsw: Test coverage for DSCP leftover fix
From: Petr Machata @ 2019-07-31 10:30 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Petr Machata, Ido Schimmel

This patch set fixes some global scope pollution issues in the DSCP tests
(in patch #1), and then proceeds (in patch #2) to add a new test for
checking whether, after DSCP prioritization rules are removed from a port,
DSCP rewrites consistently to zero, instead of the last removed rule still
staying in effect.

Petr Machata (2):
  selftests: mlxsw: Fix local variable declarations in DSCP tests
  selftests: mlxsw: Add a test for leftover DSCP rule

 .../drivers/net/mlxsw/qos_dscp_bridge.sh      |  6 +++--
 .../drivers/net/mlxsw/qos_dscp_router.sh      | 24 +++++++++++++++++--
 2 files changed, 26 insertions(+), 4 deletions(-)

-- 
2.20.1

^ permalink raw reply

* Re: [PATCH 1/2] tools: bpftool: add net load command to load XDP on interface
From: Jesper Dangaard Brouer @ 2019-07-31 10:08 UTC (permalink / raw)
  To: Daniel T. Lee
  Cc: brouer, Daniel Borkmann, Alexei Starovoitov, netdev,
	Quentin Monnet
In-Reply-To: <20190730184821.10833-2-danieltimlee@gmail.com>

On Wed, 31 Jul 2019 03:48:20 +0900
"Daniel T. Lee" <danieltimlee@gmail.com> wrote:

> By this commit, using `bpftool net load`, user can load XDP prog on
> interface. New type of enum 'net_load_type' has been made, as stated at
> cover-letter, the meaning of 'load' is, prog will be loaded on interface.

Why the keyword "load" ?
Why not "attach" (and "detach")?

For BPF there is a clear distinction between the "load" and "attach"
steps.  I know this is under subcommand "net", but to follow the
conversion of other subcommands e.g. "prog" there are both "load" and
"attach" commands.


> BPF prog will be loaded through libbpf 'bpf_set_link_xdp_fd'.

Again this is a "set" operation, not a "load" operation.

> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>

[...]
>  static int do_show(int argc, char **argv)
>  {
>  	struct bpf_attach_info attach_info = {};
> @@ -305,13 +405,17 @@ static int do_help(int argc, char **argv)
>  
>  	fprintf(stderr,
>  		"Usage: %s %s { show | list } [dev <devname>]\n"
> +		"       %s %s load PROG LOAD_TYPE <devname>\n"

The "PROG" here does it correspond to the 'bpftool prog' syntax?:

 PROG := { id PROG_ID | pinned FILE | tag PROG_TAG }

>  		"       %s %s help\n"
> +		"\n"
> +		"       " HELP_SPEC_PROGRAM "\n"
> +		"       LOAD_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n"
>  		"Note: Only xdp and tc attachments are supported now.\n"
>  		"      For progs attached to cgroups, use \"bpftool cgroup\"\n"
>  		"      to dump program attachments. For program types\n"
>  		"      sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
>  		"      consult iproute2.\n",


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox