Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [PATCH v5 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Michal Kalderon @ 2019-07-09 10:29 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Ariel Elior, jgg@ziepe.ca, dledford@redhat.com,
	galpress@amazon.com, linux-rdma@vger.kernel.org,
	davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <20190709070244.GH7034@mtr-leonro.mtl.com>

> From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> owner@vger.kernel.org> On Behalf Of Leon Romanovsky
> 
> On Mon, Jul 08, 2019 at 12:14:58PM +0300, Michal Kalderon wrote:
> > Create some common API's for adding entries to a xa_mmap.
> > Searching for an entry and freeing one.
> >
> > The code was copied from the efa driver almost as is, just renamed
> > function to be generic and not efa specific.
> >
> > Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> > ---
> >  drivers/infiniband/core/device.c      |   1 +
> >  drivers/infiniband/core/rdma_core.c   |   1 +
> >  drivers/infiniband/core/uverbs_cmd.c  |   1 +
> >  drivers/infiniband/core/uverbs_main.c | 105
> ++++++++++++++++++++++++++++++++++
> >  include/rdma/ib_verbs.h               |  32 +++++++++++
> >  5 files changed, 140 insertions(+)
> >
> > diff --git a/drivers/infiniband/core/device.c
> > b/drivers/infiniband/core/device.c
> > index 8a6ccb936dfe..a830c2c5d691 100644
> > --- a/drivers/infiniband/core/device.c
> > +++ b/drivers/infiniband/core/device.c
> > @@ -2521,6 +2521,7 @@ void ib_set_device_ops(struct ib_device *dev,
> const struct ib_device_ops *ops)
> >  	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
> >  	SET_DEVICE_OP(dev_ops, map_phys_fmr);
> >  	SET_DEVICE_OP(dev_ops, mmap);
> > +	SET_DEVICE_OP(dev_ops, mmap_free);
> >  	SET_DEVICE_OP(dev_ops, modify_ah);
> >  	SET_DEVICE_OP(dev_ops, modify_cq);
> >  	SET_DEVICE_OP(dev_ops, modify_device); diff --git
> > a/drivers/infiniband/core/rdma_core.c
> > b/drivers/infiniband/core/rdma_core.c
> > index ccf4d069c25c..7166741834c8 100644
> > --- a/drivers/infiniband/core/rdma_core.c
> > +++ b/drivers/infiniband/core/rdma_core.c
> > @@ -817,6 +817,7 @@ static void ufile_destroy_ucontext(struct
> ib_uverbs_file *ufile,
> >  	rdma_restrack_del(&ucontext->res);
> >
> >  	ib_dev->ops.dealloc_ucontext(ucontext);
> > +	rdma_user_mmap_entries_remove_free(ucontext);
> >  	kfree(ucontext);
> >
> >  	ufile->ucontext = NULL;
> > diff --git a/drivers/infiniband/core/uverbs_cmd.c
> > b/drivers/infiniband/core/uverbs_cmd.c
> > index 7ddd0e5bc6b3..44c0600245e4 100644
> > --- a/drivers/infiniband/core/uverbs_cmd.c
> > +++ b/drivers/infiniband/core/uverbs_cmd.c
> > @@ -254,6 +254,7 @@ static int ib_uverbs_get_context(struct
> > uverbs_attr_bundle *attrs)
> >
> >  	mutex_init(&ucontext->per_mm_list_lock);
> >  	INIT_LIST_HEAD(&ucontext->per_mm_list);
> > +	xa_init(&ucontext->mmap_xa);
> >
> >  	ret = get_unused_fd_flags(O_CLOEXEC);
> >  	if (ret < 0)
> > diff --git a/drivers/infiniband/core/uverbs_main.c
> > b/drivers/infiniband/core/uverbs_main.c
> > index 11c13c1381cf..37507cc27e8c 100644
> > --- a/drivers/infiniband/core/uverbs_main.c
> > +++ b/drivers/infiniband/core/uverbs_main.c
> > @@ -965,6 +965,111 @@ int rdma_user_mmap_io(struct ib_ucontext
> > *ucontext, struct vm_area_struct *vma,  }
> > EXPORT_SYMBOL(rdma_user_mmap_io);
> >
> > +static inline u64
> > +rdma_user_mmap_get_key(const struct rdma_user_mmap_entry
> *entry) {
> > +	return (u64)entry->mmap_page << PAGE_SHIFT; }
> > +
> > +struct rdma_user_mmap_entry *
> > +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64
> > +len) {
> > +	struct rdma_user_mmap_entry *entry;
> > +	u64 mmap_page;
> > +
> > +	mmap_page = key >> PAGE_SHIFT;
> > +	if (mmap_page > U32_MAX)
> > +		return NULL;
> > +
> > +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> > +	if (!entry || rdma_user_mmap_get_key(entry) != key ||
> > +	    entry->length != len)
> > +		return NULL;
> > +
> > +	ibdev_dbg(ucontext->device,
> > +		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx]
> removed\n",
> > +		  entry->obj, key, entry->address, entry->length);
> > +
> > +	return entry;
> > +}
> > +EXPORT_SYMBOL(rdma_user_mmap_entry_get);
> 
> Please add function description in kernel doc format for all newly
> EXPORT_SYMBOL() functions you introduced in RDMA/core.
Ok. Could you give me a reference to an example? Where should the 
Documentation be added to? 


> 
> > +
> > +/*
> > + * Note this locking scheme cannot support removal of entries, except
> > +during
> > + * ucontext destruction when the core code guarentees no concurrency.
> > + */
> > +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void
> *obj,
> > +				u64 address, u64 length, u8 mmap_flag) {
> > +	struct rdma_user_mmap_entry *entry;
> > +	u32 next_mmap_page;
> > +	int err;
> > +
> > +	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
> 
> It is worth to use kzalloc and not kmalloc.
> 
> Thanks

^ permalink raw reply

* RE: [PATCH v5 rdma-next 2/6] RDMA/efa: Use the common mmap_xa helpers
From: Michal Kalderon @ 2019-07-09 10:30 UTC (permalink / raw)
  To: Gal Pressman, Ariel Elior, jgg@ziepe.ca, dledford@redhat.com
  Cc: linux-rdma@vger.kernel.org, davem@davemloft.net,
	netdev@vger.kernel.org
In-Reply-To: <0a28f174-1875-452e-ea0a-c8db2d243ce5@amazon.com>

> From: Gal Pressman <galpress@amazon.com>
> Sent: Tuesday, July 9, 2019 12:03 PM
> 
> On 08/07/2019 12:14, Michal Kalderon wrote:
> 
> Hi, a few nits:
Thanks for the review, will fix them.

> 
> > Remove the functions related to managing the mmap_xa database.
> > This code was copied to the ib_core. Use the common API's instead.
> >
> > Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> > ---
> >  drivers/infiniband/hw/efa/efa.h       |   3 +-
> >  drivers/infiniband/hw/efa/efa_main.c  |   1 +
> >  drivers/infiniband/hw/efa/efa_verbs.c | 183
> > ++++++++--------------------------
> >  3 files changed, 42 insertions(+), 145 deletions(-) diff --git
> > a/drivers/infiniband/hw/efa/efa_verbs.c
> > b/drivers/infiniband/hw/efa/efa_verbs.c
> > index df77bc312a25..5dff892da161 100644
> > --- a/drivers/infiniband/hw/efa/efa_verbs.c
> > +++ b/drivers/infiniband/hw/efa/efa_verbs.c
> > @@ -13,34 +13,15 @@
> >
> >  #include "efa.h"
> >
> > -#define EFA_MMAP_FLAG_SHIFT 56
> > -#define EFA_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT -
> 1, 0)
> > -#define EFA_MMAP_INVALID U64_MAX
> > -
> 
> Don't delete the blank line please.
> 
> >  enum {
> >  	EFA_MMAP_DMA_PAGE = 0,
> >  	EFA_MMAP_IO_WC,
> >  	EFA_MMAP_IO_NC,
> >  };
> > -
> >  #define EFA_AENQ_ENABLED_GROUPS \
> >  	(BIT(EFA_ADMIN_FATAL_ERROR) | BIT(EFA_ADMIN_WARNING) | \
> >  	 BIT(EFA_ADMIN_NOTIFICATION) | BIT(EFA_ADMIN_KEEP_ALIVE))
> >
> > -struct efa_mmap_entry {
> > -	void  *obj;
> > -	u64 address;
> > -	u64 length;
> > -	u32 mmap_page;
> > -	u8 mmap_flag;
> > -};
> > -
> > -static inline u64 get_mmap_key(const struct efa_mmap_entry *efa) -{
> > -	return ((u64)efa->mmap_flag << EFA_MMAP_FLAG_SHIFT) |
> > -	       ((u64)efa->mmap_page << PAGE_SHIFT);
> > -}
> > -
> >  #define EFA_CHUNK_PAYLOAD_SHIFT       12
> >  #define EFA_CHUNK_PAYLOAD_SIZE
> BIT(EFA_CHUNK_PAYLOAD_SHIFT)
> >  #define EFA_CHUNK_PAYLOAD_PTR_SIZE    8
> > @@ -145,105 +126,7 @@ static void *efa_zalloc_mapped(struct efa_dev
> *dev, dma_addr_t *dma_addr,
> >  	return addr;
> >  }
> >
> > -/*
> > - * This is only called when the ucontext is destroyed and there can
> > be no
> > - * concurrent query via mmap or allocate on the xarray, thus we can
> > be sure no
> > - * other thread is using the entry pointer. We also know that all the
> > BAR
> > - * pages have either been zap'd or munmaped at this point.  Normal
> > pages are
> > - * refcounted and will be freed at the proper time.
> > - */
> > -static void mmap_entries_remove_free(struct efa_dev *dev,
> > -				     struct efa_ucontext *ucontext)
> > -{
> > -	struct efa_mmap_entry *entry;
> > -	unsigned long mmap_page;
> >
> > -	xa_for_each(&ucontext->mmap_xa, mmap_page, entry) {
> > -		xa_erase(&ucontext->mmap_xa, mmap_page);
> > -
> > -		ibdev_dbg(
> > -			&dev->ibdev,
> > -			"mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx]
> removed\n",
> > -			entry->obj, get_mmap_key(entry), entry->address,
> > -			entry->length);
> > -		if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
> > -			/* DMA mapping is already gone, now free the pages
> */
> > -			free_pages_exact(phys_to_virt(entry->address),
> > -					 entry->length);
> > -		kfree(entry);
> > -	}
> > -}
> > -
> > -static struct efa_mmap_entry *mmap_entry_get(struct efa_dev *dev,
> > -					     struct efa_ucontext *ucontext,
> > -					     u64 key, u64 len)
> > -{
> > -	struct efa_mmap_entry *entry;
> > -	u64 mmap_page;
> > -
> > -	mmap_page = (key & EFA_MMAP_PAGE_MASK) >> PAGE_SHIFT;
> > -	if (mmap_page > U32_MAX)
> > -		return NULL;
> > -
> > -	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> > -	if (!entry || get_mmap_key(entry) != key || entry->length != len)
> > -		return NULL;
> > -
> > -	ibdev_dbg(&dev->ibdev,
> > -		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx]
> removed\n",
> > -		  entry->obj, key, entry->address, entry->length);
> > -
> > -	return entry;
> > -}
> > -
> > -/*
> > - * Note this locking scheme cannot support removal of entries, except
> > during
> > - * ucontext destruction when the core code guarentees no concurrency.
> > - */
> > -static u64 mmap_entry_insert(struct efa_dev *dev, struct efa_ucontext
> *ucontext,
> > -			     void *obj, u64 address, u64 length, u8 mmap_flag)
> > -{
> > -	struct efa_mmap_entry *entry;
> > -	u32 next_mmap_page;
> > -	int err;
> > -
> > -	entry = kmalloc(sizeof(*entry), GFP_KERNEL);
> > -	if (!entry)
> > -		return EFA_MMAP_INVALID;
> > -
> > -	entry->obj = obj;
> > -	entry->address = address;
> > -	entry->length = length;
> > -	entry->mmap_flag = mmap_flag;
> > -
> > -	xa_lock(&ucontext->mmap_xa);
> > -	if (check_add_overflow(ucontext->mmap_xa_page,
> > -			       (u32)(length >> PAGE_SHIFT),
> > -			       &next_mmap_page))
> > -		goto err_unlock;
> > -
> > -	entry->mmap_page = ucontext->mmap_xa_page;
> > -	ucontext->mmap_xa_page = next_mmap_page;
> > -	err = __xa_insert(&ucontext->mmap_xa, entry->mmap_page,
> entry,
> > -			  GFP_KERNEL);
> > -	if (err)
> > -		goto err_unlock;
> > -
> > -	xa_unlock(&ucontext->mmap_xa);
> > -
> > -	ibdev_dbg(
> > -		&dev->ibdev,
> > -		"mmap: obj[0x%p] addr[%#llx], len[%#llx], key[%#llx]
> inserted\n",
> > -		entry->obj, entry->address, entry->length,
> get_mmap_key(entry));
> > -
> > -	return get_mmap_key(entry);
> > -
> > -err_unlock:
> > -	xa_unlock(&ucontext->mmap_xa);
> > -	kfree(entry);
> > -	return EFA_MMAP_INVALID;
> > -
> > -}
> >
> 
> You left two extra blank lines between efa_zalloc_mapped and
> efa_query_device.
> 
> >  int efa_query_device(struct ib_device *ibdev,
> >  		     struct ib_device_attr *props,
> > @@ -488,45 +371,52 @@ static int qp_mmap_entries_setup(struct efa_qp
> *qp,
> >  				 struct efa_com_create_qp_params
> *params,
> >  				 struct efa_ibv_create_qp_resp *resp)  {
> > +	u64 address;
> > +	u64 length;
> 
> Line break.
> 
> >  	/*
> >  	 * Once an entry is inserted it might be mmapped, hence cannot be
> >  	 * cleaned up until dealloc_ucontext.
> >  	 */
> >  	resp->sq_db_mmap_key =
> > -		mmap_entry_insert(dev, ucontext, qp,
> > -				  dev->db_bar_addr + resp->sq_db_offset,
> > -				  PAGE_SIZE, EFA_MMAP_IO_NC);
> > -	if (resp->sq_db_mmap_key == EFA_MMAP_INVALID)
> > +		rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
> > +					    dev->db_bar_addr +
> > +					    resp->sq_db_offset,
> > +					    PAGE_SIZE, EFA_MMAP_IO_NC);
> > +	if (resp->sq_db_mmap_key == RDMA_USER_MMAP_INVALID)
> >  		return -ENOMEM;
> >
> >  	resp->sq_db_offset &= ~PAGE_MASK;
> >
> > +	address = dev->mem_bar_addr + resp->llq_desc_offset;
> > +	length = PAGE_ALIGN(params->sq_ring_size_in_bytes +
> > +			    (resp->llq_desc_offset & ~PAGE_MASK));
> >  	resp->llq_desc_mmap_key =
> > -		mmap_entry_insert(dev, ucontext, qp,
> > -				  dev->mem_bar_addr + resp-
> >llq_desc_offset,
> > -				  PAGE_ALIGN(params-
> >sq_ring_size_in_bytes +
> > -					     (resp->llq_desc_offset &
> ~PAGE_MASK)),
> > -				  EFA_MMAP_IO_WC);
> > -	if (resp->llq_desc_mmap_key == EFA_MMAP_INVALID)
> > +		rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
> > +					    address,
> > +					    length,
> > +					    EFA_MMAP_IO_WC);
> > +	if (resp->llq_desc_mmap_key == RDMA_USER_MMAP_INVALID)
> >  		return -ENOMEM;
> >
> >  	resp->llq_desc_offset &= ~PAGE_MASK;
> >
> >  	if (qp->rq_size) {
> > +		address = dev->db_bar_addr + resp->rq_db_offset;
> >  		resp->rq_db_mmap_key =
> > -			mmap_entry_insert(dev, ucontext, qp,
> > -					  dev->db_bar_addr + resp-
> >rq_db_offset,
> > -					  PAGE_SIZE, EFA_MMAP_IO_NC);
> > -		if (resp->rq_db_mmap_key == EFA_MMAP_INVALID)
> > +			rdma_user_mmap_entry_insert(&ucontext-
> >ibucontext, qp,
> > +						    address, PAGE_SIZE,
> > +						    EFA_MMAP_IO_NC);
> > +		if (resp->rq_db_mmap_key ==
> RDMA_USER_MMAP_INVALID)
> >  			return -ENOMEM;
> >
> >  		resp->rq_db_offset &= ~PAGE_MASK;
> >
> > +		address = virt_to_phys(qp->rq_cpu_addr);
> >  		resp->rq_mmap_key =
> > -			mmap_entry_insert(dev, ucontext, qp,
> > -					  virt_to_phys(qp->rq_cpu_addr),
> > -					  qp->rq_size,
> EFA_MMAP_DMA_PAGE);
> > -		if (resp->rq_mmap_key == EFA_MMAP_INVALID)
> > +			rdma_user_mmap_entry_insert(&ucontext-
> >ibucontext, qp,
> > +						    address, qp->rq_size,
> > +						    EFA_MMAP_DMA_PAGE);
> > +		if (resp->rq_mmap_key == RDMA_USER_MMAP_INVALID)
> >  			return -ENOMEM;
> >
> >  		resp->rq_mmap_size = qp->rq_size;
> > @@ -875,11 +765,13 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct
> > ib_udata *udata)  static int cq_mmap_entries_setup(struct efa_dev *dev,
> struct efa_cq *cq,
> >  				 struct efa_ibv_create_cq_resp *resp)  {
> > +	struct efa_ucontext *ucontext = cq->ucontext;
> 
> Line break.
> 
> >  	resp->q_mmap_size = cq->size;
> > -	resp->q_mmap_key = mmap_entry_insert(dev, cq->ucontext, cq,
> > -					     virt_to_phys(cq->cpu_addr),
> > -					     cq->size,
> EFA_MMAP_DMA_PAGE);
> > -	if (resp->q_mmap_key == EFA_MMAP_INVALID)
> > +	resp->q_mmap_key =
> > +		rdma_user_mmap_entry_insert(&ucontext->ibucontext, cq,
> > +					    virt_to_phys(cq->cpu_addr),
> > +					    cq->size,
> EFA_MMAP_DMA_PAGE);
> > +	if (resp->q_mmap_key == RDMA_USER_MMAP_INVALID)
> >  		return -ENOMEM;
> >
> >  	return 0;
> > @@ -1531,7 +1423,6 @@ int efa_alloc_ucontext(struct ib_ucontext
> *ibucontext, struct ib_udata *udata)
> >  		goto err_out;
> >
> >  	ucontext->uarn = result.uarn;
> > -	xa_init(&ucontext->mmap_xa);
> >
> >  	resp.cmds_supp_udata_mask |=
> EFA_USER_CMDS_SUPP_UDATA_QUERY_DEVICE;
> >  	resp.cmds_supp_udata_mask |=
> EFA_USER_CMDS_SUPP_UDATA_CREATE_AH;
> > @@ -1560,19 +1451,25 @@ void efa_dealloc_ucontext(struct ib_ucontext
> *ibucontext)
> >  	struct efa_ucontext *ucontext = to_eucontext(ibucontext);
> >  	struct efa_dev *dev = to_edev(ibucontext->device);
> >
> > -	mmap_entries_remove_free(dev, ucontext);
> >  	efa_dealloc_uar(dev, ucontext->uarn);
> >  }
> >
> > +void efa_mmap_free(u64 address, u64 length, u8 mmap_flag)
> > +{
> > +	/* DMA mapping is already gone, now free the pages */
> > +	if (mmap_flag == EFA_MMAP_DMA_PAGE)
> > +		free_pages_exact(phys_to_virt(address), length);
> > +}
> > +
> >  static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext
> *ucontext,
> >  		      struct vm_area_struct *vma, u64 key, u64 length)
> >  {
> > -	struct efa_mmap_entry *entry;
> > +	struct rdma_user_mmap_entry *entry;
> >  	unsigned long va;
> >  	u64 pfn;
> >  	int err;
> >
> > -	entry = mmap_entry_get(dev, ucontext, key, length);
> > +	entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, key,
> length);
> >  	if (!entry) {
> >  		ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid
> entry\n",
> >  			  key);
> >

^ permalink raw reply

* Re: [PATCH nf-next 1/3] netfilter: nf_nat_proto: add nf_nat_bridge_ops support
From: Florian Westphal @ 2019-07-09 10:42 UTC (permalink / raw)
  To: wenxu; +Cc: Florian Westphal, pablo, netfilter-devel, netdev
In-Reply-To: <0a4cf910-6c87-34b6-3018-3e25f6fecdce@ucloud.cn>

wenxu <wenxu@ucloud.cn> wrote:
> > For NAT on bridge, it should be possible already to push such packets
> > up the stack by
> >
> > bridge input meta iif eth0 ip saddr 192.168.0.0/16 \
> >        meta pkttype set unicast ether daddr set 00:11:22:33:44:55
> 
> yes, packet can be push up to IP stack to handle the nat through bridge device. 
> 
> In my case dnat 2.2.1.7 to 10.0.0.7, It assume the mac address of the two address
> is the same known by outer.

I think that in general they will have different MAC addresses, so plain
replacement of ip addresses won't work.

> But in This case modify the packet dmac to bridge device, the packet push up through bridge device
> Then do nat and route send back to bridge device.

Are you saying that you can use the send-to-ip-layer approach?

We might need/want a more convenient way to do this.
There are two ways that I can see:

1. a redirect support for nftables bridge family.
   The redirect expression would be same as "ether daddr set
   <bridge_mac>", but there is no need to know the bridge mac address.

2. Support ebtables -t broute in nftables.
   The route rework for ebtables has been completed already, so
   this needs a new expression.  Packet that is brouted behaves
   as if the bridge port was not part of the bridge.

^ permalink raw reply

* Re: [PATCH v5 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Leon Romanovsky @ 2019-07-09 10:52 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: Ariel Elior, jgg@ziepe.ca, dledford@redhat.com,
	galpress@amazon.com, linux-rdma@vger.kernel.org,
	davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <MN2PR18MB31825FB2CCC56C47763C9D0DA1F10@MN2PR18MB3182.namprd18.prod.outlook.com>

On Tue, Jul 09, 2019 at 10:29:36AM +0000, Michal Kalderon wrote:
> > From: linux-rdma-owner@vger.kernel.org <linux-rdma-
> > owner@vger.kernel.org> On Behalf Of Leon Romanovsky
> >
> > On Mon, Jul 08, 2019 at 12:14:58PM +0300, Michal Kalderon wrote:
> > > Create some common API's for adding entries to a xa_mmap.
> > > Searching for an entry and freeing one.
> > >
> > > The code was copied from the efa driver almost as is, just renamed
> > > function to be generic and not efa specific.
> > >
> > > Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
> > > ---
> > >  drivers/infiniband/core/device.c      |   1 +
> > >  drivers/infiniband/core/rdma_core.c   |   1 +
> > >  drivers/infiniband/core/uverbs_cmd.c  |   1 +
> > >  drivers/infiniband/core/uverbs_main.c | 105
> > ++++++++++++++++++++++++++++++++++
> > >  include/rdma/ib_verbs.h               |  32 +++++++++++
> > >  5 files changed, 140 insertions(+)
> > >
> > > diff --git a/drivers/infiniband/core/device.c
> > > b/drivers/infiniband/core/device.c
> > > index 8a6ccb936dfe..a830c2c5d691 100644
> > > --- a/drivers/infiniband/core/device.c
> > > +++ b/drivers/infiniband/core/device.c
> > > @@ -2521,6 +2521,7 @@ void ib_set_device_ops(struct ib_device *dev,
> > const struct ib_device_ops *ops)
> > >  	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
> > >  	SET_DEVICE_OP(dev_ops, map_phys_fmr);
> > >  	SET_DEVICE_OP(dev_ops, mmap);
> > > +	SET_DEVICE_OP(dev_ops, mmap_free);
> > >  	SET_DEVICE_OP(dev_ops, modify_ah);
> > >  	SET_DEVICE_OP(dev_ops, modify_cq);
> > >  	SET_DEVICE_OP(dev_ops, modify_device); diff --git
> > > a/drivers/infiniband/core/rdma_core.c
> > > b/drivers/infiniband/core/rdma_core.c
> > > index ccf4d069c25c..7166741834c8 100644
> > > --- a/drivers/infiniband/core/rdma_core.c
> > > +++ b/drivers/infiniband/core/rdma_core.c
> > > @@ -817,6 +817,7 @@ static void ufile_destroy_ucontext(struct
> > ib_uverbs_file *ufile,
> > >  	rdma_restrack_del(&ucontext->res);
> > >
> > >  	ib_dev->ops.dealloc_ucontext(ucontext);
> > > +	rdma_user_mmap_entries_remove_free(ucontext);
> > >  	kfree(ucontext);
> > >
> > >  	ufile->ucontext = NULL;
> > > diff --git a/drivers/infiniband/core/uverbs_cmd.c
> > > b/drivers/infiniband/core/uverbs_cmd.c
> > > index 7ddd0e5bc6b3..44c0600245e4 100644
> > > --- a/drivers/infiniband/core/uverbs_cmd.c
> > > +++ b/drivers/infiniband/core/uverbs_cmd.c
> > > @@ -254,6 +254,7 @@ static int ib_uverbs_get_context(struct
> > > uverbs_attr_bundle *attrs)
> > >
> > >  	mutex_init(&ucontext->per_mm_list_lock);
> > >  	INIT_LIST_HEAD(&ucontext->per_mm_list);
> > > +	xa_init(&ucontext->mmap_xa);
> > >
> > >  	ret = get_unused_fd_flags(O_CLOEXEC);
> > >  	if (ret < 0)
> > > diff --git a/drivers/infiniband/core/uverbs_main.c
> > > b/drivers/infiniband/core/uverbs_main.c
> > > index 11c13c1381cf..37507cc27e8c 100644
> > > --- a/drivers/infiniband/core/uverbs_main.c
> > > +++ b/drivers/infiniband/core/uverbs_main.c
> > > @@ -965,6 +965,111 @@ int rdma_user_mmap_io(struct ib_ucontext
> > > *ucontext, struct vm_area_struct *vma,  }
> > > EXPORT_SYMBOL(rdma_user_mmap_io);
> > >
> > > +static inline u64
> > > +rdma_user_mmap_get_key(const struct rdma_user_mmap_entry
> > *entry) {
> > > +	return (u64)entry->mmap_page << PAGE_SHIFT; }
> > > +
> > > +struct rdma_user_mmap_entry *
> > > +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64
> > > +len) {
> > > +	struct rdma_user_mmap_entry *entry;
> > > +	u64 mmap_page;
> > > +
> > > +	mmap_page = key >> PAGE_SHIFT;
> > > +	if (mmap_page > U32_MAX)
> > > +		return NULL;
> > > +
> > > +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> > > +	if (!entry || rdma_user_mmap_get_key(entry) != key ||
> > > +	    entry->length != len)
> > > +		return NULL;
> > > +
> > > +	ibdev_dbg(ucontext->device,
> > > +		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx]
> > removed\n",
> > > +		  entry->obj, key, entry->address, entry->length);
> > > +
> > > +	return entry;
> > > +}
> > > +EXPORT_SYMBOL(rdma_user_mmap_entry_get);
> >
> > Please add function description in kernel doc format for all newly
> > EXPORT_SYMBOL() functions you introduced in RDMA/core.
> Ok. Could you give me a reference to an example? Where should the
> Documentation be added to?

Above function in *.c file.

For example, see function rdma_set_ack_timeout():
https://patchwork.kernel.org/patch/10778827/

Thanks

^ permalink raw reply

* Re: [PATCH] iwlwifi: fix warning iwl-trans.h is included more than once
From: Luca Coelho @ 2019-07-09 10:58 UTC (permalink / raw)
  To: Hariprasad Kelam, Johannes Berg, Emmanuel Grumbach,
	Intel Linux Wireless, Kalle Valo, David S. Miller, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <20190526113815.GA6328@hari-Inspiron-1545>

On Sun, 2019-05-26 at 17:08 +0530, Hariprasad Kelam wrote:
> remove duplication include of iwl-trans.h
> 
> issue identified by includecheck
> 
> Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
> ---

Thanks! I have applied this (with small modifications to the commit
message) in our internal tree and it will reach the mainline following
our normal upstreaming process.

--
Cheers,
Luca.


^ permalink raw reply

* IPv6 flow label reflection behave for RST packets
From: Marek Majkowski @ 2019-07-09 11:10 UTC (permalink / raw)
  To: kuznet, yoshfuji, Jakub Sitnicki; +Cc: netdev, kernel-team

Morning,

I'm experimenting with flow label reflection from a server point of
view. I'm able to get it working in both supported ways:

(a) per-socket with flow manager IPV6_FL_F_REFLECT and flowlabel_consistency=0

(b) with global flowlabel_reflect sysctl

However, I was surprised to see that RST after the connection is torn
down, doesn't have the correct flow label value:

IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [S]
IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [S.]
IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [.]
IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [F.]
IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [P.]
IP6 (flowlabel 0xdfc46) ::1.1235 > ::1.59276: Flags [R]

Notice, the last RST packet has inconsistent flow label. Perhaps we
can argue this behaviour might be acceptable for a per-socket
IPV6_FL_F_REFLECT option, but with global flowlabel_reflect, I would
expect the RST to preserve the reflected flow label value.

I suspect the same behaviour is true for kernel-generated ICMPv6.

Prepared test case:
https://gist.github.com/majek/139081b84f9b5b6187c8ccff802e3ab3

This behaviour is not necessarily a bug, more of a surprise. Flow
label reflection is mostly useful in deployments where Linux servers
stand behind ECMP router, which uses flow-label to compute the hash.
Flow label reflection allows ICMP PTB message to be routed back to
correct server.

It's hard to imagine a situation where generated RST or ICMP echo
response would trigger a ICMP PTB. Flow label reflection is explained
here:
https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
and:
https://tools.ietf.org/html/rfc7098
https://tools.ietf.org/html/rfc6438

Cheers,
    Marek

(Note: the unrelated "fwmark_reflect" toggle is about something
different - flow marks, but also addresses RST and ICMP generated by
the server)

^ permalink raw reply

* [PATCH] crypto: user - make NETLINK_CRYPTO work inside netns
From: Ondrej Mosnacek @ 2019-07-09 11:11 UTC (permalink / raw)
  To: linux-crypto, Herbert Xu
  Cc: netdev, David S . Miller, Stephan Mueller, Steffen Klassert,
	Don Zickus

Currently, NETLINK_CRYPTO works only in the init network namespace. It
doesn't make much sense to cut it out of the other network namespaces,
so do the minor plumbing work necessary to make it work in any network
namespace. Code inspired by net/core/sock_diag.c.

Tested using kcapi-dgst from libkcapi [1]:
Before:
    # unshare -n kcapi-dgst -c sha256 </dev/null | wc -c
    libkcapi - Error: Netlink error: sendmsg failed
    libkcapi - Error: Netlink error: sendmsg failed
    libkcapi - Error: NETLINK_CRYPTO: cannot obtain cipher information for hmac(sha512) (is required crypto_user.c patch missing? see documentation)
    0

After:
    # unshare -n kcapi-dgst -c sha256 </dev/null | wc -c
    32

[1] https://github.com/smuellerDD/libkcapi

Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
---
 crypto/crypto_user_base.c            | 37 +++++++++++++++++++---------
 crypto/crypto_user_stat.c            |  4 ++-
 include/crypto/internal/cryptouser.h |  2 --
 include/net/net_namespace.h          |  3 +++
 4 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/crypto/crypto_user_base.c b/crypto/crypto_user_base.c
index e48da3b75c71..c92d415eaf82 100644
--- a/crypto/crypto_user_base.c
+++ b/crypto/crypto_user_base.c
@@ -22,9 +22,10 @@
 #include <linux/crypto.h>
 #include <linux/cryptouser.h>
 #include <linux/sched.h>
-#include <net/netlink.h>
 #include <linux/security.h>
+#include <net/netlink.h>
 #include <net/net_namespace.h>
+#include <net/sock.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/internal/rng.h>
 #include <crypto/akcipher.h>
@@ -37,9 +38,6 @@
 
 static DEFINE_MUTEX(crypto_cfg_mutex);
 
-/* The crypto netlink socket */
-struct sock *crypto_nlsk;
-
 struct crypto_dump_info {
 	struct sk_buff *in_skb;
 	struct sk_buff *out_skb;
@@ -195,6 +193,7 @@ out:
 static int crypto_report(struct sk_buff *in_skb, struct nlmsghdr *in_nlh,
 			 struct nlattr **attrs)
 {
+	struct net *net = sock_net(in_skb->sk);
 	struct crypto_user_alg *p = nlmsg_data(in_nlh);
 	struct crypto_alg *alg;
 	struct sk_buff *skb;
@@ -226,7 +225,7 @@ drop_alg:
 	if (err)
 		return err;
 
-	return nlmsg_unicast(crypto_nlsk, skb, NETLINK_CB(in_skb).portid);
+	return nlmsg_unicast(net->crypto_nlsk, skb, NETLINK_CB(in_skb).portid);
 }
 
 static int crypto_dump_report(struct sk_buff *skb, struct netlink_callback *cb)
@@ -429,6 +428,7 @@ static const struct crypto_link {
 static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 			       struct netlink_ext_ack *extack)
 {
+	struct net *net = sock_net(skb->sk);
 	struct nlattr *attrs[CRYPTOCFGA_MAX+1];
 	const struct crypto_link *link;
 	int type, err;
@@ -459,7 +459,7 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 				.done = link->done,
 				.min_dump_alloc = min(dump_alloc, 65535UL),
 			};
-			err = netlink_dump_start(crypto_nlsk, skb, nlh, &c);
+			err = netlink_dump_start(net->crypto_nlsk, skb, nlh, &c);
 		}
 
 		return err;
@@ -483,22 +483,35 @@ static void crypto_netlink_rcv(struct sk_buff *skb)
 	mutex_unlock(&crypto_cfg_mutex);
 }
 
-static int __init crypto_user_init(void)
+static int __net_init crypto_netlink_init(struct net *net)
 {
 	struct netlink_kernel_cfg cfg = {
 		.input	= crypto_netlink_rcv,
 	};
 
-	crypto_nlsk = netlink_kernel_create(&init_net, NETLINK_CRYPTO, &cfg);
-	if (!crypto_nlsk)
-		return -ENOMEM;
+	net->crypto_nlsk = netlink_kernel_create(net, NETLINK_CRYPTO, &cfg);
+	return net->crypto_nlsk == NULL ? -ENOMEM : 0;
+}
 
-	return 0;
+static void __net_exit crypto_netlink_exit(struct net *net)
+{
+	netlink_kernel_release(net->crypto_nlsk);
+	net->crypto_nlsk = NULL;
+}
+
+static struct pernet_operations crypto_netlink_net_ops = {
+	.init = crypto_netlink_init,
+	.exit = crypto_netlink_exit,
+};
+
+static int __init crypto_user_init(void)
+{
+	return register_pernet_subsys(&crypto_netlink_net_ops);
 }
 
 static void __exit crypto_user_exit(void)
 {
-	netlink_kernel_release(crypto_nlsk);
+	unregister_pernet_subsys(&crypto_netlink_net_ops);
 }
 
 module_init(crypto_user_init);
diff --git a/crypto/crypto_user_stat.c b/crypto/crypto_user_stat.c
index a03f326a63d3..8bad88413de1 100644
--- a/crypto/crypto_user_stat.c
+++ b/crypto/crypto_user_stat.c
@@ -10,6 +10,7 @@
 #include <linux/cryptouser.h>
 #include <linux/sched.h>
 #include <net/netlink.h>
+#include <net/sock.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/internal/rng.h>
 #include <crypto/akcipher.h>
@@ -298,6 +299,7 @@ out:
 int crypto_reportstat(struct sk_buff *in_skb, struct nlmsghdr *in_nlh,
 		      struct nlattr **attrs)
 {
+	struct net *net = sock_net(in_skb->sk);
 	struct crypto_user_alg *p = nlmsg_data(in_nlh);
 	struct crypto_alg *alg;
 	struct sk_buff *skb;
@@ -329,7 +331,7 @@ drop_alg:
 	if (err)
 		return err;
 
-	return nlmsg_unicast(crypto_nlsk, skb, NETLINK_CB(in_skb).portid);
+	return nlmsg_unicast(net->crypto_nlsk, skb, NETLINK_CB(in_skb).portid);
 }
 
 MODULE_LICENSE("GPL");
diff --git a/include/crypto/internal/cryptouser.h b/include/crypto/internal/cryptouser.h
index 8c602b187c58..40623f4457df 100644
--- a/include/crypto/internal/cryptouser.h
+++ b/include/crypto/internal/cryptouser.h
@@ -1,8 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include <net/netlink.h>
 
-extern struct sock *crypto_nlsk;
-
 struct crypto_alg *crypto_alg_match(struct crypto_user_alg *p, int exact);
 
 #ifdef CONFIG_CRYPTO_STATS
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 12689ddfc24c..610e40eaea52 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -165,6 +165,9 @@ struct net {
 #endif
 #ifdef CONFIG_XDP_SOCKETS
 	struct netns_xdp	xdp;
+#endif
+#if IS_ENABLED(CONFIG_CRYPTO_USER)
+	struct sock		*crypto_nlsk;
 #endif
 	struct sock		*diag_nlsk;
 	atomic_t		fnhe_genid;
-- 
2.21.0


^ permalink raw reply related

* Re: IPv6 flow label reflection behave for RST packets
From: Eric Dumazet @ 2019-07-09 11:19 UTC (permalink / raw)
  To: Marek Majkowski, kuznet, yoshfuji, Jakub Sitnicki; +Cc: netdev, kernel-team
In-Reply-To: <CAJPywT++ibhPSzL8pCS6Jpej9EeR3g9x89xssK8U=vi6FqLUUw@mail.gmail.com>



On 7/9/19 1:10 PM, Marek Majkowski wrote:
> Morning,
> 
> I'm experimenting with flow label reflection from a server point of
> view. I'm able to get it working in both supported ways:
> 
> (a) per-socket with flow manager IPV6_FL_F_REFLECT and flowlabel_consistency=0
> 
> (b) with global flowlabel_reflect sysctl
> 
> However, I was surprised to see that RST after the connection is torn
> down, doesn't have the correct flow label value:
> 
> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [S]
> IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [S.]
> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [.]
> IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [F.]
> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [P.]
> IP6 (flowlabel 0xdfc46) ::1.1235 > ::1.59276: Flags [R]
> 
> Notice, the last RST packet has inconsistent flow label. Perhaps we
> can argue this behaviour might be acceptable for a per-socket
> IPV6_FL_F_REFLECT option, but with global flowlabel_reflect, I would
> expect the RST to preserve the reflected flow label value.
> 
> I suspect the same behaviour is true for kernel-generated ICMPv6.
> 
> Prepared test case:
> https://gist.github.com/majek/139081b84f9b5b6187c8ccff802e3ab3
> 
> This behaviour is not necessarily a bug, more of a surprise. Flow
> label reflection is mostly useful in deployments where Linux servers
> stand behind ECMP router, which uses flow-label to compute the hash.
> Flow label reflection allows ICMP PTB message to be routed back to
> correct server.
> 
> It's hard to imagine a situation where generated RST or ICMP echo
> response would trigger a ICMP PTB. Flow label reflection is explained
> here:
> https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
> and:
> https://tools.ietf.org/html/rfc7098
> https://tools.ietf.org/html/rfc6438
> 
> Cheers,
>     Marek
> 
> 
> (Note: the unrelated "fwmark_reflect" toggle is about something
> different - flow marks, but also addresses RST and ICMP generated by
> the server)
> 

Please check the recent commits, scheduled for linux-5.3

a346abe051bd2bd0d5d0140b2da9ec95639acad7 ipv6: icmp: allow flowlabel reflection in echo replies
c67b85558ff20cb1ff20874461d12af456bee5d0 ipv6: tcp: send consistent autoflowlabel in TIME_WAIT state
392096736a06bc9d8f2b42fd4bb1a44b245b9fed ipv6: tcp: fix potential NULL deref in tcp_v6_send_reset()
50a8accf10627b343109a9c9d5c361751bf753b0 ipv6: tcp: send consistent flowlabel in TIME_WAIT state
323a53c41292a0d7efc8748856c623324c8d7c21 ipv6: tcp: enable flowlabel reflection in some RST packets


^ permalink raw reply

* [PATCH] vhost: fix null pointer dereference in vhost_del_umem_range
From: Denis Kirjanov @ 2019-07-09 11:42 UTC (permalink / raw)
  To: mst, jasowang; +Cc: kvm, netdev, Denis Kirjanov

> UBSAN: Undefined behaviour in ../drivers/vhost/vhost.c:52:1
> member access within null pointer of type 'struct rb_root'
> CPU: 2 PID: 1450 Comm: syz-executor493 Not tainted
> 4.12.14-525.g4d6309b-default #1 SLE15 (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.0.0-prebuilt.qemu-project.org 04/01/2014
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0xc6/0x159 lib/dump_stack.c:52
>  ubsan_epilogue+0xe/0x81 lib/ubsan.c:164
>  handle_missaligned_access lib/ubsan.c:299 [inline]
>  __ubsan_handle_type_mismatch+0x2ed/0x44e lib/ubsan.c:325
>  vhost_chr_write_iter+0x103e/0x1330 [vhost]
>  call_write_iter include/linux/fs.h:1764 [inline]
>  new_sync_write fs/read_write.c:497 [inline]
>  __vfs_write+0x67c/0x9b0 fs/read_write.c:510
>  vfs_write+0x1a2/0x610 fs/read_write.c:558
>  SYSC_write fs/read_write.c:605 [inline]
>  SyS_write+0xd8/0x1f0 fs/read_write.c:597
>  do_syscall_64+0x26b/0x6e0 arch/x86/entry/common.c:284
>  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> RIP: 0033:0x7fdb7687a739
> RSP: 002b:00007ffd48719f38 EFLAGS: 00000213 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fdb7687a739
> RDX: 0000000000000068 RSI: 0000000020000300 RDI: 0000000000000003
> RBP: 00000000004006f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000213 R12: 00000000004004a0
> R13: 00007ffd4871a020 R14: 0000000000000000 R15: 0000000000000000
> ================================================================================
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] SMP KASAN PTI
> Modules linked in: vhost_net tun vhost tap af_packet iscsi_ibft
> iscsi_boot_sysfs bochs_drm ttm ppdev drm_kms_helper drm
> drm_panel_orientation_quirks joydev e1000 syscopyarea sysfillrect
> pcspkr sysimgblt fb_sys_fops i2c_piix4 parport_pc parport qemu_fw_cfg
> button ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic
> ata_piix ahci libahci libata serio_raw floppy sg dm_multipath dm_mod
> scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
> Supported: No, Unreleased kernel
> CPU: 2 PID: 1450 Comm: syz-executor493 Not tainted
> 4.12.14-525.g4d6309b-default #1 SLE15 (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.0.0-prebuilt.qemu-project.org 04/01/2014
> task: ffff88004b858040 task.stack: ffff88004a820000
> RIP: 0010:vhost_chr_write_iter+0x525/0x1330 [vhost]
> RSP: 0018:ffff88004a827a38 EFLAGS: 00010246
> RAX: dffffc0000000000 RBX: 1ffff10009504f51 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffffed0009504f40
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 1ffff10009504ec1 R11: 0000000000000000 R12: 0000000020000040
> R13: dffffc0000000000 R14: ffff8800352f0000 R15: ffffed0006a5e005
> FS:  00007fdb76f79740(0000) GS:ffff880060d00000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055b42161d980 CR3: 000000003422c000 CR4: 00000000000006e0
> Call Trace:
>  call_write_iter include/linux/fs.h:1764 [inline]
>  new_sync_write fs/read_write.c:497 [inline]
>  __vfs_write+0x67c/0x9b0 fs/read_write.c:510
>  vfs_write+0x1a2/0x610 fs/read_write.c:558
>  SYSC_write fs/read_write.c:605 [inline]
>  SyS_write+0xd8/0x1f0 fs/read_write.c:597
>  do_syscall_64+0x26b/0x6e0 arch/x86/entry/common.c:284
>  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> RIP: 0033:0x7fdb7687a739
> RSP: 002b:00007ffd48719f38 EFLAGS: 00000213 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fdb7687a739
> RDX: 0000000000000068 RSI: 0000000020000300 RDI: 0000000000000003
> RBP: 00000000004006f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000213 R12: 00000000004004a0
> R13: 00007ffd4871a020 R14: 0000000000000000 R15: 0000000000000000
> Code: fc ff df 80 3c 02 00 0f 85 3c 0b 00 00 49 8b 6e 60 48 85 ed 0f
> 84 1c 0b 00 00 48 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80>
> 3c 02 00 0f 85 f4 0a 00 00 4c 8b 7d 00 4d 85 ff 0f 84 e7 06
> RIP: vhost_chr_write_iter+0x525/0x1330 [vhost] RSP: ffff88004a827a38
> ---[ end trace 49849730b5255f76 ]---

Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")

Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
---
 drivers/vhost/vhost.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e995c12d8e24..026123a6fc7b 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -962,7 +962,8 @@ static void vhost_del_umem_range(struct vhost_umem *umem,
 
 	while ((node = vhost_umem_interval_tree_iter_first(&umem->umem_tree,
 							   start, end)))
-		vhost_umem_free(umem, node);
+		if (node)
+			vhost_umem_free(umem, node);
 }
 
 static void vhost_iotlb_notify_vq(struct vhost_dev *d,
-- 
2.12.3


^ permalink raw reply related

* Re: [PATCH net-next 0/2] tc-testing: Add plugin for simple traffic generation
From: Jamal Hadi Salim @ 2019-07-09 11:44 UTC (permalink / raw)
  To: Lucas Bates, davem
  Cc: netdev, xiyou.wangcong, jiri, mleitner, vladbu, dcaratti, kernel
In-Reply-To: <1562636067-1338-1-git-send-email-lucasb@mojatatu.com>

On 2019-07-08 9:34 p.m., Lucas Bates wrote:
> This series supersedes the previous submission that included a patch for test
> case verification using JSON output.  It adds a new tdc plugin, scapyPlugin, as
> a way to send traffic to test tc filters and actions.
> 
> The first patch makes a change to the TdcPlugin module that will allow tdc
> plugins to examine the test case currently being executed, so plugins can
> play a more active role in testing by accepting information or commands from
> the test case.  This is required for scapyPlugin to work.
> 
> The second patch adds scapyPlugin itself, and an example test case file to
> demonstrate how the scapy block works in the test cases.
> 

Shouldve said V3 in the subject line - but fwiw,

ACKed-by: Jamal Hadi Salim <jhs@mojatatu.com>


cheers,
jamal

^ permalink raw reply

* KASAN: global-out-of-bounds Read in load_next_firmware_from_table
From: syzbot @ 2019-07-09 12:27 UTC (permalink / raw)
  To: andreyknvl, davem, gregkh, kvalo, libertas-dev, linux-kernel,
	linux-usb, linux-wireless, netdev, syzkaller-bugs, tglx

Hello,

syzbot found the following crash on:

HEAD commit:    7829a896 usb-fuzzer: main usb gadget fuzzer driver
git tree:       https://github.com/google/kasan.git usb-fuzzer
console output: https://syzkaller.appspot.com/x/log.txt?x=12fd0e9ba00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=f6d4561982f71f63
dashboard link: https://syzkaller.appspot.com/bug?extid=98156c174c5a2cad9f8f
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=125f669ba00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=146b806ba00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+98156c174c5a2cad9f8f@syzkaller.appspotmail.com

usb 1-1: Direct firmware load for libertas/usb8388_v5.bin failed with error  
-2
usb 1-1: Direct firmware load for libertas/usb8388.bin failed with error -2
usb 1-1: Direct firmware load for usb8388.bin failed with error -2
==================================================================
BUG: KASAN: global-out-of-bounds in  
load_next_firmware_from_table+0x267/0x2d0  
drivers/net/wireless/marvell/libertas/firmware.c:99
Read of size 8 at addr ffffffff860942b8 by task kworker/1:1/21

CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 5.2.0-rc6+ #13
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: events request_firmware_work_func
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0xca/0x13e lib/dump_stack.c:113
  print_address_description+0x67/0x231 mm/kasan/report.c:188
  __kasan_report.cold+0x1a/0x32 mm/kasan/report.c:317
  kasan_report+0xe/0x20 mm/kasan/common.c:614
  load_next_firmware_from_table+0x267/0x2d0  
drivers/net/wireless/marvell/libertas/firmware.c:99
  helper_firmware_cb+0xdc/0x100  
drivers/net/wireless/marvell/libertas/firmware.c:70
  request_firmware_work_func+0x126/0x242  
drivers/base/firmware_loader/main.c:785
  process_one_work+0x905/0x1570 kernel/workqueue.c:2269
  worker_thread+0x96/0xe20 kernel/workqueue.c:2415
  kthread+0x30b/0x410 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the variable:
  fw_table+0x98/0x5c0

Memory state around the buggy address:
  ffffffff86094180: fa fa fa fa 00 04 fa fa fa fa fa fa 00 00 05 fa
  ffffffff86094200: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
> ffffffff86094280: 00 00 00 00 00 00 fa fa fa fa fa fa 00 00 00 00
                                         ^
  ffffffff86094300: 00 00 00 01 fa fa fa fa 00 00 00 00 02 fa fa fa
  ffffffff86094380: fa fa fa fa 00 03 fa fa fa fa fa fa 00 00 00 00
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: IPv6 flow label reflection behave for RST packets
From: Marek Majkowski @ 2019-07-09 12:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: kuznet, yoshfuji, Jakub Sitnicki, netdev, kernel-team
In-Reply-To: <a854848f-9fb3-47b9-cb18-e76455e5e664@gmail.com>

Ha, thanks. I missed that.

There is a caveat though. I don't think it's working as intended...
Running my script:

$ sysctl -w net.ipv6.flowlabel_reflect=3

$ tail reflect.py
cd2.close()
cd.send(b"a")

$ python3 reflect.py
IP6 (flowlabel 0xf2927, hlim 64) ::1.1235 > ::1.60246: Flags [F.]
IP6 (flowlabel 0xf2927, hlim 64) ::1.60246 > ::1.1235: Flags [P.]
IP6 (flowlabel 0x58ecd, hlim 64) ::1.1235 > ::1.60246: Flags [R]

Note. The RST is opportunistic, depending on timing I sometimes get a
proper FIN, without RST.

If I change the script to introduce some delay:

$ tail reflect.py
cd2.close()
time.sleep(0.1)
cd.send(b"a")

$ python3 reflect.py
IP6 (flowlabel 0x2f60c, hlim 64) ::1.60326 > ::1.1235: Flags [.]
IP6 (flowlabel 0x2f60c, hlim 64) ::1.60326 > ::1.1235: Flags [P.]
IP6 (flowlabel 0x2f60c, hlim 64) ::1.1235 > ::1.60326: Flags [R]

Now it seem to work reliably. Tested on net-next under virtme.

Marek

On Tue, Jul 9, 2019 at 1:19 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 7/9/19 1:10 PM, Marek Majkowski wrote:
> > Morning,
> >
> > I'm experimenting with flow label reflection from a server point of
> > view. I'm able to get it working in both supported ways:
> >
> > (a) per-socket with flow manager IPV6_FL_F_REFLECT and flowlabel_consistency=0
> >
> > (b) with global flowlabel_reflect sysctl
> >
> > However, I was surprised to see that RST after the connection is torn
> > down, doesn't have the correct flow label value:
> >
> > IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [S]
> > IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [S.]
> > IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [.]
> > IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [F.]
> > IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [P.]
> > IP6 (flowlabel 0xdfc46) ::1.1235 > ::1.59276: Flags [R]
> >
> > Notice, the last RST packet has inconsistent flow label. Perhaps we
> > can argue this behaviour might be acceptable for a per-socket
> > IPV6_FL_F_REFLECT option, but with global flowlabel_reflect, I would
> > expect the RST to preserve the reflected flow label value.
> >
> > I suspect the same behaviour is true for kernel-generated ICMPv6.
> >
> > Prepared test case:
> > https://gist.github.com/majek/139081b84f9b5b6187c8ccff802e3ab3
> >
> > This behaviour is not necessarily a bug, more of a surprise. Flow
> > label reflection is mostly useful in deployments where Linux servers
> > stand behind ECMP router, which uses flow-label to compute the hash.
> > Flow label reflection allows ICMP PTB message to be routed back to
> > correct server.
> >
> > It's hard to imagine a situation where generated RST or ICMP echo
> > response would trigger a ICMP PTB. Flow label reflection is explained
> > here:
> > https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
> > and:
> > https://tools.ietf.org/html/rfc7098
> > https://tools.ietf.org/html/rfc6438
> >
> > Cheers,
> >     Marek
> >
> >
> > (Note: the unrelated "fwmark_reflect" toggle is about something
> > different - flow marks, but also addresses RST and ICMP generated by
> > the server)
> >
>
> Please check the recent commits, scheduled for linux-5.3
>
> a346abe051bd2bd0d5d0140b2da9ec95639acad7 ipv6: icmp: allow flowlabel reflection in echo replies
> c67b85558ff20cb1ff20874461d12af456bee5d0 ipv6: tcp: send consistent autoflowlabel in TIME_WAIT state
> 392096736a06bc9d8f2b42fd4bb1a44b245b9fed ipv6: tcp: fix potential NULL deref in tcp_v6_send_reset()
> 50a8accf10627b343109a9c9d5c361751bf753b0 ipv6: tcp: send consistent flowlabel in TIME_WAIT state
> 323a53c41292a0d7efc8748856c623324c8d7c21 ipv6: tcp: enable flowlabel reflection in some RST packets
>

^ permalink raw reply

* [PATCH] net: netsec: start using buffers if page_pool registration succeeded
From: Ilias Apalodimas @ 2019-07-09 12:35 UTC (permalink / raw)
  To: netdev, jaswinder.singh, davem; +Cc: Ilias Apalodimas

The current driver starts using page_pool buffers before calling
xdp_rxq_info_reg_mem_model(). Start using the buffers after the
registration succeeded, so we won't have to call
page_pool_request_shutdown() in case of failure

Fixes: 5c67bf0ec4d0 ("net: netsec: Use page_pool API")
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/ethernet/socionext/netsec.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index d7307ab90d74..c3a4f86f56ee 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -1309,6 +1309,15 @@ static int netsec_setup_rx_dring(struct netsec_priv *priv)
 		goto err_out;
 	}
 
+	err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0);
+	if (err)
+		goto err_out;
+
+	err = xdp_rxq_info_reg_mem_model(&dring->xdp_rxq, MEM_TYPE_PAGE_POOL,
+					 dring->page_pool);
+	if (err)
+		goto err_out;
+
 	for (i = 0; i < DESC_NUM; i++) {
 		struct netsec_desc *desc = &dring->desc[i];
 		dma_addr_t dma_handle;
@@ -1327,14 +1336,6 @@ static int netsec_setup_rx_dring(struct netsec_priv *priv)
 	}
 
 	netsec_rx_fill(priv, 0, DESC_NUM);
-	err = xdp_rxq_info_reg(&dring->xdp_rxq, priv->ndev, 0);
-	if (err)
-		goto err_out;
-
-	err = xdp_rxq_info_reg_mem_model(&dring->xdp_rxq, MEM_TYPE_PAGE_POOL,
-					 dring->page_pool);
-	if (err)
-		goto err_out;
 
 	return 0;
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
From: Ido Schimmel @ 2019-07-09 12:38 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, pieter.jansenvanvuuren, andrew, f.fainelli, vivien.didelot,
	idosch, Alexei Starovoitov
In-Reply-To: <20190708155158.3f75b57c@cakuba.netronome.com>

On Mon, Jul 08, 2019 at 03:51:58PM -0700, Jakub Kicinski wrote:
> On Mon, 8 Jul 2019 16:19:08 +0300, Ido Schimmel wrote:
> > On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
> > > From: Ido Schimmel <idosch@idosch.org>
> > > Date: Sun,  7 Jul 2019 10:58:17 +0300
> > >   
> > > > Users have several ways to debug the kernel and understand why a packet
> > > > was dropped. For example, using "drop monitor" and "perf". Both
> > > > utilities trace kfree_skb(), which is the function called when a packet
> > > > is freed as part of a failure. The information provided by these tools
> > > > is invaluable when trying to understand the cause of a packet loss.
> > > > 
> > > > In recent years, large portions of the kernel data path were offloaded
> > > > to capable devices. Today, it is possible to perform L2 and L3
> > > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> > > > Different TC classifiers and actions are also offloaded to capable
> > > > devices, at both ingress and egress.
> > > > 
> > > > However, when the data path is offloaded it is not possible to achieve
> > > > the same level of introspection as tools such "perf" and "drop monitor"
> > > > become irrelevant.
> > > > 
> > > > This patchset aims to solve this by allowing users to monitor packets
> > > > that the underlying device decided to drop along with relevant metadata
> > > > such as the drop reason and ingress port.  
> > > 
> > > We are now going to have 5 or so ways to capture packets passing through
> > > the system, this is nonsense.
> > > 
> > > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
> > > devlink thing.
> > > 
> > > This is insanity, too many ways to do the same thing and therefore the
> > > worst possible user experience.
> > > 
> > > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
> > > XDP perf events, and these taps there too.
> > > 
> > > I mean really, think about it from the average user's perspective.  To
> > > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
> > > listen on devlink but configure a special tap thing beforehand and then
> > > if someone is using XDP I gotta setup another perf event buffer capture
> > > thing too.  
> > 
> > Let me try to explain again because I probably wasn't clear enough. The
> > devlink-trap mechanism is not doing the same thing as other solutions.
> > 
> > The packets we are capturing in this patchset are packets that the
> > kernel (the CPU) never saw up until now - they were silently dropped by
> > the underlying device performing the packet forwarding instead of the
> > CPU.

Jakub,

It seems to me that most of the criticism is about consolidation of
interfaces because you believe I'm doing something you can already do
today, but this is not the case.

Switch ASICs have dedicated traps for specific packets. Usually, these
packets are control packets (e.g., ARP, BGP) which are required for the
correct functioning of the control plane. You can see this in the SAI
interface, which is an abstraction layer over vendors' SDKs:

https://github.com/opencomputeproject/SAI/blob/master/inc/saihostif.h#L157

We need to be able to configure the hardware policers of these traps and
read their statistics to understand how many packets they dropped. We
currently do not have a way to do any of that and we rely on hardcoded
defaults in the driver which do not fit every use case (from
experience):

https://elixir.bootlin.com/linux/v5.2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum.c#L4103

We plan to extend devlink-trap mechanism to cover all these use cases. I
hope you agree that this functionality belongs in devlink given it is a
device-specific configuration and not a netdev-specific one.

That being said, in its current form, this mechanism is focused on traps
that correlate to packets the device decided to drop as this is very
useful for debugging.

Given that the entire configuration is done via devlink and that devlink
stores all the information about these traps, it seems logical to also
report these packets and their metadata to user space as devlink events.

If this is not desirable, we can try to call into drop_monitor from
devlink and add a new command (e.g., NET_DM_CMD_HW_ALERT), which will
encode all the information we currently have in DEVLINK_CMD_TRAP_REPORT.

IMO, this is less desirable, as instead of having one tool (devlink) to
interact with this mechanism we will need two (devlink & dropwatch).

Below I tried to answer all your questions and refer to all the points
you brought up.

> When you say silently dropped do you mean that mlxsw as of today
> doesn't have any counters exposed for those events?

Some of these packets are counted, but not all of them.

> If we wanted to consolidate this into something existing we can either
>  (a) add similar traps in the kernel data path;
>  (b) make these traps extension of statistics.
> 
> My knee jerk reaction to seeing the patches was that it adds a new
> place where device statistics are reported.

Not at all. This would be a step back. We can already count discards due
to VLAN membership on ingress on a per-port basis. A software maintained
global counter does not buy us anything.

By also getting the dropped packet - coupled with the drop reason and
ingress port - you can understand exactly why and on which VLAN the
packet was dropped. I wrote a Wireshark dissector for these netlink
packets to make our life easier. You can see the details in my comment
to the cover letter:

https://marc.info/?l=linux-netdev&m=156248736710238&w=2

In case you do not care about individual packets, but still want more
fine-grained statistics for your monitoring application, you can use
eBPF. For example, one thing we did is attaching a kprobe to
devlink_trap_report() with an eBPF program that dissects the incoming
skbs and maintains a counter per-{5 tuple, drop reason}. With
ebpf_exporter you can export these statistics to Prometheus on which you
can run queries and visualize the results with Grafana. This is
especially useful for tail and early drops since it allows you to
understand which flows contribute to most of the drops.

> Users who want to know why things are dropped will not get detailed
> breakdown from ethtool -S which for better or worse is the one stop
> shop for device stats today.

I hope I managed to explain why counters are not enough, but I also want
to point out that ethtool statistics are not properly documented and
this hinders their effectiveness. I did my best to document the exposed
traps in order to avoid the same fate:

https://patchwork.ozlabs.org/patch/1128585/

In addition, there are selftests to show how each trap can be triggered
to reduce the ambiguity even further:

https://patchwork.ozlabs.org/patch/1128610/

And a note in the documentation to make sure future functionality is
tested as well:

https://patchwork.ozlabs.org/patch/1128608/

> Having thought about it some more, however, I think that having a
> forwarding "exception" object and hanging statistics off of it is a
> better design, even if we need to deal with some duplication to get
> there.
> 
> IOW having an way to "trap all packets which would increment a
> statistic" (option (b) above) is probably a bad design.
> 
> As for (a) I wonder how many of those events have a corresponding event
> in the kernel stack?

Generic packet drops all have a corresponding kfree_skb() calls in the
kernel, but that does not mean that every packet dropped by the hardware
would also be dropped by the kernel if it were to be injected to its Rx
path. In my reply to Dave I gave buffer drops as an example.

There are also situations in which packets can be dropped due to
device-specific exceptions and these do not have a corresponding drop
reason in the kernel. See example here:

https://patchwork.ozlabs.org/patch/1128587/

> If we could add corresponding trace points and just feed those from
> the device driver, that'd obviously be a holy grail.

Unlike tracepoints, netlink gives you a structured and extensible
interface. For example, in Spectrum-1 we cannot provide the Tx port for
early/tail drops, whereas for Spectrum-2 and later we can. With netlink,
we can just omit the DEVLINK_ATTR_TRAP_OUT_PORT attribute for
Spectrum-1. You also get a programmatic interface that you can query for
this information:

# devlink -v trap show netdevsim/netdevsim10 trap ingress_vlan_filter
netdevsim/netdevsim10:
  name ingress_vlan_filter type drop generic true report false action drop group l2_drops
    metadata:
       input_port

Thanks

^ permalink raw reply

* [PATCH v2 0/4] Fix hang of Armada 8040 SoC in orion-mdio
From: josua @ 2019-07-09 13:00 UTC (permalink / raw)
  To: netdev; +Cc: Josua Mayer
In-Reply-To: <20190706151900.14355-1-josua@solid-run.com>

From: Josua Mayer <josua.mayer@jm0.eu>

With a modular kernel as configured by Debian a hang was observed with
the Armada 8040 SoC in the Clearfog GT and Macchiatobin boards.

The 8040 SoC actually requires four clocks to be enabled for the mdio
interface to function. All 4 clocks are already specified in
armada-cp110.dtsi. It has however been missed that the orion-mdio driver
only supports enabling up to three clocks.

This patch-set allows the orion-mdio driver to handle four clocks and
adds a warning when more clocks are specified to prevent this particular
oversight in the future.

Changes since v1:
- fixed condition for priting the warning (Andrew Lunn)
- rephrased commit description for deferred probing (Andrew Lunn)
- fixed compiler warnings (kbuild test robot)

Josua Mayer (4):
  dt-bindings: allow up to four clocks for orion-mdio
  net: mvmdio: allow up to four clocks to be specified for orion-mdio
  net: mvmdio: print warning when orion-mdio has too many clocks
  net: mvmdio: defer probe of orion-mdio if a clock is not ready

 Documentation/devicetree/bindings/net/marvell-orion-mdio.txt |  2 +-
 drivers/net/ethernet/marvell/mvmdio.c                        | 11 ++++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

-- 
2.16.4

^ permalink raw reply

* [PATCH v2 4/4] net: mvmdio: defer probe of orion-mdio if a clock is not ready
From: josua @ 2019-07-09 13:01 UTC (permalink / raw)
  To: netdev; +Cc: Josua Mayer, David S. Miller, Andrew Lunn
In-Reply-To: <20190709130101.5160-1-josua@solid-run.com>

From: Josua Mayer <josua@solid-run.com>

Defer probing of the orion-mdio interface when getting a clock returns
EPROBE_DEFER. This avoids locking up the Armada 8k SoC when mdio is used
before all clocks have been enabled.

Signed-off-by: Josua Mayer <josua@solid-run.com>
---
 drivers/net/ethernet/marvell/mvmdio.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvmdio.c b/drivers/net/ethernet/marvell/mvmdio.c
index eba18065a4da..f660cc2b8258 100644
--- a/drivers/net/ethernet/marvell/mvmdio.c
+++ b/drivers/net/ethernet/marvell/mvmdio.c
@@ -321,6 +321,10 @@ static int orion_mdio_probe(struct platform_device *pdev)
 
 	for (i = 0; i < ARRAY_SIZE(dev->clk); i++) {
 		dev->clk[i] = of_clk_get(pdev->dev.of_node, i);
+		if (PTR_ERR(dev->clk[i]) == -EPROBE_DEFER) {
+			ret = -EPROBE_DEFER;
+			goto out_clk;
+		}
 		if (IS_ERR(dev->clk[i]))
 			break;
 		clk_prepare_enable(dev->clk[i]);
@@ -366,6 +370,7 @@ static int orion_mdio_probe(struct platform_device *pdev)
 	if (dev->err_interrupt > 0)
 		writel(0, dev->regs + MVMDIO_ERR_INT_MASK);
 
+out_clk:
 	for (i = 0; i < ARRAY_SIZE(dev->clk); i++) {
 		if (IS_ERR(dev->clk[i]))
 			break;
-- 
2.16.4


^ permalink raw reply related

* [PATCH v2 2/4] net: mvmdio: allow up to four clocks to be specified for orion-mdio
From: josua @ 2019-07-09 13:00 UTC (permalink / raw)
  To: netdev; +Cc: Josua Mayer, stable, David S. Miller, Andrew Lunn
In-Reply-To: <20190709130101.5160-1-josua@solid-run.com>

From: Josua Mayer <josua@solid-run.com>

Allow up to four clocks to be specified and enabled for the orion-mdio
interface, which are required by the Armada 8k and defined in
armada-cp110.dtsi.

Fixes a hang in probing the mvmdio driver that was encountered on the
Clearfog GT 8K with all drivers built as modules, but also affects other
boards such as the MacchiatoBIN.

Cc: stable@vger.kernel.org
Fixes: 96cb43423822 ("net: mvmdio: allow up to three clocks to be specified for orion-mdio")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Josua Mayer <josua@solid-run.com>
---
 drivers/net/ethernet/marvell/mvmdio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvmdio.c b/drivers/net/ethernet/marvell/mvmdio.c
index c5dac6bd2be4..e17d563e97a6 100644
--- a/drivers/net/ethernet/marvell/mvmdio.c
+++ b/drivers/net/ethernet/marvell/mvmdio.c
@@ -64,7 +64,7 @@

 struct orion_mdio_dev {
 	void __iomem *regs;
-	struct clk *clk[3];
+	struct clk *clk[4];
 	/*
 	 * If we have access to the error interrupt pin (which is
 	 * somewhat misnamed as it not only reflects internal errors
-- 
2.16.4

^ permalink raw reply related

* [PATCH v2 1/4] dt-bindings: allow up to four clocks for orion-mdio
From: josua @ 2019-07-09 13:00 UTC (permalink / raw)
  To: netdev
  Cc: Josua Mayer, stable, David S. Miller, Rob Herring, Mark Rutland,
	Andrew Lunn
In-Reply-To: <20190709130101.5160-1-josua@solid-run.com>

From: Josua Mayer <josua@solid-run.com>

Armada 8040 needs four clocks to be enabled for MDIO accesses to work.
Update the binding to allow the extra clock to be specified.

Cc: stable@vger.kernel.org
Fixes: 6d6a331f44a1 ("dt-bindings: allow up to three clocks for orion-mdio")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Josua Mayer <josua@solid-run.com>
---
 Documentation/devicetree/bindings/net/marvell-orion-mdio.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt b/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt
index 42cd81090a2c..3f3cfc1d8d4d 100644
--- a/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt
+++ b/Documentation/devicetree/bindings/net/marvell-orion-mdio.txt
@@ -16,7 +16,7 @@ Required properties:
 
 Optional properties:
 - interrupts: interrupt line number for the SMI error/done interrupt
-- clocks: phandle for up to three required clocks for the MDIO instance
+- clocks: phandle for up to four required clocks for the MDIO instance
 
 The child nodes of the MDIO driver are the individual PHY devices
 connected to this MDIO bus. They must have a "reg" property given the
-- 
2.16.4


^ permalink raw reply related

* [PATCH v2 3/4] net: mvmdio: print warning when orion-mdio has too many clocks
From: josua @ 2019-07-09 13:01 UTC (permalink / raw)
  To: netdev; +Cc: Josua Mayer, David S. Miller, Andrew Lunn
In-Reply-To: <20190709130101.5160-1-josua@solid-run.com>

From: Josua Mayer <josua@solid-run.com>

Print a warning when device tree specifies more than the maximum of four
clocks supported by orion-mdio. Because reading from mdio can lock up
the Armada 8k when a required clock is not initialized, it is important
to notify the user when a specified clock is ignored.

Signed-off-by: Josua Mayer <josua@solid-run.com>
---
 drivers/net/ethernet/marvell/mvmdio.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvmdio.c b/drivers/net/ethernet/marvell/mvmdio.c
index e17d563e97a6..eba18065a4da 100644
--- a/drivers/net/ethernet/marvell/mvmdio.c
+++ b/drivers/net/ethernet/marvell/mvmdio.c
@@ -326,6 +326,10 @@ static int orion_mdio_probe(struct platform_device *pdev)
 		clk_prepare_enable(dev->clk[i]);
 	}

+	if (!IS_ERR(of_clk_get(pdev->dev.of_node, ARRAY_SIZE(dev->clk))))
+		dev_warn(&pdev->dev, "unsupported number of clocks, limiting to the first "
+			 __stringify(ARRAY_SIZE(dev->clk)) "\n");
+
 	dev->err_interrupt = platform_get_irq(pdev, 0);
 	if (dev->err_interrupt > 0 &&
 	    resource_size(r) < MVMDIO_ERR_INT_MASK + 4) {
-- 
2.16.4

^ permalink raw reply related

* Re: [PATCH v2 3/4] net: mvmdio: print warning when orion-mdio has too many clocks
From: Andrew Lunn @ 2019-07-09 13:14 UTC (permalink / raw)
  To: josua; +Cc: netdev, David S. Miller
In-Reply-To: <20190709130101.5160-4-josua@solid-run.com>

On Tue, Jul 09, 2019 at 03:01:00PM +0200, josua@solid-run.com wrote:
> From: Josua Mayer <josua@solid-run.com>
> 
> Print a warning when device tree specifies more than the maximum of four
> clocks supported by orion-mdio. Because reading from mdio can lock up
> the Armada 8k when a required clock is not initialized, it is important
> to notify the user when a specified clock is ignored.
> 
> Signed-off-by: Josua Mayer <josua@solid-run.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH v2 4/4] net: mvmdio: defer probe of orion-mdio if a clock is not ready
From: Andrew Lunn @ 2019-07-09 13:15 UTC (permalink / raw)
  To: josua; +Cc: netdev, David S. Miller
In-Reply-To: <20190709130101.5160-5-josua@solid-run.com>

On Tue, Jul 09, 2019 at 03:01:01PM +0200, josua@solid-run.com wrote:
> From: Josua Mayer <josua@solid-run.com>
> 
> Defer probing of the orion-mdio interface when getting a clock returns
> EPROBE_DEFER. This avoids locking up the Armada 8k SoC when mdio is used
> before all clocks have been enabled.
> 
> Signed-off-by: Josua Mayer <josua@solid-run.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [PATCH iproute2 1/2] Revert "ip6tunnel: fix 'ip -6 {show|change} dev <name>' cmds"
From: Andrea Claudi @ 2019-07-09 13:16 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern
In-Reply-To: <cover.1562667648.git.aclaudi@redhat.com>

This reverts commit ba126dcad20e6d0e472586541d78bdd1ac4f1123.
It breaks tunnel creation when using 'dev' parameter:

$ ip link add type dummy
$ ip -6 tunnel add ip6tnl1 mode ip6ip6 remote 2001:db8:ffff:100::2 local 2001:db8:ffff:100::1 hoplimit 1 tclass 0x0 dev dummy0
add tunnel "ip6tnl0" failed: File exists

dev parameter must be used to specify the device to which
the tunnel is binded, and not the tunnel itself.

Reported-by: Jianwen Ji <jiji@redhat.com>
Reviewed-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
---
 ip/ip6tunnel.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c
index 56fd3466ed062..999408ed801b1 100644
--- a/ip/ip6tunnel.c
+++ b/ip/ip6tunnel.c
@@ -298,8 +298,6 @@ static int parse_args(int argc, char **argv, int cmd, struct ip6_tnl_parm2 *p)
 		p->link = ll_name_to_index(medium);
 		if (!p->link)
 			return nodev(medium);
-		else
-			strlcpy(p->name, medium, sizeof(p->name));
 	}
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related

* [PATCH iproute2 0/2] Fix IPv6 tunnel add when dev param is used
From: Andrea Claudi @ 2019-07-09 13:16 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern

Commit ba126dcad20e6 ("ip6tunnel: fix 'ip -6 {show|change} dev
<name>' cmds") breaks IPv6 tunnel creation when dev parameter
is used.

This series revert the original commit, which mistakenly use
dev for tunnel name, while addressing a issue on tunnel change
when no interface name is specified.

Andrea Claudi (2):
  Revert "ip6tunnel: fix 'ip -6 {show|change} dev <name>' cmds"
  ip tunnel: warn when changing IPv6 tunnel without tunnel name

 ip/ip6tunnel.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

-- 
2.20.1

^ permalink raw reply

* [PATCH iproute2 2/2] ip tunnel: warn when changing IPv6 tunnel without tunnel name
From: Andrea Claudi @ 2019-07-09 13:16 UTC (permalink / raw)
  To: netdev; +Cc: stephen, dsahern
In-Reply-To: <cover.1562667648.git.aclaudi@redhat.com>

Tunnel change fails if a tunnel name is not specified while using
'ip -6 tunnel change'. However, no warning message is printed and
no error code is returned.

$ ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none dev dummy0
$ ip -6 tunnel change dev dummy0 local 2001:1234::1 remote 2001:1234::2
$ ip -6 tunnel show ip6tnl1
ip6tnl1: gre/ipv6 remote fd::2 local fd::1 dev dummy0 encaplimit none hoplimit 127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)

This commit checks if tunnel interface name is equal to an empty
string: in this case, it prints a warning message to the user.
It intentionally avoids to return an error to not break existing
script setup.

This is the output after this commit:
$ ip -6 tunnel add ip6tnl1 mode ip6gre local fd::1 remote fd::2 tos inherit ttl 127 encaplimit none dev dummy0
$ ip -6 tunnel change dev dummy0 local 2001:1234::1 remote 2001:1234::2
Tunnel interface name not specified
$ ip -6 tunnel show ip6tnl1
ip6tnl1: gre/ipv6 remote fd::2 local fd::1 dev dummy0 encaplimit none hoplimit 127 tclass inherit flowlabel 0x00000 (flowinfo 0x00000000)

Reviewed-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
---
 ip/ip6tunnel.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/ip/ip6tunnel.c b/ip/ip6tunnel.c
index 999408ed801b1..e3da11eb4518e 100644
--- a/ip/ip6tunnel.c
+++ b/ip/ip6tunnel.c
@@ -386,6 +386,9 @@ static int do_add(int cmd, int argc, char **argv)
 	if (parse_args(argc, argv, cmd, &p) < 0)
 		return -1;
 
+	if (!*p.name)
+		fprintf(stderr, "Tunnel interface name not specified\n");
+
 	if (p.proto == IPPROTO_GRE)
 		basedev = "ip6gre0";
 	else if (p.i_flags & VTI_ISVTI)
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next] bnxt_en: Add page_pool_destroy() during RX ring cleanup.
From: Andy Gospodarek @ 2019-07-09 13:18 UTC (permalink / raw)
  To: Michael Chan; +Cc: davem, netdev, Ilias Apalodimas
In-Reply-To: <1562658607-30048-1-git-send-email-michael.chan@broadcom.com>

On Tue, Jul 09, 2019 at 03:50:07AM -0400, Michael Chan wrote:
> Add page_pool_destroy() in bnxt_free_rx_rings() during normal RX ring
> cleanup, as Ilias has informed us that the following commit has been
> merged:
> 
> 1da4bbeffe41 ("net: core: page_pool: add user refcnt and reintroduce page_pool_destroy")
> 
> The special error handling code to call page_pool_free() can now be
> removed.  bnxt_free_rx_rings() will always be called during normal
> shutdown or any error paths.
> 
> Fixes: 322b87ca55f2 ("bnxt_en: add page_pool support")
> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Cc: Andy Gospodarek <gospo@broadcom.com>
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index e9d3bd8..2b5b0ab 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -2500,6 +2500,7 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
>  		if (xdp_rxq_info_is_reg(&rxr->xdp_rxq))
>  			xdp_rxq_info_unreg(&rxr->xdp_rxq);
>  
> +		page_pool_destroy(rxr->page_pool);
>  		rxr->page_pool = NULL;
>  
>  		kfree(rxr->rx_tpa);
> @@ -2560,19 +2561,14 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
>  			return rc;
>  
>  		rc = xdp_rxq_info_reg(&rxr->xdp_rxq, bp->dev, i);
> -		if (rc < 0) {
> -			page_pool_free(rxr->page_pool);
> -			rxr->page_pool = NULL;
> +		if (rc < 0)
>  			return rc;
> -		}
>  
>  		rc = xdp_rxq_info_reg_mem_model(&rxr->xdp_rxq,
>  						MEM_TYPE_PAGE_POOL,
>  						rxr->page_pool);
>  		if (rc) {
>  			xdp_rxq_info_unreg(&rxr->xdp_rxq);
> -			page_pool_free(rxr->page_pool);
> -			rxr->page_pool = NULL;

Rather than deleting these lines it would also be acceptable to do:

                if (rc) {
                        xdp_rxq_info_unreg(&rxr->xdp_rxq);
-                       page_pool_free(rxr->page_pool);
+                       page_pool_destroy(rxr->page_pool);
                        rxr->page_pool = NULL;
                        return rc;
                }

but anytime there is a failure to bnxt_alloc_rx_rings the driver will
immediately follow it up with a call to bnxt_free_rx_rings, so
page_pool_destroy will be called.

Thanks for pushing this out so quickly!

Acked-by: Andy Gospodarek <gospo@broadcom.com> 


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox