Netdev List
 help / color / mirror / Atom feed
* [PATCH] ethtool: igb: dump RR2DCDELAY register
From: Artem Bityutskiy @ 2019-07-15  6:52 UTC (permalink / raw)
  To: John W . Linville; +Cc: netdev

Decode 'RR2DCDELAY' register which Linux kernel provides starting from version
5.3. The corresponding commit in the Linux kernel is:
    cd502a7f7c9c igb: add RR2DCDELAY to ethtool registers dump

The RR2DCDELAY register is present in I210 and I211 Intel Gigabit Ethernet
chips and it stands for "Read Request To Data Completion Delay". Here is how
this register is described in the I210 datasheet:

"This field captures the maximum PCIe split time in 16 ns units, which is the
maximum delay between the read request to the first data completion. This is
giving an estimation of the PCIe round trip time."

In practice, this register can be used to measure the time it takes the NIC to
read data from the host memory.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
---
 igb.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/igb.c b/igb.c
index e0ccef9..ab0896f 100644
--- a/igb.c
+++ b/igb.c
@@ -859,6 +859,18 @@ igb_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs)
 		"0x03430: TDFPC       (Tx data FIFO packet count)      0x%08X\n",
 		regs_buff[550]);
 
+	/*
+	 * Starting from kernel version 5.3 the registers dump buffer grew from
+	 * 739 4-byte words to 740 words, and word 740 contains the RR2DCDLAY
+	 * register.
+	 */
+	if (regs->len < 740)
+		return 0;
+
+	fprintf(stdout,
+		"0x05BF4: RR2DCDELAY  (Max. DMA read delay)            0x%08X\n",
+		regs_buff[739]);
+
 	return 0;
 }
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH] mm/gup: Use put_user_page*() instead of put_page*()
From: Bharath Vedartham @ 2019-07-15  6:56 UTC (permalink / raw)
  To: John Hubbard
  Cc: akpm, ira.weiny, Mauro Carvalho Chehab, Dimitri Sivanich,
	Arnd Bergmann, Greg Kroah-Hartman, Alex Williamson, Cornelia Huck,
	Jens Axboe, Alexander Viro, Björn Töpel,
	Magnus Karlsson, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, Jakub Kicinski, Jesper Dangaard Brouer,
	John Fastabend, Enrico Weigelt, Thomas Gleixner, Alexios Zavras,
	Dan Carpenter, Max Filippov, Matt Sickler, Kirill A. Shutemov,
	Keith Busch, YueHaibing, linux-media, linux-kernel, devel, kvm,
	linux-block, linux-fsdevel, linux-mm, netdev, bpf, xdp-newbies,
	Jason Gunthorpe
In-Reply-To: <deea584f-2da2-8e1f-5a07-e97bf32c63bb@nvidia.com>

On Sun, Jul 14, 2019 at 04:33:42PM -0700, John Hubbard wrote:
> On 7/14/19 12:08 PM, Bharath Vedartham wrote:
> > This patch converts all call sites of get_user_pages
> > to use put_user_page*() instead of put_page*() functions to
> > release reference to gup pinned pages.
Hi John, 
> Hi Bharath,
> 
> Thanks for jumping in to help, and welcome to the party!
> 
> You've caught everyone in the middle of a merge window, btw.  As a
> result, I'm busy rebasing and reworking the get_user_pages call sites, 
> and gup tracking, in the wake of some semi-traumatic changes to bio 
> and gup and such. I plan to re-post right after 5.3-rc1 shows up, from 
> here:
> 
>     https://github.com/johnhubbard/linux/commits/gup_dma_core
> 
> ...which you'll find already covers the changes you've posted, except for:
> 
>     drivers/misc/sgi-gru/grufault.c
>     drivers/staging/kpc2000/kpc_dma/fileops.c
> 
> ...and this one, which is undergoing to larger local changes, due to
> bvec, so let's leave it out of the choices:
> 
>     fs/io_uring.c
> 
> Therefore, until -rc1, if you'd like to help, I'd recommend one or more
> of the following ideas:
> 
> 1. Pull down https://github.com/johnhubbard/linux/commits/gup_dma_core
> and find missing conversions: look for any additional missing 
> get_user_pages/put_page conversions. You've already found a couple missing 
> ones. I haven't re-run a search in a long time, so there's probably even more.
> 	a) And find more, after I rebase to 5.3-rc1: people probably are adding
> 	get_user_pages() calls as we speak. :)
Shouldn't this be documented then? I don't see any docs for using
put_user_page*() in v5.2.1 in the memory management API section?
> 2. Patches: Focus on just one subsystem at a time, and perfect the patch for
> it. For example, I think this the staging driver would be perfect to start with:
> 
>     drivers/staging/kpc2000/kpc_dma/fileops.c
> 
> 	a) verify that you've really, corrected converted the whole
> 	driver. (Hint: I think you might be overlooking a put_page call.)
Yup. I did see that! Will fix it!
> 	b) Attempt to test it if you can (I'm being hypocritical in
> 	the extreme here, but one of my problems is that testing
> 	has been light, so any help is very valuable). qemu...?
> 	OTOH, maybe even qemu cannot easily test a kpc2000, but
> 	perhaps `git blame` and talking to the authors would help
> 	figure out a way to validate the changes.
Great! I ll do that, I ll mail the patch authors and ask them for help
in testing. 
> 	Thinking about whether you can run a test that would prove or
> 	disprove my claim in (a), above, could be useful in coming up
> 	with tests to run.

> In other words, a few very high quality conversions (even just one) that
> we can really put our faith in, is what I value most here. Tested patches
> are awesome.
I understand that! 
> 3. Once I re-post, turn on the new CONFIG_DEBUG_GET_USER_PAGES_REFERENCES
> and run things such as xfstest/fstest. (Again, doing so would be going
> further than I have yet--very helpful). Help clarify what conversions have
> actually been tested and work, and which ones remain unvalidated.
> Other: Please note that this:
Yup will do that.
>     https://github.com/johnhubbard/linux/commits/gup_dma_core
> 
>     a) gets rebased often, and
> 
>     b) has a bunch of commits (iov_iter and related) that conflict
>        with the latest linux.git,
> 
>     c) has some bugs in the bio area, that I'm fixing, so I don't trust
>        that's it's safely runnable, for a few more days.
I assume your repo contains only work related to fixing gup issues and
not the main repo for gup development? i.e where gup changes are merged?
Also are release_pages and put_user_pages interchangable? 
> One note below, for the future:
> 
> > 
> > This is a bunch of trivial conversions which is a part of an effort
> > by John Hubbard to solve issues with gup pinned pages and 
> > filesystem writeback.
> > 
> > The issue is more clearly described in John Hubbard's patch[1] where
> > put_user_page*() functions are introduced.
> > 
> > Currently put_user_page*() simply does put_page but future implementations
> > look to change that once treewide change of put_page callsites to 
> > put_user_page*() is finished.
> > 
> > The lwn article describing the issue with gup pinned pages and filesystem 
> > writeback [2].
> > 
> > This patch has been tested by building and booting the kernel as I don't
> > have the required hardware to test the device drivers.
> > 
> > I did not modify gpu/drm drivers which use release_pages instead of
> > put_page() to release reference of gup pinned pages as I am not clear
> > whether release_pages and put_page are interchangable. 
> > 
> > [1] https://lkml.org/lkml/2019/3/26/1396
> 
> When referring to patches in a commit description, please use the 
> commit hash, not an external link. See Submitting Patches [1] for details.
> 
> Also, once you figure out the right maintainers and other involved people,
> putting Cc: in the commit description is common practice, too.
> 
> [1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html
Will work on that! Thanks!
> thanks,
> -- 
> John Hubbard
> NVIDIA
> 
> > 
> > [2] https://lwn.net/Articles/784574/
> > 
> > Signed-off-by: Bharath Vedartham <linux.bhar@gmail.com>
> > ---
> >  drivers/media/v4l2-core/videobuf-dma-sg.c | 3 +--
> >  drivers/misc/sgi-gru/grufault.c           | 2 +-
> >  drivers/staging/kpc2000/kpc_dma/fileops.c | 4 +---
> >  drivers/vfio/vfio_iommu_type1.c           | 2 +-
> >  fs/io_uring.c                             | 7 +++----
> >  mm/gup_benchmark.c                        | 6 +-----
> >  net/xdp/xdp_umem.c                        | 7 +------
> >  7 files changed, 9 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
> > index 66a6c6c..d6eeb43 100644
> > --- a/drivers/media/v4l2-core/videobuf-dma-sg.c
> > +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
> > @@ -349,8 +349,7 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma)
> >  	BUG_ON(dma->sglen);
> >  
> >  	if (dma->pages) {
> > -		for (i = 0; i < dma->nr_pages; i++)
> > -			put_page(dma->pages[i]);
> > +		put_user_pages(dma->pages, dma->nr_pages);
> >  		kfree(dma->pages);
> >  		dma->pages = NULL;
> >  	}
> > diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> > index 4b713a8..61b3447 100644
> > --- a/drivers/misc/sgi-gru/grufault.c
> > +++ b/drivers/misc/sgi-gru/grufault.c
> > @@ -188,7 +188,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
> >  	if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
> >  		return -EFAULT;
> >  	*paddr = page_to_phys(page);
> > -	put_page(page);
> > +	put_user_page(page);
> >  	return 0;
> >  }
> >  
> > diff --git a/drivers/staging/kpc2000/kpc_dma/fileops.c b/drivers/staging/kpc2000/kpc_dma/fileops.c
> > index 6166587..26dceed 100644
> > --- a/drivers/staging/kpc2000/kpc_dma/fileops.c
> > +++ b/drivers/staging/kpc2000/kpc_dma/fileops.c
> > @@ -198,9 +198,7 @@ int  kpc_dma_transfer(struct dev_private_data *priv, struct kiocb *kcb, unsigned
> >  	sg_free_table(&acd->sgt);
> >   err_dma_map_sg:
> >   err_alloc_sg_table:
> > -	for (i = 0 ; i < acd->page_count ; i++){
> > -		put_page(acd->user_pages[i]);
> > -	}
> > +	put_user_pages(acd->user_pages, acd->page_count);
> >   err_get_user_pages:
> >  	kfree(acd->user_pages);
> >   err_alloc_userpages:
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index add34ad..c491524 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -369,7 +369,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> >  		 */
> >  		if (ret > 0 && vma_is_fsdax(vmas[0])) {
> >  			ret = -EOPNOTSUPP;
> > -			put_page(page[0]);
> > +			put_user_page(page[0]);
> >  		}
> >  	}
> >  	up_read(&mm->mmap_sem);
> > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > index 4ef62a4..b4a4549 100644
> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -2694,10 +2694,9 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
> >  			 * if we did partial map, or found file backed vmas,
> >  			 * release any pages we did get
> >  			 */
> > -			if (pret > 0) {
> > -				for (j = 0; j < pret; j++)
> > -					put_page(pages[j]);
> > -			}
> > +			if (pret > 0)
> > +				put_user_pages(pages, pret);
> > +
> >  			if (ctx->account_mem)
> >  				io_unaccount_mem(ctx->user, nr_pages);
> >  			kvfree(imu->bvec);
> > diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
> > index 7dd602d..15fc7a2 100644
> > --- a/mm/gup_benchmark.c
> > +++ b/mm/gup_benchmark.c
> > @@ -76,11 +76,7 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
> >  	gup->size = addr - gup->addr;
> >  
> >  	start_time = ktime_get();
> > -	for (i = 0; i < nr_pages; i++) {
> > -		if (!pages[i])
> > -			break;
> > -		put_page(pages[i]);
> > -	}
> > +	put_user_pages(pages, nr_pages);
> >  	end_time = ktime_get();
> >  	gup->put_delta_usec = ktime_us_delta(end_time, start_time);
> >  
> > diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> > index 9c6de4f..6103e19 100644
> > --- a/net/xdp/xdp_umem.c
> > +++ b/net/xdp/xdp_umem.c
> > @@ -173,12 +173,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem *umem)
> >  {
> >  	unsigned int i;
> >  
> > -	for (i = 0; i < umem->npgs; i++) {
> > -		struct page *page = umem->pgs[i];
> > -
> > -		set_page_dirty_lock(page);
> > -		put_page(page);
> > -	}
> > +	put_user_pages_dirty_lock(umem->pgs, umem->npgs);
> >  
> >  	kfree(umem->pgs);
> >  	umem->pgs = NULL;
> > 

^ permalink raw reply

* Re: [PATCH] mm/gup: Use put_user_page*() instead of put_page*()
From: Bharath Vedartham @ 2019-07-15  6:59 UTC (permalink / raw)
  To: Jens Axboe
  Cc: akpm, ira.weiny, jhubbard, Mauro Carvalho Chehab,
	Dimitri Sivanich, Arnd Bergmann, Greg Kroah-Hartman,
	Alex Williamson, Cornelia Huck, Alexander Viro,
	Björn Töpel, Magnus Karlsson, David S. Miller,
	Alexei Starovoitov, Daniel Borkmann, Jakub Kicinski,
	Jesper Dangaard Brouer, John Fastabend, Enrico Weigelt,
	Thomas Gleixner, Alexios Zavras, Dan Carpenter, Max Filippov,
	Matt Sickler, Kirill A. Shutemov, Keith Busch, YueHaibing,
	linux-media, linux-kernel, devel, kvm, linux-block, linux-fsdevel,
	linux-mm, netdev, bpf, xdp-newbies
In-Reply-To: <018ee3d1-e2f0-ca12-9f63-945056c09985@kernel.dk>

On Sun, Jul 14, 2019 at 08:33:57PM -0600, Jens Axboe wrote:
> On 7/14/19 1:08 PM, Bharath Vedartham wrote:
> > diff --git a/fs/io_uring.c b/fs/io_uring.c
> > index 4ef62a4..b4a4549 100644
> > --- a/fs/io_uring.c
> > +++ b/fs/io_uring.c
> > @@ -2694,10 +2694,9 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, void __user *arg,
> >   			 * if we did partial map, or found file backed vmas,
> >   			 * release any pages we did get
> >   			 */
> > -			if (pret > 0) {
> > -				for (j = 0; j < pret; j++)
> > -					put_page(pages[j]);
> > -			}
> > +			if (pret > 0)
> > +				put_user_pages(pages, pret);
> > +
> >   			if (ctx->account_mem)
> >   				io_unaccount_mem(ctx->user, nr_pages);
> >   			kvfree(imu->bvec);
> 
> You handled just the failure case of the buffer registration, but not
> the actual free in io_sqe_buffer_unregister().
> 
> -- 
> Jens Axboe
Yup got it! Thanks! I won't be sending a patch again as fs/io_uring.c
may have larger local changes for put_user_pages.

Thanks

^ permalink raw reply

* Re: [PATCH v2 3/9] rcu/sync: Remove custom check for reader-section
From: Oleg Nesterov @ 2019-07-15  7:26 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Paul E. McKenney, linux-kernel, Alexey Kuznetsov, Bjorn Helgaas,
	Borislav Petkov, c0d1n61at3, David S. Miller, edumazet,
	Greg Kroah-Hartman, Hideaki YOSHIFUJI, H. Peter Anvin,
	Ingo Molnar, Jonathan Corbet, Josh Triplett, keescook,
	kernel-hardening, kernel-team, Lai Jiangshan, Len Brown,
	linux-acpi, linux-doc, linux-pci, linux-pm, Mathieu Desnoyers,
	neilb, netdev, Pavel Machek, peterz, Rafael J. Wysocki,
	Rasmus Villemoes, rcu, Steven Rostedt, Tejun Heo, Thomas Gleixner,
	will, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)
In-Reply-To: <20190713031008.GA248225@google.com>

On 07/12, Joel Fernandes wrote:
>
>  static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
>  {
> -	RCU_LOCKDEP_WARN(!rcu_read_lock_held() &&
> -			 !rcu_read_lock_bh_held() &&
> -			 !rcu_read_lock_sched_held(),
> +	RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),

Yes, this is what I meant.

Sorry for confusion, I should have mentioned that rcu_sync_is_idle()
was recently updated when I suggested to use the new helper.

Acked-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply

* Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock
From: Stefano Garzarella @ 2019-07-15  7:44 UTC (permalink / raw)
  To: Jason Wang; +Cc: Michael S. Tsirkin, Stefan Hajnoczi, virtualization, netdev
In-Reply-To: <a514d8a4-3a12-feeb-4467-af7a9fbf5183@redhat.com>

On Fri, Jul 12, 2019 at 06:14:39PM +0800, Jason Wang wrote:
> 
> On 2019/7/12 下午6:00, Stefano Garzarella wrote:
> > On Thu, Jul 11, 2019 at 03:52:21PM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Jul 11, 2019 at 01:41:34PM +0200, Stefano Garzarella wrote:
> > > > On Thu, Jul 11, 2019 at 03:37:00PM +0800, Jason Wang wrote:
> > > > > On 2019/7/10 下午11:37, Stefano Garzarella wrote:
> > > > > > Hi,
> > > > > > as Jason suggested some months ago, I looked better at the virtio-net driver to
> > > > > > understand if we can reuse some parts also in the virtio-vsock driver, since we
> > > > > > have similar challenges (mergeable buffers, page allocation, small
> > > > > > packets, etc.).
> > > > > > 
> > > > > > Initially, I would add the skbuff in the virtio-vsock in order to re-use
> > > > > > receive_*() functions.
> > > > > 
> > > > > Yes, that will be a good step.
> > > > > 
> > > > Okay, I'll go on this way.
> > > > 
> > > > > > Then I would move receive_[small, big, mergeable]() and
> > > > > > add_recvbuf_[small, big, mergeable]() outside of virtio-net driver, in order to
> > > > > > call them also from virtio-vsock. I need to do some refactoring (e.g. leave the
> > > > > > XDP part on the virtio-net driver), but I think it is feasible.
> > > > > > 
> > > > > > The idea is to create a virtio-skb.[h,c] where put these functions and a new
> > > > > > object where stores some attributes needed (e.g. hdr_len ) and status (e.g.
> > > > > > some fields of struct receive_queue).
> > > > > 
> > > > > My understanding is we could be more ambitious here. Do you see any blocker
> > > > > for reusing virtio-net directly? It's better to reuse not only the functions
> > > > > but also the logic like NAPI to avoid re-inventing something buggy and
> > > > > duplicated.
> > > > > 
> > > > These are my concerns:
> > > > - virtio-vsock is not a "net_device", so a lot of code related to
> > > >    ethtool, net devices (MAC address, MTU, speed, VLAN, XDP, offloading) will be
> > > >    not used by virtio-vsock.
> 
> 
> Linux support device other than ethernet, so it should not be a problem.
> 
> 
> > > > 
> > > > - virtio-vsock has a different header. We can consider it as part of
> > > >    virtio_net payload, but it precludes the compatibility with old hosts. This
> > > >    was one of the major doubts that made me think about using only the
> > > >    send/recv skbuff functions, that it shouldn't break the compatibility.
> 
> 
> We can extend the current vnet header helper for it to work for vsock.

Okay, I'll do it.

> 
> 
> > > > 
> > > > > > This is an idea of virtio-skb.h that
> > > > > > I have in mind:
> > > > > >       struct virtskb;
> > > > > 
> > > > > What fields do you want to store in virtskb? It looks to be exist sk_buff is
> > > > > flexible enough to us?
> > > > My idea is to store queues information, like struct receive_queue or
> > > > struct send_queue, and some device attributes (e.g. hdr_len ).
> 
> 
> If you reuse skb or virtnet_info, there is not necessary.
> 

Okay.

> 
> > > > 
> > > > > 
> > > > > >       struct sk_buff *virtskb_receive_small(struct virtskb *vs, ...);
> > > > > >       struct sk_buff *virtskb_receive_big(struct virtskb *vs, ...);
> > > > > >       struct sk_buff *virtskb_receive_mergeable(struct virtskb *vs, ...);
> > > > > > 
> > > > > >       int virtskb_add_recvbuf_small(struct virtskb*vs, ...);
> > > > > >       int virtskb_add_recvbuf_big(struct virtskb *vs, ...);
> > > > > >       int virtskb_add_recvbuf_mergeable(struct virtskb *vs, ...);
> > > > > > 
> > > > > > For the Guest->Host path it should be easier, so maybe I can add a
> > > > > > "virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a part of the code
> > > > > > of xmit_skb().
> > > > > 
> > > > > I may miss something, but I don't see any thing that prevents us from using
> > > > > xmit_skb() directly.
> > > > > 
> > > > Yes, but my initial idea was to make it more parametric and not related to the
> > > > virtio_net_hdr, so the 'hdr_len' could be a parameter and the
> > > > 'num_buffers' should be handled by the caller.
> > > > 
> > > > > > Let me know if you have in mind better names or if I should put these function
> > > > > > in another place.
> > > > > > 
> > > > > > I would like to leave the control part completely separate, so, for example,
> > > > > > the two drivers will negotiate the features independently and they will call
> > > > > > the right virtskb_receive_*() function based on the negotiation.
> > > > > 
> > > > > If it's one the issue of negotiation, we can simply change the
> > > > > virtnet_probe() to deal with different devices.
> > > > > 
> > > > > 
> > > > > > I already started to work on it, but before to do more steps and send an RFC
> > > > > > patch, I would like to hear your opinion.
> > > > > > Do you think that makes sense?
> > > > > > Do you see any issue or a better solution?
> > > > > 
> > > > > I still think we need to seek a way of adding some codes on virtio-net.c
> > > > > directly if there's no huge different in the processing of TX/RX. That would
> > > > > save us a lot time.
> > > > After the reading of the buffers from the virtqueue I think the process
> > > > is slightly different, because virtio-net will interface with the network
> > > > stack, while virtio-vsock will interface with the vsock-core (socket).
> > > > So the virtio-vsock implements the following:
> > > > - control flow mechanism to avoid to loose packets, informing the peer
> > > >    about the amount of memory available in the receive queue using some
> > > >    fields in the virtio_vsock_hdr
> > > > - de-multiplexing parsing the virtio_vsock_hdr and choosing the right
> > > >    socket depending on the port
> > > > - socket state handling
> 
> 
> I think it's just a branch, for ethernet, go for networking stack. otherwise
> go for vsock core?
> 

Yes, that should work.

So, I should refactor the functions that can be called also from the vsock
core, in order to remove "struct net_device *dev" parameter.
Maybe creating some wrappers for the network stack.

Otherwise I should create a fake net_device for vsock_core.

What do you suggest?

> 
> > > > 
> > > > We can use the virtio-net as transport, but we should add a lot of
> > > > code to skip "net device" stuff when it is used by the virtio-vsock.
> 
> 
> This could be another choice, but consider it was not transparent to the
> admin and require new features, we may seek a transparent solution here.
> 
> 
> > > > This could break something in virtio-net, for this reason, I thought to reuse
> > > > only the send/recv functions starting from the idea to split the virtio-net
> > > > driver in two parts:
> > > > a. one with all stuff related to the network stack
> > > > b. one with the stuff needed to communicate with the host
> > > > 
> > > > And use skbuff to communicate between parts. In this way, virtio-vsock
> > > > can use only the b part.
> > > > 
> > > > Maybe we can do this split in a better way, but I'm not sure it is
> > > > simple.
> > > > 
> > > > Thanks,
> > > > Stefano
> > > Frankly, skb is a huge structure which adds a lot of
> > > overhead. I am not sure that using it is such a great idea
> > > if building a device that does not have to interface
> > > with the networking stack.
> 
> 
> I believe vsock is mainly used for stream performance not for PPS. So the
> impact should be minimal. We can use other metadata, just need branch in
> recv_xxx().
> 

Yes, I think stream performance is the case.

> 
> > Thanks for the advice!
> > 
> > > So I agree with Jason in theory. To clarify, he is basically saying
> > > current implementation is all wrong, it should be a protocol and we
> > > should teach networking stack that there are reliable net devices that
> > > handle just this protocol. We could add a flag in virtio net that
> > > will say it's such a device.
> > > 
> > > Whether it's doable, I don't know, and it's definitely not simple - in
> > > particular you will have to also re-implement existing devices in these
> > > terms, and not just virtio - vmware vsock too.
> 
> 
> Merging vsock protocol to exist networking stack could be a long term goal,
> I believe for the first phase, we can seek to use virtio-net first.
>

Yes, I agree.

> 
> > > 
> > > If you want to do a POC you can add a new address family,
> > > that's easier.
> > Very interesting!
> > I agree with you. In this way we can completely split the protocol
> > logic, from the device.
> > 
> > As you said, it will not simple to do, but can be an opportunity to learn
> > better the Linux networking stack!
> > I'll try to do a PoC with AF_VSOCK2 that will use the virtio-net.
> 
> 
> I suggest to do this step by step:
> 
> 1) use virtio-net but keep some protocol logic
> 
> 2) separate protocol logic and merge it to exist Linux networking stack

Make sense, thanks for the suggestions, I'll try to do these steps!

Thanks,
Stefano

^ permalink raw reply

* Re: [PATCH] ipvs: remove unnecessary space
From: Pablo Neira Ayuso @ 2019-07-15  7:57 UTC (permalink / raw)
  To: yangxingwu
  Cc: wensong, horms, ja, kadlec, fw, davem, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel, joe
In-Reply-To: <20190712130721.7168-1-xingwu.yang@gmail.com>

On Fri, Jul 12, 2019 at 09:07:21PM +0800, yangxingwu wrote:
> this patch removes the extra space and use bitmap_zalloc instead
> 
> Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
> ---
>  net/netfilter/ipvs/ip_vs_mh.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> index 94d9d34..3229867 100644
> --- a/net/netfilter/ipvs/ip_vs_mh.c
> +++ b/net/netfilter/ipvs/ip_vs_mh.c
> @@ -174,8 +174,7 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
>  		return 0;
>  	}
>  
> -	table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> -			 sizeof(unsigned long), GFP_KERNEL);
> +	table = bitmap_zalloc(IP_VS_MH_TAB_SIZE, GFP_KERNEL);

By doing:

        git grep "=  " ...

on the netfilter folders, I see more of these, it would be good if you
fix them at once or, probably, you want to use coccinelle for this.

Thanks.

^ permalink raw reply

* Re: [PATCH 8/8] docs: remove extra conf.py files
From: Mauro Carvalho Chehab @ 2019-07-15  7:57 UTC (permalink / raw)
  To: Markus Heiser
  Cc: Linux Doc Mailing List, Mauro Carvalho Chehab, linux-kernel,
	Jonathan Corbet, Herbert Xu, David S. Miller, David Airlie,
	Daniel Vetter, Maarten Lankhorst, Maxime Ripard, Sean Paul,
	Dmitry Torokhov, Yoshinori Sato, Rich Felker, Jaroslav Kysela,
	Takashi Iwai, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, x86, linux-crypto, dri-devel, linux-input, netdev,
	linux-sh, alsa-devel
In-Reply-To: <e3ff0a8a-6956-3855-07be-9c126df2da2d@darmarit.de>

Em Mon, 15 Jul 2019 08:16:54 +0200
Markus Heiser <markus.heiser@darmarit.de> escreveu:

> Hi Mauro,
> 
> sorry, I havn't tested your patch, but one question ...
> 
> Am 14.07.19 um 17:10 schrieb Mauro Carvalho Chehab:
> > Now that the latex_documents are handled automatically, we can
> > remove those extra conf.py files.  
> 
> We need these conf.py also for compiling books' into HTML.  For this
> the tags.add("subproject") is needed.  Should we realy drop this feature?
> 
> -- Markus --

You're right: adding "subproject" tags is needed for html. Folding this
to patch 7/8 makes both htmldocs and pdfdocs to work with SPHINXDIRS
without the need of a per-subdir conf.py.

Regards,
Mauro

diff --git a/Documentation/sphinx/load_config.py b/Documentation/sphinx/load_config.py
index 75f527ff4c95..e4a04f367b41 100644
--- a/Documentation/sphinx/load_config.py
+++ b/Documentation/sphinx/load_config.py
@@ -51,3 +51,7 @@ def loadConfig(namespace):
             execfile_(config_file, config)
             del config['__file__']
             namespace.update(config)
+        else:
+            config = namespace.copy()
+            config['tags'].add("subproject")
+            namespace.update(config)

Thanks,
Mauro

^ permalink raw reply related

* Re: [PATCH] ipvs: remove unnecessary space
From: Pablo Neira Ayuso @ 2019-07-15  8:26 UTC (permalink / raw)
  To: yangxingwu
  Cc: wensong, horms, ja, kadlec, fw, davem, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel, joe
In-Reply-To: <20190715075703.ak6nk3sbnqksjh72@salvia>

On Mon, Jul 15, 2019 at 09:57:03AM +0200, Pablo Neira Ayuso wrote:
> On Fri, Jul 12, 2019 at 09:07:21PM +0800, yangxingwu wrote:
> > this patch removes the extra space and use bitmap_zalloc instead
> > 
> > Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
> > ---
> >  net/netfilter/ipvs/ip_vs_mh.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> > index 94d9d34..3229867 100644
> > --- a/net/netfilter/ipvs/ip_vs_mh.c
> > +++ b/net/netfilter/ipvs/ip_vs_mh.c
> > @@ -174,8 +174,7 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
> >  		return 0;
> >  	}
> >  
> > -	table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > -			 sizeof(unsigned long), GFP_KERNEL);
> > +	table = bitmap_zalloc(IP_VS_MH_TAB_SIZE, GFP_KERNEL);
> 
> By doing:
> 
>         git grep "=  " ...
> 
> on the netfilter folders, I see more of these, it would be good if you
> fix them at once or, probably, you want to use coccinelle for this.

If patch subject is "remove unnecessary space" then just remove
unnecessary spaces in the patch, otherwise I suggest you rename this
to "ipvs: use bitmap_zalloc()" or such, since the space removal here
is irrelevant.

^ permalink raw reply

* Re: [PATCH] ipvs: remove unnecessary space
From: Pablo Neira Ayuso @ 2019-07-15  8:27 UTC (permalink / raw)
  To: Simon Horman
  Cc: yangxingwu, wensong, ja, kadlec, fw, davem, netdev, lvs-devel,
	netfilter-devel, coreteam, linux-kernel
In-Reply-To: <20190710080609.smxjqe2d5jyro4hv@verge.net.au>

On Wed, Jul 10, 2019 at 10:06:09AM +0200, Simon Horman wrote:
> On Wed, Jul 10, 2019 at 03:45:52PM +0800, yangxingwu wrote:
> > ---
> >  net/netfilter/ipvs/ip_vs_mh.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> > index 94d9d34..98e358e 100644
> > --- a/net/netfilter/ipvs/ip_vs_mh.c
> > +++ b/net/netfilter/ipvs/ip_vs_mh.c
> > @@ -174,8 +174,8 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
> >  		return 0;
> >  	}
> >  
> > -	table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > -			 sizeof(unsigned long), GFP_KERNEL);
> > +	table =	kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > +			sizeof(unsigned long), GFP_KERNEL);

May I ask one thing? :-)

Please, remove all unnecessary spaces in one go, search for:

        git grep "=  "

in the netfilter tree, and send a v2 for this one.

Thanks.

^ permalink raw reply

* [PATCH] e1000e: Make speed detection on hotplugging cable more reliable
From: Kai-Heng Feng @ 2019-07-15  8:43 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: intel-wired-lan, netdev, linux-kernel, Kai-Heng Feng

After hotplugging an 1Gbps ethernet cable with 1Gbps link partner, the
MII_BMSR may reports 10Mbps, renders the network rather slow.

The issue has much lower fail rate after commit 59653e6497d1 ("e1000e:
Make watchdog use delayed work"), which esssentially introduces some
delay before running the watchdog task.

But there's still a chance that the hotplugging event and the queued
watchdog task gets run at the same time, then the original issue can be
observed once again.

So let's use mod_delayed_work() to add a deterministic 1 second delay
before running watchdog task, after an interrupt.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index e4baa13b3cda..c83bf5349d53 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1780,8 +1780,8 @@ static irqreturn_t e1000_intr_msi(int __always_unused irq, void *data)
 		}
 		/* guard against interrupt when we're going down */
 		if (!test_bit(__E1000_DOWN, &adapter->state))
-			queue_delayed_work(adapter->e1000_workqueue,
-					   &adapter->watchdog_task, 1);
+			mod_delayed_work(adapter->e1000_workqueue,
+					 &adapter->watchdog_task, HZ);
 	}
 
 	/* Reset on uncorrectable ECC error */
@@ -1861,8 +1861,8 @@ static irqreturn_t e1000_intr(int __always_unused irq, void *data)
 		}
 		/* guard against interrupt when we're going down */
 		if (!test_bit(__E1000_DOWN, &adapter->state))
-			queue_delayed_work(adapter->e1000_workqueue,
-					   &adapter->watchdog_task, 1);
+			mod_delayed_work(adapter->e1000_workqueue,
+					 &adapter->watchdog_task, HZ);
 	}
 
 	/* Reset on uncorrectable ECC error */
@@ -1907,8 +1907,8 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
 		hw->mac.get_link_status = true;
 		/* guard against interrupt when we're going down */
 		if (!test_bit(__E1000_DOWN, &adapter->state))
-			queue_delayed_work(adapter->e1000_workqueue,
-					   &adapter->watchdog_task, 1);
+			mod_delayed_work(adapter->e1000_workqueue,
+					 &adapter->watchdog_task, HZ);
 	}
 
 	if (!test_bit(__E1000_DOWN, &adapter->state))
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v2 1/2] rt2x00usb: fix rx queue hang
From: Kalle Valo @ 2019-07-15  8:48 UTC (permalink / raw)
  To: Soeren Moch
  Cc: Stanislaw Gruszka, stable, Helmut Schaa, David S. Miller,
	linux-wireless, netdev, linux-kernel
In-Reply-To: <20190701105314.9707-1-smoch@web.de>

Soeren Moch <smoch@web.de> writes:

> Since commit ed194d136769 ("usb: core: remove local_irq_save() around
>  ->complete() handler") the handler rt2x00usb_interrupt_rxdone() is
> not running with interrupts disabled anymore. So this completion handler
> is not guaranteed to run completely before workqueue processing starts
> for the same queue entry.
> Be sure to set all other flags in the entry correctly before marking
> this entry ready for workqueue processing. This way we cannot miss error
> conditions that need to be signalled from the completion handler to the
> worker thread.
> Note that rt2x00usb_work_rxdone() processes all available entries, not
> only such for which queue_work() was called.
>
> This patch is similar to what commit df71c9cfceea ("rt2x00: fix order
> of entry flags modification") did for TX processing.
>
> This fixes a regression on a RT5370 based wifi stick in AP mode, which
> suddenly stopped data transmission after some period of heavy load. Also
> stopping the hanging hostapd resulted in the error message "ieee80211
> phy0: rt2x00queue_flush_queue: Warning - Queue 14 failed to flush".
> Other operation modes are probably affected as well, this just was
> the used testcase.
>
> Fixes: ed194d136769 ("usb: core: remove local_irq_save() around ->complete() handler")
> Cc: stable@vger.kernel.org # 4.20+
> Signed-off-by: Soeren Moch <smoch@web.de>

I'll queue this for v5.3.

-- 
Kalle Valo

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH] e1000e: Make speed detection on hotplugging cable more reliable
From: Paul Menzel @ 2019-07-15  8:52 UTC (permalink / raw)
  To: Kai-Heng Feng; +Cc: Jeff Kirsher, netdev, intel-wired-lan, linux-kernel
In-Reply-To: <20190715084355.9962-1-kai.heng.feng@canonical.com>

[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]

Dear Kai-Heng,


Thank you for the patch.

On 7/15/19 10:43 AM, Kai-Heng Feng wrote:
> After hotplugging an 1Gbps ethernet cable with 1Gbps link partner, the
> MII_BMSR may reports 10Mbps, renders the network rather slow.

s/may reports/may report/
s/renders/rendering/

> The issue has much lower fail rate after commit 59653e6497d1 ("e1000e:
> Make watchdog use delayed work"), which esssentially introduces some

essentially

> delay before running the watchdog task.
> 
> But there's still a chance that the hotplugging event and the queued
> watchdog task gets run at the same time, then the original issue can be
> observed once again.
> 
> So let's use mod_delayed_work() to add a deterministic 1 second delay
> before running watchdog task, after an interrupt.

I am not clear about the effects for the user. Could you elaborate
please? Does the link now come up up to one second later?

> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>

Any bug URL?

> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)


Kind regards,

Paul


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH] e1000e: Make speed detection on hotplugging cable more reliable
From: Kai Heng Feng @ 2019-07-15  9:00 UTC (permalink / raw)
  To: Paul Menzel; +Cc: Jeff Kirsher, netdev, intel-wired-lan, linux-kernel
In-Reply-To: <017771d5-f168-493a-46a1-88e513941ba1@molgen.mpg.de>

at 4:52 PM, Paul Menzel <pmenzel@molgen.mpg.de> wrote:

> Dear Kai-Heng,
>
>
> Thank you for the patch.
>
> On 7/15/19 10:43 AM, Kai-Heng Feng wrote:
>> After hotplugging an 1Gbps ethernet cable with 1Gbps link partner, the
>> MII_BMSR may reports 10Mbps, renders the network rather slow.
>
> s/may reports/may report/
> s/renders/rendering/

Apparently English isn’t my mother tongue ;)

>
>> The issue has much lower fail rate after commit 59653e6497d1 ("e1000e:
>> Make watchdog use delayed work"), which esssentially introduces some
>
> essentially

Ok.

>
>> delay before running the watchdog task.
>>
>> But there's still a chance that the hotplugging event and the queued
>> watchdog task gets run at the same time, then the original issue can be
>> observed once again.
>>
>> So let's use mod_delayed_work() to add a deterministic 1 second delay
>> before running watchdog task, after an interrupt.
>
> I am not clear about the effects for the user. Could you elaborate
> please? Does the link now come up up to one second later?

Yes, the link will be up on a fixed one second later.

The delay varies between 0 to 2 seconds without this patch.

>
>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>
> Any bug URL?

If maintainers think it’s necessary then I’ll file one.

Kai-Heng

>
>> ---
>>  drivers/net/ethernet/intel/e1000e/netdev.c | 12 ++++++------
>>  1 file changed, 6 insertions(+), 6 deletions(-)
>
>
> Kind regards,
>
> Paul



^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH] e1000e: Make speed detection on hotplugging cable more reliable
From: Paul Menzel @ 2019-07-15  9:06 UTC (permalink / raw)
  To: Kai Heng Feng; +Cc: Jeff Kirsher, netdev, intel-wired-lan, linux-kernel
In-Reply-To: <F9859C57-4F6D-4685-B4B6-D1D7DCB50D27@canonical.com>

[-- Attachment #1: Type: text/plain, Size: 1725 bytes --]

Dear Kai Heng,


(with or without hyphen?)

On 7/15/19 11:00 AM, Kai Heng Feng wrote:
> at 4:52 PM, Paul Menzel <pmenzel@molgen.mpg.de> wrote:

>> On 7/15/19 10:43 AM, Kai-Heng Feng wrote:
>>> After hotplugging an 1Gbps ethernet cable with 1Gbps link partner, the
>>> MII_BMSR may reports 10Mbps, renders the network rather slow.
>>
>> s/may reports/may report/
>> s/renders/rendering/
> 
> Apparently English isn’t my mother tongue ;)

No problem. Mine neither.

>>> The issue has much lower fail rate after commit 59653e6497d1 ("e1000e:
>>> Make watchdog use delayed work"), which esssentially introduces some
>>
>> essentially
> 
> Ok.
> 
>>> delay before running the watchdog task.
>>>
>>> But there's still a chance that the hotplugging event and the queued
>>> watchdog task gets run at the same time, then the original issue can be
>>> observed once again.
>>>
>>> So let's use mod_delayed_work() to add a deterministic 1 second delay
>>> before running watchdog task, after an interrupt.
>>
>> I am not clear about the effects for the user. Could you elaborate
>> please? Does the link now come up up to one second later?
> 
> Yes, the link will be up on a fixed one second later.
> 
> The delay varies between 0 to 2 seconds without this patch.

Is there no other fix? Regarding booting a system fast (less than six
seconds), a fixed one second delay is quite a regression on systems where
it worked before.

>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>
>> Any bug URL?
> 
> If maintainers think it’s necessary then I’ll file one.

Not necessary, if there is none. I thought you had one in Launchpad or so.


Kind regards,

Paul


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5174 bytes --]

^ permalink raw reply

* Re: [PATCH v3 20/24] wireless: Remove call to memset after dma_alloc_coherent
From: Kalle Valo @ 2019-07-15  9:06 UTC (permalink / raw)
  To: Fuqian Huang
  Cc: David S . Miller, Arend van Spriel, Franky Lin, Hante Meuleman,
	Chi-Hsien Lin, Wright Feng, Igor Mitsyanko, Avinash Patil,
	Sergey Matyukevich, ath10k, linux-wireless, netdev, linux-kernel,
	brcm80211-dev-list.pdl, brcm80211-dev-list
In-Reply-To: <20190715031941.7120-1-huangfq.daxian@gmail.com>

Fuqian Huang <huangfq.daxian@gmail.com> writes:

> In commit 518a2f1925c3
> ("dma-mapping: zero memory returned from dma_alloc_*"),
> dma_alloc_coherent has already zeroed the memory.
> So memset is not needed.
>
> Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
> ---
> Changes in v3:
>   - Use actual commit rather than the merge commit in the commit message
>
>  drivers/net/wireless/ath/ath10k/ce.c                     | 5 -----
>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c  | 2 --
>  drivers/net/wireless/quantenna/qtnfmac/pcie/pearl_pcie.c | 2 --
>  drivers/net/wireless/quantenna/qtnfmac/pcie/topaz_pcie.c | 2 --
>  4 files changed, 11 deletions(-)

Via which tree is this supposed to go? Can I take this to
wireless-drivers-next?

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH v3 20/24] wireless: Remove call to memset after dma_alloc_coherent
From: Kalle Valo @ 2019-07-15  9:07 UTC (permalink / raw)
  To: Fuqian Huang
  Cc: David S . Miller, Arend van Spriel, Franky Lin, Hante Meuleman,
	Chi-Hsien Lin, Wright Feng, Igor Mitsyanko, Avinash Patil,
	Sergey Matyukevich, ath10k, linux-wireless, netdev, linux-kernel,
	brcm80211-dev-list.pdl, brcm80211-dev-list
In-Reply-To: <20190715031941.7120-1-huangfq.daxian@gmail.com>

Fuqian Huang <huangfq.daxian@gmail.com> writes:

> In commit 518a2f1925c3
> ("dma-mapping: zero memory returned from dma_alloc_*"),
> dma_alloc_coherent has already zeroed the memory.
> So memset is not needed.
>
> Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
> ---
> Changes in v3:
>   - Use actual commit rather than the merge commit in the commit message
>
>  drivers/net/wireless/ath/ath10k/ce.c                     | 5 -----
>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/pcie.c  | 2 --
>  drivers/net/wireless/quantenna/qtnfmac/pcie/pearl_pcie.c | 2 --
>  drivers/net/wireless/quantenna/qtnfmac/pcie/topaz_pcie.c | 2 --
>  4 files changed, 11 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
> index eca87f7c5b6c..294fbc1e89ab 100644
> --- a/drivers/net/wireless/ath/ath10k/ce.c
> +++ b/drivers/net/wireless/ath/ath10k/ce.c
> @@ -1704,9 +1704,6 @@ ath10k_ce_alloc_dest_ring_64(struct ath10k *ar, unsigned int ce_id,
>  	/* Correctly initialize memory to 0 to prevent garbage
>  	 * data crashing system when download firmware
>  	 */
> -	memset(dest_ring->base_addr_owner_space_unaligned, 0,
> -	       nentries * sizeof(struct ce_desc_64) + CE_DESC_RING_ALIGN);

Shouldn't you also remove the comment?

-- 
Kalle Valo

^ permalink raw reply

* [PATCH bpf] samples/bpf: build with -D__TARGET_ARCH_$(SRCARCH)
From: Ilya Leoshkevich @ 2019-07-15  9:11 UTC (permalink / raw)
  To: bpf, netdev; +Cc: gor, heiko.carstens, Ilya Leoshkevich

While $ARCH can be relatively flexible (see Makefile and
tools/scripts/Makefile.arch), $SRCARCH always corresponds to a directory
name under arch/.

Therefore, build samples with -D__TARGET_ARCH_$(SRCARCH), since that
matches the expectations of bpf_helpers.h.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
---
 samples/bpf/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index f90daadfbc89..1d9be26b4edd 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -284,7 +284,7 @@ $(obj)/%.o: $(src)/%.c
 	$(Q)$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
 		-I$(srctree)/tools/testing/selftests/bpf/ \
 		-D__KERNEL__ -D__BPF_TRACING__ -Wno-unused-value -Wno-pointer-sign \
-		-D__TARGET_ARCH_$(ARCH) -Wno-compare-distinct-pointer-types \
+		-D__TARGET_ARCH_$(SRCARCH) -Wno-compare-distinct-pointer-types \
 		-Wno-gnu-variable-sized-type-not-at-end \
 		-Wno-address-of-packed-member -Wno-tautological-compare \
 		-Wno-unknown-warning-option $(CLANG_ARCH_ARGS) \
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH] ethtool: igb: dump RR2DCDELAY register
From: Michal Kubecek @ 2019-07-15  9:13 UTC (permalink / raw)
  To: Artem Bityutskiy; +Cc: John W . Linville, netdev
In-Reply-To: <20190715065228.88377-1-artem.bityutskiy@linux.intel.com>

On Mon, Jul 15, 2019 at 09:52:28AM +0300, Artem Bityutskiy wrote:
> Decode 'RR2DCDELAY' register which Linux kernel provides starting from version
> 5.3. The corresponding commit in the Linux kernel is:
>     cd502a7f7c9c igb: add RR2DCDELAY to ethtool registers dump
> 
> The RR2DCDELAY register is present in I210 and I211 Intel Gigabit Ethernet
> chips and it stands for "Read Request To Data Completion Delay". Here is how
> this register is described in the I210 datasheet:
> 
> "This field captures the maximum PCIe split time in 16 ns units, which is the
> maximum delay between the read request to the first data completion. This is
> giving an estimation of the PCIe round trip time."
> 
> In practice, this register can be used to measure the time it takes the NIC to
> read data from the host memory.
> 
> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> ---
>  igb.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/igb.c b/igb.c
> index e0ccef9..ab0896f 100644
> --- a/igb.c
> +++ b/igb.c
> @@ -859,6 +859,18 @@ igb_dump_regs(struct ethtool_drvinfo *info, struct ethtool_regs *regs)
>  		"0x03430: TDFPC       (Tx data FIFO packet count)      0x%08X\n",
>  		regs_buff[550]);
>  
> +	/*
> +	 * Starting from kernel version 5.3 the registers dump buffer grew from
> +	 * 739 4-byte words to 740 words, and word 740 contains the RR2DCDLAY

nit: "E" missing here:                                                    ^

> +	 * register.
> +	 */
> +	if (regs->len < 740)
> +		return 0;
> +
> +	fprintf(stdout,
> +		"0x05BF4: RR2DCDELAY  (Max. DMA read delay)            0x%08X\n",
> +		regs_buff[739]);

Showing a delay as hex number doesn't seem very user friendly but that
also applies to many existing registers so it's probably better to be
consistent and perhaps do an overall cleanup later.

Michal

^ permalink raw reply

* Re: [RFC] virtio-net: share receive_*() and add_recvbuf_*() with virtio-vsock
From: Jason Wang @ 2019-07-15  9:16 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, Stefan Hajnoczi, virtualization, netdev
In-Reply-To: <20190715074416.a3s2i5ausognotbn@steredhat>


>>>>>>>        struct sk_buff *virtskb_receive_small(struct virtskb *vs, ...);
>>>>>>>        struct sk_buff *virtskb_receive_big(struct virtskb *vs, ...);
>>>>>>>        struct sk_buff *virtskb_receive_mergeable(struct virtskb *vs, ...);
>>>>>>>
>>>>>>>        int virtskb_add_recvbuf_small(struct virtskb*vs, ...);
>>>>>>>        int virtskb_add_recvbuf_big(struct virtskb *vs, ...);
>>>>>>>        int virtskb_add_recvbuf_mergeable(struct virtskb *vs, ...);
>>>>>>>
>>>>>>> For the Guest->Host path it should be easier, so maybe I can add a
>>>>>>> "virtskb_send(struct virtskb *vs, struct sk_buff *skb)" with a part of the code
>>>>>>> of xmit_skb().
>>>>>> I may miss something, but I don't see any thing that prevents us from using
>>>>>> xmit_skb() directly.
>>>>>>
>>>>> Yes, but my initial idea was to make it more parametric and not related to the
>>>>> virtio_net_hdr, so the 'hdr_len' could be a parameter and the
>>>>> 'num_buffers' should be handled by the caller.
>>>>>
>>>>>>> Let me know if you have in mind better names or if I should put these function
>>>>>>> in another place.
>>>>>>>
>>>>>>> I would like to leave the control part completely separate, so, for example,
>>>>>>> the two drivers will negotiate the features independently and they will call
>>>>>>> the right virtskb_receive_*() function based on the negotiation.
>>>>>> If it's one the issue of negotiation, we can simply change the
>>>>>> virtnet_probe() to deal with different devices.
>>>>>>
>>>>>>
>>>>>>> I already started to work on it, but before to do more steps and send an RFC
>>>>>>> patch, I would like to hear your opinion.
>>>>>>> Do you think that makes sense?
>>>>>>> Do you see any issue or a better solution?
>>>>>> I still think we need to seek a way of adding some codes on virtio-net.c
>>>>>> directly if there's no huge different in the processing of TX/RX. That would
>>>>>> save us a lot time.
>>>>> After the reading of the buffers from the virtqueue I think the process
>>>>> is slightly different, because virtio-net will interface with the network
>>>>> stack, while virtio-vsock will interface with the vsock-core (socket).
>>>>> So the virtio-vsock implements the following:
>>>>> - control flow mechanism to avoid to loose packets, informing the peer
>>>>>     about the amount of memory available in the receive queue using some
>>>>>     fields in the virtio_vsock_hdr
>>>>> - de-multiplexing parsing the virtio_vsock_hdr and choosing the right
>>>>>     socket depending on the port
>>>>> - socket state handling
>>
>> I think it's just a branch, for ethernet, go for networking stack. otherwise
>> go for vsock core?
>>
> Yes, that should work.
>
> So, I should refactor the functions that can be called also from the vsock
> core, in order to remove "struct net_device *dev" parameter.
> Maybe creating some wrappers for the network stack.
>
> Otherwise I should create a fake net_device for vsock_core.
>
> What do you suggest?


I'm not quite sure I get the question. Can you just use the one that 
created by virtio_net?


Thanks

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH] e1000e: Make speed detection on hotplugging cable more reliable
From: Kai Heng Feng @ 2019-07-15  9:21 UTC (permalink / raw)
  To: Paul Menzel; +Cc: Jeff Kirsher, netdev, intel-wired-lan, linux-kernel
In-Reply-To: <37a1e2af-64c6-4515-5dcc-6051e1192636@molgen.mpg.de>

at 5:06 PM, Paul Menzel <pmenzel@molgen.mpg.de> wrote:

> Dear Kai Heng,
>
>
> (with or without hyphen?)
>
> On 7/15/19 11:00 AM, Kai Heng Feng wrote:
>> at 4:52 PM, Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
>>> On 7/15/19 10:43 AM, Kai-Heng Feng wrote:
>>>> After hotplugging an 1Gbps ethernet cable with 1Gbps link partner, the
>>>> MII_BMSR may reports 10Mbps, renders the network rather slow.
>>>
>>> s/may reports/may report/
>>> s/renders/rendering/
>>
>> Apparently English isn’t my mother tongue ;)
>
> No problem. Mine neither.
>
>>>> The issue has much lower fail rate after commit 59653e6497d1 ("e1000e:
>>>> Make watchdog use delayed work"), which esssentially introduces some
>>>
>>> essentially
>>
>> Ok.
>>
>>>> delay before running the watchdog task.
>>>>
>>>> But there's still a chance that the hotplugging event and the queued
>>>> watchdog task gets run at the same time, then the original issue can be
>>>> observed once again.
>>>>
>>>> So let's use mod_delayed_work() to add a deterministic 1 second delay
>>>> before running watchdog task, after an interrupt.
>>>
>>> I am not clear about the effects for the user. Could you elaborate
>>> please? Does the link now come up up to one second later?
>>
>> Yes, the link will be up on a fixed one second later.
>>
>> The delay varies between 0 to 2 seconds without this patch.
>
> Is there no other fix? Regarding booting a system fast (less than six
> seconds), a fixed one second delay is quite a regression on systems where
> it worked before.

This only affects when ethernet cable is hot plugged.

Kai-Heng

>
>>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>>
>>> Any bug URL?
>>
>> If maintainers think it’s necessary then I’ll file one.
>
> Not necessary, if there is none. I thought you had one in Launchpad or so.
>
>
> Kind regards,
>
> Paul



^ permalink raw reply

* Re: [RFC bpf-next 0/8] bpf: accelerate insn patching speed
From: Jiong Wang @ 2019-07-15  9:21 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Jiong Wang, Alexei Starovoitov, Daniel Borkmann, Edward Cree,
	Naveen N. Rao, Andrii Nakryiko, Jakub Kicinski, bpf, Networking,
	oss-drivers, Yonghong Song
In-Reply-To: <CAEf4BzaPFbYKUQzu7VoRd7idrqPDMEFF=UEmT2pGf+Lxz06+sA@mail.gmail.com>


Andrii Nakryiko writes:

> On Thu, Jul 11, 2019 at 4:22 AM Jiong Wang <jiong.wang@netronome.com> wrote:
>>
>>
>> Andrii Nakryiko writes:
>>
>> > On Thu, Jul 4, 2019 at 2:31 PM Jiong Wang <jiong.wang@netronome.com> wrote:
>> >>
>> >> This is an RFC based on latest bpf-next about acclerating insn patching
>> >> speed, it is now near the shape of final PATCH set, and we could see the
>> >> changes migrating to list patching would brings, so send out for
>> >> comments. Most of the info are in cover letter. I splitted the code in a
>> >> way to show API migration more easily.
>> >
>> >
>> > Hey Jiong,
>> >
>> >
>> > Sorry, took me a while to get to this and learn more about instruction
>> > patching. Overall this looks good and I think is a good direction.
>> > I'll post high-level feedback here, and some more
>> > implementation-specific ones in corresponding patches.
>>
>> Great, thanks very much for the feedbacks. Most of your feedbacks are
>> hitting those pain points I exactly had ran into. For some of them, I
>> thought similar solutions like yours, but failed due to various
>> reasons. Let's go through them again, I could have missed some important
>> things.
>>
>> Please see my replies below.
>
> Thanks for thoughtful reply :)
>
>>
>> >>
>> >> Test Results
>> >> ===
>> >>   - Full pass on test_verifier/test_prog/test_prog_32 under all three
>> >>     modes (interpreter, JIT, JIT with blinding).
>> >>
>> >>   - Benchmarking shows 10 ~ 15x faster on medium sized prog, and reduce
>> >>     patching time from 5100s (nearly one and a half hour) to less than
>> >>     0.5s for 1M insn patching.
>> >>
>> >> Known Issues
>> >> ===
>> >>   - The following warning is triggered when running scale test which
>> >>     contains 1M insns and patching:
>> >>       warning of mm/page_alloc.c:4639 __alloc_pages_nodemask+0x29e/0x330
>> >>
>> >>     This is caused by existing code, it can be reproduced on bpf-next
>> >>     master with jit blinding enabled, then run scale unit test, it will
>> >>     shown up after half an hour. After this set, patching is very fast, so
>> >>     it shows up quickly.
>> >>
>> >>   - No line info adjustment support when doing insn delete, subprog adj
>> >>     is with bug when doing insn delete as well. Generally, removal of insns
>> >>     could possibly cause remove of entire line or subprog, therefore
>> >>     entries of prog->aux->linfo or env->subprog needs to be deleted. I
>> >>     don't have good idea and clean code for integrating this into the
>> >>     linearization code at the moment, will do more experimenting,
>> >>     appreciate ideas and suggestions on this.
>> >
>> > Is there any specific problem to detect which line info to delete? Or
>> > what am I missing besides careful implementation?
>>
>> Mostly line info and subprog info are range info which covers a range of
>> insns. Deleting insns could causing you adjusting the range or removing one
>> range entirely. subprog info could be fully recalcuated during
>> linearization while line info I need some careful implementation and I
>> failed to have clean code for this during linearization also as said no
>> unit tests to help me understand whether the code is correct or not.
>>
>
> Ok, that's good that it's just about clean implementation. Try to
> implement it as clearly as possible. Then post it here, and if it can
> be improved someone (me?) will try to help to clean it up further.
>
> Not a big expert on line info, so can't comment on that,
> unfortunately. Maybe Yonghong can chime in (cc'ed)
>
>
>> I will described this latter, spent too much time writing the following
>> reply. Might worth an separate discussion thread.
>>
>> >>
>> >>     Insn delete doesn't happen on normal programs, for example Cilium
>> >>     benchmarks, and happens rarely on test_progs, so the test coverage is
>> >>     not good. That's also why this RFC have a full pass on selftest with
>> >>     this known issue.
>> >
>> > I hope you'll add test for deletion (and w/ corresponding line info)
>> > in final patch set :)
>>
>> Will try. Need to spend some time on BTF format.
>> >
>> >>
>> >>   - Could further use mem pool to accelerate the speed, changes are trivial
>> >>     on top of this RFC, and could be 2x extra faster. Not included in this
>> >>     RFC as reducing the algo complexity from quadratic to linear of insn
>> >>     number is the first step.
>> >
>> > Honestly, I think that would add more complexity than necessary, and I
>> > think we can further speed up performance without that, see below.
>> >
>> >>
>> >> Background
>> >> ===
>> >> This RFC aims to accelerate BPF insn patching speed, patching means expand
>> >> one bpf insn at any offset inside bpf prog into a set of new insns, or
>> >> remove insns.
>> >>
>> >> At the moment, insn patching is quadratic of insn number, this is due to
>> >> branch targets of jump insns needs to be adjusted, and the algo used is:
>> >>
>> >>   for insn inside prog
>> >>     patch insn + regeneate bpf prog
>> >>     for insn inside new prog
>> >>       adjust jump target
>> >>
>> >> This is causing significant time spending when a bpf prog requires large
>> >> amount of patching on different insns. Benchmarking shows it could take
>> >> more than half minutes to finish patching when patching number is more
>> >> than 50K, and the time spent could be more than one hour when patching
>> >> number is around 1M.
>> >>
>> >>   15000   :    3s
>> >>   45000   :   29s
>> >>   95000   :  125s
>> >>   195000  :  712s
>> >>   1000000 : 5100s
>> >>
>> >> This RFC introduces new patching infrastructure. Before doing insn
>> >> patching, insns in bpf prog are turned into a singly linked list, insert
>> >> new insns just insert new list node, delete insns just set delete flag.
>> >> And finally, the list is linearized back into array, and branch target
>> >> adjustment is done for all jump insns during linearization. This algo
>> >> brings the time complexity from quadratic to linear of insn number.
>> >>
>> >> Benchmarking shows the new patching infrastructure could be 10 ~ 15x faster
>> >> on medium sized prog, and for a 1M patching it reduce the time from 5100s
>> >> to less than 0.5s.
>> >>
>> >> Patching API
>> >> ===
>> >> Insn patching could happen on two layers inside BPF. One is "core layer"
>> >> where only BPF insns are patched. The other is "verification layer" where
>> >> insns have corresponding aux info as well high level subprog info, so
>> >> insn patching means aux info needs to be patched as well, and subprog info
>> >> needs to be adjusted. BPF prog also has debug info associated, so line info
>> >> should always be updated after insn patching.
>> >>
>> >> So, list creation, destroy, insert, delete is the same for both layer,
>> >> but lineration is different. "verification layer" patching require extra
>> >> work. Therefore the patch APIs are:
>> >>
>> >>    list creation:                bpf_create_list_insn
>> >>    list patch:                   bpf_patch_list_insn
>> >>    list pre-patch:               bpf_prepatch_list_insn
>> >
>> > I think pre-patch name is very confusing, until I read full
>> > description I couldn't understand what it's supposed to be used for.
>> > Speaking of bpf_patch_list_insn, patch is also generic enough to leave
>> > me wondering whether instruction buffer is inserted after instruction,
>> > or instruction is replaced with a bunch of instructions.
>> >
>> > So how about two more specific names:
>> > bpf_patch_list_insn -> bpf_list_insn_replace (meaning replace given
>> > instruction with a list of patch instructions)
>> > bpf_prepatch_list_insn -> bpf_list_insn_prepend (well, I think this
>> > one is pretty clear).
>>
>> My sense on English word is not great, will switch to above which indeed
>> reads more clear.
>>
>> >>    list lineration (core layer): prog = bpf_linearize_list_insn(prog, list)
>> >>    list lineration (veri layer): env = verifier_linearize_list_insn(env, list)
>> >
>> > These two functions are both quite involved, as well as share a lot of
>> > common code. I'd rather have one linearize instruction, that takes env
>> > as an optional parameter. If env is specified (which is the case for
>> > all cases except for constant blinding pass), then adjust aux_data and
>> > subprogs along the way.
>>
>> Two version of lineration and how to unify them was a painpoint to me. I
>> thought to factor out some of the common code out, but it actually doesn't
>> count much, the final size counting + insnsi resize parts are the same,
>> then things start to diverge since the "Copy over insn" loop.
>>
>> verifier layer needs to copy and initialize aux data etc. And jump
>> relocation is different. At core layer, the use case is JIT blinding which
>> could expand an jump_imm insn into a and/or/jump_reg sequence, and the
>
> Sorry, I didn't get what "could expand an jump_imm insn into a
> and/or/jump_reg sequence", maybe you can clarify if I'm missing
> something.
>
> But from your cover letter description, core layer has no jumps at
> all, while verifier has jumps inside patch buffer. So, if you support
> jumps inside of patch buffer, it will automatically work for core
> layer. Or what am I missing?

I meant in core layer (JIT blinding), there is the following patching:

input:
  insn 0             insn 0
  insn 1             insn 1
  jmp_imm   >>       mov_imm  \
  insn 2             xor_imm    insn seq expanded from jmp_imm
  insn 3             jmp_reg  /
                     insn 2
                     insn 3


jmp_imm is the insn that will be patched, and the actually transformation
is to expand it into mov_imm/xor_imm/jmp_reg sequence. "jmp_reg", sitting
at the end of the patch buffer, must jump to the same destination as the
original jmp_imm, so "jmp_reg" is an insn inside patch buffer but should
be relocated, and the jump destination is outside of patch buffer.

This means for core layer (jit blinding), it needs to take care of insn
inside patch buffer.
  
> Just compared two version of linearize side by side. From what I can
> see, unified version could look like this, high-level:
>
> 1. Count final insn count (but see my other suggestions how to avoid
> that altogether). If not changed - exit.
> 2. Realloc insn buffer, copy just instructions (not aux_data yet).
> Build idx_map, if necessary.
> 3. (if env) then bpf_patchable_insn has aux_data, so now do another
> pass and copy it into resulting array.
> 4. (if env) Copy sub info. Though I'd see if we can just reuse old
> ones and just adjust offsets. I'm not sure why we need to allocate new
> array, subprogram count shouldn't change, right?

If there is no dead insn elimination opt, then we could just adjust
offsets. When there is insn deleting, I feel the logic becomes more
complex. One subprog could be completely deleted or partially deleted, so
I feel just recalculate the whole subprog info as a side-product is
much simpler.

> 5. (common) Relocate jumps. Not clear why core layer doesn't care
> about PATCHED (or, alternatively, why verifier layer cares).

See above, in this RFC, core layer care PATCHED during relocating jumps,
and verifier layer doesn't.

> And again, with targets pointer it will look totally different (and
> simpler).

Yes, will see how the code looks.

> 6. (if env) adjust subprogs
> 7. (common) Adjust prog's line info.
>
> The devil is in the details, but I think this will still be better if
> contained in one function if a bunch of `if (env)` checks. Still
> pretty linear.
>
>> jump_reg is at the end of the patch buffer, it should be relocated. While
>> all use case in verifier layer, no jump in the prog will be patched and all
>> new jumps in patch buffer will jump inside the buffer locally so no need to
>> resolve.
>>
>> And yes we could unify them into one and control the diverge using
>> argument, but then where to place the function is an issue. My
>> understanding is verifier.c is designed to be on top of core.c and core.c
>> should not reference and no need to be aware of any verifier specific data
>> structures, for example env or bpf_aux_insn_data etc.
>
> Func prototype where it is. Maybe forward-declare verifier env struct.
> Implementation in verifier.c?
>
>>
>> So, in this RFC, I had choosed to write separate linerization function for
>> core and verifier layer. Does this make sense?
>
> See above. Let's still try to make it better.
>
>>
>> >
>> > This would keep logic less duplicated and shouldn't complexity beyond
>> > few null checks in few places.
>> >
>> >>    list destroy:                 bpf_destroy_list_insn
>> >>
>> >
>> > I'd also add a macro foreach_list_insn instead of explicit for loops
>> > in multiple places. That would also allow to skip deleted instructions
>> > transparently.
>> >
>> >> list patch could change the insn at patch point, it will invalid the aux
>> >
>> > typo: invalid -> invalidate
>>
>> Ack.
>>
>> >
>> >> info at patching point. list pre-patch insert new insns before patch point
>> >> where the insn and associated aux info are not touched, it is used for
>> >> example in convert_ctx_access when generating prologue.
>> >>
>> >> Typical API sequence for one patching pass:
>> >>
>> >>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
>> >>    for (elem = list; elem; elem = elem->next)
>> >>       patch_buf = gen_patch_buf_logic;
>> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
>> >>    bpf_prog = bpf_linearize_list_insn(list)
>> >>    bpf_destroy_list_insn(list)
>> >>
>> >> Several patching passes could also share the same list:
>> >>
>> >>    struct bpf_list_insn list = bpf_create_list_insn(struct bpf_prog);
>> >>    for (elem = list; elem; elem = elem->next)
>> >>       patch_buf = gen_patch_buf_logic1;
>> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
>> >>    for (elem = list; elem; elem = elem->next)
>> >>       patch_buf = gen_patch_buf_logic2;
>> >>       elem = bpf_patch_list_insn(elem, patch_buf, cnt);
>> >>    bpf_prog = bpf_linearize_list_insn(list)
>> >>    bpf_destroy_list_insn(list)
>> >>
>> >> but note new inserted insns int early passes won't have aux info except
>> >> zext info. So, if one patch pass requires all aux info updated and
>> >> recalculated for all insns including those pathced, it should first
>> >> linearize the old list, then re-create the list. The RFC always create and
>> >> linearize the list for each migrated patching pass separately.
>> >
>> > I think we should do just one list creation, few passes of patching
>> > and then linearize once. That will save quite a lot of memory
>> > allocation and will speed up a lot of things. All the verifier
>> > patching happens one after the other without any other functionality
>> > in between, so there shouldn't be any problem.
>>
>> Yes, as mentioned above, it is possible and I had tried to do it in an very
>> initial impl. IIRC convert_ctx_access + fixup_bpf_calls could share the
>> same list, but then the 32-bit zero extension insertion pass requires
>> aux.zext_dst set properly for all instructions including those patched
>
> So zext_dst. Seems like it's easily calculatable, so doesn't seem like
> it even needs to be accessed from aux_data.
>
> But. I can see at least two ways to do this:
> 1. those patching passes that care about aux_data, should just do
> extra check for NULL. Because when we adjust insns now, we just leave
> zero-initialized aux_data, except for zext_dst and seen. So it's easy
> to default to them if aux_data is NULL for patchable_insn.
> 2. just allocate and fill them out them when applying patch insns
> buffer. It's not a duplication, we already fill them out during
> patching today. So just do the same, except through malloc()'ed
> pointer instead. At the end they will be copied into linear resulting
> array during linearization (uniformly with non-patched insns).
>
>> one which we need to linearize the list first (as we set zext_dst during
>> linerization), or the other choice is we do the zext_dst initialization
>> during bpf_patch_list_insn, but this then make bpf_patch_list_insn diverge
>> between core and verifier layer.
>
> List construction is much simpler, even if we have to have extra
> check, similar to `if (env) { do_extra(); }`, IMO, it's fine.
>
>>
>> > As for aux_data. We can solve that even more simply and reliably by
>> > storing a pointer along the struct bpf_list_insn
>>
>> This is exactly what I had implemented initially, but then the issue is how
>> to handle aux_data for patched insn? IIRC I was leave it as a NULL pointer,
>> but later found zext_dst info is required for all insns, so I end up
>> duplicating zext_dst in bpf_list_insn.
>
> See above. No duplication. You have a pointer. Whether aux_data is in
> original array or was malloc()'ed, doesn't matter. But no duplication
> of fields.
>
>>
>> This leads me worrying we need to keep duplicating fields there as soon as
>> there is new similar requirements in future patching pass and I thought it
>> might be better to just reference the aux_data inside env using orig_idx,
>> this avoids duplicating information, but we need to make sure used fields
>> inside aux_data for patched insn update-to-date during linearization or
>> patching list.
>>
>> > (btw, how about calling it bpf_patchable_insn?).
>>
>> No preference, will use this one.
>>
>> > Here's how I propose to represent this patchable instruction:
>> >
>> > struct bpf_list_insn {
>> >        struct bpf_insn insn;
>> >        struct bpf_list_insn *next;
>> >        struct bpf_list_insn *target;
>> >        struct bpf_insn_aux_data *aux_data;
>> >        s32 orig_idx; // can repurpose this to have three meanings:
>> >                      // -2 - deleted
>> >                      // -1 - patched/inserted insn
>> >                      // >=0 - original idx
>>
>> I actually had experimented the -2/-1/0 trick, exactly the same number
>> assignment :) IIRC the code was not clear compared with using flag, the
>> reason seems to be:
>>   1. we still need orig_idx of an patched insn somehow, meaning negate the
>>      index.
>
> Not following, original index with be >=0, no?
>
>>   2. somehow somecode need to know whether one insn is deleted or patched
>>      after the negation, so I end up with some ugly code.
>
> So that's why you'll have constants with descriptive name for -2 and -1.
>
>>
>> Anyway, I might had not thought hard enough on this, I will retry using the
>> special index instead of flag, hopefully I could have clean code this time.
>>
>
> Yeah, please try again. All those `orig_idx = insn->orig_idx - 1; if
> (orig_idx >= 0) { ... }` are very confusing.
>
>> > };
>> >
>> > The idea would be as follows:
>> > 1. when creating original list, target pointer will point directly to
>> > a patchable instruction wrapper for jumps/calls. This will allow to
>> > stop tracking and re-calculating jump offsets and instruction indicies
>> > until linearization.
>>
>> Not sure I have followed the idea of "target" pointer. At the moment we are
>> using index mapping array (generated as by-product during coping insn).
>>
>> While the "target" pointer means to during list initialization, each jump
>> insn will have target initialized to the list node of the converted jump
>> destination insn, and all those non-jump insns are with NULL? Then during
>> linearization you assign index to each list node (could be done as
>> by-product of other pass) before insn coping which could then relocate the
>> insn during the coping as the "target" would have final index calculated?
>> Am I following correctly?
>
> Yes, I think you are understanding correctly what I'm saying. For
> implementation, you can do it in few ways, through few passes or with
> some additional data, is less important. See what's cleanest.
>
>>
>> > 2. aux_data is also filled at that point. Later at linearization time
>> > you'd just iterate over all the instructions in final order and copy
>> > original aux_data, if it's present. And then just repace env's
>> > aux_data array at the end, should be very simple and fast.
>>
>> As explained, I am worried making aux_data a pointer will causing
>> duplicating some fields into list_insn if the fields are required for
>> patched insns.
>
> Addressed above, I don't think there will be any duplication, because
> we pass aux_data by pointer.
>
>>
>> > 3. during fix_bpf_calls, zext, ctx rewrite passes, we'll reuse the
>> > same list of instructions and those passes will just keep inserting
>> > instruction buffers. Given we have restriction that all the jumps are
>> > only within patch buffer, it will be trivial to construct proper
>> > patchable instruction wrappers for newly added instructions, with NULL
>> > for aux_data and possibly non-NULL target (if it's a JMP insn).
>> > 4. After those passes, linearize, adjust subprogs (for this you'll
>> > probably still need to create index mapping, right?), copy or create
>> > new aux_data.
>> > 5. Done.
>> >
>> > What do you think? I think this should be overall simpler and faster.
>> > But let me know if I'm missing something.
>>
>> Thanks for all these thoughts, they are very good suggestions and reminds
>> me to revisit some points I had forgotten. I will do the following things:
>>
>>   1. retry the negative index solution to eliminate flag if the result code
>>      could be clean.
>>   2. the "target" pointer seems make sense, it makes list_insn bigger but
>>      normally space trade with time, so I will try to implement it to see
>>      how the code looks like.
>>   3. I still have concerns on making aux_data as pointer. Mostly due to
>>      patched insn will have NULL pointer and in case aux info of patched
>>      insn is required, we need to duplicate info inside list_insn. For
>>      example 32-bit zext opt requires zext_dst.
>>
>
>
> So one more thing I wanted to suggest. I'll try to keep high-level
> suggestions here.
>
> What about having a wrapper for patchable_insn list, where you can
> store some additional data, like final count and whatever else. It
> will eliminate some passes (counting) and will make list handling
> easier (because you can have a dummy head pointer, so no special
> handling of first element

Will try it.

> you had this concern in patch #1, I
> believe). But it will be clear if it's beneficial once implemented.

>> Regards,
>> Jiong
>>
>> >>
>> >> Compared with old patching code, this new infrastructure has much less core
>> >> code, even though the final code has a couple of extra lines but that is
>> >> mostly due to for list based infrastructure, we need to do more error
>> >> checks, so the list and associated aux data structure could be freed when
>> >> errors happens.
>> >>
>> >> Patching Restrictions
>> >> ===
>> >>   - For core layer, the linearization assume no new jumps inside patch buf.
>> >>     Currently, the only user of this layer is jit blinding.
>> >>   - For verifier layer, there could be new jumps inside patch buf, but
>> >>     they should have branch target resolved themselves, meaning new jumps
>> >>     doesn't jump to insns out of the patch buf. This is the case for all
>> >>     existing verifier layer users.
>> >>   - bpf_insn_aux_data for all patched insns including the one at patch
>> >>     point are invalidated, only 32-bit zext info will be recalcuated.
>> >>     If the aux data of insn at patch point needs to be retained, it is
>> >>     purely insn insertion, so need to use the pre-patch API.
>> >>
>> >> I plan to send out a PATCH set once I finished insn deletion line info adj
>> >> support, please have a looks at this RFC, and appreciate feedbacks.
>> >>
>> >> Jiong Wang (8):
>> >>   bpf: introducing list based insn patching infra to core layer
>> >>   bpf: extend list based insn patching infra to verification layer
>> >>   bpf: migrate jit blinding to list patching infra
>> >>   bpf: migrate convert_ctx_accesses to list patching infra
>> >>   bpf: migrate fixup_bpf_calls to list patching infra
>> >>   bpf: migrate zero extension opt to list patching infra
>> >>   bpf: migrate insn remove to list patching infra
>> >>   bpf: delete all those code around old insn patching infrastructure
>> >>
>> >>  include/linux/bpf_verifier.h |   1 -
>> >>  include/linux/filter.h       |  27 +-
>> >>  kernel/bpf/core.c            | 431 +++++++++++++++++-----------
>> >>  kernel/bpf/verifier.c        | 649 +++++++++++++++++++------------------------
>> >>  4 files changed, 580 insertions(+), 528 deletions(-)
>> >>
>> >> --
>> >> 2.7.4
>> >>
>>


^ permalink raw reply

* Re: [PATCH 16/30] net/wireless: Use kmemdup rather than duplicating its implementation
From: Kalle Valo @ 2019-07-15  9:32 UTC (permalink / raw)
  To: Fuqian Huang
  Cc: David S . Miller, Solomon Peachy, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <20190703131614.25408-1-huangfq.daxian@gmail.com>

Fuqian Huang <huangfq.daxian@gmail.com> writes:

> kmemdup is introduced to duplicate a region of memory in a neat way.
> Rather than kmalloc/kzalloc + memset, which the programmer needs to
> write the size twice (sometimes lead to mistakes), kmemdup improves
> readability, leads to smaller code and also reduce the chances of mistakes.
> Suggestion to use kmemdup rather than using kmalloc/kzalloc + memset.
>
> Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>

I assume I can take this to wireless-drivers-next. If not, please
holler.

Fuqian, it would help the maintainers a lot if you could clearly
indicate to which tree the patches are planned to be commited. If I just
see one patch from a 30 patch set I have no clue what is your plan,
either is someone else going to apply the full patchset or the
maintainers should pick respective patches individually.

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH v2 17/35] net/wireless: Use kmemdup rather than duplicating its implementation
From: Kalle Valo @ 2019-07-15  9:33 UTC (permalink / raw)
  To: Fuqian Huang
  Cc: David S . Miller, Solomon Peachy, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <20190703162934.32645-1-huangfq.daxian@gmail.com>

Fuqian Huang <huangfq.daxian@gmail.com> writes:

> kmemdup is introduced to duplicate a region of memory in a neat way.
> Rather than kmalloc/kzalloc + memcpy, which the programmer needs to
> write the size twice (sometimes lead to mistakes), kmemdup improves
> readability, leads to smaller code and also reduce the chances of mistakes.
> Suggestion to use kmemdup rather than using kmalloc/kzalloc + memcpy.
>
> Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>

I'm planning to take this.

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH] wl3501_cs: remove redundant variable ret
From: Kalle Valo @ 2019-07-15  9:40 UTC (permalink / raw)
  To: Colin King
  Cc: David S . Miller, linux-wireless, netdev, kernel-janitors,
	linux-kernel
In-Reply-To: <20190705103732.30568-1-colin.king@canonical.com>

Colin King <colin.king@canonical.com> writes:

> From: Colin Ian King <colin.king@canonical.com>
>
> The variable ret is being initialized with a value that is never
> read and it is being updated later with a new value that is returned.
> The variable is redundant and can be replaced with a return 0 as
> there are no other return points in this function.
>
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  drivers/net/wireless/wl3501_cs.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/wl3501_cs.c b/drivers/net/wireless/wl3501_cs.c
> index a25b17932edb..007bf6803293 100644
> --- a/drivers/net/wireless/wl3501_cs.c
> +++ b/drivers/net/wireless/wl3501_cs.c
> @@ -1226,7 +1226,6 @@ static int wl3501_init_firmware(struct wl3501_card *this)
>  static int wl3501_close(struct net_device *dev)
>  {
>  	struct wl3501_card *this = netdev_priv(dev);
> -	int rc = -ENODEV;

I'll manually fix the commit log with:

s/variable ret/variable rc/

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH] ath10k: work around uninitialized vht_pfr variable
From: Kalle Valo @ 2019-07-15  9:44 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Miaoqing Pan, David S. Miller, Rakesh Pillai, Brian Norris,
	Balaji Pothunoori, Wen Gong, Pradeep kumar Chitrapu, Sriram R,
	ath10k, linux-wireless, netdev, linux-kernel, clang-built-linux
In-Reply-To: <20190708125050.3689133-1-arnd@arndb.de>

Arnd Bergmann <arnd@arndb.de> writes:

> As clang points out, the vht_pfr is assigned to a struct member
> without being initialized in one case:
>
> drivers/net/wireless/ath/ath10k/mac.c:7528:7: error: variable 'vht_pfr' is used uninitialized whenever 'if' condition
>       is false [-Werror,-Wsometimes-uninitialized]
>                 if (!ath10k_mac_can_set_bitrate_mask(ar, band, mask,
>                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> drivers/net/wireless/ath/ath10k/mac.c:7551:20: note: uninitialized use occurs here
>                 arvif->vht_pfr = vht_pfr;
>                                  ^~~~~~~
> drivers/net/wireless/ath/ath10k/mac.c:7528:3: note: remove the 'if' if its condition is always true
>                 if (!ath10k_mac_can_set_bitrate_mask(ar, band, mask,
>                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> drivers/net/wireless/ath/ath10k/mac.c:7483:12: note: initialize the variable 'vht_pfr' to silence this warning
>         u8 vht_pfr;
>
> Add an explicit but probably incorrect initialization here.
> I suspect we want a better fix here, but chose this approach to
> illustrate the issue.
>
> Fixes: 8b97b055dc9d ("ath10k: fix failure to set multiple fixed rate")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

I'll queue this for v5.3.

-- 
Kalle Valo

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox