* Re: [PATCH v7 00/10] x86: macrofying inline asm for better compilation
From: Ingo Molnar @ 2018-09-10 13:00 UTC (permalink / raw)
To: Nadav Amit
Cc: Kate Stewart, Peter Zijlstra, Christopher Li,
virtualization@lists.linux-foundation.org, Max Filippov,
Jan Beulich, H. Peter Anvin, Sam Ravnborg, Alok Kataria, X86 ML,
linux-sparse@vger.kernel.org, Ingo Molnar,
linux-xtensa@linux-xtensa.org, Kees Cook, Josh Poimboeuf,
Thomas Gleixner, Juergen Gross, Chris Zankel, Masahiro Yamada,
Greg Kroah-Hartman, LKML, Ph
In-Reply-To: <4EDCB9E5-22C4-4168-A80A-4EC81ECE43EF@vmware.com>
* Nadav Amit <namit@vmware.com> wrote:
> Ping.
Masahiro Yamada noted that some Reviewed-by tags were not added - could you please double check
past mails and add them and re-send against the latest kernel?
Thanks,
Ingo
^ permalink raw reply
* Re: [PATCH] qxl: refactor to use drm_fb_helper_fbdev_setup
From: kbuild test robot @ 2018-09-11 2:22 UTC (permalink / raw)
To: Peter Wu; +Cc: linux-kernel, dri-devel, virtualization, kbuild-all, Dave Airlie
In-Reply-To: <20180910132156.23201-1-peter@lekensteyn.nl>
Hi Peter,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on linus/master]
[also build test WARNING on v4.19-rc3 next-20180910]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Peter-Wu/qxl-refactor-to-use-drm_fb_helper_fbdev_setup/20180911-071413
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__
sparse warnings: (new ones prefixed by >>)
drivers/gpu/drm/qxl/qxl_drv.c:144:9: sparse: undefined identifier 'qxl_fbdev_set_suspend'
drivers/gpu/drm/qxl/qxl_drv.c:181:9: sparse: undefined identifier 'qxl_fbdev_set_suspend'
>> drivers/gpu/drm/qxl/qxl_drv.c:144:30: sparse: call with no type!
drivers/gpu/drm/qxl/qxl_drv.c:181:30: sparse: call with no type!
drivers/gpu/drm/qxl/qxl_drv.c: In function 'qxl_drm_freeze':
drivers/gpu/drm/qxl/qxl_drv.c:144:2: error: implicit declaration of function 'qxl_fbdev_set_suspend'; did you mean 'fb_set_suspend'? [-Werror=implicit-function-declaration]
qxl_fbdev_set_suspend(qdev, 1);
^~~~~~~~~~~~~~~~~~~~~
fb_set_suspend
cc1: some warnings being treated as errors
--
>> drivers/gpu/drm/qxl/qxl_fb.c:166:21: sparse: incorrect type in assignment (different address spaces) @@ expected char const *data @@ got char [noderchar const *data @@
drivers/gpu/drm/qxl/qxl_fb.c:166:21: expected char const *data
drivers/gpu/drm/qxl/qxl_fb.c:166:21: got char [noderef] <asn:2>*
include/linux/overflow.h:251:13: sparse: undefined identifier '__builtin_mul_overflow'
include/linux/overflow.h:251:13: sparse: incorrect type in conditional
include/linux/overflow.h:251:13: got void
include/linux/overflow.h:251:13: sparse: call with no type!
vim +166 drivers/gpu/drm/qxl/qxl_fb.c
130
131 /*
132 * FIXME
133 * It should not be necessary to have a special dirty() callback for fbdev.
134 */
135 static int qxlfb_framebuffer_dirty(struct drm_framebuffer *fb,
136 struct drm_file *file_priv,
137 unsigned flags, unsigned color,
138 struct drm_clip_rect *clips,
139 unsigned num_clips)
140 {
141 struct qxl_device *qdev = fb->dev->dev_private;
142 struct fb_info *info = qdev->fb_helper.fbdev;
143 struct qxl_fb_image qxl_fb_image;
144 struct fb_image *image = &qxl_fb_image.fb_image;
145
146 /* TODO: hard coding 32 bpp */
147 int stride = fb->pitches[0];
148
149 /*
150 * we are using a shadow draw buffer, at qdev->surface0_shadow
151 */
152 image->dx = clips->x1;
153 image->dy = clips->y1;
154 image->width = clips->x2 - clips->x1;
155 image->height = clips->y2 - clips->y1;
156 image->fg_color = 0xffffffff; /* unused, just to avoid uninitialized
157 warnings */
158 image->bg_color = 0;
159 image->depth = 32; /* TODO: take from somewhere? */
160 image->cmap.start = 0;
161 image->cmap.len = 0;
162 image->cmap.red = NULL;
163 image->cmap.green = NULL;
164 image->cmap.blue = NULL;
165 image->cmap.transp = NULL;
> 166 image->data = info->screen_base + (clips->x1 * 4) + (stride * clips->y1);
167
168 qxl_fb_image_init(&qxl_fb_image, qdev, info, NULL);
169 qxl_draw_opaque_fb(&qxl_fb_image, stride);
170
171 return 0;
172 }
173
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH net-next v2 0/5] virtio: support packed ring
From: Tiwei Bie @ 2018-09-11 5:37 UTC (permalink / raw)
To: Jason Wang
Cc: virtio-dev, Michael S. Tsirkin, netdev, linux-kernel,
virtualization, wexu
In-Reply-To: <f5ef0393-cf02-0795-a51a-8783ee300037@redhat.com>
On Mon, Sep 10, 2018 at 11:33:17AM +0800, Jason Wang wrote:
> On 2018年09月10日 11:00, Tiwei Bie wrote:
> > On Fri, Sep 07, 2018 at 09:00:49AM -0400, Michael S. Tsirkin wrote:
> > > On Fri, Sep 07, 2018 at 09:22:25AM +0800, Tiwei Bie wrote:
> > > > On Mon, Aug 27, 2018 at 05:00:40PM +0300, Michael S. Tsirkin wrote:
> > > > > Are there still plans to test the performance with vost pmd?
> > > > > vhost doesn't seem to show a performance gain ...
> > > > >
> > > > I tried some performance tests with vhost PMD. In guest, the
> > > > XDP program will return XDP_DROP directly. And in host, testpmd
> > > > will do txonly fwd.
> > > >
> > > > When burst size is 1 and packet size is 64 in testpmd and
> > > > testpmd needs to iterate 5 Tx queues (but only the first two
> > > > queues are enabled) to prepare and inject packets, I got ~12%
> > > > performance boost (5.7Mpps -> 6.4Mpps). And if the vhost PMD
> > > > is faster (e.g. just need to iterate the first two queues to
> > > > prepare and inject packets), then I got similar performance
> > > > for both rings (~9.9Mpps) (packed ring's performance can be
> > > > lower, because it's more complicated in driver.)
> > > >
> > > > I think packed ring makes vhost PMD faster, but it doesn't make
> > > > the driver faster. In packed ring, the ring is simplified, and
> > > > the handling of the ring in vhost (device) is also simplified,
> > > > but things are not simplified in driver, e.g. although there is
> > > > no desc table in the virtqueue anymore, driver still needs to
> > > > maintain a private desc state table (which is still managed as
> > > > a list in this patch set) to support the out-of-order desc
> > > > processing in vhost (device).
> > > >
> > > > I think this patch set is mainly to make the driver have a full
> > > > functional support for the packed ring, which makes it possible
> > > > to leverage the packed ring feature in vhost (device). But I'm
> > > > not sure whether there is any other better idea, I'd like to
> > > > hear your thoughts. Thanks!
> > > Just this: Jens seems to report a nice gain with virtio and
> > > vhost pmd across the board. Try to compare virtio and
> > > virtio pmd to see what does pmd do better?
> > The virtio PMD (drivers/net/virtio) in DPDK doesn't need to share
> > the virtio ring operation code with other drivers and is highly
> > optimized for network. E.g. in Rx, the Rx burst function won't
> > chain descs. So the ID management for the Rx ring can be quite
> > simple and straightforward, we just need to initialize these IDs
> > when initializing the ring and don't need to change these IDs
> > in data path anymore (the mergable Rx code in that patch set
> > assumes the descs will be written back in order, which should be
> > fixed. I.e., the ID in the desc should be used to index vq->descx[]).
> > The Tx code in that patch set also assumes the descs will be
> > written back by device in order, which should be fixed.
>
> Yes it is. I think I've pointed it out in some early version of pmd patch.
> So I suspect part (or all) of the boost may come from in order feature.
>
> >
> > But in kernel virtio driver, the virtio_ring.c is very generic.
> > The enqueue (virtqueue_add()) and dequeue (virtqueue_get_buf_ctx())
> > functions need to support all the virtio devices and should be
> > able to handle all the possible cases that may happen. So although
> > the packed ring can be very efficient in some cases, currently
> > the room to optimize the performance in kernel's virtio_ring.c
> > isn't that much. If we want to take the fully advantage of the
> > packed ring's efficiency, we need some further e.g. API changes
> > in virtio_ring.c, which shouldn't be part of this patch set.
>
> Could you please share more thoughts on this e.g how to improve the API?
> Notice since the API is shared by both split ring and packed ring, it may
> improve the performance of split ring as well. One can easily imagine a
> batching API, but it does not have many real users now, the only case is the
> XDP transmission which can accept an array of XDP frames.
I don't have detailed thoughts on this yet. But kernel's
virtio_ring.c is quite generic compared with what we did
in virtio PMD.
>
> > So
> > I still think this patch set is mainly to make the kernel virtio
> > driver to have a full functional support of the packed ring, and
> > we can't expect impressive performance boost with it.
>
> We can only gain when virtio ring layout is the bottleneck. If there're
> bottlenecks elsewhere, we probably won't see any increasing in the numbers.
> Vhost-net is an example, and lots of optimizations have proved that virtio
> ring is not the main bottleneck for the current codes. I suspect it also the
> case of virtio driver. Did perf tell us any interesting things in virtio
> driver?
>
> Thanks
>
> >
> > >
> > > > > On Wed, Jul 11, 2018 at 10:27:06AM +0800, Tiwei Bie wrote:
> > > > > > Hello everyone,
> > > > > >
> > > > > > This patch set implements packed ring support in virtio driver.
> > > > > >
> > > > > > Some functional tests have been done with Jason's
> > > > > > packed ring implementation in vhost:
> > > > > >
> > > > > > https://lkml.org/lkml/2018/7/3/33
> > > > > >
> > > > > > Both of ping and netperf worked as expected.
> > > > > >
> > > > > > v1 -> v2:
> > > > > > - Use READ_ONCE() to read event off_wrap and flags together (Jason);
> > > > > > - Add comments related to ccw (Jason);
> > > > > >
> > > > > > RFC (v6) -> v1:
> > > > > > - Avoid extra virtio_wmb() in virtqueue_enable_cb_delayed_packed()
> > > > > > when event idx is off (Jason);
> > > > > > - Fix bufs calculation in virtqueue_enable_cb_delayed_packed() (Jason);
> > > > > > - Test the state of the desc at used_idx instead of last_used_idx
> > > > > > in virtqueue_enable_cb_delayed_packed() (Jason);
> > > > > > - Save wrap counter (as part of queue state) in the return value
> > > > > > of virtqueue_enable_cb_prepare_packed();
> > > > > > - Refine the packed ring definitions in uapi;
> > > > > > - Rebase on the net-next tree;
> > > > > >
> > > > > > RFC v5 -> RFC v6:
> > > > > > - Avoid tracking addr/len/flags when DMA API isn't used (MST/Jason);
> > > > > > - Define wrap counter as bool (Jason);
> > > > > > - Use ALIGN() in vring_init_packed() (Jason);
> > > > > > - Avoid using pointer to track `next` in detach_buf_packed() (Jason);
> > > > > > - Add comments for barriers (Jason);
> > > > > > - Don't enable RING_PACKED on ccw for now (noticed by Jason);
> > > > > > - Refine the memory barrier in virtqueue_poll();
> > > > > > - Add a missing memory barrier in virtqueue_enable_cb_delayed_packed();
> > > > > > - Remove the hacks in virtqueue_enable_cb_prepare_packed();
> > > > > >
> > > > > > RFC v4 -> RFC v5:
> > > > > > - Save DMA addr, etc in desc state (Jason);
> > > > > > - Track used wrap counter;
> > > > > >
> > > > > > RFC v3 -> RFC v4:
> > > > > > - Make ID allocation support out-of-order (Jason);
> > > > > > - Various fixes for EVENT_IDX support;
> > > > > >
> > > > > > RFC v2 -> RFC v3:
> > > > > > - Split into small patches (Jason);
> > > > > > - Add helper virtqueue_use_indirect() (Jason);
> > > > > > - Just set id for the last descriptor of a list (Jason);
> > > > > > - Calculate the prev in virtqueue_add_packed() (Jason);
> > > > > > - Fix/improve desc suppression code (Jason/MST);
> > > > > > - Refine the code layout for XXX_split/packed and wrappers (MST);
> > > > > > - Fix the comments and API in uapi (MST);
> > > > > > - Remove the BUG_ON() for indirect (Jason);
> > > > > > - Some other refinements and bug fixes;
> > > > > >
> > > > > > RFC v1 -> RFC v2:
> > > > > > - Add indirect descriptor support - compile test only;
> > > > > > - Add event suppression supprt - compile test only;
> > > > > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > > > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > > > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > > > > - Avoid using '%' operator (Jason);
> > > > > > - Rename free_head -> next_avail_idx (Jason);
> > > > > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > > > > - Some other refinements and bug fixes;
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Tiwei Bie (5):
> > > > > > virtio: add packed ring definitions
> > > > > > virtio_ring: support creating packed ring
> > > > > > virtio_ring: add packed ring support
> > > > > > virtio_ring: add event idx support in packed ring
> > > > > > virtio_ring: enable packed ring
> > > > > >
> > > > > > drivers/s390/virtio/virtio_ccw.c | 14 +
> > > > > > drivers/virtio/virtio_ring.c | 1365 ++++++++++++++++++++++------
> > > > > > include/linux/virtio_ring.h | 8 +-
> > > > > > include/uapi/linux/virtio_config.h | 3 +
> > > > > > include/uapi/linux/virtio_ring.h | 43 +
> > > > > > 5 files changed, 1157 insertions(+), 276 deletions(-)
> > > > > >
> > > > > > --
> > > > > > 2.18.0
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH v2] x86/paravirt: Cleanup native_patch()
From: Borislav Petkov @ 2018-09-11 9:15 UTC (permalink / raw)
To: Juergen Gross; +Cc: x86, LKML, virtualization
In-Reply-To: <dfe856d6-a4db-b406-ef18-dba3b6b8c342@suse.com>
When CONFIG_PARAVIRT_SPINLOCKS=n, it fires
arch/x86/kernel/paravirt_patch_64.c: In function ‘native_patch’:
arch/x86/kernel/paravirt_patch_64.c:89:1: warning: label ‘patch_site’ defined but not used [-Wunused-label]
patch_site:
but those labels can simply be removed by directly calling the
respective functions there.
Get rid of local variables too, while at it. Also, simplify function
flow for better readability.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: x86@kernel.org
Cc: virtualization@lists.linux-foundation.org
---
Here it is, rebased ontop of rc3+tip/master.
arch/x86/kernel/paravirt_patch_32.c | 44 ++++++++++-----------------
arch/x86/kernel/paravirt_patch_64.c | 46 +++++++++++------------------
2 files changed, 33 insertions(+), 57 deletions(-)
diff --git a/arch/x86/kernel/paravirt_patch_32.c b/arch/x86/kernel/paravirt_patch_32.c
index d460cbcabcfe..6368c22fa1fa 100644
--- a/arch/x86/kernel/paravirt_patch_32.c
+++ b/arch/x86/kernel/paravirt_patch_32.c
@@ -34,14 +34,10 @@ extern bool pv_is_native_vcpu_is_preempted(void);
unsigned native_patch(u8 type, void *ibuf, unsigned long addr, unsigned len)
{
- const unsigned char *start, *end;
- unsigned ret;
-
#define PATCH_SITE(ops, x) \
- case PARAVIRT_PATCH(ops.x): \
- start = start_##ops##_##x; \
- end = end_##ops##_##x; \
- goto patch_site
+ case PARAVIRT_PATCH(ops.x): \
+ return paravirt_patch_insns(ibuf, len, start_##ops##_##x, end_##ops##_##x)
+
switch (type) {
#ifdef CONFIG_PARAVIRT_XXL
PATCH_SITE(irq, irq_disable);
@@ -54,32 +50,24 @@ unsigned native_patch(u8 type, void *ibuf, unsigned long addr, unsigned len)
PATCH_SITE(mmu, write_cr3);
#endif
#if defined(CONFIG_PARAVIRT_SPINLOCKS)
- case PARAVIRT_PATCH(lock.queued_spin_unlock):
- if (pv_is_native_spin_unlock()) {
- start = start_lock_queued_spin_unlock;
- end = end_lock_queued_spin_unlock;
- goto patch_site;
- }
- goto patch_default;
+ case PARAVIRT_PATCH(lock.queued_spin_unlock):
+ if (pv_is_native_spin_unlock())
+ return paravirt_patch_insns(ibuf, len,
+ start_lock_queued_spin_unlock,
+ end_lock_queued_spin_unlock);
+ break;
- case PARAVIRT_PATCH(lock.vcpu_is_preempted):
- if (pv_is_native_vcpu_is_preempted()) {
- start = start_lock_vcpu_is_preempted;
- end = end_lock_vcpu_is_preempted;
- goto patch_site;
- }
- goto patch_default;
+ case PARAVIRT_PATCH(lock.vcpu_is_preempted):
+ if (pv_is_native_vcpu_is_preempted())
+ return paravirt_patch_insns(ibuf, len,
+ start_lock_vcpu_is_preempted,
+ end_lock_vcpu_is_preempted);
+ break;
#endif
default:
-patch_default: __maybe_unused
- ret = paravirt_patch_default(type, ibuf, addr, len);
- break;
-
-patch_site:
- ret = paravirt_patch_insns(ibuf, len, start, end);
break;
}
#undef PATCH_SITE
- return ret;
+ return paravirt_patch_default(type, ibuf, addr, len);
}
diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c
index 5ad5bcda9dc6..7ca9cb726f4d 100644
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -42,15 +42,11 @@ extern bool pv_is_native_vcpu_is_preempted(void);
unsigned native_patch(u8 type, void *ibuf, unsigned long addr, unsigned len)
{
- const unsigned char *start, *end;
- unsigned ret;
-
#define PATCH_SITE(ops, x) \
- case PARAVIRT_PATCH(ops.x): \
- start = start_##ops##_##x; \
- end = end_##ops##_##x; \
- goto patch_site
- switch(type) {
+ case PARAVIRT_PATCH(ops.x): \
+ return paravirt_patch_insns(ibuf, len, start_##ops##_##x, end_##ops##_##x)
+
+ switch (type) {
#ifdef CONFIG_PARAVIRT_XXL
PATCH_SITE(irq, restore_fl);
PATCH_SITE(irq, save_fl);
@@ -64,32 +60,24 @@ unsigned native_patch(u8 type, void *ibuf, unsigned long addr, unsigned len)
PATCH_SITE(mmu, write_cr3);
#endif
#if defined(CONFIG_PARAVIRT_SPINLOCKS)
- case PARAVIRT_PATCH(lock.queued_spin_unlock):
- if (pv_is_native_spin_unlock()) {
- start = start_lock_queued_spin_unlock;
- end = end_lock_queued_spin_unlock;
- goto patch_site;
- }
- goto patch_default;
+ case PARAVIRT_PATCH(lock.queued_spin_unlock):
+ if (pv_is_native_spin_unlock())
+ return paravirt_patch_insns(ibuf, len,
+ start_lock_queued_spin_unlock,
+ end_lock_queued_spin_unlock);
+ break;
- case PARAVIRT_PATCH(lock.vcpu_is_preempted):
- if (pv_is_native_vcpu_is_preempted()) {
- start = start_lock_vcpu_is_preempted;
- end = end_lock_vcpu_is_preempted;
- goto patch_site;
- }
- goto patch_default;
+ case PARAVIRT_PATCH(lock.vcpu_is_preempted):
+ if (pv_is_native_vcpu_is_preempted())
+ return paravirt_patch_insns(ibuf, len,
+ start_lock_vcpu_is_preempted,
+ end_lock_vcpu_is_preempted);
+ break;
#endif
default:
-patch_default: __maybe_unused
- ret = paravirt_patch_default(type, ibuf, addr, len);
- break;
-
-patch_site:
- ret = paravirt_patch_insns(ibuf, len, start, end);
break;
}
#undef PATCH_SITE
- return ret;
+ return paravirt_patch_default(type, ibuf, addr, len);
}
--
2.17.0.582.gccdcbd54c
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related
* Re: [PATCH v2] x86/paravirt: Cleanup native_patch()
From: Juergen Gross @ 2018-09-11 9:44 UTC (permalink / raw)
To: Borislav Petkov; +Cc: x86, LKML, virtualization
In-Reply-To: <20180911091510.GA12094@zn.tnic>
On 11/09/18 11:15, Borislav Petkov wrote:
> When CONFIG_PARAVIRT_SPINLOCKS=n, it fires
>
> arch/x86/kernel/paravirt_patch_64.c: In function ‘native_patch’:
> arch/x86/kernel/paravirt_patch_64.c:89:1: warning: label ‘patch_site’ defined but not used [-Wunused-label]
> patch_site:
>
> but those labels can simply be removed by directly calling the
> respective functions there.
>
> Get rid of local variables too, while at it. Also, simplify function
> flow for better readability.
>
> Signed-off-by: Borislav Petkov <bp@suse.de>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: x86@kernel.org
> Cc: virtualization@lists.linux-foundation.org
Reviewed-by: Juergen Gross <jgross@suse.com>
Juergen
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH net-next V2 00/11] vhost_net TX batching
From: Jason Wang @ 2018-09-12 3:16 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: mst, kvm, virtualization
Hi all:
This series tries to batch submitting packets to underlayer socket
through msg_control during sendmsg(). This is done by:
1) Doing userspace copy inside vhost_net
2) Build XDP buff
3) Batch at most 64 (VHOST_NET_BATCH) XDP buffs and submit them once
through msg_control during sendmsg().
4) Underlayer sockets can use XDP buffs directly when XDP is enalbed,
or build skb based on XDP buff.
For the packet that can not be built easily with XDP or for the case
that batch submission is hard (e.g sndbuf is limited). We will go for
the previous slow path, passing iov iterator to underlayer socket
through sendmsg() once per packet.
This can help to improve cache utilization and avoid lots of indirect
calls with sendmsg(). It can also co-operate with the batching support
of the underlayer sockets (e.g the case of XDP redirection through
maps).
Testpmd(txonly) in guest shows obvious improvements:
Test /+pps%
XDP_DROP on TAP /+44.8%
XDP_REDIRECT on TAP /+29%
macvtap (skb) /+26%
Netperf TCP_STREAM TX from guest shows obvious improvements on small
packet:
size/session/+thu%/+normalize%
64/ 1/ +2%/ 0%
64/ 2/ +3%/ +1%
64/ 4/ +7%/ +5%
64/ 8/ +8%/ +6%
256/ 1/ +3%/ 0%
256/ 2/ +10%/ +7%
256/ 4/ +26%/ +22%
256/ 8/ +27%/ +23%
512/ 1/ +3%/ +2%
512/ 2/ +19%/ +14%
512/ 4/ +43%/ +40%
512/ 8/ +45%/ +41%
1024/ 1/ +4%/ 0%
1024/ 2/ +27%/ +21%
1024/ 4/ +38%/ +73%
1024/ 8/ +15%/ +24%
2048/ 1/ +10%/ +7%
2048/ 2/ +16%/ +12%
2048/ 4/ 0%/ +2%
2048/ 8/ 0%/ +2%
4096/ 1/ +36%/ +60%
4096/ 2/ -11%/ -26%
4096/ 4/ 0%/ +14%
4096/ 8/ 0%/ +4%
16384/ 1/ -1%/ +5%
16384/ 2/ 0%/ +2%
16384/ 4/ 0%/ -3%
16384/ 8/ 0%/ +4%
65535/ 1/ 0%/ +10%
65535/ 2/ 0%/ +8%
65535/ 4/ 0%/ +1%
65535/ 8/ 0%/ +3%
Please review.
Thanks
Changes from V1:
- don't hold page refcnt for XDP_DROP
- release page if build_skb() fails
- introduce a helper to build skb for the path of no batching mode
- rename tun_do_xdp() as tun_xdp_act() and use negative for reporting
errors instead of a pointer to integer
- introduce a num field in tun_msg_ctl
- introduce a new structure for storing metadata in the head of XDP
packet
- do not store batche XDP inside vhoet_net_virtqueue
Jason Wang (11):
net: sock: introduce SOCK_XDP
tuntap: switch to use XDP_PACKET_HEADROOM
tuntap: enable bh early during processing XDP
tuntap: simplify error handling in tun_build_skb()
tuntap: tweak on the path of skb XDP case in tun_build_skb()
tuntap: split out XDP logic
tuntap: move XDP flushing out of tun_do_xdp()
tun: switch to new type of msg_control
tuntap: accept an array of XDP buffs through sendmsg()
tap: accept an array of XDP buffs through sendmsg()
vhost_net: batch submitting XDP buffers to underlayer sockets
drivers/net/tap.c | 88 +++++++++++++-
drivers/net/tun.c | 267 ++++++++++++++++++++++++++++++++---------
drivers/vhost/net.c | 181 +++++++++++++++++++++++++---
include/linux/if_tun.h | 14 +++
include/net/sock.h | 1 +
5 files changed, 471 insertions(+), 80 deletions(-)
--
2.17.1
^ permalink raw reply
* [PATCH net-next V2 01/11] net: sock: introduce SOCK_XDP
From: Jason Wang @ 2018-09-12 3:16 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: mst, kvm, virtualization
In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com>
This patch introduces a new sock flag - SOCK_XDP. This will be used
for notifying the upper layer that XDP program is attached on the
lower socket, and requires for extra headroom.
TUN will be the first user.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tun.c | 19 +++++++++++++++++++
include/net/sock.h | 1 +
2 files changed, 20 insertions(+)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index ebd07ad82431..2c548bd20393 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -869,6 +869,9 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
tun_napi_init(tun, tfile, napi);
}
+ if (rtnl_dereference(tun->xdp_prog))
+ sock_set_flag(&tfile->sk, SOCK_XDP);
+
tun_set_real_num_queues(tun);
/* device is allowed to go away first, so no need to hold extra
@@ -1241,13 +1244,29 @@ static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
struct netlink_ext_ack *extack)
{
struct tun_struct *tun = netdev_priv(dev);
+ struct tun_file *tfile;
struct bpf_prog *old_prog;
+ int i;
old_prog = rtnl_dereference(tun->xdp_prog);
rcu_assign_pointer(tun->xdp_prog, prog);
if (old_prog)
bpf_prog_put(old_prog);
+ for (i = 0; i < tun->numqueues; i++) {
+ tfile = rtnl_dereference(tun->tfiles[i]);
+ if (prog)
+ sock_set_flag(&tfile->sk, SOCK_XDP);
+ else
+ sock_reset_flag(&tfile->sk, SOCK_XDP);
+ }
+ list_for_each_entry(tfile, &tun->disabled, next) {
+ if (prog)
+ sock_set_flag(&tfile->sk, SOCK_XDP);
+ else
+ sock_reset_flag(&tfile->sk, SOCK_XDP);
+ }
+
return 0;
}
diff --git a/include/net/sock.h b/include/net/sock.h
index 433f45fc2d68..38cae35f6e16 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -800,6 +800,7 @@ enum sock_flags {
SOCK_SELECT_ERR_QUEUE, /* Wake select on error queue */
SOCK_RCU_FREE, /* wait rcu grace period in sk_destruct() */
SOCK_TXTIME,
+ SOCK_XDP, /* XDP is attached */
};
#define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
--
2.17.1
^ permalink raw reply related
* [PATCH net-next V2 02/11] tuntap: switch to use XDP_PACKET_HEADROOM
From: Jason Wang @ 2018-09-12 3:17 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: mst, kvm, virtualization
In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tun.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 2c548bd20393..d3677a544b56 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -113,7 +113,6 @@ do { \
} while (0)
#endif
-#define TUN_HEADROOM 256
#define TUN_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
/* TUN device flags */
@@ -1654,7 +1653,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
rcu_read_lock();
xdp_prog = rcu_dereference(tun->xdp_prog);
if (xdp_prog)
- pad += TUN_HEADROOM;
+ pad += XDP_PACKET_HEADROOM;
buflen += SKB_DATA_ALIGN(len + pad);
rcu_read_unlock();
--
2.17.1
^ permalink raw reply related
* [PATCH net-next V2 03/11] tuntap: enable bh early during processing XDP
From: Jason Wang @ 2018-09-12 3:17 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: mst, kvm, virtualization
In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com>
This patch move the bh enabling a little bit earlier, this will be
used for factoring out the core XDP logic of tuntap.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tun.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d3677a544b56..372caf7d67d9 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1726,22 +1726,18 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
goto err_xdp;
}
}
+ rcu_read_unlock();
+ local_bh_enable();
skb = build_skb(buf, buflen);
- if (!skb) {
- rcu_read_unlock();
- local_bh_enable();
+ if (!skb)
return ERR_PTR(-ENOMEM);
- }
skb_reserve(skb, pad - delta);
skb_put(skb, len);
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
- rcu_read_unlock();
- local_bh_enable();
-
return skb;
err_redirect:
--
2.17.1
^ permalink raw reply related
* [PATCH net-next V2 04/11] tuntap: simplify error handling in tun_build_skb()
From: Jason Wang @ 2018-09-12 3:17 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: mst, kvm, virtualization
In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com>
There's no need to duplicate page get logic in each action. So this
patch tries to get page and calculate the offset before processing XDP
actions (except for XDP_DROP), and undo them when meet errors (we
don't care the performance on errors). This will be used for factoring
out XDP logic.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tun.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 372caf7d67d9..257cf7342d54 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1701,17 +1701,13 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
xdp_do_flush_map();
if (err)
goto err_redirect;
- rcu_read_unlock();
- local_bh_enable();
- return NULL;
+ goto out;
case XDP_TX:
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
if (tun_xdp_tx(tun->dev, &xdp) < 0)
goto err_redirect;
- rcu_read_unlock();
- local_bh_enable();
- return NULL;
+ goto out;
case XDP_PASS:
delta = orig_data - xdp.data;
len = xdp.data_end - xdp.data;
@@ -1742,7 +1738,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
err_redirect:
put_page(alloc_frag->page);
-err_xdp:
+out:
rcu_read_unlock();
local_bh_enable();
this_cpu_inc(tun->pcpu_stats->rx_dropped);
--
2.17.1
^ permalink raw reply related
* [PATCH net-next V2 05/11] tuntap: tweak on the path of skb XDP case in tun_build_skb()
From: Jason Wang @ 2018-09-12 3:17 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: mst, kvm, virtualization
In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com>
If we're sure not to go native XDP, there's no need for several things
like bh and rcu stuffs. So this patch introduces a helper to build skb
and hold page refcnt. When we found we will go through skb path, build
skb directly.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tun.c | 39 ++++++++++++++++++++++++---------------
1 file changed, 24 insertions(+), 15 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 257cf7342d54..946c6148ed75 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1635,6 +1635,23 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile,
return true;
}
+static struct sk_buff *__tun_build_skb(struct page_frag *alloc_frag, char *buf,
+ int buflen, int len, int pad, int delta)
+{
+ struct sk_buff *skb = build_skb(buf, buflen);
+
+ if (!skb)
+ return ERR_PTR(-ENOMEM);
+
+ skb_reserve(skb, pad - delta);
+ skb_put(skb, len);
+
+ get_page(alloc_frag->page);
+ alloc_frag->offset += buflen;
+
+ return skb;
+}
+
static struct sk_buff *tun_build_skb(struct tun_struct *tun,
struct tun_file *tfile,
struct iov_iter *from,
@@ -1642,7 +1659,6 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
int len, int *skb_xdp)
{
struct page_frag *alloc_frag = ¤t->task_frag;
- struct sk_buff *skb;
struct bpf_prog *xdp_prog;
int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
unsigned int delta = 0;
@@ -1672,10 +1688,12 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
* of xdp_prog above, this should be rare and for simplicity
* we do XDP on skb in case the headroom is not enough.
*/
- if (hdr->gso_type || !xdp_prog)
+ if (hdr->gso_type || !xdp_prog) {
*skb_xdp = 1;
- else
- *skb_xdp = 0;
+ return __tun_build_skb(alloc_frag, buf, buflen, len, pad, delta);
+ }
+
+ *skb_xdp = 0;
local_bh_disable();
rcu_read_lock();
@@ -1719,22 +1737,13 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
trace_xdp_exception(tun->dev, xdp_prog, act);
/* fall through */
case XDP_DROP:
- goto err_xdp;
+ goto out;
}
}
rcu_read_unlock();
local_bh_enable();
- skb = build_skb(buf, buflen);
- if (!skb)
- return ERR_PTR(-ENOMEM);
-
- skb_reserve(skb, pad - delta);
- skb_put(skb, len);
- get_page(alloc_frag->page);
- alloc_frag->offset += buflen;
-
- return skb;
+ return __tun_build_skb(alloc_frag, buf, buflen, len, pad, delta);
err_redirect:
put_page(alloc_frag->page);
--
2.17.1
^ permalink raw reply related
* [PATCH net-next V2 06/11] tuntap: split out XDP logic
From: Jason Wang @ 2018-09-12 3:17 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: mst, kvm, virtualization
In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com>
This patch split out XDP logic into a single function. This make it to
be reused by XDP batching path in the following patch.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tun.c | 88 +++++++++++++++++++++++++++--------------------
1 file changed, 51 insertions(+), 37 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 946c6148ed75..14fe94098180 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1636,14 +1636,14 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile,
}
static struct sk_buff *__tun_build_skb(struct page_frag *alloc_frag, char *buf,
- int buflen, int len, int pad, int delta)
+ int buflen, int len, int pad)
{
struct sk_buff *skb = build_skb(buf, buflen);
if (!skb)
return ERR_PTR(-ENOMEM);
- skb_reserve(skb, pad - delta);
+ skb_reserve(skb, pad);
skb_put(skb, len);
get_page(alloc_frag->page);
@@ -1652,6 +1652,39 @@ static struct sk_buff *__tun_build_skb(struct page_frag *alloc_frag, char *buf,
return skb;
}
+static int tun_xdp_act(struct tun_struct *tun, struct bpf_prog *xdp_prog,
+ struct xdp_buff *xdp, u32 act)
+{
+ int err;
+
+ switch (act) {
+ case XDP_REDIRECT:
+ err = xdp_do_redirect(tun->dev, xdp, xdp_prog);
+ xdp_do_flush_map();
+ if (err)
+ return err;
+ break;
+ case XDP_TX:
+ err = tun_xdp_tx(tun->dev, xdp);
+ if (err < 0)
+ return err;
+ break;
+ case XDP_PASS:
+ break;
+ default:
+ bpf_warn_invalid_xdp_action(act);
+ /* fall through */
+ case XDP_ABORTED:
+ trace_xdp_exception(tun->dev, xdp_prog, act);
+ /* fall through */
+ case XDP_DROP:
+ this_cpu_inc(tun->pcpu_stats->rx_dropped);
+ break;
+ }
+
+ return act;
+}
+
static struct sk_buff *tun_build_skb(struct tun_struct *tun,
struct tun_file *tfile,
struct iov_iter *from,
@@ -1661,10 +1694,10 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
struct page_frag *alloc_frag = ¤t->task_frag;
struct bpf_prog *xdp_prog;
int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
- unsigned int delta = 0;
char *buf;
size_t copied;
- int err, pad = TUN_RX_PAD;
+ int pad = TUN_RX_PAD;
+ int err = 0;
rcu_read_lock();
xdp_prog = rcu_dereference(tun->xdp_prog);
@@ -1690,7 +1723,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
*/
if (hdr->gso_type || !xdp_prog) {
*skb_xdp = 1;
- return __tun_build_skb(alloc_frag, buf, buflen, len, pad, delta);
+ return __tun_build_skb(alloc_frag, buf, buflen, len, pad);
}
*skb_xdp = 0;
@@ -1698,9 +1731,8 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
local_bh_disable();
rcu_read_lock();
xdp_prog = rcu_dereference(tun->xdp_prog);
- if (xdp_prog && !*skb_xdp) {
+ if (xdp_prog) {
struct xdp_buff xdp;
- void *orig_data;
u32 act;
xdp.data_hard_start = buf;
@@ -1708,49 +1740,31 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
xdp_set_data_meta_invalid(&xdp);
xdp.data_end = xdp.data + len;
xdp.rxq = &tfile->xdp_rxq;
- orig_data = xdp.data;
- act = bpf_prog_run_xdp(xdp_prog, &xdp);
- switch (act) {
- case XDP_REDIRECT:
- get_page(alloc_frag->page);
- alloc_frag->offset += buflen;
- err = xdp_do_redirect(tun->dev, &xdp, xdp_prog);
- xdp_do_flush_map();
- if (err)
- goto err_redirect;
- goto out;
- case XDP_TX:
+ act = bpf_prog_run_xdp(xdp_prog, &xdp);
+ if (act == XDP_REDIRECT || act == XDP_TX) {
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
- if (tun_xdp_tx(tun->dev, &xdp) < 0)
- goto err_redirect;
- goto out;
- case XDP_PASS:
- delta = orig_data - xdp.data;
- len = xdp.data_end - xdp.data;
- break;
- default:
- bpf_warn_invalid_xdp_action(act);
- /* fall through */
- case XDP_ABORTED:
- trace_xdp_exception(tun->dev, xdp_prog, act);
- /* fall through */
- case XDP_DROP:
- goto out;
}
+ err = tun_xdp_act(tun, xdp_prog, &xdp, act);
+ if (err < 0)
+ goto err_xdp;
+ if (err != XDP_PASS)
+ goto out;
+
+ pad = xdp.data - xdp.data_hard_start;
+ len = xdp.data_end - xdp.data;
}
rcu_read_unlock();
local_bh_enable();
- return __tun_build_skb(alloc_frag, buf, buflen, len, pad, delta);
+ return __tun_build_skb(alloc_frag, buf, buflen, len, pad);
-err_redirect:
+err_xdp:
put_page(alloc_frag->page);
out:
rcu_read_unlock();
local_bh_enable();
- this_cpu_inc(tun->pcpu_stats->rx_dropped);
return NULL;
}
--
2.17.1
^ permalink raw reply related
* [PATCH net-next V2 07/11] tuntap: move XDP flushing out of tun_do_xdp()
From: Jason Wang @ 2018-09-12 3:17 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: mst, kvm, virtualization
In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com>
This will allow adding batch flushing on top.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tun.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 14fe94098180..3ae539374f6b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1660,7 +1660,6 @@ static int tun_xdp_act(struct tun_struct *tun, struct bpf_prog *xdp_prog,
switch (act) {
case XDP_REDIRECT:
err = xdp_do_redirect(tun->dev, xdp, xdp_prog);
- xdp_do_flush_map();
if (err)
return err;
break;
@@ -1749,6 +1748,8 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
err = tun_xdp_act(tun, xdp_prog, &xdp, act);
if (err < 0)
goto err_xdp;
+ if (err == XDP_REDIRECT)
+ xdp_do_flush_map();
if (err != XDP_PASS)
goto out;
--
2.17.1
^ permalink raw reply related
* [PATCH net-next V2 08/11] tun: switch to new type of msg_control
From: Jason Wang @ 2018-09-12 3:17 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: mst, kvm, virtualization
In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com>
This patch introduces to a new tun/tap specific msg_control:
#define TUN_MSG_UBUF 1
#define TUN_MSG_PTR 2
struct tun_msg_ctl {
int type;
void *ptr;
};
This allows us to pass different kinds of msg_control through
sendmsg(). The first supported type is ubuf (TUN_MSG_UBUF) which will
be used by the existed vhost_net zerocopy code. The second is XDP
buff, which allows vhost_net to pass XDP buff to TUN. This could be
used to implement accepting an array of XDP buffs from vhost_net in
the following patches.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tap.c | 18 ++++++++++++------
drivers/net/tun.c | 6 +++++-
drivers/vhost/net.c | 7 +++++--
include/linux/if_tun.h | 14 ++++++++++++++
4 files changed, 36 insertions(+), 9 deletions(-)
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index f0f7cd977667..7996ed7cbf18 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -619,7 +619,7 @@ static inline struct sk_buff *tap_alloc_skb(struct sock *sk, size_t prepad,
#define TAP_RESERVE HH_DATA_OFF(ETH_HLEN)
/* Get packet from user space buffer */
-static ssize_t tap_get_user(struct tap_queue *q, struct msghdr *m,
+static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
struct iov_iter *from, int noblock)
{
int good_linear = SKB_MAX_HEAD(TAP_RESERVE);
@@ -663,7 +663,7 @@ static ssize_t tap_get_user(struct tap_queue *q, struct msghdr *m,
if (unlikely(len < ETH_HLEN))
goto err;
- if (m && m->msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY)) {
+ if (msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY)) {
struct iov_iter i;
copylen = vnet_hdr.hdr_len ?
@@ -724,11 +724,11 @@ static ssize_t tap_get_user(struct tap_queue *q, struct msghdr *m,
tap = rcu_dereference(q->tap);
/* copy skb_ubuf_info for callback when skb has no error */
if (zerocopy) {
- skb_shinfo(skb)->destructor_arg = m->msg_control;
+ skb_shinfo(skb)->destructor_arg = msg_control;
skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
- } else if (m && m->msg_control) {
- struct ubuf_info *uarg = m->msg_control;
+ } else if (msg_control) {
+ struct ubuf_info *uarg = msg_control;
uarg->callback(uarg, false);
}
@@ -1150,7 +1150,13 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m,
size_t total_len)
{
struct tap_queue *q = container_of(sock, struct tap_queue, sock);
- return tap_get_user(q, m, &m->msg_iter, m->msg_flags & MSG_DONTWAIT);
+ struct tun_msg_ctl *ctl = m->msg_control;
+
+ if (ctl && ctl->type != TUN_MSG_UBUF)
+ return -EINVAL;
+
+ return tap_get_user(q, ctl ? ctl->ptr : NULL, &m->msg_iter,
+ m->msg_flags & MSG_DONTWAIT);
}
static int tap_recvmsg(struct socket *sock, struct msghdr *m,
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 3ae539374f6b..89779b58c7ca 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2431,11 +2431,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
int ret;
struct tun_file *tfile = container_of(sock, struct tun_file, socket);
struct tun_struct *tun = tun_get(tfile);
+ struct tun_msg_ctl *ctl = m->msg_control;
if (!tun)
return -EBADFD;
- ret = tun_get_user(tun, tfile, m->msg_control, &m->msg_iter,
+ if (ctl && ctl->type != TUN_MSG_UBUF)
+ return -EINVAL;
+
+ ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter,
m->msg_flags & MSG_DONTWAIT,
m->msg_flags & MSG_MORE);
tun_put(tun);
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 4e656f89cb22..fb01ce6d981c 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -620,6 +620,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
.msg_controllen = 0,
.msg_flags = MSG_DONTWAIT,
};
+ struct tun_msg_ctl ctl;
size_t len, total_len = 0;
int err;
struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
@@ -664,8 +665,10 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
ubuf->ctx = nvq->ubufs;
ubuf->desc = nvq->upend_idx;
refcount_set(&ubuf->refcnt, 1);
- msg.msg_control = ubuf;
- msg.msg_controllen = sizeof(ubuf);
+ msg.msg_control = &ctl;
+ ctl.type = TUN_MSG_UBUF;
+ ctl.ptr = ubuf;
+ msg.msg_controllen = sizeof(ctl);
ubufs = nvq->ubufs;
atomic_inc(&ubufs->refcount);
nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index 3d2996dc7d85..12e3eebf0ce6 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -16,9 +16,23 @@
#define __IF_TUN_H
#include <uapi/linux/if_tun.h>
+#include <uapi/linux/virtio_net.h>
#define TUN_XDP_FLAG 0x1UL
+#define TUN_MSG_UBUF 1
+#define TUN_MSG_PTR 2
+struct tun_msg_ctl {
+ unsigned short type;
+ unsigned short num;
+ void *ptr;
+};
+
+struct tun_xdp_hdr {
+ int buflen;
+ struct virtio_net_hdr gso;
+};
+
#if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
struct socket *tun_get_socket(struct file *);
struct ptr_ring *tun_get_tx_ring(struct file *file);
--
2.17.1
^ permalink raw reply related
* [PATCH net-next V2 09/11] tuntap: accept an array of XDP buffs through sendmsg()
From: Jason Wang @ 2018-09-12 3:17 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: mst, kvm, virtualization
In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com>
This patch implement TUN_MSG_PTR msg_control type. This type allows
the caller to pass an array of XDP buffs to tuntap through ptr field
of the tun_msg_control. If an XDP program is attached, tuntap can run
XDP program directly. If not, tuntap will build skb and do a fast
receiving since part of the work has been done by vhost_net.
This will avoid lots of indirect calls thus improves the icache
utilization and allows to do XDP batched flushing when doing XDP
redirection.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tun.c | 117 ++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 114 insertions(+), 3 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 89779b58c7ca..2a2cd35853b7 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2426,22 +2426,133 @@ static void tun_sock_write_space(struct sock *sk)
kill_fasync(&tfile->fasync, SIGIO, POLL_OUT);
}
+static int tun_xdp_one(struct tun_struct *tun,
+ struct tun_file *tfile,
+ struct xdp_buff *xdp, int *flush)
+{
+ struct tun_xdp_hdr *hdr = xdp->data_hard_start;
+ struct virtio_net_hdr *gso = &hdr->gso;
+ struct tun_pcpu_stats *stats;
+ struct bpf_prog *xdp_prog;
+ struct sk_buff *skb = NULL;
+ u32 rxhash = 0, act;
+ int buflen = hdr->buflen;
+ int err = 0;
+ bool skb_xdp = false;
+
+ xdp_prog = rcu_dereference(tun->xdp_prog);
+ if (xdp_prog) {
+ if (gso->gso_type) {
+ skb_xdp = true;
+ goto build;
+ }
+ xdp_set_data_meta_invalid(xdp);
+ xdp->rxq = &tfile->xdp_rxq;
+
+ act = bpf_prog_run_xdp(xdp_prog, xdp);
+ err = tun_xdp_act(tun, xdp_prog, xdp, act);
+ if (err < 0) {
+ put_page(virt_to_head_page(xdp->data));
+ return err;
+ }
+
+ switch (err) {
+ case XDP_REDIRECT:
+ *flush = true;
+ /* fall through */
+ case XDP_TX:
+ return 0;
+ case XDP_PASS:
+ break;
+ default:
+ put_page(virt_to_head_page(xdp->data));
+ return 0;
+ }
+ }
+
+build:
+ skb = build_skb(xdp->data_hard_start, buflen);
+ if (!skb) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ skb_reserve(skb, xdp->data - xdp->data_hard_start);
+ skb_put(skb, xdp->data_end - xdp->data);
+
+ if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) {
+ this_cpu_inc(tun->pcpu_stats->rx_frame_errors);
+ kfree_skb(skb);
+ err = -EINVAL;
+ goto out;
+ }
+
+ skb->protocol = eth_type_trans(skb, tun->dev);
+ skb_reset_network_header(skb);
+ skb_probe_transport_header(skb, 0);
+
+ if (skb_xdp) {
+ err = do_xdp_generic(xdp_prog, skb);
+ if (err != XDP_PASS)
+ goto out;
+ }
+
+ if (!rcu_dereference(tun->steering_prog))
+ rxhash = __skb_get_hash_symmetric(skb);
+
+ netif_receive_skb(skb);
+
+ stats = get_cpu_ptr(tun->pcpu_stats);
+ u64_stats_update_begin(&stats->syncp);
+ stats->rx_packets++;
+ stats->rx_bytes += skb->len;
+ u64_stats_update_end(&stats->syncp);
+ put_cpu_ptr(stats);
+
+ if (rxhash)
+ tun_flow_update(tun, rxhash, tfile);
+
+out:
+ return err;
+}
+
static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
{
- int ret;
+ int ret, i;
struct tun_file *tfile = container_of(sock, struct tun_file, socket);
struct tun_struct *tun = tun_get(tfile);
struct tun_msg_ctl *ctl = m->msg_control;
+ struct xdp_buff *xdp;
if (!tun)
return -EBADFD;
- if (ctl && ctl->type != TUN_MSG_UBUF)
- return -EINVAL;
+ if (ctl && (ctl->type == TUN_MSG_PTR)) {
+ int n = ctl->num;
+ int flush = 0;
+
+ local_bh_disable();
+ rcu_read_lock();
+
+ for (i = 0; i < n; i++) {
+ xdp = &((struct xdp_buff *)ctl->ptr)[i];
+ tun_xdp_one(tun, tfile, xdp, &flush);
+ }
+
+ if (flush)
+ xdp_do_flush_map();
+
+ rcu_read_unlock();
+ local_bh_enable();
+
+ ret = total_len;
+ goto out;
+ }
ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter,
m->msg_flags & MSG_DONTWAIT,
m->msg_flags & MSG_MORE);
+out:
tun_put(tun);
return ret;
}
--
2.17.1
^ permalink raw reply related
* [PATCH net-next V2 10/11] tap: accept an array of XDP buffs through sendmsg()
From: Jason Wang @ 2018-09-12 3:17 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: mst, kvm, virtualization
In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com>
This patch implement TUN_MSG_PTR msg_control type. This type allows
the caller to pass an array of XDP buffs to tuntap through ptr field
of the tun_msg_control. Tap will build skb through those XDP buffers.
This will avoid lots of indirect calls thus improves the icache
utilization and allows to do XDP batched flushing when doing XDP
redirection.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/tap.c | 74 +++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 72 insertions(+), 2 deletions(-)
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 7996ed7cbf18..a4ab4a791fe7 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1146,14 +1146,84 @@ static const struct file_operations tap_fops = {
#endif
};
+static int tap_get_user_xdp(struct tap_queue *q, struct xdp_buff *xdp)
+{
+ struct tun_xdp_hdr *hdr = xdp->data_hard_start;
+ struct virtio_net_hdr *gso = &hdr->gso;
+ int buflen = hdr->buflen;
+ int vnet_hdr_len = 0;
+ struct tap_dev *tap;
+ struct sk_buff *skb;
+ int err, depth;
+
+ if (q->flags & IFF_VNET_HDR)
+ vnet_hdr_len = READ_ONCE(q->vnet_hdr_sz);
+
+ skb = build_skb(xdp->data_hard_start, buflen);
+ if (!skb) {
+ err = -ENOMEM;
+ goto err;
+ }
+
+ skb_reserve(skb, xdp->data - xdp->data_hard_start);
+ skb_put(skb, xdp->data_end - xdp->data);
+
+ skb_set_network_header(skb, ETH_HLEN);
+ skb_reset_mac_header(skb);
+ skb->protocol = eth_hdr(skb)->h_proto;
+
+ if (vnet_hdr_len) {
+ err = virtio_net_hdr_to_skb(skb, gso, tap_is_little_endian(q));
+ if (err)
+ goto err_kfree;
+ }
+
+ skb_probe_transport_header(skb, ETH_HLEN);
+
+ /* Move network header to the right position for VLAN tagged packets */
+ if ((skb->protocol == htons(ETH_P_8021Q) ||
+ skb->protocol == htons(ETH_P_8021AD)) &&
+ __vlan_get_protocol(skb, skb->protocol, &depth) != 0)
+ skb_set_network_header(skb, depth);
+
+ rcu_read_lock();
+ tap = rcu_dereference(q->tap);
+ if (tap) {
+ skb->dev = tap->dev;
+ dev_queue_xmit(skb);
+ } else {
+ kfree_skb(skb);
+ }
+ rcu_read_unlock();
+
+ return 0;
+
+err_kfree:
+ kfree_skb(skb);
+err:
+ rcu_read_lock();
+ tap = rcu_dereference(q->tap);
+ if (tap && tap->count_tx_dropped)
+ tap->count_tx_dropped(tap);
+ rcu_read_unlock();
+ return err;
+}
+
static int tap_sendmsg(struct socket *sock, struct msghdr *m,
size_t total_len)
{
struct tap_queue *q = container_of(sock, struct tap_queue, sock);
struct tun_msg_ctl *ctl = m->msg_control;
+ struct xdp_buff *xdp;
+ int i;
- if (ctl && ctl->type != TUN_MSG_UBUF)
- return -EINVAL;
+ if (ctl && (ctl->type == TUN_MSG_PTR)) {
+ for (i = 0; i < ctl->num; i++) {
+ xdp = &((struct xdp_buff *)ctl->ptr)[i];
+ tap_get_user_xdp(q, xdp);
+ }
+ return 0;
+ }
return tap_get_user(q, ctl ? ctl->ptr : NULL, &m->msg_iter,
m->msg_flags & MSG_DONTWAIT);
--
2.17.1
^ permalink raw reply related
* [PATCH net-next V2 11/11] vhost_net: batch submitting XDP buffers to underlayer sockets
From: Jason Wang @ 2018-09-12 3:17 UTC (permalink / raw)
To: netdev, linux-kernel; +Cc: mst, kvm, virtualization
In-Reply-To: <20180912031709.14112-1-jasowang@redhat.com>
This patch implements XDP batching for vhost_net. The idea is first to
try to do userspace copy and build XDP buff directly in vhost. Instead
of submitting the packet immediately, vhost_net will batch them in an
array and submit every 64 (VHOST_NET_BATCH) packets to the under layer
sockets through msg_control of sendmsg().
When XDP is enabled on the TUN/TAP, TUN/TAP can process XDP inside a
loop without caring GUP thus it can do batch map flushing. When XDP is
not enabled or not supported, the underlayer socket need to build skb
and pass it to network core. The batched packet submission allows us
to do batching like netif_receive_skb_list() in the future.
This saves lots of indirect calls for better cache utilization. For
the case that we can't so batching e.g when sndbuf is limited or
packet size is too large, we will go for usual one packet per
sendmsg() way.
Doing testpmd on various setups gives us:
Test /+pps%
XDP_DROP on TAP /+44.8%
XDP_REDIRECT on TAP /+29%
macvtap (skb) /+26%
Netperf tests shows obvious improvements for small packet transmission:
size/session/+thu%/+normalize%
64/ 1/ +2%/ 0%
64/ 2/ +3%/ +1%
64/ 4/ +7%/ +5%
64/ 8/ +8%/ +6%
256/ 1/ +3%/ 0%
256/ 2/ +10%/ +7%
256/ 4/ +26%/ +22%
256/ 8/ +27%/ +23%
512/ 1/ +3%/ +2%
512/ 2/ +19%/ +14%
512/ 4/ +43%/ +40%
512/ 8/ +45%/ +41%
1024/ 1/ +4%/ 0%
1024/ 2/ +27%/ +21%
1024/ 4/ +38%/ +73%
1024/ 8/ +15%/ +24%
2048/ 1/ +10%/ +7%
2048/ 2/ +16%/ +12%
2048/ 4/ 0%/ +2%
2048/ 8/ 0%/ +2%
4096/ 1/ +36%/ +60%
4096/ 2/ -11%/ -26%
4096/ 4/ 0%/ +14%
4096/ 8/ 0%/ +4%
16384/ 1/ -1%/ +5%
16384/ 2/ 0%/ +2%
16384/ 4/ 0%/ -3%
16384/ 8/ 0%/ +4%
65535/ 1/ 0%/ +10%
65535/ 2/ 0%/ +8%
65535/ 4/ 0%/ +1%
65535/ 8/ 0%/ +3%
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 174 ++++++++++++++++++++++++++++++++++++++++----
1 file changed, 161 insertions(+), 13 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index fb01ce6d981c..dd4e0a301635 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -116,6 +116,8 @@ struct vhost_net_virtqueue {
* For RX, number of batched heads
*/
int done_idx;
+ /* Number of XDP frames batched */
+ int batched_xdp;
/* an array of userspace buffers info */
struct ubuf_info *ubuf_info;
/* Reference counting for outstanding ubufs.
@@ -123,6 +125,8 @@ struct vhost_net_virtqueue {
struct vhost_net_ubuf_ref *ubufs;
struct ptr_ring *rx_ring;
struct vhost_net_buf rxq;
+ /* Batched XDP buffs */
+ struct xdp_buff *xdp;
};
struct vhost_net {
@@ -338,6 +342,11 @@ static bool vhost_sock_zcopy(struct socket *sock)
sock_flag(sock->sk, SOCK_ZEROCOPY);
}
+static bool vhost_sock_xdp(struct socket *sock)
+{
+ return sock_flag(sock->sk, SOCK_XDP);
+}
+
/* In case of DMA done not in order in lower device driver for some reason.
* upend_idx is used to track end of used idx, done_idx is used to track head
* of used idx. Once lower device DMA done contiguously, we will signal KVM
@@ -444,10 +453,37 @@ static void vhost_net_signal_used(struct vhost_net_virtqueue *nvq)
nvq->done_idx = 0;
}
+static void vhost_tx_batch(struct vhost_net *net,
+ struct vhost_net_virtqueue *nvq,
+ struct socket *sock,
+ struct msghdr *msghdr)
+{
+ struct tun_msg_ctl ctl = {
+ .type = TUN_MSG_PTR,
+ .num = nvq->batched_xdp,
+ .ptr = nvq->xdp,
+ };
+ int err;
+
+ if (nvq->batched_xdp == 0)
+ goto signal_used;
+
+ msghdr->msg_control = &ctl;
+ err = sock->ops->sendmsg(sock, msghdr, 0);
+ if (unlikely(err < 0)) {
+ vq_err(&nvq->vq, "Fail to batch sending packets\n");
+ return;
+ }
+
+signal_used:
+ vhost_net_signal_used(nvq);
+ nvq->batched_xdp = 0;
+}
+
static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
struct vhost_net_virtqueue *nvq,
unsigned int *out_num, unsigned int *in_num,
- bool *busyloop_intr)
+ struct msghdr *msghdr, bool *busyloop_intr)
{
struct vhost_virtqueue *vq = &nvq->vq;
unsigned long uninitialized_var(endtime);
@@ -455,8 +491,9 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
out_num, in_num, NULL, NULL);
if (r == vq->num && vq->busyloop_timeout) {
+ /* Flush batched packets first */
if (!vhost_sock_zcopy(vq->private_data))
- vhost_net_signal_used(nvq);
+ vhost_tx_batch(net, nvq, vq->private_data, msghdr);
preempt_disable();
endtime = busy_clock() + vq->busyloop_timeout;
while (vhost_can_busy_poll(endtime)) {
@@ -512,7 +549,7 @@ static int get_tx_bufs(struct vhost_net *net,
struct vhost_virtqueue *vq = &nvq->vq;
int ret;
- ret = vhost_net_tx_get_vq_desc(net, nvq, out, in, busyloop_intr);
+ ret = vhost_net_tx_get_vq_desc(net, nvq, out, in, msg, busyloop_intr);
if (ret < 0 || ret == vq->num)
return ret;
@@ -540,6 +577,80 @@ static bool tx_can_batch(struct vhost_virtqueue *vq, size_t total_len)
!vhost_vq_avail_empty(vq->dev, vq);
}
+#define VHOST_NET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
+
+static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
+ struct iov_iter *from)
+{
+ struct vhost_virtqueue *vq = &nvq->vq;
+ struct socket *sock = vq->private_data;
+ struct page_frag *alloc_frag = ¤t->task_frag;
+ struct virtio_net_hdr *gso;
+ struct xdp_buff *xdp = &nvq->xdp[nvq->batched_xdp];
+ struct tun_xdp_hdr *hdr;
+ size_t len = iov_iter_count(from);
+ int headroom = vhost_sock_xdp(sock) ? XDP_PACKET_HEADROOM : 0;
+ int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+ int pad = SKB_DATA_ALIGN(VHOST_NET_RX_PAD + headroom + nvq->sock_hlen);
+ int sock_hlen = nvq->sock_hlen;
+ void *buf;
+ int copied;
+
+ if (unlikely(len < nvq->sock_hlen))
+ return -EFAULT;
+
+ if (SKB_DATA_ALIGN(len + pad) +
+ SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE)
+ return -ENOSPC;
+
+ buflen += SKB_DATA_ALIGN(len + pad);
+ alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES);
+ if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL)))
+ return -ENOMEM;
+
+ buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
+ copied = copy_page_from_iter(alloc_frag->page,
+ alloc_frag->offset +
+ offsetof(struct tun_xdp_hdr, gso),
+ sock_hlen, from);
+ if (copied != sock_hlen)
+ return -EFAULT;
+
+ hdr = buf;
+ gso = &hdr->gso;
+
+ if ((gso->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
+ vhost16_to_cpu(vq, gso->csum_start) +
+ vhost16_to_cpu(vq, gso->csum_offset) + 2 >
+ vhost16_to_cpu(vq, gso->hdr_len)) {
+ gso->hdr_len = cpu_to_vhost16(vq,
+ vhost16_to_cpu(vq, gso->csum_start) +
+ vhost16_to_cpu(vq, gso->csum_offset) + 2);
+
+ if (vhost16_to_cpu(vq, gso->hdr_len) > len)
+ return -EINVAL;
+ }
+
+ len -= sock_hlen;
+ copied = copy_page_from_iter(alloc_frag->page,
+ alloc_frag->offset + pad,
+ len, from);
+ if (copied != len)
+ return -EFAULT;
+
+ xdp->data_hard_start = buf;
+ xdp->data = buf + pad;
+ xdp->data_end = xdp->data + len;
+ hdr->buflen = buflen;
+
+ get_page(alloc_frag->page);
+ alloc_frag->offset += buflen;
+
+ ++nvq->batched_xdp;
+
+ return 0;
+}
+
static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
{
struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
@@ -556,10 +667,14 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
size_t len, total_len = 0;
int err;
int sent_pkts = 0;
+ bool sock_can_batch = (sock->sk->sk_sndbuf == INT_MAX);
for (;;) {
bool busyloop_intr = false;
+ if (nvq->done_idx == VHOST_NET_BATCH)
+ vhost_tx_batch(net, nvq, sock, &msg);
+
head = get_tx_bufs(net, nvq, &msg, &out, &in, &len,
&busyloop_intr);
/* On error, stop handling until the next kick. */
@@ -577,14 +692,34 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
break;
}
- vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
- vq->heads[nvq->done_idx].len = 0;
-
total_len += len;
- if (tx_can_batch(vq, total_len))
- msg.msg_flags |= MSG_MORE;
- else
- msg.msg_flags &= ~MSG_MORE;
+
+ /* For simplicity, TX batching is only enabled if
+ * sndbuf is unlimited.
+ */
+ if (sock_can_batch) {
+ err = vhost_net_build_xdp(nvq, &msg.msg_iter);
+ if (!err) {
+ goto done;
+ } else if (unlikely(err != -ENOSPC)) {
+ vhost_tx_batch(net, nvq, sock, &msg);
+ vhost_discard_vq_desc(vq, 1);
+ vhost_net_enable_vq(net, vq);
+ break;
+ }
+
+ /* We can't build XDP buff, go for single
+ * packet path but let's flush batched
+ * packets.
+ */
+ vhost_tx_batch(net, nvq, sock, &msg);
+ msg.msg_control = NULL;
+ } else {
+ if (tx_can_batch(vq, total_len))
+ msg.msg_flags |= MSG_MORE;
+ else
+ msg.msg_flags &= ~MSG_MORE;
+ }
/* TODO: Check specific error and bomb out unless ENOBUFS? */
err = sock->ops->sendmsg(sock, &msg, len);
@@ -596,15 +731,17 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
if (err != len)
pr_debug("Truncated TX packet: len %d != %zd\n",
err, len);
- if (++nvq->done_idx >= VHOST_NET_BATCH)
- vhost_net_signal_used(nvq);
+done:
+ vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
+ vq->heads[nvq->done_idx].len = 0;
+ ++nvq->done_idx;
if (vhost_exceeds_weight(++sent_pkts, total_len)) {
vhost_poll_queue(&vq->poll);
break;
}
}
- vhost_net_signal_used(nvq);
+ vhost_tx_batch(net, nvq, sock, &msg);
}
static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
@@ -1081,6 +1218,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
struct vhost_dev *dev;
struct vhost_virtqueue **vqs;
void **queue;
+ struct xdp_buff *xdp;
int i;
n = kvmalloc(sizeof *n, GFP_KERNEL | __GFP_RETRY_MAYFAIL);
@@ -1101,6 +1239,14 @@ static int vhost_net_open(struct inode *inode, struct file *f)
}
n->vqs[VHOST_NET_VQ_RX].rxq.queue = queue;
+ xdp = kmalloc_array(VHOST_NET_BATCH, sizeof(*xdp), GFP_KERNEL);
+ if (!xdp) {
+ kfree(vqs);
+ kvfree(n);
+ kfree(queue);
+ }
+ n->vqs[VHOST_NET_VQ_TX].xdp = xdp;
+
dev = &n->dev;
vqs[VHOST_NET_VQ_TX] = &n->vqs[VHOST_NET_VQ_TX].vq;
vqs[VHOST_NET_VQ_RX] = &n->vqs[VHOST_NET_VQ_RX].vq;
@@ -1111,6 +1257,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
n->vqs[i].ubuf_info = NULL;
n->vqs[i].upend_idx = 0;
n->vqs[i].done_idx = 0;
+ n->vqs[i].batched_xdp = 0;
n->vqs[i].vhost_hlen = 0;
n->vqs[i].sock_hlen = 0;
n->vqs[i].rx_ring = NULL;
@@ -1194,6 +1341,7 @@ static int vhost_net_release(struct inode *inode, struct file *f)
* since jobs can re-queue themselves. */
vhost_net_flush(n);
kfree(n->vqs[VHOST_NET_VQ_RX].rxq.queue);
+ kfree(n->vqs[VHOST_NET_VQ_TX].xdp);
kfree(n->dev.vqs);
kvfree(n);
return 0;
--
2.17.1
^ permalink raw reply related
* Re: [PATCH] qxl: refactor to use drm_fb_helper_fbdev_setup
From: Gerd Hoffmann @ 2018-09-12 7:03 UTC (permalink / raw)
To: Peter Wu; +Cc: Dave Airlie, linux-kernel, dri-devel, virtualization
In-Reply-To: <20180910132156.23201-1-peter@lekensteyn.nl>
On Mon, Sep 10, 2018 at 03:21:56PM +0200, Peter Wu wrote:
> Lots of code can be removed by relying on fb-helper:
> - "struct drm_framebuffer" moves to fb_helper.fb.
> - "struct drm_gem_object" moves to fb_helper.obj[0].
> - "struct qxl_device" can be inferred as drm_fb_helper is embedded.
> - qxl_user_framebuffer_create -> drm_gem_fb_create.
> - qxl_user_framebuffer_destroy -> drm_gem_fb_destroy.
> - qxl_fbdev_destroy -> drm_fb_helper_fbdev_teardown + vfree(shadow).
>
> Remove unused code:
> - qxl_fbdev_qobj_is_fb, qxl_fbdev_set_suspend.
> - Unused fields of qxl_fbdev: delayed_ops, delayed_ops_lock, size.
>
> Misc notes:
> - The dirty callback is preserved as it is necessary to trigger update
> commands in the hw (the screen stays black otherwise).
> - No idea when .create_handle in drm_framebuffer_funcs is used, but use
> the same drm_gem_fb_create_handle to match drm_gem_fb_funcs.
> - I don't know why qxl_fb_find_or_create_single used to check for an
> existing framebuffer and removed that check to match other drivers.
> - Use of drm_fb_helper_fbdev_teardown also requires "info->fbdefio" to
> be dynamically allocated. Replace the existing defio config by
> drm_fb_helper_defio_init to accomodate this.
>
> Testing results: startx with fbdev, modesetting and qxl all seems to
> work. Tested also with CONFIG_DRM_FBDEV_EMULATION=n, fbdev obviously
> fails but others are fine. QEMU -spice and QEMU -spice with vdagent and
> multiple (resized) displays (via remote-viewer) also works.
> unbind vtconsole and rmmod has *not* regressed (i.e. it still trips on a
> use-after-free in qxl_check_idle via qxl_ttm_fini).
>
> Ideally setup/teardown is replaced by drm_fbdev_generic_setup as that
> would result in further code reduction, improve error handling (like not
> leaking shadow memory), but unfortunately QXL has no implementation for
> qxl_gem_prime_vmap.
Pushed to drm-misc-next.
thanks,
Gerd
^ permalink raw reply
* Re: [virtio-dev] [PATCH 2/2] drm/virtio: add iommu support.
From: Gerd Hoffmann @ 2018-09-12 7:25 UTC (permalink / raw)
To: Jiandi An
Cc: virtio-dev, Dave Airlie, LKML, dri-devel,
open list:VIRTIO CORE, NET..., Dave Airlie
In-Reply-To: <9bf20fa1-3752-edd1-588c-e38cd166c3af@amd.com>
Hi,
> I attempted to fix it in the ttm layer and here is the discussion
> https://lore.kernel.org/lkml/b44280d7-eb13-0996-71f5-3fbdeb466801@amd.com/
>
> The ttm maintainer Christian is suggesting to map and set ttm->pages as decrypted
> right after ttm->pages are allocated.
>
> Just checking with you guys maybe there is a better way to handle this in
> the virtio gpu layer instead of the common ttm layer. Could you guys shine some
> light on this? Thanks.
I think the tty layer is the right place to fix this. virtio just calls
down to ttm for mappings. I think virtio should just hint to ttm using
a flag in some struct, probably ttm_buffer_object or
ttm_mem_type_manager, that the objects need decrypted mappings.
cheers,
Gerd
^ permalink raw reply
* Re: [PATCH net-next v2 2/5] virtio_ring: support creating packed ring
From: Tiwei Bie @ 2018-09-12 7:51 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180910022837.GA11822@debian>
On Mon, Sep 10, 2018 at 10:28:37AM +0800, Tiwei Bie wrote:
> On Fri, Sep 07, 2018 at 10:03:24AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 11, 2018 at 10:27:08AM +0800, Tiwei Bie wrote:
> > > This commit introduces the support for creating packed ring.
> > > All split ring specific functions are added _split suffix.
> > > Some necessary stubs for packed ring are also added.
> > >
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> >
[...]
> > > +
> > > +/*
> > > + * The layout for the packed ring is a continuous chunk of memory
> > > + * which looks like this.
> > > + *
> > > + * struct vring_packed {
> > > + * // The actual descriptors (16 bytes each)
> > > + * struct vring_packed_desc desc[num];
> > > + *
> > > + * // Padding to the next align boundary.
> > > + * char pad[];
> > > + *
> > > + * // Driver Event Suppression
> > > + * struct vring_packed_desc_event driver;
> > > + *
> > > + * // Device Event Suppression
> > > + * struct vring_packed_desc_event device;
> > > + * };
> > > + */
> >
> > Why not just allocate event structures separately?
> > Is it a win to have them share a cache line for some reason?
Users may call vring_new_virtqueue() with preallocated
memory to setup the ring, e.g.:
https://github.com/torvalds/linux/blob/11da3a7f84f1/drivers/s390/virtio/virtio_ccw.c#L513-L522
https://github.com/torvalds/linux/blob/11da3a7f84f1/drivers/misc/mic/vop/vop_main.c#L306-L307
Below is the corresponding definition in split ring:
https://github.com/oasis-tcs/virtio-spec/blob/89dd55f5e606/split-ring.tex#L64-L78
If my understanding is correct, this is just for legacy
interfaces, and we won't define layout in packed ring
and don't need to support vring_new_virtqueue() in packed
ring. Is it correct? Thanks!
>
> Will do that.
>
> >
> > > +static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
> > > + void *p, unsigned long align)
> > > +{
> > > + vr->num = num;
> > > + vr->desc = p;
> > > + vr->driver = (void *)ALIGN(((uintptr_t)p +
> > > + sizeof(struct vring_packed_desc) * num), align);
> > > + vr->device = vr->driver + 1;
> > > +}
> >
> > What's all this about alignment? Where does it come from?
>
> It comes from the `vring_align` parameter of vring_create_virtqueue()
> and vring_new_virtqueue():
>
> https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L1061
> https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L1123
>
> Should I just ignore it in packed ring?
>
> CCW defined this:
>
> #define KVM_VIRTIO_CCW_RING_ALIGN 4096
>
> I'm not familiar with CCW. Currently, in this patch set, packed ring
> isn't enabled on CCW dues to some legacy accessors are not implemented
> in packed ring yet.
>
> >
> > > +
> > > +static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
> > > +{
> > > + return ((sizeof(struct vring_packed_desc) * num + align - 1)
> > > + & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
> > > +}
> [...]
> > > @@ -1129,10 +1388,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
> > > void (*callback)(struct virtqueue *vq),
> > > const char *name)
> > > {
> > > - struct vring vring;
> > > - vring_init(&vring, num, pages, vring_align);
> > > - return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > > - notify, callback, name);
> > > + union vring_union vring;
> > > + bool packed;
> > > +
> > > + packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> > > + if (packed)
> > > + vring_init_packed(&vring.vring_packed, num, pages, vring_align);
> > > + else
> > > + vring_init(&vring.vring_split, num, pages, vring_align);
> >
> >
> > vring_init in the UAPI header is more or less a bug.
> > I'd just stop using it, keep it around for legacy userspace.
>
> Got it. I'd like to do that. Thanks.
>
> >
> > > +
> > > + return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > > + context, notify, callback, name);
> > > }
> > > EXPORT_SYMBOL_GPL(vring_new_virtqueue);
> > >
> > > @@ -1142,7 +1408,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
> > >
> > > if (vq->we_own_ring) {
> > > vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
> > > - vq->vring.desc, vq->queue_dma_addr);
> > > + vq->packed ? (void *)vq->vring_packed.desc :
> > > + (void *)vq->vring.desc,
> > > + vq->queue_dma_addr);
> > > }
> > > list_del(&_vq->list);
> > > kfree(vq);
> > > @@ -1184,7 +1452,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
> > >
> > > struct vring_virtqueue *vq = to_vvq(_vq);
> > >
> > > - return vq->vring.num;
> > > + return vq->packed ? vq->vring_packed.num : vq->vring.num;
> > > }
> > > EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
> > >
> > > @@ -1227,6 +1495,10 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
> > >
> > > BUG_ON(!vq->we_own_ring);
> > >
> > > + if (vq->packed)
> > > + return vq->queue_dma_addr + ((char *)vq->vring_packed.driver -
> > > + (char *)vq->vring_packed.desc);
> > > +
> > > return vq->queue_dma_addr +
> > > ((char *)vq->vring.avail - (char *)vq->vring.desc);
> > > }
> > > @@ -1238,11 +1510,16 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
> > >
> > > BUG_ON(!vq->we_own_ring);
> > >
> > > + if (vq->packed)
> > > + return vq->queue_dma_addr + ((char *)vq->vring_packed.device -
> > > + (char *)vq->vring_packed.desc);
> > > +
> > > return vq->queue_dma_addr +
> > > ((char *)vq->vring.used - (char *)vq->vring.desc);
> > > }
> > > EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
> > >
> > > +/* Only available for split ring */
> > > const struct vring *virtqueue_get_vring(struct virtqueue *vq)
> > > {
> > > return &to_vvq(vq)->vring;
> > > diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> > > index fab02133a919..992142b35f55 100644
> > > --- a/include/linux/virtio_ring.h
> > > +++ b/include/linux/virtio_ring.h
> > > @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
> > > struct virtio_device;
> > > struct virtqueue;
> > >
> > > +union vring_union {
> > > + struct vring vring_split;
> > > + struct vring_packed vring_packed;
> > > +};
> > > +
> > > /*
> > > * Creates a virtqueue and allocates the descriptor ring. If
> > > * may_reduce_num is set, then this may allocate a smaller ring than
> > > @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
> > >
> > > /* Creates a virtqueue with a custom layout. */
> > > struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > - struct vring vring,
> > > + union vring_union vring,
> > > + bool packed,
> > > struct virtio_device *vdev,
> > > bool weak_barriers,
> > > bool ctx,
> > > --
> > > 2.18.0
^ permalink raw reply
* Re: [PATCH net-next v2 2/5] virtio_ring: support creating packed ring
From: Michael S. Tsirkin @ 2018-09-12 12:45 UTC (permalink / raw)
To: Tiwei Bie; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180910022837.GA11822@debian>
On Mon, Sep 10, 2018 at 10:28:37AM +0800, Tiwei Bie wrote:
> On Fri, Sep 07, 2018 at 10:03:24AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 11, 2018 at 10:27:08AM +0800, Tiwei Bie wrote:
> > > This commit introduces the support for creating packed ring.
> > > All split ring specific functions are added _split suffix.
> > > Some necessary stubs for packed ring are also added.
> > >
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> >
> > I'd rather have a patch just renaming split functions, then
> > add all packed stuff in as a separate patch on top.
>
> Sure, I will do that.
>
> >
> >
> > > ---
> > > drivers/virtio/virtio_ring.c | 801 +++++++++++++++++++++++------------
> > > include/linux/virtio_ring.h | 8 +-
> > > 2 files changed, 546 insertions(+), 263 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 814b395007b2..c4f8abc7445a 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -60,11 +60,15 @@ struct vring_desc_state {
> > > struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> > > };
> > >
> > > +struct vring_desc_state_packed {
> > > + int next; /* The next desc state. */
> >
> > So this can go away with IN_ORDER?
>
> Yes. If IN_ORDER is negotiated, next won't be needed anymore.
> Currently, IN_ORDER isn't included in this patch set, because
> some changes for split ring are needed to make sure that it
> will use the descs in the expected order. After that,
> optimizations can be done for both of split ring and packed
> ring respectively.
>
> >
> > > +};
> > > +
> > > struct vring_virtqueue {
> > > struct virtqueue vq;
> > >
> > > - /* Actual memory layout for this queue */
> > > - struct vring vring;
> > > + /* Is this a packed ring? */
> > > + bool packed;
> > >
> > > /* Can we use weak barriers? */
> > > bool weak_barriers;
> > > @@ -86,11 +90,39 @@ struct vring_virtqueue {
> > > /* Last used index we've seen. */
> > > u16 last_used_idx;
> > >
> > > - /* Last written value to avail->flags */
> > > - u16 avail_flags_shadow;
> > > + union {
> > > + /* Available for split ring */
> > > + struct {
> > > + /* Actual memory layout for this queue. */
> > > + struct vring vring;
> > >
> > > - /* Last written value to avail->idx in guest byte order */
> > > - u16 avail_idx_shadow;
> > > + /* Last written value to avail->flags */
> > > + u16 avail_flags_shadow;
> > > +
> > > + /* Last written value to avail->idx in
> > > + * guest byte order. */
> > > + u16 avail_idx_shadow;
> > > + };
> >
> > Name this field split so it's easier to detect misuse of e.g.
> > packed fields in split code?
>
> Good point, I'll do that.
>
> >
> > > +
> > > + /* Available for packed ring */
> > > + struct {
> > > + /* Actual memory layout for this queue. */
> > > + struct vring_packed vring_packed;
> > > +
> > > + /* Driver ring wrap counter. */
> > > + bool avail_wrap_counter;
> > > +
> > > + /* Device ring wrap counter. */
> > > + bool used_wrap_counter;
> > > +
> > > + /* Index of the next avail descriptor. */
> > > + u16 next_avail_idx;
> > > +
> > > + /* Last written value to driver->flags in
> > > + * guest byte order. */
> > > + u16 event_flags_shadow;
> > > + };
> > > + };
> [...]
> > > +
> > > +/*
> > > + * The layout for the packed ring is a continuous chunk of memory
> > > + * which looks like this.
> > > + *
> > > + * struct vring_packed {
> > > + * // The actual descriptors (16 bytes each)
> > > + * struct vring_packed_desc desc[num];
> > > + *
> > > + * // Padding to the next align boundary.
> > > + * char pad[];
> > > + *
> > > + * // Driver Event Suppression
> > > + * struct vring_packed_desc_event driver;
> > > + *
> > > + * // Device Event Suppression
> > > + * struct vring_packed_desc_event device;
> > > + * };
> > > + */
> >
> > Why not just allocate event structures separately?
> > Is it a win to have them share a cache line for some reason?
>
> Will do that.
>
> >
> > > +static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
> > > + void *p, unsigned long align)
> > > +{
> > > + vr->num = num;
> > > + vr->desc = p;
> > > + vr->driver = (void *)ALIGN(((uintptr_t)p +
> > > + sizeof(struct vring_packed_desc) * num), align);
> > > + vr->device = vr->driver + 1;
> > > +}
> >
> > What's all this about alignment? Where does it come from?
>
> It comes from the `vring_align` parameter of vring_create_virtqueue()
> and vring_new_virtqueue():
>
> https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L1061
> https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L1123
Note the TODO - we just never got to fixing it. It would be great to fix
this for virtio 1 rings, but if not - at least let's not add this
stuff for packed rings.
> Should I just ignore it in packed ring?
>
> CCW defined this:
>
> #define KVM_VIRTIO_CCW_RING_ALIGN 4096
>
> I'm not familiar with CCW. Currently, in this patch set, packed ring
> isn't enabled on CCW dues to some legacy accessors are not implemented
> in packed ring yet.
Then you need to take steps not to negotiate this feature bit for
ccw drivers.
> >
> > > +
> > > +static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
> > > +{
> > > + return ((sizeof(struct vring_packed_desc) * num + align - 1)
> > > + & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
> > > +}
> [...]
> > > @@ -1129,10 +1388,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
> > > void (*callback)(struct virtqueue *vq),
> > > const char *name)
> > > {
> > > - struct vring vring;
> > > - vring_init(&vring, num, pages, vring_align);
> > > - return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > > - notify, callback, name);
> > > + union vring_union vring;
> > > + bool packed;
> > > +
> > > + packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> > > + if (packed)
> > > + vring_init_packed(&vring.vring_packed, num, pages, vring_align);
> > > + else
> > > + vring_init(&vring.vring_split, num, pages, vring_align);
> >
> >
> > vring_init in the UAPI header is more or less a bug.
> > I'd just stop using it, keep it around for legacy userspace.
>
> Got it. I'd like to do that. Thanks.
>
> >
> > > +
> > > + return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > > + context, notify, callback, name);
> > > }
> > > EXPORT_SYMBOL_GPL(vring_new_virtqueue);
> > >
> > > @@ -1142,7 +1408,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
> > >
> > > if (vq->we_own_ring) {
> > > vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
> > > - vq->vring.desc, vq->queue_dma_addr);
> > > + vq->packed ? (void *)vq->vring_packed.desc :
> > > + (void *)vq->vring.desc,
> > > + vq->queue_dma_addr);
> > > }
> > > list_del(&_vq->list);
> > > kfree(vq);
> > > @@ -1184,7 +1452,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
> > >
> > > struct vring_virtqueue *vq = to_vvq(_vq);
> > >
> > > - return vq->vring.num;
> > > + return vq->packed ? vq->vring_packed.num : vq->vring.num;
> > > }
> > > EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
> > >
> > > @@ -1227,6 +1495,10 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
> > >
> > > BUG_ON(!vq->we_own_ring);
> > >
> > > + if (vq->packed)
> > > + return vq->queue_dma_addr + ((char *)vq->vring_packed.driver -
> > > + (char *)vq->vring_packed.desc);
> > > +
> > > return vq->queue_dma_addr +
> > > ((char *)vq->vring.avail - (char *)vq->vring.desc);
> > > }
> > > @@ -1238,11 +1510,16 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
> > >
> > > BUG_ON(!vq->we_own_ring);
> > >
> > > + if (vq->packed)
> > > + return vq->queue_dma_addr + ((char *)vq->vring_packed.device -
> > > + (char *)vq->vring_packed.desc);
> > > +
> > > return vq->queue_dma_addr +
> > > ((char *)vq->vring.used - (char *)vq->vring.desc);
> > > }
> > > EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
> > >
> > > +/* Only available for split ring */
> > > const struct vring *virtqueue_get_vring(struct virtqueue *vq)
> > > {
> > > return &to_vvq(vq)->vring;
> > > diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> > > index fab02133a919..992142b35f55 100644
> > > --- a/include/linux/virtio_ring.h
> > > +++ b/include/linux/virtio_ring.h
> > > @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
> > > struct virtio_device;
> > > struct virtqueue;
> > >
> > > +union vring_union {
> > > + struct vring vring_split;
> > > + struct vring_packed vring_packed;
> > > +};
> > > +
> > > /*
> > > * Creates a virtqueue and allocates the descriptor ring. If
> > > * may_reduce_num is set, then this may allocate a smaller ring than
> > > @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
> > >
> > > /* Creates a virtqueue with a custom layout. */
> > > struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > - struct vring vring,
> > > + union vring_union vring,
> > > + bool packed,
> > > struct virtio_device *vdev,
> > > bool weak_barriers,
> > > bool ctx,
> > > --
> > > 2.18.0
^ permalink raw reply
* Re: [PATCH net-next v2 1/5] virtio: add packed ring definitions
From: Michael S. Tsirkin @ 2018-09-12 12:53 UTC (permalink / raw)
To: Tiwei Bie; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180910021322.GA10912@debian>
On Mon, Sep 10, 2018 at 10:13:22AM +0800, Tiwei Bie wrote:
> On Fri, Sep 07, 2018 at 09:51:23AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 11, 2018 at 10:27:07AM +0800, Tiwei Bie wrote:
> > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > ---
> > > include/uapi/linux/virtio_config.h | 3 +++
> > > include/uapi/linux/virtio_ring.h | 43 ++++++++++++++++++++++++++++++
> > > 2 files changed, 46 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> > > index 449132c76b1c..1196e1c1d4f6 100644
> > > --- a/include/uapi/linux/virtio_config.h
> > > +++ b/include/uapi/linux/virtio_config.h
> > > @@ -75,6 +75,9 @@
> > > */
> > > #define VIRTIO_F_IOMMU_PLATFORM 33
> > >
> > > +/* This feature indicates support for the packed virtqueue layout. */
> > > +#define VIRTIO_F_RING_PACKED 34
> > > +
> > > /*
> > > * Does the device support Single Root I/O Virtualization?
> > > */
> > > diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
> > > index 6d5d5faa989b..0254a2ba29cf 100644
> > > --- a/include/uapi/linux/virtio_ring.h
> > > +++ b/include/uapi/linux/virtio_ring.h
> > > @@ -44,6 +44,10 @@
> > > /* This means the buffer contains a list of buffer descriptors. */
> > > #define VRING_DESC_F_INDIRECT 4
> > >
> > > +/* Mark a descriptor as available or used. */
> > > +#define VRING_DESC_F_AVAIL (1ul << 7)
> > > +#define VRING_DESC_F_USED (1ul << 15)
> > > +
> > > /* The Host uses this in used->flags to advise the Guest: don't kick me when
> > > * you add a buffer. It's unreliable, so it's simply an optimization. Guest
> > > * will still kick if it's out of buffers. */
> > > @@ -53,6 +57,17 @@
> > > * optimization. */
> > > #define VRING_AVAIL_F_NO_INTERRUPT 1
> > >
> > > +/* Enable events. */
> > > +#define VRING_EVENT_F_ENABLE 0x0
> > > +/* Disable events. */
> > > +#define VRING_EVENT_F_DISABLE 0x1
> > > +/*
> > > + * Enable events for a specific descriptor
> > > + * (as specified by Descriptor Ring Change Event Offset/Wrap Counter).
> > > + * Only valid if VIRTIO_RING_F_EVENT_IDX has been negotiated.
> > > + */
> > > +#define VRING_EVENT_F_DESC 0x2
> > > +
> > > /* We support indirect buffer descriptors */
> > > #define VIRTIO_RING_F_INDIRECT_DESC 28
> > >
> >
> > These are for the packed ring, right? Pls prefix accordingly.
>
> How about something like this:
>
> #define VRING_PACKED_DESC_F_AVAIL (1u << 7)
> #define VRING_PACKED_DESC_F_USED (1u << 15)
_F_ should be a bit number, not a shifted value.
Yes I know current virtio_ring.h is inconsistent in
that respect. changes welcome though we need to
maintain the old names for the sake of old apps.
>
> #define VRING_PACKED_EVENT_F_ENABLE 0x0
> #define VRING_PACKED_EVENT_F_DISABLE 0x1
> #define VRING_PACKED_EVENT_F_DESC 0x2
These are values, I think they should not have _F_.
Yes I know current virtio_ring.h is inconsistent in
that respect.
>
> > Also, you likely need macros for the wrap counters.
>
> How about something like this:
>
> #define VRING_PACKED_EVENT_WRAP_COUNTER_SHIFT 15
Given it's a single bit, maybe even
VRING_PACKED_EVENT_F_WRAP_CTR
> #define VRING_PACKED_EVENT_WRAP_COUNTER_MASK \
> (1u << VRING_PACKED_WRAP_COUNTER_SHIFT)
> #define VRING_PACKED_EVENT_OFFSET_MASK \
> ((1u << VRING_PACKED_WRAP_COUNTER_SHIFT) - 1)
I'm not sure the mask is justified.
> >
> > > @@ -171,4 +186,32 @@ static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old)
> > > return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
> > > }
> > >
> > > +struct vring_packed_desc_event {
> > > + /* Descriptor Ring Change Event Offset/Wrap Counter. */
> > > + __virtio16 off_wrap;
> > > + /* Descriptor Ring Change Event Flags. */
> > > + __virtio16 flags;
> > > +};
> > > +
> > > +struct vring_packed_desc {
> > > + /* Buffer Address. */
> > > + __virtio64 addr;
> > > + /* Buffer Length. */
> > > + __virtio32 len;
> > > + /* Buffer ID. */
> > > + __virtio16 id;
> > > + /* The flags depending on descriptor type. */
> > > + __virtio16 flags;
> > > +};
> >
> > Don't use __virtioXX types, just __leXX ones.
>
> Got it, will do that.
>
> >
> > > +
> > > +struct vring_packed {
> > > + unsigned int num;
> > > +
> > > + struct vring_packed_desc *desc;
> > > +
> > > + struct vring_packed_desc_event *driver;
> > > +
> > > + struct vring_packed_desc_event *device;
> > > +};
> > > +
> > > #endif /* _UAPI_LINUX_VIRTIO_RING_H */
> > > --
> > > 2.18.0
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH net-next v2 0/5] virtio: support packed ring
From: Michael S. Tsirkin @ 2018-09-12 13:06 UTC (permalink / raw)
To: Tiwei Bie; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180910030053.GA15645@debian>
On Mon, Sep 10, 2018 at 11:00:53AM +0800, Tiwei Bie wrote:
> On Fri, Sep 07, 2018 at 09:00:49AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Sep 07, 2018 at 09:22:25AM +0800, Tiwei Bie wrote:
> > > On Mon, Aug 27, 2018 at 05:00:40PM +0300, Michael S. Tsirkin wrote:
> > > > Are there still plans to test the performance with vost pmd?
> > > > vhost doesn't seem to show a performance gain ...
> > > >
> > >
> > > I tried some performance tests with vhost PMD. In guest, the
> > > XDP program will return XDP_DROP directly. And in host, testpmd
> > > will do txonly fwd.
> > >
> > > When burst size is 1 and packet size is 64 in testpmd and
> > > testpmd needs to iterate 5 Tx queues (but only the first two
> > > queues are enabled) to prepare and inject packets, I got ~12%
> > > performance boost (5.7Mpps -> 6.4Mpps). And if the vhost PMD
> > > is faster (e.g. just need to iterate the first two queues to
> > > prepare and inject packets), then I got similar performance
> > > for both rings (~9.9Mpps) (packed ring's performance can be
> > > lower, because it's more complicated in driver.)
> > >
> > > I think packed ring makes vhost PMD faster, but it doesn't make
> > > the driver faster. In packed ring, the ring is simplified, and
> > > the handling of the ring in vhost (device) is also simplified,
> > > but things are not simplified in driver, e.g. although there is
> > > no desc table in the virtqueue anymore, driver still needs to
> > > maintain a private desc state table (which is still managed as
> > > a list in this patch set) to support the out-of-order desc
> > > processing in vhost (device).
> > >
> > > I think this patch set is mainly to make the driver have a full
> > > functional support for the packed ring, which makes it possible
> > > to leverage the packed ring feature in vhost (device). But I'm
> > > not sure whether there is any other better idea, I'd like to
> > > hear your thoughts. Thanks!
> >
> > Just this: Jens seems to report a nice gain with virtio and
> > vhost pmd across the board. Try to compare virtio and
> > virtio pmd to see what does pmd do better?
>
> The virtio PMD (drivers/net/virtio) in DPDK doesn't need to share
> the virtio ring operation code with other drivers and is highly
> optimized for network. E.g. in Rx, the Rx burst function won't
> chain descs.
> So the ID management for the Rx ring can be quite
> simple and straightforward, we just need to initialize these IDs
> when initializing the ring and don't need to change these IDs
> in data path anymore (the mergable Rx code in that patch set
> assumes the descs will be written back in order, which should be
> fixed. I.e., the ID in the desc should be used to index vq->descx[]).
> The Tx code in that patch set also assumes the descs will be
> written back by device in order, which should be fixed.
>
> But in kernel virtio driver, the virtio_ring.c is very generic.
> The enqueue (virtqueue_add()) and dequeue (virtqueue_get_buf_ctx())
> functions need to support all the virtio devices and should be
> able to handle all the possible cases that may happen. So although
> the packed ring can be very efficient in some cases, currently
> the room to optimize the performance in kernel's virtio_ring.c
> isn't that much. If we want to take the fully advantage of the
> packed ring's efficiency, we need some further e.g. API changes
> in virtio_ring.c, which shouldn't be part of this patch set. So
> I still think this patch set is mainly to make the kernel virtio
> driver to have a full functional support of the packed ring, and
> we can't expect impressive performance boost with it.
So what are the cases that make things complex?
Maybe we should drop support for them completely.
> >
> >
> > >
> > > >
> > > > On Wed, Jul 11, 2018 at 10:27:06AM +0800, Tiwei Bie wrote:
> > > > > Hello everyone,
> > > > >
> > > > > This patch set implements packed ring support in virtio driver.
> > > > >
> > > > > Some functional tests have been done with Jason's
> > > > > packed ring implementation in vhost:
> > > > >
> > > > > https://lkml.org/lkml/2018/7/3/33
> > > > >
> > > > > Both of ping and netperf worked as expected.
> > > > >
> > > > > v1 -> v2:
> > > > > - Use READ_ONCE() to read event off_wrap and flags together (Jason);
> > > > > - Add comments related to ccw (Jason);
> > > > >
> > > > > RFC (v6) -> v1:
> > > > > - Avoid extra virtio_wmb() in virtqueue_enable_cb_delayed_packed()
> > > > > when event idx is off (Jason);
> > > > > - Fix bufs calculation in virtqueue_enable_cb_delayed_packed() (Jason);
> > > > > - Test the state of the desc at used_idx instead of last_used_idx
> > > > > in virtqueue_enable_cb_delayed_packed() (Jason);
> > > > > - Save wrap counter (as part of queue state) in the return value
> > > > > of virtqueue_enable_cb_prepare_packed();
> > > > > - Refine the packed ring definitions in uapi;
> > > > > - Rebase on the net-next tree;
> > > > >
> > > > > RFC v5 -> RFC v6:
> > > > > - Avoid tracking addr/len/flags when DMA API isn't used (MST/Jason);
> > > > > - Define wrap counter as bool (Jason);
> > > > > - Use ALIGN() in vring_init_packed() (Jason);
> > > > > - Avoid using pointer to track `next` in detach_buf_packed() (Jason);
> > > > > - Add comments for barriers (Jason);
> > > > > - Don't enable RING_PACKED on ccw for now (noticed by Jason);
> > > > > - Refine the memory barrier in virtqueue_poll();
> > > > > - Add a missing memory barrier in virtqueue_enable_cb_delayed_packed();
> > > > > - Remove the hacks in virtqueue_enable_cb_prepare_packed();
> > > > >
> > > > > RFC v4 -> RFC v5:
> > > > > - Save DMA addr, etc in desc state (Jason);
> > > > > - Track used wrap counter;
> > > > >
> > > > > RFC v3 -> RFC v4:
> > > > > - Make ID allocation support out-of-order (Jason);
> > > > > - Various fixes for EVENT_IDX support;
> > > > >
> > > > > RFC v2 -> RFC v3:
> > > > > - Split into small patches (Jason);
> > > > > - Add helper virtqueue_use_indirect() (Jason);
> > > > > - Just set id for the last descriptor of a list (Jason);
> > > > > - Calculate the prev in virtqueue_add_packed() (Jason);
> > > > > - Fix/improve desc suppression code (Jason/MST);
> > > > > - Refine the code layout for XXX_split/packed and wrappers (MST);
> > > > > - Fix the comments and API in uapi (MST);
> > > > > - Remove the BUG_ON() for indirect (Jason);
> > > > > - Some other refinements and bug fixes;
> > > > >
> > > > > RFC v1 -> RFC v2:
> > > > > - Add indirect descriptor support - compile test only;
> > > > > - Add event suppression supprt - compile test only;
> > > > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > > > - Avoid using '%' operator (Jason);
> > > > > - Rename free_head -> next_avail_idx (Jason);
> > > > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > > > - Some other refinements and bug fixes;
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Tiwei Bie (5):
> > > > > virtio: add packed ring definitions
> > > > > virtio_ring: support creating packed ring
> > > > > virtio_ring: add packed ring support
> > > > > virtio_ring: add event idx support in packed ring
> > > > > virtio_ring: enable packed ring
> > > > >
> > > > > drivers/s390/virtio/virtio_ccw.c | 14 +
> > > > > drivers/virtio/virtio_ring.c | 1365 ++++++++++++++++++++++------
> > > > > include/linux/virtio_ring.h | 8 +-
> > > > > include/uapi/linux/virtio_config.h | 3 +
> > > > > include/uapi/linux/virtio_ring.h | 43 +
> > > > > 5 files changed, 1157 insertions(+), 276 deletions(-)
> > > > >
> > > > > --
> > > > > 2.18.0
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >
^ permalink raw reply
* [PATCH v2 02/17] compat_ioctl: move drivers to generic_compat_ioctl_ptrarg
From: Arnd Bergmann @ 2018-09-12 15:01 UTC (permalink / raw)
To: viro
Cc: kvm, Alexander Shishkin, Jarkko Sakkinen, virtualization,
Benjamin Tissoires, linux-mtd, Peter Huewe, linux1394-devel,
devel, Jason Gunthorpe, Marek Vasut, linux-input, Tomas Winkler,
Arnd Bergmann, Jiri Kosina, Artem Bityutskiy, Greg Kroah-Hartman,
linux-usb, linux-kernel, Sudip Mukherjee, Stefan Richter, netdev,
linux-fsdevel, linux-integrity
In-Reply-To: <20180912150142.157913-1-arnd@arndb.de>
Each of these drivers has a copy of the same trivial helper function to
convert the pointer argument and then call the native ioctl handler.
We now have a generic implementation of that, so use it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/char/ppdev.c | 12 +---------
drivers/char/tpm/tpm_vtpm_proxy.c | 12 +---------
drivers/firewire/core-cdev.c | 12 +---------
drivers/hid/usbhid/hiddev.c | 11 +--------
drivers/hwtracing/stm/core.c | 12 +---------
drivers/misc/mei/main.c | 22 +----------------
drivers/mtd/ubi/cdev.c | 36 +++-------------------------
drivers/net/tap.c | 12 +---------
drivers/staging/pi433/pi433_if.c | 12 +---------
drivers/usb/core/devio.c | 16 +------------
drivers/vfio/vfio.c | 39 +++----------------------------
drivers/vhost/net.c | 12 +---------
drivers/vhost/scsi.c | 12 +---------
drivers/vhost/test.c | 12 +---------
drivers/vhost/vsock.c | 12 +---------
fs/fat/file.c | 13 +----------
16 files changed, 20 insertions(+), 237 deletions(-)
diff --git a/drivers/char/ppdev.c b/drivers/char/ppdev.c
index 1ae77b41050a..c38a62457cf0 100644
--- a/drivers/char/ppdev.c
+++ b/drivers/char/ppdev.c
@@ -674,14 +674,6 @@ static long pp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return ret;
}
-#ifdef CONFIG_COMPAT
-static long pp_compat_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- return pp_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static int pp_open(struct inode *inode, struct file *file)
{
unsigned int minor = iminor(inode);
@@ -790,9 +782,7 @@ static const struct file_operations pp_fops = {
.write = pp_write,
.poll = pp_poll,
.unlocked_ioctl = pp_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = pp_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.open = pp_open,
.release = pp_release,
};
diff --git a/drivers/char/tpm/tpm_vtpm_proxy.c b/drivers/char/tpm/tpm_vtpm_proxy.c
index 87a0ce47f201..a170f5ca7416 100644
--- a/drivers/char/tpm/tpm_vtpm_proxy.c
+++ b/drivers/char/tpm/tpm_vtpm_proxy.c
@@ -678,20 +678,10 @@ static long vtpmx_fops_ioctl(struct file *f, unsigned int ioctl,
}
}
-#ifdef CONFIG_COMPAT
-static long vtpmx_fops_compat_ioctl(struct file *f, unsigned int ioctl,
- unsigned long arg)
-{
- return vtpmx_fops_ioctl(f, ioctl, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static const struct file_operations vtpmx_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = vtpmx_fops_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vtpmx_fops_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.llseek = noop_llseek,
};
diff --git a/drivers/firewire/core-cdev.c b/drivers/firewire/core-cdev.c
index d8e185582642..2acc0c9ddf94 100644
--- a/drivers/firewire/core-cdev.c
+++ b/drivers/firewire/core-cdev.c
@@ -1659,14 +1659,6 @@ static long fw_device_op_ioctl(struct file *file,
return dispatch_ioctl(file->private_data, cmd, (void __user *)arg);
}
-#ifdef CONFIG_COMPAT
-static long fw_device_op_compat_ioctl(struct file *file,
- unsigned int cmd, unsigned long arg)
-{
- return dispatch_ioctl(file->private_data, cmd, compat_ptr(arg));
-}
-#endif
-
static int fw_device_op_mmap(struct file *file, struct vm_area_struct *vma)
{
struct client *client = file->private_data;
@@ -1808,7 +1800,5 @@ const struct file_operations fw_device_ops = {
.mmap = fw_device_op_mmap,
.release = fw_device_op_release,
.poll = fw_device_op_poll,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = fw_device_op_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
};
diff --git a/drivers/hid/usbhid/hiddev.c b/drivers/hid/usbhid/hiddev.c
index 23872d08308c..73a168f97024 100644
--- a/drivers/hid/usbhid/hiddev.c
+++ b/drivers/hid/usbhid/hiddev.c
@@ -845,13 +845,6 @@ static long hiddev_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return r;
}
-#ifdef CONFIG_COMPAT
-static long hiddev_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
- return hiddev_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static const struct file_operations hiddev_fops = {
.owner = THIS_MODULE,
.read = hiddev_read,
@@ -861,9 +854,7 @@ static const struct file_operations hiddev_fops = {
.release = hiddev_release,
.unlocked_ioctl = hiddev_ioctl,
.fasync = hiddev_fasync,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = hiddev_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.llseek = noop_llseek,
};
diff --git a/drivers/hwtracing/stm/core.c b/drivers/hwtracing/stm/core.c
index 10bcb5d73f90..3f5cbb948781 100644
--- a/drivers/hwtracing/stm/core.c
+++ b/drivers/hwtracing/stm/core.c
@@ -651,23 +651,13 @@ stm_char_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
return err;
}
-#ifdef CONFIG_COMPAT
-static long
-stm_char_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
-{
- return stm_char_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
-}
-#else
-#define stm_char_compat_ioctl NULL
-#endif
-
static const struct file_operations stm_fops = {
.open = stm_char_open,
.release = stm_char_release,
.write = stm_char_write,
.mmap = stm_char_mmap,
.unlocked_ioctl = stm_char_ioctl,
- .compat_ioctl = stm_char_compat_ioctl,
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.llseek = no_llseek,
};
diff --git a/drivers/misc/mei/main.c b/drivers/misc/mei/main.c
index 4d77a6ae183a..a26b645e432f 100644
--- a/drivers/misc/mei/main.c
+++ b/drivers/misc/mei/main.c
@@ -535,24 +535,6 @@ static long mei_ioctl(struct file *file, unsigned int cmd, unsigned long data)
return rets;
}
-/**
- * mei_compat_ioctl - the compat IOCTL function
- *
- * @file: pointer to file structure
- * @cmd: ioctl command
- * @data: pointer to mei message structure
- *
- * Return: 0 on success , <0 on error
- */
-#ifdef CONFIG_COMPAT
-static long mei_compat_ioctl(struct file *file,
- unsigned int cmd, unsigned long data)
-{
- return mei_ioctl(file, cmd, (unsigned long)compat_ptr(data));
-}
-#endif
-
-
/**
* mei_poll - the poll function
*
@@ -855,9 +837,7 @@ static const struct file_operations mei_fops = {
.owner = THIS_MODULE,
.read = mei_read,
.unlocked_ioctl = mei_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = mei_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.open = mei_open,
.release = mei_release,
.write = mei_write,
diff --git a/drivers/mtd/ubi/cdev.c b/drivers/mtd/ubi/cdev.c
index 22547d7a84ea..2aad1da86acc 100644
--- a/drivers/mtd/ubi/cdev.c
+++ b/drivers/mtd/ubi/cdev.c
@@ -1061,36 +1061,6 @@ static long ctrl_cdev_ioctl(struct file *file, unsigned int cmd,
return err;
}
-#ifdef CONFIG_COMPAT
-static long vol_cdev_compat_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- unsigned long translated_arg = (unsigned long)compat_ptr(arg);
-
- return vol_cdev_ioctl(file, cmd, translated_arg);
-}
-
-static long ubi_cdev_compat_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- unsigned long translated_arg = (unsigned long)compat_ptr(arg);
-
- return ubi_cdev_ioctl(file, cmd, translated_arg);
-}
-
-static long ctrl_cdev_compat_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- unsigned long translated_arg = (unsigned long)compat_ptr(arg);
-
- return ctrl_cdev_ioctl(file, cmd, translated_arg);
-}
-#else
-#define vol_cdev_compat_ioctl NULL
-#define ubi_cdev_compat_ioctl NULL
-#define ctrl_cdev_compat_ioctl NULL
-#endif
-
/* UBI volume character device operations */
const struct file_operations ubi_vol_cdev_operations = {
.owner = THIS_MODULE,
@@ -1101,7 +1071,7 @@ const struct file_operations ubi_vol_cdev_operations = {
.write = vol_cdev_write,
.fsync = vol_cdev_fsync,
.unlocked_ioctl = vol_cdev_ioctl,
- .compat_ioctl = vol_cdev_compat_ioctl,
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
};
/* UBI character device operations */
@@ -1109,13 +1079,13 @@ const struct file_operations ubi_cdev_operations = {
.owner = THIS_MODULE,
.llseek = no_llseek,
.unlocked_ioctl = ubi_cdev_ioctl,
- .compat_ioctl = ubi_cdev_compat_ioctl,
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
};
/* UBI control character device operations */
const struct file_operations ubi_ctrl_cdev_operations = {
.owner = THIS_MODULE,
.unlocked_ioctl = ctrl_cdev_ioctl,
- .compat_ioctl = ctrl_cdev_compat_ioctl,
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.llseek = no_llseek,
};
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index f0f7cd977667..720deb07f2b4 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1124,14 +1124,6 @@ static long tap_ioctl(struct file *file, unsigned int cmd,
}
}
-#ifdef CONFIG_COMPAT
-static long tap_compat_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- return tap_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static const struct file_operations tap_fops = {
.owner = THIS_MODULE,
.open = tap_open,
@@ -1141,9 +1133,7 @@ static const struct file_operations tap_fops = {
.poll = tap_poll,
.llseek = no_llseek,
.unlocked_ioctl = tap_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = tap_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
};
static int tap_sendmsg(struct socket *sock, struct msghdr *m,
diff --git a/drivers/staging/pi433/pi433_if.c b/drivers/staging/pi433/pi433_if.c
index c85a805a1243..9e4caf7ad384 100644
--- a/drivers/staging/pi433/pi433_if.c
+++ b/drivers/staging/pi433/pi433_if.c
@@ -945,16 +945,6 @@ pi433_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return retval;
}
-#ifdef CONFIG_COMPAT
-static long
-pi433_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
-{
- return pi433_ioctl(filp, cmd, (unsigned long)compat_ptr(arg));
-}
-#else
-#define pi433_compat_ioctl NULL
-#endif /* CONFIG_COMPAT */
-
/*-------------------------------------------------------------------------*/
static int pi433_open(struct inode *inode, struct file *filp)
@@ -1111,7 +1101,7 @@ static const struct file_operations pi433_fops = {
.write = pi433_write,
.read = pi433_read,
.unlocked_ioctl = pi433_ioctl,
- .compat_ioctl = pi433_compat_ioctl,
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.open = pi433_open,
.release = pi433_release,
.llseek = no_llseek,
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 6ce77b33da61..269e0befba2d 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -2553,18 +2553,6 @@ static long usbdev_ioctl(struct file *file, unsigned int cmd,
return ret;
}
-#ifdef CONFIG_COMPAT
-static long usbdev_compat_ioctl(struct file *file, unsigned int cmd,
- unsigned long arg)
-{
- int ret;
-
- ret = usbdev_do_ioctl(file, cmd, compat_ptr(arg));
-
- return ret;
-}
-#endif
-
/* No kernel lock - fine */
static __poll_t usbdev_poll(struct file *file,
struct poll_table_struct *wait)
@@ -2588,9 +2576,7 @@ const struct file_operations usbdev_file_operations = {
.read = usbdev_read,
.poll = usbdev_poll,
.unlocked_ioctl = usbdev_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = usbdev_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.mmap = usbdev_mmap,
.open = usbdev_open,
.release = usbdev_release,
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 64833879f75d..79f08a99602d 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1200,15 +1200,6 @@ static long vfio_fops_unl_ioctl(struct file *filep,
return ret;
}
-#ifdef CONFIG_COMPAT
-static long vfio_fops_compat_ioctl(struct file *filep,
- unsigned int cmd, unsigned long arg)
-{
- arg = (unsigned long)compat_ptr(arg);
- return vfio_fops_unl_ioctl(filep, cmd, arg);
-}
-#endif /* CONFIG_COMPAT */
-
static int vfio_fops_open(struct inode *inode, struct file *filep)
{
struct vfio_container *container;
@@ -1291,9 +1282,7 @@ static const struct file_operations vfio_fops = {
.read = vfio_fops_read,
.write = vfio_fops_write,
.unlocked_ioctl = vfio_fops_unl_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vfio_fops_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.mmap = vfio_fops_mmap,
};
@@ -1572,15 +1561,6 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
return ret;
}
-#ifdef CONFIG_COMPAT
-static long vfio_group_fops_compat_ioctl(struct file *filep,
- unsigned int cmd, unsigned long arg)
-{
- arg = (unsigned long)compat_ptr(arg);
- return vfio_group_fops_unl_ioctl(filep, cmd, arg);
-}
-#endif /* CONFIG_COMPAT */
-
static int vfio_group_fops_open(struct inode *inode, struct file *filep)
{
struct vfio_group *group;
@@ -1636,9 +1616,7 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
static const struct file_operations vfio_group_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = vfio_group_fops_unl_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vfio_group_fops_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.open = vfio_group_fops_open,
.release = vfio_group_fops_release,
};
@@ -1703,24 +1681,13 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
return device->ops->mmap(device->device_data, vma);
}
-#ifdef CONFIG_COMPAT
-static long vfio_device_fops_compat_ioctl(struct file *filep,
- unsigned int cmd, unsigned long arg)
-{
- arg = (unsigned long)compat_ptr(arg);
- return vfio_device_fops_unl_ioctl(filep, cmd, arg);
-}
-#endif /* CONFIG_COMPAT */
-
static const struct file_operations vfio_device_fops = {
.owner = THIS_MODULE,
.release = vfio_device_fops_release,
.read = vfio_device_fops_read,
.write = vfio_device_fops_write,
.unlocked_ioctl = vfio_device_fops_unl_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vfio_device_fops_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.mmap = vfio_device_fops_mmap,
};
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 4e656f89cb22..e9624350f6a5 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1535,14 +1535,6 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl,
}
}
-#ifdef CONFIG_COMPAT
-static long vhost_net_compat_ioctl(struct file *f, unsigned int ioctl,
- unsigned long arg)
-{
- return vhost_net_ioctl(f, ioctl, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static ssize_t vhost_net_chr_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
struct file *file = iocb->ki_filp;
@@ -1578,9 +1570,7 @@ static const struct file_operations vhost_net_fops = {
.write_iter = vhost_net_chr_write_iter,
.poll = vhost_net_chr_poll,
.unlocked_ioctl = vhost_net_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vhost_net_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.open = vhost_net_open,
.llseek = noop_llseek,
};
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index c24bb690680b..9180d38de353 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1495,21 +1495,11 @@ vhost_scsi_ioctl(struct file *f,
}
}
-#ifdef CONFIG_COMPAT
-static long vhost_scsi_compat_ioctl(struct file *f, unsigned int ioctl,
- unsigned long arg)
-{
- return vhost_scsi_ioctl(f, ioctl, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static const struct file_operations vhost_scsi_fops = {
.owner = THIS_MODULE,
.release = vhost_scsi_release,
.unlocked_ioctl = vhost_scsi_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vhost_scsi_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.open = vhost_scsi_open,
.llseek = noop_llseek,
};
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 40589850eb33..0b185b4712fb 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -298,21 +298,11 @@ static long vhost_test_ioctl(struct file *f, unsigned int ioctl,
}
}
-#ifdef CONFIG_COMPAT
-static long vhost_test_compat_ioctl(struct file *f, unsigned int ioctl,
- unsigned long arg)
-{
- return vhost_test_ioctl(f, ioctl, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static const struct file_operations vhost_test_fops = {
.owner = THIS_MODULE,
.release = vhost_test_release,
.unlocked_ioctl = vhost_test_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vhost_test_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.open = vhost_test_open,
.llseek = noop_llseek,
};
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 34bc3ab40c6d..83c60f3a9c09 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -699,23 +699,13 @@ static long vhost_vsock_dev_ioctl(struct file *f, unsigned int ioctl,
}
}
-#ifdef CONFIG_COMPAT
-static long vhost_vsock_dev_compat_ioctl(struct file *f, unsigned int ioctl,
- unsigned long arg)
-{
- return vhost_vsock_dev_ioctl(f, ioctl, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static const struct file_operations vhost_vsock_fops = {
.owner = THIS_MODULE,
.open = vhost_vsock_dev_open,
.release = vhost_vsock_dev_release,
.llseek = noop_llseek,
.unlocked_ioctl = vhost_vsock_dev_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = vhost_vsock_dev_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
};
static struct miscdevice vhost_vsock_misc = {
diff --git a/fs/fat/file.c b/fs/fat/file.c
index 4f3d72fb1e60..c52c9e9ca36b 100644
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@ -171,15 +171,6 @@ long fat_generic_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
}
}
-#ifdef CONFIG_COMPAT
-static long fat_generic_compat_ioctl(struct file *filp, unsigned int cmd,
- unsigned long arg)
-
-{
- return fat_generic_ioctl(filp, cmd, (unsigned long)compat_ptr(arg));
-}
-#endif
-
static int fat_file_release(struct inode *inode, struct file *filp)
{
if ((filp->f_mode & FMODE_WRITE) &&
@@ -209,9 +200,7 @@ const struct file_operations fat_file_operations = {
.mmap = generic_file_mmap,
.release = fat_file_release,
.unlocked_ioctl = fat_generic_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = fat_generic_compat_ioctl,
-#endif
+ .compat_ioctl = generic_compat_ioctl_ptrarg,
.fsync = fat_file_fsync,
.splice_read = generic_file_splice_read,
.fallocate = fat_fallocate,
--
2.18.0
^ permalink raw reply related
* Re: [PATCH net-next v2 2/5] virtio_ring: support creating packed ring
From: Michael S. Tsirkin @ 2018-09-12 16:12 UTC (permalink / raw)
To: Tiwei Bie; +Cc: virtio-dev, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180912075140.GA22545@debian>
On Wed, Sep 12, 2018 at 03:51:40PM +0800, Tiwei Bie wrote:
> On Mon, Sep 10, 2018 at 10:28:37AM +0800, Tiwei Bie wrote:
> > On Fri, Sep 07, 2018 at 10:03:24AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 11, 2018 at 10:27:08AM +0800, Tiwei Bie wrote:
> > > > This commit introduces the support for creating packed ring.
> > > > All split ring specific functions are added _split suffix.
> > > > Some necessary stubs for packed ring are also added.
> > > >
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > >
> [...]
> > > > +
> > > > +/*
> > > > + * The layout for the packed ring is a continuous chunk of memory
> > > > + * which looks like this.
> > > > + *
> > > > + * struct vring_packed {
> > > > + * // The actual descriptors (16 bytes each)
> > > > + * struct vring_packed_desc desc[num];
> > > > + *
> > > > + * // Padding to the next align boundary.
> > > > + * char pad[];
> > > > + *
> > > > + * // Driver Event Suppression
> > > > + * struct vring_packed_desc_event driver;
> > > > + *
> > > > + * // Device Event Suppression
> > > > + * struct vring_packed_desc_event device;
> > > > + * };
> > > > + */
> > >
> > > Why not just allocate event structures separately?
> > > Is it a win to have them share a cache line for some reason?
>
> Users may call vring_new_virtqueue() with preallocated
> memory to setup the ring, e.g.:
>
> https://github.com/torvalds/linux/blob/11da3a7f84f1/drivers/s390/virtio/virtio_ccw.c#L513-L522
> https://github.com/torvalds/linux/blob/11da3a7f84f1/drivers/misc/mic/vop/vop_main.c#L306-L307
>
> Below is the corresponding definition in split ring:
>
> https://github.com/oasis-tcs/virtio-spec/blob/89dd55f5e606/split-ring.tex#L64-L78
>
> If my understanding is correct, this is just for legacy
> interfaces, and we won't define layout in packed ring
> and don't need to support vring_new_virtqueue() in packed
> ring. Is it correct? Thanks!
mic doesn't support 1.0 yet but ccw does.
It's probably best to look into converting ccw to
virtio_create_virtqueue, then you can just fail
vring_new_virtqueue for packed.
Cornelia, are there any gotchas to look out for?
>
>
> >
> > Will do that.
> >
> > >
> > > > +static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
> > > > + void *p, unsigned long align)
> > > > +{
> > > > + vr->num = num;
> > > > + vr->desc = p;
> > > > + vr->driver = (void *)ALIGN(((uintptr_t)p +
> > > > + sizeof(struct vring_packed_desc) * num), align);
> > > > + vr->device = vr->driver + 1;
> > > > +}
> > >
> > > What's all this about alignment? Where does it come from?
> >
> > It comes from the `vring_align` parameter of vring_create_virtqueue()
> > and vring_new_virtqueue():
> >
> > https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L1061
> > https://github.com/torvalds/linux/blob/a49a9dcce802/drivers/virtio/virtio_ring.c#L1123
> >
> > Should I just ignore it in packed ring?
> >
> > CCW defined this:
> >
> > #define KVM_VIRTIO_CCW_RING_ALIGN 4096
> >
> > I'm not familiar with CCW. Currently, in this patch set, packed ring
> > isn't enabled on CCW dues to some legacy accessors are not implemented
> > in packed ring yet.
> >
> > >
> > > > +
> > > > +static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
> > > > +{
> > > > + return ((sizeof(struct vring_packed_desc) * num + align - 1)
> > > > + & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
> > > > +}
> > [...]
> > > > @@ -1129,10 +1388,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
> > > > void (*callback)(struct virtqueue *vq),
> > > > const char *name)
> > > > {
> > > > - struct vring vring;
> > > > - vring_init(&vring, num, pages, vring_align);
> > > > - return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > > > - notify, callback, name);
> > > > + union vring_union vring;
> > > > + bool packed;
> > > > +
> > > > + packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> > > > + if (packed)
> > > > + vring_init_packed(&vring.vring_packed, num, pages, vring_align);
> > > > + else
> > > > + vring_init(&vring.vring_split, num, pages, vring_align);
> > >
> > >
> > > vring_init in the UAPI header is more or less a bug.
> > > I'd just stop using it, keep it around for legacy userspace.
> >
> > Got it. I'd like to do that. Thanks.
> >
> > >
> > > > +
> > > > + return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > > > + context, notify, callback, name);
> > > > }
> > > > EXPORT_SYMBOL_GPL(vring_new_virtqueue);
> > > >
> > > > @@ -1142,7 +1408,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
> > > >
> > > > if (vq->we_own_ring) {
> > > > vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
> > > > - vq->vring.desc, vq->queue_dma_addr);
> > > > + vq->packed ? (void *)vq->vring_packed.desc :
> > > > + (void *)vq->vring.desc,
> > > > + vq->queue_dma_addr);
> > > > }
> > > > list_del(&_vq->list);
> > > > kfree(vq);
> > > > @@ -1184,7 +1452,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
> > > >
> > > > struct vring_virtqueue *vq = to_vvq(_vq);
> > > >
> > > > - return vq->vring.num;
> > > > + return vq->packed ? vq->vring_packed.num : vq->vring.num;
> > > > }
> > > > EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
> > > >
> > > > @@ -1227,6 +1495,10 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
> > > >
> > > > BUG_ON(!vq->we_own_ring);
> > > >
> > > > + if (vq->packed)
> > > > + return vq->queue_dma_addr + ((char *)vq->vring_packed.driver -
> > > > + (char *)vq->vring_packed.desc);
> > > > +
> > > > return vq->queue_dma_addr +
> > > > ((char *)vq->vring.avail - (char *)vq->vring.desc);
> > > > }
> > > > @@ -1238,11 +1510,16 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
> > > >
> > > > BUG_ON(!vq->we_own_ring);
> > > >
> > > > + if (vq->packed)
> > > > + return vq->queue_dma_addr + ((char *)vq->vring_packed.device -
> > > > + (char *)vq->vring_packed.desc);
> > > > +
> > > > return vq->queue_dma_addr +
> > > > ((char *)vq->vring.used - (char *)vq->vring.desc);
> > > > }
> > > > EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
> > > >
> > > > +/* Only available for split ring */
> > > > const struct vring *virtqueue_get_vring(struct virtqueue *vq)
> > > > {
> > > > return &to_vvq(vq)->vring;
> > > > diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> > > > index fab02133a919..992142b35f55 100644
> > > > --- a/include/linux/virtio_ring.h
> > > > +++ b/include/linux/virtio_ring.h
> > > > @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
> > > > struct virtio_device;
> > > > struct virtqueue;
> > > >
> > > > +union vring_union {
> > > > + struct vring vring_split;
> > > > + struct vring_packed vring_packed;
> > > > +};
> > > > +
> > > > /*
> > > > * Creates a virtqueue and allocates the descriptor ring. If
> > > > * may_reduce_num is set, then this may allocate a smaller ring than
> > > > @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
> > > >
> > > > /* Creates a virtqueue with a custom layout. */
> > > > struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > - struct vring vring,
> > > > + union vring_union vring,
> > > > + bool packed,
> > > > struct virtio_device *vdev,
> > > > bool weak_barriers,
> > > > bool ctx,
> > > > --
> > > > 2.18.0
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox