* [PATCH 00/23] drm: Eliminate plane->fb/crtc usage for atomic drivers
From: Ville Syrjala @ 2018-03-22 15:22 UTC (permalink / raw)
To: dri-devel
Cc: David Airlie, Daniel Vetter, chris, Eric Anholt,
Benjamin Gaignard, Dave Airlie, Boris Brezillon, Thomas Hellstrom,
Joonyoung Shim, Sinclair Yeh, Rob Clark, amd-gfx, martin.peres,
VMware Graphics, Harry Wentland, David (ChunMing) Zhou,
linux-arm-msm, intel-gfx, Maarten Lankhorst, Inki Dae,
virtualization, Vincent Abriou, Shawn Guo
From: Ville Syrjälä <ville.syrjala@linux.intel.com>
I really just wanted to fix i915 to re-enable its planes afer load
detection (a two line patch). This is what I actually ended up with
after I ran into a framebuffer refcount leak with said two line patch.
I've tested this on a few i915 boxes and so far it's looking
good. Everything else is just compile tested.
Entire series available here:
git://github.com/vsyrjala/linux.git plane_fb_crtc_nuke
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: chris@chris-wilson.co.uk
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: David Airlie <airlied@linux.ie>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: freedreno@lists.freedesktop.org
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: martin.peres@free.fr
Cc: Rob Clark <robdclark@gmail.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sinclair Yeh <syeh@vmware.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Vincent Abriou <vincent.abriou@st.com>
Cc: virtualization@lists.linux-foundation.org
Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>
Ville Syrjälä (23):
Revert "drm/atomic-helper: Fix leak in disable_all"
drm/atomic-helper: Make drm_atomic_helper_disable_all() update the
plane->fb pointers
drm: Clear crtc->primary->crtc when disabling the crtc via setcrtc()
drm/atomic-helper: WARN if legacy plane fb pointers are bogus when
committing duplicated state
drm: Add local 'plane' variable for primary/cursor planes
drm: Adjust whitespace for legibility
drm: Make the fb refcount handover less magic
drm: Use plane->state->fb over plane->fb
drm/i915: Stop consulting plane->fb
drm/msm: Stop consulting plane->fb
drm/sti: Stop consulting plane->fb
drm/vmwgfx: Stop consulting plane->fb
drm/zte: Stop consulting plane->fb
drm/atmel-hlcdc: Stop using plane->fb
drm: Stop updating plane->crtc/fb/old_fb on atomic drivers
drm/amdgpu/dc: Stop updating plane->fb
drm/i915: Stop updating plane->fb/crtc
drm/exynos: Stop updating plane->crtc
drm/msm: Stop updating plane->fb/crtc
drm/virtio: Stop updating plane->fb
drm/vc4: Stop updating plane->fb/crtc
drm/i915: Restore planes after load detection
drm/i915: Make force_load_detect effective even w/ DMI quirks/hotplug
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 -
drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c | 12 +---
drivers/gpu/drm/drm_atomic.c | 55 ++--------------
drivers/gpu/drm/drm_atomic_helper.c | 79 ++++++++++-------------
drivers/gpu/drm/drm_crtc.c | 51 ++++++++++-----
drivers/gpu/drm/drm_fb_helper.c | 7 --
drivers/gpu/drm/drm_framebuffer.c | 5 --
drivers/gpu/drm/drm_plane.c | 64 +++++++++++-------
drivers/gpu/drm/drm_plane_helper.c | 4 +-
drivers/gpu/drm/exynos/exynos_drm_plane.c | 2 -
drivers/gpu/drm/i915/intel_crt.c | 6 ++
drivers/gpu/drm/i915/intel_display.c | 9 +--
drivers/gpu/drm/i915/intel_fbdev.c | 2 +-
drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c | 3 +-
drivers/gpu/drm/msm/disp/mdp4/mdp4_plane.c | 2 -
drivers/gpu/drm/msm/disp/mdp5/mdp5_plane.c | 2 -
drivers/gpu/drm/sti/sti_plane.c | 9 +--
drivers/gpu/drm/vc4/vc4_crtc.c | 3 -
drivers/gpu/drm/virtio/virtgpu_display.c | 2 -
drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 6 +-
drivers/gpu/drm/zte/zx_vou.c | 2 +-
include/drm/drm_atomic.h | 3 -
22 files changed, 143 insertions(+), 187 deletions(-)
--
2.16.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH 20/23] drm/virtio: Stop updating plane->fb
From: Ville Syrjala @ 2018-03-22 15:23 UTC (permalink / raw)
To: dri-devel; +Cc: David Airlie, intel-gfx, virtualization
In-Reply-To: <20180322152313.6561-1-ville.syrjala@linux.intel.com>
From: Ville Syrjälä <ville.syrjala@linux.intel.com>
We want to get rid of plane->fb on atomic drivers. Stop setting it.
Cc: David Airlie <airlied@linux.ie>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
drivers/gpu/drm/virtio/virtgpu_display.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c b/drivers/gpu/drm/virtio/virtgpu_display.c
index 8cc8c34d67f5..42e842ceb53c 100644
--- a/drivers/gpu/drm/virtio/virtgpu_display.c
+++ b/drivers/gpu/drm/virtio/virtgpu_display.c
@@ -302,8 +302,6 @@ static int vgdev_output_init(struct virtio_gpu_device *vgdev, int index)
drm_crtc_init_with_planes(dev, crtc, primary, cursor,
&virtio_gpu_crtc_funcs, NULL);
drm_crtc_helper_add(crtc, &virtio_gpu_crtc_helper_funcs);
- primary->crtc = crtc;
- cursor->crtc = crtc;
drm_connector_init(dev, connector, &virtio_gpu_connector_funcs,
DRM_MODE_CONNECTOR_VIRTUAL);
--
2.16.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related
* Re: [PATCH 20/23] drm/virtio: Stop updating plane->fb
From: Gerd Hoffmann @ 2018-03-22 16:11 UTC (permalink / raw)
To: Ville Syrjala; +Cc: David Airlie, intel-gfx, dri-devel, virtualization
In-Reply-To: <20180322152313.6561-21-ville.syrjala@linux.intel.com>
> We want to get rid of plane->fb on atomic drivers. Stop setting it.
> - primary->crtc = crtc;
> - cursor->crtc = crtc;
commit msg vs. patch content mismatch here ...
cheers,
Gerd
^ permalink raw reply
* Re: [PATCH 00/23] drm: Eliminate plane->fb/crtc usage for atomic drivers
From: Noralf Trønnes @ 2018-03-22 16:51 UTC (permalink / raw)
To: Ville Syrjala, dri-devel
Cc: Boris Brezillon, Thomas Hellstrom, David Airlie, Daniel Vetter,
intel-gfx, Seung-Woo Kim, amd-gfx, virtualization, Kyungmin Park,
VMware Graphics, linux-arm-msm, Alex Deucher, Shawn Guo,
freedreno, Vincent Abriou, Christian König
In-Reply-To: <20180322152313.6561-1-ville.syrjala@linux.intel.com>
Den 22.03.2018 16.22, skrev Ville Syrjala:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> I really just wanted to fix i915 to re-enable its planes afer load
> detection (a two line patch). This is what I actually ended up with
> after I ran into a framebuffer refcount leak with said two line patch.
>
> I've tested this on a few i915 boxes and so far it's looking
> good. Everything else is just compile tested.
>
> Entire series available here:
> git://github.com/vsyrjala/linux.git plane_fb_crtc_nuke
>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
> Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
> Cc: chris@chris-wilson.co.uk
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Airlie <airlied@gmail.com>
> Cc: David Airlie <airlied@linux.ie>
> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
> Cc: Eric Anholt <eric@anholt.net>
> Cc: freedreno@lists.freedesktop.org
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Cc: Inki Dae <inki.dae@samsung.com>
> Cc: Joonyoung Shim <jy0922.shim@samsung.com>
> Cc: Kyungmin Park <kyungmin.park@samsung.com>
> Cc: linux-arm-msm@vger.kernel.org
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: martin.peres@free.fr
> Cc: Rob Clark <robdclark@gmail.com>
> Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
> Cc: Shawn Guo <shawnguo@kernel.org>
> Cc: Sinclair Yeh <syeh@vmware.com>
> Cc: Thomas Hellstrom <thellstrom@vmware.com>
> Cc: Vincent Abriou <vincent.abriou@st.com>
> Cc: virtualization@lists.linux-foundation.org
> Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>
>
> Ville Syrjälä (23):
> Revert "drm/atomic-helper: Fix leak in disable_all"
> drm/atomic-helper: Make drm_atomic_helper_disable_all() update the
> plane->fb pointers
> drm: Clear crtc->primary->crtc when disabling the crtc via setcrtc()
> drm/atomic-helper: WARN if legacy plane fb pointers are bogus when
> committing duplicated state
> drm: Add local 'plane' variable for primary/cursor planes
> drm: Adjust whitespace for legibility
> drm: Make the fb refcount handover less magic
> drm: Use plane->state->fb over plane->fb
> drm/i915: Stop consulting plane->fb
> drm/msm: Stop consulting plane->fb
> drm/sti: Stop consulting plane->fb
> drm/vmwgfx: Stop consulting plane->fb
> drm/zte: Stop consulting plane->fb
> drm/atmel-hlcdc: Stop using plane->fb
> drm: Stop updating plane->crtc/fb/old_fb on atomic drivers
> drm/amdgpu/dc: Stop updating plane->fb
> drm/i915: Stop updating plane->fb/crtc
> drm/exynos: Stop updating plane->crtc
> drm/msm: Stop updating plane->fb/crtc
> drm/virtio: Stop updating plane->fb
> drm/vc4: Stop updating plane->fb/crtc
> drm/i915: Restore planes after load detection
> drm/i915: Make force_load_detect effective even w/ DMI quirks/hotplug
tinydrm is also using plane->fb:
$ grep -r "plane\.fb" drivers/gpu/drm/tinydrm/
drivers/gpu/drm/tinydrm/repaper.c: if (tdev->pipe.plane.fb != fb)
drivers/gpu/drm/tinydrm/mipi-dbi.c: if (tdev->pipe.plane.fb != fb)
drivers/gpu/drm/tinydrm/mipi-dbi.c: struct drm_framebuffer *fb =
mipi->tinydrm.pipe.plane.fb;
drivers/gpu/drm/tinydrm/core/tinydrm-pipe.c: pipe->plane.fb = fb;
drivers/gpu/drm/tinydrm/ili9225.c: if (tdev->pipe.plane.fb != fb)
drivers/gpu/drm/tinydrm/st7586.c: if (tdev->pipe.plane.fb != fb)
Noralf.
> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 -
> drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c | 12 +---
> drivers/gpu/drm/drm_atomic.c | 55 ++--------------
> drivers/gpu/drm/drm_atomic_helper.c | 79 ++++++++++-------------
> drivers/gpu/drm/drm_crtc.c | 51 ++++++++++-----
> drivers/gpu/drm/drm_fb_helper.c | 7 --
> drivers/gpu/drm/drm_framebuffer.c | 5 --
> drivers/gpu/drm/drm_plane.c | 64 +++++++++++-------
> drivers/gpu/drm/drm_plane_helper.c | 4 +-
> drivers/gpu/drm/exynos/exynos_drm_plane.c | 2 -
> drivers/gpu/drm/i915/intel_crt.c | 6 ++
> drivers/gpu/drm/i915/intel_display.c | 9 +--
> drivers/gpu/drm/i915/intel_fbdev.c | 2 +-
> drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c | 3 +-
> drivers/gpu/drm/msm/disp/mdp4/mdp4_plane.c | 2 -
> drivers/gpu/drm/msm/disp/mdp5/mdp5_plane.c | 2 -
> drivers/gpu/drm/sti/sti_plane.c | 9 +--
> drivers/gpu/drm/vc4/vc4_crtc.c | 3 -
> drivers/gpu/drm/virtio/virtgpu_display.c | 2 -
> drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 6 +-
> drivers/gpu/drm/zte/zx_vou.c | 2 +-
> include/drm/drm_atomic.h | 3 -
> 22 files changed, 143 insertions(+), 187 deletions(-)
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH v2 20/23] drm/virtio: Stop updating plane->crtc
From: Ville Syrjala @ 2018-03-22 17:40 UTC (permalink / raw)
To: dri-devel; +Cc: David Airlie, intel-gfx, virtualization
In-Reply-To: <20180322152313.6561-21-ville.syrjala@linux.intel.com>
From: Ville Syrjälä <ville.syrjala@linux.intel.com>
We want to get rid of plane->crtc on atomic drivers. Stop setting it.
v2: s/fb/crtc/ in the commit message (Gerd)
Cc: David Airlie <airlied@linux.ie>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
drivers/gpu/drm/virtio/virtgpu_display.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/virtio/virtgpu_display.c b/drivers/gpu/drm/virtio/virtgpu_display.c
index 8cc8c34d67f5..42e842ceb53c 100644
--- a/drivers/gpu/drm/virtio/virtgpu_display.c
+++ b/drivers/gpu/drm/virtio/virtgpu_display.c
@@ -302,8 +302,6 @@ static int vgdev_output_init(struct virtio_gpu_device *vgdev, int index)
drm_crtc_init_with_planes(dev, crtc, primary, cursor,
&virtio_gpu_crtc_funcs, NULL);
drm_crtc_helper_add(crtc, &virtio_gpu_crtc_helper_funcs);
- primary->crtc = crtc;
- cursor->crtc = crtc;
drm_connector_init(dev, connector, &virtio_gpu_connector_funcs,
DRM_MODE_CONNECTOR_VIRTUAL);
--
2.16.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related
* Re: [PATCH 00/23] drm: Eliminate plane->fb/crtc usage for atomic drivers
From: Emil Velikov @ 2018-03-22 17:54 UTC (permalink / raw)
To: Ville Syrjala
Cc: Boris Brezillon, Thomas Hellstrom, amd-gfx mailing list,
David Airlie, Daniel Vetter, intel-gfx, Seung-Woo Kim,
ML dri-devel, open list:VIRTIO GPU DRIVER, Kyungmin Park,
VMware Graphics, linux-arm-msm, Alex Deucher, Shawn Guo,
freedreno, Vincent Abriou, Christian König
In-Reply-To: <20180322152313.6561-1-ville.syrjala@linux.intel.com>
Hi Ville,
On 22 March 2018 at 15:22, Ville Syrjala <ville.syrjala@linux.intel.com> wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> I really just wanted to fix i915 to re-enable its planes afer load
> detection (a two line patch). This is what I actually ended up with
> after I ran into a framebuffer refcount leak with said two line patch.
>
> I've tested this on a few i915 boxes and so far it's looking
> good. Everything else is just compile tested.
>
Mostly thinking out loud:
Wondering if one cannot somehow (re)move plane->fb/crtc altogether.
Otherwise drivers will reintroduce similar code, despite the WARNs and
beefy documentation :-\
HTH
Emil
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH 00/23] drm: Eliminate plane->fb/crtc usage for atomic drivers
From: Emil Velikov @ 2018-03-22 18:34 UTC (permalink / raw)
To: Harry Wentland
Cc: Boris Brezillon, Thomas Hellstrom, David Airlie, Daniel Vetter,
intel-gfx, Seung-Woo Kim, amd-gfx mailing list,
open list:VIRTIO GPU DRIVER, Vincent Abriou, Kyungmin Park,
VMware Graphics, ML dri-devel, linux-arm-msm, Alex Deucher,
freedreno, Shawn Guo, Christian König, Ville Syrjala
In-Reply-To: <a0d3ec32-be68-c96e-607c-0af4a97a3dd6@amd.com>
On 22 March 2018 at 18:03, Harry Wentland <harry.wentland@amd.com> wrote:
> On 2018-03-22 01:54 PM, Emil Velikov wrote:
>> Hi Ville,
>>
>> On 22 March 2018 at 15:22, Ville Syrjala <ville.syrjala@linux.intel.com> wrote:
>>> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>>
>>> I really just wanted to fix i915 to re-enable its planes afer load
>>> detection (a two line patch). This is what I actually ended up with
>>> after I ran into a framebuffer refcount leak with said two line patch.
>>>
>>> I've tested this on a few i915 boxes and so far it's looking
>>> good. Everything else is just compile tested.
>>>
>> Mostly thinking out loud:
>>
>> Wondering if one cannot somehow (re)move plane->fb/crtc altogether.
>> Otherwise drivers will reintroduce similar code, despite the WARNs and
>> beefy documentation :-\
>
> Wouldn't that require an atomic conversion of all remaining drivers?
>
That or maybe move into plane->legacy->{fb,crtc}. Feel free to swap
'legacy' with flashier name.
Hmm back in 2015 we had a GSoC that updated BOCHS and CIRRUS drivers,
but they never got merged.
Don't recall the details - from memory the conversion seemed fine, but
there was either shortage on review/other.
Might be worth reviving that... regardless it's getting a bit off-topic.
-Emil
[1] https://www.google-melange.com/archive/gsoc/2015/orgs/xorg/projects/johnhunter.html
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH 00/23] drm: Eliminate plane->fb/crtc usage for atomic drivers
From: Ville Syrjälä @ 2018-03-22 18:49 UTC (permalink / raw)
To: Noralf Trønnes
Cc: Boris Brezillon, Thomas Hellstrom, amd-gfx, David Airlie,
Daniel Vetter, intel-gfx, Seung-Woo Kim, dri-devel,
virtualization, Kyungmin Park, VMware Graphics, linux-arm-msm,
Alex Deucher, Shawn Guo, freedreno, Vincent Abriou,
Christian König
In-Reply-To: <b92fe986-295c-fecb-dca1-82cb9bf7b7b1@tronnes.org>
On Thu, Mar 22, 2018 at 05:51:35PM +0100, Noralf Trønnes wrote:
> tinydrm is also using plane->fb:
>
> $ grep -r "plane\.fb" drivers/gpu/drm/tinydrm/
> drivers/gpu/drm/tinydrm/repaper.c: if (tdev->pipe.plane.fb != fb)
> drivers/gpu/drm/tinydrm/mipi-dbi.c: if (tdev->pipe.plane.fb != fb)
> drivers/gpu/drm/tinydrm/mipi-dbi.c: struct drm_framebuffer *fb =
> mipi->tinydrm.pipe.plane.fb;
Oh dear, and naturally it's the most annoying one of the bunch. So we
want to take the plane lock in the fb.dirty() hook to look at the fb,
but mipi-dbi.c calls that directly from the bowels of its
.atomic_enable() hook. So that would deadlock pretty neatly if the
commit happens while already holding the lock.
So we'd either need to plumb an acquire context into fb.dirty(),
or maybe have tinydrm provide a lower level lockless dirty() hook
that gets called by both (there are just too many layers in this
particular cake to immediately see where such a hook would be best
placed). Or maybe there's some other solution I didn't think of.
Looking at this I'm also left wondering how this is supposed
handle fb.dirty() getting called concurrently with a modeset.
The dirty_mutex only seems to offer any protection for
fb.dirty() against another fb.dirty()...
--
Ville Syrjälä
Intel OTC
^ permalink raw reply
* Re: [PATCH 00/23] drm: Eliminate plane->fb/crtc usage for atomic drivers
From: Noralf Trønnes @ 2018-03-22 23:28 UTC (permalink / raw)
To: Ville Syrjälä
Cc: Boris Brezillon, Thomas Hellstrom, amd-gfx, David Airlie,
Daniel Vetter, intel-gfx, Seung-Woo Kim, dri-devel,
virtualization, Kyungmin Park, VMware Graphics, linux-arm-msm,
Alex Deucher, Shawn Guo, freedreno, Vincent Abriou,
Christian König
In-Reply-To: <20180322184912.GM5453@intel.com>
Den 22.03.2018 19.49, skrev Ville Syrjälä:
> On Thu, Mar 22, 2018 at 05:51:35PM +0100, Noralf Trønnes wrote:
>> tinydrm is also using plane->fb:
>>
>> $ grep -r "plane\.fb" drivers/gpu/drm/tinydrm/
>> drivers/gpu/drm/tinydrm/repaper.c: if (tdev->pipe.plane.fb != fb)
>> drivers/gpu/drm/tinydrm/mipi-dbi.c: if (tdev->pipe.plane.fb != fb)
>> drivers/gpu/drm/tinydrm/mipi-dbi.c: struct drm_framebuffer *fb =
>> mipi->tinydrm.pipe.plane.fb;
> Oh dear, and naturally it's the most annoying one of the bunch. So we
> want to take the plane lock in the fb.dirty() hook to look at the fb,
> but mipi-dbi.c calls that directly from the bowels of its
> .atomic_enable() hook. So that would deadlock pretty neatly if the
> commit happens while already holding the lock.
>
> So we'd either need to plumb an acquire context into fb.dirty(),
> or maybe have tinydrm provide a lower level lockless dirty() hook
> that gets called by both (there are just too many layers in this
> particular cake to immediately see where such a hook would be best
> placed). Or maybe there's some other solution I didn't think of.
>
> Looking at this I'm also left wondering how this is supposed
> handle fb.dirty() getting called concurrently with a modeset.
> The dirty_mutex only seems to offer any protection for
> fb.dirty() against another fb.dirty()...
>
I have been waiting for the 'page-flip with damage'[1] work to come to
fruition so I could handle all flushing during atomic commit.
The idea is to use the same damage rect for a generic dirtyfb callback.
Maybe a temporary tinydrm specific solution is possible until that work
lands, but I don't know enough about how to implement such a dirtyfb
callback.
I imagine it could look something like this:
struct tinydrm_device {
+ struct drm_clip_rect dirtyfb_rect;
};
static int tinydrm_fb_dirty(struct drm_framebuffer *fb,
struct drm_file *file_priv,
unsigned int flags, unsigned int color,
struct drm_clip_rect *clips,
unsigned int num_clips)
{
struct tinydrm_device *tdev = fb->dev->dev_private;
struct drm_framebuffer *planefb;
/* Grab whatever lock needed to check this */
planefb = tdev->pipe.plane.state->fb;
/* fbdev can flush even when we're not interested */
if (fb != planefb)
return 0;
/* Protect dirtyfb_rect with a lock */
tinydrm_merge_clips(&tdev->dirty_rectfb, clips, num_clips, flags,
fb->width, fb->height);
/*
* Somehow do an atomic commit that results in the atomic update
* callback being called
*/
ret = drm_atomic_commit(state);
...
}
static void mipi_dbi_flush(struct drm_framebuffer *fb,
struct drm_clip_rect *clip)
{
/* Flush out framebuffer */
}
void mipi_dbi_pipe_update(struct drm_simple_display_pipe *pipe,
struct drm_plane_state *old_state)
{
struct tinydrm_device *tdev = pipe_to_tinydrm(pipe);
struct mipi_dbi *mipi = mipi_dbi_from_tinydrm(tdev);
struct drm_framebuffer *fb = pipe->plane.state->fb;
struct drm_crtc *crtc = &tdev->pipe.crtc;
if (!fb || (fb == old_state->fb && !tdev->dirty_rect.x2))
goto out_vblank;
/* Don't flush if the controller isn't initialized yet */
if (!mipi->enabled)
goto out_vblank;
if (fb != old_state->fb) { /* Page flip or new, flush all */
mipi_dbi_flush(fb, NULL);
} else { /* dirtyfb ioctl */
mipi_dbi_flush(fb, &tdev->dirtyfb_rect);
memset(&tdev->dirtyfb_rect, 0, sizeof(tdev->dirtyfb_rect));
}
out_vblank:
if (crtc->state->event) {
spin_lock_irq(&crtc->dev->event_lock);
drm_crtc_send_vblank_event(crtc, crtc->state->event);
spin_unlock_irq(&crtc->dev->event_lock);
crtc->state->event = NULL;
}
}
This is called from the driver pipe enable callback after the controller
has been initialized:
void mipi_dbi_pipe_enable(struct drm_simple_display_pipe *pipe,
struct drm_crtc_state *crtc_state)
{
struct tinydrm_device *tdev = pipe_to_tinydrm(pipe);
struct mipi_dbi *mipi = mipi_dbi_from_tinydrm(tdev);
- struct drm_framebuffer *fb = pipe->plane.fb;
+ struct drm_framebuffer *fb = pipe->plane.state->fb;
DRM_DEBUG_KMS("\n");
mipi->enabled = true;
- if (fb)
- fb->funcs->dirty(fb, NULL, 0, 0, NULL, 0);
+ mipi_dbi_flush(fb, NULL);
tinydrm_enable_backlight(mipi->backlight);
}
I can make patches if this makes any sense and you can help me with the
dirtyfb implementation.
Noralf.
[1]
https://lists.freedesktop.org/archives/dri-devel/2017-December/161003.html
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH v2] virtio_balloon: export hugetlb page allocation counts
From: Jason Wang @ 2018-03-23 2:38 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: virtio-dev, virtualization, linux-kernel
In-Reply-To: <20180322050842-mutt-send-email-mst@kernel.org>
On 2018年03月22日 11:10, Michael S. Tsirkin wrote:
> On Thu, Mar 22, 2018 at 09:52:18AM +0800, Jason Wang wrote:
>> On 2018年03月20日 12:26, Jonathan Helman wrote:
>>>> On Mar 19, 2018, at 7:31 PM, Jason Wang<jasowang@redhat.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2018年03月20日 06:14, Jonathan Helman wrote:
>>>>> Export the number of successful and failed hugetlb page
>>>>> allocations via the virtio balloon driver. These 2 counts
>>>>> come directly from the vm_events HTLB_BUDDY_PGALLOC and
>>>>> HTLB_BUDDY_PGALLOC_FAIL.
>>>>>
>>>>> Signed-off-by: Jonathan Helman<jonathan.helman@oracle.com>
>>>> Reviewed-by: Jason Wang<jasowang@redhat.com>
>>> Thanks.
>>>
>>>>> ---
>>>>> drivers/virtio/virtio_balloon.c | 6 ++++++
>>>>> include/uapi/linux/virtio_balloon.h | 4 +++-
>>>>> 2 files changed, 9 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
>>>>> index dfe5684..6b237e3 100644
>>>>> --- a/drivers/virtio/virtio_balloon.c
>>>>> +++ b/drivers/virtio/virtio_balloon.c
>>>>> @@ -272,6 +272,12 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
>>>>> pages_to_bytes(events[PSWPOUT]));
>>>>> update_stat(vb, idx++, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]);
>>>>> update_stat(vb, idx++, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]);
>>>>> +#ifdef CONFIG_HUGETLB_PAGE
>>>>> + update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
>>>>> + events[HTLB_BUDDY_PGALLOC]);
>>>>> + update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGFAIL,
>>>>> + events[HTLB_BUDDY_PGALLOC_FAIL]);
>>>>> +#endif
>>>>> #endif
>>>>> update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMFREE,
>>>>> pages_to_bytes(i.freeram));
>>>>> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
>>>>> index 4e8b830..40297a3 100644
>>>>> --- a/include/uapi/linux/virtio_balloon.h
>>>>> +++ b/include/uapi/linux/virtio_balloon.h
>>>>> @@ -53,7 +53,9 @@ struct virtio_balloon_config {
>>>>> #define VIRTIO_BALLOON_S_MEMTOT 5 /* Total amount of memory */
>>>>> #define VIRTIO_BALLOON_S_AVAIL 6 /* Available memory as in /proc */
>>>>> #define VIRTIO_BALLOON_S_CACHES 7 /* Disk caches */
>>>>> -#define VIRTIO_BALLOON_S_NR 8
>>>>> +#define VIRTIO_BALLOON_S_HTLB_PGALLOC 8 /* Hugetlb page allocations */
>>>>> +#define VIRTIO_BALLOON_S_HTLB_PGFAIL 9 /* Hugetlb page allocation failures */
>>>>> +#define VIRTIO_BALLOON_S_NR 10
>>>>> /*
>>>>> * Memory statistics structure.
>>>> Not for this patch, but it looks to me that exporting such nr through uapi is fragile.
>>> Sorry, can you explain what you mean here?
>>>
>>> Jon
>> Spec said "Within an output buffer submitted to the statsq, the device MUST
>> ignore entries with tag values that it does not recognize". So exporting
>> VIRTIO_BALLOON_S_NR seems useless and device implementation can not depend
>> on such number in uapi.
>>
>> Thanks
> Suggestions? I don't like to break build for people ...
>
Didn't have a good idea. But maybe we should keep VIRTIO_BALLOON_S_NR
unchanged, and add a comment here.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* RE: [PATCH 1/4] iommu: Add virtio-iommu driver
From: Tian, Kevin @ 2018-03-23 8:27 UTC (permalink / raw)
To: 'Robin Murphy', Jean-Philippe Brucker,
iommu@lists.linux-foundation.org, kvm@vger.kernel.org,
virtualization@lists.linux-foundation.org,
virtio-dev@lists.oasis-open.org, kvmarm@lists.cs.columbia.edu
Cc: jayachandran.nair@cavium.com, tnowicki@caviumnetworks.com,
mst@redhat.com, Marc Zyngier, Will Deacon,
jintack@cs.columbia.edu, eric.auger.pro@gmail.com
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D19108DC42@SHSMSX101.ccr.corp.intel.com>
> From: Tian, Kevin
> Sent: Thursday, March 22, 2018 6:06 PM
>
> > From: Robin Murphy [mailto:robin.murphy@arm.com]
> > Sent: Wednesday, March 21, 2018 10:24 PM
> >
> > On 21/03/18 13:14, Jean-Philippe Brucker wrote:
> > > On 21/03/18 06:43, Tian, Kevin wrote:
> > > [...]
> > >>> +
> > >>> +#include <uapi/linux/virtio_iommu.h>
> > >>> +
> > >>> +#define MSI_IOVA_BASE 0x8000000
> > >>> +#define MSI_IOVA_LENGTH 0x100000
> > >>
> > >> this is ARM specific, and according to virtio-iommu spec isn't it
> > >> better probed on the endpoint instead of hard-coding here?
> > >
> > > These values are arbitrary, not really ARM-specific even if ARM is the
> > > only user yet: we're just reserving a random IOVA region for mapping
> > MSIs.
> > > It is hard-coded because of the way iommu-dma.c works, but I don't
> > quite
> > > remember why that allocation isn't dynamic.
> >
> > The host kernel needs to have *some* MSI region in place before the
> > guest can start configuring interrupts, otherwise it won't know what
> > address to give to the underlying hardware. However, as soon as the host
> > kernel has picked a region, host userspace needs to know that it can no
> > longer use addresses in that region for DMA-able guest memory. It's a
> > lot easier when the address is fixed in hardware and the host userspace
> > will never be stupid enough to try and VFIO_IOMMU_DMA_MAP it, but in
> > the
> > more general case where MSI writes undergo IOMMU address translation
> > so
> > it's an arbitrary IOVA, this has the potential to conflict with stuff
> > like guest memory hotplug.
> >
> > What we currently have is just the simplest option, with the host kernel
> > just picking something up-front and pretending to host userspace that
> > it's a fixed hardware address. There's certainly scope for it to be a
> > bit more dynamic in the sense of adding an interface to let userspace
> > move it around (before attaching any devices, at least), but I don't
> > think it's feasible for the host kernel to second-guess userspace enough
> > to make it entirely transparent like it is in the DMA API domain case.
> >
> > Of course, that's all assuming the host itself is using a virtio-iommu
> > (e.g. in a nested virt or emulation scenario). When it's purely within a
> > guest then an MSI reservation shouldn't matter so much, since the guest
> > won't be anywhere near the real hardware configuration anyway.
> >
> > Robin.
>
> Curious since anyway we are defining a new iommu architecture
> is it possible to avoid those ARM-specific burden completely?
>
OK, after some study around those tricks below is my learning:
- MSI_IOVA window is used only on request (iommu_dma_get
_msi_page), not meant to take effect on all architectures once
initialized. e.g. ARM GIC does it but not x86. So it is reasonable
for virtio-iommu driver to implement such capability;
- I thought whether hardware MSI doorbell can be always reported
on virtio-iommu since it's newly defined. Looks there is a problem
if underlying IOMMU is sw-managed MSI style - valid mapping is
expected in all level of translations, meaning guest has to manage
stage-1 mapping in nested configuration since stage-1 is owned
by guest.
Then virtio-iommu is naturally expected to report the same MSI
model as supported by underlying hardware. Below are some
further thoughts along this route (use 'IOMMU' to represent the
physical one and 'virtio-iommu' for virtual one):
----
In the scope of current virtio-iommu spec v.6, there is no nested
consideration yet. Guest driver is expected to use MAP/UNMAP
interface on assigned endpoints. In this case the MAP requests
(IOVA->GPA) is caught and maintained within Qemu which then
further talks to VFIO to map IOVA->HPA in IOMMU.
Qemu can learn the MSI model of IOMMU from sysfs.
For hardware MSI doorbell (x86 and some ARM):
* Host kernel reports to Qemu as IOMMU_RESV_MSI
* Qemu report to guest as VIRTIO_IOMMU_RESV_MEM_T_MSI
* Guest takes the range as IOMMU_RESV_MSI. reserved
* Qemu MAP database has no mapping for the doorbell
* Physical IOMMU page table has no mapping for the doorbell
* MSI from passthrough device bypass IOMMU
* MSI from emulated device bypass virtio-iommu
For software MSI doorbell (most ARM):
* Host kernel reports to Qemu as IOMMU_RESV_SW_MSI
* Qemu report to guest as VIRTIO_IOMMU_RESV_MEM_T_RESERVED
* Guest takes the range as IOMMU_RESV_RESERVED
* vGIC requests to map 'GPA of the virtual doorbell'
* a map request (IOVA->GPA) sent on endpoint
* Qemu maintains the mapping in MAP database
* but no VFIO_MAP request since it's purely virtual
* GIC requests to map 'HPA of the physical doorbell'
* e.g. triggered by VFIO enable msi
* IOMMU now includes a valid mapping (IOVA->HPA)
* MSI from emulated device go through Qemu MAP
database (IOVA->'GPA of virtual doorbell') and then hit vGIC
* MSI from passthrough device go through IOMMU
(IOVA->'HPA of physical doorbell') and then hit GIC
In this case, host doorbell is treated as reserved resource in
guest side. Guest has its own sw-management for virtual
doorbell which is only used for emulated device. two paths
are completely separated.
If above captures the right flow, current v0.6 spec is complete
regarding to required function definition.
----
Then comes nested case, with two level page tables (stage-1
and stage-2) in IOMMU. stage-1 is for IOVA->GPA and stage-2
is for GPA->HPA. VFIO map/unmap happens on stage-2,
while stage-1 is directly managed by guest (and bound to
IOMMU which enables nested translation from IOVA->GPA
->HPA).
For hardware MSI, there is nothing special compared to
previous requirement. Both host/guest treat the doorbell
as reserved and guarantee no mapping in either stage-1 or
stage-2.
For software MSI, more consideration is required:
* for emulated device it is just fine as long as guest keeps
IOVA->'GPA of virtual doorbell' in stage-1. Qemu is expected
to walk stage-1 page table upon MSI request from emulated
device to hit vGIC;
* for passthrough device however there is a problem. We
need valid mapping in both stage-1 and stage-2, while host
kernel is only responsible for stage-2:
1) if we expect to keep same isolation policy (i.e.
host MSI fully managed by host kernel), then an identity
mapping for host-reported MSI range is expected in stage-1.
In such case we need a new type VIRTIO_IOMMU_RESV_
MEM_T_DIRECT to teach guest setup identity mapping.
it should be the right thing to add since anyway there might
be true IOMMU_RESV_DIRECT range reported from host
which also should be handled.
2) Alternatively we could instead allow Qemu to
request dynamic change of physical doorbell mapping in
stage2, e.g. from GPA of virtual doorbell to HPA of physical
doorbell. But it doesn't like a good design - VFIO doesn't
assign interrupt controller to user space then why should
VFIO allow user mapping to doorbell...
if 1) is agreed, looks the missing part in spec is just VIRTIO_
IOMMU_RESV_MEM_T_DIRECT, though the whole story
is lengthy and fully enabling nested require many other
works. :-)
Thanks
Kevin
^ permalink raw reply
* Re: [PATCH 1/4] iommu: Add virtio-iommu driver
From: Robin Murphy @ 2018-03-23 14:48 UTC (permalink / raw)
To: Jean-Philippe Brucker, iommu, kvm, virtualization, virtio-dev,
kvmarm
Cc: jayachandran.nair, lorenzo.pieralisi, tnowicki, mst, marc.zyngier,
will.deacon, jintack, eric.auger, joro, eric.auger.pro
In-Reply-To: <20180214145340.1223-2-jean-philippe.brucker@arm.com>
On 14/02/18 14:53, Jean-Philippe Brucker wrote:
> The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
> requests such as map/unmap over virtio-mmio transport without emulating
> page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
> requests.
>
> The bulk of the code transforms calls coming from the IOMMU API into
> corresponding virtio requests. Mappings are kept in an interval tree
> instead of page tables.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
> MAINTAINERS | 6 +
> drivers/iommu/Kconfig | 11 +
> drivers/iommu/Makefile | 1 +
> drivers/iommu/virtio-iommu.c | 960 ++++++++++++++++++++++++++++++++++++++
> include/uapi/linux/virtio_ids.h | 1 +
> include/uapi/linux/virtio_iommu.h | 116 +++++
> 6 files changed, 1095 insertions(+)
> create mode 100644 drivers/iommu/virtio-iommu.c
> create mode 100644 include/uapi/linux/virtio_iommu.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3bdc260e36b7..2a181924d420 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -14818,6 +14818,12 @@ S: Maintained
> F: drivers/virtio/virtio_input.c
> F: include/uapi/linux/virtio_input.h
>
> +VIRTIO IOMMU DRIVER
> +M: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> +S: Maintained
> +F: drivers/iommu/virtio-iommu.c
> +F: include/uapi/linux/virtio_iommu.h
> +
> VIRTUAL BOX GUEST DEVICE DRIVER
> M: Hans de Goede <hdegoede@redhat.com>
> M: Arnd Bergmann <arnd@arndb.de>
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index f3a21343e636..1ea0ec74524f 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -381,4 +381,15 @@ config QCOM_IOMMU
> help
> Support for IOMMU on certain Qualcomm SoCs.
>
> +config VIRTIO_IOMMU
> + bool "Virtio IOMMU driver"
> + depends on VIRTIO_MMIO
> + select IOMMU_API
> + select INTERVAL_TREE
> + select ARM_DMA_USE_IOMMU if ARM
> + help
> + Para-virtualised IOMMU driver with virtio.
> +
> + Say Y here if you intend to run this kernel as a guest.
> +
> endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 1fb695854809..9c68be1365e1 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -29,3 +29,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
> obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
> obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
> obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> new file mode 100644
> index 000000000000..a9c9245e8ba2
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -0,0 +1,960 @@
> +/*
> + * Virtio driver for the paravirtualized IOMMU
> + *
> + * Copyright (C) 2018 ARM Limited
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0
This wants to be a // comment at the very top of the file (thankfully
the policy is now properly documented in-tree since
Documentation/process/license-rules.rst got merged)
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/amba/bus.h>
> +#include <linux/delay.h>
> +#include <linux/dma-iommu.h>
> +#include <linux/freezer.h>
> +#include <linux/interval_tree.h>
> +#include <linux/iommu.h>
> +#include <linux/module.h>
> +#include <linux/of_iommu.h>
> +#include <linux/of_platform.h>
> +#include <linux/pci.h>
> +#include <linux/platform_device.h>
> +#include <linux/virtio.h>
> +#include <linux/virtio_config.h>
> +#include <linux/virtio_ids.h>
> +#include <linux/wait.h>
> +
> +#include <uapi/linux/virtio_iommu.h>
> +
> +#define MSI_IOVA_BASE 0x8000000
> +#define MSI_IOVA_LENGTH 0x100000
> +
> +struct viommu_dev {
> + struct iommu_device iommu;
> + struct device *dev;
> + struct virtio_device *vdev;
> +
> + struct ida domain_ids;
> +
> + struct virtqueue *vq;
> + /* Serialize anything touching the request queue */
> + spinlock_t request_lock;
> +
> + /* Device configuration */
> + struct iommu_domain_geometry geometry;
> + u64 pgsize_bitmap;
> + u8 domain_bits;
> +};
> +
> +struct viommu_mapping {
> + phys_addr_t paddr;
> + struct interval_tree_node iova;
> + union {
> + struct virtio_iommu_req_map map;
> + struct virtio_iommu_req_unmap unmap;
> + } req;
> +};
> +
> +struct viommu_domain {
> + struct iommu_domain domain;
> + struct viommu_dev *viommu;
> + struct mutex mutex;
> + unsigned int id;
> +
> + spinlock_t mappings_lock;
> + struct rb_root_cached mappings;
> +
> + /* Number of endpoints attached to this domain */
> + unsigned long endpoints;
> +};
> +
> +struct viommu_endpoint {
> + struct viommu_dev *viommu;
> + struct viommu_domain *vdomain;
> +};
> +
> +struct viommu_request {
> + struct scatterlist top;
> + struct scatterlist bottom;
> +
> + int written;
> + struct list_head list;
> +};
> +
> +#define to_viommu_domain(domain) \
> + container_of(domain, struct viommu_domain, domain)
> +
> +/* Virtio transport */
> +
> +static int viommu_status_to_errno(u8 status)
> +{
> + switch (status) {
> + case VIRTIO_IOMMU_S_OK:
> + return 0;
> + case VIRTIO_IOMMU_S_UNSUPP:
> + return -ENOSYS;
> + case VIRTIO_IOMMU_S_INVAL:
> + return -EINVAL;
> + case VIRTIO_IOMMU_S_RANGE:
> + return -ERANGE;
> + case VIRTIO_IOMMU_S_NOENT:
> + return -ENOENT;
> + case VIRTIO_IOMMU_S_FAULT:
> + return -EFAULT;
> + case VIRTIO_IOMMU_S_IOERR:
> + case VIRTIO_IOMMU_S_DEVERR:
> + default:
> + return -EIO;
> + }
> +}
> +
> +/*
> + * viommu_get_req_size - compute request size
> + *
> + * A virtio-iommu request is split into one device-read-only part (top) and one
> + * device-write-only part (bottom). Given a request, return the sizes of the two
> + * parts in @top and @bottom.
> + *
> + * Return 0 on success, or an error when the request seems invalid.
> + */
> +static int viommu_get_req_size(struct viommu_dev *viommu,
> + struct virtio_iommu_req_head *req, size_t *top,
> + size_t *bottom)
> +{
> + size_t size;
> + union virtio_iommu_req *r = (void *)req;
> +
> + *bottom = sizeof(struct virtio_iommu_req_tail);
> +
> + switch (req->type) {
> + case VIRTIO_IOMMU_T_ATTACH:
> + size = sizeof(r->attach);
> + break;
> + case VIRTIO_IOMMU_T_DETACH:
> + size = sizeof(r->detach);
> + break;
> + case VIRTIO_IOMMU_T_MAP:
> + size = sizeof(r->map);
> + break;
> + case VIRTIO_IOMMU_T_UNMAP:
> + size = sizeof(r->unmap);
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + *top = size - *bottom;
> + return 0;
> +}
> +
> +static int viommu_receive_resp(struct viommu_dev *viommu, int nr_sent,
> + struct list_head *sent)
> +{
> +
> + unsigned int len;
> + int nr_received = 0;
> + struct viommu_request *req, *pending;
> +
> + pending = list_first_entry_or_null(sent, struct viommu_request, list);
> + if (WARN_ON(!pending))
> + return 0;
> +
> + while ((req = virtqueue_get_buf(viommu->vq, &len)) != NULL) {
> + if (req != pending) {
> + dev_warn(viommu->dev, "discarding stale request\n");
> + continue;
> + }
> +
> + pending->written = len;
> +
> + if (++nr_received == nr_sent) {
> + WARN_ON(!list_is_last(&pending->list, sent));
> + break;
> + } else if (WARN_ON(list_is_last(&pending->list, sent))) {
> + break;
> + }
> +
> + pending = list_next_entry(pending, list);
> + }
> +
> + return nr_received;
> +}
> +
> +static int _viommu_send_reqs_sync(struct viommu_dev *viommu,
> + struct viommu_request *req, int nr,
> + int *nr_sent)
> +{
> + int i, ret;
> + ktime_t timeout;
> + LIST_HEAD(pending);
> + int nr_received = 0;
> + struct scatterlist *sg[2];
> + /*
> + * The timeout is chosen arbitrarily. It's only here to prevent locking
> + * up the CPU in case of a device bug.
> + */
> + unsigned long timeout_ms = 1000;
> +
> + *nr_sent = 0;
> +
> + for (i = 0; i < nr; i++, req++) {
> + req->written = 0;
> +
> + sg[0] = &req->top;
> + sg[1] = &req->bottom;
> +
> + ret = virtqueue_add_sgs(viommu->vq, sg, 1, 1, req,
> + GFP_ATOMIC);
> + if (ret)
> + break;
> +
> + list_add_tail(&req->list, &pending);
> + }
> +
> + if (i && !virtqueue_kick(viommu->vq))
> + return -EPIPE;
> +
> + timeout = ktime_add_ms(ktime_get(), timeout_ms * i);
> + while (nr_received < i && ktime_before(ktime_get(), timeout)) {
> + nr_received += viommu_receive_resp(viommu, i - nr_received,
> + &pending);
> + if (nr_received < i)
> + cpu_relax();
> + }
> +
> + if (nr_received != i)
> + ret = -ETIMEDOUT;
> +
> + if (ret == -ENOSPC && nr_received)
> + /*
> + * We've freed some space since virtio told us that the ring is
> + * full, tell the caller to come back for more.
> + */
> + ret = -EAGAIN;
> +
> + *nr_sent = nr_received;
> +
> + return ret;
> +}
> +
> +/*
> + * viommu_send_reqs_sync - add a batch of requests, kick the host and wait for
> + * them to return
> + *
> + * @req: array of requests
> + * @nr: array length
> + * @nr_sent: on return, contains the number of requests actually sent
> + *
> + * Return 0 on success, or an error if we failed to send some of the requests.
> + */
> +static int viommu_send_reqs_sync(struct viommu_dev *viommu,
> + struct viommu_request *req, int nr,
> + int *nr_sent)
> +{
> + int ret;
> + int sent = 0;
> + unsigned long flags;
> +
> + *nr_sent = 0;
> + do {
> + spin_lock_irqsave(&viommu->request_lock, flags);
> + ret = _viommu_send_reqs_sync(viommu, req, nr, &sent);
> + spin_unlock_irqrestore(&viommu->request_lock, flags);
> +
> + *nr_sent += sent;
> + req += sent;
> + nr -= sent;
> + } while (ret == -EAGAIN);
> +
> + return ret;
> +}
> +
> +/*
> + * viommu_send_req_sync - send one request and wait for reply
> + *
> + * @top: pointer to a virtio_iommu_req_* structure
> + *
> + * Returns 0 if the request was successful, or an error number otherwise. No
> + * distinction is done between transport and request errors.
> + */
> +static int viommu_send_req_sync(struct viommu_dev *viommu, void *top)
> +{
> + int ret;
> + int nr_sent;
> + void *bottom;
> + size_t top_size, bottom_size;
> + struct virtio_iommu_req_tail *tail;
> + struct virtio_iommu_req_head *head = top;
> + struct viommu_request req = {
> + .written = 0
> + };
> +
> + ret = viommu_get_req_size(viommu, head, &top_size, &bottom_size);
> + if (ret)
> + return ret;
> +
> + bottom = top + top_size;
> + tail = bottom + bottom_size - sizeof(*tail);
> +
> + sg_init_one(&req.top, top, top_size);
> + sg_init_one(&req.bottom, bottom, bottom_size);
> +
> + ret = viommu_send_reqs_sync(viommu, &req, 1, &nr_sent);
> + if (ret || !req.written || nr_sent != 1) {
> + dev_err(viommu->dev, "failed to send request\n");
> + return -EIO;
> + }
> +
> + return viommu_status_to_errno(tail->status);
> +}
> +
> +/*
> + * viommu_add_mapping - add a mapping to the internal tree
> + *
> + * On success, return the new mapping. Otherwise return NULL.
> + */
> +static struct viommu_mapping *
> +viommu_add_mapping(struct viommu_domain *vdomain, unsigned long iova,
> + phys_addr_t paddr, size_t size)
> +{
> + unsigned long flags;
> + struct viommu_mapping *mapping;
> +
> + mapping = kzalloc(sizeof(*mapping), GFP_ATOMIC);
> + if (!mapping)
> + return NULL;
> +
> + mapping->paddr = paddr;
> + mapping->iova.start = iova;
> + mapping->iova.last = iova + size - 1;
> +
> + spin_lock_irqsave(&vdomain->mappings_lock, flags);
> + interval_tree_insert(&mapping->iova, &vdomain->mappings);
> + spin_unlock_irqrestore(&vdomain->mappings_lock, flags);
> +
> + return mapping;
> +}
> +
> +/*
> + * viommu_del_mappings - remove mappings from the internal tree
> + *
> + * @vdomain: the domain
> + * @iova: start of the range
> + * @size: size of the range. A size of 0 corresponds to the entire address
> + * space.
> + * @out_mapping: if not NULL, the first removed mapping is returned in there.
> + * This allows the caller to reuse the buffer for the unmap request. When
> + * the returned size is greater than zero, if a mapping is returned, the
> + * caller must free it.
This "free multiple mappings except maybe hand one of them off to the
caller" interface is really unintuitive. AFAICS it's only used by
viommu_unmap() to grab mapping->req, but that doesn't seem to care about
mapping itself, so I wonder whether it wouldn't make more sense to just
have a global kmem_cache of struct virtio_iommu_req_unmap for that and
avoid a lot of complexity...
> + *
> + * On success, returns the number of unmapped bytes (>= size)
> + */
> +static size_t viommu_del_mappings(struct viommu_domain *vdomain,
> + unsigned long iova, size_t size,
> + struct viommu_mapping **out_mapping)
> +{
> + size_t unmapped = 0;
> + unsigned long flags;
> + unsigned long last = iova + size - 1;
> + struct viommu_mapping *mapping = NULL;
> + struct interval_tree_node *node, *next;
> +
> + spin_lock_irqsave(&vdomain->mappings_lock, flags);
> + next = interval_tree_iter_first(&vdomain->mappings, iova, last);
> +
> + if (next) {
> + mapping = container_of(next, struct viommu_mapping, iova);
> + /* Trying to split a mapping? */
> + if (WARN_ON(mapping->iova.start < iova))
> + next = NULL;
> + }
> +
> + while (next) {
> + node = next;
> + mapping = container_of(node, struct viommu_mapping, iova);
> +
> + next = interval_tree_iter_next(node, iova, last);
> +
> + /*
> + * Note that for a partial range, this will return the full
> + * mapping so we avoid sending split requests to the device.
> + */
> + unmapped += mapping->iova.last - mapping->iova.start + 1;
> +
> + interval_tree_remove(node, &vdomain->mappings);
> +
> + if (out_mapping && !(*out_mapping))
> + *out_mapping = mapping;
> + else
> + kfree(mapping);
> + }
> + spin_unlock_irqrestore(&vdomain->mappings_lock, flags);
> +
> + return unmapped;
> +}
> +
> +/*
> + * viommu_replay_mappings - re-send MAP requests
> + *
> + * When reattaching a domain that was previously detached from all endpoints,
> + * mappings were deleted from the device. Re-create the mappings available in
> + * the internal tree.
> + */
> +static int viommu_replay_mappings(struct viommu_domain *vdomain)
> +{
> + unsigned long flags;
> + int i = 1, ret, nr_sent;
> + struct viommu_request *reqs;
> + struct viommu_mapping *mapping;
> + struct interval_tree_node *node;
> + size_t top_size, bottom_size;
> +
> + spin_lock_irqsave(&vdomain->mappings_lock, flags);
> + node = interval_tree_iter_first(&vdomain->mappings, 0, -1UL);
> + if (!node) {
> + spin_unlock_irqrestore(&vdomain->mappings_lock, flags);
> + return 0;
> + }
> +
> + while ((node = interval_tree_iter_next(node, 0, -1UL)) != NULL)
> + i++;
> + spin_unlock_irqrestore(&vdomain->mappings_lock, flags);
> +
> + reqs = kcalloc(i, sizeof(*reqs), GFP_KERNEL);
> + if (!reqs)
> + return -ENOMEM;
> +
> + bottom_size = sizeof(struct virtio_iommu_req_tail);
> + top_size = sizeof(struct virtio_iommu_req_map) - bottom_size;
> +
> + i = 0;
> + spin_lock_irqsave(&vdomain->mappings_lock, flags);
> + node = interval_tree_iter_first(&vdomain->mappings, 0, -1UL);
> + while (node) {
> + mapping = container_of(node, struct viommu_mapping, iova);
> + sg_init_one(&reqs[i].top, &mapping->req.map, top_size);
> + sg_init_one(&reqs[i].bottom, &mapping->req.map.tail,
> + bottom_size);
> +
> + node = interval_tree_iter_next(node, 0, -1UL);
> + i++;
> + }
> + spin_unlock_irqrestore(&vdomain->mappings_lock, flags);
> +
> + ret = viommu_send_reqs_sync(vdomain->viommu, reqs, i, &nr_sent);
> + kfree(reqs);
> +
> + return ret;
> +}
> +
> +/* IOMMU API */
> +
> +static bool viommu_capable(enum iommu_cap cap)
> +{
> + return false;
> +}
The .capable callback is optional, so it's only worth implementing once
you want it to do something beyond the default behaviour.
> +
> +static struct iommu_domain *viommu_domain_alloc(unsigned type)
> +{
> + struct viommu_domain *vdomain;
> +
> + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> + return NULL;
> +
> + vdomain = kzalloc(sizeof(*vdomain), GFP_KERNEL);
> + if (!vdomain)
> + return NULL;
> +
> + mutex_init(&vdomain->mutex);
> + spin_lock_init(&vdomain->mappings_lock);
> + vdomain->mappings = RB_ROOT_CACHED;
> +
> + if (type == IOMMU_DOMAIN_DMA &&
> + iommu_get_dma_cookie(&vdomain->domain)) {
> + kfree(vdomain);
> + return NULL;
> + }
> +
> + return &vdomain->domain;
> +}
> +
> +static int viommu_domain_finalise(struct viommu_dev *viommu,
> + struct iommu_domain *domain)
> +{
> + int ret;
> + struct viommu_domain *vdomain = to_viommu_domain(domain);
> + /* ida limits size to 31 bits. A value of 0 means "max" */
> + unsigned int max_domain = viommu->domain_bits >= 31 ? 0 :
> + 1U << viommu->domain_bits;
> +
> + vdomain->viommu = viommu;
> +
> + domain->pgsize_bitmap = viommu->pgsize_bitmap;
> + domain->geometry = viommu->geometry;
> +
> + ret = ida_simple_get(&viommu->domain_ids, 0, max_domain, GFP_KERNEL);
> + if (ret >= 0)
> + vdomain->id = (unsigned int)ret;
> +
> + return ret > 0 ? 0 : ret;
> +}
> +
> +static void viommu_domain_free(struct iommu_domain *domain)
> +{
> + struct viommu_domain *vdomain = to_viommu_domain(domain);
> +
> + iommu_put_dma_cookie(domain);
> +
> + /* Free all remaining mappings (size 2^64) */
> + viommu_del_mappings(vdomain, 0, 0, NULL);
> +
> + if (vdomain->viommu)
> + ida_simple_remove(&vdomain->viommu->domain_ids, vdomain->id);
> +
> + kfree(vdomain);
> +}
> +
> +static int viommu_attach_dev(struct iommu_domain *domain, struct device *dev)
> +{
> + int i;
> + int ret = 0;
> + struct virtio_iommu_req_attach *req;
> + struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> + struct viommu_endpoint *vdev = fwspec->iommu_priv;
> + struct viommu_domain *vdomain = to_viommu_domain(domain);
> +
> + mutex_lock(&vdomain->mutex);
> + if (!vdomain->viommu) {
> + /*
> + * Initialize the domain proper now that we know which viommu
> + * owns it.
> + */
> + ret = viommu_domain_finalise(vdev->viommu, domain);
> + } else if (vdomain->viommu != vdev->viommu) {
> + dev_err(dev, "cannot attach to foreign vIOMMU\n");
> + ret = -EXDEV;
> + }
> + mutex_unlock(&vdomain->mutex);
> +
> + if (ret)
> + return ret;
> +
> + /*
> + * In the virtio-iommu device, when attaching the endpoint to a new
> + * domain, it is detached from the old one and, if as as a result the
> + * old domain isn't attached to any endpoint, all mappings are removed
> + * from the old domain and it is freed.
> + *
> + * In the driver the old domain still exists, and its mappings will be
> + * recreated if it gets reattached to an endpoint. Otherwise it will be
> + * freed explicitly.
> + *
> + * vdev->vdomain is protected by group->mutex
> + */
> + if (vdev->vdomain)
> + vdev->vdomain->endpoints--;
> +
> + /* DMA to the stack is forbidden, store request on the heap */
> + req = kzalloc(sizeof(*req), GFP_KERNEL);
> + if (!req)
> + return -ENOMEM;
> +
> + *req = (struct virtio_iommu_req_attach) {
> + .head.type = VIRTIO_IOMMU_T_ATTACH,
> + .domain = cpu_to_le32(vdomain->id),
> + };
> +
> + for (i = 0; i < fwspec->num_ids; i++) {
> + req->endpoint = cpu_to_le32(fwspec->ids[i]);
> +
> + ret = viommu_send_req_sync(vdomain->viommu, req);
> + if (ret)
> + break;
> + }
> +
> + kfree(req);
> +
> + if (ret)
> + return ret;
> +
> + if (!vdomain->endpoints) {
> + /*
> + * This endpoint is the first to be attached to the domain.
> + * Replay existing mappings if any (e.g. SW MSI).
> + */
> + ret = viommu_replay_mappings(vdomain);
> + if (ret)
> + return ret;
> + }
> +
> + vdomain->endpoints++;
> + vdev->vdomain = vdomain;
> +
> + return 0;
> +}
> +
> +static int viommu_map(struct iommu_domain *domain, unsigned long iova,
> + phys_addr_t paddr, size_t size, int prot)
> +{
> + int ret;
> + int flags;
> + struct viommu_mapping *mapping;
> + struct viommu_domain *vdomain = to_viommu_domain(domain);
> +
> + mapping = viommu_add_mapping(vdomain, iova, paddr, size);
> + if (!mapping)
> + return -ENOMEM;
> +
> + flags = (prot & IOMMU_READ ? VIRTIO_IOMMU_MAP_F_READ : 0) |
> + (prot & IOMMU_WRITE ? VIRTIO_IOMMU_MAP_F_WRITE : 0);
> +
> + mapping->req.map = (struct virtio_iommu_req_map) {
> + .head.type = VIRTIO_IOMMU_T_MAP,
> + .domain = cpu_to_le32(vdomain->id),
> + .virt_start = cpu_to_le64(iova),
> + .phys_start = cpu_to_le64(paddr),
> + .virt_end = cpu_to_le64(iova + size - 1),
> + .flags = cpu_to_le32(flags),
> + };
> +
> + if (!vdomain->endpoints)
> + return 0;
> +
> + ret = viommu_send_req_sync(vdomain->viommu, &mapping->req);
> + if (ret)
> + viommu_del_mappings(vdomain, iova, size, NULL);
> +
> + return ret;
> +}
> +
> +static size_t viommu_unmap(struct iommu_domain *domain, unsigned long iova,
> + size_t size)
> +{
> + int ret = 0;
> + size_t unmapped;
> + struct viommu_mapping *mapping = NULL;
> + struct viommu_domain *vdomain = to_viommu_domain(domain);
> +
> + unmapped = viommu_del_mappings(vdomain, iova, size, &mapping);
> + if (unmapped < size) {
> + ret = -EINVAL;
> + goto out_free;
> + }
> +
> + /* Device already removed all mappings after detach. */
> + if (!vdomain->endpoints)
> + goto out_free;
> +
> + if (WARN_ON(!mapping))
> + return 0;
> +
> + mapping->req.unmap = (struct virtio_iommu_req_unmap) {
> + .head.type = VIRTIO_IOMMU_T_UNMAP,
> + .domain = cpu_to_le32(vdomain->id),
> + .virt_start = cpu_to_le64(iova),
> + .virt_end = cpu_to_le64(iova + unmapped - 1),
> + };
...In fact, the kmem_cache idea might be moot since it looks like with a
bit of restructuring you could get away with just a single per-viommu
virtio_iommu_req_unmap structure; this lot could be passed around on the
stack until request_lock is taken, at which point it would be copied
into the 'real' DMA-able structure. The equivalent might apply to
viommu_map() too - now that I'm looking at it, it seems like it would
take pretty minimal effort to encapsulate the whole business cleanly in
viommu_send_req_sync(), which could do something like this instead of
going through viommu_send_reqs_sync():
...
spin_lock_irqsave(&viommu->request_lock, flags);
viommu_copy_req(viommu->dma_req, req);
ret = _viommu_send_reqs_sync(viommu, viommu->dma_req, 1, &sent);
spin_unlock_irqrestore(&viommu->request_lock, flags);
...
> +
> + ret = viommu_send_req_sync(vdomain->viommu, &mapping->req);
> +
> +out_free:
> + kfree(mapping);
> +
> + return ret ? 0 : unmapped;
> +}
> +
> +static phys_addr_t viommu_iova_to_phys(struct iommu_domain *domain,
> + dma_addr_t iova)
> +{
> + u64 paddr = 0;
> + unsigned long flags;
> + struct viommu_mapping *mapping;
> + struct interval_tree_node *node;
> + struct viommu_domain *vdomain = to_viommu_domain(domain);
> +
> + spin_lock_irqsave(&vdomain->mappings_lock, flags);
> + node = interval_tree_iter_first(&vdomain->mappings, iova, iova);
> + if (node) {
> + mapping = container_of(node, struct viommu_mapping, iova);
> + paddr = mapping->paddr + (iova - mapping->iova.start);
> + }
> + spin_unlock_irqrestore(&vdomain->mappings_lock, flags);
> +
> + return paddr;
> +}
> +
> +static struct iommu_ops viommu_ops;
> +static struct virtio_driver virtio_iommu_drv;
> +
> +static int viommu_match_node(struct device *dev, void *data)
> +{
> + return dev->parent->fwnode == data;
> +}
> +
> +static struct viommu_dev *viommu_get_by_fwnode(struct fwnode_handle *fwnode)
> +{
> + struct device *dev = driver_find_device(&virtio_iommu_drv.driver, NULL,
> + fwnode, viommu_match_node);
> + put_device(dev);
> +
> + return dev ? dev_to_virtio(dev)->priv : NULL;
> +}
> +
> +static int viommu_add_device(struct device *dev)
> +{
> + struct iommu_group *group;
> + struct viommu_endpoint *vdev;
> + struct viommu_dev *viommu = NULL;
> + struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> +
> + if (!fwspec || fwspec->ops != &viommu_ops)
> + return -ENODEV;
> +
> + viommu = viommu_get_by_fwnode(fwspec->iommu_fwnode);
> + if (!viommu)
> + return -ENODEV;
> +
> + vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
> + if (!vdev)
> + return -ENOMEM;
> +
> + vdev->viommu = viommu;
> + fwspec->iommu_priv = vdev;
> +
> + /*
> + * Last step creates a default domain and attaches to it. Everything
> + * must be ready.
> + */
> + group = iommu_group_get_for_dev(dev);
> + if (!IS_ERR(group))
> + iommu_group_put(group);
Since you create the sysfs IOMMU device, maybe also create the links for
the masters?
> +
> + return PTR_ERR_OR_ZERO(group);
> +}
> +
> +static void viommu_remove_device(struct device *dev)
> +{
You need to remove dev from its group, too (basically, .remove_device
should always undo everything .add_device did)
It would also be good practice to verify that dev->iommu_fwspec exists
and is one of yours before touching anything, although having checked
the core code I see we do currently just about get away with it thanks
to the horrible per-bus ops.
> + kfree(dev->iommu_fwspec->iommu_priv);
> +}
> +
> +static struct iommu_group *viommu_device_group(struct device *dev)
> +{
> + if (dev_is_pci(dev))
> + return pci_device_group(dev);
> + else
> + return generic_device_group(dev);
> +}
> +
> +static int viommu_of_xlate(struct device *dev, struct of_phandle_args *args)
> +{
> + return iommu_fwspec_add_ids(dev, args->args, 1);
I'm sure a DT binding somewhere needs to document the appropriate value
and meaning for #iommu-cells - I guess that probably falls under the
virtio-mmio binding?
> +}
> +
> +static void viommu_get_resv_regions(struct device *dev, struct list_head *head)
> +{
> + struct iommu_resv_region *region;
> + int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
> +
> + region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH, prot,
> + IOMMU_RESV_SW_MSI);
> + if (!region)
> + return;
> +
> + list_add_tail(®ion->list, head);
> + iommu_dma_get_resv_regions(dev, head);
> +}
> +
> +static void viommu_put_resv_regions(struct device *dev, struct list_head *head)
> +{
> + struct iommu_resv_region *entry, *next;
> +
> + list_for_each_entry_safe(entry, next, head, list)
> + kfree(entry);
> +}
> +
> +static struct iommu_ops viommu_ops = {
> + .capable = viommu_capable,
> + .domain_alloc = viommu_domain_alloc,
> + .domain_free = viommu_domain_free,
> + .attach_dev = viommu_attach_dev,
> + .map = viommu_map,
> + .unmap = viommu_unmap,
> + .map_sg = default_iommu_map_sg,
> + .iova_to_phys = viommu_iova_to_phys,
> + .add_device = viommu_add_device,
> + .remove_device = viommu_remove_device,
> + .device_group = viommu_device_group,
> + .of_xlate = viommu_of_xlate,
> + .get_resv_regions = viommu_get_resv_regions,
> + .put_resv_regions = viommu_put_resv_regions,
> +};
> +
> +static int viommu_init_vq(struct viommu_dev *viommu)
> +{
> + struct virtio_device *vdev = dev_to_virtio(viommu->dev);
> + const char *name = "request";
> + void *ret;
> +
> + ret = virtio_find_single_vq(vdev, NULL, name);
> + if (IS_ERR(ret)) {
> + dev_err(viommu->dev, "cannot find VQ\n");
> + return PTR_ERR(ret);
> + }
> +
> + viommu->vq = ret;
> +
> + return 0;
> +}
> +
> +static int viommu_probe(struct virtio_device *vdev)
> +{
> + struct device *parent_dev = vdev->dev.parent;
> + struct viommu_dev *viommu = NULL;
> + struct device *dev = &vdev->dev;
> + u64 input_start = 0;
> + u64 input_end = -1UL;
> + int ret;
> +
> + viommu = devm_kzalloc(dev, sizeof(*viommu), GFP_KERNEL);
> + if (!viommu)
> + return -ENOMEM;
> +
> + spin_lock_init(&viommu->request_lock);
> + ida_init(&viommu->domain_ids);
> + viommu->dev = dev;
> + viommu->vdev = vdev;
> +
> + ret = viommu_init_vq(viommu);
> + if (ret)
> + return ret;
> +
> + virtio_cread(vdev, struct virtio_iommu_config, page_size_mask,
> + &viommu->pgsize_bitmap);
> +
> + if (!viommu->pgsize_bitmap) {
> + ret = -EINVAL;
> + goto err_free_vqs;
> + }
> +
> + viommu->domain_bits = 32;
> +
> + /* Optional features */
> + virtio_cread_feature(vdev, VIRTIO_IOMMU_F_INPUT_RANGE,
> + struct virtio_iommu_config, input_range.start,
> + &input_start);
> +
> + virtio_cread_feature(vdev, VIRTIO_IOMMU_F_INPUT_RANGE,
> + struct virtio_iommu_config, input_range.end,
> + &input_end);
> +
> + virtio_cread_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_BITS,
> + struct virtio_iommu_config, domain_bits,
> + &viommu->domain_bits);
> +
> + viommu->geometry = (struct iommu_domain_geometry) {
> + .aperture_start = input_start,
> + .aperture_end = input_end,
> + .force_aperture = true,
> + };
> +
> + viommu_ops.pgsize_bitmap = viommu->pgsize_bitmap;
> +
> + virtio_device_ready(vdev);
> +
> + ret = iommu_device_sysfs_add(&viommu->iommu, dev, NULL, "%s",
> + virtio_bus_name(vdev));
> + if (ret)
> + goto err_free_vqs;
> +
> + iommu_device_set_ops(&viommu->iommu, &viommu_ops);
> + iommu_device_set_fwnode(&viommu->iommu, parent_dev->fwnode);
> +
> + iommu_device_register(&viommu->iommu);
> +
> +#ifdef CONFIG_PCI
> + if (pci_bus_type.iommu_ops != &viommu_ops) {
> + pci_request_acs();
> + ret = bus_set_iommu(&pci_bus_type, &viommu_ops);
> + if (ret)
> + goto err_unregister;
> + }
> +#endif
> +#ifdef CONFIG_ARM_AMBA
> + if (amba_bustype.iommu_ops != &viommu_ops) {
> + ret = bus_set_iommu(&amba_bustype, &viommu_ops);
> + if (ret)
> + goto err_unregister;
> + }
> +#endif
> + if (platform_bus_type.iommu_ops != &viommu_ops) {
> + ret = bus_set_iommu(&platform_bus_type, &viommu_ops);
> + if (ret)
> + goto err_unregister;
> + }
> +
> + vdev->priv = viommu;
> +
> + dev_info(dev, "input address: %u bits\n",
> + order_base_2(viommu->geometry.aperture_end));
> + dev_info(dev, "page mask: %#llx\n", viommu->pgsize_bitmap);
> +
> + return 0;
> +
> +err_unregister:
> + iommu_device_sysfs_remove(&viommu->iommu);
> + iommu_device_unregister(&viommu->iommu);
> +err_free_vqs:
> + vdev->config->del_vqs(vdev);
> +
> + return ret;
> +}
> +
> +static void viommu_remove(struct virtio_device *vdev)
> +{
> + struct viommu_dev *viommu = vdev->priv;
> +
> + iommu_device_sysfs_remove(&viommu->iommu);
> + iommu_device_unregister(&viommu->iommu);
> +
> + /* Stop all virtqueues */
> + vdev->config->reset(vdev);
> + vdev->config->del_vqs(vdev);
> +
> + dev_info(&vdev->dev, "device removed\n");
> +}
> +
> +static void viommu_config_changed(struct virtio_device *vdev)
> +{
> + dev_warn(&vdev->dev, "config changed\n");
> +}
> +
> +static unsigned int features[] = {
> + VIRTIO_IOMMU_F_MAP_UNMAP,
> + VIRTIO_IOMMU_F_DOMAIN_BITS,
> + VIRTIO_IOMMU_F_INPUT_RANGE,
> +};
> +
> +static struct virtio_device_id id_table[] = {
> + { VIRTIO_ID_IOMMU, VIRTIO_DEV_ANY_ID },
> + { 0 },
> +};
> +
> +static struct virtio_driver virtio_iommu_drv = {
> + .driver.name = KBUILD_MODNAME,
> + .driver.owner = THIS_MODULE,
> + .id_table = id_table,
> + .feature_table = features,
> + .feature_table_size = ARRAY_SIZE(features),
> + .probe = viommu_probe,
> + .remove = viommu_remove,
> + .config_changed = viommu_config_changed,
> +};
> +
> +module_virtio_driver(virtio_iommu_drv);
> +
> +IOMMU_OF_DECLARE(viommu, "virtio,mmio");
> +
> +MODULE_DESCRIPTION("Virtio IOMMU driver");
> +MODULE_AUTHOR("Jean-Philippe Brucker <jean-philippe.brucker@arm.com>");
> +MODULE_LICENSE("GPL v2");
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 6d5c3b2d4f4d..cfe47c5d9a56 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -43,5 +43,6 @@
> #define VIRTIO_ID_INPUT 18 /* virtio input */
> #define VIRTIO_ID_VSOCK 19 /* virtio vsock transport */
> #define VIRTIO_ID_CRYPTO 20 /* virtio crypto */
> +#define VIRTIO_ID_IOMMU 23 /* virtio IOMMU */
>
> #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h
> new file mode 100644
> index 000000000000..0de9b44db14d
> --- /dev/null
> +++ b/include/uapi/linux/virtio_iommu.h
> @@ -0,0 +1,116 @@
> +/*
> + * Virtio-iommu definition v0.6
> + *
> + * Copyright (C) 2018 ARM Ltd.
> + *
> + * SPDX-License-Identifier: BSD-3-Clause
Again, at the top, although in /* */ here since it's a header.
Robin.
> + */
> +#ifndef _UAPI_LINUX_VIRTIO_IOMMU_H
> +#define _UAPI_LINUX_VIRTIO_IOMMU_H
> +
> +#include <linux/types.h>
> +
> +/* Feature bits */
> +#define VIRTIO_IOMMU_F_INPUT_RANGE 0
> +#define VIRTIO_IOMMU_F_DOMAIN_BITS 1
> +#define VIRTIO_IOMMU_F_MAP_UNMAP 2
> +#define VIRTIO_IOMMU_F_BYPASS 3
> +
> +struct virtio_iommu_config {
> + /* Supported page sizes */
> + __u64 page_size_mask;
> + /* Supported IOVA range */
> + struct virtio_iommu_range {
> + __u64 start;
> + __u64 end;
> + } input_range;
> + /* Max domain ID size */
> + __u8 domain_bits;
> +} __packed;
> +
> +/* Request types */
> +#define VIRTIO_IOMMU_T_ATTACH 0x01
> +#define VIRTIO_IOMMU_T_DETACH 0x02
> +#define VIRTIO_IOMMU_T_MAP 0x03
> +#define VIRTIO_IOMMU_T_UNMAP 0x04
> +
> +/* Status types */
> +#define VIRTIO_IOMMU_S_OK 0x00
> +#define VIRTIO_IOMMU_S_IOERR 0x01
> +#define VIRTIO_IOMMU_S_UNSUPP 0x02
> +#define VIRTIO_IOMMU_S_DEVERR 0x03
> +#define VIRTIO_IOMMU_S_INVAL 0x04
> +#define VIRTIO_IOMMU_S_RANGE 0x05
> +#define VIRTIO_IOMMU_S_NOENT 0x06
> +#define VIRTIO_IOMMU_S_FAULT 0x07
> +
> +struct virtio_iommu_req_head {
> + __u8 type;
> + __u8 reserved[3];
> +} __packed;
> +
> +struct virtio_iommu_req_tail {
> + __u8 status;
> + __u8 reserved[3];
> +} __packed;
> +
> +struct virtio_iommu_req_attach {
> + struct virtio_iommu_req_head head;
> +
> + __le32 domain;
> + __le32 endpoint;
> + __le32 reserved;
> +
> + struct virtio_iommu_req_tail tail;
> +} __packed;
> +
> +struct virtio_iommu_req_detach {
> + struct virtio_iommu_req_head head;
> +
> + __le32 endpoint;
> + __le32 reserved;
> +
> + struct virtio_iommu_req_tail tail;
> +} __packed;
> +
> +#define VIRTIO_IOMMU_MAP_F_READ (1 << 0)
> +#define VIRTIO_IOMMU_MAP_F_WRITE (1 << 1)
> +#define VIRTIO_IOMMU_MAP_F_EXEC (1 << 2)
> +
> +#define VIRTIO_IOMMU_MAP_F_MASK (VIRTIO_IOMMU_MAP_F_READ | \
> + VIRTIO_IOMMU_MAP_F_WRITE | \
> + VIRTIO_IOMMU_MAP_F_EXEC)
> +
> +struct virtio_iommu_req_map {
> + struct virtio_iommu_req_head head;
> +
> + __le32 domain;
> + __le64 virt_start;
> + __le64 virt_end;
> + __le64 phys_start;
> + __le32 flags;
> +
> + struct virtio_iommu_req_tail tail;
> +} __packed;
> +
> +struct virtio_iommu_req_unmap {
> + struct virtio_iommu_req_head head;
> +
> + __le32 domain;
> + __le64 virt_start;
> + __le64 virt_end;
> + __le32 reserved;
> +
> + struct virtio_iommu_req_tail tail;
> +} __packed;
> +
> +union virtio_iommu_req {
> + struct virtio_iommu_req_head head;
> +
> + struct virtio_iommu_req_attach attach;
> + struct virtio_iommu_req_detach detach;
> + struct virtio_iommu_req_map map;
> + struct virtio_iommu_req_unmap unmap;
> +};
> +
> +#endif
>
^ permalink raw reply
* Re: [PATCH 2/4] iommu/virtio: Add probe request
From: Robin Murphy @ 2018-03-23 15:00 UTC (permalink / raw)
To: Jean-Philippe Brucker, iommu, kvm, virtualization, virtio-dev,
kvmarm
Cc: jayachandran.nair, lorenzo.pieralisi, tnowicki, mst, marc.zyngier,
will.deacon, jintack, eric.auger, joro, eric.auger.pro
In-Reply-To: <20180214145340.1223-3-jean-philippe.brucker@arm.com>
On 14/02/18 14:53, Jean-Philippe Brucker wrote:
> When the device offers the probe feature, send a probe request for each
> device managed by the IOMMU. Extract RESV_MEM information. When we
> encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
> This will tell other subsystems that there is no need to map the MSI
> doorbell in the virtio-iommu, because MSIs bypass it.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> ---
> drivers/iommu/virtio-iommu.c | 163 ++++++++++++++++++++++++++++++++++++--
> include/uapi/linux/virtio_iommu.h | 37 +++++++++
> 2 files changed, 193 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index a9c9245e8ba2..3ac4b38eaf19 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -45,6 +45,7 @@ struct viommu_dev {
> struct iommu_domain_geometry geometry;
> u64 pgsize_bitmap;
> u8 domain_bits;
> + u32 probe_size;
> };
>
> struct viommu_mapping {
> @@ -72,6 +73,7 @@ struct viommu_domain {
> struct viommu_endpoint {
> struct viommu_dev *viommu;
> struct viommu_domain *vdomain;
> + struct list_head resv_regions;
> };
>
> struct viommu_request {
> @@ -140,6 +142,10 @@ static int viommu_get_req_size(struct viommu_dev *viommu,
> case VIRTIO_IOMMU_T_UNMAP:
> size = sizeof(r->unmap);
> break;
> + case VIRTIO_IOMMU_T_PROBE:
> + *bottom += viommu->probe_size;
> + size = sizeof(r->probe) + *bottom;
> + break;
> default:
> return -EINVAL;
> }
> @@ -448,6 +454,105 @@ static int viommu_replay_mappings(struct viommu_domain *vdomain)
> return ret;
> }
>
> +static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
> + struct virtio_iommu_probe_resv_mem *mem,
> + size_t len)
> +{
> + struct iommu_resv_region *region = NULL;
> + unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
> +
> + u64 addr = le64_to_cpu(mem->addr);
> + u64 size = le64_to_cpu(mem->size);
> +
> + if (len < sizeof(*mem))
> + return -EINVAL;
> +
> + switch (mem->subtype) {
> + case VIRTIO_IOMMU_RESV_MEM_T_MSI:
> + region = iommu_alloc_resv_region(addr, size, prot,
> + IOMMU_RESV_MSI);
> + break;
> + case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
> + default:
> + region = iommu_alloc_resv_region(addr, size, 0,
> + IOMMU_RESV_RESERVED);
> + break;
> + }
> +
> + list_add(&vdev->resv_regions, ®ion->list);
> +
> + /*
> + * Treat unknown subtype as RESERVED, but urge users to update their
> + * driver.
> + */
> + if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
> + mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI)
> + pr_warn("unknown resv mem subtype 0x%x\n", mem->subtype);
Might as well avoid the extra comparisons by incorporating this into the
switch statement, i.e.:
default:
dev_warn(vdev->viommu_dev->dev, ...);
/* Fallthrough */
case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
...
(dev_warn is generally preferable to pr_warn when feasible)
> +
> + return 0;
> +}
> +
> +static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev)
> +{
> + int ret;
> + u16 type, len;
> + size_t cur = 0;
> + struct virtio_iommu_req_probe *probe;
> + struct virtio_iommu_probe_property *prop;
> + struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> + struct viommu_endpoint *vdev = fwspec->iommu_priv;
> +
> + if (!fwspec->num_ids)
> + /* Trouble ahead. */
> + return -EINVAL;
> +
> + probe = kzalloc(sizeof(*probe) + viommu->probe_size +
> + sizeof(struct virtio_iommu_req_tail), GFP_KERNEL);
> + if (!probe)
> + return -ENOMEM;
> +
> + probe->head.type = VIRTIO_IOMMU_T_PROBE;
> + /*
> + * For now, assume that properties of an endpoint that outputs multiple
> + * IDs are consistent. Only probe the first one.
> + */
> + probe->endpoint = cpu_to_le32(fwspec->ids[0]);
> +
> + ret = viommu_send_req_sync(viommu, probe);
> + if (ret)
> + goto out_free;
> +
> + prop = (void *)probe->properties;
> + type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
> +
> + while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
> + cur < viommu->probe_size) {
> + len = le16_to_cpu(prop->length);
> +
> + switch (type) {
> + case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
> + ret = viommu_add_resv_mem(vdev, (void *)prop->value, len);
> + break;
> + default:
> + dev_dbg(dev, "unknown viommu prop 0x%x\n", type);
> + }
> +
> + if (ret)
> + dev_err(dev, "failed to parse viommu prop 0x%x\n", type);
> +
> + cur += sizeof(*prop) + len;
> + if (cur >= viommu->probe_size)
> + break;
> +
> + prop = (void *)probe->properties + cur;
> + type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
> + }
> +
> +out_free:
> + kfree(probe);
> + return ret;
> +}
> +
> /* IOMMU API */
>
> static bool viommu_capable(enum iommu_cap cap)
> @@ -703,6 +808,7 @@ static struct viommu_dev *viommu_get_by_fwnode(struct fwnode_handle *fwnode)
>
> static int viommu_add_device(struct device *dev)
> {
> + int ret;
> struct iommu_group *group;
> struct viommu_endpoint *vdev;
> struct viommu_dev *viommu = NULL;
> @@ -720,8 +826,16 @@ static int viommu_add_device(struct device *dev)
> return -ENOMEM;
>
> vdev->viommu = viommu;
> + INIT_LIST_HEAD(&vdev->resv_regions);
> fwspec->iommu_priv = vdev;
>
> + if (viommu->probe_size) {
> + /* Get additional information for this endpoint */
> + ret = viommu_probe_endpoint(viommu, dev);
> + if (ret)
> + return ret;
> + }
> +
> /*
> * Last step creates a default domain and attaches to it. Everything
> * must be ready.
> @@ -735,7 +849,19 @@ static int viommu_add_device(struct device *dev)
>
> static void viommu_remove_device(struct device *dev)
> {
> - kfree(dev->iommu_fwspec->iommu_priv);
> + struct viommu_endpoint *vdev;
> + struct iommu_resv_region *entry, *next;
> + struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> +
> + if (!fwspec || fwspec->ops != &viommu_ops)
> + return;
Oh good :) I guess that was just a patch-splitting issue. The group
thing still applies, though.
Robin.
> +
> + vdev = fwspec->iommu_priv;
> +
> + list_for_each_entry_safe(entry, next, &vdev->resv_regions, list)
> + kfree(entry);
> +
> + kfree(vdev);
> }
>
> static struct iommu_group *viommu_device_group(struct device *dev)
> @@ -753,15 +879,33 @@ static int viommu_of_xlate(struct device *dev, struct of_phandle_args *args)
>
> static void viommu_get_resv_regions(struct device *dev, struct list_head *head)
> {
> - struct iommu_resv_region *region;
> + struct iommu_resv_region *entry, *new_entry, *msi = NULL;
> + struct viommu_endpoint *vdev = dev->iommu_fwspec->iommu_priv;
> int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
>
> - region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH, prot,
> - IOMMU_RESV_SW_MSI);
> - if (!region)
> - return;
> + list_for_each_entry(entry, &vdev->resv_regions, list) {
> + /*
> + * If the device registered a bypass MSI windows, use it.
> + * Otherwise add a software-mapped region
> + */
> + if (entry->type == IOMMU_RESV_MSI)
> + msi = entry;
> +
> + new_entry = kmemdup(entry, sizeof(*entry), GFP_KERNEL);
> + if (!new_entry)
> + return;
> + list_add_tail(&new_entry->list, head);
> + }
> +
> + if (!msi) {
> + msi = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
> + prot, IOMMU_RESV_SW_MSI);
> + if (!msi)
> + return;
> +
> + list_add_tail(&msi->list, head);
> + }
>
> - list_add_tail(®ion->list, head);
> iommu_dma_get_resv_regions(dev, head);
> }
>
> @@ -852,6 +996,10 @@ static int viommu_probe(struct virtio_device *vdev)
> struct virtio_iommu_config, domain_bits,
> &viommu->domain_bits);
>
> + virtio_cread_feature(vdev, VIRTIO_IOMMU_F_PROBE,
> + struct virtio_iommu_config, probe_size,
> + &viommu->probe_size);
> +
> viommu->geometry = (struct iommu_domain_geometry) {
> .aperture_start = input_start,
> .aperture_end = input_end,
> @@ -933,6 +1081,7 @@ static unsigned int features[] = {
> VIRTIO_IOMMU_F_MAP_UNMAP,
> VIRTIO_IOMMU_F_DOMAIN_BITS,
> VIRTIO_IOMMU_F_INPUT_RANGE,
> + VIRTIO_IOMMU_F_PROBE,
> };
>
> static struct virtio_device_id id_table[] = {
> diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h
> index 0de9b44db14d..2335d9ed4676 100644
> --- a/include/uapi/linux/virtio_iommu.h
> +++ b/include/uapi/linux/virtio_iommu.h
> @@ -15,6 +15,7 @@
> #define VIRTIO_IOMMU_F_DOMAIN_BITS 1
> #define VIRTIO_IOMMU_F_MAP_UNMAP 2
> #define VIRTIO_IOMMU_F_BYPASS 3
> +#define VIRTIO_IOMMU_F_PROBE 4
>
> struct virtio_iommu_config {
> /* Supported page sizes */
> @@ -26,6 +27,9 @@ struct virtio_iommu_config {
> } input_range;
> /* Max domain ID size */
> __u8 domain_bits;
> + __u8 padding[3];
> + /* Probe buffer size */
> + __u32 probe_size;
> } __packed;
>
> /* Request types */
> @@ -33,6 +37,7 @@ struct virtio_iommu_config {
> #define VIRTIO_IOMMU_T_DETACH 0x02
> #define VIRTIO_IOMMU_T_MAP 0x03
> #define VIRTIO_IOMMU_T_UNMAP 0x04
> +#define VIRTIO_IOMMU_T_PROBE 0x05
>
> /* Status types */
> #define VIRTIO_IOMMU_S_OK 0x00
> @@ -104,6 +109,37 @@ struct virtio_iommu_req_unmap {
> struct virtio_iommu_req_tail tail;
> } __packed;
>
> +#define VIRTIO_IOMMU_RESV_MEM_T_RESERVED 0
> +#define VIRTIO_IOMMU_RESV_MEM_T_MSI 1
> +
> +struct virtio_iommu_probe_resv_mem {
> + __u8 subtype;
> + __u8 reserved[3];
> + __le64 addr;
> + __le64 size;
> +} __packed;
> +
> +#define VIRTIO_IOMMU_PROBE_T_NONE 0
> +#define VIRTIO_IOMMU_PROBE_T_RESV_MEM 1
> +
> +#define VIRTIO_IOMMU_PROBE_T_MASK 0xfff
> +
> +struct virtio_iommu_probe_property {
> + __le16 type;
> + __le16 length;
> + __u8 value[];
> +} __packed;
> +
> +struct virtio_iommu_req_probe {
> + struct virtio_iommu_req_head head;
> + __le32 endpoint;
> + __u8 reserved[64];
> +
> + __u8 properties[];
> +
> + /* Tail follows the variable-length properties array (no padding) */
> +} __packed;
> +
> union virtio_iommu_req {
> struct virtio_iommu_req_head head;
>
> @@ -111,6 +147,7 @@ union virtio_iommu_req {
> struct virtio_iommu_req_detach detach;
> struct virtio_iommu_req_map map;
> struct virtio_iommu_req_unmap unmap;
> + struct virtio_iommu_req_probe probe;
> };
>
> #endif
>
^ permalink raw reply
* [PATCH v29 0/4] Virtio-balloon: support free page reporting
From: Wei Wang @ 2018-03-26 2:39 UTC (permalink / raw)
To: virtio-dev, linux-kernel, virtualization, kvm, linux-mm, mst,
mhocko, akpm
Cc: yang.zhang.wz, riel, quan.xu0, liliang.opensource, huangzhichao,
pbonzini, nilal
This patch series is separated from the previous "Virtio-balloon
Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT,
implemented by this series enables the virtio-balloon driver to report
hints of guest free pages to the host. It can be used to accelerate live
migration of VMs. Here is an introduction of this usage:
Live migration needs to transfer the VM's memory from the source machine
to the destination round by round. For the 1st round, all the VM's memory
is transferred. From the 2nd round, only the pieces of memory that were
written by the guest (after the 1st round) are transferred. One method
that is popularly used by the hypervisor to track which part of memory is
written is to write-protect all the guest memory.
This feature enables the optimization by skipping the transfer of guest
free pages during VM live migration. It is not concerned that the memory
pages are used after they are given to the hypervisor as a hint of the
free pages, because they will be tracked by the hypervisor and transferred
in the subsequent round if they are used and written.
* Tests
- Test Environment
Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Guest: 8G RAM, 4 vCPU
Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
- Test Results
- Idle Guest Live Migration Time (results are averaged over 10 runs)
- Optimization v.s. Legacy = 261ms vs 1769ms --> ~86% reduction
- Guest with Linux Compilation Workload (make bzImage -j4):
- Live Migration Time (average)
Optimization v.s. Legacy = 1260ms v.s. 2634ms --> ~51% reduction
- Linux Compilation Time
Optimization v.s. Legacy = 4min58s v.s. 5min3s
--> no obvious difference
ChangeLog:
v28->v29:
- mm/page_poison: only expose page_poison_enabled(), rather than more
changes did in v28, as we are not 100% confident about that for now.
- virtio-balloon: use a separate buffer for the stop cmd, instead of
having the start and stop cmd use the same buffer. This avoids the
corner case that the start cmd is overridden by the stop cmd when
the host has a delay in reading the start cmd.
v27->v28:
- mm/page_poison: Move PAGE_POISON to page_poison.c and add a function
to expose page poison val to kernel modules.
v26->v27:
- add a new patch to expose page_poisoning_enabled to kernel modules
- virtio-balloon: set poison_val to 0xaaaaaaaa, instead of 0xaa
v25->v26: virtio-balloon changes only
- remove kicking free page vq since the host now polls the vq after
initiating the reporting
- report_free_page_func: detach all the used buffers after sending
the stop cmd id. This avoids leaving the detaching burden (i.e.
overhead) to the next cmd id. Detaching here isn't considered
overhead since the stop cmd id has been sent, and host has already
moved formard.
v24->v25:
- mm: change walk_free_mem_block to return 0 (instead of true) on
completing the report, and return a non-zero value from the
callabck, which stops the reporting.
- virtio-balloon:
- use enum instead of define for VIRTIO_BALLOON_VQ_INFLATE etc.
- avoid __virtio_clear_bit when bailing out;
- a new method to avoid reporting the some cmd id to host twice
- destroy_workqueue can cancel free page work when the feature is
negotiated;
- fail probe when the free page vq size is less than 2.
v23->v24:
- change feature name VIRTIO_BALLOON_F_FREE_PAGE_VQ to
VIRTIO_BALLOON_F_FREE_PAGE_HINT
- kick when vq->num_free < half full, instead of "= half full"
- replace BUG_ON with bailing out
- check vb->balloon_wq in probe(), if null, bail out
- add a new feature bit for page poisoning
- solve the corner case that one cmd id being sent to host twice
v22->v23:
- change to kick the device when the vq is half-way full;
- open-code batch_free_page_sg into add_one_sg;
- change cmd_id from "uint32_t" to "__virtio32";
- reserver one entry in the vq for the driver to send cmd_id, instead
of busywaiting for an available entry;
- add "stop_update" check before queue_work for prudence purpose for
now, will have a separate patch to discuss this flag check later;
- init_vqs: change to put some variables on stack to have simpler
implementation;
- add destroy_workqueue(vb->balloon_wq);
v21->v22:
- add_one_sg: some code and comment re-arrangement
- send_cmd_id: handle a cornercase
For previous ChangeLog, please reference
https://lwn.net/Articles/743660/
Wei Wang (4):
mm: support reporting free page blocks
virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
mm/page_poison: expose page_poisoning_enabled to kernel modules
virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
drivers/virtio/virtio_balloon.c | 267 +++++++++++++++++++++++++++++++-----
include/linux/mm.h | 6 +
include/uapi/linux/virtio_balloon.h | 7 +
mm/page_alloc.c | 96 +++++++++++++
mm/page_poison.c | 6 +
5 files changed, 346 insertions(+), 36 deletions(-)
--
2.7.4
^ permalink raw reply
* [PATCH v29 1/4] mm: support reporting free page blocks
From: Wei Wang @ 2018-03-26 2:39 UTC (permalink / raw)
To: virtio-dev, linux-kernel, virtualization, kvm, linux-mm, mst,
mhocko, akpm
Cc: yang.zhang.wz, riel, quan.xu0, liliang.opensource, huangzhichao,
pbonzini, nilal
In-Reply-To: <1522031994-7246-1-git-send-email-wei.w.wang@intel.com>
This patch adds support to walk through the free page blocks in the
system and report them via a callback function. Some page blocks may
leave the free list after zone->lock is released, so it is the caller's
responsibility to either detect or prevent the use of such pages.
One use example of this patch is to accelerate live migration by skipping
the transfer of free pages reported from the guest. A popular method used
by the hypervisor to track which part of memory is written during live
migration is to write-protect all the guest memory. So, those pages that
are reported as free pages but are written after the report function
returns will be captured by the hypervisor, and they will be added to the
next round of memory transfer.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michal Hocko <mhocko@kernel.org>
---
include/linux/mm.h | 6 ++++
mm/page_alloc.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 102 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42..c72b5a9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1944,6 +1944,12 @@ extern void free_area_init_node(int nid, unsigned long * zones_size,
unsigned long zone_start_pfn, unsigned long *zholes_size);
extern void free_initmem(void);
+extern int walk_free_mem_block(void *opaque,
+ int min_order,
+ int (*report_pfn_range)(void *opaque,
+ unsigned long pfn,
+ unsigned long num));
+
/*
* Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
* into the buddy system. The freed pages will be poisoned with pattern
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 635d7dd..d58de87 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4912,6 +4912,102 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
show_swap_cache_info();
}
+/*
+ * Walk through a free page list and report the found pfn range via the
+ * callback.
+ *
+ * Return 0 if it completes the reporting. Otherwise, return the non-zero
+ * value returned from the callback.
+ */
+static int walk_free_page_list(void *opaque,
+ struct zone *zone,
+ int order,
+ enum migratetype mt,
+ int (*report_pfn_range)(void *,
+ unsigned long,
+ unsigned long))
+{
+ struct page *page;
+ struct list_head *list;
+ unsigned long pfn, flags;
+ int ret = 0;
+
+ spin_lock_irqsave(&zone->lock, flags);
+ list = &zone->free_area[order].free_list[mt];
+ list_for_each_entry(page, list, lru) {
+ pfn = page_to_pfn(page);
+ ret = report_pfn_range(opaque, pfn, 1 << order);
+ if (ret)
+ break;
+ }
+ spin_unlock_irqrestore(&zone->lock, flags);
+
+ return ret;
+}
+
+/**
+ * walk_free_mem_block - Walk through the free page blocks in the system
+ * @opaque: the context passed from the caller
+ * @min_order: the minimum order of free lists to check
+ * @report_pfn_range: the callback to report the pfn range of the free pages
+ *
+ * If the callback returns a non-zero value, stop iterating the list of free
+ * page blocks. Otherwise, continue to report.
+ *
+ * Please note that there are no locking guarantees for the callback and
+ * that the reported pfn range might be freed or disappear after the
+ * callback returns so the caller has to be very careful how it is used.
+ *
+ * The callback itself must not sleep or perform any operations which would
+ * require any memory allocations directly (not even GFP_NOWAIT/GFP_ATOMIC)
+ * or via any lock dependency. It is generally advisable to implement
+ * the callback as simple as possible and defer any heavy lifting to a
+ * different context.
+ *
+ * There is no guarantee that each free range will be reported only once
+ * during one walk_free_mem_block invocation.
+ *
+ * pfn_to_page on the given range is strongly discouraged and if there is
+ * an absolute need for that make sure to contact MM people to discuss
+ * potential problems.
+ *
+ * The function itself might sleep so it cannot be called from atomic
+ * contexts.
+ *
+ * In general low orders tend to be very volatile and so it makes more
+ * sense to query larger ones first for various optimizations which like
+ * ballooning etc... This will reduce the overhead as well.
+ *
+ * Return 0 if it completes the reporting. Otherwise, return the non-zero
+ * value returned from the callback.
+ */
+int walk_free_mem_block(void *opaque,
+ int min_order,
+ int (*report_pfn_range)(void *opaque,
+ unsigned long pfn,
+ unsigned long num))
+{
+ struct zone *zone;
+ int order;
+ enum migratetype mt;
+ int ret;
+
+ for_each_populated_zone(zone) {
+ for (order = MAX_ORDER - 1; order >= min_order; order--) {
+ for (mt = 0; mt < MIGRATE_TYPES; mt++) {
+ ret = walk_free_page_list(opaque, zone,
+ order, mt,
+ report_pfn_range);
+ if (ret)
+ return ret;
+ }
+ }
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(walk_free_mem_block);
+
static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
{
zoneref->zone = zone;
--
2.7.4
^ permalink raw reply related
* [PATCH v29 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
From: Wei Wang @ 2018-03-26 2:39 UTC (permalink / raw)
To: virtio-dev, linux-kernel, virtualization, kvm, linux-mm, mst,
mhocko, akpm
Cc: yang.zhang.wz, riel, quan.xu0, liliang.opensource, huangzhichao,
pbonzini, nilal
In-Reply-To: <1522031994-7246-1-git-send-email-wei.w.wang@intel.com>
Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.
Host requests the guest to report free page hints by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.
When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host polls the free
page vq after sending the starting cmd id, so the guest doesn't need to
kick after filling an element to the vq.
Host may also requests the guest to stop the reporting in advance by
sending the stop cmd id to the guest via the configuration register.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
---
drivers/virtio/virtio_balloon.c | 257 +++++++++++++++++++++++++++++++-----
include/uapi/linux/virtio_balloon.h | 4 +
2 files changed, 225 insertions(+), 36 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index dfe5684..18d24a4 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,9 +51,22 @@ MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
static struct vfsmount *balloon_mnt;
#endif
+enum virtio_balloon_vq {
+ VIRTIO_BALLOON_VQ_INFLATE,
+ VIRTIO_BALLOON_VQ_DEFLATE,
+ VIRTIO_BALLOON_VQ_STATS,
+ VIRTIO_BALLOON_VQ_FREE_PAGE,
+ VIRTIO_BALLOON_VQ_MAX
+};
+
struct virtio_balloon {
struct virtio_device *vdev;
- struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+ struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+ /* Balloon's own wq for cpu-intensive work items */
+ struct workqueue_struct *balloon_wq;
+ /* The free page reporting work item submitted to the balloon wq */
+ struct work_struct report_free_page_work;
/* The balloon servicing is delegated to a freezable workqueue. */
struct work_struct update_balloon_stats_work;
@@ -63,6 +76,13 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
+ /* The new cmd id received from host */
+ uint32_t cmd_id_received;
+ /* The cmd id that is in use */
+ __virtio32 cmd_id_use;
+ /* Buffer to store the stop sign */
+ __virtio32 stop_cmd_id;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
@@ -320,17 +340,6 @@ static void stats_handle_request(struct virtio_balloon *vb)
virtqueue_kick(vq);
}
-static void virtballoon_changed(struct virtio_device *vdev)
-{
- struct virtio_balloon *vb = vdev->priv;
- unsigned long flags;
-
- spin_lock_irqsave(&vb->stop_update_lock, flags);
- if (!vb->stop_update)
- queue_work(system_freezable_wq, &vb->update_balloon_size_work);
- spin_unlock_irqrestore(&vb->stop_update_lock, flags);
-}
-
static inline s64 towards_target(struct virtio_balloon *vb)
{
s64 target;
@@ -347,6 +356,34 @@ static inline s64 towards_target(struct virtio_balloon *vb)
return target - vb->num_pages;
}
+static void virtballoon_changed(struct virtio_device *vdev)
+{
+ struct virtio_balloon *vb = vdev->priv;
+ unsigned long flags;
+ s64 diff = towards_target(vb);
+
+ if (diff) {
+ spin_lock_irqsave(&vb->stop_update_lock, flags);
+ if (!vb->stop_update)
+ queue_work(system_freezable_wq,
+ &vb->update_balloon_size_work);
+ spin_unlock_irqrestore(&vb->stop_update_lock, flags);
+ }
+
+ if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+ virtio_cread(vdev, struct virtio_balloon_config,
+ free_page_report_cmd_id, &vb->cmd_id_received);
+ if (vb->cmd_id_received !=
+ VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID) {
+ spin_lock_irqsave(&vb->stop_update_lock, flags);
+ if (!vb->stop_update)
+ queue_work(vb->balloon_wq,
+ &vb->report_free_page_work);
+ spin_unlock_irqrestore(&vb->stop_update_lock, flags);
+ }
+ }
+}
+
static void update_balloon_size(struct virtio_balloon *vb)
{
u32 actual = vb->num_pages;
@@ -421,42 +458,163 @@ static void update_balloon_size_func(struct work_struct *work)
static int init_vqs(struct virtio_balloon *vb)
{
- struct virtqueue *vqs[3];
- vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
- static const char * const names[] = { "inflate", "deflate", "stats" };
- int err, nvqs;
+ struct virtqueue *vqs[VIRTIO_BALLOON_VQ_MAX];
+ vq_callback_t *callbacks[VIRTIO_BALLOON_VQ_MAX];
+ const char *names[VIRTIO_BALLOON_VQ_MAX];
+ struct scatterlist sg;
+ int ret;
/*
- * We expect two virtqueues: inflate and deflate, and
- * optionally stat.
+ * Inflateq and deflateq are used unconditionally. The names[]
+ * will be NULL if the related feature is not enabled, which will
+ * cause no allocation for the corresponding virtqueue in find_vqs.
*/
- nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
- err = virtio_find_vqs(vb->vdev, nvqs, vqs, callbacks, names, NULL);
- if (err)
- return err;
+ callbacks[VIRTIO_BALLOON_VQ_INFLATE] = balloon_ack;
+ names[VIRTIO_BALLOON_VQ_INFLATE] = "inflate";
+ callbacks[VIRTIO_BALLOON_VQ_DEFLATE] = balloon_ack;
+ names[VIRTIO_BALLOON_VQ_DEFLATE] = "deflate";
+ names[VIRTIO_BALLOON_VQ_STATS] = NULL;
+ names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
- vb->inflate_vq = vqs[0];
- vb->deflate_vq = vqs[1];
if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
- struct scatterlist sg;
- unsigned int num_stats;
- vb->stats_vq = vqs[2];
+ names[VIRTIO_BALLOON_VQ_STATS] = "stats";
+ callbacks[VIRTIO_BALLOON_VQ_STATS] = stats_request;
+ }
+
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+ names[VIRTIO_BALLOON_VQ_FREE_PAGE] = "free_page_vq";
+ callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
+ }
+ ret = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX,
+ vqs, callbacks, names, NULL, NULL);
+ if (ret)
+ return ret;
+
+ vb->inflate_vq = vqs[VIRTIO_BALLOON_VQ_INFLATE];
+ vb->deflate_vq = vqs[VIRTIO_BALLOON_VQ_DEFLATE];
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
+ vb->stats_vq = vqs[VIRTIO_BALLOON_VQ_STATS];
/*
* Prime this virtqueue with one buffer so the hypervisor can
* use it to signal us later (it can't be broken yet!).
*/
- num_stats = update_balloon_stats(vb);
-
- sg_init_one(&sg, vb->stats, sizeof(vb->stats[0]) * num_stats);
- if (virtqueue_add_outbuf(vb->stats_vq, &sg, 1, vb, GFP_KERNEL)
- < 0)
- BUG();
+ sg_init_one(&sg, vb->stats, sizeof(vb->stats));
+ ret = virtqueue_add_outbuf(vb->stats_vq, &sg, 1, vb,
+ GFP_KERNEL);
+ if (ret) {
+ dev_warn(&vb->vdev->dev, "%s: add stat_vq failed\n",
+ __func__);
+ return ret;
+ }
virtqueue_kick(vb->stats_vq);
}
+
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+ vb->free_page_vq = vqs[VIRTIO_BALLOON_VQ_FREE_PAGE];
+
return 0;
}
+static int add_one_sg(struct virtqueue *vq, unsigned long pfn, uint32_t len)
+{
+ struct scatterlist sg;
+ unsigned int unused;
+
+ sg_init_table(&sg, 1);
+ sg_set_page(&sg, pfn_to_page(pfn), len, 0);
+
+ /* Detach all the used buffers from the vq */
+ while (virtqueue_get_buf(vq, &unused))
+ ;
+
+ /*
+ * Since this is an optimization feature, losing a couple of free
+ * pages to report isn't important. We simply return without adding
+ * the page hint if the vq is full.
+ * We are adding one entry each time, which essentially results in no
+ * memory allocation, so the GFP_KERNEL flag below can be ignored.
+ * Host works by polling the free page vq for hints after sending the
+ * starting cmd id, so the driver doesn't need to kick after filling
+ * the vq.
+ * Lastly, there is always one entry reserved for the cmd id to use.
+ */
+ if (vq->num_free > 1)
+ return virtqueue_add_inbuf(vq, &sg, 1, vq, GFP_KERNEL);
+
+ return 0;
+}
+
+static int virtio_balloon_send_free_pages(void *opaque, unsigned long pfn,
+ unsigned long nr_pages)
+{
+ struct virtio_balloon *vb = (struct virtio_balloon *)opaque;
+ uint32_t len = nr_pages << PAGE_SHIFT;
+
+ /*
+ * If a stop id or a new cmd id was just received from host, stop
+ * the reporting, and return 1 to indicate an active stop.
+ */
+ if (virtio32_to_cpu(vb->vdev, vb->cmd_id_use) != vb->cmd_id_received)
+ return 1;
+
+ return add_one_sg(vb->free_page_vq, pfn, len);
+}
+
+static int send_start_cmd_id(struct virtio_balloon *vb, uint32_t cmd_id)
+{
+ struct scatterlist sg;
+ struct virtqueue *vq = vb->free_page_vq;
+
+ vb->cmd_id_use = cpu_to_virtio32(vb->vdev, cmd_id);
+ sg_init_one(&sg, &vb->cmd_id_use, sizeof(vb->cmd_id_use));
+ return virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
+}
+
+static int send_stop_cmd_id(struct virtio_balloon *vb)
+{
+ struct scatterlist sg;
+ struct virtqueue *vq = vb->free_page_vq;
+
+ sg_init_one(&sg, &vb->stop_cmd_id, sizeof(vb->cmd_id_use));
+ return virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
+}
+
+static void report_free_page_func(struct work_struct *work)
+{
+ struct virtio_balloon *vb;
+ struct virtqueue *vq;
+ unsigned int unused;
+ int ret;
+
+ vb = container_of(work, struct virtio_balloon, report_free_page_work);
+ vq = vb->free_page_vq;
+
+ /* Start by sending the received cmd id to host with an outbuf */
+ ret = send_start_cmd_id(vb, vb->cmd_id_received);
+ if (unlikely(ret))
+ goto err;
+
+ ret = walk_free_mem_block(vb, 0, &virtio_balloon_send_free_pages);
+ if (unlikely(ret == -EIO))
+ goto err;
+
+ /* End by sending a stop id to host with an outbuf */
+ ret = send_stop_cmd_id(vb);
+ if (likely(!ret)) {
+ /*
+ * Ending: make sure all the used buffers have been detached
+ * from the vq.
+ */
+ while (vq->num_free != virtqueue_get_vring_size(vq))
+ virtqueue_get_buf(vq, &unused);
+ return;
+ }
+err:
+ dev_err(&vb->vdev->dev, "%s: free page vq failure, ret=%d\n",
+ __func__, ret);
+}
+
#ifdef CONFIG_BALLOON_COMPACTION
/*
* virtballoon_migratepage - perform the balloon page migration on behalf of
@@ -570,18 +728,36 @@ static int virtballoon_probe(struct virtio_device *vdev)
if (err)
goto out_free_vb;
+ if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+ /*
+ * There is always one entry reserved for cmd id, so the ring
+ * size needs to be at least two to report free page hints.
+ */
+ if (virtqueue_get_vring_size(vb->free_page_vq) < 2)
+ goto out_free_vb;
+ vb->balloon_wq = alloc_workqueue("balloon-wq",
+ WQ_FREEZABLE | WQ_CPU_INTENSIVE, 0);
+ if (!vb->balloon_wq) {
+ err = -ENOMEM;
+ goto out_del_vqs;
+ }
+ vb->stop_cmd_id = cpu_to_virtio32(vb->vdev,
+ VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
+ INIT_WORK(&vb->report_free_page_work, report_free_page_func);
+ }
+
vb->nb.notifier_call = virtballoon_oom_notify;
vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY;
err = register_oom_notifier(&vb->nb);
if (err < 0)
- goto out_del_vqs;
+ goto out_del_balloon_wq;
#ifdef CONFIG_BALLOON_COMPACTION
balloon_mnt = kern_mount(&balloon_fs);
if (IS_ERR(balloon_mnt)) {
err = PTR_ERR(balloon_mnt);
unregister_oom_notifier(&vb->nb);
- goto out_del_vqs;
+ goto out_del_balloon_wq;
}
vb->vb_dev_info.migratepage = virtballoon_migratepage;
@@ -591,7 +767,7 @@ static int virtballoon_probe(struct virtio_device *vdev)
kern_unmount(balloon_mnt);
unregister_oom_notifier(&vb->nb);
vb->vb_dev_info.inode = NULL;
- goto out_del_vqs;
+ goto out_del_balloon_wq;
}
vb->vb_dev_info.inode->i_mapping->a_ops = &balloon_aops;
#endif
@@ -602,6 +778,9 @@ static int virtballoon_probe(struct virtio_device *vdev)
virtballoon_changed(vdev);
return 0;
+out_del_balloon_wq:
+ if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+ destroy_workqueue(vb->balloon_wq);
out_del_vqs:
vdev->config->del_vqs(vdev);
out_free_vb:
@@ -635,6 +814,11 @@ static void virtballoon_remove(struct virtio_device *vdev)
cancel_work_sync(&vb->update_balloon_size_work);
cancel_work_sync(&vb->update_balloon_stats_work);
+ if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+ cancel_work_sync(&vb->report_free_page_work);
+ destroy_workqueue(vb->balloon_wq);
+ }
+
remove_common(vb);
#ifdef CONFIG_BALLOON_COMPACTION
if (vb->vb_dev_info.inode)
@@ -686,6 +870,7 @@ static unsigned int features[] = {
VIRTIO_BALLOON_F_MUST_TELL_HOST,
VIRTIO_BALLOON_F_STATS_VQ,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
+ VIRTIO_BALLOON_F_FREE_PAGE_HINT,
};
static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index 4e8b830..b2d86c2 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -34,15 +34,19 @@
#define VIRTIO_BALLOON_F_MUST_TELL_HOST 0 /* Tell before reclaiming pages */
#define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */
#define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
/* Size of a PFN in the balloon interface. */
#define VIRTIO_BALLOON_PFN_SHIFT 12
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
struct virtio_balloon_config {
/* Number of pages host wants Guest to give up. */
__u32 num_pages;
/* Number of pages we've actually got in balloon. */
__u32 actual;
+ /* Free page report command id, readonly by guest */
+ __u32 free_page_report_cmd_id;
};
#define VIRTIO_BALLOON_S_SWAP_IN 0 /* Amount of memory swapped in */
--
2.7.4
^ permalink raw reply related
* [PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules
From: Wei Wang @ 2018-03-26 2:39 UTC (permalink / raw)
To: virtio-dev, linux-kernel, virtualization, kvm, linux-mm, mst,
mhocko, akpm
Cc: yang.zhang.wz, riel, quan.xu0, liliang.opensource, huangzhichao,
pbonzini, nilal
In-Reply-To: <1522031994-7246-1-git-send-email-wei.w.wang@intel.com>
In some usages, e.g. virtio-balloon, a kernel module needs to know if
page poisoning is in use. This patch exposes the page_poisoning_enabled
function to kernel modules.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
---
mm/page_poison.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/mm/page_poison.c b/mm/page_poison.c
index e83fd44..762b472 100644
--- a/mm/page_poison.c
+++ b/mm/page_poison.c
@@ -17,6 +17,11 @@ static int early_page_poison_param(char *buf)
}
early_param("page_poison", early_page_poison_param);
+/**
+ * page_poisoning_enabled - check if page poisoning is enabled
+ *
+ * Return true if page poisoning is enabled, or false if not.
+ */
bool page_poisoning_enabled(void)
{
/*
@@ -29,6 +34,7 @@ bool page_poisoning_enabled(void)
(!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
debug_pagealloc_enabled()));
}
+EXPORT_SYMBOL_GPL(page_poisoning_enabled);
static void poison_page(struct page *page)
{
--
2.7.4
^ permalink raw reply related
* [PATCH v29 4/4] virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
From: Wei Wang @ 2018-03-26 2:39 UTC (permalink / raw)
To: virtio-dev, linux-kernel, virtualization, kvm, linux-mm, mst,
mhocko, akpm
Cc: yang.zhang.wz, riel, quan.xu0, liliang.opensource, huangzhichao,
pbonzini, nilal
In-Reply-To: <1522031994-7246-1-git-send-email-wei.w.wang@intel.com>
The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
guest is using page poisoning. Guest writes to the poison_val config
field to tell host about the page poisoning value in use.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
drivers/virtio/virtio_balloon.c | 10 ++++++++++
include/uapi/linux/virtio_balloon.h | 3 +++
2 files changed, 13 insertions(+)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 18d24a4..6de9339 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -699,6 +699,7 @@ static struct file_system_type balloon_fs = {
static int virtballoon_probe(struct virtio_device *vdev)
{
struct virtio_balloon *vb;
+ __u32 poison_val;
int err;
if (!vdev->config->get) {
@@ -744,6 +745,11 @@ static int virtballoon_probe(struct virtio_device *vdev)
vb->stop_cmd_id = cpu_to_virtio32(vb->vdev,
VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
INIT_WORK(&vb->report_free_page_work, report_free_page_func);
+ if (virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+ memset(&poison_val, PAGE_POISON, sizeof(poison_val));
+ virtio_cwrite(vb->vdev, struct virtio_balloon_config,
+ poison_val, &poison_val);
+ }
}
vb->nb.notifier_call = virtballoon_oom_notify;
@@ -862,6 +868,9 @@ static int virtballoon_restore(struct virtio_device *vdev)
static int virtballoon_validate(struct virtio_device *vdev)
{
+ if (!page_poisoning_enabled())
+ __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_POISON);
+
__virtio_clear_bit(vdev, VIRTIO_F_IOMMU_PLATFORM);
return 0;
}
@@ -871,6 +880,7 @@ static unsigned int features[] = {
VIRTIO_BALLOON_F_STATS_VQ,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
VIRTIO_BALLOON_F_FREE_PAGE_HINT,
+ VIRTIO_BALLOON_F_PAGE_POISON,
};
static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index b2d86c2..8b93581 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
#define VIRTIO_BALLOON_F_STATS_VQ 1 /* Memory Stats virtqueue */
#define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */
#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */
/* Size of a PFN in the balloon interface. */
#define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -47,6 +48,8 @@ struct virtio_balloon_config {
__u32 actual;
/* Free page report command id, readonly by guest */
__u32 free_page_report_cmd_id;
+ /* Stores PAGE_POISON if page poisoning is in use */
+ __u32 poison_val;
};
#define VIRTIO_BALLOON_S_SWAP_IN 0 /* Amount of memory swapped in */
--
2.7.4
^ permalink raw reply related
* RE: [PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled to kernel modules
From: Wang, Wei W @ 2018-03-26 3:24 UTC (permalink / raw)
To: virtio-dev@lists.oasis-open.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
linux-mm@kvack.org, mst@redhat.com, mhocko@kernel.org,
akpm@linux-foundation.org
Cc: yang.zhang.wz@gmail.com, riel@redhat.com, quan.xu0@gmail.com,
liliang.opensource@gmail.com, huangzhichao@huawei.com,
pbonzini@redhat.com, nilal@redhat.com
In-Reply-To: <1522031994-7246-4-git-send-email-wei.w.wang@intel.com>
On Monday, March 26, 2018 10:40 AM, Wang, Wei W wrote:
> Subject: [PATCH v29 3/4] mm/page_poison: expose page_poisoning_enabled
> to kernel modules
>
> In some usages, e.g. virtio-balloon, a kernel module needs to know if page
> poisoning is in use. This patch exposes the page_poisoning_enabled function
> to kernel modules.
>
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> ---
> mm/page_poison.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/mm/page_poison.c b/mm/page_poison.c index e83fd44..762b472
> 100644
> --- a/mm/page_poison.c
> +++ b/mm/page_poison.c
> @@ -17,6 +17,11 @@ static int early_page_poison_param(char *buf) }
> early_param("page_poison", early_page_poison_param);
>
> +/**
> + * page_poisoning_enabled - check if page poisoning is enabled
> + *
> + * Return true if page poisoning is enabled, or false if not.
> + */
> bool page_poisoning_enabled(void)
> {
> /*
> @@ -29,6 +34,7 @@ bool page_poisoning_enabled(void)
>
> (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) &&
> debug_pagealloc_enabled()));
> }
> +EXPORT_SYMBOL_GPL(page_poisoning_enabled);
>
> static void poison_page(struct page *page) {
> --
> 2.7.4
Could we get a review of this patch? We've reviewed other parts, and this one seems to be the last part of this feature. Thanks.
Best,
Wei
^ permalink raw reply
* [RFC PATCH V2 0/8] Packed ring for vhost
From: Jason Wang @ 2018-03-26 3:38 UTC (permalink / raw)
To: mst, virtualization; +Cc: netdev, linux-kernel, kvm
Hi all:
This RFC implement packed ring layout. The code were tested with pmd
implement by Jens at
http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor change
was needed for pmd codes to kick virtqueue since it assumes a busy
polling backend.
Test were done between localhost and guest. Testpmd (rxonly) in guest
reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.
Notes: The event suppression /indirect descriptor support is complied
test only because of lacked driver support.
Changes from V1:
- Refactor vhost used elem code to avoid open coding on used elem
- Event suppression support (compile test only).
- Indirect descriptor support (compile test only).
- Zerocopy support.
- vIOMMU support.
- SCSI/VSOCK support (compile test only).
- Fix several bugs
For simplicity, I don't implement batching or other optimizations.
Please review.
Thanks
Jason Wang (8):
vhost: move get_rx_bufs to vhost.c
vhost: hide used ring layout from device
vhost: do not use vring_used_elem
vhost_net: do not explicitly manipulate vhost_used_elem
vhost: vhost_put_user() can accept metadata type
virtio: introduce packed ring defines
vhost: packed ring support
vhost: event suppression for packed ring
drivers/vhost/net.c | 138 ++-----
drivers/vhost/scsi.c | 62 +--
drivers/vhost/vhost.c | 818 ++++++++++++++++++++++++++++++++++---
drivers/vhost/vhost.h | 46 ++-
drivers/vhost/vsock.c | 42 +-
include/uapi/linux/virtio_config.h | 9 +
include/uapi/linux/virtio_ring.h | 32 ++
7 files changed, 921 insertions(+), 226 deletions(-)
--
2.7.4
^ permalink raw reply
* [RFC PATCH V2 1/8] vhost: move get_rx_bufs to vhost.c
From: Jason Wang @ 2018-03-26 3:38 UTC (permalink / raw)
To: mst, virtualization; +Cc: netdev, linux-kernel, kvm
In-Reply-To: <1522035533-11786-1-git-send-email-jasowang@redhat.com>
Move get_rx_bufs() to vhost.c and rename it to
vhost_get_rx_bufs(). This helps to hide vring internal layout from
specific device implementation. Packed ring implementation will
benefit from this.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 83 ++-------------------------------------------------
drivers/vhost/vhost.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++
drivers/vhost/vhost.h | 7 +++++
3 files changed, 88 insertions(+), 80 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8139bc7..57dfa63 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -658,83 +658,6 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk)
return len;
}
-/* This is a multi-buffer version of vhost_get_desc, that works if
- * vq has read descriptors only.
- * @vq - the relevant virtqueue
- * @datalen - data length we'll be reading
- * @iovcount - returned count of io vectors we fill
- * @log - vhost log
- * @log_num - log offset
- * @quota - headcount quota, 1 for big buffer
- * returns number of buffer heads allocated, negative on error
- */
-static int get_rx_bufs(struct vhost_virtqueue *vq,
- struct vring_used_elem *heads,
- int datalen,
- unsigned *iovcount,
- struct vhost_log *log,
- unsigned *log_num,
- unsigned int quota)
-{
- unsigned int out, in;
- int seg = 0;
- int headcount = 0;
- unsigned d;
- int r, nlogs = 0;
- /* len is always initialized before use since we are always called with
- * datalen > 0.
- */
- u32 uninitialized_var(len);
-
- while (datalen > 0 && headcount < quota) {
- if (unlikely(seg >= UIO_MAXIOV)) {
- r = -ENOBUFS;
- goto err;
- }
- r = vhost_get_vq_desc(vq, vq->iov + seg,
- ARRAY_SIZE(vq->iov) - seg, &out,
- &in, log, log_num);
- if (unlikely(r < 0))
- goto err;
-
- d = r;
- if (d == vq->num) {
- r = 0;
- goto err;
- }
- if (unlikely(out || in <= 0)) {
- vq_err(vq, "unexpected descriptor format for RX: "
- "out %d, in %d\n", out, in);
- r = -EINVAL;
- goto err;
- }
- if (unlikely(log)) {
- nlogs += *log_num;
- log += *log_num;
- }
- heads[headcount].id = cpu_to_vhost32(vq, d);
- len = iov_length(vq->iov + seg, in);
- heads[headcount].len = cpu_to_vhost32(vq, len);
- datalen -= len;
- ++headcount;
- seg += in;
- }
- heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
- *iovcount = seg;
- if (unlikely(log))
- *log_num = nlogs;
-
- /* Detect overrun */
- if (unlikely(datalen > 0)) {
- r = UIO_MAXIOV + 1;
- goto err;
- }
- return headcount;
-err:
- vhost_discard_vq_desc(vq, headcount);
- return r;
-}
-
/* Expects to be always run from workqueue - which acts as
* read-size critical section for our kind of RCU. */
static void handle_rx(struct vhost_net *net)
@@ -784,9 +707,9 @@ static void handle_rx(struct vhost_net *net)
while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
sock_len += sock_hlen;
vhost_len = sock_len + vhost_hlen;
- headcount = get_rx_bufs(vq, vq->heads + nheads, vhost_len,
- &in, vq_log, &log,
- likely(mergeable) ? UIO_MAXIOV : 1);
+ headcount = vhost_get_bufs(vq, vq->heads + nheads, vhost_len,
+ &in, vq_log, &log,
+ likely(mergeable) ? UIO_MAXIOV : 1);
/* On error, stop handling until the next kick. */
if (unlikely(headcount < 0))
goto out;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 1b3e8d2d..c57df71 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2098,6 +2098,84 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
}
EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
+/* This is a multi-buffer version of vhost_get_desc, that works if
+ * vq has read descriptors only.
+ * @vq - the relevant virtqueue
+ * @datalen - data length we'll be reading
+ * @iovcount - returned count of io vectors we fill
+ * @log - vhost log
+ * @log_num - log offset
+ * @quota - headcount quota, 1 for big buffer
+ * returns number of buffer heads allocated, negative on error
+ */
+int vhost_get_bufs(struct vhost_virtqueue *vq,
+ struct vring_used_elem *heads,
+ int datalen,
+ unsigned *iovcount,
+ struct vhost_log *log,
+ unsigned *log_num,
+ unsigned int quota)
+{
+ unsigned int out, in;
+ int seg = 0;
+ int headcount = 0;
+ unsigned d;
+ int r, nlogs = 0;
+ /* len is always initialized before use since we are always called with
+ * datalen > 0.
+ */
+ u32 uninitialized_var(len);
+
+ while (datalen > 0 && headcount < quota) {
+ if (unlikely(seg >= UIO_MAXIOV)) {
+ r = -ENOBUFS;
+ goto err;
+ }
+ r = vhost_get_vq_desc(vq, vq->iov + seg,
+ ARRAY_SIZE(vq->iov) - seg, &out,
+ &in, log, log_num);
+ if (unlikely(r < 0))
+ goto err;
+
+ d = r;
+ if (d == vq->num) {
+ r = 0;
+ goto err;
+ }
+ if (unlikely(out || in <= 0)) {
+ vq_err(vq, "unexpected descriptor format for RX: "
+ "out %d, in %d\n", out, in);
+ r = -EINVAL;
+ goto err;
+ }
+ if (unlikely(log)) {
+ nlogs += *log_num;
+ log += *log_num;
+ }
+ heads[headcount].id = cpu_to_vhost32(vq, d);
+ len = iov_length(vq->iov + seg, in);
+ heads[headcount].len = cpu_to_vhost32(vq, len);
+ datalen -= len;
+ ++headcount;
+ seg += in;
+ }
+ heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
+ *iovcount = seg;
+ if (unlikely(log))
+ *log_num = nlogs;
+
+ /* Detect overrun */
+ if (unlikely(datalen > 0)) {
+ r = UIO_MAXIOV + 1;
+ goto err;
+ }
+ return headcount;
+err:
+ vhost_discard_vq_desc(vq, headcount);
+ return r;
+}
+EXPORT_SYMBOL_GPL(vhost_get_bufs);
+
/* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
{
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index ac4b605..cbc2c80 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -185,6 +185,13 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
struct iovec iov[], unsigned int iov_count,
unsigned int *out_num, unsigned int *in_num,
struct vhost_log *log, unsigned int *log_num);
+int vhost_get_bufs(struct vhost_virtqueue *vq,
+ struct vring_used_elem *heads,
+ int datalen,
+ unsigned *iovcount,
+ struct vhost_log *log,
+ unsigned *log_num,
+ unsigned int quota);
void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
int vhost_vq_init_access(struct vhost_virtqueue *);
--
2.7.4
^ permalink raw reply related
* [RFC PATCH V2 2/8] vhost: hide used ring layout from device
From: Jason Wang @ 2018-03-26 3:38 UTC (permalink / raw)
To: mst, virtualization; +Cc: netdev, linux-kernel, kvm
In-Reply-To: <1522035533-11786-1-git-send-email-jasowang@redhat.com>
We used to return descriptor head by vhost_get_vq_desc() to device and
pass it back to vhost_add_used() and its friends. This exposes the
internal used ring layout to device which makes it hard to be extended for
e.g packed ring layout.
So this patch tries to hide the used ring layout by
- letting vhost_get_vq_desc() return pointer to struct vring_used_elem
- accepting pointer to struct vring_used_elem in vhost_add_used() and
vhost_add_used_and_signal()
This could help to hide used ring layout and make it easier to
implement packed ring on top.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 46 +++++++++++++++++++++-----------------
drivers/vhost/scsi.c | 62 +++++++++++++++++++++++++++------------------------
drivers/vhost/vhost.c | 52 +++++++++++++++++++++---------------------
drivers/vhost/vhost.h | 9 +++++---
drivers/vhost/vsock.c | 42 +++++++++++++++++-----------------
5 files changed, 112 insertions(+), 99 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 57dfa63..f821fcd 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -422,22 +422,24 @@ static int vhost_net_enable_vq(struct vhost_net *n,
static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
struct vhost_virtqueue *vq,
+ struct vring_used_elem *used_elem,
struct iovec iov[], unsigned int iov_size,
unsigned int *out_num, unsigned int *in_num)
{
unsigned long uninitialized_var(endtime);
- int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+ int r = vhost_get_vq_desc(vq, used_elem, vq->iov, ARRAY_SIZE(vq->iov),
out_num, in_num, NULL, NULL);
- if (r == vq->num && vq->busyloop_timeout) {
+ if (r == -ENOSPC && vq->busyloop_timeout) {
preempt_disable();
endtime = busy_clock() + vq->busyloop_timeout;
while (vhost_can_busy_poll(vq->dev, endtime) &&
vhost_vq_avail_empty(vq->dev, vq))
cpu_relax();
preempt_enable();
- r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
- out_num, in_num, NULL, NULL);
+ r = vhost_get_vq_desc(vq, used_elem, vq->iov,
+ ARRAY_SIZE(vq->iov), out_num, in_num,
+ NULL, NULL);
}
return r;
@@ -459,7 +461,6 @@ static void handle_tx(struct vhost_net *net)
struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
struct vhost_virtqueue *vq = &nvq->vq;
unsigned out, in;
- int head;
struct msghdr msg = {
.msg_name = NULL,
.msg_namelen = 0,
@@ -472,6 +473,7 @@ static void handle_tx(struct vhost_net *net)
size_t hdr_size;
struct socket *sock;
struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
+ struct vring_used_elem used;
bool zcopy, zcopy_used;
mutex_lock(&vq->mutex);
@@ -494,20 +496,20 @@ static void handle_tx(struct vhost_net *net)
vhost_zerocopy_signal_used(net, vq);
- head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
- ARRAY_SIZE(vq->iov),
- &out, &in);
- /* On error, stop handling until the next kick. */
- if (unlikely(head < 0))
- break;
+ err = vhost_net_tx_get_vq_desc(net, vq, &used, vq->iov,
+ ARRAY_SIZE(vq->iov),
+ &out, &in);
/* Nothing new? Wait for eventfd to tell us they refilled. */
- if (head == vq->num) {
+ if (err == -ENOSPC) {
if (unlikely(vhost_enable_notify(&net->dev, vq))) {
vhost_disable_notify(&net->dev, vq);
continue;
}
break;
}
+ /* On error, stop handling until the next kick. */
+ if (unlikely(err < 0))
+ break;
if (in) {
vq_err(vq, "Unexpected descriptor format for TX: "
"out %d, int %d\n", out, in);
@@ -535,7 +537,8 @@ static void handle_tx(struct vhost_net *net)
struct ubuf_info *ubuf;
ubuf = nvq->ubuf_info + nvq->upend_idx;
- vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head);
+ vq->heads[nvq->upend_idx].id =
+ cpu_to_vhost32(vq, used.id);
vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
ubuf->callback = vhost_zerocopy_callback;
ubuf->ctx = nvq->ubufs;
@@ -576,7 +579,7 @@ static void handle_tx(struct vhost_net *net)
pr_debug("Truncated TX packet: "
" len %d != %zd\n", err, len);
if (!zcopy_used)
- vhost_add_used_and_signal(&net->dev, vq, head, 0);
+ vhost_add_used_and_signal(&net->dev, vq, &used, 0);
else
vhost_zerocopy_signal_used(net, vq);
vhost_net_tx_packet(net);
@@ -707,14 +710,12 @@ static void handle_rx(struct vhost_net *net)
while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
sock_len += sock_hlen;
vhost_len = sock_len + vhost_hlen;
- headcount = vhost_get_bufs(vq, vq->heads + nheads, vhost_len,
- &in, vq_log, &log,
- likely(mergeable) ? UIO_MAXIOV : 1);
- /* On error, stop handling until the next kick. */
- if (unlikely(headcount < 0))
- goto out;
+ err = vhost_get_bufs(vq, vq->heads + nheads, vhost_len,
+ &in, vq_log, &log,
+ likely(mergeable) ? UIO_MAXIOV : 1,
+ &headcount);
/* OK, now we need to know about added descriptors. */
- if (!headcount) {
+ if (err == -ENOSPC) {
if (unlikely(vhost_enable_notify(&net->dev, vq))) {
/* They have slipped one in as we were
* doing that: check again. */
@@ -725,6 +726,9 @@ static void handle_rx(struct vhost_net *net)
* they refilled. */
goto out;
}
+ /* On error, stop handling until the next kick. */
+ if (unlikely(err < 0))
+ goto out;
if (nvq->rx_ring)
msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
/* On overrun, truncate and discard */
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 7ad5709..654c71f 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -67,7 +67,7 @@ struct vhost_scsi_inflight {
struct vhost_scsi_cmd {
/* Descriptor from vhost_get_vq_desc() for virt_queue segment */
- int tvc_vq_desc;
+ struct vring_used_elem tvc_vq_used;
/* virtio-scsi initiator task attribute */
int tvc_task_attr;
/* virtio-scsi response incoming iovecs */
@@ -441,8 +441,9 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
struct vhost_virtqueue *vq = &vs->vqs[VHOST_SCSI_VQ_EVT].vq;
struct virtio_scsi_event *event = &evt->event;
struct virtio_scsi_event __user *eventp;
+ struct vring_used_elem used;
unsigned out, in;
- int head, ret;
+ int ret;
if (!vq->private_data) {
vs->vs_events_missed = true;
@@ -451,16 +452,16 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
again:
vhost_disable_notify(&vs->dev, vq);
- head = vhost_get_vq_desc(vq, vq->iov,
+ ret = vhost_get_vq_desc(vq, &used, vq->iov,
ARRAY_SIZE(vq->iov), &out, &in,
NULL, NULL);
- if (head < 0) {
+ if (ret == -ENOSPC) {
+ if (vhost_enable_notify(&vs->dev, vq))
+ goto again;
vs->vs_events_missed = true;
return;
}
- if (head == vq->num) {
- if (vhost_enable_notify(&vs->dev, vq))
- goto again;
+ if (ret < 0) {
vs->vs_events_missed = true;
return;
}
@@ -480,7 +481,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
eventp = vq->iov[out].iov_base;
ret = __copy_to_user(eventp, event, sizeof(*event));
if (!ret)
- vhost_add_used_and_signal(&vs->dev, vq, head, 0);
+ vhost_add_used_and_signal(&vs->dev, vq, &used, 0);
else
vq_err(vq, "Faulted on vhost_scsi_send_event\n");
}
@@ -541,7 +542,7 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
ret = copy_to_iter(&v_rsp, sizeof(v_rsp), &iov_iter);
if (likely(ret == sizeof(v_rsp))) {
struct vhost_scsi_virtqueue *q;
- vhost_add_used(cmd->tvc_vq, cmd->tvc_vq_desc, 0);
+ vhost_add_used(cmd->tvc_vq, &cmd->tvc_vq_used, 0);
q = container_of(cmd->tvc_vq, struct vhost_scsi_virtqueue, vq);
vq = q - vs->vqs;
__set_bit(vq, signal);
@@ -784,7 +785,7 @@ static void vhost_scsi_submission_work(struct work_struct *work)
static void
vhost_scsi_send_bad_target(struct vhost_scsi *vs,
struct vhost_virtqueue *vq,
- int head, unsigned out)
+ struct vring_used_elem *used, unsigned out)
{
struct virtio_scsi_cmd_resp __user *resp;
struct virtio_scsi_cmd_resp rsp;
@@ -795,7 +796,7 @@ vhost_scsi_send_bad_target(struct vhost_scsi *vs,
resp = vq->iov[out].iov_base;
ret = __copy_to_user(resp, &rsp, sizeof(rsp));
if (!ret)
- vhost_add_used_and_signal(&vs->dev, vq, head, 0);
+ vhost_add_used_and_signal(&vs->dev, vq, used, 0);
else
pr_err("Faulted on virtio_scsi_cmd_resp\n");
}
@@ -807,11 +808,12 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
struct virtio_scsi_cmd_req v_req;
struct virtio_scsi_cmd_req_pi v_req_pi;
struct vhost_scsi_cmd *cmd;
+ struct vring_used_elem used;
struct iov_iter out_iter, in_iter, prot_iter, data_iter;
u64 tag;
u32 exp_data_len, data_direction;
unsigned int out = 0, in = 0;
- int head, ret, prot_bytes;
+ int ret, prot_bytes;
size_t req_size, rsp_size = sizeof(struct virtio_scsi_cmd_resp);
size_t out_size, in_size;
u16 lun;
@@ -831,22 +833,22 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
vhost_disable_notify(&vs->dev, vq);
for (;;) {
- head = vhost_get_vq_desc(vq, vq->iov,
- ARRAY_SIZE(vq->iov), &out, &in,
- NULL, NULL);
+ ret = vhost_get_vq_desc(vq, &used, vq->iov,
+ ARRAY_SIZE(vq->iov), &out, &in,
+ NULL, NULL);
pr_debug("vhost_get_vq_desc: head: %d, out: %u in: %u\n",
- head, out, in);
- /* On error, stop handling until the next kick. */
- if (unlikely(head < 0))
- break;
+ used.id, out, in);
/* Nothing new? Wait for eventfd to tell us they refilled. */
- if (head == vq->num) {
+ if (ret == -ENOSPC) {
if (unlikely(vhost_enable_notify(&vs->dev, vq))) {
vhost_disable_notify(&vs->dev, vq);
continue;
}
break;
}
+ /* On error, stop handling until the next kick. */
+ if (unlikely(ret < 0))
+ break;
/*
* Check for a sane response buffer so we can report early
* errors back to the guest.
@@ -891,20 +893,20 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
if (unlikely(!copy_from_iter_full(req, req_size, &out_iter))) {
vq_err(vq, "Faulted on copy_from_iter\n");
- vhost_scsi_send_bad_target(vs, vq, head, out);
+ vhost_scsi_send_bad_target(vs, vq, &used, out);
continue;
}
/* virtio-scsi spec requires byte 0 of the lun to be 1 */
if (unlikely(*lunp != 1)) {
vq_err(vq, "Illegal virtio-scsi lun: %u\n", *lunp);
- vhost_scsi_send_bad_target(vs, vq, head, out);
+ vhost_scsi_send_bad_target(vs, vq, &used, out);
continue;
}
tpg = READ_ONCE(vs_tpg[*target]);
if (unlikely(!tpg)) {
/* Target does not exist, fail the request */
- vhost_scsi_send_bad_target(vs, vq, head, out);
+ vhost_scsi_send_bad_target(vs, vq, &used, out);
continue;
}
/*
@@ -950,7 +952,8 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
if (data_direction != DMA_TO_DEVICE) {
vq_err(vq, "Received non zero pi_bytesout,"
" but wrong data_direction\n");
- vhost_scsi_send_bad_target(vs, vq, head, out);
+ vhost_scsi_send_bad_target(vs, vq,
+ &used, out);
continue;
}
prot_bytes = vhost32_to_cpu(vq, v_req_pi.pi_bytesout);
@@ -958,7 +961,8 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
if (data_direction != DMA_FROM_DEVICE) {
vq_err(vq, "Received non zero pi_bytesin,"
" but wrong data_direction\n");
- vhost_scsi_send_bad_target(vs, vq, head, out);
+ vhost_scsi_send_bad_target(vs, vq,
+ &used, out);
continue;
}
prot_bytes = vhost32_to_cpu(vq, v_req_pi.pi_bytesin);
@@ -996,7 +1000,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
vq_err(vq, "Received SCSI CDB with command_size: %d that"
" exceeds SCSI_MAX_VARLEN_CDB_SIZE: %d\n",
scsi_command_size(cdb), VHOST_SCSI_MAX_CDB_SIZE);
- vhost_scsi_send_bad_target(vs, vq, head, out);
+ vhost_scsi_send_bad_target(vs, vq, &used, out);
continue;
}
cmd = vhost_scsi_get_tag(vq, tpg, cdb, tag, lun, task_attr,
@@ -1005,7 +1009,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
if (IS_ERR(cmd)) {
vq_err(vq, "vhost_scsi_get_tag failed %ld\n",
PTR_ERR(cmd));
- vhost_scsi_send_bad_target(vs, vq, head, out);
+ vhost_scsi_send_bad_target(vs, vq, &used, out);
continue;
}
cmd->tvc_vhost = vs;
@@ -1025,7 +1029,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
if (unlikely(ret)) {
vq_err(vq, "Failed to map iov to sgl\n");
vhost_scsi_release_cmd(&cmd->tvc_se_cmd);
- vhost_scsi_send_bad_target(vs, vq, head, out);
+ vhost_scsi_send_bad_target(vs, vq, &used, out);
continue;
}
}
@@ -1034,7 +1038,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
* complete the virtio-scsi request in TCM callback context via
* vhost_scsi_queue_data_in() and vhost_scsi_queue_status()
*/
- cmd->tvc_vq_desc = head;
+ cmd->tvc_vq_used = used;
/*
* Dispatch cmd descriptor for cmwq execution in process
* context provided by vhost_scsi_workqueue. This also ensures
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c57df71..83cb1dc 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1956,6 +1956,7 @@ static int get_indirect(struct vhost_virtqueue *vq,
* never a valid descriptor number) if none was found. A negative code is
* returned on error. */
int vhost_get_vq_desc(struct vhost_virtqueue *vq,
+ struct vring_used_elem *used,
struct iovec iov[], unsigned int iov_size,
unsigned int *out_num, unsigned int *in_num,
struct vhost_log *log, unsigned int *log_num)
@@ -1988,7 +1989,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
* invalid.
*/
if (vq->avail_idx == last_avail_idx)
- return vq->num;
+ return -ENOSPC;
/* Only get avail ring entries after they have been
* exposed by guest.
@@ -2006,6 +2007,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
return -EFAULT;
}
+ used->id = ring_head;
head = vhost16_to_cpu(vq, ring_head);
/* If their number is silly, that's an error. */
@@ -2094,10 +2096,16 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
/* Assume notifications from guest are disabled at this point,
* if they aren't we would need to update avail_event index. */
BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
- return head;
+ return 0;
}
EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
+static void vhost_set_used_len(struct vhost_virtqueue *vq,
+ struct vring_used_elem *used, int len)
+{
+ used->len = cpu_to_vhost32(vq, len);
+}
+
/* This is a multi-buffer version of vhost_get_desc, that works if
* vq has read descriptors only.
* @vq - the relevant virtqueue
@@ -2114,13 +2122,13 @@ int vhost_get_bufs(struct vhost_virtqueue *vq,
unsigned *iovcount,
struct vhost_log *log,
unsigned *log_num,
- unsigned int quota)
+ unsigned int quota,
+ s16 *count)
{
unsigned int out, in;
int seg = 0;
int headcount = 0;
- unsigned d;
- int r, nlogs = 0;
+ int r = 0, nlogs = 0;
/* len is always initialized before use since we are always called with
* datalen > 0.
*/
@@ -2131,17 +2139,12 @@ int vhost_get_bufs(struct vhost_virtqueue *vq,
r = -ENOBUFS;
goto err;
}
- r = vhost_get_vq_desc(vq, vq->iov + seg,
+ r = vhost_get_vq_desc(vq, &heads[headcount], vq->iov + seg,
ARRAY_SIZE(vq->iov) - seg, &out,
&in, log, log_num);
if (unlikely(r < 0))
goto err;
- d = r;
- if (d == vq->num) {
- r = 0;
- goto err;
- }
if (unlikely(out || in <= 0)) {
vq_err(vq, "unexpected descriptor format for RX: "
"out %d, in %d\n", out, in);
@@ -2152,24 +2155,26 @@ int vhost_get_bufs(struct vhost_virtqueue *vq,
nlogs += *log_num;
log += *log_num;
}
- heads[headcount].id = cpu_to_vhost32(vq, d);
+
len = iov_length(vq->iov + seg, in);
- heads[headcount].len = cpu_to_vhost32(vq, len);
+ vhost_set_used_len(vq, &heads[headcount], len);
datalen -= len;
++headcount;
seg += in;
}
- heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
+ vhost_set_used_len(vq, &heads[headcount - 1], len + datalen);
*iovcount = seg;
if (unlikely(log))
*log_num = nlogs;
/* Detect overrun */
if (unlikely(datalen > 0)) {
- r = UIO_MAXIOV + 1;
+ headcount = UIO_MAXIOV + 1;
goto err;
}
- return headcount;
+
+ *count = headcount;
+ return 0;
err:
vhost_discard_vq_desc(vq, headcount);
return r;
@@ -2185,14 +2190,11 @@ EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
/* After we've used one of their buffers, we tell them about it. We'll then
* want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
+int vhost_add_used(struct vhost_virtqueue *vq, struct vring_used_elem *used,
+ int len)
{
- struct vring_used_elem heads = {
- cpu_to_vhost32(vq, head),
- cpu_to_vhost32(vq, len)
- };
-
- return vhost_add_used_n(vq, &heads, 1);
+ vhost_set_used_len(vq, used, len);
+ return vhost_add_used_n(vq, used, 1);
}
EXPORT_SYMBOL_GPL(vhost_add_used);
@@ -2325,9 +2327,9 @@ EXPORT_SYMBOL_GPL(vhost_signal);
/* And here's the combo meal deal. Supersize me! */
void vhost_add_used_and_signal(struct vhost_dev *dev,
struct vhost_virtqueue *vq,
- unsigned int head, int len)
+ struct vring_used_elem *used, int len)
{
- vhost_add_used(vq, head, len);
+ vhost_add_used(vq, used, len);
vhost_signal(dev, vq);
}
EXPORT_SYMBOL_GPL(vhost_add_used_and_signal);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index cbc2c80..477bfd2 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -182,6 +182,7 @@ int vhost_vq_access_ok(struct vhost_virtqueue *vq);
int vhost_log_access_ok(struct vhost_dev *);
int vhost_get_vq_desc(struct vhost_virtqueue *,
+ struct vring_used_elem *used_elem,
struct iovec iov[], unsigned int iov_count,
unsigned int *out_num, unsigned int *in_num,
struct vhost_log *log, unsigned int *log_num);
@@ -191,15 +192,17 @@ int vhost_get_bufs(struct vhost_virtqueue *vq,
unsigned *iovcount,
struct vhost_log *log,
unsigned *log_num,
- unsigned int quota);
+ unsigned int quota,
+ s16 *count);
void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
int vhost_vq_init_access(struct vhost_virtqueue *);
-int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
+int vhost_add_used(struct vhost_virtqueue *vq,
+ struct vring_used_elem *elem, int len);
int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
unsigned count);
void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
- unsigned int id, int len);
+ struct vring_used_elem *, int len);
void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
struct vring_used_elem *heads, unsigned count);
void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 0d14e2f..fc20292 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -98,11 +98,12 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
for (;;) {
struct virtio_vsock_pkt *pkt;
+ struct vring_used_elem used;
struct iov_iter iov_iter;
unsigned out, in;
size_t nbytes;
size_t len;
- int head;
+ int ret;
spin_lock_bh(&vsock->send_pkt_list_lock);
if (list_empty(&vsock->send_pkt_list)) {
@@ -116,16 +117,9 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
list_del_init(&pkt->list);
spin_unlock_bh(&vsock->send_pkt_list_lock);
- head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
- &out, &in, NULL, NULL);
- if (head < 0) {
- spin_lock_bh(&vsock->send_pkt_list_lock);
- list_add(&pkt->list, &vsock->send_pkt_list);
- spin_unlock_bh(&vsock->send_pkt_list_lock);
- break;
- }
-
- if (head == vq->num) {
+ ret = vhost_get_vq_desc(vq, &used, vq->iov, ARRAY_SIZE(vq->iov),
+ &out, &in, NULL, NULL);
+ if (ret == -ENOSPC) {
spin_lock_bh(&vsock->send_pkt_list_lock);
list_add(&pkt->list, &vsock->send_pkt_list);
spin_unlock_bh(&vsock->send_pkt_list_lock);
@@ -139,6 +133,12 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
}
break;
}
+ if (ret < 0) {
+ spin_lock_bh(&vsock->send_pkt_list_lock);
+ list_add(&pkt->list, &vsock->send_pkt_list);
+ spin_unlock_bh(&vsock->send_pkt_list_lock);
+ break;
+ }
if (out) {
virtio_transport_free_pkt(pkt);
@@ -146,7 +146,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
break;
}
- len = iov_length(&vq->iov[out], in);
+ len = vhost32_to_cpu(vq, used.len);
iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
@@ -163,7 +163,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
break;
}
- vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
+ vhost_add_used(vq, &used, sizeof(pkt->hdr) + pkt->len);
added = true;
if (pkt->reply) {
@@ -346,7 +346,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
struct vhost_vsock *vsock = container_of(vq->dev, struct vhost_vsock,
dev);
struct virtio_vsock_pkt *pkt;
- int head;
+ struct vring_used_elem used;
+ int ret;
unsigned int out, in;
bool added = false;
@@ -367,18 +368,17 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
goto no_more_replies;
}
- head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
- &out, &in, NULL, NULL);
- if (head < 0)
- break;
-
- if (head == vq->num) {
+ ret = vhost_get_vq_desc(vq, &used, vq->iov, ARRAY_SIZE(vq->iov),
+ &out, &in, NULL, NULL);
+ if (ret == -ENOSPC) {
if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
vhost_disable_notify(&vsock->dev, vq);
continue;
}
break;
}
+ if (ret < 0)
+ break;
pkt = vhost_vsock_alloc_pkt(vq, out, in);
if (!pkt) {
@@ -397,7 +397,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
else
virtio_transport_free_pkt(pkt);
- vhost_add_used(vq, head, sizeof(pkt->hdr) + len);
+ vhost_add_used(vq, &used, sizeof(pkt->hdr) + len);
added = true;
}
--
2.7.4
^ permalink raw reply related
* [RFC PATCH V2 3/8] vhost: do not use vring_used_elem
From: Jason Wang @ 2018-03-26 3:38 UTC (permalink / raw)
To: mst, virtualization; +Cc: netdev, linux-kernel, kvm
In-Reply-To: <1522035533-11786-1-git-send-email-jasowang@redhat.com>
Instead of depending on the exported vring_used_elem, this patch
switches to use a new internal structure vhost_used_elem which embed
vring_used_elem in itself. This could be used to let vhost to record
extra metadata for the incoming packed ring layout.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 19 ++++++++++---------
drivers/vhost/scsi.c | 10 +++++-----
drivers/vhost/vhost.c | 33 ++++++++++++++++-----------------
drivers/vhost/vhost.h | 18 +++++++++++-------
drivers/vhost/vsock.c | 6 +++---
5 files changed, 45 insertions(+), 41 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index f821fcd..7ea2aee 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -337,10 +337,10 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
int j = 0;
for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
- if (vq->heads[i].len == VHOST_DMA_FAILED_LEN)
+ if (vq->heads[i].elem.len == VHOST_DMA_FAILED_LEN)
vhost_net_tx_err(net);
- if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
- vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
+ if (VHOST_DMA_IS_DONE(vq->heads[i].elem.len)) {
+ vq->heads[i].elem.len = VHOST_DMA_CLEAR_LEN;
++j;
} else
break;
@@ -363,7 +363,7 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
rcu_read_lock_bh();
/* set len to mark this desc buffers done DMA */
- vq->heads[ubuf->desc].len = success ?
+ vq->heads[ubuf->desc].elem.len = success ?
VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
cnt = vhost_net_ubuf_put(ubufs);
@@ -422,7 +422,7 @@ static int vhost_net_enable_vq(struct vhost_net *n,
static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
struct vhost_virtqueue *vq,
- struct vring_used_elem *used_elem,
+ struct vhost_used_elem *used_elem,
struct iovec iov[], unsigned int iov_size,
unsigned int *out_num, unsigned int *in_num)
{
@@ -473,7 +473,7 @@ static void handle_tx(struct vhost_net *net)
size_t hdr_size;
struct socket *sock;
struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
- struct vring_used_elem used;
+ struct vhost_used_elem used;
bool zcopy, zcopy_used;
mutex_lock(&vq->mutex);
@@ -537,9 +537,10 @@ static void handle_tx(struct vhost_net *net)
struct ubuf_info *ubuf;
ubuf = nvq->ubuf_info + nvq->upend_idx;
- vq->heads[nvq->upend_idx].id =
- cpu_to_vhost32(vq, used.id);
- vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
+ vq->heads[nvq->upend_idx].elem.id =
+ cpu_to_vhost32(vq, used.elem.id);
+ vq->heads[nvq->upend_idx].elem.len =
+ VHOST_DMA_IN_PROGRESS;
ubuf->callback = vhost_zerocopy_callback;
ubuf->ctx = nvq->ubufs;
ubuf->desc = nvq->upend_idx;
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 654c71f..ac11412 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -67,7 +67,7 @@ struct vhost_scsi_inflight {
struct vhost_scsi_cmd {
/* Descriptor from vhost_get_vq_desc() for virt_queue segment */
- struct vring_used_elem tvc_vq_used;
+ struct vhost_used_elem tvc_vq_used;
/* virtio-scsi initiator task attribute */
int tvc_task_attr;
/* virtio-scsi response incoming iovecs */
@@ -441,7 +441,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
struct vhost_virtqueue *vq = &vs->vqs[VHOST_SCSI_VQ_EVT].vq;
struct virtio_scsi_event *event = &evt->event;
struct virtio_scsi_event __user *eventp;
- struct vring_used_elem used;
+ struct vhost_used_elem used;
unsigned out, in;
int ret;
@@ -785,7 +785,7 @@ static void vhost_scsi_submission_work(struct work_struct *work)
static void
vhost_scsi_send_bad_target(struct vhost_scsi *vs,
struct vhost_virtqueue *vq,
- struct vring_used_elem *used, unsigned out)
+ struct vhost_used_elem *used, unsigned out)
{
struct virtio_scsi_cmd_resp __user *resp;
struct virtio_scsi_cmd_resp rsp;
@@ -808,7 +808,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
struct virtio_scsi_cmd_req v_req;
struct virtio_scsi_cmd_req_pi v_req_pi;
struct vhost_scsi_cmd *cmd;
- struct vring_used_elem used;
+ struct vhost_used_elem used;
struct iov_iter out_iter, in_iter, prot_iter, data_iter;
u64 tag;
u32 exp_data_len, data_direction;
@@ -837,7 +837,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
ARRAY_SIZE(vq->iov), &out, &in,
NULL, NULL);
pr_debug("vhost_get_vq_desc: head: %d, out: %u in: %u\n",
- used.id, out, in);
+ used.elem.id, out, in);
/* Nothing new? Wait for eventfd to tell us they refilled. */
if (ret == -ENOSPC) {
if (unlikely(vhost_enable_notify(&vs->dev, vq))) {
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 83cb1dc..8744dae 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1956,7 +1956,7 @@ static int get_indirect(struct vhost_virtqueue *vq,
* never a valid descriptor number) if none was found. A negative code is
* returned on error. */
int vhost_get_vq_desc(struct vhost_virtqueue *vq,
- struct vring_used_elem *used,
+ struct vhost_used_elem *used,
struct iovec iov[], unsigned int iov_size,
unsigned int *out_num, unsigned int *in_num,
struct vhost_log *log, unsigned int *log_num)
@@ -2007,7 +2007,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
return -EFAULT;
}
- used->id = ring_head;
+ used->elem.id = ring_head;
head = vhost16_to_cpu(vq, ring_head);
/* If their number is silly, that's an error. */
@@ -2101,9 +2101,9 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
static void vhost_set_used_len(struct vhost_virtqueue *vq,
- struct vring_used_elem *used, int len)
+ struct vhost_used_elem *used, int len)
{
- used->len = cpu_to_vhost32(vq, len);
+ used->elem.len = cpu_to_vhost32(vq, len);
}
/* This is a multi-buffer version of vhost_get_desc, that works if
@@ -2117,7 +2117,7 @@ static void vhost_set_used_len(struct vhost_virtqueue *vq,
* returns number of buffer heads allocated, negative on error
*/
int vhost_get_bufs(struct vhost_virtqueue *vq,
- struct vring_used_elem *heads,
+ struct vhost_used_elem *heads,
int datalen,
unsigned *iovcount,
struct vhost_log *log,
@@ -2190,7 +2190,7 @@ EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
/* After we've used one of their buffers, we tell them about it. We'll then
* want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, struct vring_used_elem *used,
+int vhost_add_used(struct vhost_virtqueue *vq, struct vhost_used_elem *used,
int len)
{
vhost_set_used_len(vq, used, len);
@@ -2199,27 +2199,26 @@ int vhost_add_used(struct vhost_virtqueue *vq, struct vring_used_elem *used,
EXPORT_SYMBOL_GPL(vhost_add_used);
static int __vhost_add_used_n(struct vhost_virtqueue *vq,
- struct vring_used_elem *heads,
+ struct vhost_used_elem *heads,
unsigned count)
{
struct vring_used_elem __user *used;
u16 old, new;
- int start;
+ int start, i;
start = vq->last_used_idx & (vq->num - 1);
used = vq->used->ring + start;
- if (count == 1) {
- if (vhost_put_user(vq, heads[0].id, &used->id)) {
+ for (i = 0; i < count; i++) {
+ if (unlikely(vhost_put_user(vq, heads[i].elem.id,
+ &used[i].id))) {
vq_err(vq, "Failed to write used id");
return -EFAULT;
}
- if (vhost_put_user(vq, heads[0].len, &used->len)) {
+ if (unlikely(vhost_put_user(vq, heads[i].elem.len,
+ &used[i].len))) {
vq_err(vq, "Failed to write used len");
return -EFAULT;
}
- } else if (vhost_copy_to_user(vq, used, heads, count * sizeof *used)) {
- vq_err(vq, "Failed to write used");
- return -EFAULT;
}
if (unlikely(vq->log_used)) {
/* Make sure data is seen before log. */
@@ -2243,7 +2242,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
/* After we've used one of their buffers, we tell them about it. We'll then
* want to notify the guest, using eventfd. */
-int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
+int vhost_add_used_n(struct vhost_virtqueue *vq, struct vhost_used_elem *heads,
unsigned count)
{
int start, n, r;
@@ -2327,7 +2326,7 @@ EXPORT_SYMBOL_GPL(vhost_signal);
/* And here's the combo meal deal. Supersize me! */
void vhost_add_used_and_signal(struct vhost_dev *dev,
struct vhost_virtqueue *vq,
- struct vring_used_elem *used, int len)
+ struct vhost_used_elem *used, int len)
{
vhost_add_used(vq, used, len);
vhost_signal(dev, vq);
@@ -2337,7 +2336,7 @@ EXPORT_SYMBOL_GPL(vhost_add_used_and_signal);
/* multi-buffer version of vhost_add_used_and_signal */
void vhost_add_used_and_signal_n(struct vhost_dev *dev,
struct vhost_virtqueue *vq,
- struct vring_used_elem *heads, unsigned count)
+ struct vhost_used_elem *heads, unsigned count)
{
vhost_add_used_n(vq, heads, count);
vhost_signal(dev, vq);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 477bfd2..8399887 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -34,6 +34,10 @@ struct vhost_poll {
struct vhost_dev *dev;
};
+struct vhost_used_elem {
+ struct vring_used_elem elem;
+};
+
void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
bool vhost_has_work(struct vhost_dev *dev);
@@ -126,7 +130,7 @@ struct vhost_virtqueue {
struct iovec iov[UIO_MAXIOV];
struct iovec iotlb_iov[64];
struct iovec *indirect;
- struct vring_used_elem *heads;
+ struct vhost_used_elem *heads;
/* Protected by virtqueue mutex. */
struct vhost_umem *umem;
struct vhost_umem *iotlb;
@@ -182,12 +186,12 @@ int vhost_vq_access_ok(struct vhost_virtqueue *vq);
int vhost_log_access_ok(struct vhost_dev *);
int vhost_get_vq_desc(struct vhost_virtqueue *,
- struct vring_used_elem *used_elem,
+ struct vhost_used_elem *used_elem,
struct iovec iov[], unsigned int iov_count,
unsigned int *out_num, unsigned int *in_num,
struct vhost_log *log, unsigned int *log_num);
int vhost_get_bufs(struct vhost_virtqueue *vq,
- struct vring_used_elem *heads,
+ struct vhost_used_elem *heads,
int datalen,
unsigned *iovcount,
struct vhost_log *log,
@@ -198,13 +202,13 @@ void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
int vhost_vq_init_access(struct vhost_virtqueue *);
int vhost_add_used(struct vhost_virtqueue *vq,
- struct vring_used_elem *elem, int len);
-int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
+ struct vhost_used_elem *elem, int len);
+int vhost_add_used_n(struct vhost_virtqueue *vq, struct vhost_used_elem *heads,
unsigned count);
void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
- struct vring_used_elem *, int len);
+ struct vhost_used_elem *, int len);
void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
- struct vring_used_elem *heads, unsigned count);
+ struct vhost_used_elem *heads, unsigned count);
void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
void vhost_disable_notify(struct vhost_dev *, struct vhost_virtqueue *);
bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index fc20292..5d36208 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -98,7 +98,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
for (;;) {
struct virtio_vsock_pkt *pkt;
- struct vring_used_elem used;
+ struct vhost_used_elem used;
struct iov_iter iov_iter;
unsigned out, in;
size_t nbytes;
@@ -146,7 +146,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
break;
}
- len = vhost32_to_cpu(vq, used.len);
+ len = vhost32_to_cpu(vq, used.elem.len);
iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
@@ -346,7 +346,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
struct vhost_vsock *vsock = container_of(vq->dev, struct vhost_vsock,
dev);
struct virtio_vsock_pkt *pkt;
- struct vring_used_elem used;
+ struct vhost_used_elem used;
int ret;
unsigned int out, in;
bool added = false;
--
2.7.4
^ permalink raw reply related
* [RFC PATCH V2 4/8] vhost_net: do not explicitly manipulate vhost_used_elem
From: Jason Wang @ 2018-03-26 3:38 UTC (permalink / raw)
To: mst, virtualization; +Cc: netdev, linux-kernel, kvm
In-Reply-To: <1522035533-11786-1-git-send-email-jasowang@redhat.com>
Two helpers of setting/getting used len were introduced to avoid
explicitly manipulating vhost_used_elem in zerocopy code. This will be
used to hide used_elem internals and simplify packed ring
implementation.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 11 +++++------
drivers/vhost/vhost.c | 12 ++++++++++--
drivers/vhost/vhost.h | 5 +++++
3 files changed, 20 insertions(+), 8 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7ea2aee..7be8b55 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -337,9 +337,10 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
int j = 0;
for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
- if (vq->heads[i].elem.len == VHOST_DMA_FAILED_LEN)
+ if (vhost_get_used_len(vq, &vq->heads[i]) ==
+ VHOST_DMA_FAILED_LEN)
vhost_net_tx_err(net);
- if (VHOST_DMA_IS_DONE(vq->heads[i].elem.len)) {
+ if (VHOST_DMA_IS_DONE(vhost_get_used_len(vq, &vq->heads[i]))) {
vq->heads[i].elem.len = VHOST_DMA_CLEAR_LEN;
++j;
} else
@@ -537,10 +538,8 @@ static void handle_tx(struct vhost_net *net)
struct ubuf_info *ubuf;
ubuf = nvq->ubuf_info + nvq->upend_idx;
- vq->heads[nvq->upend_idx].elem.id =
- cpu_to_vhost32(vq, used.elem.id);
- vq->heads[nvq->upend_idx].elem.len =
- VHOST_DMA_IN_PROGRESS;
+ vhost_set_used_len(vq, &used, VHOST_DMA_IN_PROGRESS);
+ vq->heads[nvq->upend_idx] = used;
ubuf->callback = vhost_zerocopy_callback;
ubuf->ctx = nvq->ubufs;
ubuf->desc = nvq->upend_idx;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 8744dae..65954d6 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2100,11 +2100,19 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
}
EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
-static void vhost_set_used_len(struct vhost_virtqueue *vq,
- struct vhost_used_elem *used, int len)
+void vhost_set_used_len(struct vhost_virtqueue *vq,
+ struct vhost_used_elem *used, int len)
{
used->elem.len = cpu_to_vhost32(vq, len);
}
+EXPORT_SYMBOL_GPL(vhost_set_used_len);
+
+int vhost_get_used_len(struct vhost_virtqueue *vq,
+ struct vhost_used_elem *used)
+{
+ return vhost32_to_cpu(vq, used->elem.len);
+}
+EXPORT_SYMBOL_GPL(vhost_get_used_len);
/* This is a multi-buffer version of vhost_get_desc, that works if
* vq has read descriptors only.
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 8399887..d57c875 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -198,6 +198,11 @@ int vhost_get_bufs(struct vhost_virtqueue *vq,
unsigned *log_num,
unsigned int quota,
s16 *count);
+void vhost_set_used_len(struct vhost_virtqueue *vq,
+ struct vhost_used_elem *used,
+ int len);
+int vhost_get_used_len(struct vhost_virtqueue *vq,
+ struct vhost_used_elem *used);
void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
int vhost_vq_init_access(struct vhost_virtqueue *);
--
2.7.4
^ permalink raw reply related
* [RFC PATCH V2 5/8] vhost: vhost_put_user() can accept metadata type
From: Jason Wang @ 2018-03-26 3:38 UTC (permalink / raw)
To: mst, virtualization; +Cc: netdev, linux-kernel, kvm
In-Reply-To: <1522035533-11786-1-git-send-email-jasowang@redhat.com>
We assumes used ring update is the only user for vhost_put_user() in
the past. This may not be the case for the incoming packed ring which
may update the descriptor ring for used. So introduce a new type
parameter.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/vhost.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 65954d6..dcac4d4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -847,7 +847,7 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
return __vhost_get_user_slow(vq, addr, size, type);
}
-#define vhost_put_user(vq, x, ptr) \
+#define vhost_put_user(vq, x, ptr, type) \
({ \
int ret = -EFAULT; \
if (!vq->iotlb) { \
@@ -855,7 +855,7 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
} else { \
__typeof__(ptr) to = \
(__typeof__(ptr)) __vhost_get_user(vq, ptr, \
- sizeof(*ptr), VHOST_ADDR_USED); \
+ sizeof(*ptr), type); \
if (to != NULL) \
ret = __put_user(x, to); \
else \
@@ -1716,7 +1716,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
{
void __user *used;
if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->used_flags),
- &vq->used->flags) < 0)
+ &vq->used->flags, VHOST_ADDR_USED) < 0)
return -EFAULT;
if (unlikely(vq->log_used)) {
/* Make sure the flag is seen before log. */
@@ -1735,7 +1735,7 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
{
if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx),
- vhost_avail_event(vq)))
+ vhost_avail_event(vq), VHOST_ADDR_USED))
return -EFAULT;
if (unlikely(vq->log_used)) {
void __user *used;
@@ -2218,12 +2218,12 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
used = vq->used->ring + start;
for (i = 0; i < count; i++) {
if (unlikely(vhost_put_user(vq, heads[i].elem.id,
- &used[i].id))) {
+ &used[i].id, VHOST_ADDR_USED))) {
vq_err(vq, "Failed to write used id");
return -EFAULT;
}
if (unlikely(vhost_put_user(vq, heads[i].elem.len,
- &used[i].len))) {
+ &used[i].len, VHOST_ADDR_USED))) {
vq_err(vq, "Failed to write used len");
return -EFAULT;
}
@@ -2269,7 +2269,7 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vhost_used_elem *heads,
/* Make sure buffer is written before we update index. */
smp_wmb();
if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx),
- &vq->used->idx)) {
+ &vq->used->idx, VHOST_ADDR_USED)) {
vq_err(vq, "Failed to increment used idx");
return -EFAULT;
}
--
2.7.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox