* [PATCH 00/23] drm: Eliminate plane->fb/crtc usage for atomic drivers
From: Ville Syrjala @ 2018-03-22 15:22 UTC (permalink / raw)
To: dri-devel
Cc: David Airlie, Daniel Vetter, chris, Eric Anholt,
Benjamin Gaignard, Dave Airlie, Boris Brezillon, Thomas Hellstrom,
Joonyoung Shim, Sinclair Yeh, Rob Clark, amd-gfx, martin.peres,
VMware Graphics, Harry Wentland, David (ChunMing) Zhou,
linux-arm-msm, intel-gfx, Maarten Lankhorst, Inki Dae,
virtualization, Vincent Abriou, Shawn Guo
From: Ville Syrjälä <ville.syrjala@linux.intel.com>
I really just wanted to fix i915 to re-enable its planes afer load
detection (a two line patch). This is what I actually ended up with
after I ran into a framebuffer refcount leak with said two line patch.
I've tested this on a few i915 boxes and so far it's looking
good. Everything else is just compile tested.
Entire series available here:
git://github.com/vsyrjala/linux.git plane_fb_crtc_nuke
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Boris Brezillon <boris.brezillon@free-electrons.com>
Cc: chris@chris-wilson.co.uk
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: David Airlie <airlied@linux.ie>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: freedreno@lists.freedesktop.org
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: martin.peres@free.fr
Cc: Rob Clark <robdclark@gmail.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sinclair Yeh <syeh@vmware.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Vincent Abriou <vincent.abriou@st.com>
Cc: virtualization@lists.linux-foundation.org
Cc: VMware Graphics <linux-graphics-maintainer@vmware.com>
Ville Syrjälä (23):
Revert "drm/atomic-helper: Fix leak in disable_all"
drm/atomic-helper: Make drm_atomic_helper_disable_all() update the
plane->fb pointers
drm: Clear crtc->primary->crtc when disabling the crtc via setcrtc()
drm/atomic-helper: WARN if legacy plane fb pointers are bogus when
committing duplicated state
drm: Add local 'plane' variable for primary/cursor planes
drm: Adjust whitespace for legibility
drm: Make the fb refcount handover less magic
drm: Use plane->state->fb over plane->fb
drm/i915: Stop consulting plane->fb
drm/msm: Stop consulting plane->fb
drm/sti: Stop consulting plane->fb
drm/vmwgfx: Stop consulting plane->fb
drm/zte: Stop consulting plane->fb
drm/atmel-hlcdc: Stop using plane->fb
drm: Stop updating plane->crtc/fb/old_fb on atomic drivers
drm/amdgpu/dc: Stop updating plane->fb
drm/i915: Stop updating plane->fb/crtc
drm/exynos: Stop updating plane->crtc
drm/msm: Stop updating plane->fb/crtc
drm/virtio: Stop updating plane->fb
drm/vc4: Stop updating plane->fb/crtc
drm/i915: Restore planes after load detection
drm/i915: Make force_load_detect effective even w/ DMI quirks/hotplug
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 -
drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c | 12 +---
drivers/gpu/drm/drm_atomic.c | 55 ++--------------
drivers/gpu/drm/drm_atomic_helper.c | 79 ++++++++++-------------
drivers/gpu/drm/drm_crtc.c | 51 ++++++++++-----
drivers/gpu/drm/drm_fb_helper.c | 7 --
drivers/gpu/drm/drm_framebuffer.c | 5 --
drivers/gpu/drm/drm_plane.c | 64 +++++++++++-------
drivers/gpu/drm/drm_plane_helper.c | 4 +-
drivers/gpu/drm/exynos/exynos_drm_plane.c | 2 -
drivers/gpu/drm/i915/intel_crt.c | 6 ++
drivers/gpu/drm/i915/intel_display.c | 9 +--
drivers/gpu/drm/i915/intel_fbdev.c | 2 +-
drivers/gpu/drm/msm/disp/mdp4/mdp4_crtc.c | 3 +-
drivers/gpu/drm/msm/disp/mdp4/mdp4_plane.c | 2 -
drivers/gpu/drm/msm/disp/mdp5/mdp5_plane.c | 2 -
drivers/gpu/drm/sti/sti_plane.c | 9 +--
drivers/gpu/drm/vc4/vc4_crtc.c | 3 -
drivers/gpu/drm/virtio/virtgpu_display.c | 2 -
drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 6 +-
drivers/gpu/drm/zte/zx_vou.c | 2 +-
include/drm/drm_atomic.h | 3 -
22 files changed, 143 insertions(+), 187 deletions(-)
--
2.16.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* RE: [PATCH 1/4] iommu: Add virtio-iommu driver
From: Tian, Kevin @ 2018-03-22 10:06 UTC (permalink / raw)
To: Robin Murphy, Jean-Philippe Brucker,
iommu@lists.linux-foundation.org, kvm@vger.kernel.org,
virtualization@lists.linux-foundation.org,
virtio-dev@lists.oasis-open.org, kvmarm@lists.cs.columbia.edu
Cc: jayachandran.nair@cavium.com, tnowicki@caviumnetworks.com,
mst@redhat.com, Marc Zyngier, Will Deacon,
jintack@cs.columbia.edu, eric.auger.pro@gmail.com
In-Reply-To: <739fbfdf-1651-8036-367a-52940b93dee6@arm.com>
> From: Robin Murphy [mailto:robin.murphy@arm.com]
> Sent: Wednesday, March 21, 2018 10:24 PM
>
> On 21/03/18 13:14, Jean-Philippe Brucker wrote:
> > On 21/03/18 06:43, Tian, Kevin wrote:
> > [...]
> >>> +
> >>> +#include <uapi/linux/virtio_iommu.h>
> >>> +
> >>> +#define MSI_IOVA_BASE 0x8000000
> >>> +#define MSI_IOVA_LENGTH 0x100000
> >>
> >> this is ARM specific, and according to virtio-iommu spec isn't it
> >> better probed on the endpoint instead of hard-coding here?
> >
> > These values are arbitrary, not really ARM-specific even if ARM is the
> > only user yet: we're just reserving a random IOVA region for mapping
> MSIs.
> > It is hard-coded because of the way iommu-dma.c works, but I don't
> quite
> > remember why that allocation isn't dynamic.
>
> The host kernel needs to have *some* MSI region in place before the
> guest can start configuring interrupts, otherwise it won't know what
> address to give to the underlying hardware. However, as soon as the host
> kernel has picked a region, host userspace needs to know that it can no
> longer use addresses in that region for DMA-able guest memory. It's a
> lot easier when the address is fixed in hardware and the host userspace
> will never be stupid enough to try and VFIO_IOMMU_DMA_MAP it, but in
> the
> more general case where MSI writes undergo IOMMU address translation
> so
> it's an arbitrary IOVA, this has the potential to conflict with stuff
> like guest memory hotplug.
>
> What we currently have is just the simplest option, with the host kernel
> just picking something up-front and pretending to host userspace that
> it's a fixed hardware address. There's certainly scope for it to be a
> bit more dynamic in the sense of adding an interface to let userspace
> move it around (before attaching any devices, at least), but I don't
> think it's feasible for the host kernel to second-guess userspace enough
> to make it entirely transparent like it is in the DMA API domain case.
>
> Of course, that's all assuming the host itself is using a virtio-iommu
> (e.g. in a nested virt or emulation scenario). When it's purely within a
> guest then an MSI reservation shouldn't matter so much, since the guest
> won't be anywhere near the real hardware configuration anyway.
>
> Robin.
Curious since anyway we are defining a new iommu architecture
is it possible to avoid those ARM-specific burden completely?
Thanks
Kevin
^ permalink raw reply
* RE: [virtio-dev] [RFC] virtio-iommu version 0.6
From: Tian, Kevin @ 2018-03-22 9:44 UTC (permalink / raw)
To: Jean-Philippe Brucker, virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org
Cc: eric.auger@redhat.com, eric.auger.pro@gmail.com
In-Reply-To: <3575fbb8-36a9-ef38-8234-6adf6a573b3e@arm.com>
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Wednesday, March 21, 2018 9:14 PM
>
> Hi Kevin,
>
> Thanks for the comments
>
> On 19/03/18 10:03, Tian, Kevin wrote:
> > BYPASS feature bit is not covered in "2.3.1/2.3.2/2.3.3"". Is it
> > intended?
>
> In my opinion BYPASS is a bit different from the other features: while the
> others are needed for correctness, this one is optional and even if the
> guest supports BYPASS, it should be allowed not to accept it. For security
> reasons it may not want to let endpoints access the whole guest-physical
> address space.
ok, possibly because I'm not familiar with virtio spec convention.
My original feeling is that each feature bit will have a behavior
description regarding to whether device reports and whether
driver accepts. If no need to cover optional feature, then it's fine
to me. :-)
>
> > --2.6.2 Device Requirements: Device operations--
> >
> > "If the VIRTIO_IOMMU_F_INPUT_RANGE feature is offered,
> > the device MUST truncate the range described by virt_start
> > and virt_end in requests to ft in the range described by
> > input_range."
> >
> > "If the VIRTIO_IOMMU_F_DOMAIN_BITS is offered, the device
> > MUST ignore bits above domain_bits in field domain of requests."
> >
> > shouldn't device return error upon above situation instead
> > of continuing operation with modified value?
>
> The intent was to allow device and driver to pass metadata in the top bits
> of pointers and domain IDs, perhaps for a future extension or for
> debugging. I'm not convinced it's useful anymore (and do wonder what my
> initial reasoning was...) Such extension would be determined by a new
> feature bit and the device would know that the fields contain additional
> info, if the driver accepted that feature.
>
> > --2.6.4 DETACH request--
> >
> > " Detach an endpoint from its domain. When this request
> > completes, the endpoint cannot access any mapping from that
> > domain anymore "
> >
> > Does it make sense to mention BYPASS situation upon this request?
>
> Sure, I'll add something to clarify that situation
>
> > --2.6.8.2 Property RESV_MEM --
> >
> > "VIRTIO_IOMMU_RESV_MEM_T_RESERVED (0) Accesses to
> > virtual addresses in this region are not translated by the device.
> > They may either be aborted by the device (or the underlying
> > IOMMU), bypass it, or never even reach it"
> >
> > in 3.3 hardware device assignment you described an example
> > that a reserved range is stolen by host to map the MSI
> > doorbell... such map behavior is not described above.
>
> Right, we can say that accesses to the region may be used for host
> translation
>
> > Then comes a question for VIRTIO_IOMMU_RESV_MEM_T_MSI.
> > I know there were quite some discussion around this flag before,
> > but my mental picture now still is a bit difficult to understand its
> > usage based on examples in implementation notes:
> >
> > - for x86, you describe it as indicating MSI bypass for
> > oxfeexxxxx. However guest doesn't need to know this fact. Only
> > requirement is to treat it as reserved range (as on bare metal)
> > then T_RESERVED is sufficient for this purpose>
> > - for ARM, either let guest or let host to choose a virtual
> > address for mapping to MSI doorbell. the former doesn't require
> > a reserved range. for the latter also T_RESERVED is enough as
> > the example in hardware device assignment section.
>
> It might be nicer to have the host decide it, but when the physical IOMMU
> is ARM SMMU, nesting translation complicates things, because the guest
> *has* to create a mapping:
confirm one thing first. v0.6 doesn't support binding to guest IOVA
page table yet. So based on current map/unmap interface, there is
no such complicity right? stage-1 is just a shadowed translation (IOVA
->HPA) to guest side (IOVA->GPA) with nesting disabled. In that case
the default behavior is host-reserved style.
Then comes nested scenario:
>
> * The guest is in charge of stage-1. It creates IOVA->GPA mapping for the
> MSI doorbell. The GPA is mapped to the physical MSI doorbell at
> stage-2 by the host.
Is it a must that above GPA is mapped to physical MSI doorbell?
Ideally:
1) Host reserves some IOVA range for mapping MSI doorbell
2) the range is reported to user space
3) Qemu reported the range as reserved on endpoints, marked
as T_IDENTITY (a new type to be introduced), meaning guest
needs setup identity mapping in stage-1
4) Then device should be able to ring physical MSI doorbell
5) I'm not sure whether guest still needs to allocate its own
IOVA and mapping to vGIC doorbell in such case...
>
> * when it writes that IOVA in the virtual MSI-X tables, the host updates
> the physical MSI-X tables using an ioctl (it can't map the physical
> tables into the guest, because MSI-X vector data also contain physical
> device info that shouldn't be known by the guest.)
>
> So we need support for software-mapped MSIs in the guest anyway. In
> Linux
> the virtio-iommu driver can use existing plumbing for that, no change
> needed in the guest.
>
> > Then what's the real point of differentiating MSI bypass from
> > normal reserved range? Is there other example where guest
> > may use such info to do special thing?
>
> The presence of an MSI bypass regions allows the driver and the DMA layer
> (in Linux iommu_dma_map_msi_msg) to know whether it needs to create a
> mapping or if it's bypassed. When writing the entry into the MSI-X table,
> if the address in the MSI vector already corresponds to an MSI bypass
> region, then there is no need to create a mapping. Making the bypass MSI
> region a normal RESV range hides that information.
ok, will take a look to understand this part.
>
> Another way of choosing would be with #ifdef CONFIG_ARM64,
> CONFIG_X86 etc,
> but I find it nasty, and I personally prefer using MSI bypass for ARM when
> possible.
however from current v0.6 examples, BYPASS is only listed for x86
case. ARM usage is the missing piece making me confused.
>
> > -- 3.2 Message Signaled Interrupts --
> >
> > " Section 3.2.2 describes the added complexity (from the host
> > point of view) of translating the IRQ chip doorbell "
> >
> > however later 3.2.2 only says one sentence about host
> > complexity - " However, this mode of operations may add
> > significant complexity in the host implementation". If you plan
> > to elaborate that part in the future, please add a note. :-)
>
> Oh right, I don't really want to elaborate, I'll try to reword 3.2. The
> kvmtool patches have support for both MSI bypass and translation
> (selectable with a cmdline flag), which I think is good enough. I'd rather
> read a code example than some clumsy prose describing it.
>
sounds good.
Thanks
Kevin
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH v2] virtio_balloon: export hugetlb page allocation counts
From: Michael S. Tsirkin @ 2018-03-22 3:10 UTC (permalink / raw)
To: Jason Wang; +Cc: virtio-dev, virtualization, linux-kernel
In-Reply-To: <c09d3570-a8f9-7a84-5dd6-46d385da2e15@redhat.com>
On Thu, Mar 22, 2018 at 09:52:18AM +0800, Jason Wang wrote:
>
>
> On 2018年03月20日 12:26, Jonathan Helman wrote:
> > > On Mar 19, 2018, at 7:31 PM, Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > >
> > > On 2018年03月20日 06:14, Jonathan Helman wrote:
> > > > Export the number of successful and failed hugetlb page
> > > > allocations via the virtio balloon driver. These 2 counts
> > > > come directly from the vm_events HTLB_BUDDY_PGALLOC and
> > > > HTLB_BUDDY_PGALLOC_FAIL.
> > > >
> > > > Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com>
> > > Reviewed-by: Jason Wang <jasowang@redhat.com>
> > Thanks.
> >
> > > > ---
> > > > drivers/virtio/virtio_balloon.c | 6 ++++++
> > > > include/uapi/linux/virtio_balloon.h | 4 +++-
> > > > 2 files changed, 9 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> > > > index dfe5684..6b237e3 100644
> > > > --- a/drivers/virtio/virtio_balloon.c
> > > > +++ b/drivers/virtio/virtio_balloon.c
> > > > @@ -272,6 +272,12 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
> > > > pages_to_bytes(events[PSWPOUT]));
> > > > update_stat(vb, idx++, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]);
> > > > update_stat(vb, idx++, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]);
> > > > +#ifdef CONFIG_HUGETLB_PAGE
> > > > + update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
> > > > + events[HTLB_BUDDY_PGALLOC]);
> > > > + update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGFAIL,
> > > > + events[HTLB_BUDDY_PGALLOC_FAIL]);
> > > > +#endif
> > > > #endif
> > > > update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMFREE,
> > > > pages_to_bytes(i.freeram));
> > > > diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
> > > > index 4e8b830..40297a3 100644
> > > > --- a/include/uapi/linux/virtio_balloon.h
> > > > +++ b/include/uapi/linux/virtio_balloon.h
> > > > @@ -53,7 +53,9 @@ struct virtio_balloon_config {
> > > > #define VIRTIO_BALLOON_S_MEMTOT 5 /* Total amount of memory */
> > > > #define VIRTIO_BALLOON_S_AVAIL 6 /* Available memory as in /proc */
> > > > #define VIRTIO_BALLOON_S_CACHES 7 /* Disk caches */
> > > > -#define VIRTIO_BALLOON_S_NR 8
> > > > +#define VIRTIO_BALLOON_S_HTLB_PGALLOC 8 /* Hugetlb page allocations */
> > > > +#define VIRTIO_BALLOON_S_HTLB_PGFAIL 9 /* Hugetlb page allocation failures */
> > > > +#define VIRTIO_BALLOON_S_NR 10
> > > > /*
> > > > * Memory statistics structure.
> > > Not for this patch, but it looks to me that exporting such nr through uapi is fragile.
> > Sorry, can you explain what you mean here?
> >
> > Jon
>
> Spec said "Within an output buffer submitted to the statsq, the device MUST
> ignore entries with tag values that it does not recognize". So exporting
> VIRTIO_BALLOON_S_NR seems useless and device implementation can not depend
> on such number in uapi.
>
> Thanks
Suggestions? I don't like to break build for people ...
> >
> > > Thanks
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH v2] virtio_balloon: export hugetlb page allocation counts
From: Jason Wang @ 2018-03-22 1:52 UTC (permalink / raw)
To: Jonathan Helman; +Cc: virtio-dev, virtualization, linux-kernel, mst
In-Reply-To: <6ED6CAC9-FAD9-4A27-A9B1-C6EFC7627D68@oracle.com>
On 2018年03月20日 12:26, Jonathan Helman wrote:
>> On Mar 19, 2018, at 7:31 PM, Jason Wang <jasowang@redhat.com> wrote:
>>
>>
>>
>> On 2018年03月20日 06:14, Jonathan Helman wrote:
>>> Export the number of successful and failed hugetlb page
>>> allocations via the virtio balloon driver. These 2 counts
>>> come directly from the vm_events HTLB_BUDDY_PGALLOC and
>>> HTLB_BUDDY_PGALLOC_FAIL.
>>>
>>> Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com>
>> Reviewed-by: Jason Wang <jasowang@redhat.com>
> Thanks.
>
>>> ---
>>> drivers/virtio/virtio_balloon.c | 6 ++++++
>>> include/uapi/linux/virtio_balloon.h | 4 +++-
>>> 2 files changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
>>> index dfe5684..6b237e3 100644
>>> --- a/drivers/virtio/virtio_balloon.c
>>> +++ b/drivers/virtio/virtio_balloon.c
>>> @@ -272,6 +272,12 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
>>> pages_to_bytes(events[PSWPOUT]));
>>> update_stat(vb, idx++, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]);
>>> update_stat(vb, idx++, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]);
>>> +#ifdef CONFIG_HUGETLB_PAGE
>>> + update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
>>> + events[HTLB_BUDDY_PGALLOC]);
>>> + update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGFAIL,
>>> + events[HTLB_BUDDY_PGALLOC_FAIL]);
>>> +#endif
>>> #endif
>>> update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMFREE,
>>> pages_to_bytes(i.freeram));
>>> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
>>> index 4e8b830..40297a3 100644
>>> --- a/include/uapi/linux/virtio_balloon.h
>>> +++ b/include/uapi/linux/virtio_balloon.h
>>> @@ -53,7 +53,9 @@ struct virtio_balloon_config {
>>> #define VIRTIO_BALLOON_S_MEMTOT 5 /* Total amount of memory */
>>> #define VIRTIO_BALLOON_S_AVAIL 6 /* Available memory as in /proc */
>>> #define VIRTIO_BALLOON_S_CACHES 7 /* Disk caches */
>>> -#define VIRTIO_BALLOON_S_NR 8
>>> +#define VIRTIO_BALLOON_S_HTLB_PGALLOC 8 /* Hugetlb page allocations */
>>> +#define VIRTIO_BALLOON_S_HTLB_PGFAIL 9 /* Hugetlb page allocation failures */
>>> +#define VIRTIO_BALLOON_S_NR 10
>>> /*
>>> * Memory statistics structure.
>> Not for this patch, but it looks to me that exporting such nr through uapi is fragile.
> Sorry, can you explain what you mean here?
>
> Jon
Spec said "Within an output buffer submitted to the statsq, the device
MUST ignore entries with tag values that it does not recognize". So
exporting VIRTIO_BALLOON_S_NR seems useless and device implementation
can not depend on such number in uapi.
Thanks
>
>> Thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH 1/4] iommu: Add virtio-iommu driver
From: Robin Murphy @ 2018-03-21 14:23 UTC (permalink / raw)
To: Jean-Philippe Brucker, Tian, Kevin,
iommu@lists.linux-foundation.org, kvm@vger.kernel.org,
virtualization@lists.linux-foundation.org,
virtio-dev@lists.oasis-open.org, kvmarm@lists.cs.columbia.edu
Cc: jayachandran.nair@cavium.com, tnowicki@caviumnetworks.com,
mst@redhat.com, Marc Zyngier, Will Deacon,
jintack@cs.columbia.edu, eric.auger.pro@gmail.com
In-Reply-To: <a11f1b52-c031-014c-8c23-1732f3d4666c@arm.com>
On 21/03/18 13:14, Jean-Philippe Brucker wrote:
> On 21/03/18 06:43, Tian, Kevin wrote:
> [...]
>>> +
>>> +#include <uapi/linux/virtio_iommu.h>
>>> +
>>> +#define MSI_IOVA_BASE 0x8000000
>>> +#define MSI_IOVA_LENGTH 0x100000
>>
>> this is ARM specific, and according to virtio-iommu spec isn't it
>> better probed on the endpoint instead of hard-coding here?
>
> These values are arbitrary, not really ARM-specific even if ARM is the
> only user yet: we're just reserving a random IOVA region for mapping MSIs.
> It is hard-coded because of the way iommu-dma.c works, but I don't quite
> remember why that allocation isn't dynamic.
The host kernel needs to have *some* MSI region in place before the
guest can start configuring interrupts, otherwise it won't know what
address to give to the underlying hardware. However, as soon as the host
kernel has picked a region, host userspace needs to know that it can no
longer use addresses in that region for DMA-able guest memory. It's a
lot easier when the address is fixed in hardware and the host userspace
will never be stupid enough to try and VFIO_IOMMU_DMA_MAP it, but in the
more general case where MSI writes undergo IOMMU address translation so
it's an arbitrary IOVA, this has the potential to conflict with stuff
like guest memory hotplug.
What we currently have is just the simplest option, with the host kernel
just picking something up-front and pretending to host userspace that
it's a fixed hardware address. There's certainly scope for it to be a
bit more dynamic in the sense of adding an interface to let userspace
move it around (before attaching any devices, at least), but I don't
think it's feasible for the host kernel to second-guess userspace enough
to make it entirely transparent like it is in the DMA API domain case.
Of course, that's all assuming the host itself is using a virtio-iommu
(e.g. in a nested virt or emulation scenario). When it's purely within a
guest then an MSI reservation shouldn't matter so much, since the guest
won't be anywhere near the real hardware configuration anyway.
Robin.
> As said on the v0.6 spec thread, I'm not sure allocating the IOVA range in
> the host is preferable. With nested translation the guest has to map it
> anyway, and I believe dealing with IOVA allocation should be left to the
> guest when possible.
>
> Thanks,
> Jean
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
^ permalink raw reply
* Re: [PATCH 1/4] iommu: Add virtio-iommu driver
From: Jean-Philippe Brucker @ 2018-03-21 13:14 UTC (permalink / raw)
To: Tian, Kevin, iommu@lists.linux-foundation.org,
kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
virtio-dev@lists.oasis-open.org, kvmarm@lists.cs.columbia.edu
Cc: jayachandran.nair@cavium.com, Lorenzo Pieralisi,
tnowicki@caviumnetworks.com, mst@redhat.com, Marc Zyngier,
Will Deacon, jintack@cs.columbia.edu, eric.auger@redhat.com,
Robin Murphy, joro@8bytes.org, eric.auger.pro@gmail.com
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D19108B0FE@SHSMSX101.ccr.corp.intel.com>
On 21/03/18 06:43, Tian, Kevin wrote:
[...]
>> +
>> +#include <uapi/linux/virtio_iommu.h>
>> +
>> +#define MSI_IOVA_BASE 0x8000000
>> +#define MSI_IOVA_LENGTH 0x100000
>
> this is ARM specific, and according to virtio-iommu spec isn't it
> better probed on the endpoint instead of hard-coding here?
These values are arbitrary, not really ARM-specific even if ARM is the
only user yet: we're just reserving a random IOVA region for mapping MSIs.
It is hard-coded because of the way iommu-dma.c works, but I don't quite
remember why that allocation isn't dynamic.
As said on the v0.6 spec thread, I'm not sure allocating the IOVA range in
the host is preferable. With nested translation the guest has to map it
anyway, and I believe dealing with IOVA allocation should be left to the
guest when possible.
Thanks,
Jean
^ permalink raw reply
* Re: [virtio-dev] [RFC] virtio-iommu version 0.6
From: Jean-Philippe Brucker @ 2018-03-21 13:13 UTC (permalink / raw)
To: Tian, Kevin, virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org
Cc: eric.auger@redhat.com, eric.auger.pro@gmail.com
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D191085A76@SHSMSX101.ccr.corp.intel.com>
Hi Kevin,
Thanks for the comments
On 19/03/18 10:03, Tian, Kevin wrote:
> BYPASS feature bit is not covered in "2.3.1/2.3.2/2.3.3"". Is it
> intended?
In my opinion BYPASS is a bit different from the other features: while the
others are needed for correctness, this one is optional and even if the
guest supports BYPASS, it should be allowed not to accept it. For security
reasons it may not want to let endpoints access the whole guest-physical
address space.
> --2.6.2 Device Requirements: Device operations--
>
> "If the VIRTIO_IOMMU_F_INPUT_RANGE feature is offered,
> the device MUST truncate the range described by virt_start
> and virt_end in requests to ft in the range described by
> input_range."
>
> "If the VIRTIO_IOMMU_F_DOMAIN_BITS is offered, the device
> MUST ignore bits above domain_bits in field domain of requests."
>
> shouldn't device return error upon above situation instead
> of continuing operation with modified value?
The intent was to allow device and driver to pass metadata in the top bits
of pointers and domain IDs, perhaps for a future extension or for
debugging. I'm not convinced it's useful anymore (and do wonder what my
initial reasoning was...) Such extension would be determined by a new
feature bit and the device would know that the fields contain additional
info, if the driver accepted that feature.
> --2.6.4 DETACH request--
>
> " Detach an endpoint from its domain. When this request
> completes, the endpoint cannot access any mapping from that
> domain anymore "
>
> Does it make sense to mention BYPASS situation upon this request?
Sure, I'll add something to clarify that situation
> --2.6.8.2 Property RESV_MEM --
>
> "VIRTIO_IOMMU_RESV_MEM_T_RESERVED (0) Accesses to
> virtual addresses in this region are not translated by the device.
> They may either be aborted by the device (or the underlying
> IOMMU), bypass it, or never even reach it"
>
> in 3.3 hardware device assignment you described an example
> that a reserved range is stolen by host to map the MSI
> doorbell... such map behavior is not described above.
Right, we can say that accesses to the region may be used for host translation
> Then comes a question for VIRTIO_IOMMU_RESV_MEM_T_MSI.
> I know there were quite some discussion around this flag before,
> but my mental picture now still is a bit difficult to understand its
> usage based on examples in implementation notes:
>
> - for x86, you describe it as indicating MSI bypass for
> oxfeexxxxx. However guest doesn't need to know this fact. Only
> requirement is to treat it as reserved range (as on bare metal)
> then T_RESERVED is sufficient for this purpose>
> - for ARM, either let guest or let host to choose a virtual
> address for mapping to MSI doorbell. the former doesn't require
> a reserved range. for the latter also T_RESERVED is enough as
> the example in hardware device assignment section.
It might be nicer to have the host decide it, but when the physical IOMMU
is ARM SMMU, nesting translation complicates things, because the guest
*has* to create a mapping:
* The guest is in charge of stage-1. It creates IOVA->GPA mapping for the
MSI doorbell. The GPA is mapped to the physical MSI doorbell at
stage-2 by the host.
* when it writes that IOVA in the virtual MSI-X tables, the host updates
the physical MSI-X tables using an ioctl (it can't map the physical
tables into the guest, because MSI-X vector data also contain physical
device info that shouldn't be known by the guest.)
So we need support for software-mapped MSIs in the guest anyway. In Linux
the virtio-iommu driver can use existing plumbing for that, no change
needed in the guest.
> Then what's the real point of differentiating MSI bypass from
> normal reserved range? Is there other example where guest
> may use such info to do special thing?
The presence of an MSI bypass regions allows the driver and the DMA layer
(in Linux iommu_dma_map_msi_msg) to know whether it needs to create a
mapping or if it's bypassed. When writing the entry into the MSI-X table,
if the address in the MSI vector already corresponds to an MSI bypass
region, then there is no need to create a mapping. Making the bypass MSI
region a normal RESV range hides that information.
Another way of choosing would be with #ifdef CONFIG_ARM64, CONFIG_X86 etc,
but I find it nasty, and I personally prefer using MSI bypass for ARM when
possible.
> -- 3.2 Message Signaled Interrupts --
>
> " Section 3.2.2 describes the added complexity (from the host
> point of view) of translating the IRQ chip doorbell "
>
> however later 3.2.2 only says one sentence about host
> complexity - " However, this mode of operations may add
> significant complexity in the host implementation". If you plan
> to elaborate that part in the future, please add a note. :-)
Oh right, I don't really want to elaborate, I'll try to reword 3.2. The
kvmtool patches have support for both MSI bypass and translation
(selectable with a cmdline flag), which I think is good enough. I'd rather
read a code example than some clumsy prose describing it.
Thanks,
Jean
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [Outreachy kernel] [PATCH] drm/qxl: Replace drm_gem_object_reference/unreference() with _get/put()
From: Daniel Vetter @ 2018-03-21 8:20 UTC (permalink / raw)
To: Santha Meena Ramamoorthy
Cc: airlied, linux-kernel, dri-devel, virtualization,
outreachy-kernel, seanpaul, daniel.vetter, airlied
In-Reply-To: <1521570567-22519-1-git-send-email-santhameena13@gmail.com>
On Tue, Mar 20, 2018 at 11:29:27AM -0700, Santha Meena Ramamoorthy wrote:
> Replace drm_gem_object_reference/unreference function with *_get/put()
> suffixes, because it is shorter and consistent with the kernel
> kref_get/put() functions. The following Coccinelle script was used:
>
> @@
> expression e;
> @@
>
> (
> -drm_gem_object_reference(e);
> +drm_gem_object_get(e);
> |
> -drm_gem_object_unreference(e);
> +drm_gem_object_put(e);
> |
> -drm_gem_object_unreference_unlocked(e);
> +drm_gem_object_put_unlocked(e);
> )
>
> Signed-off-by: Santha Meena Ramamoorthy <santhameena13@gmail.com>
lgtm, thanks for your patch. Applied to drm-misc-next.
-Daniel
> ---
> drivers/gpu/drm/qxl/qxl_display.c | 4 ++--
> drivers/gpu/drm/qxl/qxl_dumb.c | 2 +-
> drivers/gpu/drm/qxl/qxl_fb.c | 6 +++---
> drivers/gpu/drm/qxl/qxl_gem.c | 2 +-
> drivers/gpu/drm/qxl/qxl_ioctl.c | 4 ++--
> drivers/gpu/drm/qxl/qxl_object.c | 6 +++---
> 6 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c
> index 9a9214a..ecb35ed 100644
> --- a/drivers/gpu/drm/qxl/qxl_display.c
> +++ b/drivers/gpu/drm/qxl/qxl_display.c
> @@ -309,7 +309,7 @@ void qxl_user_framebuffer_destroy(struct drm_framebuffer *fb)
> struct qxl_bo *bo = gem_to_qxl_bo(qxl_fb->obj);
>
> WARN_ON(bo->shadow);
> - drm_gem_object_unreference_unlocked(qxl_fb->obj);
> + drm_gem_object_put_unlocked(qxl_fb->obj);
> drm_framebuffer_cleanup(fb);
> kfree(qxl_fb);
> }
> @@ -1215,7 +1215,7 @@ qxl_user_framebuffer_create(struct drm_device *dev,
> ret = qxl_framebuffer_init(dev, qxl_fb, mode_cmd, obj, &qxl_fb_funcs);
> if (ret) {
> kfree(qxl_fb);
> - drm_gem_object_unreference_unlocked(obj);
> + drm_gem_object_put_unlocked(obj);
> return NULL;
> }
>
> diff --git a/drivers/gpu/drm/qxl/qxl_dumb.c b/drivers/gpu/drm/qxl/qxl_dumb.c
> index 11085ab..c666b89 100644
> --- a/drivers/gpu/drm/qxl/qxl_dumb.c
> +++ b/drivers/gpu/drm/qxl/qxl_dumb.c
> @@ -82,6 +82,6 @@ int qxl_mode_dumb_mmap(struct drm_file *file_priv,
> return -ENOENT;
> qobj = gem_to_qxl_bo(gobj);
> *offset_p = qxl_bo_mmap_offset(qobj);
> - drm_gem_object_unreference_unlocked(gobj);
> + drm_gem_object_put_unlocked(gobj);
> return 0;
> }
> diff --git a/drivers/gpu/drm/qxl/qxl_fb.c b/drivers/gpu/drm/qxl/qxl_fb.c
> index 23af3e3..3388914 100644
> --- a/drivers/gpu/drm/qxl/qxl_fb.c
> +++ b/drivers/gpu/drm/qxl/qxl_fb.c
> @@ -95,7 +95,7 @@ static void qxlfb_destroy_pinned_object(struct drm_gem_object *gobj)
> qxl_bo_kunmap(qbo);
> qxl_bo_unpin(qbo);
>
> - drm_gem_object_unreference_unlocked(gobj);
> + drm_gem_object_put_unlocked(gobj);
> }
>
> int qxl_get_handle_for_primary_fb(struct qxl_device *qdev,
> @@ -316,11 +316,11 @@ static int qxlfb_create(struct qxl_fbdev *qfbdev,
> qxl_bo_unpin(qbo);
> }
> if (fb && ret) {
> - drm_gem_object_unreference_unlocked(gobj);
> + drm_gem_object_put_unlocked(gobj);
> drm_framebuffer_cleanup(fb);
> kfree(fb);
> }
> - drm_gem_object_unreference_unlocked(gobj);
> + drm_gem_object_put_unlocked(gobj);
> return ret;
> }
>
> diff --git a/drivers/gpu/drm/qxl/qxl_gem.c b/drivers/gpu/drm/qxl/qxl_gem.c
> index 85f5467..f5c1e78 100644
> --- a/drivers/gpu/drm/qxl/qxl_gem.c
> +++ b/drivers/gpu/drm/qxl/qxl_gem.c
> @@ -98,7 +98,7 @@ int qxl_gem_object_create_with_handle(struct qxl_device *qdev,
> return r;
> /* drop reference from allocate - handle holds it now */
> *qobj = gem_to_qxl_bo(gobj);
> - drm_gem_object_unreference_unlocked(gobj);
> + drm_gem_object_put_unlocked(gobj);
> return 0;
> }
>
> diff --git a/drivers/gpu/drm/qxl/qxl_ioctl.c b/drivers/gpu/drm/qxl/qxl_ioctl.c
> index e8c0b10..e238a1a 100644
> --- a/drivers/gpu/drm/qxl/qxl_ioctl.c
> +++ b/drivers/gpu/drm/qxl/qxl_ioctl.c
> @@ -121,7 +121,7 @@ static int qxlhw_handle_to_bo(struct drm_file *file_priv, uint64_t handle,
> qobj = gem_to_qxl_bo(gobj);
>
> ret = qxl_release_list_add(release, qobj);
> - drm_gem_object_unreference_unlocked(gobj);
> + drm_gem_object_put_unlocked(gobj);
> if (ret)
> return ret;
>
> @@ -343,7 +343,7 @@ static int qxl_update_area_ioctl(struct drm_device *dev, void *data,
> qxl_bo_unreserve(qobj);
>
> out:
> - drm_gem_object_unreference_unlocked(gobj);
> + drm_gem_object_put_unlocked(gobj);
> return ret;
> }
>
> diff --git a/drivers/gpu/drm/qxl/qxl_object.c b/drivers/gpu/drm/qxl/qxl_object.c
> index f6b80fe..e9fb0ab 100644
> --- a/drivers/gpu/drm/qxl/qxl_object.c
> +++ b/drivers/gpu/drm/qxl/qxl_object.c
> @@ -211,13 +211,13 @@ void qxl_bo_unref(struct qxl_bo **bo)
> if ((*bo) == NULL)
> return;
>
> - drm_gem_object_unreference_unlocked(&(*bo)->gem_base);
> + drm_gem_object_put_unlocked(&(*bo)->gem_base);
> *bo = NULL;
> }
>
> struct qxl_bo *qxl_bo_ref(struct qxl_bo *bo)
> {
> - drm_gem_object_reference(&bo->gem_base);
> + drm_gem_object_get(&bo->gem_base);
> return bo;
> }
>
> @@ -318,7 +318,7 @@ void qxl_bo_force_delete(struct qxl_device *qdev)
> list_del_init(&bo->list);
> mutex_unlock(&qdev->gem.mutex);
> /* this should unref the ttm bo */
> - drm_gem_object_unreference_unlocked(&bo->gem_base);
> + drm_gem_object_put_unlocked(&bo->gem_base);
> }
> }
>
> --
> 2.7.4
>
> --
> You received this message because you are subscribed to the Google Groups "outreachy-kernel" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to outreachy-kernel+unsubscribe@googlegroups.com.
> To post to this group, send email to outreachy-kernel@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/outreachy-kernel/1521570567-22519-1-git-send-email-santhameena13%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply
* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
From: Tiwei Bie @ 2018-03-21 7:35 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180316145702-mutt-send-email-mst@kernel.org>
On Fri, Mar 16, 2018 at 04:30:02PM +0200, Michael S. Tsirkin wrote:
> On Fri, Mar 16, 2018 at 07:36:47PM +0800, Jason Wang wrote:
> > > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > > > > > if (!queue) {
> > > > > > > > > /* Try to get a single page. You are my only hope! */
> > > > > > > > > - queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > > > > > > + queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > > > > > > + packed),
> > > > > > > > > &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > > > > > > > }
> > > > > > > > > if (!queue)
> > > > > > > > > return NULL;
> > > > > > > > > - queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > > > > - vring_init(&vring, num, queue, vring_align);
> > > > > > > > > + queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > > > > > > + if (packed)
> > > > > > > > > + vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > > > > > > + else
> > > > > > > > > + vring_init(&vring.vring_split, num, queue, vring_align);
> > > > > > > > Let's rename vring_init to vring_init_split() like other helpers?
> > > > > > > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > > > > > > I don't think we can rename it.
> > > > > > I see, then this need more thoughts to unify the API.
> > > > > My thought is to keep the old API as is, and introduce
> > > > > new types and helpers for packed ring.
> > > > I admit it's not a fault of this patch. But we'd better think of this in the
> > > > future, consider we may have new kinds of ring.
> > > >
> > > > > More details can be found in this patch:
> > > > > https://lkml.org/lkml/2018/2/23/243
> > > > > (PS. The type which has bit fields is just for reference,
> > > > > and will be changed in next version.)
> > > > >
> > > > > Do you have any other suggestions?
> > > > No.
> > > Hmm.. Sorry, I didn't describe my question well.
> > > I mean do you have any suggestions about the API
> > > design for packed ring in uapi header? Currently
> > > I introduced below two new helpers:
> > >
> > > static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
> > > void *p, unsigned long align);
> > > static inline unsigned vring_packed_size(unsigned int num, unsigned long align);
> > >
> > > When new rings are introduced in the future, above
> > > helpers can't be reused. Maybe we should make the
> > > helpers be able to determine the ring type?
> >
> > Let's wait for Michael's comment here. Generally, I fail to understand why
> > vring_init() become a part of uapi. Git grep shows the only use cases are
> > virtio_test/vringh_test.
> >
> > Thanks
>
> For init - I think it's a mistake that stems from lguest which sometimes
> made it less than obvious which code is where. I don't see a reason to
> add to it.
Got it! I'll move vring_packed_init() out of uapi. Many thanks! :)
Best regards,
Tiwei Bie
>
> --
> MST
^ permalink raw reply
* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
From: Tiwei Bie @ 2018-03-21 7:30 UTC (permalink / raw)
To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <094ca28b-d8af-bf7a-ea7e-0d0bf7518bda@redhat.com>
On Fri, Mar 16, 2018 at 07:36:47PM +0800, Jason Wang wrote:
> On 2018年03月16日 18:04, Tiwei Bie wrote:
> > On Fri, Mar 16, 2018 at 04:34:28PM +0800, Jason Wang wrote:
> > > On 2018年03月16日 15:40, Tiwei Bie wrote:
> > > > On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
> > > > > On 2018年03月16日 14:10, Tiwei Bie wrote:
> > > > > > On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
> > > > > > > On 2018年02月23日 19:18, Tiwei Bie wrote:
> > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > > ---
> > > > > > > > drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
> > > > > > > > include/linux/virtio_ring.h | 8 +-
> > > > > > > > 2 files changed, 618 insertions(+), 89 deletions(-)
[...]
> > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > > > > if (!queue) {
> > > > > > > > /* Try to get a single page. You are my only hope! */
> > > > > > > > - queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > > > > > + queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > > > > > + packed),
> > > > > > > > &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > > > > > > }
> > > > > > > > if (!queue)
> > > > > > > > return NULL;
> > > > > > > > - queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > > > - vring_init(&vring, num, queue, vring_align);
> > > > > > > > + queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > > > > > + if (packed)
> > > > > > > > + vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > > > > > + else
> > > > > > > > + vring_init(&vring.vring_split, num, queue, vring_align);
> > > > > > > Let's rename vring_init to vring_init_split() like other helpers?
> > > > > > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > > > > > I don't think we can rename it.
> > > > > I see, then this need more thoughts to unify the API.
> > > > My thought is to keep the old API as is, and introduce
> > > > new types and helpers for packed ring.
> > > I admit it's not a fault of this patch. But we'd better think of this in the
> > > future, consider we may have new kinds of ring.
> > >
> > > > More details can be found in this patch:
> > > > https://lkml.org/lkml/2018/2/23/243
> > > > (PS. The type which has bit fields is just for reference,
> > > > and will be changed in next version.)
> > > >
> > > > Do you have any other suggestions?
> > > No.
> > Hmm.. Sorry, I didn't describe my question well.
> > I mean do you have any suggestions about the API
> > design for packed ring in uapi header? Currently
> > I introduced below two new helpers:
> >
> > static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
> > void *p, unsigned long align);
> > static inline unsigned vring_packed_size(unsigned int num, unsigned long align);
> >
> > When new rings are introduced in the future, above
> > helpers can't be reused. Maybe we should make the
> > helpers be able to determine the ring type?
>
> Let's wait for Michael's comment here. Generally, I fail to understand why
> vring_init() become a part of uapi. Git grep shows the only use cases are
> virtio_test/vringh_test.
Thank you very much for the review on this patch!
I'll send out a new version ASAP to address these
comments. :)
Best regards,
Tiwei Bie
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* RE: [PATCH 1/4] iommu: Add virtio-iommu driver
From: Tian, Kevin @ 2018-03-21 6:43 UTC (permalink / raw)
To: Jean-Philippe Brucker, iommu@lists.linux-foundation.org,
kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
virtio-dev@lists.oasis-open.org, kvmarm@lists.cs.columbia.edu
Cc: jayachandran.nair@cavium.com, lorenzo.pieralisi@arm.com,
tnowicki@caviumnetworks.com, mst@redhat.com, marc.zyngier@arm.com,
will.deacon@arm.com, jintack@cs.columbia.edu,
eric.auger@redhat.com, robin.murphy@arm.com, joro@8bytes.org,
eric.auger.pro@gmail.com
In-Reply-To: <20180214145340.1223-2-jean-philippe.brucker@arm.com>
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker@arm.com]
> Sent: Wednesday, February 14, 2018 10:54 PM
>
> The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
> requests such as map/unmap over virtio-mmio transport without
> emulating
> page tables. This implementation handles ATTACH, DETACH, MAP and
> UNMAP
> requests.
>
> The bulk of the code transforms calls coming from the IOMMU API into
> corresponding virtio requests. Mappings are kept in an interval tree
> instead of page tables.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
[...]
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> new file mode 100644
> index 000000000000..a9c9245e8ba2
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -0,0 +1,960 @@
> +/*
> + * Virtio driver for the paravirtualized IOMMU
> + *
> + * Copyright (C) 2018 ARM Limited
> + * Author: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/amba/bus.h>
> +#include <linux/delay.h>
> +#include <linux/dma-iommu.h>
> +#include <linux/freezer.h>
> +#include <linux/interval_tree.h>
> +#include <linux/iommu.h>
> +#include <linux/module.h>
> +#include <linux/of_iommu.h>
> +#include <linux/of_platform.h>
> +#include <linux/pci.h>
> +#include <linux/platform_device.h>
> +#include <linux/virtio.h>
> +#include <linux/virtio_config.h>
> +#include <linux/virtio_ids.h>
> +#include <linux/wait.h>
> +
> +#include <uapi/linux/virtio_iommu.h>
> +
> +#define MSI_IOVA_BASE 0x8000000
> +#define MSI_IOVA_LENGTH 0x100000
this is ARM specific, and according to virtio-iommu spec isn't it
better probed on the endpoint instead of hard-coding here?
Thanks
Kevin
^ permalink raw reply
* Re: [PATCH v2] virtio_balloon: export hugetlb page allocation counts
From: Jonathan Helman @ 2018-03-20 4:26 UTC (permalink / raw)
To: Jason Wang; +Cc: virtio-dev, virtualization, linux-kernel, mst
In-Reply-To: <db6371af-f93d-4409-c910-2ff712196c5b@redhat.com>
> On Mar 19, 2018, at 7:31 PM, Jason Wang <jasowang@redhat.com> wrote:
>
>
>
> On 2018年03月20日 06:14, Jonathan Helman wrote:
>> Export the number of successful and failed hugetlb page
>> allocations via the virtio balloon driver. These 2 counts
>> come directly from the vm_events HTLB_BUDDY_PGALLOC and
>> HTLB_BUDDY_PGALLOC_FAIL.
>>
>> Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com>
>
> Reviewed-by: Jason Wang <jasowang@redhat.com>
Thanks.
>
>> ---
>> drivers/virtio/virtio_balloon.c | 6 ++++++
>> include/uapi/linux/virtio_balloon.h | 4 +++-
>> 2 files changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
>> index dfe5684..6b237e3 100644
>> --- a/drivers/virtio/virtio_balloon.c
>> +++ b/drivers/virtio/virtio_balloon.c
>> @@ -272,6 +272,12 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
>> pages_to_bytes(events[PSWPOUT]));
>> update_stat(vb, idx++, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]);
>> update_stat(vb, idx++, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]);
>> +#ifdef CONFIG_HUGETLB_PAGE
>> + update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
>> + events[HTLB_BUDDY_PGALLOC]);
>> + update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGFAIL,
>> + events[HTLB_BUDDY_PGALLOC_FAIL]);
>> +#endif
>> #endif
>> update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMFREE,
>> pages_to_bytes(i.freeram));
>> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
>> index 4e8b830..40297a3 100644
>> --- a/include/uapi/linux/virtio_balloon.h
>> +++ b/include/uapi/linux/virtio_balloon.h
>> @@ -53,7 +53,9 @@ struct virtio_balloon_config {
>> #define VIRTIO_BALLOON_S_MEMTOT 5 /* Total amount of memory */
>> #define VIRTIO_BALLOON_S_AVAIL 6 /* Available memory as in /proc */
>> #define VIRTIO_BALLOON_S_CACHES 7 /* Disk caches */
>> -#define VIRTIO_BALLOON_S_NR 8
>> +#define VIRTIO_BALLOON_S_HTLB_PGALLOC 8 /* Hugetlb page allocations */
>> +#define VIRTIO_BALLOON_S_HTLB_PGFAIL 9 /* Hugetlb page allocation failures */
>> +#define VIRTIO_BALLOON_S_NR 10
>> /*
>> * Memory statistics structure.
>
> Not for this patch, but it looks to me that exporting such nr through uapi is fragile.
Sorry, can you explain what you mean here?
Jon
>
> Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v2] virtio_balloon: export hugetlb page allocation counts
From: Jason Wang @ 2018-03-20 2:31 UTC (permalink / raw)
To: Jonathan Helman, mst; +Cc: virtio-dev, linux-kernel, virtualization
In-Reply-To: <1521497654-24807-1-git-send-email-jonathan.helman@oracle.com>
On 2018年03月20日 06:14, Jonathan Helman wrote:
> Export the number of successful and failed hugetlb page
> allocations via the virtio balloon driver. These 2 counts
> come directly from the vm_events HTLB_BUDDY_PGALLOC and
> HTLB_BUDDY_PGALLOC_FAIL.
>
> Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/virtio/virtio_balloon.c | 6 ++++++
> include/uapi/linux/virtio_balloon.h | 4 +++-
> 2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index dfe5684..6b237e3 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -272,6 +272,12 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
> pages_to_bytes(events[PSWPOUT]));
> update_stat(vb, idx++, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]);
> update_stat(vb, idx++, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]);
> +#ifdef CONFIG_HUGETLB_PAGE
> + update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
> + events[HTLB_BUDDY_PGALLOC]);
> + update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGFAIL,
> + events[HTLB_BUDDY_PGALLOC_FAIL]);
> +#endif
> #endif
> update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMFREE,
> pages_to_bytes(i.freeram));
> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
> index 4e8b830..40297a3 100644
> --- a/include/uapi/linux/virtio_balloon.h
> +++ b/include/uapi/linux/virtio_balloon.h
> @@ -53,7 +53,9 @@ struct virtio_balloon_config {
> #define VIRTIO_BALLOON_S_MEMTOT 5 /* Total amount of memory */
> #define VIRTIO_BALLOON_S_AVAIL 6 /* Available memory as in /proc */
> #define VIRTIO_BALLOON_S_CACHES 7 /* Disk caches */
> -#define VIRTIO_BALLOON_S_NR 8
> +#define VIRTIO_BALLOON_S_HTLB_PGALLOC 8 /* Hugetlb page allocations */
> +#define VIRTIO_BALLOON_S_HTLB_PGFAIL 9 /* Hugetlb page allocation failures */
> +#define VIRTIO_BALLOON_S_NR 10
>
> /*
> * Memory statistics structure.
Not for this patch, but it looks to me that exporting such nr through
uapi is fragile.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH] ptr_ring: fix build
From: Jason Wang @ 2018-03-20 2:29 UTC (permalink / raw)
To: Michael S. Tsirkin, linux-kernel; +Cc: virtualization
In-Reply-To: <1521506496-455194-1-git-send-email-mst@redhat.com>
On 2018年03月20日 08:41, Michael S. Tsirkin wrote:
> Fixes after recent use of kvmalloc
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> tools/virtio/ringtest/ptr_ring.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/tools/virtio/ringtest/ptr_ring.c b/tools/virtio/ringtest/ptr_ring.c
> index 477899c..2d566fb 100644
> --- a/tools/virtio/ringtest/ptr_ring.c
> +++ b/tools/virtio/ringtest/ptr_ring.c
> @@ -17,6 +17,8 @@
> #define likely(x) (__builtin_expect(!!(x), 1))
> #define ALIGN(x, a) (((x) + (a) - 1) / (a) * (a))
> #define SIZE_MAX (~(size_t)0)
> +#define KMALLOC_MAX_SIZE SIZE_MAX
> +#define BUG_ON(x) assert(x)
>
> typedef pthread_spinlock_t spinlock_t;
>
> @@ -57,6 +59,9 @@ static void kfree(void *p)
> free(p);
> }
>
> +#define kvmalloc_array kmalloc_array
> +#define kvfree kfree
> +
> static void spin_lock_init(spinlock_t *lock)
> {
> int r = pthread_spin_init(lock, 0);
Acked-by: Jason Wang <jasowang@redhat.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH] ptr_ring: fix build
From: Michael S. Tsirkin @ 2018-03-20 0:41 UTC (permalink / raw)
To: linux-kernel; +Cc: virtualization
Fixes after recent use of kvmalloc
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
tools/virtio/ringtest/ptr_ring.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/virtio/ringtest/ptr_ring.c b/tools/virtio/ringtest/ptr_ring.c
index 477899c..2d566fb 100644
--- a/tools/virtio/ringtest/ptr_ring.c
+++ b/tools/virtio/ringtest/ptr_ring.c
@@ -17,6 +17,8 @@
#define likely(x) (__builtin_expect(!!(x), 1))
#define ALIGN(x, a) (((x) + (a) - 1) / (a) * (a))
#define SIZE_MAX (~(size_t)0)
+#define KMALLOC_MAX_SIZE SIZE_MAX
+#define BUG_ON(x) assert(x)
typedef pthread_spinlock_t spinlock_t;
@@ -57,6 +59,9 @@ static void kfree(void *p)
free(p);
}
+#define kvmalloc_array kmalloc_array
+#define kvfree kfree
+
static void spin_lock_init(spinlock_t *lock)
{
int r = pthread_spin_init(lock, 0);
--
MST
^ permalink raw reply related
* [PATCH v2] virtio_balloon: export hugetlb page allocation counts
From: Jonathan Helman @ 2018-03-19 22:14 UTC (permalink / raw)
To: mst, jasowang; +Cc: virtio-dev, linux-kernel, virtualization
Export the number of successful and failed hugetlb page
allocations via the virtio balloon driver. These 2 counts
come directly from the vm_events HTLB_BUDDY_PGALLOC and
HTLB_BUDDY_PGALLOC_FAIL.
Signed-off-by: Jonathan Helman <jonathan.helman@oracle.com>
---
drivers/virtio/virtio_balloon.c | 6 ++++++
include/uapi/linux/virtio_balloon.h | 4 +++-
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index dfe5684..6b237e3 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -272,6 +272,12 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb)
pages_to_bytes(events[PSWPOUT]));
update_stat(vb, idx++, VIRTIO_BALLOON_S_MAJFLT, events[PGMAJFAULT]);
update_stat(vb, idx++, VIRTIO_BALLOON_S_MINFLT, events[PGFAULT]);
+#ifdef CONFIG_HUGETLB_PAGE
+ update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGALLOC,
+ events[HTLB_BUDDY_PGALLOC]);
+ update_stat(vb, idx++, VIRTIO_BALLOON_S_HTLB_PGFAIL,
+ events[HTLB_BUDDY_PGALLOC_FAIL]);
+#endif
#endif
update_stat(vb, idx++, VIRTIO_BALLOON_S_MEMFREE,
pages_to_bytes(i.freeram));
diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index 4e8b830..40297a3 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -53,7 +53,9 @@ struct virtio_balloon_config {
#define VIRTIO_BALLOON_S_MEMTOT 5 /* Total amount of memory */
#define VIRTIO_BALLOON_S_AVAIL 6 /* Available memory as in /proc */
#define VIRTIO_BALLOON_S_CACHES 7 /* Disk caches */
-#define VIRTIO_BALLOON_S_NR 8
+#define VIRTIO_BALLOON_S_HTLB_PGALLOC 8 /* Hugetlb page allocations */
+#define VIRTIO_BALLOON_S_HTLB_PGFAIL 9 /* Hugetlb page allocation failures */
+#define VIRTIO_BALLOON_S_NR 10
/*
* Memory statistics structure.
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v5 0/7] jailhouse: Enhance secondary Jailhouse guest support /wrt PCI
From: Thomas Gleixner @ 2018-03-19 19:50 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: jailhouse-dev, Juergen Gross, Benedikt Spranger, Jan Kiszka, x86,
linux-pci, Linux Kernel Mailing List, Rob Herring, virtualization,
Andy Shevchenko, Ingo Molnar, H . Peter Anvin, Bjorn Helgaas,
Otavio Pontes, Mark Rutland
In-Reply-To: <20180319194813.GJ77194@bhelgaas-glaptop.roam.corp.google.com>
Bjorn,
On Mon, 19 Mar 2018, Bjorn Helgaas wrote:
> On Wed, Mar 07, 2018 at 08:39:11AM +0100, Jan Kiszka wrote:
> > Basic x86 support [1] for running Linux as secondary Jailhouse [2] guest
> > is currently pending in the tip tree. This builds on top and enhances
> > the PCI support for x86 and also ARM guests (ARM[64] does not require
> > platform patches and works already).
> > ...
>
> Hi Jan,
>
> What tree do you plan to merge this through? I was sort of assuming
> x86, since the main point of jailhouse is not PCI, but I do notice
> most of the changes here are PCI-related, so thought I should make
> sure you're not waiting on me.
I picked it up already. It's sitting in tip x86/platform
Thanks,
tglx
^ permalink raw reply
* Re: get_user_pages returning 0 (was Re: kernel BUG at drivers/vhost/vhost.c:LINE!)
From: Dmitry Vyukov via Virtualization @ 2018-03-19 18:18 UTC (permalink / raw)
To: dsterba, Michael S. Tsirkin, syzbot, Michel Lespinasse,
syzkaller-bugs, Linux-MM, Andrew Morton, virtualization,
Andrea Arcangeli, Jason Wang, KVM list, LKML, netdev
In-Reply-To: <20180319152936.GI6955@suse.cz>
On Mon, Mar 19, 2018 at 4:29 PM, David Sterba <dsterba@suse.cz> wrote:
> On Mon, Mar 19, 2018 at 05:09:28PM +0200, Michael S. Tsirkin wrote:
>> Hello!
>> The following code triggered by syzbot
>>
>> r = get_user_pages_fast(log, 1, 1, &page);
>> if (r < 0)
>> return r;
>> BUG_ON(r != 1);
>>
>> Just looking at get_user_pages_fast's documentation this seems
>> impossible - it is supposed to only ever return # of pages
>> pinned or errno.
>>
>> However, poking at code, I see at least one path that might cause this:
>>
>> ret = faultin_page(tsk, vma, start, &foll_flags,
>> nonblocking);
>> switch (ret) {
>> case 0:
>> goto retry;
>> case -EFAULT:
>> case -ENOMEM:
>> case -EHWPOISON:
>> return i ? i : ret;
>> case -EBUSY:
>> return i;
>>
>> which originally comes from:
>>
>> commit 53a7706d5ed8f1a53ba062b318773160cc476dde
>> Author: Michel Lespinasse <walken@google.com>
>> Date: Thu Jan 13 15:46:14 2011 -0800
>>
>> mlock: do not hold mmap_sem for extended periods of time
>>
>> __get_user_pages gets a new 'nonblocking' parameter to signal that the
>> caller is prepared to re-acquire mmap_sem and retry the operation if
>> needed. This is used to split off long operations if they are going to
>> block on a disk transfer, or when we detect contention on the mmap_sem.
>>
>> [akpm@linux-foundation.org: remove ref to rwsem_is_contended()]
>> Signed-off-by: Michel Lespinasse <walken@google.com>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Cc: Ingo Molnar <mingo@elte.hu>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: David Howells <dhowells@redhat.com>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>
>> I started looking into this, if anyone has any feedback meanwhile,
>> that would be appreciated.
>>
>> In particular I don't really see why would this trigger
>> on commit 8f5fd927c3a7576d57248a2d7a0861c3f2795973:
>>
>> Merge: 8757ae2 093e037
>> Author: Linus Torvalds <torvalds@linux-foundation.org>
>> Date: Fri Mar 16 13:37:42 2018 -0700
>>
>> Merge tag 'for-4.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
>>
>> is btrfs used on these systems?
>
> There were 3 patches pulled by that tag, none of them is even remotely
> related to the reported bug, AFAICS. If there's some impact, it must be
> indirect, obvious bugs like NULL pointer would exhibit in a different
> way and leave at least some trace in the stacks.
That is just a commit on which the bug was hit. It's provided so that
developers can make sense out of line numbers and check if the tree
includes/not includes a particular commit, etc. It's not that that
commit introduced the bug.
^ permalink raw reply
* get_user_pages returning 0 (was Re: kernel BUG at drivers/vhost/vhost.c:LINE!)
From: Michael S. Tsirkin @ 2018-03-19 15:09 UTC (permalink / raw)
To: syzbot
Cc: aarcange, kvm, netdev, syzkaller-bugs, linux-kernel,
virtualization, linux-mm, David Sterba, Andrew Morton,
Michel Lespinasse
In-Reply-To: <001a11427716098c150567bcd12f@google.com>
Hello!
The following code triggered by syzbot
r = get_user_pages_fast(log, 1, 1, &page);
if (r < 0)
return r;
BUG_ON(r != 1);
Just looking at get_user_pages_fast's documentation this seems
impossible - it is supposed to only ever return # of pages
pinned or errno.
However, poking at code, I see at least one path that might cause this:
ret = faultin_page(tsk, vma, start, &foll_flags,
nonblocking);
switch (ret) {
case 0:
goto retry;
case -EFAULT:
case -ENOMEM:
case -EHWPOISON:
return i ? i : ret;
case -EBUSY:
return i;
which originally comes from:
commit 53a7706d5ed8f1a53ba062b318773160cc476dde
Author: Michel Lespinasse <walken@google.com>
Date: Thu Jan 13 15:46:14 2011 -0800
mlock: do not hold mmap_sem for extended periods of time
__get_user_pages gets a new 'nonblocking' parameter to signal that the
caller is prepared to re-acquire mmap_sem and retry the operation if
needed. This is used to split off long operations if they are going to
block on a disk transfer, or when we detect contention on the mmap_sem.
[akpm@linux-foundation.org: remove ref to rwsem_is_contended()]
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I started looking into this, if anyone has any feedback meanwhile,
that would be appreciated.
In particular I don't really see why would this trigger
on commit 8f5fd927c3a7576d57248a2d7a0861c3f2795973:
Merge: 8757ae2 093e037
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri Mar 16 13:37:42 2018 -0700
Merge tag 'for-4.16-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
is btrfs used on these systems?
syzbot output below:
------------------------
Hello,
syzbot hit the following crash on upstream commit
8f5fd927c3a7576d57248a2d7a0861c3f2795973 (Fri Mar 16 20:37:42 2018 +0000)
Merge tag 'for-4.16-rc5-tag' of
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
So far this crash happened 2 times on upstream.
C reproducer is attached.
syzkaller reproducer is attached.
Raw console output is attached.
compiler: gcc (GCC) 7.1.1 20170620
.config is attached.
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+6304bf97ef436580fede@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.
audit: type=1400 audit(1521377060.016:6): avc: denied { map } for
pid=4210 comm="bash" path="/bin/bash" dev="sda1" ino=1457
scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
tcontext=system_u:object_r:file_t:s0 tclass=file permissive=1
audit: type=1400 audit(1521377077.866:7): avc: denied { map } for
pid=4228 comm="syzkaller050160" path="/root/syzkaller050160487" dev="sda1"
ino=16481 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
------------[ cut here ]------------
kernel BUG at drivers/vhost/vhost.c:1655!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 4228 Comm: syzkaller050160 Not tainted 4.16.0-rc5+ #357
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:set_bit_to_user drivers/vhost/vhost.c:1655 [inline]
RIP: 0010:log_write+0x3ca/0x490 drivers/vhost/vhost.c:1679
RSP: 0018:ffff8801b0fa77b0 EFLAGS: 00010293
RAX: ffff8801af534240 RBX: dffffc0000000000 RCX: ffffffff8443f50a
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801af535618
RBP: ffff8801b0fa78f0 R08: 0000000000000040 R09: 0000000000000001
R10: ffff8801b0fa76d0 R11: 0000000000000002 R12: 0001ffffffffffff
R13: ffffed00361f4f09 R14: ffff8801b0fa78c8 R15: ffff8801b0fa7848
FS: 00000000007df880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020d7c000 CR3: 00000001d3e5b005 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
vhost_update_used_flags+0x379/0x480 drivers/vhost/vhost.c:1726
vhost_vq_init_access+0xca/0x540 drivers/vhost/vhost.c:1766
vhost_net_set_backend drivers/vhost/net.c:1166 [inline]
vhost_net_ioctl+0xee0/0x1920 drivers/vhost/net.c:1320
vfs_ioctl fs/ioctl.c:46 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
SYSC_ioctl fs/ioctl.c:701 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x43ff09
RSP: 002b:00007ffe94d57fc8 EFLAGS: 00000207 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ff09
RDX: 0000000020d7c000 RSI: 000000004008af30 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
R10: 00000000004002c8 R11: 0000000000000207 R12: 0000000000401830
R13: 00000000004018c0 R14: 0000000000000000 R15: 0000000000000000
Code: 5e 41 5f 5d c3 31 c0 eb a6 e8 b3 22 2d fd 4c 89 ef e8 5b bb 4d fd 4c
89 f8 48 c1 e8 03 c6 04 18 f8 e9 3a ff ff ff e8 96 22 2d fd <0f> 0b e8 8f 22
2d fd 4d 8d 6c 24 ff e9 89 fe ff ff e8 80 22 2d
RIP: set_bit_to_user drivers/vhost/vhost.c:1655 [inline] RSP:
ffff8801b0fa77b0
RIP: log_write+0x3ca/0x490 drivers/vhost/vhost.c:1679 RSP: ffff8801b0fa77b0
---[ end trace 867ce9e35847b153 ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..
[....] Starting enhanced syslogd: rsyslogd[ 15.400171] audit: type=1400 audit(1521377057.354:5): avc: denied { syslog } for pid=4071 comm="rsyslogd" capability=34 scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=capability2 permissive=1
^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
Starting mcstransd:
[....] Starting periodic command scheduler: cron^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
[....] Starting OpenBSD Secure Shell server: sshd^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
[....] Starting file context maintaining daemon: restorecond^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0c.
Debian GNU/Linux 7 syzkaller ttyS0
syzkaller login: [ 18.063012] audit: type=1400 audit(1521377060.016:6): avc: denied { map } for pid=4210 comm="bash" path="/bin/bash" dev="sda1" ino=1457 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 tcontext=system_u:object_r:file_t:s0 tclass=file permissive=1
Warning: Permanently added '10.128.0.48' (ECDSA) to the list of known hosts.
executing program
[ 35.912387] audit: type=1400 audit(1521377077.866:7): avc: denied { map } for pid=4228 comm="syzkaller050160" path="/root/syzkaller050160487" dev="sda1" ino=16481 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
[ 35.918516] ------------[ cut here ]------------
[ 35.943043] kernel BUG at drivers/vhost/vhost.c:1655!
[ 35.948327] invalid opcode: 0000 [#1] SMP KASAN
[ 35.952967] Dumping ftrace buffer:
[ 35.956472] (ftrace buffer empty)
[ 35.960152] Modules linked in:
[ 35.963316] CPU: 1 PID: 4228 Comm: syzkaller050160 Not tainted 4.16.0-rc5+ #357
[ 35.970729] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 35.980057] RIP: 0010:log_write+0x3ca/0x490
[ 35.984344] RSP: 0018:ffff8801b0fa77b0 EFLAGS: 00010293
[ 35.989677] RAX: ffff8801af534240 RBX: dffffc0000000000 RCX: ffffffff8443f50a
[ 35.996918] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801af535618
[ 36.004155] RBP: ffff8801b0fa78f0 R08: 0000000000000040 R09: 0000000000000001
[ 36.011392] R10: ffff8801b0fa76d0 R11: 0000000000000002 R12: 0001ffffffffffff
[ 36.018632] R13: ffffed00361f4f09 R14: ffff8801b0fa78c8 R15: ffff8801b0fa7848
[ 36.025871] FS: 00000000007df880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
[ 36.034065] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 36.039915] CR2: 0000000020d7c000 CR3: 00000001d3e5b005 CR4: 00000000001606e0
[ 36.047156] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 36.054393] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 36.061631] Call Trace:
[ 36.064192] ? copy_overflow+0x30/0x30
[ 36.068047] ? translate_desc+0x2bd/0x590
[ 36.072170] vhost_update_used_flags+0x379/0x480
[ 36.076895] vhost_vq_init_access+0xca/0x540
[ 36.081276] vhost_net_ioctl+0xee0/0x1920
[ 36.085397] ? vhost_net_stop_vq+0xf0/0xf0
[ 36.089603] ? avc_ss_reset+0x110/0x110
[ 36.093547] ? __handle_mm_fault+0x5ba/0x38c0
[ 36.098012] ? __pmd_alloc+0x4e0/0x4e0
[ 36.101869] ? trace_hardirqs_off+0x10/0x10
[ 36.106161] ? __fd_install+0x25f/0x740
[ 36.110106] ? find_held_lock+0x35/0x1d0
[ 36.114145] ? check_same_owner+0x320/0x320
[ 36.118438] ? rcu_note_context_switch+0x710/0x710
[ 36.123334] ? __do_page_fault+0x5f7/0xc90
[ 36.127540] ? vhost_net_stop_vq+0xf0/0xf0
[ 36.131742] do_vfs_ioctl+0x1b1/0x1520
[ 36.135601] ? ioctl_preallocate+0x2b0/0x2b0
[ 36.139979] ? selinux_capable+0x40/0x40
[ 36.144009] ? up_read+0x1a/0x40
[ 36.147349] ? security_file_ioctl+0x7d/0xb0
[ 36.151727] ? security_file_ioctl+0x89/0xb0
[ 36.156104] SyS_ioctl+0x8f/0xc0
[ 36.159436] ? do_vfs_ioctl+0x1520/0x1520
[ 36.163557] do_syscall_64+0x281/0x940
[ 36.167412] ? __do_page_fault+0xc90/0xc90
[ 36.171615] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 36.176338] ? syscall_return_slowpath+0x550/0x550
[ 36.181233] ? syscall_return_slowpath+0x2ac/0x550
[ 36.186128] ? prepare_exit_to_usermode+0x350/0x350
[ 36.191113] ? retint_user+0x18/0x18
[ 36.194799] ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 36.199613] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 36.204772] RIP: 0033:0x43ff09
[ 36.207934] RSP: 002b:00007ffe94d57fc8 EFLAGS: 00000207 ORIG_RAX: 0000000000000010
[ 36.215608] RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043ff09
[ 36.222847] RDX: 0000000020d7c000 RSI: 000000004008af30 RDI: 0000000000000003
[ 36.230085] RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
[ 36.237324] R10: 00000000004002c8 R11: 0000000000000207 R12: 0000000000401830
[ 36.244563] R13: 00000000004018c0 R14: 0000000000000000 R15: 0000000000000000
[ 36.251810] Code: 5e 41 5f 5d c3 31 c0 eb a6 e8 b3 22 2d fd 4c 89 ef e8 5b bb 4d fd 4c 89 f8 48 c1 e8 03 c6 04 18 f8 e9 3a ff ff ff e8 96 22 2d fd <0f> 0b e8 8f 22 2d fd 4d 8d 6c 24 ff e9 89 fe ff ff e8 80 22 2d
[ 36.270877] RIP: log_write+0x3ca/0x490 RSP: ffff8801b0fa77b0
[ 36.276691] ---[ end trace 867ce9e35847b153 ]---
[ 36.281425] Kernel panic - not syncing: Fatal exception
[ 36.287129] Dumping ftrace buffer:
[ 36.290635] (ftrace buffer empty)
[ 36.294312] Kernel Offset: disabled
[ 36.297909] Rebooting in 86400 seconds..
# See https://goo.gl/kgGztJ for information about syzkaller reproducers.
#{Threaded:false Collide:false Repeat:false Procs:1 Sandbox: Fault:false FaultCall:-1 FaultNth:0 EnableTun:false UseTmpDir:false HandleSegv:false WaitRepeat:false Debug:false Repro:false}
r0 = openat$vnet(0xffffffffffffff9c, &(0x7f00002ac000)='/dev/vhost-net\x00', 0x2, 0x0)
ioctl$int_in(r0, 0x40000000af01, &(0x7f0000000040))
r1 = openat$audio(0xffffffffffffff9c, &(0x7f0000000180)='/dev/audio\x00', 0x0, 0x0)
close(r1)
socket$packet(0x11, 0x3, 0x300)
ioctl$VHOST_SET_VRING_ADDR(r0, 0x4028af11, &(0x7f0000000500)={0x0, 0x1, &(0x7f0000000740)=""/142, &(0x7f00000003c0)=""/69, &(0x7f0000000140)=""/14, 0xfffffffffffffffc})
ioctl$VHOST_SET_FEATURES(r0, 0x4008af00, &(0x7f0000000640)=0x200000000)
write$vnet(r0, &(0x7f0000000580)={0x1, {&(0x7f00000001c0)=""/219, 0x34c, &(0x7f0000000480)=""/98, 0xffffffffffffffff, 0x2}}, 0x68)
ioctl$VHOST_NET_SET_BACKEND(r0, 0x4008af30, &(0x7f0000d7c000)={0x0, r1})
// autogenerated by syzkaller (http://github.com/google/syzkaller)
#define _GNU_SOURCE
#include <endian.h>
#include <stdint.h>
#include <string.h>
#include <sys/syscall.h>
#include <unistd.h>
uint64_t r[2] = {0xffffffffffffffff, 0xffffffffffffffff};
void loop()
{
long res;
memcpy((void*)0x202ac000, "/dev/vhost-net", 15);
res = syscall(__NR_openat, 0xffffffffffffff9c, 0x202ac000, 2, 0);
if (res != -1)
r[0] = res;
*(uint64_t*)0x20000040 = 0;
syscall(__NR_ioctl, r[0], 0x40000000af01, 0x20000040);
memcpy((void*)0x20000180, "/dev/audio", 11);
res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000180, 0, 0);
if (res != -1)
r[1] = res;
syscall(__NR_close, r[1]);
syscall(__NR_socket, 0x11, 3, 0x300);
*(uint32_t*)0x20000500 = 0;
*(uint32_t*)0x20000504 = 1;
*(uint64_t*)0x20000508 = 0x20000740;
*(uint64_t*)0x20000510 = 0x200003c0;
*(uint64_t*)0x20000518 = 0x20000140;
*(uint64_t*)0x20000520 = 0xfffffffffffffffc;
syscall(__NR_ioctl, r[0], 0x4028af11, 0x20000500);
*(uint64_t*)0x20000640 = 0x200000000;
syscall(__NR_ioctl, r[0], 0x4008af00, 0x20000640);
*(uint32_t*)0x20000580 = 1;
*(uint64_t*)0x20000588 = 0x200001c0;
*(uint64_t*)0x20000590 = 0x34c;
*(uint64_t*)0x20000598 = 0x20000480;
*(uint8_t*)0x200005a0 = -1;
*(uint8_t*)0x200005a1 = 2;
*(uint64_t*)0x200005a8 = 0;
*(uint64_t*)0x200005b0 = 0;
*(uint64_t*)0x200005b8 = 0;
*(uint64_t*)0x200005c0 = 0;
*(uint64_t*)0x200005c8 = 0;
*(uint64_t*)0x200005d0 = 0;
*(uint64_t*)0x200005d8 = 0;
*(uint64_t*)0x200005e0 = 0;
syscall(__NR_write, r[0], 0x20000580, 0x68);
*(uint32_t*)0x20d7c000 = 0;
*(uint32_t*)0x20d7c004 = r[1];
syscall(__NR_ioctl, r[0], 0x4008af30, 0x20d7c000);
}
int main()
{
syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
loop();
return 0;
}
#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.16.0-rc5 Kernel Configuration
#
.. skipped ...
---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.
syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.
^ permalink raw reply
* RE: [virtio-dev] [RFC] virtio-iommu version 0.6
From: Tian, Kevin @ 2018-03-19 10:03 UTC (permalink / raw)
To: Jean-Philippe Brucker, virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org
Cc: eric.auger@redhat.com, eric.auger.pro@gmail.com
In-Reply-To: <20180206181106.21815-1-jean-philippe.brucker@arm.com>
> From: Jean-Philippe Brucker
> Sent: Wednesday, February 7, 2018 2:11 AM
>
> Please find version 0.6 of the virtio-iommu specification at the following
> locations.
>
> Document: http://jpbrucker.net/virtio-iommu/spec/virtio-iommu.pdf
> http://jpbrucker.net/virtio-iommu/spec/virtio-iommu.html
>
> Sources: git://linux-arm.org/virtio-iommu.git viommu/v0.6
>
> I didn't receive any comment for v0.5 [1], so this update is fairly light:
> * Changed range definition in map and unmap requests
> * Use virtio-iommu device ID
> * Update IORT node ID and simplify layout
>
> The following document shows the difference between v0.5 and v0.6:
> http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-pdf-diff-v0.5-
> v0.6.pdf
>
> Next version will hopefully add vSVA (PASID+PRI) and/or architectural
> optimizations, but I can't provide a timeline yet. I'll probably send a
> small draft first.
>
> I will send the Linux driver soon. In the meantime you can fetch it on my
> development branches, based on next-20180206.
>
> git://linux-arm.org/linux-jpb.git virtio-iommu/devel
> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/devel
>
> Any comments welcome!
>
> Thanks,
> Jean
>
> [1] https://www.spinics.net/lists/kvm/msg157402.html
some comments here:
--
BYPASS feature bit is not covered in "2.3.1/2.3.2/2.3.3"". Is it
intended?
--2.6.2 Device Requirements: Device operations--
"If the VIRTIO_IOMMU_F_INPUT_RANGE feature is offered,
the device MUST truncate the range described by virt_start
and virt_end in requests to ft in the range described by
input_range."
"If the VIRTIO_IOMMU_F_DOMAIN_BITS is offered, the device
MUST ignore bits above domain_bits in field domain of requests."
shouldn't device return error upon above situation instead
of continuing operation with modified value?
--2.6.4 DETACH request--
" Detach an endpoint from its domain. When this request
completes, the endpoint cannot access any mapping from that
domain anymore "
Does it make sense to mention BYPASS situation upon this request?
--2.6.8.2 Property RESV_MEM --
"VIRTIO_IOMMU_RESV_MEM_T_RESERVED (0) Accesses to
virtual addresses in this region are not translated by the device.
They may either be aborted by the device (or the underlying
IOMMU), bypass it, or never even reach it"
in 3.3 hardware device assignment you described an example
that a reserved range is stolen by host to map the MSI
doorbell... such map behavior is not described above.
Then comes a question for VIRTIO_IOMMU_RESV_MEM_T_MSI.
I know there were quite some discussion around this flag before,
but my mental picture now still is a bit difficult to understand its
usage based on examples in implementation notes:
- for x86, you describe it as indicating MSI bypass for
oxfeexxxxx. However guest doesn't need to know this fact. Only
requirement is to treat it as reserved range (as on bare metal)
then T_RESERVED is sufficient for this purpose
- for ARM, either let guest or let host to choose a virtual
address for mapping to MSI doorbell. the former doesn't require
a reserved range. for the latter also T_RESERVED is enough as
the example in hardware device assignment section.
Then what's the real point of differentiating MSI bypass from
normal reserved range? Is there other example where guest
may use such info to do special thing?
-- 3.2 Message Signaled Interrupts --
" Section 3.2.2 describes the added complexity (from the host
point of view) of translating the IRQ chip doorbell "
however later 3.2.2 only says one sentence about host
complexity - " However, this mode of operations may add
significant complexity in the host implementation". If you plan
to elaborate that part in the future, please add a note. :-)
Thanks
Kevin
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH] crypto: virtio - remove dependency on CRYPTO_AUTHENC
From: Herbert Xu @ 2018-03-16 15:56 UTC (permalink / raw)
To: Peter Wu; +Cc: virtualization, linux-crypto, Michael S . Tsirkin
In-Reply-To: <20180306235315.11550-1-peter@lekensteyn.nl>
On Wed, Mar 07, 2018 at 12:53:15AM +0100, Peter Wu wrote:
> virtio_crypto does not use function crypto_authenc_extractkeys, remove
> this unnecessary dependency. Compiles fine and passes cryptodev-linux
> cipher and speed tests from https://wiki.qemu.org/Features/VirtioCrypto
>
> Fixes: dbaf0624ffa5 ("crypto: add virtio-crypto driver")
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
Patch applied. Thanks.
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
From: Michael S. Tsirkin @ 2018-03-16 14:30 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, virtualization, wexu
In-Reply-To: <094ca28b-d8af-bf7a-ea7e-0d0bf7518bda@redhat.com>
On Fri, Mar 16, 2018 at 07:36:47PM +0800, Jason Wang wrote:
> > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > > > > if (!queue) {
> > > > > > > > /* Try to get a single page. You are my only hope! */
> > > > > > > > - queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > > > > > + queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > > > > > + packed),
> > > > > > > > &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > > > > > > }
> > > > > > > > if (!queue)
> > > > > > > > return NULL;
> > > > > > > > - queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > > > - vring_init(&vring, num, queue, vring_align);
> > > > > > > > + queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > > > > > + if (packed)
> > > > > > > > + vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > > > > > + else
> > > > > > > > + vring_init(&vring.vring_split, num, queue, vring_align);
> > > > > > > Let's rename vring_init to vring_init_split() like other helpers?
> > > > > > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > > > > > I don't think we can rename it.
> > > > > I see, then this need more thoughts to unify the API.
> > > > My thought is to keep the old API as is, and introduce
> > > > new types and helpers for packed ring.
> > > I admit it's not a fault of this patch. But we'd better think of this in the
> > > future, consider we may have new kinds of ring.
> > >
> > > > More details can be found in this patch:
> > > > https://lkml.org/lkml/2018/2/23/243
> > > > (PS. The type which has bit fields is just for reference,
> > > > and will be changed in next version.)
> > > >
> > > > Do you have any other suggestions?
> > > No.
> > Hmm.. Sorry, I didn't describe my question well.
> > I mean do you have any suggestions about the API
> > design for packed ring in uapi header? Currently
> > I introduced below two new helpers:
> >
> > static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
> > void *p, unsigned long align);
> > static inline unsigned vring_packed_size(unsigned int num, unsigned long align);
> >
> > When new rings are introduced in the future, above
> > helpers can't be reused. Maybe we should make the
> > helpers be able to determine the ring type?
>
> Let's wait for Michael's comment here. Generally, I fail to understand why
> vring_init() become a part of uapi. Git grep shows the only use cases are
> virtio_test/vringh_test.
>
> Thanks
For init - I think it's a mistake that stems from lguest which sometimes
made it less than obvious which code is where. I don't see a reason to
add to it.
--
MST
^ permalink raw reply
* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
From: Jason Wang @ 2018-03-16 11:36 UTC (permalink / raw)
To: Tiwei Bie; +Cc: mst, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <20180316100413.vtqwatregzrmhvt3@debian>
On 2018年03月16日 18:04, Tiwei Bie wrote:
> On Fri, Mar 16, 2018 at 04:34:28PM +0800, Jason Wang wrote:
>> On 2018年03月16日 15:40, Tiwei Bie wrote:
>>> On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
>>>> On 2018年03月16日 14:10, Tiwei Bie wrote:
>>>>> On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
>>>>>> On 2018年02月23日 19:18, Tiwei Bie wrote:
>>>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>>>>> ---
>>>>>>> drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
>>>>>>> include/linux/virtio_ring.h | 8 +-
>>>>>>> 2 files changed, 618 insertions(+), 89 deletions(-)
>>>
[...]
>>>>>>> + }
>>>>>>> + }
>>>>>>> + for (; n < (out_sgs + in_sgs); n++) {
>>>>>>> + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>>>>>>> + dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
>>>>>>> + if (vring_mapping_error(vq, addr))
>>>>>>> + goto unmap_release;
>>>>>>> +
>>>>>>> + flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
>>>>>>> + VRING_DESC_F_WRITE |
>>>>>>> + VRING_DESC_F_AVAIL(vq->wrap_counter) |
>>>>>>> + VRING_DESC_F_USED(!vq->wrap_counter));
>>>>>>> + if (!indirect && i == head)
>>>>>>> + head_flags = flags;
>>>>>>> + else
>>>>>>> + desc[i].flags = flags;
>>>>>>> +
>>>>>>> + desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
>>>>>>> + desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
>>>>>>> + desc[i].id = cpu_to_virtio32(_vq->vdev, head);
>>>>>>> + prev = i;
>>>>>>> + i++;
>>>>>>> + if (!indirect && i >= vq->vring_packed.num) {
>>>>>>> + i = 0;
>>>>>>> + vq->wrap_counter ^= 1;
>>>>>>> + }
>>>>>>> + }
>>>>>>> + }
>>>>>>> + /* Last one doesn't continue. */
>>>>>>> + if (!indirect && (head + 1) % vq->vring_packed.num == i)
>>>>>>> + head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>>>>>> I can't get the why we need this here.
>>>>> If only one desc is used, we will need to clear the
>>>>> VRING_DESC_F_NEXT flag from the head_flags.
>>>> Yes, I meant why following desc[prev].flags won't work for this?
>>> Because the update of desc[head].flags (in above case,
>>> prev == head) has been delayed. The flags is saved in
>>> head_flags.
>> Ok, but let's try to avoid modular here e.g tracking the number of sgs in a
>> counter.
>>
>> And I see lots of duplication in the above two loops, I believe we can unify
>> them with a a single loop. the only difference is dma direction and write
>> flag.
> The above implementation for packed ring is basically
> an mirror of the existing implementation in split ring
> as I want to keep the coding style consistent. Below
> is the corresponding code in split ring:
>
> static inline int virtqueue_add(struct virtqueue *_vq,
> struct scatterlist *sgs[],
> unsigned int total_sg,
> unsigned int out_sgs,
> unsigned int in_sgs,
> void *data,
> void *ctx,
> gfp_t gfp)
> {
> ......
>
> for (n = 0; n < out_sgs; n++) {
> for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> if (vring_mapping_error(vq, addr))
> goto unmap_release;
>
> desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT);
> desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> prev = i;
> i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> }
> }
> for (; n < (out_sgs + in_sgs); n++) {
> for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> if (vring_mapping_error(vq, addr))
> goto unmap_release;
>
> desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT | VRING_DESC_F_WRITE);
> desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> prev = i;
> i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> }
> }
>
> ......
> }
There's no need for such consistency especially consider it's a new kind
of ring. Anyway, you can stick to it.
[...]
>>>>>>> + } else
>>>>>>> + vq->free_head = i;
>>>>>> ID is only valid in the last descriptor in the list, so head + 1 should be
>>>>>> ok too?
>>>>> I don't really get your point. The vq->free_head stores
>>>>> the index of the next avail desc.
>>>> I think I get your idea now, free_head has two meanings:
>>>>
>>>> - next avail index
>>>> - buffer id
>>> In my design, free_head is just the index of the next
>>> avail desc.
>>>
>>> Driver can set anything to buffer ID.
>> Then you need another method to track id to context e.g hashing.
> I keep the context in desc_state[desc_idx]. So there is
> no extra method needed to track the context.
Well, it works for this patch but my reply was for "set anything to
buffer ID". The size of desc_state is limited, so in fact you can't use
a value greater than vq.num.
>
[...]
>> @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
>>>>>>> if (!queue) {
>>>>>>> /* Try to get a single page. You are my only hope! */
>>>>>>> - queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
>>>>>>> + queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
>>>>>>> + packed),
>>>>>>> &dma_addr, GFP_KERNEL|__GFP_ZERO);
>>>>>>> }
>>>>>>> if (!queue)
>>>>>>> return NULL;
>>>>>>> - queue_size_in_bytes = vring_size(num, vring_align);
>>>>>>> - vring_init(&vring, num, queue, vring_align);
>>>>>>> + queue_size_in_bytes = __vring_size(num, vring_align, packed);
>>>>>>> + if (packed)
>>>>>>> + vring_packed_init(&vring.vring_packed, num, queue, vring_align);
>>>>>>> + else
>>>>>>> + vring_init(&vring.vring_split, num, queue, vring_align);
>>>>>> Let's rename vring_init to vring_init_split() like other helpers?
>>>>> The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
>>>>> I don't think we can rename it.
>>>> I see, then this need more thoughts to unify the API.
>>> My thought is to keep the old API as is, and introduce
>>> new types and helpers for packed ring.
>> I admit it's not a fault of this patch. But we'd better think of this in the
>> future, consider we may have new kinds of ring.
>>
>>> More details can be found in this patch:
>>> https://lkml.org/lkml/2018/2/23/243
>>> (PS. The type which has bit fields is just for reference,
>>> and will be changed in next version.)
>>>
>>> Do you have any other suggestions?
>> No.
> Hmm.. Sorry, I didn't describe my question well.
> I mean do you have any suggestions about the API
> design for packed ring in uapi header? Currently
> I introduced below two new helpers:
>
> static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
> void *p, unsigned long align);
> static inline unsigned vring_packed_size(unsigned int num, unsigned long align);
>
> When new rings are introduced in the future, above
> helpers can't be reused. Maybe we should make the
> helpers be able to determine the ring type?
Let's wait for Michael's comment here. Generally, I fail to understand
why vring_init() become a part of uapi. Git grep shows the only use
cases are virtio_test/vringh_test.
Thanks
>
> Best regards,
> Tiwei Bie
>
>> Thanks
>>
>>> Best regards,
>>> Tiwei Bie
>>>
>>>>>>> - vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
>>>>>>> - notify, callback, name);
>>>>>>> + vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
>>>>>>> + context, notify, callback, name);
>>>>>>> if (!vq) {
>>>>>>> vring_free_queue(vdev, queue_size_in_bytes, queue,
>>>>>>> dma_addr);
>>> [...]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
From: Tiwei Bie @ 2018-03-16 10:04 UTC (permalink / raw)
To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, wexu
In-Reply-To: <02a3da02-8226-ba4e-1b47-d25755b2c429@redhat.com>
On Fri, Mar 16, 2018 at 04:34:28PM +0800, Jason Wang wrote:
> On 2018年03月16日 15:40, Tiwei Bie wrote:
> > On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
> > > On 2018年03月16日 14:10, Tiwei Bie wrote:
> > > > On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
> > > > > On 2018年02月23日 19:18, Tiwei Bie wrote:
> > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > ---
> > > > > > drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
> > > > > > include/linux/virtio_ring.h | 8 +-
> > > > > > 2 files changed, 618 insertions(+), 89 deletions(-)
> > [...]
> > > > > > cpu_addr, size, direction);
> > > > > > }
> > > > > > -static void vring_unmap_one(const struct vring_virtqueue *vq,
> > > > > > - struct vring_desc *desc)
> > > > > > +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
> > > > > > {
> > > > > Let's split the helpers to packed/split version like other helpers?
> > > > > (Consider the caller has already known the type of vq).
> > > > Okay.
> > > >
> > > [...]
> > >
> > > > > > + desc[i].flags = flags;
> > > > > > +
> > > > > > + desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > > > + desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > > > > > + desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > > > > If it's a part of chain, we only need to do this for last buffer I think.
> > > > I'm not sure I've got your point about the "last buffer".
> > > > But, yes, id just needs to be set for the last desc.
> > > Right, I think I meant "last descriptor" :)
> > >
> > > > > > + prev = i;
> > > > > > + i++;
> > > > > It looks to me prev is always i - 1?
> > > > No. prev will be (vq->vring_packed.num - 1) when i becomes 0.
> > > Right, so prev = i ? i - 1 : vq->vring_packed.num - 1.
> > Yes, i wraps together with vq->wrap_counter in following code:
> >
> > > > > > + if (!indirect && i >= vq->vring_packed.num) {
> > > > > > + i = 0;
> > > > > > + vq->wrap_counter ^= 1;
> > > > > > + }
> >
> > > > > > + }
> > > > > > + }
> > > > > > + for (; n < (out_sgs + in_sgs); n++) {
> > > > > > + for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > + dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> > > > > > + if (vring_mapping_error(vq, addr))
> > > > > > + goto unmap_release;
> > > > > > +
> > > > > > + flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> > > > > > + VRING_DESC_F_WRITE |
> > > > > > + VRING_DESC_F_AVAIL(vq->wrap_counter) |
> > > > > > + VRING_DESC_F_USED(!vq->wrap_counter));
> > > > > > + if (!indirect && i == head)
> > > > > > + head_flags = flags;
> > > > > > + else
> > > > > > + desc[i].flags = flags;
> > > > > > +
> > > > > > + desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > > > + desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > > > > > + desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > > > > > + prev = i;
> > > > > > + i++;
> > > > > > + if (!indirect && i >= vq->vring_packed.num) {
> > > > > > + i = 0;
> > > > > > + vq->wrap_counter ^= 1;
> > > > > > + }
> > > > > > + }
> > > > > > + }
> > > > > > + /* Last one doesn't continue. */
> > > > > > + if (!indirect && (head + 1) % vq->vring_packed.num == i)
> > > > > > + head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > > > I can't get the why we need this here.
> > > > If only one desc is used, we will need to clear the
> > > > VRING_DESC_F_NEXT flag from the head_flags.
> > > Yes, I meant why following desc[prev].flags won't work for this?
> > Because the update of desc[head].flags (in above case,
> > prev == head) has been delayed. The flags is saved in
> > head_flags.
>
> Ok, but let's try to avoid modular here e.g tracking the number of sgs in a
> counter.
>
> And I see lots of duplication in the above two loops, I believe we can unify
> them with a a single loop. the only difference is dma direction and write
> flag.
The above implementation for packed ring is basically
an mirror of the existing implementation in split ring
as I want to keep the coding style consistent. Below
is the corresponding code in split ring:
static inline int virtqueue_add(struct virtqueue *_vq,
struct scatterlist *sgs[],
unsigned int total_sg,
unsigned int out_sgs,
unsigned int in_sgs,
void *data,
void *ctx,
gfp_t gfp)
{
......
for (n = 0; n < out_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
if (vring_mapping_error(vq, addr))
goto unmap_release;
desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT);
desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
prev = i;
i = virtio16_to_cpu(_vq->vdev, desc[i].next);
}
}
for (; n < (out_sgs + in_sgs); n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
if (vring_mapping_error(vq, addr))
goto unmap_release;
desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT | VRING_DESC_F_WRITE);
desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
prev = i;
i = virtio16_to_cpu(_vq->vdev, desc[i].next);
}
}
......
}
>
> >
> > > > > > + else
> > > > > > + desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > > > > +
> > > > > > + if (indirect) {
> > > > > > + /* FIXME: to be implemented */
> > > > > > +
> > > > > > + /* Now that the indirect table is filled in, map it. */
> > > > > > + dma_addr_t addr = vring_map_single(
> > > > > > + vq, desc, total_sg * sizeof(struct vring_packed_desc),
> > > > > > + DMA_TO_DEVICE);
> > > > > > + if (vring_mapping_error(vq, addr))
> > > > > > + goto unmap_release;
> > > > > > +
> > > > > > + head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> > > > > > + VRING_DESC_F_AVAIL(wrap_counter) |
> > > > > > + VRING_DESC_F_USED(!wrap_counter));
> > > > > > + vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > > > + vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> > > > > > + total_sg * sizeof(struct vring_packed_desc));
> > > > > > + vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
> > > > > > + }
> > > > > > +
> > > > > > + /* We're using some buffers from the free list. */
> > > > > > + vq->vq.num_free -= descs_used;
> > > > > > +
> > > > > > + /* Update free pointer */
> > > > > > + if (indirect) {
> > > > > > + n = head + 1;
> > > > > > + if (n >= vq->vring_packed.num) {
> > > > > > + n = 0;
> > > > > > + vq->wrap_counter ^= 1;
> > > > > > + }
> > > > > > + vq->free_head = n;
> > > > > detach_buf_packed() does not even touch free_head here, so need to explain
> > > > > its meaning for packed ring.
> > > > Above code is for indirect support which isn't really
> > > > implemented in this patch yet.
> > > >
> > > > For your question, free_head stores the index of the
> > > > next avail desc. I'll add a comment for it or move it
> > > > to union and give it a better name in next version.
> > > Yes, something like avail_idx might be better.
> > >
> > > > > > + } else
> > > > > > + vq->free_head = i;
> > > > > ID is only valid in the last descriptor in the list, so head + 1 should be
> > > > > ok too?
> > > > I don't really get your point. The vq->free_head stores
> > > > the index of the next avail desc.
> > > I think I get your idea now, free_head has two meanings:
> > >
> > > - next avail index
> > > - buffer id
> > In my design, free_head is just the index of the next
> > avail desc.
> >
> > Driver can set anything to buffer ID.
>
> Then you need another method to track id to context e.g hashing.
I keep the context in desc_state[desc_idx]. So there is
no extra method needed to track the context.
>
> > And in my design,
> > I save desc index in buffer ID.
> >
> > I'll add comments for them.
> >
> > > If I'm correct, let's better add a comment for this.
> > >
> > > > > > +
> > > > > > + /* Store token and indirect buffer state. */
> > > > > > + vq->desc_state[head].num = descs_used;
> > > > > > + vq->desc_state[head].data = data;
> > > > > > + if (indirect)
> > > > > > + vq->desc_state[head].indir_desc = desc;
> > > > > > + else
> > > > > > + vq->desc_state[head].indir_desc = ctx;
> > > > > > +
> > > > > > + virtio_wmb(vq->weak_barriers);
> > > > > Let's add a comment to explain the barrier here.
> > > > Okay.
> > > >
> > > > > > + vq->vring_packed.desc[head].flags = head_flags;
> > > > > > + vq->num_added++;
> > > > > > +
> > > > > > + pr_debug("Added buffer head %i to %p\n", head, vq);
> > > > > > + END_USE(vq);
> > > > > > +
> > > > > > + return 0;
> > > > > > +
> > > > > > +unmap_release:
> > > > > > + err_idx = i;
> > > > > > + i = head;
> > > > > > +
> > > > > > + for (n = 0; n < total_sg; n++) {
> > > > > > + if (i == err_idx)
> > > > > > + break;
> > > > > > + vring_unmap_one(vq, &desc[i]);
> > > > > > + i++;
> > > > > > + if (!indirect && i >= vq->vring_packed.num)
> > > > > > + i = 0;
> > > > > > + }
> > > > > > +
> > > > > > + vq->wrap_counter = wrap_counter;
> > > > > > +
> > > > > > + if (indirect)
> > > > > > + kfree(desc);
> > > > > > +
> > > > > > + END_USE(vq);
> > > > > > + return -EIO;
> > > > > > +}
> > [...]
> > > > > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > > if (!queue) {
> > > > > > /* Try to get a single page. You are my only hope! */
> > > > > > - queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > > > + queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > > > + packed),
> > > > > > &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > > > > }
> > > > > > if (!queue)
> > > > > > return NULL;
> > > > > > - queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > - vring_init(&vring, num, queue, vring_align);
> > > > > > + queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > > > + if (packed)
> > > > > > + vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > > > + else
> > > > > > + vring_init(&vring.vring_split, num, queue, vring_align);
> > > > > Let's rename vring_init to vring_init_split() like other helpers?
> > > > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > > > I don't think we can rename it.
> > > I see, then this need more thoughts to unify the API.
> > My thought is to keep the old API as is, and introduce
> > new types and helpers for packed ring.
>
> I admit it's not a fault of this patch. But we'd better think of this in the
> future, consider we may have new kinds of ring.
>
> >
> > More details can be found in this patch:
> > https://lkml.org/lkml/2018/2/23/243
> > (PS. The type which has bit fields is just for reference,
> > and will be changed in next version.)
> >
> > Do you have any other suggestions?
>
> No.
Hmm.. Sorry, I didn't describe my question well.
I mean do you have any suggestions about the API
design for packed ring in uapi header? Currently
I introduced below two new helpers:
static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
void *p, unsigned long align);
static inline unsigned vring_packed_size(unsigned int num, unsigned long align);
When new rings are introduced in the future, above
helpers can't be reused. Maybe we should make the
helpers be able to determine the ring type?
Best regards,
Tiwei Bie
>
> Thanks
>
> >
> > Best regards,
> > Tiwei Bie
> >
> > > > > > - vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > > > > > - notify, callback, name);
> > > > > > + vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > > > > > + context, notify, callback, name);
> > > > > > if (!vq) {
> > > > > > vring_free_queue(vdev, queue_size_in_bytes, queue,
> > > > > > dma_addr);
> > [...]
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox