Linux virtualization list
 help / color / mirror / Atom feed
* CFP ICEIS 2019 - 21st Int.l Conf. on Enterprise Information Systems (Heraklion, Crete/Greece)
From: iceis @ 2018-10-16  9:32 UTC (permalink / raw)
  To: virtualization

SUBMISSION DEADLINE 

21st International Conference on Enterprise Information Systems

Submission Deadline: December 10, 2018

http://www.iceis.org/

May 3 - 5, 2019
Heraklion, Crete, Greece.

 ICEIS is organized in 6 major tracks:

 - Databases and Information Systems Integration
 - Artificial Intelligence and Decision Support Systems
 - Information Systems Analysis and Specification
 - Software Agents and Internet Computing
 - Human-Computer Interaction
 - Enterprise Architecture


In Cooperation with: ACM SIGCAS, EDSO, Cluster Habitat Sustentável, OSGP Alliance and Siemens. <br/>
Proceedings will be submitted for indexation by: DBLP, Thomson Reuters, EI, SCOPUS and Semantic Scholar. <br/>
 
A short list of presented papers will be selected so that revised and extended versions of these papers will be published by Springer.
 
All papers presented at the congress venue will also be available at the SCITEPRESS Digital Library (http://www.scitepress.org/DigitalLibrary/).
  
Should you have any question please don't hesitate contacting me.
 

Kind regards,
ICEIS Secretariat

Address: Av. D. Manuel I, 27A, 2º esq.
2910-595 Setubal, Portugal
Tel: +351 265 520 184
Fax: +351 265 520 186
Web: http://www.iceis.org/
e-mail: iceis.secretariat@insticc.org

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* CFP SMARTGREENS 2019 - 8th Int.l Conf. on Smart Cities and Green ICT Systems (Heraklion, Crete/Greece)
From: smartgreens @ 2018-10-16  9:31 UTC (permalink / raw)
  To: virtualization

SUBMISSION DEADLINE 

8th International Conference on Smart Cities and Green ICT Systems

Submission Deadline: December 10, 2018

http://www.smartgreens.org/

May 3 - 5, 2019
Heraklion, Crete, Greece.

 SMARTGREENS is organized in 5 major tracks:

 - Energy-Aware Systems and Technologies
 - Sustainable Computing and Systems
 - Smart Cities and Smart Buildings
 - Demos and Use-Cases
 - Smart and Digital Services


In Cooperation with: ACM SIGCAS, EDSO, Cluster Habitat Sustentável, OSGP Alliance and Siemens. <br/>
Proceedings will be submitted for indexation by: DBLP, Thomson Reuters, EI, SCOPUS and Semantic Scholar. <br/>
 
With the presence of internationally distinguished keynote speakers:
Norbert Streitz, Founder and Scientific Director, Smart Future Initiative, Germany
Rudolf Giffinger, Centre of Urban and Regional Research, Department of Spatial Planning, TU Wien, Austria
Javier Sánchez-Medina, University of Las Palmas de Gran Canaria, Spain
Jeroen Ploeg, 2getthere B.V., Utrecht, The Netherlands Lead Cooperative Driving Eindhoven University of Technology, Eindhoven; and The Netherlands Associate Professor (part time), Faculty of Mechanical Engineering, Dynamics and Control Group, Netherlands


A short list of presented papers will be selected so that revised and extended versions of these papers will be published by Springer.
 
All papers presented at the congress venue will also be available at the SCITEPRESS Digital Library (http://www.scitepress.org/DigitalLibrary/).
  
Should you have any question please don't hesitate contacting me.
 

Kind regards,
SMARTGREENS Secretariat

Address: Av. D. Manuel I, 27A, 2º esq.
2910-595 Setubal, Portugal
Tel: +351 265 520 185
Fax: +351 265 520 186
Web: http://www.smartgreens.org/
e-mail: smartgreens.secretariat@insticc.org

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* CFP VEHITS 2019 - 5th Int.l Conf. on Vehicle Technology and Intelligent Transport Systems (Heraklion, Crete/Greece)
From: vehits @ 2018-10-16  9:31 UTC (permalink / raw)
  To: virtualization

SUBMISSION DEADLINE 

5th International Conference on Vehicle Technology and Intelligent Transport Systems

Submission Deadline: December 10, 2018

http://www.vehits.org/

May 3 - 5, 2019
Heraklion, Crete, Greece.

 VEHITS is organized in 5 major tracks:

 - Intelligent Vehicle Technologies
 - Intelligent Transport Systems and Infrastructure
 - Connected Vehicles
 - Sustainable Transport 
 - Data Analytics


In Cooperation with: ACM SIGCAS, EDSO, Cluster Habitat Sustentável, OSGP Alliance and Siemens. <br/>
Proceedings will be submitted for indexation by: DBLP, Thomson Reuters, EI, SCOPUS and Semantic Scholar. <br/>
 
With the presence of internationally distinguished keynote speakers:
Javier Sánchez-Medina, University of Las Palmas de Gran Canaria, Spain
Jeroen Ploeg, 2getthere B.V., Utrecht, The Netherlands Lead Cooperative Driving Eindhoven University of Technology, Eindhoven; and The Netherlands Associate Professor (part time), Faculty of Mechanical Engineering, Dynamics and Control Group, Netherlands


A short list of presented papers will be selected so that revised and extended versions of these papers will be published by Springer.
 
All papers presented at the congress venue will also be available at the SCITEPRESS Digital Library (http://www.scitepress.org/DigitalLibrary/).
  
Should you have any question please don't hesitate contacting me.
 

Kind regards,
VEHITS Secretariat

Address: Av. D. Manuel I, 27A, 2º esq.
2910-595 Setubal, Portugal
Tel: +351 265 520 185
Fax: +351 265 520 186
Web: http://www.vehits.org/
e-mail: vehits.secretariat@insticc.org

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v3 0/7] Add virtio-iommu driver
From: Auger Eric @ 2018-10-16  9:25 UTC (permalink / raw)
  To: Jean-Philippe Brucker, iommu, virtualization, devicetree
  Cc: mark.rutland, peter.maydell, lorenzo.pieralisi, tnowicki, mst,
	marc.zyngier, linux-pci, will.deacon, kvmarm, robh+dt,
	robin.murphy, joro
In-Reply-To: <20181012145917.6840-1-jean-philippe.brucker@arm.com>

Hi Jean,

On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote:
> Implement the virtio-iommu driver, following specification v0.8 [1].
> Changes since v2 [2]:
> 
> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU
>   would like to phase out the MMIO transport. This produces a complex
>   topology where the programming interface of the IOMMU could appear
>   lower than the endpoints that it translates. It's not unheard of (e.g.
>   AMD IOMMU), and the guest easily copes with this.
>   
>   The "Firmware description" section of the specification has been
>   updated with all combinations of PCI, MMIO and DT, ACPI.

I have a question wrt the FW specification. The IOMMU consumes 1 slot in
the PCI domain and one needs to leave a RID hole in the iommu-map.  It
is not obvious to me that this RID always is predictable given the pcie
enumeration mechanism. Generally we have a coarse grain mapping of RID
onto iommu phandles/STREAMIDs. Here, if I understand correctly we need
to precisely identify the RID granted to the iommu. On QEMU this may
depend on the instantiation order of the virtio-pci device right? So
this does not look trivial to build this info. Isn't it possible to do
this exclusion at kernel level instead?

Thanks

Eric
> 
> * Fix structures layout, they don't need the "packed" attribute anymore.
> 
> * While we're at it, add domain parameter to DETACH request, and leave
>   some padding. This way the next version, that adds PASID support,
>   won't have to introduce a "DETACH2" request to stay backward
>   compatible.
> 
> * Require virtio device 1.0+. Remove legacy transport notes from the
>   specification.
> 
> * Request timeout is now only enabled with DEBUG.
> 
> * The patch for VFIO Kconfig (previously patch 5/5) is in next.
> 
> You can find Linux driver and kvmtool device on branches
> virtio-iommu/v0.8 [3] (currently based on 4.19-rc7 but rebasing onto
> next only produced a trivial conflict). Branch virtio-iommu/devel
> contains a few patches that I'd like to send once the base is upstream:
> 
> * virtio-iommu as a module. It got *much* nicer after Rob's probe
>   deferral rework, but I still have a bug to fix when re-loading the
>   virtio-iommu module.
> 
> * ACPI support requires a minor IORT spec update (reservation of node
>   ID). I think it should be easier to obtain once the device and drivers
>   are upstream.
> 
> [1] Virtio-iommu specification v0.8, diff from v0.7, and sources
>     git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.8
>     http://jpbrucker.net/virtio-iommu/spec/v0.8/virtio-iommu-v0.8.pdf
>     http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-pdf-diff-v0.7-v0.8.pdf
> 
> [2] [PATCH v2 0/5] Add virtio-iommu driver
>     https://www.spinics.net/lists/kvm/msg170655.html
> 
> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.8
>     git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.8
> 
> Jean-Philippe Brucker (7):
>   dt-bindings: virtio-mmio: Add IOMMU description
>   dt-bindings: virtio: Add virtio-pci-iommu node
>   PCI: OF: allow endpoints to bypass the iommu
>   PCI: OF: Initialize dev->fwnode appropriately
>   iommu: Add virtio-iommu driver
>   iommu/virtio: Add probe request
>   iommu/virtio: Add event queue
> 
>  .../devicetree/bindings/virtio/iommu.txt      |   66 +
>  .../devicetree/bindings/virtio/mmio.txt       |   30 +
>  MAINTAINERS                                   |    7 +
>  drivers/iommu/Kconfig                         |   11 +
>  drivers/iommu/Makefile                        |    1 +
>  drivers/iommu/virtio-iommu.c                  | 1171 +++++++++++++++++
>  drivers/pci/of.c                              |   14 +-
>  include/uapi/linux/virtio_ids.h               |    1 +
>  include/uapi/linux/virtio_iommu.h             |  159 +++
>  9 files changed, 1457 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h
> 

^ permalink raw reply

* Re: [PATCH] virtio_net: enable tx after resuming from suspend
From: Jason Wang @ 2018-10-16  8:53 UTC (permalink / raw)
  To: ake
  Cc: netdev, virtualization, David S. Miller, linux-kernel,
	Michael S. Tsirkin
In-Reply-To: <e2baaccc-4ead-e61d-fc1e-d79435012e1c@igel.co.jp>


On 2018/10/15 下午6:08, ake wrote:
>
> On 2018年10月12日 18:18, ake wrote:
>>
>> On 2018年10月12日 17:23, Jason Wang wrote:
>>>
>>> On 2018年10月12日 12:30, ake wrote:
>>>> On 2018年10月11日 22:06, Jason Wang wrote:
>>>>> On 2018年10月11日 18:22, ake wrote:
>>>>>> On 2018年10月11日 18:44, Jason Wang wrote:
>>>>>>> On 2018年10月11日 15:51, Ake Koomsin wrote:
>>>>>>>> commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
>>>>>>>> disabled the virtio tx before going to suspend to avoid a use after
>>>>>>>> free.
>>>>>>>> However, after resuming, it causes the virtio_net device to lose its
>>>>>>>> network connectivity.
>>>>>>>>
>>>>>>>> To solve the issue, we need to enable tx after resuming.
>>>>>>>>
>>>>>>>> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during
>>>>>>>> reset")
>>>>>>>> Signed-off-by: Ake Koomsin <ake@igel.co.jp>
>>>>>>>> ---
>>>>>>>>      drivers/net/virtio_net.c | 1 +
>>>>>>>>      1 file changed, 1 insertion(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>>> index dab504ec5e50..3453d80f5f81 100644
>>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>>> @@ -2256,6 +2256,7 @@ static int virtnet_restore_up(struct
>>>>>>>> virtio_device *vdev)
>>>>>>>>          }
>>>>>>>>            netif_device_attach(vi->dev);
>>>>>>>> +    netif_start_queue(vi->dev);
>>>>>>> I believe this is duplicated with netif_tx_wake_all_queues() in
>>>>>>> netif_device_attach() above?
>>>>>> Thank you for your review.
>>>>>>
>>>>>> If both netif_tx_wake_all_queues() and netif_start_queue() result in
>>>>>> clearing __QUEUE_STATE_DRV_XOFF, then is it possible that some
>>>>>> conditions in netif_device_attach() is not satisfied?
>>>>> Yes, maybe. One case I can see now is when the device is down, in this
>>>>> case netif_device_attach() won't try to wakeup the queue.
>>>>>
>>>>>>     Without
>>>>>> netif_start_queue(), the virtio_net device does not resume properly
>>>>>> after waking up.
>>>>> How do you trigger the issue? Just do suspend/resume?
>>>> Yes, simply suspend and resume.
>>>>
>>>> Here is how I trigger the issue:
>>>>
>>>> 1) Start the Virtual Machine Manager GUI program.
>>>> 2) Create a guest Linux OS. Make sure that the guest OS kernel is
>>>>      >= 4.12. Make sure that it uses virtio_net as its network device.
>>>>      In addition, make sure that the video adapter is VGA. Otherwise,
>>>>      waking up with the virtual power button does not work.
>>>> 3) After installing the guest OS, log in, and test the network
>>>>      connectivity by ping the host machine.
>>>> 4) Suspend. After this, the screen is blank.
>>>> 5) Resume by hitting the virtual power button. The login screen
>>>>      appears again.
>>>> 6) Log in again. The guest loses its network connection.
>>>>
>>>> In my test:
>>>> Guest: Ubuntu 16.04/18.04 with kernel 4.15.0-36-generic
>>>> Host: Ubuntu 16.04 with kernel 4.15.0-36-generic/4.4.0-137-generic
>>> I can not reproduce this issue if virtio-net interface is up in guest
>>> before the suspend. I'm using net-next.git and qemu master. But I do
>>> reproduce when virtio-net interface is down in guest before suspend,
>>> after resume, even if I make it up, the network is still lost.
>>>
>>> I think the interface is up in your case, but please confirm this.
>> If you mean the interface state before I hit the suspend button,
>> the answer is yes. The interface is up before I suspend the guest
>> machine.
>>
>> Note that my current QEMU version is QEMU emulator version 2.5.0
>> (Debian 1:2.5+dfsg-5ubuntu10.32).
>>
>> I will try with net-next.git and qemu master later and see if I can
>> reproduce the issue.
> Update. I tried with net-next and qemu master. Interestingly, the result
> is different from yours. The network is lost even if the virtio_net
> interface is up before suspending.
>
> Host: Ubuntu 16.04 with net-next kernel (default configuration)
> Guest: Ubuntu 18.04 with net-next kernel (default configuration)
> Qemu: master
> Qemu command:
> qemu-system-x86_64 -cpu host -m 2048 -enable-kvm \
> -bios /usr/share/OVMF/OVMF_CODE.fd \
> -drive file=/var/lib/libvirt/images/virtio_test.qcow2,if=virtio \
> -netdev user,id=hostnet0 \
> -device virtio-net-pci,netdev=hostnet0 \
> -device VGA,id=video0,vgamem_mb=16 \
> -global PIIX4_PM.disable_s3=1 \
> -global PIIX4_PM.disable_s4=1 -monitor stdio


Interesting, just notice you're using userspace network. To isolate the 
issue, can you retry with e.g tap or e1000 to make sure it's not a fault 
of slirp or virito-net?

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* RE: [PATCH v8] virtio_blk: add discard and write zeroes support
From: Liu, Changpeng @ 2018-10-16  1:45 UTC (permalink / raw)
  To: Daniel Verkamp, hch@infradead.org
  Cc: axboe@kernel.dk, mst@redhat.com,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, stefanha@redhat.com
In-Reply-To: <CABVzXAn1vhC=TBAzcVuKw7S63t5HUQWgWjMGX-yEXd=hx_1KHA@mail.gmail.com>



> -----Original Message-----
> From: Daniel Verkamp [mailto:dverkamp@chromium.org]
> Sent: Tuesday, October 16, 2018 7:16 AM
> To: hch@infradead.org
> Cc: virtualization@lists.linux-foundation.org; linux-block@vger.kernel.org;
> mst@redhat.com; jasowang@redhat.com; axboe@kernel.dk;
> stefanha@redhat.com; Liu, Changpeng <changpeng.liu@intel.com>
> Subject: Re: [PATCH v8] virtio_blk: add discard and write zeroes support
> 
> On Mon, Oct 15, 2018 at 2:27 AM Christoph Hellwig <hch@infradead.org> wrote:
> > On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> > > From: Changpeng Liu <changpeng.liu@intel.com>
> > >
> > > In commit 88c85538, "virtio-blk: add discard and write zeroes features
> > > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
> >
> > There is some issues in this spec.  For one using the multiple ranges
> > also for write zeroes is rather inefficient.  Write zeroes really should
> > use the same format as read and write.
> 
> I wasn't involved in the writing of the spec, so I'll defer to Michael
> and Changpeng here, but I'm not sure how "set in stone" the virtio
> specification is, or if it can be updated somehow without breaking
> compatibility.
> 
> I agree that Write Zeroes would be simpler to implement as a single
> LBA + length rather than a list.  However, it's not really possible to
> use the same format as the regular virtio block read/write commands
> (VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT), since the read/write commands do
> not specify a length explicitly; length is implied by the length of
> the data buffer as defined by the virtio descriptor, but a Write
> Zeroes command does not require a data buffer.  At best, this could be
> a separate command mirroring the layout of struct virtio_blk_req but
> with data replaced with a length field; I'm not sure that buys much in
> the way of consistency.
Yeah, that's the consideration here.
> 
> > Second the unmap flag isn't properly specified at all, as nothing
> > says the device may not unmap without the unmap flag.  Please take
> > a look at the SCSI or NVMe ѕpec for some guidance.
> 
> This could probably use some clarifying text in the specification, but
> given that there is nothing in the spec describing what the device
> needs to do when unmap = 0, I would assume that the device can do
> whatever it likes, as long as the blocks read back as 0s afterwards.
> Reading back 0s is required by the definition of the Write Zeroes
> command in the same virtio spec change.  It would probably be good to
> clarify this and explicitly define what the device is allowed to do in
> response to both settings of the unmap bit.
> 
> My understanding of the corresponding feature in NVMe (the Deallocate
> bit in the Write Zeroes command) is that the only difference between
> Deallocate = 1 and 0 is that the device "should" versus "may" (no
> "shall" on either side) deallocate the corresponding blocks, but only
> if the device supports reading 0s back after blocks are deallocated.
> If the device does not support reading 0s after deallocation, it is
> not allowed to deallocate blocks as part of a Write Zeroes command
> regardless of the setting of the Deallocate bit.
> 
> Some similar wording could probably be added to the virtio spec to
> clarify the meaning of unmap, although I would prefer something that
> makes it a little clearer that the bit is only intended as a hint from
> the driver to indicate whether the device should attempt to keep
> storage allocated for the zeroed blocks, if that is indeed the
> intended behavior.
Yes, that's the original idea.  Adding a clear description to the specification may be better. 
> 
> Is there some in-kernel doc that describes what behavior the Linux
> block layer needs from a write zeroes command?
> 
> > > +static inline int virtblk_setup_discard_write_zeroes(struct request *req,
> > > +                                             bool unmap)
> >
> > Why is this an inline function?
> 
> I don't think there's any reason it needs to be inline; I can drop the
> inline in the next revision.
> 
> Given (as far as I can tell) your concerns seem to apply to the Write
> Zeroes command specifically, would it be reasonable to start with a
> patch that just adds support for the Discard command (along with fixes
> for Ming's feedback)?  This would be sufficient for my particular use
> case (although I can't speak for Changpeng), and we can revisit Write
> Zeroes once the spec concerns are worked out.
> 
> Thanks,
> -- Daniel
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* RE: [PATCH v8] virtio_blk: add discard and write zeroes support
From: Liu, Changpeng @ 2018-10-16  1:40 UTC (permalink / raw)
  To: Christoph Hellwig, Daniel Verkamp
  Cc: Jens Axboe, Michael S. Tsirkin,
	virtualization@lists.linux-foundation.org,
	linux-block@vger.kernel.org, Stefan Hajnoczi
In-Reply-To: <20181015092740.GA3964@infradead.org>



> -----Original Message-----
> From: Christoph Hellwig [mailto:hch@infradead.org]
> Sent: Monday, October 15, 2018 5:28 PM
> To: Daniel Verkamp <dverkamp@chromium.org>
> Cc: virtualization@lists.linux-foundation.org; linux-block@vger.kernel.org;
> Michael S. Tsirkin <mst@redhat.com>; Jason Wang <jasowang@redhat.com>;
> Jens Axboe <axboe@kernel.dk>; Stefan Hajnoczi <stefanha@redhat.com>; Liu,
> Changpeng <changpeng.liu@intel.com>
> Subject: Re: [PATCH v8] virtio_blk: add discard and write zeroes support
> 
> On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> > From: Changpeng Liu <changpeng.liu@intel.com>
> >
> > In commit 88c85538, "virtio-blk: add discard and write zeroes features
> > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
> 
> There is some issues in this spec.  For one using the multiple ranges
> also for write zeroes is rather inefficient.  Write zeroes really should
> use the same format as read and write.
Because there is no length parameter for virtio block specification, adding the
two extra commands will not break the existing specification and driver implementation. 
Also existing Linux implementation for write zeroes will not use multiple segment
at all so there is always one range in practice.
> 
> Second the unmap flag isn't properly specified at all, as nothing
> says the device may not unmap without the unmap flag.  Please take
> a look at the SCSI or NVMe ѕpec for some guidance.
The unmap flag is only used for write zeroes command, as discard command will not 
guarantee the spaces will be zeroed, so adding this flag means (Discard + Write Zeroes),
so this definitely is backend related, the backend implementation can use same code
to implement discard and write zeroes commands.
> 
> > +static inline int virtblk_setup_discard_write_zeroes(struct request *req,
> > +						bool unmap)
> 
> Why is this an inline function?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v8] virtio_blk: add discard and write zeroes support
From: Daniel Verkamp @ 2018-10-15 23:16 UTC (permalink / raw)
  To: hch; +Cc: axboe, mst, virtualization, linux-block, stefanha, Changpeng Liu
In-Reply-To: <20181015092740.GA3964@infradead.org>

On Mon, Oct 15, 2018 at 2:27 AM Christoph Hellwig <hch@infradead.org> wrote:
> On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> > From: Changpeng Liu <changpeng.liu@intel.com>
> >
> > In commit 88c85538, "virtio-blk: add discard and write zeroes features
> > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
>
> There is some issues in this spec.  For one using the multiple ranges
> also for write zeroes is rather inefficient.  Write zeroes really should
> use the same format as read and write.

I wasn't involved in the writing of the spec, so I'll defer to Michael
and Changpeng here, but I'm not sure how "set in stone" the virtio
specification is, or if it can be updated somehow without breaking
compatibility.

I agree that Write Zeroes would be simpler to implement as a single
LBA + length rather than a list.  However, it's not really possible to
use the same format as the regular virtio block read/write commands
(VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT), since the read/write commands do
not specify a length explicitly; length is implied by the length of
the data buffer as defined by the virtio descriptor, but a Write
Zeroes command does not require a data buffer.  At best, this could be
a separate command mirroring the layout of struct virtio_blk_req but
with data replaced with a length field; I'm not sure that buys much in
the way of consistency.

> Second the unmap flag isn't properly specified at all, as nothing
> says the device may not unmap without the unmap flag.  Please take
> a look at the SCSI or NVMe ѕpec for some guidance.

This could probably use some clarifying text in the specification, but
given that there is nothing in the spec describing what the device
needs to do when unmap = 0, I would assume that the device can do
whatever it likes, as long as the blocks read back as 0s afterwards.
Reading back 0s is required by the definition of the Write Zeroes
command in the same virtio spec change.  It would probably be good to
clarify this and explicitly define what the device is allowed to do in
response to both settings of the unmap bit.

My understanding of the corresponding feature in NVMe (the Deallocate
bit in the Write Zeroes command) is that the only difference between
Deallocate = 1 and 0 is that the device "should" versus "may" (no
"shall" on either side) deallocate the corresponding blocks, but only
if the device supports reading 0s back after blocks are deallocated.
If the device does not support reading 0s after deallocation, it is
not allowed to deallocate blocks as part of a Write Zeroes command
regardless of the setting of the Deallocate bit.

Some similar wording could probably be added to the virtio spec to
clarify the meaning of unmap, although I would prefer something that
makes it a little clearer that the bit is only intended as a hint from
the driver to indicate whether the device should attempt to keep
storage allocated for the zeroed blocks, if that is indeed the
intended behavior.

Is there some in-kernel doc that describes what behavior the Linux
block layer needs from a write zeroes command?

> > +static inline int virtblk_setup_discard_write_zeroes(struct request *req,
> > +                                             bool unmap)
>
> Why is this an inline function?

I don't think there's any reason it needs to be inline; I can drop the
inline in the next revision.

Given (as far as I can tell) your concerns seem to apply to the Write
Zeroes command specifically, would it be reasonable to start with a
patch that just adds support for the Discard command (along with fixes
for Ming's feedback)?  This would be sufficient for my particular use
case (although I can't speak for Changpeng), and we can revisit Write
Zeroes once the spec concerns are worked out.

Thanks,
-- Daniel
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [Qemu-devel] virtio-console downgrade the virtio-pci-blk performance
From: Amit Shah @ 2018-10-15 18:51 UTC (permalink / raw)
  To: Feng Li
  Cc: linux-scsi@vger.kernel.org, amit, qemu-devel, linux-kernel,
	qemu-discuss, virtualization, dgilbert
In-Reply-To: <CAEK8JBC-HKGb43UynWx69Lj-y48Em038eNMpOb7G72XfLF_mDg@mail.gmail.com>

On (Thu) 11 Oct 2018 [18:15:41], Feng Li wrote:
> Add Amit Shah.
> 
> After some tests, we found:
> - the virtio serial port number is inversely proportional to the iSCSI
> virtio-blk-pci performance.
> If we set the virio-serial ports to 2("<controller
> type='virtio-serial' index='0' ports='2'/>), the performance downgrade
> is minimal.

If you use multiple virtio-net (or blk) devices -- just register, not
necessarily use -- does that also bring the performance down?  I
suspect it's the number of interrupts that get allocated for the
ports.  Also, could you check if MSI is enabled?  Can you try with and
without?  Can you also reproduce if you have multiple virtio-serial
controllers with 2 ports each (totalling up to whatever number that
reproduces the issue).

		Amit

> 
> - use local disk/ram disk as virtio-blk-pci disk, the performance
> downgrade is still obvious.
> 
> 
> Could anyone give some help about this issue?
> 
> Feng Li <lifeng1519@gmail.com> 于2018年10月1日周一 下午10:58写道:
> >
> > Hi Dave,
> > My comments are in-line.
> >
> > Dr. David Alan Gilbert <dgilbert@redhat.com> 于2018年10月1日周一 下午7:41写道:
> > >
> > > * Feng Li (lifeng1519@gmail.com) wrote:
> > > > Hi,
> > > > I found an obvious performance downgrade when virtio-console combined
> > > > with virtio-pci-blk.
> > > >
> > > > This phenomenon exists in nearly all Qemu versions and all Linux
> > > > (CentOS7, Fedora 28, Ubuntu 18.04) distros.
> > > >
> > > > This is a disk cmd:
> > > > -drive file=iscsi://127.0.0.1:3260/iqn.2016-02.com.test:system:fl-iscsi/1,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
> > > > -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on
> > > >
> > > > If I add "-device
> > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5  ", the virtio
> > > > disk 4k iops (randread/randwrite) would downgrade from 60k to 40k.
> > > >
> > > > In VM, if I rmmod virtio-console, the performance will back to normal.
> > > >
> > > > Any idea about this issue?
> > > >
> > > > I don't know this is a qemu issue or kernel issue.
> > >
> > > It sounds odd;  can you provide more details on:
> > >   a) The benchmark you're using.
> > I'm using fio, the config is:
> > [global]
> > ioengine=libaio
> > iodepth=128
> > runtime=120
> > time_based
> > direct=1
> >
> > [randread]
> > stonewall
> > bs=4k
> > filename=/dev/vdb
> > rw=randread
> >
> > >   b) the host and the guest config (number of cpus etc)
> > The qemu cmd is : /usr/libexec/qemu-kvm --device virtio-balloon -m 16G
> > --enable-kvm -cpu host -smp 8
> > or qemu-system-x86_64 --device virtio-balloon -m 16G --enable-kvm -cpu
> > host -smp 8
> >
> > The result is the same.
> >
> > >   c) Why are you running it with iscsi back to the same host - why not
> > >      just simplify the test back to a simple file?
> > >
> >
> > Because my ISCSI target could supply a high IOPS performance.
> > If using a slow disk, the performance downgrade would be not so obvious.
> > It's easy to be seen, you could try it.
> >
> >
> > > Dave
> > >
> > > >
> > > > Thanks in advance.
> > > > --
> > > > Thanks and Best Regards,
> > > > Alex
> > > >
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
> >
> >
> > --
> > Thanks and Best Regards,
> > Feng Li(Alex)
> 
> 
> 
> --
> Thanks and Best Regards,
> Feng Li(Alex)

		Amit
-- 
http://amitshah.net/
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [patch 00/11] x86/vdso: Cleanups, simmplifications and CLOCK_TAI support
From: Marcelo Tosatti @ 2018-10-15 13:39 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Wanpeng Li, Florian Weimer, Juergen Gross, Arnd Bergmann,
	Radim Krcmar, Peter Zijlstra, X86 ML, LKML, Linux Virtualization,
	Stephen Boyd, John Stultz, devel, Paolo Bonzini, Thomas Gleixner,
	Matt Rickard
In-Reply-To: <CALCETrXMEqHOa-8s51gGE_gxWVzTWY5eLiZrGLhMUkWKZkfmDw@mail.gmail.com>

On Thu, Oct 11, 2018 at 04:00:29PM -0700, Andy Lutomirski wrote:
> On Thu, Oct 11, 2018 at 3:28 PM Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >
> > On Tue, Oct 09, 2018 at 01:09:42PM -0700, Andy Lutomirski wrote:
> > > On Tue, Oct 9, 2018 at 8:28 AM Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > >
> > > > On Mon, Oct 08, 2018 at 10:38:22AM -0700, Andy Lutomirski wrote:
> > > > > On Mon, Oct 8, 2018 at 8:27 AM Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > >
> > > > > I read the comment three more times and even dug through the git
> > > > > history.  It seems like what you're saying is that, under certain
> > > > > conditions (which arguably would be bugs in the core Linux timing
> > > > > code),
> > > >
> > > > I don't see that as a bug. Its just a side effect of reading two
> > > > different clocks (one is CLOCK_MONOTONIC and the other is TSC),
> > > > and using those two clocks to as a "base + offset".
> > > >
> > > > As the comment explains, if you do that, can't guarantee monotonicity.
> > > >
> > > > > actually calling ktime_get_boot_ns() could be non-monotonic
> > > > > with respect to the kvmclock timing.  But get_kvmclock_ns() isn't used
> > > > > for VM timing as such -- it's used for the IOCTL interfaces for
> > > > > updating the time offset.  So can you explain how my patch is
> > > > > incorrect?
> > > >
> > > > ktime_get_boot_ns() has frequency correction applied, while
> > > > reading masterclock + TSC offset does not.
> > > >
> > > > So the clock reads differ.
> > > >
> > >
> > > Ah, okay, I finally think I see what's going on.  In the kvmclock data
> > > exposed to the guest, tsc_shift and tsc_to_system_mul come from
> > > tgt_tsc_khz, whereas master_kernel_ns and master_cycle_now come from
> > > CLOCK_BOOTTIME.  So the kvmclock and kernel clock drift apart at a
> > > rate given by the frequency shift and then suddenly agree again every
> > > time the pvclock data is updated.
> >
> > Yes.
> >
> > > Is there a reason to do it this way?
> >
> > Since pvclock updates which update system_timestamp are expensive (must stop all vcpus),
> > they should be avoided.
> >
> 
> Fair enough.
> 
> > So only HW TSC counts
> 
> makes sense.
> 
> >, and used as offset against vcpu's tsc_timestamp.
> >
> 
> Why don't you just expose CLOCK_MONTONIC_RAW or CLOCK_MONOTONIC_RAW
> plus suspend time, though?  Then you would actually be tracking a real
> kernel timekeeping mode, and you wouldn't need all this complicated
> offsetting work to avoid accidentally going backwards.

Can you outline how that would work ? 

^ permalink raw reply

* Re: [PATCH v3 3/7] PCI: OF: Allow endpoints to bypass the iommu
From: Robin Murphy @ 2018-10-15 11:32 UTC (permalink / raw)
  To: Bjorn Helgaas, Jean-Philippe Brucker
  Cc: mark.rutland, peter.maydell, tnowicki, devicetree, marc.zyngier,
	linux-pci, mst, will.deacon, virtualization, iommu, robh+dt
In-Reply-To: <20181012194158.GX5906@bhelgaas-glaptop.roam.corp.google.com>

On 12/10/18 20:41, Bjorn Helgaas wrote:
> s/iommu/IOMMU/ in subject
> 
> On Fri, Oct 12, 2018 at 03:59:13PM +0100, Jean-Philippe Brucker wrote:
>> Using the iommu-map binding, endpoints in a given PCI domain can be
>> managed by different IOMMUs. Some virtual machines may allow a subset of
>> endpoints to bypass the IOMMU. In some case the IOMMU itself is presented
> 
> s/case/cases/
> 
>> as a PCI endpoint (e.g. AMD IOMMU and virtio-iommu). Currently, when a
>> PCI root complex has an iommu-map property, the driver requires all
>> endpoints to be described by the property. Allow the iommu-map property to
>> have gaps.
> 
> I'm not an IOMMU or virtio expert, so it's not obvious to me why it is
> safe to allow devices to bypass the IOMMU.  Does this mean a typo in
> iommu-map could inadvertently allow devices to bypass it?  Should we
> indicate something in dmesg (and/or sysfs) about devices that bypass
> it?

It's not really "allow devices to bypass the IOMMU" so much as "allow DT 
to describe devices which the IOMMU doesn't translate". It's a bit of an 
edge case for not-really-PCI devices, but FWIW I can certainly think of 
several ways to build real hardware like that. As for inadvertent errors 
leaving out IDs which *should* be in the map, that really depends on the 
IOMMU/driver implementation - e.g. SMMUv2 with arm-smmu.disable_bypass=0 
would treat the device as untranslated, whereas SMMUv3 would always 
generate a fault upon any transaction due to no valid stream table entry 
being programmed (not even a bypass one).

I reckon it's a sufficiently unusual case that keeping some sort of 
message probably is worthwhile (at pr_info rather than pr_err) in case 
someone does hit it by mistake.

>> Relaxing of_pci_map_rid also allows the msi-map property to have gaps,

At worst, I suppose we could always add yet another parameter for each 
caller to choose whether a missing entry is considered an error or not.

Robin.

> s/of_pci_map_rid/of_pci_map_rid()/
> 
>> which is invalid since MSIs always reach an MSI controller. Thankfully
>> Linux will error out later, when attempting to find an MSI domain for the
>> device.
> 
> Not clear to me what "error out" means here.  In a userspace program,
> I would infer that the program exits with an error message, but I
> doubt you mean that Linux exits.
> 
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
>> ---
>>   drivers/pci/of.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
>> index 1836b8ddf292..2f5015bdb256 100644
>> --- a/drivers/pci/of.c
>> +++ b/drivers/pci/of.c
>> @@ -451,9 +451,10 @@ int of_pci_map_rid(struct device_node *np, u32 rid,
>>   		return 0;
>>   	}
>>   
>> -	pr_err("%pOF: Invalid %s translation - no match for rid 0x%x on %pOF\n",
>> -		np, map_name, rid, target && *target ? *target : NULL);
>> -	return -EFAULT;
>> +	/* Bypasses translation */
>> +	if (id_out)
>> +		*id_out = rid;
>> +	return 0;
>>   }
>>   
>>   #if IS_ENABLED(CONFIG_OF_IRQ)
>> -- 
>> 2.19.1
>>
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

^ permalink raw reply

* Re: [PATCH v3 3/7] PCI: OF: Allow endpoints to bypass the iommu
From: Michael S. Tsirkin @ 2018-10-15 10:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: mark.rutland, devicetree, lorenzo.pieralisi, tnowicki,
	peter.maydell, Jean-Philippe Brucker, linux-pci, joro,
	will.deacon, robin.murphy, virtualization, eric.auger, iommu,
	robh+dt, marc.zyngier, kvmarm
In-Reply-To: <20181012194158.GX5906@bhelgaas-glaptop.roam.corp.google.com>

On Fri, Oct 12, 2018 at 02:41:59PM -0500, Bjorn Helgaas wrote:
> s/iommu/IOMMU/ in subject
> 
> On Fri, Oct 12, 2018 at 03:59:13PM +0100, Jean-Philippe Brucker wrote:
> > Using the iommu-map binding, endpoints in a given PCI domain can be
> > managed by different IOMMUs. Some virtual machines may allow a subset of
> > endpoints to bypass the IOMMU. In some case the IOMMU itself is presented
> 
> s/case/cases/
> 
> > as a PCI endpoint (e.g. AMD IOMMU and virtio-iommu). Currently, when a
> > PCI root complex has an iommu-map property, the driver requires all
> > endpoints to be described by the property. Allow the iommu-map property to
> > have gaps.
> 
> I'm not an IOMMU or virtio expert, so it's not obvious to me why it is
> safe to allow devices to bypass the IOMMU.  Does this mean a typo in
> iommu-map could inadvertently allow devices to bypass it?


Thinking about this comment, I would like to ask: can't the
virtio device indicate the ranges in a portable way?
This would minimize the dependency on dt bindings and ACPI,
enabling support for systems that have neither but do
have virtio e.g. through pci.

>  Should we
> indicate something in dmesg (and/or sysfs) about devices that bypass
> it?
> 
> > Relaxing of_pci_map_rid also allows the msi-map property to have gaps,
> 
> s/of_pci_map_rid/of_pci_map_rid()/
> 
> > which is invalid since MSIs always reach an MSI controller. Thankfully
> > Linux will error out later, when attempting to find an MSI domain for the
> > device.
> 
> Not clear to me what "error out" means here.  In a userspace program,
> I would infer that the program exits with an error message, but I
> doubt you mean that Linux exits.
> 
> > Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
> > ---
> >  drivers/pci/of.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> > index 1836b8ddf292..2f5015bdb256 100644
> > --- a/drivers/pci/of.c
> > +++ b/drivers/pci/of.c
> > @@ -451,9 +451,10 @@ int of_pci_map_rid(struct device_node *np, u32 rid,
> >  		return 0;
> >  	}
> >  
> > -	pr_err("%pOF: Invalid %s translation - no match for rid 0x%x on %pOF\n",
> > -		np, map_name, rid, target && *target ? *target : NULL);
> > -	return -EFAULT;
> > +	/* Bypasses translation */
> > +	if (id_out)
> > +		*id_out = rid;
> > +	return 0;
> >  }
> >  
> >  #if IS_ENABLED(CONFIG_OF_IRQ)
> > -- 
> > 2.19.1
> > 

^ permalink raw reply

* Re: [PATCH net-next V2 6/8] vhost: packed ring support
From: Michael S. Tsirkin @ 2018-10-15 10:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, netdev, linux-kernel, virtualization, maxime.coquelin, wexu
In-Reply-To: <ee3785be-b309-c4ae-e959-737018c21464@redhat.com>

On Mon, Oct 15, 2018 at 10:51:06AM +0800, Jason Wang wrote:
> 
> 
> On 2018年10月15日 10:43, Michael S. Tsirkin wrote:
> > On Mon, Oct 15, 2018 at 10:22:33AM +0800, Jason Wang wrote:
> > > 
> > > On 2018年10月13日 01:23, Michael S. Tsirkin wrote:
> > > > On Fri, Oct 12, 2018 at 10:32:44PM +0800, Tiwei Bie wrote:
> > > > > On Mon, Jul 16, 2018 at 11:28:09AM +0800, Jason Wang wrote:
> > > > > [...]
> > > > > > @@ -1367,10 +1397,48 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
> > > > > >    		vq->last_avail_idx = s.num;
> > > > > >    		/* Forget the cached index value. */
> > > > > >    		vq->avail_idx = vq->last_avail_idx;
> > > > > > +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
> > > > > > +			vq->last_avail_wrap_counter = wrap_counter;
> > > > > > +			vq->avail_wrap_counter = vq->last_avail_wrap_counter;
> > > > > > +		}
> > > > > >    		break;
> > > > > >    	case VHOST_GET_VRING_BASE:
> > > > > >    		s.index = idx;
> > > > > >    		s.num = vq->last_avail_idx;
> > > > > > +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> > > > > > +			s.num |= vq->last_avail_wrap_counter << 31;
> > > > > > +		if (copy_to_user(argp, &s, sizeof(s)))
> > > > > > +			r = -EFAULT;
> > > > > > +		break;
> > > > > > +	case VHOST_SET_VRING_USED_BASE:
> > > > > > +		/* Moving base with an active backend?
> > > > > > +		 * You don't want to do that.
> > > > > > +		 */
> > > > > > +		if (vq->private_data) {
> > > > > > +			r = -EBUSY;
> > > > > > +			break;
> > > > > > +		}
> > > > > > +		if (copy_from_user(&s, argp, sizeof(s))) {
> > > > > > +			r = -EFAULT;
> > > > > > +			break;
> > > > > > +		}
> > > > > > +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
> > > > > > +			wrap_counter = s.num >> 31;
> > > > > > +			s.num &= ~(1 << 31);
> > > > > > +		}
> > > > > > +		if (s.num > 0xffff) {
> > > > > > +			r = -EINVAL;
> > > > > > +			break;
> > > > > > +		}
> > > > > Do we want to put wrap_counter at bit 15?
> > > > I think I second that - seems to be consistent with
> > > > e.g. event suppression structure and the proposed
> > > > extension to driver notifications.
> > > Ok, I assumes packed virtqueue support 64K but looks not. I can change it to
> > > bit 15 and GET_VRING_BASE need to be changed as well.
> > > 
> > > > 
> > > > > If put wrap_counter at bit 31, the check (s.num > 0xffff)
> > > > > won't be able to catch the illegal index 0x8000~0xffff for
> > > > > packed ring.
> > > > > 
> > > Do we need to clarify this in the spec?
> > Isn't this all internal vhost stuff?
> 
> I meant the illegal index 0x8000-0xffff.

It does say packed virtqueues support up to 2 15 entries each.

But yes we can add a requirement that devices do not expose
larger rings. Split does not support 2**16 either, right?
With 2**16 enties avail index becomes 0 and ring looks empty.

> 
> > 
> > > > > > +		vq->last_used_idx = s.num;
> > > > > > +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> > > > > > +			vq->last_used_wrap_counter = wrap_counter;
> > > > > > +		break;
> > > > > > +	case VHOST_GET_VRING_USED_BASE:
> > > > > Do we need the new VHOST_GET_VRING_USED_BASE and
> > > > > VHOST_SET_VRING_USED_BASE ops?
> > > > > 
> > > > > We are going to merge below series in DPDK:
> > > > > 
> > > > > http://patches.dpdk.org/patch/45874/
> > > > > 
> > > > > We may need to reach an agreement first.
> > > If we agree that 64K virtqueue won't be supported, I'm ok with either.
> > Well the spec says right at the beginning:
> > 
> > Packed virtqueues support up to 2 15 entries each.
> 
> Ok. I get it.
> 
> Then I can change vhost to match what dpdk did.
> 
> Thanks
> 
> > 
> > 
> > > Btw the code assumes used_wrap_counter is equal to avail_wrap_counter which
> > > looks wrong?
> > > 
> > > Thanks
> > > 
> > > > > > +		s.index = idx;
> > > > > > +		s.num = vq->last_used_idx;
> > > > > > +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> > > > > > +			s.num |= vq->last_used_wrap_counter << 31;
> > > > > >    		if (copy_to_user(argp, &s, sizeof s))
> > > > > >    			r = -EFAULT;
> > > > > >    		break;
> > > > > [...]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH v8] virtio_blk: add discard and write zeroes support
From: Christoph Hellwig @ 2018-10-15  9:27 UTC (permalink / raw)
  To: Daniel Verkamp
  Cc: Jens Axboe, Michael S. Tsirkin, virtualization, linux-block,
	Stefan Hajnoczi, Changpeng Liu
In-Reply-To: <20181012210628.226361-1-dverkamp@chromium.org>

On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> From: Changpeng Liu <changpeng.liu@intel.com>
> 
> In commit 88c85538, "virtio-blk: add discard and write zeroes features
> to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio

There is some issues in this spec.  For one using the multiple ranges
also for write zeroes is rather inefficient.  Write zeroes really should
use the same format as read and write.

Second the unmap flag isn't properly specified at all, as nothing
says the device may not unmap without the unmap flag.  Please take
a look at the SCSI or NVMe ѕpec for some guidance.

> +static inline int virtblk_setup_discard_write_zeroes(struct request *req,
> +						bool unmap)

Why is this an inline function?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC] VSOCK: The performance problem of vhost_vsock.
From: jiangyiwen @ 2018-10-15  6:12 UTC (permalink / raw)
  To: Jason Wang, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <30d7c370-b206-cdac-dc85-53e9be1e1c63@redhat.com>

On 2018/10/15 10:33, Jason Wang wrote:
> 
> 
> On 2018年10月15日 09:43, jiangyiwen wrote:
>> Hi Stefan & All:
>>
>> Now I find vhost-vsock has two performance problems even if it
>> is not designed for performance.
>>
>> First, I think vhost-vsock should faster than vhost-net because it
>> is no TCP/IP stack, but the real test result vhost-net is 5~10
>> times than vhost-vsock, currently I am looking for the reason.
> 
> TCP/IP is not a must for vhost-net.
> 
> How do you test and compare the performance?
> 
> Thanks
> 

I test the performance used my test tool, like follows:

Server                   Client
socket()
bind()
listen()

                         socket(AF_VSOCK) or socket(AF_INET)
Accept() <-------------->connect()
                         *======Start Record Time======*
                         Call syscall sendfile()
Recv()
                         Send end
Receive end
Send(file_size)
                         Recv(file_size)
                         *======End Record Time======*

The test result, vhost-vsock is about 500MB/s, and vhost-net is about 2500MB/s.

By the way, vhost-net use single queue.

Thanks.

>> Second, vhost-vsock only supports two vqs(tx and rx), that means
>> if multiple sockets in the guest will use the same vq to transmit
>> the message and get the response. So if there are multiple applications
>> in the guest, we should support "Multiqueue" feature for Virtio-vsock.
>>
>> Stefan, have you encountered these problems?
>>
>> Thanks,
>> Yiwen.
>>
> 
> 
> .
> 


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next V2 6/8] vhost: packed ring support
From: Jason Wang @ 2018-10-15  2:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, netdev, linux-kernel, virtualization, maxime.coquelin, wexu
In-Reply-To: <20181014224208-mutt-send-email-mst@kernel.org>



On 2018年10月15日 10:43, Michael S. Tsirkin wrote:
> On Mon, Oct 15, 2018 at 10:22:33AM +0800, Jason Wang wrote:
>>
>> On 2018年10月13日 01:23, Michael S. Tsirkin wrote:
>>> On Fri, Oct 12, 2018 at 10:32:44PM +0800, Tiwei Bie wrote:
>>>> On Mon, Jul 16, 2018 at 11:28:09AM +0800, Jason Wang wrote:
>>>> [...]
>>>>> @@ -1367,10 +1397,48 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>>>>>    		vq->last_avail_idx = s.num;
>>>>>    		/* Forget the cached index value. */
>>>>>    		vq->avail_idx = vq->last_avail_idx;
>>>>> +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
>>>>> +			vq->last_avail_wrap_counter = wrap_counter;
>>>>> +			vq->avail_wrap_counter = vq->last_avail_wrap_counter;
>>>>> +		}
>>>>>    		break;
>>>>>    	case VHOST_GET_VRING_BASE:
>>>>>    		s.index = idx;
>>>>>    		s.num = vq->last_avail_idx;
>>>>> +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
>>>>> +			s.num |= vq->last_avail_wrap_counter << 31;
>>>>> +		if (copy_to_user(argp, &s, sizeof(s)))
>>>>> +			r = -EFAULT;
>>>>> +		break;
>>>>> +	case VHOST_SET_VRING_USED_BASE:
>>>>> +		/* Moving base with an active backend?
>>>>> +		 * You don't want to do that.
>>>>> +		 */
>>>>> +		if (vq->private_data) {
>>>>> +			r = -EBUSY;
>>>>> +			break;
>>>>> +		}
>>>>> +		if (copy_from_user(&s, argp, sizeof(s))) {
>>>>> +			r = -EFAULT;
>>>>> +			break;
>>>>> +		}
>>>>> +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
>>>>> +			wrap_counter = s.num >> 31;
>>>>> +			s.num &= ~(1 << 31);
>>>>> +		}
>>>>> +		if (s.num > 0xffff) {
>>>>> +			r = -EINVAL;
>>>>> +			break;
>>>>> +		}
>>>> Do we want to put wrap_counter at bit 15?
>>> I think I second that - seems to be consistent with
>>> e.g. event suppression structure and the proposed
>>> extension to driver notifications.
>> Ok, I assumes packed virtqueue support 64K but looks not. I can change it to
>> bit 15 and GET_VRING_BASE need to be changed as well.
>>
>>>
>>>> If put wrap_counter at bit 31, the check (s.num > 0xffff)
>>>> won't be able to catch the illegal index 0x8000~0xffff for
>>>> packed ring.
>>>>
>> Do we need to clarify this in the spec?
> Isn't this all internal vhost stuff?

I meant the illegal index 0x8000-0xffff.

>
>>>>> +		vq->last_used_idx = s.num;
>>>>> +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
>>>>> +			vq->last_used_wrap_counter = wrap_counter;
>>>>> +		break;
>>>>> +	case VHOST_GET_VRING_USED_BASE:
>>>> Do we need the new VHOST_GET_VRING_USED_BASE and
>>>> VHOST_SET_VRING_USED_BASE ops?
>>>>
>>>> We are going to merge below series in DPDK:
>>>>
>>>> http://patches.dpdk.org/patch/45874/
>>>>
>>>> We may need to reach an agreement first.
>> If we agree that 64K virtqueue won't be supported, I'm ok with either.
> Well the spec says right at the beginning:
>
> Packed virtqueues support up to 2 15 entries each.

Ok. I get it.

Then I can change vhost to match what dpdk did.

Thanks

>
>
>> Btw the code assumes used_wrap_counter is equal to avail_wrap_counter which
>> looks wrong?
>>
>> Thanks
>>
>>>>> +		s.index = idx;
>>>>> +		s.num = vq->last_used_idx;
>>>>> +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
>>>>> +			s.num |= vq->last_used_wrap_counter << 31;
>>>>>    		if (copy_to_user(argp, &s, sizeof s))
>>>>>    			r = -EFAULT;
>>>>>    		break;
>>>> [...]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next V2 6/8] vhost: packed ring support
From: Michael S. Tsirkin @ 2018-10-15  2:43 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, netdev, linux-kernel, virtualization, maxime.coquelin, wexu
In-Reply-To: <447f47fa-32dd-a408-dd81-13a9839e0748@redhat.com>

On Mon, Oct 15, 2018 at 10:22:33AM +0800, Jason Wang wrote:
> 
> 
> On 2018年10月13日 01:23, Michael S. Tsirkin wrote:
> > On Fri, Oct 12, 2018 at 10:32:44PM +0800, Tiwei Bie wrote:
> > > On Mon, Jul 16, 2018 at 11:28:09AM +0800, Jason Wang wrote:
> > > [...]
> > > > @@ -1367,10 +1397,48 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
> > > >   		vq->last_avail_idx = s.num;
> > > >   		/* Forget the cached index value. */
> > > >   		vq->avail_idx = vq->last_avail_idx;
> > > > +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
> > > > +			vq->last_avail_wrap_counter = wrap_counter;
> > > > +			vq->avail_wrap_counter = vq->last_avail_wrap_counter;
> > > > +		}
> > > >   		break;
> > > >   	case VHOST_GET_VRING_BASE:
> > > >   		s.index = idx;
> > > >   		s.num = vq->last_avail_idx;
> > > > +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> > > > +			s.num |= vq->last_avail_wrap_counter << 31;
> > > > +		if (copy_to_user(argp, &s, sizeof(s)))
> > > > +			r = -EFAULT;
> > > > +		break;
> > > > +	case VHOST_SET_VRING_USED_BASE:
> > > > +		/* Moving base with an active backend?
> > > > +		 * You don't want to do that.
> > > > +		 */
> > > > +		if (vq->private_data) {
> > > > +			r = -EBUSY;
> > > > +			break;
> > > > +		}
> > > > +		if (copy_from_user(&s, argp, sizeof(s))) {
> > > > +			r = -EFAULT;
> > > > +			break;
> > > > +		}
> > > > +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
> > > > +			wrap_counter = s.num >> 31;
> > > > +			s.num &= ~(1 << 31);
> > > > +		}
> > > > +		if (s.num > 0xffff) {
> > > > +			r = -EINVAL;
> > > > +			break;
> > > > +		}
> > > Do we want to put wrap_counter at bit 15?
> > I think I second that - seems to be consistent with
> > e.g. event suppression structure and the proposed
> > extension to driver notifications.
> 
> Ok, I assumes packed virtqueue support 64K but looks not. I can change it to
> bit 15 and GET_VRING_BASE need to be changed as well.
> 
> > 
> > 
> > > If put wrap_counter at bit 31, the check (s.num > 0xffff)
> > > won't be able to catch the illegal index 0x8000~0xffff for
> > > packed ring.
> > > 
> 
> Do we need to clarify this in the spec?

Isn't this all internal vhost stuff?

> > > > +		vq->last_used_idx = s.num;
> > > > +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> > > > +			vq->last_used_wrap_counter = wrap_counter;
> > > > +		break;
> > > > +	case VHOST_GET_VRING_USED_BASE:
> > > Do we need the new VHOST_GET_VRING_USED_BASE and
> > > VHOST_SET_VRING_USED_BASE ops?
> > > 
> > > We are going to merge below series in DPDK:
> > > 
> > > http://patches.dpdk.org/patch/45874/
> > > 
> > > We may need to reach an agreement first.
> 
> If we agree that 64K virtqueue won't be supported, I'm ok with either.

Well the spec says right at the beginning:

Packed virtqueues support up to 2 15 entries each.


> Btw the code assumes used_wrap_counter is equal to avail_wrap_counter which
> looks wrong?
> 
> Thanks
> 
> > > 
> > > > +		s.index = idx;
> > > > +		s.num = vq->last_used_idx;
> > > > +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> > > > +			s.num |= vq->last_used_wrap_counter << 31;
> > > >   		if (copy_to_user(argp, &s, sizeof s))
> > > >   			r = -EFAULT;
> > > >   		break;
> > > [...]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [RFC] VSOCK: The performance problem of vhost_vsock.
From: Jason Wang @ 2018-10-15  2:33 UTC (permalink / raw)
  To: jiangyiwen, stefanha; +Cc: netdev, kvm, virtualization
In-Reply-To: <5BC3F0D4.60409@huawei.com>



On 2018年10月15日 09:43, jiangyiwen wrote:
> Hi Stefan & All:
>
> Now I find vhost-vsock has two performance problems even if it
> is not designed for performance.
>
> First, I think vhost-vsock should faster than vhost-net because it
> is no TCP/IP stack, but the real test result vhost-net is 5~10
> times than vhost-vsock, currently I am looking for the reason.

TCP/IP is not a must for vhost-net.

How do you test and compare the performance?

Thanks

> Second, vhost-vsock only supports two vqs(tx and rx), that means
> if multiple sockets in the guest will use the same vq to transmit
> the message and get the response. So if there are multiple applications
> in the guest, we should support "Multiqueue" feature for Virtio-vsock.
>
> Stefan, have you encountered these problems?
>
> Thanks,
> Yiwen.
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH net-next V2 6/8] vhost: packed ring support
From: Jason Wang @ 2018-10-15  2:22 UTC (permalink / raw)
  To: Michael S. Tsirkin, Tiwei Bie
  Cc: kvm, netdev, linux-kernel, virtualization, maxime.coquelin, wexu
In-Reply-To: <20181012131812-mutt-send-email-mst@kernel.org>



On 2018年10月13日 01:23, Michael S. Tsirkin wrote:
> On Fri, Oct 12, 2018 at 10:32:44PM +0800, Tiwei Bie wrote:
>> On Mon, Jul 16, 2018 at 11:28:09AM +0800, Jason Wang wrote:
>> [...]
>>> @@ -1367,10 +1397,48 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
>>>   		vq->last_avail_idx = s.num;
>>>   		/* Forget the cached index value. */
>>>   		vq->avail_idx = vq->last_avail_idx;
>>> +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
>>> +			vq->last_avail_wrap_counter = wrap_counter;
>>> +			vq->avail_wrap_counter = vq->last_avail_wrap_counter;
>>> +		}
>>>   		break;
>>>   	case VHOST_GET_VRING_BASE:
>>>   		s.index = idx;
>>>   		s.num = vq->last_avail_idx;
>>> +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
>>> +			s.num |= vq->last_avail_wrap_counter << 31;
>>> +		if (copy_to_user(argp, &s, sizeof(s)))
>>> +			r = -EFAULT;
>>> +		break;
>>> +	case VHOST_SET_VRING_USED_BASE:
>>> +		/* Moving base with an active backend?
>>> +		 * You don't want to do that.
>>> +		 */
>>> +		if (vq->private_data) {
>>> +			r = -EBUSY;
>>> +			break;
>>> +		}
>>> +		if (copy_from_user(&s, argp, sizeof(s))) {
>>> +			r = -EFAULT;
>>> +			break;
>>> +		}
>>> +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
>>> +			wrap_counter = s.num >> 31;
>>> +			s.num &= ~(1 << 31);
>>> +		}
>>> +		if (s.num > 0xffff) {
>>> +			r = -EINVAL;
>>> +			break;
>>> +		}
>> Do we want to put wrap_counter at bit 15?
> I think I second that - seems to be consistent with
> e.g. event suppression structure and the proposed
> extension to driver notifications.

Ok, I assumes packed virtqueue support 64K but looks not. I can change 
it to bit 15 and GET_VRING_BASE need to be changed as well.

>
>
>> If put wrap_counter at bit 31, the check (s.num > 0xffff)
>> won't be able to catch the illegal index 0x8000~0xffff for
>> packed ring.
>>

Do we need to clarify this in the spec?

>>> +		vq->last_used_idx = s.num;
>>> +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
>>> +			vq->last_used_wrap_counter = wrap_counter;
>>> +		break;
>>> +	case VHOST_GET_VRING_USED_BASE:
>> Do we need the new VHOST_GET_VRING_USED_BASE and
>> VHOST_SET_VRING_USED_BASE ops?
>>
>> We are going to merge below series in DPDK:
>>
>> http://patches.dpdk.org/patch/45874/
>>
>> We may need to reach an agreement first.

If we agree that 64K virtqueue won't be supported, I'm ok with either.

Btw the code assumes used_wrap_counter is equal to avail_wrap_counter 
which looks wrong?

Thanks

>>
>>> +		s.index = idx;
>>> +		s.num = vq->last_used_idx;
>>> +		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
>>> +			s.num |= vq->last_used_wrap_counter << 31;
>>>   		if (copy_to_user(argp, &s, sizeof s))
>>>   			r = -EFAULT;
>>>   		break;
>> [...]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* [RFC] VSOCK: The performance problem of vhost_vsock.
From: jiangyiwen @ 2018-10-15  1:43 UTC (permalink / raw)
  To: stefanha; +Cc: netdev, kvm, virtualization

Hi Stefan & All:

Now I find vhost-vsock has two performance problems even if it
is not designed for performance.

First, I think vhost-vsock should faster than vhost-net because it
is no TCP/IP stack, but the real test result vhost-net is 5~10
times than vhost-vsock, currently I am looking for the reason.

Second, vhost-vsock only supports two vqs(tx and rx), that means
if multiple sockets in the guest will use the same vq to transmit
the message and get the response. So if there are multiple applications
in the guest, we should support "Multiqueue" feature for Virtio-vsock.

Stefan, have you encountered these problems?

Thanks,
Yiwen.

^ permalink raw reply

* Re: [PATCH v8] virtio_blk: add discard and write zeroes support
From: Michael S. Tsirkin @ 2018-10-15  0:54 UTC (permalink / raw)
  To: Daniel Verkamp
  Cc: Jens Axboe, virtualization, linux-block, Stefan Hajnoczi,
	pbonzini, Changpeng Liu
In-Reply-To: <20181012210628.226361-1-dverkamp@chromium.org>

On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> From: Changpeng Liu <changpeng.liu@intel.com>
> 
> In commit 88c85538, "virtio-blk: add discard and write zeroes features
> to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
> block specification has been extended to add VIRTIO_BLK_T_DISCARD and
> VIRTIO_BLK_T_WRITE_ZEROES commands.  This patch enables support for
> discard and write zeroes in the virtio-blk driver when the device
> advertises the corresponding features, VIRTIO_BLK_F_DISCARD and
> VIRTIO_BLK_F_WRITE_ZEROES.
> 
> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
> Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>

Cc Paolo as well.

> ---
> dverkamp: I've picked up this patch and made a few minor changes (as
> listed below); most notably, I changed the kmalloc back to GFP_ATOMIC,
> since it can be called from a context where sleeping is not allowed.
> To prevent large allocations, I've also clamped the maximum number of
> discard segments to 256; this results in a 4K allocation and should be
> plenty of descriptors for most use cases.
> 
> I also removed most of the description from the commit message, since it
> was duplicating the comments from virtio_blk.h and quoting parts of the
> spec without adding any extra information.  I have tested this iteration
> of the patch using crosvm with modifications to enable the new features:
> https://chromium.googlesource.com/chromiumos/platform/crosvm/
> 
> CHANGELOG:
> v8: [dverkamp] replace shifts by 9 with SECTOR_SHIFT constant
> v7: [dverkamp] use GFP_ATOMIC for allocation that may not sleep; clarify
> descriptor flags field; comment wording cleanups.
> v6: don't set T_OUT bit to discard and write zeroes commands.
> v5: use new block layer API: blk_queue_flag_set.
> v4: several optimizations based on MST's comments, remove bit field
> usage for command descriptor.
> v3: define the virtio-blk protocol to add discard and write zeroes
> support, first version implementation based on proposed specification.
> v2: add write zeroes command support.
> v1: initial proposal implementation for discard command.
> ---
>  drivers/block/virtio_blk.c      | 95 ++++++++++++++++++++++++++++++++-
>  include/uapi/linux/virtio_blk.h | 54 +++++++++++++++++++
>  2 files changed, 147 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 23752dc99b00..04a7ae602e2f 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -18,6 +18,7 @@
>  
>  #define PART_BITS 4
>  #define VQ_NAME_LEN 16
> +#define MAX_DISCARD_SEGMENTS 256
>  
>  static int major;
>  static DEFINE_IDA(vd_index_ida);
> @@ -172,10 +173,50 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr,
>  	return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC);
>  }
>  
> +
> +static inline int virtblk_setup_discard_write_zeroes(struct request *req,
> +						bool unmap)
> +{
> +	unsigned short segments = blk_rq_nr_discard_segments(req);
> +	unsigned short n = 0;
> +	struct virtio_blk_discard_write_zeroes *range;
> +	struct bio *bio;
> +	u32 flags = 0;
> +
> +	if (unmap)
> +		flags |= VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP;
> +
> +	range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC);
> +	if (!range)
> +		return -ENOMEM;
> +
> +	__rq_for_each_bio(bio, req) {
> +		u64 sector = bio->bi_iter.bi_sector;
> +		u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT;
> +
> +		range[n].flags = cpu_to_le32(flags);
> +		range[n].num_sectors = cpu_to_le32(num_sectors);
> +		range[n].sector = cpu_to_le64(sector);
> +		n++;
> +	}
> +
> +	req->special_vec.bv_page = virt_to_page(range);
> +	req->special_vec.bv_offset = offset_in_page(range);
> +	req->special_vec.bv_len = sizeof(*range) * segments;
> +	req->rq_flags |= RQF_SPECIAL_PAYLOAD;
> +
> +	return 0;
> +}
> +
>  static inline void virtblk_request_done(struct request *req)
>  {
>  	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
>  
> +	if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
> +		kfree(page_address(req->special_vec.bv_page) +
> +		      req->special_vec.bv_offset);
> +	}
> +
>  	switch (req_op(req)) {
>  	case REQ_OP_SCSI_IN:
>  	case REQ_OP_SCSI_OUT:
> @@ -225,6 +266,7 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	int qid = hctx->queue_num;
>  	int err;
>  	bool notify = false;
> +	bool unmap = false;
>  	u32 type;
>  
>  	BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
> @@ -237,6 +279,13 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	case REQ_OP_FLUSH:
>  		type = VIRTIO_BLK_T_FLUSH;
>  		break;
> +	case REQ_OP_DISCARD:
> +		type = VIRTIO_BLK_T_DISCARD;
> +		break;
> +	case REQ_OP_WRITE_ZEROES:
> +		type = VIRTIO_BLK_T_WRITE_ZEROES;
> +		unmap = !(req->cmd_flags & REQ_NOUNMAP);
> +		break;
>  	case REQ_OP_SCSI_IN:
>  	case REQ_OP_SCSI_OUT:
>  		type = VIRTIO_BLK_T_SCSI_CMD;
> @@ -256,6 +305,12 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>  
>  	blk_mq_start_request(req);
>  
> +	if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES) {
> +		err = virtblk_setup_discard_write_zeroes(req, unmap);
> +		if (err)
> +			return BLK_STS_RESOURCE;
> +	}
> +
>  	num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
>  	if (num) {
>  		if (rq_data_dir(req) == WRITE)
> @@ -777,6 +832,42 @@ static int virtblk_probe(struct virtio_device *vdev)
>  	if (!err && opt_io_size)
>  		blk_queue_io_opt(q, blk_size * opt_io_size);
>  
> +	if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
> +		q->limits.discard_granularity = blk_size;
> +
> +		virtio_cread(vdev, struct virtio_blk_config,
> +				discard_sector_alignment, &v);
> +		if (v)
> +			q->limits.discard_alignment = v << SECTOR_SHIFT;
> +		else
> +			q->limits.discard_alignment = 0;
> +
> +		virtio_cread(vdev, struct virtio_blk_config,
> +				max_discard_sectors, &v);
> +		if (v)
> +			blk_queue_max_discard_sectors(q, v);
> +		else
> +			blk_queue_max_discard_sectors(q, UINT_MAX);
> +
> +		virtio_cread(vdev, struct virtio_blk_config, max_discard_seg,
> +				&v);
> +		if (v && v <= MAX_DISCARD_SEGMENTS)
> +			blk_queue_max_discard_segments(q, v);
> +		else
> +			blk_queue_max_discard_segments(q, MAX_DISCARD_SEGMENTS);
> +
> +		blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
> +	}
> +
> +	if (virtio_has_feature(vdev, VIRTIO_BLK_F_WRITE_ZEROES)) {
> +		virtio_cread(vdev, struct virtio_blk_config,
> +				max_write_zeroes_sectors, &v);
> +		if (v)
> +			blk_queue_max_write_zeroes_sectors(q, v);
> +		else
> +			blk_queue_max_write_zeroes_sectors(q, UINT_MAX);
> +	}
> +
>  	virtblk_update_capacity(vblk, false);
>  	virtio_device_ready(vdev);
>  
> @@ -885,14 +976,14 @@ static unsigned int features_legacy[] = {
>  	VIRTIO_BLK_F_SCSI,
>  #endif
>  	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
> -	VIRTIO_BLK_F_MQ,
> +	VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES,
>  }
>  ;
>  static unsigned int features[] = {
>  	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
>  	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE,
>  	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
> -	VIRTIO_BLK_F_MQ,
> +	VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES,
>  };
>  
>  static struct virtio_driver virtio_blk = {
> diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
> index 9ebe4d968dd5..682afbfe3aa4 100644
> --- a/include/uapi/linux/virtio_blk.h
> +++ b/include/uapi/linux/virtio_blk.h
> @@ -38,6 +38,8 @@
>  #define VIRTIO_BLK_F_BLK_SIZE	6	/* Block size of disk is available*/
>  #define VIRTIO_BLK_F_TOPOLOGY	10	/* Topology information is available */
>  #define VIRTIO_BLK_F_MQ		12	/* support more than one vq */
> +#define VIRTIO_BLK_F_DISCARD	13	/* DISCARD is supported */
> +#define VIRTIO_BLK_F_WRITE_ZEROES	14	/* WRITE ZEROES is supported */
>  
>  /* Legacy feature bits */
>  #ifndef VIRTIO_BLK_NO_LEGACY
> @@ -86,6 +88,39 @@ struct virtio_blk_config {
>  
>  	/* number of vqs, only available when VIRTIO_BLK_F_MQ is set */
>  	__u16 num_queues;
> +
> +	/* the next 3 entries are guarded by VIRTIO_BLK_F_DISCARD */
> +	/*
> +	 * The maximum discard sectors (in 512-byte sectors) for
> +	 * one segment.
> +	 */
> +	__u32 max_discard_sectors;
> +	/*
> +	 * The maximum number of discard segments in a
> +	 * discard command.
> +	 */
> +	__u32 max_discard_seg;
> +	/* Discard commands must be aligned to this number of sectors. */
> +	__u32 discard_sector_alignment;
> +
> +	/* the next 3 entries are guarded by VIRTIO_BLK_F_WRITE_ZEROES */
> +	/*
> +	 * The maximum number of write zeroes sectors (in 512-byte sectors) in
> +	 * one segment.
> +	 */
> +	__u32 max_write_zeroes_sectors;
> +	/*
> +	 * The maximum number of segments in a write zeroes
> +	 * command.
> +	 */
> +	__u32 max_write_zeroes_seg;
> +	/*
> +	 * Set if a VIRTIO_BLK_T_WRITE_ZEROES request may result in the
> +	 * deallocation of one or more of the sectors.
> +	 */
> +	__u8 write_zeroes_may_unmap;
> +
> +	__u8 unused1[3];
>  } __attribute__((packed));
>  
>  /*
> @@ -114,6 +149,12 @@ struct virtio_blk_config {
>  /* Get device ID command */
>  #define VIRTIO_BLK_T_GET_ID    8
>  
> +/* Discard command */
> +#define VIRTIO_BLK_T_DISCARD	11
> +
> +/* Write zeroes command */
> +#define VIRTIO_BLK_T_WRITE_ZEROES	13
> +
>  #ifndef VIRTIO_BLK_NO_LEGACY
>  /* Barrier before this op. */
>  #define VIRTIO_BLK_T_BARRIER	0x80000000
> @@ -133,6 +174,19 @@ struct virtio_blk_outhdr {
>  	__virtio64 sector;
>  };
>  
> +/* Unmap this range (only valid for write zeroes command) */
> +#define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP	0x00000001
> +
> +/* Discard/write zeroes range for each request. */
> +struct virtio_blk_discard_write_zeroes {
> +	/* discard/write zeroes start sector */
> +	__virtio64 sector;
> +	/* number of discard/write zeroes sectors */
> +	__virtio32 num_sectors;
> +	/* flags for this range */
> +	__virtio32 flags;
> +};
> +
>  #ifndef VIRTIO_BLK_NO_LEGACY
>  struct virtio_scsi_inhdr {
>  	__virtio32 errors;
> -- 
> 2.19.0.605.g01d371f741-goog

^ permalink raw reply

* Re: PROPOSAL: Extend inline asm syntax with size spec
From: Borislav Petkov @ 2018-10-13 21:30 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: Kate Stewart, Peter Zijlstra, Christopher Li, virtualization,
	Masahiro Yamada, Nadav Amit, Jan Beulich, H. Peter Anvin,
	Sam Ravnborg, Ingo Molnar, x86, linux-sparse, Ingo Molnar,
	linux-xtensa, Kees Cook, Segher Boessenkool, Chris Zankel,
	Michael Matz, Josh Poimboeuf, Alok Kataria, Juergen Gross, gcc,
	Richard Biener, Max Filippov, Greg Kroah-Hartman
In-Reply-To: <alpine.LNX.2.20.13.1810132355180.13914@monopod.intra.ispras.ru>

On Sun, Oct 14, 2018 at 12:14:02AM +0300, Alexander Monakov wrote:
> I apologize for coming in late here with an alternative proposal, but would
> you be happy if GCC gave you a way to designate a portion of the asm template
> string that shouldn't be counted as its cost because it doesn't go into the
> .text section? This wouldn't interact with your redefinitions of the inline
> keyword, and you could do something like (assuming we go with %` ... %`
> delimiters)

I don't mind it but I see you guys are still discussing what would be
the better solution here, on the gcc ML. And we, as one user, are a
happy camper as long as it does what it is meant to do. But how the
feature looks like syntactically is something for gcc folk to decide as
they're going to support it for the foreseeable future and I'm very well
aware of how important it is for a supportable feature to be designed
properly.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply

* Re: PROPOSAL: Extend inline asm syntax with size spec
From: Borislav Petkov @ 2018-10-13 19:33 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Kate Stewart, Peter Zijlstra, Christopher Li, virtualization,
	Masahiro Yamada, Nadav Amit, Jan Beulich, H. Peter Anvin,
	Sam Ravnborg, Ingo Molnar, x86, linux-sparse, Ingo Molnar,
	linux-xtensa, Kees Cook, Chris Zankel, Michael Matz,
	Josh Poimboeuf, Alok Kataria, Juergen Gross, gcc, Richard Biener,
	Max Filippov, Greg Kroah-Hartman, linux-kernel, Thomas Gleixner
In-Reply-To: <20181010191427.GF5533@zn.tnic>

Ok,

with Segher's help I've been playing with his patch ontop of bleeding
edge gcc 9 and here are my observations. Please double-check me for
booboos so that they can be addressed while there's time.

So here's what I see ontop of 4.19-rc7:

First marked the alternative asm() as inline and undeffed the "inline"
keyword. I need to do that because of the funky games we do with
"inline" redefinitions in include/linux/compiler_types.h.

And Segher hinted at either doing:

asm volatile inline(...

or

asm volatile __inline__(...

but both "inline" variants are defined as macros in that file.

Which means we either need to #undef inline before using it in asm() or
come up with something cleverer.

Anyway, did this:

---
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 4cd6a3b71824..7c0639087da7 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -165,11 +165,13 @@ static inline int alternatives_text_reserved(void *start, void *end)
  * For non barrier like inlines please define new variants
  * without volatile and memory clobber.
  */
+
+#undef inline
 #define alternative(oldinstr, newinstr, feature)                       \
-       asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) : : : "memory")
+       asm volatile inline(ALTERNATIVE(oldinstr, newinstr, feature) : : : "memory")
 
 #define alternative_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \
-       asm volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2) ::: "memory")
+       asm volatile inline(ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2) ::: "memory")
 
 /*
  * Alternative inline assembly with input.
---

Build failed at link time with:

arch/x86/boot/compressed/cmdline.o: In function `native_save_fl':
cmdline.c:(.text+0x0): multiple definition of `native_save_fl'
arch/x86/boot/compressed/misc.o:misc.c:(.text+0x1e0): first defined here
arch/x86/boot/compressed/cmdline.o: In function `native_restore_fl':
cmdline.c:(.text+0x10): multiple definition of `native_restore_fl'
arch/x86/boot/compressed/misc.o:misc.c:(.text+0x1f0): first defined here
arch/x86/boot/compressed/error.o: In function `native_save_fl':
error.c:(.text+0x0): multiple definition of `native_save_fl'

which I had to fix with this:

---
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 15450a675031..0d772598c37c 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -14,8 +14,7 @@
  */
 
 /* Declaration required for gcc < 4.9 to prevent -Werror=missing-prototypes */
-extern inline unsigned long native_save_fl(void);
-extern inline unsigned long native_save_fl(void)
+static inline unsigned long native_save_fl(void)
 {
        unsigned long flags;
 
@@ -33,8 +32,7 @@ ex
---

That "extern inline" declaration looks fishy to me anyway, maybe not really
needed ? Apparently, gcc < 4.9 complains with -Werror=missing-prototypes...

Then the build worked and the results look like this:

   text    data     bss     dec     hex filename
17287384        5040656 2019532 24347572        17383b4 vmlinux-before
17288020        5040656 2015436 24344112        1737630 vmlinux-2nd-version

so some inlining must be happening.

Then I did this:

---
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 9c5606d88f61..a0170344cf08 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -20,7 +20,7 @@ unsigned long __clear_user(void __user *addr, unsigned long size)
        /* no memory constraint because it doesn't change any memory gcc knows
           about */
        stac();
-       asm volatile(
+       asm volatile inline(
                "       testq  %[size8],%[size8]\n"
                "       jz     4f\n"
                "0:     movq $0,(%[dst])\n"
---

to force inlining of a somewhat bigger asm() statement. And yap, more
got inlined:

   text    data     bss     dec     hex filename
17287384        5040656 2019532 24347572        17383b4 vmlinux-before
17288020        5040656 2015436 24344112        1737630 vmlinux-2nd-version
17288076        5040656 2015436 24344168        1737668 vmlinux-2nd-version__clear_user

so more stuff gets inlined.

Looking at the asm output, it had before:

---
clear_user:
# ./arch/x86/include/asm/current.h:15:  return this_cpu_read_stable(current_task);
#APP
# 15 "./arch/x86/include/asm/current.h" 1
        movq %gs:current_task,%rax      #, pfo_ret__
# 0 "" 2
# arch/x86/lib/usercopy_64.c:51:        if (access_ok(VERIFY_WRITE, to, n))
#NO_APP
        movq    2520(%rax), %rdx        # pfo_ret___12->thread.addr_limit.seg, _1
        movq    %rdi, %rax      # to, tmp93
        addq    %rsi, %rax      # n, tmp93
        jc      .L3     #,
# arch/x86/lib/usercopy_64.c:51:        if (access_ok(VERIFY_WRITE, to, n))
        cmpq    %rax, %rdx      # tmp93, _1
        jb      .L3     #,
# arch/x86/lib/usercopy_64.c:52:                return __clear_user(to, n);
        jmp     __clear_user    #
---

note the JMP to __clear_user. After marking the asm() in __clear_user() as
inline, clear_user() inlines __clear_user() directly:

---
clear_user:
# ./arch/x86/include/asm/current.h:15:  return this_cpu_read_stable(current_task);
#APP
# 15 "./arch/x86/include/asm/current.h" 1
        movq %gs:current_task,%rax      #, pfo_ret__
# 0 "" 2
# arch/x86/lib/usercopy_64.c:51:        if (access_ok(VERIFY_WRITE, to, n))
#NO_APP
        movq    2520(%rax), %rdx        # pfo_ret___12->thread.addr_limit.seg, _1
        movq    %rdi, %rax      # to, tmp95
        addq    %rsi, %rax      # n, tmp95
        jc      .L8     #,
# arch/x86/lib/usercopy_64.c:51:        if (access_ok(VERIFY_WRITE, to, n))
        cmpq    %rax, %rdx      # tmp95, _1
        jb      .L8     #,
# ./arch/x86/include/asm/smap.h:58:     alternative("", __stringify(__ASM_STAC), X86_FEATURE_SMAP);
...

this last line is the stac() macro which gets inlined due to the
alternative() inlined macro above.

So I guess this all looks like what we wanted.

And if this lands in gcc9, we would need to do a asm_volatile() macro
which is defined differently based on the compiler used.

Thoughts, suggestions, etc are most welcome.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related

* [PATCH v8] virtio_blk: add discard and write zeroes support
From: Daniel Verkamp @ 2018-10-12 21:06 UTC (permalink / raw)
  To: virtualization, linux-block
  Cc: Jens Axboe, Michael S. Tsirkin, Stefan Hajnoczi, Changpeng Liu
In-Reply-To: <1528258740-6581-1-git-send-email-changpeng.liu@intel.com>

From: Changpeng Liu <changpeng.liu@intel.com>

In commit 88c85538, "virtio-blk: add discard and write zeroes features
to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
block specification has been extended to add VIRTIO_BLK_T_DISCARD and
VIRTIO_BLK_T_WRITE_ZEROES commands.  This patch enables support for
discard and write zeroes in the virtio-blk driver when the device
advertises the corresponding features, VIRTIO_BLK_F_DISCARD and
VIRTIO_BLK_F_WRITE_ZEROES.

Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Signed-off-by: Daniel Verkamp <dverkamp@chromium.org>
---
dverkamp: I've picked up this patch and made a few minor changes (as
listed below); most notably, I changed the kmalloc back to GFP_ATOMIC,
since it can be called from a context where sleeping is not allowed.
To prevent large allocations, I've also clamped the maximum number of
discard segments to 256; this results in a 4K allocation and should be
plenty of descriptors for most use cases.

I also removed most of the description from the commit message, since it
was duplicating the comments from virtio_blk.h and quoting parts of the
spec without adding any extra information.  I have tested this iteration
of the patch using crosvm with modifications to enable the new features:
https://chromium.googlesource.com/chromiumos/platform/crosvm/

CHANGELOG:
v8: [dverkamp] replace shifts by 9 with SECTOR_SHIFT constant
v7: [dverkamp] use GFP_ATOMIC for allocation that may not sleep; clarify
descriptor flags field; comment wording cleanups.
v6: don't set T_OUT bit to discard and write zeroes commands.
v5: use new block layer API: blk_queue_flag_set.
v4: several optimizations based on MST's comments, remove bit field
usage for command descriptor.
v3: define the virtio-blk protocol to add discard and write zeroes
support, first version implementation based on proposed specification.
v2: add write zeroes command support.
v1: initial proposal implementation for discard command.
---
 drivers/block/virtio_blk.c      | 95 ++++++++++++++++++++++++++++++++-
 include/uapi/linux/virtio_blk.h | 54 +++++++++++++++++++
 2 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 23752dc99b00..04a7ae602e2f 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -18,6 +18,7 @@
 
 #define PART_BITS 4
 #define VQ_NAME_LEN 16
+#define MAX_DISCARD_SEGMENTS 256
 
 static int major;
 static DEFINE_IDA(vd_index_ida);
@@ -172,10 +173,50 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr,
 	return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC);
 }
 
+
+static inline int virtblk_setup_discard_write_zeroes(struct request *req,
+						bool unmap)
+{
+	unsigned short segments = blk_rq_nr_discard_segments(req);
+	unsigned short n = 0;
+	struct virtio_blk_discard_write_zeroes *range;
+	struct bio *bio;
+	u32 flags = 0;
+
+	if (unmap)
+		flags |= VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP;
+
+	range = kmalloc_array(segments, sizeof(*range), GFP_ATOMIC);
+	if (!range)
+		return -ENOMEM;
+
+	__rq_for_each_bio(bio, req) {
+		u64 sector = bio->bi_iter.bi_sector;
+		u32 num_sectors = bio->bi_iter.bi_size >> SECTOR_SHIFT;
+
+		range[n].flags = cpu_to_le32(flags);
+		range[n].num_sectors = cpu_to_le32(num_sectors);
+		range[n].sector = cpu_to_le64(sector);
+		n++;
+	}
+
+	req->special_vec.bv_page = virt_to_page(range);
+	req->special_vec.bv_offset = offset_in_page(range);
+	req->special_vec.bv_len = sizeof(*range) * segments;
+	req->rq_flags |= RQF_SPECIAL_PAYLOAD;
+
+	return 0;
+}
+
 static inline void virtblk_request_done(struct request *req)
 {
 	struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
 
+	if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
+		kfree(page_address(req->special_vec.bv_page) +
+		      req->special_vec.bv_offset);
+	}
+
 	switch (req_op(req)) {
 	case REQ_OP_SCSI_IN:
 	case REQ_OP_SCSI_OUT:
@@ -225,6 +266,7 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
 	int qid = hctx->queue_num;
 	int err;
 	bool notify = false;
+	bool unmap = false;
 	u32 type;
 
 	BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
@@ -237,6 +279,13 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
 	case REQ_OP_FLUSH:
 		type = VIRTIO_BLK_T_FLUSH;
 		break;
+	case REQ_OP_DISCARD:
+		type = VIRTIO_BLK_T_DISCARD;
+		break;
+	case REQ_OP_WRITE_ZEROES:
+		type = VIRTIO_BLK_T_WRITE_ZEROES;
+		unmap = !(req->cmd_flags & REQ_NOUNMAP);
+		break;
 	case REQ_OP_SCSI_IN:
 	case REQ_OP_SCSI_OUT:
 		type = VIRTIO_BLK_T_SCSI_CMD;
@@ -256,6 +305,12 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	blk_mq_start_request(req);
 
+	if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES) {
+		err = virtblk_setup_discard_write_zeroes(req, unmap);
+		if (err)
+			return BLK_STS_RESOURCE;
+	}
+
 	num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
 	if (num) {
 		if (rq_data_dir(req) == WRITE)
@@ -777,6 +832,42 @@ static int virtblk_probe(struct virtio_device *vdev)
 	if (!err && opt_io_size)
 		blk_queue_io_opt(q, blk_size * opt_io_size);
 
+	if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
+		q->limits.discard_granularity = blk_size;
+
+		virtio_cread(vdev, struct virtio_blk_config,
+				discard_sector_alignment, &v);
+		if (v)
+			q->limits.discard_alignment = v << SECTOR_SHIFT;
+		else
+			q->limits.discard_alignment = 0;
+
+		virtio_cread(vdev, struct virtio_blk_config,
+				max_discard_sectors, &v);
+		if (v)
+			blk_queue_max_discard_sectors(q, v);
+		else
+			blk_queue_max_discard_sectors(q, UINT_MAX);
+
+		virtio_cread(vdev, struct virtio_blk_config, max_discard_seg,
+				&v);
+		if (v && v <= MAX_DISCARD_SEGMENTS)
+			blk_queue_max_discard_segments(q, v);
+		else
+			blk_queue_max_discard_segments(q, MAX_DISCARD_SEGMENTS);
+
+		blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
+	}
+
+	if (virtio_has_feature(vdev, VIRTIO_BLK_F_WRITE_ZEROES)) {
+		virtio_cread(vdev, struct virtio_blk_config,
+				max_write_zeroes_sectors, &v);
+		if (v)
+			blk_queue_max_write_zeroes_sectors(q, v);
+		else
+			blk_queue_max_write_zeroes_sectors(q, UINT_MAX);
+	}
+
 	virtblk_update_capacity(vblk, false);
 	virtio_device_ready(vdev);
 
@@ -885,14 +976,14 @@ static unsigned int features_legacy[] = {
 	VIRTIO_BLK_F_SCSI,
 #endif
 	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
-	VIRTIO_BLK_F_MQ,
+	VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES,
 }
 ;
 static unsigned int features[] = {
 	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
 	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE,
 	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
-	VIRTIO_BLK_F_MQ,
+	VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES,
 };
 
 static struct virtio_driver virtio_blk = {
diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
index 9ebe4d968dd5..682afbfe3aa4 100644
--- a/include/uapi/linux/virtio_blk.h
+++ b/include/uapi/linux/virtio_blk.h
@@ -38,6 +38,8 @@
 #define VIRTIO_BLK_F_BLK_SIZE	6	/* Block size of disk is available*/
 #define VIRTIO_BLK_F_TOPOLOGY	10	/* Topology information is available */
 #define VIRTIO_BLK_F_MQ		12	/* support more than one vq */
+#define VIRTIO_BLK_F_DISCARD	13	/* DISCARD is supported */
+#define VIRTIO_BLK_F_WRITE_ZEROES	14	/* WRITE ZEROES is supported */
 
 /* Legacy feature bits */
 #ifndef VIRTIO_BLK_NO_LEGACY
@@ -86,6 +88,39 @@ struct virtio_blk_config {
 
 	/* number of vqs, only available when VIRTIO_BLK_F_MQ is set */
 	__u16 num_queues;
+
+	/* the next 3 entries are guarded by VIRTIO_BLK_F_DISCARD */
+	/*
+	 * The maximum discard sectors (in 512-byte sectors) for
+	 * one segment.
+	 */
+	__u32 max_discard_sectors;
+	/*
+	 * The maximum number of discard segments in a
+	 * discard command.
+	 */
+	__u32 max_discard_seg;
+	/* Discard commands must be aligned to this number of sectors. */
+	__u32 discard_sector_alignment;
+
+	/* the next 3 entries are guarded by VIRTIO_BLK_F_WRITE_ZEROES */
+	/*
+	 * The maximum number of write zeroes sectors (in 512-byte sectors) in
+	 * one segment.
+	 */
+	__u32 max_write_zeroes_sectors;
+	/*
+	 * The maximum number of segments in a write zeroes
+	 * command.
+	 */
+	__u32 max_write_zeroes_seg;
+	/*
+	 * Set if a VIRTIO_BLK_T_WRITE_ZEROES request may result in the
+	 * deallocation of one or more of the sectors.
+	 */
+	__u8 write_zeroes_may_unmap;
+
+	__u8 unused1[3];
 } __attribute__((packed));
 
 /*
@@ -114,6 +149,12 @@ struct virtio_blk_config {
 /* Get device ID command */
 #define VIRTIO_BLK_T_GET_ID    8
 
+/* Discard command */
+#define VIRTIO_BLK_T_DISCARD	11
+
+/* Write zeroes command */
+#define VIRTIO_BLK_T_WRITE_ZEROES	13
+
 #ifndef VIRTIO_BLK_NO_LEGACY
 /* Barrier before this op. */
 #define VIRTIO_BLK_T_BARRIER	0x80000000
@@ -133,6 +174,19 @@ struct virtio_blk_outhdr {
 	__virtio64 sector;
 };
 
+/* Unmap this range (only valid for write zeroes command) */
+#define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP	0x00000001
+
+/* Discard/write zeroes range for each request. */
+struct virtio_blk_discard_write_zeroes {
+	/* discard/write zeroes start sector */
+	__virtio64 sector;
+	/* number of discard/write zeroes sectors */
+	__virtio32 num_sectors;
+	/* flags for this range */
+	__virtio32 flags;
+};
+
 #ifndef VIRTIO_BLK_NO_LEGACY
 struct virtio_scsi_inhdr {
 	__virtio32 errors;
-- 
2.19.0.605.g01d371f741-goog

^ permalink raw reply related

* Re: [PATCH v3 0/7] Add virtio-iommu driver
From: Jean-Philippe Brucker @ 2018-10-12 18:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: mark.rutland, peter.maydell, tnowicki, devicetree, marc.zyngier,
	linux-pci, will.deacon, virtualization, iommu, robh+dt,
	robin.murphy, kvmarm
In-Reply-To: <20181012125443-mutt-send-email-mst@kernel.org>

On 12/10/2018 18:00, Michael S. Tsirkin wrote:
> This all looks good to me. Minor nits:
> - I think DEBUG mode is best just removed for now
> - Slightly wrong patch splitup causing a misaligned structure
>   in uapi until all patches are applied.

Thanks a lot for the review, I'll fix these up and send a new version

> You should Cc Bjorn on the pci change - I'd like to see his ack on it
> being merged through my tree.

Argh, I don't know how I missed him. However patches 1-4 are device tree
changes, and need acks from Rob or Mark (on Cc)

> And pls Cc the virtio-dev list on any virtio uapi changes.
> 
> At a feature level I have some ideas for more features we
> could add, but for now I think I'll put this version in -next
> while you iron out the above wrinkles. Hope you can make the
> merge window.
Thanks, I also have some work lined up for hardware acceleration and
shared address spaces.

Jean

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox