* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Michael S. Tsirkin @ 2011-11-30 14:59 UTC (permalink / raw)
To: Ohad Ben-Cohen; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <CAK=WgbZK+k4uuyMPHUfr_PMyrR5ceRTvuDRpmhXfgwpSUOPw0g@mail.gmail.com>
On Wed, Nov 30, 2011 at 01:45:05PM +0200, Ohad Ben-Cohen wrote:
> > So you put virtio rings in MMIO memory?
>
> I'll be precise: the vrings are created in non-cacheable memory, which
> both processors have access to.
>
> > Could you please give a couple of examples of breakage?
>
> Sure. Basically, the order of the vring memory operations appear
> differently to the observing processor. For example, avail->idx gets
> updated before the new entry is put in the available array...
I see. And this happens because the ARM processor reorders
memory writes to this uncacheable memory?
And in an SMP configuration, writes are somehow not reordered?
For example, if we had such an AMP configuration with and x86
processor, wmb() (sfence) would be wrong and smp_wmb() would be sufficient.
Just checking that this is not a bug in the smp_wmb implementation
for the specific platform.
--
MST
^ permalink raw reply
* Re: [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Pawel Moll @ 2011-11-30 14:51 UTC (permalink / raw)
To: Arnd Bergmann, Stefano Stabellini
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Ian Campbell, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <201111301432.54463.arnd@arndb.de>
On Wed, 2011-11-30 at 14:32 +0000, Arnd Bergmann wrote:
> I don't care much either way, but I think it would be good to
> use similar solutions across all hypervisors. The two options
> that I've seen discussed for KVM were to use either a virtual PCI
> bus with individual virtio-pci devices as on the PC, or to
> use the new virtio-mmio driver and individually put virtio devices
> into the device tree.
Let me just add that the virtio-mmio devices can already be instantiated
from DT (see Documentation/devicetree/bindings/virtio/mmio.txt).
For A9-based VE I'd suggest placing them around 0x1001e000, eg.:
virtio_block@1001e000 {
compatible = "virtio,mmio";
reg = <0x1001e000 0x100>;
interrupts = <41>;
}
Cheers!
Paweł
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Michael S. Tsirkin @ 2011-11-30 14:50 UTC (permalink / raw)
To: Ohad Ben-Cohen; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <CAK=WgbZ5aBFNro3XC0cN+Ao4ZWrBHwwd3d__CSHA=n56gKmKXw@mail.gmail.com>
On Wed, Nov 30, 2011 at 01:55:53PM +0200, Ohad Ben-Cohen wrote:
> On Tue, Nov 29, 2011 at 5:19 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Nov 29, 2011 at 03:57:19PM +0200, Ohad Ben-Cohen wrote:
> >> > Is an extra branch faster or slower than reverting d57ed95?
> >>
> >> Sorry, unfortunately I have no way to measure this, as I don't have
> >> any virtualization/x86 setup. I'm developing on ARM SoCs, where
> >> virtualization hardware is coming, but not here yet.
> >
> > You can try using the micro-benchmark in tools/virtio/.
>
> Hmm, care to show me exactly what do you mean ?
make headers_install
make -C tools/virtio/
(you'll need an empty stub for tools/virtio/linux/module.h,
I just sent a patch to add that)
sudo insmod tools/virtio/vhost_test/vhost_test.ko
./tools/virtio/virtio_test
> Though I somewhat suspect that any micro-benchmarking I'll do with my
> random ARM SoC will not have much value to real virtualization/x86
> workloads.
>
> Thanks,
> Ohad.
Real virtualization/x86 can keep using current smp_XX barriers, right?
We can have some config for your kind of setup.
--
MST
^ permalink raw reply
* Re: [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Arnd Bergmann @ 2011-11-30 14:32 UTC (permalink / raw)
To: Ian Campbell
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Pawel Moll, kvm@vger.kernel.org, Stefano Stabellini,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <1322659535.31810.97.camel@zakaz.uk.xensource.com>
On Wednesday 30 November 2011, Ian Campbell wrote:
> On Wed, 2011-11-30 at 13:03 +0000, Arnd Bergmann wrote:
> > On Wednesday 30 November 2011, Stefano Stabellini wrote:
> > This is the same choice people have made for KVM, but it's not
> > necessarily the best option in the long run. In particular, this
> > board has a lot of hardware that you claim to have by putting the
> > machine number there, when you don't really want to emulate it.
>
> This code is actually setting up dom0 which (for the most part) sees the
> real hardware.
Ok, I see.
> > Pawell Moll is working on a variant of the vexpress code that uses
> > the flattened device tree to describe the present hardware [1], and
> > I think that would be a much better target for an official release.
> > Ideally, the hypervisor should provide the device tree binary (dtb)
> > to the guest OS describing the hardware that is actually there.
>
> Agreed. Our intention was to use DT so this fits perfectly with our
> plans.
>
> For dom0 we would expose a (possibly filtered) version of the DT given
> to us by the firmware (e.g. we might hide a serial port to reserve it
> for Xen's use, we'd likely fiddle with the memory map etc).
Ah, very good.
> For domU the DT would presumably be constructed by the toolstack (in
> dom0 userspace) as appropriate for the guest configuration. I guess this
> needn't correspond to any particular "real" hardware platform.
Correct, but it needs to correspond to some platform that is supported
by the guest OS, which leaves the choice between emulating a real
hardware platform, adding a completely new platform specifically for
virtual machines, or something in between the two.
What I suggested to the KVM developers is to start out with the
vexpress platform, but then generalize it to the point where it fits
your needs. All hardware that one expects a guest to have (GIC, timer,
...) will still show up in the same location as on a real vexpress,
while anything that makes no sense or is better paravirtualized (LCD,
storage, ...) just becomes optional and has to be described in the
device tree if it's actually there.
> > This would also be the place where you tell the guest that it should
> > look for PV devices. I'm not familiar with how Xen announces PV
> > devices to the guest on other architectures, but you have the
> > choice between providing a full "binding", i.e. a formal specification
> > in device tree format for the guest to detect PV devices in the
> > same way as physical or emulated devices, or just providing a single
> > place in the device tree in which the guest detects the presence
> > of a xen device bus and then uses hcalls to find the devices on that
> > bus.
>
> On x86 there is an emulated PCI device which serves as the hooking point
> for the PV drivers. For ARM I don't think it would be unreasonable to
> have a DT entry instead. I think it would be fine just represent the
> root of the "xenbus" and further discovery would occur using the normal
> xenbus mechanisms (so not a full binding). AIUI for buses which are
> enumerable this is the preferred DT scheme to use.
In general that is the case, yes. One could argue that any software
protocol between Xen and the guest is as good as any other, so it
makes sense to use the device tree to describe all devices here.
The counterargument to that is that Linux and other OSs already
support Xenbus, so there is no need to come up with a new binding.
I don't care much either way, but I think it would be good to
use similar solutions across all hypervisors. The two options
that I've seen discussed for KVM were to use either a virtual PCI
bus with individual virtio-pci devices as on the PC, or to
use the new virtio-mmio driver and individually put virtio devices
into the device tree.
> > Another topic is the question whether there are any hcalls that
> > we should try to standardize before we get another architecture
> > with multiple conflicting hcall APIs as we have on x86 and powerpc.
>
> The hcall API we are currently targeting is the existing Xen API (at
> least the generic parts of it). These generally deal with fairly Xen
> specific concepts like grant tables etc.
Ok. It would of course still be possible to agree on an argument passing
convention so that we can share the macros used to issue the hcalls,
even if the individual commands are all different. I think I also
remember talk about the need for a set of hypervisor independent calls
that everyone should implement, but I can't remember what those were.
Maybe we can split the number space into a range of some generic and
some vendor specific hcalls?
Arnd
^ permalink raw reply
* Re: [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Stefano Stabellini @ 2011-11-30 14:20 UTC (permalink / raw)
To: Catalin Marinas
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Arnd Bergmann, kvm@vger.kernel.org, Stefano Stabellini,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <CAHkRjk48jHO19H04rf8252+v04hk9qSdKBrfYCqJsZyamVhMEw@mail.gmail.com>
On Wed, 30 Nov 2011, Catalin Marinas wrote:
> On 30 November 2011 11:39, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > A git branch is available here (not ready for submission):
> >
> > git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git arm
> >
> > the branch above is based on git://linux-arm.org/linux-2.6.git arm-lpae,
> > even though guests don't really need lpae support to run on Xen.
>
> Indeed, you don't really need LPAE. What you may need though is
> generic timers support for A15, it would allow less Hypervisor traps.
> For up-to-date architecture patches (well, development tree, not
> guaranteed to be stable), I would recommend this (they get into
> mainline at some point):
>
> http://git.kernel.org/?p=linux/kernel/git/cmarinas/linux-arm-arch.git;a=summary
>
> Either use master or just cherry-pick the branches that you are interested in.
Thanks, I'll rebase on that.
^ permalink raw reply
* Re: virtio-scsi spec (was Re: [PATCH] Add virtio-scsi to the virtio spec)
From: Hannes Reinecke @ 2011-11-30 14:17 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Stefan Hajnoczi, Michael S. Tsirkin, LKML, linux-scsi,
virtualization
In-Reply-To: <1322661042-28191-2-git-send-email-pbonzini@redhat.com>
On 11/30/2011 02:50 PM, Paolo Bonzini wrote:
> Appendix H: SCSI Host Device
>
> The virtio SCSI host device groups together one or more simple
> virtual devices (ie. disk), and allows communicating to these
> devices using the SCSI protocol. An instance of the device
> represents a SCSI host with possibly many buses (also known as
> channels or paths), targets and LUNs attached.
>
> The virtio SCSI device services two kinds of requests:
>
> * command requests for a logical unit;
>
> * task management functions related to a logical unit, target or
> command.
>
> The device is also able to send out notifications about added and
> removed logical units. Together, these capabilities provide a
> SCSI transport protocol that uses virtqueues as the transfer
> medium. In the transport protocol, the virtio driver acts as the
> initiator, while the virtio SCSI host provides one or more
> targets that receive and process the requests.
>
> Configuration
> =============
>
> * Subsystem Device ID 7
>
> * Virtqueues 0:controlq; 1:eventq; 2..n:request queues.
>
> * Feature bits
>
> VIRTIO_SCSI_F_INOUT (0)
> A single request can include both read-only and write-only data buffers.
>
> * Device configuration layout
> All fields of this configuration are always available. sense_size and
> cdb_size are writable by the guest.
>
> struct virtio_scsi_config {
> u32 num_queues;
> u32 seg_max;
> u32 event_info_size;
> u32 sense_size;
> u32 cdb_size;
> u16 max_channel;
> u16 max_target;
> u32 max_lun;
> };
>
> num_queues is the total number of virtqueues exposed by the
> device. The driver is free to use only one request queue, or
> it can use more to achieve better performance.
>
> seg_max is the maximum number of segments that can be in a
> command. A bidirectional command can include seg_max input
> segments and seg_max output segments.
>
I would like to have the other request_queue limitations exposed
here, too.
Most notably we're missing the maximum size of an individual segment
and the maximum size of the overall I/O request.
Without it we can't efficiently map onto pass-through devices.
> event_info_size is the maximum size that the device will fill
> for buffers that the driver places in the eventq. The driver
> should always put buffers at least of this size. It is
> written by the device depending on the set of negotated
> features.
>
> sense_size is the maximum size of the sense data that the
> device will write. The default value is written by the device
> and will always be 96, but the driver can modify it. It is
> restored to the default when the device is reset.
>
> cdb_size is the maximum size of the CDB that the driver will
> write. The default value is written by the device and will
> always be 32, but the driver can likewise modify it. It is
> restored to the default when the device is reset.
>
> max_channel, max_target and max_lun can be used by the driver
> as hints for scanning the logical units on the host. In the
> current version of the spec, they will always be respectively
> 0, 255 and 16383.
>
As this is the host specification I really would like to see an host
identifier somewhere in there.
Otherwise we won't be able to reliably identify a virtio SCSI host.
Plus you can't calculate the ITL nexus information, making
Persistent Reservations impossible.
However, we should be able to delegate this to a specific controlq
command.
> Device Initialization
> =====================
>
> The initialization routine should first of all discover the
> device's virtqueues.
>
> If the driver uses the eventq, it should then place at least a
> buffer in the eventq.
>
> The driver can immediately issue requests (for example, INQUIRY
> or REPORT LUNS) or task management functions (for example, I_T
> RESET).
>
> Device Operation: request queues
> ================================
>
> The driver queues requests to an arbitrary request queue, and they are
> used by the device on that same queue. In this version of the spec,
> if a driver uses more than one queue it is the responsibility of the
> driver to ensure strict request ordering; commands placed on different
> queue will be consumed with no order constraints.
>
> Requests have the following format:
>
> struct virtio_scsi_req_cmd {
> u8 lun[8];
> u64 id;
> u8 task_attr;
> u8 prio;
> u8 crn;
> char cdb[cdb_size];
> char dataout[];
> u32 sense_len;
> u32 residual;
> u16 status_qualifier;
> u8 status;
> u8 response;
> u8 sense[sense_size];
> char datain[];
> };
>
> /* command-specific response values */
> #define VIRTIO_SCSI_S_OK 0
> #define VIRTIO_SCSI_S_UNDERRUN 1
> #define VIRTIO_SCSI_S_ABORTED 2
> #define VIRTIO_SCSI_S_BAD_TARGET 3
> #define VIRTIO_SCSI_S_RESET 4
> #define VIRTIO_SCSI_S_TRANSPORT_FAILURE 5
> #define VIRTIO_SCSI_S_TARGET_FAILURE 6
> #define VIRTIO_SCSI_S_NEXUS_FAILURE 7
> #define VIRTIO_SCSI_S_FAILURE 8
>
> /* task_attr */
> #define VIRTIO_SCSI_S_SIMPLE 0
> #define VIRTIO_SCSI_S_ORDERED 1
> #define VIRTIO_SCSI_S_HEAD 2
> #define VIRTIO_SCSI_S_ACA 3
>
> The lun field addresses a target and logical unit in the
> virtio-scsi device's SCSI domain. In this version of the spec,
> the only supported format for the LUN field is: first byte set to
> 1, second byte set to target, third and fourth byte representing
> a single level LUN structure, followed by four zero bytes. With
> this representation, a virtio-scsi device can serve up to 256
> targets and 16384 LUNs per target.
>
> The id field is the command identifier ("tag").
>
> Task_attr, prio and crn should be left to zero: command priority
> is explicitly not supported by this version of the device;
> task_attr defines the task attribute as in the table above, but
> all task attributes may be mapped to SIMPLE by the device; crn
> may also be provided by clients, but is generally expected to be
> 0. The maximum CRN value defined by the protocol is 255, since
> CRN is stored in an 8-bit integer.
>
> All of these fields are defined in SAM. They are always
> read-only, as are the cdb and dataout field. The cdb_size is
> taken from the configuration space.
>
> sense and subsequent fields are always write-only. The sense_len
> field indicates the number of bytes actually written to the sense
> buffer. The residual field indicates the residual size,
> calculated as "data_length - number_of_transferred_bytes", for
> read or write operations. For bidirectional commands, the
> number_of_transferred_bytes includes both read and written bytes.
> A residual field that is less than the size of datain means that
> the dataout field was processed entirely. A residual field that
> exceeds the size of datain means that the dataout field was
> processed partially and the datain field was not processed at
> all.
>
> The status byte is written by the device to be the status
> code as defined by SAM.
>
> The response byte is written by the device to be one of the
> following:
>
> VIRTIO_SCSI_S_OK when the request was completed and the status
> byte is filled with a SCSI status code (not necessarily
> "GOOD").
>
> VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires
> transferring more data than is available in the data buffers.
>
> VIRTIO_SCSI_S_ABORTED if the request was cancelled due to an
> ABORT TASK or ABORT TASK SET task management function.
>
> VIRTIO_SCSI_S_BAD_TARGET if the request was never processed
> because the target indicated by the lun field does not exist.
>
> VIRTIO_SCSI_S_RESET if the request was cancelled due to a bus
> or device reset.
>
> VIRTIO_SCSI_S_TRANSPORT_FAILURE if the request failed due to a
> problem in the connection between the host and the target
> (severed link).
>
> VIRTIO_SCSI_S_TARGET_FAILURE if the target is suffering a
> failure and the guest should not retry on other paths.
>
> VIRTIO_SCSI_S_NEXUS_FAILURE if the nexus is suffering a failure
> but retrying on other paths might yield a different result.
>
> VIRTIO_SCSI_S_FAILURE for other host or guest error. In
> particular, if neither dataout nor datain is empty, and the
> VIRTIO_SCSI_F_INOUT feature has not been negotiated, the
> request will be immediately returned with a response equal to
> VIRTIO_SCSI_S_FAILURE.
>
We should be adding
VIRTIO_SCSI_S_BUSY
for a temporary failure, indicating that a command retry
might be sufficient to clear this situation.
Equivalent to VIRTIO_SCSI_S_NEXUS_FAILURE, but issuing a retry on
the same path.
Thanks for the write-up.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
^ permalink raw reply
* Re: [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Catalin Marinas @ 2011-11-30 14:11 UTC (permalink / raw)
To: Stefano Stabellini
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
kvm@vger.kernel.org, Arnd Bergmann, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <alpine.DEB.2.00.1111301053300.31179@kaball-desktop>
On 30 November 2011 11:39, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> A git branch is available here (not ready for submission):
>
> git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git arm
>
> the branch above is based on git://linux-arm.org/linux-2.6.git arm-lpae,
> even though guests don't really need lpae support to run on Xen.
Indeed, you don't really need LPAE. What you may need though is
generic timers support for A15, it would allow less Hypervisor traps.
For up-to-date architecture patches (well, development tree, not
guaranteed to be stable), I would recommend this (they get into
mainline at some point):
http://git.kernel.org/?p=linux/kernel/git/cmarinas/linux-arm-arch.git;a=summary
Either use master or just cherry-pick the branches that you are interested in.
--
Catalin
^ permalink raw reply
* virtio-scsi spec (was Re: [PATCH] Add virtio-scsi to the virtio spec)
From: Paolo Bonzini @ 2011-11-30 13:50 UTC (permalink / raw)
To: Rusty Russell, virtualization
Cc: linux-scsi, LKML, Stefan Hajnoczi, Michael S. Tsirkin
In-Reply-To: <1322661042-28191-1-git-send-email-pbonzini@redhat.com>
Appendix H: SCSI Host Device
The virtio SCSI host device groups together one or more simple
virtual devices (ie. disk), and allows communicating to these
devices using the SCSI protocol. An instance of the device
represents a SCSI host with possibly many buses (also known as
channels or paths), targets and LUNs attached.
The virtio SCSI device services two kinds of requests:
* command requests for a logical unit;
* task management functions related to a logical unit, target or
command.
The device is also able to send out notifications about added and
removed logical units. Together, these capabilities provide a
SCSI transport protocol that uses virtqueues as the transfer
medium. In the transport protocol, the virtio driver acts as the
initiator, while the virtio SCSI host provides one or more
targets that receive and process the requests.
Configuration
=============
* Subsystem Device ID 7
* Virtqueues 0:controlq; 1:eventq; 2..n:request queues.
* Feature bits
VIRTIO_SCSI_F_INOUT (0)
A single request can include both read-only and write-only data buffers.
* Device configuration layout
All fields of this configuration are always available. sense_size and
cdb_size are writable by the guest.
struct virtio_scsi_config {
u32 num_queues;
u32 seg_max;
u32 event_info_size;
u32 sense_size;
u32 cdb_size;
u16 max_channel;
u16 max_target;
u32 max_lun;
};
num_queues is the total number of virtqueues exposed by the
device. The driver is free to use only one request queue, or
it can use more to achieve better performance.
seg_max is the maximum number of segments that can be in a
command. A bidirectional command can include seg_max input
segments and seg_max output segments.
event_info_size is the maximum size that the device will fill
for buffers that the driver places in the eventq. The driver
should always put buffers at least of this size. It is
written by the device depending on the set of negotated
features.
sense_size is the maximum size of the sense data that the
device will write. The default value is written by the device
and will always be 96, but the driver can modify it. It is
restored to the default when the device is reset.
cdb_size is the maximum size of the CDB that the driver will
write. The default value is written by the device and will
always be 32, but the driver can likewise modify it. It is
restored to the default when the device is reset.
max_channel, max_target and max_lun can be used by the driver
as hints for scanning the logical units on the host. In the
current version of the spec, they will always be respectively
0, 255 and 16383.
Device Initialization
=====================
The initialization routine should first of all discover the
device's virtqueues.
If the driver uses the eventq, it should then place at least a
buffer in the eventq.
The driver can immediately issue requests (for example, INQUIRY
or REPORT LUNS) or task management functions (for example, I_T
RESET).
Device Operation: request queues
================================
The driver queues requests to an arbitrary request queue, and they are
used by the device on that same queue. In this version of the spec,
if a driver uses more than one queue it is the responsibility of the
driver to ensure strict request ordering; commands placed on different
queue will be consumed with no order constraints.
Requests have the following format:
struct virtio_scsi_req_cmd {
u8 lun[8];
u64 id;
u8 task_attr;
u8 prio;
u8 crn;
char cdb[cdb_size];
char dataout[];
u32 sense_len;
u32 residual;
u16 status_qualifier;
u8 status;
u8 response;
u8 sense[sense_size];
char datain[];
};
/* command-specific response values */
#define VIRTIO_SCSI_S_OK 0
#define VIRTIO_SCSI_S_UNDERRUN 1
#define VIRTIO_SCSI_S_ABORTED 2
#define VIRTIO_SCSI_S_BAD_TARGET 3
#define VIRTIO_SCSI_S_RESET 4
#define VIRTIO_SCSI_S_TRANSPORT_FAILURE 5
#define VIRTIO_SCSI_S_TARGET_FAILURE 6
#define VIRTIO_SCSI_S_NEXUS_FAILURE 7
#define VIRTIO_SCSI_S_FAILURE 8
/* task_attr */
#define VIRTIO_SCSI_S_SIMPLE 0
#define VIRTIO_SCSI_S_ORDERED 1
#define VIRTIO_SCSI_S_HEAD 2
#define VIRTIO_SCSI_S_ACA 3
The lun field addresses a target and logical unit in the
virtio-scsi device's SCSI domain. In this version of the spec,
the only supported format for the LUN field is: first byte set to
1, second byte set to target, third and fourth byte representing
a single level LUN structure, followed by four zero bytes. With
this representation, a virtio-scsi device can serve up to 256
targets and 16384 LUNs per target.
The id field is the command identifier ("tag").
Task_attr, prio and crn should be left to zero: command priority
is explicitly not supported by this version of the device;
task_attr defines the task attribute as in the table above, but
all task attributes may be mapped to SIMPLE by the device; crn
may also be provided by clients, but is generally expected to be
0. The maximum CRN value defined by the protocol is 255, since
CRN is stored in an 8-bit integer.
All of these fields are defined in SAM. They are always
read-only, as are the cdb and dataout field. The cdb_size is
taken from the configuration space.
sense and subsequent fields are always write-only. The sense_len
field indicates the number of bytes actually written to the sense
buffer. The residual field indicates the residual size,
calculated as "data_length - number_of_transferred_bytes", for
read or write operations. For bidirectional commands, the
number_of_transferred_bytes includes both read and written bytes.
A residual field that is less than the size of datain means that
the dataout field was processed entirely. A residual field that
exceeds the size of datain means that the dataout field was
processed partially and the datain field was not processed at
all.
The status byte is written by the device to be the status
code as defined by SAM.
The response byte is written by the device to be one of the
following:
VIRTIO_SCSI_S_OK when the request was completed and the status
byte is filled with a SCSI status code (not necessarily
"GOOD").
VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires
transferring more data than is available in the data buffers.
VIRTIO_SCSI_S_ABORTED if the request was cancelled due to an
ABORT TASK or ABORT TASK SET task management function.
VIRTIO_SCSI_S_BAD_TARGET if the request was never processed
because the target indicated by the lun field does not exist.
VIRTIO_SCSI_S_RESET if the request was cancelled due to a bus
or device reset.
VIRTIO_SCSI_S_TRANSPORT_FAILURE if the request failed due to a
problem in the connection between the host and the target
(severed link).
VIRTIO_SCSI_S_TARGET_FAILURE if the target is suffering a
failure and the guest should not retry on other paths.
VIRTIO_SCSI_S_NEXUS_FAILURE if the nexus is suffering a failure
but retrying on other paths might yield a different result.
VIRTIO_SCSI_S_FAILURE for other host or guest error. In
particular, if neither dataout nor datain is empty, and the
VIRTIO_SCSI_F_INOUT feature has not been negotiated, the
request will be immediately returned with a response equal to
VIRTIO_SCSI_S_FAILURE.
Device Operation: controlq
==========================
The controlq is used for other SCSI transport operations.
Requests have the following format:
struct virtio_scsi_ctrl {
u32 type;
...
u8 response;
};
/* response values valid for all commands */
#define VIRTIO_SCSI_S_OK 0
#define VIRTIO_SCSI_S_BAD_TARGET 3
#define VIRTIO_SCSI_S_TRANSPORT_FAILURE 5
#define VIRTIO_SCSI_S_TARGET_FAILURE 6
#define VIRTIO_SCSI_S_NEXUS_FAILURE 7
#define VIRTIO_SCSI_S_FAILURE 8
#define VIRTIO_SCSI_S_INCORRECT_LUN 11
The type identifies the remaining fields.
The following commands are defined:
* Task management function
#define VIRTIO_SCSI_T_TMF 0
#define VIRTIO_SCSI_T_TMF_ABORT_TASK 0
#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET 1
#define VIRTIO_SCSI_T_TMF_CLEAR_ACA 2
#define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET 3
#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET 4
#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET 5
#define VIRTIO_SCSI_T_TMF_QUERY_TASK 6
#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET 7
struct virtio_scsi_ctrl_tmf
{
u32 type;
u32 subtype;
u8 lun[8];
u64 id;
u8 additional[];
u8 response;
}
/* command-specific response values */
#define VIRTIO_SCSI_S_FUNCTION_COMPLETE 0
#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED 9
#define VIRTIO_SCSI_S_FUNCTION_REJECTED 10
The type is VIRTIO_SCSI_T_TMF; the subtype field defines. All
fields except response are filled by the driver. The subtype
field must always be specified and identifies the requested
task management function. Other fields may be irrelevant for
the requested TMF are ignored. The lun field is in the same
format specified for request queues; the single level LUN is
ignored when the task management function addresses a whole I_T
nexus. When relevant, the value of the id field is matched
against the id values passed on the requestq.
Note that since ACA is not supported by this version of the
spec, VIRTIO_SCSI_T_TMF_CLEAR_ACA is always a no-operation.
The outcome of the task management function is written by the
device in the response field. The command-specific response
values map 1-to-1 with those defined in SAM.
* Asynchronous notification query
#define VIRTIO_SCSI_T_AN_QUERY 1
struct virtio_scsi_ctrl_an {
u32 type;
u8 lun[8];
u32 event_requested;
u32 event_actual;
u8 response;
}
#define VIRTIO_SCSI_EVT_ASYNC_OPERATIONAL_CHANGE 2
#define VIRTIO_SCSI_EVT_ASYNC_POWER_MGMT 4
#define VIRTIO_SCSI_EVT_ASYNC_EXTERNAL_REQUEST 8
#define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE 16
#define VIRTIO_SCSI_EVT_ASYNC_MULTI_HOST 32
#define VIRTIO_SCSI_EVT_ASYNC_DEVICE_BUSY 64
By sending this command, the driver asks the device which
events the given LUN can report, as described in paragraphs 6.6
and A.6 of the SCSI MMC specification. The driver writes the
events it is interested in into the event_requested; the device
responds by writing the events that it supports into
event_actual.
The type is VIRTIO_SCSI_T_AN_QUERY. The lun and event_requested
fields are written by the driver. The event_actual and response
fields are written by the device.
No command-specific values are defined for the response byte.
* Asynchronous notification subscription
#define VIRTIO_SCSI_T_AN_SUBSCRIBE 2
struct virtio_scsi_ctrl_an {
u32 type;
u8 lun[8];
u32 event_requested;
u32 event_actual;
u8 response;
}
By sending this command, the driver asks the specified LUN to
report events for its physical interface, again as described in
the SCSI MMC specification. The driver writes the events it is
interested in into the event_requested; the device responds by
writing the events that it supports into event_actual.
Event types are the same as for the asynchronous notification
query message.
The type is VIRTIO_SCSI_T_AN_SUBSCRIBE. The lun and
event_requested fields are written by the driver. The
event_actual and response fields are written by the device.
No command-specific values are defined for the response byte.
Device Operation: eventq
========================
The eventq is used by the device to report information on logical
units that are attached to it. The driver should always leave a few
buffers ready in the eventq. In general, the device will not queue
events to cope with an empty eventq, and will end up dropping events if
it finds no buffer ready. However, when reporting events for many LUNs
(e.g. when a whole target disappears), the device can throttle events
to avoid dropping them. For this reason, placing 10-15 buffers on the
event queue should be enough.
Buffers are placed in the eventq and filled by the device when
interesting events occur. The buffers should be strictly
write-only (device-filled) and the size of the buffers should be
at least the value given in the device's configuration
information.
Buffers returned by the device on the eventq will be referred to
as "events" in the rest of this section. Events have the
following format:
#define VIRTIO_SCSI_T_EVENTS_MISSED 0x80000000
struct virtio_scsi_event {
u32 event;
...
}
If bit 31 is set in the event field, the device failed to report
an event due to missing buffers. In this case, the driver should
poll the logical units for unit attention conditions, and/or do
whatever form of bus scan is appropriate for the guest operating
system.
Other data that the device writes to the buffer depends on the
contents of the event field. The following events are defined:
* No event
#define VIRTIO_SCSI_T_NO_EVENT 0
This event is fired in the following cases:
* When the device detects in the eventq a buffer that is
shorter than what is indicated in the configuration field, it
might use it immediately and put this dummy value in the
event field. A well-written driver will never observe this
situation.
* When events are dropped, the device may signal this event as
soon as the drivers makes a buffer available, in order to
request action from the driver. In this case, of course, this
event will be reported with the VIRTIO_SCSI_T_EVENTS_MISSED
flag.
* Transport reset
#define VIRTIO_SCSI_T_TRANSPORT_RESET 1
struct virtio_scsi_reset {
u32 event;
u8 lun[8];
u32 reason;
}
#define VIRTIO_SCSI_EVT_RESET_HARD 0
#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
By sending this event, the device signals that a logical unit
on a target has been reset, including the case of a new device
appearing or disappearing on the bus.The device fills in all
fields. The event field is set to
VIRTIO_SCSI_T_TRANSPORT_RESET. The lun field addresses a
logical unit in the SCSI host.
The reason value is one of the three #define values appearing
above:
* VIRTIO_SCSI_EVT_RESET_REMOVED ("LUN/target removed") is used
if the target or logical unit is no longer able to receive
commands.
* VIRTIO_SCSI_EVT_RESET_HARD ("LUN hard reset") is used if the
logical unit has been reset, but is still present.
* VIRTIO_SCSI_EVT_RESET_RESCAN ("rescan LUN/target") is used if
a target or logical unit has just appeared on the device.
The "removed" and "rescan" events, when sent for LUN 0, may
apply to the entire target. After receiving them the driver
should ask the initiator to rescan the target, in order to
detect the case when an entire target has appeared or
disappeared.
Events will also be reported via sense codes (this obviously
does not apply to newly appeared buses or targets, since the
application has never discovered them):
* "LUN/target removed" maps to sense key ILLEGAL REQUEST, asc
0x25, ascq 0x00 (LOGICAL UNIT NOT SUPPORTED)
* "LUN hard reset" maps to sense key UNIT ATTENTION, asc 0x29
(POWER ON, RESET OR BUS DEVICE RESET OCCURRED)
* "rescan LUN/target" maps to sense key UNIT ATTENTION, asc
0x3f, ascq 0x0e (REPORTED LUNS DATA HAS CHANGED)
The preferred way to detect transport reset is always to use
events, because sense codes are only seen by the driver when it
sends a SCSI command to the logical unit or target. However, in
case events are dropped, the initiator will still be able to
synchronize with the actual state of the controller if the
driver asks the initiator to rescan of the SCSI bus. During the
rescan, the initiator will be able to observe the above sense
codes, and it will process them as if it the driver had
received the equivalent event.
* Asynchronous notification
#define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
struct virtio_scsi_an_event {
u32 event;
u8 lun[8];
u32 reason;
}
By sending this event, the device signals that an asynchronous
event was fired from a physical interface.
All fields are written by the device. The event field is set to
VIRTIO_SCSI_T_ASYNC_NOTIFY. The lun field addresses a logical
unit in the SCSI host. The reason field is a subset of the
events that the driver has subscribed to via the "Asynchronous
notification subscription" command.
When dropped events are reported, the driver should poll for
asynchronous events manually using SCSI commands.
^ permalink raw reply
* [PATCH] Add virtio-scsi to the virtio spec
From: Paolo Bonzini @ 2011-11-30 13:50 UTC (permalink / raw)
To: Rusty Russell, virtualization
Cc: linux-scsi, LKML, Stefan Hajnoczi, Michael S. Tsirkin
Hi all,
here is the specification for a virtio-based SCSI host (controller, HBA,
you name it). The virtio SCSI host is the basis of an alternative
storage stack for KVM. This stack would overcome several limitations of
the current solution, virtio-blk:
1) scalability limitations: virtio-blk-over-PCI puts a strong upper
limit on the number of devices that can be added to a guest. Common
configurations have a limit of ~30 devices. While this can be worked
around by implementing a PCI-to-PCI bridge, or by using multifunction
virtio-blk devices, these solutions either have not been implemented
yet, or introduce management restrictions. On the other hand, the SCSI
architecture is well known for its scalability and virtio-scsi supports
advanced feature such as multiqueueing.
2) limited flexibility: virtio-blk does not support all possible storage
scenarios. For example, it only allows limited SCSI passthrough.
In principle, virtio-scsi provides anything that the underlying SCSI
target (be it emulated by QEMU, physical storage, iSCSI or the in-kernel
target) supports.
3) limited extensibility: over the time, many features have been added
to virtio-blk. Each such change requires modifications to the virtio
specification, to the guest drivers, and to the device model in the
host. The virtio-scsi spec has been written to follow SAM conventions,
and exposing new features to the guest will only require changes to the
host's SCSI target implementation.
This includes all the changes suggested when I posted the first version
of the draft (https://lkml.org/lkml/2011/6/7/252). The only exception is
that I did not add a "list target ports" command; instead I added hints
to the configuration space for probing the bus. Even though channels
should be obsolete and thus not supported in this version of the spec,
they still exist even in modern drivers (MegaSAS) so I kept them in
configuration space to simplify future extensions.
Here is a summary of the changes:
* clarified multiqueue semantics
* specified format of LUNs, with no references to hierarchical LUNs
* added more failure codes roughly corresponding to Linux driver_statuses
* assigned subsystem id
* configuration space changes (the only ones that were actually prompted
by implementation...): added seg_max, clarified reset behavior,
implementing the thing...), added hints for probing the bus.
* minor edits (especially clarifying device vs. driver, host vs. guest,
target vs. initiator)
Here is the lyx version. The PDF version is at
http://people.redhat.com/pbonzini/virtio-spec.pdf and
the text version of the spec is in a reply to this message.
--- virtio-spec.lyx.saved 2011-11-29 14:00:59.782659120 +0100
+++ virtio-spec.lyx 2011-11-30 12:47:48.363580452 +0100
@@ -56,6 +56,7 @@
\html_math_output 0
\html_css_as_file 0
\html_be_strict false
+\author 1531152142 "pbonzini"
\end_header
\begin_body
@@ -321,7 +322,7 @@
\begin_layout Standard
\begin_inset Tabular
-<lyxtabular version="3" rows="8" columns="3">
+<lyxtabular version="3" rows="9" columns="3">
<features tabularvalignment="middle">
<column alignment="center" valignment="top" width="0">
<column alignment="center" valignment="top" width="0">
@@ -530,6 +531,41 @@
</cell>
</row>
<row>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322650850
+7
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322650855
+SCSI host
+\end_layout
+
+\end_inset
+</cell>
+<cell alignment="center" valignment="top" topline="true" leftline="true" rightline="true" usebox="none">
+\begin_inset Text
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322650861
+Appendix H
+\end_layout
+
+\end_inset
+</cell>
+</row>
+<row>
<cell alignment="center" valignment="top" topline="true" bottomline="true" leftline="true" usebox="none">
\begin_inset Text
@@ -6427,6 +6463,2052 @@
\end_layout
\begin_layout Chapter*
+
+\change_inserted 1531152142 1322571716
+Appendix H: SCSI Host Device
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322653067
+The virtio SCSI host device groups together one or more virtual logical
+ units (ie.
+ disk), and allows communicating to them using the SCSI protocol.
+ An instance of the device represents a SCSI host to which many targets
+ and LUNs are attached.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322571726
+The virtio SCSI device services two kinds of requests:
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322571726
+command requests for a logical unit;
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322571726
+task management functions related to a logical unit, target or command.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322571726
+The device is also able to send out notifications about added and removed
+ logical units.
+ Together, these capabilities provide a SCSI transport protocol that uses
+ virtqueues as the transfer medium.
+ In the transport protocol, the virtio driver acts as the initiator, while
+ the virtio SCSI host provides one or more targets that receive and process
+ the requests.
+
+\end_layout
+
+\begin_layout Section*
+
+\change_inserted 1531152142 1322571697
+Configuration
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322651166
+Subsystem
+\begin_inset space ~
+\end_inset
+
+Device
+\begin_inset space ~
+\end_inset
+
+ID 7
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322571777
+Virtqueues 0:controlq; 1:eventq; 2..n:request queues.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322571813
+Feature
+\begin_inset space ~
+\end_inset
+
+bits
+\end_layout
+
+\begin_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1322653523
+VIRTIO_SCSI_F_INOUT
+\begin_inset space ~
+\end_inset
+
+(0) A single request can include both read-only and write-only data buffers.
+\end_layout
+
+\end_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1322651190
+Device
+\begin_inset space ~
+\end_inset
+
+configuration
+\begin_inset space ~
+\end_inset
+
+layout All fields of this configuration are always available.
+
+\series bold
+sense_size
+\series default
+ and
+\series bold
+cdb_size
+\series default
+ are writable by the guest.
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322571919
+
+struct virtio_scsi_config {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575810
+
+ u32 num_queues;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575810
+
+ u32 seg_max;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575811
+
+ u32 event_info_size;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575811
+
+ u32 sense_size;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575812
+
+ u32 cdb_size;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322576412
+
+ u16 max_channel;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322576413
+
+ u16 max_target;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322576414
+
+ u32 max_lun;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322571878
+
+};
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1322571959
+num_queues is the total number of virtqueues exposed by the device.
+ The driver is free to use only one request queue, or it can use more to
+ achieve better performance.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322576073
+seg_max is the maximum number of segments that can be in a command.
+ A bidirectional command can include
+\series bold
+seg_max
+\series default
+ input segments and
+\series bold
+seg_max
+\series default
+output segments.
+\change_unchanged
+
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322571959
+event_info_size is the maximum size that the device will fill for buffers
+ that the driver places in the eventq.
+ The driver should always put buffers at least of this size.
+ It is written by the device depending on the set of negotated features.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322571997
+sense_size is the maximum size of the sense data that the device will write.
+ The default value is written by the device and will always be 96, but the
+ driver can modify it.
+ It is restored to the default when the device is reset.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322575599
+cdb_size is the maximum size of the CDB that the driver will write.
+ The default value is written by the device and will always be 32, but the
+ driver can likewise modify it.
+ It is restored to the default when the device is reset.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322575670
+max_channel,
+\begin_inset space \space{}
+\end_inset
+
+max_target
+\series medium
+
+\begin_inset space ~
+\end_inset
+
+and
+\begin_inset space \space{}
+\end_inset
+
+
+\series default
+max_lun can be used by the driver as hints for scanning the logical units
+ on the host.
+ In the current version of the spec, they will always be respectively 0,
+ 255 and 16383.
+\change_unchanged
+
+\end_layout
+
+\end_deeper
+\begin_layout Section*
+
+\change_inserted 1531152142 1322571959
+Device Initialization
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572042
+The initialization routine should first of all discover the device's virtqueues.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572054
+If the driver uses the eventq, it should then place at least a buffer in
+ the eventq.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572042
+The driver can immediately issue requests (for example, INQUIRY or REPORT
+ LUNS) or task management functions (for example, I_T RESET).
+
+\end_layout
+
+\begin_layout Section*
+
+\change_inserted 1531152142 1322572348
+Device Operation: request queues
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322652394
+The driver queues requests to an arbitrary request queue, and they are used
+ by the device on that same queue.
+ In this version of the spec, if a driver uses more than one queue it is
+ the responsibility of the driver to ensure strict request ordering; commands
+ placed on different queue will be consumed with no order constraints.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572395
+Requests have the following format:
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572526
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572414
+
+struct virtio_scsi_req_cmd {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572417
+
+ u8 lun[8];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572419
+
+ u64 id;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572420
+
+ u8 task_attr;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572422
+
+ u8 prio;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572425
+
+ u8 crn;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572426
+
+ char cdb[cdb_size];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572410
+
+ char dataout[];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572429
+
+ u32 sense_len;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572430
+
+ u32 residual;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572432
+
+ u16 status_qualifier;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572434
+
+ u8 status;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572435
+
+ u8 response;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572437
+
+ u8 sense[sense_size];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572439
+
+ char datain[];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572471
+
+};
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572410
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572476
+
+/* command-specific response values */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572480
+
+#define VIRTIO_SCSI_S_OK 0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572483
+
+#define VIRTIO_SCSI_S_UNDERRUN 1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572489
+
+#define VIRTIO_SCSI_S_ABORTED 2
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572491
+
+#define VIRTIO_SCSI_S_BAD_TARGET 3
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572494
+
+#define VIRTIO_SCSI_S_RESET 4
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572496
+
+#define VIRTIO_SCSI_S_TRANSPORT_FAILURE 5
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572498
+
+#define VIRTIO_SCSI_S_TARGET_FAILURE 6
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572501
+
+#define VIRTIO_SCSI_S_NEXUS_FAILURE 7
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572410
+
+#define VIRTIO_SCSI_S_FAILURE 8
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572502
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572507
+
+/* task_attr */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572510
+
+#define VIRTIO_SCSI_S_SIMPLE 0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572513
+
+#define VIRTIO_SCSI_S_ORDERED 1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572516
+
+#define VIRTIO_SCSI_S_HEAD 2
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322572504
+
+#define VIRTIO_SCSI_S_ACA 3
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322652926
+The
+\series bold
+lun
+\series default
+ field addresses a target and logical unit in the virtio-scsi device's SCSI
+ domain.
+ In this version of the spec, the only supported format for the LUN field
+ is: first byte set to 1, second byte set to target, third and fourth byte
+ representing a single level LUN structure, followed by four zero bytes.
+ With this representation, a virtio-scsi device can serve up to 256 targets
+ and 16384 LUNs per target.
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572562
+The
+\series bold
+id
+\series default
+ field is the command identifier (
+\begin_inset Quotes eld
+\end_inset
+
+tag
+\begin_inset Quotes erd
+\end_inset
+
+).
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572580
+
+\series bold
+Task_attr
+\series default
+,
+\series bold
+prio
+\series default
+ and
+\series bold
+crn
+\series default
+ should be left to zero: command priority is explicitly not supported by
+ this version of the device;
+\series bold
+task_attr
+\series default
+ defines the task attribute as in the table above, but all task attributes
+ may be mapped to SIMPLE by the device;
+\series bold
+crn
+\series default
+ may also be provided by clients, but is generally expected to be 0.
+ The maximum CRN value defined by the protocol is 255, since CRN is stored
+ in an 8-bit integer.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572647
+All of these fields are defined in SAM.
+ They are always read-only, as are the
+\series bold
+cdb
+\series default
+ and
+\series bold
+dataout
+\series default
+ field.
+ The
+\series bold
+cdb_size
+\series default
+ is taken from the configuration space.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572919
+
+\series bold
+sense
+\series default
+ and subsequent fields are always write-only.
+ The
+\series bold
+sense_len
+\series default
+ field indicates the number of bytes actually written to the sense buffer.
+ The
+\series bold
+residual
+\series default
+ field indicates the residual size, calculated as
+\begin_inset Quotes eld
+\end_inset
+
+data_length - number_of_transferred_bytes
+\begin_inset Quotes erd
+\end_inset
+
+, for read or write operations.
+ For bidirectional commands, the number_of_transferred_bytes includes both
+ read and written bytes.
+ A residual field that is less than the size of datain means that the dataout
+ field was processed entirely.
+ A residual field that exceeds the size of datain means that the dataout
+ field was processed partially and the datain field was not processed at
+ all.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572971
+The
+\series bold
+status
+\series default
+ byte is written by the device to be the status code as defined in SAM.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322572971
+The
+\series bold
+response
+\series default
+ byte is written by the device to be one of the following:
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322572971
+VIRTIO_SCSI_S_OK when the request was completed and the status byte is filled
+ with a SCSI status code (not necessarily "GOOD").
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322572971
+VIRTIO_SCSI_S_UNDERRUN if the content of the CDB requires transferring more
+ data than is available in the data buffers.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322652973
+VIRTIO_SCSI_S_ABORTED if the request was cancelled due to an ABORT TASK
+ or ABORT TASK SET task management function.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322573041
+VIRTIO_SCSI_S_BAD_TARGET if the request was never processed because the
+ target indicated by the
+\series bold
+lun
+\series default
+ field does not exist.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322653176
+VIRTIO_SCSI_S_RESET if the request was cancelled due to a bus or device
+ reset (including a task management function).
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322572971
+VIRTIO_SCSI_S_TRANSPORT_FAILURE if the request failed due to a problem in
+ the connection between the host and the target (severed link).
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322572971
+VIRTIO_SCSI_S_TARGET_FAILURE if the target is suffering a failure and the
+ guest should not retry on other paths.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322572971
+VIRTIO_SCSI_S_NEXUS_FAILURE if the nexus is suffering a failure but retrying
+ on other paths might yield a different result.
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322573068
+VIRTIO_SCSI_S_FAILURE for other host or guest error.
+ In particular, if neither dataout nor datain is empty, and the VIRTIO_SCSI_F_IN
+OUT feature has not been negotiated, the request will be immediately returned
+ with a response equal to VIRTIO_SCSI_S_FAILURE.
+
+\end_layout
+
+\begin_layout Section*
+
+\change_inserted 1531152142 1322573130
+Device Operation: controlq
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322573193
+The controlq is used for other SCSI transport operations.
+ Requests have the following format:
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322573233
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573243
+
+struct virtio_scsi_ctrl {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573246
+
+ u32 type;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573248
+
+ ...
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573250
+
+ u8 response;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574229
+
+};
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574230
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574236
+
+/* response values valid for all commands */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574310
+
+#define VIRTIO_SCSI_S_OK 0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574295
+
+#define VIRTIO_SCSI_S_BAD_TARGET 3
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574230
+
+#define VIRTIO_SCSI_S_TRANSPORT_FAILURE 5
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574230
+
+#define VIRTIO_SCSI_S_TARGET_FAILURE 6
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574230
+
+#define VIRTIO_SCSI_S_NEXUS_FAILURE 7
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574230
+
+#define VIRTIO_SCSI_S_FAILURE 8
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574230
+
+#define VIRTIO_SCSI_S_INCORRECT_LUN 11
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322573193
+The
+\series bold
+type
+\series default
+ identifies the remaining fields.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322573193
+The following commands are defined:
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322576973
+Task
+\begin_inset space \space{}
+\end_inset
+
+management
+\begin_inset space \space{}
+\end_inset
+
+function
+\begin_inset space ~
+\end_inset
+
+
+\begin_inset Newline newline
+\end_inset
+
+
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF 0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_ABORT_TASK 0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET 1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_CLEAR_ACA 2
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET 3
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET 4
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET 5
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_QUERY_TASK 6
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET 7
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+struct virtio_scsi_ctrl_tmf
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+{
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+ u32 type;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+ u32 subtype;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+ u8 lun[8];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+ u64 id;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+ u8 additional[];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+ u8 response;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+}
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+/* command-specific response values */
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_S_FUNCTION_COMPLETE 0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED 9
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322573683
+
+#define VIRTIO_SCSI_S_FUNCTION_REJECTED 10
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574667
+The type is VIRTIO_SCSI_T_TMF; the subtype field defines.
+ All fields except
+\series bold
+response
+\series default
+ are filled by the driver.
+ The
+\series bold
+subtype
+\series default
+ field must always be specified and identifies the requested task management
+ function.
+ Other fields may be irrelevant for the requested TMF are ignored.
+ The
+\series bold
+lun
+\series default
+ field is in the same format specified for request queues; the single level
+ LUN is ignored when the task management function addresses a whole I_T
+ nexus.
+ When relevant, the value of the
+\series bold
+id
+\series default
+ field is matched against the id values passed on the requestq.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574668
+Note that since ACA is not supported by this version of the spec, VIRTIO_SCSI_T_
+TMF_CLEAR_ACA is always a no-operation.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574270
+The outcome of the task management function is written by the device in
+ the response field.
+ The command-specific response values map 1-to-1 with those defined in SAM.
+\end_layout
+
+\end_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1322576979
+Asynchronous
+\begin_inset space \space{}
+\end_inset
+
+notification
+\begin_inset space \space{}
+\end_inset
+
+query
+\begin_inset space ~
+\end_inset
+
+
+\begin_inset Newline newline
+\end_inset
+
+
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+#define VIRTIO_SCSI_T_AN_QUERY 1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+struct virtio_scsi_ctrl_an {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+ u32 type;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+ u8 lun[8];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+ u32 event_requested;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+ u32 event_actual;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+ u8 response;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+}
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+#define VIRTIO_SCSI_EVT_ASYNC_OPERATIONAL_CHANGE 2
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+#define VIRTIO_SCSI_EVT_ASYNC_POWER_MGMT 4
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+#define VIRTIO_SCSI_EVT_ASYNC_EXTERNAL_REQUEST 8
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+#define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE 16
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+#define VIRTIO_SCSI_EVT_ASYNC_MULTI_HOST 32
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574160
+
+#define VIRTIO_SCSI_EVT_ASYNC_DEVICE_BUSY 64
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574687
+By sending this command, the driver asks the device which events the given
+ LUN can report, as described in paragraphs 6.6 and A.6 of the SCSI MMC specificat
+ion.
+ The driver writes the events it is interested in into the event_requested;
+ the device responds by writing the events that it supports into event_actual.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574688
+The
+\series bold
+type
+\series default
+ is VIRTIO_SCSI_T_AN_QUERY.
+ The
+\series bold
+lun
+\series default
+ and
+\series bold
+event_requested
+\series default
+ fields are written by the driver.
+ The
+\series bold
+event_actual
+\series default
+ and
+\series bold
+response
+\series default
+ fields are written by the device.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574345
+No command-specific values are defined for the response byte.
+\end_layout
+
+\end_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1322576981
+Asynchronous
+\begin_inset space \space{}
+\end_inset
+
+notification
+\begin_inset space \space{}
+\end_inset
+
+subscription
+\begin_inset space ~
+\end_inset
+
+
+\begin_inset Newline newline
+\end_inset
+
+
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574354
+
+#define VIRTIO_SCSI_T_AN_SUBSCRIBE 2
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+struct virtio_scsi_ctrl_an {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+ u32 type;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+ u8 lun[8];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+ u32 event_requested;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+ u32 event_actual;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+ u8 response;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574342
+
+}
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574708
+By sending this command, the driver asks the specified LUN to report events
+ for its physical interface, again as described in the SCSI MMC specification.
+ The driver writes the events it is interested in into the event_requested;
+ the device responds by writing the events that it supports into event_actual.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574709
+Event types are the same as for the asynchronous notification query message.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574710
+The
+\series bold
+type
+\series default
+ is VIRTIO_SCSI_T_AN_SUBSCRIBE.
+ The
+\series bold
+lun
+\series default
+ and
+\series bold
+event_requested
+\series default
+ fields are written by the driver.
+ The
+\series bold
+event_actual
+\series default
+ and
+\series bold
+response
+\series default
+ fields are written by the device.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574419
+No command-specific values are defined for the response byte.
+\end_layout
+
+\end_deeper
+\begin_layout Section*
+
+\change_inserted 1531152142 1322574433
+Device Operation: eventq
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322653610
+The eventq is used by the device to report information on logical units
+ that are attached to it.
+ The driver should always leave a few buffers ready in the eventq.
+ In general, the device will not queue events to cope with an empty eventq,
+ and will end up dropping events if it finds no buffer ready.
+ However, when reporting events for many LUNs (e.g.
+ when a whole target disappears), the device can throttle events to avoid
+ dropping them.
+ For this reason, placing 10-15 buffers on the event queue should be enough.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574442
+Buffers are placed in the eventq and filled by the device when interesting
+ events occur.
+ The buffers should be strictly write-only (device-filled) and the size
+ of the buffers should be at least the value given in the device's configuration
+ information.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574487
+Buffers returned by the device on the eventq will be referred to as "events"
+ in the rest of this section.
+ Events have the following format:
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574508
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574500
+
+#define VIRTIO_SCSI_T_EVENTS_MISSED 0x80000000
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574500
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574500
+
+struct virtio_scsi_event {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574500
+
+ u32 event;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574500
+
+ ...
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574500
+
+}
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574516
+If bit 31 is set in the event field, the device failed to report an event
+ due to missing buffers.
+ In this case, the driver should poll the logical units for unit attention
+ conditions, and/or do whatever form of bus scan is appropriate for the
+ guest operating system.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574521
+Other data that the device writes to the buffer depends on the contents
+ of the event field.
+ The following events are defined:
+\end_layout
+
+\begin_layout Description
+
+\change_inserted 1531152142 1322653652
+No
+\begin_inset space \space{}
+\end_inset
+
+event
+\begin_inset space ~
+\end_inset
+
+
+\begin_inset Newline newline
+\end_inset
+
+
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574545
+
+#define VIRTIO_SCSI_T_NO_EVENT 0
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Standard
+
+\change_inserted 1531152142 1322576984
+This event is fired in the following cases:
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322574588
+When the device detects in the eventq a buffer that is shorter than what
+ is indicated in the configuration field, it might use it immediately and
+ put this dummy value in the event field.
+ A well-written driver will never observe this situation.
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322574604
+When events are dropped, the device may signal this event as soon as the
+ drivers makes a buffer available, in order to request action from the driver.
+ In this case, of course, this event will be reported with the VIRTIO_SCSI_T_EVE
+NTS_MISSED flag.
+
+\end_layout
+
+\end_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1322576985
+Transport
+\begin_inset space \space{}
+\end_inset
+
+reset
+\begin_inset space ~
+\end_inset
+
+
+\begin_inset Newline newline
+\end_inset
+
+
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+#define VIRTIO_SCSI_T_TRANSPORT_RESET 1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+struct virtio_scsi_reset {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+ u32 event;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+ u8 lun[8];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+ u32 reason;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+}
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+#define VIRTIO_SCSI_EVT_RESET_HARD 0
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+#define VIRTIO_SCSI_EVT_RESET_RESCAN 1
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322574628
+
+#define VIRTIO_SCSI_EVT_RESET_REMOVED 2
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Standard
+
+\change_inserted 1531152142 1322574756
+By sending this event, the device signals that a logical unit on a target
+ has been reset, including the case of a new device appearing or disappearing
+ on the bus.The device fills in all fields.
+ The
+\series bold
+event
+\series default
+ field is set to VIRTIO_SCSI_T_TRANSPORT_RESET.
+ The
+\series bold
+lun
+\series default
+ field addresses a logical unit in the SCSI host.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322577082
+The
+\series bold
+reason
+\series default
+ value is one of the three #define values appearing above:
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322577449
+
+\series bold
+VIRTIO_SCSI_EVT_RESET_REMOVED
+\series default
+ (
+\begin_inset Quotes eld
+\end_inset
+
+LUN/target removed
+\begin_inset Quotes erd
+\end_inset
+
+) is used if the target or logical unit is no longer able to receive commands.
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322577452
+
+\series bold
+VIRTIO_SCSI_EVT_RESET_HARD
+\series default
+ (
+\begin_inset Quotes eld
+\end_inset
+
+LUN hard reset
+\begin_inset Quotes erd
+\end_inset
+
+) is used if the logical unit has been reset, but is still present.
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322577446
+
+\series bold
+VIRTIO_SCSI_EVT_RESET_RESCAN
+\series default
+ (
+\begin_inset Quotes eld
+\end_inset
+
+rescan LUN/target
+\begin_inset Quotes erd
+\end_inset
+
+) is used if a target or logical unit has just appeared on the device.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322577382
+The
+\begin_inset Quotes eld
+\end_inset
+
+removed
+\begin_inset Quotes erd
+\end_inset
+
+ and
+\begin_inset Quotes eld
+\end_inset
+
+rescan
+\begin_inset Quotes erd
+\end_inset
+
+ events, when sent for LUN 0, may apply to the entire target.
+ After receiving them the driver should ask the initiator to rescan the
+ target, in order to detect the case when an entire target has appeared
+ or disappeared.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322577057
+Events will also be reported via sense codes (this obviously does not apply
+ to newly appeared buses or targets, since the application has never discovered
+ them):
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322577457
+\begin_inset Quotes eld
+\end_inset
+
+LUN/target removed
+\begin_inset Quotes erd
+\end_inset
+
+ maps to sense key ILLEGAL REQUEST, asc 0x25, ascq 0x00 (LOGICAL UNIT NOT
+ SUPPORTED)
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322577460
+\begin_inset Quotes eld
+\end_inset
+
+LUN hard reset
+\begin_inset Quotes erd
+\end_inset
+
+ maps to sense key UNIT ATTENTION, asc 0x29 (POWER ON, RESET OR BUS DEVICE
+ RESET OCCURRED)
+\end_layout
+
+\begin_layout Itemize
+
+\change_inserted 1531152142 1322577462
+\begin_inset Quotes eld
+\end_inset
+
+rescan LUN/target
+\begin_inset Quotes erd
+\end_inset
+
+ maps to sense key UNIT ATTENTION, asc 0x3f, ascq 0x0e (REPORTED LUNS DATA
+ HAS CHANGED)
+\change_unchanged
+
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322575482
+The preferred way to detect transport reset is always to use events, because
+ sense codes are only seen by the driver when it sends a SCSI command to
+ the logical unit or target.
+ However, in case events are dropped, the initiator will still be able to
+ synchronize with the actual state of the controller if the driver asks
+ the initiator to rescan of the SCSI bus.
+ During the rescan, the initiator will be able to observe the above sense
+ codes, and it will process them as if it the driver had received the equivalent
+ event.
+
+\end_layout
+
+\end_deeper
+\begin_layout Description
+
+\change_inserted 1531152142 1322576987
+Asynchronous
+\begin_inset space \space{}
+\end_inset
+
+notification
+\begin_inset space ~
+\end_inset
+
+
+\begin_inset Newline newline
+\end_inset
+
+
+\begin_inset listings
+inline false
+status open
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575505
+
+ #define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575505
+
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575505
+
+ struct virtio_scsi_an_event {
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575505
+
+ u32 event;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575505
+
+ u8 lun[8];
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575505
+
+ u32 reason;
+\end_layout
+
+\begin_layout Plain Layout
+
+\change_inserted 1531152142 1322575505
+
+ }
+\end_layout
+
+\end_inset
+
+
+\end_layout
+
+\begin_deeper
+\begin_layout Standard
+
+\change_inserted 1531152142 1322575520
+By sending this event, the device signals that an asynchronous event was
+ fired from a physical interface.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322575546
+All fields are written by the device.
+ The
+\series bold
+event
+\series default
+ field is set to VIRTIO_SCSI_T_ASYNC_NOTIFY.
+ The
+\series bold
+lun
+\series default
+ field addresses a logical unit in the SCSI host.
+ The
+\series bold
+reason
+\series default
+ field is a subset of the events that the driver has subscribed to via the
+ "Asynchronous notification subscription" command.
+\end_layout
+
+\begin_layout Standard
+
+\change_inserted 1531152142 1322575520
+When dropped events are reported, the driver should poll for asynchronous
+ events manually using SCSI commands.
+\change_unchanged
+
+\end_layout
+
+\end_deeper
+\begin_layout Chapter*
Appendix X: virtio-mmio
\end_layout
^ permalink raw reply
* RE: [PATCH 1/1] Staging: hv: storvsc: Move the storage driver out of the staging area
From: KY Srinivasan @ 2011-11-30 13:46 UTC (permalink / raw)
To: James Bottomley
Cc: gregkh@suse.de, linux-kernel@vger.kernel.org,
devel@linuxdriverproject.org, virtualization@lists.osdl.org,
linux-scsi@vger.kernel.org, ohering@suse.com, hch@infradead.org
In-Reply-To: <1321554393.3041.77.camel@dabdike.int.hansenpartnership.com>
> -----Original Message-----
> From: KY Srinivasan
> Sent: Thursday, November 17, 2011 11:53 PM
> To: 'James Bottomley'
> Cc: gregkh@suse.de; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; virtualization@lists.osdl.org; linux-
> scsi@vger.kernel.org; ohering@suse.com; hch@infradead.org
> Subject: RE: [PATCH 1/1] Staging: hv: storvsc: Move the storage driver out of the
> staging area
>
>
>
> > -----Original Message-----
> > From: James Bottomley [mailto:James.Bottomley@HansenPartnership.com]
> > Sent: Thursday, November 17, 2011 1:27 PM
> > To: KY Srinivasan
> > Cc: gregkh@suse.de; linux-kernel@vger.kernel.org;
> > devel@linuxdriverproject.org; virtualization@lists.osdl.org; linux-
> > scsi@vger.kernel.org; ohering@suse.com; hch@infradead.org
> > Subject: Re: [PATCH 1/1] Staging: hv: storvsc: Move the storage driver out of
> the
> > staging area
> >
> > On Tue, 2011-11-08 at 10:13 -0800, K. Y. Srinivasan wrote:
> > > The storage driver (storvsc_drv.c) handles all block storage devices
> > > assigned to Linux guests hosted on Hyper-V. This driver has been in the
> > > staging tree for a while and this patch moves it out of the staging area.
> > > As per Greg's recommendation, this patch makes no changes to the
> staging/hv
> > > directory. Once the driver moves out of staging, we will cleanup the
> > > staging/hv directory.
> > >
> > > This patch includes all the patches that I have sent against the staging/hv
> > > tree to address the comments I have gotten to date on this storage driver.
> >
> > First comment is that it would have been easier to see the individual
> > patches for comment before you committed them.
>
> I am not sure if the patches have been committed yet. All patches were sent
> to various mailing lists and you were copied as well. In the future, I will include
> the scsi mailing list in the set of lists I include for the staging patches.
Greg has checked in these patches now.
>
> >
> > The way you did mempool isn't entirely right: the problem is that to
> > prevent a memory to I/O deadlock we need to ensure forward progress on
> > the drain device. Just having 64 commands available to the host doesn't
> > necessarily achieve this because LUN1 could consume them all and starve
> > LUN0 which is the drain device leading to the deadlock, so the mempool
> > really needs to be per device using slave_alloc.
>
Presently, Linux on Hyper-V can only boot from IDE devices. While IDE devices are
handled via this stor driver, the way we handle them is a little different compared
to block devices configured as scsi devices:
For IDE devices; we have a HBA per LUN. Given that for a long time the drain device will
be an IDE device, my current implementation does address the concern that you had raised
with regards to deadlock avoidance.
>
> >
> > +static int storvsc_device_alloc(struct scsi_device *sdevice)
> > +{
> > + /*
> > + * This enables luns to be located sparsely. Otherwise, we may not
> > + * discovered them.
> > + */
> > + sdevice->sdev_bflags |= BLIST_SPARSELUN | BLIST_LARGELUN;
> > + return 0;
> > +}
> >
> > Looks bogus ... this should happen automatically for SCSI-3 devices ...
> > unless your hypervisor has some strange (and wrong) identification? I
> > really think you want to use SCSI-3 because it will do report LUN
> > scanning, which consumes far fewer resources.
>
Done.
>
> >
> > I still think you need to disable clustering and junk the bvec merge
> > function. Your object seems to be to accumulate in page size multiples
> > (and not aggregate over this) ... that's what clustering is designed to
> > do.
Done. James, I am going to send you (and the scsi mailing list) the patches addressing
your comments. I will also send out a consolidated patch for getting the driver out of
staging as well. I would like to thank you for your help in cleaning up this driver.
Regards,
K. Y
^ permalink raw reply
* Re: [Xen-devel] [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Ian Campbell @ 2011-11-30 13:25 UTC (permalink / raw)
To: Arnd Bergmann
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Pawel Moll, kvm@vger.kernel.org, Stefano Stabellini,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <201111301303.23851.arnd@arndb.de>
On Wed, 2011-11-30 at 13:03 +0000, Arnd Bergmann wrote:
> On Wednesday 30 November 2011, Stefano Stabellini wrote:
> > On Tue, 29 Nov 2011, Arnd Bergmann wrote:
> > > On Tuesday 29 November 2011, Stefano Stabellini wrote:
> > >
> > > Do you have a pointer to the kernel sources for the Linux guest?
> >
> > We have very few changes to the Linux kernel at the moment (only 3
> > commits!), just enough to be able to issue hypercalls and start a PV
> > console.
> >
> > A git branch is available here (not ready for submission):
> >
> > git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git arm
>
> Ok, interesting. There really isn't much of the platform support
> that I was expecting there. I finally found the information
> I was looking for in the xen construct_dom0() function:
>
> 167 regs->r0 = 0; /* SBZ */
> 168 regs->r1 = 2272; /* Machine NR: Versatile Express */
> 169 regs->r2 = 0xc0000100; /* ATAGS */
>
> What this means is that you are emulating the current ARM/Keil reference
> board, at least to the degree that is necessary to get the guest started.
>
> This is the same choice people have made for KVM, but it's not
> necessarily the best option in the long run. In particular, this
> board has a lot of hardware that you claim to have by putting the
> machine number there, when you don't really want to emulate it.
This code is actually setting up dom0 which (for the most part) sees the
real hardware.
The hardcoding of the platform is just a short term hack.
> Pawell Moll is working on a variant of the vexpress code that uses
> the flattened device tree to describe the present hardware [1], and
> I think that would be a much better target for an official release.
> Ideally, the hypervisor should provide the device tree binary (dtb)
> to the guest OS describing the hardware that is actually there.
Agreed. Our intention was to use DT so this fits perfectly with our
plans.
For dom0 we would expose a (possibly filtered) version of the DT given
to us by the firmware (e.g. we might hide a serial port to reserve it
for Xen's use, we'd likely fiddle with the memory map etc).
For domU the DT would presumably be constructed by the toolstack (in
dom0 userspace) as appropriate for the guest configuration. I guess this
needn't correspond to any particular "real" hardware platform.
> This would also be the place where you tell the guest that it should
> look for PV devices. I'm not familiar with how Xen announces PV
> devices to the guest on other architectures, but you have the
> choice between providing a full "binding", i.e. a formal specification
> in device tree format for the guest to detect PV devices in the
> same way as physical or emulated devices, or just providing a single
> place in the device tree in which the guest detects the presence
> of a xen device bus and then uses hcalls to find the devices on that
> bus.
On x86 there is an emulated PCI device which serves as the hooking point
for the PV drivers. For ARM I don't think it would be unreasonable to
have a DT entry instead. I think it would be fine just represent the
root of the "xenbus" and further discovery would occur using the normal
xenbus mechanisms (so not a full binding). AIUI for buses which are
enumerable this is the preferred DT scheme to use.
> Another topic is the question whether there are any hcalls that
> we should try to standardize before we get another architecture
> with multiple conflicting hcall APIs as we have on x86 and powerpc.
The hcall API we are currently targeting is the existing Xen API (at
least the generic parts of it). These generally deal with fairly Xen
specific concepts like grant tables etc.
Ian.
>
> Arnd
>
> [1] http://www.spinics.net/lists/arm-kernel/msg149604.html
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply
* Re: [PATCHv3 RFC] virtio-pci: flexible configuration layout
From: Sasha Levin @ 2011-11-30 13:12 UTC (permalink / raw)
To: Rusty Russell
Cc: Krishna Kumar, kvm, Pawel Moll, Michael S. Tsirkin,
Alexey Kardashevskiy, Wang Sheng-Hui, lkml - Kernel Mailing List,
virtualization, Christian Borntraeger, Amit Shah
In-Reply-To: <87pqgats49.fsf@rustcorp.com.au>
On Wed, 2011-11-30 at 10:10 +1030, Rusty Russell wrote:
> On Mon, 28 Nov 2011 11:15:31 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> > On Mon, 2011-11-28 at 11:25 +1030, Rusty Russell wrote:
> > > I'd like to see kvmtools remove support for legacy mode altogether,
> > > but they probably have existing users.
> >
> > While we can't simply remove it right away, instead of mixing our
> > implementation for both legacy and new spec in the same code we can
> > split the virtio-pci implementation into two:
> >
> > - virtio/virtio-pci-legacy.c
> > - virtio/virtio-pci.c
> >
> > At that point we can #ifdef the entire virtio-pci-legacy.c for now and
> > remove it at the same time legacy virtio-pci is removed from the kernel.
>
> Hmm, that might be neat, but we can't tell the driver core to try
> virtio-pci before virtio-pci-legacy, so we need detection code in both
> modules (and add a "force" flag to virtio-pci-legacy to tell it to
> accept the device even if it's not a legacy-only one).
I was thinking more in the direction of fallback code in virtio-pci.c to
virtio-pci-legacy.c.
Something like:
#ifdef VIRTIO_PCI_LEGACY
[Create BAR0 and map it to virtio-pci-legacy.c]
#endif
So BAR0 isn't defined as long as legacy code is there, which makes
falling back to legacy pretty simple.
--
Sasha.
^ permalink raw reply
* Re: [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Arnd Bergmann @ 2011-11-30 13:03 UTC (permalink / raw)
To: Stefano Stabellini
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Pawel Moll, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <alpine.DEB.2.00.1111301053300.31179@kaball-desktop>
On Wednesday 30 November 2011, Stefano Stabellini wrote:
> On Tue, 29 Nov 2011, Arnd Bergmann wrote:
> > On Tuesday 29 November 2011, Stefano Stabellini wrote:
> >
> > Do you have a pointer to the kernel sources for the Linux guest?
>
> We have very few changes to the Linux kernel at the moment (only 3
> commits!), just enough to be able to issue hypercalls and start a PV
> console.
>
> A git branch is available here (not ready for submission):
>
> git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git arm
Ok, interesting. There really isn't much of the platform support
that I was expecting there. I finally found the information
I was looking for in the xen construct_dom0() function:
167 regs->r0 = 0; /* SBZ */
168 regs->r1 = 2272; /* Machine NR: Versatile Express */
169 regs->r2 = 0xc0000100; /* ATAGS */
What this means is that you are emulating the current ARM/Keil reference
board, at least to the degree that is necessary to get the guest started.
This is the same choice people have made for KVM, but it's not
necessarily the best option in the long run. In particular, this
board has a lot of hardware that you claim to have by putting the
machine number there, when you don't really want to emulate it.
Pawell Moll is working on a variant of the vexpress code that uses
the flattened device tree to describe the present hardware [1], and
I think that would be a much better target for an official release.
Ideally, the hypervisor should provide the device tree binary (dtb)
to the guest OS describing the hardware that is actually there.
This would also be the place where you tell the guest that it should
look for PV devices. I'm not familiar with how Xen announces PV
devices to the guest on other architectures, but you have the
choice between providing a full "binding", i.e. a formal specification
in device tree format for the guest to detect PV devices in the
same way as physical or emulated devices, or just providing a single
place in the device tree in which the guest detects the presence
of a xen device bus and then uses hcalls to find the devices on that
bus.
Another topic is the question whether there are any hcalls that
we should try to standardize before we get another architecture
with multiple conflicting hcall APIs as we have on x86 and powerpc.
Arnd
[1] http://www.spinics.net/lists/arm-kernel/msg149604.html
^ permalink raw reply
* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Ohad Ben-Cohen @ 2011-11-30 11:55 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <20111129151958.GA31789@redhat.com>
On Tue, Nov 29, 2011 at 5:19 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Nov 29, 2011 at 03:57:19PM +0200, Ohad Ben-Cohen wrote:
>> > Is an extra branch faster or slower than reverting d57ed95?
>>
>> Sorry, unfortunately I have no way to measure this, as I don't have
>> any virtualization/x86 setup. I'm developing on ARM SoCs, where
>> virtualization hardware is coming, but not here yet.
>
> You can try using the micro-benchmark in tools/virtio/.
Hmm, care to show me exactly what do you mean ?
Though I somewhat suspect that any micro-benchmarking I'll do with my
random ARM SoC will not have much value to real virtualization/x86
workloads.
Thanks,
Ohad.
^ permalink raw reply
* Re: [RFC] virtio: use mandatory barriers for remote processor vdevs
From: Ohad Ben-Cohen @ 2011-11-30 11:45 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: linux-arm-kernel, linux-kernel, kvm, virtualization
In-Reply-To: <20111129151607.GE30966@redhat.com>
On Tue, Nov 29, 2011 at 5:16 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> This mentions iommu - is there a need to use dma api to let
> the firmware acess the rings? Or does it have access to all
> of memory?
IOMMU may or may not be used, it really depends on the hardware (my
personal SoC does employ one, while others don't).
The vrings are created in non-cacheable memory, which is allocated
using dma_alloc_coherent, but that isn't necessarily controlling the
remote processor access to the memory (a notable example is an
iommu-less remote processor which can directly access the physical
memory).
> Is there cache snooping? If yes access from an external device
> typically works mostly in the same way as smp ...
No, nothing fancy like that. Every processor has its own cache, with
no coherency protocol. The remote processor should really be treated
as a device, and not as a processor that is part of an SMP
configuration, and we must prohibit both the compiler and the CPU from
reordering memory operations.
> So you put virtio rings in MMIO memory?
I'll be precise: the vrings are created in non-cacheable memory, which
both processors have access to.
> Could you please give a couple of examples of breakage?
Sure. Basically, the order of the vring memory operations appear
differently to the observing processor. For example, avail->idx gets
updated before the new entry is put in the available array...
Thanks,
Ohad.
^ permalink raw reply
* Re: [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Stefano Stabellini @ 2011-11-30 11:41 UTC (permalink / raw)
To: Anup Patel
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
Arnd Bergmann, kvm@vger.kernel.org, Stefano Stabellini,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <CAAhSdy3F1oUQP=f_Tig4_MnufwPRpNooZYW8_cSTAse7aOaDoA@mail.gmail.com>
On Wed, 30 Nov 2011, Anup Patel wrote:
> Hi all,
>
> I wanted to know how Xen-ARM for A15 will address following concerns:
>
> - How will Xen-ARM for A15 support legacy guest environment like ARMv5 or ARMv6 ?
It is not our focus at the moment; we are targeting operating systems
that support a modern ARMv7 machine with GIC support.
That said, it might be possible to run legacy guests in the future,
introducing more emulation to the hypervisor.
> - What if my Cortex-A15 board does not have a GIC with virtualization support ?
We expect most hardware vendors to provide a GIC with virtualization
support. However if they do not, in order to support their boards we'll
have to do more emulation in the hypervisor.
^ permalink raw reply
* Re: [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Stefano Stabellini @ 2011-11-30 11:39 UTC (permalink / raw)
To: Arnd Bergmann
Cc: xen-devel@lists.xensource.com, linaro-dev@lists.linaro.org,
kvm@vger.kernel.org, Stefano Stabellini,
linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
android-virt@lists.cs.columbia.edu,
embeddedxen-devel@lists.sourceforge.net,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <201111292129.20444.arnd@arndb.de>
On Tue, 29 Nov 2011, Arnd Bergmann wrote:
> On Tuesday 29 November 2011, Stefano Stabellini wrote:
> > Hi all,
> > a few weeks ago I (and a few others) started hacking on a
> > proof-of-concept hypervisor port to Cortex-A15 which uses and requires
> > ARMv7 virtualization extensions. The intention of this work was to find
> > out how to best support ARM v7+ on Xen. See
> > http://old-list-archives.xen.org/archives/html/xen-arm/2011-09/msg00013.html
> > for more details.
> >
> > I am pleased to announce that significant progress has been made, and
> > that we now have a nascent Xen port for Cortex-A15. The port is based on
> > xen-unstable (HG CS 8d6edc3d26d2) and written from scratch exploiting
> > the latest virtualization, LPAE, GIC and generic timer support in
> > hardware.
>
> Very nice!
>
> Do you have a pointer to the kernel sources for the Linux guest?
We have very few changes to the Linux kernel at the moment (only 3
commits!), just enough to be able to issue hypercalls and start a PV
console.
A git branch is available here (not ready for submission):
git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git arm
the branch above is based on git://linux-arm.org/linux-2.6.git arm-lpae,
even though guests don't really need lpae support to run on Xen.
> Since Xen and KVM are both in an early working state right now,
> it would be very nice if we could agree on the guest model to make
> sure that it's always possible to run the same kernel in both
> (and potentially other future) hypervisors without modifications.
Yes, that would be ideal.
We don't plan on making many changes other than enabling PV frontends
and backends.
^ permalink raw reply
* [PATCH RFC V3 4/4] kvm : pv-ticketlocks support for linux guests running on KVM hypervisor
From: Raghavendra K T @ 2011-11-30 9:00 UTC (permalink / raw)
To: Greg Kroah-Hartman, KVM, Sedat Dilek, Ingo Molnar, Virtualization,
Jeremy Fitzhardinge, x86, H. Peter Anvin, Dave Jiang,
Thomas Gleixner, Stefano Stabellini, Gleb Natapov, Yinghai Lu,
Marcelo Tosatti, Xen, Avi Kivity, Rik van Riel,
Konrad Rzeszutek Wilk, LKML
Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
Suzuki Poulose, Sasha Levin
In-Reply-To: <20111130085921.23386.89708.sendpatchset@oc5400248562.ibm.com>
This patch extends Linux guests running on KVM hypervisor to support
pv-ticketlocks.
During smp_boot_cpus paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_KICK_VCPU) to support pv-ticketlocks. If so,
support for pv-ticketlocks is registered via pv_lock_ops.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 8b1d65d..7e419ad 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -195,10 +195,21 @@ void kvm_async_pf_task_wait(u32 token);
void kvm_async_pf_task_wake(u32 token);
u32 kvm_read_and_reset_pf_reason(void);
extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* CONFIG_PARAVIRT_SPINLOCKS */
+static void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
#define kvm_async_pf_task_wait(T) do {} while(0)
#define kvm_async_pf_task_wake(T) do {} while(0)
+#define kvm_spinlock_init() do {} while (0)
+
static inline u32 kvm_read_and_reset_pf_reason(void)
{
return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index a9c2116..dffeea3 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -33,6 +33,7 @@
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/kprobes.h>
+#include <linux/debugfs.h>
#include <asm/timer.h>
#include <asm/cpu.h>
#include <asm/traps.h>
@@ -545,6 +546,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
#endif
kvm_guest_cpu_init();
native_smp_prepare_boot_cpu();
+ kvm_spinlock_init();
}
static void __cpuinit kvm_guest_cpu_online(void *dummy)
@@ -627,3 +629,248 @@ static __init int activate_jump_labels(void)
return 0;
}
arch_initcall(activate_jump_labels);
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+ TAKEN_SLOW,
+ TAKEN_SLOW_PICKUP,
+ RELEASED_SLOW,
+ RELEASED_SLOW_KICKED,
+ NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+
+static struct kvm_spinlock_stats
+{
+ u32 contention_stats[NR_CONTENTION_STATS];
+
+#define HISTO_BUCKETS 30
+ u32 histo_spin_blocked[HISTO_BUCKETS+1];
+
+ u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+ u8 ret;
+ u8 old = ACCESS_ONCE(zero_stats);
+ if (unlikely(old)) {
+ ret = cmpxchg(&zero_stats, old, 0);
+ /* This ensures only one fellow resets the stat */
+ if (ret == old)
+ memset(&spinlock_stats, 0, sizeof(spinlock_stats));
+ }
+}
+
+static inline void add_stats(enum kvm_contention_stat var, int val)
+{
+ check_zero();
+ spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+ return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+ unsigned index = ilog2(delta);
+
+ check_zero();
+
+ if (index < HISTO_BUCKETS)
+ array[index]++;
+ else
+ array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+ u32 delta = sched_clock() - start;
+
+ __spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+ spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+ d_kvm_debug = debugfs_create_dir("kvm", NULL);
+ if (!d_kvm_debug)
+ printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
+
+ return d_kvm_debug;
+}
+
+static int __init kvm_spinlock_debugfs(void)
+{
+ struct dentry *d_kvm = kvm_init_debugfs();
+
+ if (d_kvm == NULL)
+ return -ENOMEM;
+
+ d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
+
+ debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
+
+ debugfs_create_u32("taken_slow", 0444, d_spin_debug,
+ &spinlock_stats.contention_stats[TAKEN_SLOW]);
+ debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
+ &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
+
+ debugfs_create_u32("released_slow", 0444, d_spin_debug,
+ &spinlock_stats.contention_stats[RELEASED_SLOW]);
+ debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
+ &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
+
+ debugfs_create_u64("time_blocked", 0444, d_spin_debug,
+ &spinlock_stats.time_blocked);
+
+ debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+ spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
+
+ return 0;
+}
+fs_initcall(kvm_spinlock_debugfs);
+#else /* !CONFIG_KVM_DEBUG_FS */
+#define TIMEOUT (1 << 10)
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+}
+
+static inline u64 spin_time_start(void)
+{
+ return 0;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+}
+#endif /* CONFIG_KVM_DEBUG_FS */
+
+struct kvm_lock_waiting {
+ struct arch_spinlock *lock;
+ __ticket_t want;
+};
+
+/* cpus 'waiting' on a spinlock to become available */
+static cpumask_t waiting_cpus;
+
+/* Track spinlock on which a cpu is waiting */
+static DEFINE_PER_CPU(struct kvm_lock_waiting, lock_waiting);
+
+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
+{
+ struct kvm_lock_waiting *w = &__get_cpu_var(lock_waiting);
+ int cpu = smp_processor_id();
+ u64 start;
+ unsigned long flags;
+
+ start = spin_time_start();
+
+ /*
+ * Make sure an interrupt handler can't upset things in a
+ * partially setup state.
+ */
+ local_irq_save(flags);
+
+ /*
+ * The ordering protocol on this is that the "lock" pointer
+ * may only be set non-NULL if the "want" ticket is correct.
+ * If we're updating "want", we must first clear "lock".
+ */
+ w->lock = NULL;
+ smp_wmb();
+ w->want = want;
+ smp_wmb();
+ w->lock = lock;
+
+ add_stats(TAKEN_SLOW, 1);
+
+ /*
+ * This uses set_bit, which is atomic but we should not rely on its
+ * reordering gurantees. So barrier is needed after this call.
+ */
+ cpumask_set_cpu(cpu, &waiting_cpus);
+
+ barrier();
+
+ /*
+ * Mark entry to slowpath before doing the pickup test to make
+ * sure we don't deadlock with an unlocker.
+ */
+ __ticket_enter_slowpath(lock);
+
+ /*
+ * check again make sure it didn't become free while
+ * we weren't looking.
+ */
+ if (ACCESS_ONCE(lock->tickets.head) == want) {
+ add_stats(TAKEN_SLOW_PICKUP, 1);
+ goto out;
+ }
+
+ /* Allow interrupts while blocked */
+ local_irq_restore(flags);
+
+ /* halt until it's our turn and kicked. */
+ halt();
+
+ local_irq_save(flags);
+out:
+ cpumask_clear_cpu(cpu, &waiting_cpus);
+ w->lock = NULL;
+ local_irq_restore(flags);
+ spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick a cpu */
+static inline void kvm_kick_cpu(int cpu)
+{
+ kvm_hypercall1(KVM_HC_KICK_CPU, cpu);
+}
+
+/* Kick vcpu waiting on @lock->head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+ int cpu;
+
+ add_stats(RELEASED_SLOW, 1);
+
+ for_each_cpu(cpu, &waiting_cpus) {
+ const struct kvm_lock_waiting *w = &per_cpu(lock_waiting, cpu);
+ if (ACCESS_ONCE(w->lock) == lock &&
+ ACCESS_ONCE(w->want) == ticket) {
+ add_stats(RELEASED_SLOW_KICKED, 1);
+ kvm_kick_cpu(cpu);
+ break;
+ }
+ }
+}
+
+/*
+ * Setup pv_lock_ops to exploit KVM_FEATURE_KICK_VCPU if present.
+ */
+void __init kvm_spinlock_init(void)
+{
+ if (!kvm_para_available())
+ return;
+ /* Does host kernel support KVM_FEATURE_KICK_VCPU? */
+ if (!kvm_para_has_feature(KVM_FEATURE_KICK_VCPU))
+ return;
+
+ jump_label_inc(¶virt_ticketlocks_enabled);
+
+ pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
+ pv_lock_ops.unlock_kick = kvm_unlock_kick;
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8f4b6db..a89546b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1518,6 +1518,12 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
kvm_make_request(KVM_REQ_UNHALT, vcpu);
break;
}
+ if (vcpu->kicked) {
+ vcpu->kicked = 0;
+ barrier();
+ kvm_make_request(KVM_REQ_UNHALT, vcpu);
+ break;
+ }
if (kvm_cpu_has_pending_timer(vcpu))
break;
if (signal_pending(current))
^ permalink raw reply related
* [PATCH RFC V3 3/4] kvm guest : Added configuration support to enable debug information for KVM Guests
From: Raghavendra K T @ 2011-11-30 9:00 UTC (permalink / raw)
To: Greg Kroah-Hartman, H. Peter Anvin, Gleb Natapov, Virtualization,
Jeremy Fitzhardinge, x86, KVM, Dave Jiang, Thomas Gleixner,
Stefano Stabellini, Xen, Sedat Dilek, Yinghai Lu, Marcelo Tosatti,
Ingo Molnar, Avi Kivity, Rik van Riel, Konrad Rzeszutek Wilk,
LKML
Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
Suzuki Poulose, Sasha Levin
In-Reply-To: <20111130085921.23386.89708.sendpatchset@oc5400248562.ibm.com>
Added configuration support to enable debug information
for KVM Guests in debugfs
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5d8152d..526e3ae 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -561,6 +561,15 @@ config KVM_GUEST
This option enables various optimizations for running under the KVM
hypervisor.
+config KVM_DEBUG_FS
+ bool "Enable debug information for KVM Guests in debugfs"
+ depends on KVM_GUEST && DEBUG_FS
+ default n
+ ---help---
+ This option enables collection of various statistics for KVM guest.
+ Statistics are displayed in debugfs filesystem. Enabling this option
+ may incur significant overhead.
+
source "arch/x86/lguest/Kconfig"
config PARAVIRT
^ permalink raw reply related
* [PATCH RFC V3 2/4] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
From: Raghavendra K T @ 2011-11-30 8:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, KVM, Konrad Rzeszutek Wilk, Sedat Dilek,
Virtualization, Jeremy Fitzhardinge, x86, H. Peter Anvin,
Dave Jiang, Thomas Gleixner, Stefano Stabellini, Gleb Natapov,
Yinghai Lu, Marcelo Tosatti, Ingo Molnar, Avi Kivity,
Rik van Riel, Xen, LKML
Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
Suzuki Poulose, Sasha Levin
In-Reply-To: <20111130085921.23386.89708.sendpatchset@oc5400248562.ibm.com>
Add a hypercall to KVM hypervisor to support pv-ticketlocks
KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
The presence of these hypercalls is indicated to guest via
KVM_FEATURE_KICK_VCPU/KVM_CAP_KICK_VCPU.
Qemu needs a corresponding patch to pass up the presence of this feature to
guest via cpuid. Patch to qemu will be sent separately.
There is no Xen/KVM hypercall interface to await kick from.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 734c376..8b1d65d 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,12 +16,14 @@
#define KVM_FEATURE_CLOCKSOURCE 0
#define KVM_FEATURE_NOP_IO_DELAY 1
#define KVM_FEATURE_MMU_OP 2
+
/* This indicates that the new set of kvmclock msrs
* are available. The use of 0x11 and 0x12 is deprecated
*/
#define KVM_FEATURE_CLOCKSOURCE2 3
#define KVM_FEATURE_ASYNC_PF 4
#define KVM_FEATURE_STEAL_TIME 5
+#define KVM_FEATURE_KICK_VCPU 6
/* The last 8 bits are used to indicate how to interpret the flags field
* in pvclock structure. If no bits are set, all flags are ignored.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c38efd7..6e1c8b4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2103,6 +2103,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
case KVM_CAP_GET_TSC_KHZ:
+ case KVM_CAP_KICK_VCPU:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -2577,7 +2578,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
(1 << KVM_FEATURE_NOP_IO_DELAY) |
(1 << KVM_FEATURE_CLOCKSOURCE2) |
(1 << KVM_FEATURE_ASYNC_PF) |
- (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
+ (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
+ (1 << KVM_FEATURE_KICK_VCPU);
if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
@@ -5305,6 +5307,26 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
return 1;
}
+/*
+ * kvm_pv_kick_cpu_op: Kick a vcpu.
+ *
+ * @cpu - vcpu to be kicked.
+ */
+static void kvm_pv_kick_cpu_op(struct kvm *kvm, int cpu)
+{
+ struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, cpu);
+ struct kvm_mp_state mp_state;
+
+ mp_state.mp_state = KVM_MP_STATE_RUNNABLE;
+ if (vcpu) {
+ vcpu->kicked = 1;
+ /* Ensure kicked is always set before wakeup */
+ barrier();
+ }
+ kvm_arch_vcpu_ioctl_set_mpstate(vcpu, &mp_state);
+ kvm_vcpu_kick(vcpu);
+}
+
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
{
unsigned long nr, a0, a1, a2, a3, ret;
@@ -5341,6 +5363,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
case KVM_HC_MMU_OP:
r = kvm_pv_mmu_op(vcpu, a0, hc_gpa(vcpu, a1, a2), &ret);
break;
+ case KVM_HC_KICK_CPU:
+ kvm_pv_kick_cpu_op(vcpu->kvm, a0);
+ ret = 0;
+ break;
default:
ret = -KVM_ENOSYS;
break;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index f47fcd3..e760035 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
#define KVM_CAP_PPC_HIOR 67
#define KVM_CAP_PPC_PAPR 68
#define KVM_CAP_S390_GMAP 71
+#define KVM_CAP_KICK_VCPU 72
#ifdef KVM_CAP_IRQ_ROUTING
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d526231..ff3b6ff 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -154,6 +154,11 @@ struct kvm_vcpu {
#endif
struct kvm_vcpu_arch arch;
+
+ /*
+ * blocked vcpu wakes up by checking this flag set by unlocker.
+ */
+ int kicked;
};
static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index 47a070b..19f10bd 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -19,6 +19,7 @@
#define KVM_HC_MMU_OP 2
#define KVM_HC_FEATURES 3
#define KVM_HC_PPC_MAP_MAGIC_PAGE 4
+#define KVM_HC_KICK_CPU 5
/*
* hypercalls use architecture specific
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d9cfb78..8f4b6db 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -226,6 +226,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
vcpu->kvm = kvm;
vcpu->vcpu_id = id;
vcpu->pid = NULL;
+ vcpu->kicked = 0;
init_waitqueue_head(&vcpu->wq);
kvm_async_pf_vcpu_init(vcpu);
^ permalink raw reply related
* [PATCH RFC V3 1/4] debugfs: Add support to print u32 array in debugfs
From: Raghavendra K T @ 2011-11-30 8:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Virtualization, Gleb Natapov, H. Peter Anvin,
Jeremy Fitzhardinge, x86, KVM, Dave Jiang, Thomas Gleixner,
Stefano Stabellini, LKML, Sedat Dilek, Yinghai Lu,
Marcelo Tosatti, Ingo Molnar, Avi Kivity, Rik van Riel, Xen,
Konrad Rzeszutek Wilk
Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
Suzuki Poulose, Sasha Levin
In-Reply-To: <20111130085921.23386.89708.sendpatchset@oc5400248562.ibm.com>
Add debugfs support to print u32-arrays in debugfs. Move the code from Xen to debugfs
to make the code common for other users as well.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Suzuki Poulose <suzuki@in.ibm.com>
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
---
diff --git a/arch/x86/xen/debugfs.c b/arch/x86/xen/debugfs.c
index 7c0fedd..c8377fb 100644
--- a/arch/x86/xen/debugfs.c
+++ b/arch/x86/xen/debugfs.c
@@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(void)
return d_xen_debug;
}
-struct array_data
-{
- void *array;
- unsigned elements;
-};
-
-static int u32_array_open(struct inode *inode, struct file *file)
-{
- file->private_data = NULL;
- return nonseekable_open(inode, file);
-}
-
-static size_t format_array(char *buf, size_t bufsize, const char *fmt,
- u32 *array, unsigned array_size)
-{
- size_t ret = 0;
- unsigned i;
-
- for(i = 0; i < array_size; i++) {
- size_t len;
-
- len = snprintf(buf, bufsize, fmt, array[i]);
- len++; /* ' ' or '\n' */
- ret += len;
-
- if (buf) {
- buf += len;
- bufsize -= len;
- buf[-1] = (i == array_size-1) ? '\n' : ' ';
- }
- }
-
- ret++; /* \0 */
- if (buf)
- *buf = '\0';
-
- return ret;
-}
-
-static char *format_array_alloc(const char *fmt, u32 *array, unsigned array_size)
-{
- size_t len = format_array(NULL, 0, fmt, array, array_size);
- char *ret;
-
- ret = kmalloc(len, GFP_KERNEL);
- if (ret == NULL)
- return NULL;
-
- format_array(ret, len, fmt, array, array_size);
- return ret;
-}
-
-static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
- loff_t *ppos)
-{
- struct inode *inode = file->f_path.dentry->d_inode;
- struct array_data *data = inode->i_private;
- size_t size;
-
- if (*ppos == 0) {
- if (file->private_data) {
- kfree(file->private_data);
- file->private_data = NULL;
- }
-
- file->private_data = format_array_alloc("%u", data->array, data->elements);
- }
-
- size = 0;
- if (file->private_data)
- size = strlen(file->private_data);
-
- return simple_read_from_buffer(buf, len, ppos, file->private_data, size);
-}
-
-static int xen_array_release(struct inode *inode, struct file *file)
-{
- kfree(file->private_data);
-
- return 0;
-}
-
-static const struct file_operations u32_array_fops = {
- .owner = THIS_MODULE,
- .open = u32_array_open,
- .release= xen_array_release,
- .read = u32_array_read,
- .llseek = no_llseek,
-};
-
-struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
- struct dentry *parent,
- u32 *array, unsigned elements)
-{
- struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
-
- if (data == NULL)
- return NULL;
-
- data->array = array;
- data->elements = elements;
-
- return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
-}
diff --git a/arch/x86/xen/debugfs.h b/arch/x86/xen/debugfs.h
index e281320..12ebf33 100644
--- a/arch/x86/xen/debugfs.h
+++ b/arch/x86/xen/debugfs.h
@@ -3,8 +3,4 @@
struct dentry * __init xen_init_debugfs(void);
-struct dentry *xen_debugfs_create_u32_array(const char *name, mode_t mode,
- struct dentry *parent,
- u32 *array, unsigned elements);
-
#endif /* _XEN_DEBUGFS_H */
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index fc506e6..14a8961 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -286,7 +286,7 @@ static int __init xen_spinlock_debugfs(void)
debugfs_create_u64("time_blocked", 0444, d_spin_debug,
&spinlock_stats.time_blocked);
- xen_debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
+ debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
return 0;
diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
index 90f7657..df44ccf 100644
--- a/fs/debugfs/file.c
+++ b/fs/debugfs/file.c
@@ -18,6 +18,7 @@
#include <linux/pagemap.h>
#include <linux/namei.h>
#include <linux/debugfs.h>
+#include <linux/slab.h>
static ssize_t default_read_file(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
@@ -525,3 +526,130 @@ struct dentry *debugfs_create_blob(const char *name, mode_t mode,
return debugfs_create_file(name, mode, parent, blob, &fops_blob);
}
EXPORT_SYMBOL_GPL(debugfs_create_blob);
+
+struct array_data {
+ void *array;
+ u32 elements;
+};
+
+static int u32_array_open(struct inode *inode, struct file *file)
+{
+ file->private_data = NULL;
+ return nonseekable_open(inode, file);
+}
+
+static size_t format_array(char *buf, size_t bufsize, const char *fmt,
+ u32 *array, u32 array_size)
+{
+ size_t ret = 0;
+ u32 i;
+
+ for (i = 0; i < array_size; i++) {
+ size_t len;
+
+ len = snprintf(buf, bufsize, fmt, array[i]);
+ len++; /* ' ' or '\n' */
+ ret += len;
+
+ if (buf) {
+ buf += len;
+ bufsize -= len;
+ buf[-1] = (i == array_size-1) ? '\n' : ' ';
+ }
+ }
+
+ ret++; /* \0 */
+ if (buf)
+ *buf = '\0';
+
+ return ret;
+}
+
+static char *format_array_alloc(const char *fmt, u32 *array,
+ u32 array_size)
+{
+ size_t len = format_array(NULL, 0, fmt, array, array_size);
+ char *ret;
+
+ ret = kmalloc(len, GFP_KERNEL);
+ if (ret == NULL)
+ return NULL;
+
+ format_array(ret, len, fmt, array, array_size);
+ return ret;
+}
+
+static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len,
+ loff_t *ppos)
+{
+ struct inode *inode = file->f_path.dentry->d_inode;
+ struct array_data *data = inode->i_private;
+ size_t size;
+
+ if (*ppos == 0) {
+ if (file->private_data) {
+ kfree(file->private_data);
+ file->private_data = NULL;
+ }
+
+ file->private_data = format_array_alloc("%u", data->array,
+ data->elements);
+ }
+
+ size = 0;
+ if (file->private_data)
+ size = strlen(file->private_data);
+
+ return simple_read_from_buffer(buf, len, ppos,
+ file->private_data, size);
+}
+
+static int u32_array_release(struct inode *inode, struct file *file)
+{
+ kfree(file->private_data);
+
+ return 0;
+}
+
+static const struct file_operations u32_array_fops = {
+ .owner = THIS_MODULE,
+ .open = u32_array_open,
+ .release = u32_array_release,
+ .read = u32_array_read,
+ .llseek = no_llseek,
+};
+
+/**
+ * debugfs_create_u32_array - create a debugfs file that is used to read u32
+ * array.
+ * @name: a pointer to a string containing the name of the file to create.
+ * @mode: the permission that the file should have.
+ * @parent: a pointer to the parent dentry for this file. This should be a
+ * directory dentry if set. If this parameter is %NULL, then the
+ * file will be created in the root of the debugfs filesystem.
+ * @array: u32 array that provides data.
+ * @elements: total number of elements in the array.
+ *
+ * This function creates a file in debugfs with the given name that exports
+ * @array as data. If the @mode variable is so set it can be read from.
+ * Writing is not supported. Seek within the file is also not supported.
+ * Once array is created its size can not be changed.
+ *
+ * The function returns a pointer to dentry on success. If debugfs is not
+ * enabled in the kernel, the value -%ENODEV will be returned.
+ */
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+ struct dentry *parent,
+ u32 *array, u32 elements)
+{
+ struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL);
+
+ if (data == NULL)
+ return NULL;
+
+ data->array = array;
+ data->elements = elements;
+
+ return debugfs_create_file(name, mode, parent, data, &u32_array_fops);
+}
+EXPORT_SYMBOL_GPL(debugfs_create_u32_array);
diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h
index e7d9b20..253e2fb 100644
--- a/include/linux/debugfs.h
+++ b/include/linux/debugfs.h
@@ -74,6 +74,10 @@ struct dentry *debugfs_create_blob(const char *name, mode_t mode,
struct dentry *parent,
struct debugfs_blob_wrapper *blob);
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+ struct dentry *parent,
+ u32 *array, u32 elements);
+
bool debugfs_initialized(void);
#else
@@ -193,6 +197,13 @@ static inline bool debugfs_initialized(void)
return false;
}
+struct dentry *debugfs_create_u32_array(const char *name, mode_t mode,
+ struct dentry *parent,
+ u32 *array, u32 elements)
+{
+ return ERR_PTR(-ENODEV);
+}
+
#endif
#endif
^ permalink raw reply related
* [PATCH RFC V3 0/4] kvm : Paravirt-spinlock support for KVM guests
From: Raghavendra K T @ 2011-11-30 8:59 UTC (permalink / raw)
To: Greg Kroah-Hartman, Sedat Dilek, Stefano Stabellini, KVM,
Jeremy Fitzhardinge, x86, H. Peter Anvin, Dave Jiang,
Thomas Gleixner, Marcelo Tosatti, Yinghai Lu, Gleb Natapov,
Ingo Molnar, Avi Kivity, Xen, Virtualization, Rik van Riel,
Konrad Rzeszutek Wilk, LKML
Cc: Peter Zijlstra, Raghavendra K T, Srivatsa Vaddagiri, Dave Hansen,
Suzuki Poulose, Sasha Levin
The 4-patch series to follow this email extends KVM-hypervisor and Linux guest
running on KVM-hypervisor to support pv-ticket spinlocks, based on Xen's implementation.
One hypercall is introduced in KVM hypervisor,that allows a vcpu to kick
another vcpu out of halt state.
The blocking of vcpu is done using halt() in (lock_spinning) slowpath.
The V2 change discussion was in:
https://lkml.org/lkml/2011/10/23/207
Previous discussions : (posted by Srivatsa V).
https://lkml.org/lkml/2010/7/26/24
https://lkml.org/lkml/2011/1/19/212
The BASE patch is tip 3.2-rc1 + Jeremy's following patches.
xadd (https://lkml.org/lkml/2011/10/4/328)
x86/ticketlocklock (https://lkml.org/lkml/2011/10/12/496).
Changes in V3:
- rebased to 3.2-rc1
- use halt() instead of wait for kick hypercall.
- modify kick hyper call to do wakeup halted vcpu.
- hook kvm_spinlock_init to smp_prepare_cpus call (moved the call out of head##.c).
- fix the potential race when zero_stat is read.
- export debugfs_create_32 and add documentation to API.
- use static inline and enum instead of ADDSTAT macro.
- add barrier() in after setting kick_vcpu.
- empty static inline function for kvm_spinlock_init.
- combine the patches one and two readuce overhead.
- make KVM_DEBUGFS depends on DEBUGFS.
- include debugfs header unconditionally.
Changes in V2:
- rebased patchesto -rc9
- synchronization related changes based on Jeremy's changes (Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>) pointed by
Stephan Diestelhorst <stephan.diestelhorst@amd.com>
- enabling 32 bit guests
- splitted patches into two more chunks
Srivatsa Vaddagiri, Suzuki Poulose, Raghavendra K T (4):
Add debugfs support to print u32-arrays in debugfs
Add a hypercall to KVM hypervisor to support pv-ticketlocks
Added configuration support to enable debug information for KVM Guests
pv-ticketlocks support for linux guests running on KVM hypervisor
Results:
From the results we can see that patched kernel performance is similar to
BASE when there is no lock contention. But once we start seeing more
contention, patched kernel outperforms BASE.
set up :
Kernel for host/guest : 3.2-rc1 + Jeremy's xadd, pv spinlock patches as BASE
3 guests with 8VCPU, 4GB RAM, 1 used for kernbench (kernbench -f -H -M -o 20) other for cpuhog (shell script while
true with an instruction)
scenario A: unpinned
1x: no hogs
2x: 8hogs in one guest
3x: 8hogs each in two guest
Result for Non PLE machine :
Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM
BASE BASE+patch %improvement
mean (sd) mean (sd)
Scenario A:
case 1x: 157.548 (10.624) 156.408 (11.1622) 0.723589
case 2x: 1110.18 (807.019) 310.96 (105.194) 71.9901
case 3x: 3110.36 (2408.03) 303.688 (110.474) 90.2362
Result for PLE machine:
Machine : IBM xSeries with Intel(R) Xeon(R) X7560 2.27GHz CPU with 32/64 core, with 8
online cores and 4*64GB RAM
BASE BASE+patch %improvement
mean (sd) mean (sd)
Scenario A:
case 1x: 159.725 (47.4906) 159.07 (47.8133) 0.41008
case 2x: 190.957 (49.2976) 187.273 (50.5469) 1.92923
case 3x: 226.317 (88.6023) 223.698 (90.4362) 1.15723
---
13 files changed, 454 insertions(+), 112 deletions(-)
arch/x86/Kconfig | 9 ++
arch/x86/include/asm/kvm_para.h | 17 +++-
arch/x86/kernel/kvm.c | 247 +++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 28 +++++-
arch/x86/xen/debugfs.c | 104 ----------------
arch/x86/xen/debugfs.h | 4 -
arch/x86/xen/spinlock.c | 2 +-
fs/debugfs/file.c | 128 ++++++++++++++++++++
include/linux/debugfs.h | 11 ++
include/linux/kvm.h | 1 +
include/linux/kvm_host.h | 5 +
include/linux/kvm_para.h | 1 +
virt/kvm/kvm_main.c | 7 +
13 files changed, 452 insertions(+), 112 deletions(-)
^ permalink raw reply
* Re: [PATCHv3 RFC] virtio-pci: flexible configuration layout
From: Michael S. Tsirkin @ 2011-11-30 8:14 UTC (permalink / raw)
To: Rusty Russell
Cc: Krishna Kumar, kvm, Pawel Moll, Wang Sheng-Hui,
Alexey Kardashevskiy, lkml - Kernel Mailing List, virtualization,
Christian Borntraeger, Sasha Levin, Amit Shah
In-Reply-To: <87pqgats49.fsf@rustcorp.com.au>
On Wed, Nov 30, 2011 at 10:10:22AM +1030, Rusty Russell wrote:
> On Mon, 28 Nov 2011 11:15:31 +0200, Sasha Levin <levinsasha928@gmail.com> wrote:
> > On Mon, 2011-11-28 at 11:25 +1030, Rusty Russell wrote:
> > > I'd like to see kvmtools remove support for legacy mode altogether,
> > > but they probably have existing users.
> >
> > While we can't simply remove it right away, instead of mixing our
> > implementation for both legacy and new spec in the same code we can
> > split the virtio-pci implementation into two:
> >
> > - virtio/virtio-pci-legacy.c
> > - virtio/virtio-pci.c
> >
> > At that point we can #ifdef the entire virtio-pci-legacy.c for now and
> > remove it at the same time legacy virtio-pci is removed from the kernel.
>
> Hmm, that might be neat, but we can't tell the driver core to try
> virtio-pci before virtio-pci-legacy, so we need detection code in both
> modules (and add a "force" flag to virtio-pci-legacy to tell it to
> accept the device even if it's not a legacy-only one).
This flag might need to be per device ideally, which is tricky ...
>
> Then it should work...
> Cheers,
> Rusty.
One also wonders whether and how this will work on other OS-es.
--
MST
^ permalink raw reply
* Re: [PATCHv3 RFC] virtio-pci: flexible configuration layout
From: Michael S. Tsirkin @ 2011-11-30 7:18 UTC (permalink / raw)
To: Rusty Russell
Cc: Krishna Kumar, kvm, Pawel Moll, Wang Sheng-Hui,
Alexey Kardashevskiy, lkml - Kernel Mailing List, virtualization,
Christian Borntraeger, Sasha Levin, Amit Shah
In-Reply-To: <87sjl6tsnm.fsf@rustcorp.com.au>
On Wed, Nov 30, 2011 at 09:58:45AM +1030, Rusty Russell wrote:
> > I think I see a way to do that in a relatively painless way.
> > Do you prefer seeing driver patches or spec? Or are you not interested
> > in reusing the same structure at all?
>
> I think we should look at code at this point; my gut says we're going to
> be not-quite-similar-enough-to-be-useful. At which point, a clean-slate
> approach is more appealing. But the code will show, one way or another.
>
> Thanks,
> Rusty.
Makes sense, absolutely. So I'll hack on it and post and we can
judge the result.
One small comment that I'm afraid was lost in the noise
is that we should not add any 64 bit fields in the common area,
because there's no generic iowrite64/ioread64.
--
MST
^ permalink raw reply
* Re: [ANNOUNCE] Xen port to Cortex-A15 / ARMv7 with virt extensions
From: Anup Patel @ 2011-11-30 4:42 UTC (permalink / raw)
To: Arnd Bergmann
Cc: xen-devel, linaro-dev, kvm, Stefano Stabellini, linux-kernel,
virtualization, android-virt, embeddedxen-devel, linux-arm-kernel
In-Reply-To: <201111292129.20444.arnd@arndb.de>
[-- Attachment #1.1: Type: text/plain, Size: 1712 bytes --]
Hi all,
I wanted to know how Xen-ARM for A15 will address following concerns:
- How will Xen-ARM for A15 support legacy guest environment like ARMv5 or
ARMv6 ?
- What if my Cortex-A15 board does not have a GIC with virtualization
support ?
Best Regards,
Anup Patel
On Wed, Nov 30, 2011 at 2:59 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tuesday 29 November 2011, Stefano Stabellini wrote:
> > Hi all,
> > a few weeks ago I (and a few others) started hacking on a
> > proof-of-concept hypervisor port to Cortex-A15 which uses and requires
> > ARMv7 virtualization extensions. The intention of this work was to find
> > out how to best support ARM v7+ on Xen. See
> >
> http://old-list-archives.xen.org/archives/html/xen-arm/2011-09/msg00013.html
> > for more details.
> >
> > I am pleased to announce that significant progress has been made, and
> > that we now have a nascent Xen port for Cortex-A15. The port is based on
> > xen-unstable (HG CS 8d6edc3d26d2) and written from scratch exploiting
> > the latest virtualization, LPAE, GIC and generic timer support in
> > hardware.
>
> Very nice!
>
> Do you have a pointer to the kernel sources for the Linux guest?
> Since Xen and KVM are both in an early working state right now,
> it would be very nice if we could agree on the guest model to make
> sure that it's always possible to run the same kernel in both
> (and potentially other future) hypervisors without modifications.
>
> Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
[-- Attachment #1.2: Type: text/html, Size: 2540 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox