qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Hannes Reinecke <hare@suse.de>
Cc: Lin Ma <lma@suse.com>, Stefan Hajnoczi <stefanha@gmail.com>,
	Zhiqiang Zhou <ZZhou@suse.com>, Fam Zheng <famz@redhat.com>,
	qemu-devel@nongnu.org, stefanha@redhat.com, mst@redhat.com
Subject: Re: [Qemu-devel] 答复: Re: [RFC] virtio-fc: draft idea of virtual fibre channel HBA
Date: Tue, 16 May 2017 04:19:27 -0400 (EDT)	[thread overview]
Message-ID: <1282321742.7961608.1494922767010.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <5c410c0e-f0b0-a049-84ed-7e31eb4e1dab@suse.de>


> Maybe a union with an overall size of 256 byte (to hold the iSCSI iqn
> string), which for FC carries the WWPN and the WWNN?

That depends on how you would like to do controller passthrough in
general.  iSCSI doesn't have the 64-bit target ID, and doesn't have
(AFAIK) hot-plug/hot-unplug support, so it's less important than FC.

> > 2) If the initiator ID is the moral equivalent of a MAC address,
> > shouldn't it be the host that provides the initiator ID to the host in
> > the virtio-scsi config space?  (From your proposal, I'd guess it's the
> > latter, but maybe I am not reading correctly).
> 
> That would be dependent on the emulation. For emulated SCSI disk I guess
> we need to specify it in the commandline somewhere, but for scsi
> passthrough we could grab it from the underlying device.

Wait, that would be the target ID.  The initiator ID would be the NPIV
vport's WWNN/WWPN.  It could be specified on the QEMU command line, or
it could be tied to some file descriptor (created and initialized by
libvirt, which has CAP_SYS_ADMIN, and then passed to QEMU; similar to
tap file descriptors).

> >> b) stop exposing the devices attached to that NPIV host to the guest
> > 
> > What do you mean exactly?
> > 
> That's one of the longer term plans I have.
> When doing NPIV currently all devices from the NPIV host appear on the
> host. Including all partitions, LVM devices and what not. [...]
> If we make the (guest) initiator ID identical to the NPIV WWPN we can
> tag the _host_ to not expose any partitions on any LUNs, making the
> above quite easy.

Yes, definitely.

> > At this point, I can think of several ways  to do this, one being SG_IO
> > in QEMU while the other are more exoteric.
> > 
> > 1) use virtio-scsi with userspace passthrough (current solution).
> 
> With option (1) and the target/initiator ID extensions we should be able
> to get basic NPIV support to work, and would even be able to handle
> reservations in a sane manner.

Agreed, but I'm not anymore that sure that the advantages outweigh the
disadvantages.  Also, let's add no FC-NVMe support to the disadvantages.

> > 2) the exact opposite: use the recently added "mediated device
> > passthrough" (mdev) framework to present a "fake" PCI device to the
> > guest.
> 
> (2) sounds interesting, but I'd have to have a look into the code to
> figure out if it could easily be done.

Not that easy, but it's the bread and butter of the hardware manufacturers.
If we want them to do it alone, (2) is the way.  Both nVidia and Intel are
using it.

> > 3) handle passthrough with a kernel driver.  Under this model, the guest
> > uses the virtio device, but the passthrough of commands and TMFs is
> > performed by the host driver.
> > 
> > We can then choose whether to do it with virtio-scsi or with a new
> > virtio-fc.
>
> (3) would be feasible, as it would effectively mean 'just' to update the
> current NPIV mechanism. However, this would essentially lock us in for
> FC; any other types (think NVMe) will require yet another solution.

An FC-NVMe driver could also expose the same vhost interface, couldn't it?
FC-NVMe doesn't have to share the Linux code; but sharing the virtio standard
and the userspace ABI would be great.

In fact, the main advantage of virtio-fc would be that (if we define it properly)
it could be reused for FC-NVMe instead of having to extend e.g. virtio-blk.
For example virtio-scsi has request, to-device payload, response, from-device
payload.  virtio-fc's request format could be the initiator and target port
identifiers, followed by FCP_CMD, to-device payload, FCP_RSP, from-device
payload.

> > 4) same as (3), but in userspace with a "macvtap" like layer (e.g.,
> > socket+bind creates an NPIV vport).  This layer can work on some kind of
> > FCP encapsulation, not the raw thing, and virtio-fc could be designed
> > according to a similar format for simplicity.
>
> (4) would require raw FCP frame access, which is one thing we do _not_
> have. Each card (except for the pure FCoE ones like bnx2fc, fnic, and
> fcoe) only allows access to pre-formatted I/O commands. And has it's own
> mechanism for generatind sequence IDs etc. So anything requiring raw FCP
> access is basically out of the game.

Not raw.  It could even be defined at the exchange level (plus some special
things for discovery and login services).  But I agree that (4) is a bit
pie-in-the-sky.

> Overall, I would vote to specify a new virtio scsi format _first_,
> keeping in mind all of these options.
> (1), (3), and (4) all require an update anyway :-)
> 
> The big advantage I see with (1) is that it can be added with just some
> code changes to qemu and virtio-scsi. Every other option require some
> vendor buy-in, which inevitably leads to more discussions, delays, and
> more complex interaction (changes to qemu, virtio, _and_ the affected HBAs).

I agree.  But if we have to reinvent everything in a couple years for
NVMe over fabrics, maybe it's not worth it.

> While we're at it: We also need a 'timeout' field to the virtion request
> structure. I even posted an RFC for it :-)

Yup, I've seen it. :)

Paolo

  reply	other threads:[~2017-05-16  8:19 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-15  7:15 [Qemu-devel] [RFC] virtio-fc: draft idea of virtual fibre channel HBA Lin Ma
2017-02-15 15:33 ` Stefan Hajnoczi
2017-02-16  7:16   ` [Qemu-devel] 答复: " Lin Ma
2017-02-16  8:39     ` Paolo Bonzini
2017-02-16  9:02       ` Lin Ma
2017-02-16  9:56       ` Hannes Reinecke
2017-02-22  8:19         ` Lin Ma
2017-02-22  9:20           ` Hannes Reinecke
2017-05-15 17:21             ` Paolo Bonzini
2017-05-16  6:34               ` Hannes Reinecke
2017-05-16  8:19                 ` Paolo Bonzini [this message]
2017-05-16 15:22                   ` Hannes Reinecke
2017-05-16 16:22                     ` Paolo Bonzini
2017-05-17  6:01                       ` Hannes Reinecke
2017-05-17  7:33                         ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1282321742.7961608.1494922767010.JavaMail.zimbra@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=ZZhou@suse.com \
    --cc=famz@redhat.com \
    --cc=hare@suse.de \
    --cc=lma@suse.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).