All of lore.kernel.org
 help / color / mirror / Atom feed
From: Don Dutile <ddutile-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Alex Williamson
	<alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: "Benoît Canet"
	<benoit.canet-J9ArbTHlV+bR7s880joybQ@public.gmane.org>,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org
Subject: Re: VFIO and scheduled SR-IOV cards
Date: Mon, 03 Jun 2013 14:34:29 -0400	[thread overview]
Message-ID: <51ACE1B5.2050102@redhat.com> (raw)
In-Reply-To: <1370282529.30975.344.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>

On 06/03/2013 02:02 PM, Alex Williamson wrote:
> On Mon, 2013-06-03 at 18:33 +0200, Benoît Canet wrote:
>> Hello,
>>
>> I plan to write a PF driver for an SR-IOV card and make the VFs work with QEMU's
>> VFIO passthrough so I am asking the following design question before trying to
>> write and push code.
>>
>> After SR-IOV being enabled on this hardware only one VF function can be active
>> at a given time.
>
> Is this actually an SR-IOV device or are you trying to write a driver
> that emulates SR-IOV for a PF?
>
>> The PF host kernel driver is acting as a scheduler.
>> It switch every few milliseconds which VF is the current active function while
>> disabling the others VFs.
>>
that's time-sharing of hw, which sw doesn't see ... so, ok.

>> One consequence of how the hardware works is that the MMR regions of the
>> switched off VFs must be unmapped and their io access should block until the VF
>> is switched on again.
>
This violates the spec., and does impact sw -- how can one assign such a VF to a guest
-- it does not work indep. of other VFs.

> MMR = Memory Mapped Register?
>
> This seems contradictory to the SR-IOV spec, which states:
>
>          Each VF contains a non-shared set of physical resources required
>          to deliver Function-specific
>          services, e.g., resources such as work queues, data buffers,
>          etc. These resources can be directly
>          accessed by an SI without requiring VI or SR-PCIM intervention.
>
> Furthermore, each VF should have a separate requester ID.  What's being
> suggested here seems like maybe that's not the case.  If true, it would
I didn't read it that way above.  I read it as the PCIe end is timeshared
btwn VFs (& PFs?). .... with some VFs disappearing (from a driver perspective)
as if the device was hot unplug w/o notification.  That will probably cause
read-timeouts & SME's, bringing down most enterprise-level systems.

> make iommu groups challenging.  Is there any VF save/restore around the
> scheduling?
>
>> Each IOMMU map/unmap should be done in less than 100ns.
>
> I think that may be a lot to ask if we need to unmap the regions in the
> guest and in the iommu.  If the "VFs" used different requester IDs,
> iommu unmapping whouldn't be necessary.  I experimented with switching
> between trapped (read/write) access to memory regions and mmap'd (direct
> mapping) for handling legacy interrupts.  There was a noticeable
> performance penalty switching per interrupt.
>
>> As the kernel iommu module is being called by the VFIO driver the PF driver
>> cannot interface with it.
>>
>> Currently the only interface of the VFIO code is for the userland QEMU process
>> and I fear that notifying QEMU that it should do the unmap/block would take more
>> than 100ns.
>>
>> Also blocking the IO access in QEMU under the BQL would freeze QEMU.
>>
>> Do you have and idea on how to write this required map and block/unmap feature ?
>
> It seems like there are several options, but I'm doubtful that any of
> them will meet 100ns.  If this is completely fake SR-IOV and there's not
> a different requester ID per VF, I'd start with seeing if you can even
> do the iommu_unmap/iommu_map of the MMIO BARs in under 100ns.  If that's
> close to your limit, then your only real option for QEMU is to freeze
> it, which still involves getting multiple (maybe many) vCPUs out of VM
> mode.  That's not free either.  If by some miracle you have time to
> spare, you could remap the regions to trapped mode and let the vCPUs run
> while vfio blocks on read/write.
>
> Maybe there's even a question whether mmap'd mode is worthwhile for this
> device.  Trapping every read/write is orders of magnitude slower, but
> allows you to handle the "wait for VF" on the kernel side.
>
> If you can provide more info on the device design/contraints, maybe we
> can come up with better options.  Thanks,
>
> Alex
>
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: Don Dutile <ddutile@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Benoît Canet" <benoit.canet@irqsave.net>,
	iommu@lists.linux-foundation.org, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] VFIO and scheduled SR-IOV cards
Date: Mon, 03 Jun 2013 14:34:29 -0400	[thread overview]
Message-ID: <51ACE1B5.2050102@redhat.com> (raw)
In-Reply-To: <1370282529.30975.344.camel@ul30vt.home>

On 06/03/2013 02:02 PM, Alex Williamson wrote:
> On Mon, 2013-06-03 at 18:33 +0200, Benoît Canet wrote:
>> Hello,
>>
>> I plan to write a PF driver for an SR-IOV card and make the VFs work with QEMU's
>> VFIO passthrough so I am asking the following design question before trying to
>> write and push code.
>>
>> After SR-IOV being enabled on this hardware only one VF function can be active
>> at a given time.
>
> Is this actually an SR-IOV device or are you trying to write a driver
> that emulates SR-IOV for a PF?
>
>> The PF host kernel driver is acting as a scheduler.
>> It switch every few milliseconds which VF is the current active function while
>> disabling the others VFs.
>>
that's time-sharing of hw, which sw doesn't see ... so, ok.

>> One consequence of how the hardware works is that the MMR regions of the
>> switched off VFs must be unmapped and their io access should block until the VF
>> is switched on again.
>
This violates the spec., and does impact sw -- how can one assign such a VF to a guest
-- it does not work indep. of other VFs.

> MMR = Memory Mapped Register?
>
> This seems contradictory to the SR-IOV spec, which states:
>
>          Each VF contains a non-shared set of physical resources required
>          to deliver Function-specific
>          services, e.g., resources such as work queues, data buffers,
>          etc. These resources can be directly
>          accessed by an SI without requiring VI or SR-PCIM intervention.
>
> Furthermore, each VF should have a separate requester ID.  What's being
> suggested here seems like maybe that's not the case.  If true, it would
I didn't read it that way above.  I read it as the PCIe end is timeshared
btwn VFs (& PFs?). .... with some VFs disappearing (from a driver perspective)
as if the device was hot unplug w/o notification.  That will probably cause
read-timeouts & SME's, bringing down most enterprise-level systems.

> make iommu groups challenging.  Is there any VF save/restore around the
> scheduling?
>
>> Each IOMMU map/unmap should be done in less than 100ns.
>
> I think that may be a lot to ask if we need to unmap the regions in the
> guest and in the iommu.  If the "VFs" used different requester IDs,
> iommu unmapping whouldn't be necessary.  I experimented with switching
> between trapped (read/write) access to memory regions and mmap'd (direct
> mapping) for handling legacy interrupts.  There was a noticeable
> performance penalty switching per interrupt.
>
>> As the kernel iommu module is being called by the VFIO driver the PF driver
>> cannot interface with it.
>>
>> Currently the only interface of the VFIO code is for the userland QEMU process
>> and I fear that notifying QEMU that it should do the unmap/block would take more
>> than 100ns.
>>
>> Also blocking the IO access in QEMU under the BQL would freeze QEMU.
>>
>> Do you have and idea on how to write this required map and block/unmap feature ?
>
> It seems like there are several options, but I'm doubtful that any of
> them will meet 100ns.  If this is completely fake SR-IOV and there's not
> a different requester ID per VF, I'd start with seeing if you can even
> do the iommu_unmap/iommu_map of the MMIO BARs in under 100ns.  If that's
> close to your limit, then your only real option for QEMU is to freeze
> it, which still involves getting multiple (maybe many) vCPUs out of VM
> mode.  That's not free either.  If by some miracle you have time to
> spare, you could remap the regions to trapped mode and let the vCPUs run
> while vfio blocks on read/write.
>
> Maybe there's even a question whether mmap'd mode is worthwhile for this
> device.  Trapping every read/write is orders of magnitude slower, but
> allows you to handle the "wait for VF" on the kernel side.
>
> If you can provide more info on the device design/contraints, maybe we
> can come up with better options.  Thanks,
>
> Alex
>
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

  parent reply	other threads:[~2013-06-03 18:34 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-03 16:33 VFIO and scheduled SR-IOV cards Benoît Canet
2013-06-03 16:33 ` [Qemu-devel] " Benoît Canet
     [not found] ` <20130603163305.GC4094-J9ArbTHlV+bR7s880joybQ@public.gmane.org>
2013-06-03 18:02   ` Alex Williamson
2013-06-03 18:02     ` [Qemu-devel] " Alex Williamson
     [not found]     ` <1370282529.30975.344.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2013-06-03 18:34       ` Don Dutile [this message]
2013-06-03 18:34         ` Don Dutile
     [not found]         ` <51ACE1B5.2050102-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-06-03 18:57           ` Alex Williamson
2013-06-03 18:57             ` [Qemu-devel] " Alex Williamson
     [not found]             ` <1370285865.30975.361.camel-85EaTFmN5p//9pzu0YdTqQ@public.gmane.org>
2013-06-04 15:50               ` Benoît Canet
2013-06-04 15:50                 ` Benoît Canet
     [not found]                 ` <20130604155030.GA5991-J9ArbTHlV+bR7s880joybQ@public.gmane.org>
2013-06-04 18:31                   ` Alex Williamson
2013-06-04 18:31                     ` Alex Williamson
2013-07-10 10:23                   ` Michael S. Tsirkin
2013-07-10 10:23                     ` [Qemu-devel] " Michael S. Tsirkin
     [not found]                     ` <20130710102355.GB10203-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-07-28 15:17                       ` Benoît Canet
2013-07-28 15:17                         ` Benoît Canet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51ACE1B5.2050102@redhat.com \
    --to=ddutile-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=benoit.canet-J9ArbTHlV+bR7s880joybQ@public.gmane.org \
    --cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=qemu-devel-qX2TKyscuCcdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.