* Re: [virtio-dev] Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Siwei Liu @ 2018-06-05 21:16 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Alexander Duyck, virtio-dev, qemu-devel, Samudrala, Sridhar,
virtualization
In-Reply-To: <20180605152344-mutt-send-email-mst@kernel.org>
Good to see this discussion going. I share the same feeling that the
decision of plugging the primary (passthrough) should only be made
until guest driver acknowledges DRIVER_OK and _F_STANDBY.
Architecturally this intelligence should be baken to QEMU itself
rather than moving up to management stack, such as libvirt.
A few questions in line below.
On Tue, Jun 5, 2018 at 5:33 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> I don't think this is sufficient.
>
> If both primary and standby devices are present, a legacy guest without
> support for the feature might see two devices with same mac and get
> confused.
>
> I think that we should only make primary visible after guest acked the
> backup feature bit.
>
> And on reset or when backup is cleared in some other way, unplug the
> primary.
>
> Something like the below will do the job:
>
> Primary device is added with a special "primary-failover" flag.
I wonder if you envision this flag as a user interface or some
internal attribute/property to QEMU device?
Since QEMU needs to associate this particular primary-failover device
with a virtio standby instance for checking DRIVER_OK as you indicate
below, I wonder if the group ID thing will be helpful to set
"primary-failover" flag internally without having to add another
option in CLI?
> A virtual machine is then initialized with just a standby virtio
> device. Primary is not yet added.
>
> Later QEMU detects that guest driver device set DRIVER_OK.
> It then exposes the primary device to the guest, and triggers
> a device addition event (hot-plug event) for it.
Sounds OK. While there might be a small window the guest already
starts to use virtio interface before the passthrough gets plugged in,
I think that should be fine.
Another option is to wait until the primary appears and gets
registered in the guest, so it can bring up the upper failover
interface.
>
> If QEMU detects guest driver removal, it initiates a hot-unplug sequence
> to remove the primary driver. In particular, if QEMU detects guest
> re-initialization (e.g. by detecting guest reset) it immediately removes
> the primary device.
Agreed.
For legacy guest, user might prefer seeing a single passthrough device
rather than a virtio device. I would hope there's an option to allow
it to happen, instead of removing all passthrough devices.
Regards,
-Siwei
>
> We can move some of this code to management as well, architecturally it
> does not make too much sense but it might be easier implementation-wise.
>
> HTH
>
> On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote:
>> Ping on this patch now that the kernel patches are accepted into davem's net-next tree.
>> https://patchwork.ozlabs.org/cover/920005/
>>
>>
>> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
>> > This feature bit can be used by hypervisor to indicate virtio_net device to
>> > act as a standby for another device with the same MAC address.
>> >
>> > I tested this with a small change to the patch to mark the STANDBY feature 'true'
>> > by default as i am using libvirt to start the VMs.
>> > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
>> > XML file?
>> >
>> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> > ---
>> > hw/net/virtio-net.c | 2 ++
>> > include/standard-headers/linux/virtio_net.h | 3 +++
>> > 2 files changed, 5 insertions(+)
>> >
>> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> > index 90502fca7c..38b3140670 100644
>> > --- a/hw/net/virtio-net.c
>> > +++ b/hw/net/virtio-net.c
>> > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>> > true),
>> > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>> > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>> > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
>> > + false),
>> > DEFINE_PROP_END_OF_LIST(),
>> > };
>> > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
>> > index e9f255ea3f..01ec09684c 100644
>> > --- a/include/standard-headers/linux/virtio_net.h
>> > +++ b/include/standard-headers/linux/virtio_net.h
>> > @@ -57,6 +57,9 @@
>> > * Steering */
>> > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>> > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device
>> > + * with the same MAC.
>> > + */
>> > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
>> > #ifndef VIRTIO_NET_NO_LEGACY
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>
^ permalink raw reply
* Re: [PATCH v2 2/2] x86: paravirt: make native_save_fl extern inline
From: Arnd Bergmann @ 2018-06-05 21:28 UTC (permalink / raw)
To: Nick Desaulniers
Cc: linux-efi, Brijesh Singh, Boris Ostrovsky, Jan Kiszka,
Will Deacon, Jarkko Sakkinen, virtualization, Masahiro Yamada,
manojgupta, H. Peter Anvin, akataria, tweek, mawilcox,
the arch/x86 maintainers, Greg Hackmann, Matthias Kaehlcke,
Geert Uytterhoeven, David Rientjes, Andrey Ryabinin,
Lendacky, Thomas, Linux Kbuild mailing list, Kees Cook,
Steven Rostedt, Cao jin <caoj.f>
In-Reply-To: <20180605170532.170361-3-ndesaulniers@google.com>
On Tue, Jun 5, 2018 at 7:05 PM, Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> The semantics of extern inline has changed since gnu89. This means that
> folks using GCC versions >= 5.1 may see symbol redefinition errors at
> link time for subdirs that override KBUILD_CFLAGS (making the C standard
> used implicit) regardless of this patch. This has been cleaned up
> earlier in the patch set, but is left as a note in the commit message
> for future travelers.
I think the keyword you are missing is
__attribute__((gnu_inline))
which forces the gnu89 behavior on all compiler versions. It's been supported
since gcc-4.2, so it should not cause problems on any compiler that is able
to build an x86 kernel.
Arnd
^ permalink raw reply
* Re: [PATCH v2 2/2] x86: paravirt: make native_save_fl extern inline
From: Arnd Bergmann @ 2018-06-05 21:31 UTC (permalink / raw)
To: Nick Desaulniers
Cc: linux-efi, Brijesh Singh, Boris Ostrovsky, Jan Kiszka,
Will Deacon, Jarkko Sakkinen, virtualization, Masahiro Yamada,
manojgupta, H. Peter Anvin, akataria, tweek, mawilcox,
the arch/x86 maintainers, Greg Hackmann, Matthias Kaehlcke,
Geert Uytterhoeven, David Rientjes, Andrey Ryabinin,
Lendacky, Thomas, Linux Kbuild mailing list, Kees Cook,
Steven Rostedt, Cao jin <caoj.f>
In-Reply-To: <CAK8P3a0v6sr_oZmYTZV7E-uJ8JdG6D8JqUesHDPkPY2UrWKdvw@mail.gmail.com>
On Tue, Jun 5, 2018 at 11:28 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Jun 5, 2018 at 7:05 PM, Nick Desaulniers
> <ndesaulniers@google.com> wrote:
>>
>> The semantics of extern inline has changed since gnu89. This means that
>> folks using GCC versions >= 5.1 may see symbol redefinition errors at
>> link time for subdirs that override KBUILD_CFLAGS (making the C standard
>> used implicit) regardless of this patch. This has been cleaned up
>> earlier in the patch set, but is left as a note in the commit message
>> for future travelers.
>
> I think the keyword you are missing is
>
> __attribute__((gnu_inline))
>
> which forces the gnu89 behavior on all compiler versions. It's been supported
> since gcc-4.2, so it should not cause problems on any compiler that is able
> to build an x86 kernel.
Nevermind, I just saw you already posted that.
Arnd
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-05 21:32 UTC (permalink / raw)
To: Siwei Liu
Cc: Alexander Duyck, virtio-dev, qemu-devel, Samudrala, Sridhar,
virtualization
In-Reply-To: <CADGSJ21Qf6KpLzr+og5DOFjMPqu69n6dWRLvvj1FGAR5kVRNew@mail.gmail.com>
On Tue, Jun 05, 2018 at 02:16:44PM -0700, Siwei Liu wrote:
> Good to see this discussion going. I share the same feeling that the
> decision of plugging the primary (passthrough) should only be made
> until guest driver acknowledges DRIVER_OK and _F_STANDBY.
> Architecturally this intelligence should be baken to QEMU itself
> rather than moving up to management stack, such as libvirt.
>
> A few questions in line below.
>
> On Tue, Jun 5, 2018 at 5:33 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > I don't think this is sufficient.
> >
> > If both primary and standby devices are present, a legacy guest without
> > support for the feature might see two devices with same mac and get
> > confused.
> >
> > I think that we should only make primary visible after guest acked the
> > backup feature bit.
> >
> > And on reset or when backup is cleared in some other way, unplug the
> > primary.
> >
> > Something like the below will do the job:
> >
> > Primary device is added with a special "primary-failover" flag.
>
> I wonder if you envision this flag as a user interface or some
> internal attribute/property to QEMU device?
>
> Since QEMU needs to associate this particular primary-failover device
> with a virtio standby instance for checking DRIVER_OK as you indicate
> below, I wonder if the group ID thing will be helpful to set
> "primary-failover" flag internally without having to add another
> option in CLI?
I haven't thought about it but it's an option.
> > A virtual machine is then initialized with just a standby virtio
> > device. Primary is not yet added.
> >
> > Later QEMU detects that guest driver device set DRIVER_OK.
> > It then exposes the primary device to the guest, and triggers
> > a device addition event (hot-plug event) for it.
>
> Sounds OK. While there might be a small window the guest already
> starts to use virtio interface before the passthrough gets plugged in,
> I think that should be fine.
>
> Another option is to wait until the primary appears and gets
> registered in the guest, so it can bring up the upper failover
> interface.
We can't be sure this will ever happen, can we? We might be asked to
migrate at any time.
> >
> > If QEMU detects guest driver removal, it initiates a hot-unplug sequence
> > to remove the primary driver. In particular, if QEMU detects guest
> > re-initialization (e.g. by detecting guest reset) it immediately removes
> > the primary device.
>
> Agreed.
>
> For legacy guest, user might prefer seeing a single passthrough device
> rather than a virtio device. I would hope there's an option to allow
> it to happen, instead of removing all passthrough devices.
>
> Regards,
> -Siwei
I don't see a way to make it work since then you can't migrate, can you?
The way I see it, if you don't need migration, just use PT without
failover. If you do, then you either use virtio or failover if guest
supports it.
> >
> > We can move some of this code to management as well, architecturally it
> > does not make too much sense but it might be easier implementation-wise.
> >
> > HTH
> >
> > On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote:
> >> Ping on this patch now that the kernel patches are accepted into davem's net-next tree.
> >> https://patchwork.ozlabs.org/cover/920005/
> >>
> >>
> >> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
> >> > This feature bit can be used by hypervisor to indicate virtio_net device to
> >> > act as a standby for another device with the same MAC address.
> >> >
> >> > I tested this with a small change to the patch to mark the STANDBY feature 'true'
> >> > by default as i am using libvirt to start the VMs.
> >> > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
> >> > XML file?
> >> >
> >> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> > ---
> >> > hw/net/virtio-net.c | 2 ++
> >> > include/standard-headers/linux/virtio_net.h | 3 +++
> >> > 2 files changed, 5 insertions(+)
> >> >
> >> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> > index 90502fca7c..38b3140670 100644
> >> > --- a/hw/net/virtio-net.c
> >> > +++ b/hw/net/virtio-net.c
> >> > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> >> > true),
> >> > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> >> > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> >> > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
> >> > + false),
> >> > DEFINE_PROP_END_OF_LIST(),
> >> > };
> >> > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
> >> > index e9f255ea3f..01ec09684c 100644
> >> > --- a/include/standard-headers/linux/virtio_net.h
> >> > +++ b/include/standard-headers/linux/virtio_net.h
> >> > @@ -57,6 +57,9 @@
> >> > * Steering */
> >> > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
> >> > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device
> >> > + * with the same MAC.
> >> > + */
> >> > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
> >> > #ifndef VIRTIO_NET_NO_LEGACY
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >
^ permalink raw reply
* Re: [PATCH v2 1/2] compiler-gcc.h: add gnu_inline to all inline declarations
From: Joe Perches @ 2018-06-05 21:59 UTC (permalink / raw)
To: Nick Desaulniers
Cc: linux-efi, brijesh.singh, boris.ostrovsky, J. Kiszka, Will Deacon,
jarkko.sakkinen, virtualization, Masahiro Yamada, Manoj Gupta,
hpa, akataria, Thiebaud Weksteen, mawilcox, x86, Greg Hackmann,
Matthias Kaehlcke, geert, David Rientjes, Andrey Ryabinin,
thomas.lendacky, Arnd Bergmann, Linux Kbuild mailing list,
Kees Cook, rostedt, Cao jin, Michal Marek, Josh Poimboeuf
In-Reply-To: <CAKwvOd=zuy0BJC3f3EKT_9Mg83qdZxAxw_RJ4okAMKrpKjJRHQ@mail.gmail.com>
On Tue, 2018-06-05 at 12:50 -0700, Nick Desaulniers wrote:
> On Tue, Jun 5, 2018 at 12:14 PM Joe Perches <joe@perches.com> wrote:
> >
> > On Tue, 2018-06-05 at 10:23 -0700, Joe Perches wrote:
> > > Perhaps these are simpler as
> > >
> > > #define __inline__ inline
> > > #define __inline inline
> >
> > Currently, there are these uses of inline variants in the kernel
> >
> > $ git grep -w inline | wc -l
> > 68410
> > $ git grep -w __inline__ | wc -l
> > 503
> > $ git grep -w __inline | wc -l
> > 57
> >
> > So it seems it's also reasonable to sed all uses of __inline to inline
> > and perhaps remove __inline eventually altogether.
> > (perhaps there are still too many __inline__ uses)
>
> Yeah, that sounds good. Should I split that into 3 patches:
>
> > Excluding scripts and a few other files,
> > here's a possible patch done with:
> >
> > $ git grep -w --name-only __inline | \
> > grep -vP '^(?:arch/alpha/|include/|scripts/)' | \
> > xargs sed -r -i -e 's/\b__inline\b/inline/g' \
> > -e 's/\binline\s+static\b/static inline/g'
> > ---
> > Documentation/trace/tracepoint-analysis.rst | 2 +-
> > drivers/staging/rtl8723bs/core/rtw_pwrctrl.c | 4 ++--
> > drivers/staging/rtl8723bs/core/rtw_wlan_util.c | 2 +-
> > drivers/staging/rtl8723bs/include/drv_types.h | 6 +++---
> > drivers/staging/rtl8723bs/include/ieee80211.h | 6 +++---
> > drivers/staging/rtl8723bs/include/osdep_service.h | 10 +++++-----
> > drivers/staging/rtl8723bs/include/osdep_service_linux.h | 14 +++++++-------
> > drivers/staging/rtl8723bs/include/rtw_mlme.h | 14 +++++++-------
> > drivers/staging/rtl8723bs/include/rtw_recv.h | 16 ++++++++--------
> > drivers/staging/rtl8723bs/include/sta_info.h | 2 +-
> > drivers/staging/rtl8723bs/include/wifi.h | 14 +++++++-------
> > drivers/staging/rtl8723bs/include/wlan_bssdef.h | 2 +-
> > lib/zstd/mem.h | 2 +-
> > 13 files changed, 47 insertions(+), 47 deletions(-)
>
>
> 1 for documentation, 1 for rtl8723bs, 1 for zstd?
Seems sensible to me.
> Follow up set or include in v3?
Your choice. Probably a follow up would work best.
Also, the remaining __inline uses would be:
arch/alpha/include/asm/compiler.h:#undef __inline
include/linux/compiler-gcc.h:#define __inline __inline __attribute__((always_inline,unused)) notrace
include/linux/compiler-gcc.h:#define __inline __inline __attribute__((unused)) notrace
scripts/checkpatch.pl:our $Inline = qr{inline|__always_inline|noinline|__inline|__inline__};
scripts/checkpatch.pl:# Check for __inline__ and __inline, prefer inline
scripts/checkpatch.pl: $line =~ /\b(__inline__|__inline)\b/) {
scripts/checkpatch.pl: $fixed[$fixlinenr] =~ s/\b(__inline__|__inline)\b/inline/;
scripts/genksyms/keywords.c: { "__inline", INLINE_KEYW },
scripts/kernel-doc: $prototype =~ s/^__inline +//;
So all of these could be removed/updated appropriately too
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Siwei Liu @ 2018-06-05 22:09 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Alexander Duyck, virtio-dev, qemu-devel, Samudrala, Sridhar,
virtualization
In-Reply-To: <20180606002650-mutt-send-email-mst@kernel.org>
On Tue, Jun 5, 2018 at 2:32 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Jun 05, 2018 at 02:16:44PM -0700, Siwei Liu wrote:
>> Good to see this discussion going. I share the same feeling that the
>> decision of plugging the primary (passthrough) should only be made
>> until guest driver acknowledges DRIVER_OK and _F_STANDBY.
>> Architecturally this intelligence should be baken to QEMU itself
>> rather than moving up to management stack, such as libvirt.
>>
>> A few questions in line below.
>>
>> On Tue, Jun 5, 2018 at 5:33 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
>> > I don't think this is sufficient.
>> >
>> > If both primary and standby devices are present, a legacy guest without
>> > support for the feature might see two devices with same mac and get
>> > confused.
>> >
>> > I think that we should only make primary visible after guest acked the
>> > backup feature bit.
>> >
>> > And on reset or when backup is cleared in some other way, unplug the
>> > primary.
>> >
>> > Something like the below will do the job:
>> >
>> > Primary device is added with a special "primary-failover" flag.
>>
>> I wonder if you envision this flag as a user interface or some
>> internal attribute/property to QEMU device?
>>
>> Since QEMU needs to associate this particular primary-failover device
>> with a virtio standby instance for checking DRIVER_OK as you indicate
>> below, I wonder if the group ID thing will be helpful to set
>> "primary-failover" flag internally without having to add another
>> option in CLI?
>
> I haven't thought about it but it's an option.
>
>> > A virtual machine is then initialized with just a standby virtio
>> > device. Primary is not yet added.
>> >
>> > Later QEMU detects that guest driver device set DRIVER_OK.
>> > It then exposes the primary device to the guest, and triggers
>> > a device addition event (hot-plug event) for it.
>>
>> Sounds OK. While there might be a small window the guest already
>> starts to use virtio interface before the passthrough gets plugged in,
>> I think that should be fine.
>>
>> Another option is to wait until the primary appears and gets
>> registered in the guest, so it can bring up the upper failover
>> interface.
>
> We can't be sure this will ever happen, can we? We might be asked to
> migrate at any time.
Right. For that to work, virtio driver needs a complicated state
tracking hand-in-hand with the host for VF's hot plugging activity. If
hot plugging is defered due to migration activity the failover
interface can be brought up immediately.
As said, this is just an option. I don't have strong preference here.
>
>> >
>> > If QEMU detects guest driver removal, it initiates a hot-unplug sequence
>> > to remove the primary driver. In particular, if QEMU detects guest
>> > re-initialization (e.g. by detecting guest reset) it immediately removes
>> > the primary device.
>>
>> Agreed.
>>
>> For legacy guest, user might prefer seeing a single passthrough device
>> rather than a virtio device. I would hope there's an option to allow
>> it to happen, instead of removing all passthrough devices.
>>
>> Regards,
>> -Siwei
>
> I don't see a way to make it work since then you can't migrate, can you?
The thing is cloud service provider might prefer sticking to the same
level of service agreement (SLA) of keeping VF over migration,
particularly when guest OS or kernel does not have the support.The
footprint in guest OS support might NOT be prevailing in the early
days. There's no point the cloud vendor can switch back and forth, as
there's no way for hypervisor to detect what kernel version even OS
the guest will be running ahead of time, until the guest gets fully
loaded and boots up, right?
-Siwei
>
> The way I see it, if you don't need migration, just use PT without
> failover. If you do, then you either use virtio or failover if guest
> supports it.
>
>> >
>> > We can move some of this code to management as well, architecturally it
>> > does not make too much sense but it might be easier implementation-wise.
>> >
>> > HTH
>> >
>> > On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote:
>> >> Ping on this patch now that the kernel patches are accepted into davem's net-next tree.
>> >> https://patchwork.ozlabs.org/cover/920005/
>> >>
>> >>
>> >> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
>> >> > This feature bit can be used by hypervisor to indicate virtio_net device to
>> >> > act as a standby for another device with the same MAC address.
>> >> >
>> >> > I tested this with a small change to the patch to mark the STANDBY feature 'true'
>> >> > by default as i am using libvirt to start the VMs.
>> >> > Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
>> >> > XML file?
>> >> >
>> >> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> >> > ---
>> >> > hw/net/virtio-net.c | 2 ++
>> >> > include/standard-headers/linux/virtio_net.h | 3 +++
>> >> > 2 files changed, 5 insertions(+)
>> >> >
>> >> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> >> > index 90502fca7c..38b3140670 100644
>> >> > --- a/hw/net/virtio-net.c
>> >> > +++ b/hw/net/virtio-net.c
>> >> > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>> >> > true),
>> >> > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>> >> > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>> >> > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
>> >> > + false),
>> >> > DEFINE_PROP_END_OF_LIST(),
>> >> > };
>> >> > diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
>> >> > index e9f255ea3f..01ec09684c 100644
>> >> > --- a/include/standard-headers/linux/virtio_net.h
>> >> > +++ b/include/standard-headers/linux/virtio_net.h
>> >> > @@ -57,6 +57,9 @@
>> >> > * Steering */
>> >> > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>> >> > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device
>> >> > + * with the same MAC.
>> >> > + */
>> >> > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
>> >> > #ifndef VIRTIO_NET_NO_LEGACY
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>> >
^ permalink raw reply
* Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Jason Wang @ 2018-06-06 2:29 UTC (permalink / raw)
To: Michael S. Tsirkin, Samudrala, Sridhar
Cc: alexander.h.duyck, virtio-dev, qemu-devel, virtualization
In-Reply-To: <20180605152344-mutt-send-email-mst@kernel.org>
On 2018年06月05日 20:33, Michael S. Tsirkin wrote:
> I don't think this is sufficient.
>
> If both primary and standby devices are present, a legacy guest without
> support for the feature might see two devices with same mac and get
> confused.
>
> I think that we should only make primary visible after guest acked the
> backup feature bit.
I think we want exactly the reverse? E.g fail the negotiation when guest
does not ack backup feature.
Otherwise legacy guest won't even have the chance to see primary device
in the guest.
>
> And on reset or when backup is cleared in some other way, unplug the
> primary.
What if guest does not support hotplug?
>
> Something like the below will do the job:
>
> Primary device is added with a special "primary-failover" flag.
> A virtual machine is then initialized with just a standby virtio
> device. Primary is not yet added.
A question is how to do the matching? Qemu knows nothing about e.g mac
address of a pass-through device I believe?
>
> Later QEMU detects that guest driver device set DRIVER_OK.
> It then exposes the primary device to the guest, and triggers
> a device addition event (hot-plug event) for it.
Do you mean we won't have primary device in the initial qemu cli?
>
> If QEMU detects guest driver removal, it initiates a hot-unplug sequence
> to remove the primary driver. In particular, if QEMU detects guest
> re-initialization (e.g. by detecting guest reset) it immediately removes
> the primary device.
I believe guest failover module should handle this gracefully?
Thanks
>
> We can move some of this code to management as well, architecturally it
> does not make too much sense but it might be easier implementation-wise.
>
> HTH
>
> On Mon, Jun 04, 2018 at 06:41:48PM -0700, Samudrala, Sridhar wrote:
>> Ping on this patch now that the kernel patches are accepted into davem's net-next tree.
>> https://patchwork.ozlabs.org/cover/920005/
>>
>>
>> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
>>> This feature bit can be used by hypervisor to indicate virtio_net device to
>>> act as a standby for another device with the same MAC address.
>>>
>>> I tested this with a small change to the patch to mark the STANDBY feature 'true'
>>> by default as i am using libvirt to start the VMs.
>>> Is there a way to pass the newly added feature bit 'standby' to qemu via libvirt
>>> XML file?
>>>
>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>> ---
>>> hw/net/virtio-net.c | 2 ++
>>> include/standard-headers/linux/virtio_net.h | 3 +++
>>> 2 files changed, 5 insertions(+)
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index 90502fca7c..38b3140670 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>>> true),
>>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
>>> + false),
>>> DEFINE_PROP_END_OF_LIST(),
>>> };
>>> diff --git a/include/standard-headers/linux/virtio_net.h b/include/standard-headers/linux/virtio_net.h
>>> index e9f255ea3f..01ec09684c 100644
>>> --- a/include/standard-headers/linux/virtio_net.h
>>> +++ b/include/standard-headers/linux/virtio_net.h
>>> @@ -57,6 +57,9 @@
>>> * Steering */
>>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for another device
>>> + * with the same MAC.
>>> + */
>>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */
>>> #ifndef VIRTIO_NET_NO_LEGACY
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands support
From: Changpeng Liu @ 2018-06-06 4:19 UTC (permalink / raw)
To: virtualization, changpeng.liu; +Cc: pbonzini, cavery, stefanha
Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES commands
support, this will impact the performance when using SSD backend over
file systems.
Commit 88c85538 "virtio-blk: add discard and write zeroes features to
specification"(see https://github.com/oasis-tcs/virtio-spec) extended
existing virtio-blk protocol, adding extra DISCARD and WRITE ZEROES
commands support.
While here, using 16 bytes descriptor to describe one segment of DISCARD
or WRITE ZEROES commands, each command may contain one or more decriptors.
The following data structure shows the definition of one descriptor:
struct virtio_blk_discard_write_zeroes {
le64 sector;
le32 num_sectors;
le32 unmap;
};
Field 'sector' means the start sector for DISCARD and WRITE ZEROES,
filed 'num_sectors' means the number of sectors for DISCARD and WRITE
ZEROES, if both DISCARD and WRITE ZEROES are supported, field 'unmap'
maybe used for WRITE ZEROES command with DISCARD enabled.
We also extended the virtio-blk configuration space to let backend
device put DISCARD and WRITE ZEROES configuration parameters.
struct virtio_blk_config {
[...]
le32 max_discard_sectors;
le32 max_discard_seg;
le32 discard_sector_alignment;
le32 max_write_zeroes_sectors;
le32 max_write_zeroes_seg;
u8 write_zeroes_may_unmap;
}
New feature bit [VIRTIO_BLK_F_DISCARD (13)]: Device can support discard
command, maximum discard sectors size in field 'max_discard_sectors' and
maximum discard segment number in field 'max_discard_seg'.
New feature [VIRTIO_BLK_F_WRITE_ZEROES (14)]: Device can support write
zeroes command, maximum write zeroes sectors size in field
'max_write_zeroes_sectors' and maximum write zeroes segment number in
field 'max_write_zeroes_seg'.
The parameters in the configuration space of the device field
'max_discard_sectors' and field 'discard_sector_alignment' are expressed in
512-byte units if the VIRTIO_BLK_F_DISCARD feature bit is negotiated. The
field 'max_write_zeroes_sectors' is expressed in 512-byte units if the
VIRTIO_BLK_F_WRITE_ZEROES feature bit is negotiated.
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
---
CHANGELOG:
v6: don't set T_OUT bit to discard and write zeroes commands.
v5: use new block layer API: blk_queue_flag_set.
v4: several optimizations based on MST's comments, remove bit field usage for
command descriptor.
v3: define the virtio-blk protocol to add discard and write zeroes support,
first version implementation based on proposed specification.
v2: add write zeroes command support.
v1: initial proposal implementation for discard command.
---
drivers/block/virtio_blk.c | 89 ++++++++++++++++++++++++++++++++++++++++-
include/uapi/linux/virtio_blk.h | 43 ++++++++++++++++++++
2 files changed, 130 insertions(+), 2 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 4a07593c..5aabc63 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -172,10 +172,45 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr,
return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC);
}
+static inline int virtblk_setup_discard_write_zeroes(struct request *req,
+ bool unmap)
+{
+ unsigned short segments = blk_rq_nr_discard_segments(req);
+ unsigned short n = 0;
+ struct virtio_blk_discard_write_zeroes *range;
+ struct bio *bio;
+
+ range = kmalloc_array(segments, sizeof(*range), GFP_KERNEL);
+ if (!range)
+ return -ENOMEM;
+
+ __rq_for_each_bio(bio, req) {
+ u64 sector = bio->bi_iter.bi_sector;
+ u32 num_sectors = bio->bi_iter.bi_size >> 9;
+
+ range[n].unmap = cpu_to_le32(unmap);
+ range[n].num_sectors = cpu_to_le32(num_sectors);
+ range[n].sector = cpu_to_le64(sector);
+ n++;
+ }
+
+ req->special_vec.bv_page = virt_to_page(range);
+ req->special_vec.bv_offset = offset_in_page(range);
+ req->special_vec.bv_len = sizeof(*range) * segments;
+ req->rq_flags |= RQF_SPECIAL_PAYLOAD;
+
+ return 0;
+}
+
static inline void virtblk_request_done(struct request *req)
{
struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
+ if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
+ kfree(page_address(req->special_vec.bv_page) +
+ req->special_vec.bv_offset);
+ }
+
switch (req_op(req)) {
case REQ_OP_SCSI_IN:
case REQ_OP_SCSI_OUT:
@@ -225,6 +260,7 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
int qid = hctx->queue_num;
int err;
bool notify = false;
+ bool unmap = false;
u32 type;
BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
@@ -237,6 +273,13 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
case REQ_OP_FLUSH:
type = VIRTIO_BLK_T_FLUSH;
break;
+ case REQ_OP_DISCARD:
+ type = VIRTIO_BLK_T_DISCARD;
+ break;
+ case REQ_OP_WRITE_ZEROES:
+ type = VIRTIO_BLK_T_WRITE_ZEROES;
+ unmap = !(req->cmd_flags & REQ_NOUNMAP);
+ break;
case REQ_OP_SCSI_IN:
case REQ_OP_SCSI_OUT:
type = VIRTIO_BLK_T_SCSI_CMD;
@@ -256,6 +299,12 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
blk_mq_start_request(req);
+ if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES) {
+ err = virtblk_setup_discard_write_zeroes(req, unmap);
+ if (err)
+ return BLK_STS_RESOURCE;
+ }
+
num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
if (num) {
if (rq_data_dir(req) == WRITE)
@@ -777,6 +826,42 @@ static int virtblk_probe(struct virtio_device *vdev)
if (!err && opt_io_size)
blk_queue_io_opt(q, blk_size * opt_io_size);
+ if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
+ q->limits.discard_granularity = blk_size;
+
+ virtio_cread(vdev, struct virtio_blk_config,
+ discard_sector_alignment, &v);
+ if (v)
+ q->limits.discard_alignment = v << 9;
+ else
+ q->limits.discard_alignment = 0;
+
+ virtio_cread(vdev, struct virtio_blk_config,
+ max_discard_sectors, &v);
+ if (v)
+ blk_queue_max_discard_sectors(q, v);
+ else
+ blk_queue_max_discard_sectors(q, UINT_MAX);
+
+ virtio_cread(vdev, struct virtio_blk_config, max_discard_seg,
+ &v);
+ if (v)
+ blk_queue_max_discard_segments(q, v);
+ else
+ blk_queue_max_discard_segments(q, USHRT_MAX);
+
+ blk_queue_flag_set(QUEUE_FLAG_DISCARD, q);
+ }
+
+ if (virtio_has_feature(vdev, VIRTIO_BLK_F_WRITE_ZEROES)) {
+ virtio_cread(vdev, struct virtio_blk_config,
+ max_write_zeroes_sectors, &v);
+ if (v)
+ blk_queue_max_write_zeroes_sectors(q, v);
+ else
+ blk_queue_max_write_zeroes_sectors(q, UINT_MAX);
+ }
+
virtblk_update_capacity(vblk, false);
virtio_device_ready(vdev);
@@ -885,14 +970,14 @@ static int virtblk_restore(struct virtio_device *vdev)
VIRTIO_BLK_F_SCSI,
#endif
VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
- VIRTIO_BLK_F_MQ,
+ VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES,
}
;
static unsigned int features[] = {
VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE,
VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE,
- VIRTIO_BLK_F_MQ,
+ VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES,
};
static struct virtio_driver virtio_blk = {
diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h
index 9ebe4d9..8e7a015 100644
--- a/include/uapi/linux/virtio_blk.h
+++ b/include/uapi/linux/virtio_blk.h
@@ -38,6 +38,8 @@
#define VIRTIO_BLK_F_BLK_SIZE 6 /* Block size of disk is available*/
#define VIRTIO_BLK_F_TOPOLOGY 10 /* Topology information is available */
#define VIRTIO_BLK_F_MQ 12 /* support more than one vq */
+#define VIRTIO_BLK_F_DISCARD 13 /* DISCARD is supported */
+#define VIRTIO_BLK_F_WRITE_ZEROES 14 /* WRITE ZEROES is supported */
/* Legacy feature bits */
#ifndef VIRTIO_BLK_NO_LEGACY
@@ -86,6 +88,29 @@ struct virtio_blk_config {
/* number of vqs, only available when VIRTIO_BLK_F_MQ is set */
__u16 num_queues;
+ /* The maximum discard sectors (in 512-byte sectors) for
+ * one segment (if VIRTIO_BLK_F_DISCARD)
+ */
+ __u32 max_discard_sectors;
+ /* The maximum number of discard segments in a
+ * discard command (if VIRTIO_BLK_F_DISCARD)
+ */
+ __u32 max_discard_seg;
+ /* The alignment sectors for discard (if VIRTIO_BLK_F_DISCARD) */
+ __u32 discard_sector_alignment;
+ /* The maximum number of write zeroes sectors (in 512-byte sectors) in
+ * one segment (if VIRTIO_BLK_F_WRITE_ZEROES)
+ */
+ __u32 max_write_zeroes_sectors;
+ /* The maximum number of segments in a write zeroes
+ * command (if VIRTIO_BLK_F_WRITE_ZEROES)
+ */
+ __u32 max_write_zeroes_seg;
+ /* Device clear this bit when write zeroes command can't result in
+ * unmapping sectors (if VIRITO_BLK_F_WRITE_ZEROES and with unmap)
+ */
+ __u8 write_zeroes_may_unmap;
+ __u8 unused1[3];
} __attribute__((packed));
/*
@@ -114,6 +139,12 @@ struct virtio_blk_config {
/* Get device ID command */
#define VIRTIO_BLK_T_GET_ID 8
+/* Discard command */
+#define VIRTIO_BLK_T_DISCARD 11
+
+/* Write zeroes command */
+#define VIRTIO_BLK_T_WRITE_ZEROES 13
+
#ifndef VIRTIO_BLK_NO_LEGACY
/* Barrier before this op. */
#define VIRTIO_BLK_T_BARRIER 0x80000000
@@ -133,6 +164,18 @@ struct virtio_blk_outhdr {
__virtio64 sector;
};
+/*
+ * discard/write zeroes range for each request.
+ */
+struct virtio_blk_discard_write_zeroes {
+ /* discard/write zeroes start sector */
+ __virtio64 sector;
+ /* number of discard/write zeroes sectors */
+ __virtio32 num_sectors;
+ /* valid for write zeroes command */
+ __virtio32 unmap;
+};
+
#ifndef VIRTIO_BLK_NO_LEGACY
struct virtio_scsi_inhdr {
__virtio32 errors;
--
1.9.3
^ permalink raw reply related
* Re: [virtio-dev] Re: [PATCH v3] virtio_pci: support enabling VFs
From: Tiwei Bie @ 2018-06-06 12:11 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, linux-pci, linux-kernel,
virtualization, stefanha, zhihong.wang, bhelgaas, mark.d.rustad
In-Reply-To: <20180605152151-mutt-send-email-mst@kernel.org>
On Tue, Jun 05, 2018 at 03:23:11PM +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 05, 2018 at 09:36:53AM +0800, Tiwei Bie wrote:
> > On Mon, Jun 04, 2018 at 07:32:25PM +0300, Michael S. Tsirkin wrote:
> > > On Fri, Jun 01, 2018 at 12:02:39PM +0800, Tiwei Bie wrote:
> > > > There is a new feature bit allocated in virtio spec to
> > > > support SR-IOV (Single Root I/O Virtualization):
> > > >
> > > > https://github.com/oasis-tcs/virtio-spec/issues/11
> > > >
> > > > This patch enables the support for this feature bit in
> > > > virtio driver.
> > > >
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > ---
> > >
> > > OK but what about freeze/restore functions?
>
> So for restore, don't you need to restore the
> sriov capability?
Currently I'm not familiar with the PM part.
But I still think the sriov capability should
be handled by PCI core. I'm trying to understand
all the relevant code..
For your question, based on what I found from
the code currently, I guess the sriov capability
will be restored by pci_restore_state() which
will be called by the ops in pci_dev_pm_ops.
The sriov_restore_state() will be called
eventually.
Best regards,
Tiwei Bie
>
>
> > > I also wonder about kexec - virtio.c currently does:
> > >
> > > /* We always start by resetting the device, in case a previous
> > > * driver messed it up. This also tests that code path a little. */
> > > dev->config->reset(dev);
> > >
> > > Do we need to do something like this for sriov?
> >
> > I think VFs are managed by PCI core. Once they are
> > allocated, virtio driver doesn't have to care too
> > much about how to manage them. The proposal for the
> > spec is just to provide a feature bit based virtio
> > way for virtio drivers to know whether a virtio
> > device is SR-IOV capable (and virtio drivers can
> > support configuring SR-IOV based on the feature
> > bit negotiation result).
> >
> > >
> > > I also wonder whether PCI core should disable sriov for us.
> > >
> > >
> > > I wish there was a patch emulating this without vDPA for QEMU,
> > > would make it easy to test your patches. Do you happen
> > > to have something like this?
> >
> > Sorry, currently I don't have anything like this..
> >
> > Best regards,
> > Tiwei Bie
> >
> > >
> > > Thanks,
> > >
> > >
> > > > v3:
> > > > - Drop the acks;
> > > >
> > > > v2:
> > > > - Disable VFs when unbinding the driver (Alex, MST);
> > > > - Don't use pci_sriov_configure_simple (Alex);
> > > >
> > > > drivers/virtio/virtio_pci_common.c | 30 ++++++++++++++++++++++++++++++
> > > > drivers/virtio/virtio_pci_modern.c | 14 ++++++++++++++
> > > > include/uapi/linux/virtio_config.h | 7 ++++++-
> > > > 3 files changed, 50 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > index 48d4d1cf1cb6..1d4467b2dc31 100644
> > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > @@ -577,6 +577,8 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> > > > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > > > struct device *dev = get_device(&vp_dev->vdev.dev);
> > > >
> > > > + pci_disable_sriov(pci_dev);
> > > > +
> > > > unregister_virtio_device(&vp_dev->vdev);
> > > >
> > > > if (vp_dev->ioaddr)
> > > > @@ -588,6 +590,33 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> > > > put_device(dev);
> > > > }
> > > >
> > > > +static int virtio_pci_sriov_configure(struct pci_dev *pci_dev, int num_vfs)
> > > > +{
> > > > + struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > > > + struct virtio_device *vdev = &vp_dev->vdev;
> > > > + int ret;
> > > > +
> > > > + if (!(vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_DRIVER_OK))
> > > > + return -EBUSY;
> > > > +
> > > > + if (!__virtio_test_bit(vdev, VIRTIO_F_SR_IOV))
> > > > + return -EINVAL;
> > > > +
> > > > + if (pci_vfs_assigned(pci_dev))
> > > > + return -EPERM;
> > > > +
> > > > + if (num_vfs == 0) {
> > > > + pci_disable_sriov(pci_dev);
> > > > + return 0;
> > > > + }
> > > > +
> > > > + ret = pci_enable_sriov(pci_dev, num_vfs);
> > > > + if (ret < 0)
> > > > + return ret;
> > > > +
> > > > + return num_vfs;
> > > > +}
> > > > +
> > > > static struct pci_driver virtio_pci_driver = {
> > > > .name = "virtio-pci",
> > > > .id_table = virtio_pci_id_table,
> > > > @@ -596,6 +625,7 @@ static struct pci_driver virtio_pci_driver = {
> > > > #ifdef CONFIG_PM_SLEEP
> > > > .driver.pm = &virtio_pci_pm_ops,
> > > > #endif
> > > > + .sriov_configure = virtio_pci_sriov_configure,
> > > > };
> > > >
> > > > module_pci_driver(virtio_pci_driver);
> > > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > > > index 2555d80f6eec..07571daccfec 100644
> > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > @@ -153,14 +153,28 @@ static u64 vp_get_features(struct virtio_device *vdev)
> > > > return features;
> > > > }
> > > >
> > > > +static void vp_transport_features(struct virtio_device *vdev, u64 features)
> > > > +{
> > > > + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > + struct pci_dev *pci_dev = vp_dev->pci_dev;
> > > > +
> > > > + if ((features & BIT_ULL(VIRTIO_F_SR_IOV)) &&
> > > > + pci_find_ext_capability(pci_dev, PCI_EXT_CAP_ID_SRIOV))
> > > > + __virtio_set_bit(vdev, VIRTIO_F_SR_IOV);
> > > > +}
> > > > +
> > > > /* virtio config->finalize_features() implementation */
> > > > static int vp_finalize_features(struct virtio_device *vdev)
> > > > {
> > > > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > + u64 features = vdev->features;
> > > >
> > > > /* Give virtio_ring a chance to accept features. */
> > > > vring_transport_features(vdev);
> > > >
> > > > + /* Give virtio_pci a chance to accept features. */
> > > > + vp_transport_features(vdev, features);
> > > > +
> > > > if (!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
> > > > dev_err(&vdev->dev, "virtio: device uses modern interface "
> > > > "but does not have VIRTIO_F_VERSION_1\n");
> > > > diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> > > > index 308e2096291f..b7c1f4e7d59e 100644
> > > > --- a/include/uapi/linux/virtio_config.h
> > > > +++ b/include/uapi/linux/virtio_config.h
> > > > @@ -49,7 +49,7 @@
> > > > * transport being used (eg. virtio_ring), the rest are per-device feature
> > > > * bits. */
> > > > #define VIRTIO_TRANSPORT_F_START 28
> > > > -#define VIRTIO_TRANSPORT_F_END 34
> > > > +#define VIRTIO_TRANSPORT_F_END 38
> > > >
> > > > #ifndef VIRTIO_CONFIG_NO_LEGACY
> > > > /* Do we get callbacks when the ring is completely used, even if we've
> > > > @@ -71,4 +71,9 @@
> > > > * this is for compatibility with legacy systems.
> > > > */
> > > > #define VIRTIO_F_IOMMU_PLATFORM 33
> > > > +
> > > > +/*
> > > > + * Does the device support Single Root I/O Virtualization?
> > > > + */
> > > > +#define VIRTIO_F_SR_IOV 37
> > > > #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> > > > --
> > > > 2.17.0
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > >
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH v3] virtio_pci: support enabling VFs
From: Michael S. Tsirkin @ 2018-06-06 12:44 UTC (permalink / raw)
To: Tiwei Bie
Cc: alexander.h.duyck, virtio-dev, linux-pci, linux-kernel,
virtualization, stefanha, zhihong.wang, bhelgaas, mark.d.rustad
In-Reply-To: <20180606121154.GA9599@debian>
On Wed, Jun 06, 2018 at 08:11:54PM +0800, Tiwei Bie wrote:
> On Tue, Jun 05, 2018 at 03:23:11PM +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 05, 2018 at 09:36:53AM +0800, Tiwei Bie wrote:
> > > On Mon, Jun 04, 2018 at 07:32:25PM +0300, Michael S. Tsirkin wrote:
> > > > On Fri, Jun 01, 2018 at 12:02:39PM +0800, Tiwei Bie wrote:
> > > > > There is a new feature bit allocated in virtio spec to
> > > > > support SR-IOV (Single Root I/O Virtualization):
> > > > >
> > > > > https://github.com/oasis-tcs/virtio-spec/issues/11
> > > > >
> > > > > This patch enables the support for this feature bit in
> > > > > virtio driver.
> > > > >
> > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > ---
> > > >
> > > > OK but what about freeze/restore functions?
> >
> > So for restore, don't you need to restore the
> > sriov capability?
>
> Currently I'm not familiar with the PM part.
> But I still think the sriov capability should
> be handled by PCI core.
OK but the point is restore looks just like power up for device.
> I'm trying to understand
> all the relevant code..
> For your question, based on what I found from
> the code currently, I guess the sriov capability
> will be restored by pci_restore_state() which
> will be called by the ops in pci_dev_pm_ops.
> The sriov_restore_state() will be called
> eventually.
>
> Best regards,
> Tiwei Bie
Right but my point is during resume SRIOV gets enabled first before
driver ok.
Maybe we should relax the requirements in the spec:
- only require FEATURES_OK from device, not DRIVER_OK from driver
- explain that it only has to happen once, not on each reset,
and driver can remember the result
>
> >
> >
> > > > I also wonder about kexec - virtio.c currently does:
> > > >
> > > > /* We always start by resetting the device, in case a previous
> > > > * driver messed it up. This also tests that code path a little. */
> > > > dev->config->reset(dev);
> > > >
> > > > Do we need to do something like this for sriov?
> > >
> > > I think VFs are managed by PCI core. Once they are
> > > allocated, virtio driver doesn't have to care too
> > > much about how to manage them. The proposal for the
> > > spec is just to provide a feature bit based virtio
> > > way for virtio drivers to know whether a virtio
> > > device is SR-IOV capable (and virtio drivers can
> > > support configuring SR-IOV based on the feature
> > > bit negotiation result).
> > >
> > > >
> > > > I also wonder whether PCI core should disable sriov for us.
> > > >
> > > >
> > > > I wish there was a patch emulating this without vDPA for QEMU,
> > > > would make it easy to test your patches. Do you happen
> > > > to have something like this?
> > >
> > > Sorry, currently I don't have anything like this..
> > >
> > > Best regards,
> > > Tiwei Bie
> > >
> > > >
> > > > Thanks,
> > > >
> > > >
> > > > > v3:
> > > > > - Drop the acks;
> > > > >
> > > > > v2:
> > > > > - Disable VFs when unbinding the driver (Alex, MST);
> > > > > - Don't use pci_sriov_configure_simple (Alex);
> > > > >
> > > > > drivers/virtio/virtio_pci_common.c | 30 ++++++++++++++++++++++++++++++
> > > > > drivers/virtio/virtio_pci_modern.c | 14 ++++++++++++++
> > > > > include/uapi/linux/virtio_config.h | 7 ++++++-
> > > > > 3 files changed, 50 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > > index 48d4d1cf1cb6..1d4467b2dc31 100644
> > > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > > @@ -577,6 +577,8 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> > > > > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > > > > struct device *dev = get_device(&vp_dev->vdev.dev);
> > > > >
> > > > > + pci_disable_sriov(pci_dev);
> > > > > +
> > > > > unregister_virtio_device(&vp_dev->vdev);
> > > > >
> > > > > if (vp_dev->ioaddr)
> > > > > @@ -588,6 +590,33 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> > > > > put_device(dev);
> > > > > }
> > > > >
> > > > > +static int virtio_pci_sriov_configure(struct pci_dev *pci_dev, int num_vfs)
> > > > > +{
> > > > > + struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > > > > + struct virtio_device *vdev = &vp_dev->vdev;
> > > > > + int ret;
> > > > > +
> > > > > + if (!(vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_DRIVER_OK))
> > > > > + return -EBUSY;
> > > > > +
> > > > > + if (!__virtio_test_bit(vdev, VIRTIO_F_SR_IOV))
> > > > > + return -EINVAL;
> > > > > +
> > > > > + if (pci_vfs_assigned(pci_dev))
> > > > > + return -EPERM;
> > > > > +
> > > > > + if (num_vfs == 0) {
> > > > > + pci_disable_sriov(pci_dev);
> > > > > + return 0;
> > > > > + }
> > > > > +
> > > > > + ret = pci_enable_sriov(pci_dev, num_vfs);
> > > > > + if (ret < 0)
> > > > > + return ret;
> > > > > +
> > > > > + return num_vfs;
> > > > > +}
> > > > > +
> > > > > static struct pci_driver virtio_pci_driver = {
> > > > > .name = "virtio-pci",
> > > > > .id_table = virtio_pci_id_table,
> > > > > @@ -596,6 +625,7 @@ static struct pci_driver virtio_pci_driver = {
> > > > > #ifdef CONFIG_PM_SLEEP
> > > > > .driver.pm = &virtio_pci_pm_ops,
> > > > > #endif
> > > > > + .sriov_configure = virtio_pci_sriov_configure,
> > > > > };
> > > > >
> > > > > module_pci_driver(virtio_pci_driver);
> > > > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > > > > index 2555d80f6eec..07571daccfec 100644
> > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > @@ -153,14 +153,28 @@ static u64 vp_get_features(struct virtio_device *vdev)
> > > > > return features;
> > > > > }
> > > > >
> > > > > +static void vp_transport_features(struct virtio_device *vdev, u64 features)
> > > > > +{
> > > > > + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > + struct pci_dev *pci_dev = vp_dev->pci_dev;
> > > > > +
> > > > > + if ((features & BIT_ULL(VIRTIO_F_SR_IOV)) &&
> > > > > + pci_find_ext_capability(pci_dev, PCI_EXT_CAP_ID_SRIOV))
> > > > > + __virtio_set_bit(vdev, VIRTIO_F_SR_IOV);
> > > > > +}
> > > > > +
> > > > > /* virtio config->finalize_features() implementation */
> > > > > static int vp_finalize_features(struct virtio_device *vdev)
> > > > > {
> > > > > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > + u64 features = vdev->features;
> > > > >
> > > > > /* Give virtio_ring a chance to accept features. */
> > > > > vring_transport_features(vdev);
> > > > >
> > > > > + /* Give virtio_pci a chance to accept features. */
> > > > > + vp_transport_features(vdev, features);
> > > > > +
> > > > > if (!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
> > > > > dev_err(&vdev->dev, "virtio: device uses modern interface "
> > > > > "but does not have VIRTIO_F_VERSION_1\n");
> > > > > diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> > > > > index 308e2096291f..b7c1f4e7d59e 100644
> > > > > --- a/include/uapi/linux/virtio_config.h
> > > > > +++ b/include/uapi/linux/virtio_config.h
> > > > > @@ -49,7 +49,7 @@
> > > > > * transport being used (eg. virtio_ring), the rest are per-device feature
> > > > > * bits. */
> > > > > #define VIRTIO_TRANSPORT_F_START 28
> > > > > -#define VIRTIO_TRANSPORT_F_END 34
> > > > > +#define VIRTIO_TRANSPORT_F_END 38
> > > > >
> > > > > #ifndef VIRTIO_CONFIG_NO_LEGACY
> > > > > /* Do we get callbacks when the ring is completely used, even if we've
> > > > > @@ -71,4 +71,9 @@
> > > > > * this is for compatibility with legacy systems.
> > > > > */
> > > > > #define VIRTIO_F_IOMMU_PLATFORM 33
> > > > > +
> > > > > +/*
> > > > > + * Does the device support Single Root I/O Virtualization?
> > > > > + */
> > > > > +#define VIRTIO_F_SR_IOV 37
> > > > > #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> > > > > --
> > > > > 2.17.0
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > >
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH v3] virtio_pci: support enabling VFs
From: Tiwei Bie @ 2018-06-06 14:19 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, linux-pci, linux-kernel,
virtualization, stefanha, zhihong.wang, bhelgaas, mark.d.rustad
In-Reply-To: <20180606153236-mutt-send-email-mst@kernel.org>
On Wed, Jun 06, 2018 at 03:44:11PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2018 at 08:11:54PM +0800, Tiwei Bie wrote:
> > On Tue, Jun 05, 2018 at 03:23:11PM +0300, Michael S. Tsirkin wrote:
> > > On Tue, Jun 05, 2018 at 09:36:53AM +0800, Tiwei Bie wrote:
> > > > On Mon, Jun 04, 2018 at 07:32:25PM +0300, Michael S. Tsirkin wrote:
> > > > > On Fri, Jun 01, 2018 at 12:02:39PM +0800, Tiwei Bie wrote:
> > > > > > There is a new feature bit allocated in virtio spec to
> > > > > > support SR-IOV (Single Root I/O Virtualization):
> > > > > >
> > > > > > https://github.com/oasis-tcs/virtio-spec/issues/11
> > > > > >
> > > > > > This patch enables the support for this feature bit in
> > > > > > virtio driver.
> > > > > >
> > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > ---
> > > > >
> > > > > OK but what about freeze/restore functions?
> > >
> > > So for restore, don't you need to restore the
> > > sriov capability?
> >
> > Currently I'm not familiar with the PM part.
> > But I still think the sriov capability should
> > be handled by PCI core.
>
> OK but the point is restore looks just like power up for device.
>
> > I'm trying to understand
> > all the relevant code..
> > For your question, based on what I found from
> > the code currently, I guess the sriov capability
> > will be restored by pci_restore_state() which
> > will be called by the ops in pci_dev_pm_ops.
> > The sriov_restore_state() will be called
> > eventually.
> >
> > Best regards,
> > Tiwei Bie
>
> Right but my point is during resume SRIOV gets enabled first before
> driver ok.
>
> Maybe we should relax the requirements in the spec:
> - only require FEATURES_OK from device, not DRIVER_OK from driver
> - explain that it only has to happen once, not on each reset,
> and driver can remember the result
I got your point now! I'd like to relax the
requirements in the spec.
Best regards,
Tiwei Bie
>
>
> >
> > >
> > >
> > > > > I also wonder about kexec - virtio.c currently does:
> > > > >
> > > > > /* We always start by resetting the device, in case a previous
> > > > > * driver messed it up. This also tests that code path a little. */
> > > > > dev->config->reset(dev);
> > > > >
> > > > > Do we need to do something like this for sriov?
> > > >
> > > > I think VFs are managed by PCI core. Once they are
> > > > allocated, virtio driver doesn't have to care too
> > > > much about how to manage them. The proposal for the
> > > > spec is just to provide a feature bit based virtio
> > > > way for virtio drivers to know whether a virtio
> > > > device is SR-IOV capable (and virtio drivers can
> > > > support configuring SR-IOV based on the feature
> > > > bit negotiation result).
> > > >
> > > > >
> > > > > I also wonder whether PCI core should disable sriov for us.
> > > > >
> > > > >
> > > > > I wish there was a patch emulating this without vDPA for QEMU,
> > > > > would make it easy to test your patches. Do you happen
> > > > > to have something like this?
> > > >
> > > > Sorry, currently I don't have anything like this..
> > > >
> > > > Best regards,
> > > > Tiwei Bie
> > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > >
> > > > > > v3:
> > > > > > - Drop the acks;
> > > > > >
> > > > > > v2:
> > > > > > - Disable VFs when unbinding the driver (Alex, MST);
> > > > > > - Don't use pci_sriov_configure_simple (Alex);
> > > > > >
> > > > > > drivers/virtio/virtio_pci_common.c | 30 ++++++++++++++++++++++++++++++
> > > > > > drivers/virtio/virtio_pci_modern.c | 14 ++++++++++++++
> > > > > > include/uapi/linux/virtio_config.h | 7 ++++++-
> > > > > > 3 files changed, 50 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > > > index 48d4d1cf1cb6..1d4467b2dc31 100644
> > > > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > > > @@ -577,6 +577,8 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> > > > > > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > > > > > struct device *dev = get_device(&vp_dev->vdev.dev);
> > > > > >
> > > > > > + pci_disable_sriov(pci_dev);
> > > > > > +
> > > > > > unregister_virtio_device(&vp_dev->vdev);
> > > > > >
> > > > > > if (vp_dev->ioaddr)
> > > > > > @@ -588,6 +590,33 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> > > > > > put_device(dev);
> > > > > > }
> > > > > >
> > > > > > +static int virtio_pci_sriov_configure(struct pci_dev *pci_dev, int num_vfs)
> > > > > > +{
> > > > > > + struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > > > > > + struct virtio_device *vdev = &vp_dev->vdev;
> > > > > > + int ret;
> > > > > > +
> > > > > > + if (!(vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_DRIVER_OK))
> > > > > > + return -EBUSY;
> > > > > > +
> > > > > > + if (!__virtio_test_bit(vdev, VIRTIO_F_SR_IOV))
> > > > > > + return -EINVAL;
> > > > > > +
> > > > > > + if (pci_vfs_assigned(pci_dev))
> > > > > > + return -EPERM;
> > > > > > +
> > > > > > + if (num_vfs == 0) {
> > > > > > + pci_disable_sriov(pci_dev);
> > > > > > + return 0;
> > > > > > + }
> > > > > > +
> > > > > > + ret = pci_enable_sriov(pci_dev, num_vfs);
> > > > > > + if (ret < 0)
> > > > > > + return ret;
> > > > > > +
> > > > > > + return num_vfs;
> > > > > > +}
> > > > > > +
> > > > > > static struct pci_driver virtio_pci_driver = {
> > > > > > .name = "virtio-pci",
> > > > > > .id_table = virtio_pci_id_table,
> > > > > > @@ -596,6 +625,7 @@ static struct pci_driver virtio_pci_driver = {
> > > > > > #ifdef CONFIG_PM_SLEEP
> > > > > > .driver.pm = &virtio_pci_pm_ops,
> > > > > > #endif
> > > > > > + .sriov_configure = virtio_pci_sriov_configure,
> > > > > > };
> > > > > >
> > > > > > module_pci_driver(virtio_pci_driver);
> > > > > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > > > > > index 2555d80f6eec..07571daccfec 100644
> > > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > > @@ -153,14 +153,28 @@ static u64 vp_get_features(struct virtio_device *vdev)
> > > > > > return features;
> > > > > > }
> > > > > >
> > > > > > +static void vp_transport_features(struct virtio_device *vdev, u64 features)
> > > > > > +{
> > > > > > + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > > + struct pci_dev *pci_dev = vp_dev->pci_dev;
> > > > > > +
> > > > > > + if ((features & BIT_ULL(VIRTIO_F_SR_IOV)) &&
> > > > > > + pci_find_ext_capability(pci_dev, PCI_EXT_CAP_ID_SRIOV))
> > > > > > + __virtio_set_bit(vdev, VIRTIO_F_SR_IOV);
> > > > > > +}
> > > > > > +
> > > > > > /* virtio config->finalize_features() implementation */
> > > > > > static int vp_finalize_features(struct virtio_device *vdev)
> > > > > > {
> > > > > > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > > + u64 features = vdev->features;
> > > > > >
> > > > > > /* Give virtio_ring a chance to accept features. */
> > > > > > vring_transport_features(vdev);
> > > > > >
> > > > > > + /* Give virtio_pci a chance to accept features. */
> > > > > > + vp_transport_features(vdev, features);
> > > > > > +
> > > > > > if (!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
> > > > > > dev_err(&vdev->dev, "virtio: device uses modern interface "
> > > > > > "but does not have VIRTIO_F_VERSION_1\n");
> > > > > > diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> > > > > > index 308e2096291f..b7c1f4e7d59e 100644
> > > > > > --- a/include/uapi/linux/virtio_config.h
> > > > > > +++ b/include/uapi/linux/virtio_config.h
> > > > > > @@ -49,7 +49,7 @@
> > > > > > * transport being used (eg. virtio_ring), the rest are per-device feature
> > > > > > * bits. */
> > > > > > #define VIRTIO_TRANSPORT_F_START 28
> > > > > > -#define VIRTIO_TRANSPORT_F_END 34
> > > > > > +#define VIRTIO_TRANSPORT_F_END 38
> > > > > >
> > > > > > #ifndef VIRTIO_CONFIG_NO_LEGACY
> > > > > > /* Do we get callbacks when the ring is completely used, even if we've
> > > > > > @@ -71,4 +71,9 @@
> > > > > > * this is for compatibility with legacy systems.
> > > > > > */
> > > > > > #define VIRTIO_F_IOMMU_PLATFORM 33
> > > > > > +
> > > > > > +/*
> > > > > > + * Does the device support Single Root I/O Virtualization?
> > > > > > + */
> > > > > > +#define VIRTIO_F_SR_IOV 37
> > > > > > #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> > > > > > --
> > > > > > 2.17.0
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > >
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH v3] virtio_pci: support enabling VFs
From: Michael S. Tsirkin @ 2018-06-06 14:27 UTC (permalink / raw)
To: Tiwei Bie
Cc: alexander.h.duyck, virtio-dev, linux-pci, linux-kernel,
virtualization, stefanha, zhihong.wang, bhelgaas, mark.d.rustad
In-Reply-To: <20180606141943.GA13904@debian>
On Wed, Jun 06, 2018 at 10:19:43PM +0800, Tiwei Bie wrote:
> On Wed, Jun 06, 2018 at 03:44:11PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 06, 2018 at 08:11:54PM +0800, Tiwei Bie wrote:
> > > On Tue, Jun 05, 2018 at 03:23:11PM +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 05, 2018 at 09:36:53AM +0800, Tiwei Bie wrote:
> > > > > On Mon, Jun 04, 2018 at 07:32:25PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Fri, Jun 01, 2018 at 12:02:39PM +0800, Tiwei Bie wrote:
> > > > > > > There is a new feature bit allocated in virtio spec to
> > > > > > > support SR-IOV (Single Root I/O Virtualization):
> > > > > > >
> > > > > > > https://github.com/oasis-tcs/virtio-spec/issues/11
> > > > > > >
> > > > > > > This patch enables the support for this feature bit in
> > > > > > > virtio driver.
> > > > > > >
> > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > ---
> > > > > >
> > > > > > OK but what about freeze/restore functions?
> > > >
> > > > So for restore, don't you need to restore the
> > > > sriov capability?
> > >
> > > Currently I'm not familiar with the PM part.
> > > But I still think the sriov capability should
> > > be handled by PCI core.
> >
> > OK but the point is restore looks just like power up for device.
> >
> > > I'm trying to understand
> > > all the relevant code..
> > > For your question, based on what I found from
> > > the code currently, I guess the sriov capability
> > > will be restored by pci_restore_state() which
> > > will be called by the ops in pci_dev_pm_ops.
> > > The sriov_restore_state() will be called
> > > eventually.
> > >
> > > Best regards,
> > > Tiwei Bie
> >
> > Right but my point is during resume SRIOV gets enabled first before
> > driver ok.
> >
> > Maybe we should relax the requirements in the spec:
> > - only require FEATURES_OK from device, not DRIVER_OK from driver
> > - explain that it only has to happen once, not on each reset,
> > and driver can remember the result
>
> I got your point now! I'd like to relax the
> requirements in the spec.
>
> Best regards,
> Tiwei Bie
Well the ballot approving your change closed. I think we should apply
the first chunks reserving the feature bit then, and defer the rest, and
you can work on new wording documenting the actual behaviour with a new
github issue to track that - does this make sense?
Let's do it quickly though - I don't want to bother the
TC with re-voting the deferral, then the new patch.
> >
> >
> > >
> > > >
> > > >
> > > > > > I also wonder about kexec - virtio.c currently does:
> > > > > >
> > > > > > /* We always start by resetting the device, in case a previous
> > > > > > * driver messed it up. This also tests that code path a little. */
> > > > > > dev->config->reset(dev);
> > > > > >
> > > > > > Do we need to do something like this for sriov?
> > > > >
> > > > > I think VFs are managed by PCI core. Once they are
> > > > > allocated, virtio driver doesn't have to care too
> > > > > much about how to manage them. The proposal for the
> > > > > spec is just to provide a feature bit based virtio
> > > > > way for virtio drivers to know whether a virtio
> > > > > device is SR-IOV capable (and virtio drivers can
> > > > > support configuring SR-IOV based on the feature
> > > > > bit negotiation result).
> > > > >
> > > > > >
> > > > > > I also wonder whether PCI core should disable sriov for us.
> > > > > >
> > > > > >
> > > > > > I wish there was a patch emulating this without vDPA for QEMU,
> > > > > > would make it easy to test your patches. Do you happen
> > > > > > to have something like this?
> > > > >
> > > > > Sorry, currently I don't have anything like this..
> > > > >
> > > > > Best regards,
> > > > > Tiwei Bie
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >
> > > > > > > v3:
> > > > > > > - Drop the acks;
> > > > > > >
> > > > > > > v2:
> > > > > > > - Disable VFs when unbinding the driver (Alex, MST);
> > > > > > > - Don't use pci_sriov_configure_simple (Alex);
> > > > > > >
> > > > > > > drivers/virtio/virtio_pci_common.c | 30 ++++++++++++++++++++++++++++++
> > > > > > > drivers/virtio/virtio_pci_modern.c | 14 ++++++++++++++
> > > > > > > include/uapi/linux/virtio_config.h | 7 ++++++-
> > > > > > > 3 files changed, 50 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > > > > index 48d4d1cf1cb6..1d4467b2dc31 100644
> > > > > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > > > > @@ -577,6 +577,8 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> > > > > > > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > > > > > > struct device *dev = get_device(&vp_dev->vdev.dev);
> > > > > > >
> > > > > > > + pci_disable_sriov(pci_dev);
> > > > > > > +
> > > > > > > unregister_virtio_device(&vp_dev->vdev);
> > > > > > >
> > > > > > > if (vp_dev->ioaddr)
> > > > > > > @@ -588,6 +590,33 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> > > > > > > put_device(dev);
> > > > > > > }
> > > > > > >
> > > > > > > +static int virtio_pci_sriov_configure(struct pci_dev *pci_dev, int num_vfs)
> > > > > > > +{
> > > > > > > + struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > > > > > > + struct virtio_device *vdev = &vp_dev->vdev;
> > > > > > > + int ret;
> > > > > > > +
> > > > > > > + if (!(vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_DRIVER_OK))
> > > > > > > + return -EBUSY;
> > > > > > > +
> > > > > > > + if (!__virtio_test_bit(vdev, VIRTIO_F_SR_IOV))
> > > > > > > + return -EINVAL;
> > > > > > > +
> > > > > > > + if (pci_vfs_assigned(pci_dev))
> > > > > > > + return -EPERM;
> > > > > > > +
> > > > > > > + if (num_vfs == 0) {
> > > > > > > + pci_disable_sriov(pci_dev);
> > > > > > > + return 0;
> > > > > > > + }
> > > > > > > +
> > > > > > > + ret = pci_enable_sriov(pci_dev, num_vfs);
> > > > > > > + if (ret < 0)
> > > > > > > + return ret;
> > > > > > > +
> > > > > > > + return num_vfs;
> > > > > > > +}
> > > > > > > +
> > > > > > > static struct pci_driver virtio_pci_driver = {
> > > > > > > .name = "virtio-pci",
> > > > > > > .id_table = virtio_pci_id_table,
> > > > > > > @@ -596,6 +625,7 @@ static struct pci_driver virtio_pci_driver = {
> > > > > > > #ifdef CONFIG_PM_SLEEP
> > > > > > > .driver.pm = &virtio_pci_pm_ops,
> > > > > > > #endif
> > > > > > > + .sriov_configure = virtio_pci_sriov_configure,
> > > > > > > };
> > > > > > >
> > > > > > > module_pci_driver(virtio_pci_driver);
> > > > > > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > > > > > > index 2555d80f6eec..07571daccfec 100644
> > > > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > > > @@ -153,14 +153,28 @@ static u64 vp_get_features(struct virtio_device *vdev)
> > > > > > > return features;
> > > > > > > }
> > > > > > >
> > > > > > > +static void vp_transport_features(struct virtio_device *vdev, u64 features)
> > > > > > > +{
> > > > > > > + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > > > + struct pci_dev *pci_dev = vp_dev->pci_dev;
> > > > > > > +
> > > > > > > + if ((features & BIT_ULL(VIRTIO_F_SR_IOV)) &&
> > > > > > > + pci_find_ext_capability(pci_dev, PCI_EXT_CAP_ID_SRIOV))
> > > > > > > + __virtio_set_bit(vdev, VIRTIO_F_SR_IOV);
> > > > > > > +}
> > > > > > > +
> > > > > > > /* virtio config->finalize_features() implementation */
> > > > > > > static int vp_finalize_features(struct virtio_device *vdev)
> > > > > > > {
> > > > > > > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > > > + u64 features = vdev->features;
> > > > > > >
> > > > > > > /* Give virtio_ring a chance to accept features. */
> > > > > > > vring_transport_features(vdev);
> > > > > > >
> > > > > > > + /* Give virtio_pci a chance to accept features. */
> > > > > > > + vp_transport_features(vdev, features);
> > > > > > > +
> > > > > > > if (!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
> > > > > > > dev_err(&vdev->dev, "virtio: device uses modern interface "
> > > > > > > "but does not have VIRTIO_F_VERSION_1\n");
> > > > > > > diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> > > > > > > index 308e2096291f..b7c1f4e7d59e 100644
> > > > > > > --- a/include/uapi/linux/virtio_config.h
> > > > > > > +++ b/include/uapi/linux/virtio_config.h
> > > > > > > @@ -49,7 +49,7 @@
> > > > > > > * transport being used (eg. virtio_ring), the rest are per-device feature
> > > > > > > * bits. */
> > > > > > > #define VIRTIO_TRANSPORT_F_START 28
> > > > > > > -#define VIRTIO_TRANSPORT_F_END 34
> > > > > > > +#define VIRTIO_TRANSPORT_F_END 38
> > > > > > >
> > > > > > > #ifndef VIRTIO_CONFIG_NO_LEGACY
> > > > > > > /* Do we get callbacks when the ring is completely used, even if we've
> > > > > > > @@ -71,4 +71,9 @@
> > > > > > > * this is for compatibility with legacy systems.
> > > > > > > */
> > > > > > > #define VIRTIO_F_IOMMU_PLATFORM 33
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Does the device support Single Root I/O Virtualization?
> > > > > > > + */
> > > > > > > +#define VIRTIO_F_SR_IOV 37
> > > > > > > #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> > > > > > > --
> > > > > > > 2.17.0
> > > > > >
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > >
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH v3] virtio_pci: support enabling VFs
From: Tiwei Bie @ 2018-06-06 15:08 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: alexander.h.duyck, virtio-dev, linux-pci, linux-kernel,
virtualization, stefanha, zhihong.wang, bhelgaas, mark.d.rustad
In-Reply-To: <20180606172139-mutt-send-email-mst@kernel.org>
On Wed, Jun 06, 2018 at 05:27:07PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 06, 2018 at 10:19:43PM +0800, Tiwei Bie wrote:
> > On Wed, Jun 06, 2018 at 03:44:11PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, Jun 06, 2018 at 08:11:54PM +0800, Tiwei Bie wrote:
> > > > On Tue, Jun 05, 2018 at 03:23:11PM +0300, Michael S. Tsirkin wrote:
> > > > > On Tue, Jun 05, 2018 at 09:36:53AM +0800, Tiwei Bie wrote:
> > > > > > On Mon, Jun 04, 2018 at 07:32:25PM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Fri, Jun 01, 2018 at 12:02:39PM +0800, Tiwei Bie wrote:
> > > > > > > > There is a new feature bit allocated in virtio spec to
> > > > > > > > support SR-IOV (Single Root I/O Virtualization):
> > > > > > > >
> > > > > > > > https://github.com/oasis-tcs/virtio-spec/issues/11
> > > > > > > >
> > > > > > > > This patch enables the support for this feature bit in
> > > > > > > > virtio driver.
> > > > > > > >
> > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > > ---
> > > > > > >
> > > > > > > OK but what about freeze/restore functions?
> > > > >
> > > > > So for restore, don't you need to restore the
> > > > > sriov capability?
> > > >
> > > > Currently I'm not familiar with the PM part.
> > > > But I still think the sriov capability should
> > > > be handled by PCI core.
> > >
> > > OK but the point is restore looks just like power up for device.
> > >
> > > > I'm trying to understand
> > > > all the relevant code..
> > > > For your question, based on what I found from
> > > > the code currently, I guess the sriov capability
> > > > will be restored by pci_restore_state() which
> > > > will be called by the ops in pci_dev_pm_ops.
> > > > The sriov_restore_state() will be called
> > > > eventually.
> > > >
> > > > Best regards,
> > > > Tiwei Bie
> > >
> > > Right but my point is during resume SRIOV gets enabled first before
> > > driver ok.
> > >
> > > Maybe we should relax the requirements in the spec:
> > > - only require FEATURES_OK from device, not DRIVER_OK from driver
> > > - explain that it only has to happen once, not on each reset,
> > > and driver can remember the result
> >
> > I got your point now! I'd like to relax the
> > requirements in the spec.
> >
> > Best regards,
> > Tiwei Bie
>
> Well the ballot approving your change closed. I think we should apply
> the first chunks reserving the feature bit then, and defer the rest, and
> you can work on new wording documenting the actual behaviour with a new
> github issue to track that - does this make sense?
> Let's do it quickly though - I don't want to bother the
> TC with re-voting the deferral, then the new patch.
Yeah. It makes sense! I think it's a good idea
to reserve the feature bit first and defer the
rest. I'll work on the new wording documenting
the actual behavior with a new github issue to
track it. Thanks a lot!
Best regards,
Tiwei Bie
>
> > >
> > >
> > > >
> > > > >
> > > > >
> > > > > > > I also wonder about kexec - virtio.c currently does:
> > > > > > >
> > > > > > > /* We always start by resetting the device, in case a previous
> > > > > > > * driver messed it up. This also tests that code path a little. */
> > > > > > > dev->config->reset(dev);
> > > > > > >
> > > > > > > Do we need to do something like this for sriov?
> > > > > >
> > > > > > I think VFs are managed by PCI core. Once they are
> > > > > > allocated, virtio driver doesn't have to care too
> > > > > > much about how to manage them. The proposal for the
> > > > > > spec is just to provide a feature bit based virtio
> > > > > > way for virtio drivers to know whether a virtio
> > > > > > device is SR-IOV capable (and virtio drivers can
> > > > > > support configuring SR-IOV based on the feature
> > > > > > bit negotiation result).
> > > > > >
> > > > > > >
> > > > > > > I also wonder whether PCI core should disable sriov for us.
> > > > > > >
> > > > > > >
> > > > > > > I wish there was a patch emulating this without vDPA for QEMU,
> > > > > > > would make it easy to test your patches. Do you happen
> > > > > > > to have something like this?
> > > > > >
> > > > > > Sorry, currently I don't have anything like this..
> > > > > >
> > > > > > Best regards,
> > > > > > Tiwei Bie
> > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > >
> > > > > > > > v3:
> > > > > > > > - Drop the acks;
> > > > > > > >
> > > > > > > > v2:
> > > > > > > > - Disable VFs when unbinding the driver (Alex, MST);
> > > > > > > > - Don't use pci_sriov_configure_simple (Alex);
> > > > > > > >
> > > > > > > > drivers/virtio/virtio_pci_common.c | 30 ++++++++++++++++++++++++++++++
> > > > > > > > drivers/virtio/virtio_pci_modern.c | 14 ++++++++++++++
> > > > > > > > include/uapi/linux/virtio_config.h | 7 ++++++-
> > > > > > > > 3 files changed, 50 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > > > > > > index 48d4d1cf1cb6..1d4467b2dc31 100644
> > > > > > > > --- a/drivers/virtio/virtio_pci_common.c
> > > > > > > > +++ b/drivers/virtio/virtio_pci_common.c
> > > > > > > > @@ -577,6 +577,8 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> > > > > > > > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > > > > > > > struct device *dev = get_device(&vp_dev->vdev.dev);
> > > > > > > >
> > > > > > > > + pci_disable_sriov(pci_dev);
> > > > > > > > +
> > > > > > > > unregister_virtio_device(&vp_dev->vdev);
> > > > > > > >
> > > > > > > > if (vp_dev->ioaddr)
> > > > > > > > @@ -588,6 +590,33 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> > > > > > > > put_device(dev);
> > > > > > > > }
> > > > > > > >
> > > > > > > > +static int virtio_pci_sriov_configure(struct pci_dev *pci_dev, int num_vfs)
> > > > > > > > +{
> > > > > > > > + struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > > > > > > > + struct virtio_device *vdev = &vp_dev->vdev;
> > > > > > > > + int ret;
> > > > > > > > +
> > > > > > > > + if (!(vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_DRIVER_OK))
> > > > > > > > + return -EBUSY;
> > > > > > > > +
> > > > > > > > + if (!__virtio_test_bit(vdev, VIRTIO_F_SR_IOV))
> > > > > > > > + return -EINVAL;
> > > > > > > > +
> > > > > > > > + if (pci_vfs_assigned(pci_dev))
> > > > > > > > + return -EPERM;
> > > > > > > > +
> > > > > > > > + if (num_vfs == 0) {
> > > > > > > > + pci_disable_sriov(pci_dev);
> > > > > > > > + return 0;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + ret = pci_enable_sriov(pci_dev, num_vfs);
> > > > > > > > + if (ret < 0)
> > > > > > > > + return ret;
> > > > > > > > +
> > > > > > > > + return num_vfs;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > static struct pci_driver virtio_pci_driver = {
> > > > > > > > .name = "virtio-pci",
> > > > > > > > .id_table = virtio_pci_id_table,
> > > > > > > > @@ -596,6 +625,7 @@ static struct pci_driver virtio_pci_driver = {
> > > > > > > > #ifdef CONFIG_PM_SLEEP
> > > > > > > > .driver.pm = &virtio_pci_pm_ops,
> > > > > > > > #endif
> > > > > > > > + .sriov_configure = virtio_pci_sriov_configure,
> > > > > > > > };
> > > > > > > >
> > > > > > > > module_pci_driver(virtio_pci_driver);
> > > > > > > > diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
> > > > > > > > index 2555d80f6eec..07571daccfec 100644
> > > > > > > > --- a/drivers/virtio/virtio_pci_modern.c
> > > > > > > > +++ b/drivers/virtio/virtio_pci_modern.c
> > > > > > > > @@ -153,14 +153,28 @@ static u64 vp_get_features(struct virtio_device *vdev)
> > > > > > > > return features;
> > > > > > > > }
> > > > > > > >
> > > > > > > > +static void vp_transport_features(struct virtio_device *vdev, u64 features)
> > > > > > > > +{
> > > > > > > > + struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > > > > + struct pci_dev *pci_dev = vp_dev->pci_dev;
> > > > > > > > +
> > > > > > > > + if ((features & BIT_ULL(VIRTIO_F_SR_IOV)) &&
> > > > > > > > + pci_find_ext_capability(pci_dev, PCI_EXT_CAP_ID_SRIOV))
> > > > > > > > + __virtio_set_bit(vdev, VIRTIO_F_SR_IOV);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > /* virtio config->finalize_features() implementation */
> > > > > > > > static int vp_finalize_features(struct virtio_device *vdev)
> > > > > > > > {
> > > > > > > > struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> > > > > > > > + u64 features = vdev->features;
> > > > > > > >
> > > > > > > > /* Give virtio_ring a chance to accept features. */
> > > > > > > > vring_transport_features(vdev);
> > > > > > > >
> > > > > > > > + /* Give virtio_pci a chance to accept features. */
> > > > > > > > + vp_transport_features(vdev, features);
> > > > > > > > +
> > > > > > > > if (!__virtio_test_bit(vdev, VIRTIO_F_VERSION_1)) {
> > > > > > > > dev_err(&vdev->dev, "virtio: device uses modern interface "
> > > > > > > > "but does not have VIRTIO_F_VERSION_1\n");
> > > > > > > > diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> > > > > > > > index 308e2096291f..b7c1f4e7d59e 100644
> > > > > > > > --- a/include/uapi/linux/virtio_config.h
> > > > > > > > +++ b/include/uapi/linux/virtio_config.h
> > > > > > > > @@ -49,7 +49,7 @@
> > > > > > > > * transport being used (eg. virtio_ring), the rest are per-device feature
> > > > > > > > * bits. */
> > > > > > > > #define VIRTIO_TRANSPORT_F_START 28
> > > > > > > > -#define VIRTIO_TRANSPORT_F_END 34
> > > > > > > > +#define VIRTIO_TRANSPORT_F_END 38
> > > > > > > >
> > > > > > > > #ifndef VIRTIO_CONFIG_NO_LEGACY
> > > > > > > > /* Do we get callbacks when the ring is completely used, even if we've
> > > > > > > > @@ -71,4 +71,9 @@
> > > > > > > > * this is for compatibility with legacy systems.
> > > > > > > > */
> > > > > > > > #define VIRTIO_F_IOMMU_PLATFORM 33
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Does the device support Single Root I/O Virtualization?
> > > > > > > > + */
> > > > > > > > +#define VIRTIO_F_SR_IOV 37
> > > > > > > > #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> > > > > > > > --
> > > > > > > > 2.17.0
> > > > > > >
> > > > > > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > >
^ permalink raw reply
* Re: [PATCH v2 1/2] compiler-gcc.h: add gnu_inline to all inline declarations
From: hpa @ 2018-06-06 16:39 UTC (permalink / raw)
To: sedat.dilek
Cc: linux-efi, brijesh.singh, boris.ostrovsky, jan.kiszka,
will.deacon, jarkko.sakkinen, virtualization, yamada.masahiro,
manojgupta, akataria, tweek, mawilcox, x86, ghackmann, mka, geert,
rientjes, aryabinin, thomas.lendacky, Arnd Bergmann, linux-kbuild,
keescook, rostedt, caoj.fnst, michal.lkml, jpoimboe, tglx, mingo,
jgross, astrachan, ard.biesheuvel, gregkh, tstellar,
Nick Desaulniers, linux-kernel, mjg59
In-Reply-To: <CA+icZUXEpe954z0TSbhSJPdFCfOqsBtpO9mPZBcJ2HWubSo_zQ@mail.gmail.com>
On June 6, 2018 1:05:45 AM PDT, Sedat Dilek <sedat.dilek@gmail.com> wrote:
>Hi,
>
>when discovering 'gnu_inline', I found ...
>
>$ git grep -w __FORTIFY_INLINE
>include/linux/string.h:#define __FORTIFY_INLINE extern __always_inline
>__attribute__((gnu_inline))
>include/linux/string.h:__FORTIFY_INLINE char *strncpy(char *p, const
>char *q, __kernel_size_t size)
>include/linux/string.h:__FORTIFY_INLINE char *strcat(char *p, const
>char *q)
>include/linux/string.h:__FORTIFY_INLINE __kernel_size_t strlen(const
>char *p)
>include/linux/string.h:__FORTIFY_INLINE __kernel_size_t strnlen(const
>char *p, __kernel_size_t maxlen)
>include/linux/string.h:__FORTIFY_INLINE size_t strlcpy(char *p, const
>char *q, size_t size)
>include/linux/string.h:__FORTIFY_INLINE char *strncat(char *p, const
>char *q, __kernel_size_t count)
>include/linux/string.h:__FORTIFY_INLINE void *memset(void *p, int c,
>__kernel_size_t size)
>include/linux/string.h:__FORTIFY_INLINE void *memcpy(void *p, const
>void *q, __kernel_size_t size)
>include/linux/string.h:__FORTIFY_INLINE void *memmove(void *p, const
>void *q, __kernel_size_t size)
>include/linux/string.h:__FORTIFY_INLINE void *memscan(void *p, int c,
>__kernel_size_t size)
>include/linux/string.h:__FORTIFY_INLINE int memcmp(const void *p,
>const void *q, __kernel_size_t size)
>include/linux/string.h:__FORTIFY_INLINE void *memchr(const void *p,
>int c, __kernel_size_t size)
>include/linux/string.h:__FORTIFY_INLINE void *memchr_inv(const void
>*p, int c, size_t size)
>include/linux/string.h:__FORTIFY_INLINE void *kmemdup(const void *p,
>size_t size, gfp_t gfp)
>include/linux/string.h:__FORTIFY_INLINE char *strcpy(char *p, const
>char *q)
>
>After the inline changes suggested by Joe this can be adapted?
>
>Beyond this, a general question: Can someone explain why all these
>inline defines are in compiler-gcc.h (as there exists compiler.h and
>compiler-clang.h)?
>
>Thanks.
>
>Regards,
>- Sedat -
Because gcc itself also supports both GNU89-style and C99-style inlines, but the kernel was built with the former, and it is not necessarily a trivial modification, except for "static inline" which is the same for both.
The other option is to pass -fgnu89-inline on the command line, which is supported by both gcc and clang. The two methods are fully equivalent.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
^ permalink raw reply
* Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Samudrala, Sridhar @ 2018-06-06 18:17 UTC (permalink / raw)
To: Jason Wang, mst, virtualization, virtio-dev, jesse.brandeburg,
alexander.h.duyck, qemu-devel, laine, libvir-list
In-Reply-To: <d61c88c7-0803-ba1c-08ab-fc668cffa40b@redhat.com>
On 6/4/2018 7:06 PM, Jason Wang wrote:
>
>
> On 2018年06月05日 09:41, Samudrala, Sridhar wrote:
>> Ping on this patch now that the kernel patches are accepted into
>> davem's net-next tree.
>> https://patchwork.ozlabs.org/cover/920005/
>>
>>
>> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
>>> This feature bit can be used by hypervisor to indicate virtio_net
>>> device to
>>> act as a standby for another device with the same MAC address.
>>>
>>> I tested this with a small change to the patch to mark the STANDBY
>>> feature 'true'
>>> by default as i am using libvirt to start the VMs.
>>> Is there a way to pass the newly added feature bit 'standby' to qemu
>>> via libvirt
>>> XML file?
>>>
>
> Maybe you can try qemu command line passthrough:
>
> https://libvirt.org/drvqemu.html#qemucommand
It looks like this can be used to pass command line arguments to qemu.
Is it possible to specify a virtio specific attribute via this method?
For ex: to say mrg_rxbuf is off we can add the following line to virtio
section of the domain xml file.
<host mrg_rxbuf='off'/>
I think libvirt needs to be extended to to support the new 'standby' attribute
via this mechanism.
Adding Liane Stump and libvirt to the CC list.
Michael,
Can we start with getting this patch into Qemu and an update to libvirt to
support the 'standby' feature so that this feature can be enabled via
some scripts/orchestration layer for now.
We could improve this solution by enhancing Qemu to do automatic management of the
addition/deletion of the primary device based on feature negotiation as a later patch.
>
> The patch looks good to me. Let's see if Michael have any comment.
>
> Thanks
>
>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>> ---
>>> hw/net/virtio-net.c | 2 ++
>>> include/standard-headers/linux/virtio_net.h | 3 +++
>>> 2 files changed, 5 insertions(+)
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index 90502fca7c..38b3140670 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
>>> true),
>>> DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed,
>>> SPEED_UNKNOWN),
>>> DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>>> + DEFINE_PROP_BIT64("standby", VirtIONet, host_features,
>>> VIRTIO_NET_F_STANDBY,
>>> + false),
>>> DEFINE_PROP_END_OF_LIST(),
>>> };
>>> diff --git a/include/standard-headers/linux/virtio_net.h
>>> b/include/standard-headers/linux/virtio_net.h
>>> index e9f255ea3f..01ec09684c 100644
>>> --- a/include/standard-headers/linux/virtio_net.h
>>> +++ b/include/standard-headers/linux/virtio_net.h
>>> @@ -57,6 +57,9 @@
>>> * Steering */
>>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>>> +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for
>>> another device
>>> + * with the same MAC.
>>> + */
>>> #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed
>>> and duplex */
>>> #ifndef VIRTIO_NET_NO_LEGACY
>>
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-06 18:53 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: alexander.h.duyck, virtio-dev, libvir-list, qemu-devel, laine,
virtualization
In-Reply-To: <8e411566-1146-03a2-5372-8f8134ee0ce6@intel.com>
On Wed, Jun 06, 2018 at 11:17:36AM -0700, Samudrala, Sridhar wrote:
> On 6/4/2018 7:06 PM, Jason Wang wrote:
> >
> >
> > On 2018年06月05日 09:41, Samudrala, Sridhar wrote:
> > > Ping on this patch now that the kernel patches are accepted into
> > > davem's net-next tree.
> > > https://patchwork.ozlabs.org/cover/920005/
> > >
> > >
> > > On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
> > > > This feature bit can be used by hypervisor to indicate
> > > > virtio_net device to
> > > > act as a standby for another device with the same MAC address.
> > > >
> > > > I tested this with a small change to the patch to mark the
> > > > STANDBY feature 'true'
> > > > by default as i am using libvirt to start the VMs.
> > > > Is there a way to pass the newly added feature bit 'standby' to
> > > > qemu via libvirt
> > > > XML file?
> > > >
> >
> > Maybe you can try qemu command line passthrough:
> >
> > https://libvirt.org/drvqemu.html#qemucommand
>
> It looks like this can be used to pass command line arguments to qemu.
> Is it possible to specify a virtio specific attribute via this method?
>
> For ex: to say mrg_rxbuf is off we can add the following line to virtio
> section of the domain xml file.
> <host mrg_rxbuf='off'/>
>
> I think libvirt needs to be extended to to support the new 'standby' attribute
> via this mechanism.
> Adding Liane Stump and libvirt to the CC list.
>
> Michael,
> Can we start with getting this patch into Qemu and an update to libvirt to
> support the 'standby' feature so that this feature can be enabled via
> some scripts/orchestration layer for now.
Unfortunately we don't give the hypothetical orchestration layer
enough info about driver being bound, so it does not know
when is it safe to add a primary. And a similar issue around reset.
We could add events for that which would be a small patch,
but the issue then is that guest can trigger a storm of
these events.
> We could improve this solution by enhancing Qemu to do automatic management of the
> addition/deletion of the primary device based on feature negotiation as a later patch.
I'm not sure what the point of the back and forth would be though.
Typically after people invest in a specific config and get it working,
it's very hard to move them to another solution.
>
> >
> > The patch looks good to me. Let's see if Michael have any comment.
> >
> > Thanks
> >
> > > > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > > > ---
> > > > hw/net/virtio-net.c | 2 ++
> > > > include/standard-headers/linux/virtio_net.h | 3 +++
> > > > 2 files changed, 5 insertions(+)
> > > >
> > > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > > index 90502fca7c..38b3140670 100644
> > > > --- a/hw/net/virtio-net.c
> > > > +++ b/hw/net/virtio-net.c
> > > > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> > > > true),
> > > > DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed,
> > > > SPEED_UNKNOWN),
> > > > DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> > > > + DEFINE_PROP_BIT64("standby", VirtIONet, host_features,
> > > > VIRTIO_NET_F_STANDBY,
> > > > + false),
> > > > DEFINE_PROP_END_OF_LIST(),
> > > > };
> > > > diff --git a/include/standard-headers/linux/virtio_net.h
> > > > b/include/standard-headers/linux/virtio_net.h
> > > > index e9f255ea3f..01ec09684c 100644
> > > > --- a/include/standard-headers/linux/virtio_net.h
> > > > +++ b/include/standard-headers/linux/virtio_net.h
> > > > @@ -57,6 +57,9 @@
> > > > * Steering */
> > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
> > > > +#define VIRTIO_NET_F_STANDBY 62 /* Act as standby for
> > > > another device
> > > > + * with the same MAC.
> > > > + */
> > > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set
> > > > linkspeed and duplex */
> > > > #ifndef VIRTIO_NET_NO_LEGACY
> > >
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [libvirt] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Samudrala, Sridhar @ 2018-06-06 19:39 UTC (permalink / raw)
To: Ján Tomko
Cc: alexander.h.duyck, virtio-dev, mst, qemu-devel, virtualization,
libvir-list, laine
In-Reply-To: <20180606185214.GH6767@dnr>
On 6/6/2018 11:52 AM, Ján Tomko wrote:
> On Wed, Jun 06, 2018 at 11:17:36AM -0700, Samudrala, Sridhar wrote:
>> On 6/4/2018 7:06 PM, Jason Wang wrote:
>>>
>>>
>>> On 2018年06月05日 09:41, Samudrala, Sridhar wrote:
>>>> Ping on this patch now that the kernel patches are accepted into
>>>> davem's net-next tree.
>>>> https://patchwork.ozlabs.org/cover/920005/
>>>>
>>>>
>>>> On 5/7/2018 4:09 PM, Sridhar Samudrala wrote:
>>>>> This feature bit can be used by hypervisor to indicate virtio_net
>>>>> device to
>>>>> act as a standby for another device with the same MAC address.
>>>>>
>>>>> I tested this with a small change to the patch to mark the STANDBY
>>>>> feature 'true'
>>>>> by default as i am using libvirt to start the VMs.
>>>>> Is there a way to pass the newly added feature bit 'standby' to qemu
>>>>> via libvirt
>>>>> XML file?
>>>>>
>>>
>>> Maybe you can try qemu command line passthrough:
>>>
>>> https://libvirt.org/drvqemu.html#qemucommand
>>
>> It looks like this can be used to pass command line arguments to qemu.
>> Is it possible to specify a virtio specific attribute via this method?
>>
>
> Yes, for testing purposes you should be able to do this via using QEMU's
> -set command line argument:
> http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
>
> i.e.:
>
> <domain type='kvm'
> xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
> ...
> <qemu:commandline>
> <qemu:arg value='-set'/>
> <qemu:arg value='device.net0.standby=on'/>
> </qemu:commandline>
> </domain>
Thanks. Will try this.
>
>> For ex: to say mrg_rxbuf is off we can add the following line to virtio
>> section of the domain xml file.
>> <host mrg_rxbuf='off'/>
>>
>> I think libvirt needs to be extended to to support the new 'standby'
>> attribute
>> via this mechanism.
>> Adding Liane Stump and libvirt to the CC list.
>
> *Laine
>
>>
>> Michael,
>> Can we start with getting this patch into Qemu and an update to
>> libvirt to
>> support the 'standby' feature so that this feature can be enabled via
>> some scripts/orchestration layer for now.
>>
>> We could improve this solution by enhancing Qemu to do automatic
>> management of the
>> addition/deletion of the primary device based on feature negotiation
>> as a later patch.
>>
>
> If that means the libvirt attribute would no longer be needed, I don't
> see the reason to add it to libvirt in the first place.
I think we still need this attribute supported via libvirt so that a user/admin
can enable this feature via domain XML specification.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Christoph Hellwig @ 2018-06-07 5:23 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: robh, pawel.moll, Tom Lendacky, cohuck, Ram Pai, linux-kernel,
virtualization, hch, joe, Rustad, Mark D, david, linuxppc-dev,
elfring, Anshuman Khandual
In-Reply-To: <20180531204320-mutt-send-email-mst@kernel.org>
On Thu, May 31, 2018 at 08:43:58PM +0300, Michael S. Tsirkin wrote:
> Pls work on a long term solution. Short term needs can be served by
> enabling the iommu platform in qemu.
So, I spent some time looking at converting virtio to dma ops overrides,
and the current virtio spec, and the sad through I have to tell is that
both the spec and the Linux implementation are complete and utterly fucked
up.
Both in the flag naming and the implementation there is an implication
of DMA API == IOMMU, which is fundamentally wrong.
The DMA API does a few different things:
a) address translation
This does include IOMMUs. But it also includes random offsets
between PCI bars and system memory that we see on various
platforms. Worse so some of these offsets might be based on
banks, e.g. on the broadcom bmips platform. It also deals
with bitmask in physical addresses related to memory encryption
like AMD SEV. I'd be really curious how for example the
Intel virtio based NIC is going to work on any of those
plaforms.
b) coherency
On many architectures DMA is not cache coherent, and we need
to invalidate and/or write back cache lines before doing
DMA. Again, I wonder how this is every going to work with
hardware based virtio implementations. Even worse I think this
is actually broken at least for VIVT event for virtualized
implementations. E.g. a KVM guest is going to access memory
using different virtual addresses than qemu, vhost might throw
in another different address space.
c) bounce buffering
Many DMA implementations can not address all physical memory
due to addressing limitations. In such cases we copy the
DMA memory into a known addressable bounc buffer and DMA
from there.
d) flushing write combining buffers or similar
On some hardware platforms we need workarounds to e.g. read
from a certain mmio address to make sure DMA can actually
see memory written by the host.
All of this is bypassed by virtio by default despite generally being
platform issues, not particular to a given device.
^ permalink raw reply
* Re: [RFC v6 4/5] virtio_ring: add event idx support in packed ring
From: Jason Wang @ 2018-06-07 9:50 UTC (permalink / raw)
To: Tiwei Bie, mst, virtualization, linux-kernel, netdev; +Cc: wexu
In-Reply-To: <20180605074046.20709-5-tiwei.bie@intel.com>
On 2018年06月05日 15:40, Tiwei Bie wrote:
> static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> + u16 bufs, used_idx, wrap_counter;
>
> START_USE(vq);
>
> /* We optimistically turn back on interrupts, then check if there was
> * more to do. */
> + /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> + * either clear the flags bit or point the event index at the next
> + * entry. Always update the event index to keep code simple. */
> +
Maybe for packed ring, it's time to treat event index separately to
avoid a virtio_wmb() for event idx is off.
> + /* TODO: tune this threshold */
> + if (vq->next_avail_idx < vq->last_used_idx)
> + bufs = (vq->vring_packed.num + vq->next_avail_idx -
> + vq->last_used_idx) * 3 / 4;
> + else
> + bufs = (vq->next_avail_idx - vq->last_used_idx) * 3 / 4;
vq->next_avail-idx could be equal to vq->last_usd_idx when the ring is
full. Though virito-net is the only user now and it can guarantee this
won't happen. But consider this is a core API, we should make sure it
can work for any cases.
It looks to me that bufs is just vq->vring_packed.num - vq->num_free?
> +
> + wrap_counter = vq->used_wrap_counter;
> +
> + used_idx = vq->last_used_idx + bufs;
> + if (used_idx >= vq->vring_packed.num) {
> + used_idx -= vq->vring_packed.num;
> + wrap_counter ^= 1;
> + }
> +
> + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> + used_idx | (wrap_counter << 15));
>
> if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> + /* We need to update event offset and event wrap
> + * counter first before updating event flags. */
> + virtio_wmb(vq->weak_barriers);
> + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> + VRING_EVENT_F_ENABLE;
> vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> vq->event_flags_shadow);
> - /* We need to enable interrupts first before re-checking
> - * for more used buffers. */
> - virtio_mb(vq->weak_barriers);
> }
>
> + /* We need to update event suppression structure first
> + * before re-checking for more used buffers. */
> + virtio_mb(vq->weak_barriers);
> +
> if (more_used_packed(vq)) {
> END_USE(vq);
> return false;
I think what we need to to make sure the descriptor used_idx is used?
Otherwise we may stop and restart qdisc too frequently?
Thanks
> --
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v3] virtio: update the comments for transport features
From: Stefan Hajnoczi @ 2018-06-07 13:04 UTC (permalink / raw)
To: Tiwei Bie; +Cc: virtio-dev, mst, cohuck, linux-kernel, virtualization
In-Reply-To: <20180601112632.12651-1-tiwei.bie@intel.com>
[-- Attachment #1.1: Type: text/plain, Size: 776 bytes --]
On Fri, Jun 01, 2018 at 07:26:32PM +0800, Tiwei Bie wrote:
> The existing comments for transport features are outdated.
> So update them to address the latest changes in the spec.
>
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
> ---
> This patch is generated on top of below patch:
> https://lists.oasis-open.org/archives/virtio-dev/201805/msg00212.html
>
> v3:
> - Fix a typo in the commit message (Cornelia);
>
> v2:
> - Improve the comments (Cornelia);
> - Improve the commit message;
>
> include/uapi/linux/virtio_config.h | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH v6] virtio_blk: add DISCARD and WRIET ZEROES commands support
From: Stefan Hajnoczi @ 2018-06-07 13:10 UTC (permalink / raw)
To: Changpeng Liu; +Cc: pbonzini, cavery, virtualization
In-Reply-To: <1528258740-6581-1-git-send-email-changpeng.liu@intel.com>
[-- Attachment #1.1: Type: text/plain, Size: 3765 bytes --]
On Wed, Jun 06, 2018 at 12:19:00PM +0800, Changpeng Liu wrote:
> Existing virtio-blk protocol doesn't have DISCARD/WRITE ZEROES commands
> support, this will impact the performance when using SSD backend over
> file systems.
>
> Commit 88c85538 "virtio-blk: add discard and write zeroes features to
> specification"(see https://github.com/oasis-tcs/virtio-spec) extended
> existing virtio-blk protocol, adding extra DISCARD and WRITE ZEROES
> commands support.
>
> While here, using 16 bytes descriptor to describe one segment of DISCARD
> or WRITE ZEROES commands, each command may contain one or more decriptors.
>
> The following data structure shows the definition of one descriptor:
>
> struct virtio_blk_discard_write_zeroes {
> le64 sector;
> le32 num_sectors;
> le32 unmap;
> };
>
> Field 'sector' means the start sector for DISCARD and WRITE ZEROES,
> filed 'num_sectors' means the number of sectors for DISCARD and WRITE
> ZEROES, if both DISCARD and WRITE ZEROES are supported, field 'unmap'
> maybe used for WRITE ZEROES command with DISCARD enabled.
>
> We also extended the virtio-blk configuration space to let backend
> device put DISCARD and WRITE ZEROES configuration parameters.
>
> struct virtio_blk_config {
> [...]
>
> le32 max_discard_sectors;
> le32 max_discard_seg;
> le32 discard_sector_alignment;
> le32 max_write_zeroes_sectors;
> le32 max_write_zeroes_seg;
> u8 write_zeroes_may_unmap;
> }
>
> New feature bit [VIRTIO_BLK_F_DISCARD (13)]: Device can support discard
> command, maximum discard sectors size in field 'max_discard_sectors' and
> maximum discard segment number in field 'max_discard_seg'.
>
> New feature [VIRTIO_BLK_F_WRITE_ZEROES (14)]: Device can support write
> zeroes command, maximum write zeroes sectors size in field
> 'max_write_zeroes_sectors' and maximum write zeroes segment number in
> field 'max_write_zeroes_seg'.
>
> The parameters in the configuration space of the device field
> 'max_discard_sectors' and field 'discard_sector_alignment' are expressed in
> 512-byte units if the VIRTIO_BLK_F_DISCARD feature bit is negotiated. The
> field 'max_write_zeroes_sectors' is expressed in 512-byte units if the
> VIRTIO_BLK_F_WRITE_ZEROES feature bit is negotiated.
>
> Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
> ---
> CHANGELOG:
> v6: don't set T_OUT bit to discard and write zeroes commands.
I don't see this in the patch...
> @@ -225,6 +260,7 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> int qid = hctx->queue_num;
> int err;
> bool notify = false;
> + bool unmap = false;
> u32 type;
>
> BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
> @@ -237,6 +273,13 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
> case REQ_OP_FLUSH:
> type = VIRTIO_BLK_T_FLUSH;
> break;
> + case REQ_OP_DISCARD:
> + type = VIRTIO_BLK_T_DISCARD;
> + break;
> + case REQ_OP_WRITE_ZEROES:
> + type = VIRTIO_BLK_T_WRITE_ZEROES;
> + unmap = !(req->cmd_flags & REQ_NOUNMAP);
> + break;
> case REQ_OP_SCSI_IN:
> case REQ_OP_SCSI_OUT:
> type = VIRTIO_BLK_T_SCSI_CMD;
> @@ -256,6 +299,12 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx,
>
> blk_mq_start_request(req);
>
> + if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES) {
> + err = virtblk_setup_discard_write_zeroes(req, unmap);
> + if (err)
> + return BLK_STS_RESOURCE;
> + }
> +
> num = blk_rq_map_sg(hctx->queue, req, vbr->sg);
> if (num) {
> if (rq_data_dir(req) == WRITE)
...since we still do blk_rq_map_sg() here and num should be != 0.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: KMSAN: uninit-value in _copy_to_iter (2)
From: Michael S. Tsirkin @ 2018-06-07 15:38 UTC (permalink / raw)
To: syzbot
Cc: willemb, avagin, kvm, netdev, matthew, linux-kernel, mingo,
syzkaller-bugs, edumazet, viro, dingtianhong, pabeni,
virtualization, davem, elena.reshetova
In-Reply-To: <000000000000cf4578056ab12452@google.com>
#syz test: https://github.com/google/kmsan.git/master d2d741e5d1898dfde1a75ea3d29a9a3e2edf0617
Subject: vhost: fix info leak
Fixes: CVE-2018-1118
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f0be5f35ab28..9beefa6ed1ce 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2345,6 +2345,9 @@ struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
if (!node)
return NULL;
+
+ /* Make sure all padding within the structure is initialized. */
+ memset(&node->msg, 0, sizeof node->msg);
node->vq = vq;
node->msg.type = type;
return node;
^ permalink raw reply related
* Re: KMSAN: uninit-value in _copy_to_iter (2)
From: Dmitry Vyukov via Virtualization @ 2018-06-07 16:25 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Willem de Bruijn, avagin, KVM list, netdev, Matthew Dawson, LKML,
Ingo Molnar, syzkaller-bugs, Eric Dumazet, Al Viro, Ding Tianhong,
syzbot, Paolo Abeni, virtualization, David Miller,
Reshetova, Elena
In-Reply-To: <20180607183627-mutt-send-email-mst@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 1599 bytes --]
On Thu, Jun 7, 2018 at 5:38 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> #syz test: https://github.com/google/kmsan.git/master d2d741e5d1898dfde1a75ea3d29a9a3e2edf0617
Hi Michael,
We need:
#syz test: https://github.com/google/kmsan.git master
here. Please see
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches
for more info.
Please also add the Reported-by tag when mailing the patch for review.
Thanks
> Subject: vhost: fix info leak
>
> Fixes: CVE-2018-1118
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index f0be5f35ab28..9beefa6ed1ce 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2345,6 +2345,9 @@ struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
> struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
> if (!node)
> return NULL;
> +
> + /* Make sure all padding within the structure is initialized. */
> + memset(&node->msg, 0, sizeof node->msg);
> node->vq = vq;
> node->msg.type = type;
> return node;
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180607183627-mutt-send-email-mst%40kernel.org.
> For more options, visit https://groups.google.com/d/optout.
[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 518 bytes --]
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f0be5f35ab28..9beefa6ed1ce 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2345,6 +2345,9 @@ struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
if (!node)
return NULL;
+
+ /* Make sure all padding within the structure is initialized. */
+ memset(&node->msg, 0, sizeof node->msg);
node->vq = vq;
node->msg.type = type;
return node;
[-- Attachment #3: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related
* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Michael S. Tsirkin @ 2018-06-07 16:28 UTC (permalink / raw)
To: Christoph Hellwig
Cc: robh, pawel.moll, Tom Lendacky, cohuck, Ram Pai, linux-kernel,
virtualization, joe, Rustad, Mark D, david, linuxppc-dev, elfring,
Anshuman Khandual
In-Reply-To: <20180607052306.GA1532@infradead.org>
On Wed, Jun 06, 2018 at 10:23:06PM -0700, Christoph Hellwig wrote:
> On Thu, May 31, 2018 at 08:43:58PM +0300, Michael S. Tsirkin wrote:
> > Pls work on a long term solution. Short term needs can be served by
> > enabling the iommu platform in qemu.
>
> So, I spent some time looking at converting virtio to dma ops overrides,
> and the current virtio spec, and the sad through I have to tell is that
> both the spec and the Linux implementation are complete and utterly fucked
> up.
Let me restate it: DMA API has support for a wide range of hardware, and
hardware based virtio implementations likely won't benefit from all of
it.
And given virtio right now is optimized for specific workloads, improving
portability without regressing performance isn't easy.
I think it's unsurprising since it started a strictly a guest/host
mechanism. People did implement offloads on specific platforms though,
and they are known to work. To improve portability even further,
we might need to make spec and code changes.
I'm not really sympathetic to people complaining that they can't even
set a flag in qemu though. If that's the case the stack in question is
way too inflexible.
> Both in the flag naming and the implementation there is an implication
> of DMA API == IOMMU, which is fundamentally wrong.
Maybe we need to extend the meaning of PLATFORM_IOMMU or rename it.
It's possible that some setups will benefit from a more
fine-grained approach where some aspects of the DMA
API are bypassed, others aren't.
This seems to be what was being asked for in this thread,
with comments claiming IOMMU flag adds too much overhead.
> The DMA API does a few different things:
>
> a) address translation
>
> This does include IOMMUs. But it also includes random offsets
> between PCI bars and system memory that we see on various
> platforms.
I don't think you mean bars. That's unrelated to DMA.
> Worse so some of these offsets might be based on
> banks, e.g. on the broadcom bmips platform. It also deals
> with bitmask in physical addresses related to memory encryption
> like AMD SEV. I'd be really curious how for example the
> Intel virtio based NIC is going to work on any of those
> plaforms.
SEV guys report that they just set the iommu flag and then it all works.
I guess if there's translation we can think of this as a kind of iommu.
Maybe we should rename PLATFORM_IOMMU to PLARTFORM_TRANSLATION?
And apparently some people complain that just setting that flag makes
qemu check translation on each access with an unacceptable performance
overhead. Forcing same behaviour for everyone on general principles
even without the flag is unlikely to make them happy.
> b) coherency
>
> On many architectures DMA is not cache coherent, and we need
> to invalidate and/or write back cache lines before doing
> DMA. Again, I wonder how this is every going to work with
> hardware based virtio implementations.
You mean dma_Xmb and friends?
There's a new feature VIRTIO_F_IO_BARRIER that's being proposed
for that.
> Even worse I think this
> is actually broken at least for VIVT event for virtualized
> implementations. E.g. a KVM guest is going to access memory
> using different virtual addresses than qemu, vhost might throw
> in another different address space.
I don't really know what VIVT is. Could you help me please?
> c) bounce buffering
>
> Many DMA implementations can not address all physical memory
> due to addressing limitations. In such cases we copy the
> DMA memory into a known addressable bounc buffer and DMA
> from there.
Don't do it then?
> d) flushing write combining buffers or similar
>
> On some hardware platforms we need workarounds to e.g. read
> from a certain mmio address to make sure DMA can actually
> see memory written by the host.
I guess it isn't an issue as long as WC isn't actually used.
It will become an issue when virtio spec adds some WC capability -
I suspect we can ignore this for now.
>
> All of this is bypassed by virtio by default despite generally being
> platform issues, not particular to a given device.
It's both a device and a platform issue. A PV device is often more like
another CPU than like a PCI device.
--
MST
^ permalink raw reply
* Re: KMSAN: uninit-value in _copy_to_iter (2)
From: Michael S. Tsirkin @ 2018-06-07 17:10 UTC (permalink / raw)
To: syzbot
Cc: willemb, avagin, kvm, netdev, matthew, linux-kernel, mingo,
syzkaller-bugs, edumazet, viro, dingtianhong, pabeni,
virtualization, davem, elena.reshetova
In-Reply-To: <000000000000cf4578056ab12452@google.com>
#syz test: https://github.com/google/kmsan.git master
Subject: vhost: fix info leak
Fixes: CVE-2018-1118
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f0be5f35ab28..9beefa6ed1ce 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2345,6 +2345,9 @@ struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
if (!node)
return NULL;
+
+ /* Make sure all padding within the structure is initialized. */
+ memset(&node->msg, 0, sizeof node->msg);
node->vq = vq;
node->msg.type = type;
return node;
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox