* Re: [PATCH] vhost-net: add module alias
From: Stephen Hemminger @ 2012-01-11 16:58 UTC (permalink / raw)
To: Michael Tokarev
Cc: netdev, virtualization, Kay Sievers, kvm, Michael S. Tsirkin
In-Reply-To: <4F0D3543.7020401@msgid.tls.msk.ru>
On Wed, 11 Jan 2012 11:07:47 +0400
Michael Tokarev <mjt@tls.msk.ru> wrote:
> On 11.01.2012 08:54, Stephen Hemminger wrote:
> > By adding the a module alias, programs (or users) won't have to explicitly
> > call modprobe. Vhost-net will always be available if built into the kernel.
> > It does require assigning a permanent minor number for depmod to work.
> > Choose one next to TUN since this driver is related to it.
>
> Why do you think a statically-allocated device number will do any good
> at all? Static /dev is gone almost completely, at least on the systems
> where whole virt stuff makes any sense, so you don't have pre-created
> vhost-net device anymore, and hence this allocation makes no sense.
> Just IMHO anyway.
The statically allocated device number is required for the udev/module
autoloading to work. Probably the udev infrastructure needs a consistent
number to hang off of.
It looks like:
* driver adds MODULE_ALIAS() for devname and character device
* depmod scans modules and creates modules.devname (in /lib/modules)
* udev uses modules.devname to autoload the module
$ /sbin/modinfo vhost_net
filename: /lib/modules/3.2.0-net+/kernel/drivers/vhost/vhost_net.ko
alias: devname:vhost-net
alias: char-major-10-201
description: Host kernel accelerator for virtio net
...
See also: https://lkml.org/lkml/2010/5/21/134
^ permalink raw reply
* Re: [PATCH] vhost-net: add module alias
From: Stephen Hemminger @ 2012-01-11 16:54 UTC (permalink / raw)
To: Amos Kong; +Cc: kvm, Michael S. Tsirkin, netdev, virtualization, device, alan
In-Reply-To: <CAFeW=paA4Aqk2gy2qpUUM6nANjr3ahio3N02Tz4LVEGVVxNJWA@mail.gmail.com>
On Wed, 11 Jan 2012 15:43:42 +0800
Amos Kong <kongjianjun@gmail.com> wrote:
> On Wed, Jan 11, 2012 at 12:54 PM, Stephen Hemminger
> <shemminger@vyatta.com>wrote:
>
> > By adding the a module alias, programs (or users) won't have to explicitly
> > call modprobe. Vhost-net will always be available if built into the kernel.
> > It does require assigning a permanent minor number for depmod to work.
> > Choose one next to TUN since this driver is related to it.
> >
> > Also, use C99 style initialization.
> >
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> >
> > ---
> > drivers/vhost/net.c | 8 +++++---
> > include/linux/miscdevice.h | 1 +
> > 2 files changed, 6 insertions(+), 3 deletions(-)
> >
:
> /*
> * These allocations are managed by device@lanana.org. If you use an
> * entry that is not in assigned your entry may well be moved and
> * reassigned, or set dynamic if a fixed value is not justified.
> */
Didn't that mailing address was ever used any more. Like many places
in kernel, the comment looked like a historical leftover.
^ permalink raw reply
* [PATCH 4/4] xen kconfig: describe xen tmem in the config menu
From: Andrew Jones @ 2012-01-11 16:36 UTC (permalink / raw)
To: konrad; +Cc: jeremy, xen-devel, virtualization, konrad.wilk,
stefano.stabellini
In-Reply-To: <1326299801-7966-1-git-send-email-drjones@redhat.com>
Add a description to the config menu for xen tmem.
Signed-off-by: Andrew Jones <drjones@redhat.com>
---
drivers/xen/Kconfig | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 1d24061..7e8d728 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -143,7 +143,7 @@ config SWIOTLB_XEN
select SWIOTLB
config XEN_TMEM
- bool
+ bool "Xen Transcendent Memory (tmem)"
default y if (CLEANCACHE || FRONTSWAP)
help
Shim to interface in-kernel Transcendent Memory hooks
--
1.7.7.5
^ permalink raw reply related
* [PATCH 3/4 v2] xen kconfig: add dom0 support help text
From: Andrew Jones @ 2012-01-11 16:36 UTC (permalink / raw)
To: konrad; +Cc: jeremy, xen-devel, virtualization, konrad.wilk,
stefano.stabellini
In-Reply-To: <1326299801-7966-1-git-send-email-drjones@redhat.com>
Describe dom0 support in the config menu and supply help text for it.
v2 adds 'if EXPERT' to keep it out of the "standard" menu.
Signed-off-by: Andrew Jones <drjones@redhat.com>
---
arch/x86/xen/Kconfig | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 26c731a..31ec975 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -14,9 +14,14 @@ config XEN
Xen hypervisor.
config XEN_DOM0
- def_bool y
+ bool "Xen Initial Domain (Dom0) support" if EXPERT
+ default y
depends on XEN && PCI_XEN && SWIOTLB_XEN
depends on X86_LOCAL_APIC && X86_IO_APIC && ACPI && PCI
+ help
+ This allows the kernel to be used for the initial Xen domain,
+ Domain0. This is a privileged guest that supplies backends
+ and is used to manage the other Xen domains.
# Dummy symbol since people have come to rely on the PRIVILEGED_GUEST
# name in tools.
--
1.7.7.5
^ permalink raw reply related
* [PATCH 2/4 v2] xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps
From: Andrew Jones @ 2012-01-11 16:36 UTC (permalink / raw)
To: konrad; +Cc: jeremy, xen-devel, virtualization, konrad.wilk,
stefano.stabellini
In-Reply-To: <1326299801-7966-1-git-send-email-drjones@redhat.com>
PV-on-HVM guests may want to use the xen keyboard/mouse frontend, but
they don't use the xen frame buffer frontend. For this case it doesn't
make much sense for INPUT_XEN_KBDDEV_FRONTEND to depend on
XEN_FBDEV_FRONTEND. The opposite direction always makes more sense, i.e.
if you're using xenfb, then you'll want xenkbd. Switch the dependencies.
Signed-off-by: Andrew Jones <drjones@redhat.com>
---
drivers/input/misc/Kconfig | 2 +-
drivers/video/Kconfig | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig
index 22d875f..36c15bf 100644
--- a/drivers/input/misc/Kconfig
+++ b/drivers/input/misc/Kconfig
@@ -533,7 +533,7 @@ config INPUT_CMA3000_I2C
config INPUT_XEN_KBDDEV_FRONTEND
tristate "Xen virtual keyboard and mouse support"
- depends on XEN_FBDEV_FRONTEND
+ depends on XEN
default y
select XEN_XENBUS_FRONTEND
help
diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index d83e967..3e38c2f 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -2263,7 +2263,7 @@ config FB_VIRTUAL
config XEN_FBDEV_FRONTEND
tristate "Xen virtual frame buffer support"
- depends on FB && XEN
+ depends on FB && XEN && INPUT_XEN_KBDDEV_FRONTEND
select FB_SYS_FILLRECT
select FB_SYS_COPYAREA
select FB_SYS_IMAGEBLIT
--
1.7.7.5
^ permalink raw reply related
* [PATCH 1/4] xen kconfig: keep XEN_XENBUS_FRONTEND builtin
From: Andrew Jones @ 2012-01-11 16:36 UTC (permalink / raw)
To: konrad; +Cc: jeremy, xen-devel, virtualization, konrad.wilk,
stefano.stabellini
In-Reply-To: <20120111161911.GB18203@andromeda.dapyr.net>
When XEN_XENBUS_FRONTEND gets selected as a module it can lead to
unbootable configs. If we need it, then we should just build it in.
Signed-off-by: Andrew Jones <drjones@redhat.com>
---
drivers/xen/Kconfig | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 8795480..1d24061 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -118,7 +118,7 @@ config XEN_SYS_HYPERVISOR
but will have no xen contents.
config XEN_XENBUS_FRONTEND
- tristate
+ bool
config XEN_GNTDEV
tristate "userspace grant access device driver"
--
1.7.7.5
^ permalink raw reply related
* Re: [Xen-devel] [PATCH 2/4 v2] xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps
From: Andrew Jones @ 2012-01-11 16:29 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: jeremy, xen-devel, konrad wilk, FlorianSchandinat,
dmitry torokhov, virtualization
In-Reply-To: <20120111161130.GA18203@andromeda.dapyr.net>
----- Original Message -----
> On Mon, Jan 09, 2012 at 06:51:41PM +0100, Andrew Jones wrote:
> > PV-on-HVM guests may want to use the xen keyboard/mouse frontend,
> > but
> > they don't use the xen frame buffer frontend. For this case it
> > doesn't
>
> Ok, but PV does?
> > make much sense for INPUT_XEN_KBDDEV_FRONTEND to depend on
> > XEN_FBDEV_FRONTEND. The opposite direction always makes more sense,
> > i.e.
> > if you're using xenfb, then you'll want xenkbd. Switch the
> > dependencies.
>
> That sounds like it would be universal irregardless if it is
> PV or PVonHVM?
This patch makes it such that if you want to use both, then you must
select both. It also says that if you want FB, then you need the
KBD. However, if you only want the KBD then you're fine with just
that. So there isn't any risk of breaking configs designed to use
FB, because FB should be manually selected for those configs anyway.
Drew
> >
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> > drivers/input/misc/Kconfig | 2 +-
> > drivers/video/Kconfig | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/input/misc/Kconfig
> > b/drivers/input/misc/Kconfig
> > index 22d875f..36c15bf 100644
> > --- a/drivers/input/misc/Kconfig
> > +++ b/drivers/input/misc/Kconfig
> > @@ -533,7 +533,7 @@ config INPUT_CMA3000_I2C
> >
> > config INPUT_XEN_KBDDEV_FRONTEND
> > tristate "Xen virtual keyboard and mouse support"
> > - depends on XEN_FBDEV_FRONTEND
> > + depends on XEN
> > default y
> > select XEN_XENBUS_FRONTEND
> > help
> > diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
> > index d83e967..3e38c2f 100644
> > --- a/drivers/video/Kconfig
> > +++ b/drivers/video/Kconfig
> > @@ -2263,7 +2263,7 @@ config FB_VIRTUAL
> >
> > config XEN_FBDEV_FRONTEND
> > tristate "Xen virtual frame buffer support"
> > - depends on FB && XEN
> > + depends on FB && XEN && INPUT_XEN_KBDDEV_FRONTEND
> > select FB_SYS_FILLRECT
> > select FB_SYS_COPYAREA
> > select FB_SYS_IMAGEBLIT
> > --
> > 1.7.7.5
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
>
^ permalink raw reply
* Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Konrad Rzeszutek Wilk @ 2012-01-11 16:19 UTC (permalink / raw)
To: Andrew Jones
Cc: Jeremy Fitzhardinge, xen-devel, virtualization,
Konrad Rzeszutek Wilk, Stefano Stabellini
In-Reply-To: <8e643150-6be5-44ba-aba3-a987b22fc82b@zmail13.collab.prod.int.phx2.redhat.com>
> > If the root complaint is that "customers think that anything set in
> > .config is a supported feature", then the solutions are to support
> > all
> > the features in .config, re-educate the customers that they're wrong,
> > or
> > maintain a local patch to do this stuff.
>
> If only re-educating people was free, like preempting questions is.
> Local patches are of course always an option, and perhaps in this
> case it's the best one. However, I think we already made a case for
> better xen configurability for the driver domains, so I'm not 100%
Could you repost those backend patches please? At this point I am not
sure which one we have discarded?
> convinced my initial patch (making dom0 configurable) isn't worthy
> of upstream. Also, I didn't see any comments on my v2[*] of that
> patch, which I believe satisfies the menu complexity issue and
> brings in more configurability. That said, I'm about to reply to
> that patch myself, since there's an issue with it.
>
> Drew
>
> [*] http://article.gmane.org/gmane.linux.kernel.virtualization/14635
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply
* Re: [Xen-devel] [PATCH 2/4 v2] xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps
From: Konrad Rzeszutek Wilk @ 2012-01-11 16:11 UTC (permalink / raw)
To: Andrew Jones
Cc: jeremy, xen-devel, konrad.wilk, FlorianSchandinat,
dmitry.torokhov, virtualization
In-Reply-To: <1326131501-9610-1-git-send-email-drjones@redhat.com>
On Mon, Jan 09, 2012 at 06:51:41PM +0100, Andrew Jones wrote:
> PV-on-HVM guests may want to use the xen keyboard/mouse frontend, but
> they don't use the xen frame buffer frontend. For this case it doesn't
Ok, but PV does?
> make much sense for INPUT_XEN_KBDDEV_FRONTEND to depend on
> XEN_FBDEV_FRONTEND. The opposite direction always makes more sense, i.e.
> if you're using xenfb, then you'll want xenkbd. Switch the dependencies.
That sounds like it would be universal irregardless if it is
PV or PVonHVM?
>
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
> drivers/input/misc/Kconfig | 2 +-
> drivers/video/Kconfig | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig
> index 22d875f..36c15bf 100644
> --- a/drivers/input/misc/Kconfig
> +++ b/drivers/input/misc/Kconfig
> @@ -533,7 +533,7 @@ config INPUT_CMA3000_I2C
>
> config INPUT_XEN_KBDDEV_FRONTEND
> tristate "Xen virtual keyboard and mouse support"
> - depends on XEN_FBDEV_FRONTEND
> + depends on XEN
> default y
> select XEN_XENBUS_FRONTEND
> help
> diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
> index d83e967..3e38c2f 100644
> --- a/drivers/video/Kconfig
> +++ b/drivers/video/Kconfig
> @@ -2263,7 +2263,7 @@ config FB_VIRTUAL
>
> config XEN_FBDEV_FRONTEND
> tristate "Xen virtual frame buffer support"
> - depends on FB && XEN
> + depends on FB && XEN && INPUT_XEN_KBDDEV_FRONTEND
> select FB_SYS_FILLRECT
> select FB_SYS_COPYAREA
> select FB_SYS_IMAGEBLIT
> --
> 1.7.7.5
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Anthony Liguori @ 2012-01-11 16:02 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Pawel Moll, Benjamin Herrenschmidt, virtualization,
Christian Borntraeger, Sasha Levin
In-Reply-To: <20120111154515.GD20570@redhat.com>
On 01/11/2012 09:45 AM, Michael S. Tsirkin wrote:
> On Wed, Jan 11, 2012 at 09:28:27AM -0600, Anthony Liguori wrote:
>> On 01/11/2012 09:21 AM, Michael S. Tsirkin wrote:
>>> On Wed, Jan 11, 2012 at 09:15:49AM -0600, Anthony Liguori wrote:
>>>>> This is similar to what we have now. But it's still buggy: e.g. if guest
>>>>> updates MAC byte by byte, we have no way to know when it's done doing
>>>>> so.
>>>>
>>>> This is no different than a normal network card. You have to use a
>>>> secondary register to trigger an update.
>>>>
>>>> Regards,
>>>>
>>>> Anthony Liguori
>>>
>>> Possible but doesn't let us layer nicely to allow unchanged drivers
>>> that work with all transports (new pci, old pci, non pci).
>>
>> If we declare config space LE, then we have to touch all drivers.
>> There's no way around it because the virtio API is byte-based, not
>> word based.
>
> Fine but don't we want to be compatible with old hypervisors?
We can modify the drivers to work either with a virtio1 or virtio2 transport.
If the only difference is that we move to word access instead of byte access for
the config space, it's a nop because drivers don't rely on sub-word access today.
>> This is why I'm suggesting making the virtio API (and then the
>> virtio-pci ABI) word based. It gives us the flexibility to make
>> endianness a property of the transport and not a property of the
>> devices.
>>
>> Regards,
>>
>> Anthony Liguori
>
> Some fields are 64 bit, this is still tricky to do atomically.
> What's the objection to using a config VQ?
Then we move very far away from something that looks like a PCI device. The
problem we're having here is specifically where we've deviated from what a
normal PCI device would do. Fixing that by deviating even further seems counter
intuitive to me.
Regards,
Anthony Liguori
>
^ permalink raw reply
* Re: [Xen-devel] [PATCH 3/4 v2] xen kconfig: add dom0 support help text
From: Andrew Jones @ 2012-01-11 15:45 UTC (permalink / raw)
To: xen-devel
Cc: jeremy, stefano stabellini, Ian Campbell, konrad wilk,
virtualization
In-Reply-To: <1326132476-10047-1-git-send-email-drjones@redhat.com>
----- Original Message -----
> Describe dom0 support in the config menu and supply help text for it.
>
> v2 adds 'if EXPERT' to keep it out of the "standard" menu.
>
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
> arch/x86/xen/Kconfig | 7 ++++++-
> 1 files changed, 6 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
> index 26c731a..31ec975 100644
> --- a/arch/x86/xen/Kconfig
> +++ b/arch/x86/xen/Kconfig
> @@ -14,9 +14,14 @@ config XEN
> Xen hypervisor.
>
> config XEN_DOM0
> - def_bool y
> + bool "Xen Initial Domain (Dom0) support" if EXPERT
> + default y
> depends on XEN && PCI_XEN && SWIOTLB_XEN
> depends on X86_LOCAL_APIC && X86_IO_APIC && ACPI && PCI
> + help
> + This allows the kernel to be used for the initial Xen domain,
> + Domain0. This is a privileged guest that supplies backends
> + and is used to manage the other Xen domains.
>
> # Dummy symbol since people have come to rely on the
> PRIVILEGED_GUEST
> # name in tools.
> --
> 1.7.7.5
There's currently an issue using EXPERT. It doesn't make this
patch wrong, but any users that would want to turn EXPERT on, in
order to turn DOM0 off, would find that their configs have gotten
twisted in other ways as well. I've posted a patch for EXPERT, and
if it gets merged, then I really can't see any good arguments against
this patch.
Drew
^ permalink raw reply
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Michael S. Tsirkin @ 2012-01-11 15:45 UTC (permalink / raw)
To: Anthony Liguori
Cc: Pawel Moll, Benjamin Herrenschmidt, virtualization,
Christian Borntraeger, Sasha Levin
In-Reply-To: <4F0DAA9B.7060703@codemonkey.ws>
On Wed, Jan 11, 2012 at 09:28:27AM -0600, Anthony Liguori wrote:
> On 01/11/2012 09:21 AM, Michael S. Tsirkin wrote:
> >On Wed, Jan 11, 2012 at 09:15:49AM -0600, Anthony Liguori wrote:
> >>>This is similar to what we have now. But it's still buggy: e.g. if guest
> >>>updates MAC byte by byte, we have no way to know when it's done doing
> >>>so.
> >>
> >>This is no different than a normal network card. You have to use a
> >>secondary register to trigger an update.
> >>
> >>Regards,
> >>
> >>Anthony Liguori
> >
> >Possible but doesn't let us layer nicely to allow unchanged drivers
> >that work with all transports (new pci, old pci, non pci).
>
> If we declare config space LE, then we have to touch all drivers.
> There's no way around it because the virtio API is byte-based, not
> word based.
Fine but don't we want to be compatible with old hypervisors?
> This is why I'm suggesting making the virtio API (and then the
> virtio-pci ABI) word based. It gives us the flexibility to make
> endianness a property of the transport and not a property of the
> devices.
>
> Regards,
>
> Anthony Liguori
Some fields are 64 bit, this is still tricky to do atomically.
What's the objection to using a config VQ?
^ permalink raw reply
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Michael S. Tsirkin @ 2012-01-11 15:39 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Pawel Moll, Benjamin Herrenschmidt, virtualization,
Christian Borntraeger, Sasha Levin
In-Reply-To: <CAJSP0QWMBhsg9ExmTsDG7d32p=7tokCPpm5Z0DMAm2_ADvvb+g@mail.gmail.com>
On Wed, Jan 11, 2012 at 02:28:48PM +0000, Stefan Hajnoczi wrote:
> On Wed, Jan 11, 2012 at 9:10 AM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
> > On Wed, 2012-01-11 at 08:47 +0000, Stefan Hajnoczi wrote:
> >>
> >> This is also an opportunity to stop using CPU physical addresses in
> >> the ring and instead perform DMA like a normal PCI device (use bus
> >> addresses).
> >
> > Euh why ?
>
> Because it's a paravirt hack that ends up hitting corner cases. It's
> not possible to do virtio-pci passthrough under nested virtualization
> unless we use an IOMMU. Imagine passing virtio-net from L0 into the
> L2 guest (i.e. PCI-passthrough). If virtio-pci is really "PCI" this
> should be possible but it's not when we use physical addresses instead
> of bus addresses.
>
> Stefan
It won't be hard to show siginificant performance regression if
we do this. Hard to justify for something as niche as nested virt.
--
MST
^ permalink raw reply
* Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Andrew Jones @ 2012-01-11 15:37 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Stefano Stabellini, xen-devel, Konrad Rzeszutek Wilk,
virtualization
In-Reply-To: <4F0B8D06.8050501@goop.org>
----- Original Message -----
> On 01/10/2012 05:44 AM, Stefano Stabellini wrote:
> > On Mon, 9 Jan 2012, Andrew Jones wrote:
> >> I guess if we did the s/XEN_DOM0/LOCAL_APIC && IO_APIC && ACPI/ in
> >> arch/x86/pci/xen.c it would be pretty easy to review for
> >> equivalence.
> >> Then keep CONFIG_PRIVILIGED, but drop XEN_DOM0 from everywhere
> >> else and
> >> compile in the 3 files. I don't think it makes much sense to do it
> >> though. XEN_DOM0 keeps things tidier now and might be useful
> >> later.
> > we can keep things clean with the following:
> >
> > #ifdef CONFIG_LOCAL_APIC && CONFIG_IO_APIC && CONFIG_ACPI &&
> > CONFIG_PCI_XEN
> >
> > #define XEN_DOM0
> >
> > #endif
> >
> > in include/xen/xen.h.
> >
> > So in the source files we can still '#ifdef XEN_DOM0', but at the
> > same
> > time we can get rid of the build symbol: everybody wins.
>
> No, really, I think this is silly. This kind of dependency
> information
> should be encoded in the Kconfig system, and not reimplemented in an
> ad-hoc way.
>
> If there were a clean way to do what Andrew wants then I'd support
> it,
> but this thread has descended into a spiral of madness, which makes
> me
> think its a lost cause.
>
> If the root complaint is that "customers think that anything set in
> .config is a supported feature", then the solutions are to support
> all
> the features in .config, re-educate the customers that they're wrong,
> or
> maintain a local patch to do this stuff.
If only re-educating people was free, like preempting questions is.
Local patches are of course always an option, and perhaps in this
case it's the best one. However, I think we already made a case for
better xen configurability for the driver domains, so I'm not 100%
convinced my initial patch (making dom0 configurable) isn't worthy
of upstream. Also, I didn't see any comments on my v2[*] of that
patch, which I believe satisfies the menu complexity issue and
brings in more configurability. That said, I'm about to reply to
that patch myself, since there's an issue with it.
Drew
[*] http://article.gmane.org/gmane.linux.kernel.virtualization/14635
^ permalink raw reply
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Anthony Liguori @ 2012-01-11 15:28 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Pawel Moll, Benjamin Herrenschmidt, virtualization,
Christian Borntraeger, Sasha Levin
In-Reply-To: <20120111152129.GB20570@redhat.com>
On 01/11/2012 09:21 AM, Michael S. Tsirkin wrote:
> On Wed, Jan 11, 2012 at 09:15:49AM -0600, Anthony Liguori wrote:
>>> This is similar to what we have now. But it's still buggy: e.g. if guest
>>> updates MAC byte by byte, we have no way to know when it's done doing
>>> so.
>>
>> This is no different than a normal network card. You have to use a
>> secondary register to trigger an update.
>>
>> Regards,
>>
>> Anthony Liguori
>
> Possible but doesn't let us layer nicely to allow unchanged drivers
> that work with all transports (new pci, old pci, non pci).
If we declare config space LE, then we have to touch all drivers. There's no
way around it because the virtio API is byte-based, not word based.
This is why I'm suggesting making the virtio API (and then the virtio-pci ABI)
word based. It gives us the flexibility to make endianness a property of the
transport and not a property of the devices.
Regards,
Anthony Liguori
^ permalink raw reply
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Michael S. Tsirkin @ 2012-01-11 15:21 UTC (permalink / raw)
To: Anthony Liguori
Cc: Pawel Moll, Benjamin Herrenschmidt, virtualization,
Christian Borntraeger, Sasha Levin
In-Reply-To: <4F0DA7A5.7050600@codemonkey.ws>
On Wed, Jan 11, 2012 at 09:15:49AM -0600, Anthony Liguori wrote:
> On 01/11/2012 09:12 AM, Michael S. Tsirkin wrote:
> >On Wed, Jan 11, 2012 at 07:30:34AM -0600, Anthony Liguori wrote:
> >>On 01/10/2012 06:25 PM, Rusty Russell wrote:
> >>>On Tue, 10 Jan 2012 19:03:36 +0200, "Michael S. Tsirkin"<mst@redhat.com> wrote:
> >>>>On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote:
> >>>>>Yes. The idea that we can alter fields in the device-specific config
> >>>>>area is flawed. There may be cases where it doesn't matter, but as an
> >>>>>idea it was holed to begin with.
> >>>>>
> >>>>>We can reduce probability by doing a double read to check, but there are
> >>>>>still cases where it will fail.
> >>>>
> >>>>Okay - want me to propose an interface for that?
> >>>
> >>>Had a brief chat with BenH (CC'd).
> >>>
> >>>I think we should deprecate writing to the config space. Only balloon
> >>>does it AFAICT, and I can't quite figure out *why* it has an 'active'
> >>>field. This solves half the problem, of sync guest writes. For the
> >>>other half, I suggest a generation counter; odd means inconsistent. The
> >>>guest can poll.
> >>>
> >>>BenH also convinced me we should finally make the config space LE if
> >>>we're going to change things. Since PCI is the most common transport,
> >>>guest-endian confuses people. And it sucks for really weird machines
> >>
> >>I think the more important thing to do is require accesses to
> >>integers in the config space to always be aligned and to use the
> >>appropriate accessor. Non-integer fields should be restricted to
> >>byte access.
> >>
> >>That limits config space entries to 32-bit but also means that there
> >>is no need for a generation counter. It's also easier to deal with
> >>endian conversion that way.
> >
> >This is similar to what we have now. But it's still buggy: e.g. if guest
> >updates MAC byte by byte, we have no way to know when it's done doing
> >so.
>
> This is no different than a normal network card. You have to use a
> secondary register to trigger an update.
>
> Regards,
>
> Anthony Liguori
Possible but doesn't let us layer nicely to allow unchanged drivers
that work with all transports (new pci, old pci, non pci).
Something like a command VQ would be a generic transport
that can be hidden behind config->set(...).
> >
> >
> >>But it means the backend code ends up being much simpler to write
> >>(because it behaves more like a normal PCI device).
> >>
> >>If we're already making the change, the endianness ought to be a feature bit.
> >>
> >>>We should also change the ring (to a single ring, I think).
> >>
> >>Ack.
> >>
> >>>Descriptors
> >>>to 24 bytes long (8 byte cookie, 8 byte addr, 4 byte len, 4 byte flags).
> >>>We might be able to squeeze it into 20 bytes but that means packing. We
> >>>should support inline, chained or indirect. Let the other side ack by
> >>>setting flag, cookie and len (if written).
> >>>
> >>>Moreover, I think we should make all these changes at once (at least, in
> >>>the spec). That makes it a big change, and it'll take longer to
> >>>develop, but makes it easy in the long run to differentiate legacy and
> >>>modern virtio.
> >>
> >>Ack. Long live virtio2! :-)
> >>
> >>Regards,
> >>
> >>Anthony Liguori
> >>
> >>>
> >>>Thoughts?
> >>>Rusty.
^ permalink raw reply
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Anthony Liguori @ 2012-01-11 15:15 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Pawel Moll, Benjamin Herrenschmidt, virtualization,
Christian Borntraeger, Sasha Levin
In-Reply-To: <20120111151230.GA20570@redhat.com>
On 01/11/2012 09:12 AM, Michael S. Tsirkin wrote:
> On Wed, Jan 11, 2012 at 07:30:34AM -0600, Anthony Liguori wrote:
>> On 01/10/2012 06:25 PM, Rusty Russell wrote:
>>> On Tue, 10 Jan 2012 19:03:36 +0200, "Michael S. Tsirkin"<mst@redhat.com> wrote:
>>>> On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote:
>>>>> Yes. The idea that we can alter fields in the device-specific config
>>>>> area is flawed. There may be cases where it doesn't matter, but as an
>>>>> idea it was holed to begin with.
>>>>>
>>>>> We can reduce probability by doing a double read to check, but there are
>>>>> still cases where it will fail.
>>>>
>>>> Okay - want me to propose an interface for that?
>>>
>>> Had a brief chat with BenH (CC'd).
>>>
>>> I think we should deprecate writing to the config space. Only balloon
>>> does it AFAICT, and I can't quite figure out *why* it has an 'active'
>>> field. This solves half the problem, of sync guest writes. For the
>>> other half, I suggest a generation counter; odd means inconsistent. The
>>> guest can poll.
>>>
>>> BenH also convinced me we should finally make the config space LE if
>>> we're going to change things. Since PCI is the most common transport,
>>> guest-endian confuses people. And it sucks for really weird machines
>>
>> I think the more important thing to do is require accesses to
>> integers in the config space to always be aligned and to use the
>> appropriate accessor. Non-integer fields should be restricted to
>> byte access.
>>
>> That limits config space entries to 32-bit but also means that there
>> is no need for a generation counter. It's also easier to deal with
>> endian conversion that way.
>
> This is similar to what we have now. But it's still buggy: e.g. if guest
> updates MAC byte by byte, we have no way to know when it's done doing
> so.
This is no different than a normal network card. You have to use a secondary
register to trigger an update.
Regards,
Anthony Liguori
>
>
>> But it means the backend code ends up being much simpler to write
>> (because it behaves more like a normal PCI device).
>>
>> If we're already making the change, the endianness ought to be a feature bit.
>>
>>> We should also change the ring (to a single ring, I think).
>>
>> Ack.
>>
>>> Descriptors
>>> to 24 bytes long (8 byte cookie, 8 byte addr, 4 byte len, 4 byte flags).
>>> We might be able to squeeze it into 20 bytes but that means packing. We
>>> should support inline, chained or indirect. Let the other side ack by
>>> setting flag, cookie and len (if written).
>>>
>>> Moreover, I think we should make all these changes at once (at least, in
>>> the spec). That makes it a big change, and it'll take longer to
>>> develop, but makes it easy in the long run to differentiate legacy and
>>> modern virtio.
>>
>> Ack. Long live virtio2! :-)
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>>
>>> Thoughts?
>>> Rusty.
^ permalink raw reply
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Michael S. Tsirkin @ 2012-01-11 15:12 UTC (permalink / raw)
To: Anthony Liguori
Cc: Pawel Moll, Benjamin Herrenschmidt, virtualization,
Christian Borntraeger, Sasha Levin
In-Reply-To: <4F0D8EFA.3010503@codemonkey.ws>
On Wed, Jan 11, 2012 at 07:30:34AM -0600, Anthony Liguori wrote:
> On 01/10/2012 06:25 PM, Rusty Russell wrote:
> >On Tue, 10 Jan 2012 19:03:36 +0200, "Michael S. Tsirkin"<mst@redhat.com> wrote:
> >>On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote:
> >>>Yes. The idea that we can alter fields in the device-specific config
> >>>area is flawed. There may be cases where it doesn't matter, but as an
> >>>idea it was holed to begin with.
> >>>
> >>>We can reduce probability by doing a double read to check, but there are
> >>>still cases where it will fail.
> >>
> >>Okay - want me to propose an interface for that?
> >
> >Had a brief chat with BenH (CC'd).
> >
> >I think we should deprecate writing to the config space. Only balloon
> >does it AFAICT, and I can't quite figure out *why* it has an 'active'
> >field. This solves half the problem, of sync guest writes. For the
> >other half, I suggest a generation counter; odd means inconsistent. The
> >guest can poll.
> >
> >BenH also convinced me we should finally make the config space LE if
> >we're going to change things. Since PCI is the most common transport,
> >guest-endian confuses people. And it sucks for really weird machines
>
> I think the more important thing to do is require accesses to
> integers in the config space to always be aligned and to use the
> appropriate accessor. Non-integer fields should be restricted to
> byte access.
>
> That limits config space entries to 32-bit but also means that there
> is no need for a generation counter. It's also easier to deal with
> endian conversion that way.
This is similar to what we have now. But it's still buggy: e.g. if guest
updates MAC byte by byte, we have no way to know when it's done doing
so.
> But it means the backend code ends up being much simpler to write
> (because it behaves more like a normal PCI device).
>
> If we're already making the change, the endianness ought to be a feature bit.
>
> >We should also change the ring (to a single ring, I think).
>
> Ack.
>
> >Descriptors
> >to 24 bytes long (8 byte cookie, 8 byte addr, 4 byte len, 4 byte flags).
> >We might be able to squeeze it into 20 bytes but that means packing. We
> >should support inline, chained or indirect. Let the other side ack by
> >setting flag, cookie and len (if written).
> >
> >Moreover, I think we should make all these changes at once (at least, in
> >the spec). That makes it a big change, and it'll take longer to
> >develop, but makes it easy in the long run to differentiate legacy and
> >modern virtio.
>
> Ack. Long live virtio2! :-)
>
> Regards,
>
> Anthony Liguori
>
> >
> >Thoughts?
> >Rusty.
^ permalink raw reply
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Stefan Hajnoczi @ 2012-01-11 14:28 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Pawel Moll, Michael S. Tsirkin, virtualization,
Christian Borntraeger, Sasha Levin
In-Reply-To: <1326273019.23910.118.camel@pasglop>
On Wed, Jan 11, 2012 at 9:10 AM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Wed, 2012-01-11 at 08:47 +0000, Stefan Hajnoczi wrote:
>>
>> This is also an opportunity to stop using CPU physical addresses in
>> the ring and instead perform DMA like a normal PCI device (use bus
>> addresses).
>
> Euh why ?
Because it's a paravirt hack that ends up hitting corner cases. It's
not possible to do virtio-pci passthrough under nested virtualization
unless we use an IOMMU. Imagine passing virtio-net from L0 into the
L2 guest (i.e. PCI-passthrough). If virtio-pci is really "PCI" this
should be possible but it's not when we use physical addresses instead
of bus addresses.
Stefan
^ permalink raw reply
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Anthony Liguori @ 2012-01-11 13:30 UTC (permalink / raw)
To: Rusty Russell
Cc: Pawel Moll, Michael S. Tsirkin, Benjamin Herrenschmidt,
virtualization, Christian Borntraeger, Sasha Levin
In-Reply-To: <8762gj6q5r.fsf@rustcorp.com.au>
On 01/10/2012 06:25 PM, Rusty Russell wrote:
> On Tue, 10 Jan 2012 19:03:36 +0200, "Michael S. Tsirkin"<mst@redhat.com> wrote:
>> On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote:
>>> Yes. The idea that we can alter fields in the device-specific config
>>> area is flawed. There may be cases where it doesn't matter, but as an
>>> idea it was holed to begin with.
>>>
>>> We can reduce probability by doing a double read to check, but there are
>>> still cases where it will fail.
>>
>> Okay - want me to propose an interface for that?
>
> Had a brief chat with BenH (CC'd).
>
> I think we should deprecate writing to the config space. Only balloon
> does it AFAICT, and I can't quite figure out *why* it has an 'active'
> field. This solves half the problem, of sync guest writes. For the
> other half, I suggest a generation counter; odd means inconsistent. The
> guest can poll.
>
> BenH also convinced me we should finally make the config space LE if
> we're going to change things. Since PCI is the most common transport,
> guest-endian confuses people. And it sucks for really weird machines
I think the more important thing to do is require accesses to integers in the
config space to always be aligned and to use the appropriate accessor.
Non-integer fields should be restricted to byte access.
That limits config space entries to 32-bit but also means that there is no need
for a generation counter. It's also easier to deal with endian conversion that way.
But it means the backend code ends up being much simpler to write (because it
behaves more like a normal PCI device).
If we're already making the change, the endianness ought to be a feature bit.
> We should also change the ring (to a single ring, I think).
Ack.
> Descriptors
> to 24 bytes long (8 byte cookie, 8 byte addr, 4 byte len, 4 byte flags).
> We might be able to squeeze it into 20 bytes but that means packing. We
> should support inline, chained or indirect. Let the other side ack by
> setting flag, cookie and len (if written).
>
> Moreover, I think we should make all these changes at once (at least, in
> the spec). That makes it a big change, and it'll take longer to
> develop, but makes it easy in the long run to differentiate legacy and
> modern virtio.
Ack. Long live virtio2! :-)
Regards,
Anthony Liguori
>
> Thoughts?
> Rusty.
^ permalink raw reply
* Re: [PATCH 1/1][RESEND] Staging: hv: storvsc: Move the storage driver out of the staging area
From: Christoph Hellwig @ 2012-01-11 10:36 UTC (permalink / raw)
To: K. Y. Srinivasan
Cc: gregkh, linux-kernel, devel, virtualization, ohering,
James.Bottomley, hch, linux-scsi
In-Reply-To: <1325622441-15600-1-git-send-email-kys@microsoft.com>
> +#define STORVSC_MIN_BUF_NR 64
What about a comment explaining what this is for?
> +#define STORVSC_RING_BUFFER_SIZE (20*PAGE_SIZE)
> +static int storvsc_ringbuffer_size = STORVSC_RING_BUFFER_SIZE;
I don't think you need the #define here.
> +/* to alert the user that structure sizes may be mismatched even though the */
> +/* protocol versions match. */
> +
> +
> +#define REVISION_STRING(REVISION_) #REVISION_
> +#define FILL_VMSTOR_REVISION(RESULT_LVALUE_) \
> + do { \
> + char *revision_string \
> + = REVISION_STRING($Rev : 6 $) + 6; \
> + RESULT_LVALUE_ = 0; \
> + while (*revision_string >= '0' \
> + && *revision_string <= '9') { \
> + RESULT_LVALUE_ *= 10; \
> + RESULT_LVALUE_ += *revision_string - '0'; \
> + revision_string++; \
> + } \
> + } while (0)
> +
How can these macros work? I don't think git ever expans $Rev, and in
either case I really don't get how this is supposed to work.
> +/* Major/minor macros. Minor version is in LSB, meaning that earlier flat */
> +/* version numbers will be interpreted as "0.x" (i.e., 1 becomes 0.1). */
Please use comment formats like this:
/*
* Blah ....
* .....
*/
(this also applies to a few other places).
> +#define VMSTOR_PROTOCOL_MAJOR(VERSION_) (((VERSION_) >> 8) & 0xff)
> +#define VMSTOR_PROTOCOL_MINOR(VERSION_) (((VERSION_)) & 0xff)
> +#define VMSTOR_PROTOCOL_VERSION(MAJOR_, MINOR_) ((((MAJOR_) & 0xff) << 8) | \
> + (((MINOR_) & 0xff)))
All these should be inline functions.
> +#define VMSTOR_INVALID_PROTOCOL_VERSION (-1)
Not used.
> + * Platform neutral description of a scsi request -
> + * this remains the same across the write regardless of 32/64 bit
> + * note: it's patterned off the SCSI_PASS_THROUGH structure
> + */
> +#define CDB16GENERIC_LENGTH 0x10
> +
> +#ifndef SENSE_BUFFER_SIZE
> +#define SENSE_BUFFER_SIZE 0x12
> +#endif
> +
> +#define MAX_DATA_BUF_LEN_WITH_PADDING 0x14
Please don't conditionally re-use the SCSI-layer SENSE_BUFFER_SIZE for
your on the wire packets, but define your own. Please also give all
theses constants a VMSCSI_ or similar prefix.
> +struct vmscsi_request {
> + unsigned short length;
> + unsigned char srb_status;
> + unsigned char scsi_status;
> +
> + unsigned char port_number;
> + unsigned char path_id;
> + unsigned char target_id;
> + unsigned char lun;
> +
> + unsigned char cdb_length;
> + unsigned char sense_info_length;
> + unsigned char data_in;
> + unsigned char reserved;
> +
> + unsigned int data_transfer_length;
> +
> + union {
> + unsigned char cdb[CDB16GENERIC_LENGTH];
> + unsigned char sense_data[SENSE_BUFFER_SIZE];
> + unsigned char reserved_array[MAX_DATA_BUF_LEN_WITH_PADDING];
> + };
> +} __attribute((packed));
Please use fixed size u8/16/32/etc types for all the on-wire defintions.
I'd also really recommend splitting the actual protocol defintion in
a header separate from the driver implementation to make it clear what
is part of the protocol and what's internal to the driver.
> +/* This is the set of flags that the vsc can set in any packets it sends */
> +#define VSC_LEGAL_FLAGS (REQUEST_COMPLETION_FLAG)
unused.
> +#define STORVSC_MAX_CMD_LEN 16
This seems to duplicate the CDB16GENERIC_LENGTH define used above,
please make sure you only have one #define for this constant.
> +
> +/* Matches Windows-end */
> +enum storvsc_request_type {
> + WRITE_TYPE,
> + READ_TYPE,
> + UNKNOWN_TYPE,
> +};
For enums specifying the protocol please always use explicit values.
> +struct hv_storvsc_request {
> + struct hv_device *device;
> +
> + /* Synchronize the request/response if needed */
> + struct completion wait_event;
> +
> + unsigned char *sense_buffer;
> + void *context;
This always is a struct storvsc_cmd_request, and should be typed as
such.
> + void (*on_io_completion)(struct hv_storvsc_request *request);
This always points to storvsc_command_completion, so please remove the
indirect call and use it directly.
> + struct hv_multipage_buffer data_buffer;
> +
> + struct vstor_packet vstor_packet;
> +};
Btw, what is the reason that storvsc_cmd_request and hv_storvsc_request
are separate structures and not merged into one? This wastes a tiny
amount of memory for the init/reset requests, but keeps the code
a lot simpler.
> +static inline struct storvsc_device *get_out_stor_device(
> + struct hv_device *device)
> +static inline struct storvsc_device *get_in_stor_device(
> + struct hv_device *device)
I'm pretty sure you defended this odd reference counting scheme last
time the discussion came up, but please write up a long comment in
the code explaning it so that the question doesn't come up again
all the time.
> + /*
> + * The current SCSI handling on the host side does
> + * not correctly handle:
> + * INQUIRY command with page code parameter set to 0x80
> + * MODE_SENSE command with cmd[2] == 0x1c
> + *
> + * Setup srb and scsi status so this won't be fatal.
> + * We do this so we can distinguish truly fatal failues
> + * (srb status == 0x4) and off-line the device in that case.
> + */
> +
> + if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
> + (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
This should be:
if (stor_pkt->vm_srb.cdb[0] == INQUIRY ||
stor_pkt->vm_srb.cdb[0] == MODE_SENSE) {
or even better use a switch statement to clarify what's going on.
> +static int storvsc_remove(struct hv_device *dev)
> +{
> + struct storvsc_device *stor_device = hv_get_drvdata(dev);
> + struct Scsi_Host *host = stor_device->host;
> +
> + scsi_remove_host(host);
> +
> + scsi_host_put(host);
> +
> + storvsc_dev_remove(dev);
> +
> + return 0;
> +}
Why isn't this near storvcs_probe at the end of the file given that
they logically belong together? Also please only do the scsi_host_put
once you are fully done with it, that is after storvsc_dev_remove.
> +/*
> + * storvsc_host_reset_handler - Reset the scsi HBA
> + */
> +static int storvsc_host_reset_handler(struct scsi_cmnd *scmnd)
> +{
> + struct hv_host_device *host_dev = shost_priv(scmnd->device->host);
> + struct hv_device *dev = host_dev->dev;
> +
> + return storvsc_host_reset(dev);
> +}
Why is storvsc_host_reset split from this function? Also the comment
really doesn't tell us anything, I'd rather leave it away.
> +/*
> + * storvsc_command_completion - Command completion processing
> + */
> +static void storvsc_command_completion(struct hv_storvsc_request *request)
I really don't see how this comment adds any value over the pure
function name.
> + if (vm_srb->srb_status == 0x4)
> + if (vm_srb->srb_status == 0x20) {
Please add defines for the vm_srb->srb_status values.
> +static bool storvsc_check_scsi_cmd(struct scsi_cmnd *scmnd)
Please call this storvsc_ignore_command or similar to make the use more
obvious.
> + /* If retrying, no need to prep the cmd */
> + if (scmnd->host_scribble) {
> +
> + cmd_request =
> + (struct storvsc_cmd_request *)scmnd->host_scribble;
> +
> + goto retry_request;
> + }
I don't think this can ever happen given that you make sure to always
clear ->host_scribble when returning SCSI_MLQUEUE_*_BUSY.
> + request_size = sizeof(struct storvsc_cmd_request);
> +
> + cmd_request = mempool_alloc(memp->request_mempool,
> + GFP_ATOMIC);
> + if (!cmd_request)
> + return SCSI_MLQUEUE_DEVICE_BUSY;
The point of the mempool allocator is that it will never return NULL.
> +
> + memset(cmd_request, 0, sizeof(struct storvsc_cmd_request));
> +
> + /* Setup the cmd request */
> + cmd_request->bounce_sgl_count = 0;
> + cmd_request->bounce_sgl = NULL;
> + cmd_request->cmd = scmnd;
Either use memset for the whole structure, or set the fields that need
it to 0 explicitly, but not both. I doubt it's more efficient to only
initialise the fields that need it.
> + if (!cmd_request->bounce_sgl) {
> + scmnd->host_scribble = NULL;
> + mempool_free(cmd_request,
> + memp->request_mempool);
> +
> + return SCSI_MLQUEUE_HOST_BUSY;
> + }
> + if (cmd_request->bounce_sgl_count)
> + destroy_bounce_buffer(cmd_request->bounce_sgl,
> + cmd_request->bounce_sgl_count);
> +
> + mempool_free(cmd_request, memp->request_mempool);
> +
> + scmnd->host_scribble = NULL;
> +
> + ret = SCSI_MLQUEUE_DEVICE_BUSY;
Please use goto labels and a single error exit for unwindining on
failure.
> +/* Scsi driver */
Not a useful comment, just remove it.
> +/*
> + * storvsc_probe - Add a new device for this driver
> + */
Not a very useful comment, just remove it.
> + if (dev_is_ide)
> + storvsc_get_ide_info(device, &target, &path);
path is never used, so drop that argument from storvsc_get_ide_info
please. Target is only used in the next dev_is_ide block, so please
move this call to it.
> + /* max # of devices per target */
> + host->max_lun = STORVSC_MAX_LUNS_PER_TARGET;
> + /* max # of targets per channel */
> + host->max_id = STORVSC_MAX_TARGETS;
> + /* max # of channels */
> + host->max_channel = STORVSC_MAX_CHANNELS - 1;
> + /* max cmd length */
> + host->max_cmd_len = STORVSC_MAX_CMD_LEN;
Any reason these aren't set directly in the host template?
> + if (!dev_is_ide) {
> + scsi_scan_host(host);
> + return 0;
> + }
> + ret = scsi_add_device(host, 0, target, 0);
> + if (ret) {
> + scsi_remove_host(host);
> + goto err_out2;
> + }
> + return 0;
I'd write this as an if/else block to be be more clear.
> +/* The one and only one */
this isn't an overly useful comment.
^ permalink raw reply
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Michael S. Tsirkin @ 2012-01-11 10:21 UTC (permalink / raw)
To: Rusty Russell
Cc: Christian Borntraeger, Benjamin Herrenschmidt, Sasha Levin,
Pawel Moll, virtualization
In-Reply-To: <8762gj6q5r.fsf@rustcorp.com.au>
On Wed, Jan 11, 2012 at 10:55:52AM +1030, Rusty Russell wrote:
> On Tue, 10 Jan 2012 19:03:36 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote:
> > > Yes. The idea that we can alter fields in the device-specific config
> > > area is flawed. There may be cases where it doesn't matter, but as an
> > > idea it was holed to begin with.
> > >
> > > We can reduce probability by doing a double read to check, but there are
> > > still cases where it will fail.
> >
> > Okay - want me to propose an interface for that?
>
> Had a brief chat with BenH (CC'd).
>
> I think we should deprecate writing to the config space. Only balloon
> does it AFAICT, and I can't quite figure out *why* it has an 'active'
> field.
Are you sure? I think net writes a mac address.
> This solves half the problem, of sync guest writes. For the
> other half, I suggest a generation counter; odd means inconsistent. The
> guest can poll.
So we get the counter until it's even, get the config, if it's changed
repeat? Yes it works. However, I would like to have a way to detect
config change just by looking at memory. ATM we need to read ISR to
know. If we used a VQ, the advantage would be that the device can work
with a single MSIX vector shared by all VQs.
If we do require config VQ anyway, why not use it to notify
guest of config changes? Guest could pre-post an in buffer
and host uses that.
> BenH also convinced me we should finally make the config space LE if
> we're going to change things. Since PCI is the most common transport,
> guest-endian confuses people. And it sucks for really weird machines.
Are we going to keep guest endian for e.g. virtio net header?
If yes the benefit of switching config space is not that big.
And changes in devices would affect non-PCI transports.
> We should also change the ring (to a single ring, I think). Descriptors
> to 24 bytes long (8 byte cookie, 8 byte addr, 4 byte len, 4 byte flags).
> We might be able to squeeze it into 20 bytes but that means packing. We
> should support inline, chained or indirect. Let the other side ack by
> setting flag, cookie and len (if written).
Quite possibly all or some of these things help performance
but do we have to change the spec before we have experimental
proof?
I did experiment with a single ring using tools/virtio and
I didn't see a measureable performance gain. Two rings
do have the advantage of not requiring host side copy,
which copy would surely add to cache pressure. Since
host doesn't change desriptors, we could also
preformat some descriptors in the current design.
There is a fragmentation problem in theory but it can be alleviated with
a smart allocator. About inline - it can only help very small buffers.
Which workloads do you have in mind exactly?
> Moreover, I think we should make all these changes at once (at least, in
> the spec). That makes it a big change, and it'll take longer to
> develop, but makes it easy in the long run to differentiate legacy and
> modern virtio.
>
> Thoughts?
> Rusty.
BTW this seems to be the reverse from what you have in Mar 2001,
see 87mxkjls61.fsf@rustcorp.com.au :)
I am much less concerned with what we do for configuration,
but I do not believe we have learned all performance lessons
from virtio ring1. Is there any reason why we shouldn't be
able to experiment with inline within virtio1 and see
whether that gets us anything?
If we do a bunch of changes to the ring at once, we can't
figure out what's right, what's wrong, or back out of
mistakes later.
Since there are non PCI transports that use the ring,
we really shouldn't make both the configuration and
the ring changes depend on the same feature bit.
--
MST
^ permalink raw reply
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Benjamin Herrenschmidt @ 2012-01-11 9:10 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Pawel Moll, Michael S. Tsirkin, virtualization,
Christian Borntraeger, Sasha Levin
In-Reply-To: <CAJSP0QU_viNks2jv003c8+gYWFkyW=FvZpU5w6wyroeEyFkXsA@mail.gmail.com>
On Wed, 2012-01-11 at 08:47 +0000, Stefan Hajnoczi wrote:
>
> This is also an opportunity to stop using CPU physical addresses in
> the ring and instead perform DMA like a normal PCI device (use bus
> addresses).
Euh why ?
That would mean in many cases adding a layer of iommu, which will slow
things down a lot ... unless we create a special virtio bus which has
its own dma-ops that do a direct v:p translation that is. But anything
PCI doing dma_map_sg and co will involved emulated iommu mapping &
unmapping and will be a huge hit on performances.
Cheers,
Ben.
^ permalink raw reply
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Stefan Hajnoczi @ 2012-01-11 8:47 UTC (permalink / raw)
To: Rusty Russell
Cc: Pawel Moll, Michael S. Tsirkin, Benjamin Herrenschmidt,
virtualization, Christian Borntraeger, Sasha Levin
In-Reply-To: <8762gj6q5r.fsf@rustcorp.com.au>
On Wed, Jan 11, 2012 at 12:25 AM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> On Tue, 10 Jan 2012 19:03:36 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote:
>> > Yes. The idea that we can alter fields in the device-specific config
>> > area is flawed. There may be cases where it doesn't matter, but as an
>> > idea it was holed to begin with.
>> >
>> > We can reduce probability by doing a double read to check, but there are
>> > still cases where it will fail.
>>
>> Okay - want me to propose an interface for that?
>
> Had a brief chat with BenH (CC'd).
>
> I think we should deprecate writing to the config space. Only balloon
> does it AFAICT, and I can't quite figure out *why* it has an 'active'
> field. This solves half the problem, of sync guest writes. For the
> other half, I suggest a generation counter; odd means inconsistent. The
> guest can poll.
>
> BenH also convinced me we should finally make the config space LE if
> we're going to change things. Since PCI is the most common transport,
> guest-endian confuses people. And it sucks for really weird machines.
>
> We should also change the ring (to a single ring, I think). Descriptors
> to 24 bytes long (8 byte cookie, 8 byte addr, 4 byte len, 4 byte flags).
> We might be able to squeeze it into 20 bytes but that means packing. We
> should support inline, chained or indirect. Let the other side ack by
> setting flag, cookie and len (if written).
>
> Moreover, I think we should make all these changes at once (at least, in
> the spec). That makes it a big change, and it'll take longer to
> develop, but makes it easy in the long run to differentiate legacy and
> modern virtio.
This is also an opportunity to stop using CPU physical addresses in
the ring and instead perform DMA like a normal PCI device (use bus
addresses).
Stefan
^ permalink raw reply
* Re: [PATCH] vhost-net: add module alias
From: Amos Kong @ 2012-01-11 7:43 UTC (permalink / raw)
To: Stephen Hemminger
Cc: kvm, Michael S. Tsirkin, netdev, virtualization, device, alan
In-Reply-To: <20120110205400.6c1cb306@nehalam.linuxnetplumber.net>
[-- Attachment #1.1: Type: text/plain, Size: 2350 bytes --]
On Wed, Jan 11, 2012 at 12:54 PM, Stephen Hemminger
<shemminger@vyatta.com>wrote:
> By adding the a module alias, programs (or users) won't have to explicitly
> call modprobe. Vhost-net will always be available if built into the kernel.
> It does require assigning a permanent minor number for depmod to work.
> Choose one next to TUN since this driver is related to it.
>
> Also, use C99 style initialization.
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
> ---
> drivers/vhost/net.c | 8 +++++---
> include/linux/miscdevice.h | 1 +
> 2 files changed, 6 insertions(+), 3 deletions(-)
>
> --- a/drivers/vhost/net.c 2012-01-10 10:56:58.883179194 -0800
> +++ b/drivers/vhost/net.c 2012-01-10 19:48:23.650225892 -0800
> @@ -856,9 +856,9 @@ static const struct file_operations vhos
> };
>
> static struct miscdevice vhost_net_misc = {
> - MISC_DYNAMIC_MINOR,
> - "vhost-net",
> - &vhost_net_fops,
> + .minor = VHOST_NET_MINOR,
> + .name = "vhost-net",
> + .fops = &vhost_net_fops,
> };
>
> static int vhost_net_init(void)
> @@ -879,3 +879,5 @@ MODULE_VERSION("0.0.1");
> MODULE_LICENSE("GPL v2");
> MODULE_AUTHOR("Michael S. Tsirkin");
> MODULE_DESCRIPTION("Host kernel accelerator for virtio net");
> +MODULE_ALIAS_MISCDEV(VHOST_NET_MINOR);
> +MODULE_ALIAS("devname:vhost-net");
> --- a/include/linux/miscdevice.h 2012-01-10 10:56:59.779189436
> -0800
> +++ b/include/linux/miscdevice.h 2012-01-10 19:49:56.091748210
> -0800
> @@ -31,6 +31,7 @@
> #define I2O_MINOR 166
> #define MICROCODE_MINOR 184
> #define TUN_MINOR 200
> +#define VHOST_NET_MINOR 201
>
CC: alan@linux.intel.com
CC: device@lanana.org
include/linux/miscdevice.h:
/*
* These allocations are managed by device@lanana.org. If you use an
* entry that is not in assigned your entry may well be moved and
* reassigned, or set dynamic if a fixed value is not justified.
*/
#define MWAVE_MINOR 219 /* ACP/Mwave Modem */
> #define MPT_MINOR 220
> #define MPT2SAS_MINOR 221
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>
[-- Attachment #1.2: Type: text/html, Size: 3664 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox