* Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Andrew Jones @ 2012-01-09 10:37 UTC (permalink / raw)
To: Stefano Stabellini; +Cc: jeremy, xen-devel, konrad wilk, virtualization
In-Reply-To: <alpine.DEB.2.00.1201061648440.3150@kaball-desktop>
----- Original Message -----
> On Fri, 6 Jan 2012, Andrew Jones wrote:
> > XEN_DOM0 is a silent option that has been automatically selected
> > when
> > CONFIG_XEN is selected since 6b0661a5e6fbf. If this option was
> > changed
> > to a menu configurable option then it would only give users the
> > ability
> > to compile out about 100 kbytes of code.
>
> I think that it is less than that because backends are not tied to
> dom0
> in any way, even though currently they cannot be compiled without
> XEN_DOM0.
> Running a network or block backend in domU is a well known
> configuration.
> Therefore the code compiled out only amounts to about 10K.
>
>
> > diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> > index 8795480..0725228 100644
> > --- a/drivers/xen/Kconfig
> > +++ b/drivers/xen/Kconfig
> > @@ -78,8 +78,9 @@ config XEN_DEV_EVTCHN
> >
> > config XEN_BACKEND
> > bool "Backend driver support"
> > - depends on XEN_DOM0
> > default y
> > + depends on XEN && PCI_XEN && SWIOTLB_XEN
> > + depends on X86_LOCAL_APIC && X86_IO_APIC && ACPI && PCI
> > help
> > Support for backend device drivers that provide I/O services
> > to other virtual machines.
>
> I think we can trimmer the dependency list here: after all backends
> can
> be run in unpriviledged guests as well (see driver domains).
> So I think it should depend only on XEN.
Hmm.. Actually, with dependency lists in mind, I think a really messed
this patch up. I should have either had CONFIG_XEN inherit these deps,
or I should have spread them around to be required at only the specific
places they're needed, and then left the stub functions in. Neither of
those options seems like a good idea to me. We either force all XEN
guests to always have all the deps needed for an initial domain, which
is exactly the opposite of the suggestion above (trimming XEN_BACKEND
deps for driver domains), or we actually make the code messier and
more complicated with more #ifdefs, which are now neatly packaged with
XEN_DOM0.
The driver domain concept may actually be the technical need for
making XEN_DOM0 optional. If we want the ability to build a minimally
configed XEN kernel to be used as a driver domain, then it doesn't
seem like we'd want it to also be capable of running as an initial
domain, or to be forced to have all the dependencies and code of an
initial domain.
With that in mind, I propose we go back to my initial patch, relax
deps for XEN_BACKEND, and double check that each individual backend
has the appropriate minimal deps. In general, it should be a goal to
make sure most XEN options are menu selectable and that all dep lists
are minimized in order to make sure driver domains can be built with
minimal configs.
Drew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
^ permalink raw reply
* Re: [Xen-devel] [PATCH 2/4] xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps
From: Andrew Jones @ 2012-01-09 10:43 UTC (permalink / raw)
To: Dmitry Torokhov
Cc: jeremy, xen-devel, konrad wilk, FlorianSchandinat, virtualization,
Konrad Rzeszutek Wilk
In-Reply-To: <20120109075911.GA4049@core.coreip.homeip.net>
----- Original Message -----
> Hi Andrew,
>
> On Fri, Jan 06, 2012 at 10:58:06AM -0500, Andrew Jones wrote:
> >
> >
> > ----- Original Message -----
> > > On Fri, Jan 06, 2012 at 10:43:09AM +0100, Andrew Jones wrote:
> > > > PV-on-HVM guests may want to use the xen keyboard/mouse
> > > > frontend,
> > > > but
> > > > they don't use the xen frame buffer frontend. For this case it
> > > > doesn't
> > > > make much sense for INPUT_XEN_KBDDEV_FRONTEND to depend on
> > > > XEN_FBDEV_FRONTEND. The opposite direction always makes more
> > > > sense,
> > > > i.e.
> > > > if you're using xenfb, then you'll want xenkbd. Switch the
> > > > dependencies.
> > >
> > > You need to CC as well these people that have 'maintainer' field
> > > on
> > > them:
> > >
> > > konrad@phenom:~/work/linux$ scripts/get_maintainer.pl -f
> > > drivers/video/Kconfig
> > > Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
> > > (maintainer:FRAMEBUFFER LAYER)
> > > linux-fbdev@vger.kernel.org (open list:FRAMEBUFFER LAYER)
> > > linux-kernel@vger.kernel.org (open list)
> > > konrad@phenom:~/work/linux$ scripts/get_maintainer.pl -f
> > > drivers/input/misc/Kconfig
> > > Dmitry Torokhov <dmitry.torokhov@gmail.com> (maintainer:INPUT
> > > (KEYBOARD,...,commit_signer:9/16=56%)
> > > Samuel Ortiz <sameo@linux.intel.com> (commit_signer:3/16=19%)
> > > Anirudh Ghayal <aghayal@codeaurora.org> (commit_signer:2/16=12%)
> > > Peter Ujfalusi <peter.ujfalusi@ti.com> (commit_signer:2/16=12%)
> > > Alan Cox <alan@linux.intel.com> (commit_signer:2/16=12%)
> > > linux-input@vger.kernel.org (open list:INPUT (KEYBOARD,...)
> > > linux-kernel@vger.kernel.org (open list)
> > >
> >
> > Thanks. Replied with them in CC.
> >
> > Drew
> >
> > > >
> > > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > > ---
> > > > drivers/input/misc/Kconfig | 2 +-
> > > > drivers/video/Kconfig | 1 +
> > > > 2 files changed, 2 insertions(+), 1 deletions(-)
> > > >
> > > > diff --git a/drivers/input/misc/Kconfig
> > > > b/drivers/input/misc/Kconfig
> > > > index 22d875f..36c15bf 100644
> > > > --- a/drivers/input/misc/Kconfig
> > > > +++ b/drivers/input/misc/Kconfig
> > > > @@ -533,7 +533,7 @@ config INPUT_CMA3000_I2C
> > > >
> > > > config INPUT_XEN_KBDDEV_FRONTEND
> > > > tristate "Xen virtual keyboard and mouse support"
> > > > - depends on XEN_FBDEV_FRONTEND
> > > > + depends on XEN
>
> This is OK with me.
>
> > > > default y
> > > > select XEN_XENBUS_FRONTEND
> > > > help
> > > > diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
> > > > index d83e967..269b299 100644
> > > > --- a/drivers/video/Kconfig
> > > > +++ b/drivers/video/Kconfig
> > > > @@ -2269,6 +2269,7 @@ config XEN_FBDEV_FRONTEND
> > > > select FB_SYS_IMAGEBLIT
> > > > select FB_SYS_FOPS
> > > > select FB_DEFERRED_IO
> > > > + select INPUT_XEN_KBDDEV_FRONTEND
>
> But here you need to either depend on or select INPUT as select does
> not
> resolve dependencies for selected symbol.
>
Would I actually need 'select INPUT' and select 'INPUT_MISC'? Maybe
'depends on' would just be cleaner and safer. I'll send a V2.
Thanks,
Drew
> Thanks.
>
> --
> Dmitry
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
^ permalink raw reply
* Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Stefano Stabellini @ 2012-01-09 11:39 UTC (permalink / raw)
To: Andrew Jones
Cc: jeremy@goop.org, xen-devel@lists.xensource.com,
virtualization@lists.linux-foundation.org, konrad wilk,
Stefano Stabellini
In-Reply-To: <d6148ae9-932e-4fba-947e-ee70b2e25fd2@zmail13.collab.prod.int.phx2.redhat.com>
On Mon, 9 Jan 2012, Andrew Jones wrote:
> ----- Original Message -----
> > On Fri, 6 Jan 2012, Andrew Jones wrote:
> > > XEN_DOM0 is a silent option that has been automatically selected
> > > when
> > > CONFIG_XEN is selected since 6b0661a5e6fbf. If this option was
> > > changed
> > > to a menu configurable option then it would only give users the
> > > ability
> > > to compile out about 100 kbytes of code.
> >
> > I think that it is less than that because backends are not tied to
> > dom0
> > in any way, even though currently they cannot be compiled without
> > XEN_DOM0.
> > Running a network or block backend in domU is a well known
> > configuration.
> > Therefore the code compiled out only amounts to about 10K.
> >
> >
> > > diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> > > index 8795480..0725228 100644
> > > --- a/drivers/xen/Kconfig
> > > +++ b/drivers/xen/Kconfig
> > > @@ -78,8 +78,9 @@ config XEN_DEV_EVTCHN
> > >
> > > config XEN_BACKEND
> > > bool "Backend driver support"
> > > - depends on XEN_DOM0
> > > default y
> > > + depends on XEN && PCI_XEN && SWIOTLB_XEN
> > > + depends on X86_LOCAL_APIC && X86_IO_APIC && ACPI && PCI
> > > help
> > > Support for backend device drivers that provide I/O services
> > > to other virtual machines.
> >
> > I think we can trimmer the dependency list here: after all backends
> > can
> > be run in unpriviledged guests as well (see driver domains).
> > So I think it should depend only on XEN.
>
> Hmm.. Actually, with dependency lists in mind, I think a really messed
> this patch up. I should have either had CONFIG_XEN inherit these deps,
> or I should have spread them around to be required at only the specific
> places they're needed, and then left the stub functions in. Neither of
> those options seems like a good idea to me. We either force all XEN
> guests to always have all the deps needed for an initial domain, which
> is exactly the opposite of the suggestion above (trimming XEN_BACKEND
> deps for driver domains), or we actually make the code messier and
> more complicated with more #ifdefs, which are now neatly packaged with
> XEN_DOM0.
I don't think we should add "PCI_XEN && SWIOTLB_XEN && X86_LOCAL_APIC &&
X86_IO_APIC && ACPI && PCI" to XEN either.
However it should be possible to add only the right dependencies to the
right places.
In practice it should be sufficient to:
- make PCI_XEN depend on X86_LOCAL_APIC && X86_IO_APIC && ACPI;
- make XEN_PCIDEV_FRONTEND and XEN_PCIDEV_BACKEND depend on PCI_XEN;
- remove the 'ifdef CONFIG_ACPI' from arch/x86/pci/xen.c.
> The driver domain concept may actually be the technical need for
> making XEN_DOM0 optional. If we want the ability to build a minimally
> configed XEN kernel to be used as a driver domain, then it doesn't
> seem like we'd want it to also be capable of running as an initial
> domain, or to be forced to have all the dependencies and code of an
> initial domain.
In practice any decent driver domain needs PCI and ACPI support, so
basically it has the same config requirement of XEN_DOM0. At that point
we have already discussed that having XEN_DOM0 enabled or disabled
hardly makes any differences, so we can just get rid of it.
^ permalink raw reply
* Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Konrad Rzeszutek Wilk @ 2012-01-09 14:56 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Andrew Jones, xen-devel@lists.xensource.com, jeremy@goop.org,
virtualization@lists.linux-foundation.org
In-Reply-To: <alpine.DEB.2.00.1201091054111.3150@kaball-desktop>
On Mon, Jan 09, 2012 at 11:39:44AM +0000, Stefano Stabellini wrote:
> On Mon, 9 Jan 2012, Andrew Jones wrote:
> > ----- Original Message -----
> > > On Fri, 6 Jan 2012, Andrew Jones wrote:
> > > > XEN_DOM0 is a silent option that has been automatically selected
> > > > when
> > > > CONFIG_XEN is selected since 6b0661a5e6fbf. If this option was
> > > > changed
> > > > to a menu configurable option then it would only give users the
> > > > ability
> > > > to compile out about 100 kbytes of code.
> > >
> > > I think that it is less than that because backends are not tied to
> > > dom0
> > > in any way, even though currently they cannot be compiled without
> > > XEN_DOM0.
> > > Running a network or block backend in domU is a well known
> > > configuration.
> > > Therefore the code compiled out only amounts to about 10K.
> > >
> > >
> > > > diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> > > > index 8795480..0725228 100644
> > > > --- a/drivers/xen/Kconfig
> > > > +++ b/drivers/xen/Kconfig
> > > > @@ -78,8 +78,9 @@ config XEN_DEV_EVTCHN
> > > >
> > > > config XEN_BACKEND
> > > > bool "Backend driver support"
> > > > - depends on XEN_DOM0
> > > > default y
> > > > + depends on XEN && PCI_XEN && SWIOTLB_XEN
> > > > + depends on X86_LOCAL_APIC && X86_IO_APIC && ACPI && PCI
> > > > help
> > > > Support for backend device drivers that provide I/O services
> > > > to other virtual machines.
> > >
> > > I think we can trimmer the dependency list here: after all backends
> > > can
> > > be run in unpriviledged guests as well (see driver domains).
> > > So I think it should depend only on XEN.
> >
> > Hmm.. Actually, with dependency lists in mind, I think a really messed
> > this patch up. I should have either had CONFIG_XEN inherit these deps,
> > or I should have spread them around to be required at only the specific
> > places they're needed, and then left the stub functions in. Neither of
> > those options seems like a good idea to me. We either force all XEN
> > guests to always have all the deps needed for an initial domain, which
> > is exactly the opposite of the suggestion above (trimming XEN_BACKEND
> > deps for driver domains), or we actually make the code messier and
> > more complicated with more #ifdefs, which are now neatly packaged with
> > XEN_DOM0.
>
> I don't think we should add "PCI_XEN && SWIOTLB_XEN && X86_LOCAL_APIC &&
> X86_IO_APIC && ACPI && PCI" to XEN either.
> However it should be possible to add only the right dependencies to the
> right places.
> In practice it should be sufficient to:
>
> - make PCI_XEN depend on X86_LOCAL_APIC && X86_IO_APIC && ACPI;
Not good. You can make non-ACPI builds with PCI.
.. and you are missing CONFIG_PCI in there.
>
> - make XEN_PCIDEV_FRONTEND and XEN_PCIDEV_BACKEND depend on PCI_XEN;
OK. That sounds good.
>
> - remove the 'ifdef CONFIG_ACPI' from arch/x86/pci/xen.c.
No.. same issue - non-ACPI builds can be with PCI.
>
>
>
> > The driver domain concept may actually be the technical need for
> > making XEN_DOM0 optional. If we want the ability to build a minimally
> > configed XEN kernel to be used as a driver domain, then it doesn't
> > seem like we'd want it to also be capable of running as an initial
> > domain, or to be forced to have all the dependencies and code of an
> > initial domain.
>
> In practice any decent driver domain needs PCI and ACPI support, so
> basically it has the same config requirement of XEN_DOM0. At that point
Sure.. for backends. But for frontends that requirement is not always true.
> we have already discussed that having XEN_DOM0 enabled or disabled
> hardly makes any differences, so we can just get rid of it.
Or in other words, substitute it as CONFIG_PCI_XEN.
But this brings a question about MCE, ACPI cpu freq and ACPI S3.
Those all belong to the dom0 - not to any driver domain. So getting rid
of CONFIG_XEN_DOM0 would present a problem - what would those depend on?
^ permalink raw reply
* Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Stefano Stabellini @ 2012-01-09 16:12 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Andrew Jones, xen-devel@lists.xensource.com,
virtualization@lists.linux-foundation.org, jeremy@goop.org,
Stefano Stabellini
In-Reply-To: <20120109145648.GA25563@phenom.dumpdata.com>
On Mon, 9 Jan 2012, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 09, 2012 at 11:39:44AM +0000, Stefano Stabellini wrote:
> > I don't think we should add "PCI_XEN && SWIOTLB_XEN && X86_LOCAL_APIC &&
> > X86_IO_APIC && ACPI && PCI" to XEN either.
> > However it should be possible to add only the right dependencies to the
> > right places.
> > In practice it should be sufficient to:
> >
> > - make PCI_XEN depend on X86_LOCAL_APIC && X86_IO_APIC && ACPI;
>
> Not good. You can make non-ACPI builds with PCI.
>
> .. and you are missing CONFIG_PCI in there.
> >
> > - make XEN_PCIDEV_FRONTEND and XEN_PCIDEV_BACKEND depend on PCI_XEN;
>
> OK. That sounds good.
> >
> > - remove the 'ifdef CONFIG_ACPI' from arch/x86/pci/xen.c.
>
> No.. same issue - non-ACPI builds can be with PCI.
> >
Right.. in that case we should carefully replace the 'ifdef
CONFIG_XEN_DOM0' with 'ifdef CONFIG_ACPI' in arch/x86/pci/xen.c.
It would effectively keep things as they are now: if ACPI and XEN are
enabled together in the config file, your kernel is able to setup
interrupts when running as DOM0.
Regarding X86_LOCAL_APIC and X86_IO_APIC I cannot recall anymore where
these dependencies come from exactly.
> > > The driver domain concept may actually be the technical need for
> > > making XEN_DOM0 optional. If we want the ability to build a minimally
> > > configed XEN kernel to be used as a driver domain, then it doesn't
> > > seem like we'd want it to also be capable of running as an initial
> > > domain, or to be forced to have all the dependencies and code of an
> > > initial domain.
> >
> > In practice any decent driver domain needs PCI and ACPI support, so
> > basically it has the same config requirement of XEN_DOM0. At that point
>
> Sure.. for backends. But for frontends that requirement is not always true.
Right but a driver domain is bound to have at least the pci backend.
> > we have already discussed that having XEN_DOM0 enabled or disabled
> > hardly makes any differences, so we can just get rid of it.
>
> Or in other words, substitute it as CONFIG_PCI_XEN.
I was just trying to assign the dependencies where they really belong
and I proposed to remove the 'ifdef CONFIG_ACPI' because I didn't
realize that one can build working PCI configurations on XEN without it.
The fact that PCI_XEN would be used then almost as XEN_DOM0 is used now
is just a side effect, due to the fact that PCI and device interrupts
initialization is what mainly differentiates dom0 from other
configurations.
> But this brings a question about MCE, ACPI cpu freq and ACPI S3.
> Those all belong to the dom0 - not to any driver domain. So getting rid
> of CONFIG_XEN_DOM0 would present a problem - what would those depend on?
They would depend on XEN and whatever else they really depend on, for
example ACPI.
^ permalink raw reply
* Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Konrad Rzeszutek Wilk @ 2012-01-09 17:04 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Andrew Jones, xen-devel@lists.xensource.com, jeremy@goop.org,
virtualization@lists.linux-foundation.org
In-Reply-To: <alpine.DEB.2.00.1201091532420.3150@kaball-desktop>
On Mon, Jan 09, 2012 at 04:12:10PM +0000, Stefano Stabellini wrote:
> On Mon, 9 Jan 2012, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jan 09, 2012 at 11:39:44AM +0000, Stefano Stabellini wrote:
> > > I don't think we should add "PCI_XEN && SWIOTLB_XEN && X86_LOCAL_APIC &&
> > > X86_IO_APIC && ACPI && PCI" to XEN either.
> > > However it should be possible to add only the right dependencies to the
> > > right places.
> > > In practice it should be sufficient to:
> > >
> > > - make PCI_XEN depend on X86_LOCAL_APIC && X86_IO_APIC && ACPI;
> >
> > Not good. You can make non-ACPI builds with PCI.
> >
> > .. and you are missing CONFIG_PCI in there.
> > >
> > > - make XEN_PCIDEV_FRONTEND and XEN_PCIDEV_BACKEND depend on PCI_XEN;
> >
> > OK. That sounds good.
> > >
> > > - remove the 'ifdef CONFIG_ACPI' from arch/x86/pci/xen.c.
> >
> > No.. same issue - non-ACPI builds can be with PCI.
> > >
>
> Right.. in that case we should carefully replace the 'ifdef
> CONFIG_XEN_DOM0' with 'ifdef CONFIG_ACPI' in arch/x86/pci/xen.c.
.. which I think I did at some point as part of cleanup and then
removed them since it peppered the xen.c with tons of #ifdef CONFIG_ACPI.
What about PVonHVM. There is this nagging feeling in the back of my
head that turning CONFIG_XEN_DOM0 disables the CONFIG_PVHVM option?
Or something like that? Maybe it is the other way around?
It would seem we need to seperate the PVHVM from DOM0. But the extra
complexity comes with Mukesh's PVonHVM Hybrid which can do dom0 stuff
with PVonHVM (should be searchable on xen-devel to find the patches).
Ian also mentioned that we MUST have the XEN_PRIVILIGED option, otherwise
we will cause a regression with the GRUB tools. So that must stay or we need
provide a deprecation schedule for the next 3 years to remove it and
have patches in the GRUB toolchains.
Either way, there is also the question of making sure that no regressions
are introduced - so a lot of the CONFIG_* interdependencies will have
to be checked. I think that means checking CONFIG_XEN, CONFIG_PRIVILIGED,
CONFIG_PVHVM, CONFIG_PCI, CONFIG_ACPI, CONFIG_XEN_BACKEND in various combinations.
Oh, I forgot CONFIG_XEN_FRONTEND.. And I think that one can also become
a module (ditto for XEN_BACKEND)
And everytime we did something to those we managed to mess them up so
we should be dilligient.
^ permalink raw reply
* Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Andrew Jones @ 2012-01-09 17:26 UTC (permalink / raw)
To: Stefano Stabellini
Cc: jeremy, xen-devel, Konrad Rzeszutek Wilk, virtualization
In-Reply-To: <alpine.DEB.2.00.1201091532420.3150@kaball-desktop>
----- Original Message -----
> On Mon, 9 Jan 2012, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jan 09, 2012 at 11:39:44AM +0000, Stefano Stabellini wrote:
> > > I don't think we should add "PCI_XEN && SWIOTLB_XEN &&
> > > X86_LOCAL_APIC &&
> > > X86_IO_APIC && ACPI && PCI" to XEN either.
> > > However it should be possible to add only the right dependencies
> > > to the
> > > right places.
> > > In practice it should be sufficient to:
> > >
> > > - make PCI_XEN depend on X86_LOCAL_APIC && X86_IO_APIC && ACPI;
> >
> > Not good. You can make non-ACPI builds with PCI.
> >
> > .. and you are missing CONFIG_PCI in there.
> > >
> > > - make XEN_PCIDEV_FRONTEND and XEN_PCIDEV_BACKEND depend on
> > > PCI_XEN;
> >
> > OK. That sounds good.
> > >
> > > - remove the 'ifdef CONFIG_ACPI' from arch/x86/pci/xen.c.
> >
> > No.. same issue - non-ACPI builds can be with PCI.
> > >
>
> Right.. in that case we should carefully replace the 'ifdef
> CONFIG_XEN_DOM0' with 'ifdef CONFIG_ACPI' in arch/x86/pci/xen.c.
> It would effectively keep things as they are now: if ACPI and XEN are
> enabled together in the config file, your kernel is able to setup
> interrupts when running as DOM0.
> Regarding X86_LOCAL_APIC and X86_IO_APIC I cannot recall anymore
> where
> these dependencies come from exactly.
>
Here's one path
pci_xen_initial_domain [arch/x86/pci/xen.c] (#ifdef XEN_DOM0)
xen_setup_acpi_sci [arch/x86/pci/xen.c] (#ifdef XEN_DOM0)
acpi_get_override_irq [arch/x86/kernel/apic/io_apic.c]
(#ifdef X86_IO_APIC)
The messiness I stated before comes in when we try to find all these
paths and make sure they're appropriately #ifdef'ed with the minimal
set of symbols from the set that XEN_DOM0 used to cover for us.
However, without alternative implementations in the absence of
particular config options, then we still need to stub the functions out
at the top level. So we could simply s/r XEN_DOM0 with X86_LOCAL_APIC
&& X86_IO_APIC && ACPI in arch/x86/pci/xen.c, but that doesn't make
much sense either, because XEN_DOM0 does a nice job keeping things neat
and well documented, i.e. we can quickly tell what functions are
dom0-only.
>
> > > > The driver domain concept may actually be the technical need
> > > > for
> > > > making XEN_DOM0 optional. If we want the ability to build a
> > > > minimally
> > > > configed XEN kernel to be used as a driver domain, then it
> > > > doesn't
> > > > seem like we'd want it to also be capable of running as an
> > > > initial
> > > > domain, or to be forced to have all the dependencies and code
> > > > of an
> > > > initial domain.
> > >
> > > In practice any decent driver domain needs PCI and ACPI support,
> > > so
> > > basically it has the same config requirement of XEN_DOM0. At that
> > > point
> >
> > Sure.. for backends. But for frontends that requirement is not
> > always true.
>
> Right but a driver domain is bound to have at least the pci backend.
>
>
> > > we have already discussed that having XEN_DOM0 enabled or
> > > disabled
> > > hardly makes any differences, so we can just get rid of it.
> >
> > Or in other words, substitute it as CONFIG_PCI_XEN.
>
> I was just trying to assign the dependencies where they really belong
> and I proposed to remove the 'ifdef CONFIG_ACPI' because I didn't
> realize that one can build working PCI configurations on XEN without
> it.
>
> The fact that PCI_XEN would be used then almost as XEN_DOM0 is used
> now
> is just a side effect, due to the fact that PCI and device interrupts
> initialization is what mainly differentiates dom0 from other
> configurations.
>
>
> > But this brings a question about MCE, ACPI cpu freq and ACPI S3.
> > Those all belong to the dom0 - not to any driver domain. So getting
> > rid
> > of CONFIG_XEN_DOM0 would present a problem - what would those
> > depend on?
>
> They would depend on XEN and whatever else they really depend on, for
> example ACPI.
This brings up my comment before about me not being the best person
to suggest removing XEN_DOM0. It appears this symbol is fairly useful
even now for the reasons I mentioned above, and future work may once
again find it the cleanest way to manage dependencies. Possibly
dependencies will even expand to a point where it sufficiently
diverges from the requirements of driver domains. Then users may like
to be able to configure it out. I don't know.
Drew
^ permalink raw reply
* Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Andrew Jones @ 2012-01-09 17:38 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: jeremy, xen-devel, Stefano Stabellini, virtualization
In-Reply-To: <20120109170418.GD28700@phenom.dumpdata.com>
----- Original Message -----
> On Mon, Jan 09, 2012 at 04:12:10PM +0000, Stefano Stabellini wrote:
> > On Mon, 9 Jan 2012, Konrad Rzeszutek Wilk wrote:
> > > On Mon, Jan 09, 2012 at 11:39:44AM +0000, Stefano Stabellini
> > > wrote:
> > > > I don't think we should add "PCI_XEN && SWIOTLB_XEN &&
> > > > X86_LOCAL_APIC &&
> > > > X86_IO_APIC && ACPI && PCI" to XEN either.
> > > > However it should be possible to add only the right
> > > > dependencies to the
> > > > right places.
> > > > In practice it should be sufficient to:
> > > >
> > > > - make PCI_XEN depend on X86_LOCAL_APIC && X86_IO_APIC && ACPI;
> > >
> > > Not good. You can make non-ACPI builds with PCI.
> > >
> > > .. and you are missing CONFIG_PCI in there.
> > > >
> > > > - make XEN_PCIDEV_FRONTEND and XEN_PCIDEV_BACKEND depend on
> > > > PCI_XEN;
> > >
> > > OK. That sounds good.
> > > >
> > > > - remove the 'ifdef CONFIG_ACPI' from arch/x86/pci/xen.c.
> > >
> > > No.. same issue - non-ACPI builds can be with PCI.
> > > >
> >
> > Right.. in that case we should carefully replace the 'ifdef
> > CONFIG_XEN_DOM0' with 'ifdef CONFIG_ACPI' in arch/x86/pci/xen.c.
>
> .. which I think I did at some point as part of cleanup and then
> removed them since it peppered the xen.c with tons of #ifdef
> CONFIG_ACPI.
>
> What about PVonHVM. There is this nagging feeling in the back of my
> head that turning CONFIG_XEN_DOM0 disables the CONFIG_PVHVM option?
> Or something like that? Maybe it is the other way around?
>
> It would seem we need to seperate the PVHVM from DOM0. But the extra
> complexity comes with Mukesh's PVonHVM Hybrid which can do dom0 stuff
> with PVonHVM (should be searchable on xen-devel to find the patches).
They're currently separate, i.e. no problem with DOM0 off and PVHVM on,
or vice-versa afaict. I don't know anything about the hybrid though.
>
> Ian also mentioned that we MUST have the XEN_PRIVILIGED option,
> otherwise
> we will cause a regression with the GRUB tools. So that must stay or
> we need
> provide a deprecation schedule for the next 3 years to remove it and
> have patches in the GRUB toolchains.
Yeah, that's a bummer.
>
> Either way, there is also the question of making sure that no
> regressions
> are introduced - so a lot of the CONFIG_* interdependencies will have
> to be checked. I think that means checking CONFIG_XEN,
> CONFIG_PRIVILIGED,
> CONFIG_PVHVM, CONFIG_PCI, CONFIG_ACPI, CONFIG_XEN_BACKEND in various
> combinations.
>
I guess if we did the s/XEN_DOM0/LOCAL_APIC && IO_APIC && ACPI/ in
arch/x86/pci/xen.c it would be pretty easy to review for equivalence.
Then keep CONFIG_PRIVILIGED, but drop XEN_DOM0 from everywhere else and
compile in the 3 files. I don't think it makes much sense to do it
though. XEN_DOM0 keeps things tidier now and might be useful later.
> Oh, I forgot CONFIG_XEN_FRONTEND.. And I think that one can also
> become
> a module (ditto for XEN_BACKEND)
>
> And everytime we did something to those we managed to mess them up so
> we should be dilligient.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
^ permalink raw reply
* [PATCH 2/4 v2] xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps
From: Andrew Jones @ 2012-01-09 17:51 UTC (permalink / raw)
To: xen-devel
Cc: jeremy, virtualization, dmitry.torokhov, FlorianSchandinat,
konrad.wilk
In-Reply-To: <725950ad-d5cf-4ddb-9870-a5c8d75cfa51@zmail13.collab.prod.int.phx2.redhat.com>
PV-on-HVM guests may want to use the xen keyboard/mouse frontend, but
they don't use the xen frame buffer frontend. For this case it doesn't
make much sense for INPUT_XEN_KBDDEV_FRONTEND to depend on
XEN_FBDEV_FRONTEND. The opposite direction always makes more sense, i.e.
if you're using xenfb, then you'll want xenkbd. Switch the dependencies.
Signed-off-by: Andrew Jones <drjones@redhat.com>
---
drivers/input/misc/Kconfig | 2 +-
drivers/video/Kconfig | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig
index 22d875f..36c15bf 100644
--- a/drivers/input/misc/Kconfig
+++ b/drivers/input/misc/Kconfig
@@ -533,7 +533,7 @@ config INPUT_CMA3000_I2C
config INPUT_XEN_KBDDEV_FRONTEND
tristate "Xen virtual keyboard and mouse support"
- depends on XEN_FBDEV_FRONTEND
+ depends on XEN
default y
select XEN_XENBUS_FRONTEND
help
diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index d83e967..3e38c2f 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -2263,7 +2263,7 @@ config FB_VIRTUAL
config XEN_FBDEV_FRONTEND
tristate "Xen virtual frame buffer support"
- depends on FB && XEN
+ depends on FB && XEN && INPUT_XEN_KBDDEV_FRONTEND
select FB_SYS_FILLRECT
select FB_SYS_COPYAREA
select FB_SYS_IMAGEBLIT
--
1.7.7.5
^ permalink raw reply related
* [PATCH 3/4 v2] xen kconfig: add dom0 support help text
From: Andrew Jones @ 2012-01-09 18:07 UTC (permalink / raw)
To: xen-devel
Cc: jeremy, virtualization, Ian.Campbell, stefano.stabellini,
konrad.wilk
In-Reply-To: <1325842991-4404-4-git-send-email-drjones@redhat.com>
Describe dom0 support in the config menu and supply help text for it.
v2 adds 'if EXPERT' to keep it out of the "standard" menu.
Signed-off-by: Andrew Jones <drjones@redhat.com>
---
arch/x86/xen/Kconfig | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 26c731a..31ec975 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -14,9 +14,14 @@ config XEN
Xen hypervisor.
config XEN_DOM0
- def_bool y
+ bool "Xen Initial Domain (Dom0) support" if EXPERT
+ default y
depends on XEN && PCI_XEN && SWIOTLB_XEN
depends on X86_LOCAL_APIC && X86_IO_APIC && ACPI && PCI
+ help
+ This allows the kernel to be used for the initial Xen domain,
+ Domain0. This is a privileged guest that supplies backends
+ and is used to manage the other Xen domains.
# Dummy symbol since people have come to rely on the PRIVILEGED_GUEST
# name in tools.
--
1.7.7.5
^ permalink raw reply related
* Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Stefano Stabellini @ 2012-01-09 18:12 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Andrew Jones, xen-devel@lists.xensource.com,
virtualization@lists.linux-foundation.org, jeremy@goop.org,
Stefano Stabellini
In-Reply-To: <20120109170418.GD28700@phenom.dumpdata.com>
On Mon, 9 Jan 2012, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 09, 2012 at 04:12:10PM +0000, Stefano Stabellini wrote:
> > On Mon, 9 Jan 2012, Konrad Rzeszutek Wilk wrote:
> > > On Mon, Jan 09, 2012 at 11:39:44AM +0000, Stefano Stabellini wrote:
> > > > I don't think we should add "PCI_XEN && SWIOTLB_XEN && X86_LOCAL_APIC &&
> > > > X86_IO_APIC && ACPI && PCI" to XEN either.
> > > > However it should be possible to add only the right dependencies to the
> > > > right places.
> > > > In practice it should be sufficient to:
> > > >
> > > > - make PCI_XEN depend on X86_LOCAL_APIC && X86_IO_APIC && ACPI;
> > >
> > > Not good. You can make non-ACPI builds with PCI.
> > >
> > > .. and you are missing CONFIG_PCI in there.
> > > >
> > > > - make XEN_PCIDEV_FRONTEND and XEN_PCIDEV_BACKEND depend on PCI_XEN;
> > >
> > > OK. That sounds good.
> > > >
> > > > - remove the 'ifdef CONFIG_ACPI' from arch/x86/pci/xen.c.
> > >
> > > No.. same issue - non-ACPI builds can be with PCI.
> > > >
> >
> > Right.. in that case we should carefully replace the 'ifdef
> > CONFIG_XEN_DOM0' with 'ifdef CONFIG_ACPI' in arch/x86/pci/xen.c.
>
> .. which I think I did at some point as part of cleanup and then
> removed them since it peppered the xen.c with tons of #ifdef CONFIG_ACPI.
that's unfortunate
> What about PVonHVM. There is this nagging feeling in the back of my
> head that turning CONFIG_XEN_DOM0 disables the CONFIG_PVHVM option?
> Or something like that? Maybe it is the other way around?
nope, nothing I can think of.
> It would seem we need to seperate the PVHVM from DOM0. But the extra
> complexity comes with Mukesh's PVonHVM Hybrid which can do dom0 stuff
> with PVonHVM (should be searchable on xen-devel to find the patches).
I think that at the moment PVHVM and DOM0 aren't tie to one another in
any way, apart from the fact that they use the same generic
infrastructure to remap interrupts and MSIs into event channels.
^ permalink raw reply
* Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Stefano Stabellini @ 2012-01-09 18:44 UTC (permalink / raw)
To: Andrew Jones
Cc: jeremy@goop.org, xen-devel@lists.xensource.com,
virtualization@lists.linux-foundation.org, Stefano Stabellini,
Konrad Rzeszutek Wilk
In-Reply-To: <3f35c565-bcaf-49d2-ba46-c61ca95aa51b@zmail13.collab.prod.int.phx2.redhat.com>
On Mon, 9 Jan 2012, Andrew Jones wrote:
> I guess if we did the s/XEN_DOM0/LOCAL_APIC && IO_APIC && ACPI/ in
> arch/x86/pci/xen.c it would be pretty easy to review for equivalence.
> Then keep CONFIG_PRIVILIGED, but drop XEN_DOM0 from everywhere else and
> compile in the 3 files. I don't think it makes much sense to do it
> though. XEN_DOM0 keeps things tidier now and might be useful later.
we can keep things clean with the following:
#ifdef CONFIG_LOCAL_APIC && CONFIG_IO_APIC && CONFIG_ACPI && CONFIG_PCI_XEN
#define XEN_DOM0
#endif
in include/xen/xen.h.
So in the source files we can still '#ifdef XEN_DOM0', but at the same
time we can get rid of the build symbol: everybody wins.
^ permalink raw reply
* Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Jeremy Fitzhardinge @ 2012-01-10 0:57 UTC (permalink / raw)
To: Stefano Stabellini
Cc: Andrew Jones, xen-devel@lists.xensource.com,
virtualization@lists.linux-foundation.org, Konrad Rzeszutek Wilk
In-Reply-To: <alpine.DEB.2.00.1201091838490.3150@kaball-desktop>
On 01/10/2012 05:44 AM, Stefano Stabellini wrote:
> On Mon, 9 Jan 2012, Andrew Jones wrote:
>> I guess if we did the s/XEN_DOM0/LOCAL_APIC && IO_APIC && ACPI/ in
>> arch/x86/pci/xen.c it would be pretty easy to review for equivalence.
>> Then keep CONFIG_PRIVILIGED, but drop XEN_DOM0 from everywhere else and
>> compile in the 3 files. I don't think it makes much sense to do it
>> though. XEN_DOM0 keeps things tidier now and might be useful later.
> we can keep things clean with the following:
>
> #ifdef CONFIG_LOCAL_APIC && CONFIG_IO_APIC && CONFIG_ACPI && CONFIG_PCI_XEN
>
> #define XEN_DOM0
>
> #endif
>
> in include/xen/xen.h.
>
> So in the source files we can still '#ifdef XEN_DOM0', but at the same
> time we can get rid of the build symbol: everybody wins.
No, really, I think this is silly. This kind of dependency information
should be encoded in the Kconfig system, and not reimplemented in an
ad-hoc way.
If there were a clean way to do what Andrew wants then I'd support it,
but this thread has descended into a spiral of madness, which makes me
think its a lost cause.
If the root complaint is that "customers think that anything set in
.config is a supported feature", then the solutions are to support all
the features in .config, re-educate the customers that they're wrong, or
maintain a local patch to do this stuff.
J
^ permalink raw reply
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Michael S. Tsirkin @ 2012-01-10 17:03 UTC (permalink / raw)
To: Rusty Russell
Cc: Christian Borntraeger, Sasha Levin, Pawel Moll, virtualization
In-Reply-To: <878vm6daqy.fsf@rustcorp.com.au>
On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote:
> On Tue, 20 Dec 2011 13:37:18 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, Dec 19, 2011 at 09:38:42PM +1030, Rusty Russell wrote:
> > > On Mon, 19 Dec 2011 11:13:26 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >
> > > Actually, the host doesn't need anything, since it can always lock out
> > > the guest while it updates the area.
> > > It's the guest which can't do atomic updates.
> >
> > There are 2 cases
> > - updates by guest accesses by host
> > - accesses by guest updates by host
> >
> > Both are problematic because the guest accesses are split.
> > Consider the example I gave at the beginning was with capacity read
> > by guest. Host can not solve it without guest changes, right?
>
> Indeed, my brain fart. Let's pretend I didn't say that, and you didn't
> have to explain it to me in baby words :)
>
> > > > The counter can be in guest memory, right? So we don't pay extra
> > > > exits.
> > >
> > > Could be, but I'm not delighted about the design.
> >
> > OK, so this is an argument for always using a control vq, right?
>
> Yes. The idea that we can alter fields in the device-specific config
> area is flawed. There may be cases where it doesn't matter, but as an
> idea it was holed to begin with.
>
> We can reduce probability by doing a double read to check, but there are
> still cases where it will fail.
Okay - want me to propose an interface for that?
^ permalink raw reply
* [PATCH v2 0/3] virtio_net: Better low memory handling.
From: Mike Waychison @ 2012-01-10 17:40 UTC (permalink / raw)
To: Rusty Russell, Michael S. Tsirkin
Cc: netdev, earhart, digitaleric, linux-kernel, virtualization
The following series applies to net-next.
The following series changes the low memory paths in virtio_net to not
disable NAPI while waiting in the allocator in the slow path.
It attempts to rectify some performance problems we've seen where the
network performance drops significantly when memory is low. The working
theory is that the disabling of NAPI while allocations are occuring in
the slow refill_work() path can in turn cause significant delays in
processing of the receive queue.
The observed situation on an unmodified kernel is that a system with low
memory will in turn quickly stop being being able to refill the rx queue
with buffers, in turn causing packets to be dropped by the host OS (as
NAPI polling is disabled) while the driver wait for memory to be
reclaimed. The way the process-context refill loop works, the memory
subsystem is effectively polled every half second when things look
bleak, leading to significant packet loss and congestion window
collapse.
The first patch prepares for a batched refill path by splitting the
receive buffer allocation and enqueue bits into separate functions.
The second patch introduces the batched refill path itself. As
implemented, it still disables NAPI completely for the whole refill.
The third patch then pushed the disabling and enabling of NAPI down into
the refill loop so that it only covers the actual enqueueing of receive
buffers on the virtqueue.
This patchset seems to help the test setups that I'm playing with.
Example tests include using several bit-torrent clients within a VM and
watching the network throughputs on the host. Other similar workloads
(high network traffic, reasonably high disk IO causing memory pressure)
also appear to have throughput problems alleviated by these changes.
As a warning, I've only tested this using the "small buffers" and
"mergeable" buffers paths. I haven't figured out how to force qemu-kvm
to not support the feature bits that trigger 'mergable' receive buffers.
ChangeLog:
==========
v2:
- Rewritten to not introduce a spinlock to protect the virtqueue data.
---
Mike Waychison (3):
virtio_net: Split receive buffer alloc/add
virtio_net: Batch receive buffer filling
virtio_net: Don't disable NAPI while allocating
drivers/net/virtio_net.c | 325 ++++++++++++++++++++++++++++++++++++++++------
1 files changed, 285 insertions(+), 40 deletions(-)
^ permalink raw reply
* [PATCH v2 1/3] virtio_net: Split receive buffer alloc/add
From: Mike Waychison @ 2012-01-10 17:41 UTC (permalink / raw)
To: Rusty Russell, Michael S. Tsirkin
Cc: netdev, earhart, digitaleric, linux-kernel, virtualization
In-Reply-To: <20120110174052.4505.66514.stgit@mike2.sea.corp.google.com>
In preparation for allocating receive buffers in the slow path without
disabling NAPI, split the allocation and addition of receive buffers
apart into two separate functions (per receive buffer type).
While here, move the vi->num accounting into the add functions.
Signed-off-by: Mike Waychison <mikew@google.com>
---
drivers/net/virtio_net.c | 150 +++++++++++++++++++++++++++++++++++-----------
1 files changed, 113 insertions(+), 37 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 76fe14e..5531089 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -353,17 +353,35 @@ frame_err:
dev_kfree_skb(skb);
}
-static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
+/*
+ * Allocate an skb for "small" receive buffer configurations.
+ * May return NULL if oom.
+ * No serialization required.
+ */
+static struct sk_buff *alloc_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
{
struct sk_buff *skb;
- struct skb_vnet_hdr *hdr;
- int err;
skb = __netdev_alloc_skb_ip_align(vi->dev, MAX_PACKET_LEN, gfp);
if (unlikely(!skb))
- return -ENOMEM;
+ return NULL;
skb_put(skb, MAX_PACKET_LEN);
+ return skb;
+}
+
+/*
+ * Add a skb to the receive queue for "small" receive buffer configurations.
+ * Returns the number of virtqueue slots left free on success, negative errno
+ * otherwise.
+ * Always consumes skb.
+ * Must be serialized with the napi poll path.
+ */
+static int add_recvbuf_small(struct virtnet_info *vi, struct sk_buff *skb,
+ gfp_t gfp)
+{
+ struct skb_vnet_hdr *hdr;
+ int err;
hdr = skb_vnet_hdr(skb);
sg_set_buf(vi->rx_sg, &hdr->hdr, sizeof hdr->hdr);
@@ -373,36 +391,54 @@ static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, 2, skb, gfp);
if (err < 0)
dev_kfree_skb(skb);
+ else
+ vi->num++;
return err;
}
-static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
+/*
+ * Allocate an list of pages for "big" receive buffer configurations.
+ * Pages are chained through ->private.
+ * May return null if oom.
+ * No serialization required.
+ */
+static struct page *alloc_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
{
- struct page *first, *list = NULL;
- char *p;
- int i, err, offset;
+ struct page *first, *tail = NULL;
+ int i;
- /* page in vi->rx_sg[MAX_SKB_FRAGS + 1] is list tail */
- for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
+ /* Build a list of pages chained through ->private. Built in reverse order */
+ for (i = 0; i < MAX_SKB_FRAGS + 1; ++i) {
first = get_a_page(vi, gfp);
if (!first) {
- if (list)
- give_pages(vi, list);
- return -ENOMEM;
+ if (tail)
+ give_pages(vi, tail);
+ return NULL;
}
- sg_set_buf(&vi->rx_sg[i], page_address(first), PAGE_SIZE);
- /* chain new page in list head to match sg */
- first->private = (unsigned long)list;
- list = first;
+ /* chain new page in list head */
+ first->private = (unsigned long)tail;
+ tail = first;
}
+ return first;
+}
+
+/*
+ * Add a chain of pages to the receive queue for "big" receive buffer
+ * configurations.
+ * Returns the number of virtqueue slots left free on success, negative errno
+ * otherwise.
+ * Always consumes the entire chain of pages.
+ * Must be serialized with the napi poll path.
+ */
+static int add_recvbuf_big(struct virtnet_info *vi, struct page *first,
+ gfp_t gfp)
+{
+ struct page *page;
+ char *p;
+ int i, err, offset;
- first = get_a_page(vi, gfp);
- if (!first) {
- give_pages(vi, list);
- return -ENOMEM;
- }
p = page_address(first);
/* vi->rx_sg[0], vi->rx_sg[1] share the same page */
@@ -413,30 +449,55 @@ static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
offset = sizeof(struct padded_vnet_hdr);
sg_set_buf(&vi->rx_sg[1], p + offset, PAGE_SIZE - offset);
- /* chain first in list head */
- first->private = (unsigned long)list;
+ /* Chain in the rest of the pages */
+ i = 2; /* Offset to insert further pages into the sg */
+ page = (struct page *)first->private;
+ while (page) {
+ sg_set_buf(&vi->rx_sg[i], page_address(page), PAGE_SIZE);
+ page = (struct page *)page->private;
+ i++;
+ }
+
err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
first, gfp);
if (err < 0)
give_pages(vi, first);
+ else
+ vi->num++;
return err;
}
-static int add_recvbuf_mergeable(struct virtnet_info *vi, gfp_t gfp)
+/*
+ * Allocate a page for "mergeable" receive buffer configurations.
+ * May return NULL if oom.
+ * No serialization required.
+ */
+static struct page *alloc_recvbuf_mergeable(struct virtnet_info *vi, gfp_t gfp)
+{
+ return get_a_page(vi, gfp);
+}
+
+/*
+ * Add a page to the receive queue for "mergeable" receive buffer
+ * configurations.
+ * Returns the number of virtqueue slots left free on success, negative errno
+ * otherwise.
+ * Always consumes the page.
+ * Must be serialized with the napi poll path.
+ */
+static int add_recvbuf_mergeable(struct virtnet_info *vi, struct page *page,
+ gfp_t gfp)
{
- struct page *page;
int err;
- page = get_a_page(vi, gfp);
- if (!page)
- return -ENOMEM;
-
sg_init_one(vi->rx_sg, page_address(page), PAGE_SIZE);
err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, 1, page, gfp);
if (err < 0)
give_pages(vi, page);
+ else
+ vi->num++;
return err;
}
@@ -454,17 +515,32 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
bool oom;
do {
- if (vi->mergeable_rx_bufs)
- err = add_recvbuf_mergeable(vi, gfp);
- else if (vi->big_packets)
- err = add_recvbuf_big(vi, gfp);
- else
- err = add_recvbuf_small(vi, gfp);
+ if (vi->mergeable_rx_bufs) {
+ struct page *page;
+ page = alloc_recvbuf_mergeable(vi, gfp);
+ if (!page)
+ err = -ENOMEM;
+ else
+ err = add_recvbuf_mergeable(vi, page, gfp);
+ } else if (vi->big_packets) {
+ struct page *page;
+ page = alloc_recvbuf_big(vi, gfp);
+ if (!page)
+ err = -ENOMEM;
+ else
+ err = add_recvbuf_big(vi, page, gfp);
+ } else {
+ struct sk_buff *skb;
+ skb = alloc_recvbuf_small(vi, gfp);
+ if (!skb)
+ err = -ENOMEM;
+ else
+ err = add_recvbuf_small(vi, skb, gfp);
+ }
oom = err == -ENOMEM;
if (err < 0)
break;
- ++vi->num;
} while (err > 0);
if (unlikely(vi->num > vi->max))
vi->max = vi->num;
^ permalink raw reply related
* [PATCH v2 2/3] virtio_net: Batch receive buffer filling
From: Mike Waychison @ 2012-01-10 17:41 UTC (permalink / raw)
To: Rusty Russell, Michael S. Tsirkin
Cc: netdev, earhart, digitaleric, linux-kernel, virtualization
In-Reply-To: <20120110174052.4505.66514.stgit@mike2.sea.corp.google.com>
In preparation of moving the allocation of receive buffers on the slow
path outside of the NAPI disable block in refill_work(), introduce a new
method, try_fill_recvbatch(), which fill the receive buffers in a
batched mode. Although their algorithms are similar, the list enqeueing
and cleanup are different enough that duplicating the overall algorithm
resulted in cleaner code.
This new function is implemented either by fill_recvbatch_pages() in the
case of "big" or "mergeable" receive buffers, or fill_recvbatch_small()
for the small buffer fallback case.
The batched operation allows us to later push the disabling of napi on
the virtio_net device down to only cover the bits that manipulate the
virtio queue, letting the bulk of the allocations operate while the nic
can still process received packets.
Signed-off-by: Mike Waychison <mikew@google.com>
---
drivers/net/virtio_net.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 141 insertions(+), 1 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5531089..10d9761 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -37,6 +37,7 @@ module_param(gso, bool, 0444);
/* FIXME: MTU in config. */
#define MAX_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
#define GOOD_COPY_LEN 128
+#define MAX_RX_ALLOCATE_BATCH 32
#define VIRTNET_SEND_COMMAND_SG_MAX 2
#define VIRTNET_DRIVER_VERSION "1.0.0"
@@ -572,6 +573,144 @@ static void virtnet_napi_enable(struct virtnet_info *vi)
}
}
+/*
+ * Try to fill a "big" or "mergeable" receive queue using batching.
+ * Caller must serialize against NAPI.
+ * Returns false if we failed to finish due to oom.
+ */
+static bool fill_recvbatch_pages(struct virtnet_info *vi)
+{
+ bool oom = false;
+ bool full = false;
+ LIST_HEAD(local_list);
+ struct page *page, *npage;
+ int i;
+
+ BUG_ON(!vi->big_packets && !vi->mergeable_rx_bufs);
+fill_more:
+ /* Allocate a batch. */
+ for (i = 0; i < MAX_RX_ALLOCATE_BATCH; i++) {
+ if (vi->mergeable_rx_bufs)
+ page = alloc_recvbuf_mergeable(vi, GFP_KERNEL);
+ else /* vi->big_packets */
+ page = alloc_recvbuf_big(vi, GFP_KERNEL);
+ if (!page) {
+ oom = true;
+ break;
+ }
+ list_add_tail(&page->lru, &local_list);
+ }
+
+ /* Enqueue batch as available. */
+ list_for_each_entry_safe(page, npage, &local_list, lru) {
+ int err;
+
+ list_del(&page->lru);
+ if (vi->mergeable_rx_bufs)
+ err = add_recvbuf_mergeable(vi, page, GFP_KERNEL);
+ else /* vi->big_packets */
+ err = add_recvbuf_big(vi, page, GFP_KERNEL);
+ if (err > 0)
+ continue;
+ if (err == -ENOSPC || err == 0)
+ full = true;
+ else if (err == -ENOMEM)
+ oom = true;
+ else
+ BUG();
+ break;
+ }
+ if (unlikely(vi->num > vi->max))
+ vi->max = vi->num;
+
+ /* Cleanup any remaining entries on the list */
+ if (unlikely(!list_empty(&local_list))) {
+ list_for_each_entry_safe(page, npage, &local_list, lru) {
+ list_del(&page->lru);
+ give_pages(vi, page);
+ }
+ }
+
+ if (!oom && !full)
+ goto fill_more;
+
+ return !oom;
+}
+
+/*
+ * Try to fill a "small" receive queue using batching.
+ * Caller must serialize against NAPI.
+ * Returns false if we failed to finish due to oom.
+ */
+static bool fill_recvbatch_small(struct virtnet_info *vi)
+{
+ bool oom = false;
+ bool full = false;
+ LIST_HEAD(local_list);
+ struct list_head *pos, *npos;
+ struct sk_buff *skb;
+ int i;
+
+fill_more:
+ /* Allocate a batch. */
+ for (i = 0; i < MAX_RX_ALLOCATE_BATCH; i++) {
+ skb = alloc_recvbuf_small(vi, GFP_KERNEL);
+ if (!skb) {
+ oom = true;
+ break;
+ }
+ list_add_tail((struct list_head *)skb, &local_list);
+ }
+
+ /* Enqueue batch as available. */
+ list_for_each_safe(pos, npos, &local_list) {
+ int err;
+
+ list_del(pos);
+ skb = (struct sk_buff *)pos;
+
+ err = add_recvbuf_small(vi, skb, GFP_KERNEL);
+ if (err > 0)
+ continue;
+ if (err == -ENOSPC || err == 0)
+ full = true;
+ else if (err == -ENOMEM)
+ oom = true;
+ else
+ BUG();
+ break;
+ }
+ if (unlikely(vi->num > vi->max))
+ vi->max = vi->num;
+
+ /* Cleanup any remaining entries on the list */
+ if (unlikely(!list_empty(&local_list))) {
+ list_for_each_safe(pos, npos, &local_list) {
+ skb = (struct sk_buff *)pos;
+ list_del(pos);
+ dev_kfree_skb(skb);
+ }
+ }
+
+ if (!oom && !full)
+ goto fill_more;
+
+ return !oom;
+}
+
+/*
+ * Refill the receive queues from process context.
+ * Caller must serialize against NAPI.
+ * Returns false if we failed to allocate due to memory pressure.
+ */
+static bool try_fill_recvbatch(struct virtnet_info *vi)
+{
+ if (vi->mergeable_rx_bufs || vi->big_packets)
+ return fill_recvbatch_pages(vi);
+ else
+ return fill_recvbatch_small(vi);
+}
+
static void refill_work(struct work_struct *work)
{
struct virtnet_info *vi;
@@ -579,7 +718,8 @@ static void refill_work(struct work_struct *work)
vi = container_of(work, struct virtnet_info, refill.work);
napi_disable(&vi->napi);
- still_empty = !try_fill_recv(vi, GFP_KERNEL);
+ still_empty = !try_fill_recvbatch(vi);
+ virtqueue_kick(vi->rvq);
virtnet_napi_enable(vi);
/* In theory, this can happen: if we don't get any buffers in
^ permalink raw reply related
* [PATCH v2 3/3] virtio_net: Don't disable NAPI while allocating
From: Mike Waychison @ 2012-01-10 17:41 UTC (permalink / raw)
To: Rusty Russell, Michael S. Tsirkin
Cc: netdev, earhart, digitaleric, linux-kernel, virtualization
In-Reply-To: <20120110174052.4505.66514.stgit@mike2.sea.corp.google.com>
This patch changes the batch refill path so that NAPI is only disabled
when we are in the midst of enqueue available receive buffers.
allocators.
For "small" receive buffers, pushing the enabling and disabling of NAPI
is pretty straight-forward. For the "big" and "mergeable" receive
buffers however care must be taken to avoid letting either the allocator
or the release code from touching the per-device free-page list at
vi->pages (which is only serialized by the virtue of NAPI
serialization).
To handle this case, the allocators are modified to take a
"napi_serialized" flag, which indicates whether or not they should be
touching vi->pages. As well, for page release, __give_pages() is
introduced to free chains of pages (threaded through ->private) and is
to be used when we haven't stopped NAPI.
The failure paths in the "add" functions for the receive buffers do not
need to be updated, as they must be called serially with the NAPI poll,
and therefore are allowed to touch vi->pages (indirectly through
give_pages).
Signed-off-by: Mike Waychison <mikew@google.com>
---
drivers/net/virtio_net.c | 67 +++++++++++++++++++++++++++++++++-------------
1 files changed, 48 insertions(+), 19 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 10d9761..ec0e1fb 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -118,6 +118,20 @@ static void give_pages(struct virtnet_info *vi, struct page *page)
vi->pages = page;
}
+/*
+ * Give a list of pages threaded through the ->private field back to the page
+ * allocator. To be used by paths that aren't serialized with napi.
+ */
+static void __give_pages(struct page *page)
+{
+ struct page *nextpage;
+ do {
+ nextpage = (struct page *)page->private;
+ __free_page(page);
+ page = nextpage;
+ } while (page);
+}
+
static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
{
struct page *p = vi->pages;
@@ -402,19 +416,24 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct sk_buff *skb,
* Allocate an list of pages for "big" receive buffer configurations.
* Pages are chained through ->private.
* May return null if oom.
- * No serialization required.
+ * napi_serialized must be false if we aren't serialized with the napi path.
*/
-static struct page *alloc_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
+static struct page *alloc_recvbuf_big(struct virtnet_info *vi,
+ bool napi_serialized, gfp_t gfp)
{
struct page *first, *tail = NULL;
int i;
/* Build a list of pages chained through ->private. Built in reverse order */
for (i = 0; i < MAX_SKB_FRAGS + 1; ++i) {
- first = get_a_page(vi, gfp);
+ first = napi_serialized ? get_a_page(vi, gfp) : alloc_page(gfp);
if (!first) {
- if (tail)
- give_pages(vi, tail);
+ if (tail) {
+ if (napi_serialized)
+ give_pages(vi, tail);
+ else
+ __give_pages(tail);
+ }
return NULL;
}
@@ -472,11 +491,18 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct page *first,
/*
* Allocate a page for "mergeable" receive buffer configurations.
* May return NULL if oom.
- * No serialization required.
+ * @napi_serialized must be false if we aren't serialized with the napi path.
*/
-static struct page *alloc_recvbuf_mergeable(struct virtnet_info *vi, gfp_t gfp)
+static struct page *alloc_recvbuf_mergeable(struct virtnet_info *vi,
+ bool napi_serialized, gfp_t gfp)
{
- return get_a_page(vi, gfp);
+ struct page *page;
+ if (napi_serialized)
+ return get_a_page(vi, gfp);
+ page = alloc_page(gfp);
+ if (page)
+ page->private = 0;
+ return page;
}
/*
@@ -518,14 +544,14 @@ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
do {
if (vi->mergeable_rx_bufs) {
struct page *page;
- page = alloc_recvbuf_mergeable(vi, gfp);
+ page = alloc_recvbuf_mergeable(vi, true, gfp);
if (!page)
err = -ENOMEM;
else
err = add_recvbuf_mergeable(vi, page, gfp);
} else if (vi->big_packets) {
struct page *page;
- page = alloc_recvbuf_big(vi, gfp);
+ page = alloc_recvbuf_big(vi, true, gfp);
if (!page)
err = -ENOMEM;
else
@@ -575,7 +601,7 @@ static void virtnet_napi_enable(struct virtnet_info *vi)
/*
* Try to fill a "big" or "mergeable" receive queue using batching.
- * Caller must serialize against NAPI.
+ * This call will serialize itself against NAPI.
* Returns false if we failed to finish due to oom.
*/
static bool fill_recvbatch_pages(struct virtnet_info *vi)
@@ -591,9 +617,9 @@ fill_more:
/* Allocate a batch. */
for (i = 0; i < MAX_RX_ALLOCATE_BATCH; i++) {
if (vi->mergeable_rx_bufs)
- page = alloc_recvbuf_mergeable(vi, GFP_KERNEL);
+ page = alloc_recvbuf_mergeable(vi, false, GFP_KERNEL);
else /* vi->big_packets */
- page = alloc_recvbuf_big(vi, GFP_KERNEL);
+ page = alloc_recvbuf_big(vi, false, GFP_KERNEL);
if (!page) {
oom = true;
break;
@@ -602,6 +628,7 @@ fill_more:
}
/* Enqueue batch as available. */
+ napi_disable(&vi->napi);
list_for_each_entry_safe(page, npage, &local_list, lru) {
int err;
@@ -622,12 +649,14 @@ fill_more:
}
if (unlikely(vi->num > vi->max))
vi->max = vi->num;
+ virtqueue_kick(vi->rvq);
+ virtnet_napi_enable(vi);
/* Cleanup any remaining entries on the list */
if (unlikely(!list_empty(&local_list))) {
list_for_each_entry_safe(page, npage, &local_list, lru) {
list_del(&page->lru);
- give_pages(vi, page);
+ __give_pages(page);
}
}
@@ -639,7 +668,7 @@ fill_more:
/*
* Try to fill a "small" receive queue using batching.
- * Caller must serialize against NAPI.
+ * This call will serialize itself against NAPI.
* Returns false if we failed to finish due to oom.
*/
static bool fill_recvbatch_small(struct virtnet_info *vi)
@@ -663,6 +692,7 @@ fill_more:
}
/* Enqueue batch as available. */
+ napi_disable(&vi->napi);
list_for_each_safe(pos, npos, &local_list) {
int err;
@@ -682,6 +712,8 @@ fill_more:
}
if (unlikely(vi->num > vi->max))
vi->max = vi->num;
+ virtqueue_kick(vi->rvq);
+ virtnet_napi_enable(vi);
/* Cleanup any remaining entries on the list */
if (unlikely(!list_empty(&local_list))) {
@@ -700,7 +732,7 @@ fill_more:
/*
* Refill the receive queues from process context.
- * Caller must serialize against NAPI.
+ * Callee will serialize against NAPI itself.
* Returns false if we failed to allocate due to memory pressure.
*/
static bool try_fill_recvbatch(struct virtnet_info *vi)
@@ -717,10 +749,7 @@ static void refill_work(struct work_struct *work)
bool still_empty;
vi = container_of(work, struct virtnet_info, refill.work);
- napi_disable(&vi->napi);
still_empty = !try_fill_recvbatch(vi);
- virtqueue_kick(vi->rvq);
- virtnet_napi_enable(vi);
/* In theory, this can happen: if we don't get any buffers in
* we will *never* try to fill again. */
^ permalink raw reply related
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Rusty Russell @ 2012-01-11 0:25 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Christian Borntraeger, Benjamin Herrenschmidt, Sasha Levin,
Pawel Moll, virtualization
In-Reply-To: <20120110170334.GA18404@redhat.com>
On Tue, 10 Jan 2012 19:03:36 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote:
> > Yes. The idea that we can alter fields in the device-specific config
> > area is flawed. There may be cases where it doesn't matter, but as an
> > idea it was holed to begin with.
> >
> > We can reduce probability by doing a double read to check, but there are
> > still cases where it will fail.
>
> Okay - want me to propose an interface for that?
Had a brief chat with BenH (CC'd).
I think we should deprecate writing to the config space. Only balloon
does it AFAICT, and I can't quite figure out *why* it has an 'active'
field. This solves half the problem, of sync guest writes. For the
other half, I suggest a generation counter; odd means inconsistent. The
guest can poll.
BenH also convinced me we should finally make the config space LE if
we're going to change things. Since PCI is the most common transport,
guest-endian confuses people. And it sucks for really weird machines.
We should also change the ring (to a single ring, I think). Descriptors
to 24 bytes long (8 byte cookie, 8 byte addr, 4 byte len, 4 byte flags).
We might be able to squeeze it into 20 bytes but that means packing. We
should support inline, chained or indirect. Let the other side ack by
setting flag, cookie and len (if written).
Moreover, I think we should make all these changes at once (at least, in
the spec). That makes it a big change, and it'll take longer to
develop, but makes it easy in the long run to differentiate legacy and
modern virtio.
Thoughts?
Rusty.
^ permalink raw reply
* Re: [PATCH v2 1/3] virtio_net: Split receive buffer alloc/add
From: Rusty Russell @ 2012-01-11 1:30 UTC (permalink / raw)
To: Mike Waychison, Michael S. Tsirkin
Cc: netdev, earhart, digitaleric, linux-kernel, virtualization
In-Reply-To: <20120110174100.4505.8939.stgit@mike2.sea.corp.google.com>
On Tue, 10 Jan 2012 09:41:01 -0800, Mike Waychison <mikew@google.com> wrote:
> In preparation for allocating receive buffers in the slow path without
> disabling NAPI, split the allocation and addition of receive buffers
> apart into two separate functions (per receive buffer type).
>
> While here, move the vi->num accounting into the add functions.
>
> Signed-off-by: Mike Waychison <mikew@google.com>
Hi Mike...
This exposes a nasty ugliness in the way virtio_net works. We
allocate an skbuff for the small packet case, and just allocate the
pages for the large packet cases, and alloc the skbuff when we fill the
pages.
I think all the allocators should return a populated skbuff;
this uses a bit more memory in theory, but should make the code simpler.
As an added bonus, your life should get much simpler for these patches.
I'll try to create such a patch tonight, but I'm busy finalizing my
linux.conf.au presentation, so it might take longer :(
Thanks,
Rusty.
^ permalink raw reply
* Re: [PATCH v2 1/3] virtio_net: Split receive buffer alloc/add
From: Mike Waychison @ 2012-01-11 1:42 UTC (permalink / raw)
To: Rusty Russell
Cc: earhart, Michael S. Tsirkin, netdev, linux-kernel, virtualization,
digitaleric
In-Reply-To: <8739bn6n66.fsf@rustcorp.com.au>
On Tue, Jan 10, 2012 at 5:30 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> On Tue, 10 Jan 2012 09:41:01 -0800, Mike Waychison <mikew@google.com> wrote:
>> In preparation for allocating receive buffers in the slow path without
>> disabling NAPI, split the allocation and addition of receive buffers
>> apart into two separate functions (per receive buffer type).
>>
>> While here, move the vi->num accounting into the add functions.
>>
>> Signed-off-by: Mike Waychison <mikew@google.com>
>
> Hi Mike...
>
> This exposes a nasty ugliness in the way virtio_net works. We
> allocate an skbuff for the small packet case, and just allocate the
> pages for the large packet cases, and alloc the skbuff when we fill the
> pages.
>
> I think all the allocators should return a populated skbuff;
> this uses a bit more memory in theory, but should make the code simpler.
> As an added bonus, your life should get much simpler for these patches.
>
> I'll try to create such a patch tonight, but I'm busy finalizing my
> linux.conf.au presentation, so it might take longer :(
>
This seems reasonable for the "big" receive buffers, but I don't think
it makes a lot of sense for "mergeable" receive buffers as best I can
tell. Hopefully making this work isn't too hard though :)
^ permalink raw reply
* Re: [RFC 7/11] virtio_pci: new, capability-aware driver.
From: Benjamin Herrenschmidt @ 2012-01-11 1:48 UTC (permalink / raw)
To: Rusty Russell
Cc: Christian Borntraeger, virtualization, Sasha Levin, Pawel Moll,
Michael S. Tsirkin
In-Reply-To: <8762gj6q5r.fsf@rustcorp.com.au>
On Wed, 2012-01-11 at 10:55 +1030, Rusty Russell wrote:
> On Tue, 10 Jan 2012 19:03:36 +0200, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote:
> > > Yes. The idea that we can alter fields in the device-specific config
> > > area is flawed. There may be cases where it doesn't matter, but as an
> > > idea it was holed to begin with.
> > >
> > > We can reduce probability by doing a double read to check, but there are
> > > still cases where it will fail.
> >
> > Okay - want me to propose an interface for that?
>
> Had a brief chat with BenH (CC'd).
Thanks Rusty :-)
> I think we should deprecate writing to the config space. Only balloon
> does it AFAICT, and I can't quite figure out *why* it has an 'active'
> field. This solves half the problem, of sync guest writes. For the
> other half, I suggest a generation counter; odd means inconsistent. The
> guest can poll.
Agreed. Commands are the way to go.
> BenH also convinced me we should finally make the config space LE if
> we're going to change things. Since PCI is the most common transport,
> guest-endian confuses people. And it sucks for really weird machines.
I have a few more "good" reasons but yes, basically it confuses things.
Just see the bugs we are still fixing in qemu :-)
A fixed endian makes everything simpler, there is no need to play games
when using virtio in heterogenous environment (slave CPUs), no need to
handle variable endian on CPUs that support both etc... and the cost of
byteswap is negligible (we do it routinely for normal PCI hardware and
it's never been a hot spot).
> We should also change the ring (to a single ring, I think). Descriptors
> to 24 bytes long (8 byte cookie, 8 byte addr, 4 byte len, 4 byte flags).
Do we care about space waste ? 32 bytes is a nicely round power of two
and would make general math, but also debugging (visual inspection of
the ring) etc... easier.
If we support immediate data, that means we'd have enough for most
common commands and status straight in the ring.
Another approach that may or may not be worthwhile is to have the size
of the ring elements configurable (as power of two) so that it can be
adjusted to contain both immediate and a descriptor. This is done
typically in network hardware to put the routing headers in the
immediate part and the rest of the packet elsewhere which improves cache
affinity when processing large amount of packets.
> We might be able to squeeze it into 20 bytes but that means packing. We
> should support inline, chained or indirect. Let the other side ack by
> setting flag, cookie and len (if written).
>
> Moreover, I think we should make all these changes at once (at least, in
> the spec). That makes it a big change, and it'll take longer to
> develop, but makes it easy in the long run to differentiate legacy and
> modern virtio.
Agreed,
Cheers,
Ben.
^ permalink raw reply
* [PATCH] vhost-net: add module alias
From: Stephen Hemminger @ 2012-01-11 4:54 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, kvm, virtualization
By adding the a module alias, programs (or users) won't have to explicitly
call modprobe. Vhost-net will always be available if built into the kernel.
It does require assigning a permanent minor number for depmod to work.
Choose one next to TUN since this driver is related to it.
Also, use C99 style initialization.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
---
drivers/vhost/net.c | 8 +++++---
include/linux/miscdevice.h | 1 +
2 files changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/vhost/net.c 2012-01-10 10:56:58.883179194 -0800
+++ b/drivers/vhost/net.c 2012-01-10 19:48:23.650225892 -0800
@@ -856,9 +856,9 @@ static const struct file_operations vhos
};
static struct miscdevice vhost_net_misc = {
- MISC_DYNAMIC_MINOR,
- "vhost-net",
- &vhost_net_fops,
+ .minor = VHOST_NET_MINOR,
+ .name = "vhost-net",
+ .fops = &vhost_net_fops,
};
static int vhost_net_init(void)
@@ -879,3 +879,5 @@ MODULE_VERSION("0.0.1");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Michael S. Tsirkin");
MODULE_DESCRIPTION("Host kernel accelerator for virtio net");
+MODULE_ALIAS_MISCDEV(VHOST_NET_MINOR);
+MODULE_ALIAS("devname:vhost-net");
--- a/include/linux/miscdevice.h 2012-01-10 10:56:59.779189436 -0800
+++ b/include/linux/miscdevice.h 2012-01-10 19:49:56.091748210 -0800
@@ -31,6 +31,7 @@
#define I2O_MINOR 166
#define MICROCODE_MINOR 184
#define TUN_MINOR 200
+#define VHOST_NET_MINOR 201
#define MWAVE_MINOR 219 /* ACP/Mwave Modem */
#define MPT_MINOR 220
#define MPT2SAS_MINOR 221
^ permalink raw reply
* Re: [PATCH] vhost-net: add module alias
From: Michael Tokarev @ 2012-01-11 7:07 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, virtualization, kvm, Michael S. Tsirkin
In-Reply-To: <20120110205400.6c1cb306@nehalam.linuxnetplumber.net>
On 11.01.2012 08:54, Stephen Hemminger wrote:
> By adding the a module alias, programs (or users) won't have to explicitly
> call modprobe. Vhost-net will always be available if built into the kernel.
> It does require assigning a permanent minor number for depmod to work.
> Choose one next to TUN since this driver is related to it.
Why do you think a statically-allocated device number will do any good
at all? Static /dev is gone almost completely, at least on the systems
where whole virt stuff makes any sense, so you don't have pre-created
vhost-net device anymore, and hence this allocation makes no sense.
Just IMHO anyway.
Thanks,
/mjt
^ permalink raw reply
* Re: [PATCH] vhost-net: add module alias
From: Amos Kong @ 2012-01-11 7:43 UTC (permalink / raw)
To: Stephen Hemminger
Cc: kvm, Michael S. Tsirkin, netdev, virtualization, device, alan
In-Reply-To: <20120110205400.6c1cb306@nehalam.linuxnetplumber.net>
[-- Attachment #1.1: Type: text/plain, Size: 2350 bytes --]
On Wed, Jan 11, 2012 at 12:54 PM, Stephen Hemminger
<shemminger@vyatta.com>wrote:
> By adding the a module alias, programs (or users) won't have to explicitly
> call modprobe. Vhost-net will always be available if built into the kernel.
> It does require assigning a permanent minor number for depmod to work.
> Choose one next to TUN since this driver is related to it.
>
> Also, use C99 style initialization.
>
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
> ---
> drivers/vhost/net.c | 8 +++++---
> include/linux/miscdevice.h | 1 +
> 2 files changed, 6 insertions(+), 3 deletions(-)
>
> --- a/drivers/vhost/net.c 2012-01-10 10:56:58.883179194 -0800
> +++ b/drivers/vhost/net.c 2012-01-10 19:48:23.650225892 -0800
> @@ -856,9 +856,9 @@ static const struct file_operations vhos
> };
>
> static struct miscdevice vhost_net_misc = {
> - MISC_DYNAMIC_MINOR,
> - "vhost-net",
> - &vhost_net_fops,
> + .minor = VHOST_NET_MINOR,
> + .name = "vhost-net",
> + .fops = &vhost_net_fops,
> };
>
> static int vhost_net_init(void)
> @@ -879,3 +879,5 @@ MODULE_VERSION("0.0.1");
> MODULE_LICENSE("GPL v2");
> MODULE_AUTHOR("Michael S. Tsirkin");
> MODULE_DESCRIPTION("Host kernel accelerator for virtio net");
> +MODULE_ALIAS_MISCDEV(VHOST_NET_MINOR);
> +MODULE_ALIAS("devname:vhost-net");
> --- a/include/linux/miscdevice.h 2012-01-10 10:56:59.779189436
> -0800
> +++ b/include/linux/miscdevice.h 2012-01-10 19:49:56.091748210
> -0800
> @@ -31,6 +31,7 @@
> #define I2O_MINOR 166
> #define MICROCODE_MINOR 184
> #define TUN_MINOR 200
> +#define VHOST_NET_MINOR 201
>
CC: alan@linux.intel.com
CC: device@lanana.org
include/linux/miscdevice.h:
/*
* These allocations are managed by device@lanana.org. If you use an
* entry that is not in assigned your entry may well be moved and
* reassigned, or set dynamic if a fixed value is not justified.
*/
#define MWAVE_MINOR 219 /* ACP/Mwave Modem */
> #define MPT_MINOR 220
> #define MPT2SAS_MINOR 221
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>
[-- Attachment #1.2: Type: text/html, Size: 3664 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox