Linux virtualization list
 help / color / mirror / Atom feed
* Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Andrew Jones @ 2012-01-09 10:37 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: jeremy, xen-devel, konrad wilk, virtualization
In-Reply-To: <alpine.DEB.2.00.1201061648440.3150@kaball-desktop>



----- Original Message -----
> On Fri, 6 Jan 2012, Andrew Jones wrote:
> > XEN_DOM0 is a silent option that has been automatically selected
> > when
> > CONFIG_XEN is selected since 6b0661a5e6fbf. If this option was
> > changed
> > to a menu configurable option then it would only give users the
> > ability
> > to compile out about 100 kbytes of code.
> 
> I think that it is less than that because backends are not tied to
> dom0
> in any way, even though currently they cannot be compiled without
> XEN_DOM0.
> Running a network or block backend in domU is a well known
> configuration.
> Therefore the code compiled out only amounts to about 10K.
> 
> 
> > diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> > index 8795480..0725228 100644
> > --- a/drivers/xen/Kconfig
> > +++ b/drivers/xen/Kconfig
> > @@ -78,8 +78,9 @@ config XEN_DEV_EVTCHN
> >  
> >  config XEN_BACKEND
> >  	bool "Backend driver support"
> > -	depends on XEN_DOM0
> >  	default y
> > +	depends on XEN && PCI_XEN && SWIOTLB_XEN
> > +	depends on X86_LOCAL_APIC && X86_IO_APIC && ACPI && PCI
> >  	help
> >  	  Support for backend device drivers that provide I/O services
> >  	  to other virtual machines.
> 
> I think we can trimmer the dependency list here: after all backends
> can
> be run in unpriviledged guests as well (see driver domains).
> So I think it should depend only on XEN.

Hmm.. Actually, with dependency lists in mind, I think a really messed
this patch up. I should have either had CONFIG_XEN inherit these deps,
or I should have spread them around to be required at only the specific
places they're needed, and then left the stub functions in. Neither of
those options seems like a good idea to me. We either force all XEN
guests to always have all the deps needed for an initial domain, which
is exactly the opposite of the suggestion above (trimming XEN_BACKEND
deps for driver domains), or we actually make the code messier and
more complicated with more #ifdefs, which are now neatly packaged with
XEN_DOM0.

The driver domain concept may actually be the technical need for
making XEN_DOM0 optional. If we want the ability to build a minimally
configed XEN kernel to be used as a driver domain, then it doesn't
seem like we'd want it to also be capable of running as an initial
domain, or to be forced to have all the dependencies and code of an
initial domain.

With that in mind, I propose we go back to my initial patch, relax
deps for XEN_BACKEND, and double check that each individual backend
has the appropriate minimal deps. In general, it should be a goal to
make sure most XEN options are menu selectable and that all dep lists
are minimized in order to make sure driver domains can be built with
minimal configs.

Drew

> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

^ permalink raw reply

* Re: [Xen-devel] [PATCH 2/4] xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps
From: Dmitry Torokhov @ 2012-01-09  7:59 UTC (permalink / raw)
  To: Andrew Jones
  Cc: jeremy, xen-devel, FlorianSchandinat, konrad wilk, virtualization,
	Konrad Rzeszutek Wilk
In-Reply-To: <da05c51a-a541-4edd-8356-f0b5d2af0c03@zmail13.collab.prod.int.phx2.redhat.com>

Hi Andrew,

On Fri, Jan 06, 2012 at 10:58:06AM -0500, Andrew Jones wrote:
> 
> 
> ----- Original Message -----
> > On Fri, Jan 06, 2012 at 10:43:09AM +0100, Andrew Jones wrote:
> > > PV-on-HVM guests may want to use the xen keyboard/mouse frontend,
> > > but
> > > they don't use the xen frame buffer frontend. For this case it
> > > doesn't
> > > make much sense for INPUT_XEN_KBDDEV_FRONTEND to depend on
> > > XEN_FBDEV_FRONTEND. The opposite direction always makes more sense,
> > > i.e.
> > > if you're using xenfb, then you'll want xenkbd. Switch the
> > > dependencies.
> > 
> > You need to CC as well these people that have 'maintainer' field on
> > them:
> > 
> > konrad@phenom:~/work/linux$ scripts/get_maintainer.pl -f
> > drivers/video/Kconfig
> > Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
> > (maintainer:FRAMEBUFFER LAYER)
> > linux-fbdev@vger.kernel.org (open list:FRAMEBUFFER LAYER)
> > linux-kernel@vger.kernel.org (open list)
> > konrad@phenom:~/work/linux$ scripts/get_maintainer.pl -f
> > drivers/input/misc/Kconfig
> > Dmitry Torokhov <dmitry.torokhov@gmail.com> (maintainer:INPUT
> > (KEYBOARD,...,commit_signer:9/16=56%)
> > Samuel Ortiz <sameo@linux.intel.com> (commit_signer:3/16=19%)
> > Anirudh Ghayal <aghayal@codeaurora.org> (commit_signer:2/16=12%)
> > Peter Ujfalusi <peter.ujfalusi@ti.com> (commit_signer:2/16=12%)
> > Alan Cox <alan@linux.intel.com> (commit_signer:2/16=12%)
> > linux-input@vger.kernel.org (open list:INPUT (KEYBOARD,...)
> > linux-kernel@vger.kernel.org (open list)
> > 
> 
> Thanks. Replied with them in CC.
> 
> Drew
> 
> > > 
> > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > ---
> > >  drivers/input/misc/Kconfig |    2 +-
> > >  drivers/video/Kconfig      |    1 +
> > >  2 files changed, 2 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/drivers/input/misc/Kconfig
> > > b/drivers/input/misc/Kconfig
> > > index 22d875f..36c15bf 100644
> > > --- a/drivers/input/misc/Kconfig
> > > +++ b/drivers/input/misc/Kconfig
> > > @@ -533,7 +533,7 @@ config INPUT_CMA3000_I2C
> > >  
> > >  config INPUT_XEN_KBDDEV_FRONTEND
> > >  	tristate "Xen virtual keyboard and mouse support"
> > > -	depends on XEN_FBDEV_FRONTEND
> > > +	depends on XEN

This is OK with me.

> > >  	default y
> > >  	select XEN_XENBUS_FRONTEND
> > >  	help
> > > diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
> > > index d83e967..269b299 100644
> > > --- a/drivers/video/Kconfig
> > > +++ b/drivers/video/Kconfig
> > > @@ -2269,6 +2269,7 @@ config XEN_FBDEV_FRONTEND
> > >  	select FB_SYS_IMAGEBLIT
> > >  	select FB_SYS_FOPS
> > >  	select FB_DEFERRED_IO
> > > +	select INPUT_XEN_KBDDEV_FRONTEND

But here you need to either depend on or select INPUT as select does not
resolve dependencies for selected symbol.

Thanks.

-- 
Dmitry

^ permalink raw reply

* Re: [RFC PATCH v1 2/2] virtio_net: Don't disable napi on low memory.
From: Rusty Russell @ 2012-01-09  6:46 UTC (permalink / raw)
  To: Mike Waychison; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <CAGTjWtBbLtm47kozBC2fdwGYKsOooQec0W9PwUEAF58mYDioaQ@mail.gmail.com>

On Fri, 6 Jan 2012 09:54:46 -0800, Mike Waychison <mikew@google.com> wrote:
> On Wed, Jan 4, 2012 at 6:46 PM, Mike Waychison <mikew@google.com> wrote:
> > On Wed, Jan 4, 2012 at 4:31 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> >> 4) You use the skb data for the linked list; use the skb head's list.
> 
> What did you mean by this?  I was under the impression that the ->next
> and ->prev fields in sk_buff were the first two elements specifically
> so that the pointer could be treated as a list_head.  If it's the cast
> in particular that you have an objection with, I can easily change
> this to a singly linked list threaded through ->next if that's
> cleaner.

Yep, I saw the cast and misread your code.  I could have sworn that skb
used a real list_head these days, but I'm wrong.

> >>
> >> Instead, here's how I think it should be done:
> ...
> >
> > This sounds reasonable to me.  I'll see what I can muster together this week.
> >
> 
> So I started implementing it the way you were mentioning, and ran into
> a problem with the original patchset.
> 
> Currently the "mergeable" and "big" receive buffers use a private page
> free list (virtnet_info->pages) which has no synchronization itself.
> This means that the batched version can't use get_a_page() and
> give_pages() as is, which reduces the need to re-use the same alloc
> halves that I've split.   Alternatives I can think of at this point:
> 
> - pass in a flag to the allocators like "bool is_serial" that is true
> if we are serializing with napi, (which determines if we can much with
> vi->pages)
> or
> - not use the same allocators for the "mergeable" and "big" paths.
> The mergeable allocator in the non-serialized case reduces to
> alloc_page(), while the big allocator looks like a copy and paste that
> uses alloc_page instead of get_a_page().
> 
> Preferences?  I'll code one of the two up and see what it looks like.

Whatever results in a cleaner driver, I'm happy.

Thanks,
Rusty.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [PATCH 1/2] virtio: console: Serialise control work
From: Amit Shah @ 2012-01-09  3:53 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Virtualization List, miche, Michael S. Tsirkin
In-Reply-To: <87lipha9k8.fsf@rustcorp.com.au>

On (Mon) 09 Jan 2012 [13:03:59], Rusty Russell wrote:
> On Fri,  6 Jan 2012 16:19:07 +0530, Amit Shah <amit.shah@redhat.com> wrote:
> > We currently allow multiple instances of the control work handler to run
> > in parallel.  This isn't expected to work; serialise access by disabling
> > interrupts on new packets from the Host and enable them when all the
> > existing ones are consumed.
> > 
> > Testcase: hot-plug/unplug a port in a loop.  Without this patch, some
> > sysfs warnings are seen along with freezes.  With the patch, all works
> > fine.
> 
> Wait a sec... This might *work*, but it's not correct.  Remember that
> disabling callbacks is a hint, and it's cetainly not synchronous.
> 
> You need something else to explicitly prevent reentry.

OK, right.  I'll use the nrt workqueue to synchronise then.

		Amit

^ permalink raw reply

* Re: [PATCH 1/2] virtio: console: Serialise control work
From: Rusty Russell @ 2012-01-09  2:33 UTC (permalink / raw)
  Cc: Amit Shah, Virtualization List, miche, Michael S. Tsirkin
In-Reply-To: <0ee4e2fc77b40f1985391ce3f9b8a2cf3f729324.1325845993.git.amit.shah@redhat.com>

On Fri,  6 Jan 2012 16:19:07 +0530, Amit Shah <amit.shah@redhat.com> wrote:
> We currently allow multiple instances of the control work handler to run
> in parallel.  This isn't expected to work; serialise access by disabling
> interrupts on new packets from the Host and enable them when all the
> existing ones are consumed.
> 
> Testcase: hot-plug/unplug a port in a loop.  Without this patch, some
> sysfs warnings are seen along with freezes.  With the patch, all works
> fine.

Wait a sec... This might *work*, but it's not correct.  Remember that
disabling callbacks is a hint, and it's cetainly not synchronous.

You need something else to explicitly prevent reentry.

Unapplied,
Rusty.

^ permalink raw reply

* Re: [PATCH 0/2] virtio: console: control queue race fixes
From: Rusty Russell @ 2012-01-09  2:31 UTC (permalink / raw)
  Cc: Amit Shah, Virtualization List, miche, Michael S. Tsirkin
In-Reply-To: <cover.1325845993.git.amit.shah@redhat.com>

On Fri,  6 Jan 2012 16:19:06 +0530, Amit Shah <amit.shah@redhat.com> wrote:
> Hello,
> 
> The first patch here fixes the race seen by Miche.  He hasn't yet
> reported back if this fixes the races he saw, but Joy Pu from Red Hat
> tested this patch with hot-plugging/unplugging ports in a loop.
> Before this patch, he saw some freezes as well as sysfs warnings.
> After applying the patch, all was well.


So, the first one should be cc: stable?

> The second patch can be folded into the series fixing S4 for
> virtio-console.  It ensures all callbacks are disabled in the freeze
> function.  This patch has a dependency on the 1st one in the sense the
> double disabling of vq won't make sense w/o patch 1.
> 
> Please review and apply.

Done, thanks.

Cheers,
Rusty.

^ permalink raw reply

* [PATCH 1/1] drivers: hv: Get rid of some unnecessary code
From: K. Y. Srinivasan @ 2012-01-08 18:12 UTC (permalink / raw)
  To: gregkh, linux-kernel, devel, virtualization, ohering
  Cc: K. Y. Srinivasan, Haiyang Zhang

The current code unnecessarily limits the number of offers we handle.
Get rid of this limitation. As part of this cleanup, also get rid of an
unused define - MAX_MSG_TYPES.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/hv/channel_mgmt.c |   87 ---------------------------------------------
 1 files changed, 0 insertions(+), 87 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 36484db..9ffbfc5 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -37,81 +37,6 @@ struct vmbus_channel_message_table_entry {
 	void (*message_handler)(struct vmbus_channel_message_header *msg);
 };
 
-#define MAX_MSG_TYPES                    4
-#define MAX_NUM_DEVICE_CLASSES_SUPPORTED 8
-
-static const uuid_le
-	supported_device_classes[MAX_NUM_DEVICE_CLASSES_SUPPORTED] = {
-	/* {ba6163d9-04a1-4d29-b605-72e2ffb1dc7f} */
-	/* Storage - SCSI */
-	{
-		.b  = {
-			0xd9, 0x63, 0x61, 0xba, 0xa1, 0x04, 0x29, 0x4d,
-			0xb6, 0x05, 0x72, 0xe2, 0xff, 0xb1, 0xdc, 0x7f
-		}
-	},
-
-	/* {F8615163-DF3E-46c5-913F-F2D2F965ED0E} */
-	/* Network */
-	{
-		.b = {
-			0x63, 0x51, 0x61, 0xF8, 0x3E, 0xDF, 0xc5, 0x46,
-			0x91, 0x3F, 0xF2, 0xD2, 0xF9, 0x65, 0xED, 0x0E
-		}
-	},
-
-	/* {CFA8B69E-5B4A-4cc0-B98B-8BA1A1F3F95A} */
-	/* Input */
-	{
-		.b = {
-			0x9E, 0xB6, 0xA8, 0xCF, 0x4A, 0x5B, 0xc0, 0x4c,
-			0xB9, 0x8B, 0x8B, 0xA1, 0xA1, 0xF3, 0xF9, 0x5A
-		}
-	},
-
-	/* {32412632-86cb-44a2-9b5c-50d1417354f5} */
-	/* IDE */
-	{
-		.b = {
-			0x32, 0x26, 0x41, 0x32, 0xcb, 0x86, 0xa2, 0x44,
-			0x9b, 0x5c, 0x50, 0xd1, 0x41, 0x73, 0x54, 0xf5
-		}
-	},
-	/* 0E0B6031-5213-4934-818B-38D90CED39DB */
-	/* Shutdown */
-	{
-		.b = {
-			0x31, 0x60, 0x0B, 0X0E, 0x13, 0x52, 0x34, 0x49,
-			0x81, 0x8B, 0x38, 0XD9, 0x0C, 0xED, 0x39, 0xDB
-		}
-	},
-	/* {9527E630-D0AE-497b-ADCE-E80AB0175CAF} */
-	/* TimeSync */
-	{
-		.b = {
-			0x30, 0xe6, 0x27, 0x95, 0xae, 0xd0, 0x7b, 0x49,
-			0xad, 0xce, 0xe8, 0x0a, 0xb0, 0x17, 0x5c, 0xaf
-		}
-	},
-	/* {57164f39-9115-4e78-ab55-382f3bd5422d} */
-	/* Heartbeat */
-	{
-		.b = {
-			0x39, 0x4f, 0x16, 0x57, 0x15, 0x91, 0x78, 0x4e,
-			0xab, 0x55, 0x38, 0x2f, 0x3b, 0xd5, 0x42, 0x2d
-		}
-	},
-	/* {A9A0F4E7-5A45-4d96-B827-8A841E8C03E6} */
-	/* KVP */
-	{
-		.b = {
-			0xe7, 0xf4, 0xa0, 0xa9, 0x45, 0x5a, 0x96, 0x4d,
-			0xb8, 0x27, 0x8a, 0x84, 0x1e, 0x8c, 0x3,  0xe6
-	}
-	},
-
-};
-
 
 /**
  * vmbus_prep_negotiate_resp() - Create default response for Hyper-V Negotiate message
@@ -321,20 +246,8 @@ static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 	struct vmbus_channel *newchannel;
 	uuid_le *guidtype;
 	uuid_le *guidinstance;
-	int i;
-	int fsupported = 0;
 
 	offer = (struct vmbus_channel_offer_channel *)hdr;
-	for (i = 0; i < MAX_NUM_DEVICE_CLASSES_SUPPORTED; i++) {
-		if (!uuid_le_cmp(offer->offer.if_type,
-				supported_device_classes[i])) {
-			fsupported = 1;
-			break;
-		}
-	}
-
-	if (!fsupported)
-		return;
 
 	guidtype = &offer->offer.if_type;
 	guidinstance = &offer->offer.if_instance;
-- 
1.7.4.1

^ permalink raw reply related

* Re: [RFC PATCH v1 2/2] virtio_net: Don't disable napi on low memory.
From: Mike Waychison @ 2012-01-06 17:54 UTC (permalink / raw)
  To: Rusty Russell; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <CAGTjWtB3PjWDfaunXjF-KQwKYg-bPgrn-8N=8iwL0EPvviXMuw@mail.gmail.com>

On Wed, Jan 4, 2012 at 6:46 PM, Mike Waychison <mikew@google.com> wrote:
> On Wed, Jan 4, 2012 at 4:31 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
>> 4) You use the skb data for the linked list; use the skb head's list.

What did you mean by this?  I was under the impression that the ->next
and ->prev fields in sk_buff were the first two elements specifically
so that the pointer could be treated as a list_head.  If it's the cast
in particular that you have an objection with, I can easily change
this to a singly linked list threaded through ->next if that's
cleaner.

>>
>> Instead, here's how I think it should be done:
...
>
> This sounds reasonable to me.  I'll see what I can muster together this week.
>

So I started implementing it the way you were mentioning, and ran into
a problem with the original patchset.

Currently the "mergeable" and "big" receive buffers use a private page
free list (virtnet_info->pages) which has no synchronization itself.
This means that the batched version can't use get_a_page() and
give_pages() as is, which reduces the need to re-use the same alloc
halves that I've split.   Alternatives I can think of at this point:

- pass in a flag to the allocators like "bool is_serial" that is true
if we are serializing with napi, (which determines if we can much with
vi->pages)
or
- not use the same allocators for the "mergeable" and "big" paths.
The mergeable allocator in the non-serialized case reduces to
alloc_page(), while the big allocator looks like a copy and paste that
uses alloc_page instead of get_a_page().

Preferences?  I'll code one of the two up and see what it looks like.

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Ian Campbell @ 2012-01-06 17:16 UTC (permalink / raw)
  To: Andrew Jones
  Cc: jeremy@goop.org, xen-devel@lists.xensource.com,
	konrad.wilk@oracle.com, Stefano Stabellini,
	virtualization@lists.linux-foundation.org
In-Reply-To: <1325867990-15018-1-git-send-email-drjones@redhat.com>

On Fri, 2012-01-06 at 16:39 +0000, Andrew Jones wrote:
> remove XEN_PRIVILEGED_GUEST as it's just an alias for XEN_DOM0.

Hmm, this one is used by tools like update-grub to know when it is ok to
create xen+kernel entries so I think it needs to stay, or we at least
need to give lengthly warning to distros etc that it is going away.

Perhaps you can just patch it out locally?

Ian.

> 
> I compile tested this on a latest pull using an F16 config. The compile
> succeeded and 'make oldconfig' only removed these two options as
> expected.
> 
> CONFIG_XEN_DOM0=y
> CONFIG_XEN_PRIVILEGED_GUEST=y
> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/x86/include/asm/xen/pci.h |   21 +--------------------
>  arch/x86/pci/xen.c             |    6 ------
>  arch/x86/xen/Kconfig           |   10 ----------
>  arch/x86/xen/Makefile          |    3 +--
>  arch/x86/xen/xen-ops.h         |    7 -------
>  drivers/xen/Kconfig            |    3 ++-
>  drivers/xen/Makefile           |    3 +--
>  drivers/xen/xenfs/Makefile     |    3 +--
>  include/xen/xen.h              |   11 +++--------
>  9 files changed, 9 insertions(+), 58 deletions(-)
> 
> diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h
> index 968d57d..b423889 100644
> --- a/arch/x86/include/asm/xen/pci.h
> +++ b/arch/x86/include/asm/xen/pci.h
> @@ -13,30 +13,11 @@ static inline int pci_xen_hvm_init(void)
>  	return -1;
>  }
>  #endif
> -#if defined(CONFIG_XEN_DOM0)
> +
>  int __init pci_xen_initial_domain(void);
>  int xen_find_device_domain_owner(struct pci_dev *dev);
>  int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain);
>  int xen_unregister_device_domain_owner(struct pci_dev *dev);
> -#else
> -static inline int __init pci_xen_initial_domain(void)
> -{
> -	return -1;
> -}
> -static inline int xen_find_device_domain_owner(struct pci_dev *dev)
> -{
> -	return -1;
> -}
> -static inline int xen_register_device_domain_owner(struct pci_dev *dev,
> -						   uint16_t domain)
> -{
> -	return -1;
> -}
> -static inline int xen_unregister_device_domain_owner(struct pci_dev *dev)
> -{
> -	return -1;
> -}
> -#endif
>  
>  #if defined(CONFIG_PCI_MSI)
>  #if defined(CONFIG_PCI_XEN)
> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> index 492ade8..e298726 100644
> --- a/arch/x86/pci/xen.c
> +++ b/arch/x86/pci/xen.c
> @@ -108,7 +108,6 @@ static int acpi_register_gsi_xen_hvm(struct device *dev, u32 gsi,
>  				 false /* no mapping of GSI to PIRQ */);
>  }
>  
> -#ifdef CONFIG_XEN_DOM0
>  static int xen_register_gsi(u32 gsi, int gsi_override, int triggering, int polarity)
>  {
>  	int rc, irq;
> @@ -143,7 +142,6 @@ static int acpi_register_gsi_xen(struct device *dev, u32 gsi,
>  	return xen_register_gsi(gsi, -1 /* no GSI override */, trigger, polarity);
>  }
>  #endif
> -#endif
>  
>  #if defined(CONFIG_PCI_MSI)
>  #include <linux/msi.h>
> @@ -251,7 +249,6 @@ error:
>  	return irq;
>  }
>  
> -#ifdef CONFIG_XEN_DOM0
>  static bool __read_mostly pci_seg_supported = true;
>  
>  static int xen_initdom_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
> @@ -324,7 +321,6 @@ static int xen_initdom_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
>  out:
>  	return ret;
>  }
> -#endif
>  
>  static void xen_teardown_msi_irqs(struct pci_dev *dev)
>  {
> @@ -392,7 +388,6 @@ int __init pci_xen_hvm_init(void)
>  	return 0;
>  }
>  
> -#ifdef CONFIG_XEN_DOM0
>  static __init void xen_setup_acpi_sci(void)
>  {
>  	int rc;
> @@ -539,4 +534,3 @@ int xen_unregister_device_domain_owner(struct pci_dev *dev)
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(xen_unregister_device_domain_owner);
> -#endif
> diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
> index 26c731a..3c7e89a 100644
> --- a/arch/x86/xen/Kconfig
> +++ b/arch/x86/xen/Kconfig
> @@ -13,16 +13,6 @@ config XEN
>  	  kernel to boot in a paravirtualized environment under the
>  	  Xen hypervisor.
>  
> -config XEN_DOM0
> -	def_bool y
> -	depends on XEN && PCI_XEN && SWIOTLB_XEN
> -	depends on X86_LOCAL_APIC && X86_IO_APIC && ACPI && PCI
> -
> -# Dummy symbol since people have come to rely on the PRIVILEGED_GUEST
> -# name in tools.
> -config XEN_PRIVILEGED_GUEST
> -	def_bool XEN_DOM0
> -
>  config XEN_PVHVM
>  	def_bool y
>  	depends on XEN && PCI && X86_LOCAL_APIC
> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
> index add2c2d..b2d4c4b 100644
> --- a/arch/x86/xen/Makefile
> +++ b/arch/x86/xen/Makefile
> @@ -13,12 +13,11 @@ CFLAGS_mmu.o			:= $(nostackp)
>  obj-y		:= enlighten.o setup.o multicalls.o mmu.o irq.o \
>  			time.o xen-asm.o xen-asm_$(BITS).o \
>  			grant-table.o suspend.o platform-pci-unplug.o \
> -			p2m.o
> +			p2m.o vga.o
>  
>  obj-$(CONFIG_EVENT_TRACING) += trace.o
>  
>  obj-$(CONFIG_SMP)		+= smp.o
>  obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
>  obj-$(CONFIG_XEN_DEBUG_FS)	+= debugfs.o
> -obj-$(CONFIG_XEN_DOM0)		+= vga.o
>  obj-$(CONFIG_SWIOTLB_XEN)	+= pci-swiotlb-xen.o
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index b095739..d9dbbca 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -90,14 +90,7 @@ static inline void xen_uninit_lock_cpu(int cpu)
>  
>  struct dom0_vga_console_info;
>  
> -#ifdef CONFIG_XEN_DOM0
>  void __init xen_init_vga(const struct dom0_vga_console_info *, size_t size);
> -#else
> -static inline void __init xen_init_vga(const struct dom0_vga_console_info *info,
> -				       size_t size)
> -{
> -}
> -#endif
>  
>  /* Declare an asm function, along with symbols needed to make it
>     inlineable */
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index 8795480..0725228 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -78,8 +78,9 @@ config XEN_DEV_EVTCHN
>  
>  config XEN_BACKEND
>  	bool "Backend driver support"
> -	depends on XEN_DOM0
>  	default y
> +	depends on XEN && PCI_XEN && SWIOTLB_XEN
> +	depends on X86_LOCAL_APIC && X86_IO_APIC && ACPI && PCI
>  	help
>  	  Support for backend device drivers that provide I/O services
>  	  to other virtual machines.
> diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
> index 974fffd..7e61662 100644
> --- a/drivers/xen/Makefile
> +++ b/drivers/xen/Makefile
> @@ -1,4 +1,4 @@
> -obj-y	+= grant-table.o features.o events.o manage.o balloon.o
> +obj-y	+= grant-table.o features.o events.o manage.o balloon.o pci.o
>  obj-y	+= xenbus/
>  
>  nostackp := $(call cc-option, -fno-stack-protector)
> @@ -17,7 +17,6 @@ obj-$(CONFIG_XEN_SYS_HYPERVISOR)	+= sys-hypervisor.o
>  obj-$(CONFIG_XEN_PVHVM)			+= platform-pci.o
>  obj-$(CONFIG_XEN_TMEM)			+= tmem.o
>  obj-$(CONFIG_SWIOTLB_XEN)		+= swiotlb-xen.o
> -obj-$(CONFIG_XEN_DOM0)			+= pci.o
>  obj-$(CONFIG_XEN_PCIDEV_BACKEND)	+= xen-pciback/
>  
>  xen-evtchn-y				:= evtchn.o
> diff --git a/drivers/xen/xenfs/Makefile b/drivers/xen/xenfs/Makefile
> index 4fde944..d20c060 100644
> --- a/drivers/xen/xenfs/Makefile
> +++ b/drivers/xen/xenfs/Makefile
> @@ -1,4 +1,3 @@
>  obj-$(CONFIG_XENFS) += xenfs.o
>  
> -xenfs-y			  = super.o xenbus.o privcmd.o
> -xenfs-$(CONFIG_XEN_DOM0) += xenstored.o
> +xenfs-y			  = super.o xenbus.o privcmd.o xenstored.o
> diff --git a/include/xen/xen.h b/include/xen/xen.h
> index a164024..d814ef7 100644
> --- a/include/xen/xen.h
> +++ b/include/xen/xen.h
> @@ -1,6 +1,9 @@
>  #ifndef _XEN_XEN_H
>  #define _XEN_XEN_H
>  
> +#include <xen/interface/xen.h>
> +#include <asm/xen/hypervisor.h>
> +
>  enum xen_domain_type {
>  	XEN_NATIVE,		/* running on bare hardware    */
>  	XEN_PV_DOMAIN,		/* running in a PV domain      */
> @@ -18,15 +21,7 @@ extern enum xen_domain_type xen_domain_type;
>  				 xen_domain_type == XEN_PV_DOMAIN)
>  #define xen_hvm_domain()	(xen_domain() &&			\
>  				 xen_domain_type == XEN_HVM_DOMAIN)
> -
> -#ifdef CONFIG_XEN_DOM0
> -#include <xen/interface/xen.h>
> -#include <asm/xen/hypervisor.h>
> -
>  #define xen_initial_domain()	(xen_pv_domain() && \
>  				 xen_start_info->flags & SIF_INITDOMAIN)
> -#else  /* !CONFIG_XEN_DOM0 */
> -#define xen_initial_domain()	(0)
> -#endif	/* CONFIG_XEN_DOM0 */
>  
>  #endif	/* _XEN_XEN_H */

^ permalink raw reply

* Re: [PATCH] xen: remove CONFIG_XEN_DOM0 compile option
From: Stefano Stabellini @ 2012-01-06 16:59 UTC (permalink / raw)
  To: Andrew Jones
  Cc: jeremy@goop.org, xen-devel@lists.xensource.com,
	virtualization@lists.linux-foundation.org, Stefano Stabellini,
	konrad.wilk@oracle.com
In-Reply-To: <1325867990-15018-1-git-send-email-drjones@redhat.com>

On Fri, 6 Jan 2012, Andrew Jones wrote:
> XEN_DOM0 is a silent option that has been automatically selected when
> CONFIG_XEN is selected since 6b0661a5e6fbf. If this option was changed
> to a menu configurable option then it would only give users the ability
> to compile out about 100 kbytes of code.

I think that it is less than that because backends are not tied to dom0
in any way, even though currently they cannot be compiled without
XEN_DOM0.
Running a network or block backend in domU is a well known
configuration.
Therefore the code compiled out only amounts to about 10K.


> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index 8795480..0725228 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -78,8 +78,9 @@ config XEN_DEV_EVTCHN
>  
>  config XEN_BACKEND
>  	bool "Backend driver support"
> -	depends on XEN_DOM0
>  	default y
> +	depends on XEN && PCI_XEN && SWIOTLB_XEN
> +	depends on X86_LOCAL_APIC && X86_IO_APIC && ACPI && PCI
>  	help
>  	  Support for backend device drivers that provide I/O services
>  	  to other virtual machines.

I think we can trimmer the dependency list here: after all backends can
be run in unpriviledged guests as well (see driver domains).
So I think it should depend only on XEN.

^ permalink raw reply

* Re: [Xen-devel] [PATCH 2/4] xen kconfig: relax INPUT_XEN_KBDDEV_FRONTEND deps
From: Konrad Rzeszutek Wilk @ 2012-01-06 15:46 UTC (permalink / raw)
  To: Andrew Jones; +Cc: jeremy, xen-devel, konrad.wilk, virtualization
In-Reply-To: <1325842991-4404-3-git-send-email-drjones@redhat.com>

On Fri, Jan 06, 2012 at 10:43:09AM +0100, Andrew Jones wrote:
> PV-on-HVM guests may want to use the xen keyboard/mouse frontend, but
> they don't use the xen frame buffer frontend. For this case it doesn't
> make much sense for INPUT_XEN_KBDDEV_FRONTEND to depend on
> XEN_FBDEV_FRONTEND. The opposite direction always makes more sense, i.e.
> if you're using xenfb, then you'll want xenkbd. Switch the dependencies.

You need to CC as well these people that have 'maintainer' field on them:

konrad@phenom:~/work/linux$ scripts/get_maintainer.pl -f
drivers/video/Kconfig
Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
(maintainer:FRAMEBUFFER LAYER)
linux-fbdev@vger.kernel.org (open list:FRAMEBUFFER LAYER)
linux-kernel@vger.kernel.org (open list)
konrad@phenom:~/work/linux$ scripts/get_maintainer.pl -f
drivers/input/misc/Kconfig
Dmitry Torokhov <dmitry.torokhov@gmail.com> (maintainer:INPUT
(KEYBOARD,...,commit_signer:9/16=56%)
Samuel Ortiz <sameo@linux.intel.com> (commit_signer:3/16=19%)
Anirudh Ghayal <aghayal@codeaurora.org> (commit_signer:2/16=12%)
Peter Ujfalusi <peter.ujfalusi@ti.com> (commit_signer:2/16=12%)
Alan Cox <alan@linux.intel.com> (commit_signer:2/16=12%)
linux-input@vger.kernel.org (open list:INPUT (KEYBOARD,...)
linux-kernel@vger.kernel.org (open list)

> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  drivers/input/misc/Kconfig |    2 +-
>  drivers/video/Kconfig      |    1 +
>  2 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig
> index 22d875f..36c15bf 100644
> --- a/drivers/input/misc/Kconfig
> +++ b/drivers/input/misc/Kconfig
> @@ -533,7 +533,7 @@ config INPUT_CMA3000_I2C
>  
>  config INPUT_XEN_KBDDEV_FRONTEND
>  	tristate "Xen virtual keyboard and mouse support"
> -	depends on XEN_FBDEV_FRONTEND
> +	depends on XEN
>  	default y
>  	select XEN_XENBUS_FRONTEND
>  	help
> diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
> index d83e967..269b299 100644
> --- a/drivers/video/Kconfig
> +++ b/drivers/video/Kconfig
> @@ -2269,6 +2269,7 @@ config XEN_FBDEV_FRONTEND
>  	select FB_SYS_IMAGEBLIT
>  	select FB_SYS_FOPS
>  	select FB_DEFERRED_IO
> +	select INPUT_XEN_KBDDEV_FRONTEND
>  	select XEN_XENBUS_FRONTEND
>  	default y
>  	help
> -- 
> 1.7.7.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply

* Re: [Xen-devel] [PATCH 3/4] xen kconfig: add dom0 support help text
From: Konrad Rzeszutek Wilk @ 2012-01-06 15:07 UTC (permalink / raw)
  To: Andrew Jones
  Cc: jeremy, xen-devel, virtualization, Ian Campbell,
	Stefano Stabellini
In-Reply-To: <c802fb58-5067-4075-80b3-ca8004005364@zmail13.collab.prod.int.phx2.redhat.com>

> > The reason to remove is easy: it is already a silent option and
> > disabling it saves almost nothing.
> > I think that removing it should be easy enough but if you don't
> > feel confident doing it, I can come up with a patch.
> > 
> 
> I'll write the patch. It's not the patch I thought would be hard to
> do (although I'll only be posting it compile tested, as I don't
> have an environment where I can test it set up). What I thought would

Um.. Fedora Core 16?

> be difficult is the justification. However, considering you're
> advocating it, I guess I'm good there too.

We don't bite :-)

^ permalink raw reply

* Re: [Xen-devel] [PATCH 3/4] xen kconfig: add dom0 support help text
From: Stefano Stabellini @ 2012-01-06 12:35 UTC (permalink / raw)
  To: Andrew Jones
  Cc: jeremy@goop.org, xen-devel@lists.xensource.com, Ian Campbell,
	konrad wilk, Stefano Stabellini,
	virtualization@lists.linux-foundation.org
In-Reply-To: <a0f8cfee-aeee-4993-a433-4166d63a34e6@zmail13.collab.prod.int.phx2.redhat.com>

On Fri, 6 Jan 2012, Andrew Jones wrote:
> ----- Original Message -----
> > On Fri, 6 Jan 2012, Andrew Jones wrote:
> > > > Almost all of the things which dom0 needs (e.g. PCI device
> > > > management
> > > > etc) is also required by a domU with passthrough enabled so the
> > > > savings
> > > > are really very slight.
> > > > 
> > > > We are talking less than 1k of code AFAICT, 319 bytes for
> > > > arch/x86/xen/vga.o and 573 for drivers/xen/xenfs/xenstored.o plus
> > > > whatever xen_register_gsi (a couple of dozen lines of code) adds
> > > > to
> > > > arch/x86/pci/xen.o. grep doesn't show CONFIG_XEN_DOM0 being used
> > > > anywhere else. What savings do you see in practice from disabling
> > > > just
> > > > this symbol?
> > > 
> > > I completely agree that the saving are near none. The savings,
> > > however,
> > > aren't the only reason to drive the change. It's actually the
> > > symbol
> > > name itself. Unfortunately configs can be perceived as a contract
> > > of
> > > support, i.e. if feature xyz is enabled in the distro's config,
> > > then
> > > the distributor must have selected, and therefore will support,
> > > said
> > > feature.
> > > 
> > > I didn't make this motivation clear in my initial post, because I
> > > was
> > > hoping to spare people some eye rolling.
> > 
> > I thought that in the kernel community we make decisions based on
> > technical merits rather than "contracts of support".
> 
> Sorry.
> 
> > Given that disabling the symbol saves near to nothing, the logical
> > thing
> > to do is removing the symbol altogether.
> > 
> 
> I thought of that too. If the symbol just goes away, then my
> non-technical requirement will be met and the functionality will
> stay. I consider that a bigger win actually. I didn't suggest it
> though, since I've never done any dom0 development, nor had any
> consideration for dom0 code needs of the future. With
> anti-credentials like that, I'd guess it'd be tougher for me to sell
> the need to remove it, than it is for me to just make it
> configurable.
 
The reason to remove is easy: it is already a silent option and
disabling it saves almost nothing.
I think that removing it should be easy enough but if you don't
feel confident doing it, I can come up with a patch.

^ permalink raw reply

* Re: [Xen-devel] [PATCH 3/4] xen kconfig: add dom0 support help text
From: Stefano Stabellini @ 2012-01-06 11:21 UTC (permalink / raw)
  To: Andrew Jones
  Cc: jeremy@goop.org, xen-devel@lists.xensource.com,
	virtualization@lists.linux-foundation.org, Ian Campbell,
	konrad wilk
In-Reply-To: <0872cdd4-a141-4904-8149-ff84cee00514@zmail13.collab.prod.int.phx2.redhat.com>

On Fri, 6 Jan 2012, Andrew Jones wrote:
> > Almost all of the things which dom0 needs (e.g. PCI device management
> > etc) is also required by a domU with passthrough enabled so the
> > savings
> > are really very slight.
> > 
> > We are talking less than 1k of code AFAICT, 319 bytes for
> > arch/x86/xen/vga.o and 573 for drivers/xen/xenfs/xenstored.o plus
> > whatever xen_register_gsi (a couple of dozen lines of code) adds to
> > arch/x86/pci/xen.o. grep doesn't show CONFIG_XEN_DOM0 being used
> > anywhere else. What savings do you see in practice from disabling
> > just
> > this symbol?
> 
> I completely agree that the saving are near none. The savings, however,
> aren't the only reason to drive the change. It's actually the symbol
> name itself. Unfortunately configs can be perceived as a contract of
> support, i.e. if feature xyz is enabled in the distro's config, then
> the distributor must have selected, and therefore will support, said
> feature.
> 
> I didn't make this motivation clear in my initial post, because I was
> hoping to spare people some eye rolling.

I thought that in the kernel community we make decisions based on
technical merits rather than "contracts of support". 
Given that disabling the symbol saves near to nothing, the logical thing
to do is removing the symbol altogether.

^ permalink raw reply

* [PATCH 2/2] virtio: console: Disable callbacks for virtqueues at start of S4 freeze
From: Amit Shah @ 2012-01-06 10:49 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Amit Shah, Virtualization List, miche, Michael S. Tsirkin
In-Reply-To: <cover.1325845993.git.amit.shah@redhat.com>

To ensure we don't receive any more interrupts from the host after we
enter the freeze function, disable all vq interrupts.

There wasn't any problem seen due to this in tests, but applying this
patch makes the freeze case more robust.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 drivers/char/virtio_console.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index fc54a79..c1c45d8 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1860,10 +1860,18 @@ static int virtcons_freeze(struct virtio_device *vdev)
 
 	vdev->config->reset(vdev);
 
+	virtqueue_disable_cb(portdev->c_ivq);
 	cancel_work_sync(&portdev->control_work);
+	/*
+	 * Once more: if control_work_handler() was running, it would
+	 * enable the cb as the last step.
+	 */
+	virtqueue_disable_cb(portdev->c_ivq);
 	remove_controlq_data(portdev);
 
 	list_for_each_entry(port, &portdev->ports, list) {
+		virtqueue_disable_cb(port->in_vq);
+		virtqueue_disable_cb(port->out_vq);
 		/*
 		 * We'll ask the host later if the new invocation has
 		 * the port opened or closed.
-- 
1.7.7.4

^ permalink raw reply related

* [PATCH 1/2] virtio: console: Serialise control work
From: Amit Shah @ 2012-01-06 10:49 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Amit Shah, Virtualization List, miche, Michael S. Tsirkin
In-Reply-To: <cover.1325845993.git.amit.shah@redhat.com>

We currently allow multiple instances of the control work handler to run
in parallel.  This isn't expected to work; serialise access by disabling
interrupts on new packets from the Host and enable them when all the
existing ones are consumed.

Testcase: hot-plug/unplug a port in a loop.  Without this patch, some
sysfs warnings are seen along with freezes.  With the patch, all works
fine.

Reported-by: Miche Baker-Harvey <miche@google.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 drivers/char/virtio_console.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index fd2fd6f..fc54a79 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -1471,6 +1471,7 @@ static void control_work_handler(struct work_struct *work)
 	portdev = container_of(work, struct ports_device, control_work);
 	vq = portdev->c_ivq;
 
+ start:
 	spin_lock(&portdev->cvq_lock);
 	while ((buf = virtqueue_get_buf(vq, &len))) {
 		spin_unlock(&portdev->cvq_lock);
@@ -1488,6 +1489,10 @@ static void control_work_handler(struct work_struct *work)
 		}
 	}
 	spin_unlock(&portdev->cvq_lock);
+	if (unlikely(!virtqueue_enable_cb(vq))) {
+		virtqueue_disable_cb(vq);
+		goto start;
+	}
 }
 
 static void out_intr(struct virtqueue *vq)
@@ -1538,6 +1543,7 @@ static void control_intr(struct virtqueue *vq)
 {
 	struct ports_device *portdev;
 
+	virtqueue_disable_cb(vq);
 	portdev = vq->vdev->priv;
 	schedule_work(&portdev->control_work);
 }
-- 
1.7.7.4

^ permalink raw reply related

* [PATCH 0/2] virtio: console: control queue race fixes
From: Amit Shah @ 2012-01-06 10:49 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Amit Shah, Virtualization List, miche, Michael S. Tsirkin

Hello,

The first patch here fixes the race seen by Miche.  He hasn't yet
reported back if this fixes the races he saw, but Joy Pu from Red Hat
tested this patch with hot-plugging/unplugging ports in a loop.
Before this patch, he saw some freezes as well as sysfs warnings.
After applying the patch, all was well.

The second patch can be folded into the series fixing S4 for
virtio-console.  It ensures all callbacks are disabled in the freeze
function.  This patch has a dependency on the 1st one in the sense the
double disabling of vq won't make sense w/o patch 1.

Please review and apply.


Amit Shah (2):
  virtio: console: Serialise control work
  virtio: console: Disable callbacks for virtqueues at start of S4
    freeze

 drivers/char/virtio_console.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

-- 
1.7.7.4

^ permalink raw reply

* Re: [Xen-devel] [PATCH 3/4] xen kconfig: add dom0 support help text
From: Stefano Stabellini @ 2012-01-06 10:42 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Andrew Jones, xen-devel@lists.xensource.com,
	virtualization@lists.linux-foundation.org, jeremy@goop.org,
	konrad.wilk@oracle.com
In-Reply-To: <1325843743.25206.438.camel@zakaz.uk.xensource.com>

On Fri, 6 Jan 2012, Ian Campbell wrote:
> On Fri, 2012-01-06 at 09:26 +0000, Andrew Jones wrote:
> > 
> > ----- Original Message -----
> > > On Fri, 2012-01-06 at 08:57 +0000, Andrew Jones wrote:
> > > > Describe dom0 support in the config menu and supply help text for
> > > > it.
> > > 
> > > This turns a non-user visible symbol into a user visible one.
> > > Previously
> > > if Xen was enabled and the other prerequisites were met you would get
> > > dom0 support automatically -- do we really want to change that?
> > > According to 6b0661a5e6fbf it was a deliberate decision to have it
> > > this
> > > way.
> > 
> > I think it's a necessary evil in order to give users the ability to
> > compile kernels without the support. I know it doesn't make much sense
> > for most users, but...
> 
> Who actually wants to do this though and why? Do you have a bug report
> requesting this change?
> 
> Almost all of the things which dom0 needs (e.g. PCI device management
> etc) is also required by a domU with passthrough enabled so the savings
> are really very slight.
> 
> We are talking less than 1k of code AFAICT, 319 bytes for
> arch/x86/xen/vga.o and 573 for drivers/xen/xenfs/xenstored.o plus
> whatever xen_register_gsi (a couple of dozen lines of code) adds to
> arch/x86/pci/xen.o. grep doesn't show CONFIG_XEN_DOM0 being used
> anywhere else. What savings do you see in practice from disabling just
> this symbol?
> 
> We need to weigh up the size change against the complexity of asking the
> user yet another question, I'm not convinced the question is worth it on
> balance.

I agree, there is nothing to gain disabling dom0 support if the
conditions it depends upon are already met.
XEN_DOM0 was meant to be a silent option since the beginning to avoid
unnecessary complexity.

^ permalink raw reply

* Re: [Xen-devel] [PATCH 3/4] xen kconfig: add dom0 support help text
From: Ian Campbell @ 2012-01-06  9:55 UTC (permalink / raw)
  To: Andrew Jones
  Cc: jeremy@goop.org, xen-devel@lists.xensource.com,
	konrad.wilk@oracle.com, virtualization@lists.linux-foundation.org
In-Reply-To: <5a67dfb3-fab0-4792-ba17-95f0934cf7af@zmail13.collab.prod.int.phx2.redhat.com>

On Fri, 2012-01-06 at 09:26 +0000, Andrew Jones wrote:
> 
> ----- Original Message -----
> > On Fri, 2012-01-06 at 08:57 +0000, Andrew Jones wrote:
> > > Describe dom0 support in the config menu and supply help text for
> > > it.
> > 
> > This turns a non-user visible symbol into a user visible one.
> > Previously
> > if Xen was enabled and the other prerequisites were met you would get
> > dom0 support automatically -- do we really want to change that?
> > According to 6b0661a5e6fbf it was a deliberate decision to have it
> > this
> > way.
> 
> I think it's a necessary evil in order to give users the ability to
> compile kernels without the support. I know it doesn't make much sense
> for most users, but...

Who actually wants to do this though and why? Do you have a bug report
requesting this change?

Almost all of the things which dom0 needs (e.g. PCI device management
etc) is also required by a domU with passthrough enabled so the savings
are really very slight.

We are talking less than 1k of code AFAICT, 319 bytes for
arch/x86/xen/vga.o and 573 for drivers/xen/xenfs/xenstored.o plus
whatever xen_register_gsi (a couple of dozen lines of code) adds to
arch/x86/pci/xen.o. grep doesn't show CONFIG_XEN_DOM0 being used
anywhere else. What savings do you see in practice from disabling just
this symbol?

We need to weigh up the size change against the complexity of asking the
user yet another question, I'm not convinced the question is worth it on
balance.

> > 
> > BTW, you forgot a Signed-off-by and the appropriate CCs (please use
> > MAINTAINERS or ./scripts/get-maintainer.pl).
> > 
> 
> Sorry, I'll resend properly.

I've added those CC's to this reply too.

Ian.

> 
> Drew
> 
> > Ian.
> > 
> > > ---
> > >  arch/x86/xen/Kconfig |    7 ++++++-
> > >  1 files changed, 6 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
> > > index 26c731a..88862d5 100644
> > > --- a/arch/x86/xen/Kconfig
> > > +++ b/arch/x86/xen/Kconfig
> > > @@ -14,9 +14,14 @@ config XEN
> > >  	  Xen hypervisor.
> > >  
> > >  config XEN_DOM0
> > > -	def_bool y
> > > +	bool "Xen Initial Domain (Dom0) support"
> > > +	default y
> > >  	depends on XEN && PCI_XEN && SWIOTLB_XEN
> > >  	depends on X86_LOCAL_APIC && X86_IO_APIC && ACPI && PCI
> > > +	help
> > > +	  This allows the kernel to be used for the initial Xen domain,
> > > +	  Domain0. This is a privileged guest that supplies backends
> > > +	  and is used to manage the other Xen domains.
> > >  
> > >  # Dummy symbol since people have come to rely on the
> > >  PRIVILEGED_GUEST
> > >  # name in tools.
> > 
> > 
> > 

^ permalink raw reply

* Re: [RFC PATCH v1 1/2] virtio_net: Pass gfp flags when allocating rx buffers.
From: David Miller @ 2012-01-05 18:21 UTC (permalink / raw)
  To: rusty; +Cc: netdev, mst, virtualization, linux-kernel, mikew
In-Reply-To: <871urfc8md.fsf@rustcorp.com.au>

From: Rusty Russell <rusty@rustcorp.com.au>
Date: Thu, 05 Jan 2012 10:40:02 +1030

> On Wed, 04 Jan 2012 14:52:32 -0800, Mike Waychison <mikew@google.com> wrote:
>> Currently, the refill path for RX buffers will always allocate the
>> buffers as GFP_ATOMIC, even if we are in process context.  This will
>> fail to apply memory pressure as the worker thread will not contribute
>> to the freeing of memory.
>> 
>> Fix this by changing add_recvbuf_small to use the gfp variant allocator,
>> __netdev_alloc_skb_ip_align().
>> 
>> Signed-off-by: Mike Waychison <mikew@google.com>
> 
> OK, this is a no-brainer.  Thanks!  Dave, can you pick this up?
> 
> Acked-by: Rusty Russell <rusty@rustcorp.com.au>

Applied.

^ permalink raw reply

* Re: [RFC PATCH v1 2/2] virtio_net: Don't disable napi on low memory.
From: Mike Waychison @ 2012-01-05  2:46 UTC (permalink / raw)
  To: Rusty Russell; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <87y5tnat2g.fsf@rustcorp.com.au>

On Wed, Jan 4, 2012 at 4:31 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> On Wed, 04 Jan 2012 14:52:38 -0800, Mike Waychison <mikew@google.com> wrote:
>> Currently, the virtio_net driver deals with low memory memory by kicking
>> off delayed work in process context to try and refill the rx queues.
>> This process-context filling is synchronized against the receive
>> bottom-half by serially:
>>
>>   - disabling NAPI polling on the device,
>>   - allocating buffers,
>>   - enqueueing the buffers,
>>   - re-enabling napi.
>>
>> Disabling NAPI just to synchronize virtio_add_buf calls is a bit
>> overkill, and there isn't any reason we shouldn't be able to continue
>> processing received packets as they come in.  In the simple case, this
>> does not involve allocating any memory, and in fact allows memory to be
>> released as the guest system receives and processes packets.
>
> Adding a spinlock to the fastpath for a weird case in the slow path is
> bad too.
>
> I dislike several things about this patch:
> 1) You seem to leak memory if you allocate a batch then don't add it all.
> 2) You have a module parameter, which I guarantee noone else will ever use.
> 3) You've made three changes at once: allocating outside the lock,
>   batching, and replacing the napi lock with a spinlock.
> 4) You use the skb data for the linked list; use the skb head's list.
>
> Instead, here's how I think it should be done:
>
> 1) Split alloc and add, but make alloc return the skb.
> 2) Make try_fill_recv() only called in the receive path, keep it
>   basically the same (no batching).
> 3) Add a new try_fill_recv_batch() for the workqueue and init paths.
>   This should try to allocate max - num, though in practice the
>   first alloc would probably kick off reclaim and take forever,
>   then by the time it's done, the problem is solved.  Since it's
>   reading max and num outside the lock, write this loop carefully!
> 4) try_fill_recv_batch() should then stop napi and add as many as it
>   can, then restart it, then free any remainder.

This sounds reasonable to me.  I'll see what I can muster together this week.

Thanks for taking a look :)

>
> Cheers,
> Rusty.

^ permalink raw reply

* Re: [RFC PATCH v1 2/2] virtio_net: Don't disable napi on low memory.
From: Rusty Russell @ 2012-01-05  0:31 UTC (permalink / raw)
  To: Mike Waychison, Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120104225237.18184.71853.stgit@mike2.sea.corp.google.com>

On Wed, 04 Jan 2012 14:52:38 -0800, Mike Waychison <mikew@google.com> wrote:
> Currently, the virtio_net driver deals with low memory memory by kicking
> off delayed work in process context to try and refill the rx queues.
> This process-context filling is synchronized against the receive
> bottom-half by serially:
> 
>   - disabling NAPI polling on the device,
>   - allocating buffers,
>   - enqueueing the buffers,
>   - re-enabling napi.
> 
> Disabling NAPI just to synchronize virtio_add_buf calls is a bit
> overkill, and there isn't any reason we shouldn't be able to continue
> processing received packets as they come in.  In the simple case, this
> does not involve allocating any memory, and in fact allows memory to be
> released as the guest system receives and processes packets.

Adding a spinlock to the fastpath for a weird case in the slow path is
bad too.

I dislike several things about this patch:
1) You seem to leak memory if you allocate a batch then don't add it all.
2) You have a module parameter, which I guarantee noone else will ever use.
3) You've made three changes at once: allocating outside the lock,
   batching, and replacing the napi lock with a spinlock.
4) You use the skb data for the linked list; use the skb head's list.

Instead, here's how I think it should be done:

1) Split alloc and add, but make alloc return the skb.
2) Make try_fill_recv() only called in the receive path, keep it
   basically the same (no batching).
3) Add a new try_fill_recv_batch() for the workqueue and init paths.
   This should try to allocate max - num, though in practice the
   first alloc would probably kick off reclaim and take forever,
   then by the time it's done, the problem is solved.  Since it's
   reading max and num outside the lock, write this loop carefully!
4) try_fill_recv_batch() should then stop napi and add as many as it
   can, then restart it, then free any remainder.

Cheers,
Rusty.

^ permalink raw reply

* Re: [RFC PATCH v1 1/2] virtio_net: Pass gfp flags when allocating rx buffers.
From: Rusty Russell @ 2012-01-05  0:10 UTC (permalink / raw)
  To: Mike Waychison, Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120104225231.18184.96390.stgit@mike2.sea.corp.google.com>

On Wed, 04 Jan 2012 14:52:32 -0800, Mike Waychison <mikew@google.com> wrote:
> Currently, the refill path for RX buffers will always allocate the
> buffers as GFP_ATOMIC, even if we are in process context.  This will
> fail to apply memory pressure as the worker thread will not contribute
> to the freeing of memory.
> 
> Fix this by changing add_recvbuf_small to use the gfp variant allocator,
> __netdev_alloc_skb_ip_align().
> 
> Signed-off-by: Mike Waychison <mikew@google.com>

OK, this is a no-brainer.  Thanks!  Dave, can you pick this up?

Acked-by: Rusty Russell <rusty@rustcorp.com.au>

Cheers,
Rusty.

^ permalink raw reply

* [RFC PATCH v1 2/2] virtio_net: Don't disable napi on low memory.
From: Mike Waychison @ 2012-01-04 22:52 UTC (permalink / raw)
  To: Rusty Russell, Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120104225223.18184.1537.stgit@mike2.sea.corp.google.com>

Currently, the virtio_net driver deals with low memory memory by kicking
off delayed work in process context to try and refill the rx queues.
This process-context filling is synchronized against the receive
bottom-half by serially:

  - disabling NAPI polling on the device,
  - allocating buffers,
  - enqueueing the buffers,
  - re-enabling napi.

Disabling NAPI just to synchronize virtio_add_buf calls is a bit
overkill, and there isn't any reason we shouldn't be able to continue
processing received packets as they come in.  In the simple case, this
does not involve allocating any memory, and in fact allows memory to be
released as the guest system receives and processes packets.

Instead of disabling NAPI, this change re-arranges the allocation and
enqueueing of buffers such that allocation of all buffers happens
separately from the enqueueing of said buffers.  In lieu of serializing
using NAPI for the refill out-of-band path, we now serialize using a
newly introduced bottom-half spinlock, vi->rx_fill_lock.

With the allocation split off from the enqueueing, we now must be a bit
more careful about how many allocations to make.  Before-hand, we were
limitting our allocations up to the point where the virtio_add_buf call
returned -ENOSPC, but with allocations done up front, we don't really
know how many allocations to perform.  Instead of grabbing the lock all
the time, we create batches of buffer allocations, as defined by the new
rx_allocate_batch module parameter.

As well, due to the way we infer the "vi->max" number of buffers the
queue can take, (as we really don't know up front), we now simply assume
that vi->max is whatever the number of successful allocations made were
at driver initialization, and update it later if we were wrong.

In empirical experiments where memory is tight, this patch allows the
out-of-band refill path to contribute to memory pressure for reclaim,
while reducing the perceived outage of the interface's receive path.
This in turn reduces network stalls that lead to congestion window
collapses.

Signed-off-by: Mike Waychison <mikew@google.com>
---
 drivers/net/virtio_net.c |  210 +++++++++++++++++++++++++++++++++++-----------
 1 files changed, 160 insertions(+), 50 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 76fe14e..90b1e6d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -34,6 +34,9 @@ static bool csum = true, gso = true;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
 
+static int rx_allocate_batch = 32;
+module_param(rx_allocate_batch, int, 0644);
+
 /* FIXME: MTU in config. */
 #define MAX_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
 #define GOOD_COPY_LEN	128
@@ -72,6 +75,9 @@ struct virtnet_info {
 	/* Work struct for refilling if we run low on memory. */
 	struct delayed_work refill;
 
+	/* Lock used to synchronize adding buffers to the rx queue */
+	spinlock_t rx_fill_lock;
+
 	/* Chain pages by the private ptr. */
 	struct page *pages;
 
@@ -353,58 +359,82 @@ frame_err:
 	dev_kfree_skb(skb);
 }
 
-static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
+static int alloc_recvbuf_small(struct virtnet_info *vi, struct list_head *list,
+			       gfp_t gfp)
 {
 	struct sk_buff *skb;
-	struct skb_vnet_hdr *hdr;
-	int err;
 
 	skb = __netdev_alloc_skb_ip_align(vi->dev, MAX_PACKET_LEN, gfp);
 	if (unlikely(!skb))
 		return -ENOMEM;
 
 	skb_put(skb, MAX_PACKET_LEN);
+	list_add_tail((struct list_head *)skb, list);
+	return 0;
+}
+
+static int add_recvbuf_small(struct virtnet_info *vi, struct sk_buff *skb)
+{
+	struct skb_vnet_hdr *hdr;
+	int err;
 
 	hdr = skb_vnet_hdr(skb);
 	sg_set_buf(vi->rx_sg, &hdr->hdr, sizeof hdr->hdr);
 
 	skb_to_sgvec(skb, vi->rx_sg + 1, 0, skb->len);
 
-	err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, 2, skb, gfp);
+	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, 2, skb);
 	if (err < 0)
 		dev_kfree_skb(skb);
-
 	return err;
 }
 
-static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
+static int alloc_recvbuf_big(struct virtnet_info *vi, struct list_head *list,
+			     gfp_t gfp)
 {
-	struct page *first, *list = NULL;
-	char *p;
-	int i, err, offset;
+	struct page *first, *tail = NULL;
+	int i;
 
 	/* page in vi->rx_sg[MAX_SKB_FRAGS + 1] is list tail */
 	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
 		first = get_a_page(vi, gfp);
 		if (!first) {
-			if (list)
-				give_pages(vi, list);
+			if (tail)
+				give_pages(vi, tail);
 			return -ENOMEM;
 		}
 		sg_set_buf(&vi->rx_sg[i], page_address(first), PAGE_SIZE);
 
 		/* chain new page in list head to match sg */
-		first->private = (unsigned long)list;
-		list = first;
+		first->private = (unsigned long)tail;
+		tail = first;
 	}
 
 	first = get_a_page(vi, gfp);
 	if (!first) {
-		give_pages(vi, list);
+		give_pages(vi, tail);
 		return -ENOMEM;
 	}
-	p = page_address(first);
 
+	/* chain first in list head */
+	first->private = (unsigned long)tail;
+
+	list_add_tail(&first->lru, list);
+
+	return 0;
+}
+
+/*
+ *  Add a big page threaded through first->private to the rx queue.
+ *
+ *  Consumes the page list, even in case of failure.
+ */
+static int add_recvbuf_big(struct virtnet_info *vi, struct page *first)
+{
+	char *p;
+	int err, offset;
+
+	p = page_address(first);
 	/* vi->rx_sg[0], vi->rx_sg[1] share the same page */
 	/* a separated vi->rx_sg[0] for virtio_net_hdr only due to QEMU bug */
 	sg_set_buf(&vi->rx_sg[0], p, sizeof(struct virtio_net_hdr));
@@ -413,65 +443,140 @@ static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
 	offset = sizeof(struct padded_vnet_hdr);
 	sg_set_buf(&vi->rx_sg[1], p + offset, PAGE_SIZE - offset);
 
-	/* chain first in list head */
-	first->private = (unsigned long)list;
-	err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
-				    first, gfp);
+	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, MAX_SKB_FRAGS + 2,
+				first);
 	if (err < 0)
 		give_pages(vi, first);
-
 	return err;
 }
 
-static int add_recvbuf_mergeable(struct virtnet_info *vi, gfp_t gfp)
+static int alloc_recvbuf_mergeable(struct virtnet_info *vi,
+				   struct list_head *list, gfp_t gfp)
 {
 	struct page *page;
-	int err;
 
 	page = get_a_page(vi, gfp);
 	if (!page)
 		return -ENOMEM;
 
-	sg_init_one(vi->rx_sg, page_address(page), PAGE_SIZE);
+	list_add_tail(&page->lru, list);
+	return 0;
+}
 
-	err = virtqueue_add_buf_gfp(vi->rvq, vi->rx_sg, 0, 1, page, gfp);
+/* Add a page to the rx queue.  Consumes the page even in failure. */
+static int add_recvbuf_mergeable(struct virtnet_info *vi, struct page *page)
+{
+	int err;
+	sg_init_one(vi->rx_sg, page_address(page), PAGE_SIZE);
+	err = virtqueue_add_buf(vi->rvq, vi->rx_sg, 0, 1, page);
 	if (err < 0)
 		give_pages(vi, page);
-
 	return err;
 }
 
-/*
- * Returns false if we couldn't fill entirely (OOM).
- *
- * Normally run in the receive path, but can also be run from ndo_open
- * before we're receiving packets, or from refill_work which is
- * careful to disable receiving (using napi_disable).
- */
-static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
+/* Returns false if we couldn't fill entirely (OOM). */
+static inline bool try_fill_recv_pages(struct virtnet_info *vi, gfp_t gfp,
+		int (*alloc)(struct virtnet_info *, struct list_head *, gfp_t),
+		int (*add)(struct virtnet_info *, struct page *))
 {
+	LIST_HEAD(local_list);
+	struct page *page, *nextpage;
 	int err;
 	bool oom;
 
-	do {
-		if (vi->mergeable_rx_bufs)
-			err = add_recvbuf_mergeable(vi, gfp);
-		else if (vi->big_packets)
-			err = add_recvbuf_big(vi, gfp);
-		else
-			err = add_recvbuf_small(vi, gfp);
+	while (1) {
+		int count = rx_allocate_batch;
+		do {
+			err = alloc(vi, &local_list, gfp);
+			oom |= err == -ENOMEM;
+			if (err < 0)
+				break;
+		} while (err >= 0 && --count);
+
+		spin_lock_bh(&vi->rx_fill_lock);
+		list_for_each_entry_safe(page, nextpage, &local_list, lru) {
+			list_del(&page->lru);
+			err = add(vi, page);
+			oom |= err == -ENOMEM;
+			if (err >= 0)
+				++vi->num;
+		}
+		if (unlikely(vi->num > vi->max))
+			vi->max = vi->num;
+		if (err < 0 || vi->num == vi->max)
+			break;
+		spin_unlock_bh(&vi->rx_fill_lock);
+	}
+	virtqueue_kick(vi->rvq);
+	spin_unlock_bh(&vi->rx_fill_lock);
+	return !oom;
+}
+
+static bool try_fill_recv_mergeable(struct virtnet_info *vi, gfp_t gfp)
+{
+	return try_fill_recv_pages(vi, gfp, alloc_recvbuf_mergeable,
+				   add_recvbuf_mergeable);
+}
+
+static bool try_fill_recv_big(struct virtnet_info *vi, gfp_t gfp)
+{
+	return try_fill_recv_pages(vi, gfp, alloc_recvbuf_big, add_recvbuf_big);
+}
+
+static bool try_fill_recv_small(struct virtnet_info *vi, gfp_t gfp)
+{
+	LIST_HEAD(local_list);
+	struct list_head *pos, *n;
+	int err;
+	bool oom;
 
-		oom = err == -ENOMEM;
-		if (err < 0)
+	while (1) {
+		int count = rx_allocate_batch;
+		do {
+			err = alloc_recvbuf_small(vi, &local_list, gfp);
+			oom |= err == -ENOMEM;
+			/* In case of error, still process the local_list */
+			if (err < 0)
+				break;
+		} while (err >= 0 && --count);
+
+		spin_lock_bh(&vi->rx_fill_lock);
+		list_for_each_safe(pos, n, &local_list) {
+			struct sk_buff *skb = (struct sk_buff *)pos;
+			list_del(pos);
+			err = add_recvbuf_small(vi, skb);
+			oom |= err == -ENOMEM;
+			if (err >= 0)
+				++vi->num;
+		}
+		if (unlikely(vi->num > vi->max))
+			vi->max = vi->num;
+		if (err < 0 || vi->num == vi->max)
 			break;
-		++vi->num;
-	} while (err > 0);
-	if (unlikely(vi->num > vi->max))
-		vi->max = vi->num;
+		spin_unlock_bh(&vi->rx_fill_lock);
+	}
 	virtqueue_kick(vi->rvq);
+	spin_unlock_bh(&vi->rx_fill_lock);
 	return !oom;
 }
 
+/*
+ * Returns false if we couldn't fill entirely (OOM).
+ *
+ * Normally run in the receive path, but can also be run from ndo_open
+ * before we're receiving packets, or from refill_work.  Serializes
+ * access to the virtioqueue using vi->rx_fill_lock.
+ */
+static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
+{
+	if (vi->mergeable_rx_bufs)
+		return try_fill_recv_mergeable(vi, gfp);
+	else if (vi->big_packets)
+		return try_fill_recv_big(vi, gfp);
+	else
+		return try_fill_recv_small(vi, gfp);
+}
+
 static void skb_recv_done(struct virtqueue *rvq)
 {
 	struct virtnet_info *vi = rvq->vdev->priv;
@@ -502,9 +607,7 @@ static void refill_work(struct work_struct *work)
 	bool still_empty;
 
 	vi = container_of(work, struct virtnet_info, refill.work);
-	napi_disable(&vi->napi);
 	still_empty = !try_fill_recv(vi, GFP_KERNEL);
-	virtnet_napi_enable(vi);
 
 	/* In theory, this can happen: if we don't get any buffers in
 	 * we will *never* try to fill again. */
@@ -519,12 +622,15 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 	unsigned int len, received = 0;
 
 again:
+	/* We must serialize access to the queue from any concurrent refills. */
+	spin_lock(&vi->rx_fill_lock);
 	while (received < budget &&
 	       (buf = virtqueue_get_buf(vi->rvq, &len)) != NULL) {
 		receive_buf(vi->dev, buf, len);
 		--vi->num;
 		received++;
 	}
+	spin_unlock(&vi->rx_fill_lock);
 
 	if (vi->num < vi->max / 2) {
 		if (!try_fill_recv(vi, GFP_ATOMIC))
@@ -675,7 +781,7 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
 
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
 		vdev->config->set(vdev, offsetof(struct virtio_net_config, mac),
-		                  dev->dev_addr, dev->addr_len);
+				  dev->dev_addr, dev->addr_len);
 
 	return 0;
 }
@@ -1053,6 +1159,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 		goto free;
 
 	INIT_DELAYED_WORK(&vi->refill, refill_work);
+	spin_lock_init(&vi->rx_fill_lock);
 	sg_init_table(vi->rx_sg, ARRAY_SIZE(vi->rx_sg));
 	sg_init_table(vi->tx_sg, ARRAY_SIZE(vi->tx_sg));
 
@@ -1089,8 +1196,11 @@ static int virtnet_probe(struct virtio_device *vdev)
 		goto free_vqs;
 	}
 
-	/* Last of all, set up some receive buffers. */
+	/* Last of all, set up some receive buffers.
+	 * Start off with a huge max, and then clamp it down to the real max */
+	vi->max = ~0U;
 	try_fill_recv(vi, GFP_KERNEL);
+	vi->max = vi->num;
 
 	/* If we didn't even get one input buffer, we're useless. */
 	if (vi->num == 0) {

^ permalink raw reply related

* [RFC PATCH v1 1/2] virtio_net: Pass gfp flags when allocating rx buffers.
From: Mike Waychison @ 2012-01-04 22:52 UTC (permalink / raw)
  To: Rusty Russell, Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20120104225223.18184.1537.stgit@mike2.sea.corp.google.com>

Currently, the refill path for RX buffers will always allocate the
buffers as GFP_ATOMIC, even if we are in process context.  This will
fail to apply memory pressure as the worker thread will not contribute
to the freeing of memory.

Fix this by changing add_recvbuf_small to use the gfp variant allocator,
__netdev_alloc_skb_ip_align().

Signed-off-by: Mike Waychison <mikew@google.com>
---
 drivers/net/virtio_net.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2055386..76fe14e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -156,6 +156,7 @@ static void set_skb_frag(struct sk_buff *skb, struct page *page,
 	*len -= size;
 }
 
+/* Called from bottom half context */
 static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 				   struct page *page, unsigned int len)
 {
@@ -358,7 +359,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
 	struct skb_vnet_hdr *hdr;
 	int err;
 
-	skb = netdev_alloc_skb_ip_align(vi->dev, MAX_PACKET_LEN);
+	skb = __netdev_alloc_skb_ip_align(vi->dev, MAX_PACKET_LEN, gfp);
 	if (unlikely(!skb))
 		return -ENOMEM;

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox