Linux virtualization list
 help / color / mirror / Atom feed
* [patch net-next] virtio_net: allow to change mac when iface is running
From: Jiri Pirko @ 2012-06-27 15:27 UTC (permalink / raw)
  To: netdev; +Cc: brouer, virtualization, davem, mst

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/virtio_net.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f18149a..36a16d5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -679,11 +679,12 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct virtio_device *vdev = vi->vdev;
-	int ret;
+	struct sockaddr *addr = p;
 
-	ret = eth_mac_addr(dev, p);
-	if (ret)
-		return ret;
+	if (!is_valid_ether_addr(addr->sa_data))
+		return -EADDRNOTAVAIL;
+	memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
+	dev->addr_assign_type &= ~NET_ADDR_RANDOM;
 
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
 		vdev->config->set(vdev, offsetof(struct virtio_net_config, mac),
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH 1/4] mm: introduce compaction and migration for virtio ballooned pages
From: Konrad Rzeszutek Wilk @ 2012-06-27 15:30 UTC (permalink / raw)
  To: Rafael Aquini
  Cc: Rik van Riel, Michael S. Tsirkin, linux-kernel, virtualization,
	linux-mm, Konrad Rzeszutek Wilk
In-Reply-To: <20120627151716.GA3653@t510.redhat.com>

On Wed, Jun 27, 2012 at 12:17:17PM -0300, Rafael Aquini wrote:
> On Tue, Jun 26, 2012 at 07:57:55PM -0400, Konrad Rzeszutek Wilk wrote:
> > > +#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
.. snip..
> > > +struct address_space *balloon_mapping;
> > > +EXPORT_SYMBOL(balloon_mapping);
> > 
> > Why don't you call this kvm_balloon_mapping - and when other balloon
> > drivers use it, then change it to something more generic. Also at that
> > future point the other balloon drivers might do it a bit differently so
> > it might be that will be reworked completly.
> 
> Ok, I see your point. However I really think it's better to keep the naming as
> generic as possible today and, in the future, those who need to change it a bit can
> do it with no pain at all. I believe this way we potentially prevent unnecessary code
> duplication, as it will just be a matter of adjusting those preprocessor checking to
> include other balloon driver to the scheme, or get rid of all of them (in case all 
> balloon drivers assume the very same technique for their page mobility primitives).

Either way, if a driver is going to use this, they would need to adjust the
preprocessor checking (as you pointed out) to include: #ifdef CONFIG_HYPERVISORX_BALLOON
in this file. At which point they might as well rename the exported symbol to be more
generic - and do whatever else they need to do (add extra stuff maybe?).

> 
> As I can be utterly wrong on this, lets see if other folks raise the same
> concerns about this naming scheme I'm using here. If it ends up being a general
> concern that it would be better not being generic at this point, I'll happily
> switch my approach to whatever comes up to be the most feasible way of doing it.

My point here is that its more of name-space pollution. I've gotten flak on doing
this with drivers - which had very generic sounding names, and it made more sense
to rename them with a proper prefix. You are adding pieces of code for the
benefit of one driver.

But that (getting flak on the namespace) might be because the mailing list where I
had posted had more aggressive reviewers and this one is composed of more mellow folks
who are OK with this. Andrew is the final man - and I am not sure what he
prefers.

^ permalink raw reply

* Re: [PATCH] Add a page cache-backed balloon device driver.
From: Frank Swiderski @ 2012-06-27 15:48 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Andrea Arcangeli, riel, kvm, Michael S. Tsirkin, linux-kernel,
	virtualization, mikew
In-Reply-To: <87lij91myw.fsf@rustcorp.com.au>

On Tue, Jun 26, 2012 at 7:56 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> On Wed, 27 Jun 2012 00:41:06 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> On Tue, Jun 26, 2012 at 01:32:58PM -0700, Frank Swiderski wrote:
>> > This implementation of a virtio balloon driver uses the page cache to
>> > "store" pages that have been released to the host.  The communication
>> > (outside of target counts) is one way--the guest notifies the host when
>> > it adds a page to the page cache, allowing the host to madvise(2) with
>> > MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
>> > (via the regular page reclaim).  This means that inflating the balloon
>> > is similar to the existing balloon mechanism, but the deflate is
>> > different--it re-uses existing Linux kernel functionality to
>> > automatically reclaim.
>> >
>> > Signed-off-by: Frank Swiderski <fes@google.com>
>>
>> I'm pondering this:
>>
>> Should it really be a separate driver/device ID?
>> If it behaves the same from host POV, maybe it
>> should be up to the guest how to inflate/deflate
>> the balloon internally?
>
> Well, it shouldn't steal ID 10, either way :)  Either use a completely
> bogus number, or ask for an id.
>
> But AFAICT this should be a an alternate driver of for the same device:
> it's not really a separate device, is it?
>
> Cheers,
> Rusty.

Apologies, Rusty.  Asking for an ID is in the virtio spec, and I
completely neglected that step.  Though as you and others have pointed
out, this probably fits better as a different driver for the same
device.  Since it changes whether or not the deflate operation is
necessary, it also seems that how this should look is different
behavior based on a feature bit in the device.

If that sounds reasonable, then what I'll do with this patch is merge
it with the existing virtio balloon driver with a feature bit for
determining which behavior to use.

I also think the idea of a generic balloon that the different balloon
drivers use for the inflate/deflate operations is interesting and
useful, though I think the suggestion of pending that until later is
correct.

Sounds reasonable?

Regards,
fes

^ permalink raw reply

* Re: [PATCH] Add a page cache-backed balloon device driver.
From: Michael S. Tsirkin @ 2012-06-27 16:06 UTC (permalink / raw)
  To: Frank Swiderski
  Cc: Andrea Arcangeli, riel, kvm, linux-kernel, virtualization, mikew
In-Reply-To: <CAK+C7kWRsDcsB-W9m=Hn65xekvb-uOZC4oAMT0z48CC5q00oJw@mail.gmail.com>

On Wed, Jun 27, 2012 at 08:48:55AM -0700, Frank Swiderski wrote:
> On Tue, Jun 26, 2012 at 7:56 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> > On Wed, 27 Jun 2012 00:41:06 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> On Tue, Jun 26, 2012 at 01:32:58PM -0700, Frank Swiderski wrote:
> >> > This implementation of a virtio balloon driver uses the page cache to
> >> > "store" pages that have been released to the host.  The communication
> >> > (outside of target counts) is one way--the guest notifies the host when
> >> > it adds a page to the page cache, allowing the host to madvise(2) with
> >> > MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
> >> > (via the regular page reclaim).  This means that inflating the balloon
> >> > is similar to the existing balloon mechanism, but the deflate is
> >> > different--it re-uses existing Linux kernel functionality to
> >> > automatically reclaim.
> >> >
> >> > Signed-off-by: Frank Swiderski <fes@google.com>
> >>
> >> I'm pondering this:
> >>
> >> Should it really be a separate driver/device ID?
> >> If it behaves the same from host POV, maybe it
> >> should be up to the guest how to inflate/deflate
> >> the balloon internally?
> >
> > Well, it shouldn't steal ID 10, either way :)  Either use a completely
> > bogus number, or ask for an id.
> >
> > But AFAICT this should be a an alternate driver of for the same device:
> > it's not really a separate device, is it?
> >
> > Cheers,
> > Rusty.
> 
> Apologies, Rusty.  Asking for an ID is in the virtio spec, and I
> completely neglected that step.  Though as you and others have pointed
> out, this probably fits better as a different driver for the same
> device.  Since it changes whether or not the deflate operation is
> necessary, it also seems that how this should look is different
> behavior based on a feature bit in the device.
> 
> If that sounds reasonable, then what I'll do with this patch is merge
> it with the existing virtio balloon driver with a feature bit for
> determining which behavior to use.
> 
> I also think the idea of a generic balloon that the different balloon
> drivers use for the inflate/deflate operations is interesting and
> useful, though I think the suggestion of pending that until later is
> correct.
> 
> Sounds reasonable?
> 
> Regards,
> fes

I think a spec patch would be a good spec at this point.
You can get the spec from Rusty, or a mirror
from my git:

git://git.kernel.org/pub/scm/virt/kvm/mst/virtio-spec.git

^ permalink raw reply

* Re: [PATCH] Add a page cache-backed balloon device driver.
From: Frank Swiderski @ 2012-06-27 16:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Andrea Arcangeli, riel, kvm, linux-kernel, virtualization, mikew
In-Reply-To: <20120627160644.GD21393@redhat.com>

On Wed, Jun 27, 2012 at 9:06 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, Jun 27, 2012 at 08:48:55AM -0700, Frank Swiderski wrote:
>> On Tue, Jun 26, 2012 at 7:56 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
>> > On Wed, 27 Jun 2012 00:41:06 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> On Tue, Jun 26, 2012 at 01:32:58PM -0700, Frank Swiderski wrote:
>> >> > This implementation of a virtio balloon driver uses the page cache to
>> >> > "store" pages that have been released to the host.  The communication
>> >> > (outside of target counts) is one way--the guest notifies the host when
>> >> > it adds a page to the page cache, allowing the host to madvise(2) with
>> >> > MADV_DONTNEED.  Reclaim in the guest is therefore automatic and implicit
>> >> > (via the regular page reclaim).  This means that inflating the balloon
>> >> > is similar to the existing balloon mechanism, but the deflate is
>> >> > different--it re-uses existing Linux kernel functionality to
>> >> > automatically reclaim.
>> >> >
>> >> > Signed-off-by: Frank Swiderski <fes@google.com>
>> >>
>> >> I'm pondering this:
>> >>
>> >> Should it really be a separate driver/device ID?
>> >> If it behaves the same from host POV, maybe it
>> >> should be up to the guest how to inflate/deflate
>> >> the balloon internally?
>> >
>> > Well, it shouldn't steal ID 10, either way :)  Either use a completely
>> > bogus number, or ask for an id.
>> >
>> > But AFAICT this should be a an alternate driver of for the same device:
>> > it's not really a separate device, is it?
>> >
>> > Cheers,
>> > Rusty.
>>
>> Apologies, Rusty.  Asking for an ID is in the virtio spec, and I
>> completely neglected that step.  Though as you and others have pointed
>> out, this probably fits better as a different driver for the same
>> device.  Since it changes whether or not the deflate operation is
>> necessary, it also seems that how this should look is different
>> behavior based on a feature bit in the device.
>>
>> If that sounds reasonable, then what I'll do with this patch is merge
>> it with the existing virtio balloon driver with a feature bit for
>> determining which behavior to use.
>>
>> I also think the idea of a generic balloon that the different balloon
>> drivers use for the inflate/deflate operations is interesting and
>> useful, though I think the suggestion of pending that until later is
>> correct.
>>
>> Sounds reasonable?
>>
>> Regards,
>> fes
>
> I think a spec patch would be a good spec at this point.
> You can get the spec from Rusty, or a mirror
> from my git:
>
> git://git.kernel.org/pub/scm/virt/kvm/mst/virtio-spec.git
>
>
>


Got it, thanks, will do.

Regards,
fes

^ permalink raw reply

* Re: [patch net-next] virtio_net: allow to change mac when iface is running
From: David Miller @ 2012-06-28  4:30 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, virtualization, brouer, mst
In-Reply-To: <1340810866-1017-1-git-send-email-jpirko@redhat.com>

From: Jiri Pirko <jpirko@redhat.com>
Date: Wed, 27 Jun 2012 17:27:46 +0200

> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

Applied, but this seriously makes eth_mac_addr() completely useless.

Technically, every eth_mac_addr() user in a software/virtual device
should behave the way virtio_net does now.

It therefore probably makes sense to add a boolean arg which when true
elides the netif_running() check then fixup and audit every caller.

^ permalink raw reply

* Re: [patch net-next] virtio_net: allow to change mac when iface is running
From: Jiri Pirko @ 2012-06-28  6:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, virtualization, brouer, mst
In-Reply-To: <20120627.213046.1244710404799995026.davem@davemloft.net>

Thu, Jun 28, 2012 at 06:30:46AM CEST, davem@davemloft.net wrote:
>From: Jiri Pirko <jpirko@redhat.com>
>Date: Wed, 27 Jun 2012 17:27:46 +0200
>
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>Applied, but this seriously makes eth_mac_addr() completely useless.
>
>Technically, every eth_mac_addr() user in a software/virtual device
>should behave the way virtio_net does now.

I guess to. But for some HW devices eth_mac_addr() is needed (when they
does not support "life" mac change")

>
>It therefore probably makes sense to add a boolean arg which when true
>elides the netif_running() check then fixup and audit every caller.

I was thinking about this. Maybe probably __eth_mac_addr() which does
not have netif_running() check and eth_mac_addr() calling
netif_running() check and __eth_mac_addr() after that.

What do you think?

Jirka

^ permalink raw reply

* Re: [patch net-next] virtio_net: allow to change mac when iface is running
From: David Miller @ 2012-06-28  8:30 UTC (permalink / raw)
  To: jpirko; +Cc: netdev, virtualization, brouer, mst
In-Reply-To: <20120628063525.GA1520@minipsycho.orion>

From: Jiri Pirko <jpirko@redhat.com>
Date: Thu, 28 Jun 2012 08:35:25 +0200

> Thu, Jun 28, 2012 at 06:30:46AM CEST, davem@davemloft.net wrote:
>>It therefore probably makes sense to add a boolean arg which when true
>>elides the netif_running() check then fixup and audit every caller.
> 
> I was thinking about this. Maybe probably __eth_mac_addr() which does
> not have netif_running() check and eth_mac_addr() calling
> netif_running() check and __eth_mac_addr() after that.
> 
> What do you think?

Yes, sounds good.

^ permalink raw reply

* [patch net-next 0/4] net: introduce and use IFF_LIFE_ADDR_CHANGE
From: Jiri Pirko @ 2012-06-28 14:10 UTC (permalink / raw)
  To: netdev; +Cc: mst, shimoda.hiroaki, virtualization, danny.kukawka, edumazet,
	davem

three drivers updated, but this can be used in many others.

Jiri Pirko (4):
  net: introduce new priv_flag indicating iface capable of change mac
    when running
  virtio_net: use IFF_LIFE_ADDR_CHANGE priv_flag
  team: use IFF_LIFE_ADDR_CHANGE priv_flag
  dummy: use IFF_LIFE_ADDR_CHANGE priv_flag

 drivers/net/dummy.c      |   15 ++-------------
 drivers/net/team/team.c  |    9 +++++----
 drivers/net/virtio_net.c |   11 +++++------
 include/linux/if.h       |    2 ++
 net/ethernet/eth.c       |    2 +-
 5 files changed, 15 insertions(+), 24 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [patch net-next 1/4] net: introduce new priv_flag indicating iface capable of change mac when running
From: Jiri Pirko @ 2012-06-28 14:10 UTC (permalink / raw)
  To: netdev; +Cc: mst, shimoda.hiroaki, virtualization, danny.kukawka, edumazet,
	davem
In-Reply-To: <1340892639-1111-1-git-send-email-jpirko@redhat.com>

Introduce IFF_LIFE_ADDR_CHANGE priv_flag and use it to disable
netif_running() check in eth_mac_addr()

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 include/linux/if.h |    2 ++
 net/ethernet/eth.c |    2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/if.h b/include/linux/if.h
index f995c66..fd9ee7c 100644
--- a/include/linux/if.h
+++ b/include/linux/if.h
@@ -81,6 +81,8 @@
 #define IFF_UNICAST_FLT	0x20000		/* Supports unicast filtering	*/
 #define IFF_TEAM_PORT	0x40000		/* device used as team port */
 #define IFF_SUPP_NOFCS	0x80000		/* device supports sending custom FCS */
+#define IFF_LIFE_ADDR_CHANGE 0x100000	/* device supports hardware address
+					 * change when it's running */
 
 
 #define IF_GET_IFACE	0x0001		/* for querying only */
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 36e5880..8f8ded4 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -283,7 +283,7 @@ int eth_mac_addr(struct net_device *dev, void *p)
 {
 	struct sockaddr *addr = p;
 
-	if (netif_running(dev))
+	if (!(dev->priv_flags & IFF_LIFE_ADDR_CHANGE) && netif_running(dev))
 		return -EBUSY;
 	if (!is_valid_ether_addr(addr->sa_data))
 		return -EADDRNOTAVAIL;
-- 
1.7.10.4

^ permalink raw reply related

* [patch net-next 2/4] virtio_net: use IFF_LIFE_ADDR_CHANGE priv_flag
From: Jiri Pirko @ 2012-06-28 14:10 UTC (permalink / raw)
  To: netdev; +Cc: mst, shimoda.hiroaki, virtualization, danny.kukawka, edumazet,
	davem
In-Reply-To: <1340892639-1111-1-git-send-email-jpirko@redhat.com>

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/virtio_net.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 36a16d5..6a0f526 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -679,12 +679,11 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct virtio_device *vdev = vi->vdev;
-	struct sockaddr *addr = p;
+	int ret;
 
-	if (!is_valid_ether_addr(addr->sa_data))
-		return -EADDRNOTAVAIL;
-	memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
-	dev->addr_assign_type &= ~NET_ADDR_RANDOM;
+	ret = eth_mac_addr(dev, p);
+	if (ret)
+		return ret;
 
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
 		vdev->config->set(vdev, offsetof(struct virtio_net_config, mac),
@@ -1063,7 +1062,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 		return -ENOMEM;
 
 	/* Set up network device as normal. */
-	dev->priv_flags |= IFF_UNICAST_FLT;
+	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIFE_ADDR_CHANGE;
 	dev->netdev_ops = &virtnet_netdev;
 	dev->features = NETIF_F_HIGHDMA;
 
-- 
1.7.10.4

^ permalink raw reply related

* [patch net-next 3/4] team: use IFF_LIFE_ADDR_CHANGE priv_flag
From: Jiri Pirko @ 2012-06-28 14:10 UTC (permalink / raw)
  To: netdev; +Cc: mst, shimoda.hiroaki, virtualization, danny.kukawka, edumazet,
	davem
In-Reply-To: <1340892639-1111-1-git-send-email-jpirko@redhat.com>

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/team/team.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 5350eea..019d658 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1188,10 +1188,11 @@ static int team_set_mac_address(struct net_device *dev, void *p)
 {
 	struct team *team = netdev_priv(dev);
 	struct team_port *port;
-	struct sockaddr *addr = p;
+	int err;
 
-	dev->addr_assign_type &= ~NET_ADDR_RANDOM;
-	memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
+	err = eth_mac_addr(dev, p);
+	if (err)
+		return err;
 	rcu_read_lock();
 	list_for_each_entry_rcu(port, &team->port_list, list)
 		if (team->ops.port_change_mac)
@@ -1393,7 +1394,7 @@ static void team_setup(struct net_device *dev)
 	 * bring us to promisc mode in case a unicast addr is added.
 	 * Let this up to underlay drivers.
 	 */
-	dev->priv_flags |= IFF_UNICAST_FLT;
+	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIFE_ADDR_CHANGE;
 
 	dev->features |= NETIF_F_LLTX;
 	dev->features |= NETIF_F_GRO;
-- 
1.7.10.4

^ permalink raw reply related

* [patch net-next 4/4] dummy: use IFF_LIFE_ADDR_CHANGE priv_flag
From: Jiri Pirko @ 2012-06-28 14:10 UTC (permalink / raw)
  To: netdev; +Cc: mst, shimoda.hiroaki, virtualization, danny.kukawka, edumazet,
	davem
In-Reply-To: <1340892639-1111-1-git-send-email-jpirko@redhat.com>

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
---
 drivers/net/dummy.c |   15 ++-------------
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index bab0158..0352246 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -40,18 +40,6 @@
 
 static int numdummies = 1;
 
-static int dummy_set_address(struct net_device *dev, void *p)
-{
-	struct sockaddr *sa = p;
-
-	if (!is_valid_ether_addr(sa->sa_data))
-		return -EADDRNOTAVAIL;
-
-	dev->addr_assign_type &= ~NET_ADDR_RANDOM;
-	memcpy(dev->dev_addr, sa->sa_data, ETH_ALEN);
-	return 0;
-}
-
 /* fake multicast ability */
 static void set_multicast_list(struct net_device *dev)
 {
@@ -118,7 +106,7 @@ static const struct net_device_ops dummy_netdev_ops = {
 	.ndo_start_xmit		= dummy_xmit,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_set_rx_mode	= set_multicast_list,
-	.ndo_set_mac_address	= dummy_set_address,
+	.ndo_set_mac_address	= eth_mac_addr,
 	.ndo_get_stats64	= dummy_get_stats64,
 };
 
@@ -134,6 +122,7 @@ static void dummy_setup(struct net_device *dev)
 	dev->tx_queue_len = 0;
 	dev->flags |= IFF_NOARP;
 	dev->flags &= ~IFF_MULTICAST;
+	dev->priv_flags |= IFF_LIFE_ADDR_CHANGE;
 	dev->features	|= NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_TSO;
 	dev->features	|= NETIF_F_HW_CSUM | NETIF_F_HIGHDMA | NETIF_F_LLTX;
 	eth_hw_addr_random(dev);
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH 00/13] drivers: hv: kvp
From: Olaf Hering @ 2012-06-28 14:23 UTC (permalink / raw)
  To: KY Srinivasan; +Cc: Greg KH, apw, devel, virtualization, linux-kernel
In-Reply-To: <426367E2313C2449837CD2DE46E7EAF9155ED68D@SN2PRD0310MB382.namprd03.prod.outlook.com>

On Tue, Jun 26, KY Srinivasan wrote:

> > From: Greg KH [mailto:gregkh@linuxfoundation.org]
> > The fact that it was Red Hat specific was the main part, this should be
> > done in a standard way, with standard tools, right?
> 
> The reason I asked this question was to make sure I address these
> issues in addition to whatever I am debugging now. I use the standard
> tools and calls to retrieve all the IP configuration. As I look at
> each distribution the files they keep persistent IP configuration
> Information is different and that is the reason I chose to start with
> RedHat. If there is a standard way to store the configuration, I will
> do that.


KY,

instead of using system() in kvp_get_ipconfig_info and kvp_set_ip_info,
wouldnt it be easier to call an external helper script which does all
the distribution specific work? Just define some API to pass values to
the script, and something to read values collected by the script back
into the daemon.

If the work is done in a script it will be much easier for an admin to
debug and adjust it.

I think there is no standard way to configure all relevant distros in
the same way. Maybe one day NetworkManager can finally handle all
possible ways to configure network related things. But until that
happens the config files need to be adjusted manually.



Some of the functions have deep indention levels due to 'while() {
switch() }' usage. Perhaps such code could be moved into its own
function so that lines dont need to be wrapped that much due to the odd
80 column limit.

Olaf

^ permalink raw reply

* Re: [patch net-next 0/4] net: introduce and use IFF_LIFE_ADDR_CHANGE
From: Richard Cochran @ 2012-06-28 15:15 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: mst, netdev, shimoda.hiroaki, virtualization, danny.kukawka,
	edumazet, davem
In-Reply-To: <1340892639-1111-1-git-send-email-jpirko@redhat.com>

On Thu, Jun 28, 2012 at 04:10:35PM +0200, Jiri Pirko wrote:
> three drivers updated, but this can be used in many others.
> 
> Jiri Pirko (4):
>   net: introduce new priv_flag indicating iface capable of change mac
>     when running
>   virtio_net: use IFF_LIFE_ADDR_CHANGE priv_flag
>   team: use IFF_LIFE_ADDR_CHANGE priv_flag
>   dummy: use IFF_LIFE_ADDR_CHANGE priv_flag

I think you must mean LIVE and not LIFE...

Thanks,
Richard


> 
>  drivers/net/dummy.c      |   15 ++-------------
>  drivers/net/team/team.c  |    9 +++++----
>  drivers/net/virtio_net.c |   11 +++++------
>  include/linux/if.h       |    2 ++
>  net/ethernet/eth.c       |    2 +-
>  5 files changed, 15 insertions(+), 24 deletions(-)
> 
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch net-next 1/4] net: introduce new priv_flag indicating iface capable of change mac when running
From: Eric Dumazet @ 2012-06-28 15:24 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: mst, netdev, shimoda.hiroaki, virtualization, danny.kukawka,
	edumazet, davem
In-Reply-To: <1340892639-1111-2-git-send-email-jpirko@redhat.com>

On Thu, 2012-06-28 at 16:10 +0200, Jiri Pirko wrote:
> Introduce IFF_LIFE_ADDR_CHANGE priv_flag and use it to disable
> netif_running() check in eth_mac_addr()
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> ---
>  include/linux/if.h |    2 ++
>  net/ethernet/eth.c |    2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/if.h b/include/linux/if.h
> index f995c66..fd9ee7c 100644
> --- a/include/linux/if.h
> +++ b/include/linux/if.h
> @@ -81,6 +81,8 @@
>  #define IFF_UNICAST_FLT	0x20000		/* Supports unicast filtering	*/
>  #define IFF_TEAM_PORT	0x40000		/* device used as team port */
>  #define IFF_SUPP_NOFCS	0x80000		/* device supports sending custom FCS */
> +#define IFF_LIFE_ADDR_CHANGE 0x100000	/* device supports hardware address
> +					 * change when it's running */
>  
> 
>  #define IF_GET_IFACE	0x0001		/* for querying only */
> diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
> index 36e5880..8f8ded4 100644
> --- a/net/ethernet/eth.c
> +++ b/net/ethernet/eth.c
> @@ -283,7 +283,7 @@ int eth_mac_addr(struct net_device *dev, void *p)
>  {
>  	struct sockaddr *addr = p;
>  
> -	if (netif_running(dev))
> +	if (!(dev->priv_flags & IFF_LIFE_ADDR_CHANGE) && netif_running(dev))
>  		return -EBUSY;
>  	if (!is_valid_ether_addr(addr->sa_data))
>  		return -EADDRNOTAVAIL;

Since the memcpy() is not atomic, there is a small window where a reader
could get a half-changed mac address. I guess its a detail.

^ permalink raw reply

* Re: [patch net-next 0/4] net: introduce and use IFF_LIFE_ADDR_CHANGE
From: Jiri Pirko @ 2012-06-28 15:41 UTC (permalink / raw)
  To: Richard Cochran
  Cc: mst, netdev, shimoda.hiroaki, virtualization, danny.kukawka,
	edumazet, davem
In-Reply-To: <20120628151507.GA5920@localhost.localdomain>

Thu, Jun 28, 2012 at 05:15:07PM CEST, richardcochran@gmail.com wrote:
>On Thu, Jun 28, 2012 at 04:10:35PM +0200, Jiri Pirko wrote:
>> three drivers updated, but this can be used in many others.
>> 
>> Jiri Pirko (4):
>>   net: introduce new priv_flag indicating iface capable of change mac
>>     when running
>>   virtio_net: use IFF_LIFE_ADDR_CHANGE priv_flag
>>   team: use IFF_LIFE_ADDR_CHANGE priv_flag
>>   dummy: use IFF_LIFE_ADDR_CHANGE priv_flag
>
>I think you must mean LIVE and not LIFE...

Good point. I will change it and repost, but I will give it some time so
people can express themselves.

Thanks!

Jirka
>
>Thanks,
>Richard
>
>
>> 
>>  drivers/net/dummy.c      |   15 ++-------------
>>  drivers/net/team/team.c  |    9 +++++----
>>  drivers/net/virtio_net.c |   11 +++++------
>>  include/linux/if.h       |    2 ++
>>  net/ethernet/eth.c       |    2 +-
>>  5 files changed, 15 insertions(+), 24 deletions(-)
>> 
>> -- 
>> 1.7.10.4
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH RFC V6 1/5] kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks
From: Raghavendra K T @ 2012-06-28 18:17 UTC (permalink / raw)
  To: Gleb Natapov, Avi Kivity
  Cc: Jeremy Fitzhardinge, X86, KVM, Konrad Rzeszutek Wilk, linux-doc,
	LKML, Greg Kroah-Hartman, Virtualization, Ingo Molnar,
	Srivatsa Vaddagiri, Sasha Levin, H. Peter Anvin, Xen,
	Stefano Stabellini
In-Reply-To: <20120427155318.GI6833@redhat.com>

On 04/27/2012 09:23 PM, Gleb Natapov wrote:
> On Fri, Apr 27, 2012 at 04:15:35PM +0530, Raghavendra K T wrote:
>> On 04/24/2012 03:29 PM, Gleb Natapov wrote:
>>> On Mon, Apr 23, 2012 at 03:29:47PM +0530, Raghavendra K T wrote:
>>>> From: Srivatsa Vaddagiri<vatsa@linux.vnet.ibm.com>
>>>>
>>>> KVM_HC_KICK_CPU allows the calling vcpu to kick another vcpu out of halt state.
>>>>
>>>> The presence of these hypercalls is indicated to guest via
>>>> KVM_FEATURE_PV_UNHALT/KVM_CAP_PV_UNHALT.
>>>>
>>>> Signed-off-by: Srivatsa Vaddagiri<vatsa@linux.vnet.ibm.com>
>>>> Signed-off-by: Suzuki Poulose<suzuki@in.ibm.com>
>>>> Signed-off-by: Raghavendra K T<raghavendra.kt@linux.vnet.ibm.com>
>>>> ---
>> [...]
>>>> +/*
>>>> + * kvm_pv_kick_cpu_op:  Kick a vcpu.
>>>> + *
>>>> + * @apicid - apicid of vcpu to be kicked.
>>>> + */
>>>> +static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
>>>> +{
>>>> +	struct kvm_vcpu *vcpu = NULL;
>>>> +	int i;
>>>> +
>>>> +	kvm_for_each_vcpu(i, vcpu, kvm) {
>>>> +		if (!kvm_apic_present(vcpu))
>>>> +			continue;
>>>> +
>>>> +		if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0))
>>>> +			break;
>>>> +	}
>>>> +	if (vcpu) {
>>>> +		/*
>>>> +		 * Setting unhalt flag here can result in spurious runnable
>>>> +		 * state when unhalt reset does not happen in vcpu_block.
>>>> +		 * But that is harmless since that should soon result in halt.
>>>> +		 */
>>>> +		vcpu->arch.pv.pv_unhalted = 1;
>>>> +		/* We need everybody see unhalt before vcpu unblocks */
>>>> +		smp_mb();
>>>> +		kvm_vcpu_kick(vcpu);
>>>> +	}
>>>> +}
>>> This is too similar to kvm_irq_delivery_to_apic(). Why not reuse it. We
>>> can use one of reserved delivery modes as PV delivery mode. We will
>>> disallow guest to trigger it through apic interface, so this will not be
>>> part of ABI and can be changed at will.
[...]
>> kvm/x86.c
>> =========
>> kvm_pv_kick_cpu_op()
>> {
>>
>>   struct kvm_lapic_irq lapic_irq;
>>
>>   lapic_irq.shorthand = 0;
>>   lapic_irq.dest_mode = 0;
>>   lapic_irq.dest_id = apicid;
>>
>>   lapic_irq.delivery_mode = PV_DELIVERY_MODE;
>>   kvm_irq_delivery_to_apic(kvm, 0,&lapic_irq );
>>
>> }
>>
>> kvm/lapic.c
>> ==========
>> _apic_accept_irq()
>> {
>> ...
>> case APIC_DM_REMRD:
>>                  result = 1;
>>                  vcpu->pv_unhalted = 1
>>                  smp_mb();
>>                  kvm_make_request(KVM_REQ_EVENT, vcpu);
>>                  kvm_vcpu_kick(vcpu);
>>                  break;
>>
>> ...
>> }
>>
>> here using PV_DELIVERY_MODE = APIC_DM_REMRD, which was unused.
>>
> Yes, this is what I mean except that PV_DELIVERY_MODE should be
> number defined as reserved by Intel spec.
>

Hi Gleb, Avi,

This had been TODO in my V8 patches.
I 'll fold this into V9 (while rebasing to
3.5-rc).
Please let me know if it is OK.

^ permalink raw reply

* eScience Conference submission extension - 18 July
From: Ioan Raicu @ 2012-06-28 18:52 UTC (permalink / raw)
  To: virtualization


[-- Attachment #1.1: Type: text/plain, Size: 11960 bytes --]

Due to numerous requests, we are extended the paper deadline, and 
dropping the abstract deadline completely:


CALL FOR PAPERS

8th IEEE International Conference on eScience
http://www.ci.uchicago.edu/escience2012/
October 8-12, 2012
Chicago, IL, USA

Researchers in all disciplines are increasingly adopting digital tools, 
techniques and practices, often in communities and projects that span 
disciplines, laboratories, organizations, and national boundaries. The 
eScience 2012 conference is designed to bring together leading 
international and interdisciplinary research communities, developers, 
and users of eScience applications and enabling IT technologies. The 
conference serves as a forum to present the results of the latest 
applications research and product/tool developments and to highlight 
related activities from around the world. Also, we are now entering the 
second decade of eScience and the 2012 conference gives an opportunity 
to take stock of what has been achieved so far and look forward to the 
challenges and opportunities the next decade will bring.

A special emphasis of the 2012 conference is on advances in the 
application of technology in a particular discipline. Accordingly, 
significant advances in applications science and technology will be 
considered as important as the development of new technologies 
themselves. Further, we welcome contributions in educational activities 
under any of these disciplines.

As a result, the conference will be structured around two e-Science tracks:

. eScience Algorithms and Applications
. eScience application areas, including:
. Physical sciences
. Biomedical sciences
. Social sciences and humanities
. Data-oriented approaches and applications
. Compute-oriented approaches and applications
. Extreme scale approaches and applications
. Cyberinfrastructure to support eScience
. Novel hardware
. Novel uses of production infrastructure
. Software and services
. Tools

The conference proceedings will be published by the IEEE Computer 
Society Press, USA and will be made available online through the IEEE 
Digital Library. Selected papers will be invited to submit extended 
versions to a special issue of the Future Generation Computer Systems 
(FGCS)journal.

SUBMISSION PROCESS
Authors are invited to submit papers with unpublished, original work of 
not more than 8 pages of double column text using single spaced 10 point 
size on 8.5 x 11 inch pages, as per IEEE 8.5 x 11 manuscript guidelines. 
(Up to 2 additional pages may be purchased for US$150/page)

Templates are available from 
http://www.ieee.org/conferences_events/conferences/publishing/templates.html.

Authors should submit a PDF file that will print on a PostScript printer 
tohttps://www.easychair.org/conferences/?conf=escience2012

(Note that paper submitters also must submit an abstract in advance of 
the paper deadline. This should be done through the same site where 
papers are submitted.)

It is a requirement that at least one author of each accepted paper 
attend the conference.

IMPORTANT DATES

Abstract submission no longer required
Paper submission: extended to 18 July 2012 (firm)
Paper author notification: 22 August 2012
Camera-ready papers due: 10 September 2012
Conference: 8-12 October 2012




In addition to the eScience conference itself, there are six associated 
workshops and one tutorial 
(http://www.ci.uchicago.edu/escience2012/workshops.php)

  * Extending High-Performance Computing Beyond its Traditional User
    Communities, http://www.psc.edu/events/escience-2012-workshop/
  * 2nd International Workshop on Analyzing and Improving Collaborative
    eScience with Social Networks (eSoN 12),
    http://www.ci.uchicago.edu/eson2012/
  * Advances in eHealth,
    http://www.scalalife.eu/content/advances-ehealth-2012-workshop
  * Maintainable Software Practices in e-Science,
    http://software.ac.uk/maintainable-software-practice-workshop
  * eScience Meets the Instrument,
    https://confluence-vre.its.monash.edu.au/display/escience2012/eScience+Meets+the+Instrument
  * Collaborative research using eScience infrastructure and high speed
    networks,
    http://www.surfnet.nl/en/Hybride_netwerk/SURFlichtpaden/Pages/CollaborativeresearchusingeScienceinfrastructureandhighspeednetworks.aspx


  * Tutorial: Big Data Processing: Lessons from Industry and
    Applications in Science,
    http://www.ci.uchicago.edu/escience2012/tutorial.php






CONFERENCE ORGANIZATION

General Chair
. Ian Foster, University of Chicago & Argonne National Laboratory, USA
Program Co-Chairs
. Daniel S. Katz, University of Chicago & Argonne National Laboratory, USA
. Heinz Stockinger, SIB Swiss Institute of Bioinformatics, Switzerland
Program Vice Co-Chairs
. eScience Algorithms and Applications Track
. David Abramson, Monash University, Australia
. Gabrielle Allen, Louisiana State University, USA
. Cyberinfrastructure to support eScience Track
. Rosa M. Badia, Barcelona Supercomputing Center / CSIC, Spain
. Geoffrey Fox, Indiana University, USA
Early Results and Works-in-Progress Posters Chair
. Roger Barga, Microsoft, USA
Workshops Chair
. Ruth Pordes, FNAL, USA
Sponsorship Chair
. Charlie Catlett, Argonne National Laboratory, USA
Conference Manager and Finance Chair
. Julie Wulf-Knoerzer, University of Chicago & Argonne National 
Laboratory, USA
Publicity Chairs
. Kento Aida, National Institute of Informatics, Japan
. Ioan Raicu, Illinois Institute of Technology, USA
. David Wallom, Oxford e-Research Centre, UK
Local Organizing Committee
. Ninfa Mayorga, University of Chicago, USA
. Evelyn Rayburn, University of Chicago, USA
. Lynn Valentini, Argonne National Laboratory, USA
Program Committee
. eScience Algorithms and Applications Track
. Srinivas Aluru, Iowa State University, USA
. Ashiq Anjum, University of Derby, UK
. David A. Bader, Georgia Institute of Technology, USA
. Jon Blower, University of Reading, UK
. Paul Bonnington, Monash University, Australia
. Simon Cox, University of Southampton, UK
. David De Roure, Oxford e-Research Centre, UK
. George Djorgovski, California Institute of Technology, USA
. Anshu Dubey, University of Chicago & Argonne National Laboratory, USA
. Yuri Estrin, Monash University, Australia
. Dan Fay, Microsoft, USA
. Jeremy Frey, University of Southampton, UK
. Wolfgang Gentzsch, HPC Consultant, Germany
. Lutz Gross, The University of Queensland, Austrialia
. Sverker Holmgren, Uppsala University, Sweden
. Bill Howe, University of Washington, USA
. Marina Jirotka, University of Oxford, UK
. Timoleon Kipouros, University of Cambridge, UK
. Kerstin Kleese van Dam, Pacific Northwest National Laboratory, USA
. Arun S. Konagurthu, Monash University, Australia
. Peter Kunszt, SystemsX.ch <http://SystemsX.ch/>, Switzerland
. Alexey Lastovetsky, University College Dublin, Ireland
. Andrew Lewis, Griffith University, Australia
. Sergio Maffioletti, University of Zurich, Switzerland
. Amitava Majumdar, San Diego Supercomputer Center, University of 
California at San Diego, USA
. Rui Mao, Shenzhen University, China
. Madhav V. Marathe, Virginia Tech, USA
. Maryann Martone, University of California at San Diego, USA
. Louis Moresi, Monash University, Australia
. Riccardo Murri, University of Zurich, Switzerland
. Silvia D. Olabarriaga, Academic Medical Center of the University of 
Amsterdam, Netherlands
. Enrique S. Quintana-Ortí, Universidad Jaume I, Spain
. Abani Patra, University at Buffalo, USA
. Rob Pennington, NSF, USA
. Andrew Perry, Monash University, Australia
. Beth Plale, Indiana University, USA
. Michael Resch, University of Stuttgart, Germany
. Adrian Sandu, Virginia Tech, USA
. Mark Savill, Cranfield University, UK
. Erik Schnetter, Perimeter Institute for Theoretical Physics, Canada
. Edward Seidel, Louisiana State University, USA
. Suzanne M. Shontz, The Pennsylvania State University, USA
. David Skinner, Lawrence Berkeley National Laboratory, USA
. Alan Sussman, University of Maryland, USA
. Alex Szalay, Johns Hopkins University, USA
. Domenico Talia, ICAR-CNR & University of Calabria, Italy
. Jian Tao, Louisiana State University, USA
. David Wallom, Oxford e-Research Centre, UK
. Shaowen Wang, University of Illinois at Urbana-Champaign, USA
. Michael Wilde, Argonne National Laboratory & University of Chicago, USA
. Nancy Wilkins-Diehr, San Diego Supercomputer Center, University of 
California at San Diego, USA
. Wu Zhang, Shanghai University, China
. Yunquan Zhang, Chinese Academy of Sciences, China
. Cyberinfrastructure to support eScience Track
. Deb Agarwal, Lawrence Berkeley National Laboratory, USA
. Ilkay Altintas, San Diego Supercomputer Center, University of 
California at San Diego, USA
. Henri Bal, Vrije Universiteit, Netherlands
. Roger Barga, Microsoft, USA
. Martin Berzins, University of Utah, USA
. John Brooke, University of Manchester, UK
. Thomas Fahringer, University of Innsbruck, Austria
. Gilles Fedak, INRIA, France
. José A. B. Fortes, University of Florida, USA
. Yolanda Gil, ISI/USC, USA
. Madhusudhan Govindaraju, SUNY Binghamton, USA
. Thomas Hacker, Purdue University, USA
. Ken Hawick, Massey University, New Zealand
. Marty Humphrey, University of Virginia, USA
. Hai Jin, Huazhong University of Science and Technology, China
. Thilo Kielmann, Vrije Universiteit, Netherlands
. Scott Klasky, Oak Ridge National Laboratory, USA
. Isao Kojima, AIST, Japan
. Tevfik Kosar, University at Buffalo, USA
. Dieter Kranzlmueller, LMU & LRZ Munich, Germany
. Erwin Laure, KTH, Sweden
. Jysoo Lee, KISTI, Korea
. Li Xiaoming, Peking University, China
. Bertram Ludäscher, University of California, Davis, USA
. Andrew Lumsdaine, Indiana University, USA
. Tanu Malik, University of Chicago, USA
. Satoshi Matsuoka, Tokyo Institute of Technology, Japan
. Reagan Moore, University of North Carolina at Chapel Hill, USA
. Shirley Moore, University of Kentucky, USA
. Steven Newhouse, EGI, Netherlands
. Dhabaleswar K. (DK) Panda, The Ohio State University, USA
. Manish Parashar, Rutgers University, USA
. Ron Perrott, University of Oxford, UK
. Depei Qian, Beihang University, China
. Judy Qui, Indiana University, USA
. Ioan Raicu, Illinois Institute of Technology, USA
. Lavanya Ramakrishnan, Lawrence Berkeley National Laboratory, USA
. Omer Rana, Cardiff University, UK
. Paul Roe, Queensland University of Technology, Australia
. Bruno Schulze, LNCC, Brazil
. Marc Snir, Argonne National Laboratory & University of Illinois at 
Urbana-Champaign, USA
. Xian-He Sun, Illinois Institute of Technology, USA
. Yoshio Tanaka, AIST, Japan
. Michela Taufer, University of Delaware, USA
. Kerry Taylor, CSIRO, Australia
. Douglas Thain, University of Notre Dame, USA
. Paul Watson, Newcastle University, UK
. Jian Zhang, Northern Illinois University, USA
. Jun Zhao, University of Oxford, UK

Sponsors:
. University of Chicago
. Argonne National Laboratory
. IEEE
. CSIRO
. Indiana University
. additional sponsorship opportunities are available

-- 
=================================================================
Ioan Raicu, Ph.D.
Assistant Professor, Illinois Institute of Technology (IIT)
Guest Research Faculty, Argonne National Laboratory (ANL)
=================================================================
Data-Intensive Distributed Systems Laboratory, CS/IIT
Distributed Systems Laboratory, MCS/ANL
=================================================================
Cel:    1-847-722-0876
Office: 1-312-567-5704
Email:  iraicu@cs.iit.edu
Web:    http://www.cs.iit.edu/~iraicu/
Web:    http://datasys.cs.iit.edu/
=================================================================
=================================================================


[-- Attachment #1.2: Type: text/html, Size: 30575 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply

* Re: [patch net-next 2/4] virtio_net: use IFF_LIFE_ADDR_CHANGE priv_flag
From: Michael S. Tsirkin @ 2012-06-28 19:21 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, shimoda.hiroaki, virtualization, danny.kukawka, edumazet,
	davem
In-Reply-To: <1340892639-1111-3-git-send-email-jpirko@redhat.com>

On Thu, Jun 28, 2012 at 04:10:37PM +0200, Jiri Pirko wrote:
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>

FWIW

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c |   11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 36a16d5..6a0f526 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -679,12 +679,11 @@ static int virtnet_set_mac_address(struct net_device *dev, void *p)
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
>  	struct virtio_device *vdev = vi->vdev;
> -	struct sockaddr *addr = p;
> +	int ret;
>  
> -	if (!is_valid_ether_addr(addr->sa_data))
> -		return -EADDRNOTAVAIL;
> -	memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
> -	dev->addr_assign_type &= ~NET_ADDR_RANDOM;
> +	ret = eth_mac_addr(dev, p);
> +	if (ret)
> +		return ret;
>  
>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_MAC))
>  		vdev->config->set(vdev, offsetof(struct virtio_net_config, mac),
> @@ -1063,7 +1062,7 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		return -ENOMEM;
>  
>  	/* Set up network device as normal. */
> -	dev->priv_flags |= IFF_UNICAST_FLT;
> +	dev->priv_flags |= IFF_UNICAST_FLT | IFF_LIFE_ADDR_CHANGE;
>  	dev->netdev_ops = &virtnet_netdev;
>  	dev->features = NETIF_F_HIGHDMA;
>  
> -- 
> 1.7.10.4

^ permalink raw reply

* Re: [patch net-next 1/4] net: introduce new priv_flag indicating iface capable of change mac when running
From: Michael S. Tsirkin @ 2012-06-28 20:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Pirko, netdev, shimoda.hiroaki, virtualization,
	danny.kukawka, edumazet, davem
In-Reply-To: <1340897092.13187.110.camel@edumazet-glaptop>

On Thu, Jun 28, 2012 at 05:24:52PM +0200, Eric Dumazet wrote:
> On Thu, 2012-06-28 at 16:10 +0200, Jiri Pirko wrote:
> > Introduce IFF_LIFE_ADDR_CHANGE priv_flag and use it to disable
> > netif_running() check in eth_mac_addr()
> > 
> > Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> > ---
> >  include/linux/if.h |    2 ++
> >  net/ethernet/eth.c |    2 +-
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/if.h b/include/linux/if.h
> > index f995c66..fd9ee7c 100644
> > --- a/include/linux/if.h
> > +++ b/include/linux/if.h
> > @@ -81,6 +81,8 @@
> >  #define IFF_UNICAST_FLT	0x20000		/* Supports unicast filtering	*/
> >  #define IFF_TEAM_PORT	0x40000		/* device used as team port */
> >  #define IFF_SUPP_NOFCS	0x80000		/* device supports sending custom FCS */
> > +#define IFF_LIFE_ADDR_CHANGE 0x100000	/* device supports hardware address
> > +					 * change when it's running */
> >  
> > 
> >  #define IF_GET_IFACE	0x0001		/* for querying only */
> > diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
> > index 36e5880..8f8ded4 100644
> > --- a/net/ethernet/eth.c
> > +++ b/net/ethernet/eth.c
> > @@ -283,7 +283,7 @@ int eth_mac_addr(struct net_device *dev, void *p)
> >  {
> >  	struct sockaddr *addr = p;
> >  
> > -	if (netif_running(dev))
> > +	if (!(dev->priv_flags & IFF_LIFE_ADDR_CHANGE) && netif_running(dev))
> >  		return -EBUSY;
> >  	if (!is_valid_ether_addr(addr->sa_data))
> >  		return -EADDRNOTAVAIL;
> 
> Since the memcpy() is not atomic, there is a small window where a reader
> could get a half-changed mac address. I guess its a detail.
> 

At least for virtio nothing changes - we had this bug forever.
How'd you fix this?

-- 
MST

^ permalink raw reply

* [PATCH v2 0/4] make balloon pages movable by compaction
From: Rafael Aquini @ 2012-06-28 21:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Rik van Riel, Rafael Aquini, Konrad Rzeszutek Wilk,
	Michael S. Tsirkin, linux-kernel, virtualization, Andi Kleen,
	Andrew Morton

This patchset follows the main idea discussed at 2012 LSFMMS section:
"Ballooning for transparent huge pages" -- http://lwn.net/Articles/490114/

to introduce the required changes to the virtio_balloon driver, as well as
changes to the core compaction & migration bits, in order to allow
memory balloon pages become movable within a guest.

Rafael Aquini (4):
  mm: introduce compaction and migration for virtio ballooned pages
  virtio_balloon: handle concurrent accesses to virtio_balloon struct
    elements
  virtio_balloon: introduce migration primitives to balloon pages
  mm: add vm event counters for balloon pages compaction

 drivers/virtio/virtio_balloon.c |  142 +++++++++++++++++++++++++++++++++++----
 include/linux/mm.h              |   16 +++++
 include/linux/virtio_balloon.h  |    6 ++
 include/linux/vm_event_item.h   |    2 +
 mm/compaction.c                 |  111 ++++++++++++++++++++++++------
 mm/migrate.c                    |   32 ++++++++-
 mm/vmstat.c                     |    4 ++
 7 files changed, 280 insertions(+), 33 deletions(-)


V2: address Mel Gorman's review comments

TODO:
- check on naming chages suggested by Konrad (original series discussion)


Preliminary test results:
(2 VCPU 1024mB RAM KVM guest running 3.5.0_rc4+)

* 64mB balloon:
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 0
compact_balloon_failed 0
compact_balloon_isolated 0
compact_balloon_freed 0
[root@localhost ~]#
[root@localhost ~]# for i in $(seq 1 4); do echo 1> /proc/sys/vm/compact_memory & done &>/dev/null
[1]   Done                    echo > /proc/sys/vm/compact_memory
[2]   Done                    echo > /proc/sys/vm/compact_memory
[3]-  Done                    echo > /proc/sys/vm/compact_memory
[4]+  Done                    echo > /proc/sys/vm/compact_memory
[root@localhost ~]#
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
 compact_blocks_moved 2717
compact_pages_moved 46697
compact_pagemigrate_failed 75
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 16384
compact_balloon_failed 0
compact_balloon_isolated 16384
compact_balloon_freed 16384


* 128mB balloon:
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 0
compact_pages_moved 0
compact_pagemigrate_failed 0
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 0
compact_balloon_failed 0
compact_balloon_isolated 0
compact_balloon_freed 0
[root@localhost ~]#
[root@localhost ~]# for i in $(seq 1 4); do echo 1> /proc/sys/vm/compact_memory & done &>/dev/null
[1]   Done                    echo > /proc/sys/vm/compact_memory
[2]   Done                    echo > /proc/sys/vm/compact_memory
[3]-  Done                    echo > /proc/sys/vm/compact_memory
[4]+  Done                    echo > /proc/sys/vm/compact_memory
[root@localhost ~]#
[root@localhost ~]# awk '/compact/ {print}' /proc/vmstat
compact_blocks_moved 2598
compact_pages_moved 47660
compact_pagemigrate_failed 103
compact_stall 0
compact_fail 0
compact_success 0
compact_balloon_migrated 26652
compact_balloon_failed 76
compact_balloon_isolated 26728
compact_balloon_freed 26652
-- 
1.7.10.2

^ permalink raw reply

* [PATCH v2 1/4] mm: introduce compaction and migration for virtio ballooned pages
From: Rafael Aquini @ 2012-06-28 21:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Rik van Riel, Rafael Aquini, Konrad Rzeszutek Wilk,
	Michael S. Tsirkin, linux-kernel, virtualization, Andi Kleen,
	Andrew Morton
In-Reply-To: <cover.1340916058.git.aquini@redhat.com>

This patch introduces the helper functions as well as the necessary changes
to teach compaction and migration bits how to cope with pages which are
part of a guest memory balloon, in order to make them movable by memory
compaction procedures.

Signed-off-by: Rafael Aquini <aquini@redhat.com>
---
 include/linux/mm.h |   16 ++++++++
 mm/compaction.c    |  110 +++++++++++++++++++++++++++++++++++++++++++---------
 mm/migrate.c       |   30 +++++++++++++-
 3 files changed, 136 insertions(+), 20 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b36d08c..35568fc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1629,5 +1629,21 @@ static inline unsigned int debug_guardpage_minorder(void) { return 0; }
 static inline bool page_is_guard(struct page *page) { return false; }
 #endif /* CONFIG_DEBUG_PAGEALLOC */
 
+#if (defined(CONFIG_VIRTIO_BALLOON) || \
+	defined(CONFIG_VIRTIO_BALLOON_MODULE)) && defined(CONFIG_COMPACTION)
+extern bool isolate_balloon_page(struct page *);
+extern bool putback_balloon_page(struct page *);
+extern struct address_space *balloon_mapping;
+
+static inline bool is_balloon_page(struct page *page)
+{
+        return (page->mapping == balloon_mapping) ? true : false;
+}
+#else
+static inline bool is_balloon_page(struct page *page)       { return false; }
+static inline bool isolate_balloon_page(struct page *page)  { return false; }
+static inline bool putback_balloon_page(struct page *page)  { return false; }
+#endif /* (VIRTIO_BALLOON || VIRTIO_BALLOON_MODULE) && COMPACTION */
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/mm/compaction.c b/mm/compaction.c
index 7ea259d..6c6e572 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -14,6 +14,7 @@
 #include <linux/backing-dev.h>
 #include <linux/sysctl.h>
 #include <linux/sysfs.h>
+#include <linux/export.h>
 #include "internal.h"
 
 #if defined CONFIG_COMPACTION || defined CONFIG_CMA
@@ -312,32 +313,40 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 			continue;
 		}
 
-		if (!PageLRU(page))
-			continue;
-
 		/*
-		 * PageLRU is set, and lru_lock excludes isolation,
-		 * splitting and collapsing (collapsing has already
-		 * happened if PageLRU is set).
+		 * It is possible to migrate LRU pages and balloon pages.
+		 * Skip any other type of page.
 		 */
-		if (PageTransHuge(page)) {
-			low_pfn += (1 << compound_order(page)) - 1;
-			continue;
-		}
+		if (likely(PageLRU(page))) {
+			/*
+			 * PageLRU is set, and lru_lock excludes isolation,
+			 * splitting and collapsing (collapsing has already
+			 * happened if PageLRU is set).
+			 */
+			if (PageTransHuge(page)) {
+				low_pfn += (1 << compound_order(page)) - 1;
+				continue;
+			}
 
-		if (!cc->sync)
-			mode |= ISOLATE_ASYNC_MIGRATE;
+			if (!cc->sync)
+				mode |= ISOLATE_ASYNC_MIGRATE;
 
-		lruvec = mem_cgroup_page_lruvec(page, zone);
+			lruvec = mem_cgroup_page_lruvec(page, zone);
 
-		/* Try isolate the page */
-		if (__isolate_lru_page(page, mode) != 0)
-			continue;
+			/* Try isolate the page */
+			if (__isolate_lru_page(page, mode) != 0)
+				continue;
 
-		VM_BUG_ON(PageTransCompound(page));
+			VM_BUG_ON(PageTransCompound(page));
+
+			/* Successfully isolated */
+			del_page_from_lru_list(page, lruvec, page_lru(page));
+		} else if (is_balloon_page(page)) {
+			if (!isolate_balloon_page(page))
+				continue;
+		} else
+			continue;
 
-		/* Successfully isolated */
-		del_page_from_lru_list(page, lruvec, page_lru(page));
 		list_add(&page->lru, migratelist);
 		cc->nr_migratepages++;
 		nr_isolated++;
@@ -903,4 +912,67 @@ void compaction_unregister_node(struct node *node)
 }
 #endif /* CONFIG_SYSFS && CONFIG_NUMA */
 
+#if defined(CONFIG_VIRTIO_BALLOON) || defined(CONFIG_VIRTIO_BALLOON_MODULE)
+/*
+ * Balloon pages special page->mapping.
+ * users must properly allocate and initialize an instance of balloon_mapping,
+ * and set it as the page->mapping for balloon enlisted page instances.
+ *
+ * address_space_operations necessary methods for ballooned pages:
+ *   .migratepage    - used to perform balloon's page migration (as is)
+ *   .invalidatepage - used to isolate a page from balloon's page list
+ *   .freepage       - used to reinsert an isolated page to balloon's page list
+ */
+struct address_space *balloon_mapping;
+EXPORT_SYMBOL_GPL(balloon_mapping);
+
+/* __isolate_lru_page() counterpart for a ballooned page */
+bool isolate_balloon_page(struct page *page)
+{
+	if (WARN_ON(!is_balloon_page(page)))
+		return false;
+
+	if (likely(get_page_unless_zero(page))) {
+		/*
+		 * We can race against move_to_new_page() & __unmap_and_move().
+		 * If we stumble across a locked balloon page and succeed on
+		 * isolating it, the result tends to be disastrous.
+		 */
+		if (likely(trylock_page(page))) {
+			/*
+			 * A ballooned page, by default, has just one refcount.
+			 * Prevent concurrent compaction threads from isolating
+			 * an already isolated balloon page.
+			 */
+			if (is_balloon_page(page) && (page_count(page) == 2)) {
+				page->mapping->a_ops->invalidatepage(page, 0);
+				unlock_page(page);
+				return true;
+			}
+			unlock_page(page);
+		}
+		/* Drop refcount taken for this already isolated page */
+		put_page(page);
+	}
+	return false;
+}
+
+/* putback_lru_page() counterpart for a ballooned page */
+bool putback_balloon_page(struct page *page)
+{
+	if (WARN_ON(!is_balloon_page(page)))
+		return false;
+
+	if (likely(trylock_page(page))) {
+		if(is_balloon_page(page)) {
+			page->mapping->a_ops->freepage(page);
+			put_page(page);
+			unlock_page(page);
+			return true;
+		}
+		unlock_page(page);
+	}
+	return false;
+}
+#endif /* CONFIG_VIRTIO_BALLOON || CONFIG_VIRTIO_BALLOON_MODULE */
 #endif /* CONFIG_COMPACTION */
diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..59c7bc5 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -78,7 +78,10 @@ void putback_lru_pages(struct list_head *l)
 		list_del(&page->lru);
 		dec_zone_page_state(page, NR_ISOLATED_ANON +
 				page_is_file_cache(page));
-		putback_lru_page(page);
+		if (unlikely(is_balloon_page(page)))
+			WARN_ON(!putback_balloon_page(page));
+		else
+			putback_lru_page(page);
 	}
 }
 
@@ -783,6 +786,17 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 		}
 	}
 
+	if (is_balloon_page(page)) {
+		/*
+		 * A ballooned page does not need any special attention from
+		 * physical to virtual reverse mapping procedures.
+		 * Skip any attempt to unmap PTEs or to remap swap cache,
+		 * in order to avoid burning cycles at rmap level.
+		 */
+		remap_swapcache = 0;
+		goto skip_unmap;
+	}
+
 	/*
 	 * Corner case handling:
 	 * 1. When a new swap-cache page is read into, it is added to the LRU
@@ -852,6 +866,20 @@ static int unmap_and_move(new_page_t get_new_page, unsigned long private,
 			goto out;
 
 	rc = __unmap_and_move(page, newpage, force, offlining, mode);
+
+	if (is_balloon_page(newpage)) {
+		/*
+		 * A ballooned page has been migrated already. Now, it is the
+		 * time to wrap-up counters, handle the old page back to Buddy
+		 * and return.
+		 */
+		list_del(&page->lru);
+		dec_zone_page_state(page, NR_ISOLATED_ANON +
+				    page_is_file_cache(page));
+		put_page(page);
+		__free_page(page);
+		return rc;
+	}
 out:
 	if (rc != -EAGAIN) {
 		/*
-- 
1.7.10.2

^ permalink raw reply related

* [PATCH v2 2/4] virtio_balloon: handle concurrent accesses to virtio_balloon struct elements
From: Rafael Aquini @ 2012-06-28 21:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Rik van Riel, Rafael Aquini, Konrad Rzeszutek Wilk,
	Michael S. Tsirkin, linux-kernel, virtualization, Andi Kleen,
	Andrew Morton
In-Reply-To: <cover.1340916058.git.aquini@redhat.com>

This patch introduces access sychronization to critical elements of struct
virtio_balloon, in order to allow the thread concurrency compaction/migration
bits might ended up imposing to the balloon driver on several situations.

Signed-off-by: Rafael Aquini <aquini@redhat.com>
---
 drivers/virtio/virtio_balloon.c |   45 +++++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 11 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index bfbc15c..d47c5c2 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -51,6 +51,10 @@ struct virtio_balloon
 
 	/* Number of balloon pages we've told the Host we're not using. */
 	unsigned int num_pages;
+
+	/* Protect 'pages', 'pfns' & 'num_pnfs' against concurrent updates */
+	spinlock_t pfn_list_lock;
+
 	/*
 	 * The pages we've told the Host we're not using.
 	 * Each page on this list adds VIRTIO_BALLOON_PAGES_PER_PAGE
@@ -97,21 +101,23 @@ static void balloon_ack(struct virtqueue *vq)
 		complete(&vb->acked);
 }
 
-static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
-{
-	struct scatterlist sg;
-
-	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+/* Protection for concurrent accesses to balloon virtqueues and vb->acked */
+DEFINE_MUTEX(vb_queue_completion);
 
+static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq,
+		      struct scatterlist *sg)
+{
+	mutex_lock(&vb_queue_completion);
 	init_completion(&vb->acked);
 
 	/* We should always be able to add one buffer to an empty queue. */
-	if (virtqueue_add_buf(vq, &sg, 1, 0, vb, GFP_KERNEL) < 0)
+	if (virtqueue_add_buf(vq, sg, 1, 0, vb, GFP_KERNEL) < 0)
 		BUG();
 	virtqueue_kick(vq);
 
 	/* When host has read buffer, this completes via balloon_ack */
 	wait_for_completion(&vb->acked);
+	mutex_unlock(&vb_queue_completion);
 }
 
 static void set_page_pfns(u32 pfns[], struct page *page)
@@ -126,9 +132,12 @@ static void set_page_pfns(u32 pfns[], struct page *page)
 
 static void fill_balloon(struct virtio_balloon *vb, size_t num)
 {
+	struct scatterlist sg;
+	int alloc_failed = 0;
 	/* We can only do one array worth at a time. */
 	num = min(num, ARRAY_SIZE(vb->pfns));
 
+	spin_lock(&vb->pfn_list_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
 	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
 		struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
@@ -138,8 +147,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
 				dev_printk(KERN_INFO, &vb->vdev->dev,
 					   "Out of puff! Can't get %zu pages\n",
 					   num);
-			/* Sleep for at least 1/5 of a second before retry. */
-			msleep(200);
+			alloc_failed = 1;
 			break;
 		}
 		set_page_pfns(vb->pfns + vb->num_pfns, page);
@@ -149,10 +157,19 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
 	}
 
 	/* Didn't get any?  Oh well. */
-	if (vb->num_pfns == 0)
+	if (vb->num_pfns == 0) {
+		spin_unlock(&vb->pfn_list_lock);
 		return;
+	}
+
+	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+	spin_unlock(&vb->pfn_list_lock);
 
-	tell_host(vb, vb->inflate_vq);
+	/* alloc_page failed, sleep for at least 1/5 of a sec before retry. */
+	if (alloc_failed)
+		msleep(200);
+
+	tell_host(vb, vb->inflate_vq, &sg);
 }
 
 static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -169,10 +186,12 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
 static void leak_balloon(struct virtio_balloon *vb, size_t num)
 {
 	struct page *page;
+	struct scatterlist sg;
 
 	/* We can only do one array worth at a time. */
 	num = min(num, ARRAY_SIZE(vb->pfns));
 
+	spin_lock(&vb->pfn_list_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
 	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
 		page = list_first_entry(&vb->pages, struct page, lru);
@@ -180,13 +199,15 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
 		set_page_pfns(vb->pfns + vb->num_pfns, page);
 		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
 	}
+	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+	spin_unlock(&vb->pfn_list_lock);
 
 	/*
 	 * Note that if
 	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
 	 * is true, we *have* to do it in this order
 	 */
-	tell_host(vb, vb->deflate_vq);
+	tell_host(vb, vb->deflate_vq, &sg);
 	release_pages_by_pfn(vb->pfns, vb->num_pfns);
 }
 
@@ -356,6 +377,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	}
 
 	INIT_LIST_HEAD(&vb->pages);
+	spin_lock_init(&vb->pfn_list_lock);
+
 	vb->num_pages = 0;
 	init_waitqueue_head(&vb->config_change);
 	vb->vdev = vdev;
-- 
1.7.10.2

^ permalink raw reply related

* [PATCH v2 3/4] virtio_balloon: introduce migration primitives to balloon pages
From: Rafael Aquini @ 2012-06-28 21:49 UTC (permalink / raw)
  To: linux-mm
  Cc: Rik van Riel, Rafael Aquini, Konrad Rzeszutek Wilk,
	Michael S. Tsirkin, linux-kernel, virtualization, Andi Kleen,
	Andrew Morton
In-Reply-To: <cover.1340916058.git.aquini@redhat.com>

This patch makes balloon pages movable at allocation time and introduces the
infrastructure needed to perform the balloon page migration operation.

Signed-off-by: Rafael Aquini <aquini@redhat.com>
---
 drivers/virtio/virtio_balloon.c |   96 ++++++++++++++++++++++++++++++++++++++-
 include/linux/virtio_balloon.h  |    6 +++
 2 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index d47c5c2..53386aa 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -27,6 +27,8 @@
 #include <linux/delay.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/pagemap.h>
 
 /*
  * Balloon device works in 4K page units.  So each page is pointed to by
@@ -140,8 +142,9 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
 	spin_lock(&vb->pfn_list_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
 	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-		struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
-					__GFP_NOMEMALLOC | __GFP_NOWARN);
+		struct page *page = alloc_page(GFP_HIGHUSER_MOVABLE |
+						__GFP_NORETRY | __GFP_NOWARN |
+						__GFP_NOMEMALLOC);
 		if (!page) {
 			if (printk_ratelimit())
 				dev_printk(KERN_INFO, &vb->vdev->dev,
@@ -154,6 +157,7 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num)
 		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
 		totalram_pages--;
 		list_add(&page->lru, &vb->pages);
+		page->mapping = balloon_mapping;
 	}
 
 	/* Didn't get any?  Oh well. */
@@ -195,6 +199,7 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num)
 	for (vb->num_pfns = 0; vb->num_pfns < num;
 	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
 		page = list_first_entry(&vb->pages, struct page, lru);
+		page->mapping = NULL;
 		list_del(&page->lru);
 		set_page_pfns(vb->pfns + vb->num_pfns, page);
 		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
@@ -365,6 +370,77 @@ static int init_vqs(struct virtio_balloon *vb)
 	return 0;
 }
 
+/*
+ * Populate balloon_mapping->a_ops->migratepage method to perform the balloon
+ * page migration task.
+ *
+ * After a ballooned page gets isolated by compaction procedures, this is the
+ * function that performs the page migration on behalf of move_to_new_page(),
+ * when the last calls (page)->mapping->a_ops->migratepage.
+ *
+ * Page migration for virtio balloon is done in a simple swap fashion which
+ * follows these two steps:
+ *  1) insert newpage into vb->pages list and update the host about it;
+ *  2) update the host about the removed old page from vb->pages list;
+ */
+int virtballoon_migratepage(struct address_space *mapping,
+		struct page *newpage, struct page *page, enum migrate_mode mode)
+{
+	struct virtio_balloon *vb = (void *)mapping->backing_dev_info;
+	struct scatterlist sg;
+
+	/* balloon's page migration 1st step */
+	spin_lock(&vb->pfn_list_lock);
+	vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
+	list_add(&newpage->lru, &vb->pages);
+	set_page_pfns(vb->pfns, newpage);
+	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+	spin_unlock(&vb->pfn_list_lock);
+	tell_host(vb, vb->inflate_vq, &sg);
+
+	/* balloon's page migration 2nd step */
+	spin_lock(&vb->pfn_list_lock);
+	vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
+	set_page_pfns(vb->pfns, page);
+	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+	spin_unlock(&vb->pfn_list_lock);
+	tell_host(vb, vb->deflate_vq, &sg);
+
+	return 0;
+}
+
+/*
+ * Populate balloon_mapping->a_ops->invalidatepage method to help compaction on
+ * isolating a page from the balloon page list.
+ */
+void virtballoon_isolatepage(struct page *page, unsigned long mode)
+{
+	struct address_space *mapping = page->mapping;
+	struct virtio_balloon *vb = (void *)mapping->backing_dev_info;
+	spin_lock(&vb->pfn_list_lock);
+	list_del(&page->lru);
+	spin_unlock(&vb->pfn_list_lock);
+}
+
+/*
+ * Populate balloon_mapping->a_ops->freepage method to help compaction on
+ * re-inserting an isolated page into the balloon page list.
+ */
+void virtballoon_putbackpage(struct page *page)
+{
+	struct address_space *mapping = page->mapping;
+	struct virtio_balloon *vb = (void *)mapping->backing_dev_info;
+	spin_lock(&vb->pfn_list_lock);
+	list_add(&page->lru, &vb->pages);
+	spin_unlock(&vb->pfn_list_lock);
+}
+
+static const struct address_space_operations virtio_balloon_aops = {
+	.migratepage = virtballoon_migratepage,
+	.invalidatepage = virtballoon_isolatepage,
+	.freepage = virtballoon_putbackpage,
+};
+
 static int virtballoon_probe(struct virtio_device *vdev)
 {
 	struct virtio_balloon *vb;
@@ -384,6 +460,19 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	vb->vdev = vdev;
 	vb->need_stats_update = 0;
 
+	/* Init the ballooned page->mapping special balloon_mapping */
+	balloon_mapping = kmalloc(sizeof(*balloon_mapping), GFP_KERNEL);
+	if (!balloon_mapping) {
+		err = -ENOMEM;
+		goto out_free_mapping;
+	}
+
+	INIT_RADIX_TREE(&balloon_mapping->page_tree, GFP_ATOMIC | __GFP_NOWARN);
+	INIT_LIST_HEAD(&balloon_mapping->i_mmap_nonlinear);
+	spin_lock_init(&balloon_mapping->tree_lock);
+	balloon_mapping->a_ops = &virtio_balloon_aops;
+	balloon_mapping->backing_dev_info = (void *)vb;
+
 	err = init_vqs(vb);
 	if (err)
 		goto out_free_vb;
@@ -398,6 +487,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
 
 out_del_vqs:
 	vdev->config->del_vqs(vdev);
+out_free_mapping:
+	kfree(balloon_mapping);
 out_free_vb:
 	kfree(vb);
 out:
@@ -424,6 +515,7 @@ static void __devexit virtballoon_remove(struct virtio_device *vdev)
 	kthread_stop(vb->thread);
 	remove_common(vb);
 	kfree(vb);
+	kfree(balloon_mapping);
 }
 
 #ifdef CONFIG_PM
diff --git a/include/linux/virtio_balloon.h b/include/linux/virtio_balloon.h
index 652dc8b..db21300 100644
--- a/include/linux/virtio_balloon.h
+++ b/include/linux/virtio_balloon.h
@@ -56,4 +56,10 @@ struct virtio_balloon_stat {
 	u64 val;
 } __attribute__((packed));
 
+#if defined(CONFIG_COMPACTION)
+extern struct address_space *balloon_mapping;
+#else
+struct address_space *balloon_mapping;
+#endif
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.7.10.2

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox