* Re: [virtio-dev] virtio-vsock live migration
From: Michael S. Tsirkin @ 2016-03-15 16:12 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: virtio-dev, Claudio Imbrenda, Christian Borntraeger,
Matt Benjamin, virtualization, Christoffer Dall
In-Reply-To: <20160315151529.GB26263@stefanha-x1.localdomain>
On Tue, Mar 15, 2016 at 03:15:29PM +0000, Stefan Hajnoczi wrote:
> On Mon, Mar 14, 2016 at 01:13:24PM +0200, Michael S. Tsirkin wrote:
> > On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote:
> > > Michael pointed out that the virtio-vsock draft specification does not
> > > address live migration and in fact currently precludes migration.
> > >
> > > Migration is fundamental so the device specification at least mustn't
> > > preclude it. Having brainstormed migration with Matthew Benjamin and
> > > Michael Tsirkin, I am now summarizing the approach that I want to
> > > include in the next draft specification.
> > >
> > > Feedback and comments welcome! In the meantime I will implement this in
> > > code and update the draft specification.
> >
> > Most of the issue seems to be a consequence of using a 4 byte CID.
> >
> > I think the right thing to do is just to teach guests
> > about 64 bit CIDs.
> >
> > For now, can we drop guest CID from guest to host communication completely,
> > making CID only host-visible? Maybe leave the space
> > in the packet so we can add CID there later.
> > It seems that in theory this will allow changing CID
> > during migration, transparently to the guest.
> >
> > Guest visible CID is required for guest to guest communication -
> > but IIUC that is not currently supported.
> > Maybe that can be made conditional on 64 bit addressing.
> > Alternatively, it seems much easier to accept that these channels get broken
> > across migration.
>
> I reached the conclusion that channels break across migration because:
>
> 1. 32-bit CIDs are in sockaddr_vm and we'd break AF_VSOCK ABI by
> changing it to 64-bit. Application code would be specific
> virtio-vsock and wouldn't work with other AF_VSOCK transports that
> use the 32-bit sockaddr_vm struct.
You don't have to repeat the IPv6 mistake. Make all 32 bit CIDs
64 bit CIDs by padding with 0s, then 64 bit apps can use
any CID.
Old 32 bit CID applications will not be able to use the extended
addresses, but hardcoding bugs
does not seem sane.
> 2. Dropping guest CIDs from the protocol breaks network protocols that
> send addresses.
Stick it in config space if you really have to.
But why do you need it on each packet?
> NFS and netperf are the first two protocols I looked
> at and both transmit address information across the connection...
Does netperf really attempt to get local IP
and then send that inline within the connection?
--
MST
^ permalink raw reply
* Re: [PATCH v1 01/19] mm: use put_page to free page instead of putback_lru_page
From: Vlastimil Babka @ 2016-03-15 19:06 UTC (permalink / raw)
To: Minchan Kim
Cc: jlayton, Rik van Riel, aquini, rknize, Sergey Senozhatsky,
Hugh Dickins, linux-kernel, virtualization, bfields, linux-mm,
Gioh Kim, Mel Gorman, Andrew Morton, Naoya Horiguchi, Joonsoo Kim,
koct9i
In-Reply-To: <20160315011656.GD19514@bbox>
On 15.3.2016 2:16, Minchan Kim wrote:
> On Mon, Mar 14, 2016 at 09:48:33AM +0100, Vlastimil Babka wrote:
>> On 03/11/2016 08:30 AM, Minchan Kim wrote:
>>
>> Yeah, and compaction (perhaps also other migration users) has to
>> drain the lru pvec... Getting rid of this stuff is worth even by
>> itself.
>
> Good note. Although we cannot remove lru pvec draining completely,
> at least, this patch removes a case which should drain pvec for
> returning freed page to buddy.
And this is in fact the only interesting case, right. The migrated page (at its
new target) doesn't concern compaction that much, that can go to lru pvec just
fine. But we do want the freed buddy pages to merge ASAP. I guess that's the
same for CMA, page isolation...
^ permalink raw reply
* Re: [RFC qemu 0/4] A PV solution for live migration optimization
From: Dr. David Alan Gilbert @ 2016-03-15 19:55 UTC (permalink / raw)
To: Li, Liang Z
Cc: ehabkost@redhat.com, kvm@vger.kernel.org, simhan@hpe.com,
Michael S. Tsirkin, linux-kernel@vger.kernel.org,
qemu-devel@nongnu.org, jitendra.kolhe@hpe.com, linux-mm@kvack.org,
mohan_parthasarathy@hpe.com, Amit Shah, pbonzini@redhat.com,
akpm@linux-foundation.org,
virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E0414E385@shsmsx102.ccr.corp.intel.com>
* Li, Liang Z (liang.z.li@intel.com) wrote:
> > On Mon, Mar 14, 2016 at 05:03:34PM +0000, Dr. David Alan Gilbert wrote:
> > > * Li, Liang Z (liang.z.li@intel.com) wrote:
> > > > >
> > > > > Hi,
> > > > > I'm just catching back up on this thread; so without reference
> > > > > to any particular previous mail in the thread.
> > > > >
> > > > > 1) How many of the free pages do we tell the host about?
> > > > > Your main change is telling the host about all the
> > > > > free pages.
> > > >
> > > > Yes, all the guest's free pages.
> > > >
> > > > > If we tell the host about all the free pages, then we might
> > > > > end up needing to allocate more pages and update the host
> > > > > with pages we now want to use; that would have to wait for the
> > > > > host to acknowledge that use of these pages, since if we don't
> > > > > wait for it then it might have skipped migrating a page we
> > > > > just started using (I don't understand how your series solves that).
> > > > > So the guest probably needs to keep some free pages - how many?
> > > >
> > > > Actually, there is no need to care about whether the free pages will be
> > used by the host.
> > > > We only care about some of the free pages we get reused by the guest,
> > right?
> > > >
> > > > The dirty page logging can be used to solve this, starting the dirty
> > > > page logging before getting the free pages informant from guest.
> > > > Even some of the free pages are modified by the guest during the
> > > > process of getting the free pages information, these modified pages will
> > be traced by the dirty page logging mechanism. So in the following
> > migration_bitmap_sync() function.
> > > > The pages in the free pages bitmap, but latter was modified, will be
> > > > reset to dirty. We won't omit any dirtied pages.
> > > >
> > > > So, guest doesn't need to keep any free pages.
> > >
> > > OK, yes, that works; so we do:
> > > * enable dirty logging
> > > * ask guest for free pages
> > > * initialise the migration bitmap as everything-free
> > > * then later we do the normal sync-dirty bitmap stuff and it all just works.
> > >
> > > That's nice and simple.
> >
> > This works once, sure. But there's an issue is that you have to defer migration
> > until you get the free page list, and this only works once. So you end up with
> > heuristics about how long to wait.
> >
> > Instead I propose:
> >
> > - mark all pages dirty as we do now.
> >
> > - at start of migration, start tracking dirty
> > pages in kvm, and tell guest to start tracking free pages
> >
> > we can now introduce any kind of delay, for example wait for ack from guest,
> > or do whatever else, or even just start migrating pages
> >
> > - repeatedly:
> > - get list of free pages from guest
> > - clear them in migration bitmap
> > - get dirty list from kvm
> >
> > - at end of migration, stop tracking writes in kvm,
> > and tell guest to stop tracking free pages
>
> I had thought of filtering out the free pages in each migration bitmap synchronization.
> The advantage is we can skip process as many free pages as possible. Not just once.
> The disadvantage is that we should change the current memory management code to track the free pages,
> instead of traversing the free page list to construct the free pages bitmap, to reduce the overhead to get the free pages bitmap.
> I am not sure the if the Kernel people would like it.
>
> If keeping the traversing mechanism, because of the overhead, maybe it's not worth to filter out the free pages repeatedly.
Well, Michael's idea of not waiting for the dirty
bitmap to be filled does make that idea of constnatly
using the free-bitmap better.
In that case, is it easier if something (guest/host?)
allocates some memory in the guests physical RAM space
and just points the host to it, rather than having an
explicit 'send'.
Dave
> Liang
>
>
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply
* Re: [RFC -next 2/2] virtio_net: Read and use the advised MTU
From: Aaron Conole @ 2016-03-15 20:52 UTC (permalink / raw)
To: Paolo Abeni; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <1457621848.9753.44.camel@redhat.com>
Paolo Abeni <pabeni@redhat.com> writes:
> On Thu, 2016-03-10 at 09:28 -0500, Aaron Conole wrote:
>> This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it
>> exists, read the advised MTU and use it.
>>
>> No proper error handling is provided for the case where a user changes the
>> negotiated MTU. A future commit will add proper error handling. Instead, a
>> warning is emitted if the guest changes the device MTU after previously being
>> given advice.
>>
>> Signed-off-by: Aaron Conole <aconole@redhat.com>
>> ---
>> drivers/net/virtio_net.c | 15 ++++++++++++++-
>> 1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 767ab11..7175563 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -146,6 +146,7 @@ struct virtnet_info {
>> virtio_net_ctrl_ack ctrl_status;
>> u8 ctrl_promisc;
>> u8 ctrl_allmulti;
>> + bool negotiated_mtu;
>> };
>>
>> struct padded_vnet_hdr {
>> @@ -1390,8 +1391,12 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
>>
>> static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
>> {
>> + struct virtnet_info *vi = netdev_priv(dev);
>> if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
>> return -EINVAL;
>> + if (vi->negotiated_mtu == true) {
>
> why don't:
>
> if ((vi->negotiated_mtu == true) && (dev->mtu != new_mtu))
>
> ?
Okay, I'll put this test in.
>> + pr_warn("changing mtu from negotiated mtu.");
>> + }
>> dev->mtu = new_mtu;
>> return 0;
>> }
>> @@ -1836,6 +1841,13 @@ static int virtnet_probe(struct virtio_device *vdev)
>> if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
>> vi->has_cvq = true;
>>
>> + if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
>> + vi->negotiated_mtu = true;
>> + dev->mtu = virtio_cread16(vdev,
>> + offsetof(struct virtio_net_config,
>> + mtu));
>> + }
>> +
>> if (vi->any_header_sg)
>> dev->needed_headroom = vi->hdr_len;
>>
>> @@ -2017,8 +2029,9 @@ static unsigned int features[] = {
>> VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
>> VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
>> VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
>> - VIRTIO_NET_F_CTRL_MAC_ADDR,
>> + VIRTIO_NET_F_CTRL_MAC_ADDR,
>
> Here a trailing white space slipped-in.
>
> Otherwise LGTM.
>
> Paolo
D'oh! Okay, v2 will have this fixed.
>> VIRTIO_F_ANY_LAYOUT,
>> + VIRTIO_NET_F_MTU,
>> };
>>
>> static struct virtio_driver virtio_net_driver = {
Thanks so much for the review, Paolo!
-Aaron
^ permalink raw reply
* Re: [RFC -next 2/2] virtio_net: Read and use the advised MTU
From: Aaron Conole @ 2016-03-15 20:53 UTC (permalink / raw)
To: Sergei Shtylyov; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <56E1C372.60204@cogentembedded.com>
Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> writes:
> Hello.
Hi Sergei,
> On 03/10/2016 05:28 PM, Aaron Conole wrote:
>
>> This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it
>> exists, read the advised MTU and use it.
>>
>> No proper error handling is provided for the case where a user changes the
>> negotiated MTU. A future commit will add proper error handling. Instead, a
>> warning is emitted if the guest changes the device MTU after previously being
>> given advice.
>>
>> Signed-off-by: Aaron Conole <aconole@redhat.com>
>> ---
>> drivers/net/virtio_net.c | 15 ++++++++++++++-
>> 1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 767ab11..7175563 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
> [...]
>> @@ -1390,8 +1391,12 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
>>
>> static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
>> {
>> + struct virtnet_info *vi = netdev_priv(dev);
>> if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
>> return -EINVAL;
>> + if (vi->negotiated_mtu == true) {
>> + pr_warn("changing mtu from negotiated mtu.");
>> + }
>
> {} not needed, see Documentation/CodingStyle.
Okay, I'll make sure to fix this with v2.
> [...]
>
> MBR, Sergei
Thanks so much for the review!
-Aaron
^ permalink raw reply
* [RFC v2 -next 0/2] virtio-net: Advised MTU feature
From: Aaron Conole @ 2016-03-15 21:04 UTC (permalink / raw)
To: netdev, Michael S. Tsirkin, virtualization, linux-kernel
The following series adds the ability for a hypervisor to set an MTU on the
guest during feature negotiation phase. This is useful for VM orchestration
when, for instance, tunneling is involved and the MTU of the various systems
should be homogenous.
The first patch adds the feature bit as described in the proposed VFIO spec
addition found at
https://lists.oasis-open.org/archives/virtio-dev/201603/msg00001.html
The second patch adds a user of the bit, and a warning when the guest changes
the MTU from the hypervisor advised MTU. Future patches may add more thorough
error handling.
v2:
* Whitespace and code style cleanups from Sergei Shtylyov and Paolo Abeni
* Additional test before printing a warning
Aaron Conole (2):
virtio: Start feature MTU support
virtio_net: Read the advised MTU
drivers/net/virtio_net.c | 12 ++++++++++++
include/uapi/linux/virtio_net.h | 3 +++
2 files changed, 15 insertions(+)
--
2.5.0
^ permalink raw reply
* [RFC v2 -next 1/2] virtio: Start feature MTU support
From: Aaron Conole @ 2016-03-15 21:04 UTC (permalink / raw)
To: netdev, Michael S. Tsirkin, virtualization, linux-kernel
In-Reply-To: <1458075853-14789-1-git-send-email-aconole@redhat.com>
This commit adds the feature bit and associated mtu device entry for the
virtio network device. Future commits will make use of these bits to support
negotiated MTU.
Signed-off-by: Aaron Conole <aconole@bytheb.org>
---
v2:
* No change
include/uapi/linux/virtio_net.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index ec32293..41a6a01 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -55,6 +55,7 @@
#define VIRTIO_NET_F_MQ 22 /* Device supports Receive Flow
* Steering */
#define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
+#define VIRTIO_NET_F_MTU 25 /* Device supports Default MTU Negotiation */
#ifndef VIRTIO_NET_NO_LEGACY
#define VIRTIO_NET_F_GSO 6 /* Host handles pkts w/ any GSO type */
@@ -73,6 +74,8 @@ struct virtio_net_config {
* Legal values are between 1 and 0x8000
*/
__u16 max_virtqueue_pairs;
+ /* Default maximum transmit unit advice */
+ __u16 mtu;
} __attribute__((packed));
/*
--
2.5.0
^ permalink raw reply related
* [RFC v2 -next 2/2] virtio_net: Read the advised MTU
From: Aaron Conole @ 2016-03-15 21:04 UTC (permalink / raw)
To: netdev, Michael S. Tsirkin, virtualization, linux-kernel
In-Reply-To: <1458075853-14789-1-git-send-email-aconole@redhat.com>
This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it
exists, read the advised MTU and use it.
No proper error handling is provided for the case where a user changes the
negotiated MTU. A future commit will add proper error handling. Instead, a
warning is emitted if the guest changes the device MTU after previously
being given advice.
Signed-off-by: Aaron Conole <aconole@bytheb.org>
---
v2:
* Whitespace cleanup in the last hunk
* Code style change around the pr_warn
* Additional test for mtu change before printing warning
drivers/net/virtio_net.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 767ab11..429fe01 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -146,6 +146,7 @@ struct virtnet_info {
virtio_net_ctrl_ack ctrl_status;
u8 ctrl_promisc;
u8 ctrl_allmulti;
+ bool negotiated_mtu;
};
struct padded_vnet_hdr {
@@ -1390,8 +1391,11 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
{
+ struct virtnet_info *vi = netdev_priv(dev);
if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
return -EINVAL;
+ if ((vi->negotiated_mtu) && (dev->mtu != new_mtu))
+ pr_warn("changing mtu while the advised mtu bit exists.");
dev->mtu = new_mtu;
return 0;
}
@@ -1836,6 +1840,13 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
vi->has_cvq = true;
+ if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
+ vi->negotiated_mtu = true;
+ dev->mtu = virtio_cread16(vdev,
+ offsetof(struct virtio_net_config,
+ mtu));
+ }
+
if (vi->any_header_sg)
dev->needed_headroom = vi->hdr_len;
@@ -2019,6 +2030,7 @@ static unsigned int features[] = {
VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
VIRTIO_NET_F_CTRL_MAC_ADDR,
VIRTIO_F_ANY_LAYOUT,
+ VIRTIO_NET_F_MTU,
};
static struct virtio_driver virtio_net_driver = {
--
2.5.0
^ permalink raw reply related
* Re: [RFC v2 -next 0/2] virtio-net: Advised MTU feature
From: Rick Jones @ 2016-03-15 21:34 UTC (permalink / raw)
To: Aaron Conole, netdev, Michael S. Tsirkin, virtualization,
linux-kernel
In-Reply-To: <1458075853-14789-1-git-send-email-aconole@redhat.com>
On 03/15/2016 02:04 PM, Aaron Conole wrote:
> The following series adds the ability for a hypervisor to set an MTU on the
> guest during feature negotiation phase. This is useful for VM orchestration
> when, for instance, tunneling is involved and the MTU of the various systems
> should be homogenous.
>
> The first patch adds the feature bit as described in the proposed VFIO spec
> addition found at
> https://lists.oasis-open.org/archives/virtio-dev/201603/msg00001.html
>
> The second patch adds a user of the bit, and a warning when the guest changes
> the MTU from the hypervisor advised MTU. Future patches may add more thorough
> error handling.
How do you see this interacting with VMs getting MTU settings via DHCP?
rick jones
>
> v2:
> * Whitespace and code style cleanups from Sergei Shtylyov and Paolo Abeni
> * Additional test before printing a warning
>
> Aaron Conole (2):
> virtio: Start feature MTU support
> virtio_net: Read the advised MTU
>
> drivers/net/virtio_net.c | 12 ++++++++++++
> include/uapi/linux/virtio_net.h | 3 +++
> 2 files changed, 15 insertions(+)
>
^ permalink raw reply
* RE: [RFC qemu 0/4] A PV solution for live migration optimization
From: Li, Liang Z @ 2016-03-16 1:20 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: ehabkost@redhat.com, kvm@vger.kernel.org, simhan@hpe.com,
Michael S. Tsirkin, linux-kernel@vger.kernel.org,
qemu-devel@nongnu.org, jitendra.kolhe@hpe.com, linux-mm@kvack.org,
mohan_parthasarathy@hpe.com, Amit Shah, pbonzini@redhat.com,
akpm@linux-foundation.org,
virtualization@lists.linux-foundation.org, rth@twiddle.net
In-Reply-To: <20160315195515.GL11728@work-vm>
> > > > > > I'm just catching back up on this thread; so without
> > > > > > reference to any particular previous mail in the thread.
> > > > > >
> > > > > > 1) How many of the free pages do we tell the host about?
> > > > > > Your main change is telling the host about all the
> > > > > > free pages.
> > > > >
> > > > > Yes, all the guest's free pages.
> > > > >
> > > > > > If we tell the host about all the free pages, then we might
> > > > > > end up needing to allocate more pages and update the host
> > > > > > with pages we now want to use; that would have to wait for the
> > > > > > host to acknowledge that use of these pages, since if we don't
> > > > > > wait for it then it might have skipped migrating a page we
> > > > > > just started using (I don't understand how your series solves that).
> > > > > > So the guest probably needs to keep some free pages - how
> many?
> > > > >
> > > > > Actually, there is no need to care about whether the free pages
> > > > > will be
> > > used by the host.
> > > > > We only care about some of the free pages we get reused by the
> > > > > guest,
> > > right?
> > > > >
> > > > > The dirty page logging can be used to solve this, starting the
> > > > > dirty page logging before getting the free pages informant from guest.
> > > > > Even some of the free pages are modified by the guest during the
> > > > > process of getting the free pages information, these modified
> > > > > pages will
> > > be traced by the dirty page logging mechanism. So in the following
> > > migration_bitmap_sync() function.
> > > > > The pages in the free pages bitmap, but latter was modified,
> > > > > will be reset to dirty. We won't omit any dirtied pages.
> > > > >
> > > > > So, guest doesn't need to keep any free pages.
> > > >
> > > > OK, yes, that works; so we do:
> > > > * enable dirty logging
> > > > * ask guest for free pages
> > > > * initialise the migration bitmap as everything-free
> > > > * then later we do the normal sync-dirty bitmap stuff and it all just
> works.
> > > >
> > > > That's nice and simple.
> > >
> > > This works once, sure. But there's an issue is that you have to
> > > defer migration until you get the free page list, and this only
> > > works once. So you end up with heuristics about how long to wait.
> > >
> > > Instead I propose:
> > >
> > > - mark all pages dirty as we do now.
> > >
> > > - at start of migration, start tracking dirty
> > > pages in kvm, and tell guest to start tracking free pages
> > >
> > > we can now introduce any kind of delay, for example wait for ack
> > > from guest, or do whatever else, or even just start migrating pages
> > >
> > > - repeatedly:
> > > - get list of free pages from guest
> > > - clear them in migration bitmap
> > > - get dirty list from kvm
> > >
> > > - at end of migration, stop tracking writes in kvm,
> > > and tell guest to stop tracking free pages
> >
> > I had thought of filtering out the free pages in each migration bitmap
> synchronization.
> > The advantage is we can skip process as many free pages as possible. Not
> just once.
> > The disadvantage is that we should change the current memory
> > management code to track the free pages, instead of traversing the free
> page list to construct the free pages bitmap, to reduce the overhead to get
> the free pages bitmap.
> > I am not sure the if the Kernel people would like it.
> >
> > If keeping the traversing mechanism, because of the overhead, maybe it's
> not worth to filter out the free pages repeatedly.
>
> Well, Michael's idea of not waiting for the dirty bitmap to be filled does make
> that idea of constnatly using the free-bitmap better.
>
No wait is a good idea.
Actually, we could shorten the waiting time by pre allocating the free pages bit map
and update it when guest allocating/freeing pages. it requires to modify the mm
related code. I don't know whether the kernel people like this.
> In that case, is it easier if something (guest/host?) allocates some memory in
> the guests physical RAM space and just points the host to it, rather than
> having an explicit 'send'.
>
Good idea too.
Liang
> Dave
^ permalink raw reply
* Re: [RFC v2 -next 0/2] virtio-net: Advised MTU feature
From: Pankaj Gupta @ 2016-03-16 4:55 UTC (permalink / raw)
To: Aaron Conole; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <1458075853-14789-1-git-send-email-aconole@redhat.com>
>
> The following series adds the ability for a hypervisor to set an MTU on the
> guest during feature negotiation phase. This is useful for VM orchestration
> when, for instance, tunneling is involved and the MTU of the various systems
> should be homogenous.
>
> The first patch adds the feature bit as described in the proposed VFIO spec
You mean VIRTIO spec?
> addition found at
> https://lists.oasis-open.org/archives/virtio-dev/201603/msg00001.html
>
> The second patch adds a user of the bit, and a warning when the guest changes
> the MTU from the hypervisor advised MTU. Future patches may add more thorough
> error handling.
>
> v2:
> * Whitespace and code style cleanups from Sergei Shtylyov and Paolo Abeni
> * Additional test before printing a warning
>
> Aaron Conole (2):
> virtio: Start feature MTU support
> virtio_net: Read the advised MTU
>
> drivers/net/virtio_net.c | 12 ++++++++++++
> include/uapi/linux/virtio_net.h | 3 +++
> 2 files changed, 15 insertions(+)
>
> --
> 2.5.0
>
>
^ permalink raw reply
* Re: [RFC v2 -next 2/2] virtio_net: Read the advised MTU
From: Sergei Shtylyov @ 2016-03-16 14:11 UTC (permalink / raw)
To: Aaron Conole, netdev, Michael S. Tsirkin, virtualization,
linux-kernel
In-Reply-To: <1458075853-14789-3-git-send-email-aconole@redhat.com>
Hello.
On 3/16/2016 12:04 AM, Aaron Conole wrote:
> This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it
> exists, read the advised MTU and use it.
>
> No proper error handling is provided for the case where a user changes the
> negotiated MTU. A future commit will add proper error handling. Instead, a
> warning is emitted if the guest changes the device MTU after previously
> being given advice.
>
> Signed-off-by: Aaron Conole <aconole@bytheb.org>
> ---
> v2:
> * Whitespace cleanup in the last hunk
> * Code style change around the pr_warn
> * Additional test for mtu change before printing warning
>
> drivers/net/virtio_net.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 767ab11..429fe01 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
[...]
> @@ -1390,8 +1391,11 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
>
> static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
> {
> + struct virtnet_info *vi = netdev_priv(dev);
> if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
> return -EINVAL;
> + if ((vi->negotiated_mtu) && (dev->mtu != new_mtu))
Inner parens not needed, please be consistent with the code above.
[...]
MBR, Sergei
^ permalink raw reply
* Re: [RFC v2 -next 2/2] virtio_net: Read the advised MTU
From: Michael S. Tsirkin @ 2016-03-16 14:24 UTC (permalink / raw)
To: Aaron Conole; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <1458075853-14789-3-git-send-email-aconole@redhat.com>
On Tue, Mar 15, 2016 at 05:04:13PM -0400, Aaron Conole wrote:
> This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it
> exists, read the advised MTU and use it.
>
> No proper error handling is provided for the case where a user changes the
> negotiated MTU. A future commit will add proper error handling. Instead, a
> warning is emitted if the guest changes the device MTU after previously
> being given advice.
I don't see this as an error. Device might at best give a hint,
user/network admin always knows best.
>
> Signed-off-by: Aaron Conole <aconole@bytheb.org>
> ---
> v2:
> * Whitespace cleanup in the last hunk
> * Code style change around the pr_warn
> * Additional test for mtu change before printing warning
>
> drivers/net/virtio_net.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 767ab11..429fe01 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -146,6 +146,7 @@ struct virtnet_info {
> virtio_net_ctrl_ack ctrl_status;
> u8 ctrl_promisc;
> u8 ctrl_allmulti;
> + bool negotiated_mtu;
> };
>
> struct padded_vnet_hdr {
> @@ -1390,8 +1391,11 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
>
> static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
> {
> + struct virtnet_info *vi = netdev_priv(dev);
> if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
> return -EINVAL;
> + if ((vi->negotiated_mtu) && (dev->mtu != new_mtu))
> + pr_warn("changing mtu while the advised mtu bit exists.");
I don't really see why are we warning here. Just drop this chunk,
as well as the flag in struct virtnet_info.
> dev->mtu = new_mtu;
> return 0;
> }
> @@ -1836,6 +1840,13 @@ static int virtnet_probe(struct virtio_device *vdev)
> if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
> vi->has_cvq = true;
>
> + if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
> + vi->negotiated_mtu = true;
> + dev->mtu = virtio_cread16(vdev,
> + offsetof(struct virtio_net_config,
> + mtu));
> + }
> +
> if (vi->any_header_sg)
> dev->needed_headroom = vi->hdr_len;
>
> @@ -2019,6 +2030,7 @@ static unsigned int features[] = {
> VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
> VIRTIO_NET_F_CTRL_MAC_ADDR,
> VIRTIO_F_ANY_LAYOUT,
> + VIRTIO_NET_F_MTU,
> };
>
> static struct virtio_driver virtio_net_driver = {
> --
> 2.5.0
^ permalink raw reply
* Re: [virtio-dev] virtio-vsock live migration
From: Stefan Hajnoczi @ 2016-03-16 14:32 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: virtio-dev, Claudio Imbrenda, Christian Borntraeger,
Matt Benjamin, virtualization, Christoffer Dall
In-Reply-To: <20160315180916-mutt-send-email-mst@redhat.com>
[-- Attachment #1.1: Type: text/plain, Size: 3907 bytes --]
On Tue, Mar 15, 2016 at 06:12:55PM +0200, Michael S. Tsirkin wrote:
> On Tue, Mar 15, 2016 at 03:15:29PM +0000, Stefan Hajnoczi wrote:
> > On Mon, Mar 14, 2016 at 01:13:24PM +0200, Michael S. Tsirkin wrote:
> > > On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote:
> > > > Michael pointed out that the virtio-vsock draft specification does not
> > > > address live migration and in fact currently precludes migration.
> > > >
> > > > Migration is fundamental so the device specification at least mustn't
> > > > preclude it. Having brainstormed migration with Matthew Benjamin and
> > > > Michael Tsirkin, I am now summarizing the approach that I want to
> > > > include in the next draft specification.
> > > >
> > > > Feedback and comments welcome! In the meantime I will implement this in
> > > > code and update the draft specification.
> > >
> > > Most of the issue seems to be a consequence of using a 4 byte CID.
> > >
> > > I think the right thing to do is just to teach guests
> > > about 64 bit CIDs.
> > >
> > > For now, can we drop guest CID from guest to host communication completely,
> > > making CID only host-visible? Maybe leave the space
> > > in the packet so we can add CID there later.
> > > It seems that in theory this will allow changing CID
> > > during migration, transparently to the guest.
> > >
> > > Guest visible CID is required for guest to guest communication -
> > > but IIUC that is not currently supported.
> > > Maybe that can be made conditional on 64 bit addressing.
> > > Alternatively, it seems much easier to accept that these channels get broken
> > > across migration.
> >
> > I reached the conclusion that channels break across migration because:
> >
> > 1. 32-bit CIDs are in sockaddr_vm and we'd break AF_VSOCK ABI by
> > changing it to 64-bit. Application code would be specific
> > virtio-vsock and wouldn't work with other AF_VSOCK transports that
> > use the 32-bit sockaddr_vm struct.
>
> You don't have to repeat the IPv6 mistake. Make all 32 bit CIDs
> 64 bit CIDs by padding with 0s, then 64 bit apps can use
> any CID.
>
> Old 32 bit CID applications will not be able to use the extended
> addresses, but hardcoding bugs
> does not seem sane.
A mixed 32-bit and 64-bit CID world is complex. The host doesn't know
in advance whether all applications (especially inside the guest) will
support 64-bit CIDs or not. 32-bit CID applications won't work if a
64-bit CID has been assigned.
It also opens up the question how unique CIDs are allocated across
hosts.
Given that AF_VSOCK in Linux already exists in the 32-bit CID version,
I'd prefer to make virtio-vsock compatible with that for the time being.
Extensions can be added in the future but just implementing existing
AF_VSOCK semantics will already allow the applications to run.
> > 2. Dropping guest CIDs from the protocol breaks network protocols that
> > send addresses.
>
> Stick it in config space if you really have to.
> But why do you need it on each packet?
If packets are implicitly guest<->host then adding guest<->guest
communication requires a virtio spec change. If packets contain
source/destination CIDs then allowing/forbidding guest<->host or
guest<->guest communication is purely a host policy decision. I think
it's worth keeping that in from the start.
> > NFS and netperf are the first two protocols I looked
> > at and both transmit address information across the connection...
>
>
> Does netperf really attempt to get local IP
> and then send that inline within the connection?
Yes, netperf has separate control and data sockets. I think part of the
reason for this split is that the control connection can communicate the
address details for the data connection over a different protocol (TCP +
RDMA?), but I'm not sure.
Stefan
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [virtio-dev] virtio-vsock live migration
From: Matt Benjamin @ 2016-03-16 14:58 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: virtio-dev, Michael S. Tsirkin, virtualization,
Christian Borntraeger, Claudio Imbrenda, Christoffer Dall
In-Reply-To: <20160316143200.GA689@stefanha-x1.localdomain>
Hi,
----- Original Message -----
> From: "Stefan Hajnoczi" <stefanha@redhat.com>
> To: "Michael S. Tsirkin" <mst@redhat.com>
> > > > I think the right thing to do is just to teach guests
> > > > about 64 bit CIDs.
> > > >
> > > > For now, can we drop guest CID from guest to host communication
> > > > completely,
> > > > making CID only host-visible? Maybe leave the space
> > > > in the packet so we can add CID there later.
> > > > It seems that in theory this will allow changing CID
> > > > during migration, transparently to the guest.
> > > >
> > > > Guest visible CID is required for guest to guest communication -
> > > > but IIUC that is not currently supported.
> > > > Maybe that can be made conditional on 64 bit addressing.
> > > > Alternatively, it seems much easier to accept that these channels get
> > > > broken
> > > > across migration.
> > >
> > > I reached the conclusion that channels break across migration because:
> > >
> > > 1. 32-bit CIDs are in sockaddr_vm and we'd break AF_VSOCK ABI by
> > > changing it to 64-bit. Application code would be specific
> > > virtio-vsock and wouldn't work with other AF_VSOCK transports that
> > > use the 32-bit sockaddr_vm struct.
> >
> > You don't have to repeat the IPv6 mistake. Make all 32 bit CIDs
> > 64 bit CIDs by padding with 0s, then 64 bit apps can use
> > any CID.
> >
> > Old 32 bit CID applications will not be able to use the extended
> > addresses, but hardcoding bugs
> > does not seem sane.
>
> A mixed 32-bit and 64-bit CID world is complex. The host doesn't know
> in advance whether all applications (especially inside the guest) will
> support 64-bit CIDs or not. 32-bit CID applications won't work if a
> 64-bit CID has been assigned.
>
> It also opens up the question how unique CIDs are allocated across
> hosts.
>
> Given that AF_VSOCK in Linux already exists in the 32-bit CID version,
> I'd prefer to make virtio-vsock compatible with that for the time being.
> Extensions can be added in the future but just implementing existing
> AF_VSOCK semantics will already allow the applications to run.
>
> > > 2. Dropping guest CIDs from the protocol breaks network protocols that
> > > send addresses.
> >
> > Stick it in config space if you really have to.
> > But why do you need it on each packet?
>
> If packets are implicitly guest<->host then adding guest<->guest
> communication requires a virtio spec change. If packets contain
> source/destination CIDs then allowing/forbidding guest<->host or
> guest<->guest communication is purely a host policy decision. I think
> it's worth keeping that in from the start.
I'm just the downstream consumer of vsock, but this was my intuition, as well.
Matt
>
> > > NFS and netperf are the first two protocols I looked
> > > at and both transmit address information across the connection...
> >
> >
> Stefan
>
--
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103
http://www.redhat.com/en/technologies/storage
tel. 734-707-0660
fax. 734-769-8938
cel. 734-216-5309
^ permalink raw reply
* Re: [virtio-dev] virtio-vsock live migration
From: Michael S. Tsirkin @ 2016-03-16 15:05 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: virtio-dev, Claudio Imbrenda, Christian Borntraeger,
Matt Benjamin, virtualization, Christoffer Dall
In-Reply-To: <20160316143200.GA689@stefanha-x1.localdomain>
On Wed, Mar 16, 2016 at 02:32:00PM +0000, Stefan Hajnoczi wrote:
> On Tue, Mar 15, 2016 at 06:12:55PM +0200, Michael S. Tsirkin wrote:
> > On Tue, Mar 15, 2016 at 03:15:29PM +0000, Stefan Hajnoczi wrote:
> > > On Mon, Mar 14, 2016 at 01:13:24PM +0200, Michael S. Tsirkin wrote:
> > > > On Thu, Mar 03, 2016 at 03:37:37PM +0000, Stefan Hajnoczi wrote:
> > > > > Michael pointed out that the virtio-vsock draft specification does not
> > > > > address live migration and in fact currently precludes migration.
> > > > >
> > > > > Migration is fundamental so the device specification at least mustn't
> > > > > preclude it. Having brainstormed migration with Matthew Benjamin and
> > > > > Michael Tsirkin, I am now summarizing the approach that I want to
> > > > > include in the next draft specification.
> > > > >
> > > > > Feedback and comments welcome! In the meantime I will implement this in
> > > > > code and update the draft specification.
> > > >
> > > > Most of the issue seems to be a consequence of using a 4 byte CID.
> > > >
> > > > I think the right thing to do is just to teach guests
> > > > about 64 bit CIDs.
> > > >
> > > > For now, can we drop guest CID from guest to host communication completely,
> > > > making CID only host-visible? Maybe leave the space
> > > > in the packet so we can add CID there later.
> > > > It seems that in theory this will allow changing CID
> > > > during migration, transparently to the guest.
> > > >
> > > > Guest visible CID is required for guest to guest communication -
> > > > but IIUC that is not currently supported.
> > > > Maybe that can be made conditional on 64 bit addressing.
> > > > Alternatively, it seems much easier to accept that these channels get broken
> > > > across migration.
> > >
> > > I reached the conclusion that channels break across migration because:
> > >
> > > 1. 32-bit CIDs are in sockaddr_vm and we'd break AF_VSOCK ABI by
> > > changing it to 64-bit. Application code would be specific
> > > virtio-vsock and wouldn't work with other AF_VSOCK transports that
> > > use the 32-bit sockaddr_vm struct.
> >
> > You don't have to repeat the IPv6 mistake. Make all 32 bit CIDs
> > 64 bit CIDs by padding with 0s, then 64 bit apps can use
> > any CID.
> >
> > Old 32 bit CID applications will not be able to use the extended
> > addresses, but hardcoding bugs
> > does not seem sane.
>
> A mixed 32-bit and 64-bit CID world is complex. The host doesn't know
> in advance whether all applications (especially inside the guest) will
> support 64-bit CIDs or not. 32-bit CID applications won't work if a
> 64-bit CID has been assigned.
Only for guest to guest communication, correct?
Host can do dual addressing as well.
Applications that do not want connections to be broken
will use 64 bit addresses. Old applications will keep
running until you migrate.
> It also opens up the question how unique CIDs are allocated across
> hosts.
I think it's actually a good idea to define this, rather than
leave things in the air. For example, EUI-64 can be used.
> Given that AF_VSOCK in Linux already exists in the 32-bit CID version,
> I'd prefer to make virtio-vsock compatible with that for the time being.
Yes, so we cut corners in order to ship it quickly,
but that is implementation. Linux can be extended.
Why limit the protocol to follow current implementation bugs?
> Extensions can be added in the future but just implementing existing
> AF_VSOCK semantics will already allow the applications to run.
It's an important goal. At the spec level, I do not think
it is a good idea to put this limitation in, but users can
just use a subset of the available address space in order
to make existing apps work.
> > > 2. Dropping guest CIDs from the protocol breaks network protocols that
> > > send addresses.
> >
> > Stick it in config space if you really have to.
> > But why do you need it on each packet?
>
> If packets are implicitly guest<->host then adding guest<->guest
> communication requires a virtio spec change. If packets contain
> source/destination CIDs then allowing/forbidding guest<->host or
> guest<->guest communication is purely a host policy decision. I think
> it's worth keeping that in from the start.
OK.
> > > NFS and netperf are the first two protocols I looked
> > > at and both transmit address information across the connection...
> >
> >
> > Does netperf really attempt to get local IP
> > and then send that inline within the connection?
>
> Yes, netperf has separate control and data sockets. I think part of the
> reason for this split is that the control connection can communicate the
> address details for the data connection over a different protocol (TCP +
> RDMA?), but I'm not sure.
>
> Stefan
Thinking about it, netperf does not survive disconnects.
So the current protocol would be useless for it.
I am not sure about NFS but from (long past) experience it did not
attempt to re-resolve the name to address, so changing an
address would break it as well.
So I think these applications would have to use a 64 bit CID.
Why, then, do we care about one aspect of these applications
(creating connections) and not another (not breaking them)?
--
MST
^ permalink raw reply
* Re: [RFC v2 -next 1/2] virtio: Start feature MTU support
From: Stephen Hemminger @ 2016-03-16 18:23 UTC (permalink / raw)
To: Aaron Conole; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <1458075853-14789-2-git-send-email-aconole@redhat.com>
On Tue, 15 Mar 2016 17:04:12 -0400
Aaron Conole <aconole@redhat.com> wrote:
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -55,6 +55,7 @@
> #define VIRTIO_NET_F_MQ 22 /* Device supports Receive Flow
> * Steering */
> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
> +#define VIRTIO_NET_F_MTU 25 /* Device supports Default MTU Negotiation */
>
> #ifndef VIRTIO_NET_NO_LEGACY
> #define VIRTIO_NET_F_GSO 6 /* Host handles pkts w/ any GSO type */
> @@ -73,6 +74,8 @@ struct virtio_net_config {
> * Legal values are between 1 and 0x8000
> */
> __u16 max_virtqueue_pairs;
> + /* Default maximum transmit unit advice */
> + __u16 mtu;
> } __attribute__((packed));
>
> /*
You can't change user visible headers without breaking ABI.
This structure might be used by other user code. Also how can this
work if host is using old size of structure.
^ permalink raw reply
* Re: [RFC v2 -next 1/2] virtio: Start feature MTU support
From: Michael S. Tsirkin @ 2016-03-16 18:29 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20160316112314.4510d386@samsung9>
On Wed, Mar 16, 2016 at 11:23:14AM -0700, Stephen Hemminger wrote:
> On Tue, 15 Mar 2016 17:04:12 -0400
> Aaron Conole <aconole@redhat.com> wrote:
>
> > --- a/include/uapi/linux/virtio_net.h
> > +++ b/include/uapi/linux/virtio_net.h
> > @@ -55,6 +55,7 @@
> > #define VIRTIO_NET_F_MQ 22 /* Device supports Receive Flow
> > * Steering */
> > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
> > +#define VIRTIO_NET_F_MTU 25 /* Device supports Default MTU Negotiation */
> >
> > #ifndef VIRTIO_NET_NO_LEGACY
> > #define VIRTIO_NET_F_GSO 6 /* Host handles pkts w/ any GSO type */
> > @@ -73,6 +74,8 @@ struct virtio_net_config {
> > * Legal values are between 1 and 0x8000
> > */
> > __u16 max_virtqueue_pairs;
> > + /* Default maximum transmit unit advice */
> > + __u16 mtu;
> > } __attribute__((packed));
> >
> > /*
>
> You can't change user visible headers without breaking ABI.
> This structure might be used by other user code.
Then this userspace is broken.
If someone uses virtio code one has to follow virtio spec.
And Virtio spec makes it very clear that
1. config space size can change at any time
2. fields can only be accessed if the correct feature bit
has been both advertized in host bitmap and acknowledged
in guest bitmap
> Also how can this
> work if host is using old size of structure.
It works because access is guarded by feature bit check.
--
MST
^ permalink raw reply
* Re: [PATCH v1 11/19] zsmalloc: squeeze freelist into page->mapping
From: YiPing Xu @ 2016-03-17 12:09 UTC (permalink / raw)
To: Minchan Kim, Sergey Senozhatsky
Cc: Rik van Riel, aquini, rknize, Sergey Senozhatsky, Hugh Dickins,
linux-kernel, virtualization, bfields, linux-mm, Gioh Kim, koct9i,
Joonsoo Kim, Andrew Morton, jlayton, Vlastimil Babka, Mel Gorman
In-Reply-To: <20160315065126.GA3039@bbox>
On 2016/3/15 14:51, Minchan Kim wrote:
> On Tue, Mar 15, 2016 at 03:40:53PM +0900, Sergey Senozhatsky wrote:
>> On (03/11/16 16:30), Minchan Kim wrote:
>>> -static void *location_to_obj(struct page *page, unsigned long obj_idx)
>>> +static void objidx_to_page_and_ofs(struct size_class *class,
>>> + struct page *first_page,
>>> + unsigned long obj_idx,
>>> + struct page **obj_page,
>>> + unsigned long *ofs_in_page)
>>
>> this looks big; 5 params, function "returning" both page and offset...
>> any chance to split it in two steps, perhaps?
>
> Yes, it's rather ugly but I don't have a good idea.
> Feel free to suggest if you have a better idea.
>
>>
>> besides, it is more intuitive (at least to me) when 'offset'
>> shortened to 'offt', not 'ofs'.
the purpose to get 'obj_page' and 'ofs_in_page' is to map the page and
get the meta-data pointer in the page, so, we can finish this in a
single function.
just like this, and maybe we could have a better function name
static unsigned long *map_handle(struct size_class *class,
struct page *first_page, unsigned long obj_idx)
{
struct page *cursor = first_page;
unsigned long offset = obj_idx * class->size;
int nr_page = offset >> PAGE_SHIFT;
unsigned long offset_in_page = offset & ~PAGE_MASK;
void *addr;
int i;
if (class->huge) {
VM_BUG_ON_PAGE(!is_first_page(page), page);
return &page_private(page);
}
for (i = 0; i < nr_page; i++)
cursor = get_next_page(cursor);
addr = kmap_atomic(cursor);
return addr + offset_in_page;
}
static void unmap_handle(unsigned long *addr)
{
if (class->huge) {
return;
}
kunmap_atomic(addr & ~PAGE_MASK);
}
all functions called "objidx_to_page_and_ofs" could use it like this,
for example:
static unsigned long handle_from_obj(struct size_class *class,
struct page *first_page, int obj_idx)
{
unsigned long *head = map_handle(class, first_page, obj_idx);
if (*head & OBJ_ALLOCATED_TAG)
handle = *head & ~OBJ_ALLOCATED_TAG;
unmap_handle(*head);
return handle;
}
'freeze_zspage', u'nfreeze_zspage' use it in the same way.
but in 'obj_malloc', we still have to get the page to get obj.
obj = location_to_obj(m_page, obj);
> Indeed. I will change it to get_page_and_offset instead of
> abbreviation if we cannot refactor it more.
>
>>
>> -ss
>>
>>> {
>>> - unsigned long obj;
>>> + int i;
>>> + unsigned long ofs;
>>> + struct page *cursor;
>>> + int nr_page;
>>>
>>> - if (!page) {
>>> - VM_BUG_ON(obj_idx);
>>> - return NULL;
>>> - }
>>> + ofs = obj_idx * class->size;
>>> + cursor = first_page;
>>> + nr_page = ofs >> PAGE_SHIFT;
>>>
>>> - obj = page_to_pfn(page) << OBJ_INDEX_BITS;
>>> - obj |= ((obj_idx) & OBJ_INDEX_MASK);
>>> - obj <<= OBJ_TAG_BITS;
>>> + *ofs_in_page = ofs & ~PAGE_MASK;
>>> +
>>> + for (i = 0; i < nr_page; i++)
>>> + cursor = get_next_page(cursor);
>>>
>>> - return (void *)obj;
>>> + *obj_page = cursor;
>>> }
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
> .
>
^ permalink raw reply
* [PATCH net-next] virtio_net: replace netdev_alloc_skb_ip_align() with napi_alloc_skb()
From: Paolo Abeni @ 2016-03-17 14:44 UTC (permalink / raw)
To: netdev; +Cc: virtualization, Hannes Frederic Sowa, Michael S. Tsirkin
This gives small but noticeable rx performance improvement (2-3%)
and will allow exploiting future napi improvement.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
drivers/net/virtio_net.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index fb0eae4..49d84e5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -260,7 +260,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
p = page_address(page) + offset;
/* copy small packet so we can reuse these pages for small data */
- skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN);
+ skb = napi_alloc_skb(&rq->napi, GOOD_COPY_LEN);
if (unlikely(!skb))
return NULL;
--
1.8.3.1
^ permalink raw reply related
* Re: [RFC v2 -next 1/2] virtio: Start feature MTU support
From: Aaron Conole @ 2016-03-17 21:10 UTC (permalink / raw)
To: Stephen Hemminger
Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <20160316112314.4510d386@samsung9>
Stephen Hemminger <stephen@networkplumber.org> writes:
> On Tue, 15 Mar 2016 17:04:12 -0400
> Aaron Conole <aconole@redhat.com> wrote:
>
>> --- a/include/uapi/linux/virtio_net.h
>> +++ b/include/uapi/linux/virtio_net.h
>> @@ -55,6 +55,7 @@
>> #define VIRTIO_NET_F_MQ 22 /* Device supports Receive Flow
>> * Steering */
>> #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
>> +#define VIRTIO_NET_F_MTU 25 /* Device supports Default MTU Negotiation */
>>
>> #ifndef VIRTIO_NET_NO_LEGACY
>> #define VIRTIO_NET_F_GSO 6 /* Host handles pkts w/ any GSO type */
>> @@ -73,6 +74,8 @@ struct virtio_net_config {
>> * Legal values are between 1 and 0x8000
>> */
>> __u16 max_virtqueue_pairs;
>> + /* Default maximum transmit unit advice */
>> + __u16 mtu;
>> } __attribute__((packed));
>>
>> /*
>
> You can't change user visible headers without breaking ABI.
> This structure might be used by other user code. Also how can this
> work if host is using old size of structure.
How else can this field be added and remain compliant with the spec? The
spec requires that mtu be passed in the virtio_net_config field.
As for old sizeof, I think the absence of the VIRTIO_NET_F_MTU bit being
asserted is confirmation that mtu is not valid (at least, it is implied
in the spec).
Thanks so much for the review, Stephen!
-Aaron
^ permalink raw reply
* Re: [RFC v2 -next 2/2] virtio_net: Read the advised MTU
From: Aaron Conole @ 2016-03-17 21:15 UTC (permalink / raw)
To: Sergei Shtylyov; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <56E9699A.9030908@cogentembedded.com>
Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> writes:
> Hello.
>
> On 3/16/2016 12:04 AM, Aaron Conole wrote:
>
>> This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it
>> exists, read the advised MTU and use it.
>>
>> No proper error handling is provided for the case where a user changes the
>> negotiated MTU. A future commit will add proper error handling. Instead, a
>> warning is emitted if the guest changes the device MTU after previously
>> being given advice.
>>
>> Signed-off-by: Aaron Conole <aconole@bytheb.org>
>> ---
>> v2:
>> * Whitespace cleanup in the last hunk
>> * Code style change around the pr_warn
>> * Additional test for mtu change before printing warning
>>
>> drivers/net/virtio_net.c | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 767ab11..429fe01 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
> [...]
>> @@ -1390,8 +1391,11 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
>>
>> static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
>> {
>> + struct virtnet_info *vi = netdev_priv(dev);
>> if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
>> return -EINVAL;
>> + if ((vi->negotiated_mtu) && (dev->mtu != new_mtu))
>
> Inner parens not needed, please be consistent with the code above.
Okay, I will do that.
Thanks so much for the review (again), Sergei!
-Aaron
> [...]
>
> MBR, Sergei
^ permalink raw reply
* Re: [RFC v2 -next 2/2] virtio_net: Read the advised MTU
From: Aaron Conole @ 2016-03-17 21:20 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20160316142446.GA4446@redhat.com>
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Tue, Mar 15, 2016 at 05:04:13PM -0400, Aaron Conole wrote:
>> This patch checks the feature bit for the VIRTIO_NET_F_MTU feature. If it
>> exists, read the advised MTU and use it.
>>
>> No proper error handling is provided for the case where a user changes the
>> negotiated MTU. A future commit will add proper error handling. Instead, a
>> warning is emitted if the guest changes the device MTU after previously
>> being given advice.
>
> I don't see this as an error. Device might at best give a hint,
> user/network admin always knows best.
>
>>
>> Signed-off-by: Aaron Conole <aconole@bytheb.org>
>> ---
>> v2:
>> * Whitespace cleanup in the last hunk
>> * Code style change around the pr_warn
>> * Additional test for mtu change before printing warning
>>
>> drivers/net/virtio_net.c | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 767ab11..429fe01 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -146,6 +146,7 @@ struct virtnet_info {
>> virtio_net_ctrl_ack ctrl_status;
>> u8 ctrl_promisc;
>> u8 ctrl_allmulti;
>> + bool negotiated_mtu;
>> };
>>
>> struct padded_vnet_hdr {
>> @@ -1390,8 +1391,11 @@ static const struct ethtool_ops virtnet_ethtool_ops = {
>>
>> static int virtnet_change_mtu(struct net_device *dev, int new_mtu)
>> {
>> + struct virtnet_info *vi = netdev_priv(dev);
>> if (new_mtu < MIN_MTU || new_mtu > MAX_MTU)
>> return -EINVAL;
>> + if ((vi->negotiated_mtu) && (dev->mtu != new_mtu))
>> + pr_warn("changing mtu while the advised mtu bit exists.");
>
> I don't really see why are we warning here. Just drop this chunk,
> as well as the flag in struct virtnet_info.
Okay. I was warning because the user is changing this after telling to
use something different - but if you don't think it's an error, I will
drop it.
Thanks so much for the review, Michael!
-Aaron
>> dev->mtu = new_mtu;
>> return 0;
>> }
>> @@ -1836,6 +1840,13 @@ static int virtnet_probe(struct virtio_device *vdev)
>> if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ))
>> vi->has_cvq = true;
>>
>> + if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) {
>> + vi->negotiated_mtu = true;
>> + dev->mtu = virtio_cread16(vdev,
>> + offsetof(struct virtio_net_config,
>> + mtu));
>> + }
>> +
>> if (vi->any_header_sg)
>> dev->needed_headroom = vi->hdr_len;
>>
>> @@ -2019,6 +2030,7 @@ static unsigned int features[] = {
>> VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
>> VIRTIO_NET_F_CTRL_MAC_ADDR,
>> VIRTIO_F_ANY_LAYOUT,
>> + VIRTIO_NET_F_MTU,
>> };
>>
>> static struct virtio_driver virtio_net_driver = {
>> --
>> 2.5.0
^ permalink raw reply
* Re: [RFC v2 -next 0/2] virtio-net: Advised MTU feature
From: Aaron Conole @ 2016-03-17 21:24 UTC (permalink / raw)
To: Rick Jones; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <56E87FEC.2030600@hpe.com>
Rick Jones <rick.jones2@hpe.com> writes:
> On 03/15/2016 02:04 PM, Aaron Conole wrote:
>> The following series adds the ability for a hypervisor to set an MTU on the
>> guest during feature negotiation phase. This is useful for VM orchestration
>> when, for instance, tunneling is involved and the MTU of the various systems
>> should be homogenous.
>>
>> The first patch adds the feature bit as described in the proposed VFIO spec
>> addition found at
>> https://lists.oasis-open.org/archives/virtio-dev/201603/msg00001.html
>>
>> The second patch adds a user of the bit, and a warning when the guest changes
>> the MTU from the hypervisor advised MTU. Future patches may add more thorough
>> error handling.
>
> How do you see this interacting with VMs getting MTU settings via DHCP?
This is intended for networks where the VMs are not given MTU via
DHCP. I don't think it should negatively interfere with such a
network. Does that make sense?
-Aaron
> rick jones
>
>>
>> v2:
>> * Whitespace and code style cleanups from Sergei Shtylyov and Paolo Abeni
>> * Additional test before printing a warning
>>
>> Aaron Conole (2):
>> virtio: Start feature MTU support
>> virtio_net: Read the advised MTU
>>
>> drivers/net/virtio_net.c | 12 ++++++++++++
>> include/uapi/linux/virtio_net.h | 3 +++
>> 2 files changed, 15 insertions(+)
>>
^ permalink raw reply
* Re: [RFC v2 -next 0/2] virtio-net: Advised MTU feature
From: Aaron Conole @ 2016-03-17 21:24 UTC (permalink / raw)
To: Pankaj Gupta; +Cc: netdev, virtualization, linux-kernel, Michael S. Tsirkin
In-Reply-To: <1969060351.30231176.1458104130844.JavaMail.zimbra@redhat.com>
Pankaj Gupta <pagupta@redhat.com> writes:
>>
>> The following series adds the ability for a hypervisor to set an MTU on the
>> guest during feature negotiation phase. This is useful for VM orchestration
>> when, for instance, tunneling is involved and the MTU of the various systems
>> should be homogenous.
>>
>> The first patch adds the feature bit as described in the proposed VFIO spec
>
> You mean VIRTIO spec?
Yes, sorry.
>> addition found at
>> https://lists.oasis-open.org/archives/virtio-dev/201603/msg00001.html
>>
>> The second patch adds a user of the bit, and a warning when the guest changes
>> the MTU from the hypervisor advised MTU. Future patches may add more thorough
>> error handling.
>>
>> v2:
>> * Whitespace and code style cleanups from Sergei Shtylyov and Paolo Abeni
>> * Additional test before printing a warning
>>
>> Aaron Conole (2):
>> virtio: Start feature MTU support
>> virtio_net: Read the advised MTU
>>
>> drivers/net/virtio_net.c | 12 ++++++++++++
>> include/uapi/linux/virtio_net.h | 3 +++
>> 2 files changed, 15 insertions(+)
>>
>> --
>> 2.5.0
>>
>>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox