public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eddie Chapman <eddie@ehuk.net>
Cc: kvm@vger.kernel.org, Romain Francoise <romain@orebokech.com>,
	Michael Mueller <mimu@linux.vnet.ibm.com>,
	mityapetuhov@gmail.com
Subject: Re: Possible to backport this vhost-net fix to 3.10?
Date: Sun, 5 Oct 2014 18:44:45 +0300	[thread overview]
Message-ID: <20141005154445.GA14840@redhat.com> (raw)
In-Reply-To: <543164EF.4070007@ehuk.net>

On Sun, Oct 05, 2014 at 04:34:07PM +0100, Eddie Chapman wrote:
> 
> On 04/10/14 19:35, Michael S. Tsirkin wrote:
> >On Sat, Oct 04, 2014 at 12:38:24AM +0100, Eddie Chapman wrote:
> >>Hi,
> >>
> >>I've been regularly seeing on the 3.10 stable kernels the same problem as
> >>reported by Romain Francoise here:
> >>https://lkml.org/lkml/2013/1/23/492
> >>
> >>An example from my setup is at the bottom of this mail. It's a problem as
> >>qemu fails to run when it hits this, only solution is to do all qemu
> >>launches with vhost=off after it happens. It starts happening after the
> >>machine has been running for a while and after a few VMs have been started.
> >>I guess that is the fragmentation issue as the machine is never under any
> >>serious memory pressure when it happens.
> >>
> >>I see this set of changes for 3.16 has a couple of fixes which appear to
> >>address the problem:
> >>https://lkml.org/lkml/2014/6/11/302
> >>
> >>I was just wondering if there are any plans to backport these to 3.10, or
> >>even if it is actually possible (I'm not a kernel dev so wouldn't know)?
> >>
> >>If not, are there any other workarounds other than vhost=off?
> >>
> >>thanks,
> >>Eddie
> >
> >Yes, these patches aren't hard to backport.
> >Go ahead and post the backport, I'll review and ack.
> 
> Thanks Michael,
> 
> Actually I just discovered that Dmitry Petuhov backported
> 23cc5a991c7a9fb7e6d6550e65cee4f4173111c5 ("vhost-net: extend device
> allocation to vmalloc") last month to the Proxmox 3.10 kernel
> https://www.mail-archive.com/pve-devel@pve.proxmox.com/msg08873.html
> 
> He appears to have tested it quite thoroughly himself with a heavy workload,
> with no problems, though it hasn't gone into a Proxmox release yet.
> 
> His patch applies to vanilla kernel.org 3.10.55 with only slight fuzzines,
> so I've done some slight white space cleanup so it applies cleanly. vanilla
> 3.10.55 compiles fine on my machine without any errors or warnings with it.
> Is it OK (below)? Not sure it will meet stable submission rules?

OK but pls cleanup indentation, it's all scrambled.  You'll also need to
add proper attribution (using >From: header), your signature etc.


> 
> Dmitry also says that d04257b07f2362d4eb550952d5bf5f4241a8046d ("vhost-net:
> don't open-code kvfree") is not applicable in 3.10 because there's no
> open-coded kvfree() function (this appears in v3.15-rc5).

Yes that's just a cleanup, we don't do these in stable.

> Have added Dmitry to CC.
> 
> thanks,
> Eddie
> --- a/drivers/vhost/net.c	2014-10-05 15:34:12.282126999 +0100
> +++ b/drivers/vhost/net.c	2014-10-05 15:34:15.862140883 +0100
> @@ -18,6 +18,7 @@
>  #include <linux/rcupdate.h>
>  #include <linux/file.h>
>  #include <linux/slab.h>
> +#include <linux/vmalloc.h>
> 
>  #include <linux/net.h>
>  #include <linux/if_packet.h>
> @@ -707,18 +708,30 @@
>  	handle_rx(net);
>  }
> 
> +static void vhost_net_free(void *addr)
> +{
> +	if (is_vmalloc_addr(addr))
> +	vfree(addr);
> +	else
> +	kfree(addr);
> +}
> +
>  static int vhost_net_open(struct inode *inode, struct file *f)
>  {
> -	struct vhost_net *n = kmalloc(sizeof *n, GFP_KERNEL);
> +	struct vhost_net *n;
>  	struct vhost_dev *dev;
>  	struct vhost_virtqueue **vqs;
>  	int r, i;
> 
> -	if (!n)
> -		return -ENOMEM;
> +		n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> +		if (!n) {
> +			n = vmalloc(sizeof *n);
> +			if (!n)
> +			return -ENOMEM;
> +		}
>  	vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL);
>  	if (!vqs) {
> -		kfree(n);
> +		vhost_net_free(n);
>  		return -ENOMEM;
>  	}
> 
> @@ -737,7 +750,7 @@
>  	}
>  	r = vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
>  	if (r < 0) {
> -		kfree(n);
> +		vhost_net_free(n);
>  		kfree(vqs);
>  		return r;
>  	}
> @@ -840,7 +853,7 @@
>  	 * since jobs can re-queue themselves. */
>  	vhost_net_flush(n);
>  	kfree(n->dev.vqs);
> -	kfree(n);
> +	vhost_net_free(n);
>  	return 0;
>  }
> 
> 
> >
> >
> >>[1948751.794040] qemu-system-x86: page allocation failure: order:4,
> >>mode:0x1040d0
> >>[1948751.810341] CPU: 4 PID: 41198 Comm: qemu-system-x86 Not tainted
> >>3.10.53-rc1 #3
> >>[1948751.826846] Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS
> >>S1200BT.86B.02.00.0041.120520121743 12/05/2012
> >>[1948751.847285]  0000000000000004 ffff8802eaf3b9d8 ffffffff8162ff4d
> >>ffff8802eaf3ba68
> >>[1948751.864257]  ffffffff810ab771 0000000000000001 ffff8802eaf3bb48
> >>ffff8802eaf3ba68
> >>[1948751.881209]  ffffffff810abe68 ffffffff81ca2f40 ffffffff00000000
> >>0000000200000040
> >>[1948751.898276] Call Trace:
> >>[1948751.909628]  [<ffffffff8162ff4d>] dump_stack+0x19/0x1c
> >>[1948751.924284]  [<ffffffff810ab771>] warn_alloc_failed+0x111/0x126
> >>[1948751.939774]  [<ffffffff810abe68>] ?
> >>__alloc_pages_direct_compact+0x181/0x198
> >>[1948751.956650]  [<ffffffff810ac5ae>] __alloc_pages_nodemask+0x72f/0x77c
> >>[1948751.972853]  [<ffffffff810ac676>] __get_free_pages+0x12/0x41
> >>[1948751.988297]  [<ffffffffa04ac71b>] vhost_net_open+0x23/0x171 [vhost_net]
> >>[1948752.004938]  [<ffffffff8130d6c3>] misc_open+0x119/0x17d
> >>[1948752.020111]  [<ffffffff810e99b4>] chrdev_open+0x134/0x155
> >>[1948752.035604]  [<ffffffff81053193>] ? lg_local_unlock+0x1e/0x31
> >>[1948752.051436]  [<ffffffff810e9880>] ? cdev_put+0x24/0x24
> >>[1948752.066540]  [<ffffffff810e46b8>] do_dentry_open+0x15c/0x20f
> >>[1948752.082214]  [<ffffffff810e484b>] finish_open+0x34/0x3f
> >>[1948752.097234]  [<ffffffff810f2737>] do_last+0x996/0xbcb
> >>[1948752.111983]  [<ffffffff810ef98e>] ? link_path_walk+0x5e/0x791
> >>[1948752.127447]  [<ffffffff810f0296>] ? path_init+0x11d/0x403
> >>[1948752.142517]  [<ffffffff810f2a32>] path_openat+0xc6/0x43b
> >>[1948752.157207]  [<ffffffff81070f08>] ? __lock_acquire+0x9ae/0xa4a
> >>[1948752.172369]  [<ffffffff815ac2ef>] ? rtnl_unlock+0x9/0xb
> >>[1948752.186893]  [<ffffffff810f2eac>] do_filp_open+0x38/0x84
> >>[1948752.201503]  [<ffffffff81633673>] ? _raw_spin_unlock+0x26/0x2a
> >>[1948752.216719]  [<ffffffff810fdfef>] ? __alloc_fd+0xf6/0x10a
> >>[1948752.231521]  [<ffffffff810e437c>] do_sys_open+0x114/0x1a6
> >>[1948752.246396]  [<ffffffff810e4438>] SyS_open+0x19/0x1b
> >>[1948752.260709]  [<ffffffff816341d2>] system_call_fastpath+0x16/0x1b
> >

  reply	other threads:[~2014-10-05 15:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-03 23:38 Possible to backport this vhost-net fix to 3.10? Eddie Chapman
2014-10-04 18:35 ` Michael S. Tsirkin
2014-10-05 15:34   ` Eddie Chapman
2014-10-05 15:44     ` Michael S. Tsirkin [this message]
2014-10-07  3:42       ` Dmitry Petuhov
2014-10-07  3:47         ` Dmitry Petuhov
2014-10-07 12:34           ` Eddie Chapman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141005154445.GA14840@redhat.com \
    --to=mst@redhat.com \
    --cc=eddie@ehuk.net \
    --cc=kvm@vger.kernel.org \
    --cc=mimu@linux.vnet.ibm.com \
    --cc=mityapetuhov@gmail.com \
    --cc=romain@orebokech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox