All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Eddie Chapman <eddie@ehuk.net>
Cc: kvm@vger.kernel.org, Romain Francoise <romain@orebokech.com>,
	Michael Mueller <mimu@linux.vnet.ibm.com>,
	mityapetuhov@gmail.com
Subject: Re: Possible to backport this vhost-net fix to 3.10?
Date: Sun, 5 Oct 2014 18:44:45 +0300	[thread overview]
Message-ID: <20141005154445.GA14840@redhat.com> (raw)
In-Reply-To: <543164EF.4070007@ehuk.net>

On Sun, Oct 05, 2014 at 04:34:07PM +0100, Eddie Chapman wrote:
> 
> On 04/10/14 19:35, Michael S. Tsirkin wrote:
> >On Sat, Oct 04, 2014 at 12:38:24AM +0100, Eddie Chapman wrote:
> >>Hi,
> >>
> >>I've been regularly seeing on the 3.10 stable kernels the same problem as
> >>reported by Romain Francoise here:
> >>https://lkml.org/lkml/2013/1/23/492
> >>
> >>An example from my setup is at the bottom of this mail. It's a problem as
> >>qemu fails to run when it hits this, only solution is to do all qemu
> >>launches with vhost=off after it happens. It starts happening after the
> >>machine has been running for a while and after a few VMs have been started.
> >>I guess that is the fragmentation issue as the machine is never under any
> >>serious memory pressure when it happens.
> >>
> >>I see this set of changes for 3.16 has a couple of fixes which appear to
> >>address the problem:
> >>https://lkml.org/lkml/2014/6/11/302
> >>
> >>I was just wondering if there are any plans to backport these to 3.10, or
> >>even if it is actually possible (I'm not a kernel dev so wouldn't know)?
> >>
> >>If not, are there any other workarounds other than vhost=off?
> >>
> >>thanks,
> >>Eddie
> >
> >Yes, these patches aren't hard to backport.
> >Go ahead and post the backport, I'll review and ack.
> 
> Thanks Michael,
> 
> Actually I just discovered that Dmitry Petuhov backported
> 23cc5a991c7a9fb7e6d6550e65cee4f4173111c5 ("vhost-net: extend device
> allocation to vmalloc") last month to the Proxmox 3.10 kernel
> https://www.mail-archive.com/pve-devel@pve.proxmox.com/msg08873.html
> 
> He appears to have tested it quite thoroughly himself with a heavy workload,
> with no problems, though it hasn't gone into a Proxmox release yet.
> 
> His patch applies to vanilla kernel.org 3.10.55 with only slight fuzzines,
> so I've done some slight white space cleanup so it applies cleanly. vanilla
> 3.10.55 compiles fine on my machine without any errors or warnings with it.
> Is it OK (below)? Not sure it will meet stable submission rules?

OK but pls cleanup indentation, it's all scrambled.  You'll also need to
add proper attribution (using >From: header), your signature etc.


> 
> Dmitry also says that d04257b07f2362d4eb550952d5bf5f4241a8046d ("vhost-net:
> don't open-code kvfree") is not applicable in 3.10 because there's no
> open-coded kvfree() function (this appears in v3.15-rc5).

Yes that's just a cleanup, we don't do these in stable.

> Have added Dmitry to CC.
> 
> thanks,
> Eddie
> --- a/drivers/vhost/net.c	2014-10-05 15:34:12.282126999 +0100
> +++ b/drivers/vhost/net.c	2014-10-05 15:34:15.862140883 +0100
> @@ -18,6 +18,7 @@
>  #include <linux/rcupdate.h>
>  #include <linux/file.h>
>  #include <linux/slab.h>
> +#include <linux/vmalloc.h>
> 
>  #include <linux/net.h>
>  #include <linux/if_packet.h>
> @@ -707,18 +708,30 @@
>  	handle_rx(net);
>  }
> 
> +static void vhost_net_free(void *addr)
> +{
> +	if (is_vmalloc_addr(addr))
> +	vfree(addr);
> +	else
> +	kfree(addr);
> +}
> +
>  static int vhost_net_open(struct inode *inode, struct file *f)
>  {
> -	struct vhost_net *n = kmalloc(sizeof *n, GFP_KERNEL);
> +	struct vhost_net *n;
>  	struct vhost_dev *dev;
>  	struct vhost_virtqueue **vqs;
>  	int r, i;
> 
> -	if (!n)
> -		return -ENOMEM;
> +		n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
> +		if (!n) {
> +			n = vmalloc(sizeof *n);
> +			if (!n)
> +			return -ENOMEM;
> +		}
>  	vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL);
>  	if (!vqs) {
> -		kfree(n);
> +		vhost_net_free(n);
>  		return -ENOMEM;
>  	}
> 
> @@ -737,7 +750,7 @@
>  	}
>  	r = vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
>  	if (r < 0) {
> -		kfree(n);
> +		vhost_net_free(n);
>  		kfree(vqs);
>  		return r;
>  	}
> @@ -840,7 +853,7 @@
>  	 * since jobs can re-queue themselves. */
>  	vhost_net_flush(n);
>  	kfree(n->dev.vqs);
> -	kfree(n);
> +	vhost_net_free(n);
>  	return 0;
>  }
> 
> 
> >
> >
> >>[1948751.794040] qemu-system-x86: page allocation failure: order:4,
> >>mode:0x1040d0
> >>[1948751.810341] CPU: 4 PID: 41198 Comm: qemu-system-x86 Not tainted
> >>3.10.53-rc1 #3
> >>[1948751.826846] Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS
> >>S1200BT.86B.02.00.0041.120520121743 12/05/2012
> >>[1948751.847285]  0000000000000004 ffff8802eaf3b9d8 ffffffff8162ff4d
> >>ffff8802eaf3ba68
> >>[1948751.864257]  ffffffff810ab771 0000000000000001 ffff8802eaf3bb48
> >>ffff8802eaf3ba68
> >>[1948751.881209]  ffffffff810abe68 ffffffff81ca2f40 ffffffff00000000
> >>0000000200000040
> >>[1948751.898276] Call Trace:
> >>[1948751.909628]  [<ffffffff8162ff4d>] dump_stack+0x19/0x1c
> >>[1948751.924284]  [<ffffffff810ab771>] warn_alloc_failed+0x111/0x126
> >>[1948751.939774]  [<ffffffff810abe68>] ?
> >>__alloc_pages_direct_compact+0x181/0x198
> >>[1948751.956650]  [<ffffffff810ac5ae>] __alloc_pages_nodemask+0x72f/0x77c
> >>[1948751.972853]  [<ffffffff810ac676>] __get_free_pages+0x12/0x41
> >>[1948751.988297]  [<ffffffffa04ac71b>] vhost_net_open+0x23/0x171 [vhost_net]
> >>[1948752.004938]  [<ffffffff8130d6c3>] misc_open+0x119/0x17d
> >>[1948752.020111]  [<ffffffff810e99b4>] chrdev_open+0x134/0x155
> >>[1948752.035604]  [<ffffffff81053193>] ? lg_local_unlock+0x1e/0x31
> >>[1948752.051436]  [<ffffffff810e9880>] ? cdev_put+0x24/0x24
> >>[1948752.066540]  [<ffffffff810e46b8>] do_dentry_open+0x15c/0x20f
> >>[1948752.082214]  [<ffffffff810e484b>] finish_open+0x34/0x3f
> >>[1948752.097234]  [<ffffffff810f2737>] do_last+0x996/0xbcb
> >>[1948752.111983]  [<ffffffff810ef98e>] ? link_path_walk+0x5e/0x791
> >>[1948752.127447]  [<ffffffff810f0296>] ? path_init+0x11d/0x403
> >>[1948752.142517]  [<ffffffff810f2a32>] path_openat+0xc6/0x43b
> >>[1948752.157207]  [<ffffffff81070f08>] ? __lock_acquire+0x9ae/0xa4a
> >>[1948752.172369]  [<ffffffff815ac2ef>] ? rtnl_unlock+0x9/0xb
> >>[1948752.186893]  [<ffffffff810f2eac>] do_filp_open+0x38/0x84
> >>[1948752.201503]  [<ffffffff81633673>] ? _raw_spin_unlock+0x26/0x2a
> >>[1948752.216719]  [<ffffffff810fdfef>] ? __alloc_fd+0xf6/0x10a
> >>[1948752.231521]  [<ffffffff810e437c>] do_sys_open+0x114/0x1a6
> >>[1948752.246396]  [<ffffffff810e4438>] SyS_open+0x19/0x1b
> >>[1948752.260709]  [<ffffffff816341d2>] system_call_fastpath+0x16/0x1b
> >

  reply	other threads:[~2014-10-05 15:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-03 23:38 Possible to backport this vhost-net fix to 3.10? Eddie Chapman
2014-10-04 18:35 ` Michael S. Tsirkin
2014-10-05 15:34   ` Eddie Chapman
2014-10-05 15:44     ` Michael S. Tsirkin [this message]
2014-10-07  3:42       ` Dmitry Petuhov
2014-10-07  3:47         ` Dmitry Petuhov
2014-10-07 12:34           ` Eddie Chapman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141005154445.GA14840@redhat.com \
    --to=mst@redhat.com \
    --cc=eddie@ehuk.net \
    --cc=kvm@vger.kernel.org \
    --cc=mimu@linux.vnet.ibm.com \
    --cc=mityapetuhov@gmail.com \
    --cc=romain@orebokech.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.