From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: Possible to backport this vhost-net fix to 3.10? Date: Sun, 5 Oct 2014 18:44:45 +0300 Message-ID: <20141005154445.GA14840@redhat.com> References: <542F3370.1090405@ehuk.net> <20141004183508.GA15194@redhat.com> <543164EF.4070007@ehuk.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: kvm@vger.kernel.org, Romain Francoise , Michael Mueller , mityapetuhov@gmail.com To: Eddie Chapman Return-path: Received: from mx1.redhat.com ([209.132.183.28]:8414 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751625AbaJEPla (ORCPT ); Sun, 5 Oct 2014 11:41:30 -0400 Content-Disposition: inline In-Reply-To: <543164EF.4070007@ehuk.net> Sender: kvm-owner@vger.kernel.org List-ID: On Sun, Oct 05, 2014 at 04:34:07PM +0100, Eddie Chapman wrote: > > On 04/10/14 19:35, Michael S. Tsirkin wrote: > >On Sat, Oct 04, 2014 at 12:38:24AM +0100, Eddie Chapman wrote: > >>Hi, > >> > >>I've been regularly seeing on the 3.10 stable kernels the same problem as > >>reported by Romain Francoise here: > >>https://lkml.org/lkml/2013/1/23/492 > >> > >>An example from my setup is at the bottom of this mail. It's a problem as > >>qemu fails to run when it hits this, only solution is to do all qemu > >>launches with vhost=off after it happens. It starts happening after the > >>machine has been running for a while and after a few VMs have been started. > >>I guess that is the fragmentation issue as the machine is never under any > >>serious memory pressure when it happens. > >> > >>I see this set of changes for 3.16 has a couple of fixes which appear to > >>address the problem: > >>https://lkml.org/lkml/2014/6/11/302 > >> > >>I was just wondering if there are any plans to backport these to 3.10, or > >>even if it is actually possible (I'm not a kernel dev so wouldn't know)? > >> > >>If not, are there any other workarounds other than vhost=off? > >> > >>thanks, > >>Eddie > > > >Yes, these patches aren't hard to backport. > >Go ahead and post the backport, I'll review and ack. > > Thanks Michael, > > Actually I just discovered that Dmitry Petuhov backported > 23cc5a991c7a9fb7e6d6550e65cee4f4173111c5 ("vhost-net: extend device > allocation to vmalloc") last month to the Proxmox 3.10 kernel > https://www.mail-archive.com/pve-devel@pve.proxmox.com/msg08873.html > > He appears to have tested it quite thoroughly himself with a heavy workload, > with no problems, though it hasn't gone into a Proxmox release yet. > > His patch applies to vanilla kernel.org 3.10.55 with only slight fuzzines, > so I've done some slight white space cleanup so it applies cleanly. vanilla > 3.10.55 compiles fine on my machine without any errors or warnings with it. > Is it OK (below)? Not sure it will meet stable submission rules? OK but pls cleanup indentation, it's all scrambled. You'll also need to add proper attribution (using >From: header), your signature etc. > > Dmitry also says that d04257b07f2362d4eb550952d5bf5f4241a8046d ("vhost-net: > don't open-code kvfree") is not applicable in 3.10 because there's no > open-coded kvfree() function (this appears in v3.15-rc5). Yes that's just a cleanup, we don't do these in stable. > Have added Dmitry to CC. > > thanks, > Eddie > --- a/drivers/vhost/net.c 2014-10-05 15:34:12.282126999 +0100 > +++ b/drivers/vhost/net.c 2014-10-05 15:34:15.862140883 +0100 > @@ -18,6 +18,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -707,18 +708,30 @@ > handle_rx(net); > } > > +static void vhost_net_free(void *addr) > +{ > + if (is_vmalloc_addr(addr)) > + vfree(addr); > + else > + kfree(addr); > +} > + > static int vhost_net_open(struct inode *inode, struct file *f) > { > - struct vhost_net *n = kmalloc(sizeof *n, GFP_KERNEL); > + struct vhost_net *n; > struct vhost_dev *dev; > struct vhost_virtqueue **vqs; > int r, i; > > - if (!n) > - return -ENOMEM; > + n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); > + if (!n) { > + n = vmalloc(sizeof *n); > + if (!n) > + return -ENOMEM; > + } > vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL); > if (!vqs) { > - kfree(n); > + vhost_net_free(n); > return -ENOMEM; > } > > @@ -737,7 +750,7 @@ > } > r = vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX); > if (r < 0) { > - kfree(n); > + vhost_net_free(n); > kfree(vqs); > return r; > } > @@ -840,7 +853,7 @@ > * since jobs can re-queue themselves. */ > vhost_net_flush(n); > kfree(n->dev.vqs); > - kfree(n); > + vhost_net_free(n); > return 0; > } > > > > > > > >>[1948751.794040] qemu-system-x86: page allocation failure: order:4, > >>mode:0x1040d0 > >>[1948751.810341] CPU: 4 PID: 41198 Comm: qemu-system-x86 Not tainted > >>3.10.53-rc1 #3 > >>[1948751.826846] Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS > >>S1200BT.86B.02.00.0041.120520121743 12/05/2012 > >>[1948751.847285] 0000000000000004 ffff8802eaf3b9d8 ffffffff8162ff4d > >>ffff8802eaf3ba68 > >>[1948751.864257] ffffffff810ab771 0000000000000001 ffff8802eaf3bb48 > >>ffff8802eaf3ba68 > >>[1948751.881209] ffffffff810abe68 ffffffff81ca2f40 ffffffff00000000 > >>0000000200000040 > >>[1948751.898276] Call Trace: > >>[1948751.909628] [] dump_stack+0x19/0x1c > >>[1948751.924284] [] warn_alloc_failed+0x111/0x126 > >>[1948751.939774] [] ? > >>__alloc_pages_direct_compact+0x181/0x198 > >>[1948751.956650] [] __alloc_pages_nodemask+0x72f/0x77c > >>[1948751.972853] [] __get_free_pages+0x12/0x41 > >>[1948751.988297] [] vhost_net_open+0x23/0x171 [vhost_net] > >>[1948752.004938] [] misc_open+0x119/0x17d > >>[1948752.020111] [] chrdev_open+0x134/0x155 > >>[1948752.035604] [] ? lg_local_unlock+0x1e/0x31 > >>[1948752.051436] [] ? cdev_put+0x24/0x24 > >>[1948752.066540] [] do_dentry_open+0x15c/0x20f > >>[1948752.082214] [] finish_open+0x34/0x3f > >>[1948752.097234] [] do_last+0x996/0xbcb > >>[1948752.111983] [] ? link_path_walk+0x5e/0x791 > >>[1948752.127447] [] ? path_init+0x11d/0x403 > >>[1948752.142517] [] path_openat+0xc6/0x43b > >>[1948752.157207] [] ? __lock_acquire+0x9ae/0xa4a > >>[1948752.172369] [] ? rtnl_unlock+0x9/0xb > >>[1948752.186893] [] do_filp_open+0x38/0x84 > >>[1948752.201503] [] ? _raw_spin_unlock+0x26/0x2a > >>[1948752.216719] [] ? __alloc_fd+0xf6/0x10a > >>[1948752.231521] [] do_sys_open+0x114/0x1a6 > >>[1948752.246396] [] SyS_open+0x19/0x1b > >>[1948752.260709] [] system_call_fastpath+0x16/0x1b > >