From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: Load increase after memory upgrade (part2) Date: Tue, 24 Jan 2012 09:17:35 -0500 Message-ID: <20120124141735.GA31731@phenom.dumpdata.com> References: <20111219145609.GB28969@andromeda.dapyr.net> <20120110215533.GA21862@phenom.dumpdata.com> <1442969761.20120112230601@eikelenboom.it> <20120113151307.GC5025@phenom.dumpdata.com> <1383590207.20120115123259@eikelenboom.it> <20120117210225.GA23782@phenom.dumpdata.com> <4F16BC97020000780006D6D6@nat28.tlf.novell.com> <20120118142923.GA6052@andromeda.dapyr.net> <20120123223213.GA31929@phenom.dumpdata.com> <4F1E80BE020000780006E9FB@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <4F1E80BE020000780006E9FB@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jan Beulich Cc: Konrad Rzeszutek Wilk , xen-devel , Sander Eikelenboom List-Id: xen-devel@lists.xenproject.org On Tue, Jan 24, 2012 at 08:58:22AM +0000, Jan Beulich wrote: > >>> On 23.01.12 at 23:32, Konrad Rzeszutek Wilk wrote: > > On Wed, Jan 18, 2012 at 10:29:23AM -0400, Konrad Rzeszutek Wilk wrote: > >> On Wed, Jan 18, 2012 at 11:35:35AM +0000, Jan Beulich wrote: > >> > >>> On 17.01.12 at 22:02, Konrad Rzeszutek Wilk wrote: > >> > > The issue as I understand is that the DVB drivers allocate their buffers > >> > > from 0->4GB most (all the time?) so they never have to do bounce-buffering. > >> > > > >> > > While the pv-ops one ends up quite frequently doing the bounce-buffering, > >> > > which > >> > > implies that the DVB drivers end up allocating their buffers above the > > 4GB. > >> > > This means we end up spending some CPU time (in the guest) copying the > >> > > memory > >> > > from >4GB to 0-4GB region (And vice-versa). > >> > > >> > This reminds me of something (not sure what XenoLinux you use for > >> > comparison) - how are they allocating that memory? Not vmalloc_32() > >> > >> I was using the 2.6.18, then the one I saw on Google for Gentoo, and now > >> I am going to look at the 2.6.38 from OpenSuSE. > >> > >> > by chance (I remember having seen numerous uses under - iirc - > >> > drivers/media/)? > >> > > >> > Obviously, vmalloc_32() and any GFP_DMA32 allocations do *not* do > >> > what their (driver) callers might expect in a PV guest (including the > >> > contiguity assumption for the latter, recalling that you earlier said > >> > you were able to see the problem after several guest starts), and I > >> > had put into our kernels an adjustment to make vmalloc_32() actually > >> > behave as expected. > >> > >> Aaah.. The plot thickens! Let me look in the sources! Thanks for the > >> pointer. > > > > Jan hints lead me to the videobuf-dma-sg.c which does indeed to vmalloc_32 > > and then performs PCI DMA operations on the allocted vmalloc_32 > > area. > > > > So I cobbled up the attached patch (hadn't actually tested it and sadly > > won't until next week) which removes the call to vmalloc_32 and instead > > sets up DMA allocated set of pages. > > What a big patch (which would need re-doing for every vmalloc_32() > caller)! Fixing vmalloc_32() would be much less intrusive (reproducing > our 3.2 version of the affected function below, but clearly that's not > pv-ops ready). I just want to get to the bottom of this before attempting a proper fix. > > Jan > > static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, > pgprot_t prot, int node, void *caller) > { > const int order = 0; > struct page **pages; > unsigned int nr_pages, array_size, i; > gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO; > #ifdef CONFIG_XEN > gfp_t dma_mask = gfp_mask & (__GFP_DMA | __GFP_DMA32); > > BUILD_BUG_ON((__GFP_DMA | __GFP_DMA32) != (__GFP_DMA + __GFP_DMA32)); > if (dma_mask == (__GFP_DMA | __GFP_DMA32)) > gfp_mask &= ~(__GFP_DMA | __GFP_DMA32); > #endif > > nr_pages = (area->size - PAGE_SIZE) >> PAGE_SHIFT; > array_size = (nr_pages * sizeof(struct page *)); > > area->nr_pages = nr_pages; > /* Please note that the recursion is strictly bounded. */ > if (array_size > PAGE_SIZE) { > pages = __vmalloc_node(array_size, 1, nested_gfp|__GFP_HIGHMEM, > PAGE_KERNEL, node, caller); > area->flags |= VM_VPAGES; > } else { > pages = kmalloc_node(array_size, nested_gfp, node); > } > area->pages = pages; > area->caller = caller; > if (!area->pages) { > remove_vm_area(area->addr); > kfree(area); > return NULL; > } > > for (i = 0; i < area->nr_pages; i++) { > struct page *page; > gfp_t tmp_mask = gfp_mask | __GFP_NOWARN; > > if (node < 0) > page = alloc_page(tmp_mask); > else > page = alloc_pages_node(node, tmp_mask, order); > > if (unlikely(!page)) { > /* Successfully allocated i pages, free them in __vunmap() */ > area->nr_pages = i; > goto fail; > } > area->pages[i] = page; > #ifdef CONFIG_XEN > if (dma_mask) { > if (xen_limit_pages_to_max_mfn(page, 0, 32)) { > area->nr_pages = i + 1; > goto fail; > } > if (gfp_mask & __GFP_ZERO) > clear_highpage(page); > } > #endif > } > > if (map_vm_area(area, prot, &pages)) > goto fail; > return area->addr; > > fail: > warn_alloc_failed(gfp_mask, order, > "vmalloc: allocation failure, allocated %ld of %ld bytes\n", > (area->nr_pages*PAGE_SIZE), area->size); > vfree(area->addr); > return NULL; > } > > ... > > #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32) > #define GFP_VMALLOC32 GFP_DMA32 | GFP_KERNEL > #elif defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA) > #define GFP_VMALLOC32 GFP_DMA | GFP_KERNEL > #elif defined(CONFIG_XEN) > #define GFP_VMALLOC32 __GFP_DMA | __GFP_DMA32 | GFP_KERNEL > #else > #define GFP_VMALLOC32 GFP_KERNEL > #endif