From mboxrd@z Thu Jan  1 00:00:00 1970
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: [RFC PATCH] page_alloc: use first half of higher
 order chunks when halving
Date: Fri, 28 Mar 2014 13:02:01 -0400
Message-ID: <20140328170201.GB12659@phenom.dumpdata.com>
References: <5331E269.9090708@gmail.com>
	<20140326095533.GA7885@deinos.phlegethon.org>
	<20140326101746.GA14195@u109add4315675089e695.ant.amazon.com>
	<20140326150801.GD18387@phenom.dumpdata.com>
	<20140326151507.GF14195@u109add4315675089e695.ant.amazon.com>
	<5332F948.1020909@gmail.com>
	<20140326163609.GD21368@phenom.dumpdata.com>
	<533312C0.1050507@gmail.com>
	<20140326175606.GA24179@phenom.dumpdata.com>
	<5333518E.40203@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <konrad.wilk@oracle.com>) id 1WTaAa-00028D-F7
	for xen-devel@lists.xenproject.org; Fri, 28 Mar 2014 17:02:16 +0000
Content-Disposition: inline
In-Reply-To: <5333518E.40203@gmail.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Matthew Rushton <mvrushton@gmail.com>
Cc: Keir Fraser <keir@xen.org>, Matt Wilson <msw@amazon.com>, Matt Wilson <msw@linux.com>, Tim Deegan <tim@xen.org>, Jan Beulich <jbeulich@suse.com>, Andrew Cooper <andrew.cooper3@citrix.com>, xen-devel@lists.xenproject.org
List-Id: xen-devel@lists.xenproject.org

On Wed, Mar 26, 2014 at 03:15:42PM -0700, Matthew Rushton wrote:
> On 03/26/14 10:56, Konrad Rzeszutek Wilk wrote:
> >On Wed, Mar 26, 2014 at 10:47:44AM -0700, Matthew Rushton wrote:
> >>On 03/26/14 09:36, Konrad Rzeszutek Wilk wrote:
> >>>On Wed, Mar 26, 2014 at 08:59:04AM -0700, Matthew Rushton wrote:
> >>>>On 03/26/14 08:15, Matt Wilson wrote:
> >>>>>On Wed, Mar 26, 2014 at 11:08:01AM -0400, Konrad Rzeszutek Wilk wrote:
> >>>>>>Could you elaborate a bit more on the use-case please?
> >>>>>>My understanding is that most drivers use a scatter gather list - in which
> >>>>>>case it does not matter if the underlaying MFNs in the PFNs spare are
> >>>>>>not contingous.
> >>>>>>
> >>>>>>But I presume the issue you are hitting is with drivers doing dma_map_page
> >>>>>>and the page is not 4KB but rather large (compound page). Is that the
> >>>>>>problem you have observed?
> >>>>>Drivers are using very large size arguments to dma_alloc_coherent()
> >>>>>for things like RX and TX descriptor rings.
> >>>Large size like larger than 512kB? That would also cause problems
> >>>on baremetal then when swiotlb is activated I believe.
> >>I was looking at network IO performance so the buffers would not
> >>have been that large. I think large in this context is relative to
> >>the 4k page size and the odds of the buffer spanning a page
> >>boundary. For context I saw ~5-10% performance increase with guest
> >>network throughput by avoiding bounce buffers and also saw dom0 tcp
> >>streaming performance go from ~6Gb/s to over 9Gb/s on my test setup
> >>with a 10Gb NIC.
> >OK, but that would not be the dma_alloc_coherent ones then? That sounds
> >more like the generic TCP mechanism allocated 64KB pages instead of 4KB
> >and used those.
> >
> >Did you try looking at this hack that Ian proposed a long time ago
> >to verify that it is said problem?
> >
> >https://lkml.org/lkml/2013/9/4/540
> >
> 
> Yes I had seen that and intially had the same reaction but the
> change was relatively recent and not relevant. I *think* all the
> coherent allocations are ok since the swiotlb makes them contiguous.
> The problem comes with the use of the streaming api. As one example
> with jumbo frames enabled a driver might use larger rx buffers which
> triggers the problem.
> 
> I think the right thing to do is to make the dma streaming api work
> better with larger buffers on dom0. That way it works across all

OK.
> drivers and device types regardless of how they were designed.

Can you point me to an example of the DMA streaming API?

I am not sure if you mean 'streaming API' as scatter gather operations
using DMA API?

Is there a particular easy way for me to reproduce this. I have
to say I hadn't enabled Jumbo frame on my box since I am not even
sure if the switch I have can do it. Is there a idiots-punch-list
of how to reproduce this?

Thanks!
> 
> >>>>>--msw
> >>>>It's the dma streaming api I've noticed the problem with, so
> >>>>dma_map_single(). Applicable swiotlb code would be
> >>>>xen_swiotlb_map_page() and range_straddles_page_boundary(). So yes
> >>>>for larger buffers it can cause bouncing.
>