From: Robin Murphy <robin.murphy@arm.com>
To: Douglas Anderson <dianders@chromium.org>,
Russell King <linux@arm.linux.org.uk>,
Mauro Carvalho Chehab <mchehab@osg.samsung.com>,
Tomasz Figa <tfiga@chromium.org>,
Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Pawel Osciak <pawel@osciak.com>,
Dmitry Torokhov <dmitry.torokhov@gmail.com>,
will.deacon@arm.com, akpm@linux-foundation.org,
dan.j.williams@intel.com, carlo@caione.org,
laurent.pinchart+renesas@ideasonboard.com,
mike.looijmans@topic.nl, lorenx4@gmail.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 1/5] ARM: dma-mapping: Optimize allocation
Date: Wed, 13 Jan 2016 12:17:52 +0000 [thread overview]
Message-ID: <56964070.1020303@arm.com> (raw)
In-Reply-To: <1452294332-23415-2-git-send-email-dianders@chromium.org>
Hi Doug,
On 08/01/16 23:05, Douglas Anderson wrote:
> The __iommu_alloc_buffer() is expected to be called to allocate pretty
> sizeable buffers. Upon simple tests of video I saw it trying to
> allocate 4,194,304 bytes. The function tries to allocate large chunks
> in order to optimize IOMMU TLB usage.
>
> The current function is very, very slow.
>
> One problem is the way it keeps trying and trying to allocate big
> chunks. Imagine a very fragmented memory that has 4M free but no
> contiguous pages at all. Further imagine allocating 4M (1024 pages).
> We'll do the following memory allocations:
> - For page 1:
> - Try to allocate order 10 (no retry)
> - Try to allocate order 9 (no retry)
> - ...
> - Try to allocate order 0 (with retry, but not needed)
> - For page 2:
> - Try to allocate order 9 (no retry)
> - Try to allocate order 8 (no retry)
> - ...
> - Try to allocate order 0 (with retry, but not needed)
> - ...
> - ...
>
> Total number of calls to alloc() calls for this case is:
> sum(int(math.log(i, 2)) + 1 for i in range(1, 1025))
> => 9228
>
> The above is obviously worse case, but given how slow alloc can be we
> really want to try to avoid even somewhat bad cases. I timed the old
> code with a device under memory pressure and it wasn't hard to see it
> take more than 120 seconds to allocate 4 megs of memory! (NOTE: testing
> was done on kernel 3.14, so possibly mainline would behave
> differently).
>
> A second problem is that allocating big chunks under memory pressure
> when we don't need them is just not a great idea anyway unless we really
> need them. We can make due pretty well with smaller chunks so it's
> probably wise to leave bigger chunks for other users once memory
> pressure is on.
>
> Let's adjust the allocation like this:
>
> 1. If a big chunk fails, stop trying to hard and bump down to lower
> order allocations.
> 2. Don't try useless orders. The whole point of big chunks is to
> optimize the TLB and it can really only make use of 2M, 1M, 64K and
> 4K sizes.
>
> We'll still tend to eat up a bunch of big chunks, but that might be the
> right answer for some users. A future patch could possibly add a new
> DMA_ATTR that would let the caller decide that TLB optimization isn't
> important and that we should use smaller chunks. Presumably this would
> be a sane strategy for some callers.
Now that I've had time to think about it properly:
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
I just had an absolutely disgusting idea of how to get the same
progression with just a single variable and no static array, but I'll
keep that firmly to myself as it's almost IOCCC-grade WTF :D
Thanks,
Robin.
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
> Changes in v5: None
> Changes in v4:
> - Added Marek's ack
>
> Changes in v3: None
> Changes in v2:
> - No longer just 1 page at a time, but gives up higher order quickly.
> - Only tries important higher order allocations that might help us.
>
> arch/arm/mm/dma-mapping.c | 34 ++++++++++++++++++++--------------
> 1 file changed, 20 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 0eca3812527e..bc9cebfa0891 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -1122,6 +1122,9 @@ static inline void __free_iova(struct dma_iommu_mapping *mapping,
> spin_unlock_irqrestore(&mapping->lock, flags);
> }
>
> +/* We'll try 2M, 1M, 64K, and finally 4K; array must end with 0! */
> +static const int iommu_order_array[] = { 9, 8, 4, 0 };
> +
> static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
> gfp_t gfp, struct dma_attrs *attrs)
> {
> @@ -1129,6 +1132,7 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
> int count = size >> PAGE_SHIFT;
> int array_size = count * sizeof(struct page *);
> int i = 0;
> + int order_idx = 0;
>
> if (array_size <= PAGE_SIZE)
> pages = kzalloc(array_size, GFP_KERNEL);
> @@ -1162,22 +1166,24 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
> while (count) {
> int j, order;
>
> - for (order = __fls(count); order > 0; --order) {
> - /*
> - * We do not want OOM killer to be invoked as long
> - * as we can fall back to single pages, so we force
> - * __GFP_NORETRY for orders higher than zero.
> - */
> - pages[i] = alloc_pages(gfp | __GFP_NORETRY, order);
> - if (pages[i])
> - break;
> + order = iommu_order_array[order_idx];
> +
> + /* Drop down when we get small */
> + if (__fls(count) < order) {
> + order_idx++;
> + continue;
> }
>
> - if (!pages[i]) {
> - /*
> - * Fall back to single page allocation.
> - * Might invoke OOM killer as last resort.
> - */
> + if (order) {
> + /* See if it's easy to allocate a high-order chunk */
> + pages[i] = alloc_pages(gfp | __GFP_NORETRY, order);
> +
> + /* Go down a notch at first sign of pressure */
> + if (!pages[i]) {
> + order_idx++;
> + continue;
> + }
> + } else {
> pages[i] = alloc_pages(gfp, 0);
> if (!pages[i])
> goto error;
>
next prev parent reply other threads:[~2016-01-13 12:18 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-08 23:05 [PATCH v5 0/5] dma-mapping: Patches for speeding up allocation Douglas Anderson
2016-01-08 23:05 ` [PATCH v5 1/5] ARM: dma-mapping: Optimize allocation Douglas Anderson
2016-01-13 12:17 ` Robin Murphy [this message]
2016-01-13 17:33 ` Tomasz Figa
2016-01-13 17:44 ` Robin Murphy
2016-01-08 23:05 ` [PATCH v5 2/5] common: DMA-mapping: add DMA_ATTR_NO_HUGE_PAGE attribute Douglas Anderson
2016-01-08 23:05 ` [PATCH v5 3/5] ARM: dma-mapping: Use DMA_ATTR_NO_HUGE_PAGE hint to optimize allocation Douglas Anderson
2016-01-08 23:05 ` [PATCH v5 4/5] [media] videobuf2-dc: Let drivers specify DMA attrs Douglas Anderson
2016-01-08 23:05 ` [PATCH v5 5/5] [media] s5p-mfc: Set DMA_ATTR_NO_HUGE_PAGE Douglas Anderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56964070.1020303@arm.com \
--to=robin.murphy@arm.com \
--cc=akpm@linux-foundation.org \
--cc=carlo@caione.org \
--cc=dan.j.williams@intel.com \
--cc=dianders@chromium.org \
--cc=dmitry.torokhov@gmail.com \
--cc=laurent.pinchart+renesas@ideasonboard.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
--cc=lorenx4@gmail.com \
--cc=m.szyprowski@samsung.com \
--cc=mchehab@osg.samsung.com \
--cc=mike.looijmans@topic.nl \
--cc=pawel@osciak.com \
--cc=tfiga@chromium.org \
--cc=will.deacon@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox