From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754022AbdGUJtv (ORCPT ); Fri, 21 Jul 2017 05:49:51 -0400 Received: from szxga01-in.huawei.com ([45.249.212.187]:10240 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753611AbdGUJtu (ORCPT ); Fri, 21 Jul 2017 05:49:50 -0400 Subject: Re: [PATCH 0/4] Optimise 64-bit IOVA allocations To: Robin Murphy , Ard Biesheuvel References: <19661034-093e-a744-b6fb-3d23a285ebe3@arm.com> CC: Joerg Roedel , , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , David Woodhouse , Lorenzo Pieralisi , , , From: "Leizhen (ThunderTown)" Message-ID: <5971CDE9.5040600@huawei.com> Date: Fri, 21 Jul 2017 17:48:25 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <19661034-093e-a744-b6fb-3d23a285ebe3@arm.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.23.164] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A090203.5971CE0B.0029,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 6308910a377624200ece4840990afc2f Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2017/7/19 18:23, Robin Murphy wrote: > On 19/07/17 09:37, Ard Biesheuvel wrote: >> On 18 July 2017 at 17:57, Robin Murphy wrote: >>> Hi all, >>> >>> In the wake of the ARM SMMU optimisation efforts, it seems that certain >>> workloads (e.g. storage I/O with large scatterlists) probably remain quite >>> heavily influenced by IOVA allocation performance. Separately, Ard also >>> reported massive performance drops for a graphical desktop on AMD Seattle >>> when enabling SMMUs via IORT, which we traced to dma_32bit_pfn in the DMA >>> ops domain getting initialised differently for ACPI vs. DT, and exposing >>> the overhead of the rbtree slow path. Whilst we could go around trying to >>> close up all the little gaps that lead to hitting the slowest case, it >>> seems a much better idea to simply make said slowest case a lot less slow. >>> >>> I had a go at rebasing Leizhen's last IOVA series[1], but ended up finding >>> the changes rather too hard to follow, so I've taken the liberty here of >>> picking the whole thing up and reimplementing the main part in a rather >>> less invasive manner. >>> >>> Robin. >>> >>> [1] https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg17753.html >>> >>> Robin Murphy (1): >>> iommu/iova: Extend rbtree node caching >>> >>> Zhen Lei (3): >>> iommu/iova: Optimise rbtree searching >>> iommu/iova: Optimise the padding calculation >>> iommu/iova: Make dma_32bit_pfn implicit >>> >>> drivers/gpu/drm/tegra/drm.c | 3 +- >>> drivers/gpu/host1x/dev.c | 3 +- >>> drivers/iommu/amd_iommu.c | 7 +-- >>> drivers/iommu/dma-iommu.c | 18 +------ >>> drivers/iommu/intel-iommu.c | 11 ++-- >>> drivers/iommu/iova.c | 112 ++++++++++++++++----------------------- >>> drivers/misc/mic/scif/scif_rma.c | 3 +- >>> include/linux/iova.h | 8 +-- >>> 8 files changed, 60 insertions(+), 105 deletions(-) >>> >> >> These patches look suspiciously like the ones I have been using over >> the past couple of weeks (modulo the tegra and host1x changes) from >> your git tree. They work fine on my AMD Overdrive B1, both in DT and >> in ACPI/IORT modes, although it is difficult to quantify any >> performance deltas on my setup. > > Indeed - this is a rebase (to account for those new callers) with a > couple of trivial tweaks to error paths and corner cases that normal > usage shouldn't have been hitting anyway. "No longer unusably awful" is > a good enough performance delta for me :) > >> Tested-by: Ard Biesheuvel I got the same performance data compared with my patch version. It works well. Tested-by: Zhen Lei > > Thanks! > > Robin. > > . > -- Thanks! BestRegards