From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1D8EFC5320E for ; Tue, 27 Aug 2024 07:04:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:Message-ID:Date:References:In-Reply-To:Subject:Cc: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=uqQYv4b44hT/CkvJbY8Ry7Mx702lOpQ2r5lKftYwTw0=; b=H3cUxQ+aW+HyIW+1LuPb3wzGt2 osyFzgAU7mFIJkoHC6ZrWp8bgL7JlMvaZ64IKVyF2M0AkTAhsRo1B994fLSsR3RAlQzK455HmC1qy uTeiQM5zQKrvsipfUa7wQCzZoM72nleZC8oxsbj101kfVcoBEhBafs3ulc+OehPhV3Vg91PvN9lBj vib242/obBknHdiEvMEfRPbBIcEUnh+ywITpMMCbyZG+7Eewjalmt9F/leCVYS3tzKr6XrvpR74t5 4oPypr1v9sJ8Qw434PRVv7/KbVH6cKkZcEde6P8t4r8+EhkC/HGLMShYb8EyWjBdL55jlC/5BAcIV Y7PHBC1g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1siqFX-0000000A8io-33dF; Tue, 27 Aug 2024 07:04:19 +0000 Received: from mail.tkos.co.il ([84.110.109.230]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1siqEe-0000000A8Vf-1qTy; Tue, 27 Aug 2024 07:03:26 +0000 Received: from localhost (unknown [10.0.8.3]) by mail.tkos.co.il (Postfix) with ESMTP id 4385F440F60; Tue, 27 Aug 2024 10:01:30 +0300 (IDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tkos.co.il; s=default; t=1724742090; bh=rPGBSZ1Nu4oGte5lWpykwFS7VqD7s8DLfHaTrjWVjYU=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=mC8RkKPDlTDMdLg8O5LjBN7xg7+HIINo6Tc0jbi6128/xF1edkc82jSUISPuUlv29 3GqwSaxrrN87uFR+qSqiBIcQeV7Y6ruXOdEHdT+2zHw+5CZJbxsu2doa6bTWDTT9n0 5LucHaxgKPdt2p1HGotyICNCP8Av2kGH54uBlDq8LWLgg2k8DLgBbDIKuDmHMMZEoE gBoPeiNZAPLGi4ei0Yh70fn6+DXUfubBPbm/9wo7A1L1gsxd8W+51TnS6LdUMZbSTt aQRuxk66+dE+nmekRRWQb6Vmi0SSYOZKyIzNoJ8IZIM7Tdm0YfvlvDfV0QYvAUoOxS YyZmgm0PxEzPA== From: Baruch Siach To: Marek Szyprowski Cc: Christoph Hellwig , Catalin Marinas , Will Deacon , Robin Murphy , iommu@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, Petr =?utf-8?B?VGVzYcWZw61r?= , Ramon Fried , Elad Nachman , linux-rockchip@lists.infradead.org Subject: Re: [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit In-Reply-To: (Marek Szyprowski's message of "Tue, 27 Aug 2024 08:14:03 +0200") References: <17c067618b93e5d71f19c37826d54db4299621a3.1723359916.git.baruch@tkos.co.il> <53d988b1-bdce-422a-ae4e-158f305ad703@samsung.com> <87mskyva7o.fsf@tarshish> Date: Tue, 27 Aug 2024 10:03:21 +0300 Message-ID: <87ikvmv45i.fsf@tarshish> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240827_000324_915124_415364F7 X-CRM114-Status: GOOD ( 25.31 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Marek, On Tue, Aug 27 2024, Marek Szyprowski wrote: > On 27.08.2024 06:52, Baruch Siach wrote: >> Hi Marek, >> >> Thanks for your report. >> >> On Mon, Aug 26 2024, Marek Szyprowski wrote: >>> On 11.08.2024 09:09, Baruch Siach wrote: >>>> From: Catalin Marinas >>>> >>>> Hardware DMA limit might not be power of 2. When RAM range starts above >>>> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit >>>> can not encode this limit. >>>> >>>> Use plain address for DMA zone limit. >>>> >>>> Since DMA zone can now potentially span beyond 4GB physical limit of >>>> DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that cas= e. >>>> >>>> Signed-off-by: Catalin Marinas >>>> Co-developed-by: Baruch Siach >>>> Signed-off-by: Baruch Siach >>>> --- >>> This patch landed recently in linux-next as commit ba0fb44aed47 >>> ("dma-mapping: replace zone_dma_bits by zone_dma_limit"). During my >>> tests I found that it introduces the following warning on ARM64/Rockchip >>> based Odroid M1 board (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dt= s): >> Does this warning go away if you revert both 3be9b846896d and ba0fb44aed= 47? > > Yes, linux-next with above mentioned commits reverted works fine. > > >> Upstream rockchip DTs have no dma-ranges property. Is that the case for >> your platform as well? >> >> Can you share kernel report of DMA zones and swiotlb? On my platform I g= et: >> >> [ 0.000000] Zone ranges: >> [ 0.000000] DMA [mem 0x0000000800000000-0x000000083fffffff] >> [ 0.000000] DMA32 empty >> [ 0.000000] Normal [mem 0x0000000840000000-0x0000000fffffffff] >> ... >> [ 0.000000] software IO TLB: area num 8. >> [ 0.000000] software IO TLB: mapped [mem 0x000000083be38000-0x0000000= 83fe38000] (64MB) >> >> What do you get at your end? > > On ba0fb44aed47 I got: > > [ =C2=A0=C2=A0=C2=A00.000000] NUMA: No NUMA configuration found > [ =C2=A0=C2=A0=C2=A00.000000] NUMA: Faking a node at [mem=20 > 0x0000000000200000-0x00000001ffffffff] > [ =C2=A0=C2=A0=C2=A00.000000] NUMA: NODE_DATA [mem 0x1ff7a0600-0x1ff7a2ff= f] > [ =C2=A0=C2=A0=C2=A00.000000] Zone ranges: > [ =C2=A0=C2=A0=C2=A00.000000] =C2=A0=C2=A0DMA =C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0[mem 0x0000000000200000-0x00000001ffffffff] > [ =C2=A0=C2=A0=C2=A00.000000] =C2=A0=C2=A0DMA32 =C2=A0=C2=A0=C2=A0empty > [ =C2=A0=C2=A0=C2=A00.000000] =C2=A0=C2=A0Normal =C2=A0=C2=A0empty > [ =C2=A0=C2=A0=C2=A00.000000] Movable zone start for each node > [ =C2=A0=C2=A0=C2=A00.000000] Early memory node ranges > [ =C2=A0=C2=A0=C2=A00.000000] =C2=A0=C2=A0node =C2=A0=C2=A00: [mem 0x0000= 000000200000-0x00000000083fffff] > [ =C2=A0=C2=A0=C2=A00.000000] =C2=A0=C2=A0node =C2=A0=C2=A00: [mem 0x0000= 000009400000-0x00000000efffffff] > [ =C2=A0=C2=A0=C2=A00.000000] =C2=A0=C2=A0node =C2=A0=C2=A00: [mem 0x0000= 0001f0000000-0x00000001ffffffff] > [ =C2=A0=C2=A0=C2=A00.000000] Initmem setup node 0 [mem=20 > 0x0000000000200000-0x00000001ffffffff] > [ =C2=A0=C2=A0=C2=A00.000000] On node 0, zone DMA: 512 pages in unavailab= le ranges > [ =C2=A0=C2=A0=C2=A00.000000] On node 0, zone DMA: 4096 pages in unavaila= ble ranges > [ =C2=A0=C2=A0=C2=A00.000000] cma: Reserved 96 MiB at 0x00000001f0000000 = on node -1 > > ... > > [ =C2=A0=C2=A0=C2=A00.000000] software IO TLB: SWIOTLB bounce buffer size= adjusted to 3MB > [ =C2=A0=C2=A0=C2=A00.000000] software IO TLB: area num 4. > [ =C2=A0=C2=A0=C2=A00.000000] software IO TLB: mapped [mem=20 > 0x00000001fac00000-0x00000001fb000000] (4MB) > > On the fa3c109a6d30 (parent commit of the $subject) I got: > > [ =C2=A0=C2=A0=C2=A00.000000] NUMA: No NUMA configuration found > [ =C2=A0=C2=A0=C2=A00.000000] NUMA: Faking a node at [mem=20 > 0x0000000000200000-0x00000001ffffffff] > [ =C2=A0=C2=A0=C2=A00.000000] NUMA: NODE_DATA [mem 0x1ff7a0600-0x1ff7a2ff= f] > [ =C2=A0=C2=A0=C2=A00.000000] Zone ranges: > [ =C2=A0=C2=A0=C2=A00.000000] =C2=A0=C2=A0DMA =C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0[mem 0x0000000000200000-0x00000000ffffffff] > [ =C2=A0=C2=A0=C2=A00.000000] =C2=A0=C2=A0DMA32 =C2=A0=C2=A0=C2=A0empty > [ =C2=A0=C2=A0=C2=A00.000000] =C2=A0=C2=A0Normal =C2=A0=C2=A0[mem 0x00000= 00100000000-0x00000001ffffffff] > [ =C2=A0=C2=A0=C2=A00.000000] Movable zone start for each node > [ =C2=A0=C2=A0=C2=A00.000000] Early memory node ranges > [ =C2=A0=C2=A0=C2=A00.000000] =C2=A0=C2=A0node =C2=A0=C2=A00: [mem 0x0000= 000000200000-0x00000000083fffff] > [ =C2=A0=C2=A0=C2=A00.000000] =C2=A0=C2=A0node =C2=A0=C2=A00: [mem 0x0000= 000009400000-0x00000000efffffff] > [ =C2=A0=C2=A0=C2=A00.000000] =C2=A0=C2=A0node =C2=A0=C2=A00: [mem 0x0000= 0001f0000000-0x00000001ffffffff] > [ =C2=A0=C2=A0=C2=A00.000000] Initmem setup node 0 [mem=20 > 0x0000000000200000-0x00000001ffffffff] > [ =C2=A0=C2=A0=C2=A00.000000] On node 0, zone DMA: 512 pages in unavailab= le ranges > [ =C2=A0=C2=A0=C2=A00.000000] On node 0, zone DMA: 4096 pages in unavaila= ble ranges > [ =C2=A0=C2=A0=C2=A00.000000] cma: Reserved 96 MiB at 0x00000000ea000000 = on node -1 > > ... > > [ =C2=A0=C2=A0=C2=A00.000000] software IO TLB: area num 4. > [ =C2=A0=C2=A0=C2=A00.000000] software IO TLB: mapped [mem=20 > 0x00000000e6000000-0x00000000ea000000] (64MB) > > It looks that for some reasons $subject patch changes the default zone=20 > and swiotlb configuration. Does this fix the issue? diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index bfb10969cbf0..7fcd0aaa9bb6 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -116,6 +116,9 @@ static void __init arch_reserve_crashkernel(void) =20 static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit) { + if (memblock_start_of_DRAM() < U32_MAX) + zone_limit =3D min(zone_limit, U32_MAX); + return min(zone_limit, memblock_end_of_DRAM() - 1) + 1; } =20 Thanks, baruch >>> ------------[ cut here ]------------ >>> dwmmc_rockchip fe2b0000.mmc: swiotlb addr 0x00000001faf00000+4096 >>> overflow (mask ffffffff, bus limit 0). >>> WARNING: CPU: 3 PID: 1 at kernel/dma/swiotlb.c:1594 swiotlb_map+0x2f0/0= x308 >>> Modules linked in: >>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc4+ #15278 >>> Hardware name: Hardkernel ODROID-M1 (DT) >>> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=3D--) >>> pc : swiotlb_map+0x2f0/0x308 >>> lr : swiotlb_map+0x2f0/0x308 >>> ... >>> Call trace: >>> =C2=A0swiotlb_map+0x2f0/0x308 >>> =C2=A0dma_direct_map_sg+0x9c/0x2e4 >>> =C2=A0__dma_map_sg_attrs+0x28/0x94 >>> =C2=A0dma_map_sg_attrs+0x10/0x24 >>> =C2=A0dw_mci_pre_dma_transfer+0xb8/0xf4 >>> =C2=A0dw_mci_pre_req+0x50/0x68 >>> =C2=A0mmc_blk_mq_issue_rq+0x3e0/0x964 >>> =C2=A0mmc_mq_queue_rq+0x118/0x2b4 >>> =C2=A0blk_mq_dispatch_rq_list+0x21c/0x714 >>> =C2=A0__blk_mq_sched_dispatch_requests+0x490/0x58c >>> =C2=A0blk_mq_sched_dispatch_requests+0x30/0x6c >>> =C2=A0blk_mq_run_hw_queue+0x284/0x40c >>> =C2=A0blk_mq_flush_plug_list.part.0+0x190/0x974 >>> =C2=A0blk_mq_flush_plug_list+0x1c/0x2c >>> =C2=A0__blk_flush_plug+0xe4/0x140 >>> =C2=A0blk_finish_plug+0x38/0x4c >>> =C2=A0__ext4_get_inode_loc+0x22c/0x654 >>> =C2=A0__ext4_get_inode_loc_noinmem+0x40/0xa8 >>> =C2=A0__ext4_iget+0x154/0xcc0 >>> =C2=A0ext4_get_journal_inode+0x30/0x110 >>> =C2=A0ext4_load_and_init_journal+0x9c/0xaf0 >>> =C2=A0ext4_fill_super+0x1fec/0x2d90 >>> =C2=A0get_tree_bdev+0x140/0x1d8 >>> =C2=A0ext4_get_tree+0x18/0x24 >>> =C2=A0vfs_get_tree+0x28/0xe8 >>> =C2=A0path_mount+0x3e8/0xb7c >>> =C2=A0init_mount+0x68/0xac >>> =C2=A0do_mount_root+0x108/0x1dc >>> =C2=A0mount_root_generic+0x100/0x330 >>> =C2=A0mount_root+0x160/0x2d0 >>> =C2=A0initrd_load+0x1f0/0x2a0 >>> =C2=A0prepare_namespace+0x4c/0x29c >>> =C2=A0kernel_init_freeable+0x4b4/0x50c >>> =C2=A0kernel_init+0x20/0x1d8 >>> =C2=A0ret_from_fork+0x10/0x20 >>> irq event stamp: 1305682 >>> hardirqs last=C2=A0 enabled at (1305681): [] >>> console_unlock+0x124/0x130 >>> hardirqs last disabled at (1305682): [] el1_dbg+0x24/= 0x8c >>> softirqs last=C2=A0 enabled at (1305678): [] >>> handle_softirqs+0x4cc/0x4e4 >>> softirqs last disabled at (1305665): [] >>> __do_softirq+0x14/0x20 >>> ---[ end trace 0000000000000000 ]--- >>> >>> This "bus limit 0" seems to be a bit suspicious to me as well as the >>> fact that swiotlb is used for the MMC DMA. I will investigate this >>> further tomorrow. The board boots fine though. >> Looking at the code I guess that bus_dma_limit set to 0 means no bus >> limit. But dma_mask for your device indicates 32-bit device limit. This >> can't work with address above 4GB. For some reason DMA code tries to >> allocate from higher address. This is most likely the reason >> dma_capable() returns false. > > Indeed this looks like a source of the problem: > > [ =C2=A0=C2=A0=C2=A03.123618] Synopsys Designware Multimedia Card Interfa= ce Driver > [ =C2=A0=C2=A0=C2=A03.139653] dwmmc_rockchip fe2b0000.mmc: IDMAC supports= 32-bit=20 > address mode. > [ =C2=A0=C2=A0=C2=A03.147739] dwmmc_rockchip fe2b0000.mmc: Using internal= DMA controller. > [ =C2=A0=C2=A0=C2=A03.161659] dwmmc_rockchip fe2b0000.mmc: Version ID is = 270a > [ =C2=A0=C2=A0=C2=A03.168455] dwmmc_rockchip fe2b0000.mmc: DW MMC control= ler at irq=20 > 56,32 bit host data width,256 deep fifo > [ =C2=A0=C2=A0=C2=A03.182651] dwmmc_rockchip fe2b0000.mmc: Got CD GPIO > > ... > > [ =C2=A0=C2=A011.009258] ------------[ cut here ]------------ > [ =C2=A0=C2=A011.014762] dwmmc_rockchip fe2b0000.mmc: swiotlb addr=20 > 0x00000001faf00000+4096 overflow (mask ffffffff, bus limit 0). > > >> ... > > Best regards --=20 ~. .~ Tk Open Systems =3D}------------------------------------------------ooO--U--Ooo------------= {=3D - baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -