From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 200D3C77B61 for ; Thu, 27 Apr 2023 03:28:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:Subject:From:Cc:To:MIME-Version:Date:Message-ID: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=gIPz3K1ypKLQVAwHFnH8AUwM8NgWyhNYbYFFFqjMo0c=; b=Vu1SuJg4SAD2m6CIu+j+wL+1RS DBBk6+6DZumta3chzLoCbMNH8QaQ7zPR+WobgxoccVRba9Hx1/9Uyj7077EbK8lkYMUJRDLgY4zGo jengAXX2kcusCNx4mlckj1HQTE+0TZsn92D8KpDHSuC32igm7ph6cMqLfEZF3ZD/fkxy9yHdTowEg dgZxnZOvADl/DxNBeVdbNJ8DsV9T25E4XvcCFxqayE5yPqhdhn/gUT+NAMCf2x4rRvkyFXTFzGcw9 7HXEQEN7slxvg0XtvAzYpLAbKFKyObe1WAdkJPkam43lsk3i8qFUg9DGhB3NJoa3cWnhE8WbC/P+G BvpL5lWw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1prsHj-005Lq7-2y; Thu, 27 Apr 2023 03:27:07 +0000 Received: from mail-pf1-x436.google.com ([2607:f8b0:4864:20::436]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1prsHg-005Lol-18 for linux-arm-kernel@lists.infradead.org; Thu, 27 Apr 2023 03:27:06 +0000 Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-64115eef620so1927724b3a.1 for ; Wed, 26 Apr 2023 20:26:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1682566016; x=1685158016; h=content-transfer-encoding:subject:from:cc:to:content-language :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=j6iiCwpqwg3Stg5ewDT+7eOBcuuJP6ymqF+Bhs4PVrg=; b=W9e3gcH6ng5a3Z/dNx32J7//XZqRbAuvPo4voLNAdwGSyz53C/xSb2Wpf31tvMMzs0 xhj1Le/fa04Vzctf7yXesA70zbsUbiBF+kSJKFEXXjvw4+u4z/Ggv+tLN2Quq2sESoDZ yPQxNzMQ08ow7oZxkLG3jBsyjtR4iAWD2krb96Q8s5oGYMhRB+CnczlZA0uZDy0A2x20 pEUPTSV3yB7Jc1C277J+9/3bQ1dBOCLCqni9LKuyGSoEnCFRy/PKkTxpIpsqqx/Qh3iB rlv21SGMlV9CiDfJF61mo7hcuoOIdGfv8Nwf+IeGFdnqEFgp5sArnsjDFF5s/YGoqA0z yV6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682566016; x=1685158016; h=content-transfer-encoding:subject:from:cc:to:content-language :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=j6iiCwpqwg3Stg5ewDT+7eOBcuuJP6ymqF+Bhs4PVrg=; b=PVqa4XfMeraZ1amXSNOYjkyDnafRb05BvMimVdO1ChAT6OuXwj0MlRQLjLrgRQZxeh RDrZ0ynkae7Qj8x9/aGi1CDtSs6RG5TsnuDMkxkVe+1CcXj6VCYqHH347eHjdAY+jyLm +pVuEhdogMqlmPucPY4N9XjvfIy9dRogROaNRiwmheDBoZOerdLcABkjTTQQVSp8ue+w kTyb3kJRa3ZgCPX+isAz1OH9VOPiGLQEUVvHiyhWlvOc5ONST18pNQk8cgrHDE2j7G2r MLnNj/p2VIf7KYD6z8Do0DW0vfvLt9+i1pFSNNQ25FDr6CKTKSYCXFx3lC41E8Sl2Rn0 uIPA== X-Gm-Message-State: AC+VfDxAXGVp6AOlVY2gzzae114BNYSON9MuruJ/1rtRrPhMKz6HNlZY zER2ZwHtkAFadsix9DpwqVTzGg== X-Google-Smtp-Source: ACHHUZ6vSLuU1H9MGIvVSPAWipsymaOxXdPXoANqZNM2/gyZpFVsnQqt6Rlgd5Wc2mdPb4b1SRq9cA== X-Received: by 2002:a17:902:c40c:b0:1a9:7bf4:17d8 with SMTP id k12-20020a170902c40c00b001a97bf417d8mr660672plk.18.1682566015799; Wed, 26 Apr 2023 20:26:55 -0700 (PDT) Received: from [10.2.117.253] ([61.213.176.6]) by smtp.gmail.com with ESMTPSA id ju3-20020a170903428300b001a64a335e42sm10635275plb.160.2023.04.26.20.26.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 26 Apr 2023 20:26:55 -0700 (PDT) Message-ID: <2eb026b8-9e13-2b60-9e14-06417b142ac9@bytedance.com> Date: Thu, 27 Apr 2023 11:26:50 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Content-Language: en-US To: Will Deacon , Tomasz Nowicki , Laura Abbott Cc: Catalin Marinas , Will Deacon , Ard Biesheuvel , Anshuman Khandual , Mark Rutland , Kefeng Wang , Feiyang Chen , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org From: Gang Li Subject: [QUESTION FOR ARM64 TLB] performance issue and implementation difference of TLB flush X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230426_202704_722163_27E7AEC8 X-CRM114-Status: GOOD ( 11.94 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi all, I have encountered a performance issue on our ARM64 machine, which seems to be caused by the flush_tlb_kernel_range. Here is the stack on the ARM64 machine: # ARM64: ``` ghes_unmap clear_fixmap __set_fixmap flush_tlb_kernel_range ``` As we can see, the ARM64 implementation eventually calls flush_tlb_kernel_range, which flushes the TLB on all cores. However, on AMD64, the implementation calls flush_tlb_one_kernel instead. # AMD64: ``` ghes_unmap clear_fixmap __set_fixmap mmu.set_fixmap native_set_fixmap __native_set_fixmap set_pte_vaddr set_pte_vaddr_p4d __set_pte_vaddr flush_tlb_one_kernel ``` On our ARM64 machine, flush_tlb_kernel_range is causing a noticeable performance degradation. This arm64 patch said: https://lore.kernel.org/all/20161201135112.15396-1-fu.wei@linaro.org/ (commit 9f9a35a7b654e006250530425eb1fb527f0d32e9) ``` /* * Despite its name, this function must still broadcast the TLB * invalidation in order to ensure other CPUs don't end up with junk * entries as a result of speculation. Unusually, its also called in * IRQ context (ghes_iounmap_irq) so if we ever need to use IPIs for * TLB broadcasting, then we're in trouble here. */ static inline void arch_apei_flush_tlb_one(unsigned long addr) { flush_tlb_kernel_range(addr, addr + PAGE_SIZE); } ``` 1. I am curious to know the reason behind the design choice of flushing the TLB on all cores for ARM64's clear_fixmap, while AMD64 only flushes the TLB on a single core. Are there any TLB design details that make a difference here? 2. Is it possible to let the ARM64 to flush the TLB on just one core, similar to the AMD64? 3. If so, would there be any potential drawbacks or limitations to making such a change? Thanks, Gang Li _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel