From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752774AbeAaVB0 (ORCPT ); Wed, 31 Jan 2018 16:01:26 -0500 Received: from mga11.intel.com ([192.55.52.93]:17672 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752757AbeAaVBZ (ORCPT ); Wed, 31 Jan 2018 16:01:25 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,441,1511856000"; d="scan'208";a="170690603" Subject: Re: [PATCH] x86: Align TLB invalidation info To: Nadav Amit , x86@kernel.org References: <20180131201118.1694-1-namit@vmware.com> Cc: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , linux-kernel@vger.kernel.org, Peter Zijlstra , Nadav Amit , Andy Lutomirski From: Dave Hansen Message-ID: <8bb352bc-4e1f-4e87-80e3-a8e65d618d2a@linux.intel.com> Date: Wed, 31 Jan 2018 13:01:24 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <20180131201118.1694-1-namit@vmware.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/31/2018 12:11 PM, Nadav Amit wrote: > The TLB invalidation info is allocated on the stack, which might cause > it to be unaligned. Since this information may be transferred to > different cores for TLB shootdown, this might result in an additional > cache-line bouncing between the cores. > > GCC provides a way to deal with it by using > __builtin_alloca_with_align(). Use it to avoid the bouncing cache lines. It doesn't really *bounce*, though, does it? I don't see any writes on the remote side. The remote use seems entirely read-only. You also don't have to exhaustively test this, but I'd love to see at least a sanity check with a microbenchmark (or something) that, yes, this does help *something*. Maybe it makes the remote flush_tlb_func_common() run faster because it's pulling in fewer lines, or maybe you can even detect fewer misses in there.