From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF8C87A13A for ; Thu, 28 Nov 2024 19:46:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=167.114.26.122 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732823221; cv=none; b=ojYnotGSgIYhdwBufgyXrJfgAP1xiiUoobcfNpNL89VIVqAdj0t9P0fvzo4iekhK8i4VsdrEGFXzRvjQMRwXchxdP3hqt+RBg7Mjt+22MrvfAbeGJ645E+5gClyylZ/6SPSuOqweQGdLgcHQqKpVjpHeShGP5zcoyaqZgfHXSuI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732823221; c=relaxed/simple; bh=2ELePXYEM3g2x27jUZk9MIdx9Ot+pCc2LJhc59w9m48=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=l6E757NUrSMwku3eLXfdxr1FT6VX9dY//OAkvJCR04znJTW1JTYMfr/Saxya/x3TiaBR0R1/rpIrSYRmjarrlTPMvAs9L9PGMivHClHnn3ie5Fsonx7D9hiHruL6ZFIu1kWRmWd1mRp4s3pduaBvB0GocLxo1TfbEe+FnF/cWNA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com; spf=pass smtp.mailfrom=efficios.com; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b=gyhy5hDX; arc=none smtp.client-ip=167.114.26.122 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=efficios.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=efficios.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=efficios.com header.i=@efficios.com header.b="gyhy5hDX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1732823218; bh=2ELePXYEM3g2x27jUZk9MIdx9Ot+pCc2LJhc59w9m48=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=gyhy5hDXjDMnGrSLW8KjTYCg8BFsEwEdJf4GDepDN1rnr9qczkN1k44GIGziCx8hV dWFMF+jG721UI6S3cYMqrRug+MCgtzVZbQ1L4vtq3LHQ6V6kHprKu08r8/lzTbvDDZ LMJT7eLpCW12ThitnRxjQu+FFjYXVJrcY/QT3qZ5M3/Z5oxzAGIucZ54JUIMDlH/gG +bapxvRqUEfRKy67WY+X0z+DfNcmgiYR37g8ZYrWW5G2vPmaYwRuqdq1rC4nO4dAMe ThPxOPPcQHtwX72Tw1ujMeDtufzBoUwBKbh+Q7X6908f3z6fb1rN37EoE5nz+H+ieE cAINsJ/VenZmg== Received: from localhost.localdomain (96-127-217-162.qc.cable.ebox.net [96.127.217.162]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4XzmzQ2bnTzly5; Thu, 28 Nov 2024 14:46:58 -0500 (EST) Date: Thu, 28 Nov 2024 14:46:57 -0500 From: Mathieu Desnoyers To: Peter Zijlstra , Rik van Riel Cc: kernel test robot , oe-lkp@lists.linux.dev, lkp@intel.com, linux-kernel@vger.kernel.org, x86@kernel.org, Ingo Molnar , Dave Hansen , Linus Torvalds , Mel Gorman Subject: Re: [tip:x86/mm] [x86/mm/tlb] 209954cbc7: will-it-scale.per_thread_ops 13.2% regression Message-ID: References: <202411282207.6bd28eae-lkp@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <202411282207.6bd28eae-lkp@intel.com> On 28-Nov-2024 10:57:35 PM, kernel test robot wrote: > > > Hello, > > kernel test robot noticed a 13.2% regression of will-it-scale.per_thread_ops on: > > > commit: 209954cbc7d0ce1a190fc725d20ce303d74d2680 ("x86/mm/tlb: Update mm_cpumask lazily") > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/mm AFAIU, this commit changes the way TLB flushes are inhibited when context switching away from a mm. This means that one additional TLB flush is performed to a given CPU even after it has context switched away from the mm, and only then is the mm_cpumask cleared for that CPU. This could result in additional TLB flush IPI overhead in specific scenarios where the IPIs are typically triggered after a thread has context-switched out. May I recommend looking into a scheme similar to rseq mm_cid for this ? We're already adding a per-mm per-cpu data: mm_struct: /** * @pcpu_cid: Per-cpu current cid. * * Keep track of the currently allocated mm_cid for each cpu. * The per-cpu mm_cid values are serialized by their respective * runqueue locks. */ struct mm_cid __percpu *pcpu_cid; struct mm_cid { u64 time; int cid; int recent_cid; }; I suspect you could use a similar per-cpu data structure per-mm to keep track of the pending TLB flush mask, and update it simply with load/store to per-CPU data rather than have to cacheline-bounce all over the place due to frequent mm_cpumask atomic updates. Then you get all the benefits without introducing a window where useless TLB flush IPIs get triggered. Of course it's slightly less compact in terms of memory footprint than a cpumask, but you gain a lot by removing cache line bouncing on this frequent context switch code path. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com