From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C205221FB8 for ; Thu, 18 Dec 2025 07:22:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766042564; cv=none; b=TDPzMH6hO3l2IJ9/qGNo/tOoA1X7L3f8VhmpbN8k6+u8YqjtiaGBXOe2X4s2c6NGQboRWXr8uzq2rVVfWBhD4OVnM628vDxhjR72+N4uLzkDtg8dmS06ieSrC9dyS0ZVZePM+fLYDJfXnGR8UG+9IBuGiHuJ7KKPBcDtYq3ooXc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766042564; c=relaxed/simple; bh=v8cAalRQn43JyXFhRWos5ZPdRsZtNdz6rnazo+rNvyE=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=BvhgoFtH9C8y0fswy2xNHQARPwlRhlWtH0YoOFcRXEP0OxaOl2I8WIEqlEGA06o/tlj+xJvVQGLiGjcJSUVKDwLP+/OzT3x96yxJv5nHBVXOfP0lYgaI3UQRaKNiU3qC4B0kwDdQDFuApy0U5CCWfAifXKNmw1HpQjmiiIwyBRg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=IiFtzWuM; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="IiFtzWuM" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DF704C4CEFB; Thu, 18 Dec 2025 07:22:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1766042563; bh=v8cAalRQn43JyXFhRWos5ZPdRsZtNdz6rnazo+rNvyE=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=IiFtzWuMvKFyQXgkjfnEsQWHGnpImypXXT+oPLEYQCLJn/YvQN+eI2Et9GNHnUhN+ i+gH/qKbSbixNoOHsW9F5Blvg01u7KKsvOyMcGXMzwRFqll2KtwgTGESGuzyZ0Do7M wMjhcvVZYs9r0oLSt1MBDGJNHqzosSrkpCPqxi3hbL/QZ3WR73t4Git79o6y4o9Wac 3PzQS3ZWVuE/xF/nOchsxWXkJoPKH5QO/MC6JACnRIVKMk+PKgfgSj8YomHVYkPvW5 UyBHYf5hV7J5ZeaXVCAQp/+cSoZ51yDkubbrrIn0jKf44Bfp3CTBIZzlhRzMFEZ60B 2hCYXKZYq2Iog== Message-ID: Date: Thu, 18 Dec 2025 08:22:37 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v10 6/8] x86/clear_page: Introduce clear_pages() To: Ankur Arora , linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org Cc: akpm@linux-foundation.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, mjguzik@gmail.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, willy@infradead.org, raghavendra.kt@amd.com, chleroy@kernel.org, ioworker0@gmail.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com References: <20251215204922.475324-1-ankur.a.arora@oracle.com> <20251215204922.475324-7-ankur.a.arora@oracle.com> From: "David Hildenbrand (Red Hat)" Content-Language: en-US In-Reply-To: <20251215204922.475324-7-ankur.a.arora@oracle.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 12/15/25 21:49, Ankur Arora wrote: > Performance when clearing with string instructions (x86-64-stosq and > similar) can vary significantly based on the chunk-size used. > > $ perf bench mem memset -k 4KB -s 4GB -f x86-64-stosq > # Running 'mem/memset' benchmark: > # function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S) > # Copying 4GB bytes ... > > 13.748208 GB/sec > > $ perf bench mem memset -k 2MB -s 4GB -f x86-64-stosq > # Running 'mem/memset' benchmark: > # function 'x86-64-stosq' (movsq-based memset() in > # arch/x86/lib/memset_64.S) > # Copying 4GB bytes ... > > 15.067900 GB/sec > > $ perf bench mem memset -k 1GB -s 4GB -f x86-64-stosq > # Running 'mem/memset' benchmark: > # function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S) > # Copying 4GB bytes ... > > 38.104311 GB/sec > > (Both on AMD Milan.) > > With a change in chunk-size from 4KB to 1GB, we see the performance go > from 13.7 GB/sec to 38.1 GB/sec. For the chunk-size of 2MB the change isn't > quite as drastic but it is worth adding a clear_page() variant that can > handle contiguous page-extents. > > Signed-off-by: Ankur Arora > Tested-by: Raghavendra K T Nothing jumped at me. Reviewed-by: David Hildenbrand (Red Hat) -- Cheers David