From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from shards.monkeyblade.net (unknown [184.105.139.130]) by lists.ozlabs.org (Postfix) with ESMTP id 3wNKhB54WtzDqL5 for ; Thu, 11 May 2017 01:21:11 +1000 (AEST) Date: Wed, 10 May 2017 11:20:59 -0400 (EDT) Message-Id: <20170510.112059.169845404310247896.davem@davemloft.net> To: pasha.tatashin@oracle.com Cc: mhocko@kernel.org, linux-kernel@vger.kernel.org, sparclinux@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, heiko.carstens@de.ibm.com Subject: Re: [v3 0/9] parallelized "struct page" zeroing From: David Miller In-Reply-To: References: <3f5f1416-aa91-a2ff-cc89-b97fcaa3e4db@oracle.com> <20170510145726.GM31466@dhcp22.suse.cz> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Pasha Tatashin Date: Wed, 10 May 2017 11:01:40 -0400 > Perhaps you are right, and I will measure on x86. But, I suspect hit > can become unacceptable on some platfoms: there is an overhead of > calling a function, even if it is leaf-optimized, and there is an > overhead in memset() to check for alignments of size and address, > types of setting (zeroing vs. non-zeroing), etc., that adds up > quickly. Another source of overhead on the sparc64 side is that we much do memory barriers around the block initializiing stores. So batching calls to memset() amortize that as well.