From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f200.google.com (mail-io0-f200.google.com [209.85.223.200]) by kanga.kvack.org (Postfix) with ESMTP id 4D31D6B0038 for ; Fri, 12 May 2017 13:25:08 -0400 (EDT) Received: by mail-io0-f200.google.com with SMTP id s38so41811881ioi.9 for ; Fri, 12 May 2017 10:25:08 -0700 (PDT) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id z10si3544009ioi.90.2017.05.12.10.25.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 12 May 2017 10:25:07 -0700 (PDT) Subject: Re: [v3 0/9] parallelized "struct page" zeroing References: <9088ad7e-8b3b-8eba-2fdf-7b0e36e4582e@oracle.com> <65b8a658-76d1-0617-ece8-ff7a3c1c4046@oracle.com> <20170512.125708.475573831936972365.davem@davemloft.net> From: Pasha Tatashin Message-ID: <6da8d4a6-3332-8331-c329-b05efd88a70d@oracle.com> Date: Fri, 12 May 2017 13:24:52 -0400 MIME-Version: 1.0 In-Reply-To: <20170512.125708.475573831936972365.davem@davemloft.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: David Miller Cc: mhocko@kernel.org, linux-kernel@vger.kernel.org, sparclinux@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, heiko.carstens@de.ibm.com On 05/12/2017 12:57 PM, David Miller wrote: > From: Pasha Tatashin > Date: Thu, 11 May 2017 16:59:33 -0400 > >> We should either keep memset() only for deferred struct pages as what >> I have in my patches. >> >> Another option is to add a new function struct_page_clear() which >> would default to memset() and to something else on platforms that >> decide to optimize it. >> >> On SPARC it would call STBIs, and we would do one membar call after >> all "struct pages" are initialized. > > No membars will be performed for single individual page struct clear, > the cutoff to use the STBI is larger than that. > Right now it is larger, but what I suggested is to add a new optimized routine just for this case, which would do STBI for 64-bytes but without membar (do membar at the end of memmap_init_zone() and deferred_init_memmap() #define struct_page_clear(page) \ __asm__ __volatile__( \ "stxa %%g0, [%0]%2\n" \ "stxa %%xg0, [%0 + %1]%2\n" \ : /* No output */ \ : "r" (page), "r" (0x20), "i"(ASI_BLK_INIT_QUAD_LDD_P)) And insert it into __init_single_page() instead of memset() The final result is 4.01s/T which is even faster compared to current 4.97s/T Pasha -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org