From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751200AbdE3RRK (ORCPT ); Tue, 30 May 2017 13:17:10 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:31929 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750908AbdE3RRI (ORCPT ); Tue, 30 May 2017 13:17:08 -0400 Subject: Re: [v3 0/9] parallelized "struct page" zeroing To: Michal Hocko Cc: linux-kernel@vger.kernel.org, sparclinux@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, borntraeger@de.ibm.com, heiko.carstens@de.ibm.com, davem@davemloft.net References: <1494003796-748672-1-git-send-email-pasha.tatashin@oracle.com> <20170509181234.GA4397@dhcp22.suse.cz> <20170515193817.GC7551@dhcp22.suse.cz> <9b3d68aa-d2b6-2b02-4e75-f8372cbeb041@oracle.com> <20170516083601.GB2481@dhcp22.suse.cz> <07a6772b-711d-4fdc-f688-db76f1ec4c45@oracle.com> <20170529115358.GJ19725@dhcp22.suse.cz> From: Pasha Tatashin Message-ID: Date: Tue, 30 May 2017 13:16:50 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <20170529115358.GJ19725@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Source-IP: userv0022.oracle.com [156.151.31.74] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Could you be more specific? E.g. how are other stores done in > __init_single_page safe then? I am sorry to be dense here but how does > the full 64B store differ from other stores done in the same function. Hi Michal, It is safe to do regular 8-byte and smaller stores (stx, st, sth, stb) without membar, but they are slower compared to STBI which require a membar before memory can be accessed. So when on SPARC we zero a larger span of memory it is faster to use STBI, and do one membar at the end. This is why for single thread it is faster to zero multiple pages of memory and than initialize only fields that are needed in "struct page". I believe the same is true for ppc64, as they clear the whole cacheline 128-bytes at a time with larger memsets. Pasha