From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=VM1+=FM=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 89A0EC4361B
	for <linux-mm@archiver.kernel.org>; Tue,  8 Dec 2020 19:34:50 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 2346123B7B
	for <linux-mm@archiver.kernel.org>; Tue,  8 Dec 2020 19:34:50 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2346123B7B
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 91BA86B005C; Tue,  8 Dec 2020 14:34:49 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 8C9DD6B005D; Tue,  8 Dec 2020 14:34:49 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 7A3BA6B0068; Tue,  8 Dec 2020 14:34:49 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0220.hostedemail.com [216.40.44.220])
	by kanga.kvack.org (Postfix) with ESMTP id 5F7DB6B005C
	for <linux-mm@kvack.org>; Tue,  8 Dec 2020 14:34:49 -0500 (EST)
Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id 21A941EE6
	for <linux-mm@kvack.org>; Tue,  8 Dec 2020 19:34:49 +0000 (UTC)
X-FDA: 77571117498.05.fan67_620ad76273e9
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin05.hostedemail.com (Postfix) with ESMTP id 019F71802586D
	for <linux-mm@kvack.org>; Tue,  8 Dec 2020 19:34:48 +0000 (UTC)
X-HE-Tag: fan67_620ad76273e9
X-Filterd-Recvd-Size: 7871
Received: from mail-qk1-f194.google.com (mail-qk1-f194.google.com [209.85.222.194])
	by imf42.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue,  8 Dec 2020 19:34:48 +0000 (UTC)
Received: by mail-qk1-f194.google.com with SMTP id w79so6042866qkb.5
        for <linux-mm@kvack.org>; Tue, 08 Dec 2020 11:34:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=ziepe.ca; s=google;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:in-reply-to;
        bh=Rh4yCmY4B0M6fHlacYW0fLHGzWewA2jB305iEjKFLBY=;
        b=d4XFEVwVFpPHZK5tur3nnCtqrxvhRpUi2x6nqYOwxLpa+nFbLVDadI4kw+morgJnej
         HebBMbDnd2M7uYb85zCJfoMbUoZvFUiodGOJWN84H0uqe8Wc9+y05Ec9vDpX9bV3oNTb
         P0N/yPGscUL7zEGb35H8Ie20pwoU7Pd3k7E5gtWDoO8paCzVzLiIG07NPUFnquRa37h4
         EXjMrO8wFKeWQWPcHkulZnQaRQ8gNJePLxYD9vfVdE8LFfyIsP+9fH+rgBDYHwC1Eqrq
         braH1MNRX9Sci2VWdTLpL8iJi3+s/5Gw6HgMNdBXKH3EwL3VrQ2Jfy7DlJhue88Jmg2u
         W60w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:references
         :mime-version:content-disposition:in-reply-to;
        bh=Rh4yCmY4B0M6fHlacYW0fLHGzWewA2jB305iEjKFLBY=;
        b=Wmd+NEIytrSWrJ3guVU/hhO+LzYXqSZoq0t/5/UpbwIS+EIoUF2UR3PSSBnbMfmYu3
         uamy4j2VET+kmzZMaDfwY6SjGLlxt3dgkyCCAm3OishO7GLQVQhI2aWk4AEIQHJ7eS/S
         ScQulvSgkrdXzJUqvj6aHFi1AVEXnS3GRp/UkLkciJuV5GlEiZrWY4Kc3NKNnn7diCck
         PidqOlAEaG7ZvYTgQKh4zNtDjFW8palK5yNaMIc9uUIgoTdH+9IccJ5fTT9vI96cAa4j
         EfNq/ZL2wL5oAGeJ85+bhRSdy2wXfhChw7XTqL97QpoVNO8xjIR+80c7k8e1Ytt5wTh/
         T2+g==
X-Gm-Message-State: AOAM530JneoCw/6tqPEfTQZ3A2mcx5+65dz5MUOWUwA5X/tTvpCIBovi
	G6XZe1LL83Y3osW8DRzc7BzEqw==
X-Google-Smtp-Source: ABdhPJyDf9YVrmay8x00iSJl7hfzo1u+I6WgS9TlPNJ+3LvmSxqIplNUEi76Oa37fSoXIXs1PBxplw==
X-Received: by 2002:ae9:f30c:: with SMTP id p12mr2246497qkg.154.1607456087858;
        Tue, 08 Dec 2020 11:34:47 -0800 (PST)
Received: from ziepe.ca (hlfxns017vw-142-162-115-133.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.115.133])
        by smtp.gmail.com with ESMTPSA id x24sm14128270qkx.23.2020.12.08.11.34.46
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 08 Dec 2020 11:34:46 -0800 (PST)
Received: from jgg by mlx with local (Exim 4.94)
	(envelope-from <jgg@ziepe.ca>)
	id 1kmil4-0080UR-8T; Tue, 08 Dec 2020 15:34:46 -0400
Date: Tue, 8 Dec 2020 15:34:46 -0400
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Joao Martins <joao.m.martins@oracle.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: linux-mm@kvack.org, Dan Williams <dan.j.williams@intel.com>,
	Ira Weiny <ira.weiny@intel.com>, linux-nvdimm@lists.01.org,
	Matthew Wilcox <willy@infradead.org>,
	Jane Chu <jane.chu@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH RFC 7/9] mm/gup: Decrement head page once for group of
 subpages
Message-ID: <20201208193446.GP5487@ziepe.ca>
References: <20201208172901.17384-1-joao.m.martins@oracle.com>
 <20201208172901.17384-9-joao.m.martins@oracle.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20201208172901.17384-9-joao.m.martins@oracle.com>
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Dec 08, 2020 at 05:28:59PM +0000, Joao Martins wrote:
> Rather than decrementing the ref count one by one, we
> walk the page array and checking which belong to the same
> compound_head. Later on we decrement the calculated amount
> of references in a single write to the head page.
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>  mm/gup.c | 41 ++++++++++++++++++++++++++++++++---------
>  1 file changed, 32 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 194e6981eb03..3a9a7229f418 100644
> +++ b/mm/gup.c
> @@ -212,6 +212,18 @@ static bool __unpin_devmap_managed_user_page(struct page *page)
>  }
>  #endif /* CONFIG_DEV_PAGEMAP_OPS */
>  
> +static int record_refs(struct page **pages, int npages)
> +{
> +	struct page *head = compound_head(pages[0]);
> +	int refs = 1, index;
> +
> +	for (index = 1; index < npages; index++, refs++)
> +		if (compound_head(pages[index]) != head)
> +			break;
> +
> +	return refs;
> +}
> +
>  /**
>   * unpin_user_page() - release a dma-pinned page
>   * @page:            pointer to page to be released
> @@ -221,9 +233,9 @@ static bool __unpin_devmap_managed_user_page(struct page *page)
>   * that such pages can be separately tracked and uniquely handled. In
>   * particular, interactions with RDMA and filesystems need special handling.
>   */
> -void unpin_user_page(struct page *page)
> +static void __unpin_user_page(struct page *page, int refs)

Refs should be unsigned everywhere.

I suggest using clear language 'page' here should always be a compound
head called 'head' (or do we have another common variable name for
this?)

'refs' is number of tail pages within the compound, so 'ntails' or
something

>  {
> -	int refs = 1;
> +	int orig_refs = refs;
>  
>  	page = compound_head(page);

Caller should always do this

> @@ -237,14 +249,19 @@ void unpin_user_page(struct page *page)
>  		return;
>  
>  	if (hpage_pincount_available(page))
> -		hpage_pincount_sub(page, 1);
> +		hpage_pincount_sub(page, refs);
>  	else
> -		refs = GUP_PIN_COUNTING_BIAS;
> +		refs *= GUP_PIN_COUNTING_BIAS;
>  
>  	if (page_ref_sub_and_test(page, refs))
>  		__put_page(page);
>  
> -	mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, 1);
> +	mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED, orig_refs);
> +}

And really this should be placed directly after
try_grab_compound_head() and be given a similar name
'unpin_compound_head()'. Even better would be to split the FOLL_PIN
part into a function so there was a clear logical pairing.

And reviewing it like that I want to ask if this unpin sequence is in
the right order.. I would expect it to be the reverse order of the get

John?

Is it safe to call mod_node_page_state() after releasing the refcount?
This could race with hot-unplugging the struct pages so I think it is
wrong.

> +void unpin_user_page(struct page *page)
> +{
> +	__unpin_user_page(page, 1);

Thus this is

	__unpin_user_page(compound_head(page), 1);

> @@ -274,6 +291,7 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
>  				 bool make_dirty)
>  {
>  	unsigned long index;
> +	int refs = 1;
>  
>  	/*
>  	 * TODO: this can be optimized for huge pages: if a series of pages is
> @@ -286,8 +304,9 @@ void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
>  		return;
>  	}
>  
> -	for (index = 0; index < npages; index++) {
> +	for (index = 0; index < npages; index += refs) {
>  		struct page *page = compound_head(pages[index]);
> +

I think this is really hard to read, it should end up as some:

for_each_compond_head(page_list, page_list_len, &head, &ntails) {
       		if (!PageDirty(head))
			set_page_dirty_lock(head, ntails);
		unpin_user_page(head, ntails);
}

And maybe you open code that iteration, but that basic idea to find a
compound_head and ntails should be computational work performed.

No reason not to fix set_page_dirty_lock() too while you are here.

Also, this patch and the next can be completely independent of the
rest of the series, it is valuable regardless of the other tricks. You
can split them and progress them independently.

.. and I was just talking about this with Daniel Jordan and some other
people at your company :)

Thanks,
Jason