public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Luiz Capitulino <luizcap@redhat.com>
To: Mel Gorman <mgorman@techsingularity.net>,
	Yunsheng Lin <linyunsheng@huawei.com>
Cc: linux-mm@kvack.org, willy@infradead.org, david@redhat.com,
	linux-kernel@vger.kernel.org, lcapitulino@gmail.com
Subject: Re: [PATCH v2 1/2] mm: alloc_pages_bulk_noprof: drop page_list argument
Date: Fri, 3 Jan 2025 09:46:56 -0500	[thread overview]
Message-ID: <e55694ce-7562-4e73-920a-8c03eb37c3c2@redhat.com> (raw)
In-Reply-To: <rat5ihytbh4r7w476ezwxrzjzloilqygpb3dgjg7ewwmm7og2s@6osaxcvgd7ys>

On 2025-01-03 09:27, Mel Gorman wrote:
> On Fri, Jan 03, 2025 at 07:29:30PM +0800, Yunsheng Lin wrote:
>> On 2025/1/3 4:00, Mel Gorman wrote:
>>> On Wed, Dec 25, 2024 at 08:36:04PM +0800, Yunsheng Lin wrote:
>>>> On 2024/12/24 6:00, Luiz Capitulino wrote:
>>>>
>>>>>   /*
>>>>> - * __alloc_pages_bulk - Allocate a number of order-0 pages to a list or array
>>>>> + * __alloc_pages_bulk - Allocate a number of order-0 pages to an array
>>>>>    * @gfp: GFP flags for the allocation
>>>>>    * @preferred_nid: The preferred NUMA node ID to allocate from
>>>>>    * @nodemask: Set of nodes to allocate from, may be NULL
>>>>> - * @nr_pages: The number of pages desired on the list or array
>>>>> - * @page_list: Optional list to store the allocated pages
>>>>> - * @page_array: Optional array to store the pages
>>>>> + * @nr_pages: The number of pages desired in the array
>>>>> + * @page_array: Array to store the pages
>>>>>    *
>>>>>    * This is a batched version of the page allocator that attempts to
>>>>> - * allocate nr_pages quickly. Pages are added to page_list if page_list
>>>>> - * is not NULL, otherwise it is assumed that the page_array is valid.
>>>>> + * allocate nr_pages quickly. Pages are added to the page_array.
>>>>>    *
>>>>> - * For lists, nr_pages is the number of pages that should be allocated.
>>>>> - *
>>>>> - * For arrays, only NULL elements are populated with pages and nr_pages
>>>>> + * Note that only NULL elements are populated with pages and nr_pages
>>>>
>>>> It is not really related to this patch, but while we are at this, the above
>>>> seems like an odd behavior. By roughly looking at all the callers of that
>>>> API, it seems like only the below callers rely on that?
>>>> fs/erofs/zutil.c: z_erofs_gbuf_growsize()
>>>> fs/xfs/xfs_buf.c: xfs_buf_alloc_pages()
>>>>
>>>> It seems it is quite straight forward to change the above callers to not
>>>> rely on the above behavior, and we might be able to avoid more checking
>>>> by removing the above behavior?
>>>>
>>>
>>> It was implemented that way for an early user, net/sunrpc/svc_xprt.c.
>>> The behaviour removes a burden from the caller to track the number of
>>> populated elements and then pass the exact number of pages that must be
>>> allocated. If the API does not handle that detail, each caller needs
>>> similar state tracking implementations. As the overhead is going to be
>>> the same whether the API implements it once or each caller implements
>>> there own, it is simplier if there is just one implementation.
>>
>> It seems it is quite straight forward to change the above use case to
>> not rely on that by something like below?
>>
>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>> index 43c57124de52..52800bfddc86 100644
>> --- a/net/sunrpc/svc_xprt.c
>> +++ b/net/sunrpc/svc_xprt.c
>> @@ -670,19 +670,21 @@ static bool svc_alloc_arg(struct svc_rqst *rqstp)
>>                  pages = RPCSVC_MAXPAGES;
>>          }
>>
>> -       for (filled = 0; filled < pages; filled = ret) {
>> -               ret = alloc_pages_bulk_array(GFP_KERNEL, pages,
>> -                                            rqstp->rq_pages);
>> -               if (ret > filled)
>> +       for (filled = 0; filled < pages;) {
>> +               ret = alloc_pages_bulk_array(GFP_KERNEL, pages - filled,
>> +                                            rqstp->rq_pages + filled);
>> +               if (ret) {
>> +                       filled += ret;
>>                          /* Made progress, don't sleep yet */
>>                          continue;
>> +               }
>>
>>                  set_current_state(TASK_IDLE);
>>                  if (svc_thread_should_stop(rqstp)) {
>>                          set_current_state(TASK_RUNNING);
>>                          return false;
>>                  }
>> -               trace_svc_alloc_arg_err(pages, ret);
>> +               trace_svc_alloc_arg_err(pages, filled);
>>                  memalloc_retry_wait(GFP_KERNEL);
>>          }
>>          rqstp->rq_page_end = &rqstp->rq_pages[pages];
>>
> 
> The API implementation would also need to change to make this work as the
> return value is a number of pages that are on the array, not the number of
> new pages allocated. Even if fixed, it still moves cost and complexity to
> the caller and the API is harder to use and easier to make mistakes. That
> shift in responsibility and the maintenance burden would need to be
> justified. While it is possible to use wrappers to allow callers to decide
> whether to manage the tracking or let the API handle it, the requirement
> then is to show that there is a performance gain for a common use case.

I agree with all your points, but I think there's value in simplifying
alloc_pages_bulk_noprof() if the wrappers turn out not be complex or
difficult to use.

If all users of this case always fill the array sequentially, then
maybe we could just have a wrapper that scans the array first (this
is the first loop in alloc_pages_bulk_noprof()) or maybe callers
could keep track of 'filled' (via a pointer or macro usage). But
this is just brainstorming, we'd need to see the resulting code to
know if it works and makes sense.

> This is outside the scope of a serise that removes the page_list
> argument. Even if a series was proposed to shift responsibility to the
> caller, I would not expect it to be submitted with a page_list removal.

Yes, absolutely.



  reply	other threads:[~2025-01-03 14:47 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-23 22:00 [PATCH v2 0/2] mm: alloc_pages_bulk: small API refactor Luiz Capitulino
2024-12-23 22:00 ` [PATCH v2 1/2] mm: alloc_pages_bulk_noprof: drop page_list argument Luiz Capitulino
2024-12-25 12:36   ` Yunsheng Lin
2025-01-02 16:38     ` Luiz Capitulino
2025-01-03 11:21       ` Yunsheng Lin
2025-01-03 14:12         ` Luiz Capitulino
2025-01-02 20:00     ` Mel Gorman
2025-01-03 11:29       ` Yunsheng Lin
2025-01-03 14:27         ` Mel Gorman
2025-01-03 14:46           ` Luiz Capitulino [this message]
2025-01-08 13:39   ` David Hildenbrand
2024-12-23 22:00 ` [PATCH v2 2/2] mm: alloc_pages_bulk: rename API Luiz Capitulino
2025-01-08 13:40   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e55694ce-7562-4e73-920a-8c03eb37c3c2@redhat.com \
    --to=luizcap@redhat.com \
    --cc=david@redhat.com \
    --cc=lcapitulino@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linyunsheng@huawei.com \
    --cc=mgorman@techsingularity.net \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox