Re: cma_alloc(), add sleep-and-retry for temporary page pinning

All of lore.kernel.org
 help / color / mirror / Atom feed

From: cgoldswo@codeaurora.org
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-arm-msm@vger.kernel.org,
	linux-kernel@vger.kernel.org, pratikp@codeaurora.org,
	pdaly@codeaurora.org, sudraja@codeaurora.org,
	iamjoonsoo.kim@lge.com, linux-arm-msm-owner@vger.kernel.org
Subject: Re: cma_alloc(), add sleep-and-retry for temporary page pinning
Date: Thu, 20 Aug 2020 21:17:28 -0700	[thread overview]
Message-ID: <896458e8daf87a274ba1ce8ced30ac8e@codeaurora.org> (raw)
In-Reply-To: <896f92e8c37936e7cb2914e79273e9e8@codeaurora.org>

On 2020-08-11 15:20, cgoldswo@codeaurora.org wrote:
> On 2020-08-06 18:31, Andrew Morton wrote:
>> On Wed,  5 Aug 2020 19:56:21 -0700 Chris Goldsworthy
>> <cgoldswo@codeaurora.org> wrote:
>> 
>>> On mobile devices, failure to allocate from a CMA area constitutes a
>>> functional failure.  Sometimes during CMA allocations, we have 
>>> observed
>>> that pages in a CMA area allocated through alloc_pages(), that we're 
>>> trying
>>> to migrate away to make room for a CMA allocation, are temporarily 
>>> pinned.
>>> This temporary pinning can occur when a process that owns the pinned 
>>> page
>>> is being forked (the example is explained further in the commit 
>>> text).
>>> This patch addresses this issue by adding a sleep-and-retry loop in
>>> cma_alloc() . There's another example we know of similar to the above 
>>> that
>>> occurs during exit_mmap() (in zap_pte_range() specifically), but I 
>>> need to
>>> determine if this is still relevant today.
>> 
> 
>> Sounds fairly serious but boy, we're late for 5.9.
>> 
>> I can queue it for 5.10 with a cc:stable so that it gets backported
>> into earlier kernels a couple of months from now, if we think the
>> seriousness justifies backporting(?).
>> 
> 
> Queuing this seems like the best way to proceed, if we were to pick up
> this patch.
> I think we can forgo back-porting this, as this is something that will 
> only be
> needed as vendors such as our selves start using Google's Generic 
> Kernel Image
> (we've carried this patch in our tree for over four years).
> 
>> 
>> And...  it really is a sad little patch, isn't it?  Instead of fixing
>> the problem, it reduces the problem's probability by 5x.  Can't we do
>> better than this?
> 
> I have one alternative in mind.  I have been able to review the 
> exit_mmap()
> case, so before proceeding, let's do a breakdown of the problem: we can
> categorize the pinning issue we're trying to address here as being one 
> of
> (1) incrementing _refcount and getting context-switched out before
> incrementing _mapcount (applies to forking a process / copy_one_pte()), 
> and
> (2) decrementing _mapcount and getting context-switched out before
> decrementing _refcount (applies to tearing down a process / 
> exit_mmap()).
> So, one alternative would be to insert preempt_disable/enable() calls 
> at
> affected sites. So, for the copy_one_pte() pinning case, we could do 
> the
> following inside of copy_one_pte():
> 
>         if (page) {
> +               preempt_disable();
>                 get_page(page);
>                 page_dup_rmap(page, false);
> +               preempt_enable();
>                 rss[mm_counter(page)]++;
>         }
> 
> I'm not sure if this approach would be acceptable for the exit_mmap()
> pinning case (applicable when CONFIG_MMU_GATHER_NO_GATHER=y).  For the
> purposes of this discussion, we can look at two function calls inside 
> of
> exit_mmap(), in the order they're called in, to show how the pinning is
> occuring:
> 
>     1. Calling unmap_vmas(): this unmaps the pages in each VMA for an
>     exiting task, using zap_pte_range() - zap_pte_range() reduces the
>     _mapcount for each page in a VMA, using page_remove_rmap().  After
>     calling page_remove_rmap(), the page is placed into a list in
>     __tlb_remove_page().  This list of pages will be used when flushing
>     TLB entries later on during the process teardown.
> 
>     2. Calling tlb_finish_mmu(): This is will flush the TLB entries
>     associated with pages, before calling put_page() on them, using the
>     previously collected pages from __tlb_remove_page() - the call flow 
> is
>     tlb_flush_mmu() > tlb_flush_mmu() > tlb_flush_mmu_free()
>     > tlb_batch_pages_flush() > free_pages_and_swap_cache() >
>     release_pages(), where release_pages() is described as a "batched
>     put_page()"
> 
> The preempt_disable/enable() approach would entail doing the following
> inside of exit_mmap():
> 
> +       preempt_disable();
>         unmap_vmas(&tlb, vma, 0, -1);
>         free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 
> USER_PGTABLES_CEILING);
>         tlb_finish_mmu(&tlb, 0, -1);
> +       preempt_enable();
> 
> I'm not sure doing this is feasible, given how long it could take to do 
> the
> process teardown.
> 
> The good thing about this patch is that it has been stable in our 
> kernel
> for four years (though for some SoCs we increased the retry counts).  
> One
> thing to stress is that there are other instances of CMA page pinning, 
> that
> this patch isn't attempting to address. Please let me know if you're 
> okay
> with queuing this for the 5.10 merge window - if you are, I can add an
> option to configure the number of retries, and will resend the patch 
> once
> the 5.9 merge window closes.
> 
> Thanks,
> 
> Chris.

Hi Andrew,

Have you been able to give the patch any further consideration?

Thanks,

Chris.

-- 
The Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project

next prev parent reply	other threads:[~2020-08-21  4:17 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-11 22:20 cma_alloc(), add sleep-and-retry for temporary page pinning cgoldswo
2020-08-21  4:17 ` cgoldswo [this message]
2020-08-21 22:01 ` Andrew Morton
2020-09-11 18:39   ` Chris Goldsworthy
2020-09-11 18:39   ` Chris Goldsworthy
  -- strict thread matches above, loose matches on Subject: below --
2020-08-06  2:56 Chris Goldsworthy
2020-08-07  1:31 ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=896458e8daf87a274ba1ce8ced30ac8e@codeaurora.org \
    --to=cgoldswo@codeaurora.org \
    --cc=akpm@linux-foundation.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-arm-msm-owner@vger.kernel.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pdaly@codeaurora.org \
    --cc=pratikp@codeaurora.org \
    --cc=sudraja@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.