From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Yunsheng Lin <linyunsheng@huawei.com>,
davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com
Cc: liuyonglong@huawei.com, fanghaiqing@huawei.com,
zhangkun09@huawei.com, Robin Murphy <robin.murphy@arm.com>,
Alexander Duyck <alexander.duyck@gmail.com>,
IOMMU <iommu@lists.linux.dev>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Eric Dumazet <edumazet@google.com>,
Simon Horman <horms@kernel.org>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has already unbound
Date: Tue, 26 Nov 2024 11:22:39 +0100 [thread overview]
Message-ID: <554e768b-e990-49ff-bad4-805ee931597f@kernel.org> (raw)
In-Reply-To: <6233e2c3-3fea-4ed0-bdcc-9a625270da37@huawei.com>
On 26/11/2024 09.22, Yunsheng Lin wrote:
> On 2024/11/25 23:25, Jesper Dangaard Brouer wrote:
>
> ...
>
>>>>> +
>>>>> void page_pool_destroy(struct page_pool *pool)
>>>>> {
>>>>> if (!pool)
>>>>> @@ -1139,6 +1206,8 @@ void page_pool_destroy(struct page_pool *pool)
>>>>> */
>>>>> synchronize_rcu();
>>>>> + page_pool_inflight_unmap(pool);
>>>>> +
>>>>
>>>> Reaching here means we have detected in-flight packets/pages.
>>>>
>>>> In "page_pool_inflight_unmap" we scan and find those in-flight pages to
>>>> DMA unmap them. Then below we wait for these in-flight pages again.
>>>> Why don't we just "release" (page_pool_release_page) those in-flight
>>>> pages from belonging to the page_pool, when we found them during scanning?
>>>>
>>>> If doing so, we can hopefully remove the periodic checking code below.
>>>
>>> I thought about that too, but it means more complicated work than just
>>> calling the page_pool_release_page() as page->pp_ref_count need to be
>>> converted into page->_refcount for the above to work, it seems hard to
>>> do that with least performance degradation as the racing against
>>> page_pool_put_page() being called concurrently.
>>>
>>
>> Maybe we can have a design that avoid/reduce concurrency. Can we
>> convert the suggested pool->destroy_lock into an atomic?
>> (Doing an *atomic* READ in page_pool_return_page, should be fast if we
>> keep this cache in in (cache coherence) Shared state).
>>
>> In your new/proposed page_pool_return_page() when we see the
>> "destroy_cnt" (now atomic READ) bigger than zero, then we can do nothing
>> (or maybe we need decrement page-refcnt?), as we know the destroy code
>
> Is it valid to have a page->_refcount of zero when page_pool still own
> the page if we only decrement page->_refcount and not clear page->pp_magic?
No, page_pool keeps page->_refcount equal 1 (for "release") and also
clears page->pp_magic.
> What happens if put_page() is called from other subsystem for a page_pool
> owned page, isn't that mean the page might be returned to buddy page
> allocator, causing use-after-free problem?
>
Notice that page_pool_release_page() didn't decrement page refcnt.
It disconnects a page (from a page_pool). To allow it to be used as
a regular page (that will eventually be returned to the normal
page-allocator via put_page).
>> will be taking care of "releasing" the pages from the page pool.
>
> If page->_refcount is not decremented in page_pool_return_page(), how
> does page_pool_destroy() know if a specific page have been called with
> page_pool_return_page()? Does an extra state is needed to indicate that?
>
Good point. In page_pool_return_page(), we will need to handle the two
cases. (1) page still belongs to a page_pool, (2) page have been
released to look like a normal page. For (2) we do need to call
put_page(). For (1) yes we would either need some extra state, such
that page_pool_destroy() knows, or a lock like this patchset.
> And there might still be concurrency between checking/handling of the extra
> state in page_pool_destroy() and the setting of extra state in
> page_pool_return_page(), something like lock might still be needed to avoid
> the above concurrency.
>
I agree, we (likely) cannot avoid this lock.
>>
>> Once the a page is release from a page pool it becomes a normal page,
>> that adhere to normal page refcnt'ing. That is how it worked before with
>> page_pool_release_page().
>> The later extensions with page fragment support and devmem might have
>> complicated this code path.
>
> As page_pool_return_page() and page_pool_destroy() both try to "release"
> the page concurrently for a specific page, I am not sure how using some
> simple *atomic* can avoid this kind of concurrency even before page
> fragment and devmem are supported, it would be good to be more specific
> about that by using some pseudocode.
>
Okay, some my simple atomic idea will not work.
NEW IDEA:
So, the my concern in this patchset is that BH-disabling spin_lock
pool->destroy_lock is held in the outer loop of
page_pool_inflight_unmap() that scans all pages. Disabling BH for this
long have nasty side-effects.
Will it be enough to grab the pool->destroy_lock only when we detect a
page that belongs to our page pool? Of-cause after obtaining the lock.
the code need to recheck if the page still belongs to the pool.
> I looked at it more closely, previously page_pool_put_page() seemed to
> not be allowed to be called after page_pool_release_page() had been
> called for a specific page mainly because of concurrently checking/handlig
> and clearing of page->pp_magic if I understand it correctly:
> https://elixir.bootlin.com/linux/v5.16.20/source/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L5316
As I said above
The page_pool_release_page() didn't decrement page refcnt.
It disconnects a page (from a page_pool). To allow it to be used as
a regular page (that will eventually be returned to the normal
page-allocator via put_page).
--Jesper
next prev parent reply other threads:[~2024-11-26 10:22 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-20 10:34 [PATCH RFC v4 0/3] fix two bugs related to page_pool Yunsheng Lin
2024-11-20 10:34 ` [PATCH RFC v4 1/3] page_pool: fix timing for checking and disabling napi_local Yunsheng Lin
2024-12-03 2:49 ` Jakub Kicinski
2024-12-04 11:01 ` Yunsheng Lin
2024-12-05 1:28 ` Jakub Kicinski
2024-12-05 11:43 ` Yunsheng Lin
2024-12-06 0:42 ` Jakub Kicinski
2024-12-06 12:29 ` Yunsheng Lin
2024-12-06 16:09 ` Jakub Kicinski
2024-12-07 5:52 ` Yunsheng Lin
2024-11-20 10:34 ` [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has already unbound Yunsheng Lin
2024-11-20 15:10 ` Jesper Dangaard Brouer
2024-11-21 8:03 ` Yunsheng Lin
2024-11-25 15:25 ` Jesper Dangaard Brouer
2024-11-26 8:22 ` Yunsheng Lin
2024-11-26 10:22 ` Jesper Dangaard Brouer [this message]
2024-11-26 11:46 ` Yunsheng Lin
2024-11-26 21:51 ` Mina Almasry
2024-11-26 23:53 ` Alexander Duyck
2024-11-27 9:35 ` Yunsheng Lin
2024-11-27 15:31 ` Robin Murphy
2024-11-27 16:27 ` Alexander Duyck
2024-12-04 11:16 ` Yunsheng Lin
2024-11-27 19:39 ` Mina Almasry
2024-11-20 10:34 ` [PATCH RFC v4 3/3] page_pool: skip dma sync operation for inflight pages Yunsheng Lin
2024-11-20 16:17 ` Robin Murphy
2024-11-21 8:04 ` Yunsheng Lin
2024-11-21 13:44 ` Robin Murphy
2024-11-22 7:20 ` Yunsheng Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=554e768b-e990-49ff-bad4-805ee931597f@kernel.org \
--to=hawk@kernel.org \
--cc=alexander.duyck@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fanghaiqing@huawei.com \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=iommu@lists.linux.dev \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=liuyonglong@huawei.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=robin.murphy@arm.com \
--cc=zhangkun09@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.