From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Yunsheng Lin <linyunsheng@huawei.com>,
davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com
Cc: liuyonglong@huawei.com, fanghaiqing@huawei.com,
zhangkun09@huawei.com, Robin Murphy <robin.murphy@arm.com>,
Alexander Duyck <alexander.duyck@gmail.com>,
IOMMU <iommu@lists.linux.dev>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Eric Dumazet <edumazet@google.com>,
Simon Horman <horms@kernel.org>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has already unbound
Date: Tue, 26 Nov 2024 11:22:39 +0100 [thread overview]
Message-ID: <554e768b-e990-49ff-bad4-805ee931597f@kernel.org> (raw)
In-Reply-To: <6233e2c3-3fea-4ed0-bdcc-9a625270da37@huawei.com>
On 26/11/2024 09.22, Yunsheng Lin wrote:
> On 2024/11/25 23:25, Jesper Dangaard Brouer wrote:
>
> ...
>
>>>>> +
>>>>> void page_pool_destroy(struct page_pool *pool)
>>>>> {
>>>>> if (!pool)
>>>>> @@ -1139,6 +1206,8 @@ void page_pool_destroy(struct page_pool *pool)
>>>>> */
>>>>> synchronize_rcu();
>>>>> + page_pool_inflight_unmap(pool);
>>>>> +
>>>>
>>>> Reaching here means we have detected in-flight packets/pages.
>>>>
>>>> In "page_pool_inflight_unmap" we scan and find those in-flight pages to
>>>> DMA unmap them. Then below we wait for these in-flight pages again.
>>>> Why don't we just "release" (page_pool_release_page) those in-flight
>>>> pages from belonging to the page_pool, when we found them during scanning?
>>>>
>>>> If doing so, we can hopefully remove the periodic checking code below.
>>>
>>> I thought about that too, but it means more complicated work than just
>>> calling the page_pool_release_page() as page->pp_ref_count need to be
>>> converted into page->_refcount for the above to work, it seems hard to
>>> do that with least performance degradation as the racing against
>>> page_pool_put_page() being called concurrently.
>>>
>>
>> Maybe we can have a design that avoid/reduce concurrency. Can we
>> convert the suggested pool->destroy_lock into an atomic?
>> (Doing an *atomic* READ in page_pool_return_page, should be fast if we
>> keep this cache in in (cache coherence) Shared state).
>>
>> In your new/proposed page_pool_return_page() when we see the
>> "destroy_cnt" (now atomic READ) bigger than zero, then we can do nothing
>> (or maybe we need decrement page-refcnt?), as we know the destroy code
>
> Is it valid to have a page->_refcount of zero when page_pool still own
> the page if we only decrement page->_refcount and not clear page->pp_magic?
No, page_pool keeps page->_refcount equal 1 (for "release") and also
clears page->pp_magic.
> What happens if put_page() is called from other subsystem for a page_pool
> owned page, isn't that mean the page might be returned to buddy page
> allocator, causing use-after-free problem?
>
Notice that page_pool_release_page() didn't decrement page refcnt.
It disconnects a page (from a page_pool). To allow it to be used as
a regular page (that will eventually be returned to the normal
page-allocator via put_page).
>> will be taking care of "releasing" the pages from the page pool.
>
> If page->_refcount is not decremented in page_pool_return_page(), how
> does page_pool_destroy() know if a specific page have been called with
> page_pool_return_page()? Does an extra state is needed to indicate that?
>
Good point. In page_pool_return_page(), we will need to handle the two
cases. (1) page still belongs to a page_pool, (2) page have been
released to look like a normal page. For (2) we do need to call
put_page(). For (1) yes we would either need some extra state, such
that page_pool_destroy() knows, or a lock like this patchset.
> And there might still be concurrency between checking/handling of the extra
> state in page_pool_destroy() and the setting of extra state in
> page_pool_return_page(), something like lock might still be needed to avoid
> the above concurrency.
>
I agree, we (likely) cannot avoid this lock.
>>
>> Once the a page is release from a page pool it becomes a normal page,
>> that adhere to normal page refcnt'ing. That is how it worked before with
>> page_pool_release_page().
>> The later extensions with page fragment support and devmem might have
>> complicated this code path.
>
> As page_pool_return_page() and page_pool_destroy() both try to "release"
> the page concurrently for a specific page, I am not sure how using some
> simple *atomic* can avoid this kind of concurrency even before page
> fragment and devmem are supported, it would be good to be more specific
> about that by using some pseudocode.
>
Okay, some my simple atomic idea will not work.
NEW IDEA:
So, the my concern in this patchset is that BH-disabling spin_lock
pool->destroy_lock is held in the outer loop of
page_pool_inflight_unmap() that scans all pages. Disabling BH for this
long have nasty side-effects.
Will it be enough to grab the pool->destroy_lock only when we detect a
page that belongs to our page pool? Of-cause after obtaining the lock.
the code need to recheck if the page still belongs to the pool.
> I looked at it more closely, previously page_pool_put_page() seemed to
> not be allowed to be called after page_pool_release_page() had been
> called for a specific page mainly because of concurrently checking/handlig
> and clearing of page->pp_magic if I understand it correctly:
> https://elixir.bootlin.com/linux/v5.16.20/source/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c#L5316
As I said above
The page_pool_release_page() didn't decrement page refcnt.
It disconnects a page (from a page_pool). To allow it to be used as
a regular page (that will eventually be returned to the normal
page-allocator via put_page).
--Jesper
next prev parent reply other threads:[~2024-11-26 10:22 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20241120103456.396577-1-linyunsheng@huawei.com>
2024-11-20 10:34 ` [PATCH RFC v4 1/3] page_pool: fix timing for checking and disabling napi_local Yunsheng Lin
2024-12-03 2:49 ` Jakub Kicinski
2024-12-04 11:01 ` Yunsheng Lin
2024-12-05 1:28 ` Jakub Kicinski
2024-12-05 11:43 ` Yunsheng Lin
2024-12-06 0:42 ` Jakub Kicinski
2024-12-06 12:29 ` Yunsheng Lin
2024-12-06 16:09 ` Jakub Kicinski
2024-12-07 5:52 ` Yunsheng Lin
2024-11-20 10:34 ` [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has already unbound Yunsheng Lin
2024-11-20 15:10 ` Jesper Dangaard Brouer
2024-11-21 8:03 ` Yunsheng Lin
2024-11-25 15:25 ` Jesper Dangaard Brouer
2024-11-26 8:22 ` Yunsheng Lin
2024-11-26 10:22 ` Jesper Dangaard Brouer [this message]
2024-11-26 11:46 ` Yunsheng Lin
2024-11-26 21:51 ` Mina Almasry
2024-11-26 23:53 ` Alexander Duyck
2024-11-27 9:35 ` Yunsheng Lin
2024-11-27 15:31 ` Robin Murphy
2024-11-27 16:27 ` Alexander Duyck
2024-12-04 11:16 ` Yunsheng Lin
2024-11-27 19:39 ` Mina Almasry
2024-11-20 10:34 ` [PATCH RFC v4 3/3] page_pool: skip dma sync operation for inflight pages Yunsheng Lin
2024-11-20 16:17 ` Robin Murphy
2024-11-21 8:04 ` Yunsheng Lin
2024-11-21 13:44 ` Robin Murphy
2024-11-22 7:20 ` Yunsheng Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=554e768b-e990-49ff-bad4-805ee931597f@kernel.org \
--to=hawk@kernel.org \
--cc=alexander.duyck@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fanghaiqing@huawei.com \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=iommu@lists.linux.dev \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=liuyonglong@huawei.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=robin.murphy@arm.com \
--cc=zhangkun09@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).