From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
To: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux.dev,
"Michael S. Tsirkin" <mst@redhat.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
netdev@vger.kernel.org, Jesper Dangaard Brouer <hawk@kernel.org>
Subject: Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
Date: Wed, 17 Apr 2024 16:20:55 +0800 [thread overview]
Message-ID: <1713342055.436048-1-xuanzhuo@linux.alibaba.com> (raw)
In-Reply-To: <CACGkMEvjwXpF_mLR3H8ZW9PUE+3spcxKMQV1VvUARb0-Lt7NKQ@mail.gmail.com>
On Wed, 17 Apr 2024 12:08:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Apr 17, 2024 at 9:38 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 16 Apr 2024 11:24:53 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Apr 15, 2024 at 5:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 15 Apr 2024 16:56:45 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Apr 15, 2024 at 4:50 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Now, we chain the pages of big mode by the page's private variable.
> > > > > > > > > > > > But a subsequent patch aims to make the big mode to support
> > > > > > > > > > > > premapped mode. This requires additional space to store the dma addr.
> > > > > > > > > > > >
> > > > > > > > > > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > > > > > > > > > variable for storing the DMA addr.
> > > > > > > > > > > >
> > > > > > > > > > > > struct { /* Page cache and anonymous pages */
> > > > > > > > > > > > /**
> > > > > > > > > > > > * @lru: Pageout list, eg. active_list protected by
> > > > > > > > > > > > * lruvec->lru_lock. Sometimes used as a generic list
> > > > > > > > > > > > * by the page owner.
> > > > > > > > > > > > */
> > > > > > > > > > > > union {
> > > > > > > > > > > > struct list_head lru;
> > > > > > > > > > > >
> > > > > > > > > > > > /* Or, for the Unevictable "LRU list" slot */
> > > > > > > > > > > > struct {
> > > > > > > > > > > > /* Always even, to negate PageTail */
> > > > > > > > > > > > void *__filler;
> > > > > > > > > > > > /* Count page's or folio's mlocks */
> > > > > > > > > > > > unsigned int mlock_count;
> > > > > > > > > > > > };
> > > > > > > > > > > >
> > > > > > > > > > > > /* Or, free page */
> > > > > > > > > > > > struct list_head buddy_list;
> > > > > > > > > > > > struct list_head pcp_list;
> > > > > > > > > > > > };
> > > > > > > > > > > > /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > > > > > > > > struct address_space *mapping;
> > > > > > > > > > > > union {
> > > > > > > > > > > > pgoff_t index; /* Our offset within mapping. */
> > > > > > > > > > > > unsigned long share; /* share count for fsdax */
> > > > > > > > > > > > };
> > > > > > > > > > > > /**
> > > > > > > > > > > > * @private: Mapping-private opaque data.
> > > > > > > > > > > > * Usually used for buffer_heads if PagePrivate.
> > > > > > > > > > > > * Used for swp_entry_t if PageSwapCache.
> > > > > > > > > > > > * Indicates order in the buddy system if PageBuddy.
> > > > > > > > > > > > */
> > > > > > > > > > > > unsigned long private;
> > > > > > > > > > > > };
> > > > > > > > > > > >
> > > > > > > > > > > > But within the page pool struct, we have a variable called
> > > > > > > > > > > > dma_addr that is appropriate for storing dma addr.
> > > > > > > > > > > > And that struct is used by netstack. That works to our advantage.
> > > > > > > > > > > >
> > > > > > > > > > > > struct { /* page_pool used by netstack */
> > > > > > > > > > > > /**
> > > > > > > > > > > > * @pp_magic: magic value to avoid recycling non
> > > > > > > > > > > > * page_pool allocated pages.
> > > > > > > > > > > > */
> > > > > > > > > > > > unsigned long pp_magic;
> > > > > > > > > > > > struct page_pool *pp;
> > > > > > > > > > > > unsigned long _pp_mapping_pad;
> > > > > > > > > > > > unsigned long dma_addr;
> > > > > > > > > > > > atomic_long_t pp_ref_count;
> > > > > > > > > > > > };
> > > > > > > > > > > >
> > > > > > > > > > > > On the other side, we should use variables from the same sub-struct.
> > > > > > > > > > > > So this patch replaces the "private" with "pp".
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > > ---
> > > > > > > > > > >
> > > > > > > > > > > Instead of doing a customized version of page pool, can we simply
> > > > > > > > > > > switch to use page pool for big mode instead? Then we don't need to
> > > > > > > > > > > bother the dma stuffs.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The page pool needs to do the dma by the DMA APIs.
> > > > > > > > > > So we can not use the page pool directly.
> > > > > > > > >
> > > > > > > > > I found this:
> > > > > > > > >
> > > > > > > > > define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA
> > > > > > > > > * map/unmap
> > > > > > > > >
> > > > > > > > > It seems to work here?
> > > > > > > >
> > > > > > > >
> > > > > > > > I have studied the page pool mechanism and believe that we cannot use it
> > > > > > > > directly. We can make the page pool to bypass the DMA operations.
> > > > > > > > This allows us to handle DMA within virtio-net for pages allocated from the page
> > > > > > > > pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> > > > > > > > to the page.
> > > > > > > >
> > > > > > > > However, the critical issue pertains to unmapping. Ideally, we want to return
> > > > > > > > the mapped pages to the page pool and reuse them. In doing so, we can omit the
> > > > > > > > unmapping and remapping steps.
> > > > > > > >
> > > > > > > > Currently, there's a caveat: when the page pool cache is full, it disconnects
> > > > > > > > and releases the pages. When the pool hits its capacity, pages are relinquished
> > > > > > > > without a chance for unmapping.
> > > > > > >
> > > > > > > Technically, when ptr_ring is full there could be a fallback, but then
> > > > > > > it requires expensive synchronization between producer and consumer.
> > > > > > > For virtio-net, it might not be a problem because add/get has been
> > > > > > > synchronized. (It might be relaxed in the future, actually we've
> > > > > > > already seen a requirement in the past for virito-blk).
> > > > > >
> > > > > > The point is that the page will be released by page pool directly,
> > > > > > we will have no change to unmap that, if we work with page pool.
> > > > >
> > > > > I mean if we have a fallback, there would be no need to release these
> > > > > pages but put them into a link list.
> > > >
> > > >
> > > > What fallback?
> > >
> > > https://lore.kernel.org/netdev/1519607771-20613-1-git-send-email-mst@redhat.com/
> > >
> > > >
> > > > If we put the pages to the link list, why we use the page pool?
> > >
> > > The size of the cache and ptr_ring needs to be fixed.
> > >
> > > Again, as explained above, it needs more benchmarks and looks like a
> > > separate topic.
> > >
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > If we were to unmap pages each time before
> > > > > > > > returning them to the pool, we would negate the benefits of bypassing the
> > > > > > > > mapping and unmapping process altogether.
> > > > > > >
> > > > > > > Yes, but the problem in this approach is that it creates a corner
> > > > > > > exception where dma_addr is used outside the page pool.
> > > > > >
> > > > > > YES. This is a corner exception. We need to introduce this case to the page
> > > > > > pool.
> > > > > >
> > > > > > So for introducing the page-pool to virtio-net(not only for big mode),
> > > > > > we may need to push the page-pool to support dma by drivers.
> > > > >
> > > > > Adding Jesper for some comments.
> > > > >
> > > > > >
> > > > > > Back to this patch set, I think we should keep the virtio-net to manage
> > > > > > the pages.
> > > > > >
> > > > > > What do you think?
> > > > >
> > > > > I might be wrong, but I think if we need to either
> > > > >
> > > > > 1) seek a way to manage the pages by yourself but not touching page
> > > > > pool metadata (or Jesper is fine with this)
> > > >
> > > > Do you mean working with page pool or not?
> > > >
> > >
> > > I meant if Jesper is fine with reusing page pool metadata like this patch.
> > >
> > > > If we manage the pages by self(no page pool), we do not care the metadata is for
> > > > page pool or not. We just use the space of pages like the "private".
> > >
> > > That's also fine.
> > >
> > > >
> > > >
> > > > > 2) optimize the unmap for page pool
> > > > >
> > > > > or even
> > > > >
> > > > > 3) just do dma_unmap before returning the page back to the page pool,
> > > > > we don't get all the benefits of page pool but we end up with simple
> > > > > codes (no fallback for premapping).
> > > >
> > > > I am ok for this.
> > >
> > > Right, we just need to make sure there's no performance regression,
> > > then it would be fine.
> > >
> > > I see for example mana did this as well.
> >
> > I think we should not use page pool directly now,
> > because the mana does not need a space to store the dma address.
> > We need to store the dma address for unmapping.
> >
> > If we use page pool without PP_FLAG_DMA_MAP, then store the dma address by
> > page.dma_addr, I think that is not safe.
>
> Jesper, could you comment on this?
>
> >
> > I think the way of this patch set is fine.
>
> So it reuses page pool structure in the page structure for another use case.
>
> > We just use the
> > space of the page whatever it is page pool or not to store
> > the link and dma address.
>
> Probably because we've already "abused" page->private. I would leave
> it for other maintainers to decide.
If we do not want to use the elements of the page directly,
the page pool is a good way.
But we must make the page pool to work without PP_FLAG_DMA_MAP, because the
virtio-net must use the DMA APIs wrapped by virtio core.
And we still need to store the dma address, because virtio-net
can not access the descs directly.
@Jesper can we setup the page pool without PP_FLAG_DMA_MAP,
and call page_pool_set_dma_addr() from the virtio-net driver?
Thanks.
>
> Thanks
>
> >
> > Thanks.
> >
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > Thanks.
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Maybe for big mode it doesn't matter too much if there's no
> > > > > > > performance improvement.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
>
next prev parent reply other threads:[~2024-04-17 8:43 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-11 2:51 [PATCH vhost 0/6] virtio_net: rx enable premapped mode by default Xuan Zhuo
2024-04-11 2:51 ` [PATCH vhost 1/6] virtio_ring: introduce dma map api for page Xuan Zhuo
2024-04-11 11:45 ` Alexander Lobakin
2024-04-12 3:48 ` Xuan Zhuo
2024-04-18 6:08 ` Jason Wang
2024-04-11 2:51 ` [PATCH vhost 2/6] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
2024-04-18 6:09 ` Jason Wang
2024-04-18 6:13 ` Jason Wang
2024-04-11 2:51 ` [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page Xuan Zhuo
2024-04-12 4:47 ` Jason Wang
2024-04-12 5:35 ` Xuan Zhuo
2024-04-12 5:49 ` Jason Wang
2024-04-12 6:02 ` Xuan Zhuo
2024-04-15 2:08 ` Xuan Zhuo
2024-04-15 6:43 ` Jason Wang
2024-04-15 8:36 ` Xuan Zhuo
2024-04-15 8:56 ` Jason Wang
2024-04-15 8:59 ` Xuan Zhuo
2024-04-16 3:24 ` Jason Wang
2024-04-17 1:30 ` Xuan Zhuo
2024-04-17 4:08 ` Jason Wang
2024-04-17 8:20 ` Xuan Zhuo [this message]
2024-04-18 4:15 ` Jason Wang
2024-04-18 4:16 ` Jason Wang
2024-04-18 20:19 ` Jesper Dangaard Brouer
2024-04-18 21:56 ` Matthew Wilcox
2024-04-19 7:11 ` Xuan Zhuo
2024-04-18 6:11 ` Jason Wang
2024-04-11 2:51 ` [PATCH vhost 4/6] virtio_net: big mode support premapped Xuan Zhuo
2024-04-11 16:34 ` kernel test robot
2024-04-11 20:11 ` kernel test robot
2024-04-14 9:48 ` Dan Carpenter
2024-04-18 6:25 ` Jason Wang
2024-04-18 8:29 ` Xuan Zhuo
2024-04-19 0:43 ` Jason Wang
2024-04-19 4:21 ` Xuan Zhuo
2024-04-19 5:46 ` Jason Wang
2024-04-19 7:03 ` Xuan Zhuo
2024-04-19 7:24 ` Jason Wang
2024-04-19 7:26 ` Xuan Zhuo
2024-04-19 8:12 ` Jason Wang
2024-04-19 8:14 ` Xuan Zhuo
2024-04-19 7:52 ` Xuan Zhuo
2024-04-11 2:51 ` [PATCH vhost 5/6] virtio_net: enable premapped by default Xuan Zhuo
2024-04-18 6:26 ` Jason Wang
2024-04-18 8:35 ` Xuan Zhuo
2024-04-19 0:44 ` Jason Wang
2024-04-11 2:51 ` [PATCH vhost 6/6] virtio_net: rx remove premapped failover code Xuan Zhuo
2024-04-18 6:31 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1713342055.436048-1-xuanzhuo@linux.alibaba.com \
--to=xuanzhuo@linux.alibaba.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=jasowang@redhat.com \
--cc=kuba@kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=virtualization@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).