netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
To: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux.dev,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page
Date: Mon, 15 Apr 2024 16:36:41 +0800	[thread overview]
Message-ID: <1713170201.06163-2-xuanzhuo@linux.alibaba.com> (raw)
In-Reply-To: <CACGkMEvmaH9NE-5VDBPpZOpAAg4bX39Lf0-iGiYzxdV5JuZWww@mail.gmail.com>

On Mon, 15 Apr 2024 14:43:24 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Apr 15, 2024 at 10:35 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Fri, 12 Apr 2024 13:49:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Fri, Apr 12, 2024 at 1:39 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Fri, 12 Apr 2024 12:47:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Thu, Apr 11, 2024 at 10:51 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > Now, we chain the pages of big mode by the page's private variable.
> > > > > > But a subsequent patch aims to make the big mode to support
> > > > > > premapped mode. This requires additional space to store the dma addr.
> > > > > >
> > > > > > Within the sub-struct that contains the 'private', there is no suitable
> > > > > > variable for storing the DMA addr.
> > > > > >
> > > > > >                 struct {        /* Page cache and anonymous pages */
> > > > > >                         /**
> > > > > >                          * @lru: Pageout list, eg. active_list protected by
> > > > > >                          * lruvec->lru_lock.  Sometimes used as a generic list
> > > > > >                          * by the page owner.
> > > > > >                          */
> > > > > >                         union {
> > > > > >                                 struct list_head lru;
> > > > > >
> > > > > >                                 /* Or, for the Unevictable "LRU list" slot */
> > > > > >                                 struct {
> > > > > >                                         /* Always even, to negate PageTail */
> > > > > >                                         void *__filler;
> > > > > >                                         /* Count page's or folio's mlocks */
> > > > > >                                         unsigned int mlock_count;
> > > > > >                                 };
> > > > > >
> > > > > >                                 /* Or, free page */
> > > > > >                                 struct list_head buddy_list;
> > > > > >                                 struct list_head pcp_list;
> > > > > >                         };
> > > > > >                         /* See page-flags.h for PAGE_MAPPING_FLAGS */
> > > > > >                         struct address_space *mapping;
> > > > > >                         union {
> > > > > >                                 pgoff_t index;          /* Our offset within mapping. */
> > > > > >                                 unsigned long share;    /* share count for fsdax */
> > > > > >                         };
> > > > > >                         /**
> > > > > >                          * @private: Mapping-private opaque data.
> > > > > >                          * Usually used for buffer_heads if PagePrivate.
> > > > > >                          * Used for swp_entry_t if PageSwapCache.
> > > > > >                          * Indicates order in the buddy system if PageBuddy.
> > > > > >                          */
> > > > > >                         unsigned long private;
> > > > > >                 };
> > > > > >
> > > > > > But within the page pool struct, we have a variable called
> > > > > > dma_addr that is appropriate for storing dma addr.
> > > > > > And that struct is used by netstack. That works to our advantage.
> > > > > >
> > > > > >                 struct {        /* page_pool used by netstack */
> > > > > >                         /**
> > > > > >                          * @pp_magic: magic value to avoid recycling non
> > > > > >                          * page_pool allocated pages.
> > > > > >                          */
> > > > > >                         unsigned long pp_magic;
> > > > > >                         struct page_pool *pp;
> > > > > >                         unsigned long _pp_mapping_pad;
> > > > > >                         unsigned long dma_addr;
> > > > > >                         atomic_long_t pp_ref_count;
> > > > > >                 };
> > > > > >
> > > > > > On the other side, we should use variables from the same sub-struct.
> > > > > > So this patch replaces the "private" with "pp".
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > >
> > > > > Instead of doing a customized version of page pool, can we simply
> > > > > switch to use page pool for big mode instead? Then we don't need to
> > > > > bother the dma stuffs.
> > > >
> > > >
> > > > The page pool needs to do the dma by the DMA APIs.
> > > > So we can not use the page pool directly.
> > >
> > > I found this:
> > >
> > > define PP_FLAG_DMA_MAP         BIT(0) /* Should page_pool do the DMA
> > >                                         * map/unmap
> > >
> > > It seems to work here?
> >
> >
> > I have studied the page pool mechanism and believe that we cannot use it
> > directly. We can make the page pool to bypass the DMA operations.
> > This allows us to handle DMA within virtio-net for pages allocated from the page
> > pool. Furthermore, we can utilize page pool helpers to associate the DMA address
> > to the page.
> >
> > However, the critical issue pertains to unmapping. Ideally, we want to return
> > the mapped pages to the page pool and reuse them. In doing so, we can omit the
> > unmapping and remapping steps.
> >
> > Currently, there's a caveat: when the page pool cache is full, it disconnects
> > and releases the pages. When the pool hits its capacity, pages are relinquished
> > without a chance for unmapping.
>
> Technically, when ptr_ring is full there could be a fallback, but then
> it requires expensive synchronization between producer and consumer.
> For virtio-net, it might not be a problem because add/get has been
> synchronized. (It might be relaxed in the future, actually we've
> already seen a requirement in the past for virito-blk).

The point is that the page will be released by page pool directly,
we will have no change to unmap that, if we work with page pool.

>
> > If we were to unmap pages each time before
> > returning them to the pool, we would negate the benefits of bypassing the
> > mapping and unmapping process altogether.
>
> Yes, but the problem in this approach is that it creates a corner
> exception where dma_addr is used outside the page pool.

YES. This is a corner exception. We need to introduce this case to the page
pool.

So for introducing the page-pool to virtio-net(not only for big mode),
we may need to push the page-pool to support dma by drivers.

Back to this patch set, I think we should keep the virtio-net to manage
the pages.

What do you think?

Thanks

>
> Maybe for big mode it doesn't matter too much if there's no
> performance improvement.
>
> Thanks
>
> >
> > Thanks.
> >
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

  reply	other threads:[~2024-04-15  8:49 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-11  2:51 [PATCH vhost 0/6] virtio_net: rx enable premapped mode by default Xuan Zhuo
2024-04-11  2:51 ` [PATCH vhost 1/6] virtio_ring: introduce dma map api for page Xuan Zhuo
2024-04-11 11:45   ` Alexander Lobakin
2024-04-12  3:48     ` Xuan Zhuo
2024-04-18  6:08   ` Jason Wang
2024-04-11  2:51 ` [PATCH vhost 2/6] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
2024-04-18  6:09   ` Jason Wang
2024-04-18  6:13   ` Jason Wang
2024-04-11  2:51 ` [PATCH vhost 3/6] virtio_net: replace private by pp struct inside page Xuan Zhuo
2024-04-12  4:47   ` Jason Wang
2024-04-12  5:35     ` Xuan Zhuo
2024-04-12  5:49       ` Jason Wang
2024-04-12  6:02         ` Xuan Zhuo
2024-04-15  2:08         ` Xuan Zhuo
2024-04-15  6:43           ` Jason Wang
2024-04-15  8:36             ` Xuan Zhuo [this message]
2024-04-15  8:56               ` Jason Wang
2024-04-15  8:59                 ` Xuan Zhuo
2024-04-16  3:24                   ` Jason Wang
2024-04-17  1:30                     ` Xuan Zhuo
2024-04-17  4:08                       ` Jason Wang
2024-04-17  8:20                         ` Xuan Zhuo
2024-04-18  4:15                           ` Jason Wang
2024-04-18  4:16                             ` Jason Wang
2024-04-18 20:19                           ` Jesper Dangaard Brouer
2024-04-18 21:56                             ` Matthew Wilcox
2024-04-19  7:11                             ` Xuan Zhuo
2024-04-18  6:11   ` Jason Wang
2024-04-11  2:51 ` [PATCH vhost 4/6] virtio_net: big mode support premapped Xuan Zhuo
2024-04-11 16:34   ` kernel test robot
2024-04-11 20:11   ` kernel test robot
2024-04-14  9:48   ` Dan Carpenter
2024-04-18  6:25   ` Jason Wang
2024-04-18  8:29     ` Xuan Zhuo
2024-04-19  0:43       ` Jason Wang
2024-04-19  4:21         ` Xuan Zhuo
2024-04-19  5:46           ` Jason Wang
2024-04-19  7:03             ` Xuan Zhuo
2024-04-19  7:24               ` Jason Wang
2024-04-19  7:26                 ` Xuan Zhuo
2024-04-19  8:12                   ` Jason Wang
2024-04-19  8:14                     ` Xuan Zhuo
2024-04-19  7:52                 ` Xuan Zhuo
2024-04-11  2:51 ` [PATCH vhost 5/6] virtio_net: enable premapped by default Xuan Zhuo
2024-04-18  6:26   ` Jason Wang
2024-04-18  8:35     ` Xuan Zhuo
2024-04-19  0:44       ` Jason Wang
2024-04-11  2:51 ` [PATCH vhost 6/6] virtio_net: rx remove premapped failover code Xuan Zhuo
2024-04-18  6:31   ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1713170201.06163-2-xuanzhuo@linux.alibaba.com \
    --to=xuanzhuo@linux.alibaba.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=virtualization@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).