netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	virtualization@lists.linux-foundation.org, bpf@vger.kernel.org,
	netdev@vger.kernel.org
Subject: Re: [PATCH net 2/2] virtio-net: get build_skb() buf by data ptr
Date: Tue, 1 Jun 2021 14:17:27 +0800	[thread overview]
Message-ID: <24aee726-83a9-9741-23bb-1f8be0e775e6@redhat.com> (raw)
In-Reply-To: <1622527177.2087624-1-xuanzhuo@linux.alibaba.com>


在 2021/6/1 下午1:59, Xuan Zhuo 写道:
> On Tue, 1 Jun 2021 11:27:12 +0800, Jason Wang <jasowang@redhat.com> wrote:
>> 在 2021/6/1 上午11:08, Xuan Zhuo 写道:
>>> On Tue, 1 Jun 2021 11:03:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2021/5/31 下午6:58, Xuan Zhuo 写道:
>>>>> On Mon, 31 May 2021 14:10:55 +0800, Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2021/5/14 下午11:16, Xuan Zhuo 写道:
>>>>>>> In the case of merge, the page passed into page_to_skb() may be a head
>>>>>>> page, not the page where the current data is located.
>>>>>> I don't get how this can happen?
>>>>>>
>>>>>> Maybe you can explain a little bit more?
>>>>>>
>>>>>> receive_mergeable() call page_to_skb() in two places:
>>>>>>
>>>>>> 1) XDP_PASS for linearized page , in this case we use xdp_page
>>>>>> 2) page_to_skb() for "normal" page, in this case the page contains the data
>>>>> The offset may be greater than PAGE_SIZE, because page is obtained by
>>>>> virt_to_head_page(), not the page where buf is located. And "offset" is the offset
>>>>> of buf relative to page.
>>>>>
>>>>> 	tailroom = truesize - len - offset;
>>>>>
>>>>> In this case, the tailroom must be less than 0. Although there may be enough
>>>>> content on this page to save skb_shared_info.
>>>> Interesting, I think we don't use compound pages for virtio-net. (We
>>>> don't define SKB_FRAG_PAGE_ORDER).
>>>>
>>>> Am I wrong?
>>> It seems to me that it seems to be a fixed setting, not for us to configure
>>> independently
>>
>> Looks like you are right.
>>
>> See comments below.
>>
>>
>>> Thanks.
>>>
>>> ==========================================
>>>
>>> net/sock.c
>>>
>>> #define SKB_FRAG_PAGE_ORDER	get_order(32768)
>>> DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key);
>>>
>>> /**
>>>    * skb_page_frag_refill - check that a page_frag contains enough room
>>>    * @sz: minimum size of the fragment we want to get
>>>    * @pfrag: pointer to page_frag
>>>    * @gfp: priority for memory allocation
>>>    *
>>>    * Note: While this allocator tries to use high order pages, there is
>>>    * no guarantee that allocations succeed. Therefore, @sz MUST be
>>>    * less or equal than PAGE_SIZE.
>>>    */
>>> bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
>>> {
>>> 	if (pfrag->page) {
>>> 		if (page_ref_count(pfrag->page) == 1) {
>>> 			pfrag->offset = 0;
>>> 			return true;
>>> 		}
>>> 		if (pfrag->offset + sz <= pfrag->size)
>>> 			return true;
>>> 		put_page(pfrag->page);
>>> 	}
>>>
>>> 	pfrag->offset = 0;
>>> 	if (SKB_FRAG_PAGE_ORDER &&
>>> 	    !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
>>> 		/* Avoid direct reclaim but allow kswapd to wake */
>>> 		pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
>>> 					  __GFP_COMP | __GFP_NOWARN |
>>> 					  __GFP_NORETRY,
>>> 					  SKB_FRAG_PAGE_ORDER);
>>> 		if (likely(pfrag->page)) {
>>> 			pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
>>> 			return true;
>>> 		}
>>> 	}
>>> 	pfrag->page = alloc_page(gfp);
>>> 	if (likely(pfrag->page)) {
>>> 		pfrag->size = PAGE_SIZE;
>>> 		return true;
>>> 	}
>>> 	return false;
>>> }
>>> EXPORT_SYMBOL(skb_page_frag_refill);
>>>
>>>
>>>> Thanks
>>>>
>>>>
>>>>> Thanks.
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> So when trying to
>>>>>>> get the buf where the data is located, you should directly use the
>>>>>>> pointer(p) to get the address corresponding to the page.
>>>>>>>
>>>>>>> At the same time, the offset of the data in the page should also be
>>>>>>> obtained using offset_in_page().
>>>>>>>
>>>>>>> This patch solves this problem. But if you don’t use this patch, the
>>>>>>> original code can also run, because if the page is not the page of the
>>>>>>> current data, the calculated tailroom will be less than 0, and will not
>>>>>>> enter the logic of build_skb() . The significance of this patch is to
>>>>>>> modify this logical problem, allowing more situations to use
>>>>>>> build_skb().
>>>>>>>
>>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>>>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>>>>>> ---
>>>>>>>      drivers/net/virtio_net.c | 8 ++++++--
>>>>>>>      1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>>> index 3e46c12dde08..073fec4c0df1 100644
>>>>>>> --- a/drivers/net/virtio_net.c
>>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>>> @@ -407,8 +407,12 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>>>>>>      		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
>>>>>>>      		 */
>>>>>>>      		truesize = PAGE_SIZE;
>>>>>>> -		tailroom = truesize - len - offset;
>>>>>>> -		buf = page_address(page);
>>>>>>> +
>>>>>>> +		/* page maybe head page, so we should get the buf by p, not the
>>>>>>> +		 * page
>>>>>>> +		 */
>>>>>>> +		tailroom = truesize - len - offset_in_page(p);
>>
>> I wonder why offset_in_page(p) is correct? I guess it should be:
>>
>> tailroom = truesize - len - headroom;
>>
>> The reason is that the buffer is not necessarily allocated at the page
>> boundary.
> In my understanding, offset_in_page(p) is the offset of p in the page where it
> is located. In this case, the two should be equal.


I think not, if the frag is not page aligned. offset_in_page(p) doesn't 
equal to headroom.

Consider the case that the frag start from page offset 1500.


>   This has nothing to do with
> which page is allocated.
>
> Of course I think using headroom is a good idea, and it is semantically better.
>
> Thanks.


Thanks


>
>> Thanks
>>
>>
>>>>>>> +		buf = (char *)((unsigned long)p & PAGE_MASK);
>>>>>>>      	} else {
>>>>>>>      		tailroom = truesize - len;
>>>>>>>      		buf = p;


       reply	other threads:[~2021-06-01  6:17 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1622527177.2087624-1-xuanzhuo@linux.alibaba.com>
2021-06-01  6:17 ` Jason Wang [this message]
     [not found] <1622516885.7439268-1-xuanzhuo@linux.alibaba.com>
2021-06-01  3:27 ` [PATCH net 2/2] virtio-net: get build_skb() buf by data ptr Jason Wang
     [not found] <1622458734.837168-1-xuanzhuo@linux.alibaba.com>
2021-06-01  3:03 ` Jason Wang
2021-05-14 15:16 [PATCH net 0/2] virtio-net: fix for build_skb() Xuan Zhuo
2021-05-14 15:16 ` [PATCH net 2/2] virtio-net: get build_skb() buf by data ptr Xuan Zhuo
2021-05-31  6:10   ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=24aee726-83a9-9741-23bb-1f8be0e775e6@redhat.com \
    --to=jasowang@redhat.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).