From mboxrd@z Thu Jan  1 00:00:00 1970
From: Tushar Dave <tushar.n.dave@oracle.com>
Subject: Re: [PATCH net-next 1/5] bpf: use __GFP_COMP while allocating page
Date: Wed, 12 Sep 2018 13:15:48 -0700
Message-ID: <8f22f67e-437b-6533-4c87-e13d4fe33f95@oracle.com>
References: <1536694684-3200-1-git-send-email-tushar.n.dave@oracle.com>
 <1536694684-3200-2-git-send-email-tushar.n.dave@oracle.com>
 <2fd601e3-5679-08f4-1610-b3c22de80935@oracle.com>
 <6855742d-5925-0d94-e4f3-74bf118ca3d2@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
To: John Fastabend <john.fastabend@gmail.com>, ast@kernel.org,
        daniel@iogearbox.net, davem@davemloft.net,
        santosh.shilimkar@oracle.com, jakub.kicinski@netronome.com,
        quentin.monnet@netronome.com, jiong.wang@netronome.com,
        sandipan@linux.vnet.ibm.com, kafai@fb.com, rdna@fb.com, yhs@fb.com,
        netdev@vger.kernel.org, rds-devel@oss.oracle.com,
        sowmini.varadhan@oracle.com
Return-path: <netdev-owner@vger.kernel.org>
Received: from userp2130.oracle.com ([156.151.31.86]:58848 "EHLO
        userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726069AbeIMBXN (ORCPT
        <rfc822;netdev@vger.kernel.org>); Wed, 12 Sep 2018 21:23:13 -0400
In-Reply-To: <6855742d-5925-0d94-e4f3-74bf118ca3d2@gmail.com>
Content-Language: en-US
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>


On 09/12/2018 09:51 AM, John Fastabend wrote:
> On 09/12/2018 09:21 AM, Tushar Dave wrote:
>>
>>
>> On 09/11/2018 12:38 PM, Tushar Dave wrote:
>>> Helper bpg_msg_pull_data() can allocate multiple pages while
>>> linearizing multiple scatterlist elements into one shared page.
>>> However, if the shared page has size > PAGE_SIZE, using
>>> copy_page_to_iter() causes below warning.
>>>
>>> e.g.
>>> [ 6367.019832] WARNING: CPU: 2 PID: 7410 at lib/iov_iter.c:825
>>> page_copy_sane.part.8+0x0/0x8
>>>
>>> To avoid above warning, use __GFP_COMP while allocating multiple
>>> contiguous pages.
>>>
>>> Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
>>> ---
>>>    net/core/filter.c | 3 ++-
>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index d301134..0b40f95 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -2344,7 +2344,8 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
>>>        if (unlikely(bytes_sg_total > copy))
>>>            return -EINVAL;
>>>    -    page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC, get_order(copy));
>>> +    page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
>>> +               get_order(copy));
>>>        if (unlikely(!page))
>>>            return -ENOMEM;
>>>        p = page_address(page);
>>
>> I should have mentioned that I could re-order this patch anywhere in
>> patch series (as long as it doesn't break git bisect). I kept it first
>> because I think it is more like a bug fix. I sent it along with these
>> patch series considering we have a context of why and for what I need
>> this patch!
>>
>> Daniel, John,
>>
>> Not sure if you guys hit this page_copy_sane warning. I hit it when RDS
>> copy sg page to userspace using copy_page_to_iter().
>>
> 
> I have not hit this before but I'm working on a set of patches for
> test_sockmap to test the bpf_msg_pull_data() so I'll add a case
> for this. Currently, we only test the simple case where we pull
> data out of a single page in selftests. This was sufficient for
> my use case but missed a handful of other valid cases.
> 
>> example:
>>
>> RDS packet size 8KB represented in scatterlist:
>> sg_data[0].length = 1400
>> sg_data[1].length = 1448
>> sg_data[2].length = 1448
>> sg_data[3].length = 1448
>> sg_data[4].length = 1448
>> sg_data[5].length = 1000
>>
>> If start=0 and end=8192, bpf_msg_pull_data() will linearize all
>> sg_data elements into one shared page. e.g. sg_data[0].length = 8192.
>> Using this sg_data[0].page in function copy_page_to_iter() causes:
>> WARNING: CPU: 2 PID: 7410 at lib/iov_iter.c:825
>> page_copy_sane.part.8+0x0/0x8
>>
>> (FYI, patch 4 has code that does copy_page_to_iter)
>>
> 
> How about sending it as a bugfix against bpf on its own. It
> looks like we could reproduce this with a combination of
> bpf_msg_pull_data() + redirect (to ingress) perhaps. Either
> way seems like a candidate for the bpf fixes tree to me.

Done.

Thanks.
-Tushar
> 
> Thanks,
> John
> 
>>
>> Comments?
>>
>> Thanks in advance,
>> -Tushar
>>
>>
> 
>