From mboxrd@z Thu Jan  1 00:00:00 1970
From: John Fastabend <john.fastabend@gmail.com>
Subject: Re: [bpf-next PATCH v2 3/4] bpf: sockmap, BPF_F_INGRESS flag for
 BPF_SK_SKB_STREAM_VERDICT:
Date: Wed, 28 Mar 2018 08:45:25 -0700
Message-ID: <3e7dd92b-a932-d977-2a7e-505b374733df@gmail.com>
References: <20180327172042.11354.81872.stgit@john-Precision-Tower-5810>
 <20180327172322.11354.54016.stgit@john-Precision-Tower-5810>
 <71a13f21-6886-85c3-6911-8ac33c486901@iogearbox.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Cc: netdev@vger.kernel.org, davem@davemloft.net
To: Daniel Borkmann <daniel@iogearbox.net>, ast@kernel.org
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-pl0-f68.google.com ([209.85.160.68]:45177 "EHLO
        mail-pl0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1753937AbeC1Ppl (ORCPT
        <rfc822;netdev@vger.kernel.org>); Wed, 28 Mar 2018 11:45:41 -0400
Received: by mail-pl0-f68.google.com with SMTP id n15-v6so1806767plp.12
        for <netdev@vger.kernel.org>; Wed, 28 Mar 2018 08:45:41 -0700 (PDT)
In-Reply-To: <71a13f21-6886-85c3-6911-8ac33c486901@iogearbox.net>
Content-Language: en-US
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On 03/28/2018 07:21 AM, Daniel Borkmann wrote:
> On 03/27/2018 07:23 PM, John Fastabend wrote:
>> Add support for the BPF_F_INGRESS flag in skb redirect helper. To
>> do this convert skb into a scatterlist and push into ingress queue.
>> This is the same logic that is used in the sk_msg redirect helper
>> so it should feel familiar.
>>
>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
>> ---
>>  include/linux/filter.h |    1 +
>>  kernel/bpf/sockmap.c   |   94 +++++++++++++++++++++++++++++++++++++++---------
>>  net/core/filter.c      |    2 +
>>  3 files changed, 78 insertions(+), 19 deletions(-)
> [...]
>>  		if (!sg->length && md->sg_start == md->sg_end) {
>>  			list_del(&md->list);
>> +			if (md->skb)
>> +				consume_skb(md->skb);
>>  			kfree(md);
>>  		}
>>  	}
>> @@ -1045,27 +1048,72 @@ static int smap_verdict_func(struct smap_psock *psock, struct sk_buff *skb)
>>  		__SK_DROP;
>>  }
>>  
>> +static int smap_do_ingress(struct smap_psock *psock, struct sk_buff *skb)
>> +{
>> +	struct sock *sk = psock->sock;
>> +	int copied = 0, num_sg;
>> +	struct sk_msg_buff *r;
>> +
>> +	r = kzalloc(sizeof(struct sk_msg_buff), __GFP_NOWARN | GFP_ATOMIC);
>> +	if (unlikely(!r))
>> +		return -EAGAIN;
>> +
>> +	if (!sk_rmem_schedule(sk, skb, skb->len)) {
>> +		kfree(r);
>> +		return -EAGAIN;
>> +	}
>> +	sk_mem_charge(sk, skb->len);
> 
> Usually mem accounting is based on truesize. This is not done here since
> you need the exact length of the skb for the sg list later on, right?

Correct.

> 
>> +	sg_init_table(r->sg_data, MAX_SKB_FRAGS);
>> +	num_sg = skb_to_sgvec(skb, r->sg_data, 0, skb->len);
>> +	if (unlikely(num_sg < 0)) {
>> +		kfree(r);
> 
> Don't we need to undo the mem charge here in case of error?
> 

Actually, I'll just move the sk_mem_charge() down below this error
then we don't need to unwind it.

>> +		return num_sg;
>> +	}
>> +	copied = skb->len;
>> +	r->sg_start = 0;
>> +	r->sg_end = num_sg == MAX_SKB_FRAGS ? 0 : num_sg;
>> +	r->skb = skb;
>> +	list_add_tail(&r->list, &psock->ingress);
>> +	sk->sk_data_ready(sk);
>> +	return copied;
>> +}