BPF List
 help / color / mirror / Atom feed
From: Yonghong Song <yonghong.song@linux.dev>
To: Fabian Pfitzner <f.pfitzner@tu-braunschweig.de>, bpf@vger.kernel.org
Subject: Re: No direct copy from ctx to map possible, why?
Date: Mon, 15 Apr 2024 20:22:58 -0700	[thread overview]
Message-ID: <39a68b12-a921-471b-83ff-6d59b21aa4a9@linux.dev> (raw)
In-Reply-To: <6d224ee5-ca50-44a9-882e-074710bf8477@tu-braunschweig.de>


On 4/15/24 1:25 PM, Fabian Pfitzner wrote:
>> Looks like you intend to copy packet data. So from the above, 
>> 'expected=fp,pkt,pkt_meta...', you can just put the first argument
>> with xdp->data, right? 
> Yes, I intend to copy packet data. What do you mean by "first 
> argument"? I'd like to put the whole data that is depicted by 
> xdp->data into a map that stores them as raw bytes (by using a char 
> array as map element to store the data).

Sorry, typo. 'first argument' should be 'third argument'.

>
>> Verifer rejects to 'ctx' since 'ctx' contents are subject to verifier 
>> rewrite. So actual 'ctx' contents/layouts may not match uapi definition. 
> Sorry but I do not understand what you mean by "subject to verifier 
> rewrite". What kind of rewrite happens when using the ctx as argument? 
> Furthermore, am I correct that you assume that the uapi may dictate 
> the structure of the data that can be stored in a map? How is it 
> different to the case when first storing it on the stack and then 
> putting it into a map?

The UAPI xdp_md struct:

struct xdp_md {
         __u32 data;
         __u32 data_end;
         __u32 data_meta;
         /* Below access go through struct xdp_rxq_info */
         __u32 ingress_ifindex; /* rxq->dev->ifindex */
         __u32 rx_queue_index;  /* rxq->queue_index  */

         __u32 egress_ifindex;  /* txq->dev->ifindex */
};

The actual kernel representation of xdp_md:

struct xdp_buff {
         void *data;
         void *data_end;
         void *data_meta;
         void *data_hard_start;
         struct xdp_rxq_info *rxq;
         struct xdp_txq_info *txq;
         u32 frame_sz; /* frame size to deduce data_hard_end/reserved 
tailroom*/
         u32 flags; /* supported values defined in xdp_buff_flags */
};

You can see they are quite different. So to use pointee of 'ctx' as the key, we
need to allocate a space of sizeof(struct_md) to the stack and copy necessary
stuff to that structure. For example, xdp_md->ingress_ifindex = xdp_buff->rxq->dev->ifindex, etc.
Some fields actually does not make sense for copying, e.g., data/data_end/data_meta in 64bit
architecture. Since stack allocation is needed any way, so disabling ctx and requires
user explicit using stack make sense (if they want to use *ctx as map update value).

In your particular example, since you intend to copy xdp_md->data, you can directly
access that from xdp_md->data pointer, there is no need to copy ctx which is not
what you want.

>
> On 4/15/24 6:01 PM, Yonghong Song wrote:
>>
>> On 4/14/24 2:34 PM, Fabian Pfitzner wrote:
>>> Hello,
>>>
>>> is there a specific reason why it is not allowed to copy data from 
>>> ctx directly into a map via the bpf_map_update_elem helper?
>>> I develop a XDP program where I need to store incoming packets 
>>> (including the whole payload) into a map in order to buffer them.
>>> I thought I could simply put them into a map via the mentioned 
>>> helper function, but the verifier complains about expecting another 
>>> type as "ctx" (R3 type=ctx expected=fp, pkt, pkt_meta, .....).
>>
>> Looks like you intend to copy packet data. So from the above, 
>> 'expected=fp,pkt,pkt_meta...', you can just put the first argument
>> with xdp->data, right?
>> Verifer rejects to 'ctx' since 'ctx' contents are subject to verifier 
>> rewrite. So actual 'ctx' contents/layouts may not match uapi definition.
>>
>>>
>>> I was able to circumvent this error by first putting the packet onto 
>>> the stack (via xdp->data) and then write it into the map.
>>> The only limitation with this is that I cannot store packets larger 
>>> than 512 bytes due to the maximum stack size.
>>>
>>> I was also able to circumvent this by slicing chunks, that are 
>>> smaller than 512 bytes, out of the packet so that I can use the 
>>> stack as a clipboard before putting them into the map. This is a 
>>> really ugly solution, but I have not found a better one yet.
>>>
>>> So my question is: Why does this limitation exist? I am not sure if 
>>> its only related to XDP programs as this restriction is defined 
>>> inside of the bpf_map_update_elem_proto struct (arg3_type restricts 
>>> this), so I think it is a general limitation that affects all 
>>> program types.
>>>
>>> Best regards,
>>> Fabian Pfitzner
>>>
>>>
>>>
>>>
>

  reply	other threads:[~2024-04-16  3:23 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-14 21:34 No direct copy from ctx to map possible, why? Fabian Pfitzner
2024-04-15 16:01 ` Yonghong Song
2024-04-15 20:25   ` Fabian Pfitzner
2024-04-16  3:22     ` Yonghong Song [this message]
2024-04-17 19:38       ` Fabian Pfitzner
2024-04-18  0:09         ` Yonghong Song
2024-04-19  0:20         ` Andrii Nakryiko
2024-04-16  5:12 ` Hengqi Chen
2024-04-17 19:42   ` Fabian Pfitzner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39a68b12-a921-471b-83ff-6d59b21aa4a9@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=bpf@vger.kernel.org \
    --cc=f.pfitzner@tu-braunschweig.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox