BPF List
 help / color / mirror / Atom feed
* Re: [RFC PATCH net-next 0/5] net: In-kernel QUIC implementation with Userspace handshake
       [not found]                         ` <CADvbK_e7i08GAiOenJNTP_m+-MeYjSf7J-vkF+hgRfYGNCjkwQ@mail.gmail.com>
@ 2024-04-26  4:58                           ` Martin KaFai Lau
  0 siblings, 0 replies; only message in thread
From: Martin KaFai Lau @ 2024-04-26  4:58 UTC (permalink / raw)
  To: Xin Long, Stefan Metzmacher
  Cc: network dev, davem, kuba, Eric Dumazet, Paolo Abeni, Steve French,
	Namjae Jeon, Chuck Lever III, Jeff Layton, Sabrina Dubroca,
	Tyler Fanelli, Pengtao He, linux-cifs@vger.kernel.org,
	Samba Technical, bpf

On 4/22/24 1:58 PM, Xin Long wrote:
> On Sun, Apr 21, 2024 at 3:27 PM Stefan Metzmacher <metze@samba.org> wrote:
>>
>> Am 20.04.24 um 21:32 schrieb Xin Long:
>>> On Fri, Apr 19, 2024 at 3:19 PM Xin Long <lucien.xin@gmail.com> wrote:
>>>>
>>>> On Fri, Apr 19, 2024 at 2:51 PM Stefan Metzmacher <metze@samba.org> wrote:
>>>>>
>>>>> Hi Xin Long,
>>>>>
>>>>>>> But I think its unavoidable for the ALPN and SNI fields on
>>>>>>> the server side. As every service tries to use udp port 443
>>>>>>> and somehow that needs to be shared if multiple services want to
>>>>>>> use it.
>>>>>>>
>>>>>>> I guess on the acceptor side we would need to somehow detach low level
>>>>>>> udp struct sock from the logical listen struct sock.
>>>>>>>
>>>>>>> And quic_do_listen_rcv() would need to find the correct logical listening
>>>>>>> socket and call quic_request_sock_enqueue() on the logical socket
>>>>>>> not the lowlevel udo socket. The same for all stuff happening after
>>>>>>> quic_request_sock_enqueue() at the end of quic_do_listen_rcv.
>>>>>>>
>>>>>> The implementation allows one low level UDP sock to serve for multiple
>>>>>> QUIC socks.
>>>>>>
>>>>>> Currently, if your 3 quic applications listen to the same address:port
>>>>>> with SO_REUSEPORT socket option set, the incoming connection will choose
>>>>>> one of your applications randomly with hash(client_addr+port) vi
>>>>>> reuseport_select_sock() in quic_sock_lookup().
>>>>>>
>>>>>> It should be easy to do a further match with ALPN between these 3 quic
>>>>>> socks that listens to the same address:port to get the right quic sock,
>>>>>> instead of that randomly choosing.
>>>>>
>>>>> Ah, that sounds good.
>>>>>
>>>>>> The problem is to parse the TLS Client_Hello message to get the ALPN in
>>>>>> quic_sock_lookup(), which is not a proper thing to do in kernel, and
>>>>>> might be rejected by networking maintainers, I need to check with them.
>>>>>
>>>>> Is the reassembling of CRYPTO frames done in the kernel or
>>>>> userspace? Can you point me to the place in the code?
>>>> In quic_inq_handshake_tail() in kernel, for Client Initial packet
>>>> is processed when calling accept(), this is the path:
>>>>
>>>> quic_accept()-> quic_accept_sock_init() -> quic_packet_process() ->
>>>> quic_packet_handshake_process() -> quic_frame_process() ->
>>>> quic_frame_crypto_process() -> quic_inq_handshake_tail().
>>>>
>>>> Note that it's with the accept sock, not the listen sock.
>>>>
>>>>>
>>>>> If it's really impossible to do in C code maybe
>>>>> registering a bpf function in order to allow a listener
>>>>> to check the intial quic packet and decide if it wants to serve
>>>>> that connection would be possible as last resort?
>>>> That's a smart idea! man.
>>>> I think the bpf hook in reuseport_select_sock() is meant to do such
>>>> selection.
>>>>
>>>> For the Client initial packet (the only packet you need to handle),
>>>> I double you will need to do the reassembling, as Client Hello TLS message
>>>> is always less than 400 byte in my env.
>>>>
>>>> But I think you need to do the decryption for the Client initial packet
>>>> before decoding it then parsing the TLS message from its crypto frame.
>>> I created this patch:
>>>
>>> https://github.com/lxin/quic/commit/aee0b7c77df3f39941f98bb901c73fdc560befb8
>>>
>>> to do this decryption in quic_sock_look() before calling
>>> reuseport_select_sock(), so that it provides the bpf selector with
>>> a plain-text QUIC initial packet:
>>>
>>> https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.2
>>>
>>> If it's complex for you to do the decryption for the initial packet in
>>> the bpf selector, I will apply this patch. Please let me know.
>>
>> I guess in addition to quic_server_handshake(), which is called
>> after accept(), there should be quic_server_prepare_listen()
>> (and something similar for in kernel servers) that setup the reuseport
>> magic for the socket, so that it's not needed in every application.
> It's done when calling listen(), see quic_inet_listen()->quic_hash()
> where only listening sockets with its sk_reuseport set will be
> added into the reuseport group.
> 
> It means SO_REUSEPORT sockopt must be set for every socket
> before calling listen().
> 
>>
>> It seems there is only a single ebpf program possible per
>> reuseport group, so there has to be just a single one.
> Yes, a single ebpf program per reuseport group should work.
> see prepare_sk_fds() in kernel selftests for select_reuseport bfp.
> 
>>
>> But is it possible for in kernel servers to also register an epbf program?
> Good question. TBH, I don't really know much about epbf programming.
> I guess the real problem is how you pass the .o file to kernel space?
> 
> Another question is, in the selftests:
> tools/testing/selftests/bpf/prog_tests/s
> tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
> 
> it created a global reuseport_array, and then added these sockets
> into this array for the later lookup, but these sockets are all created
> in the same process.
> 
> But your case is that the sockets are created in different processes.
> I'm not sure if it's possible to add sockets from different processes
> into the same reuseport_array?
> 
> Added Martin who introduced BPF_PROG_TYPE_SK_REUSEPORT,
> I guess he may know the answers.

I didn't read the patchset, so I don't know what wanted to be done.

 From capturing the questions in this and next email:

the reuseport_array is a bpf map. Like any bpf map, it can be shared across
different processes. Meaning different processes can add sk to the map.

The bpf prog that selects a sk from the reuseport_array is set by the userspace 
through setsockopt(SO_ATTACH_REUSEPORT_EBPF). It is the only way right now, iirc.

If you can summarize what want to be done, it could help to see if there
are ways that work for the use case.


> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2024-04-26  4:59 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <cover.1710173427.git.lucien.xin@gmail.com>
     [not found] ` <74d5db09-6b5c-4054-b9d3-542f34769083@samba.org>
     [not found]   ` <CADvbK_dzVcDKsJ9RN9oc0K1Jwd+kYjxgE6q=ioRbVGhJx7Qznw@mail.gmail.com>
     [not found]     ` <f427b422-6cfc-45ac-88eb-3e7694168b63@samba.org>
     [not found]       ` <CADvbK_cA-RCLiUUWkyNsS=4OhkWrUWb68QLg28yO2=8PqNuGBQ@mail.gmail.com>
     [not found]         ` <438496a6-7f90-403d-9558-4a813e842540@samba.org>
     [not found]           ` <CADvbK_fkbOnhKL+Rb+pp+NF+VzppOQ68c=nk_6MSNjM_dxpCoQ@mail.gmail.com>
     [not found]             ` <1456b69c-4ffd-4a08-b120-6a00abf1eb05@samba.org>
     [not found]               ` <CADvbK_cQRpyzHG4UUOzfgmqLndvpx5Cd+d59rrqGRp0ic3PyxA@mail.gmail.com>
     [not found]                 ` <95922a2f-07a1-4555-acd2-c745e59bcb8e@samba.org>
     [not found]                   ` <CADvbK_eR4++HbR_RncjV9N__M-uTHtmqcC+_Of1RKVw7Uqf9Cw@mail.gmail.com>
     [not found]                     ` <CADvbK_dEWNNA_i1maRk4cmAB_uk4G4x0eZfZbrVX=zJ+2H9o_A@mail.gmail.com>
     [not found]                       ` <dc3815af-5b46-452b-8bcc-30a0934740a2@samba.org>
     [not found]                         ` <CADvbK_e7i08GAiOenJNTP_m+-MeYjSf7J-vkF+hgRfYGNCjkwQ@mail.gmail.com>
2024-04-26  4:58                           ` [RFC PATCH net-next 0/5] net: In-kernel QUIC implementation with Userspace handshake Martin KaFai Lau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox