handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)

BPF List
 help / color / mirror / Atom feed

* handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
       [not found]         ` <ZDoEG0VF6fb9y0EC@google.com>
@ 2023-04-18  1:03           ` Martin KaFai Lau
  2023-04-18 16:47             ` Stanislav Fomichev
  0 siblings, 1 reply; 8+ messages in thread
From: Martin KaFai Lau @ 2023-04-18  1:03 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Eric Dumazet, davem, linux-kernel, netdev, daniel, Jakub Kicinski,
	Paolo Abeni, Leon Romanovsky, David Ahern, Arnd Bergmann,
	Kees Cook, Christian Brauner, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch, Aleksandr Mikhalitsyn, bpf

On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
> On 04/13, Stanislav Fomichev wrote:
>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>
>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
>>>>
>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>
>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
>>>>> install FDs into the process fdtable.
>>>>>
>>>>> After some offlist discussion it was proposed to add a blacklist of
>>>>
>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
>>>
>>> Hi Eric,
>>>
>>> Oh, I'm sorry about that. :( Sure.
>>>
>>>>
>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
>>>>>
>>>>
>>>> Can we find the appropriate Fixes: tag to help stable teams ?
>>>
>>> Sure, I will add next time.
>>>
>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
>>>
>>> I think it's better to add Stanislav Fomichev to CC.
>>
>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
>> use it for tcp zerocopy, I'm assuming it should work in this case as
>> well?
> 
> Jakub reminded me of the other things I wanted to ask here bug forgot:
> 
> - setsockopt is probably not needed, right? setsockopt hook triggers
>    before the kernel and shouldn't leak anything
> - for getsockopt, instead of bypassing bpf completely, should we instead
>    ignore the error from the bpf program? that would still preserve
>    the observability aspect

stealing this thread to discuss the optlen issue which may make sense to bypass 
also.

There has been issue with optlen. Other than this older post related to optlen > 
PAGE_SIZE: 
https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/, the 
recent one related to optlen that we have seen is NETLINK_LIST_MEMBERSHIPS. The 
userspace passed in optlen == 0 and the kernel put the expected optlen (> 0) and 
'return 0;' to userspace. The userspace intention is to learn the expected 
optlen. This makes 'ctx.optlen > max_optlen' and 
__cgroup_bpf_run_filter_getsockopt() ends up returning -EFAULT to the userspace 
even the bpf prog has not changed anything.

Does it make sense to also bypass the bpf prog when 'ctx.optlen > max_optlen' 
for now (and this can use a separate patch which as usual requires a bpf selftests)?

In the future, does it make sense to have a specific cgroup-bpf-prog (a specific 
attach type?) that only uses bpf_dynptr kfunc to access the optval such that it 
can enforce read-only for some optname and potentially also track if bpf-prog 
has written a new optval? The bpf-prog can only return 1 (OK) and only allows 
using bpf_set_retval() instead. Likely there is still holes but could be a seed 
of thought to continue polishing the idea.

> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
>    gets called whenever bpf returns an error to make sure protocols have
>    a chance to handle that condition (and free the fd)
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-18  1:03           ` handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook) Martin KaFai Lau
@ 2023-04-18 16:47             ` Stanislav Fomichev
  2023-04-25 17:59               ` Kui-Feng Lee
  0 siblings, 1 reply; 8+ messages in thread
From: Stanislav Fomichev @ 2023-04-18 16:47 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Eric Dumazet, davem, linux-kernel, netdev, daniel, Jakub Kicinski,
	Paolo Abeni, Leon Romanovsky, David Ahern, Arnd Bergmann,
	Kees Cook, Christian Brauner, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch, Aleksandr Mikhalitsyn, bpf

On 04/17, Martin KaFai Lau wrote:
> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
> > On 04/13, Stanislav Fomichev wrote:
> > > On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
> > > <aleksandr.mikhalitsyn@canonical.com> wrote:
> > > > 
> > > > On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
> > > > > 
> > > > > On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
> > > > > <aleksandr.mikhalitsyn@canonical.com> wrote:
> > > > > > 
> > > > > > During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> > > > > > that bpf cgroup hook can cause FD leaks when used with sockopts which
> > > > > > install FDs into the process fdtable.
> > > > > > 
> > > > > > After some offlist discussion it was proposed to add a blacklist of
> > > > > 
> > > > > We try to replace this word by either denylist or blocklist, even in changelogs.
> > > > 
> > > > Hi Eric,
> > > > 
> > > > Oh, I'm sorry about that. :( Sure.
> > > > 
> > > > > 
> > > > > > socket options those can cause troubles when BPF cgroup hook is enabled.
> > > > > > 
> > > > > 
> > > > > Can we find the appropriate Fixes: tag to help stable teams ?
> > > > 
> > > > Sure, I will add next time.
> > > > 
> > > > Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> > > > 
> > > > I think it's better to add Stanislav Fomichev to CC.
> > > 
> > > Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
> > > use it for tcp zerocopy, I'm assuming it should work in this case as
> > > well?
> > 
> > Jakub reminded me of the other things I wanted to ask here bug forgot:
> > 
> > - setsockopt is probably not needed, right? setsockopt hook triggers
> >    before the kernel and shouldn't leak anything
> > - for getsockopt, instead of bypassing bpf completely, should we instead
> >    ignore the error from the bpf program? that would still preserve
> >    the observability aspect
> 
> stealing this thread to discuss the optlen issue which may make sense to
> bypass also.
> 
> There has been issue with optlen. Other than this older post related to
> optlen > PAGE_SIZE:
> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
> the recent one related to optlen that we have seen is
> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
> intention is to learn the expected optlen. This makes 'ctx.optlen >
> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
> -EFAULT to the userspace even the bpf prog has not changed anything.

(ignoring -EFAULT issue) this seems like it needs to be

	if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
		/* error */
	}

?

> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
> max_optlen' for now (and this can use a separate patch which as usual
> requires a bpf selftests)?

Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
seems like the way to go. It caused too much trouble already :-(

Should I prepare a patch or do you want to take a stab at it?

> In the future, does it make sense to have a specific cgroup-bpf-prog (a
> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
> such that it can enforce read-only for some optname and potentially also
> track if bpf-prog has written a new optval? The bpf-prog can only return 1
> (OK) and only allows using bpf_set_retval() instead. Likely there is still
> holes but could be a seed of thought to continue polishing the idea.

Ack, let's think about it.

Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
and we have a mostly working bpf_getsockopt (after your refactoring),
I don't see why we need to continue the current scheme of triggering
after the kernel?

> > - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
> >    gets called whenever bpf returns an error to make sure protocols have
> >    a chance to handle that condition (and free the fd)
> > 
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-18 16:47             ` Stanislav Fomichev
@ 2023-04-25 17:59               ` Kui-Feng Lee
  2023-04-25 18:42                 ` Stanislav Fomichev
  0 siblings, 1 reply; 8+ messages in thread
From: Kui-Feng Lee @ 2023-04-25 17:59 UTC (permalink / raw)
  To: Stanislav Fomichev, Martin KaFai Lau
  Cc: Eric Dumazet, davem, linux-kernel, netdev, daniel, Jakub Kicinski,
	Paolo Abeni, Leon Romanovsky, David Ahern, Arnd Bergmann,
	Kees Cook, Christian Brauner, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch, Aleksandr Mikhalitsyn, bpf



On 4/18/23 09:47, Stanislav Fomichev wrote:
> On 04/17, Martin KaFai Lau wrote:
>> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
>>> On 04/13, Stanislav Fomichev wrote:
>>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>
>>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
>>>>>>
>>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>>>
>>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
>>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
>>>>>>> install FDs into the process fdtable.
>>>>>>>
>>>>>>> After some offlist discussion it was proposed to add a blacklist of
>>>>>>
>>>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
>>>>>
>>>>> Hi Eric,
>>>>>
>>>>> Oh, I'm sorry about that. :( Sure.
>>>>>
>>>>>>
>>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
>>>>>>>
>>>>>>
>>>>>> Can we find the appropriate Fixes: tag to help stable teams ?
>>>>>
>>>>> Sure, I will add next time.
>>>>>
>>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
>>>>>
>>>>> I think it's better to add Stanislav Fomichev to CC.
>>>>
>>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
>>>> use it for tcp zerocopy, I'm assuming it should work in this case as
>>>> well?
>>>
>>> Jakub reminded me of the other things I wanted to ask here bug forgot:
>>>
>>> - setsockopt is probably not needed, right? setsockopt hook triggers
>>>     before the kernel and shouldn't leak anything
>>> - for getsockopt, instead of bypassing bpf completely, should we instead
>>>     ignore the error from the bpf program? that would still preserve
>>>     the observability aspect
>>
>> stealing this thread to discuss the optlen issue which may make sense to
>> bypass also.
>>
>> There has been issue with optlen. Other than this older post related to
>> optlen > PAGE_SIZE:
>> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
>> the recent one related to optlen that we have seen is
>> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
>> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
>> intention is to learn the expected optlen. This makes 'ctx.optlen >
>> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
>> -EFAULT to the userspace even the bpf prog has not changed anything.
> 
> (ignoring -EFAULT issue) this seems like it needs to be
> 
> 	if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
> 		/* error */
> 	}
> 
> ?
> 
>> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
>> max_optlen' for now (and this can use a separate patch which as usual
>> requires a bpf selftests)?
> 
> Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
> seems like the way to go. It caused too much trouble already :-(
> 
> Should I prepare a patch or do you want to take a stab at it?
> 
>> In the future, does it make sense to have a specific cgroup-bpf-prog (a
>> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
>> such that it can enforce read-only for some optname and potentially also
>> track if bpf-prog has written a new optval? The bpf-prog can only return 1
>> (OK) and only allows using bpf_set_retval() instead. Likely there is still
>> holes but could be a seed of thought to continue polishing the idea.
> 
> Ack, let's think about it.
> 
> Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
> as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
> and we have a mostly working bpf_getsockopt (after your refactoring),
> I don't see why we need to continue the current scheme of triggering
> after the kernel?

Since a sleepable hook would cause some restrictions, perhaps, we could 
introduce something like the promise pattern.  In our case here, BPF 
program call an async version of copy_from_user()/copy_to_user() to 
return a promise.

> 
>>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
>>>     gets called whenever bpf returns an error to make sure protocols have
>>>     a chance to handle that condition (and free the fd)
>>>
>>
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-25 17:59               ` Kui-Feng Lee
@ 2023-04-25 18:42                 ` Stanislav Fomichev
  2023-04-25 21:11                   ` Kui-Feng Lee
  2023-04-25 21:28                   ` Kui-Feng Lee
  0 siblings, 2 replies; 8+ messages in thread
From: Stanislav Fomichev @ 2023-04-25 18:42 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: Martin KaFai Lau, Eric Dumazet, davem, linux-kernel, netdev,
	daniel, Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Christian Brauner, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch, Aleksandr Mikhalitsyn, bpf

On Tue, Apr 25, 2023 at 10:59 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>
>
>
> On 4/18/23 09:47, Stanislav Fomichev wrote:
> > On 04/17, Martin KaFai Lau wrote:
> >> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
> >>> On 04/13, Stanislav Fomichev wrote:
> >>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
> >>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >>>>>
> >>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
> >>>>>>
> >>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
> >>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >>>>>>>
> >>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> >>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
> >>>>>>> install FDs into the process fdtable.
> >>>>>>>
> >>>>>>> After some offlist discussion it was proposed to add a blacklist of
> >>>>>>
> >>>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
> >>>>>
> >>>>> Hi Eric,
> >>>>>
> >>>>> Oh, I'm sorry about that. :( Sure.
> >>>>>
> >>>>>>
> >>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
> >>>>>>>
> >>>>>>
> >>>>>> Can we find the appropriate Fixes: tag to help stable teams ?
> >>>>>
> >>>>> Sure, I will add next time.
> >>>>>
> >>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> >>>>>
> >>>>> I think it's better to add Stanislav Fomichev to CC.
> >>>>
> >>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
> >>>> use it for tcp zerocopy, I'm assuming it should work in this case as
> >>>> well?
> >>>
> >>> Jakub reminded me of the other things I wanted to ask here bug forgot:
> >>>
> >>> - setsockopt is probably not needed, right? setsockopt hook triggers
> >>>     before the kernel and shouldn't leak anything
> >>> - for getsockopt, instead of bypassing bpf completely, should we instead
> >>>     ignore the error from the bpf program? that would still preserve
> >>>     the observability aspect
> >>
> >> stealing this thread to discuss the optlen issue which may make sense to
> >> bypass also.
> >>
> >> There has been issue with optlen. Other than this older post related to
> >> optlen > PAGE_SIZE:
> >> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
> >> the recent one related to optlen that we have seen is
> >> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
> >> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
> >> intention is to learn the expected optlen. This makes 'ctx.optlen >
> >> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
> >> -EFAULT to the userspace even the bpf prog has not changed anything.
> >
> > (ignoring -EFAULT issue) this seems like it needs to be
> >
> >       if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
> >               /* error */
> >       }
> >
> > ?
> >
> >> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
> >> max_optlen' for now (and this can use a separate patch which as usual
> >> requires a bpf selftests)?
> >
> > Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
> > seems like the way to go. It caused too much trouble already :-(
> >
> > Should I prepare a patch or do you want to take a stab at it?
> >
> >> In the future, does it make sense to have a specific cgroup-bpf-prog (a
> >> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
> >> such that it can enforce read-only for some optname and potentially also
> >> track if bpf-prog has written a new optval? The bpf-prog can only return 1
> >> (OK) and only allows using bpf_set_retval() instead. Likely there is still
> >> holes but could be a seed of thought to continue polishing the idea.
> >
> > Ack, let's think about it.
> >
> > Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
> > as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
> > and we have a mostly working bpf_getsockopt (after your refactoring),
> > I don't see why we need to continue the current scheme of triggering
> > after the kernel?
>
> Since a sleepable hook would cause some restrictions, perhaps, we could
> introduce something like the promise pattern.  In our case here, BPF
> program call an async version of copy_from_user()/copy_to_user() to
> return a promise.

Having a promise might work. This is essentially what we already do
with sockets/etc with acquire/release pattern.

What are the sleepable restrictions you're hinting about? I feel like
with the sleepable bpf, we can also remove all the temporary buffer
management / extra copies which sounds like a win to me. (we have this
ugly heuristics with BPF_SOCKOPT_KERN_BUF_SIZE) The program can
allocate temporary buffers if needed..

> >>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
> >>>     gets called whenever bpf returns an error to make sure protocols have
> >>>     a chance to handle that condition (and free the fd)
> >>>
> >>
> >>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-25 18:42                 ` Stanislav Fomichev
@ 2023-04-25 21:11                   ` Kui-Feng Lee
  2023-04-26 15:50                     ` Stanislav Fomichev
  2023-04-25 21:28                   ` Kui-Feng Lee
  1 sibling, 1 reply; 8+ messages in thread
From: Kui-Feng Lee @ 2023-04-25 21:11 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Martin KaFai Lau, Eric Dumazet, davem, linux-kernel, netdev,
	daniel, Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Christian Brauner, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch, Aleksandr Mikhalitsyn, bpf



On 4/25/23 11:42, Stanislav Fomichev wrote:
> On Tue, Apr 25, 2023 at 10:59 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>
>>
>>
>> On 4/18/23 09:47, Stanislav Fomichev wrote:
>>> On 04/17, Martin KaFai Lau wrote:
>>>> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
>>>>> On 04/13, Stanislav Fomichev wrote:
>>>>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>>>
>>>>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
>>>>>>>>
>>>>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
>>>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>>>>>
>>>>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
>>>>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
>>>>>>>>> install FDs into the process fdtable.
>>>>>>>>>
>>>>>>>>> After some offlist discussion it was proposed to add a blacklist of
>>>>>>>>
>>>>>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
>>>>>>>
>>>>>>> Hi Eric,
>>>>>>>
>>>>>>> Oh, I'm sorry about that. :( Sure.
>>>>>>>
>>>>>>>>
>>>>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Can we find the appropriate Fixes: tag to help stable teams ?
>>>>>>>
>>>>>>> Sure, I will add next time.
>>>>>>>
>>>>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
>>>>>>>
>>>>>>> I think it's better to add Stanislav Fomichev to CC.
>>>>>>
>>>>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
>>>>>> use it for tcp zerocopy, I'm assuming it should work in this case as
>>>>>> well?
>>>>>
>>>>> Jakub reminded me of the other things I wanted to ask here bug forgot:
>>>>>
>>>>> - setsockopt is probably not needed, right? setsockopt hook triggers
>>>>>      before the kernel and shouldn't leak anything
>>>>> - for getsockopt, instead of bypassing bpf completely, should we instead
>>>>>      ignore the error from the bpf program? that would still preserve
>>>>>      the observability aspect
>>>>
>>>> stealing this thread to discuss the optlen issue which may make sense to
>>>> bypass also.
>>>>
>>>> There has been issue with optlen. Other than this older post related to
>>>> optlen > PAGE_SIZE:
>>>> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
>>>> the recent one related to optlen that we have seen is
>>>> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
>>>> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
>>>> intention is to learn the expected optlen. This makes 'ctx.optlen >
>>>> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
>>>> -EFAULT to the userspace even the bpf prog has not changed anything.
>>>
>>> (ignoring -EFAULT issue) this seems like it needs to be
>>>
>>>        if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
>>>                /* error */
>>>        }
>>>
>>> ?
>>>
>>>> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
>>>> max_optlen' for now (and this can use a separate patch which as usual
>>>> requires a bpf selftests)?
>>>
>>> Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
>>> seems like the way to go. It caused too much trouble already :-(
>>>
>>> Should I prepare a patch or do you want to take a stab at it?
>>>
>>>> In the future, does it make sense to have a specific cgroup-bpf-prog (a
>>>> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
>>>> such that it can enforce read-only for some optname and potentially also
>>>> track if bpf-prog has written a new optval? The bpf-prog can only return 1
>>>> (OK) and only allows using bpf_set_retval() instead. Likely there is still
>>>> holes but could be a seed of thought to continue polishing the idea.
>>>
>>> Ack, let's think about it.
>>>
>>> Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
>>> as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
>>> and we have a mostly working bpf_getsockopt (after your refactoring),
>>> I don't see why we need to continue the current scheme of triggering
>>> after the kernel?
>>
>> Since a sleepable hook would cause some restrictions, perhaps, we could
>> introduce something like the promise pattern.  In our case here, BPF
>> program call an async version of copy_from_user()/copy_to_user() to
>> return a promise.
> 
> Having a promise might work. This is essentially what we already do
> with sockets/etc with acquire/release pattern.
> 
> What are the sleepable restrictions you're hinting about? I feel like
AFAIK, a sleepable program can use only some types of maps; for example,
array, hash, ringbuf,... etc.  Other types of maps are unavailable to
sleepable programs; for example, sockmap, sockhash.

> with the sleepable bpf, we can also remove all the temporary buffer
> management / extra copies which sounds like a win to me. (we have this
> ugly heuristics with BPF_SOCKOPT_KERN_BUF_SIZE) The program can
> allocate temporary buffers if needed..
Agree!


> 
>>>>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
>>>>>      gets called whenever bpf returns an error to make sure protocols have
>>>>>      a chance to handle that condition (and free the fd)
>>>>>
>>>>
>>>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-25 18:42                 ` Stanislav Fomichev
  2023-04-25 21:11                   ` Kui-Feng Lee
@ 2023-04-25 21:28                   ` Kui-Feng Lee
  2023-04-26 15:50                     ` Stanislav Fomichev
  1 sibling, 1 reply; 8+ messages in thread
From: Kui-Feng Lee @ 2023-04-25 21:28 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Martin KaFai Lau, Eric Dumazet, davem, linux-kernel, netdev,
	daniel, Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Christian Brauner, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch, Aleksandr Mikhalitsyn, bpf



On 4/25/23 11:42, Stanislav Fomichev wrote:
> On Tue, Apr 25, 2023 at 10:59 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>>
>>
>>
>> On 4/18/23 09:47, Stanislav Fomichev wrote:
>>> On 04/17, Martin KaFai Lau wrote:
>>>> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
>>>>> On 04/13, Stanislav Fomichev wrote:
>>>>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>>>
>>>>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
>>>>>>>>
>>>>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
>>>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
>>>>>>>>>
>>>>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
>>>>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
>>>>>>>>> install FDs into the process fdtable.
>>>>>>>>>
>>>>>>>>> After some offlist discussion it was proposed to add a blacklist of
>>>>>>>>
>>>>>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
>>>>>>>
>>>>>>> Hi Eric,
>>>>>>>
>>>>>>> Oh, I'm sorry about that. :( Sure.
>>>>>>>
>>>>>>>>
>>>>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Can we find the appropriate Fixes: tag to help stable teams ?
>>>>>>>
>>>>>>> Sure, I will add next time.
>>>>>>>
>>>>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
>>>>>>>
>>>>>>> I think it's better to add Stanislav Fomichev to CC.
>>>>>>
>>>>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
>>>>>> use it for tcp zerocopy, I'm assuming it should work in this case as
>>>>>> well?
>>>>>
>>>>> Jakub reminded me of the other things I wanted to ask here bug forgot:
>>>>>
>>>>> - setsockopt is probably not needed, right? setsockopt hook triggers
>>>>>      before the kernel and shouldn't leak anything
>>>>> - for getsockopt, instead of bypassing bpf completely, should we instead
>>>>>      ignore the error from the bpf program? that would still preserve
>>>>>      the observability aspect
>>>>
>>>> stealing this thread to discuss the optlen issue which may make sense to
>>>> bypass also.
>>>>
>>>> There has been issue with optlen. Other than this older post related to
>>>> optlen > PAGE_SIZE:
>>>> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
>>>> the recent one related to optlen that we have seen is
>>>> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
>>>> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
>>>> intention is to learn the expected optlen. This makes 'ctx.optlen >
>>>> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
>>>> -EFAULT to the userspace even the bpf prog has not changed anything.
>>>
>>> (ignoring -EFAULT issue) this seems like it needs to be
>>>
>>>        if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
>>>                /* error */
>>>        }
>>>
>>> ?
>>>
>>>> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
>>>> max_optlen' for now (and this can use a separate patch which as usual
>>>> requires a bpf selftests)?
>>>
>>> Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
>>> seems like the way to go. It caused too much trouble already :-(
>>>
>>> Should I prepare a patch or do you want to take a stab at it?
>>>
>>>> In the future, does it make sense to have a specific cgroup-bpf-prog (a
>>>> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
>>>> such that it can enforce read-only for some optname and potentially also
>>>> track if bpf-prog has written a new optval? The bpf-prog can only return 1
>>>> (OK) and only allows using bpf_set_retval() instead. Likely there is still
>>>> holes but could be a seed of thought to continue polishing the idea.
>>>
>>> Ack, let's think about it.
>>>
>>> Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
>>> as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
>>> and we have a mostly working bpf_getsockopt (after your refactoring),
>>> I don't see why we need to continue the current scheme of triggering
>>> after the kernel?
>>
>> Since a sleepable hook would cause some restrictions, perhaps, we could
>> introduce something like the promise pattern.  In our case here, BPF
>> program call an async version of copy_from_user()/copy_to_user() to
>> return a promise.
> 
> Having a promise might work. This is essentially what we already do
> with sockets/etc with acquire/release pattern.

Would you mind to give me some context of the socket things?

> 
> What are the sleepable restrictions you're hinting about? I feel like
> with the sleepable bpf, we can also remove all the temporary buffer
> management / extra copies which sounds like a win to me. (we have this
> ugly heuristics with BPF_SOCKOPT_KERN_BUF_SIZE) The program can
> allocate temporary buffers if needed..
> 
>>>>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
>>>>>      gets called whenever bpf returns an error to make sure protocols have
>>>>>      a chance to handle that condition (and free the fd)
>>>>>
>>>>
>>>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-25 21:28                   ` Kui-Feng Lee
@ 2023-04-26 15:50                     ` Stanislav Fomichev
  0 siblings, 0 replies; 8+ messages in thread
From: Stanislav Fomichev @ 2023-04-26 15:50 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: Martin KaFai Lau, Eric Dumazet, davem, linux-kernel, netdev,
	daniel, Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Christian Brauner, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch, Aleksandr Mikhalitsyn, bpf

On Tue, Apr 25, 2023 at 2:28 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>
>
>
> On 4/25/23 11:42, Stanislav Fomichev wrote:
> > On Tue, Apr 25, 2023 at 10:59 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
> >>
> >>
> >>
> >> On 4/18/23 09:47, Stanislav Fomichev wrote:
> >>> On 04/17, Martin KaFai Lau wrote:
> >>>> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
> >>>>> On 04/13, Stanislav Fomichev wrote:
> >>>>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
> >>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >>>>>>>
> >>>>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
> >>>>>>>>
> >>>>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
> >>>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >>>>>>>>>
> >>>>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> >>>>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
> >>>>>>>>> install FDs into the process fdtable.
> >>>>>>>>>
> >>>>>>>>> After some offlist discussion it was proposed to add a blacklist of
> >>>>>>>>
> >>>>>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
> >>>>>>>
> >>>>>>> Hi Eric,
> >>>>>>>
> >>>>>>> Oh, I'm sorry about that. :( Sure.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Can we find the appropriate Fixes: tag to help stable teams ?
> >>>>>>>
> >>>>>>> Sure, I will add next time.
> >>>>>>>
> >>>>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> >>>>>>>
> >>>>>>> I think it's better to add Stanislav Fomichev to CC.
> >>>>>>
> >>>>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
> >>>>>> use it for tcp zerocopy, I'm assuming it should work in this case as
> >>>>>> well?
> >>>>>
> >>>>> Jakub reminded me of the other things I wanted to ask here bug forgot:
> >>>>>
> >>>>> - setsockopt is probably not needed, right? setsockopt hook triggers
> >>>>>      before the kernel and shouldn't leak anything
> >>>>> - for getsockopt, instead of bypassing bpf completely, should we instead
> >>>>>      ignore the error from the bpf program? that would still preserve
> >>>>>      the observability aspect
> >>>>
> >>>> stealing this thread to discuss the optlen issue which may make sense to
> >>>> bypass also.
> >>>>
> >>>> There has been issue with optlen. Other than this older post related to
> >>>> optlen > PAGE_SIZE:
> >>>> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
> >>>> the recent one related to optlen that we have seen is
> >>>> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
> >>>> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
> >>>> intention is to learn the expected optlen. This makes 'ctx.optlen >
> >>>> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
> >>>> -EFAULT to the userspace even the bpf prog has not changed anything.
> >>>
> >>> (ignoring -EFAULT issue) this seems like it needs to be
> >>>
> >>>        if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
> >>>                /* error */
> >>>        }
> >>>
> >>> ?
> >>>
> >>>> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
> >>>> max_optlen' for now (and this can use a separate patch which as usual
> >>>> requires a bpf selftests)?
> >>>
> >>> Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
> >>> seems like the way to go. It caused too much trouble already :-(
> >>>
> >>> Should I prepare a patch or do you want to take a stab at it?
> >>>
> >>>> In the future, does it make sense to have a specific cgroup-bpf-prog (a
> >>>> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
> >>>> such that it can enforce read-only for some optname and potentially also
> >>>> track if bpf-prog has written a new optval? The bpf-prog can only return 1
> >>>> (OK) and only allows using bpf_set_retval() instead. Likely there is still
> >>>> holes but could be a seed of thought to continue polishing the idea.
> >>>
> >>> Ack, let's think about it.
> >>>
> >>> Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
> >>> as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
> >>> and we have a mostly working bpf_getsockopt (after your refactoring),
> >>> I don't see why we need to continue the current scheme of triggering
> >>> after the kernel?
> >>
> >> Since a sleepable hook would cause some restrictions, perhaps, we could
> >> introduce something like the promise pattern.  In our case here, BPF
> >> program call an async version of copy_from_user()/copy_to_user() to
> >> return a promise.
> >
> > Having a promise might work. This is essentially what we already do
> > with sockets/etc with acquire/release pattern.
>
> Would you mind to give me some context of the socket things?

I'm mostly referring to the infra around KF_ACQUIRE/KF_RELEASE where
the verifier already has some resource tracking functionality. We can
probably extend it to verify that the program does copy_to_user
equivalent at the end of the run (or somehow specifically marks that
it's not needed).

> >
> > What are the sleepable restrictions you're hinting about? I feel like
> > with the sleepable bpf, we can also remove all the temporary buffer
> > management / extra copies which sounds like a win to me. (we have this
> > ugly heuristics with BPF_SOCKOPT_KERN_BUF_SIZE) The program can
> > allocate temporary buffers if needed..
> >
> >>>>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
> >>>>>      gets called whenever bpf returns an error to make sure protocols have
> >>>>>      a chance to handle that condition (and free the fd)
> >>>>>
> >>>>
> >>>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook)
  2023-04-25 21:11                   ` Kui-Feng Lee
@ 2023-04-26 15:50                     ` Stanislav Fomichev
  0 siblings, 0 replies; 8+ messages in thread
From: Stanislav Fomichev @ 2023-04-26 15:50 UTC (permalink / raw)
  To: Kui-Feng Lee
  Cc: Martin KaFai Lau, Eric Dumazet, davem, linux-kernel, netdev,
	daniel, Jakub Kicinski, Paolo Abeni, Leon Romanovsky, David Ahern,
	Arnd Bergmann, Kees Cook, Christian Brauner, Kuniyuki Iwashima,
	Lennart Poettering, linux-arch, Aleksandr Mikhalitsyn, bpf

On Tue, Apr 25, 2023 at 2:11 PM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>
>
>
> On 4/25/23 11:42, Stanislav Fomichev wrote:
> > On Tue, Apr 25, 2023 at 10:59 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
> >>
> >>
> >>
> >> On 4/18/23 09:47, Stanislav Fomichev wrote:
> >>> On 04/17, Martin KaFai Lau wrote:
> >>>> On 4/14/23 6:55 PM, Stanislav Fomichev wrote:
> >>>>> On 04/13, Stanislav Fomichev wrote:
> >>>>>> On Thu, Apr 13, 2023 at 7:38 AM Aleksandr Mikhalitsyn
> >>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >>>>>>>
> >>>>>>> On Thu, Apr 13, 2023 at 4:22 PM Eric Dumazet <edumazet@google.com> wrote:
> >>>>>>>>
> >>>>>>>> On Thu, Apr 13, 2023 at 3:35 PM Alexander Mikhalitsyn
> >>>>>>>> <aleksandr.mikhalitsyn@canonical.com> wrote:
> >>>>>>>>>
> >>>>>>>>> During work on SO_PEERPIDFD, it was discovered (thanks to Christian),
> >>>>>>>>> that bpf cgroup hook can cause FD leaks when used with sockopts which
> >>>>>>>>> install FDs into the process fdtable.
> >>>>>>>>>
> >>>>>>>>> After some offlist discussion it was proposed to add a blacklist of
> >>>>>>>>
> >>>>>>>> We try to replace this word by either denylist or blocklist, even in changelogs.
> >>>>>>>
> >>>>>>> Hi Eric,
> >>>>>>>
> >>>>>>> Oh, I'm sorry about that. :( Sure.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> socket options those can cause troubles when BPF cgroup hook is enabled.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Can we find the appropriate Fixes: tag to help stable teams ?
> >>>>>>>
> >>>>>>> Sure, I will add next time.
> >>>>>>>
> >>>>>>> Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
> >>>>>>>
> >>>>>>> I think it's better to add Stanislav Fomichev to CC.
> >>>>>>
> >>>>>> Can we use 'struct proto' bpf_bypass_getsockopt instead? We already
> >>>>>> use it for tcp zerocopy, I'm assuming it should work in this case as
> >>>>>> well?
> >>>>>
> >>>>> Jakub reminded me of the other things I wanted to ask here bug forgot:
> >>>>>
> >>>>> - setsockopt is probably not needed, right? setsockopt hook triggers
> >>>>>      before the kernel and shouldn't leak anything
> >>>>> - for getsockopt, instead of bypassing bpf completely, should we instead
> >>>>>      ignore the error from the bpf program? that would still preserve
> >>>>>      the observability aspect
> >>>>
> >>>> stealing this thread to discuss the optlen issue which may make sense to
> >>>> bypass also.
> >>>>
> >>>> There has been issue with optlen. Other than this older post related to
> >>>> optlen > PAGE_SIZE:
> >>>> https://lore.kernel.org/bpf/5c8b7d59-1f28-2284-f7b9-49d946f2e982@linux.dev/,
> >>>> the recent one related to optlen that we have seen is
> >>>> NETLINK_LIST_MEMBERSHIPS. The userspace passed in optlen == 0 and the kernel
> >>>> put the expected optlen (> 0) and 'return 0;' to userspace. The userspace
> >>>> intention is to learn the expected optlen. This makes 'ctx.optlen >
> >>>> max_optlen' and __cgroup_bpf_run_filter_getsockopt() ends up returning
> >>>> -EFAULT to the userspace even the bpf prog has not changed anything.
> >>>
> >>> (ignoring -EFAULT issue) this seems like it needs to be
> >>>
> >>>        if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
> >>>                /* error */
> >>>        }
> >>>
> >>> ?
> >>>
> >>>> Does it make sense to also bypass the bpf prog when 'ctx.optlen >
> >>>> max_optlen' for now (and this can use a separate patch which as usual
> >>>> requires a bpf selftests)?
> >>>
> >>> Yeah, makes sense. Replacing this -EFAULT with WARN_ON_ONCE or something
> >>> seems like the way to go. It caused too much trouble already :-(
> >>>
> >>> Should I prepare a patch or do you want to take a stab at it?
> >>>
> >>>> In the future, does it make sense to have a specific cgroup-bpf-prog (a
> >>>> specific attach type?) that only uses bpf_dynptr kfunc to access the optval
> >>>> such that it can enforce read-only for some optname and potentially also
> >>>> track if bpf-prog has written a new optval? The bpf-prog can only return 1
> >>>> (OK) and only allows using bpf_set_retval() instead. Likely there is still
> >>>> holes but could be a seed of thought to continue polishing the idea.
> >>>
> >>> Ack, let's think about it.
> >>>
> >>> Maybe we should re-evaluate 'getsockopt-happens-after-the-kernel' idea
> >>> as well? If we can have a sleepable hook that can copy_from_user/copy_to_user,
> >>> and we have a mostly working bpf_getsockopt (after your refactoring),
> >>> I don't see why we need to continue the current scheme of triggering
> >>> after the kernel?
> >>
> >> Since a sleepable hook would cause some restrictions, perhaps, we could
> >> introduce something like the promise pattern.  In our case here, BPF
> >> program call an async version of copy_from_user()/copy_to_user() to
> >> return a promise.
> >
> > Having a promise might work. This is essentially what we already do
> > with sockets/etc with acquire/release pattern.
> >
> > What are the sleepable restrictions you're hinting about? I feel like
> AFAIK, a sleepable program can use only some types of maps; for example,
> array, hash, ringbuf,... etc.  Other types of maps are unavailable to
> sleepable programs; for example, sockmap, sockhash.

Sure, but it doesn't have to stay that way. (hypothetically) If we
think that sleepable makes sense, we can try to expand the scope.

> > with the sleepable bpf, we can also remove all the temporary buffer
> > management / extra copies which sounds like a win to me. (we have this
> > ugly heuristics with BPF_SOCKOPT_KERN_BUF_SIZE) The program can
> > allocate temporary buffers if needed..
> Agree!
>
>
> >
> >>>>> - or maybe we can even have a per-proto bpf_getsockopt_cleanup call that
> >>>>>      gets called whenever bpf returns an error to make sure protocols have
> >>>>>      a chance to handle that condition (and free the fd)
> >>>>>
> >>>>
> >>>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-04-26 15:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20230413133355.350571-1-aleksandr.mikhalitsyn@canonical.com>
     [not found] ` <20230413133355.350571-3-aleksandr.mikhalitsyn@canonical.com>
     [not found]   ` <CANn89iLuLkUvX-dDC=rJhtFcxjnVmfn_-crOevbQe+EjaEDGbg@mail.gmail.com>
     [not found]     ` <CAEivzxcEhfLttf0VK=NmHdQxF7CRYXNm6NwUVx6jx=-u2k-T6w@mail.gmail.com>
     [not found]       ` <CAKH8qBt+xPygUVPMUuzbi1HCJuxc4gYOdU6JkrFmSouRQgoG6g@mail.gmail.com>
     [not found]         ` <ZDoEG0VF6fb9y0EC@google.com>
2023-04-18  1:03           ` handling unsupported optlen in cgroup bpf getsockopt: (was [PATCH net-next v4 2/4] net: socket: add sockopts blacklist for BPF cgroup hook) Martin KaFai Lau
2023-04-18 16:47             ` Stanislav Fomichev
2023-04-25 17:59               ` Kui-Feng Lee
2023-04-25 18:42                 ` Stanislav Fomichev
2023-04-25 21:11                   ` Kui-Feng Lee
2023-04-26 15:50                     ` Stanislav Fomichev
2023-04-25 21:28                   ` Kui-Feng Lee
2023-04-26 15:50                     ` Stanislav Fomichev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox