netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gur Stavi <gur.stavi@huawei.com>
To: 'Willem de Bruijn' <willemdebruijn.kernel@gmail.com>
Cc: <davem@davemloft.net>, <edumazet@google.com>, <kuba@kernel.org>,
	<linux-kernel@vger.kernel.org>, <linux-kselftest@vger.kernel.org>,
	<netdev@vger.kernel.org>, <pabeni@redhat.com>, <shuah@kernel.org>
Subject: RE: [PATCH net-next v02 1/2] af_packet: allow fanout_add when socket is not RUNNING
Date: Fri, 11 Oct 2024 08:17:12 +0300	[thread overview]
Message-ID: <000201db1b9c$db32f6c0$9198e440$@huawei.com> (raw)
In-Reply-To: <67085135e4fe2_21530629429@willemb.c.googlers.com.notmuch>

> Gur Stavi wrote:
> > > Gur Stavi wrote:
> > > > > Gur Stavi wrote:
> > > > > > > Gur Stavi wrote:
> > > > > > > > >> @@ -1846,21 +1846,21 @@ static int fanout_add(struct
> sock
> > > *sk,
> > > > > > > struct fanout_args *args)
> > > > > > > > >>  	err = -EINVAL;
> > > > > > > > >>
> > > > > > > > >>  	spin_lock(&po->bind_lock);
> > > > > > > > >> -	if (packet_sock_flag(po, PACKET_SOCK_RUNNING) &&
> > > > > > > > >> -	    match->type == type &&
> > > > > > > > >> +	if (match->type == type &&
> > > > > > > > >>  	    match->prot_hook.type == po->prot_hook.type &&
> > > > > > > > >>  	    match->prot_hook.dev == po->prot_hook.dev) {
> > > > > > > > >
> > > > > > > > > Remaining unaddressed issue is that the socket can now be
> > > added
> > > > > > > > > before being bound. See comment in v1.
> > > > > > > >
> > > > > > > > I extended the psock_fanout test with unbound fanout test.
> > > > > > > >
> > > > > > > > As far as I understand, the easiest way to verify bind is
> to
> > > test
> > > > > that
> > > > > > > > po->prot_hook.dev != NULL, since we are under a bind_lock
> > > anyway.
> > > > > > > > But perhaps a more readable and direct approach to test
> "bind"
> > > > > would be
> > > > > > > > to test po->ifindex != -1, as ifindex is commented as
> "bound
> > > > > device".
> > > > > > > > However, at the moment ifindex is not initialized to -1, I
> can
> > > add
> > > > > such
> > > > > > > > initialization, but perhaps I do not fully understand all
> the
> > > > > logic.
> > > > > > > >
> > > > > > > > Any preferences?
> > > > > > >
> > > > > > > prot_hook.dev is not necessarily set if a packet socket is
> bound.
> > > > > > > It may be bound to any device. See dev_add_pack and
> ptype_head.
> > > > > > >
> > > > > > > prot_hook.type, on the other hand, must be set if bound and
> is
> > > only
> > > > > > > modified with the bind_lock held too.
> > > > > > >
> > > > > > > Well, and in packet_create. But setsockopt PACKET_FANOUT_ADD
> also
> > > > > > > succeeds in case bind() was not called explicitly first to
> bind
> > > to
> > > > > > > a specific device or change ptype.
> > > > > >
> > > > > > Please clarify the last paragraph? When you say "also succeeds"
> do
> > > you
> > > > > > mean SHOULD succeed or MAY SUCCEED by mistake if "something"
> > > happens
> > > > > ???
> > > > >
> > > > > I mean it succeeds currently. Which behavior must then be
> maintained.
> > > > >
> > > > > > Do you refer to the following scenario: socket is created with
> non-
> > > zero
> > > > > > protocol and becomes RUNNING "without bind" for all devices. In
> > > that
> > > > > case
> > > > > > it can be added to FANOUT without bind. Is that considered a
> bug or
> > > > > does
> > > > > > the bind requirement for fanout only apply for all-protocol (0)
> > > > > sockets?
> > > > >
> > > > > I'm beginning to think that this bind requirement is not needed.
> > > >
> > > > I agree with that. I think that is an historical mistake that
> socket
> > > > becomes implicitly bound to all interfaces if a protocol is defined
> > > > during create. Without this bind requirement would make sense.
> > > >
> > > > >
> > > > > All type and dev are valid, even if an ETH_P_NONE fanout group
> would
> > > > > be fairly useless.
> > > >
> > > > Fanout is all about RX, I think that refusing fanout for socket
> that
> > > > will not receive any packet is OK. The condition can be:
> > > > if (po->ifindex == -1 || !po->num)
> > >
> > > Fanout is not limited to sockets bound to a specific interface.
> > > This will break existing users.
> >
> > For specific interface ifindex >= 1
> > For "any interface" ifindex == 0
> > ifindex is -1 only if the socket was created unbound with proto == 0
> > or for the rare race case that during re-bind the new dev became
> unlisted.
> > For both of these cases fanout should fail.
> 
> The only case where packet_create does not call __register_prot_hook
> is if proto == 0. If proto is anything else, the socket will be bound,
> whether to a device hook, or ptype_all. I don't think we need this
> extra ifindex condition.
> 

Even though "unbound" is an unlikely state for such a socket the code
Should still address this state consistently. If do_bind sets ifindex
to -1 on the unlikely unlisted scenario so should packet_create on the
more likely proto == 0 scenario.

> > >
> > > Binding to ETH_P_NONE is useless, but we're not going to slow down
> > > legitimate users with branches for cases that are harmless.
> > >
> >
> > With "branch", do you refer to performance or something else?
> > As I said in other mail, ETH_P_NONE could not be used in a fanout
> > before as well because socket cannot become RUNNING with proto == 0.
> 
> Good point.
> 
> > For performance, we removed the RUNNING condition and added this.
> > It is not like we need to perform 5M fanout registrations/sec. It is a
> > syscall after all.
> 
> It's as much about code complexity as performance. Both the patch and
> resulting code should be as small and self-evident as possible.
> 
> Patch v3 introduces a lot of code churn.

Did you look at a side by side comparison? There is really very little
extra code.

> 
> If we don't care about opening up fanout groups to ETH_P_NONE, then
> patch v2 seems sufficient. If explicitly blocking this, the ENXIO
> return can be added, but ideally without touching the other lines.
> 

I am not the one to decide if opening it is a good idea but it will be
ironic if a patch with the intention to remove the only-RUNNING
restriction will end up allowing never-RUNNING sockets into a fanout
group.

> > > > I realized another possible problem. We should consider adding
> ifindex
> > > > Field to struct packet_fanout to be used for lookup of an existing
> > > match.
> > > > There is little sense to bind sockets to different interfaces and
> then
> > > > put them in the same fanout group.
> > > > If you agree, I can prepare a separate patch for that.
> > > >
> > > > > The type and dev must match that of the fanout group, and once
> added
> > > > > to a fanout group can no longer be changed (bind will fail).
> > > > >
> > > > > I briefy considered the reason might be max_num_members
> accounting.
> > > > > Since f->num_members counts running sockets. But that is not used
> > > > > when tracking membership of the group, sk_ref is. Every packet
> socket
> > > > > whose po->rollover is increased increases this refcount.
> > > > >
> > > > > > What about using ifindex to detect bind? Initialize it to -1 in
> > > > > > packet_create and ensure that packet_do_bind, on success, sets
> it
> > > > > > to device id or 0?
> > > > > >
> > > > > > psock_fanout, should probably be extended with scenarios that
> test
> > > > > > "all devices" and all/specific protocols. Any specific scenario
> > > > > > suggestions?
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> >
> >
> 



  reply	other threads:[~2024-10-11  5:17 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-08 10:27 [PATCH net-next v02 0/2] net: af_packet: allow joining a fanout when link is down Gur Stavi
2024-10-08 10:27 ` [PATCH net-next v02 1/2] af_packet: allow fanout_add when socket is not RUNNING Gur Stavi
2024-10-08 14:26   ` Willem de Bruijn
2024-10-09  6:58     ` Gur Stavi
2024-10-09 13:51       ` Willem de Bruijn
2024-10-09 18:03         ` Gur Stavi
2024-10-10  0:30           ` Willem de Bruijn
2024-10-10  7:08             ` Gur Stavi
2024-10-10 14:21               ` Willem de Bruijn
2024-10-10 16:14                 ` Gur Stavi
2024-10-10 22:12                   ` Willem de Bruijn
2024-10-11  5:17                     ` Gur Stavi [this message]
2024-10-11 14:24                       ` Willem de Bruijn
2024-10-11  9:02                     ` Gur Stavi
2024-10-11 14:35                       ` Willem de Bruijn
2024-10-11 17:12                         ` Gur Stavi
2024-10-11 19:08                           ` Willem de Bruijn
2024-10-10 11:49             ` Gur Stavi
2024-10-08 10:27 ` [PATCH net-next v02 2/2] selftests: net/psock_fanout: socket joins fanout when link is down Gur Stavi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000201db1b9c$db32f6c0$9198e440$@huawei.com' \
    --to=gur.stavi@huawei.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=shuah@kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).