From: Gur Stavi <gur.stavi@huawei.com>
To: 'Willem de Bruijn' <willemdebruijn.kernel@gmail.com>
Cc: <davem@davemloft.net>, <edumazet@google.com>, <kuba@kernel.org>,
<linux-kernel@vger.kernel.org>, <linux-kselftest@vger.kernel.org>,
<netdev@vger.kernel.org>, <pabeni@redhat.com>, <shuah@kernel.org>
Subject: RE: [PATCH net-next v02 1/2] af_packet: allow fanout_add when socket is not RUNNING
Date: Fri, 11 Oct 2024 08:17:12 +0300 [thread overview]
Message-ID: <000201db1b9c$db32f6c0$9198e440$@huawei.com> (raw)
In-Reply-To: <67085135e4fe2_21530629429@willemb.c.googlers.com.notmuch>
> Gur Stavi wrote:
> > > Gur Stavi wrote:
> > > > > Gur Stavi wrote:
> > > > > > > Gur Stavi wrote:
> > > > > > > > >> @@ -1846,21 +1846,21 @@ static int fanout_add(struct
> sock
> > > *sk,
> > > > > > > struct fanout_args *args)
> > > > > > > > >> err = -EINVAL;
> > > > > > > > >>
> > > > > > > > >> spin_lock(&po->bind_lock);
> > > > > > > > >> - if (packet_sock_flag(po, PACKET_SOCK_RUNNING) &&
> > > > > > > > >> - match->type == type &&
> > > > > > > > >> + if (match->type == type &&
> > > > > > > > >> match->prot_hook.type == po->prot_hook.type &&
> > > > > > > > >> match->prot_hook.dev == po->prot_hook.dev) {
> > > > > > > > >
> > > > > > > > > Remaining unaddressed issue is that the socket can now be
> > > added
> > > > > > > > > before being bound. See comment in v1.
> > > > > > > >
> > > > > > > > I extended the psock_fanout test with unbound fanout test.
> > > > > > > >
> > > > > > > > As far as I understand, the easiest way to verify bind is
> to
> > > test
> > > > > that
> > > > > > > > po->prot_hook.dev != NULL, since we are under a bind_lock
> > > anyway.
> > > > > > > > But perhaps a more readable and direct approach to test
> "bind"
> > > > > would be
> > > > > > > > to test po->ifindex != -1, as ifindex is commented as
> "bound
> > > > > device".
> > > > > > > > However, at the moment ifindex is not initialized to -1, I
> can
> > > add
> > > > > such
> > > > > > > > initialization, but perhaps I do not fully understand all
> the
> > > > > logic.
> > > > > > > >
> > > > > > > > Any preferences?
> > > > > > >
> > > > > > > prot_hook.dev is not necessarily set if a packet socket is
> bound.
> > > > > > > It may be bound to any device. See dev_add_pack and
> ptype_head.
> > > > > > >
> > > > > > > prot_hook.type, on the other hand, must be set if bound and
> is
> > > only
> > > > > > > modified with the bind_lock held too.
> > > > > > >
> > > > > > > Well, and in packet_create. But setsockopt PACKET_FANOUT_ADD
> also
> > > > > > > succeeds in case bind() was not called explicitly first to
> bind
> > > to
> > > > > > > a specific device or change ptype.
> > > > > >
> > > > > > Please clarify the last paragraph? When you say "also succeeds"
> do
> > > you
> > > > > > mean SHOULD succeed or MAY SUCCEED by mistake if "something"
> > > happens
> > > > > ???
> > > > >
> > > > > I mean it succeeds currently. Which behavior must then be
> maintained.
> > > > >
> > > > > > Do you refer to the following scenario: socket is created with
> non-
> > > zero
> > > > > > protocol and becomes RUNNING "without bind" for all devices. In
> > > that
> > > > > case
> > > > > > it can be added to FANOUT without bind. Is that considered a
> bug or
> > > > > does
> > > > > > the bind requirement for fanout only apply for all-protocol (0)
> > > > > sockets?
> > > > >
> > > > > I'm beginning to think that this bind requirement is not needed.
> > > >
> > > > I agree with that. I think that is an historical mistake that
> socket
> > > > becomes implicitly bound to all interfaces if a protocol is defined
> > > > during create. Without this bind requirement would make sense.
> > > >
> > > > >
> > > > > All type and dev are valid, even if an ETH_P_NONE fanout group
> would
> > > > > be fairly useless.
> > > >
> > > > Fanout is all about RX, I think that refusing fanout for socket
> that
> > > > will not receive any packet is OK. The condition can be:
> > > > if (po->ifindex == -1 || !po->num)
> > >
> > > Fanout is not limited to sockets bound to a specific interface.
> > > This will break existing users.
> >
> > For specific interface ifindex >= 1
> > For "any interface" ifindex == 0
> > ifindex is -1 only if the socket was created unbound with proto == 0
> > or for the rare race case that during re-bind the new dev became
> unlisted.
> > For both of these cases fanout should fail.
>
> The only case where packet_create does not call __register_prot_hook
> is if proto == 0. If proto is anything else, the socket will be bound,
> whether to a device hook, or ptype_all. I don't think we need this
> extra ifindex condition.
>
Even though "unbound" is an unlikely state for such a socket the code
Should still address this state consistently. If do_bind sets ifindex
to -1 on the unlikely unlisted scenario so should packet_create on the
more likely proto == 0 scenario.
> > >
> > > Binding to ETH_P_NONE is useless, but we're not going to slow down
> > > legitimate users with branches for cases that are harmless.
> > >
> >
> > With "branch", do you refer to performance or something else?
> > As I said in other mail, ETH_P_NONE could not be used in a fanout
> > before as well because socket cannot become RUNNING with proto == 0.
>
> Good point.
>
> > For performance, we removed the RUNNING condition and added this.
> > It is not like we need to perform 5M fanout registrations/sec. It is a
> > syscall after all.
>
> It's as much about code complexity as performance. Both the patch and
> resulting code should be as small and self-evident as possible.
>
> Patch v3 introduces a lot of code churn.
Did you look at a side by side comparison? There is really very little
extra code.
>
> If we don't care about opening up fanout groups to ETH_P_NONE, then
> patch v2 seems sufficient. If explicitly blocking this, the ENXIO
> return can be added, but ideally without touching the other lines.
>
I am not the one to decide if opening it is a good idea but it will be
ironic if a patch with the intention to remove the only-RUNNING
restriction will end up allowing never-RUNNING sockets into a fanout
group.
> > > > I realized another possible problem. We should consider adding
> ifindex
> > > > Field to struct packet_fanout to be used for lookup of an existing
> > > match.
> > > > There is little sense to bind sockets to different interfaces and
> then
> > > > put them in the same fanout group.
> > > > If you agree, I can prepare a separate patch for that.
> > > >
> > > > > The type and dev must match that of the fanout group, and once
> added
> > > > > to a fanout group can no longer be changed (bind will fail).
> > > > >
> > > > > I briefy considered the reason might be max_num_members
> accounting.
> > > > > Since f->num_members counts running sockets. But that is not used
> > > > > when tracking membership of the group, sk_ref is. Every packet
> socket
> > > > > whose po->rollover is increased increases this refcount.
> > > > >
> > > > > > What about using ifindex to detect bind? Initialize it to -1 in
> > > > > > packet_create and ensure that packet_do_bind, on success, sets
> it
> > > > > > to device id or 0?
> > > > > >
> > > > > > psock_fanout, should probably be extended with scenarios that
> test
> > > > > > "all devices" and all/specific protocols. Any specific scenario
> > > > > > suggestions?
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> >
> >
>
next prev parent reply other threads:[~2024-10-11 5:17 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-08 10:27 [PATCH net-next v02 0/2] net: af_packet: allow joining a fanout when link is down Gur Stavi
2024-10-08 10:27 ` [PATCH net-next v02 1/2] af_packet: allow fanout_add when socket is not RUNNING Gur Stavi
2024-10-08 14:26 ` Willem de Bruijn
2024-10-09 6:58 ` Gur Stavi
2024-10-09 13:51 ` Willem de Bruijn
2024-10-09 18:03 ` Gur Stavi
2024-10-10 0:30 ` Willem de Bruijn
2024-10-10 7:08 ` Gur Stavi
2024-10-10 14:21 ` Willem de Bruijn
2024-10-10 16:14 ` Gur Stavi
2024-10-10 22:12 ` Willem de Bruijn
2024-10-11 5:17 ` Gur Stavi [this message]
2024-10-11 14:24 ` Willem de Bruijn
2024-10-11 9:02 ` Gur Stavi
2024-10-11 14:35 ` Willem de Bruijn
2024-10-11 17:12 ` Gur Stavi
2024-10-11 19:08 ` Willem de Bruijn
2024-10-10 11:49 ` Gur Stavi
2024-10-08 10:27 ` [PATCH net-next v02 2/2] selftests: net/psock_fanout: socket joins fanout when link is down Gur Stavi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='000201db1b9c$db32f6c0$9198e440$@huawei.com' \
--to=gur.stavi@huawei.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shuah@kernel.org \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.