From: Jiri Pirko <jiri@resnulli.us>
To: David Wei <dw@davidwei.uk>
Cc: Jakub Kicinski <kuba@kernel.org>,
Sabrina Dubroca <sd@queasysnail.net>,
netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>
Subject: Re: [PATCH net-next v3 2/4] netdevsim: forward skbs from one connected port to another
Date: Sun, 17 Dec 2023 12:36:46 +0100 [thread overview]
Message-ID: <ZX7dTtwmMHFVMhNv@nanopsycho> (raw)
In-Reply-To: <3ca24281-5ec7-4102-9e01-6e6f826a3a8c@davidwei.uk>
Sun, Dec 17, 2023 at 03:59:53AM CET, dw@davidwei.uk wrote:
>On 2023-12-16 01:22, Jiri Pirko wrote:
>> Fri, Dec 15, 2023 at 07:31:42PM CET, dw@davidwei.uk wrote:
>>> On 2023-12-15 02:45, Jiri Pirko wrote:
>>>> Thu, Dec 14, 2023 at 10:24:41PM CET, dw@davidwei.uk wrote:
>>>>> Forward skbs sent from one netdevsim port to its connected netdevsim
>>>>> port using dev_forward_skb, in a spirit similar to veth.
>>>>
>>>> Perhaps better to write "dev_forward_skb()" to make obvious you talk
>>>> about function.
>>>
>>> Sorry, it's a bad habit at this point :)
>>>
>>>>
>>>>
>>>>>
>>>>> Signed-off-by: David Wei <dw@davidwei.uk>
>>>>> ---
>>>>> drivers/net/netdevsim/netdev.c | 23 ++++++++++++++++++-----
>>>>> 1 file changed, 18 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
>>>>> index e290c54b0e70..c5f53b1dbdcc 100644
>>>>> --- a/drivers/net/netdevsim/netdev.c
>>>>> +++ b/drivers/net/netdevsim/netdev.c
>>>>> @@ -29,19 +29,33 @@
>>>>> static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>> {
>>>>> struct netdevsim *ns = netdev_priv(dev);
>>>>> + struct netdevsim *peer_ns;
>>>>> + int ret = NETDEV_TX_OK;
>>>>>
>>>>> + rcu_read_lock();
>>>>
>>>> Why do you need to be in rcu read locked section here?
>>>
>>> So the RCU protected pointer `peer` does not change during the critical
>>> section. Veth does something similar in its xmit() for its peer.
>>
>> RCU does not work like this. Please check out the documentation.
>
>When destroying a netdevsim in nsim_destroy(), rtnl_lock is held which prevents
>concurrent destruction of netdevsims. Then, unregister_netdevice() will
>eventually call synchronize_rcu_expedited().
>
>Let's say we have two netdevsims, A linked with B, where A->peer is B and
>B->peer is A.
>
>If we're destroying B in nsim_destroy(), then we first do
>rcu_assign_pointer(A->peer, NULL). Of course, any read-side critical sections
>that dereferenced a non-NULL A->peer won't be affected by this update.
>
>Then B's nsim_destroy() calls unregister_netdevice(), followed eventually by
>synchronize_rcu_expedited(). As I understand RCU, this will wait for one RCU
>grace period, or any nsim_start_xmit() that started _before_ B's
>rcu_assign_pointer(A->peer, NULL) to complete.
>
>RCU docs say that the caller of synchronize_rcu() upon return may be again
>concurrent w/ another nsim_start_xmit() reader. But they should see NULL for
>A->peer ptr due to the rcu_assign_pointer(A->peer, NULL) update during B's
>nsim_destroy(). So after synchronize_rcu() no skb from A should be forwarded to
>B anymore.
>
>In fact, it looks like since v5.0 being in a softirq handler serves as an RCU
>read-side critical section. So the rcu_read_lock() here in nsim_start_xmit() is
>actually redundant.
>
>I believe this is veth's intention too. There is a comment in veth_dellink()
>that says the pair of peer devices are guaranteed to be not freed before one
>RCU grace period.
>
>As long as adding/removing links is also under rtnl_lock, I think with RCU
>guarantees discussed above we will be SMP safe. Does this seem right to you?
>
>>
>>
>>>
>>>>
>>>>
>>>>> if (!nsim_ipsec_tx(ns, skb))
>>>>> - goto out;
>>>>> + goto err;
>>>>
>>>> Not sure why you need to rename the label. Why "out" is not okay?
>>>>
>>>>>
>>>>> u64_stats_update_begin(&ns->syncp);
>>>>> ns->tx_packets++;
>>>>> ns->tx_bytes += skb->len;
>>>>> u64_stats_update_end(&ns->syncp);
>>>>>
>>>>> -out:
>>>>> - dev_kfree_skb(skb);
My point is, why don't take rcu_read_lock() here.
>>>>> + peer_ns = rcu_dereference(ns->peer);
>>>>> + if (!peer_ns)
>>>>> + goto err;
>>>>
>>>> This is definitelly not an error path, "err" label name is misleading.
>>>
>>> That's fair, I can change it back. Lots has changed since my original
>>> intentions.
>>>
>>>>
>>>>
>>>>> +
>>>>> + skb_tx_timestamp(skb);
>>>>> + if (unlikely(dev_forward_skb(peer_ns->netdev, skb) == NET_RX_DROP))
>>>>> + ret = NET_XMIT_DROP;
>>>>
>>>> Hmm, can't you track dropped packets in ns->tx_dropped and expose in
>>>> nsim_get_stats64() ?
>>>
>>> I can add this.
>>>
>>>>
>>>>
>>>>>
>>>>> - return NETDEV_TX_OK;
>>>>> + rcu_read_unlock();
>>>>> + return ret;
>>>>> +
>>>>> +err:
>>>>> + rcu_read_unlock();
>>>>> + dev_kfree_skb(skb);
>>>>> + return ret;
>>>>> }
>>>>>
>>>>> static void nsim_set_rx_mode(struct net_device *dev)
>>>>> @@ -302,7 +316,6 @@ static void nsim_setup(struct net_device *dev)
>>>>> eth_hw_addr_random(dev);
>>>>>
>>>>> dev->tx_queue_len = 0;
>>>>> - dev->flags |= IFF_NOARP;
>>>>> dev->flags &= ~IFF_MULTICAST;
>>>>> dev->priv_flags |= IFF_LIVE_ADDR_CHANGE |
>>>>> IFF_NO_QUEUE;
>>>>> --
>>>>> 2.39.3
>>>>>
next prev parent reply other threads:[~2023-12-17 11:36 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-14 21:24 [PATCH net-next v3 0/4] netdevsim: link and forward skbs between ports David Wei
2023-12-14 21:24 ` [PATCH net-next v3 1/4] netdevsim: allow two netdevsim ports to be connected David Wei
2023-12-15 11:11 ` Jiri Pirko
2023-12-15 19:13 ` David Wei
2023-12-16 9:21 ` Jiri Pirko
2023-12-16 20:52 ` David Wei
2023-12-14 21:24 ` [PATCH net-next v3 2/4] netdevsim: forward skbs from one connected port to another David Wei
2023-12-15 10:45 ` Jiri Pirko
2023-12-15 18:31 ` David Wei
2023-12-16 9:22 ` Jiri Pirko
2023-12-17 2:59 ` David Wei
2023-12-17 11:36 ` Jiri Pirko [this message]
2023-12-14 21:24 ` [PATCH net-next v3 3/4] netdevsim: add selftest for forwarding skb between connected ports David Wei
2023-12-14 21:24 ` [PATCH net-next v3 4/4] netdevsim: add Makefile for selftests David Wei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZX7dTtwmMHFVMhNv@nanopsycho \
--to=jiri@resnulli.us \
--cc=davem@davemloft.net \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sd@queasysnail.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.