From: Jiri Pirko <jiri@resnulli.us>
To: David Wei <dw@davidwei.uk>
Cc: Jakub Kicinski <kuba@kernel.org>,
Sabrina Dubroca <sd@queasysnail.net>,
netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>
Subject: Re: [PATCH net-next v3 2/4] netdevsim: forward skbs from one connected port to another
Date: Sun, 17 Dec 2023 12:36:46 +0100 [thread overview]
Message-ID: <ZX7dTtwmMHFVMhNv@nanopsycho> (raw)
In-Reply-To: <3ca24281-5ec7-4102-9e01-6e6f826a3a8c@davidwei.uk>
Sun, Dec 17, 2023 at 03:59:53AM CET, dw@davidwei.uk wrote:
>On 2023-12-16 01:22, Jiri Pirko wrote:
>> Fri, Dec 15, 2023 at 07:31:42PM CET, dw@davidwei.uk wrote:
>>> On 2023-12-15 02:45, Jiri Pirko wrote:
>>>> Thu, Dec 14, 2023 at 10:24:41PM CET, dw@davidwei.uk wrote:
>>>>> Forward skbs sent from one netdevsim port to its connected netdevsim
>>>>> port using dev_forward_skb, in a spirit similar to veth.
>>>>
>>>> Perhaps better to write "dev_forward_skb()" to make obvious you talk
>>>> about function.
>>>
>>> Sorry, it's a bad habit at this point :)
>>>
>>>>
>>>>
>>>>>
>>>>> Signed-off-by: David Wei <dw@davidwei.uk>
>>>>> ---
>>>>> drivers/net/netdevsim/netdev.c | 23 ++++++++++++++++++-----
>>>>> 1 file changed, 18 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
>>>>> index e290c54b0e70..c5f53b1dbdcc 100644
>>>>> --- a/drivers/net/netdevsim/netdev.c
>>>>> +++ b/drivers/net/netdevsim/netdev.c
>>>>> @@ -29,19 +29,33 @@
>>>>> static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>> {
>>>>> struct netdevsim *ns = netdev_priv(dev);
>>>>> + struct netdevsim *peer_ns;
>>>>> + int ret = NETDEV_TX_OK;
>>>>>
>>>>> + rcu_read_lock();
>>>>
>>>> Why do you need to be in rcu read locked section here?
>>>
>>> So the RCU protected pointer `peer` does not change during the critical
>>> section. Veth does something similar in its xmit() for its peer.
>>
>> RCU does not work like this. Please check out the documentation.
>
>When destroying a netdevsim in nsim_destroy(), rtnl_lock is held which prevents
>concurrent destruction of netdevsims. Then, unregister_netdevice() will
>eventually call synchronize_rcu_expedited().
>
>Let's say we have two netdevsims, A linked with B, where A->peer is B and
>B->peer is A.
>
>If we're destroying B in nsim_destroy(), then we first do
>rcu_assign_pointer(A->peer, NULL). Of course, any read-side critical sections
>that dereferenced a non-NULL A->peer won't be affected by this update.
>
>Then B's nsim_destroy() calls unregister_netdevice(), followed eventually by
>synchronize_rcu_expedited(). As I understand RCU, this will wait for one RCU
>grace period, or any nsim_start_xmit() that started _before_ B's
>rcu_assign_pointer(A->peer, NULL) to complete.
>
>RCU docs say that the caller of synchronize_rcu() upon return may be again
>concurrent w/ another nsim_start_xmit() reader. But they should see NULL for
>A->peer ptr due to the rcu_assign_pointer(A->peer, NULL) update during B's
>nsim_destroy(). So after synchronize_rcu() no skb from A should be forwarded to
>B anymore.
>
>In fact, it looks like since v5.0 being in a softirq handler serves as an RCU
>read-side critical section. So the rcu_read_lock() here in nsim_start_xmit() is
>actually redundant.
>
>I believe this is veth's intention too. There is a comment in veth_dellink()
>that says the pair of peer devices are guaranteed to be not freed before one
>RCU grace period.
>
>As long as adding/removing links is also under rtnl_lock, I think with RCU
>guarantees discussed above we will be SMP safe. Does this seem right to you?
>
>>
>>
>>>
>>>>
>>>>
>>>>> if (!nsim_ipsec_tx(ns, skb))
>>>>> - goto out;
>>>>> + goto err;
>>>>
>>>> Not sure why you need to rename the label. Why "out" is not okay?
>>>>
>>>>>
>>>>> u64_stats_update_begin(&ns->syncp);
>>>>> ns->tx_packets++;
>>>>> ns->tx_bytes += skb->len;
>>>>> u64_stats_update_end(&ns->syncp);
>>>>>
>>>>> -out:
>>>>> - dev_kfree_skb(skb);
My point is, why don't take rcu_read_lock() here.
>>>>> + peer_ns = rcu_dereference(ns->peer);
>>>>> + if (!peer_ns)
>>>>> + goto err;
>>>>
>>>> This is definitelly not an error path, "err" label name is misleading.
>>>
>>> That's fair, I can change it back. Lots has changed since my original
>>> intentions.
>>>
>>>>
>>>>
>>>>> +
>>>>> + skb_tx_timestamp(skb);
>>>>> + if (unlikely(dev_forward_skb(peer_ns->netdev, skb) == NET_RX_DROP))
>>>>> + ret = NET_XMIT_DROP;
>>>>
>>>> Hmm, can't you track dropped packets in ns->tx_dropped and expose in
>>>> nsim_get_stats64() ?
>>>
>>> I can add this.
>>>
>>>>
>>>>
>>>>>
>>>>> - return NETDEV_TX_OK;
>>>>> + rcu_read_unlock();
>>>>> + return ret;
>>>>> +
>>>>> +err:
>>>>> + rcu_read_unlock();
>>>>> + dev_kfree_skb(skb);
>>>>> + return ret;
>>>>> }
>>>>>
>>>>> static void nsim_set_rx_mode(struct net_device *dev)
>>>>> @@ -302,7 +316,6 @@ static void nsim_setup(struct net_device *dev)
>>>>> eth_hw_addr_random(dev);
>>>>>
>>>>> dev->tx_queue_len = 0;
>>>>> - dev->flags |= IFF_NOARP;
>>>>> dev->flags &= ~IFF_MULTICAST;
>>>>> dev->priv_flags |= IFF_LIVE_ADDR_CHANGE |
>>>>> IFF_NO_QUEUE;
>>>>> --
>>>>> 2.39.3
>>>>>
next prev parent reply other threads:[~2023-12-17 11:36 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-14 21:24 [PATCH net-next v3 0/4] netdevsim: link and forward skbs between ports David Wei
2023-12-14 21:24 ` [PATCH net-next v3 1/4] netdevsim: allow two netdevsim ports to be connected David Wei
2023-12-15 11:11 ` Jiri Pirko
2023-12-15 19:13 ` David Wei
2023-12-16 9:21 ` Jiri Pirko
2023-12-16 20:52 ` David Wei
2023-12-14 21:24 ` [PATCH net-next v3 2/4] netdevsim: forward skbs from one connected port to another David Wei
2023-12-15 10:45 ` Jiri Pirko
2023-12-15 18:31 ` David Wei
2023-12-16 9:22 ` Jiri Pirko
2023-12-17 2:59 ` David Wei
2023-12-17 11:36 ` Jiri Pirko [this message]
2023-12-14 21:24 ` [PATCH net-next v3 3/4] netdevsim: add selftest for forwarding skb between connected ports David Wei
2023-12-14 21:24 ` [PATCH net-next v3 4/4] netdevsim: add Makefile for selftests David Wei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZX7dTtwmMHFVMhNv@nanopsycho \
--to=jiri@resnulli.us \
--cc=davem@davemloft.net \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sd@queasysnail.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).