From: Thomas Karlsson <thomas.karlsson@paneda.se>
To: Jakub Kicinski <kuba@kernel.org>
Cc: "davem@davemloft.net" <davem@davemloft.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
<jiri@resnulli.us>, <kaber@trash.net>, <edumazet@google.com>,
<vyasevic@redhat.com>, <alexander.duyck@gmail.com>
Subject: Re: [PATCH net-next v3] macvlan: Support for high multicast packet rate
Date: Wed, 2 Dec 2020 12:28:47 +0100 [thread overview]
Message-ID: <333d17ee-b01c-3286-bc7c-30d100b223ae@paneda.se> (raw)
In-Reply-To: <20201201111143.2a82d744@kicinski-fedora-pc1c0hjn.DHCP.thefacebook.com>
On 2020-12-01 20:11, Jakub Kicinski wrote:
> On Mon, 30 Nov 2020 15:00:43 +0100 Thomas Karlsson wrote:
>> Background:
>> Broadcast and multicast packages are enqueued for later processing.
>> This queue was previously hardcoded to 1000.
>>
>> This proved insufficient for handling very high packet rates.
>> This resulted in packet drops for multicast.
>> While at the same time unicast worked fine.
>>
>> The change:
>> This patch make the queue length adjustable to accommodate
>> for environments with very high multicast packet rate.
>> But still keeps the default value of 1000 unless specified.
>>
>> The queue length is specified as a request per macvlan
>> using the IFLA_MACVLAN_BC_QUEUE_LEN parameter.
>>
>> The actual used queue length will then be the maximum of
>> any macvlan connected to the same port. The actual used
>> queue length for the port can be retrieved (read only)
>> by the IFLA_MACVLAN_BC_QUEUE_LEN_USED parameter for verification.
>>
>> This will be followed up by a patch to iproute2
>> in order to adjust the parameter from userspace.
>>
>> Signed-off-by: Thomas Karlsson <thomas.karlsson@paneda.se>
>
> Looks good! Minor nits below:
:)
>
>> @@ -1218,6 +1220,7 @@ static int macvlan_port_create(struct net_device *dev)
>> for (i = 0; i < MACVLAN_HASH_SIZE; i++)
>> INIT_HLIST_HEAD(&port->vlan_source_hash[i]);
>>
>> + port->bc_queue_len_used = MACVLAN_DEFAULT_BC_QUEUE_LEN;
>
> Should this be inited to 0? Otherwise if the first link asks for lower
> queue len than the default it will not get set, right?
Indeed, looks you are right, see also below
>> skb_queue_head_init(&port->bc_queue);
>> INIT_WORK(&port->bc_work, macvlan_process_broadcast);
>>
>> @@ -1486,6 +1489,12 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
>> goto destroy_macvlan_port;
>> }
>>
>> + vlan->bc_queue_len_requested = MACVLAN_DEFAULT_BC_QUEUE_LEN;
>> + if (data && data[IFLA_MACVLAN_BC_QUEUE_LEN])
>> + vlan->bc_queue_len_requested = nla_get_u32(data[IFLA_MACVLAN_BC_QUEUE_LEN]);
>> + if (vlan->bc_queue_len_requested > port->bc_queue_len_used)
>> + port->bc_queue_len_used = vlan->bc_queue_len_requested;
>
> Or perhaps we should just call update_port_bc_queue_len() here?
That would also have prevented the above bug... So yes, I think that is better
to keep the logic only in one place. I'll change to that.
>> err = register_netdevice(dev);
>> if (err < 0)
>> goto destroy_macvlan_port;
>
>> @@ -1658,6 +1684,8 @@ static const struct nla_policy macvlan_policy[IFLA_MACVLAN_MAX + 1] = {
>> [IFLA_MACVLAN_MACADDR] = { .type = NLA_BINARY, .len = MAX_ADDR_LEN },
>> [IFLA_MACVLAN_MACADDR_DATA] = { .type = NLA_NESTED },
>> [IFLA_MACVLAN_MACADDR_COUNT] = { .type = NLA_U32 },
>> + [IFLA_MACVLAN_BC_QUEUE_LEN] = { .type = NLA_U32 },
>> + [IFLA_MACVLAN_BC_QUEUE_LEN_USED] = { .type = NLA_U32 },
>
> This is an input policy, so you can set type to NLA_REJECT and you
> won't have to check if it's set on input.
>
Great!
>> };
>>
>> int macvlan_link_register(struct rtnl_link_ops *ops)
>> @@ -1688,6 +1716,18 @@ static struct rtnl_link_ops macvlan_link_ops = {
>> .priv_size = sizeof(struct macvlan_dev),
>> };
>>
>> +static void update_port_bc_queue_len(struct macvlan_port *port)
>> +{
>> + struct macvlan_dev *vlan;
>> + u32 max_bc_queue_len_requested = 0;
>
> Please reorder so that the vars are longest line to shortest.
>
got it
>> + list_for_each_entry_rcu(vlan, &port->vlans, list) {
>
> I don't think you need the _rcu() flavor here, this is always called
> from the configuration paths holding RTNL lock, right?
>
To be honest, what to use/not to use when traversing the list was what caused me the most
doubt/trouble of the patch :)
I sort of assumed that there must be some outer synchronisation that prevented
two or more concurrent calls to new/delte/change link. but wasn't sure how
and where that synchonisation took place. Now that I have googled RTLN lock I understand
that part much better.
The main reason I went with _rcu was because the existing code is using list_del_rcu and
list_add_tail_rcu when modifying the list as well as _rcu when accessing/traversing (in some places).
So I figured if they needed the _rcu variants I too would need that.
But from a closer inspection I think in that situation it is only needed because the list is accessed
from for example macvlan_handle_frame (obviously not protected by the RTLN lock) using _rcu version
and under the rcu_read_lock as protection. So then it must also be updated with _rcu
in all places of course. Even if all the updates are done under the RTNL lock.
This was a long ramble :)
But thanks, I think I understand the synchronisation mechanism in the kernel a bit better now!
As I'm only calling my function from the netlink configuration functions under RTLN lock
It should be safe to drop the _rcu version as you say, because the list is only
modified in those functions too. Great!
>> + if (vlan->bc_queue_len_requested > max_bc_queue_len_requested)
>> + max_bc_queue_len_requested = vlan->bc_queue_len_requested;
>> + }
>> + port->bc_queue_len_used = max_bc_queue_len_requested;
>> +}
>> +
>> static int macvlan_device_event(struct notifier_block *unused,
>> unsigned long event, void *ptr)
>> {
I also noticed I got a few line length warnings in patchworks but none when I ran the ./scrips/checkpatch.pl
locally. So is the net tree using strict 80 chars? I would prefer not to introduce extra line breaks
on those lines as I think it will hurt readability but of course I will if needed.
I will publish a v4 later today.
/Thomas
next prev parent reply other threads:[~2020-12-02 11:29 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <485531aec7e243659ee4e3bb7fa2186d@paneda.se>
2020-11-23 14:22 ` Hardcoded multicast queue length in macvlan.c driver causes poor multicast receive performance Thomas Karlsson
2020-11-23 22:30 ` Jakub Kicinski
2020-11-25 12:51 ` Thomas Karlsson
2020-11-25 16:57 ` [PATCH] macvlan: Support for high multicast packet rate Thomas Karlsson
2020-11-25 21:55 ` [PATCH net-next v2] " Thomas Karlsson
2020-11-25 16:58 ` Hardcoded multicast queue length in macvlan.c driver causes poor multicast receive performance Jakub Kicinski
2020-11-25 17:12 ` Thomas Karlsson
2020-11-25 18:07 ` Jakub Kicinski
2020-11-25 22:15 ` Thomas Karlsson
2020-11-25 23:01 ` Jakub Kicinski
2020-11-26 20:00 ` Thomas Karlsson
2020-11-27 17:27 ` Jakub Kicinski
2020-11-27 23:13 ` Thomas Karlsson
2020-11-30 14:00 ` [PATCH net-next v3] macvlan: Support for high multicast packet rate Thomas Karlsson
2020-12-01 19:11 ` Jakub Kicinski
2020-12-02 11:28 ` Thomas Karlsson [this message]
2020-12-02 16:44 ` Jakub Kicinski
2020-11-30 14:23 ` [PATCH iproute2-next v1] iplink macvlan: Added bcqueuelen parameter Thomas Karlsson
2020-12-10 16:07 ` Thomas Karlsson
2020-12-11 0:36 ` Jakub Kicinski
2020-12-02 18:49 ` [PATCH net-next v4] macvlan: Support for high multicast packet rate Thomas Karlsson
2020-12-03 16:20 ` Jakub Kicinski
2020-12-03 16:53 ` Thomas Karlsson
2020-12-14 10:42 ` [PATCH iproute2-next v2] iplink:macvlan: Added bcqueuelen parameter Thomas Karlsson
2020-12-14 17:03 ` David Ahern
2020-12-14 17:50 ` Thomas Karlsson
2020-12-16 4:07 ` David Ahern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=333d17ee-b01c-3286-bc7c-30d100b223ae@paneda.se \
--to=thomas.karlsson@paneda.se \
--cc=alexander.duyck@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jiri@resnulli.us \
--cc=kaber@trash.net \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=vyasevic@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).