netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Saeed Mahameed <saeedm@mellanox.com>,
	"davem@davemloft.net" <davem@davemloft.net>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [RFC net-next 1/2] net: net-porcfs: Reduce rcu lock critical section
Date: Wed, 11 Apr 2018 15:30:10 -0700	[thread overview]
Message-ID: <f1b3bc40-cfdf-5d89-9cfc-cf6996f99d8c@gmail.com> (raw)
In-Reply-To: <1523473143.3402.55.camel@mellanox.com>



On 04/11/2018 11:59 AM, Saeed Mahameed wrote:
> On Tue, 2018-04-10 at 13:35 -0700, Eric Dumazet wrote:
>>
>> On 04/10/2018 10:16 AM, David Miller wrote:
>>>
>>> The tradeoff here is that now you are doing two unnecessary atomic
>>> operations per stats dump.
>>>
>>> That is what the RCU lock allows us to avoid.
>>>
>>
>> dev_hold() and dev_put() are actually per cpu increment and
>> decrements,
>> pretty cheap these days.
>>
> 
> Yes, i am adding only 2 cpu instructions here.
> I think the trade-off here is too small and the price to finally have
> get_stats64 called from non atomic context is really worth it.

Oh... but you have not mentioned this reason in your changelog.

What about bonding stats ?

By sending partial patches like that, others have to take care of the details
and this is not really acceptable.

> 
> It  looks really odd to me that the device chain locks are held for
> such long periods, while we already have the means to avoid this, same
> goes for rtnl_lock, same trick can work here for many use cases and
> many ndos, we are just over protective for no reasons.
> 
> 
>> Problem here is that any preemption of the process holding device
>> reference
>> might trigger warnings in device unregister.
>>
> 
> This is true for any other place dev_hold is used,
> as explained in the commit message dev_hold is used for a very brief
> moment before calling the stats ndo and released directly after.

Not really.

Other places usually have notifiers to remove the refcount when needed.

We worked quite hard in the past to remove extra dev_hold()
(before we finally converted it to percpu counter)

> 
> looking at 
> 
> netdev_wait_allrefs(...)
> [...]
> 	msleep(250);
> 
> 	refcnt = netdev_refcnt_read(dev);
> 
> 	if (time_after(jiffies, warning_time + 10 * HZ)) {
> 		pr_emerg("unregister_netdevice: waiting for %s to
> become free. Usage count = %d\n",
> 			 dev->name, refcnt);
> 		warning_time = jiffies;
> 	}
> 
> The holder will get enough time to release the netdev way before the
> warning is triggered.
> 
> The warning is triggered only if someone holds the dev for more than 10
> seconds which is impossible for the stats ndo to take more than this,
> in fact i just did a quick measurement and it seems that in average
> get_stats64 ndo takes 0.5us !

Average is nice, but what about max time ?

Sleeping more than 10 seconds to satify GFP_KERNEL memory allocation can definitely
happen in the real world, or simply if you get preempted by some RT/high prio tasks.

Just say no to 'might sleep in ndo_get_stats()', and you will save a lot of issues.

  reply	other threads:[~2018-04-11 22:30 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-10 17:08 [RFC net-next 1/2] net: net-porcfs: Reduce rcu lock critical section Saeed Mahameed
2018-04-10 17:08 ` [RFC net-next 2/2] net: net-sysfs: Reduce netstat_show read_lock " Saeed Mahameed
2018-04-10 17:17   ` David Miller
2018-04-10 17:16 ` [RFC net-next 1/2] net: net-porcfs: Reduce rcu lock " David Miller
2018-04-10 20:35   ` Eric Dumazet
2018-04-11 18:59     ` Saeed Mahameed
2018-04-11 22:30       ` Eric Dumazet [this message]
2018-04-11 23:47         ` Saeed Mahameed
2018-04-12  2:59           ` Eric Dumazet
2018-04-12 19:12             ` Saeed Mahameed
2018-04-16 20:50               ` Saeed Mahameed
2018-04-16 21:07                 ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f1b3bc40-cfdf-5d89-9cfc-cf6996f99d8c@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).