Re: [PATCHv4 iproute2 2/2] lib/libnetlink: update rtnl_talk to support malloc buff at run time

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Phil Sutter <phil@nwl.cc>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: Michal Kubecek <mkubecek@suse.cz>, Hangbin Liu <haliu@redhat.com>,
	netdev@vger.kernel.org, Hangbin Liu <liuhangbin@gmail.com>
Subject: Re: [PATCHv4 iproute2 2/2] lib/libnetlink: update rtnl_talk to support malloc buff at run time
Date: Wed, 11 Oct 2017 13:10:07 +0200	[thread overview]
Message-ID: <20171011111007.GA11332@orbyte.nwl.cc> (raw)
In-Reply-To: <20171010094743.6ae2baa8@shemminger-XPS-13-9360>

On Tue, Oct 10, 2017 at 09:47:43AM -0700, Stephen Hemminger wrote:
> On Tue, 10 Oct 2017 08:41:17 +0200
> Michal Kubecek <mkubecek@suse.cz> wrote:
> 
> > On Mon, Oct 09, 2017 at 10:25:25PM +0200, Phil Sutter wrote:
> > > Hi Stephen,
> > > 
> > > On Mon, Oct 02, 2017 at 10:37:08AM -0700, Stephen Hemminger wrote:  
> > > > On Thu, 28 Sep 2017 21:33:46 +0800
> > > > Hangbin Liu <haliu@redhat.com> wrote:
> > > >   
> > > > > From: Hangbin Liu <liuhangbin@gmail.com>
> > > > > 
> > > > > This is an update for 460c03f3f3cc ("iplink: double the buffer size also in
> > > > > iplink_get()"). After update, we will not need to double the buffer size
> > > > > every time when VFs number increased.
> > > > > 
> > > > > With call like rtnl_talk(&rth, &req.n, NULL, 0), we can simply remove the
> > > > > length parameter.
> > > > > 
> > > > > With call like rtnl_talk(&rth, nlh, nlh, sizeof(req), I add a new variable
> > > > > answer to avoid overwrite data in nlh, because it may has more info after
> > > > > nlh. also this will avoid nlh buffer not enough issue.
> > > > > 
> > > > > We need to free answer after using.
> > > > > 
> > > > > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> > > > > Signed-off-by: Phil Sutter <phil@nwl.cc>
> > > > > ---  
> > > > 
> > > > Most of the uses of rtnl_talk() don't need to this peek and dynamic sizing.
> > > > Can only those places that need that be targeted?  
> > > 
> > > We could probably do that, by having a buffer on stack in __rtnl_talk()
> > > which will be used instead of the allocated one if 'answer' is NULL. Or
> > > maybe even introduce a dedicated API call for the dynamically allocated
> > > receive buffer. But I really doubt that's feasible: AFAICT, that stack
> > > buffer still needs to be reasonably sized since the reply might be
> > > larger than the request (reusing the request buffer would be the most
> > > simple way to tackle this), also there is support for extack which may
> > > bloat the response to arbitrary size. Hangbin has shown in his benchmark
> > > that the overhead of the second syscall is negligible, so why care about
> > > that and increase code complexity even further?
> > > 
> > > Not saying it's not possible, but I just doubt it's worth the effort.  
> > 
> > Agreed. Current code is based on the assumption that we can estimate the
> > maximum reply length in advance and the reason for this series is that
> > this assumption turned out to be wrong. I'm afraid that if we replace
> > it by an assumption that we can estimate the maximum reply length for
> > most requests with only few exceptions, it's only matter of time for us
> > to be proven wrong again.
> > 
> > Michal Kubecek
> > 
> 
> For query responses, yes the response may be large. But for the common cases of
> add address or add route, the response should just be ack or error.

And with extack, error is comprised of the original request plus an
arbitrarily sized error message, so we can't just reuse the request
buffer and are back to "guessing" the right length again.

To get an idea of what we're talking about, I wrote a simple benchmark
which adds 256 * 254 (= 65024) addresses to an interface, then removes
them again one by one and measured the time that takes for binaries with
and without Hangbin's patches:

OP	Vanilla		Hangbin		Delta
--------------------------------------------------------
add	real 2m16.244s	real 2m27.964s	+11.72s	(108.6%)
	user 0m15.241s	user 0m17.295s	+2.054s	(113.5%)
	sys  1m40.229s	sys  1m48.239s	+8.01s	(108.0%)

remove	real 1m44.950s	real 1m47.044s	+2.094s	(102.0%)
	user 0m13.899s	user 0m14.723s	+0.824s (105.9%)
	sys  1m30.798s	sys  1m31.938s	+1.140s (101.3%)

So the overhead of the second syscall and dynamic memory allocation is
less than 10% overall. Given the short time a single call to 'ip'
typically takes, I don't think the difference is noticeable even in
highly performance critical applications.

Cheers, Phil

next prev parent reply	other threads:[~2017-10-11 11:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-28 13:33 [PATCHv4 iproute2 0/2] libnetlink: malloc correct buff at run time Hangbin Liu
2017-09-28 13:33 ` [PATCHv4 iproute2 1/2] lib/libnetlink: re malloc buff if size is not enough Hangbin Liu
2017-09-29 12:55   ` Michal Kubecek
2017-09-29 17:54   ` Stephen Hemminger
2017-09-29 18:20     ` Michal Kubecek
2017-09-30 13:54     ` Hangbin Liu
2017-09-28 13:33 ` [PATCHv4 iproute2 2/2] lib/libnetlink: update rtnl_talk to support malloc buff at run time Hangbin Liu
2017-09-29 13:06   ` Michal Kubecek
2017-10-02 17:37   ` Stephen Hemminger
2017-10-09 20:25     ` Phil Sutter
2017-10-10  6:41       ` Michal Kubecek
2017-10-10 16:47         ` Stephen Hemminger
2017-10-11 10:40           ` Hangbin Liu
2017-10-11 11:10           ` Phil Sutter [this message]
2017-10-11 14:35             ` David Ahern
2017-10-12 16:07             ` Stephen Hemminger
2017-10-13 10:31               ` Phil Sutter
2017-10-25 10:45   ` Stephen Hemminger
2017-10-25 13:16     ` Hangbin Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171011111007.GA11332@orbyte.nwl.cc \
    --to=phil@nwl.cc \
    --cc=haliu@redhat.com \
    --cc=liuhangbin@gmail.com \
    --cc=mkubecek@suse.cz \
    --cc=netdev@vger.kernel.org \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).