All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lukasz Skalski <l.skalski@samsung.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
	Tom Gundersen <teg@jklm.no>, Jiri Kosina <jkosina@suse.cz>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Daniel Mack <daniel@zonque.org>,
	David Herrmann <dh.herrmann@gmail.com>,
	Djalal Harouni <tixxdz@opendz.org>
Subject: Re: D-bus is three orders of magnitude too slow (Re: [GIT PULL] kdbus for 4.1-rc1)
Date: Tue, 28 Apr 2015 12:25:32 +0200	[thread overview]
Message-ID: <553F601C.6030309@samsung.com> (raw)
In-Reply-To: <CALCETrWLTLqZ0pioOEHakd_S+h=F1X2qXpODmEZ73JcFZNxYig@mail.gmail.com>

On 04/27/2015 10:08 PM, Andy Lutomirski wrote:
> On Fri, Apr 24, 2015 at 6:50 AM, Lukasz Skalski <l.skalski@samsung.com> wrote:
>> Hi All,
>>
>> On 04/23/2015 07:16 PM, Greg Kroah-Hartman wrote:
>>> On Thu, Apr 23, 2015 at 09:46:22AM -0700, Andy Lutomirski wrote:
>>>>  - There's still an open performance question.  Namely: is kdbus performant?
>>>
>>> Yes, I thought that was already answered.  Tizen posted some numbers
>>> with a much older version of the code, before David fixed a bunch of
>>> issues that he and you found, and that averaged between 25-50% faster.
>>> Details are in this presentation:
>>>       http://download.tizen.org/misc/media/conference2014/slides/tdc2014-kdbus-in-tizen3.pdf
>>>
>>> The Tizen and GENIVI developers are off running numbers with the latest
>>> code, or so they told me through emails, but I don't know when/if that
>>> will ever happen, so I can't promise more than what is already here.
>>>
>>
>> I'm working on kdbus support for GLib ([1],[2]). I saw some questions
>> about kdbus performance, so I've prepared simple benchmark. Because
>> David already has posted some comparison results between kdbus and UDS,
>> I've decided to use my GLib port with native kdbus support (it should
>> be noted, that this port is not finished yet and there are still some
>> places for improvements, thus please do not treat these test results as
>> final).
>>
>> To perform tests I've created two simple apps:
>>
>> - server: http://fpaste.org/215157/
>> - client: http://fpaste.org/215156/
> 
> After some fiddling (what's this G_BUS_TYPE_USER thing?
> G_BUS_TYPE_SESSION seems to work), I got this to run.
> 

As it was discussed some time ago with GLib developers, we have a plan
to create two new bus types called "user" (G_BUS_TYPE_USER) and
"machine" (G_BUS_TYPE_MACHINE). These ones try to connect to kdbus,
falling back to the old "session" and "system" buses (respectively) in
case of failure. The "session" (G_BUS_TYPE_SESSION) and "system"
(G_BUS_TYPE_SYSTEM) bus types always connect to the dbus socket, as
today.

> Can we please take a big step back from the kernel-vs-userspace debate
> here?  Can we try to understand why it's so bloody slow before
> thinking about merging something that might ossify it?
> 
> With a slightly improved test (it shows how many calls happened) and
> userspace dbus, I got:
> 
> $ taskset -c 0 dbus-launch bash
> $ perf stat ./test.sh
> HANDLER: bus_acquired_handler
> HANDLER: name_acquired_handler
> 20000 calls in 4.978810 s = 0.248940 ms per call
> 
>  Performance counter stats for './test.sh':
> 
>        3866.770672 task-clock (msec)         #    0.738 CPUs utilized
>            300,633 context-switches          #    0.078 M/sec
>                  3 cpu-migrations            #    0.001 K/sec
>                817 page-faults               #    0.211 K/sec
>     13,572,244,134 cycles                    #    3.510 GHz
>          [83.26%]
>      8,799,771,026 stalled-cycles-frontend   #   64.84% frontend
> cycles idle    [82.75%]
>      6,914,976,524 stalled-cycles-backend    #   50.95% backend
> cycles idle    [68.92%]
>      9,915,362,980 instructions              #    0.73  insns per cycle
>                                              #    0.89  stalled cycles
> per insn [84.82%]
>      2,343,233,242 branches                  #  605.992 M/sec
>          [82.25%]
>         44,699,493 branch-misses             #    1.91% of all
> branches         [82.83%]
> 
>        5.238823116 seconds time elapsed
> 
> 
> That's more than 15 context switches per call.  Something is severely
> broken.  There should be almost exactly four context switches per
> call.
> 
> NB: subtract 250ms from the elapsed time.  test.sh contains a sleep
> 0.25 to work around a stupid race.
> 
> This performance is absolutely terrible.  Kdbus should be 100-1000x
> faster, not 2x faster, so kdbus' performance is still absolutely
> terrible.
> 
> Sure, we can crank the message size so high that the only thing that
> matters is the per-byte overhead, in which case anything that uses
> memfd will win, but dbus-daemon could pretty easily do that to.
> 
> Anyway, here's part of the problem.  The client does this for each
> message (I added the nanosleep for this test so I could see what was
> going on):
> 
> [pid 29592] eventfd2(0, O_NONBLOCK|O_CLOEXEC) = 6
> [pid 29592] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
> [pid 29592] write(5, "\1\0\0\0\0\0\0\0", 8) = 8
> [pid 29592] poll([{fd=6, events=POLLIN}], 1, 25000) = 1 ([{fd=6,
> revents=POLLIN}])
> [pid 29592] poll([{fd=6, events=POLLIN}], 1, 25000) = 1 ([{fd=6,
> revents=POLLIN}])
> [pid 29592] read(6, "\1\0\0\0\0\0\0\0", 16) = 8
> [pid 29592] poll([{fd=6, events=POLLIN}], 1, 25000 <unfinished ...>
> [pid 29593] <... poll resumed> )        = 1 ([{fd=5, revents=POLLIN}])
> [pid 29593] read(5, "\1\0\0\0\0\0\0\0", 16) = 8
> [pid 29593] sendmsg(4, {msg_name(0)=NULL,
> msg_iov(1)=[{"l\1\0\1\5\4\0\0=\4\0\0e\0\0\0\10\1g\0\1s\0\0\1\1o\0\r\0\0\0"...,
> 1149}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 1149
> [pid 29593] poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}], 2,
> 4294967295) = 1 ([{fd=4, revents=POLLIN}])
> [pid 29593] read(5, 0x7f9115edfcf0, 16) = -1 EAGAIN (Resource
> temporarily unavailable)
> [pid 29593] write(5, "\1\0\0\0\0\0\0\0", 8) = 8
> [pid 29593] recvmsg(4, {msg_name(0)=NULL,
> msg_iov(1)=[{"l\2\1\1\7\0\0\0>\4\0\0-\0\0\0", 16}], msg_controllen=0,
> msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 16
> [pid 29593] poll([{fd=4, events=POLLIN}], 1, 0) = 1 ([{fd=4, revents=POLLIN}])
> [pid 29593] recvmsg(4, {msg_name(0)=NULL,
> msg_iov(1)=[{"\10\1g\0\1s\0\0\5\1u\0=\4\0\0\6\1s\0\4\0\0\0:1.1\0\0\0\0"...,
> 55}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC)
> = 55
> [pid 29593] write(5, "\1\0\0\0\0\0\0\0", 8) = 8
> [pid 29593] write(5, "\1\0\0\0\0\0\0\0", 8) = 8
> [pid 29593] poll([{fd=5, events=POLLIN}], 1, 0) = 1 ([{fd=5, revents=POLLIN}])
> [pid 29593] write(6, "\1\0\0\0\0\0\0\0", 8 <unfinished ...>
> [pid 29592] <... poll resumed> )        = 1 ([{fd=6, revents=POLLIN}])
> [pid 29592] futex(0x1f5c130, FUTEX_WAIT, 2, NULL <unfinished ...>
> [pid 29593] <... write resumed> )       = 8
> [pid 29593] futex(0x1f5c130, FUTEX_WAKE, 1 <unfinished ...>
> [pid 29592] <... futex resumed> )       = 0
> [pid 29592] read(6, "\1\0\0\0\0\0\0\0", 16) = 8
> [pid 29592] futex(0x1f5c130, FUTEX_WAKE, 1) = 0
> [pid 29592] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
> [pid 29592] futex(0x1f5c140, FUTEX_WAKE, 2147483647) = 0
> [pid 29592] nanosleep({0, 100000},  <unfinished ...>
> [pid 29593] <... futex resumed> )       = 1
> [pid 29593] close(6)                    = 0
> [pid 29593] poll([{fd=4, events=POLLIN}], 1, 0) = 0 (Timeout)
> [pid 29593] write(5, "\1\0\0\0\0\0\0\0", 8) = 8
> [pid 29593] poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}], 2,
> 4294967295) = 1 ([{fd=5, revents=POLLIN}])
> [pid 29593] read(5, "\4\0\0\0\0\0\0\0", 16) = 8
> [pid 29593] poll([{fd=5, events=POLLIN}, {fd=4, events=POLLIN}], 2,
> 4294967295 <unfinished ...>
> [pid 29592] <... nanosleep resumed> NULL) = 0
> 
> Hmm.  So you're closing an fd for each message, thereby forcing an RCU
> cleanup, and you're doing some kind of inter-thread communication
> using several poll calls and a few futex wakes.
> 
> Can someone who likes kdbus please benchmark kdbus against something
> that doesn't go out of its way to be inefficient?  I don't even want
> to review kernel code that has as a major claim to fame the ability to
> beat this mess by a mere factor of two.
> 
> For comparison, I have some code that uses Thrift, which is dog-slow
> [1], to serialize a message, send it, and deserialize it on the other
> end.  The entire process takes 28 microseconds on average.  That would
> be almost exactly three orders of magnitude faster.  Of course, I'm
> not comparing apples to apples here, but the species of fruit being
> compared does not justify a three order of magnitude difference.
> 
> [1] For all I know, the glib serialization stuff is also dog-slow, or
> perhaps snail-slow.  Nonetheless, Thrift, or at least the version I'm
> using, is not a shining example of performance.  If you want
> serialization performance, use Cap'n Proto.
> 
> --Andy
> 

-- 
Lukasz Skalski
Samsung R&D Institute Poland
Samsung Electronics
l.skalski@samsung.com

      reply	other threads:[~2015-04-28 10:25 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-27 20:08 D-bus is three orders of magnitude too slow (Re: [GIT PULL] kdbus for 4.1-rc1) Andy Lutomirski
2015-04-28 10:25 ` Lukasz Skalski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=553F601C.6030309@samsung.com \
    --to=l.skalski@samsung.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=daniel@zonque.org \
    --cc=dh.herrmann@gmail.com \
    --cc=ebiederm@xmission.com \
    --cc=gnomes@lxorguk.ukuu.org.uk \
    --cc=gregkh@linuxfoundation.org \
    --cc=jkosina@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=teg@jklm.no \
    --cc=tixxdz@opendz.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.