From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
To: Paul Jackson <pj@engr.sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>,
akpm@osdl.org, greg@kroah.com, linux-kernel@vger.kernel.org,
jlan@engr.sgi.com, efocht@hpce.nec.com, linuxram@us.ibm.com,
gh@us.ibm.com, elsa-devel@lists.sourceforge.net
Subject: Re: [patch 1/2] fork_connector: add a fork connector
Date: Tue, 29 Mar 2005 11:04:16 +0400 [thread overview]
Message-ID: <1112079856.5243.24.camel@uganda> (raw)
In-Reply-To: <20050328134242.4c6f7583.pj@engr.sgi.com>
[-- Attachment #1: Type: text/plain, Size: 6040 bytes --]
On Mon, 2005-03-28 at 13:42 -0800, Paul Jackson wrote:
> Guillaume wrote:
> > The lmbench shows that the overhead (the construction and the sending
> > of the message) in the fork() routine is around 7%.
>
> Thanks for including the numbers. The 7% seems a bit costly, for a bit
> more accounting information. Perhaps dean's suggestion, to not use
> ascii, will help. I hope so, though I doubt it will make a huge
> difference. Was this 7% loss with or without a user level program
> consuming the sent messages? I would think that the number of interest
> would include a minimal consumer task.
There is no overhead at all using CBUS.
On my old P2/256mb SMP machine it took about 950 usec
to create+exit process both with fork connector turned on and
without it even compiled.
Direct connector's method call took about 1000-1100 usec.
Current fork connector does not use CBUS [yet, I hope].
> I don't see a good reason to make fork_connector() inline. Since it
> calls other subroutines and is not just a few lines, perhaps better to
> make it a real routine, so we can see it in "nm --print-size" output and
> debug stacks.
>
> Having the "#ifdef CONFIG_FORK_CONNECTOR" chunk of code right in fork.c
> seems unfortunate. Can the real fork_connector() be put elsewhere, and
> the ifdef put in a header file that makes it a no-op if not configured,
> or simply a function declaration, if configured?
>
> What's the status of the connector driver patch? I perhaps wasn't
> paying close enough attention, but all I see of it now is a couple of
> patches sent to lkml, from Evgeniy Polyakov, in September and January.
> I don't see it in my copies of *-mm or recent Linus bk trees. Am I
> missing something?
It was dropped from -mm tree, since bk tree where it lives
was in maintenance mode.
I think connector will be appeared in the next -mm release.
> This still seems to me like more apparatus than is desirable, just to
> get another form of session id, as best as I can figure it. However
> we've already been there, and apparently my concerns were not
> persuasive. If one does go down this path, then using this connector
> patch is a good an alternative as any I know of. Well, that or relayfs.
> My uneducated assumption is that relayfs might at least batch data
> packets up into big buffer chunks better, but someone more knowledgeable
> than me needs to consider that.
>
> It's a little sad, when almost all the required accounting information
> comes out in packed 64 byte records, carefully buffered and sent in
> big chunks, to minimize per-task costs. Then this one extra detail,
> of <parent-pid, child-pid> requires an entire netlink packet of
> its own of what size -- another 50 or 100 bytes? Is this packet
> received as a separate data packet, on its own recv(2) system call,
> by the user task, not in a big block of packets? The efficiency
> of getting this one extra <parent-pid, child-pid> out of the kernel
> seems to be one or two orders of magnitude worse than the rest of
> the accounting data.
It can be easily changed.
One may send kernel/acct.c acct_t structure out of the kernel -
overhead will be the same: kmalloc probably will get new area from the
same 256-bytes pool, skb is still in cache.
> ===
>
> Hmmm ... perhaps one could add a _second_ accounting file, cutting and
> pasting code in kernel/acct.c and enabling writing additional
> information to that second file, using the same mechanisms as now used
> for the primary file. Use a more extensible record format for the
> second file (say start each record with a magic cookie, a byte record
> type and a byte record length, then that many bytes). That way, we have
> an escape valve for adding additional record types in the future.
> And that way we can efficiently write short records, with just say
> a couple of interesting values, and minimal overhead.
>
> Don't worry if the magic cookie appears as part of the raw data. If one
> has to resync such a data stream, one can look for a series of records,
> each starting with the magic cookie, sensible record type byte, and a
> length that ends right at the next such valid record. The occassional
> duplication of the same cookie within the data stream would not thwart a
> resync for long. And the main purpose of the magic cookie is to make
> sure you are still in sync, not reverting to garbage-in, garbage-out,
> mode. Almost any magic value other than 0x0000 will suffice for that
> purpose.
>
> I just ran a silly little test on my PC desktop Linux box, scanning
> /proc/kcore. The _least_ common 2 byte word seen was 0x2B91, with 31
> instances in a half-billion words scanned, so I nominate that value for
> the magic cookie ;).
>
> The key reason that it might make sense here to adapt the existing
> accounting file direct write mechanism, rather than using "connector" or
> "relayfs", is that we really do want to get this data to disk initially.
> Relayfs is optimized for getting alot of data to a user daemon, and the
> connector for sending smaller packets of data to a user daemon. But
> accounting processing is sometimes done out of a cron job off-hours.
> During the day (the busy hours) you might just want to stash the stuff
> with as little performance impact is possible. If one can avoid _any_
> other task having to context switch in, in order to get this data on its
> way, that is a huge win.
File writing accounting [kernel/acct.c] is slower, it takes global
locks
and requires process' context to work with system calls.
realyfs is interesting project, but it has different aims,
as far as I can see, - it is created for transferring huge amounts of
data,
and it is succeded in it, while connector is purely
control/notification
mechanism, for example, for gathering short-living per-process
accounting data.
--
Evgeniy Polyakov
Crash is better than data corruption -- Arthur Grabowski
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2005-03-29 7:09 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-25 10:03 [patch 1/2] fork_connector: add a fork connector Guillaume Thouvenin
2005-03-25 22:45 ` dean gaudet
2005-03-28 21:42 ` Paul Jackson
2005-03-29 7:04 ` Evgeniy Polyakov [this message]
2005-03-29 7:02 ` Greg KH
2005-03-29 7:10 ` Evgeniy Polyakov
2005-03-29 8:49 ` Paul Jackson
2005-03-29 9:17 ` Guillaume Thouvenin
2005-03-29 15:23 ` Paul Jackson
2005-03-29 18:44 ` Jay Lan
2005-03-30 1:05 ` Paul Jackson
2005-03-30 5:39 ` Guillaume Thouvenin
2005-03-30 6:35 ` Paul Jackson
2005-03-30 10:25 ` Herbert Xu
2005-03-30 10:57 ` Evgeniy Polyakov
2005-03-30 11:01 ` Guillaume Thouvenin
2005-04-01 3:26 ` Drew Hess
2005-03-29 10:29 ` Evgeniy Polyakov
2005-03-29 17:03 ` Paul Jackson
2005-03-29 21:09 ` Jay Lan
2005-03-29 22:01 ` Paul Jackson
2005-03-30 14:14 ` Evgeniy Polyakov
2005-03-30 20:56 ` Paul Jackson
2005-03-30 6:06 ` dean gaudet
2005-03-30 6:25 ` Paul Jackson
2005-03-30 6:38 ` Guillaume Thouvenin
2005-03-30 18:11 ` Jay Lan
2005-03-29 8:05 ` Guillaume Thouvenin
2005-03-29 14:47 ` Paul Jackson
2005-03-29 12:51 ` Guillaume Thouvenin
2005-03-29 15:35 ` Paul Jackson
2005-03-30 5:52 ` Guillaume Thouvenin
2005-03-30 6:41 ` Paul Jackson
-- strict thread matches above, loose matches on Subject: below --
2005-03-17 9:04 Guillaume Thouvenin
2005-03-17 16:56 ` Jesse Barnes
2005-03-17 21:38 ` Evgeniy Polyakov
2005-03-17 22:05 ` Jesse Barnes
2005-03-21 8:23 ` Guillaume Thouvenin
2005-03-21 12:48 ` Guillaume Thouvenin
2005-03-21 20:52 ` Ram
2005-03-22 4:36 ` Evgeniy Polyakov
2005-03-22 18:40 ` Ram
2005-03-22 7:07 ` Guillaume Thouvenin
2005-03-22 18:15 ` Jay Lan
2005-03-23 8:15 ` Guillaume Thouvenin
2005-03-22 18:26 ` Ram
2005-03-22 19:22 ` Evgeniy Polyakov
2005-03-22 19:18 ` Ram
2005-03-22 20:25 ` Evgeniy Polyakov
2005-03-22 20:42 ` Ram
2005-03-23 4:52 ` Evgeniy Polyakov
2005-03-22 22:51 ` Jay Lan
2005-03-22 23:51 ` Jay Lan
2005-03-23 5:01 ` Evgeniy Polyakov
[not found] ` <1111557106.23532.65.camel@uganda>
2005-03-23 19:00 ` Ram
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1112079856.5243.24.camel@uganda \
--to=johnpol@2ka.mipt.ru \
--cc=akpm@osdl.org \
--cc=efocht@hpce.nec.com \
--cc=elsa-devel@lists.sourceforge.net \
--cc=gh@us.ibm.com \
--cc=greg@kroah.com \
--cc=guillaume.thouvenin@bull.net \
--cc=jlan@engr.sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxram@us.ibm.com \
--cc=pj@engr.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.