public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jay Lan <jlan@engr.sgi.com>
To: Paul Jackson <pj@engr.sgi.com>
Cc: johnpol@2ka.mipt.ru, guillaume.thouvenin@bull.net, akpm@osdl.org,
	greg@kroah.com, linux-kernel@vger.kernel.org,
	efocht@hpce.nec.com, linuxram@us.ibm.com, gh@us.ibm.com,
	elsa-devel@lists.sourceforge.net
Subject: Re: [patch 1/2] fork_connector: add a fork connector
Date: Tue, 29 Mar 2005 13:09:44 -0800	[thread overview]
Message-ID: <4249C418.5040007@engr.sgi.com> (raw)
In-Reply-To: <20050329090304.23fbb340.pj@engr.sgi.com>

Paul,

The fork_connector is not designed to solve accounting data collection
problem.

The accounting data collection must be done via a hook from do_exit().
The acct_process() hook invokes do_acct_process() to write BSD
accounting data to disk. CSA needs a similar hook off do_exit() to
collect more accounting data and write to disk in different
accounting records format. This part is not part of fork_connector.

ELSA does not care how accounting data being written to disks. However,
it needs the accounting data being reliably and accurately collected
and written to disk by BSD and/or CSA and it needs the <ppid, pid>
information to aggregate processes. It was never the fork_connector's
intention to piggy back the data to the accounting file.

Thanks,
  - jay



Paul Jackson wrote:
> Evgeniy writes:
> 
>>Here forking connector module "exits" and can handle next fork() on the
>>same CPU.
> 
> 
> Fine ... but it's not about what the fork_connector does.  It's about
> getting the accounting data to disk, if I understand correctly.
> 
> 
>>That is why it is very fast in "fast-path".
> 
> 
> I don't care how fast a tool is.  I care how fast the job gets done.  If
> a tool is only doing part of the job, then we can't decide whether to
> use that tool just based on how fast that part of the job gets done.
> 
> 
>>The most expensive part is cn_netlink_send()/netlink_broadcast(), 
>>with CBUS it is deferred to the safe time,
> 
> 
> This is "safe time" for the immediate purpose of letting the forking
> process continue on its way.  But the deferred work of buffering up the
> data and writing it to disk still needs to be done, pretty soon.  When
> sizing a system to see how many users or jobs I can run on it at a time,
> I will have to include sufficient cpu, memory and disk i/o to handle
> getting this accounting data to disk, right?
> 
> 
>>> 2) Using a modified form of what BSD ACCOUNTING does now:
>>>	- forking process appends single fork data to in-kernel buffer
>>
>>It is not as simple.
>>It takes global locks several times, it access bunch of shared between
>>CPU data.
>>It calls ->stat() and ->write() which may sleep.
> 
> 
> Hmmm ... good points.  The mechanisms in the kernel now (and for the
> last 25 years) to write out BSD ACCOUNTING data may not be numa friendly.
> 
> Perhaps there should be a per-cpu 512 byte buffer, which can gather up 8
> accounting records (64 bytes each) and only call the file system write
> once every 8 task exits.  Or perhaps a per-node buffer, with a spinlock
> to serialize access by the CPUs on that node.  Or perhaps per-node
> accounting files.  Or something like that.
> 
> Guillaume, Jay - do we (you ?) need to make classic BSD ACCOUNTING data
> collection numa friendly?  Based on the various frustrated comments at
> the top of kernel/acct.c, this could be a non-trivial effort to get
> right.  Maybe we need it, but can't afford it.
> 
> And perhaps my proposed variable length records for supplementary
> accounting, such as <parent pid, child pid> from fork, need to allow
> for some way to pad out the rest of a buffer, when the next record
> won't fit entirely.
> 
> 
>>That work is deferred and does not affect in-kernel processes.
> 
> 
> The accounting data collection cannot be deferred for long, perhaps
> just a few minutes.  Not until the data hits the disk can we rest
> indefinitely.  Unless, that is, I don't understand what problem is
> being solved here (quite possible ;).
> 
> 
>>And why userspace fork connector should write data to the disk?
> 
> 
> I NEVER said it should.  I am NOT trying to redesign fork_connector.
> 
> Good grief ... how many times and ways do I have to say this ;)?
> 
> I am asking what is the best tool for accounting data collection,
> which, if I understand correctly, does need to write to disk.
> 


  reply	other threads:[~2005-03-29 21:21 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-03-25 10:03 [patch 1/2] fork_connector: add a fork connector Guillaume Thouvenin
2005-03-25 22:45 ` dean gaudet
2005-03-28 21:42 ` Paul Jackson
2005-03-29  7:04   ` Evgeniy Polyakov
2005-03-29  7:02     ` Greg KH
2005-03-29  7:10       ` Evgeniy Polyakov
2005-03-29  8:49     ` Paul Jackson
2005-03-29  9:17       ` Guillaume Thouvenin
2005-03-29 15:23         ` Paul Jackson
2005-03-29 18:44           ` Jay Lan
2005-03-30  1:05             ` Paul Jackson
2005-03-30  5:39           ` Guillaume Thouvenin
2005-03-30  6:35             ` Paul Jackson
2005-03-30 10:25               ` Herbert Xu
2005-03-30 10:57                 ` Evgeniy Polyakov
2005-03-30 11:01                 ` Guillaume Thouvenin
2005-04-01  3:26           ` Drew Hess
2005-03-29 10:29       ` Evgeniy Polyakov
2005-03-29 17:03         ` Paul Jackson
2005-03-29 21:09           ` Jay Lan [this message]
2005-03-29 22:01             ` Paul Jackson
2005-03-30 14:14               ` Evgeniy Polyakov
2005-03-30 20:56                 ` Paul Jackson
2005-03-30  6:06             ` dean gaudet
2005-03-30  6:25               ` Paul Jackson
2005-03-30  6:38               ` Guillaume Thouvenin
2005-03-30 18:11               ` Jay Lan
2005-03-29  8:05   ` Guillaume Thouvenin
2005-03-29 14:47     ` Paul Jackson
2005-03-29 12:51   ` Guillaume Thouvenin
2005-03-29 15:35     ` Paul Jackson
2005-03-30  5:52       ` Guillaume Thouvenin
2005-03-30  6:41         ` Paul Jackson
  -- strict thread matches above, loose matches on Subject: below --
2005-03-17  9:04 Guillaume Thouvenin
2005-03-17 16:56 ` Jesse Barnes
2005-03-17 21:38   ` Evgeniy Polyakov
2005-03-17 22:05     ` Jesse Barnes
2005-03-21  8:23       ` Guillaume Thouvenin
2005-03-21 12:48       ` Guillaume Thouvenin
2005-03-21 20:52         ` Ram
2005-03-22  4:36           ` Evgeniy Polyakov
2005-03-22 18:40             ` Ram
2005-03-22  7:07           ` Guillaume Thouvenin
2005-03-22 18:15             ` Jay Lan
2005-03-23  8:15               ` Guillaume Thouvenin
2005-03-22 18:26             ` Ram
2005-03-22 19:22               ` Evgeniy Polyakov
2005-03-22 19:18                 ` Ram
2005-03-22 20:25                   ` Evgeniy Polyakov
2005-03-22 20:42                     ` Ram
2005-03-23  4:52                       ` Evgeniy Polyakov
2005-03-22 22:51                   ` Jay Lan
2005-03-22 23:51                 ` Jay Lan
2005-03-23  5:01                   ` Evgeniy Polyakov
     [not found]                     ` <1111557106.23532.65.camel@uganda>
2005-03-23 19:00                       ` Ram

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4249C418.5040007@engr.sgi.com \
    --to=jlan@engr.sgi.com \
    --cc=akpm@osdl.org \
    --cc=efocht@hpce.nec.com \
    --cc=elsa-devel@lists.sourceforge.net \
    --cc=gh@us.ibm.com \
    --cc=greg@kroah.com \
    --cc=guillaume.thouvenin@bull.net \
    --cc=johnpol@2ka.mipt.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxram@us.ibm.com \
    --cc=pj@engr.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox