public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH][RFC] fix BSD accounting (w/ long-term perspective ;-)
@ 2004-03-08 23:03 Tim Schmielau
  2004-03-08 23:33 ` Tim Schmielau
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Tim Schmielau @ 2004-03-08 23:03 UTC (permalink / raw)
  To: lkml; +Cc: Arthur Corliss

BSD accounting currently reports only the lower 16 bits of 32 bit uids,
reports times in units of 1/HZ seconds while it announces a unit of
1/100 seconds in the header file (which hurts especially on i386 where
HZ changed since 2.4) and cannot report times longer that 49 days.

We might want to fix that, but since we are in a stable kernel series
we need to insert some glue for binary compatibility.
This, of course, is nicer if we have a plot in place to zap such crap in 
the long term.
As only Arthur Corliss commented on my previous post (BSD accounting
dosn't seem that fancy these days...), I made up such a plot myself:

 - store the upper 16 bits of high uid/gids in the (currently zeroed)
   padding bytes of struct acct.

 - comp_t used for time values is actually able to hold up to 34 bit 
   values, so use that to allow times up to 194 days (with HZ=1024) to
   be reported. The GNU acct tools are able to deal with the extra bits.

 - units of time are the hard part. As userspace programs have no way 
   to determine which kernel version wrote the accounting file, and
   as the units of time have changed between 2.4 and 2.6, we have already
   messed up. It gets worse since we announce the constant (and thus 
   useless) value of AHZ=100 in <linux/acct.h> for all archs since ages.

   My proposed solution: keep backwards compatibility with 2.4 by using
   a unit of 1/USER_HZ, announce that correctly in the header file,
   and simply admit that BSD accounting on i386 is broken in 2.6.0-2.6.4.

 - store a version number in the last byte of struct acct, which allows
   for a smooth transition to a new binary format when 2.7 comes out.
   For 2.7, extend uid/gid fields to 32 bit, report times in terms
   of AHZ=100 on all platforms (thus allowing to report times up to 1988 
   days), and remove the compatibility stuff from the kernel.

 - introduce some compatibility glue into the userland tools, which can be 
   removed when 2.6 is history.

This would leave the user with two choices:

 - Don't mess with userland tools. Binary compatibility holds from 
   2.0 to 2.6, but breaks at the 2.6/2.7 border. For 2.7., userspace
   needs to be recompiled, but yields support for 32 bit uid/gids
   in return.

 - Install an updated userland accounting package. Immediately yields
   32 bit uid/gid support on kernels 2.6.5 and up.
   User may switch forth and back between kernels from 2.0 up to 2.7+,
   even within the same accounting file, without breaking compatibility.

Patch for 2.6 kernel is below. Proposed cleanup patch for 2.7 is at
  http://www.physik3.uni-rostock.de/tim/kernel/2.7/acct-cleanup-01.patch
Compatibility glue for GNU acct package is still to be done.

Comments?
Any other suggestions for incompatible changes to struct acct in 2.7?

Tim


--- linux-2.6.4-rc1/include/linux/acct.h	2004-02-04 04:43:17.000000000 +0100
+++ linux-2.6.4-rc1-acct/include/linux/acct.h	2004-03-08 22:19:21.000000000 +0100
@@ -16,6 +16,8 @@
 #define _LINUX_ACCT_H
 
 #include <linux/types.h>
+#include <linux/version.h>
+#include <asm/param.h>
 
 /* 
  *  comp_t is a 16-bit "floating" point number with a 3-bit base 8
@@ -34,12 +36,18 @@ typedef __u16	comp_t;
 
 #define ACCT_COMM	16
 
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,7,0)
+
+#define ACCT_VERSION    1
+#define AHZ		(USER_HZ)
+
+
 struct acct
 {
 	char		ac_flag;		/* Accounting Flags */
 /*
- *	No binary format break with 2.0 - but when we hit 32bit uid we'll
- *	have to bite one
+ *	No binary format break with 2.0 - but 32 bit uid/gid support is
+ *	a kludge.
  */
 	__u16		ac_uid;			/* Accounting Real User ID */
 	__u16		ac_gid;			/* Accounting Real Group ID */
@@ -56,9 +64,46 @@ struct acct
 	comp_t		ac_swaps;		/* Accounting Number of Swaps */
 	__u32		ac_exitcode;		/* Accounting Exitcode */
 	char		ac_comm[ACCT_COMM + 1];	/* Accounting Command Name */
-	char		ac_pad[10];		/* Accounting Padding Bytes */
+	__u16		ac_uid_hi;		/* Accounting Real User ID */
+	__u16		ac_gid_hi;		/* Accounting Real Group ID */
+	char		ac_pad[5];		/* Accounting Padding Bytes */
+	char		ac_version;		/* Always set to ACCT_VERSION */
+};
+
+#else
+
+#define ACCT_VERSION    2
+#define AHZ		100
+
+/*
+ * New binary format with 32 bit uid/gid.
+ * Scheduled for 2.7.0
+ */
+
+struct acct
+{
+	__u32		ac_uid;			/* Accounting Real User ID */
+	__u32		ac_gid;			/* Accounting Real Group ID */
+	__u32		ac_exitcode;		/* Accounting Exitcode */
+	__u32		ac_btime;		/* Accounting Process Creation Time */
+	comp_t		ac_utime;		/* Accounting User Time */
+	comp_t		ac_stime;		/* Accounting System Time */
+	comp_t		ac_etime;		/* Accounting Elapsed Time */
+	comp_t		ac_mem;			/* Accounting Average Memory Usage */
+	comp_t		ac_io;			/* Accounting Chars Transferred */
+	comp_t		ac_rw;			/* Accounting Blocks Read or Written */
+	comp_t		ac_minflt;		/* Accounting Minor Pagefaults */
+	comp_t		ac_majflt;		/* Accounting Major Pagefaults */
+	comp_t		ac_swaps;		/* Accounting Number of Swaps */
+	__u16		ac_tty;			/* Accounting Control Terminal */
+	char		ac_pad[9];		/* Accounting Padding Bytes */
+	char		ac_flag;		/* Accounting Flags */
+	char		ac_comm[ACCT_COMM + 1];	/* Accounting Command Name */
+	char		ac_version;		/* Always set to ACCT_VERSION */
 };
 
+#endif
+
 /*
  *  accounting flags
  */
@@ -69,8 +114,6 @@ struct acct
 #define ACORE		0x08	/* ... dumped core */
 #define AXSIG		0x10	/* ... was killed by a signal */
 
-#define AHZ		100
-
 #ifdef __KERNEL__
 
 #include <linux/config.h>
@@ -84,6 +127,28 @@ extern void acct_process(long exitcode);
 #define acct_process(x)		do { } while (0)
 #endif
 
+/*
+ * Yet another HZ to *HZ helper function.
+ * See <linux/timer.h> for the original.
+ */
+
+#if (HZ % AHZ)==0
+# define jiffies_to_AHZ(x) ((x) / (HZ / AHZ))
+#else
+# define jiffies_to_AHZ(x) ((clock_t) jiffies_64_to_AHZ((u64) x))
+#endif
+
+static inline u64 jiffies_64_to_AHZ(u64 x)
+{
+#if (HZ % AHZ)==0
+	do_div(x, HZ / AHZ);
+#else
+	x *= AHZ;
+	do_div(x, HZ);
+#endif
+	return x;
+}
+
 #endif	/* __KERNEL */
 
 #endif	/* _LINUX_ACCT_H */

--- linux-2.6.4-rc1/kernel/acct.c	2004-02-04 04:43:56.000000000 +0100
+++ linux-2.6.4-rc1-acct/kernel/acct.c	2004-03-08 22:31:26.000000000 +0100
@@ -265,13 +265,16 @@ void acct_auto_close(struct super_block 
  *  This routine has been adopted from the encode_comp_t() function in
  *  the kern_acct.c file of the FreeBSD operating system. The encoding
  *  is a 13-bit fraction with a 3-bit (base 8) exponent.
+ *
+ *  Bumped up to encode 64 bit values. Unfortunately the result may
+ *  overflow now.
  */
 
 #define	MANTSIZE	13			/* 13 bit mantissa. */
 #define	EXPSIZE		3			/* Base 8 (3 bit) exponent. */
 #define	MAXFRACT	((1 << MANTSIZE) - 1)	/* Maximum fractional value. */
 
-static comp_t encode_comp_t(unsigned long value)
+static comp_t encode_comp_t(u64 value)
 {
 	int exp, rnd;
 
@@ -293,9 +296,14 @@ static comp_t encode_comp_t(unsigned lon
 	/*
          * Clean it up and polish it off.
          */
-	exp <<= MANTSIZE;		/* Shift the exponent into place */
-	exp += value;			/* and add on the mantissa. */
-	return exp;
+	if (exp >= (1 << EXPSIZE)) {
+		/* Overflow. Return largest representable number instead. */
+		return (1ul << (MANTSIZE + EXPSIZE)) - 1;
+	} else {
+		exp <<= MANTSIZE;	/* Shift the exponent into place */
+		exp += value;		/* and add on the mantissa. */
+		return exp;
+	}
 }
 
 /*
@@ -334,15 +342,21 @@ static void do_acct_process(long exitcod
 	strlcpy(ac.ac_comm, current->comm, sizeof(ac.ac_comm));
 
 	elapsed = get_jiffies_64() - current->start_time;
-	ac.ac_etime = encode_comp_t(elapsed < (unsigned long) -1l ?
-	                       (unsigned long) elapsed : (unsigned long) -1l);
+	ac.ac_etime = encode_comp_t(jiffies_64_to_AHZ(elapsed));
 	do_div(elapsed, HZ);
 	ac.ac_btime = xtime.tv_sec - elapsed;
 	ac.ac_utime = encode_comp_t(current->utime);
 	ac.ac_stime = encode_comp_t(current->stime);
-	/* we really need to bite the bullet and change layout */
 	ac.ac_uid = current->uid;
 	ac.ac_gid = current->gid;
+#if ACCT_VERSION==1
+	/*
+	 * A change in layout to 32 bit uid/gid fields is scheduled for 2.7.
+	 * But let's keep it compatible for now.
+	 */
+	ac.ac_uid_hi = current->uid >> 16;
+	ac.ac_gid_hi = current->gid >> 16;
+#endif
 	ac.ac_tty = current->tty ? old_encode_dev(tty_devnum(current->tty)) : 0;
 
 	ac.ac_flag = 0;
@@ -374,6 +388,7 @@ static void do_acct_process(long exitcod
 	ac.ac_majflt = encode_comp_t(current->maj_flt);
 	ac.ac_swaps = encode_comp_t(current->nswap);
 	ac.ac_exitcode = exitcode;
+	ac.ac_version = ACCT_VERSION;
 
 	/*
          * Kernel segment override to datasegment and write it

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][RFC] fix BSD accounting (w/ long-term perspective ;-)
  2004-03-08 23:03 Tim Schmielau
@ 2004-03-08 23:33 ` Tim Schmielau
  2004-03-09  0:23 ` Arthur Corliss
  2004-03-12 17:59 ` Ragnar Kjørstad
  2 siblings, 0 replies; 5+ messages in thread
From: Tim Schmielau @ 2004-03-08 23:33 UTC (permalink / raw)
  To: lkml; +Cc: Arthur Corliss

> Patch for 2.6 kernel is below.

Whoops, missed the incremental hunk below. Corrected full patch is at
  http://www.physik3.uni-rostock.de/tim/kernel/2.6/acct-06.patch

Tim


--- linux-2.6.4-rc1-acct/kernel/acct.c	2004-03-08 22:31:26.000000000 +0100
+++ linux-2.6.4-rc1-acct1/kernel/acct.c	2004-03-09 00:25:50.000000000 +0100
@@ -345,8 +345,8 @@ static void do_acct_process(long exitcod
 	ac.ac_etime = encode_comp_t(jiffies_64_to_AHZ(elapsed));
 	do_div(elapsed, HZ);
 	ac.ac_btime = xtime.tv_sec - elapsed;
-	ac.ac_utime = encode_comp_t(current->utime);
-	ac.ac_stime = encode_comp_t(current->stime);
+	ac.ac_utime = encode_comp_t(jiffies_to_AHZ(current->utime));
+	ac.ac_stime = encode_comp_t(jiffies_to_AHZ(current->stime));
 	ac.ac_uid = current->uid;
 	ac.ac_gid = current->gid;
 #if ACCT_VERSION==1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][RFC] fix BSD accounting (w/ long-term perspective ;-)
  2004-03-08 23:03 Tim Schmielau
  2004-03-08 23:33 ` Tim Schmielau
@ 2004-03-09  0:23 ` Arthur Corliss
  2004-03-12 17:59 ` Ragnar Kjørstad
  2 siblings, 0 replies; 5+ messages in thread
From: Arthur Corliss @ 2004-03-09  0:23 UTC (permalink / raw)
  To: Tim Schmielau; +Cc: lkml

On Tue, 9 Mar 2004, Tim Schmielau wrote:

> Comments?

The glue for the GNU accounting tools are already on the way, should this be
the patch accepted for 2.6.  If you would, please provide a similar patch
against 2.4 (or I'll do it, which ever is more convenient).  Great job on your
patch, BTW, I should have snarfed those padding bytes myself, but I wasn't
giving as much thought as you did to backwards compatibility.

> Any other suggestions for incompatible changes to struct acct in 2.7?

Taking a note from the BSD camp, instead of fixing the field size in acct.h
why not cast ac_uid/ac_gid as uid_t/gid_t and let the kernel determine the
size to log in 2.7 onwards?  Yes, this will break binary compatibility for
those upgrading older systems with a new stable series kernel.  However, this
would obviate the need in the future to provide a migration path.  Outside of
a recompile it would be transparent to the userland tools.

For fields working within a well-defined range I don't think we should be
second-guessing the kernel, and just use what the kernel dictates.  Not doing
so means we are acknowledging that we don't care about logging incorrect data.

Being the whiny little bitch that I am I'm going to repeat my mantra:  we
should not be in the business of logging bad data, regardless of how rare we
think we might actually do it.

Besides which, with your great idea of putting a version field in the struct
we can now put some intelligence into the GNU utils to read the different
versions of the records in the same file.

	--Arthur Corliss
	  Bolverk's Lair -- http://arthur.corlissfamily.org/
	  Digital Mages -- http://www.digitalmages.com/
	  "Live Free or Die, the Only Way to Live" -- NH State Motto

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][RFC] fix BSD accounting (w/ long-term perspective ;-)
  2004-03-08 23:03 Tim Schmielau
  2004-03-08 23:33 ` Tim Schmielau
  2004-03-09  0:23 ` Arthur Corliss
@ 2004-03-12 17:59 ` Ragnar Kjørstad
  2 siblings, 0 replies; 5+ messages in thread
From: Ragnar Kjørstad @ 2004-03-12 17:59 UTC (permalink / raw)
  To: Tim Schmielau; +Cc: lkml, Arthur Corliss, watters, law, te

[ CCing Sam Watters and Linda Walsh at SGI for feedback about using
pid/ppid/pgid/sid/paggid/whatever to track jobs in accounting-logs ]

On Tue, Mar 09, 2004 at 12:03:27AM +0100, Tim Schmielau wrote:
> BSD accounting currently reports only the lower 16 bits of 32 bit uids,
> reports times in units of 1/HZ seconds while it announces a unit of
> 1/100 seconds in the header file (which hurts especially on i386 where
> HZ changed since 2.4) and cannot report times longer that 49 days.

Red Hat uses a patch that adds 32 bit uids and 32 bit gids at the end of
struct acct, reducing the size of the padding to 2 bytes. If a simular
extension is added to the official kernel it would be nice if they were
compatible.


There is also a few other problems with the current implementation that
we would like to see fixed if there is going to be a format-change
anyway:
- Architecture indepencence. It would be nice if the binary format was
  independent of the architecture so files can be processed on other
  hosts. This would involve defining padding and byte-ordering, as well
  as either stop using 1/HZ units in the file or add AHZ to the record.

- Better accuracy. Specificly the records don't include an exact
  termination-time for the process. One can estimate termination-time
  based on creation time and elapsed time but since elapsed time is
  truncated into comp_t it is not accurate for long-lived processes.

- More information. Accounting is typically used to keep track of
  what resources different jobs have used, where the definition of
  "job" may differ from site to site. Quite often "job" is a group
  of related processes, so including pid/ppid would allow
  post-processing to rebuild the process-hierarchy and deduce what
  processes were part of the same job. Even better, if usespace
  could notify the kernel what job a process belongs to, it would be
  easier to track jobs across multiple nodes in a cluster and so on.
  Maybe adding pid, ppid and jobid to the format and then use the sid
  as the jobid for now? And then it can be replaced by a proper jobid
  from e.g. PAGG later?

>    My proposed solution: keep backwards compatibility with 2.4 by using
>    a unit of 1/USER_HZ, announce that correctly in the header file,
>    and simply admit that BSD accounting on i386 is broken in 2.6.0-2.6.4.

Sounds good.

>  - store a version number in the last byte of struct acct, which allows
>    for a smooth transition to a new binary format when 2.7 comes out.
>    For 2.7, extend uid/gid fields to 32 bit, report times in terms
>    of AHZ=100 on all platforms (thus allowing to report times up to 1988 
>    days), and remove the compatibility stuff from the kernel.

It may be useful to have the version-information in the beginning of the
struct to allow future versions to use a struct of a different size.
There should be some bits available in the flag-field, maybe that's
enough?


-- 
Ragnar Kjørstad
Software Engineer
Scali - http://www.scali.com
High Performance Clustering

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH][RFC] fix BSD accounting (w/ long-term perspective ;-)
@ 2004-03-12 23:20 Albert Cahalan
  0 siblings, 0 replies; 5+ messages in thread
From: Albert Cahalan @ 2004-03-12 23:20 UTC (permalink / raw)
  To: linux-kernel mailing list; +Cc: kernel, corliss, tim

> store a version number in the last byte of struct acct, which allows
> for a smooth transition to a new binary format when 2.7 comes out.
> For 2.7, extend uid/gid fields to 32 bit, report times in terms
> of AHZ=100 on all platforms (thus allowing to report times up to 1988 
> days), and remove the compatibility stuff from the kernel.

This is no good. The struct size varies. It is 64 bytes on
most architectures. It is likely to be larger on hppa, and
smaller on m68k or arm.

The second byte is available, as padding on every arch.
Use that instead.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-03-12 23:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-12 23:20 [PATCH][RFC] fix BSD accounting (w/ long-term perspective ;-) Albert Cahalan
  -- strict thread matches above, loose matches on Subject: below --
2004-03-08 23:03 Tim Schmielau
2004-03-08 23:33 ` Tim Schmielau
2004-03-09  0:23 ` Arthur Corliss
2004-03-12 17:59 ` Ragnar Kjørstad

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox