All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oren Laadan <orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
To: "Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
Cc: Linux Containers <containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org>
Subject: Re: [PATCH RFC] Send checkpoint and restart debug info to a log	file (v2)
Date: Wed, 21 Oct 2009 19:14:24 -0400	[thread overview]
Message-ID: <4ADF95D0.8060806@librato.com> (raw)
In-Reply-To: <20091021224922.GA5827-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org):
>>
>> Serge E. Hallyn wrote:
>>> Until now, debug data was sent to syslog.  That's not really
>>> right.
>>>
>>> This patch sends ckpt_debug output to a logfile provided by
>>> the caller.  If no logfile is provided, then no debug output
>>> is needed.  So the user can pass in -1 for the logfd to say
>>> so.
>> I suggest to keep an option to write to the console/syslog too.
>> Otherwise we may lose important debug info if a nasty crash
>> occurs ...
> 
> Ok, that should be simple enough to tack on once the rest is
> settled - just have ckpt_log() sent fmt+##args to both
> do_ckpt_user_debug() (to write to user-provided file) and a
> ckpt_syslog() which is a noop if !CONFIG_CHECKPOINT_DEBUG.
> 
>>> Note that this does not address the potential (inevitable?)
>>> lockup of writing out a debug msg while checkpointing the
>>> debug file.  In practice so far it seems to work rather well.
>>> (with quite a bit of testing)
>> I believe the only concern here is that our debug code should
>> not write any debug info while the respective inode is locked.
> 
> BTW it occurs to me that self-restart with a logfd must be kinda
> hosed early on :)

Why would that be a problem ?  I actually think it's useful for
those doing self-restart:  you open a file, the kernel takes a
reference to it, then sys_restart() will eventually close that
file descriptor - but kernel still keeps a reference - so debug
data keeps flowing. When restart completes -- data is gone; If
restart fails - user will have information in that file.

> 
>>> This also means that we have to be more careful than we have
>>> been about not writing out sensitive data.
>>>
>>> This is pure RFC, not meant to be pretty.
>>>
>>> The split into ckpt_debug and ckpt_err (see changelog) suggests
>>> that we should rearrange (and be more consistent about) how and
>>> when we print out debug info.  Left as an exercise for later.
>> I hate homework...
>>
>> Besides, I think we need to address it now. This patch is pretty
>> intrusive and will be painful when I fold to ckpt-v19. I do not
>> want to go through the process twice.
> 
> Ok, why don't I take my best stab at this in my next patch.
> 
>> It's definitely time to come up with guidelines to when/where/how
>> to add debugging output, and when/where/how to add error output.
>>
>> In the 'how' section, besides avoiding leaking sensitive data, I'd
>> like to come up with a canonical format that will be easy to read
>> and to parse automatically (improve over ckpt_write_err).
>>
>> Hmm... ckpt_write_err() should change too -- be written to the log
>> instead.
> 
> Let's not be hasty - maybe duplicated, but an error message at the
> end of the checkpoint file is still a nice sanity check for userspace
> if they happened to checkpoint without a logfile.

Ok.

> 
>>> Changelog:
>>> 	Oct 21: split ckpt_debug into ckpt_debug and ckpt_err.
>>> 		Git rid of the split by memory debug info etc.
>> The split is useful to control the amount of log.
> 
> It's a stupid split!  And I've never used it.  Besides, when a log is
> for a single c/r, it's really not very big.

It may be stupid split!, yet it did prove very useful to me.
Maybe it's because you never debugged the memory checkpoint
page by page.

A typical scenario: you hit a bug -> you enable debugging ->
the bug disappears -> you disable debugging -> you hit the bug ...

IOW, debugging output in big doses affects the execution in a
way that makes heisen-bugs hide. Control over verbosity means you
get better chances at reproducing the behavior and still have
enough meaningful data.

> 
> More practically, requiring userspace to pass over a flag
> consisting of CKPT_DBG_MEM|CKPT_DBG|FILE|CKPT_DBG|TASK, and
> handle corresponding usage flags, is not nice.

I agree with you on about this. Maybe we want a better
interface ?

Which brings me to this random thought: maybe we want to
make the fourth argument of sys_{checkpoint,restart} a
structure, to make it easier to extend it in the future
without having to go throw a clone3-like hell...

Specifically, this structure could now be:

struct ckpt_args {
	int version;
	int logfd;
	int logmask;
};

(or use union checkpoint {} and union restart {} to tell
between checkpoint- and restart-related args.

> 
>>> 		Since userspace actively asks for debug info, I
>>> 		also made it not depend on CONFIG_CHECKPOINT_DEBUG.
>>> 		We may want to put it back again to limit kernel
>>> 		size, but for now it's a distraction, and I'm not
>>> 		convinced it makes sense
>> Then add CONFIG_CHECKPOINT_LOGGING ?
>> (automatically implied by CONFIG_CHECKPOINT_DEBUG)
> 
> Well...  for some reason I can't decide...  does that make sense?
> 
> Remember more config options means more ways for the code to
> not compile.  Before adding one, I'd prefer to compile an image
> with and without the ckpt_debugs (iow with the #defines under
> #ifdef 0) and see how much size it really adds.  And in face right
> now CONFIG_CHECKPOINT_DEBUG only directs creation of the extra restart
> task debug info, which is only printed out when ctx->uflags & CKPT_DEBUG_ALL,
> so that config option could go away entirely.
> 
>>> Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
>>> ---
>>>  arch/s390/kernel/compat_wrapper.S |    2 +
>>>  arch/x86/mm/checkpoint.c          |   11 ++---
>>>  checkpoint/checkpoint.c           |   22 ++++----
>>>  checkpoint/files.c                |   29 +++++------
>>>  checkpoint/memory.c               |   43 ++++++++-------
>>>  checkpoint/namespace.c            |    3 -
>>>  checkpoint/objhash.c              |   33 +++++++------
>>>  checkpoint/process.c              |   85 +++++++++++++++----------------
>>>  checkpoint/restart.c              |  101 +++++++++++++++++--------------------
>>>  checkpoint/signal.c               |    5 +--
>>>  checkpoint/sys.c                  |   94 ++++++++++++++++++++++------------
>>>  drivers/char/tty_io.c             |   22 ++++----
>>>  include/linux/checkpoint.h        |   85 ++++++++++++++++---------------
>>>  include/linux/checkpoint_types.h  |    5 +-
>>>  include/linux/syscalls.h          |    5 +-
>>>  ipc/checkpoint.c                  |   13 ++---
>>>  ipc/checkpoint_msg.c              |   13 ++---
>>>  ipc/checkpoint_sem.c              |   13 ++---
>>>  ipc/checkpoint_shm.c              |   15 ++---
>>>  kernel/cred.c                     |    2 +-
>>>  lib/Kconfig.debug                 |    4 +-
>>>  mm/filemap.c                      |    2 +-
>>>  mm/shmem.c                        |    2 +-
>>>  net/checkpoint.c                  |   46 ++++++++--------
>>>  net/ipv4/checkpoint.c             |   26 ++++-----
>>>  net/unix/checkpoint.c             |   65 +++++++++++++-----------
>>>  26 files changed, 380 insertions(+), 366 deletions(-)
>> Uhhh... when this matures from RFC to patch-for-inclusion, make
>> sure to split it nicely for me ... please.
> 
> I'm not sure how split-able it is, but i'll do my best.

One way it to decide that we change the name ckpt_debug() to,
say, ckpt_log_debug(), then you introduce the new function in
on patch, and in following patches convert file by file.

Or, you just split it to first change ckpt_debug(), and then one
patch per file (or per subsystem). It's not bisect-able, but
that's ok because it's not going as is to the kernel - I'll
fold it anyway into ckpt-v19 clean patchset.

Oren.

  parent reply	other threads:[~2009-10-21 23:14 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-21 21:05 [PATCH RFC] Send checkpoint and restart debug info to a log file (v2) Serge E. Hallyn
     [not found] ` <20091021210507.GA2098-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-21 21:05   ` [PATCH user-cr] Allow for logfile for kernel debug messages (v2) Serge E. Hallyn
     [not found]     ` <20091021210533.GA2165-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-21 21:46       ` Oren Laadan
     [not found]         ` <4ADF8140.3010102-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-21 22:02           ` Serge E. Hallyn
     [not found]             ` <20091021220245.GA8994-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2009-10-21 22:18               ` Oren Laadan
     [not found]                 ` <4ADF88BD.20400-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-22  5:56                   ` Matt Helsley
2009-10-22  5:48           ` Matt Helsley
     [not found]             ` <20091022054853.GC7757-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2009-10-23 18:59               ` Oren Laadan
2009-10-21 22:03   ` [PATCH RFC] Send checkpoint and restart debug info to a log file (v2) Oren Laadan
     [not found]     ` <4ADF853F.6080807-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-21 22:49       ` Serge E. Hallyn
     [not found]         ` <20091021224922.GA5827-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-21 23:14           ` Oren Laadan [this message]
     [not found]             ` <4ADF95D0.8060806-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-22  0:51               ` Serge E. Hallyn
     [not found]                 ` <20091022005157.GA11608-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-22  6:04                   ` Matt Helsley
     [not found]                     ` <20091022060400.GE7757-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2009-10-23 18:48                       ` Oren Laadan
     [not found]                         ` <4AE1FA7A.5030702-RdfvBDnrOixBDgjK7y7TUQ@public.gmane.org>
2009-10-23 20:06                           ` Serge E. Hallyn
2009-10-26 21:52                           ` Serge E. Hallyn
     [not found]                             ` <20091026215238.GA10900-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2009-10-26 23:39                               ` Oren Laadan
2009-10-22 18:25               ` Serge E. Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ADF95D0.8060806@librato.com \
    --to=orenl-rdfvbdnroixbdgjk7y7tuq@public.gmane.org \
    --cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
    --cc=serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.