All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Cliff Wickman <cpw@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	andi@firstfloor.org, tglx@linutronix.de,
	linux-kernel@vger.kernel.org,
	the arch/x86 maintainers <x86@kernel.org>
Subject: Re: [PATCH] X86: reboot-notify additions
Date: Thu, 19 Jun 2008 14:50:49 -0700	[thread overview]
Message-ID: <m1d4mdksau.fsf@frodo.ebiederm.org> (raw)
In-Reply-To: <20080619145453.GA11929@sgi.com> (Cliff Wickman's message of "Thu, 19 Jun 2008 09:54:53 -0500")

Cliff Wickman <cpw@sgi.com> writes:

> On Thu, Jun 19, 2008 at 01:02:14PM +0200, Ingo Molnar wrote:
>> 
>> * Cliff Wickman <cpw@sgi.com> wrote:
>> 
>> > From: Cliff Wickman <cpw@sgi.com>
>> > 
>> > X86 reboot-notify additions.
>
> As Andi Kleen pointed out, this is not X86-specific.  (it started out
> to be, but what I hoped to achieve for X86 turns out to be generic).
>
>> > This patch adds scans of the "reboot_notifier_list" callback chain in
>> > a three other places where the kernel is being stopped and/or restarted.
>> > 
>> > Adds calls to blocking_notifier_call_chain() in:
>> >    crash_kexec(), emergency_restart(), sys_kexec_load()
>> > 
>> > In the crash_kexec() and emergency_restart() cases it is indicated to the
>> > called-back function that the system is not in a sane state, so that
>> > it can avoid taking a lock or some such potentially blocking action.
>> > 
>> > These callbacks are important to a partition system. The stopped kernel
> needs
>> > to inform other partitions of their need to disconnect (stop sharing
> memory).
>> > 
>> > Diffed against 2.6.26-rc6
>> > 
>> > Signed-off-by: Cliff Wickman <cpw@sgi.com>
>> > ---
>> >  include/linux/notifier.h |    4 ++++
>> >  kernel/kexec.c           |    5 +++++
>> >  kernel/sys.c             |    1 +
>> >  3 files changed, 10 insertions(+)
>> > 
>> > Index: linux/include/linux/notifier.h
>> > ===================================================================
>> > --- linux.orig/include/linux/notifier.h
>> > +++ linux/include/linux/notifier.h
>> > @@ -202,6 +202,10 @@ static inline int notifier_to_errno(int 
>> >  #define SYS_RESTART	SYS_DOWN
>> >  #define SYS_HALT	0x0002	/* Notify of system halt */
>> >  #define SYS_POWER_OFF	0x0003	/* Notify of system power off */
>> > +#define SYS_INSANE	0x0004	/* Notify of system error/panic/oops */
>> > +/* For the SYS_INSANE case, no locks should be taken by the called-back
>> > + * function.  The kernel is ready for an immediate reboot.
>> > + */
>> >  
>> >  #define NETLINK_URELEASE	0x0001	/* Unicast netlink socket released */
>> >  
>> > Index: linux/kernel/kexec.c
>> > ===================================================================
>> > --- linux.orig/kernel/kexec.c
>> > +++ linux/kernel/kexec.c
>> > @@ -1001,6 +1001,9 @@ asmlinkage long sys_kexec_load(unsigned 
>> >  		if (result)
>> >  			goto out;
>> >  	}
>> > +
>> > +	blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
>> > +
>> >  	/* Install the new kernel, and  Uninstall the old */
>> >  	image = xchg(dest_image, image);
>> >  
>> > @@ -1068,6 +1071,8 @@ void crash_kexec(struct pt_regs *regs)
>> >  	if (!locked) {
>> >  		if (kexec_crash_image) {
>> >  			struct pt_regs fixed_regs;
>> > +			blocking_notifier_call_chain(&reboot_notifier_list,
>> > +				SYS_INSANE, NULL);
>> >  			crash_setup_regs(&fixed_regs, regs);
>> >  			crash_save_vmcoreinfo();
>> >  			machine_crash_shutdown(&fixed_regs);
>> > Index: linux/kernel/sys.c
>> > ===================================================================
>> > --- linux.orig/kernel/sys.c
>> > +++ linux/kernel/sys.c
>> > @@ -270,6 +270,7 @@ out_unlock:
>> >   */
>> >  void emergency_restart(void)
>> >  {
>> > +	blocking_notifier_call_chain(&reboot_notifier_list, SYS_INSANE, NULL);
>> >  	machine_emergency_restart();
>> >  }
>> >  EXPORT_SYMBOL_GPL(emergency_restart);
>> 
>> i dont think this is a good idea. reboot_notifier_list is a blocking 
>> notifier, i.e. it comes with a notifier->rwsem read-write mutex that is 
>> taken when blocking_notifier_call_chain() is executed.
>> 
>> i.e. this patch puts a sleeping mutex operation (a down_read()) into a 
>> highly critical code path of the kernel. This will decrease the 
>> reliability of the kernel.
>
> Andi pointed this out, too.
>
> For these emergency cases (I'll change "SYS_INSANE" to "SYS_EMERGENCY")
> I probably should be using raw_notifier_call_chain(), which requires
> a slightly different form of list header but doesn't try to protect
> against someone else adding to the notifier list.
>> 
>> what exactly are you trying to achieve?
>> 
>> 	Ingo
>
> The impetus for these additions is to call back a driver in every case
> that the kernel is going down.  In a partitioned system we need such a
> driver to inform all other partitions that they need to disconnect from
> the rebooting/halting/panicing partition (kernel image).  If they are not
> informed, they may bring themselves crashing down as well.
> (xpc is such a cross_partition driver)

Cool.  Someone who wants this kind of functionality and has code in
the kernel.  Perhaps we can have a reasonable discussion about this.
The last time this came up people wanted a hook so they could support
their out of tree blobs in an enterprise kernel.

emergency_restart only happens or only should happen with explicit admin
request Sysreq-r.  Or when a watchdog detects the system is borked.
By design it is not expected to call drivers.  The kexec on panic
case is similar.

sys_kexec_load is a ridiculous place to call any kind of reboot notifier
because it is a prep function that doesn't require any kind of connection
to rebooting.

As far as this being a generic problem I half agree.  It seems to depend
on your platform if you need something like this.

With that said.  I suggest we have a single platform specific function 
that can be called.  Possibly something like pm_power_off.  It is
insanely important that we be able to audit all of the code on these
code paths, and that a random loadable driver not be able to get in
and mess things up.

Eric

  reply	other threads:[~2008-06-19 22:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-18 22:03 [PATCH] X86: reboot-notify additions Cliff Wickman
2008-06-19  3:02 ` Andi Kleen
2008-06-19 11:02 ` Ingo Molnar
2008-06-19 14:54   ` Cliff Wickman
2008-06-19 21:50     ` Eric W. Biederman [this message]
2008-06-20 15:16       ` Cliff Wickman
2008-06-20 15:43         ` Ingo Molnar
2008-06-20 18:01         ` Eric W. Biederman
2008-06-20 19:18           ` Cliff Wickman
2008-06-20 21:33             ` Eric W. Biederman
2008-06-20 11:45     ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1d4mdksau.fsf@frodo.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=andi@firstfloor.org \
    --cc=cpw@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.