All of lore.kernel.org
 help / color / mirror / Atom feed
* What's the best way to call sd_shutdown() on all SCSI disks on shutdown?
@ 2018-07-30 19:17 Theodore Y. Ts'o
  2018-07-30 20:35 ` Bart Van Assche
  2018-07-30 20:35 ` James Bottomley
  0 siblings, 2 replies; 3+ messages in thread
From: Theodore Y. Ts'o @ 2018-07-30 19:17 UTC (permalink / raw)
  To: linux-scsi, linux-block

I've been looking at what's the best way to make sure everything gets
cleanly flushed out to disk on a powerdown.  Right now in
__orderly_poweroff(), we call emergency_sync() which kicks a workqueue
to flush all file systems and block devices --- and then we
immediately power down the system, before the scheduler even has a
chance to schedule the workqueue thread.  Hopefully userspace has the
unmounted all file systems, which will has implicitly issued a cache
flush command, but if we have a userspace program writing to a block
device directly, currently there's nothing to make sure things will
get flushed out to the device.

Beyond that, though, I'm interested in figuring out how to make sure
that all SCSI devices will receive (and acknowledge) SHUTDOWN command
so that the disks can be spun down and heads retracted to a safe
landing zone before we power down the system.

It appears the best way to do this is to call sd_shutdown(), since we
don't seem to have a high-level "shutdown" concept recognized in the
block layer (the way we currently, have, say support for "discard").

So the question is, what's the best way to architect something like
this.  I could implement a hacky interator loop in the SCSI subsystem,
and call it directly from __orderly_poweroff in kernel/reboot.c.  But
I'm pretty sure that would never get accepted upstream, and so it
would remain a Google data center hack.

What do people think would be the best way of implementing something
that would be upstream acceptable?

Thanks,

						- Ted

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: What's the best way to call sd_shutdown() on all SCSI disks on shutdown?
  2018-07-30 19:17 What's the best way to call sd_shutdown() on all SCSI disks on shutdown? Theodore Y. Ts'o
@ 2018-07-30 20:35 ` Bart Van Assche
  2018-07-30 20:35 ` James Bottomley
  1 sibling, 0 replies; 3+ messages in thread
From: Bart Van Assche @ 2018-07-30 20:35 UTC (permalink / raw)
  To: linux-scsi@vger.kernel.org, linux-block@vger.kernel.org,
	tytso@mit.edu

On Mon, 2018-07-30 at 15:17 -0400, Theodore Y. Ts'o wrote:
> I've been looking at what's the best way to make sure everything gets
> cleanly flushed out to disk on a powerdown.  Right now in
> __orderly_poweroff(), we call emergency_sync() which ki=
cks a workqueue
> to flush all file systems and block devices --- and then we
> immediately power down the system, before the scheduler even has a
> chance to schedule the workqueue thread.  Hopefully userspace has the
> unmounted all file systems, which will has implicitly issued a cache
> flush command, but if we have a userspace program writing to a block
> device directly, currently there's nothing to make sure things will
> get flushed out to the device.
>=20
> Beyond that, though, I'm interested in figuring out how to make sure
> that all SCSI devices will receive (and acknowledge) SHUTDOWN command
> so that the disks can be spun down and heads retracted to a safe
> landing zone before we power down the system.
>=20
> It appears the best way to do this is to call sd_shutdown(), sinc=
e we
> don't seem to have a high-level "shutdown" concept recognized=
 in the
> block layer (the way we currently, have, say support for "discard=
").
>=20
> So the question is, what's the best way to architect something like
> this.  I could implement a hacky interator loop in the SCSI subsystem=
,
> and call it directly from __orderly_poweroff in kernel/rebo=
ot.c.  But
> I'm pretty sure that would never get accepted upstream, and so it
> would remain a Google data center hack.
>=20
> What do people think would be the best way of implementing something
> that would be upstream acceptable?

Hi Ted,

Isn't that behavior a bug in __orderly_poweroff()? My understandi=
ng is that
__orderly_poweroff() calls run_cmd(poweroff_cmd). If the =
poweroff command
gets the chance to run then it will execute the reboot() system call to pow=
er
off the system. The reboot() system call then calls kernel_power_of=
f(). That
last function then calls device_shutdown(). device_shutdown() calls=
 the
.shutdown() method for all known devices. As you know the sd driver
implements that method. More in detail:

architecture code or ACPI code
-> orderly_poweroff()
  -> poweroff_work_func()
    -> __orderly_poweroff()
      -> run_cmd(poweroff_cmd)
        -> call_usermodehelper("/sbin/poweroff")

>From the systemd source file src/core/shutdown.c:

    (void) reboot(RB_POWER_OFF);

>From kernel/reboot.c:

SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
		void __user *, arg)
{
	...
	case LINUX_REBOOT_CMD_POWER_OFF:
		kernel_power_off();
		do_exit(0);
		break;
	...
}

void kernel_power_off(void)
{
	kernel_shutdown_prepare(SYSTEM_POWER_OFF);
	if (pm_power_off_prepare)
		pm_power_off_prepare();
	migrate_to_reboot_cpu();
	syscore_shutdown();
	pr_emerg("Power down\n");
	kmsg_dump(KMSG_DUMP_POWEROFF);
	machine_power_off();
}

static void kernel_shutdown_prepare(enum system_states state)
{
	blocking_notifier_call_chain(&reboot_notifier_list=
,
		(state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, =
NULL);
	system_state = state;
	usermodehelper_disable();
	device_shutdown();
}

>From drivers/base/core.c:

void device_shutdown(void)
{
		...
		if (dev->class && dev->class->shutdown_pre) {
			if (initcall_debug)
				dev_info(dev, "shutdown_pre\n");
			dev->class->shutdown_pre(dev);
		}
		if (dev->bus && dev->bus->shutdown) {
			if (initcall_debug)
				dev_info(dev, "shutdown\n");
			dev->bus->shutdown(dev);
		} else if (dev->driver && dev->driver->shutdown) =
Hs-
			if (initcall_debug)
				dev_info(dev, "shutdown\n");
			dev->driver->shutdown(dev);
		}
		...
}

Bart.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: What's the best way to call sd_shutdown() on all SCSI disks on shutdown?
  2018-07-30 19:17 What's the best way to call sd_shutdown() on all SCSI disks on shutdown? Theodore Y. Ts'o
  2018-07-30 20:35 ` Bart Van Assche
@ 2018-07-30 20:35 ` James Bottomley
  1 sibling, 0 replies; 3+ messages in thread
From: James Bottomley @ 2018-07-30 20:35 UTC (permalink / raw)
  To: Theodore Y. Ts'o, linux-scsi, linux-block

On Mon, 2018-07-30 at 15:17 -0400, Theodore Y. Ts'o wrote:
> I've been looking at what's the best way to make sure everything gets
> cleanly flushed out to disk on a powerdown.  Right now in
> __orderly_poweroff(), we call emergency_sync() which kicks a
> workqueue to flush all file systems and block devices --- and then we
> immediately power down the system, before the scheduler even has a
> chance to schedule the workqueue thread.  Hopefully userspace has the
> unmounted all file systems, which will has implicitly issued a cache
> flush command, but if we have a userspace program writing to a block
> device directly, currently there's nothing to make sure things will
> get flushed out to the device.
> 
> Beyond that, though, I'm interested in figuring out how to make sure
> that all SCSI devices will receive (and acknowledge) SHUTDOWN command
> so that the disks can be spun down and heads retracted to a safe
> landing zone before we power down the system.

The basic way to do this is to shut down the scsi bus, see below.

> It appears the best way to do this is to call sd_shutdown(), since we
> don't seem to have a high-level "shutdown" concept recognized in the
> block layer (the way we currently, have, say support for "discard").
> 
> So the question is, what's the best way to architect something like
> this.  I could implement a hacky interator loop in the SCSI
> subsystem, and call it directly from __orderly_poweroff in
> kernel/reboot.c.  But I'm pretty sure that would never get accepted
> upstream, and so it would remain a Google data center hack.
> 
> What do people think would be the best way of implementing something
> that would be upstream acceptable?

The sd_shutdown function is fully plumbed in to the current sysfs model
with every scsi device being on a dummy scsi bus. So if you detach the
device from the scsi bus, the remove function (which calls sd_shutdown)
gets called as part of the detach.  At the moment, the way that happens
is either by specific detach of the device or via the module_exit
function of SCSI, so if you can get that called before the system shuts
down everything should just work.  To be honest, I really thought this
did actually happen anyway today.  The separate device_shutdown()
method in the kernel_shutdown_prepare() should call our sd_shutdown
method (eventually), can you investigate why that isn't working for you
... is it being called too late?

Alternatively, if you can find a way to get sysfs to trigger a shutdown
on all its busses at some point then we'll get swept up in that. 
Finally, you could keep a list of busses needing to be shut down for
storage safety and we could add scsi to that.

James

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-07-30 22:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-30 19:17 What's the best way to call sd_shutdown() on all SCSI disks on shutdown? Theodore Y. Ts'o
2018-07-30 20:35 ` Bart Van Assche
2018-07-30 20:35 ` James Bottomley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.