* Re: What's the best way to call sd_shutdown() on all SCSI disks on shutdown?
2018-07-30 19:17 What's the best way to call sd_shutdown() on all SCSI disks on shutdown? Theodore Y. Ts'o
@ 2018-07-30 20:35 ` Bart Van Assche
2018-07-30 20:35 ` James Bottomley
1 sibling, 0 replies; 3+ messages in thread
From: Bart Van Assche @ 2018-07-30 20:35 UTC (permalink / raw)
To: linux-scsi@vger.kernel.org, linux-block@vger.kernel.org,
tytso@mit.edu
On Mon, 2018-07-30 at 15:17 -0400, Theodore Y. Ts'o wrote:
> I've been looking at what's the best way to make sure everything gets
> cleanly flushed out to disk on a powerdown. Right now in
> __orderly_poweroff(), we call emergency_sync() which ki=
cks a workqueue
> to flush all file systems and block devices --- and then we
> immediately power down the system, before the scheduler even has a
> chance to schedule the workqueue thread. Hopefully userspace has the
> unmounted all file systems, which will has implicitly issued a cache
> flush command, but if we have a userspace program writing to a block
> device directly, currently there's nothing to make sure things will
> get flushed out to the device.
>=20
> Beyond that, though, I'm interested in figuring out how to make sure
> that all SCSI devices will receive (and acknowledge) SHUTDOWN command
> so that the disks can be spun down and heads retracted to a safe
> landing zone before we power down the system.
>=20
> It appears the best way to do this is to call sd_shutdown(), sinc=
e we
> don't seem to have a high-level "shutdown" concept recognized=
in the
> block layer (the way we currently, have, say support for "discard=
").
>=20
> So the question is, what's the best way to architect something like
> this. I could implement a hacky interator loop in the SCSI subsystem=
,
> and call it directly from __orderly_poweroff in kernel/rebo=
ot.c. But
> I'm pretty sure that would never get accepted upstream, and so it
> would remain a Google data center hack.
>=20
> What do people think would be the best way of implementing something
> that would be upstream acceptable?
Hi Ted,
Isn't that behavior a bug in __orderly_poweroff()? My understandi=
ng is that
__orderly_poweroff() calls run_cmd(poweroff_cmd). If the =
poweroff command
gets the chance to run then it will execute the reboot() system call to pow=
er
off the system. The reboot() system call then calls kernel_power_of=
f(). That
last function then calls device_shutdown(). device_shutdown() calls=
the
.shutdown() method for all known devices. As you know the sd driver
implements that method. More in detail:
architecture code or ACPI code
-> orderly_poweroff()
-> poweroff_work_func()
-> __orderly_poweroff()
-> run_cmd(poweroff_cmd)
-> call_usermodehelper("/sbin/poweroff")
>From the systemd source file src/core/shutdown.c:
(void) reboot(RB_POWER_OFF);
>From kernel/reboot.c:
SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
void __user *, arg)
{
...
case LINUX_REBOOT_CMD_POWER_OFF:
kernel_power_off();
do_exit(0);
break;
...
}
void kernel_power_off(void)
{
kernel_shutdown_prepare(SYSTEM_POWER_OFF);
if (pm_power_off_prepare)
pm_power_off_prepare();
migrate_to_reboot_cpu();
syscore_shutdown();
pr_emerg("Power down\n");
kmsg_dump(KMSG_DUMP_POWEROFF);
machine_power_off();
}
static void kernel_shutdown_prepare(enum system_states state)
{
blocking_notifier_call_chain(&reboot_notifier_list=
,
(state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, =
NULL);
system_state = state;
usermodehelper_disable();
device_shutdown();
}
>From drivers/base/core.c:
void device_shutdown(void)
{
...
if (dev->class && dev->class->shutdown_pre) {
if (initcall_debug)
dev_info(dev, "shutdown_pre\n");
dev->class->shutdown_pre(dev);
}
if (dev->bus && dev->bus->shutdown) {
if (initcall_debug)
dev_info(dev, "shutdown\n");
dev->bus->shutdown(dev);
} else if (dev->driver && dev->driver->shutdown) =
Hs-
if (initcall_debug)
dev_info(dev, "shutdown\n");
dev->driver->shutdown(dev);
}
...
}
Bart.
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: What's the best way to call sd_shutdown() on all SCSI disks on shutdown?
2018-07-30 19:17 What's the best way to call sd_shutdown() on all SCSI disks on shutdown? Theodore Y. Ts'o
2018-07-30 20:35 ` Bart Van Assche
@ 2018-07-30 20:35 ` James Bottomley
1 sibling, 0 replies; 3+ messages in thread
From: James Bottomley @ 2018-07-30 20:35 UTC (permalink / raw)
To: Theodore Y. Ts'o, linux-scsi, linux-block
On Mon, 2018-07-30 at 15:17 -0400, Theodore Y. Ts'o wrote:
> I've been looking at what's the best way to make sure everything gets
> cleanly flushed out to disk on a powerdown. Right now in
> __orderly_poweroff(), we call emergency_sync() which kicks a
> workqueue to flush all file systems and block devices --- and then we
> immediately power down the system, before the scheduler even has a
> chance to schedule the workqueue thread. Hopefully userspace has the
> unmounted all file systems, which will has implicitly issued a cache
> flush command, but if we have a userspace program writing to a block
> device directly, currently there's nothing to make sure things will
> get flushed out to the device.
>
> Beyond that, though, I'm interested in figuring out how to make sure
> that all SCSI devices will receive (and acknowledge) SHUTDOWN command
> so that the disks can be spun down and heads retracted to a safe
> landing zone before we power down the system.
The basic way to do this is to shut down the scsi bus, see below.
> It appears the best way to do this is to call sd_shutdown(), since we
> don't seem to have a high-level "shutdown" concept recognized in the
> block layer (the way we currently, have, say support for "discard").
>
> So the question is, what's the best way to architect something like
> this. I could implement a hacky interator loop in the SCSI
> subsystem, and call it directly from __orderly_poweroff in
> kernel/reboot.c. But I'm pretty sure that would never get accepted
> upstream, and so it would remain a Google data center hack.
>
> What do people think would be the best way of implementing something
> that would be upstream acceptable?
The sd_shutdown function is fully plumbed in to the current sysfs model
with every scsi device being on a dummy scsi bus. So if you detach the
device from the scsi bus, the remove function (which calls sd_shutdown)
gets called as part of the detach. At the moment, the way that happens
is either by specific detach of the device or via the module_exit
function of SCSI, so if you can get that called before the system shuts
down everything should just work. To be honest, I really thought this
did actually happen anyway today. The separate device_shutdown()
method in the kernel_shutdown_prepare() should call our sd_shutdown
method (eventually), can you investigate why that isn't working for you
... is it being called too late?
Alternatively, if you can find a way to get sysfs to trigger a shutdown
on all its busses at some point then we'll get swept up in that.
Finally, you could keep a list of busses needing to be shut down for
storage safety and we could add scsi to that.
James
^ permalink raw reply [flat|nested] 3+ messages in thread