public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Andi Kleen <andi@firstfloor.org>,
	Lucas De Marchi <lucas.de.marchi@gmail.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Paul Mackerras <paulus@samba.org>,
	david@gibson.dropbear.id.au, Kees Cook <keescook@chromium.org>,
	Serge Hallyn <serge.hallyn@canonical.com>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Feng Hong <hongfeng@marvell.com>,
	Lucas De Marchi <lucas.demarchi@profusion.mobi>
Subject: Re: [PATCH 1/1] poweroff: change orderly_poweroff() to use schedule_work()
Date: Fri, 15 Mar 2013 17:39:16 +0100	[thread overview]
Message-ID: <20130315163916.GA31995@redhat.com> (raw)
In-Reply-To: <20130314152819.7fb1242b493e8bad2d34671b@linux-foundation.org>

On 03/14, Andrew Morton wrote:
>
> On Wed, 13 Mar 2013 18:47:05 +0100 Oleg Nesterov <oleg@redhat.com> wrote:
>
> > This means that orderly_poweroff() becomes async even if we do not
> > run the command and always succeeds, schedule_work() can only fail
> > if the work is already pending. We can export __orderly_poweroff()
> > and change the non-atomic callers which want the old semantics.
> >
> > ...
> >
> > @@ -2218,21 +2237,9 @@ static int __orderly_poweroff(void)
> >   */
> >  int orderly_poweroff(bool force)
> >  {
> > +	if (force) /* do not override the pending "true" */
> > +		poweroff_force = true;
> > +	schedule_work(&poweroff_work);
> > +	return 0;
> >  }
>
> afaict the current version of orderly_poweroff() will never return -
> either __orderly_poweroff() will block until the machine shuts down or
> kernel_power_off() will do so.

Note that __orderly_poweroff() uses UMH_WAIT_EXEC, not UMH_WAIT_PROC,
so it returns right after /sbin/poweroff starts to execute.

So it is already asynchronous unless execve() fails.

> However with this patch there is a path via which orderly_poweroff()
> can return to its caller, I think?

See above, but please also read the changelog.

With this patch orderly_poweroff() is always async, even if exec fails,
but

> If so, the caller might be rather
> surprised and we're exercising never-before-used code paths.  In fact
> if the surprised caller goes oops, the poweroff might not occur at all.

This should not happen.

Anyway. Please also note that now we can export __orderly_poweroff() and
probably change it, it can have another argument "bool use_UMH_WAIT_PROC".

	int __orderly_poweroff(bool force, bool sync)
	{
		int wait = sync ? UMH_WAIT_EXEC : UMH_WAIT_EXEC;

		ret = call_usermodehelper(argv[0], argv, envp, wait);

		if (force) {
			// EXEC failed or /sbin/poweroff didn't do its work
			if (ret || sync)
				kernel_power_off();
		}
	}

The non-atomic callers can use __orderly_poweroff(sync => true).

---------------------------------------------------------------------------
And, Andrew, et all... Could you help with another mentioned problem? It is
really trivial, but exactly because it is trivial I do not know what should
I do.

To remind, say, argv_split(poweroff_cmd) can race with sysctl changing this
string, in this case it can write to the memory after argv[] array. We can
fix this, or we can rewrite argv_split/free:

	void argv_free(char **argv)
	{
		kfree(argv[-1]);
		kfree(argv);
	}

	char **argv_split(gfp_t gfp, const char *str, int *argcp)
	{
		char *argv_str;
		bool was_space;
		char **argv, **argv_ret;
		int argc;

		argv_str = kstrndup(str, KMALLOC_MAX_SIZE, gfp);
		if (!argv_str)
			return NULL;

		argc = count_argc(argv_str);
		argv = kmalloc(sizeof(*argv) * (argc + 2), gfp);
		if (!argv) {
			kfree(argv_str);
			return NULL;
		}

		*argv = argv_str;
		argv_ret = ++argv;
		for (was_space = true; *argv_str; argv_str++) {
			if (isspace(*argv_str)) {
				was_space = true;
				*argv_str = 0;
			} else if (was_space) {
				was_space = false;
				*argv++ = argv_str;
			}
		}
		*argv = NULL;

		if (argcp)
			*argcp = argc;
		return argv_ret;
	}

This way it uses a single kstrndup() to keep the arguments and it is
always safe.

But, whatever we do with argv_split(), it can hit the string "in between".
Personally I think we do not really care, but...

Perhaps we should add proc_dostring_lock() which takes some lock and
modify the callers of argv_split() (or add argv_split_lock) ?

Or perhaps we should introduce the rwsem which should protect every
sysctl-string and proc_dostring() should take this lock?

Help! I'd prefer to rewrite argv_split(), but I agree with any suggestion
in advance.

Oleg.


  reply	other threads:[~2013-03-15 16:41 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-12  3:25 Regression with orderly_poweroff() Benjamin Herrenschmidt
2013-03-12 14:46 ` Linus Torvalds
2013-03-12 17:46   ` Oleg Nesterov
2013-03-12 17:54   ` Lucas De Marchi
2013-03-12 18:22     ` Oleg Nesterov
2013-03-12 18:42       ` Linus Torvalds
2013-03-12 19:11         ` Oleg Nesterov
2013-03-12 19:20           ` Linus Torvalds
2013-03-12 20:35             ` Oleg Nesterov
2013-03-13 17:46               ` [PATCH 0/1] poweroff: change orderly_poweroff() to use schedule_work() Oleg Nesterov
2013-03-13 17:47                 ` [PATCH 1/1] " Oleg Nesterov
2013-03-14 22:28                   ` Andrew Morton
2013-03-15 16:39                     ` Oleg Nesterov [this message]
2013-03-16 20:23                       ` [PATCH 0/2] finx argv_split() vs sysctl race Oleg Nesterov
2013-03-16 20:23                         ` [PATCH 1/2] teach argv_split() to handle the mutable strings Oleg Nesterov
2013-03-18 16:03                           ` [PATCH v2 " Oleg Nesterov
2013-03-18 21:53                           ` [PATCH " Andrew Morton
2013-03-19 19:54                             ` [PATCH -mm] argv_split-teach-it-to-handle-mutable-strings-fix-2 Oleg Nesterov
2013-03-16 20:24                         ` [PATCH 2/2] set_task_comm: kill the pointless memset() + wmb() Oleg Nesterov
2013-03-16 20:32                         ` [PATCH 0/2] finx argv_split() vs sysctl race Andi Kleen
2013-03-16 20:45                           ` Oleg Nesterov
2013-03-16 20:56                             ` Andi Kleen
2013-03-16 21:23                               ` Oleg Nesterov
2013-03-16 21:54                                 ` Andi Kleen
2013-03-17 14:15                                   ` Oleg Nesterov
2013-03-18 16:03                                     ` Oleg Nesterov
2013-03-13 23:35                 ` [PATCH 0/1] poweroff: change orderly_poweroff() to use schedule_work() Lucas De Marchi
2013-03-12 20:13           ` Regression with orderly_poweroff() Andi Kleen
2013-03-12 19:28   ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130315163916.GA31995@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=benh@kernel.crashing.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=hongfeng@marvell.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucas.de.marchi@gmail.com \
    --cc=lucas.demarchi@profusion.mobi \
    --cc=paulus@samba.org \
    --cc=rjw@sisk.pl \
    --cc=serge.hallyn@canonical.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox