All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Andi Kleen <andi@firstfloor.org>,
	Lucas De Marchi <lucas.de.marchi@gmail.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Paul Mackerras <paulus@samba.org>,
	david@gibson.dropbear.id.au, Kees Cook <keescook@chromium.org>,
	Serge Hallyn <serge.hallyn@canonical.com>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Feng Hong <hongfeng@marvell.com>,
	Lucas De Marchi <lucas.demarchi@profusion.mobi>
Subject: Re: [PATCH 1/1] poweroff: change orderly_poweroff() to use schedule_work()
Date: Fri, 15 Mar 2013 17:39:16 +0100	[thread overview]
Message-ID: <20130315163916.GA31995@redhat.com> (raw)
In-Reply-To: <20130314152819.7fb1242b493e8bad2d34671b@linux-foundation.org>

On 03/14, Andrew Morton wrote:
>
> On Wed, 13 Mar 2013 18:47:05 +0100 Oleg Nesterov <oleg@redhat.com> wrote:
>
> > This means that orderly_poweroff() becomes async even if we do not
> > run the command and always succeeds, schedule_work() can only fail
> > if the work is already pending. We can export __orderly_poweroff()
> > and change the non-atomic callers which want the old semantics.
> >
> > ...
> >
> > @@ -2218,21 +2237,9 @@ static int __orderly_poweroff(void)
> >   */
> >  int orderly_poweroff(bool force)
> >  {
> > +	if (force) /* do not override the pending "true" */
> > +		poweroff_force = true;
> > +	schedule_work(&poweroff_work);
> > +	return 0;
> >  }
>
> afaict the current version of orderly_poweroff() will never return -
> either __orderly_poweroff() will block until the machine shuts down or
> kernel_power_off() will do so.

Note that __orderly_poweroff() uses UMH_WAIT_EXEC, not UMH_WAIT_PROC,
so it returns right after /sbin/poweroff starts to execute.

So it is already asynchronous unless execve() fails.

> However with this patch there is a path via which orderly_poweroff()
> can return to its caller, I think?

See above, but please also read the changelog.

With this patch orderly_poweroff() is always async, even if exec fails,
but

> If so, the caller might be rather
> surprised and we're exercising never-before-used code paths.  In fact
> if the surprised caller goes oops, the poweroff might not occur at all.

This should not happen.

Anyway. Please also note that now we can export __orderly_poweroff() and
probably change it, it can have another argument "bool use_UMH_WAIT_PROC".

	int __orderly_poweroff(bool force, bool sync)
	{
		int wait = sync ? UMH_WAIT_EXEC : UMH_WAIT_EXEC;

		ret = call_usermodehelper(argv[0], argv, envp, wait);

		if (force) {
			// EXEC failed or /sbin/poweroff didn't do its work
			if (ret || sync)
				kernel_power_off();
		}
	}

The non-atomic callers can use __orderly_poweroff(sync => true).

---------------------------------------------------------------------------
And, Andrew, et all... Could you help with another mentioned problem? It is
really trivial, but exactly because it is trivial I do not know what should
I do.

To remind, say, argv_split(poweroff_cmd) can race with sysctl changing this
string, in this case it can write to the memory after argv[] array. We can
fix this, or we can rewrite argv_split/free:

	void argv_free(char **argv)
	{
		kfree(argv[-1]);
		kfree(argv);
	}

	char **argv_split(gfp_t gfp, const char *str, int *argcp)
	{
		char *argv_str;
		bool was_space;
		char **argv, **argv_ret;
		int argc;

		argv_str = kstrndup(str, KMALLOC_MAX_SIZE, gfp);
		if (!argv_str)
			return NULL;

		argc = count_argc(argv_str);
		argv = kmalloc(sizeof(*argv) * (argc + 2), gfp);
		if (!argv) {
			kfree(argv_str);
			return NULL;
		}

		*argv = argv_str;
		argv_ret = ++argv;
		for (was_space = true; *argv_str; argv_str++) {
			if (isspace(*argv_str)) {
				was_space = true;
				*argv_str = 0;
			} else if (was_space) {
				was_space = false;
				*argv++ = argv_str;
			}
		}
		*argv = NULL;

		if (argcp)
			*argcp = argc;
		return argv_ret;
	}

This way it uses a single kstrndup() to keep the arguments and it is
always safe.

But, whatever we do with argv_split(), it can hit the string "in between".
Personally I think we do not really care, but...

Perhaps we should add proc_dostring_lock() which takes some lock and
modify the callers of argv_split() (or add argv_split_lock) ?

Or perhaps we should introduce the rwsem which should protect every
sysctl-string and proc_dostring() should take this lock?

Help! I'd prefer to rewrite argv_split(), but I agree with any suggestion
in advance.

Oleg.


  reply	other threads:[~2013-03-15 16:41 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-12  3:25 Regression with orderly_poweroff() Benjamin Herrenschmidt
2013-03-12 14:46 ` Linus Torvalds
2013-03-12 17:46   ` Oleg Nesterov
2013-03-12 17:54   ` Lucas De Marchi
2013-03-12 18:22     ` Oleg Nesterov
2013-03-12 18:42       ` Linus Torvalds
2013-03-12 19:11         ` Oleg Nesterov
2013-03-12 19:20           ` Linus Torvalds
2013-03-12 20:35             ` Oleg Nesterov
2013-03-13 17:46               ` [PATCH 0/1] poweroff: change orderly_poweroff() to use schedule_work() Oleg Nesterov
2013-03-13 17:47                 ` [PATCH 1/1] " Oleg Nesterov
2013-03-14 22:28                   ` Andrew Morton
2013-03-15 16:39                     ` Oleg Nesterov [this message]
2013-03-16 20:23                       ` [PATCH 0/2] finx argv_split() vs sysctl race Oleg Nesterov
2013-03-16 20:23                         ` [PATCH 1/2] teach argv_split() to handle the mutable strings Oleg Nesterov
2013-03-18 16:03                           ` [PATCH v2 " Oleg Nesterov
2013-03-18 21:53                           ` [PATCH " Andrew Morton
2013-03-19 19:54                             ` [PATCH -mm] argv_split-teach-it-to-handle-mutable-strings-fix-2 Oleg Nesterov
2013-03-16 20:24                         ` [PATCH 2/2] set_task_comm: kill the pointless memset() + wmb() Oleg Nesterov
2013-03-16 20:32                         ` [PATCH 0/2] finx argv_split() vs sysctl race Andi Kleen
2013-03-16 20:45                           ` Oleg Nesterov
2013-03-16 20:56                             ` Andi Kleen
2013-03-16 21:23                               ` Oleg Nesterov
2013-03-16 21:54                                 ` Andi Kleen
2013-03-17 14:15                                   ` Oleg Nesterov
2013-03-18 16:03                                     ` Oleg Nesterov
2013-03-13 23:35                 ` [PATCH 0/1] poweroff: change orderly_poweroff() to use schedule_work() Lucas De Marchi
2013-03-12 20:13           ` Regression with orderly_poweroff() Andi Kleen
2013-03-12 19:28   ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130315163916.GA31995@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=benh@kernel.crashing.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=hongfeng@marvell.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucas.de.marchi@gmail.com \
    --cc=lucas.demarchi@profusion.mobi \
    --cc=paulus@samba.org \
    --cc=rjw@sisk.pl \
    --cc=serge.hallyn@canonical.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.