All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Ying Han <yinghan@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch 2/2] oom: fix race while temporarily setting current's oom_score_adj
Date: Tue, 30 Aug 2011 17:57:33 +0200	[thread overview]
Message-ID: <20110830155733.GB22754@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1108300041330.21066@chino.kir.corp.google.com>

On 08/30, David Rientjes wrote:
>
> Using that function to both set oom_score_adj to OOM_SCORE_ADJ_MAX and
> then reinstate the previous value is racy since it's possible that
> userspace can set the value to something else itself before the old value
> is reinstated.  That results in userspace setting current's oom_score_adj
> to a different value and then the kernel immediately setting it back to
> its previous value without notification.

Sure,

> To fix this, a new compare_swap_oom_score_adj() function is introduced
> with the same semantics as the compare and swap CAS instruction, or
> CMPXCHG on x86.  It is used to reinstate the previous value of
> oom_score_adj if and only if the present value is the same as the old
> value.

But this can't fix the race completely ?

> +void compare_swap_oom_score_adj(int old_val, int new_val)
> +{
> +	struct sighand_struct *sighand = current->sighand;
> +
> +	spin_lock_irq(&sighand->siglock);
> +	if (current->signal->oom_score_adj == old_val)
> +		current->signal->oom_score_adj = new_val;
> +	spin_unlock_irq(&sighand->siglock);
> +}

So. This is used this way:

	old_val = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX);

	do_something();

	compare_swap_oom_score_adj(OOM_SCORE_ADJ_MAX, old_val);

What if userspace sets oom_score_adj = OOM_SCORE_ADJ_MAX in between?
May be the callers should use OOM_SCORE_ADJ_MAX + 1 instead, this way
we can't confuse old_val with the value from the userspace...




But in fact I am writing this email because I have the question.
Do we really need 2 helpers, and do we really need to allow to set
the arbitrary value?

I mean, perhaps we can do something like

	void set_oom_victim(bool on)
	{
		if (on) {
			oom_score_adj += ADJ_MAX - ADJ_MIN + 1;
		} else if (oom_score_adj > ADJ_MAX) {
			oom_score_adj -= ADJ_MAX - ADJ_MIN + 1;
		}
	}

Not sure this really makes sense, just curious.

Oleg.


WARNING: multiple messages have this Message-ID (diff)
From: Oleg Nesterov <oleg@redhat.com>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Ying Han <yinghan@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [patch 2/2] oom: fix race while temporarily setting current's oom_score_adj
Date: Tue, 30 Aug 2011 17:57:33 +0200	[thread overview]
Message-ID: <20110830155733.GB22754@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1108300041330.21066@chino.kir.corp.google.com>

On 08/30, David Rientjes wrote:
>
> Using that function to both set oom_score_adj to OOM_SCORE_ADJ_MAX and
> then reinstate the previous value is racy since it's possible that
> userspace can set the value to something else itself before the old value
> is reinstated.  That results in userspace setting current's oom_score_adj
> to a different value and then the kernel immediately setting it back to
> its previous value without notification.

Sure,

> To fix this, a new compare_swap_oom_score_adj() function is introduced
> with the same semantics as the compare and swap CAS instruction, or
> CMPXCHG on x86.  It is used to reinstate the previous value of
> oom_score_adj if and only if the present value is the same as the old
> value.

But this can't fix the race completely ?

> +void compare_swap_oom_score_adj(int old_val, int new_val)
> +{
> +	struct sighand_struct *sighand = current->sighand;
> +
> +	spin_lock_irq(&sighand->siglock);
> +	if (current->signal->oom_score_adj == old_val)
> +		current->signal->oom_score_adj = new_val;
> +	spin_unlock_irq(&sighand->siglock);
> +}

So. This is used this way:

	old_val = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX);

	do_something();

	compare_swap_oom_score_adj(OOM_SCORE_ADJ_MAX, old_val);

What if userspace sets oom_score_adj = OOM_SCORE_ADJ_MAX in between?
May be the callers should use OOM_SCORE_ADJ_MAX + 1 instead, this way
we can't confuse old_val with the value from the userspace...




But in fact I am writing this email because I have the question.
Do we really need 2 helpers, and do we really need to allow to set
the arbitrary value?

I mean, perhaps we can do something like

	void set_oom_victim(bool on)
	{
		if (on) {
			oom_score_adj += ADJ_MAX - ADJ_MIN + 1;
		} else if (oom_score_adj > ADJ_MAX) {
			oom_score_adj -= ADJ_MAX - ADJ_MIN + 1;
		}
	}

Not sure this really makes sense, just curious.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-08-30 16:00 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-27 16:31 [PATCH 0/8] make vfork killable/restartable/traceable Oleg Nesterov
2011-07-27 16:32 ` [PATCH 1/8] vfork: introduce complete_vfork_done() Oleg Nesterov
2011-07-27 16:32 ` [PATCH 2/8] vfork: introduce clone_vfork_finish() Oleg Nesterov
2011-07-27 16:32 ` [PATCH 3/8] vfork: make it killable Oleg Nesterov
2011-07-29 13:02   ` Matt Fleming
2011-07-29 14:32     ` Oleg Nesterov
2011-07-29 15:32       ` Matt Fleming
2011-07-27 16:33 ` [PATCH 4/8] coredump_wait: don't call complete_vfork_done() Oleg Nesterov
2011-07-29 13:02   ` Matt Fleming
2011-07-29 14:25     ` Oleg Nesterov
2011-07-29 15:26       ` Matt Fleming
2011-07-27 16:33 ` [PATCH 5/8] introduce find_get_task_by_vpid() Oleg Nesterov
2011-07-27 16:33 ` [PATCH 6/8] vfork: do not setup child->vfork_done beforehand Oleg Nesterov
2011-07-27 16:34 ` [PATCH 7/8] vfork: make it stoppable/traceable Oleg Nesterov
2011-07-27 16:34 ` [PATCH 8/8] vfork: do not block SIG_DFL/SIG_IGN signals is single-threaded Oleg Nesterov
2011-07-27 16:34 ` [PATCH 9/8] kill PF_STARTING Oleg Nesterov
2011-07-27 19:39 ` [PATCH 0/8] make vfork killable/restartable/traceable Linus Torvalds
2011-07-28 13:59   ` Oleg Nesterov
2011-07-28 14:58     ` Oleg Nesterov
2011-07-27 22:38 ` Pedro Alves
2011-07-29 19:23 ` Tejun Heo
2011-08-12 17:55   ` [PATCH v2 0/3] make vfork killable Oleg Nesterov
2011-08-12 17:56     ` [PATCH 1/3] vfork: introduce complete_vfork_done() Oleg Nesterov
2011-08-12 17:56     ` [PATCH 2/3] vfork: make it killable Oleg Nesterov
2011-08-19 20:33       ` Matt Fleming
2011-08-22 13:35         ` Oleg Nesterov
2011-08-12 17:56     ` [PATCH 3/3] coredump_wait: don't call complete_vfork_done() Oleg Nesterov
2011-08-17  7:50       ` Tejun Heo
2011-08-17 15:11         ` Oleg Nesterov
2011-08-12 17:57     ` [PATCH 4/3] kill PF_STARTING Oleg Nesterov
2011-08-17  7:51       ` Tejun Heo
2011-08-13 16:18     ` [PATCH v2 0/3] make vfork killable Tejun Heo
2011-08-15 19:42       ` Oleg Nesterov
2011-08-16 19:42         ` Tejun Heo
2011-08-23 22:01       ` Matt Helsley
2011-08-23 22:12         ` Tejun Heo
     [not found] ` <20110727163610.GJ23793@redhat.com>
     [not found]   ` <20110727175624.GA3950@redhat.com>
     [not found]     ` <20110728154324.GA22864@redhat.com>
     [not found]       ` <alpine.DEB.2.00.1107281341060.16093@chino.kir.corp.google.com>
     [not found]         ` <20110729141431.GA3501@redhat.com>
     [not found]           ` <20110730143426.GA6061@redhat.com>
2011-07-30 15:22             ` mm->oom_disable_count is broken Oleg Nesterov
2011-08-01 11:52               ` KOSAKI Motohiro
2011-08-29 18:37                 ` Oleg Nesterov
2011-08-29 23:17                   ` David Rientjes
2011-08-30  7:43                     ` [patch 1/2] oom: remove oom_disable_count David Rientjes
2011-08-30  7:43                       ` David Rientjes
2011-08-30  7:43                       ` [patch 2/2] oom: fix race while temporarily setting current's oom_score_adj David Rientjes
2011-08-30  7:43                         ` David Rientjes
2011-08-30 15:57                         ` Oleg Nesterov [this message]
2011-08-30 15:57                           ` Oleg Nesterov
2011-08-30 15:28                       ` [patch 1/2] oom: remove oom_disable_count Oleg Nesterov
2011-08-30 15:28                         ` Oleg Nesterov
2011-08-30 22:06                         ` David Rientjes
2011-08-30 22:06                           ` David Rientjes
2011-08-30 16:17                     ` mm->oom_disable_count is broken Oleg Nesterov
2011-08-10 21:44 ` [PATCH 0/8] make vfork killable/restartable/traceable Pavel Machek
2011-08-11 16:09   ` Oleg Nesterov
2011-08-11 16:22     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110830155733.GB22754@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.