Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Oleg Nesterov <oleg@tv-sign.ru>
To: Christoph Lameter <clameter@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Srivatsa Vaddagiri <vatsa@in.ibm.com>,
	"Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>,
	Gautham shenoy <ego@in.ibm.com>, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org
Subject: Re: slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems
Date: Mon, 29 Jan 2007 20:19:23 +0300	[thread overview]
Message-ID: <20070129171923.GA138@tv-sign.ru> (raw)
In-Reply-To: <Pine.LNX.4.64.0701290852390.28200@schroedinger.engr.sgi.com>

On 01/29, Christoph Lameter wrote:
>
> On Mon, 29 Jan 2007, Oleg Nesterov wrote:
> 
> > Now,
> > 	static void __devinit start_cpu_timer(int cpu)
> > 	{
> > 		struct delayed_work *reap_work = &per_cpu(reap_work, cpu);
> > 
> > 		if (keventd_up() && reap_work->work.func == NULL) {
> > 			init_reap_node(cpu);
> > 			INIT_DELAYED_WORK(reap_work, cache_reap);
> > 			schedule_delayed_work_on(cpu, reap_work,
> > 						__round_jiffies_relative(HZ, cpu));
> > 		}
> > 	}
> > 
> > This is wrong. Suppose we have a CPU_UP,CPU_DOWN,CPU_UP sequence. The last
> > CPU_UP will not restart a per-cpu "cache_reap timer".
> 
> Why?

Because the last CPU_UP calls start_cpu_timer(), but since ->work.func != NULL
we don't do schedule_delayed_work_on(). I think (if I am right) this is a slab's
bug.

> > With or without recent changes, it is possible that work->func() will run on
> > another CPU (not that to which it was submitted) if CPU goes down. In fact,
> > this can happen while work->func() is running, so even smp_processor_id()
> > is not safe to use in work->func().
> 
> But the work func was scheduled by schedule_delayed_work_on(). Isnt that a 
> general problem with schedule_delayed_work_on() and keventd?

I think this is yet another problem with workqueues/cpu-hotplug interaction.
Probably, the problem is more general. With CONFIG_CPU_HOTPLUG, we can't
garantee that smp_processor_id() is stable even if the thread is pinned to
specific processor.

> > However, cache_reap() seems to wrongly assume that smp_processor_id() is stable,
> > this is the second problem.
> >
> > Is my understanding correct?
>
> cache_reap assumes that the processor id is stable based on the kevent 
> thread being pinned to a processor.

I think cache_reap() is not alone, and this is not its fault.

But please note another minor problem,

	void cache_reap(struct work_struct *unused)
	{
		...

		schedule_delayed_work(&__get_cpu_var(reap_work), ...);
	}

Even if smp_processor_id() was stable during the execution of cache_reap(),
this work_struct can be moved to another CPU if CPU_DEAD happens. We can't
avoid this, and this is correct.

This means that __get_cpu_var(reap_work) returns a "wrong" struct delayed_work.
This is absolutely harmless right now, but may be it's better to use
container_of(unused, struct delayed_work, work).

Oleg.

next prev parent reply	other threads:[~2007-01-29 17:20 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-29  1:13 slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems Oleg Nesterov
2007-01-29 16:54 ` Christoph Lameter
2007-01-29 17:19   ` Oleg Nesterov [this message]
2007-01-29 17:27     ` Christoph Lameter
2007-01-29 18:27       ` Oleg Nesterov
2007-01-29 19:09         ` Christoph Lameter
2007-01-29 19:29           ` Oleg Nesterov
2007-01-29 19:25         ` Christoph Lameter
2007-01-29 19:49           ` Oleg Nesterov
2007-01-29 20:29             ` Christoph Lameter
2007-01-29 21:05               ` Oleg Nesterov
2007-01-29 21:48                 ` Christoph Lameter
2007-01-29 22:14                   ` Oleg Nesterov
2007-02-20 18:39         ` Max Krasnyansky
2007-02-20 18:45           ` Christoph Lameter
2007-02-20 20:05             ` Oleg Nesterov
2007-02-20 21:22               ` Max Krasnyansky
2007-02-20 21:35                 ` Christoph Lameter
2007-02-20 22:01                   ` Max Krasnyansky
2007-02-20 22:14                     ` Christoph Lameter
2007-02-20 22:48                       ` SLAB cache reaper on isolated cpus Max Krasnyansky
2007-02-20 23:19                         ` Christoph Lameter
2007-02-21  3:41                           ` Max Krasnyansky
2007-02-20 21:05             ` slab: start_cpu_timer/cache_reap CONFIG_HOTPLUG_CPU problems Max Krasnyansky
2007-02-20 21:34               ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070129171923.GA138@tv-sign.ru \
    --to=oleg@tv-sign.ru \
    --cc=akpm@osdl.org \
    --cc=clameter@sgi.com \
    --cc=ego@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=vatsa@in.ibm.com \
    --cc=venkatesh.pallipadi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.