Linux Power Management development

Linux Power Management development
 help / color / mirror / Atom feed

* [RFC PATCH v3 8/9] kvm, vmx: Add atomic synchronization with CPU Hotplug
From: Srivatsa S. Bhat @ 2012-12-07 17:40 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, oleg
  Cc: sbw, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207173702.27305.1486.stgit@srivatsabhat.in.ibm.com>

preempt_disable() will no longer help prevent CPUs from going offline, once
stop_machine() gets removed from the CPU offline path. So use
get/put_online_cpus_atomic() in vmx_vcpu_load() to prevent CPUs from
going offline while clearing vmcs.

Reported-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Debugged-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 arch/x86/kvm/vmx.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f858159..d8a4cf1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1519,10 +1519,14 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
 
-	if (!vmm_exclusive)
+	if (!vmm_exclusive) {
 		kvm_cpu_vmxon(phys_addr);
-	else if (vmx->loaded_vmcs->cpu != cpu)
+	} else if (vmx->loaded_vmcs->cpu != cpu) {
+		/* Prevent any CPU from going offline */
+		get_online_cpus_atomic();
 		loaded_vmcs_clear(vmx->loaded_vmcs);
+		put_online_cpus_atomic();
+	}
 
 	if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
 		per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;

^ permalink raw reply related

* [RFC PATCH v3 9/9] cpu: No more __stop_machine() in _cpu_down()
From: Srivatsa S. Bhat @ 2012-12-07 17:40 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, oleg
  Cc: sbw, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207173702.27305.1486.stgit@srivatsabhat.in.ibm.com>

From: Paul E. McKenney <paul.mckenney@linaro.org>

The _cpu_down() function invoked as part of the CPU-hotplug offlining
process currently invokes __stop_machine(), which is slow and inflicts
substantial real-time latencies on the entire system.  This patch
substitutes stop_one_cpu() for __stop_machine() in order to improve
both performance and real-time latency.

This is currently unsafe, because there are a number of uses of
preempt_disable() that are intended to block CPU-hotplug offlining.
These will be fixed by using get/put_online_cpus_atomic(), but in the
meantime, this commit is one way to help locate them.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ srivatsa.bhat@linux.vnet.ibm.com: Refer to the new sync primitives for
  readers (in the changelog), and s/stop_cpus/stop_one_cpu ]
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/cpu.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index d2076fa..71c65dc 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -532,7 +532,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
 	}
 	smpboot_park_threads(cpu);

-	err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
+	err = stop_one_cpu(cpu, take_cpu_down, &tcd_param);
 	if (err) {
 		/* CPU didn't die: tell everyone.  Can't complain. */
 		smpboot_unpark_threads(cpu);

^ permalink raw reply related

* [RFC PATCH v3 5/9] sched, cpu hotplug: Use stable online cpus in try_to_wake_up() & select_task_rq()
From: Srivatsa S. Bhat @ 2012-12-07 17:39 UTC (permalink / raw)
  To: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, oleg
  Cc: sbw, amit.kucheria, rostedt, rjw, srivatsa.bhat, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207173702.27305.1486.stgit@srivatsabhat.in.ibm.com>

Once stop_machine() is gone from the CPU offline path, we won't be able to
depend on preempt_disable() to prevent CPUs from going offline from under us.

Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
while invoking from atomic context.

Scheduler functions such as try_to_wake_up() and select_task_rq() (and even
select_fallback_rq()) deal with picking new CPUs to run tasks. So they need
to synchronize with CPU offline operations.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/sched/core.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..f51e0aa 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1103,6 +1103,10 @@ EXPORT_SYMBOL_GPL(kick_process);
 #ifdef CONFIG_SMP
 /*
  * ->cpus_allowed is protected by both rq->lock and p->pi_lock
+ *
+ *  Must be called under get/put_online_cpus_atomic() or
+ *  equivalent, to avoid CPUs from going offline from underneath
+ *  us.
  */
 static int select_fallback_rq(int cpu, struct task_struct *p)
 {
@@ -1166,6 +1170,9 @@ out:
 
 /*
  * The caller (fork, wakeup) owns p->pi_lock, ->cpus_allowed is stable.
+ *
+ * Must be called under get/put_online_cpus_atomic(), to prevent
+ * CPUs from going offline from underneath us.
  */
 static inline
 int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
@@ -1406,6 +1413,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	int cpu, success = 0;
 
 	smp_wmb();
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	if (!(p->state & state))
 		goto out;
@@ -1446,6 +1454,7 @@ stat:
 	ttwu_stat(p, cpu, wake_flags);
 out:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	put_online_cpus_atomic();
 
 	return success;
 }
@@ -1624,6 +1633,7 @@ void wake_up_new_task(struct task_struct *p)
 	unsigned long flags;
 	struct rq *rq;
 
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 #ifdef CONFIG_SMP
 	/*
@@ -1644,6 +1654,7 @@ void wake_up_new_task(struct task_struct *p)
 		p->sched_class->task_woken(rq, p);
 #endif
 	task_rq_unlock(rq, p, &flags);
+	put_online_cpus_atomic();
 }
 
 #ifdef CONFIG_PREEMPT_NOTIFIERS
@@ -2541,6 +2552,7 @@ void sched_exec(void)
 	unsigned long flags;
 	int dest_cpu;
 
+	get_online_cpus_atomic();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
 	dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0);
 	if (dest_cpu == smp_processor_id())
@@ -2550,11 +2562,13 @@ void sched_exec(void)
 		struct migration_arg arg = { p, dest_cpu };
 
 		raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+		put_online_cpus_atomic();
 		stop_one_cpu(task_cpu(p), migration_cpu_stop, &arg);
 		return;
 	}
 unlock:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+	put_online_cpus_atomic();
 }
 
 #endif


^ permalink raw reply related

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Tejun Heo @ 2012-12-07 17:57 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, oleg, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207173759.27305.84316.stgit@srivatsabhat.in.ibm.com>

On Fri, Dec 07, 2012 at 11:08:13PM +0530, Srivatsa S. Bhat wrote:
> 4. No deadlock possibilities
> 
>    Per-cpu locking is not the way to go if we want to have relaxed rules
>    for lock-ordering. Because, we can end up in circular-locking dependencies
>    as explained in https://lkml.org/lkml/2012/12/6/290
> 
>    So, avoid per-cpu locking schemes (per-cpu locks/per-cpu atomic counters
>    with spin-on-contention etc) as much as possible.

I really can't say I like this approach.  percpu locking is very
tricky to get right and difficult to get right and we should try our
best to avoid implementing subsystem specific ones as much as
possible.  Also, I think the right approach would be auditing each
get_online_cpus_atomic() callsites and figure out proper locking order
rather than implementing a construct this unusual especially as
hunting down the incorrect cases shouldn't be difficult given proper
lockdep annotation.

lg_lock doesn't do local nesting and I'm not sure how big a deal that
is as I don't know how many should be converted.  But if nesting is an
absolute necessity, it would be much better to implement generic
rwlock variant (say, lg_rwlock) rather than implementing unusual
cpuhotplug-specific percpu synchronization construct.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Tejun Heo @ 2012-12-07 18:16 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, oleg, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207175724.GA2821@htj.dyndns.org>

Hello, again.

On Fri, Dec 07, 2012 at 09:57:24AM -0800, Tejun Heo wrote:
> possible.  Also, I think the right approach would be auditing each
> get_online_cpus_atomic() callsites and figure out proper locking order
> rather than implementing a construct this unusual especially as
> hunting down the incorrect cases shouldn't be difficult given proper
> lockdep annotation.

On the second look, it looks like you're implementing proper
percpu_rwlock semantics as readers aren't supposed to induce circular
dependency directly.  Can you please work with Oleg to implement
proper percpu-rwlock and use that for CPU hotplug rather than
implementing it inside CPU hotplug?

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-07 18:24 UTC (permalink / raw)
  To: Tejun Heo
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, oleg, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207175724.GA2821@htj.dyndns.org>

On 12/07/2012 11:27 PM, Tejun Heo wrote:
> On Fri, Dec 07, 2012 at 11:08:13PM +0530, Srivatsa S. Bhat wrote:
>> 4. No deadlock possibilities
>>
>>    Per-cpu locking is not the way to go if we want to have relaxed rules
>>    for lock-ordering. Because, we can end up in circular-locking dependencies
>>    as explained in https://lkml.org/lkml/2012/12/6/290
>>
>>    So, avoid per-cpu locking schemes (per-cpu locks/per-cpu atomic counters
>>    with spin-on-contention etc) as much as possible.
> 
> I really can't say I like this approach.  percpu locking is very
> tricky to get right and difficult to get right and we should try our
> best to avoid implementing subsystem specific ones as much as
> possible.  Also, I think the right approach would be auditing each
> get_online_cpus_atomic() callsites and figure out proper locking order
> rather than implementing a construct this unusual especially as
> hunting down the incorrect cases shouldn't be difficult given proper
> lockdep annotation.
> 
> lg_lock doesn't do local nesting and I'm not sure how big a deal that
> is as I don't know how many should be converted.  But if nesting is an
> absolute necessity, it would be much better to implement generic
> rwlock variant (say, lg_rwlock) rather than implementing unusual
> cpuhotplug-specific percpu synchronization construct.
>

To be honest, at a certain point in time while designing this, I did
realize that this was getting kinda overly complicated ;-) ... but I
wanted to see how this would actually work out when finished and get
some feedback on the same, hence I posted it out. But this also proves
that we _can_ actually compete with the flexibility of preempt_disable()
and still be safe with respect to locking, if we really want to ;-)

Regards,
Srivatsa S. Bhat


^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Tejun Heo @ 2012-12-07 18:31 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, oleg, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C23441.7020309@linux.vnet.ibm.com>

Hello, Srivatsa.

On Fri, Dec 07, 2012 at 11:54:01PM +0530, Srivatsa S. Bhat wrote:
> > lg_lock doesn't do local nesting and I'm not sure how big a deal that
> > is as I don't know how many should be converted.  But if nesting is an
> > absolute necessity, it would be much better to implement generic
> > rwlock variant (say, lg_rwlock) rather than implementing unusual
> > cpuhotplug-specific percpu synchronization construct.
> 
> To be honest, at a certain point in time while designing this, I did
> realize that this was getting kinda overly complicated ;-) ... but I
> wanted to see how this would actually work out when finished and get
> some feedback on the same, hence I posted it out. But this also proves
> that we _can_ actually compete with the flexibility of preempt_disable()
> and still be safe with respect to locking, if we really want to ;-)

I got confused by comparison to preempt_disable() but you're right
that percpu rwlock shouldn't be able to introduce locking dependency
which doesn't exist with non-percpu rwlock.  ie. write locking should
be atomic w.r.t. to all readers.  At the simplest, this can be
implemented by writer backing out all the way if try-locking any CPU
fails and retrying the whole thing.  That should be correct but has
the potential of starving the writer.

What we need here is a generic percpu-rwlock.  I don't know which
exact implementation strategy we should choose.  Maybe your switching
to global rwlock is the right solution.  But, at any rate, I think it
would be best to implement proper percpu-rwlock and then apply it to
CPU hotplug.  It's actually gonna be pretty fitting as
get_online_cpus() is being converted to percpu-rwsem.  IIUC, Oleg has
been working on this for a while now.  Oleg, what do you think?

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-07 18:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, oleg, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207181608.GB2821@htj.dyndns.org>

On 12/07/2012 11:46 PM, Tejun Heo wrote:
> Hello, again.
> 
> On Fri, Dec 07, 2012 at 09:57:24AM -0800, Tejun Heo wrote:
>> possible.  Also, I think the right approach would be auditing each
>> get_online_cpus_atomic() callsites and figure out proper locking order
>> rather than implementing a construct this unusual especially as
>> hunting down the incorrect cases shouldn't be difficult given proper
>> lockdep annotation.
> 
> On the second look, it looks like you're implementing proper
> percpu_rwlock semantics

Ah, nice! I didn't realize that I was actually doing what I intended
to avoid! ;-)

Looking at the implementation of lglocks, and going by Oleg's earlier
comment that we just need to replace spinlock_t with rwlock_t in them
to get percpu_rwlocks, I was horrified at the kinds of circular locking
dependencies that they would be prone to, unless used carefully.

So I devised this scheme to be safe, while still having relaxed rules.
But if *this* is what percpu_rwlocks should ideally look like,
then great! :-)

> as readers aren't supposed to induce circular
> dependency directly.

Yep, in this scheme, nobody will end up in circular dependency.

>  Can you please work with Oleg to implement
> proper percpu-rwlock and use that for CPU hotplug rather than
> implementing it inside CPU hotplug?
> 

Sure, I'd be more than happy to!

Regards,
Srivatsa S. Bhat

^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-07 18:38 UTC (permalink / raw)
  To: Tejun Heo
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, oleg, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207183141.GC2821@htj.dyndns.org>

On 12/08/2012 12:01 AM, Tejun Heo wrote:
> Hello, Srivatsa.
> 
> On Fri, Dec 07, 2012 at 11:54:01PM +0530, Srivatsa S. Bhat wrote:
>>> lg_lock doesn't do local nesting and I'm not sure how big a deal that
>>> is as I don't know how many should be converted.  But if nesting is an
>>> absolute necessity, it would be much better to implement generic
>>> rwlock variant (say, lg_rwlock) rather than implementing unusual
>>> cpuhotplug-specific percpu synchronization construct.
>>
>> To be honest, at a certain point in time while designing this, I did
>> realize that this was getting kinda overly complicated ;-) ... but I
>> wanted to see how this would actually work out when finished and get
>> some feedback on the same, hence I posted it out. But this also proves
>> that we _can_ actually compete with the flexibility of preempt_disable()
>> and still be safe with respect to locking, if we really want to ;-)
> 
> I got confused by comparison to preempt_disable() but you're right
> that percpu rwlock shouldn't be able to introduce locking dependency
> which doesn't exist with non-percpu rwlock.  ie. write locking should
> be atomic w.r.t. to all readers.

Yep!

>  At the simplest, this can be
> implemented by writer backing out all the way if try-locking any CPU
> fails and retrying the whole thing.  That should be correct but has
> the potential of starving the writer.
> 

Exactly! This is what I mentioned yesterday in the link below, and said
that its not good because of writer starvation ("too wasteful"):
https://lkml.org/lkml/2012/12/6/290

> What we need here is a generic percpu-rwlock.  I don't know which
> exact implementation strategy we should choose.  Maybe your switching
> to global rwlock is the right solution.  But, at any rate, I think it
> would be best to implement proper percpu-rwlock and then apply it to
> CPU hotplug.  It's actually gonna be pretty fitting as
> get_online_cpus() is being converted to percpu-rwsem.  IIUC, Oleg has
> been working on this for a while now.  Oleg, what do you think?
> 

Hmm, that sounds good.

Regards,
Srivatsa S. Bhat


^ permalink raw reply

* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline
From: Oleg Nesterov @ 2012-12-07 19:56 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C0E88E.9050909@linux.vnet.ibm.com>

On 12/07, Srivatsa S. Bhat wrote:
>
> On 12/06/2012 09:48 PM, Oleg Nesterov wrote:
> > On 12/06, Srivatsa S. Bhat wrote:
> >>
> >> +void get_online_cpus_atomic(void)
> >> +{
> >> +	int c, old;
> >> +
> >> +	preempt_disable();
> >> +	read_lock(&hotplug_rwlock);
> >
> > Confused... Why it also takes hotplug_rwlock?
>
> To avoid ABBA deadlocks.
>
> hotplug_rwlock was meant for the "light" readers.
> The atomic counters were meant for the "heavy/full" readers.

OK, I got lost a bit. I'll try to read v3 tomorrow.

> > Obviously you can't use get_online_cpus_atomic() under rq->lock or
> > task->pi_lock or any other lock CPU_DYING can take. Probably this is
> > fine, but perhaps it makes sense to add the lockdep annotations.
>
> Hmm, you are right. We can't use _atomic() in the CPU_DYING path.

Not sure I undestand... I simply meant that, say,
get_online_cpus_atomic() under task->pi_lock can obviously deadlock
with take_cpu_down() which can want the same task->pi_lock after
disable_atomic_reader().

Oleg.

^ permalink raw reply

* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline
From: Srivatsa S. Bhat @ 2012-12-07 20:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207195629.GA13238@redhat.com>

On 12/08/2012 01:26 AM, Oleg Nesterov wrote:
> On 12/07, Srivatsa S. Bhat wrote:
>>
>> On 12/06/2012 09:48 PM, Oleg Nesterov wrote:
>>> On 12/06, Srivatsa S. Bhat wrote:
>>>>
>>>> +void get_online_cpus_atomic(void)
>>>> +{
>>>> +	int c, old;
>>>> +
>>>> +	preempt_disable();
>>>> +	read_lock(&hotplug_rwlock);
>>>
>>> Confused... Why it also takes hotplug_rwlock?
>>
>> To avoid ABBA deadlocks.
>>
>> hotplug_rwlock was meant for the "light" readers.
>> The atomic counters were meant for the "heavy/full" readers.
> 
> OK, I got lost a bit. I'll try to read v3 tomorrow.
> 

OK, thanks! But note that v3 is completely different from v2.

>>> Obviously you can't use get_online_cpus_atomic() under rq->lock or
>>> task->pi_lock or any other lock CPU_DYING can take. Probably this is
>>> fine, but perhaps it makes sense to add the lockdep annotations.
>>
>> Hmm, you are right. We can't use _atomic() in the CPU_DYING path.
> 
> Not sure I undestand... I simply meant that, say,
> get_online_cpus_atomic() under task->pi_lock can obviously deadlock
> with take_cpu_down() which can want the same task->pi_lock after
> disable_atomic_reader().
> 

Right, I mistook your point for something else (i.e., ability for
the writer to do get_online_cpus_atomic() safely, which I fixed in
v3).

So, your point above is very valid. And yes, we can't do much
about it, we'll just have to teach lockdep to catch such usages.

Regards,
Srivatsa S. Bhat


^ permalink raw reply

* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline
From: Oleg Nesterov @ 2012-12-07 20:59 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C250CA.8070209@linux.vnet.ibm.com>

On 12/08, Srivatsa S. Bhat wrote:
>
> On 12/08/2012 01:26 AM, Oleg Nesterov wrote:
> > Not sure I undestand... I simply meant that, say,
> > get_online_cpus_atomic() under task->pi_lock can obviously deadlock
> > with take_cpu_down() which can want the same task->pi_lock after
> > disable_atomic_reader().
>
> Right, I mistook your point for something else (i.e., ability for
> the writer to do get_online_cpus_atomic() safely, which I fixed in
> v3).
>
> So, your point above is very valid. And yes, we can't do much
> about it, we'll just have to teach lockdep to catch such usages.

Afaics, this is simple. Just add the "fake" lockdep_map as, say,
lglock does. Except, you need rwlock_acquire_read(map, 0, 1, IP)
because this lock is recursive.

But. There is another email from you about the possible deadlock.
I'll write the reply in a minute...

Oleg.


^ permalink raw reply

* Re: [RFC PATCH v2 01/10] CPU hotplug: Provide APIs for "light" atomic readers to prevent CPU offline
From: Oleg Nesterov @ 2012-12-07 21:01 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tj, tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <50C0EF49.8050700@linux.vnet.ibm.com>

On 12/07, Srivatsa S. Bhat wrote:
>
> CPU 0                         CPU 1
> ------                        ------
>
> 1. Acquire lock A             Increment CPU1's
>                               atomic counter
>
>
>
> 2. Increment CPU0's            Try to acquire lock A
>    atomic counter
>
>
> Now consider what happens if a hotplug writer (cpu_down) begins,

Exactly. So the fake lockdep_map should be per-cpu as well.

lglock doesn't need this because lg_local_lock() is not recursive
and lockdep can catch the bug like this. So it can look as single
lock for lockdep.

IOW. If you use the global lockdep_map and want lockdep to notice
this deadlock you need rwlock_acquire_read(map, 0, 0, IP). But then
lockdep will complain if the task does "read lock" under "read lock".

Oleg.

^ permalink raw reply

* [PATCH 1/2] thermal: Add support for thermal sensor for Kirkwood SoC
From: Nobuhiro Iwamatsu @ 2012-12-07 23:15 UTC (permalink / raw)
  To: linux-pm; +Cc: linux-arm-kernel, rui.zhang, jason, Nobuhiro Iwamatsu

Some kirkwood SoC has thermal sensor.
This patch adds support for 88F6282 and 88F6283.

Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
---
 drivers/thermal/Kconfig            |    7 ++
 drivers/thermal/Makefile           |    1 +
 drivers/thermal/kirkwood_thermal.c |  131 ++++++++++++++++++++++++++++++++++++
 3 files changed, 139 insertions(+)
 create mode 100644 drivers/thermal/kirkwood_thermal.c

diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index e1cb6bd..8710ac2 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -55,3 +55,10 @@ config EXYNOS_THERMAL
 	help
 	  If you say yes here you get support for TMU (Thermal Managment
 	  Unit) on SAMSUNG EXYNOS series of SoC.
+
+config KIRKWOOD_THERMAL
+	tristate "Temperature sensor on Marvel Kirkwood"
+	depends on ARCH_KIRKWOOD && THERMAL
+	help
+	  Support for the Kirkwood thermal sensor driver into the Linux thermal
+	  framework. This supports only 88F6282 and 88F6283.
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index 885550d..4dbe5e1 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -6,4 +6,5 @@ obj-$(CONFIG_THERMAL)		+= thermal_sys.o
 obj-$(CONFIG_CPU_THERMAL)		+= cpu_cooling.o
 obj-$(CONFIG_SPEAR_THERMAL)		+= spear_thermal.o
 obj-$(CONFIG_RCAR_THERMAL)	+= rcar_thermal.o
+obj-$(CONFIG_KIRKWOOD_THERMAL)         += kirkwood_thermal.o
 obj-$(CONFIG_EXYNOS_THERMAL)		+= exynos_thermal.o
diff --git a/drivers/thermal/kirkwood_thermal.c b/drivers/thermal/kirkwood_thermal.c
new file mode 100644
index 0000000..bddcf49
--- /dev/null
+++ b/drivers/thermal/kirkwood_thermal.c
@@ -0,0 +1,131 @@
+/*
+ * kirkwood thermal sensor driver
+ *
+ * Copyright (C) 2012 Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/of.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/thermal.h>
+
+#define THERMAL_VALID_OFFSET	9
+#define THERMAL_VALID_MASK	0x1
+#define THERMAL_TEMP_OFFSET	10
+#define THERMAL_TEMP_MASK	0x1FF
+
+/* Kirkwood Thermal Sensor Dev Structure */
+struct kirkwood_thermal_dev {
+	void __iomem *base_addr;
+};
+
+static inline int kirkwood_get_temp(struct thermal_zone_device *thermal,
+			unsigned long *temp)
+{
+	unsigned long reg;
+	struct kirkwood_thermal_dev *thermal_dev = thermal->devdata;
+
+	reg = readl_relaxed(thermal_dev->base_addr);
+
+	/* Valid check */
+	if (!(reg >> THERMAL_VALID_OFFSET) & THERMAL_VALID_MASK) {
+		dev_info(&thermal->device, "Reading temperature is not valid\n");
+		return -EIO;
+	}
+
+	reg = (reg >> THERMAL_TEMP_OFFSET) & THERMAL_TEMP_MASK;
+	/* Calculate temperature. See Table 814 in hardware manual. */
+	*temp = ((322 - reg) * 10000) / 13625;
+
+	return 0;
+}
+
+static struct thermal_zone_device_ops ops = {
+	.get_temp = kirkwood_get_temp,
+};
+
+static int kirkwood_thermal_probe(struct platform_device *pdev)
+{
+	struct thermal_zone_device *thermal = NULL;
+	struct kirkwood_thermal_dev *thermal_dev;
+	struct resource *res;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!res) {
+		dev_err(&pdev->dev, "Failed to get platform resource\n");
+		return -ENODEV;
+	}
+
+	thermal_dev
+		= devm_kzalloc(&pdev->dev, sizeof(*thermal_dev), GFP_KERNEL);
+	if (!thermal_dev) {
+		dev_err(&pdev->dev, "kzalloc fail\n");
+		return -ENOMEM;
+	}
+
+	thermal_dev->base_addr = devm_ioremap(&pdev->dev, res->start,
+				resource_size(res));
+	if (!thermal_dev->base_addr) {
+		dev_err(&pdev->dev, "Failed to ioremap memory\n");
+		return -ENOMEM;
+	}
+
+	thermal = thermal_zone_device_register("kirkwood_thermal", 0, 0,
+				   thermal_dev, &ops, 0, 0);
+	if (IS_ERR(thermal)) {
+		dev_err(&pdev->dev, "Failed to register thermal zone device\n");
+		return  PTR_ERR(thermal);
+	}
+
+	platform_set_drvdata(pdev, thermal);
+
+	dev_info(&thermal->device, KBUILD_MODNAME ": Thermal sensor registered\n");
+
+	return 0;
+}
+
+static int kirkwood_thermal_exit(struct platform_device *pdev)
+{
+	struct thermal_zone_device *kirkwood_thermal
+			= platform_get_drvdata(pdev);
+
+	thermal_zone_device_unregister(kirkwood_thermal);
+	platform_set_drvdata(pdev, NULL);
+
+	return 0;
+}
+
+static const struct of_device_id kirkwood_thermal_id_table[] = {
+	{ .compatible = "marvel,thermal-kirkwood" },
+	{}
+};
+MODULE_DEVICE_TABLE(of, kirkwood_thermal_id_table);
+
+static struct platform_driver kirkwood_thermal_driver = {
+	.probe = kirkwood_thermal_probe,
+	.remove = kirkwood_thermal_exit,
+	.driver = {
+		.name = "kirkwood_thermal",
+		.owner = THIS_MODULE,
+		.of_match_table = of_match_ptr(kirkwood_thermal_id_table),
+	},
+};
+
+module_platform_driver(kirkwood_thermal_driver);
+
+MODULE_AUTHOR("Nobuhiro Iwamatsu <iwamatsu@nigauri.org>");
+MODULE_DESCRIPTION("kirkwood thermal driver");
+MODULE_LICENSE("GPL");
-- 
1.7.10.4


^ permalink raw reply related

* [PATCH 2/2] ARM: Kirkwood: Add support thermal sensor for 88F6282 and 88F6283
From: Nobuhiro Iwamatsu @ 2012-12-07 23:15 UTC (permalink / raw)
  To: linux-pm; +Cc: linux-arm-kernel, rui.zhang, jason, Nobuhiro Iwamatsu
In-Reply-To: <1354922151-3250-1-git-send-email-iwamatsu@nigauri.org>

88F6282 and 88F6283 has thermal sensor.
This patch enables DT and kernel config.

Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
---
 arch/arm/boot/dts/kirkwood-6282.dtsi |    6 ++++++
 arch/arm/configs/kirkwood_defconfig  |    2 ++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm/boot/dts/kirkwood-6282.dtsi b/arch/arm/boot/dts/kirkwood-6282.dtsi
index 9ae2004..4d9cd74 100644
--- a/arch/arm/boot/dts/kirkwood-6282.dtsi
+++ b/arch/arm/boot/dts/kirkwood-6282.dtsi
@@ -41,5 +41,11 @@
 			clock-frequency = <100000>;
 			status = "disabled";
 		};
+
+		thermal@10078 {
+			compatible = "marvel,thermal-kirkwood";
+			reg = <0x10078 0x4>;
+			status = "okay";
+		};
 	};
 };
diff --git a/arch/arm/configs/kirkwood_defconfig b/arch/arm/configs/kirkwood_defconfig
index 93f3794..3d8667f 100644
--- a/arch/arm/configs/kirkwood_defconfig
+++ b/arch/arm/configs/kirkwood_defconfig
@@ -118,6 +118,8 @@ CONFIG_SPI=y
 CONFIG_SPI_ORION=y
 CONFIG_GPIO_SYSFS=y
 # CONFIG_HWMON is not set
+CONFIG_THERMAL=y
+CONFIG_KIRKWOOD_THERMAL=y
 CONFIG_WATCHDOG=y
 CONFIG_ORION_WATCHDOG=y
 CONFIG_HID_DRAGONRISE=y
-- 
1.7.10.4


^ permalink raw reply related

* Re: [PATCH 1/2] thermal: Add support for thermal sensor for Kirkwood SoC
From: Jason Gunthorpe @ 2012-12-07 23:24 UTC (permalink / raw)
  To: Nobuhiro Iwamatsu; +Cc: rui.zhang, jason, linux-arm-kernel, linux-pm
In-Reply-To: <1354922151-3250-1-git-send-email-iwamatsu@nigauri.org>

On Sat, Dec 08, 2012 at 08:15:50AM +0900, Nobuhiro Iwamatsu wrote:
> Some kirkwood SoC has thermal sensor.
> This patch adds support for 88F6282 and 88F6283.

Thanks! I was just about to write this.. Looks good here.

Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Jason

^ permalink raw reply

* Re: [PATCH 1/2] thermal: Add support for thermal sensor for Kirkwood SoC
From: Jason Gunthorpe @ 2012-12-07 23:59 UTC (permalink / raw)
  To: Nobuhiro Iwamatsu; +Cc: linux-pm, rui.zhang, jason, linux-arm-kernel
In-Reply-To: <20121207232459.GA4304@obsidianresearch.com>

On Fri, Dec 07, 2012 at 04:24:59PM -0700, Jason Gunthorpe wrote:
> On Sat, Dec 08, 2012 at 08:15:50AM +0900, Nobuhiro Iwamatsu wrote:
> > Some kirkwood SoC has thermal sensor.
> > This patch adds support for 88F6282 and 88F6283.
> 
> Thanks! I was just about to write this.. Looks good here.

Ah, looking closer:

$ cat /sys/class/thermal/thermal_zone0/temp 
37

That should be 37000, the value out of the driver should be in
milli-Celsius.

I'd use this equation instead:

	*temp = ((322 - reg) * 10000 * 1000) / 13625;

Regards,
Jason

^ permalink raw reply

* Re: [PATCH 1/2] thermal: Add support for thermal sensor for Kirkwood SoC
From: Thomas Petazzoni @ 2012-12-08  0:07 UTC (permalink / raw)
  To: Nobuhiro Iwamatsu; +Cc: rui.zhang, jason, linux-arm-kernel, linux-pm
In-Reply-To: <1354922151-3250-1-git-send-email-iwamatsu@nigauri.org>

Hello,

On Sat,  8 Dec 2012 08:15:50 +0900, Nobuhiro Iwamatsu wrote:

> +config KIRKWOOD_THERMAL
> +	tristate "Temperature sensor on Marvel Kirkwood"

Marvell

> +	/* Valid check */
> +	if (!(reg >> THERMAL_VALID_OFFSET) & THERMAL_VALID_MASK) {
> +		dev_info(&thermal->device, "Reading temperature is not valid\n");

"Temperature read is not valid" maybe? An native english speaker could
say.

> +	thermal_dev
> +		= devm_kzalloc(&pdev->dev, sizeof(*thermal_dev), GFP_KERNEL);

I think the usual coding style is to have the = on the first line.

> +static int kirkwood_thermal_exit(struct platform_device *pdev)
> +{
> +	struct thermal_zone_device *kirkwood_thermal
> +			= platform_get_drvdata(pdev);

Ditto.

> +static const struct of_device_id kirkwood_thermal_id_table[] = {
> +	{ .compatible = "marvel,thermal-kirkwood" },

marvel -> marvell

Also, I think it should be marvell,kirkwood-thermal, since most other
DT compatible strings that we have for Marvell SoCs are
marvell,<soc>-<function>.

Also, the Device Tree binding documentation is missing (even though it
is admittedly going to be a very short documentation).

Thanks!

Thomas
-- 
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH 1/2] thermal: Add support for thermal sensor for Kirkwood SoC
From: Russell King - ARM Linux @ 2012-12-08  0:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nobuhiro Iwamatsu, rui.zhang, jason, linux-arm-kernel, linux-pm
In-Reply-To: <20121207235904.GA15078@obsidianresearch.com>

On Fri, Dec 07, 2012 at 04:59:04PM -0700, Jason Gunthorpe wrote:
> On Fri, Dec 07, 2012 at 04:24:59PM -0700, Jason Gunthorpe wrote:
> > On Sat, Dec 08, 2012 at 08:15:50AM +0900, Nobuhiro Iwamatsu wrote:
> > > Some kirkwood SoC has thermal sensor.
> > > This patch adds support for 88F6282 and 88F6283.
> > 
> > Thanks! I was just about to write this.. Looks good here.
> 
> Ah, looking closer:
> 
> $ cat /sys/class/thermal/thermal_zone0/temp 
> 37
> 
> That should be 37000, the value out of the driver should be in
> milli-Celsius.
> 
> I'd use this equation instead:
> 
> 	*temp = ((322 - reg) * 10000 * 1000) / 13625;

Be careful of math overflows... make sure you do this calculation using
unsigned arithmetic as temperatures above 157 degress will cause this
to look like a negative number using signed math... However, that
probably won't ever be noticed because at 157 degrees, you'll definitely
be outside the operating limits of the device.

^ permalink raw reply

* Re: [PATCH 1/2] thermal: Add support for thermal sensor for Kirkwood SoC
From: Russell King - ARM Linux @ 2012-12-08  0:11 UTC (permalink / raw)
  To: Thomas Petazzoni
  Cc: Nobuhiro Iwamatsu, rui.zhang, jason, linux-arm-kernel, linux-pm
In-Reply-To: <20121208010708.260209f5@skate>

On Sat, Dec 08, 2012 at 01:07:08AM +0100, Thomas Petazzoni wrote:
> Hello,
> 
> On Sat,  8 Dec 2012 08:15:50 +0900, Nobuhiro Iwamatsu wrote:
> > +static const struct of_device_id kirkwood_thermal_id_table[] = {
> > +	{ .compatible = "marvel,thermal-kirkwood" },
> 
> marvel -> marvell
> 
> Also, I think it should be marvell,kirkwood-thermal, since most other
> DT compatible strings that we have for Marvell SoCs are
> marvell,<soc>-<function>.
> 
> Also, the Device Tree binding documentation is missing (even though it
> is admittedly going to be a very short documentation).

Is this in any way compatible with the thermal monitoring found on
Dove (510) stuff?  If so, should it have the SoC prefix in there,
or should it be "armada-thermal" for the SoC family?

^ permalink raw reply

* [PATCH] power: bq2414x: use module_i2c_driver macro
From: Devendra Naga @ 2012-12-08 17:16 UTC (permalink / raw)
  To: Anton Vorontsov, David Woodhouse, linux-pm; +Cc: Devendra Naga

use the module_i2c_driver macro and reduce code and also remove the
duplicated module init and exit places.

Signed-off-by: Devendra Naga <develkernel412222@gmail.com>
---
 drivers/power/bq2415x_charger.c |   13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/power/bq2415x_charger.c b/drivers/power/bq2415x_charger.c
index ee842b3..b05df0a 100644
--- a/drivers/power/bq2415x_charger.c
+++ b/drivers/power/bq2415x_charger.c
@@ -1653,18 +1653,7 @@ static struct i2c_driver bq2415x_driver = {
 	.id_table = bq2415x_i2c_id_table,
 };
 
-static int __init bq2415x_init(void)
-{
-	return i2c_add_driver(&bq2415x_driver);
-}
-module_init(bq2415x_init);
-
-static void __exit bq2415x_exit(void)
-{
-	i2c_del_driver(&bq2415x_driver);
-}
-module_exit(bq2415x_exit);
-
+module_i2c_driver(bq2415x_driver);
 MODULE_AUTHOR("Pali Rohár <pali.rohar@gmail.com>");
 MODULE_DESCRIPTION("bq2415x charger driver");
 MODULE_LICENSE("GPL");
-- 
1.7.10.4


^ permalink raw reply related

* Re: [PATCH 1/2] thermal: Add support for thermal sensor for Kirkwood SoC
From: Sebastian Hesselbarth @ 2012-12-09 13:54 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Thomas Petazzoni, linux-arm-kernel, rui.zhang, jason,
	Nobuhiro Iwamatsu, linux-pm
In-Reply-To: <20121208001152.GA14363@n2100.arm.linux.org.uk>

On 12/08/2012 01:11 AM, Russell King - ARM Linux wrote:
> On Sat, Dec 08, 2012 at 01:07:08AM +0100, Thomas Petazzoni wrote:
>> Hello,
>>
>> On Sat,  8 Dec 2012 08:15:50 +0900, Nobuhiro Iwamatsu wrote:
>>> +static const struct of_device_id kirkwood_thermal_id_table[] = {
>>> +	{ .compatible = "marvel,thermal-kirkwood" },
>>
>> marvel ->  marvell
>>
>> Also, I think it should be marvell,kirkwood-thermal, since most other
>> DT compatible strings that we have for Marvell SoCs are
>> marvell,<soc>-<function>.
>>
>> Also, the Device Tree binding documentation is missing (even though it
>> is admittedly going to be a very short documentation).
>
> Is this in any way compatible with the thermal monitoring found on
> Dove (510) stuff?  If so, should it have the SoC prefix in there,
> or should it be "armada-thermal" for the SoC family?

I haven't checked the driver in detail but at least register offsets
and the register-to-temperature function are different for Dove.
This is no big deal and can be handled with compatible strings.

But more important, "kirkwood" includes 88f618x, 88f619x, and 88f6281
that have no thermal diode - at least it is not mentioned in the public
datasheet. So, finally for Nobuhiro's patch I suggest to have two
compatible strings, one for marvell,88f6282-thermal and one for
marvell,88f6282-thermal. Numbering scheme of Marvell SoCs is a mess..

For the driver, the name should be either orion_thermal.c (as we will
reuse it for Dove), or mvebu_thermal.c if there is also a thermal diode
on Armada 370/XP. Using "armada" alone is not a good idea, as it also
includes some pxa-based SoCs - naming scheme of Marvell SoCs is even
more broken than numbering scheme ;)

Sebastian

^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Oleg Nesterov @ 2012-12-09 19:14 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207173759.27305.84316.stgit@srivatsabhat.in.ibm.com>

On 12/07, Srivatsa S. Bhat wrote:
>
> Per-cpu counters can help solve the cache-line bouncing problem. So we
> actually use the best of both: per-cpu counters (no-waiting) at the reader
> side in the fast-path, and global rwlocks in the slowpath.
>
> [ Fastpath = no writer is active; Slowpath = a writer is active ]
>
> IOW, the hotplug readers just increment/decrement their per-cpu refcounts
> when no writer is active.

Plus LOCK and cli/sti. I do not pretend I really know how bad this is
performance-wise though. And at first glance this look overcomplicated.

But yes, it is easy to blame somebody else's code ;) And I can't suggest
something better at least right now. If I understand correctly, we can not
use, say, synchronize_sched() in _cpu_down() path, you also want to improve
the latency. And I guess something like kick_all_cpus_sync() is "too heavy".

Also. After the quick reading this doesn't look correct, please see below.

> +void get_online_cpus_atomic(void)
> +{
> +	unsigned int cpu = smp_processor_id();
> +	unsigned long flags;
> +
> +	preempt_disable();
> +	local_irq_save(flags);
> +
> +	if (cpu_hotplug.active_writer == current)
> +		goto out;
> +
> +	smp_rmb(); /* Paired with smp_wmb() in drop_writer_signal() */
> +
> +	if (likely(!writer_active(cpu))) {

WINDOW. Suppose that reader_active() == F.

> +		mark_reader_fastpath();
> +		goto out;

Why take_cpu_down() can't do announce_cpu_offline_begin() + sync_all_readers()
in between?

Looks like we should increment the counter first, then check writer_active().
And sync_atomic_reader() needs rmb between 2 atomic_read's.

Or. Again, suppose that reader_active() == F. But is_new_writer() == T.

> +	if (is_new_writer(cpu)) {
> +		/*
> +		 * ACK the writer's signal only if this is a fresh read-side
> +		 * critical section, and not just an extension of a running
> +		 * (nested) read-side critical section.
> +		 */
> +		if (!reader_active(cpu)) {
> +			ack_writer_signal();

What if take_cpu_down() does announce_cpu_offline_end() right before
ack_writer_signal() ? In this case get_online_cpus_atomic() returns
with writer_signal == -1. If nothing else this breaks the next
raise_writer_signal().

Oleg.

^ permalink raw reply

* Re: [RFC PATCH v3 7/9] yield_to(), cpu-hotplug: Prevent offlining of other CPUs properly
From: Oleg Nesterov @ 2012-12-09 19:48 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121207173950.27305.39499.stgit@srivatsabhat.in.ibm.com>

On 12/07, Srivatsa S. Bhat wrote:
>
> Once stop_machine() is gone from the CPU offline path, we won't be able to
> depend on local_irq_save() to prevent CPUs from going offline from under us.

OK, I guess we need to avoid resched_task()->smp_send_reschedule()
after __cpu_disable() and before migrate_tasks().

But, whatever problem we have,

> Use the get/put_online_cpus_atomic() APIs to prevent CPUs from going offline,
> while invoking from atomic context.

it should be solved, so...

> -		if (preempt && rq != p_rq)
> +		if (preempt && rq != p_rq && cpu_online(task_cpu(p)))

Why do we need this change?

Afaics, you could add BUG_ON(!cpu_online(...)) instead?

I am just curious.

Oleg.


^ permalink raw reply

* Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
From: Srivatsa S. Bhat @ 2012-12-09 19:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: tglx, peterz, paulmck, rusty, mingo, akpm, namhyung,
	vincent.guittot, tj, sbw, amit.kucheria, rostedt, rjw, wangyun,
	xiaoguangrong, nikunj, linux-pm, linux-kernel
In-Reply-To: <20121209191437.GA2816@redhat.com>

On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
> On 12/07, Srivatsa S. Bhat wrote:
>>
>> Per-cpu counters can help solve the cache-line bouncing problem. So we
>> actually use the best of both: per-cpu counters (no-waiting) at the reader
>> side in the fast-path, and global rwlocks in the slowpath.
>>
>> [ Fastpath = no writer is active; Slowpath = a writer is active ]
>>
>> IOW, the hotplug readers just increment/decrement their per-cpu refcounts
>> when no writer is active.
> 
> Plus LOCK and cli/sti. I do not pretend I really know how bad this is
> performance-wise though. And at first glance this look overcomplicated.
>

Hehe, I agree ;-) But I couldn't think of any other way to get rid of the
deadlock possibilities, other than using global rwlocks. So I designed a
way in which we can switch between per-cpu counters and global rwlocks
dynamically. Probably there is a smarter way to achieve what we want, dunno...
 
> But yes, it is easy to blame somebody else's code ;) And I can't suggest
> something better at least right now. If I understand correctly, we can not
> use, say, synchronize_sched() in _cpu_down() path

We can't sleep in that code.. so that's a no-go.

>, you also want to improve
> the latency. And I guess something like kick_all_cpus_sync() is "too heavy".
> 

I hadn't considered that. Thinking of it, I don't think it would help us..
It won't get rid of the currently running preempt_disable() sections no?

> Also. After the quick reading this doesn't look correct, please see below.
> 
>> +void get_online_cpus_atomic(void)
>> +{
>> +	unsigned int cpu = smp_processor_id();
>> +	unsigned long flags;
>> +
>> +	preempt_disable();
>> +	local_irq_save(flags);
>> +
>> +	if (cpu_hotplug.active_writer == current)
>> +		goto out;
>> +
>> +	smp_rmb(); /* Paired with smp_wmb() in drop_writer_signal() */
>> +
>> +	if (likely(!writer_active(cpu))) {
> 
> WINDOW. Suppose that reader_active() == F.
> 
>> +		mark_reader_fastpath();
>> +		goto out;
> 
> Why take_cpu_down() can't do announce_cpu_offline_begin() + sync_all_readers()
> in between?
> 
> Looks like we should increment the counter first, then check writer_active().

You are right, I missed the above race-conditions.

> And sync_atomic_reader() needs rmb between 2 atomic_read's.
> 

OK.

> 
> Or. Again, suppose that reader_active() == F. But is_new_writer() == T.
> 
>> +	if (is_new_writer(cpu)) {
>> +		/*
>> +		 * ACK the writer's signal only if this is a fresh read-side
>> +		 * critical section, and not just an extension of a running
>> +		 * (nested) read-side critical section.
>> +		 */
>> +		if (!reader_active(cpu)) {
>> +			ack_writer_signal();
> 
> What if take_cpu_down() does announce_cpu_offline_end() right before
> ack_writer_signal() ? In this case get_online_cpus_atomic() returns
> with writer_signal == -1. If nothing else this breaks the next
> raise_writer_signal().
> 

Oh, yes, this one is wrong too! We need to mark ourselves as active
reader right in the beginning. And probably change the check to
"reader_nested()" or something like that.

Thanks a lot Oleg!
Regards,
Srivatsa S. Bhat


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox