All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kohli, Gaurav" <gkohli@codeaurora.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: tglx@linutronix.de, mpe@ellerman.id.au, mingo@kernel.org,
	bigeasy@linutronix.de, linux-kernel@vger.kernel.org,
	linux-arm-msm@vger.kernel.org,
	Neeraj Upadhyay <neeraju@codeaurora.org>,
	Will Deacon <will.deacon@arm.com>,
	Oleg Nesterov <oleg@redhat.com>
Subject: Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup
Date: Tue, 1 May 2018 16:10:53 +0530	[thread overview]
Message-ID: <f134c8d8-e56f-bf29-0177-e6702ae79f2b@codeaurora.org> (raw)
In-Reply-To: <20180501101845.GE12217@hirez.programming.kicks-ass.net>



On 5/1/2018 3:48 PM, Peter Zijlstra wrote:
> On Tue, May 01, 2018 at 01:20:26PM +0530, Kohli, Gaurav wrote:
>> But In our older case, where we have seen failure below is the wake up path
>> and ftraces, Wakeup occured  and completed before schedule call only.
>>
>> So final state of CPUHP is running not parked. I have also pasted debug
>> ftraces that we got during issue reproduction.
>>
>> Here wakeup for cpuhp is below:
>>
>> takedown_cpu-> kthread_park-> wake_up_process
>>
>>
>>   39,034,311,742,395  apps (10240)        Trace Printk cpuhp/0  (16)  [000]
>> 39015.625000: <debug> __kthread_parkme state=512 task=ffffffcc7458e680
>> flags: 0x5 -> state 5 -> state is parked inside parkme function
>>
>> 39,034,311,846,510  apps (10240)        Trace Printk cpuhp/0  (16)  [000]
>> 39015.625000: <debug> before schedule __kthread_parkme state=0
>> task=ffffffcc7458e680 flags: 0xd  ->  just before schedule call, state is
>> running
>>
>> tatic void __kthread_parkme(struct kthread *self)
>>
>> {
>>
>>          __set_current_state(TASK_PARKED);
>>
>>          while (test_bit(KTHREAD_SHOULD_PARK, &self->flags)) {
>>
>>                  if (!test_and_set_bit(KTHREAD_IS_PARKED, &self->flags))
>>
>>                          complete(&self->parked);
>>
>>                  schedule();
>>
>>                  __set_current_state(TASK_PARKED);
>>
>>          }
>>
>>          clear_bit(KTHREAD_IS_PARKED, &self->flags);
>>
>>          __set_current_state(TASK_RUNNING);
>>
>> }
>>
>> So my point is here also, if it is reschedule then it can set TASK_PARKED,
>> but it seems after takedown_cpu call this thread never get a chance to run,
>> So final state is TASK_RUNNING.
>>
>> In our current fix also can't we observe same scenario where final state is
>> TASK_RUNNING.
> 
> I'm not sure I understand your concern. Loosing the TASK_PARKED store
> with the above code is obviously bad. But with the loop as proposed I
> don't see a problem.

Yes with loop, it will reset TASK_PARKED but that is not happening in 
the dumps we have seen. Here before schedule state is RUNNING and cpuhp
got migrate to some core but never get a chance to run so state is running.
> 
> takedown_cpu() can proceed beyond smpboot_park_threads() and kill the
> CPU before any of the threads are parked -- per having the complete()
> before hitting schedule().
> 
> And, afaict, that is harmless. When we go offline, sched_cpu_dying() ->
> migrate_tasks() will migrate any still runnable threads off the cpu.
> But because at this point the thread must be in the PARKED wait-loop, it
> will hit schedule() and go to sleep eventually.
> 
> Also note that kthread_unpark() does __kthread_bind() to rebind the
> threads.
> 
> Aaaah... I think I've spotted a problem there. We clear SHOULD_PARK
> before we rebind, so if the thread lost the first PARKED store,
> does the completion, gets migrated, cycles through the loop and now
> observes !SHOULD_PARK and bails the wait-loop, then __kthread_bind()
> will forever wait.
> 

So during next unpark
__kthread_unpark -> __kthread_bind -> wait_task_inactive (this got 
failed, as current state is running so failed on below call:


  while (task_running(rq, p)) {
                         if (match_state && unlikely(p->state != 
match_state))
                                 return 0;
                         cpu_relax();
                 }

and gives warning:


         if (!wait_task_inactive(p, state)) {
                 WARN_ON(1);
                 return; -> return from here, and further binding call 
fail which is after this code.
         }


finally it is giving bug_on here as we failed to rebind hotplug to our core:
                         }
                         kthread_parkme();
                         /* We might have been woken for stop */
                         continue;
                 }

                 BUG_ON(td->cpu != smp_processor_id()); panic occured.


So it seems we always have to be in PARKED state only , not miss any 
single instance.
> Is that what you had in mind?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, 
Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

  parent reply	other threads:[~2018-05-01 10:40 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-25  8:33 [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup Gaurav Kohli
2018-04-25 20:09 ` Peter Zijlstra
2018-04-26  4:04   ` Kohli, Gaurav
2018-04-26  9:14     ` Peter Zijlstra
2018-04-26  8:41   ` Peter Zijlstra
2018-04-26  8:57     ` Peter Zijlstra
2018-04-26 15:53       ` Kohli, Gaurav
2018-04-30 11:17         ` Peter Zijlstra
2018-05-01  7:50           ` Kohli, Gaurav
2018-05-01 10:18             ` Peter Zijlstra
2018-05-01 10:40               ` Peter Zijlstra
2018-05-01 10:40               ` Kohli, Gaurav [this message]
2018-05-01 11:31                 ` Peter Zijlstra
2018-05-01 11:46                   ` Kohli, Gaurav
2018-05-01 13:19                     ` Peter Zijlstra
2018-05-02  5:15                       ` Kohli, Gaurav
2018-05-02  8:20                         ` Peter Zijlstra
2018-05-02 10:13                           ` Kohli, Gaurav
2018-05-07 11:09                             ` Kohli, Gaurav
2018-05-07 11:23                               ` Kohli, Gaurav
2018-06-05 11:13                                 ` Kohli, Gaurav
2018-06-05 15:08                                   ` Oleg Nesterov
2018-06-05 15:22                                     ` Peter Zijlstra
2018-06-05 15:40                                       ` Peter Zijlstra
2018-06-05 16:35                                         ` Oleg Nesterov
2018-06-05 18:21                                           ` Kohli, Gaurav
2018-06-05 20:13                                           ` Peter Zijlstra
2018-06-06 13:51                                             ` Oleg Nesterov
2018-06-06 15:03                                               ` Peter Zijlstra
2018-06-06 15:04                                               ` Peter Zijlstra
2018-06-06 15:22                                               ` Peter Zijlstra
2018-06-06 18:59                                               ` Peter Zijlstra
2018-06-07  8:30                                                 ` Kohli, Gaurav
2018-05-01 10:44               ` Peter Zijlstra
2018-04-26 16:02     ` Andrea Parri
2018-04-26 16:18     ` Oleg Nesterov
2018-04-30 11:20       ` Peter Zijlstra
2018-04-30 11:56         ` Peter Zijlstra
2018-04-28  6:43 ` [lkp-robot] [kthread/smpboot] cad8e99675: inconsistent{IN-HARDIRQ-W}->{HARDIRQ-ON-W}usage kernel test robot
2018-04-28  6:43   ` kernel test robot
2018-04-28  6:43   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f134c8d8-e56f-bf29-0177-e6702ae79f2b@codeaurora.org \
    --to=gkohli@codeaurora.org \
    --cc=bigeasy@linutronix.de \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=neeraju@codeaurora.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.