All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jet Chen <jet.chen@intel.com>
To: Viresh Kumar <viresh.kumar@linaro.org>,
	Fengguang Wu <fengguang.wu@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: WARNING: CPU: 0 PID: 1935 at kernel/timer.c:1621 migrate_timer_list()
Date: Fri, 04 Apr 2014 15:46:39 +0800	[thread overview]
Message-ID: <533E635F.9050803@intel.com> (raw)
In-Reply-To: <CAKohponbGFFo+W7X24e4h3ub2L8Tkb89eP6_F17CRYELk0yeRQ@mail.gmail.com>

Hi Viresh,

I changed your print message as you suggested.

diff --git a/kernel/timer.c b/kernel/timer.c
index 6c3a371..193101d 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1617,8 +1617,8 @@ static void migrate_timer_list(struct tvec_base *new_base, struct list_head *hea

                  /* Check if CPU still has pinned timers */
                  if (unlikely(WARN(is_pinned,
-                                 "%s: can't migrate pinned timer: %p, deactivating it\n",
-                                 __func__, timer)))
+                                 "%s: can't migrate pinned timer: %p, timer->function: %p,deactivating it\n",
+                                 __func__, timer, timer->function)))
                          continue;

Then I reproduced the issue, and got the dmesg output,

[   37.918406] migrate_timer_list: can't migrate pinned timer: ffffffff81f06a60, timer->function: ffffffff810d7010,deactivating it

We reproduced this issue for several times in our LKP system. The address of timer ffffffff81f06a60 is very constant. So is timer->function, I believe.

Hope this information will help you. Please feel free to tell me what else I can do to help you.

Thanks,
Jet

On 04/04/2014 12:56 PM, Viresh Kumar wrote:
> Thanks Fengguang,
>
> On 4 April 2014 08:49, Fengguang Wu <fengguang.wu@intel.com> wrote:
>> Greetings,
>>
>> I got the below dmesg and the first bad commit is
>>
>> git://git.linaro.org/people/vireshk/linux timer-cleanup-for-tglx
>>
>> commit 6378cb51af5f4743db0dcb3cbcf862eac5908754
>> Author:     Viresh Kumar <viresh.kumar@linaro.org>
>> AuthorDate: Thu Mar 20 14:29:02 2014 +0530
>> Commit:     Viresh Kumar <viresh.kumar@linaro.org>
>> CommitDate: Wed Apr 2 14:54:57 2014 +0530
>>
>>      timer: don't migrate pinned timers
>>
>>      migrate_timer() is called when a CPU goes down and its timers are required to be
>>      migrated to some other CPU. Its the responsibility of the users of the timer to
>>      remove it before control reaches to migrate_timers().
>>
>>      As these were the pinned timers, the best we can do is: don't migrate these and
>>      report to the user as well.
>>
>>      That's all this patch does.
>>
>>      Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>>
>> ===================================================
>> PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT!
>> ===================================================
>> Attached one more dmesg for the NULL pointer bug in parent commit.
>>
>> +--------------------------------------------------------+------------+------------+------------+
>> |                                                        | 5a8530b7c3 | 6378cb51af | 7caf71f403 |
>> +--------------------------------------------------------+------------+------------+------------+
>> | boot_successes                                         | 103        | 14         | 10         |
>> | boot_failures                                          | 17         | 18         | 13         |
>> | BUG:unable_to_handle_kernel_NULL_pointer_dereference   | 16         |            |            |
>> | Oops:SMP                                               | 16         |            |            |
>> | Kernel_panic-not_syncing:Fatal_exception               | 16         |            |            |
>> | backtrace:vfs_read                                     | 16         |            |            |
>> | backtrace:SyS_read                                     | 16         |            |            |
>> | BUG:kernel_test_crashed                                | 1          |            |            |
>> | WARNING:CPU:PID:at_kernel/timer.c:migrate_timer_list() | 0          | 17         | 12         |
>> | backtrace:vfs_write                                    | 0          | 17         | 12         |
>> | backtrace:SyS_write                                    | 0          | 17         | 12         |
>> | BUG:kernel_early_hang_without_any_printk_output        | 0          | 1          | 1          |
>> +--------------------------------------------------------+------------+------------+------------+
>>
>> [   74.242293] Unregister pv shared memory for cpu 1
>> [   74.273280] smpboot: CPU 1 is now offline
>> [   74.274685] ------------[ cut here ]------------
>> [   74.275524] WARNING: CPU: 0 PID: 1935 at kernel/timer.c:1621 migrate_timer_list+0xd6/0xf0()
>> [   74.275524] migrate_timer_list: can't migrate pinned timer: ffffffff81f06a60, deactivating it
>
> Hmm, nice. So, my patch hasn't created a bug, but just highlighted it.
> I have added this piece of code while migrating timers away:
>
> if (unlikely(WARN(is_pinned,
>      "%s: can't migrate pinned timer: %p, deactivating it\n",
>      __func__, timer)))
>
> Which means, migrate all timers to other CPUs when a CPU is going down.
> But obviously we can't migrate the pinned timers. And it looks like we
> actually were doing that before this commit and things went unnoticed.
>
> But just due to this print, we are highlighting an existing issue here.
> @Thomas: So, in a sense my patch is doing some good work now :)
>
> Now, we need to fix the code which queued this pinned timer.
> @Fengguang: As I don't have the facilities to reproduce this, can you
> help me debugging this? Probably just change this print message to
> print the address of timer->function as well and then we can find
> the name of routine with help of objdump..
>
> What we need to do finally is to remove this timer before CPU is
> going down. I can fix the driver in question once we know which
> driver it is.
>
> --
> viresh
>

  reply	other threads:[~2014-04-04  7:47 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-04  3:19 WARNING: CPU: 0 PID: 1935 at kernel/timer.c:1621 migrate_timer_list() Fengguang Wu
2014-04-04  4:56 ` Viresh Kumar
2014-04-04  7:46   ` Jet Chen [this message]
2014-04-04  7:52     ` Viresh Kumar
2014-04-05  4:30       ` Jet Chen
2014-04-05  6:26         ` Viresh Kumar
2014-04-05  7:18           ` Jet Chen
2014-04-05 12:05             ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=533E635F.9050803@intel.com \
    --to=jet.chen@intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.