All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
To: oleksandr@natalenko.name
Cc: viresh.kumar@linaro.org, LKML <linux-kernel@vger.kernel.org>,
	linux-pm@vger.kernel.org
Subject: Re: Fwd: [BUG] oops in cpufreq driver with AMD Kaveri CPU
Date: Tue, 12 Aug 2014 18:39:39 -0500	[thread overview]
Message-ID: <53EAA5BB.2010207@amd.com> (raw)
In-Reply-To: <CAOjmkp8h19de3bYaLpqmXxEynKi--gHBf6MxXuuNoDzZXw=O8Q@mail.gmail.com>

On 8/12/2014 2:51 PM, Aravind Gopalakrishnan wrote:
>
>
> Hello.
>
> Occasionally I get my machine hung completely. Fortunately, I've got 
> and saved
> oops listing using netconsole before hang, and here it is [1].
>
> Here is little piece of oops from the link above:
>
> ===
> [15051.270461] BUG: unable to handle kernel paging request at 
> 00000000ff5ae8e4
> [15051.271583] IP: [<ffffffff8109ae6e>] srcu_notifier_call_chain+0xe/0x20
> …
> [15051.956205] Call Trace:
> [15051.980641]  [<ffffffff81606085>] ? 
> __cpufreq_notify_transition+0x95/0x1e0
> [15052.005640]  [<ffffffff816081ee>] cpufreq_notify_transition+0x3e/0x70
> [15052.030240]  [<ffffffff816083d8>] 
> cpufreq_freq_transition_begin+0xe8/0x130
> [15052.054522]  [<ffffffff813b8940>] ? ucs2_strncmp+0x70/0x70
> [15052.078208]  [<ffffffff816089bf>] __target_index+0xbf/0x1a0
> [15052.101348]  [<ffffffff81608b9c>] __cpufreq_driver_target+0xfc/0x160
> [15052.124250]  [<ffffffff8160b0d4>] od_check_cpu+0xa4/0xb0
> [15052.146789]  [<ffffffff8160c9ec>] dbs_check_cpu+0x16c/0x1c0
> [15052.168935]  [<ffffffff8160b4dd>] od_dbs_timer+0x11d/0x180
> [15052.190607]  [<ffffffff8108e6ff>] process_one_work+0x17f/0x4c0
> [15052.211825]  [<ffffffff8108f46b>] worker_thread+0x11b/0x3f0
> [15052.232490]  [<ffffffff8108f350>] ? create_and_start_worker+0x80/0x80
> [15052.253127]  [<ffffffff81096479>] kthread+0xc9/0xe0
> [15052.273292]  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
> [15052.293487]  [<ffffffff81793efc>] ret_from_fork+0x7c/0xb0
> [15052.313544]  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
> …
> ===
>
> Also here is my lspci [2] and cpuinfo [3] as well.
>
> Vanilla 3.15.8 and 3.16.0 are affected as well as latest Ubuntu 3.13 
> kernel.
>
> No visible reason to trigger the bug. After hang machine doesn't 
> respond via
> network, there's no disk IO, and also it doesn't respond to pressing power
> button in order to perform soft off.
>
> [1] https://gist.github.com/085af9da81197faf6637
> [2] https://gist.github.com/318ebda5576b099590b8
> [3] https://gist.github.com/9c1307463c7ad6835b2d
>
>

Hi,

I noticed this ping yesterday and tried to reproduce your issue on a 
similar system I have (btw, this is a 'Kabini' processor and not a 
'Kaveri') without success.

/proc/cpuinfo:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 22
model           : 0
model name      : AMD Opteron(tm) X2150 APU
stepping        : 1
microcode       : 0x7000106
cpu MHz         : 800.000
cache size      : 2048 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext 
fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc 
extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 cx16 sse4_1 
sse4_2 movbe popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic 
cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt 
topoext perfctr_nb perfctr_l2 arat xsaveopt hw_pstate proc_feedback npt 
lbrv svm_lock nrip_save tsc_scale flushbyasid decodeassists pausefilter 
pfthreshold bmi1
bogomips        : 3793.19
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate [11]

Since the BUG happens on a frequency transition, I tried this-
periodically ramped up the cpu frequency by running a workload to keep 
all cores busy for sometime; And let cpu frequency drop down by killing 
the load.
Repeated this cycle overnight yesterday but did not notice the BUG.
(Using ondemand governor, with uname -r: 3.16-rc4)
(I think you mentioned you were able to reproduce on 3.16. So assuming 
-rc will be affected too)

Are you noticing this BUG when you are running any particular load?
I could help debug effort or test patches to fix issue(whenever 
necessary) if I have some way to reproduce this..

-Aravind

WARNING: multiple messages have this Message-ID (diff)
From: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
To: <oleksandr@natalenko.name>
Cc: <viresh.kumar@linaro.org>, LKML <linux-kernel@vger.kernel.org>,
	<linux-pm@vger.kernel.org>
Subject: Re: Fwd: [BUG] oops in cpufreq driver with AMD Kaveri CPU
Date: Tue, 12 Aug 2014 18:39:39 -0500	[thread overview]
Message-ID: <53EAA5BB.2010207@amd.com> (raw)
In-Reply-To: <CAOjmkp8h19de3bYaLpqmXxEynKi--gHBf6MxXuuNoDzZXw=O8Q@mail.gmail.com>

On 8/12/2014 2:51 PM, Aravind Gopalakrishnan wrote:
>
>
> Hello.
>
> Occasionally I get my machine hung completely. Fortunately, I've got 
> and saved
> oops listing using netconsole before hang, and here it is [1].
>
> Here is little piece of oops from the link above:
>
> ===
> [15051.270461] BUG: unable to handle kernel paging request at 
> 00000000ff5ae8e4
> [15051.271583] IP: [<ffffffff8109ae6e>] srcu_notifier_call_chain+0xe/0x20
> …
> [15051.956205] Call Trace:
> [15051.980641]  [<ffffffff81606085>] ? 
> __cpufreq_notify_transition+0x95/0x1e0
> [15052.005640]  [<ffffffff816081ee>] cpufreq_notify_transition+0x3e/0x70
> [15052.030240]  [<ffffffff816083d8>] 
> cpufreq_freq_transition_begin+0xe8/0x130
> [15052.054522]  [<ffffffff813b8940>] ? ucs2_strncmp+0x70/0x70
> [15052.078208]  [<ffffffff816089bf>] __target_index+0xbf/0x1a0
> [15052.101348]  [<ffffffff81608b9c>] __cpufreq_driver_target+0xfc/0x160
> [15052.124250]  [<ffffffff8160b0d4>] od_check_cpu+0xa4/0xb0
> [15052.146789]  [<ffffffff8160c9ec>] dbs_check_cpu+0x16c/0x1c0
> [15052.168935]  [<ffffffff8160b4dd>] od_dbs_timer+0x11d/0x180
> [15052.190607]  [<ffffffff8108e6ff>] process_one_work+0x17f/0x4c0
> [15052.211825]  [<ffffffff8108f46b>] worker_thread+0x11b/0x3f0
> [15052.232490]  [<ffffffff8108f350>] ? create_and_start_worker+0x80/0x80
> [15052.253127]  [<ffffffff81096479>] kthread+0xc9/0xe0
> [15052.273292]  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
> [15052.293487]  [<ffffffff81793efc>] ret_from_fork+0x7c/0xb0
> [15052.313544]  [<ffffffff810963b0>] ? flush_kthread_worker+0xb0/0xb0
> …
> ===
>
> Also here is my lspci [2] and cpuinfo [3] as well.
>
> Vanilla 3.15.8 and 3.16.0 are affected as well as latest Ubuntu 3.13 
> kernel.
>
> No visible reason to trigger the bug. After hang machine doesn't 
> respond via
> network, there's no disk IO, and also it doesn't respond to pressing power
> button in order to perform soft off.
>
> [1] https://gist.github.com/085af9da81197faf6637
> [2] https://gist.github.com/318ebda5576b099590b8
> [3] https://gist.github.com/9c1307463c7ad6835b2d
>
>

Hi,

I noticed this ping yesterday and tried to reproduce your issue on a 
similar system I have (btw, this is a 'Kabini' processor and not a 
'Kaveri') without success.

/proc/cpuinfo:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 22
model           : 0
model name      : AMD Opteron(tm) X2150 APU
stepping        : 1
microcode       : 0x7000106
cpu MHz         : 800.000
cache size      : 2048 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext 
fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc 
extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 cx16 sse4_1 
sse4_2 movbe popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic 
cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt 
topoext perfctr_nb perfctr_l2 arat xsaveopt hw_pstate proc_feedback npt 
lbrv svm_lock nrip_save tsc_scale flushbyasid decodeassists pausefilter 
pfthreshold bmi1
bogomips        : 3793.19
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate [11]

Since the BUG happens on a frequency transition, I tried this-
periodically ramped up the cpu frequency by running a workload to keep 
all cores busy for sometime; And let cpu frequency drop down by killing 
the load.
Repeated this cycle overnight yesterday but did not notice the BUG.
(Using ondemand governor, with uname -r: 3.16-rc4)
(I think you mentioned you were able to reproduce on 3.16. So assuming 
-rc will be affected too)

Are you noticing this BUG when you are running any particular load?
I could help debug effort or test patches to fix issue(whenever 
necessary) if I have some way to reproduce this..

-Aravind

  parent reply	other threads:[~2014-08-12 23:39 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-04 21:39 [BUG] oops in cpufreq driver with AMD Kaveri CPU Oleksandr Natalenko
2014-08-07 20:53 ` Oleksandr Natalenko
2014-08-08 17:26   ` Oleksandr Natalenko
2014-08-12  5:52   ` Viresh Kumar
2014-08-12  5:55     ` Oleksandr Natalenko
2014-08-12  6:16       ` Viresh Kumar
2014-08-12  7:26         ` Oleksandr Natalenko
2014-08-12  7:46           ` Viresh Kumar
2014-08-12 18:04             ` Oleksandr Natalenko
2014-08-12 18:18               ` Oleksandr Natalenko
2014-08-12 18:54                 ` Oleksandr Natalenko
     [not found]                   ` <CAOjmkp_mrMYJJfEqqKtPVrbMuaoJ9W6212LKHETeUsOsJryh-Q@mail.gmail.com>
     [not found]                     ` <CAOjmkp8h19de3bYaLpqmXxEynKi--gHBf6MxXuuNoDzZXw=O8Q@mail.gmail.com>
2014-08-12 23:39                       ` Aravind Gopalakrishnan [this message]
2014-08-12 23:39                         ` Fwd: " Aravind Gopalakrishnan
2014-08-13  8:02                         ` Oleksandr Natalenko
2014-08-13  4:36                   ` Viresh Kumar
2014-08-13  4:42                   ` Viresh Kumar
2014-08-13  5:43                     ` Oleksandr Natalenko
2014-11-11 10:41                     ` Oleksandr Natalenko
2014-11-18 19:07                       ` Oleksandr Natalenko
2015-04-14 16:07                         ` Oleksandr Natalenko
2014-08-13  4:32                 ` Viresh Kumar
2014-08-13  5:56                   ` Oleksandr Natalenko
2014-08-13 12:45                     ` Oleksandr Natalenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53EAA5BB.2010207@amd.com \
    --to=aravind.gopalakrishnan@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=oleksandr@natalenko.name \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.