All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alok Kataria <akataria@vmware.com>
To: "jacob.jun.pan@linux.intel.com" <jacob.jun.pan@linux.intel.com>
Cc: "rui.zhang@intel.com" <rui.zhang@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"eric.ernst@intel.com" <eric.ernst@intel.com>,
	"rjw@sisk.pl" <rjw@sisk.pl>
Subject: Re: Regression in intel_powerclamp, due to cpu whitelist removal
Date: Thu, 20 Oct 2016 04:02:55 +0000	[thread overview]
Message-ID: <1476936498.2694.21.camel@vmware.com> (raw)
In-Reply-To: <20161019204530.3d2ec1d5@jacob-builder>

On Wed, 2016-10-19 at 20:45 -0700, Jacob Pan wrote:
> On Tue, 18 Oct 2016 14:20:49 +0000
> Alok Kataria <akataria@vmware.com> wrote:
> 
> > Hi Jacob, Zhang, 
> > 
> > One of your recent commit "thermal/powerclamp: remove cpu
> > whitelist” [1], has caused a regression in the kernel. 
> > 
> > That commit changed powerclamp_probe from requiring all of the
> > following features:
> > 
> > X86_FEATURE_NONSTOP_TSC
> > X86_FEATURE_CONSTANT_TSC
> > X86_FEATURE_MWAIT
> > X86_FEATURE_ARAT           
> > 
> > to *any* of them.  The problem is clamp_thread still wants to use
> > mwait_idle_with_hints even if the CPU doesn't support it. 
> >
> Hi Alok,
> 
> You are right, it should be AND not OR.
>  
> +Eric who has a patch to address this.
> 
> https://patchwork.kernel.org/patch/9365005/

Thanks Jacob. 
Also, I don't see stable copied on that submission, shouldn't this be a
candidate for backporting to all affected kernel versions ?

Thanks,
Alok
> 
> Rui/Rafael,
> 
> Could you consider this as an urgent fix?
> 
> Jacob
> > This was reported by our users when running ubuntu 16.10
> > (4.8.0-22-generic) inside a VMware VM, though as mentioned above I
> > don’t think it is specific to our platform. We have seen kernel
> > panics due to invalid opcode because of this. Below is the stack
> > trace for your reference. 
> > 
> > [    5.736416] invalid opcode: 0000 [#1] SMP
> > [    5.736455] Modules linked in: vmw_vsock_vmci_transport vsock
> > vmw_balloon intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul
> > ghash_clmulni_intel aesni_intel aes_x86_64 lrw glue_helper
> > ablk_helper cryptd intel_rapl_perf input_leds joydev serio_raw
> > snd_ens1371 snd_ac97_codec gameport ac97_bus snd_pcm snd_seq_midi
> > snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd
> > soundcore i2c_piix4 shpchp vmw_vmci nfit floppy(+) mac_hid parport_pc
> > ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid
> > ahci libahci e1000 mptspi mptscsih psmouse mptbase vmwgfx
> > scsi_transport_spi ttm drm_kms_helper syscopyarea sysfillrect
> > sysimgblt fb_sys_fops drm pata_acpi fjes [    5.744370] CPU: 1 PID:
> > 912 Comm: kidle_inject/1 Not tainted 4.8.0-22-generic #24-Ubuntu
> > [    5.744373] Hardware name: VMware, Inc. VMware Virtual
> > Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
> > [    5.744375] task: ffff9658f7a663c0 task.stack: ffff9658fa908000
> > [    5.744378] RIP: 0010:[<ffffffffc05728b8>]  [<ffffffffc05728b8>]
> > clamp_thread+0x2b8/0x5d0 [intel_powerclamp] [    5.744380] RSP:
> > 0018:ffff9658fa90be00  EFLAGS: 00010246 [    5.744383] RAX:
> > ffff9658fa908008 RBX: 00000000fffee0a6 RCX: 0000000000000000
> > [    5.744386] RDX: 0000000000000000 RSI: 0000000000000246 RDI:
> > 0000000000000246 [    5.744388] RBP: ffff9658fa90bec0 R08:
> > ffff9658fa908000 R09: 0000000000000000 [    5.744391] R10:
> > 000000000001cbf7 R11: 0000000000000000 R12: ffffffff8db581a0
> > [    5.744393] R13: ffff9658fa908000 R14: 0000000000000000 R15:
> > ffff9658fa908000 [    5.744396] FS:  0000000000000000(0000)
> > GS:ffff9658fc640000(0000) knlGS:0000000000000000 [    5.744398] CS:
> > 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [    5.744401] CR2:
> > 00007ffa6cc262e8 CR3: 000000003ab3b000 CR4: 00000000001406e0
> > [    5.744403] Stack: [    5.744406]  0000000000000001
> > ffff9658f7a66dc0 ffff9658fc659200 00000000e878d638 [    5.744409]
> > 0000000000000001 00000002fc659200 0000000000000001 ffff9658fa908008
> > [    5.744411]  0000000000000000 ffff9658fc64fea8 00000000fffee0a6
> > ffffffffc05720a0 [    5.744414] Call Trace: [    5.744416]
> > [<ffffffffc05720a0>] ? pkg_state_counter+0xa0/0xa0 [intel_powerclamp]
> > [    5.744419]  [<ffffffffc0572600>] ?
> > powerclamp_set_cur_state+0x170/0x170 [intel_powerclamp]
> > [    5.744421]  [<ffffffffc0572600>] ?
> > powerclamp_set_cur_state+0x170/0x170 [intel_powerclamp]
> > [    5.744424]  [<ffffffff8cca3c18>] kthread+0xd8/0xf0
> > [    5.744427]  [<ffffffff8d49f29f>] ret_from_fork+0x1f/0x40
> > [    5.744429]  [<ffffffff8cca3b40>] ?
> > kthread_create_on_node+0x1e0/0x1e0 [    5.744432] Code: cc e9 ba 00
> > 00 00 eb 19 0f 1f 00 0f ae f0 65 48 8b 04 25 04 69 01 00 0f ae b8 08
> > c0 ff ff 0f ae f0 31 d2 48 8b 44 24 38 48 89 d1 <0f> 01 c8 49 8b 45
> > 08 a8 08 75 0b b9 01 00 00 00 4c 89 f0 0f 01 [    5.744434] RIP
> > [<ffffffffc05728b8>] clamp_thread+0x2b8/0x5d0 [intel_powerclamp]
> > [    5.744437]  RSP <ffff9658fa90be00> [    5.744440] invalid opcode:
> > 0000 [#2] SMP [    5.744452] ---[ end trace cf659c4076bf2804 ]---
> > 
> > Looking at the instruction at the RIP <ffffffffc05728b8> shows that
> > the kernel attempted to execute “monitor” instruction. 
> > 
> >  8b8:   0f 01 c8                monitor %rax,%rcx,%rdx
> >  8bb:   49 8b 45 08             mov    0x8(%r13),%rax
> > 
> > To fix this, I think you should restore the explicit feature check
> > “if block” that was removed in the above mentioned commit. Can you
> > please look at this ?
> > 
> > Thanks,
> > Alok
> > 
> > 
> > [1] b721ca0d192754deccb89fb01c77e41e6fd91ad9
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_torvalds_linux_commit_b721ca0d192754deccb89fb01c77e41e6fd91ad9&d=CwIFaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=2AkLWShm6V8Nuu8ZZ-80Flo6y0XxCGmO1xrsAeRArAE&m=7uVsMg9U267LoIREKGqRgG6PRN0CXj7r4Or_eZkIGSc&s=k4SUhjPw1E7qeXBt7d40wlxcG1Bh4bXI-nosLw5SdYM&e= , 
> > 
> 
> [Jacob Pan]

  reply	other threads:[~2016-10-20  4:03 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-18 14:20 Regression in intel_powerclamp, due to cpu whitelist removal Alok Kataria
2016-10-20  3:45 ` Jacob Pan
2016-10-20  4:02   ` Alok Kataria [this message]
2016-10-20  5:28   ` Zhang Rui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1476936498.2694.21.camel@vmware.com \
    --to=akataria@vmware.com \
    --cc=eric.ernst@intel.com \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=rui.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.