public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>, Mike Galbraith <efault@gmx.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Len Brown <len.brown@intel.com>,
	linux-pm@vger.kernel.org, x86@kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Len Brown <lenb@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86 idle: repair large-server 50-watt idle-power regression
Date: Thu, 19 Dec 2013 19:43:39 +0100	[thread overview]
Message-ID: <20131219184339.GA32669@gmail.com> (raw)
In-Reply-To: <CA+55aFzGxcML7j8CEvQPYzh0W81uVoAAVmGctMOUZ7CZ1yYd2A@mail.gmail.com>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> The x86 memory rules are that two loads always execute in order (ie 
> rmb is a no-op).
> 
> So I see no reason for a memory barrier after the monitor. [...]

Yes, I'm leaning towards that interpretation as well, but the reason 
I'm a bit catious is the somewhat curious (to me!) wording of the 
MONITOR instruction:

  Sets up a linear address range to be monitored by hardware and 
  activates the monitor.

  ...

  The MONITOR instruction is ordered as a load operation with respect 
  to other memory trans­actions. The instruction can be used at all 
  privilege levels and is subject to all permission checking and 
  faults associated with a byte load. Like a load, the MONITOR 
  instruction sets the A-bit but not the D-bit in page tables.

Where apparently the 'range' means 'full cache line surrounding the 
memory address in question'.

We have no other load instructions that operate on such a large 
'range' of addresses, and I wanted to make sure it's a true (single 
byte) load for that specific address. The documentation does not 
appear to explicitly state that it's a load for that address - only 
that it's ordered as a load.

The reason I'm asking is because 'flags' itself might not be at the 
beginning of the cache line, as it's in the middle of thread_info:

 struct thread_info {
        struct task_struct      *task;          /* main task structure */
        struct exec_domain      *exec_domain;   /* execution domain */
        __u32                   flags;          /* low level flags */

while 'MONITOR' appears to work on the cache line. So are all 
addresses within that cache line ordered? Only the specific address 
given to the instruction itself? Only the first word of the cacheline 
itself?

The documentation is a bit vague, at least in my reading, and 
depending on which actual word the instruction reads (if it reads any 
word at all ... it's probably just setting up an address for MWAIT) 
from that cacheline, its ordering properties might be surprising.

> [...] But both sides of clflush sounds sane, and as mentioned the 
> "go to sleep" side isn't as critical as the "wake up" side if the 
> monitor.

Yeah.

> Please let's just make that pre-monitor hack be a static key, and do 
> mfence explicitly around the clflush inside that conditional 
> section.

Agreed.

Thanks,

	Ingo

  parent reply	other threads:[~2013-12-19 18:43 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-07  8:00 50 Watt idle power regression bisected to Linux-3.10 Len Brown
2013-12-07  8:39 ` Mike Galbraith
2013-12-07 16:01   ` Len Brown
2013-12-07 16:45     ` Len Brown
2013-12-07 19:17       ` Mike Galbraith
2013-12-10 11:41         ` Ingo Molnar
2013-12-07 12:54 ` Thomas Gleixner
2013-12-08  4:57 ` Mike Galbraith
2013-12-08 20:40   ` Len Brown
2013-12-09  3:16     ` Mike Galbraith
2013-12-10  5:17       ` Mike Galbraith
2013-12-10 11:45         ` Ingo Molnar
2013-12-10 14:29         ` Thomas Gleixner
2013-12-10 15:06           ` Ingo Molnar
2013-12-11  2:05           ` Thomas Gleixner
2013-12-11  3:21             ` Mike Galbraith
2013-12-11 11:28               ` Thomas Gleixner
2013-12-11 11:38                 ` Borislav Petkov
2013-12-11 11:52                   ` Peter Zijlstra
2013-12-11 12:29                     ` Mike Galbraith
2013-12-11 12:43                       ` Peter Zijlstra
2013-12-11 13:10                         ` Mike Galbraith
2013-12-11 13:40                         ` Borislav Petkov
2013-12-11 14:56                           ` Ingo Molnar
2013-12-11 16:02                             ` Borislav Petkov
2013-12-11 16:43                             ` Peter Zijlstra
2013-12-11 17:50                               ` Ingo Molnar
2013-12-11 23:08                                 ` H. Peter Anvin
2013-12-11 23:14                                   ` Borislav Petkov
2013-12-12  0:52                                     ` H. Peter Anvin
2013-12-12  4:25                                       ` Mike Galbraith
2013-12-12  4:49                                         ` H. Peter Anvin
2013-12-12  4:59                                           ` Mike Galbraith
2013-12-12  5:37                                           ` Mike Galbraith
2013-12-12  5:45                                             ` H. Peter Anvin
2013-12-12  5:57                                               ` Mike Galbraith
2013-12-12  6:05                                                 ` Mike Galbraith
2013-12-12  7:57                                                   ` H. Peter Anvin
2013-12-12  8:51                                   ` Peter Zijlstra
2013-12-12 13:28                                     ` Ingo Molnar
2013-12-12 15:06                                       ` H. Peter Anvin
2013-12-12 15:51                                         ` Peter Zijlstra
2013-12-11 14:42                         ` Ingo Molnar
2013-12-11 15:02                           ` Thomas Gleixner
2013-12-11 15:09                             ` Ingo Molnar
2013-12-11 16:44                               ` Peter Zijlstra
2013-12-11 17:48                                 ` Ingo Molnar
2013-12-11 16:44                           ` Peter Zijlstra
2013-12-11 17:47                             ` Ingo Molnar
2013-12-11 21:43                     ` Len Brown
2013-12-11 22:22                       ` Thomas Gleixner
2013-12-18 21:44 ` [PATCH] x86 idle: repair large-server 50-watt idle-power regression Len Brown
2013-12-19 12:22   ` Ingo Molnar
2013-12-19 14:40     ` H. Peter Anvin
2013-12-19 15:45       ` Borislav Petkov
2013-12-19 15:55     ` H. Peter Anvin
2013-12-19 16:02       ` Ingo Molnar
2013-12-19 16:09         ` H. Peter Anvin
2013-12-19 16:13         ` H. Peter Anvin
2013-12-19 16:21           ` Peter Zijlstra
2013-12-19 16:50             ` H. Peter Anvin
2013-12-19 17:07               ` Ingo Molnar
2013-12-19 17:25                 ` Peter Zijlstra
2013-12-19 17:36                   ` Peter Zijlstra
2013-12-19 18:05                     ` H. Peter Anvin
2013-12-19 18:14                       ` Ingo Molnar
2013-12-19 17:50                   ` Peter Zijlstra
2013-12-19 18:18                     ` Ingo Molnar
2013-12-19 21:05                       ` H. Peter Anvin
2013-12-19 21:17                         ` Ingo Molnar
2013-12-19 18:10                   ` Ingo Molnar
2013-12-19 18:09                 ` H. Peter Anvin
2013-12-19 18:19                   ` H. Peter Anvin
2013-12-19 18:23                     ` Ingo Molnar
     [not found]                       ` <CA+55aFzGxcML7j8CEvQPYzh0W81uVoAAVmGctMOUZ7CZ1yYd2A@mail.gmail.com>
2013-12-19 18:43                         ` Ingo Molnar [this message]
2013-12-19 20:09                         ` [tip:x86/idle] x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers tip-bot for H. Peter Anvin
2013-12-19 20:40                           ` Ingo Molnar
2013-12-19 20:46                             ` Linus Torvalds
2013-12-19 21:14                               ` Ingo Molnar
2013-12-19 21:25                                 ` Linus Torvalds
2013-12-19 21:55                             ` Peter Zijlstra
2013-12-20  8:47                               ` Ingo Molnar
2013-12-19 20:33                         ` [tip:x86/idle] x86, idle: Add memory barriers around clflush in mwait_play_dead() tip-bot for H. Peter Anvin
2013-12-19 18:19                   ` [PATCH] x86 idle: repair large-server 50-watt idle-power regression Ingo Molnar
2013-12-19 19:22                     ` H. Peter Anvin
2013-12-19 19:27                       ` Peter Zijlstra
2013-12-19 19:51   ` [tip:x86/urgent] x86 idle: Repair " tip-bot for Len Brown
2014-03-18  0:20     ` Davidlohr Bueso
2014-03-18  9:16       ` Peter Zijlstra
2014-03-19  2:14         ` Jason Low
2014-03-19  6:42           ` Peter Zijlstra
2014-04-08 21:43       ` Brown, Len
2014-04-09  8:18         ` Peter Zijlstra
2014-04-15  3:27         ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131219184339.GA32669@gmail.com \
    --to=mingo@kernel.org \
    --cc=bp@alien8.de \
    --cc=efault@gmx.de \
    --cc=hpa@zytor.com \
    --cc=len.brown@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox