* Re: __force_order usage on x86's CRn accesses
[not found] <4FF2D6AB020000780008D460@nat28.tlf.novell.com>
@ 2012-07-03 9:55 ` Glauber Costa
2012-07-03 10:08 ` Jan Beulich
0 siblings, 1 reply; 3+ messages in thread
From: Glauber Costa @ 2012-07-03 9:55 UTC (permalink / raw)
To: Jan Beulich, jeremy, Ingo Molnar, H. Peter Anvin, Thomas Gleixner,
linux-kernel
>
> Pretty soon after the introduction of this via "x86: unify paravirt
> parts of system.h" Jeremy had posted a patch to correct the
> (reversed in direction of access) constraints using that auxiliary
> variable. What happened to that change? Was it dropped for a
> reason?
About the change itself, I don't know. Maybe Jeremy, who proposed it,
can give you a better answer.
> Furthermore, the addition of these constraints happened
> without any real explanation - the code comment that was added
> doesn't really help understand why "volatile" isn't sufficient here.
If my memory dont't fail me, I believe this is because gcc will feel
free to reorder a sequence of instructions that does not access memory.
Specially since it has no knowledge of what's in the inline assembly,
and what are its constraints. It only knows that it is an register
operation, and treats it like one.
Also, I believe what we are concerned with here is not arbitrary reorder
between that and other instructions, which we welcome, but reordering
between a read and a write to the same crX - specially of concern for
things doing read-modify-writes of control registers.
This is something I heavily looked at the time, but never again, so
confirmation from people with more knowledge on the compilers world can
be probably helpful here.
>
> Next, I also saw it being suggested somewhere to replace the
> static definition of the variable (causing an instance to appear
> everywhere any of the referencing inline function being used)
> by an extern one. That also never got carried out, causing
> about two dozen pointless instances of this variable to be
> scattered around, likely consuming cache bandwidth here and
> there.
I indeed see no reason not to do it.
> Now, rather than exporting an auxiliary variable like this, how
> about using an existing, rarely referenced (at least in affected
> contexts) but already exported one instead?
> empty_zero_page, __supported_pte_mask, x86_platform, or
> high_memory might be candidates.
A similar technique is used to determine a safe address for access
somewhere in the scheduler IIRC,
> Finally (and this is because I lack the explanation why the
> artificial constraint is needed in the first place), why is it that
> clts() doesn't need one too?
>
Because we're not using it to do read-modify-write of the control register.
But let me disclaim again that I might be wrong here...
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: __force_order usage on x86's CRn accesses
2012-07-03 9:55 ` __force_order usage on x86's CRn accesses Glauber Costa
@ 2012-07-03 10:08 ` Jan Beulich
0 siblings, 0 replies; 3+ messages in thread
From: Jan Beulich @ 2012-07-03 10:08 UTC (permalink / raw)
To: Glauber Costa
Cc: Ingo Molnar, jeremy, Thomas Gleixner, linux-kernel,
H. Peter Anvin
>>> On 03.07.12 at 11:55, Glauber Costa <glommer@parallels.com> wrote:
>> Furthermore, the addition of these constraints happened
>> without any real explanation - the code comment that was added
>> doesn't really help understand why "volatile" isn't sufficient here.
>
> If my memory dont't fail me, I believe this is because gcc will feel
> free to reorder a sequence of instructions that does not access memory.
> Specially since it has no knowledge of what's in the inline assembly,
> and what are its constraints. It only knows that it is an register
> operation, and treats it like one.
>
> Also, I believe what we are concerned with here is not arbitrary reorder
> between that and other instructions, which we welcome, but reordering
> between a read and a write to the same crX - specially of concern for
> things doing read-modify-writes of control registers.
But such sequences can't be re-ordered when the asm-s are
volatile qualified.
>> Finally (and this is because I lack the explanation why the
>> artificial constraint is needed in the first place), why is it that
>> clts() doesn't need one too?
>>
>
> Because we're not using it to do read-modify-write of the control register.
Sequences of read_cr0()/clts()/write_cr0() certainly exist (in
the xor implementations), so if a read_cr0()/write_cr0() can
have issues (which is impossible if the value written depends on
the one read), read_cr0()/clts() would be even more affected,
as there's no compiler visible dependency between the two.
But I don't think any such can happen in reality, and I was
hoping to be provided with an example of proving me wrong
(and understand the whole situation).
Jan
^ permalink raw reply [flat|nested] 3+ messages in thread
* __force_order usage on x86's CRn accesses
@ 2012-07-03 9:04 Jan Beulich
0 siblings, 0 replies; 3+ messages in thread
From: Jan Beulich @ 2012-07-03 9:04 UTC (permalink / raw)
To: mingo, tglx, gcosta, hpa; +Cc: Jeremy Fitzhardinge, linux-kernel
Pretty soon after the introduction of this via "x86: unify paravirt
parts of system.h" Jeremy had posted a patch to correct the
(reversed in direction of access) constraints using that auxiliary
variable. What happened to that change? Was it dropped for a
reason?
Furthermore, the addition of these constraints happened
without any real explanation - the code comment that was added
doesn't really help understand why "volatile" isn't sufficient here.
Next, I also saw it being suggested somewhere to replace the
static definition of the variable (causing an instance to appear
everywhere any of the referencing inline function being used)
by an extern one. That also never got carried out, causing
about two dozen pointless instances of this variable to be
scattered around, likely consuming cache bandwidth here and
there.
Now, rather than exporting an auxiliary variable like this, how
about using an existing, rarely referenced (at least in affected
contexts) but already exported one instead?
empty_zero_page, __supported_pte_mask, x86_platform, or
high_memory might be candidates.
Finally (and this is because I lack the explanation why the
artificial constraint is needed in the first place), why is it that
clts() doesn't need one too?
Thanks, Jan
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-07-03 10:07 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4FF2D6AB020000780008D460@nat28.tlf.novell.com>
2012-07-03 9:55 ` __force_order usage on x86's CRn accesses Glauber Costa
2012-07-03 10:08 ` Jan Beulich
2012-07-03 9:04 Jan Beulich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.