[GIT PULL 0/4] KVM: s390: Fixes for kvm/next (3.19) and stable

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [GIT PULL 0/4] KVM: s390: Fixes for kvm/next (3.19) and stable
@ 2014-11-07 11:45 Christian Borntraeger
  2014-11-07 11:45 ` [GIT PULL 1/4] KVM: s390: Fix ipte locking Christian Borntraeger
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Christian Borntraeger @ 2014-11-07 11:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: KVM, Gleb Natapov, Alexander Graf, Cornelia Huck, Jens Freimann,
	linux-s390, Christian Borntraeger

Paolo,

here are some fixes for KVM on s390 for kvm/next. The first two patches
have a stable tag and could also go via kvm/master - if you like.

The following changes since commit 0df1f2487d2f0d04703f142813d53615d62a1da4:

  Linux 3.18-rc3 (2014-11-02 15:01:51 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  tags/kvm-s390-next-20141107

for you to fetch changes up to 365dc1633521a32d55d839f56b41bb9a531d957a:

----------------------------------------------------------------
KVM: s390: Fixes for kvm/next (3.19) and stable

1. We should flush TLBs for load control instruction emulation (stable)
2. A workaround for a compiler bug that renders ACCESS_ONCE broken (stable)
3. Fix program check handling for load control
4. Documentation Fix

----------------------------------------------------------------
Christian Borntraeger (2):
      KVM: s390: Fix ipte locking
      KVM: s390: flush CPU on load control

Dominik Dingel (1):
      KVM: fix vm device attribute documentation

Heiko Carstens (1):
      KVM: s390: fix handling of lctl[g]/stctl[g]

 Documentation/virtual/kvm/devices/vm.txt | 10 ++---
 arch/s390/kvm/gaccess.c                  | 20 ++++++---
 arch/s390/kvm/priv.c                     | 72 +++++++++++++++-----------------
 3 files changed, 53 insertions(+), 49 deletions(-)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [GIT PULL 1/4] KVM: s390: Fix ipte locking
  2014-11-07 11:45 [GIT PULL 0/4] KVM: s390: Fixes for kvm/next (3.19) and stable Christian Borntraeger
@ 2014-11-07 11:45 ` Christian Borntraeger
  2014-11-10 20:18   ` compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds Christian Borntraeger
  2014-11-07 11:45 ` [GIT PULL 2/4] KVM: s390: flush CPU on load control Christian Borntraeger
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Christian Borntraeger @ 2014-11-07 11:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: KVM, Gleb Natapov, Alexander Graf, Cornelia Huck, Jens Freimann,
	linux-s390, Christian Borntraeger, stable, #, v3.16+

ipte_unlock_siif uses cmpxchg to replace the in-memory data of the ipte
lock together with ACCESS_ONCE for the intial read.

union ipte_control {
        unsigned long val;
        struct {
                unsigned long k  : 1;
                unsigned long kh : 31;
                unsigned long kg : 32;
        };
};
[...]
static void ipte_unlock_siif(struct kvm_vcpu *vcpu)
{
        union ipte_control old, new, *ic;

        ic = &vcpu->kvm->arch.sca->ipte_control;
        do {
                new = old = ACCESS_ONCE(*ic);
                new.kh--;
                if (!new.kh)
                        new.k = 0;
        } while (cmpxchg(&ic->val, old.val, new.val) != old.val);
        if (!new.kh)
                wake_up(&vcpu->kvm->arch.ipte_wq);
}

The new value, is loaded twice from memory with gcc 4.7.2 of
fedora 18, despite the ACCESS_ONCE:

--->

l       %r4,0(%r3)      <--- load first 32 bit of lock (k and kh) in r4
alfi    %r4,2147483647  <--- add -1 to r4
llgtr   %r4,%r4         <--- zero out the sign bit of r4
lg      %r1,0(%r3)      <--- load all 64 bit of lock into new
lgr     %r2,%r1         <--- load the same into old
risbg   %r1,%r4,1,31,32 <--- shift and insert r4 into the bits 1-31 of
new
llihf   %r4,2147483647
ngrk    %r4,%r1,%r4
jne     aa0 <ipte_unlock+0xf8>
nihh    %r1,32767
lgr     %r4,%r2
csg     %r4,%r1,0(%r3)
cgr     %r2,%r4
jne     a70 <ipte_unlock+0xc8>

If the memory value changes between the first load (l) and the second
load (lg) we are broken. If that happens VCPU threads will hang
(unkillable) in handle_ipte_interlock.

Andreas Krebbel analyzed this and tracked it down to a compiler bug in
that version:
"while it is not that obvious the C99 standard basically forbids
duplicating the memory access also in that case. For an argumentation of
a similiar case please see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278#c43

For the implementation-defined cases regarding volatile there are some
GCC-specific clarifications which can be found here:
https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html#Volatiles

I've tracked down the problem with a reduced testcase. The problem was
that during a tree level optimization (SRA - scalar replacement of
aggregates) the volatile marker is lost. And an RTL level optimizer (CSE
- common subexpression elimination) then propagated the memory read into
  its second use introducing another access to the memory location. So
indeed Christian's suspicion that the union access has something to do
with it is correct (since it triggered the SRA optimization).

This issue has been reported and fixed in the GCC 4.8 development cycle:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145"

This patch replaces the ACCESS_ONCE scheme with a barrier() based scheme
that should work for all supported compilers.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org # v3.16+
---
 arch/s390/kvm/gaccess.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 0f961a1..6dc0ad9 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -229,10 +229,12 @@ static void ipte_lock_simple(struct kvm_vcpu *vcpu)
 		goto out;
 	ic = &vcpu->kvm->arch.sca->ipte_control;
 	do {
-		old = ACCESS_ONCE(*ic);
+		old = *ic;
+		barrier();
 		while (old.k) {
 			cond_resched();
-			old = ACCESS_ONCE(*ic);
+			old = *ic;
+			barrier();
 		}
 		new = old;
 		new.k = 1;
@@ -251,7 +253,9 @@ static void ipte_unlock_simple(struct kvm_vcpu *vcpu)
 		goto out;
 	ic = &vcpu->kvm->arch.sca->ipte_control;
 	do {
-		new = old = ACCESS_ONCE(*ic);
+		old = *ic;
+		barrier();
+		new = old;
 		new.k = 0;
 	} while (cmpxchg(&ic->val, old.val, new.val) != old.val);
 	wake_up(&vcpu->kvm->arch.ipte_wq);
@@ -265,10 +269,12 @@ static void ipte_lock_siif(struct kvm_vcpu *vcpu)
 
 	ic = &vcpu->kvm->arch.sca->ipte_control;
 	do {
-		old = ACCESS_ONCE(*ic);
+		old = *ic;
+		barrier();
 		while (old.kg) {
 			cond_resched();
-			old = ACCESS_ONCE(*ic);
+			old = *ic;
+			barrier();
 		}
 		new = old;
 		new.k = 1;
@@ -282,7 +288,9 @@ static void ipte_unlock_siif(struct kvm_vcpu *vcpu)
 
 	ic = &vcpu->kvm->arch.sca->ipte_control;
 	do {
-		new = old = ACCESS_ONCE(*ic);
+		old = *ic;
+		barrier();
+		new = old;
 		new.kh--;
 		if (!new.kh)
 			new.k = 0;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [GIT PULL 2/4] KVM: s390: flush CPU on load control
  2014-11-07 11:45 [GIT PULL 0/4] KVM: s390: Fixes for kvm/next (3.19) and stable Christian Borntraeger
  2014-11-07 11:45 ` [GIT PULL 1/4] KVM: s390: Fix ipte locking Christian Borntraeger
@ 2014-11-07 11:45 ` Christian Borntraeger
  2014-11-07 11:45 ` [GIT PULL 3/4] KVM: s390: fix handling of lctl[g]/stctl[g] Christian Borntraeger
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Christian Borntraeger @ 2014-11-07 11:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: KVM, Gleb Natapov, Alexander Graf, Cornelia Huck, Jens Freimann,
	linux-s390, Christian Borntraeger, stable

some control register changes will flush some aspects of the CPU, e.g.
POP explicitely mentions that for CR9-CR11 "TLBs may be cleared".
Instead of trying to be clever and only flush on specific CRs, let
play safe and flush on all lctl(g) as future machines might define
new bits in CRs. Load control intercept should not happen that often.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
---
 arch/s390/kvm/priv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 72bb2dd..9c565b6 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -791,7 +791,7 @@ int kvm_s390_handle_lctl(struct kvm_vcpu *vcpu)
 			break;
 		reg = (reg + 1) % 16;
 	} while (1);
-
+	kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 	return 0;
 }
 
@@ -863,7 +863,7 @@ static int handle_lctlg(struct kvm_vcpu *vcpu)
 			break;
 		reg = (reg + 1) % 16;
 	} while (1);
-
+	kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 	return 0;
 }
 
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [GIT PULL 3/4] KVM: s390: fix handling of lctl[g]/stctl[g]
  2014-11-07 11:45 [GIT PULL 0/4] KVM: s390: Fixes for kvm/next (3.19) and stable Christian Borntraeger
  2014-11-07 11:45 ` [GIT PULL 1/4] KVM: s390: Fix ipte locking Christian Borntraeger
  2014-11-07 11:45 ` [GIT PULL 2/4] KVM: s390: flush CPU on load control Christian Borntraeger
@ 2014-11-07 11:45 ` Christian Borntraeger
  2014-11-07 11:45 ` [GIT PULL 4/4] KVM: fix vm device attribute documentation Christian Borntraeger
  2014-11-07 12:07 ` [GIT PULL 0/4] KVM: s390: Fixes for kvm/next (3.19) and stable Paolo Bonzini
  4 siblings, 0 replies; 16+ messages in thread
From: Christian Borntraeger @ 2014-11-07 11:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: KVM, Gleb Natapov, Alexander Graf, Cornelia Huck, Jens Freimann,
	linux-s390, Heiko Carstens, Christian Borntraeger

From: Heiko Carstens <heiko.carstens@de.ibm.com>

According to the architecture all instructions are suppressing if memory
access is prohibited due to DAT protection, unless stated otherwise for
an instruction.
The lctl[g]/stctl[g] implementations handled this incorrectly since
control register handling was done piecemeal, which means they had
terminating instead of suppressing semantics.
This patch fixes this.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/priv.c | 68 +++++++++++++++++++++++++---------------------------
 1 file changed, 32 insertions(+), 36 deletions(-)

diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 9c565b6..9bde32f 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -762,8 +762,8 @@ int kvm_s390_handle_lctl(struct kvm_vcpu *vcpu)
 {
 	int reg1 = (vcpu->arch.sie_block->ipa & 0x00f0) >> 4;
 	int reg3 = vcpu->arch.sie_block->ipa & 0x000f;
-	u32 val = 0;
-	int reg, rc;
+	int reg, rc, nr_regs;
+	u32 ctl_array[16];
 	u64 ga;
 
 	vcpu->stat.instruction_lctl++;
@@ -779,14 +779,15 @@ int kvm_s390_handle_lctl(struct kvm_vcpu *vcpu)
 	VCPU_EVENT(vcpu, 5, "lctl r1:%x, r3:%x, addr:%llx", reg1, reg3, ga);
 	trace_kvm_s390_handle_lctl(vcpu, 0, reg1, reg3, ga);
 
+	nr_regs = ((reg3 - reg1) & 0xf) + 1;
+	rc = read_guest(vcpu, ga, ctl_array, nr_regs * sizeof(u32));
+	if (rc)
+		return kvm_s390_inject_prog_cond(vcpu, rc);
 	reg = reg1;
+	nr_regs = 0;
 	do {
-		rc = read_guest(vcpu, ga, &val, sizeof(val));
-		if (rc)
-			return kvm_s390_inject_prog_cond(vcpu, rc);
 		vcpu->arch.sie_block->gcr[reg] &= 0xffffffff00000000ul;
-		vcpu->arch.sie_block->gcr[reg] |= val;
-		ga += 4;
+		vcpu->arch.sie_block->gcr[reg] |= ctl_array[nr_regs++];
 		if (reg == reg3)
 			break;
 		reg = (reg + 1) % 16;
@@ -799,9 +800,9 @@ int kvm_s390_handle_stctl(struct kvm_vcpu *vcpu)
 {
 	int reg1 = (vcpu->arch.sie_block->ipa & 0x00f0) >> 4;
 	int reg3 = vcpu->arch.sie_block->ipa & 0x000f;
+	int reg, rc, nr_regs;
+	u32 ctl_array[16];
 	u64 ga;
-	u32 val;
-	int reg, rc;
 
 	vcpu->stat.instruction_stctl++;
 
@@ -817,26 +818,24 @@ int kvm_s390_handle_stctl(struct kvm_vcpu *vcpu)
 	trace_kvm_s390_handle_stctl(vcpu, 0, reg1, reg3, ga);
 
 	reg = reg1;
+	nr_regs = 0;
 	do {
-		val = vcpu->arch.sie_block->gcr[reg] &  0x00000000fffffffful;
-		rc = write_guest(vcpu, ga, &val, sizeof(val));
-		if (rc)
-			return kvm_s390_inject_prog_cond(vcpu, rc);
-		ga += 4;
+		ctl_array[nr_regs++] = vcpu->arch.sie_block->gcr[reg];
 		if (reg == reg3)
 			break;
 		reg = (reg + 1) % 16;
 	} while (1);
-
-	return 0;
+	rc = write_guest(vcpu, ga, ctl_array, nr_regs * sizeof(u32));
+	return rc ? kvm_s390_inject_prog_cond(vcpu, rc) : 0;
 }
 
 static int handle_lctlg(struct kvm_vcpu *vcpu)
 {
 	int reg1 = (vcpu->arch.sie_block->ipa & 0x00f0) >> 4;
 	int reg3 = vcpu->arch.sie_block->ipa & 0x000f;
-	u64 ga, val;
-	int reg, rc;
+	int reg, rc, nr_regs;
+	u64 ctl_array[16];
+	u64 ga;
 
 	vcpu->stat.instruction_lctlg++;
 
@@ -848,17 +847,17 @@ static int handle_lctlg(struct kvm_vcpu *vcpu)
 	if (ga & 7)
 		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
 
-	reg = reg1;
-
 	VCPU_EVENT(vcpu, 5, "lctlg r1:%x, r3:%x, addr:%llx", reg1, reg3, ga);
 	trace_kvm_s390_handle_lctl(vcpu, 1, reg1, reg3, ga);
 
+	nr_regs = ((reg3 - reg1) & 0xf) + 1;
+	rc = read_guest(vcpu, ga, ctl_array, nr_regs * sizeof(u64));
+	if (rc)
+		return kvm_s390_inject_prog_cond(vcpu, rc);
+	reg = reg1;
+	nr_regs = 0;
 	do {
-		rc = read_guest(vcpu, ga, &val, sizeof(val));
-		if (rc)
-			return kvm_s390_inject_prog_cond(vcpu, rc);
-		vcpu->arch.sie_block->gcr[reg] = val;
-		ga += 8;
+		vcpu->arch.sie_block->gcr[reg] = ctl_array[nr_regs++];
 		if (reg == reg3)
 			break;
 		reg = (reg + 1) % 16;
@@ -871,8 +870,9 @@ static int handle_stctg(struct kvm_vcpu *vcpu)
 {
 	int reg1 = (vcpu->arch.sie_block->ipa & 0x00f0) >> 4;
 	int reg3 = vcpu->arch.sie_block->ipa & 0x000f;
-	u64 ga, val;
-	int reg, rc;
+	int reg, rc, nr_regs;
+	u64 ctl_array[16];
+	u64 ga;
 
 	vcpu->stat.instruction_stctg++;
 
@@ -884,23 +884,19 @@ static int handle_stctg(struct kvm_vcpu *vcpu)
 	if (ga & 7)
 		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
 
-	reg = reg1;
-
 	VCPU_EVENT(vcpu, 5, "stctg r1:%x, r3:%x, addr:%llx", reg1, reg3, ga);
 	trace_kvm_s390_handle_stctl(vcpu, 1, reg1, reg3, ga);
 
+	reg = reg1;
+	nr_regs = 0;
 	do {
-		val = vcpu->arch.sie_block->gcr[reg];
-		rc = write_guest(vcpu, ga, &val, sizeof(val));
-		if (rc)
-			return kvm_s390_inject_prog_cond(vcpu, rc);
-		ga += 8;
+		ctl_array[nr_regs++] = vcpu->arch.sie_block->gcr[reg];
 		if (reg == reg3)
 			break;
 		reg = (reg + 1) % 16;
 	} while (1);
-
-	return 0;
+	rc = write_guest(vcpu, ga, ctl_array, nr_regs * sizeof(u64));
+	return rc ? kvm_s390_inject_prog_cond(vcpu, rc) : 0;
 }
 
 static const intercept_handler_t eb_handlers[256] = {
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [GIT PULL 4/4] KVM: fix vm device attribute documentation
  2014-11-07 11:45 [GIT PULL 0/4] KVM: s390: Fixes for kvm/next (3.19) and stable Christian Borntraeger
                   ` (2 preceding siblings ...)
  2014-11-07 11:45 ` [GIT PULL 3/4] KVM: s390: fix handling of lctl[g]/stctl[g] Christian Borntraeger
@ 2014-11-07 11:45 ` Christian Borntraeger
  2014-11-07 12:07 ` [GIT PULL 0/4] KVM: s390: Fixes for kvm/next (3.19) and stable Paolo Bonzini
  4 siblings, 0 replies; 16+ messages in thread
From: Christian Borntraeger @ 2014-11-07 11:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: KVM, Gleb Natapov, Alexander Graf, Cornelia Huck, Jens Freimann,
	linux-s390, Dominik Dingel, Christian Borntraeger

From: Dominik Dingel <dingel@linux.vnet.ibm.com>

Documentation uses incorrect attribute names for some vm device
attributes: fix this.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 Documentation/virtual/kvm/devices/vm.txt | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/virtual/kvm/devices/vm.txt b/Documentation/virtual/kvm/devices/vm.txt
index 0d16f96..d426fc8 100644
--- a/Documentation/virtual/kvm/devices/vm.txt
+++ b/Documentation/virtual/kvm/devices/vm.txt
@@ -12,14 +12,14 @@ specific.
 1. GROUP: KVM_S390_VM_MEM_CTRL
 Architectures: s390
 
-1.1. ATTRIBUTE: KVM_S390_VM_MEM_CTRL
+1.1. ATTRIBUTE: KVM_S390_VM_MEM_ENABLE_CMMA
 Parameters: none
-Returns: -EBUSY if already a vcpus is defined, otherwise 0
+Returns: -EBUSY if a vcpu is already defined, otherwise 0
 
-Enables CMMA for the virtual machine
+Enables Collaborative Memory Management Assist (CMMA) for the virtual machine.
 
-1.2. ATTRIBUTE: KVM_S390_VM_CLR_CMMA
-Parameteres: none
+1.2. ATTRIBUTE: KVM_S390_VM_MEM_CLR_CMMA
+Parameters: none
 Returns: 0
 
 Clear the CMMA status for all guest pages, so any pages the guest marked
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [GIT PULL 0/4] KVM: s390: Fixes for kvm/next (3.19) and stable
  2014-11-07 11:45 [GIT PULL 0/4] KVM: s390: Fixes for kvm/next (3.19) and stable Christian Borntraeger
                   ` (3 preceding siblings ...)
  2014-11-07 11:45 ` [GIT PULL 4/4] KVM: fix vm device attribute documentation Christian Borntraeger
@ 2014-11-07 12:07 ` Paolo Bonzini
  4 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2014-11-07 12:07 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: KVM, Gleb Natapov, Alexander Graf, Cornelia Huck, Jens Freimann,
	linux-s390



On 07/11/2014 12:45, Christian Borntraeger wrote:
> Paolo,
> 
> here are some fixes for KVM on s390 for kvm/next. The first two patches
> have a stable tag and could also go via kvm/master - if you like.
> 
> The following changes since commit 0df1f2487d2f0d04703f142813d53615d62a1da4:
> 
>   Linux 3.18-rc3 (2014-11-02 15:01:51 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  tags/kvm-s390-next-20141107
> 
> for you to fetch changes up to 365dc1633521a32d55d839f56b41bb9a531d957a:
> 
> ----------------------------------------------------------------
> KVM: s390: Fixes for kvm/next (3.19) and stable
> 
> 1. We should flush TLBs for load control instruction emulation (stable)
> 2. A workaround for a compiler bug that renders ACCESS_ONCE broken (stable)
> 3. Fix program check handling for load control
> 4. Documentation Fix
> 
> ----------------------------------------------------------------
> Christian Borntraeger (2):
>       KVM: s390: Fix ipte locking
>       KVM: s390: flush CPU on load control
> 
> Dominik Dingel (1):
>       KVM: fix vm device attribute documentation
> 
> Heiko Carstens (1):
>       KVM: s390: fix handling of lctl[g]/stctl[g]
> 
>  Documentation/virtual/kvm/devices/vm.txt | 10 ++---
>  arch/s390/kvm/gaccess.c                  | 20 ++++++---
>  arch/s390/kvm/priv.c                     | 72 +++++++++++++++-----------------
>  3 files changed, 53 insertions(+), 49 deletions(-)
> 

Pulled to kvm/next (at least locally, I have tests running and will push
to kernel.org next Monday).

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
  2014-11-07 11:45 ` [GIT PULL 1/4] KVM: s390: Fix ipte locking Christian Borntraeger
@ 2014-11-10 20:18   ` Christian Borntraeger
  2014-11-10 21:07     ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Borntraeger @ 2014-11-10 20:18 UTC (permalink / raw)
  To: torvalds
  Cc: Paolo Bonzini, KVM,
	linux-kernel@vger.kernel.org >> Linux Kernel Mailing List,
	Heiko Carstens, Andreas Krebbel, Martin Schwidefsky,
	Cornelia Huck

Linus,

Last week I sent belows patch to Paolo for kvm/next.
Heiko Carstens pointed out that the kernel often does not 
work around compiler bugs, instead we blacklist the 
compiler, make a buildbug on or similar when we cant be 
sure to catch all broken cases, e.g. Heiko mentioned 
the the x86 specific test cases for stack protector.

Now: I can reproduces belows miscompile on gcc46 and gcc 47
gcc 45 seems ok, gcc 48 is fixed.  This makes blacklisting
a bit hard, especially since it is not limited to s390, but
covers all architectures.
In essence ACCESS_ONCE will not work reliably on aggregate 
types with gcc 4.6 and gcc 4.7.
In Linux we seem to use ACCESS_ONCE mostly on scalar types,
below code is an example were we dont - and break.

Linus, what is your take on workarounds of compiler bugs?
The barrier solution below will fix that particular case, but
are there others? Maybe we can come with a clever trick of
creating a build-bug for ACCESS_ONCE on non-scalar types?
A testcase is not trivial, since we have to rely on other
optimization steps to actually do the wrong thing and the 
gcc testcase test will dump the internal tree and check
that - something that does not seem to be ok for the kernel.

Christian

-------- Forwarded Message --------
Subject: [GIT PULL 1/4] KVM: s390: Fix ipte locking
Date: Fri,  7 Nov 2014 12:45:13 +0100
From: Christian Borntraeger <borntraeger@de.ibm.com>
To: Paolo Bonzini <pbonzini@redhat.com>
CC: KVM <kvm@vger.kernel.org>, Gleb Natapov <gleb@kernel.org>, Alexander Graf <agraf@suse.de>, Cornelia Huck <cornelia.huck@de.ibm.com>, Jens Freimann <jfrei@linux.vnet.ibm.com>, linux-s390 <linux-s390@vger.kernel.org>, Christian Borntraeger <borntraeger@de.ibm.com>, stable@vger.kernel.org

ipte_unlock_siif uses cmpxchg to replace the in-memory data of the ipte
lock together with ACCESS_ONCE for the intial read.

union ipte_control {
        unsigned long val;
        struct {
                unsigned long k  : 1;
                unsigned long kh : 31;
                unsigned long kg : 32;
        };
};
[...]
static void ipte_unlock_siif(struct kvm_vcpu *vcpu)
{
        union ipte_control old, new, *ic;

        ic = &vcpu->kvm->arch.sca->ipte_control;
        do {
                new = old = ACCESS_ONCE(*ic);
                new.kh--;
                if (!new.kh)
                        new.k = 0;
        } while (cmpxchg(&ic->val, old.val, new.val) != old.val);
        if (!new.kh)
                wake_up(&vcpu->kvm->arch.ipte_wq);
}

The new value, is loaded twice from memory with gcc 4.7.2 of
fedora 18, despite the ACCESS_ONCE:

--->

l       %r4,0(%r3)      <--- load first 32 bit of lock (k and kh) in r4
alfi    %r4,2147483647  <--- add -1 to r4
llgtr   %r4,%r4         <--- zero out the sign bit of r4
lg      %r1,0(%r3)      <--- load all 64 bit of lock into new
lgr     %r2,%r1         <--- load the same into old
risbg   %r1,%r4,1,31,32 <--- shift and insert r4 into the bits 1-31 of
new
llihf   %r4,2147483647
ngrk    %r4,%r1,%r4
jne     aa0 <ipte_unlock+0xf8>
nihh    %r1,32767
lgr     %r4,%r2
csg     %r4,%r1,0(%r3)
cgr     %r2,%r4
jne     a70 <ipte_unlock+0xc8>

If the memory value changes between the first load (l) and the second
load (lg) we are broken. If that happens VCPU threads will hang
(unkillable) in handle_ipte_interlock.

Andreas Krebbel analyzed this and tracked it down to a compiler bug in
that version:
"while it is not that obvious the C99 standard basically forbids
duplicating the memory access also in that case. For an argumentation of
a similiar case please see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278#c43

For the implementation-defined cases regarding volatile there are some
GCC-specific clarifications which can be found here:
https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html#Volatiles

I've tracked down the problem with a reduced testcase. The problem was
that during a tree level optimization (SRA - scalar replacement of
aggregates) the volatile marker is lost. And an RTL level optimizer (CSE
- common subexpression elimination) then propagated the memory read into
  its second use introducing another access to the memory location. So
indeed Christian's suspicion that the union access has something to do
with it is correct (since it triggered the SRA optimization).

This issue has been reported and fixed in the GCC 4.8 development cycle:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145"

This patch replaces the ACCESS_ONCE scheme with a barrier() based scheme
that should work for all supported compilers.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org # v3.16+
---
 arch/s390/kvm/gaccess.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 0f961a1..6dc0ad9 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -229,10 +229,12 @@ static void ipte_lock_simple(struct kvm_vcpu *vcpu)
 		goto out;
 	ic = &vcpu->kvm->arch.sca->ipte_control;
 	do {
-		old = ACCESS_ONCE(*ic);
+		old = *ic;
+		barrier();
 		while (old.k) {
 			cond_resched();
-			old = ACCESS_ONCE(*ic);
+			old = *ic;
+			barrier();
 		}
 		new = old;
 		new.k = 1;
@@ -251,7 +253,9 @@ static void ipte_unlock_simple(struct kvm_vcpu *vcpu)
 		goto out;
 	ic = &vcpu->kvm->arch.sca->ipte_control;
 	do {
-		new = old = ACCESS_ONCE(*ic);
+		old = *ic;
+		barrier();
+		new = old;
 		new.k = 0;
 	} while (cmpxchg(&ic->val, old.val, new.val) != old.val);
 	wake_up(&vcpu->kvm->arch.ipte_wq);
@@ -265,10 +269,12 @@ static void ipte_lock_siif(struct kvm_vcpu *vcpu)

 	ic = &vcpu->kvm->arch.sca->ipte_control;
 	do {
-		old = ACCESS_ONCE(*ic);
+		old = *ic;
+		barrier();
 		while (old.kg) {
 			cond_resched();
-			old = ACCESS_ONCE(*ic);
+			old = *ic;
+			barrier();
 		}
 		new = old;
 		new.k = 1;
@@ -282,7 +288,9 @@ static void ipte_unlock_siif(struct kvm_vcpu *vcpu)

 	ic = &vcpu->kvm->arch.sca->ipte_control;
 	do {
-		new = old = ACCESS_ONCE(*ic);
+		old = *ic;
+		barrier();
+		new = old;
 		new.kh--;
 		if (!new.kh)
 			new.k = 0;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
  2014-11-10 20:18   ` compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds Christian Borntraeger
@ 2014-11-10 21:07     ` Linus Torvalds
  2014-11-11  0:37       ` Paul E. McKenney
                         ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Linus Torvalds @ 2014-11-10 21:07 UTC (permalink / raw)
  To: Christian Borntraeger, Paul McKenney, Ingo Molnar
  Cc: Paolo Bonzini, KVM, Linux Kernel Mailing List, Heiko Carstens,
	Andreas Krebbel, Martin Schwidefsky, Cornelia Huck,
	linux-arch@vger.kernel.org

On Mon, Nov 10, 2014 at 12:18 PM, Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
>
> Now: I can reproduces belows miscompile on gcc46 and gcc 47
> gcc 45 seems ok, gcc 48 is fixed.  This makes blacklisting
> a bit hard, especially since it is not limited to s390, but
> covers all architectures.
> In essence ACCESS_ONCE will not work reliably on aggregate
> types with gcc 4.6 and gcc 4.7.
> In Linux we seem to use ACCESS_ONCE mostly on scalar types,
> below code is an example were we dont - and break.

Hmm. I think we should see how painful it would be to make it a rule
that ACCESS_ONCE() only works on scalar types.

Even in the actual code you show as an example, the "fix" is really to
use the "unsigned long val" member of the union for the ACCESS_ONCE().
And that seems to be true in many other cases too.

So before blacklisting any compilers, let's first see if

 (a) we can actually make it a real rule that we only use ACCESS_ONCE on scalars
 (b) we can somehow enforce this with a compiler warning/error for mis-uses

For example, the attached patch works for some cases, but shows how we
use ACCESS_ONCE() on pointers to pte_t's etc, so it doesn't come even
close to compiling the whole kernel. But I wonder how painful that
would be to change.. The places where it complains are actually
somewhat debatable to begin with, like:

 - handle_pte_fault(.. pte_t *pte ..):

        entry = ACCESS_ONCE(*pte);

and the thing is, "pte" is actually possibly an 8-byte entity on
x86-32, and that ACCESS_ONCE() fundamentally will be two 32-byte
reads.

So there is a very valid argument for saying "well, you shouldn't do
that, then", and that we might be better off cleaning up our
ACCESS_ONCE() uses, than to just blindly blacklist compilers.

NOTE! I'm not at all advocating the attached patch. I'm sending it out
white-space damaged on purpose, it's more of a "hey, something like
this might be the direction we want to go in", with the spinlock.h
part of the patch also acting as an example of the kind of changes the
"ACCESS_ONCE() only works on scalars" rule would require.

So I do agree with Heiko that we generally don't want to work around
compiler bugs if we can avoid it. But sometimes the compiler bugs do
end up saying "your'e doing something very fragile". Maybe we should
try to be less fragile here.

And in your example, the whole

        old = ACCESS_ONCE(*ic);

*could* just be a

        old->val = ACCESS_ONCE(ic->val);

the same way the x86 spinlock.h changes below.

I did *not* try to see how many other cases we have. It's possible
that your "don't use ACCESS_ONCE, use a barrier() instead" ends up
being a valid workaround. For the pte case, that may well be the
solution, for example (because what we really care about is not so
much "it's an atomic access" but "it's still the same that we
originally assumed").  Sometimes we want ACCESS_ONCE() because we
really want an atomic value (and we just don't care if it's old or
new), but sometimes it's really because we don't want the compiler to
re-load it and possibly see two different values - one that we check,
and one that we actually use (and then a barrier() would generally be
perfectly sufficient)

Adding some more people to the discussion just to see if anybody else
has comments about ACCESS_ONCE() on aggregate types.

(Btw, it's not just aggregate types, even non-aggregate types like
"long long" are not necessarily safe, to give the same 64-bit on
x86-32 example. So adding an assert that it's smaller or equal in size
to a "long" might also not be unreasonable)

                   Linus

---
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index d5ad7b1118fc..63e82f1dfc1a 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -378,7 +378,11 @@ void ftrace_likely_update(struct
ftrace_branch_data *f, int val, int expect);
  * use is to mediate communication between process-level code and irq/NMI
  * handlers, all running on the same CPU.
  */
-#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
+#define get_scalar_volatile_pointer(x) ({ \
+       typeof(x) *__p = &(x); \
+       volatile typeof(x) *__vp = __p; \
+       (void)(long)*__p; __vp; })
+#define ACCESS_ONCE(x) (*get_scalar_volatile_pointer(x))

 /* Ignore/forbid kprobes attach on very low level functions marked by
this attribute: */
 #ifdef CONFIG_KPROBES
diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 9295016485c9..b7e6825af5e3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -105,7 +105,7 @@ static __always_inline int
arch_spin_trylock(arch_spinlock_t *lock)
 {
        arch_spinlock_t old, new;

-       old.tickets = ACCESS_ONCE(lock->tickets);
+       old.head_tail = ACCESS_ONCE(lock->head_tail);
        if (old.tickets.head != (old.tickets.tail & ~TICKET_SLOWPATH_FLAG))
                return 0;

@@ -162,16 +162,14 @@ static __always_inline void
arch_spin_unlock(arch_spinlock_t *lock)

 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
-       struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
-
-       return tmp.tail != tmp.head;
+       struct arch_spinlock tmp = { .head_tail =
ACCESS_ONCE(lock->head_tail) };
+       return tmp.tickets.tail != tmp.tickets.head;
 }

 static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 {
-       struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
-
-       return (__ticket_t)(tmp.tail - tmp.head) > TICKET_LOCK_INC;
+       struct arch_spinlock tmp = { .head_tail =
ACCESS_ONCE(lock->head_tail) };
+       return (__ticket_t)(tmp.tickets.tail - tmp.tickets.head) >
TICKET_LOCK_INC;
 }
 #define arch_spin_is_contended arch_spin_is_contended

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
  2014-11-10 21:07     ` Linus Torvalds
@ 2014-11-11  0:37       ` Paul E. McKenney
  2014-11-11 21:16       ` Christian Borntraeger
  2014-11-20 11:39       ` Christian Borntraeger
  2 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2014-11-11  0:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christian Borntraeger, Ingo Molnar, Paolo Bonzini, KVM,
	Linux Kernel Mailing List, Heiko Carstens, Andreas Krebbel,
	Martin Schwidefsky, Cornelia Huck, linux-arch@vger.kernel.org

On Mon, Nov 10, 2014 at 01:07:33PM -0800, Linus Torvalds wrote:
> On Mon, Nov 10, 2014 at 12:18 PM, Christian Borntraeger
> <borntraeger@de.ibm.com> wrote:
> >
> > Now: I can reproduces belows miscompile on gcc46 and gcc 47
> > gcc 45 seems ok, gcc 48 is fixed.  This makes blacklisting
> > a bit hard, especially since it is not limited to s390, but
> > covers all architectures.
> > In essence ACCESS_ONCE will not work reliably on aggregate
> > types with gcc 4.6 and gcc 4.7.
> > In Linux we seem to use ACCESS_ONCE mostly on scalar types,
> > below code is an example were we dont - and break.
> 
> Hmm. I think we should see how painful it would be to make it a rule
> that ACCESS_ONCE() only works on scalar types.

For whatever it is worth, I have been assuming that ACCESS_ONCE() was
only ever supposed to work on scalar types.  And one of the reasons
that I have been giving the pre-EV56 Alpha guys a hard time is because
I would like us to be able to continue using ACCESS_ONCE() on 8-bit and
16-bit quantities as well.

> Even in the actual code you show as an example, the "fix" is really to
> use the "unsigned long val" member of the union for the ACCESS_ONCE().
> And that seems to be true in many other cases too.
> 
> So before blacklisting any compilers, let's first see if
> 
>  (a) we can actually make it a real rule that we only use ACCESS_ONCE on scalars
>  (b) we can somehow enforce this with a compiler warning/error for mis-uses
> 
> For example, the attached patch works for some cases, but shows how we
> use ACCESS_ONCE() on pointers to pte_t's etc, so it doesn't come even
> close to compiling the whole kernel. But I wonder how painful that
> would be to change.. The places where it complains are actually
> somewhat debatable to begin with, like:
> 
>  - handle_pte_fault(.. pte_t *pte ..):
> 
>         entry = ACCESS_ONCE(*pte);
> 
> and the thing is, "pte" is actually possibly an 8-byte entity on
> x86-32, and that ACCESS_ONCE() fundamentally will be two 32-byte
> reads.
> 
> So there is a very valid argument for saying "well, you shouldn't do
> that, then", and that we might be better off cleaning up our
> ACCESS_ONCE() uses, than to just blindly blacklist compilers.
> 
> NOTE! I'm not at all advocating the attached patch. I'm sending it out
> white-space damaged on purpose, it's more of a "hey, something like
> this might be the direction we want to go in", with the spinlock.h
> part of the patch also acting as an example of the kind of changes the
> "ACCESS_ONCE() only works on scalars" rule would require.
> 
> So I do agree with Heiko that we generally don't want to work around
> compiler bugs if we can avoid it. But sometimes the compiler bugs do
> end up saying "your'e doing something very fragile". Maybe we should
> try to be less fragile here.
> 
> And in your example, the whole
> 
>         old = ACCESS_ONCE(*ic);
> 
> *could* just be a
> 
>         old->val = ACCESS_ONCE(ic->val);
> 
> the same way the x86 spinlock.h changes below.
> 
> I did *not* try to see how many other cases we have. It's possible
> that your "don't use ACCESS_ONCE, use a barrier() instead" ends up
> being a valid workaround. For the pte case, that may well be the
> solution, for example (because what we really care about is not so
> much "it's an atomic access" but "it's still the same that we
> originally assumed").  Sometimes we want ACCESS_ONCE() because we
> really want an atomic value (and we just don't care if it's old or
> new), but sometimes it's really because we don't want the compiler to
> re-load it and possibly see two different values - one that we check,
> and one that we actually use (and then a barrier() would generally be
> perfectly sufficient)
> 
> Adding some more people to the discussion just to see if anybody else
> has comments about ACCESS_ONCE() on aggregate types.
> 
> (Btw, it's not just aggregate types, even non-aggregate types like
> "long long" are not necessarily safe, to give the same 64-bit on
> x86-32 example. So adding an assert that it's smaller or equal in size
> to a "long" might also not be unreasonable)

Good point on "long long" on 32-bit systems.

Another reason to avoid trying to do anything that even smells atomic on
non-machine-sized/aligned variables is that the compiler guys' current
reaction to this sort of situation is "No problem!!!  The compiler can
just invent a lock to guard all such accesses!"  I don't think that we
want to go there.

>                    Linus
> 
> ---
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index d5ad7b1118fc..63e82f1dfc1a 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -378,7 +378,11 @@ void ftrace_likely_update(struct
> ftrace_branch_data *f, int val, int expect);
>   * use is to mediate communication between process-level code and irq/NMI
>   * handlers, all running on the same CPU.
>   */
> -#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> +#define get_scalar_volatile_pointer(x) ({ \
> +       typeof(x) *__p = &(x); \
> +       volatile typeof(x) *__vp = __p; \
> +       (void)(long)*__p; __vp; })
> +#define ACCESS_ONCE(x) (*get_scalar_volatile_pointer(x))

I know you said that this was to be experimental, but it happily loads
from a "long long" on 32-bit x86 running gcc version 4.6.3, and does it
32 bits at a time.  How about something like the following instead?

#define get_scalar_volatile_pointer(x) ({ \
	typeof(x) *__p = &(x); \
	BUILD_BUG_ON(sizeof(x) != sizeof(char) && \
		     sizeof(x) != sizeof(short) && \
		     sizeof(x) != sizeof(int) && \
		     sizeof(x) != sizeof(long)); \
	volatile typeof(x) *__vp = __p; \
	(void)(long)*__p; __vp; })
#define ACCESS_ONCE(x) (*get_scalar_volatile_pointer(x))

							Thanx, Paul

> 
>  /* Ignore/forbid kprobes attach on very low level functions marked by
> this attribute: */
>  #ifdef CONFIG_KPROBES
> diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
> index 9295016485c9..b7e6825af5e3 100644
> --- a/arch/x86/include/asm/spinlock.h
> +++ b/arch/x86/include/asm/spinlock.h
> @@ -105,7 +105,7 @@ static __always_inline int
> arch_spin_trylock(arch_spinlock_t *lock)
>  {
>         arch_spinlock_t old, new;
> 
> -       old.tickets = ACCESS_ONCE(lock->tickets);
> +       old.head_tail = ACCESS_ONCE(lock->head_tail);
>         if (old.tickets.head != (old.tickets.tail & ~TICKET_SLOWPATH_FLAG))
>                 return 0;
> 
> @@ -162,16 +162,14 @@ static __always_inline void
> arch_spin_unlock(arch_spinlock_t *lock)
> 
>  static inline int arch_spin_is_locked(arch_spinlock_t *lock)
>  {
> -       struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
> -
> -       return tmp.tail != tmp.head;
> +       struct arch_spinlock tmp = { .head_tail =
> ACCESS_ONCE(lock->head_tail) };
> +       return tmp.tickets.tail != tmp.tickets.head;
>  }
> 
>  static inline int arch_spin_is_contended(arch_spinlock_t *lock)
>  {
> -       struct __raw_tickets tmp = ACCESS_ONCE(lock->tickets);
> -
> -       return (__ticket_t)(tmp.tail - tmp.head) > TICKET_LOCK_INC;
> +       struct arch_spinlock tmp = { .head_tail =
> ACCESS_ONCE(lock->head_tail) };
> +       return (__ticket_t)(tmp.tickets.tail - tmp.tickets.head) >
> TICKET_LOCK_INC;
>  }
>  #define arch_spin_is_contended arch_spin_is_contended
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
  2014-11-10 21:07     ` Linus Torvalds
  2014-11-11  0:37       ` Paul E. McKenney
@ 2014-11-11 21:16       ` Christian Borntraeger
  2014-11-12  0:33         ` Linus Torvalds
  2014-11-20 11:39       ` Christian Borntraeger
  2 siblings, 1 reply; 16+ messages in thread
From: Christian Borntraeger @ 2014-11-11 21:16 UTC (permalink / raw)
  To: Linus Torvalds, Paul McKenney, Ingo Molnar
  Cc: Paolo Bonzini, KVM, Linux Kernel Mailing List, Heiko Carstens,
	Andreas Krebbel, Martin Schwidefsky, Cornelia Huck,
	linux-arch@vger.kernel.org

Am 10.11.2014 um 22:07 schrieb Linus Torvalds:
> On Mon, Nov 10, 2014 at 12:18 PM, Christian Borntraeger
> <borntraeger@de.ibm.com> wrote:
>>
>> Now: I can reproduces belows miscompile on gcc46 and gcc 47
>> gcc 45 seems ok, gcc 48 is fixed.  This makes blacklisting
>> a bit hard, especially since it is not limited to s390, but
>> covers all architectures.
>> In essence ACCESS_ONCE will not work reliably on aggregate
>> types with gcc 4.6 and gcc 4.7.
>> In Linux we seem to use ACCESS_ONCE mostly on scalar types,
>> below code is an example were we dont - and break.
> 
> Hmm. I think we should see how painful it would be to make it a rule
> that ACCESS_ONCE() only works on scalar types.
> 
> Even in the actual code you show as an example, the "fix" is really to
> use the "unsigned long val" member of the union for the ACCESS_ONCE().
> And that seems to be true in many other cases too.

Yes, using the val like in 
-               new = old = ACCESS_ONCE(*ic);
+               new.val = old.val = ACCESS_ONCE(ic->val);

does solve the problem as well. In fact, gcc does create the same binary
code on my 4.7.2.

Are you ok with the patch as is in kvm/next for the time being or shall
we revert that and replace it with the .val scheme?

We can also do the cleanup later on if we manage to get your initial patch
into a shape that works out.

Christian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
  2014-11-11 21:16       ` Christian Borntraeger
@ 2014-11-12  0:33         ` Linus Torvalds
  2014-11-12  0:36           ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2014-11-12  0:33 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Paul McKenney, Ingo Molnar, Paolo Bonzini, KVM,
	Linux Kernel Mailing List, Heiko Carstens, Andreas Krebbel,
	Martin Schwidefsky, Cornelia Huck, linux-arch@vger.kernel.org

On Tue, Nov 11, 2014 at 1:16 PM, Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
>
> Are you ok with the patch as is in kvm/next for the time being or shall
> we revert that and replace it with the .val scheme?

Is that the one that was quoted at the beginning of the thread, that
uses barrier()?

I guess as a workaround it is fine, as long as we don't lose sight of
trying to eventually do a better job.

                     Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
  2014-11-12  0:33         ` Linus Torvalds
@ 2014-11-12  0:36           ` Linus Torvalds
  2014-11-12  8:05             ` Christian Borntraeger
  2014-11-12  9:28             ` Martin Schwidefsky
  0 siblings, 2 replies; 16+ messages in thread
From: Linus Torvalds @ 2014-11-12  0:36 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Paul McKenney, Ingo Molnar, Paolo Bonzini, KVM,
	Linux Kernel Mailing List, Heiko Carstens, Andreas Krebbel,
	Martin Schwidefsky, Cornelia Huck, linux-arch@vger.kernel.org

On Tue, Nov 11, 2014 at 4:33 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I guess as a workaround it is fine, as long as we don't lose sight of
> trying to eventually do a better job.

Oh, and when it comes to the actual gcc bug - do you have any reason
to believe that it's somehow triggered more easily by something
particular in the arch/s390/kvm/gaccess.c code?

IOW, why does this problem not hit the x86 spinlocks that also use
volatile pointers to aggregate types? Or does it?

                       Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
  2014-11-12  0:36           ` Linus Torvalds
@ 2014-11-12  8:05             ` Christian Borntraeger
  2014-11-12  9:28             ` Martin Schwidefsky
  1 sibling, 0 replies; 16+ messages in thread
From: Christian Borntraeger @ 2014-11-12  8:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Paul McKenney, Ingo Molnar, Paolo Bonzini, KVM,
	Linux Kernel Mailing List, Heiko Carstens, Andreas Krebbel,
	Martin Schwidefsky, Cornelia Huck, linux-arch@vger.kernel.org

Am 12.11.2014 um 01:36 schrieb Linus Torvalds:
> On Tue, Nov 11, 2014 at 4:33 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> I guess as a workaround it is fine, as long as we don't lose sight of
>> trying to eventually do a better job.
> 
> Oh, and when it comes to the actual gcc bug - do you have any reason
> to believe that it's somehow triggered more easily by something
> particular in the arch/s390/kvm/gaccess.c code?

Yes there are reasons. First of all the bug if SRA removes the volatile tag, but that does not mean that this breaks things. As long as the operation is simple enough things will be mostly ok. If things are not simple like in gaccess things get more complicated. Lets look at the ipte lock. The lock itself consists of 3 parts: k (1 bit:locked), kh(31bit counter for the host) and kg(32 bit counter for the millicode when doing specific guest instructions). There are 3 valid states
1. k=0, kh=0; kg=0
2. k=1, kh!=0, kg=0
3. k=1, kh=0, kg!=0

So the host code must check if the guest counter is zero and can then set the k bit to one and increase the counter. (for unlock it has to decrement kh and if that becomes zero also zero the k bit)
That means that we have multiple accesses to subcomponents. As the host counter is bit 1-31 (ibm speak, linux speak bit 32-62) gcc wants to use the load thirty one bit instruction. 
So far so good. The ticket lock is also not a trivial set/clear bit.

Now: In gcc the memory costs for s390 are modeled to have the same costs as register accesses (TARGET_MEMORY_MOVE_COST==1, TARGET_REGISTER_MOVE_COST=1)
So for gcc a re-loading of a part of the lock from memory costs the same as loading it from a register. That probably triggered that bug.

Christian

> 
> IOW, why does this problem not hit the x86 spinlocks that also use
> volatile pointers to aggregate types? Or does it?

I think we would have noticed if that hits. But there are certainly cases where this bug triggers also on x86, see
the initial bug report of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 This bug is certainly different, (instead of transforming one load into multiple loads , it combines multiple write into one) but it shows, that a volatile marker is removed.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
  2014-11-12  0:36           ` Linus Torvalds
  2014-11-12  8:05             ` Christian Borntraeger
@ 2014-11-12  9:28             ` Martin Schwidefsky
  1 sibling, 0 replies; 16+ messages in thread
From: Martin Schwidefsky @ 2014-11-12  9:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Christian Borntraeger, Paul McKenney, Ingo Molnar, Paolo Bonzini,
	KVM, Linux Kernel Mailing List, Heiko Carstens, Andreas Krebbel,
	Cornelia Huck, linux-arch@vger.kernel.org

On Tue, 11 Nov 2014 16:36:06 -0800
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Nov 11, 2014 at 4:33 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I guess as a workaround it is fine, as long as we don't lose sight of
> > trying to eventually do a better job.
> 
> Oh, and when it comes to the actual gcc bug - do you have any reason
> to believe that it's somehow triggered more easily by something
> particular in the arch/s390/kvm/gaccess.c code?
> 
> IOW, why does this problem not hit the x86 spinlocks that also use
> volatile pointers to aggregate types? Or does it?

This looks similiar to what we had on s390:

	old.tickets = ACCESS_ONCE(lock->tickets)

In theory x86 should be affected as well. On s390 we have lots of
instruction that operate on memory and the cost model of gcc makes
the compiler more inclined to access memory multiple times. My
guess would be that once the value is cached in a register the
cost model for x86 will usually make sure that the value is not
read a second time. But this is no guarantee.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
  2014-11-10 21:07     ` Linus Torvalds
  2014-11-11  0:37       ` Paul E. McKenney
  2014-11-11 21:16       ` Christian Borntraeger
@ 2014-11-20 11:39       ` Christian Borntraeger
  2014-11-20 20:30         ` Linus Torvalds
  2 siblings, 1 reply; 16+ messages in thread
From: Christian Borntraeger @ 2014-11-20 11:39 UTC (permalink / raw)
  To: Linus Torvalds, Paul McKenney, Ingo Molnar
  Cc: Paolo Bonzini, KVM, Linux Kernel Mailing List, Heiko Carstens,
	Andreas Krebbel, Martin Schwidefsky, Cornelia Huck,
	linux-arch@vger.kernel.org

Am 10.11.2014 um 22:07 schrieb Linus Torvalds:
[...]
> So before blacklisting any compilers, let's first see if
> 
>  (a) we can actually make it a real rule that we only use ACCESS_ONCE on scalars
>  (b) we can somehow enforce this with a compiler warning/error for mis-uses
> 
> For example, the attached patch works for some cases, but shows how we
> use ACCESS_ONCE() on pointers to pte_t's etc, so it doesn't come even
> close to compiling the whole kernel. But I wonder how painful that
> would be to change.. The places where it complains are actually
> somewhat debatable to begin with, like:
> 
>  - handle_pte_fault(.. pte_t *pte ..):
> 
>         entry = ACCESS_ONCE(*pte);
> 
> and the thing is, "pte" is actually possibly an 8-byte entity on
> x86-32, and that ACCESS_ONCE() fundamentally will be two 32-byte
> reads.
> 
> So there is a very valid argument for saying "well, you shouldn't do
> that, then", and that we might be better off cleaning up our
> ACCESS_ONCE() uses, than to just blindly blacklist compilers.
> 
> NOTE! I'm not at all advocating the attached patch. I'm sending it out
> white-space damaged on purpose, it's more of a "hey, something like
> this might be the direction we want to go in", with the spinlock.h
> part of the patch also acting as an example of the kind of changes the
> "ACCESS_ONCE() only works on scalars" rule would require.

So I tried to see if I can come up with some results on how often this problem happens...

[...]


> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index d5ad7b1118fc..63e82f1dfc1a 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -378,7 +378,11 @@ void ftrace_likely_update(struct
> ftrace_branch_data *f, int val, int expect);
>   * use is to mediate communication between process-level code and irq/NMI
>   * handlers, all running on the same CPU.
>   */
> -#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> +#define get_scalar_volatile_pointer(x) ({ \
> +       typeof(x) *__p = &(x); \
> +       volatile typeof(x) *__vp = __p; \
> +       (void)(long)*__p; __vp; })
> +#define ACCESS_ONCE(x) (*get_scalar_volatile_pointer(x))

..and just took this patch. On s390 is pretty much clean with allyesconfig
In fact with the siif lock changed only the pte/pmd cases you mentioned trigger a compile error:

mm/memory.c: In function 'handle_pte_fault':
mm/memory.c:3203:2: error: aggregate value used where an integer was expected
  entry = ACCESS_ONCE(*pte);

mm/rmap.c: In function 'mm_find_pmd':
mm/rmap.c:584:2: error: aggregate value used where an integer was expected
  pmde = ACCESS_ONCE(*pmd);


Here a barrier() might be a good solution as well, I guess.
On x86 allyesconfig its almost the same.
- we need your spinlock changes (well, something different to make it compile)
- we need to fix pmd and pte
- we have gup_get_pte in arch/x86/mm/gup.c getting a ptep

So It looks like we could make a change to ACCESS_ONCE. Would something like

CONFIG_ARCH_SCALAR_ACCESS_ONCE be a good start?

This would boil down to
Patch1: Provide stricter ACCESS_ONCE if CONFIG_ARCH_SCALAR_ACCESS_ONCE is set + docu update + comments
Patch2: Change mm/* to barriers
Patch3: Change x86 locks
Patch4: Change x86 gup
Patch4: Enable CONFIG_ARCH_SCALAR_ACCESS_ONCE for s390x and x86

Makes sense?

Christian

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds
  2014-11-20 11:39       ` Christian Borntraeger
@ 2014-11-20 20:30         ` Linus Torvalds
  0 siblings, 0 replies; 16+ messages in thread
From: Linus Torvalds @ 2014-11-20 20:30 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Paul McKenney, Ingo Molnar, Paolo Bonzini, KVM,
	Linux Kernel Mailing List, Heiko Carstens, Andreas Krebbel,
	Martin Schwidefsky, Cornelia Huck, linux-arch@vger.kernel.org

On Thu, Nov 20, 2014 at 3:39 AM, Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
>
> So It looks like we could make a change to ACCESS_ONCE. Would something like
>
> CONFIG_ARCH_SCALAR_ACCESS_ONCE be a good start?

No, if it's just a handful of places to be fixed, let's not add config
options for broken cases.

> This would boil down to
> Patch1: Provide stricter ACCESS_ONCE if CONFIG_ARCH_SCALAR_ACCESS_ONCE is set + docu update + comments
> Patch2: Change mm/* to barriers
> Patch3: Change x86 locks
> Patch4: Change x86 gup
> Patch4: Enable CONFIG_ARCH_SCALAR_ACCESS_ONCE for s390x and x86

Just do patches 2-4 first, and then patch 1 unconditionally.

Obviously you'd need to spread the word on linux-arch to see how bad
it is for other cases, but if other architectures are at all like x86
and s390, and just require a few trivial patches, let's not make this
some config option.

                   Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-11-20 20:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-07 11:45 [GIT PULL 0/4] KVM: s390: Fixes for kvm/next (3.19) and stable Christian Borntraeger
2014-11-07 11:45 ` [GIT PULL 1/4] KVM: s390: Fix ipte locking Christian Borntraeger
2014-11-10 20:18   ` compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds Christian Borntraeger
2014-11-10 21:07     ` Linus Torvalds
2014-11-11  0:37       ` Paul E. McKenney
2014-11-11 21:16       ` Christian Borntraeger
2014-11-12  0:33         ` Linus Torvalds
2014-11-12  0:36           ` Linus Torvalds
2014-11-12  8:05             ` Christian Borntraeger
2014-11-12  9:28             ` Martin Schwidefsky
2014-11-20 11:39       ` Christian Borntraeger
2014-11-20 20:30         ` Linus Torvalds
2014-11-07 11:45 ` [GIT PULL 2/4] KVM: s390: flush CPU on load control Christian Borntraeger
2014-11-07 11:45 ` [GIT PULL 3/4] KVM: s390: fix handling of lctl[g]/stctl[g] Christian Borntraeger
2014-11-07 11:45 ` [GIT PULL 4/4] KVM: fix vm device attribute documentation Christian Borntraeger
2014-11-07 12:07 ` [GIT PULL 0/4] KVM: s390: Fixes for kvm/next (3.19) and stable Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox