From mboxrd@z Thu Jan 1 00:00:00 1970 From: simon.marchi@ericsson.com (Simon Marchi) Date: Mon, 30 May 2016 13:48:11 -0400 Subject: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM? Message-ID: <574C7CDB.7050103@ericsson.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hello knowledgeable ARM people! (Background: https://sourceware.org/ml/gdb/2016-05/msg00020.html ) Debugging a flaky GDB test case on ARM lead me to think there might be race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM (PTRACE_SETVFPREGS is ARM-specific anyway). The test case (and the reproducer below) changes the value of a VFP register (let's say d0) using PTRACE_SETVFPREGS and resumes the thread with PTRACE_CONT. It happens intermittently that the thread resumes execution with the old value in d0 instead of the new one. Here is a minimal reproducing example. test.S: .global _start _start: vldr.64 d0, constant vldr.64 d1, constant break_here: vcmp.f64 d0, d1 vmrs APSR_nzcv, fpscr # Exit code moveq r0, #1 movne r0, #0 # Exit syscall mov r7, #1 svc 0 .align 8 constant: .word 0xc8b43958 .word 0x40594676 Built with: $ gcc -g3 -O0 -o test test.S -nostdlib And the gdb script, test.gdb: file test b break_here run p $d0 = 4.0 c The test is ran with $ ./gdb -nx -x test.gdb -batch The test loads the same constant in d0 and d1. It then does a comparison between them and exits with 1 (failure) if they are the same, 0 (success) if they are different. The GDB script breaks at "break_here", tries to change the value of d0 to some other constant (4.0) and lets the program continue and exit. If our register write succeeded, the program should exit with 0 (values are different). If our register write failed, the program will exit with 1 (values are still the same). The result is that I randomly see both cases, hinting to a race between the register write and the time where the kernel restores the thread's vfp registers. Note that when GDB's affinity is pinned to a single core, I do not see the failure. Also, note that when I remove the vldr.64 instructions, I can't seem to reproduce the problem, so it looks like they are somehow important. I see this behavior on 3 different boards: - ODroid XU-4, kernel 3.10.96 - Firefly RK3288, kernel 3.10.0 - Raspberry Pi 2, kernel 4.4.8 Any ideas about this problem? Thanks, Simon From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161807AbcE3SDP (ORCPT ); Mon, 30 May 2016 14:03:15 -0400 Received: from usplmg21.ericsson.net ([198.24.6.65]:43811 "EHLO usplmg21.ericsson.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161330AbcE3SDO (ORCPT ); Mon, 30 May 2016 14:03:14 -0400 X-Greylist: delayed 901 seconds by postgrey-1.27 at vger.kernel.org; Mon, 30 May 2016 14:03:14 EDT X-AuditID: c6180641-f796f6d000000e1e-bb-574c7ca5b066 From: Simon Marchi Subject: Possible race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM? Organization: Ericsson AB To: CC: , Message-ID: <574C7CDB.7050103@ericsson.com> Date: Mon, 30 May 2016 13:48:11 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrDLMWRmVeSWpSXmKPExsUyuXRPuO6yGp9wg2cPmS02Pb7GanF51xw2 i0NT9zI6MHtcvnaR2WPzknqPz5vkApijuGxSUnMyy1KL9O0SuDKmf1rHVLBHsGLlss/sDYxz ebsYOTkkBEwkpvQvZ4GwxSQu3FvP1sXIxSEkcJRR4kbvf0YIZwOjxOYzy8Cq2AT0JOat/gWU YOcQFvCQ2FwCEuUXkJTY0LCbGcQWEdCSOHJ+IxOIzSxgLbGgbTUriM0roC2xcsNNNhCbRUBV 4tTKPWC2qECExKztP5ggagQlTs58ArSJA6hXU2L9Ln2IMfIS29/OARsvJKAmsWLxQpYJjAKz kHTMQuiYhaRjASPzKkaO0uKCnNx0I8NNjMAwPCbB5riDcW+v5yFGAQ5GJR5eBT3vcCHWxLLi ytxDjBIczEoivHPSfcKFeFMSK6tSi/Lji0pzUosPMUpzsCiJ8+q/VAwXEkhPLEnNTk0tSC2C yTJxcEo1MPZVHuufebZ+ocPNvCeR/9JXfd3IOvl5nTivj4/UzZwMsZuBSQEG5vOPz/u14YyW itHXg6F1Aq7f9y5cphn0bFXJA4Y3jQE/Qo/YLmbfWii+dvra/bvk/los7DwnLL/uwZG8Cecv vM5Tkd08WaQ6/aTAJp/Xm3sVa0NrNlvf5Z8mGL7qVtBmbnclluKMREMt5qLiRAArTxNxPwIA AA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello knowledgeable ARM people! (Background: https://sourceware.org/ml/gdb/2016-05/msg00020.html ) Debugging a flaky GDB test case on ARM lead me to think there might be race between PTRACE_SETVFPREGS and PTRACE_CONT on ARM (PTRACE_SETVFPREGS is ARM-specific anyway). The test case (and the reproducer below) changes the value of a VFP register (let's say d0) using PTRACE_SETVFPREGS and resumes the thread with PTRACE_CONT. It happens intermittently that the thread resumes execution with the old value in d0 instead of the new one. Here is a minimal reproducing example. test.S: .global _start _start: vldr.64 d0, constant vldr.64 d1, constant break_here: vcmp.f64 d0, d1 vmrs APSR_nzcv, fpscr # Exit code moveq r0, #1 movne r0, #0 # Exit syscall mov r7, #1 svc 0 .align 8 constant: .word 0xc8b43958 .word 0x40594676 Built with: $ gcc -g3 -O0 -o test test.S -nostdlib And the gdb script, test.gdb: file test b break_here run p $d0 = 4.0 c The test is ran with $ ./gdb -nx -x test.gdb -batch The test loads the same constant in d0 and d1. It then does a comparison between them and exits with 1 (failure) if they are the same, 0 (success) if they are different. The GDB script breaks at "break_here", tries to change the value of d0 to some other constant (4.0) and lets the program continue and exit. If our register write succeeded, the program should exit with 0 (values are different). If our register write failed, the program will exit with 1 (values are still the same). The result is that I randomly see both cases, hinting to a race between the register write and the time where the kernel restores the thread's vfp registers. Note that when GDB's affinity is pinned to a single core, I do not see the failure. Also, note that when I remove the vldr.64 instructions, I can't seem to reproduce the problem, so it looks like they are somehow important. I see this behavior on 3 different boards: - ODroid XU-4, kernel 3.10.96 - Firefly RK3288, kernel 3.10.0 - Raspberry Pi 2, kernel 4.4.8 Any ideas about this problem? Thanks, Simon