From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754311AbbK0IrH (ORCPT ); Fri, 27 Nov 2015 03:47:07 -0500 Received: from mail-wm0-f41.google.com ([74.125.82.41]:37380 "EHLO mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753855AbbK0IrE (ORCPT ); Fri, 27 Nov 2015 03:47:04 -0500 Date: Fri, 27 Nov 2015 09:47:00 +0100 From: Ingo Molnar To: "Jason A. Donenfeld" Cc: Thomas Gleixner , mingo@redhat.com, hpa@zytor.com, LKML Subject: Re: irq_fpu_usable() is irreliable Message-ID: <20151127084700.GB26693@gmail.com> References: <20151118065508.GA18849@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Jason A. Donenfeld wrote: > Intel 3820QM, but inside VMWare Workstation 12. > > > Third, could you post such a problematic stack trace? > > Sure: https://paste.kde.org/pfhhdchs9/7mmtvb So it's: [ 187.194226] CPU: 0 PID: 1165 Comm: iperf3 Tainted: G O 4.2.3-1-ARCH #1 [ 187.194229] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015 [ 187.194231] 0000000000000000 0000000062ca03ad ffff88003b82f0d0 ffffffff8156c0ca [ 187.194233] ffff88003bfa0dc0 0000000000000090 ffff88003b82f260 ffffffffa03fc27e [ 187.194234] 0000000000000010 ffff88003be05300 0000000000000000 ffff88003b82f3e0 [ 187.194235] Call Trace: [ 187.194244] [] dump_stack+0x4c/0x6e [ 187.194248] [] chacha20_avx+0x23e/0x250 [wireguard] [ 187.194253] [] ? nommu_map_page+0x43/0x80 [ 187.194257] [] ? e1000_xmit_frame+0xdf1/0x11c0 [e1000] [ 187.194259] [] ? poly1305_update_asm+0x11e/0x1b0 [wireguard] [ 187.194260] [] chacha20_finish+0x3d/0x60 [wireguard] [ 187.194262] [] chacha20poly1305_encrypt_finish+0x2e/0xf0 [wireguard] [ 187.194263] [] noise_message_encrypt+0x162/0x180 [wireguard] [ 187.194269] [] ? __kmalloc_node_track_caller+0x35/0x2e0 [ 187.194274] [] ? __alloc_skb+0x87/0x210 [ 187.194275] [] ? __kmalloc_reserve.isra.5+0x31/0x90 [ 187.194276] [] ? __alloc_skb+0x5b/0x210 [ 187.194278] [] ? __alloc_skb+0x9b/0x210 [ 187.194279] [] noise_message_create_data+0x55/0x80 [wireguard] [ 187.194280] [] packet_send_queue+0x1f8/0x4d0 [wireguard] [ 187.194285] [] ? dequeue_entity+0x149/0x690 [ 187.194287] [] ? put_prev_entity+0x31/0x420 [ 187.194289] [] ? __switch_to+0x25c/0x4a0 [ 187.194291] [] ? finish_task_switch+0x62/0x1b0 [ 187.194292] [] ? __schedule+0x340/0xa00 [ 187.194296] [] ? hrtimer_try_to_cancel+0x29/0x120 [ 187.194298] [] ? add_wait_queue+0x44/0x50 [ 187.194299] [] ? __kmalloc_node_track_caller+0x35/0x2e0 [ 187.194302] [] ? __pollwait+0x7e/0xe0 [ 187.194303] [] ? __alloc_skb+0x87/0x210 [ 187.194304] [] ? __kmalloc_reserve.isra.5+0x31/0x90 [ 187.194305] [] xmit+0x8f/0xe0 [wireguard] [ 187.194308] [] dev_hard_start_xmit+0x24f/0x3f0 [ 187.194309] [] ? validate_xmit_skb.isra.34.part.35+0x1e/0x2a0 [ 187.194310] [] __dev_queue_xmit+0x4d2/0x540 [ 187.194311] [] dev_queue_xmit_sk+0x13/0x20 [ 187.194313] [] neigh_direct_output+0x12/0x20 [ 187.194315] [] ip_finish_output2+0x1b6/0x3c0 [ 187.194317] [] ? __ip_append_data.isra.3+0x6ae/0xac0 [ 187.194317] [] ip_finish_output+0x13c/0x1d0 [ 187.194318] [] ip_output+0x75/0xe0 [ 187.194319] [] ? ip_make_skb+0x10d/0x130 [ 187.194320] [] ip_local_out_sk+0x31/0x40 [ 187.194321] [] ip_send_skb+0x1a/0x50 [ 187.194323] [] udp_send_skb+0x151/0x280 [ 187.194325] [] udp_sendmsg+0x305/0x9d0 [ 187.194327] [] ? _raw_spin_unlock_bh+0xe/0x10 [ 187.194328] [] inet_sendmsg+0x7f/0xb0 [ 187.194329] [] sock_sendmsg+0x17/0x30 [ 187.194330] [] sock_write_iter+0x85/0xf0 [ 187.194332] [] __vfs_write+0xcc/0x100 [ 187.194333] [] vfs_write+0xa4/0x1a0 [ 187.194334] [] SyS_write+0x55/0xc0 [ 187.194335] [] entry_SYSCALL_64_fastpath+0x12/0x71 so this does not seem to be a very complex stack trace: we are trying to use the FPU from a regular process, from a regular system call path. No interrupts, no kernel threads, no complications. We possibly context switched recently: [ 187.194285] [] ? dequeue_entity+0x149/0x690 [ 187.194287] [] ? put_prev_entity+0x31/0x420 [ 187.194289] [] ? __switch_to+0x25c/0x4a0 [ 187.194291] [] ? finish_task_switch+0x62/0x1b0 [ 187.194292] [] ? __schedule+0x340/0xa00 but that's all that I can see in the trace. So as a first step I'd try Linus's very latest kernel, to make sure it's not a bug that got fixed meanwhile. If it still occurs, try to report it to the vmware virtualization folks. Maybe it's some host kernel activity that changes the state of the FPU. I don't know ... Thanks, Ingo