From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754311AbbK0IrH (ORCPT <rfc822;w@1wt.eu>);
	Fri, 27 Nov 2015 03:47:07 -0500
Received: from mail-wm0-f41.google.com ([74.125.82.41]:37380 "EHLO
	mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753855AbbK0IrE (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Fri, 27 Nov 2015 03:47:04 -0500
Date: Fri, 27 Nov 2015 09:47:00 +0100
From: Ingo Molnar <mingo@kernel.org>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Thomas Gleixner <tglx@linutronix.de>, mingo@redhat.com, hpa@zytor.com,
        LKML <linux-kernel@vger.kernel.org>
Subject: Re: irq_fpu_usable() is irreliable
Message-ID: <20151127084700.GB26693@gmail.com>
References: <CAHmME9pTPzCzZSVaO5vKT+0gaZwu424v4AzBMfM1opAdwkMbYg@mail.gmail.com>
 <20151118065508.GA18849@gmail.com>
 <CAHmME9opFmC2494O7-5BL+Fct4CE1k5Rtc5=j3j+qWoZajtRxA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAHmME9opFmC2494O7-5BL+Fct4CE1k5Rtc5=j3j+qWoZajtRxA@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


* Jason A. Donenfeld <Jason@zx2c4.com> wrote:

> Intel 3820QM, but inside VMWare Workstation 12.
> 
> > Third, could you post such a problematic stack trace?
> 
> Sure: https://paste.kde.org/pfhhdchs9/7mmtvb

So it's:

    [  187.194226] CPU: 0 PID: 1165 Comm: iperf3 Tainted: G           O    4.2.3-1-ARCH #1
    [  187.194229] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
    [  187.194231]  0000000000000000 0000000062ca03ad ffff88003b82f0d0 ffffffff8156c0ca
    [  187.194233]  ffff88003bfa0dc0 0000000000000090 ffff88003b82f260 ffffffffa03fc27e
    [  187.194234]  0000000000000010 ffff88003be05300 0000000000000000 ffff88003b82f3e0
    [  187.194235] Call Trace:
    [  187.194244]  [<ffffffff8156c0ca>] dump_stack+0x4c/0x6e
    [  187.194248]  [<ffffffffa03fc27e>] chacha20_avx+0x23e/0x250 [wireguard]
    [  187.194253]  [<ffffffff8101de03>] ? nommu_map_page+0x43/0x80
    [  187.194257]  [<ffffffffa0344161>] ? e1000_xmit_frame+0xdf1/0x11c0 [e1000]
    [  187.194259]  [<ffffffffa03fbe6e>] ? poly1305_update_asm+0x11e/0x1b0 [wireguard]
    [  187.194260]  [<ffffffffa03fcd0d>] chacha20_finish+0x3d/0x60 [wireguard]
    [  187.194262]  [<ffffffffa03f8eae>] chacha20poly1305_encrypt_finish+0x2e/0xf0 [wireguard]
    [  187.194263]  [<ffffffffa03efa32>] noise_message_encrypt+0x162/0x180 [wireguard]
    [  187.194269]  [<ffffffff811b60e5>] ? __kmalloc_node_track_caller+0x35/0x2e0
    [  187.194274]  [<ffffffff81460af7>] ? __alloc_skb+0x87/0x210
    [  187.194275]  [<ffffffff81460a11>] ? __kmalloc_reserve.isra.5+0x31/0x90
    [  187.194276]  [<ffffffff81460acb>] ? __alloc_skb+0x5b/0x210
    [  187.194278]  [<ffffffff81460b0b>] ? __alloc_skb+0x9b/0x210
    [  187.194279]  [<ffffffffa03f2a65>] noise_message_create_data+0x55/0x80 [wireguard]
    [  187.194280]  [<ffffffffa03e9708>] packet_send_queue+0x1f8/0x4d0 [wireguard]
    [  187.194285]  [<ffffffff810a8219>] ? dequeue_entity+0x149/0x690
    [  187.194287]  [<ffffffff810a9051>] ? put_prev_entity+0x31/0x420
    [  187.194289]  [<ffffffff810146ec>] ? __switch_to+0x25c/0x4a0
    [  187.194291]  [<ffffffff81099ce2>] ? finish_task_switch+0x62/0x1b0
    [  187.194292]  [<ffffffff8156d500>] ? __schedule+0x340/0xa00
    [  187.194296]  [<ffffffff810ddf19>] ? hrtimer_try_to_cancel+0x29/0x120
    [  187.194298]  [<ffffffff810b4464>] ? add_wait_queue+0x44/0x50
    [  187.194299]  [<ffffffff811b60e5>] ? __kmalloc_node_track_caller+0x35/0x2e0
    [  187.194302]  [<ffffffff811e33ce>] ? __pollwait+0x7e/0xe0
    [  187.194303]  [<ffffffff81460af7>] ? __alloc_skb+0x87/0x210
    [  187.194304]  [<ffffffff81460a11>] ? __kmalloc_reserve.isra.5+0x31/0x90
    [  187.194305]  [<ffffffffa03e861f>] xmit+0x8f/0xe0 [wireguard]
    [  187.194308]  [<ffffffff8147588f>] dev_hard_start_xmit+0x24f/0x3f0
    [  187.194309]  [<ffffffff814753be>] ? validate_xmit_skb.isra.34.part.35+0x1e/0x2a0
    [  187.194310]  [<ffffffff81476042>] __dev_queue_xmit+0x4d2/0x540
    [  187.194311]  [<ffffffff814760c3>] dev_queue_xmit_sk+0x13/0x20
    [  187.194313]  [<ffffffff8147d9c2>] neigh_direct_output+0x12/0x20
    [  187.194315]  [<ffffffff814b1756>] ip_finish_output2+0x1b6/0x3c0
    [  187.194317]  [<ffffffff814b309e>] ? __ip_append_data.isra.3+0x6ae/0xac0
    [  187.194317]  [<ffffffff814b376c>] ip_finish_output+0x13c/0x1d0
    [  187.194318]  [<ffffffff814b3b75>] ip_output+0x75/0xe0
    [  187.194319]  [<ffffffff814b468d>] ? ip_make_skb+0x10d/0x130
    [  187.194320]  [<ffffffff814b1381>] ip_local_out_sk+0x31/0x40
    [  187.194321]  [<ffffffff814b44ea>] ip_send_skb+0x1a/0x50
    [  187.194323]  [<ffffffff814dc221>] udp_send_skb+0x151/0x280
    [  187.194325]  [<ffffffff814dd7f5>] udp_sendmsg+0x305/0x9d0
    [  187.194327]  [<ffffffff8157115e>] ? _raw_spin_unlock_bh+0xe/0x10
    [  187.194328]  [<ffffffff814e8daf>] inet_sendmsg+0x7f/0xb0
    [  187.194329]  [<ffffffff81457227>] sock_sendmsg+0x17/0x30
    [  187.194330]  [<ffffffff814572c5>] sock_write_iter+0x85/0xf0
    [  187.194332]  [<ffffffff811d028c>] __vfs_write+0xcc/0x100
    [  187.194333]  [<ffffffff811d0b04>] vfs_write+0xa4/0x1a0
    [  187.194334]  [<ffffffff811d1815>] SyS_write+0x55/0xc0
    [  187.194335]  [<ffffffff8157162e>] entry_SYSCALL_64_fastpath+0x12/0x71

so this does not seem to be a very complex stack trace: we are trying to use the 
FPU from a regular process, from a regular system call path. No interrupts, no 
kernel threads, no complications.

We possibly context switched recently:

    [  187.194285]  [<ffffffff810a8219>] ? dequeue_entity+0x149/0x690
    [  187.194287]  [<ffffffff810a9051>] ? put_prev_entity+0x31/0x420
    [  187.194289]  [<ffffffff810146ec>] ? __switch_to+0x25c/0x4a0
    [  187.194291]  [<ffffffff81099ce2>] ? finish_task_switch+0x62/0x1b0
    [  187.194292]  [<ffffffff8156d500>] ? __schedule+0x340/0xa00

but that's all that I can see in the trace.

So as a first step I'd try Linus's very latest kernel, to make sure it's not a bug 
that got fixed meanwhile. If it still occurs, try to report it to the vmware 
virtualization folks. Maybe it's some host kernel activity that changes the state 
of the FPU. I don't know ...

Thanks,

	Ingo