From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756236AbeATPWf (ORCPT ); Sat, 20 Jan 2018 10:22:35 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46616 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754913AbeATPWb (ORCPT ); Sat, 20 Jan 2018 10:22:31 -0500 Date: Sat, 20 Jan 2018 16:22:29 +0100 From: Andrea Arcangeli To: "Van De Ven, Arjan" Cc: David Woodhouse , Hou Tao , "linux-kernel@vger.kernel.org" , "mingo@redhat.com" , Thomas Gleixner , "ak@linux.intel.com" , "dave.hansen@linux.intel.com" , "peterz@infradead.org" , "qiuxishi@huawei.com" , "wangkefeng.wang@huawei.com" Subject: Re: [RH72 Spectre] ibpb_enabled = 1 leads to hard LOCKUP under x86_64 host machine Message-ID: <20180120152229.GA2042@redhat.com> References: <12e55119-4f5b-da63-2b1c-14fb70243b21@huawei.com> <1516438985.5087.71.camel@infradead.org> <0575AF4FD06DD142AD198903C74E1CC87A5EC9AE@ORSMSX103.amr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0575AF4FD06DD142AD198903C74E1CC87A5EC9AE@ORSMSX103.amr.corp.intel.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello everyone, On Sat, Jan 20, 2018 at 01:56:08PM +0000, Van De Ven, Arjan wrote: > well first of all don't use IBRS, use retpoline This issue triggers in the IBPB code during user to user context switch and IBPB is still needed there no matter if kernel is using retpolines or if it uses kernel IBRS. In fact IBPB is still needed there even if retpolines+user_ibrs is used or if always_ibrs/ibrs_enabled=2 is used (IBRS doesn't protect from the poison generated in the same predictor mode, "especially" in future CPUs). Only retpolining all userland would avoid IBPB here, but I doubt you suggest that. Kernel retpolines or kernel IBRS would make zero difference for this specific issue. > and if Andrea says this was a known issue in their code then I think that closes the issue. > It's an implementation bug we inherited from the merge of a CPU vendor patch and I can confirm it's already closed. The fix has been already shipped with the wave 2 update in fact and some other versions even had the bug fixed since the very first wave on 0day. That deadlock nuisance only ever triggered in artificial QA testcases and even then it wasn't easily reproducible. We already moved the follow ups in vendor BZ to avoid using bandwidth here. Thank you! Andrea