From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH v2] MCE: Fix race condition in mctelem_reserve Date: Wed, 19 Feb 2014 10:35:26 +0000 Message-ID: <530488EE.7050204@eu.citrix.com> References: <1390387834.32296.1.camel@hamster.uk.xensource.com> <52DFC5BC0200007800115C92@nat28.tlf.novell.com> <1390411039.32296.8.camel@hamster.uk.xensource.com> <1392803609.8843.3.camel@hamster.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1392803609.8843.3.camel@hamster.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Frediano Ziglio Cc: Liu Jinsong , Christoph Egger , David Vrabel , Jan Beulich , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 02/19/2014 09:53 AM, Frediano Ziglio wrote: > On Tue, 2014-02-18 at 12:47 +0000, George Dunlap wrote: >> On Wed, Jan 22, 2014 at 5:17 PM, Frediano Ziglio >> wrote: >>> These lines (in mctelem_reserve) >>> >>> newhead = oldhead->mcte_next; >>> if (cmpxchgptr(freelp, oldhead, newhead) == oldhead) { >>> >>> are racy. After you read the newhead pointer it can happen that another >>> flow (thread or recursive invocation) change all the list but set head >>> with same value. So oldhead is the same as *freelp but you are setting >>> a new head that could point to whatever element (even already used). >>> >>> This patch use instead a bit array and atomic bit operations. >>> >>> Signed-off-by: Frediano Ziglio >> What is this like from a release perspective? When is this code run, >> and how often is the bug triggered? >> >> -George > The code handle MCE situation. So if your hardware is good is not a big > deal. If your hardware start to have some problems in some situation is > possible that cpu raise a mce quite often causing the race to happen. > > I think that the probability is not that high. The test was finely > tested (not that easy to do even now) and solve a real bug. OK thanks -- at this point then, I think I'd just as soon hold this off until 4.4.1, unless we get some other blocking bugs, just so that we can minimize the changes. -George