From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760385Ab2C2XZy (ORCPT ); Thu, 29 Mar 2012 19:25:54 -0400 Received: from li9-11.members.linode.com ([67.18.176.11]:53019 "EHLO test.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760329Ab2C2XZt (ORCPT ); Thu, 29 Mar 2012 19:25:49 -0400 Date: Thu, 29 Mar 2012 16:25:43 -0700 From: "Ted Ts'o" To: Linus Torvalds Cc: Dave Jones , Wu Fengguang , Linux Kernel Mailing List Subject: Re: lockups shortly after booting in current git. Message-ID: <20120329232543.GF13970@thunk.org> Mail-Followup-To: Ted Ts'o , Linus Torvalds , Dave Jones , Wu Fengguang , Linux Kernel Mailing List References: <20120329195354.GA11790@redhat.com> <20120329202619.GA14001@redhat.com> <20120329203926.GA13970@thunk.org> <20120329211244.GA18684@redhat.com> <20120329214510.GD13970@thunk.org> <20120329214959.GA20783@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on test.thunk.org); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 29, 2012 at 02:52:40PM -0700, Linus Torvalds wrote: > > > > up 30 minutes with that reverted, no problems so far.. > > Goodie. Let it run for a while more, and really pound on it. > > Ted, are there any downsides to just reverting that commit (ie any > subtle interactions) for now? That's assuming that Dave's testing > continues to confirm that it is that one commit. That commit fixes a race which is seen when you write into fallocated (and hence uninitialized) disk blocks under *very* heavy memory pressure. Furthermore, although theoretically it could trigger under normal direct I/O writes, it only seems to trigger if you are issuing a huge number of AIO writes, such that a just-written page can get evicted from memory, and then read back into memory, before the workqueue has a chance to update the extent tree. This race has been around for a little over a year, and no one noticed until two months ago; it only happens under fairly exotic conditions, and in fact even after trying very hard to create a simple repro under lab conditions, we could only reproduce the problem and confirm the fix on production servers running MySQL on very fast PCIe-attached flash devices. Given that Dave was able to hit this problem pretty quickly, if we confirm that this commit is at fault, the only reasonable thing to do is to revert it IMO. - Ted