From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753649Ab2CISA0 (ORCPT ); Fri, 9 Mar 2012 13:00:26 -0500 Received: from mx1.redhat.com ([209.132.183.28]:2720 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752349Ab2CISAY (ORCPT ); Fri, 9 Mar 2012 13:00:24 -0500 Date: Fri, 9 Mar 2012 13:00:15 -0500 From: Dave Jones To: Yang Bai Cc: Fengguang Wu , Linux Kernel , Fedora Kernel Team , kernel@tesarici.cz Subject: Re: inode->i_wb_list corruption. Message-ID: <20120309180015.GA3862@redhat.com> Mail-Followup-To: Dave Jones , Yang Bai , Fengguang Wu , Linux Kernel , Fedora Kernel Team , kernel@tesarici.cz References: <20120306185137.GA15881@redhat.com> <20120306210307.GC8781@quack.suse.cz> <20120307072608.GA24087@localhost> <20120307104240.GB18658@quack.suse.cz> <20120309145713.GA21543@redhat.com> <20120309151951.GA30160@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (trimmed cc) On Sat, Mar 10, 2012 at 12:14:37AM +0800, Yang Bai wrote: > On Fri, Mar 9, 2012 at 11:19 PM, Dave Jones wrote: > > And with that, this arrived.. > > https://bugzilla.redhat.com/show_bug.cgi?id=788433#c3 > > > > I'm leaning strongly towards believing this is yet another case of i915 > > corrupting memory on resume. > > Nice catch. I am wondering > 1) why all lists being affected and > 2) why all list_head's prev being set to NULL. > > Any ideas? This is probably the same bug: https://bugzilla.kernel.org/show_bug.cgi?id=37142 Petr noticed that the corruption is 32 bytes getting zeroed at the beginning of a page. I think this may be responsible for a lot of different bugs that we've had reported. i915_drm_thaw is a deep nest of functions though, so this is going to be hard to track down where that write is coming from. Because the corruption seems to happen to pages that are already allocated, we probably can't even rely on DEBUG_PAGEALLOC, though it might be worth trying. Dave