From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932614Ab1AMLsx (ORCPT ); Thu, 13 Jan 2011 06:48:53 -0500 Received: from mtagate2.uk.ibm.com ([194.196.100.162]:58257 "EHLO mtagate2.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756577Ab1AMLst (ORCPT ); Thu, 13 Jan 2011 06:48:49 -0500 Date: Thu, 13 Jan 2011 12:48:40 +0100 (CET) From: Sebastian Ott X-X-Sender: sebott@localhost6.localdomain6 To: Theodore Tso cc: "linux-ext4@vger.kernel.org development" , LKML Kernel , pm list Subject: Re: Oops while going into hibernate In-Reply-To: Message-ID: References: <20110112162655.GA13496@thunk.org> <20110112172646.GB13496@thunk.org> User-Agent: Alpine 2.02 (LFD 1266 2009-07-14) Organization: "IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martin Jetter Geschaeftsfuehrung: Dirk Wittkopp Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294" MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 12 Jan 2011, Theodore Tso wrote: > That looks really bogus. /usr/bin/killall is a system binary, and there's > no good reason that hibernation should be trying to write pages to that > binary. > > You said originally that the oops was happening "while going into > hibernation right after resuming with...". So that means you did a > successful suspend/resume, and then the second suspend caused the oops? Yes. I basically did a echo reboot > /sys/power/disk for i in {1..5} ;do echo disk > /sys/power/state ;done and it crashed very early in the second suspend. > It looks like somehow the pages were left marked as dirty, so the > writeback daemons attempted writing back a page to an inode which was > never opened read/write (and in fact as a text page for > /usr/bin/killall, was mapped read/only). > Given that ext4 initializes jinode only when the file is opened > read/write, the fact that it is null, and the fact that it makes no > sense that a program would be modifying /usr/bin/killall as part of a > suspend/resume, it looks very much like we just unmasked a software > suspend bug.... Ah, ok. Thanks for the explanation! Regards, Sebastian