Re: ext4: 3.17? problems

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Pavel Machek <pavel@ucw.cz>
To: "Theodore Ts'o" <tytso@mit.edu>,
	kernel list <linux-kernel@vger.kernel.org>,
	jack@suse.cz, linux-ext4@vger.kernel.org,
	adilger.kernel@dilger.ca, Dmitry Monakhov <dmonakhov@openvz.org>
Subject: Re: ext4: 3.17? problems
Date: Tue, 30 Sep 2014 23:01:08 +0200	[thread overview]
Message-ID: <20140930210108.GB14283@amd> (raw)
In-Reply-To: <20140929114408.GC2738@quack.suse.cz> <87iok6ojis.fsf@openvz.org> <20140928124658.GB6694@thunk.org>

Hi!

On Sun 2014-09-28 08:46:58, Theodore Ts'o wrote:
> On Sun, Sep 28, 2014 at 12:44:56PM +0200, Pavel Machek wrote:
> > 
> > After update to debian testing, my machine sometimes fails to
> > reboot. (aptitude upgrade seems to be the trigger).
> > 
> > So I had to hard power-down the machine. That should be perfectly
> > safe, as ext4 has a journal, and this is plain SATA disk, right?
> > 
> > On next boot to Debian stable, I got stacktrace, and messages about
> > ext4 corruption. Back to Debian testing. systemd ran fsck, determined
> > it can't fix it, dropped me into emergency shell, _but mounted the
> > filesstem, anyway_. Oops.
> 
> I've been running 3.17-rc4 plus the ext4 dev patches and due to either
> regressions in i915 or the X server (not sure which) over the last
> couple of weeks, I've had to power-down my system a number of times
> after the system has hung when either shutting down the X server or
> when trying to add or remove an external display.  So I've had to
> unfortunately do a fair number of hard-power-offs on my T540p, and
> I've not noticed any like what you've described.

Ok, I'm not 100% sure it was 3.17-rcX... but according to logs, it
is. 3.17-rc4

> Can you give any more details?  Are you using LVM or dm-crypt?  Is
> this repeatable?

No, I don't think it is repeatable in useful way for debugging, but it
is not first time it happened here. No LVM or dm-crypt in use.

> > So I had to hard power-down the machine. That should be perfectly
> > safe, as ext4 has a journal, and this is plain SATA disk, right?
> > 
> AFAIU you have some corruption on your fs (the root of cause is unknown
> at this moment)
> So you have following stages:
> 1) fs corruption
> 2) boot-> mount attempt
> 3) fsck
> During (1) Once ext4 driver found this error it will call ext4_error
> which will tag sb with FS_ERROR flag.
> During (2) it will found that tag and clear s_orphan which result
> in complain you have seen  during(3)

I tried to search syslog, but could not find original messages. It
happened during shutdown. I guess syslog was already stopped at that
point..>?

Logs say:

Sep 28 11:45:38 amd NetworkManager[3422]: <info> Activation (tun0)
successful, device activated.
Sep 28 11:45:38 amd nm-dispatcher: Dispatching action 'up' for tun0
Sep 28 11:45:39 amd systemd[1]: Stopping OpenBSD Secure Shell
server...
Sep 28 11:45:39 amd systemd[1]: Starting OpenBSD Secure Shell
server...
Sep 28 11:45:39 amd systemd[1]: Started OpenBSD Secure Shell server.
Sep 28 11:45:41 amd NetworkManager[3422]: <warn> Could not send ARP
for local address 10.10.0.14: Failed to execute child process
"/sbin/arping" (No such file or directory)
Sep 28 11:45:49 amd ntpdate[1413]: adjust time server 193.85.174.5
offset 0.002797 sec
Sep 28 12:17:01 amd /USR/SBIN/CRON[3612]: (root) CMD (   cd / &&
run-parts --report /etc/cron.hourly)
Sep 28 12:58:12 amd rsyslogd: [origin software="rsyslogd"
swVersion="8.4.0" x-pid="3380" x-info="http://www.rsyslog.com"] start
Sep 28 12:58:12 amd systemd[1]: Starting Load Kernel Modules...
Sep 28 12:58:12 amd systemd[1]: Mounted POSIX Message Queue File
System.
Sep 28 12:58:12 amd systemd[1]: Starting udev Kernel Socket.
Sep 28 12:58:12 amd systemd[1]: Listening on udev Kernel Socket.
Sep 28 12:58:12 amd systemd[1]: Starting udev Control Socket.
Sep 28 12:58:12 amd systemd[1]: Listening on udev Control Socket.
Sep 28 12:58:12 amd systemd[1]: Starting udev Coldplug all Devices...
Sep 28 12:58:12 amd systemd[1]: Started Set Up Additional Binary
Formats.
Sep 28 12:58:12 amd systemd[1]: Starting Dispatch Password Requests to
Console Directory Watch.
Sep 28 12:58:12 amd systemd[1]: Started Dispatch Password Requests to
Console Directory Watch.
Sep 28 12:58:12 amd systemd[1]: Mounting Debug File System...
Sep 28 12:58:12 amd kernel: Initializing cgroup subsys cpu
Sep 28 12:58:12 amd kernel: Linux version 3.17.0-rc4 (pavel@amd) (gcc
version 4.9.1 (Debian 4.9.1-12) ) #1 SMP Sun Sep 14 21:24:53 CEST 2014

> > After update to debian testing, my machine sometimes fails to
> > reboot. (aptitude upgrade seems to be the trigger).
> > 
> > So I had to hard power-down the machine. That should be perfectly
> > safe, as ext4 has a journal, and this is plain SATA disk, right?
>   Yes, it should be safe.

Good.

> > On next boot to Debian stable, I got stacktrace, and messages about
> > ext4 corruption. Back to Debian testing. systemd ran fsck, determined
>   It would be really good to get those messages... Ideally you could also
> use
>   e2image -r <partition> | bzip2 -c
> to store fs metadata before doing anything else with the fs to a usb stick.
> That is invaluable for future analysis.

Too late for that :-(.

> > it can't fix it, dropped me into emergency shell, _but mounted the
> > filesstem, anyway_. Oops.
>   What kernel versions are you running in Debian testing and stable?

Debian testing was 3.17-rc4, AFAICT. For debian stable -- not sure.

> My guess would be that kernel had problems only during orphan inode
> recovery (i.e. when deleting already deleted files) and we let the mount
> proceed if this fails because it's a relatively harmless problem.

Is there some phase during shutdown where journalling no longer
protects fs integrity?

Thanks,
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

next prev parent reply	other threads:[~2014-09-30 21:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-28 10:44 ext4: 3.17? problems Pavel Machek
2014-09-28 12:46 ` Theodore Ts'o
2014-09-30 21:01   ` Pavel Machek [this message]
2014-09-30 23:18     ` Henrique de Moraes Holschuh
2014-10-01  8:50       ` Jan Kara
2014-10-01  8:48     ` Jan Kara
2014-09-29  9:36 ` Dmitry Monakhov
2014-09-29 11:44 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140930210108.GB14283@amd \
    --to=pavel@ucw.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=dmonakhov@openvz.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox