From: Pavel Machek <pavel@ucw.cz>
To: "Theodore Ts'o" <tytso@mit.edu>,
kernel list <linux-kernel@vger.kernel.org>,
jack@suse.cz, linux-ext4@vger.kernel.org,
adilger.kernel@dilger.ca, Dmitry Monakhov <dmonakhov@openvz.org>
Subject: Re: ext4: 3.17? problems
Date: Tue, 30 Sep 2014 23:01:08 +0200 [thread overview]
Message-ID: <20140930210108.GB14283@amd> (raw)
In-Reply-To: <20140929114408.GC2738@quack.suse.cz> <87iok6ojis.fsf@openvz.org> <20140928124658.GB6694@thunk.org>
Hi!
On Sun 2014-09-28 08:46:58, Theodore Ts'o wrote:
> On Sun, Sep 28, 2014 at 12:44:56PM +0200, Pavel Machek wrote:
> >
> > After update to debian testing, my machine sometimes fails to
> > reboot. (aptitude upgrade seems to be the trigger).
> >
> > So I had to hard power-down the machine. That should be perfectly
> > safe, as ext4 has a journal, and this is plain SATA disk, right?
> >
> > On next boot to Debian stable, I got stacktrace, and messages about
> > ext4 corruption. Back to Debian testing. systemd ran fsck, determined
> > it can't fix it, dropped me into emergency shell, _but mounted the
> > filesstem, anyway_. Oops.
>
> I've been running 3.17-rc4 plus the ext4 dev patches and due to either
> regressions in i915 or the X server (not sure which) over the last
> couple of weeks, I've had to power-down my system a number of times
> after the system has hung when either shutting down the X server or
> when trying to add or remove an external display. So I've had to
> unfortunately do a fair number of hard-power-offs on my T540p, and
> I've not noticed any like what you've described.
Ok, I'm not 100% sure it was 3.17-rcX... but according to logs, it
is. 3.17-rc4
> Can you give any more details? Are you using LVM or dm-crypt? Is
> this repeatable?
No, I don't think it is repeatable in useful way for debugging, but it
is not first time it happened here. No LVM or dm-crypt in use.
> > So I had to hard power-down the machine. That should be perfectly
> > safe, as ext4 has a journal, and this is plain SATA disk, right?
> >
> AFAIU you have some corruption on your fs (the root of cause is unknown
> at this moment)
> So you have following stages:
> 1) fs corruption
> 2) boot-> mount attempt
> 3) fsck
> During (1) Once ext4 driver found this error it will call ext4_error
> which will tag sb with FS_ERROR flag.
> During (2) it will found that tag and clear s_orphan which result
> in complain you have seen during(3)
I tried to search syslog, but could not find original messages. It
happened during shutdown. I guess syslog was already stopped at that
point..>?
Logs say:
Sep 28 11:45:38 amd NetworkManager[3422]: <info> Activation (tun0)
successful, device activated.
Sep 28 11:45:38 amd nm-dispatcher: Dispatching action 'up' for tun0
Sep 28 11:45:39 amd systemd[1]: Stopping OpenBSD Secure Shell
server...
Sep 28 11:45:39 amd systemd[1]: Starting OpenBSD Secure Shell
server...
Sep 28 11:45:39 amd systemd[1]: Started OpenBSD Secure Shell server.
Sep 28 11:45:41 amd NetworkManager[3422]: <warn> Could not send ARP
for local address 10.10.0.14: Failed to execute child process
"/sbin/arping" (No such file or directory)
Sep 28 11:45:49 amd ntpdate[1413]: adjust time server 193.85.174.5
offset 0.002797 sec
Sep 28 12:17:01 amd /USR/SBIN/CRON[3612]: (root) CMD ( cd / &&
run-parts --report /etc/cron.hourly)
Sep 28 12:58:12 amd rsyslogd: [origin software="rsyslogd"
swVersion="8.4.0" x-pid="3380" x-info="http://www.rsyslog.com"] start
Sep 28 12:58:12 amd systemd[1]: Starting Load Kernel Modules...
Sep 28 12:58:12 amd systemd[1]: Mounted POSIX Message Queue File
System.
Sep 28 12:58:12 amd systemd[1]: Starting udev Kernel Socket.
Sep 28 12:58:12 amd systemd[1]: Listening on udev Kernel Socket.
Sep 28 12:58:12 amd systemd[1]: Starting udev Control Socket.
Sep 28 12:58:12 amd systemd[1]: Listening on udev Control Socket.
Sep 28 12:58:12 amd systemd[1]: Starting udev Coldplug all Devices...
Sep 28 12:58:12 amd systemd[1]: Started Set Up Additional Binary
Formats.
Sep 28 12:58:12 amd systemd[1]: Starting Dispatch Password Requests to
Console Directory Watch.
Sep 28 12:58:12 amd systemd[1]: Started Dispatch Password Requests to
Console Directory Watch.
Sep 28 12:58:12 amd systemd[1]: Mounting Debug File System...
Sep 28 12:58:12 amd kernel: Initializing cgroup subsys cpu
Sep 28 12:58:12 amd kernel: Linux version 3.17.0-rc4 (pavel@amd) (gcc
version 4.9.1 (Debian 4.9.1-12) ) #1 SMP Sun Sep 14 21:24:53 CEST 2014
> > After update to debian testing, my machine sometimes fails to
> > reboot. (aptitude upgrade seems to be the trigger).
> >
> > So I had to hard power-down the machine. That should be perfectly
> > safe, as ext4 has a journal, and this is plain SATA disk, right?
> Yes, it should be safe.
Good.
> > On next boot to Debian stable, I got stacktrace, and messages about
> > ext4 corruption. Back to Debian testing. systemd ran fsck, determined
> It would be really good to get those messages... Ideally you could also
> use
> e2image -r <partition> | bzip2 -c
> to store fs metadata before doing anything else with the fs to a usb stick.
> That is invaluable for future analysis.
Too late for that :-(.
> > it can't fix it, dropped me into emergency shell, _but mounted the
> > filesstem, anyway_. Oops.
> What kernel versions are you running in Debian testing and stable?
Debian testing was 3.17-rc4, AFAICT. For debian stable -- not sure.
> My guess would be that kernel had problems only during orphan inode
> recovery (i.e. when deleting already deleted files) and we let the mount
> proceed if this fails because it's a relatively harmless problem.
Is there some phase during shutdown where journalling no longer
protects fs integrity?
Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
next prev parent reply other threads:[~2014-09-30 21:01 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-28 10:44 ext4: 3.17? problems Pavel Machek
2014-09-28 12:46 ` Theodore Ts'o
2014-09-30 21:01 ` Pavel Machek [this message]
2014-09-30 23:18 ` Henrique de Moraes Holschuh
2014-10-01 8:50 ` Jan Kara
2014-10-01 8:48 ` Jan Kara
2014-09-29 9:36 ` Dmitry Monakhov
2014-09-29 11:44 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140930210108.GB14283@amd \
--to=pavel@ucw.cz \
--cc=adilger.kernel@dilger.ca \
--cc=dmonakhov@openvz.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox