* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
@ 2014-12-13 3:36 ` bugzilla-daemon
2014-12-15 11:57 ` bugzilla-daemon
` (20 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2014-12-13 3:36 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
Theodore Tso <tytso@mit.edu> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tytso@mit.edu
--- Comment #1 from Theodore Tso <tytso@mit.edu> ---
This error just means there is an inconsistency between the number of free
blocks in the bitmap and the number of blocks reported in the block group
descriptors. So the errors reported by fsck could be consistent with what you
are seeing.
This might be fixed by d89b2f764e9f9eb6c1ac90d25e6fc5b43b676a48. It's a bit
painful because I'm not sure what is included in 3.16-0.bpo.2-amd64. You need
to ask the Debian kernel distro folks that question.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
2014-12-13 3:36 ` [Bug 89621] " bugzilla-daemon
@ 2014-12-15 11:57 ` bugzilla-daemon
2014-12-15 14:51 ` bugzilla-daemon
` (19 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2014-12-15 11:57 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #2 from John Hughes <john@calva.com> ---
Well, looking at fs/ext4/extents.c I see:
static int ext4_alloc_file_blocks(struct file *file, ext4_lblk_t offset,
ext4_lblk_t len, loff_t new_size,
int flags, int mode)
So it looks to me as if this kernel (based on 3.16.5) includes that commit.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
2014-12-13 3:36 ` [Bug 89621] " bugzilla-daemon
2014-12-15 11:57 ` bugzilla-daemon
@ 2014-12-15 14:51 ` bugzilla-daemon
2014-12-15 15:21 ` bugzilla-daemon
` (18 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2014-12-15 14:51 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #3 from Theodore Tso <tytso@mit.edu> ---
Well, was this a one-time thing or can you easily reproduce this error? The
problem is that almost anything can cause this --- bad memory, a wild pointer
derference coming from a proprietary video driver from Nvidia, etc. And by
the time we get the ext4-error, basically what we have is the dead body lying
in the alley. Where the bullet came from, and from what direction, and who
fired the gun, is not something that will be immediately evident; just that we
have a very specific kind of inconsistency between the allocation bitmap and
the block group summary statistics. So in order to track it down, we need to
be able to try to isolate causes. So for example, does the problem show up if
you use the latest bleeding edge kernel from mainline, etc.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (2 preceding siblings ...)
2014-12-15 14:51 ` bugzilla-daemon
@ 2014-12-15 15:21 ` bugzilla-daemon
2014-12-15 22:27 ` bugzilla-daemon
` (17 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2014-12-15 15:21 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #4 from John Hughes <john@calva.com> ---
I've seen it 3 times so far.
Once on boot, I ran fsck, the error has not come back (so far).
Once (on another, but similarly configured VM) the error came back almost
immediately after being fixed by fsck. A second run of fsck and no problems so
far.
I'm not running any odd drivers in these VM's. The host of one of them has an
Eicon DIVA card with it's drivers, the host of the other doesn't.
The strangest part of my setup is that I have an mdadm raid1 on the host
system, that is made available as a virtio block device to a guest (running
under LVM). In the guest I have LVM2 volumes, with ext3 formatted filesystems.
Everything is now running this 3.16 Debian kernel.
One filesystem was mounted with data=journal, the other is the root filesystem
mounted with the default data=ordered.
barriers are not explicitly disabled so they should be enabled by default by
now (yes?)
The virtio block devices have cache=none.
I'll try to schedule the installation of a more up-to-date kernel on at least
one of the machines, but it may take me a few days.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (3 preceding siblings ...)
2014-12-15 15:21 ` bugzilla-daemon
@ 2014-12-15 22:27 ` bugzilla-daemon
2014-12-16 11:46 ` bugzilla-daemon
` (16 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2014-12-15 22:27 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #5 from Theodore Tso <tytso@mit.edu> ---
Yes, barriers are enabled by default, and with cache=none, the flush commands
*should* be sent down to the host device. It might be worth doing a quick
blktrace to make sure the host device is getting the cache flushes and they
aren't getting dropped by qemu or by mdraid, but the configuration at least
looks sane.
In the cases where you've seen the problem, have there been any sense where
either (a) the host has suffered a crash or shutdown or power drop, or (b) the
guest VM has been abruptly shutdown or killed without triggering a normal guest
OS shutdown?
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (4 preceding siblings ...)
2014-12-15 22:27 ` bugzilla-daemon
@ 2014-12-16 11:46 ` bugzilla-daemon
2014-12-16 14:55 ` bugzilla-daemon
` (15 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2014-12-16 11:46 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #6 from John Hughes <john@calva.com> ---
Could you give me a clue about what exactlt I should do for a "quick blktrace",
what should I look for?
As for crashes and so on:
case #1, I found my guest machine "cedric" dead on 3 Dec, root filesystem
read-only. I ran fsck manualy and after reboot the log shows:
Dec 3 11:04:26 cedric kernel: [ 1.440438] EXT4-fs (dm-0): mounting ext3
file system using the ext4 subsystem
Dec 3 11:04:26 cedric kernel: [ 1.468114] EXT4-fs (dm-0): mounted
filesystem with ordered data mode. Opts: (null)
Dec 3 11:04:26 cedric kernel: [ 6.139705] EXT4-fs (dm-0): re-mounted. Opts:
(null)
Dec 3 11:04:26 cedric kernel: [ 6.311768] EXT4-fs (dm-0): re-mounted. Opts:
errors=remount-ro
Dec 3 11:09:11 cedric kernel: [ 302.048065] EXT4-fs (dm-0): error count since
last fsck: 3
Dec 3 11:09:11 cedric kernel: [ 302.048102] EXT4-fs (dm-0): initial error at
time 1417556618: ext4_mb_release_inode_pa:3773
Dec 3 11:09:11 cedric kernel: [ 302.048105] EXT4-fs (dm-0): last error at
time 1417596801: ext4_remount:4853
"cedric" had been up since 26 Nov.
Case #2, a filesystem went read-only on my guest machine "olympic" while the
cyrus imap "squatter" was indexing all the messages.
Dec 11 17:17:06 olympic kernel: [957229.875900] EXT4-fs (dm-4): pa
ffff88001f7ff980: logic 0, phys. 9176576, len 128
Dec 11 17:17:06 olympic kernel: [957229.878640] EXT4-fs error (device dm-4):
ext4_mb_release_inode_pa:3773: group 280, free 128, pa_free 127
Dec 11 17:17:06 olympic kernel: [957229.881350] Aborting journal on device
dm-4-8.
Dec 11 17:17:06 olympic kernel: [957229.891273] EXT4-fs (dm-4): Remounting
filesystem read-only
Dec 11 17:22:37 olympic kernel: [957561.369731] EXT4-fs (dm-4): pa
ffff880009f7b028: logic 0, phys. 9192448, len 512
Dec 11 17:22:37 olympic kernel: [957561.371926] EXT4-fs error (device dm-4):
ext4_mb_release_inode_pa:3773: group 280, free 512, pa_free 511
Dec 11 17:24:45 olympic kernel: [957688.608554] EXT4-fs error (device dm-4):
ext4_put_super:792: Couldn't clean up the journal
I ran fsck to fix the problem and re-mounted the filesystem, the problem
re-occured:
Dec 11 17:34:19 olympic kernel: [958263.206923] EXT4-fs (dm-4): mounting ext3
file system using the ext4 subsystem
Dec 11 17:34:19 olympic kernel: [958263.253131] EXT4-fs (dm-4): mounted
filesystem with journalled data mode. Opts: data=journal,errors=remount-ro
Dec 11 17:35:27 olympic kernel: [958331.251355] EXT4-fs (dm-4): pa
ffff88003d99aa50: logic 0, phys. 60800, len 32
Dec 11 17:35:27 olympic kernel: [958331.254094] EXT4-fs error (device dm-4):
ext4_mb_release_inode_pa:3773: group 1, free 32, pa_free 31
Dec 11 17:35:27 olympic kernel: [958331.256878] Aborting journal on device
dm-4-8.
Dec 11 17:35:27 olympic kernel: [958331.266214] EXT4-fs (dm-4): Remounting
filesystem read-only
Dec 11 17:36:17 olympic kernel: [958380.640506] EXT4-fs error (device dm-4):
ext4_put_super:792: Couldn't clean up the journal
I fixed it again with fsck, now all seems well (so far)
Dec 11 17:48:47 olympic kernel: [959131.397372] EXT4-fs (dm-4): mounting ext3
file system using the ext4 subsystem
Dec 11 17:48:47 olympic kernel: [959131.430854] EXT4-fs (dm-4): mounted
filesystem with journalled data mode. Opts: data=journal,errors=remount-ro
The system had been running since Nov 30.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (5 preceding siblings ...)
2014-12-16 11:46 ` bugzilla-daemon
@ 2014-12-16 14:55 ` bugzilla-daemon
2015-05-21 19:01 ` bugzilla-daemon
` (14 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2014-12-16 14:55 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #7 from Theodore Tso <tytso@mit.edu> ---
OK, so vis-a-vis case #1, it looks like the fsck didn't take.
Dec 3 11:04:26 cedric kernel: [ 1.440438] EXT4-fs (dm-0): mounting ext3
file system using the ext4 subsystem
Dec 3 11:04:26 cedric kernel: [ 1.468114] EXT4-fs (dm-0): mounted
filesystem with ordered data mode. Opts: (null)
Dec 3 11:04:26 cedric kernel: [ 6.139705] EXT4-fs (dm-0): re-mounted. Opts:
(null)
Dec 3 11:04:26 cedric kernel: [ 6.311768] EXT4-fs (dm-0): re-mounted. Opts:
errors=remount-ro
Dec 3 11:09:11 cedric kernel: [ 302.048065] EXT4-fs (dm-0): error count since
last fsck: 3
Dec 3 11:09:11 cedric kernel: [ 302.048102] EXT4-fs (dm-0): initial error at
time 1417556618: ext4_mb_release_inode_pa:3773
Dec 3 11:09:11 cedric kernel: [ 302.048105] EXT4-fs (dm-0): last error at
time 1417596801: ext4_remount:4853
I'm not exactly sure what time zone your system messages are in, but e2fsck
1.42.5 (I assume you're using e2fsprogs from debian stable?) will clear the
error stats, and time 1417596801 is:
% TZ=GMT date -d @1417596801
Wed Dec 3 08:53:21 GMT 2014
Either that, or maybe the raid system is acting up and writes aren't going to
both halves, or something weird like that? Can you do me a favor and
configure a handful of VM's that ***don't*** use MD RAID, just so we can rule
out funny business on that score?
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (6 preceding siblings ...)
2014-12-16 14:55 ` bugzilla-daemon
@ 2015-05-21 19:01 ` bugzilla-daemon
2015-05-27 9:53 ` bugzilla-daemon
` (13 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2015-05-21 19:01 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
Michal Cihar <michal@cihar.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |michal@cihar.com
--- Comment #8 from Michal Cihar <michal@cihar.com> ---
I've seen this several times as well. All cases were after upgrade to Debian
Jessie with 3.16.0 kernel running in KVM. The host kernel is unchanged, so I'm
pretty sure the guest kernel is guilty here. The host does LVM + RAID, the
guest just gets virtio based disk so nothing complex on it's side.
The error in my case:
[149757.256024] EXT4-fs (vda1): pa ffff88001c92bd90: logic 0, phys. 1944064,
len 64
[149757.323573] EXT4-fs error (device vda1): ext4_mb_release_inode_pa:3773:
group 59, free 4, pa_free 2
[149757.349776] Aborting journal on device vda1-8.
[149757.386808] EXT4-fs (vda1): Remounting filesystem read-only
[236229.600081] EXT4-fs (vda1): error count since last fsck: 1
[236229.600086] EXT4-fs (vda1): initial error at time 1432088178:
ext4_mb_release_inode_pa:3773
[236229.600088] EXT4-fs (vda1): last error at time 1432088178:
ext4_mb_release_inode_pa:3773
I'm not able to reproduce this, but I've already seen the error several times.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (7 preceding siblings ...)
2015-05-21 19:01 ` bugzilla-daemon
@ 2015-05-27 9:53 ` bugzilla-daemon
2015-05-27 10:01 ` bugzilla-daemon
` (12 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2015-05-27 9:53 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #9 from John Hughes <john@calva.com> ---
So the common factors between Michal's case and mine are:
Debian 3.16.0 kernel running in KVM
Host kernel doing MDADM (Raid 1 for me)
Michal is running LVM on the host, I'm running it on the guest.
Michal, my filesystems are ext3, (now being handled by ext4 subsystem). Are
yours ext3 or ext4?
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (8 preceding siblings ...)
2015-05-27 9:53 ` bugzilla-daemon
@ 2015-05-27 10:01 ` bugzilla-daemon
2015-05-28 0:27 ` bugzilla-daemon
` (11 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2015-05-27 10:01 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #10 from Michal Cihar <michal@cihar.com> ---
In my case it is ext4. And I have LVM + RAID 1 on host.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (9 preceding siblings ...)
2015-05-27 10:01 ` bugzilla-daemon
@ 2015-05-28 0:27 ` bugzilla-daemon
2015-05-28 1:22 ` bugzilla-daemon
` (10 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2015-05-28 0:27 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #11 from Theodore Tso <tytso@mit.edu> ---
Why are you still using v3.16.0? The latest 3.16 kernel is v3.16.7. I
believe the bug in question was fixed in a backport that first showed up in
v3.16.2:
commit 868be28e321b764da29592cfd9fd659434e823aa
Author: Theodore Ts'o <tytso@mit.edu>
Date: Wed Jul 30 22:17:17 2014 -0400
ext4: fix ext4_discard_allocated_blocks() if we can't allocate the pa
struct
commit 86f0afd463215fc3e58020493482faa4ac3a4d69 upstream.
If there is a failure while allocating the preallocation structure, a
number of blocks can end up getting marked in the in-memory buddy
bitmap, and then not getting released. This can result in the
following corruption getting reported by the kernel:
EXT4-fs error (device sda3): ext4_mb_generate_buddy:758: group 1126,
12793 clusters in bitmap, 12729 in gd
In that case, we need to release the blocks using mb_free_blocks().
Tested: fs smoke test; also demonstrated that with injected errors,
the file system is no longer getting corrupted
Google-Bug-Id: 16657874
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (10 preceding siblings ...)
2015-05-28 0:27 ` bugzilla-daemon
@ 2015-05-28 1:22 ` bugzilla-daemon
2015-05-28 5:24 ` bugzilla-daemon
` (9 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2015-05-28 1:22 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #12 from Theodore Tso <tytso@mit.edu> ---
Ah, never mind; the "debian 3.16.0" kernel is actually based on 3.16.7-ckt9 --
at least, the latest version of debian 3.16.0. You'd have to specify exactly
what version of the "debian 3.16.0" kernel to know for sure. (But this is why
many upstream kernel developers don't try to support distro kernels. It's hard
to see what commits might or might not be in a particular kernel distro.)
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (11 preceding siblings ...)
2015-05-28 1:22 ` bugzilla-daemon
@ 2015-05-28 5:24 ` bugzilla-daemon
2015-05-28 10:35 ` bugzilla-daemon
` (8 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2015-05-28 5:24 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #13 from Michal Cihar <michal@cihar.com> ---
Sorry for being unclear, in my case it's 3.16.7-ckt9-3~deb8u1, latest version
available in Debian stable.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (12 preceding siblings ...)
2015-05-28 5:24 ` bugzilla-daemon
@ 2015-05-28 10:35 ` bugzilla-daemon
2015-07-24 16:17 ` bugzilla-daemon
` (7 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2015-05-28 10:35 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #14 from John Hughes <john@calva.com> ---
And for me its:
Linux olympic 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt7-1~bpo70+1
(2015-04-07) x86_64 GNU/Linux
The last but one backported version for oldstable. I can update to
3.16.7-ckt9...
(What a horrid name)
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (13 preceding siblings ...)
2015-05-28 10:35 ` bugzilla-daemon
@ 2015-07-24 16:17 ` bugzilla-daemon
2015-08-11 15:15 ` bugzilla-daemon
` (6 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2015-07-24 16:17 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #15 from John Hughes <john@calva.com> ---
Since I've seen this bug regularly:
Jul 9 01:33:50 olympic kernel: [2914041.432613] Aborting journal on device
dm-4-8.
Jul 15 14:00:17 olympic kernel: [196797.659798] Aborting journal on device
dm-2-8.
Jun 23 11:16:37 olympic kernel: [1566607.798081] Aborting journal on device
dm-4-8.
And a bunch of times on / where it didn't get logged, but I needed to reboot
after fsck:
Jul 13 07:20:31 olympic kernel: imklog 5.8.11, log source = /proc/kmsg started.
Jul 15 14:33:18 olympic kernel: imklog 5.8.11, log source = /proc/kmsg started.
Jul 19 15:36:11 olympic kernel: imklog 5.8.11, log source = /proc/kmsg started.
Jul 23 08:00:30 olympic kernel: imklog 5.8.11, log source = /proc/kmsg started.
(More or less at least once every 4 days)
I've upgraded the kernel to stock 3.18.19.
I'll report here if I see the problem again.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (14 preceding siblings ...)
2015-07-24 16:17 ` bugzilla-daemon
@ 2015-08-11 15:15 ` bugzilla-daemon
2015-08-11 18:16 ` bugzilla-daemon
` (5 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2015-08-11 15:15 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #16 from John Hughes <john@calva.com> ---
Ok, I've now seen the bug while running 3.18.19 (compiled by me straight from
kernel git).
It stayed up for way longer this time :-) but then...
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Initializing cgroup subsys cpuacct
[ 0.000000] Linux version 3.18.19-jh1 (john@olympic) (gcc version 4.7.2
(Debian 4.7.2-5) ) #1 SMP Thu Jul 23 14:54:25 CEST 2015
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.18.19-jh1
root=/dev/mapper/olympic-olympic--root ro console=tty0 console=ttyS0,19200n8
quiet
...
[ 5.310778] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro
[ 7.917833] EXT4-fs (vda1): mounting ext3 file system using the ext4
subsystem
[ 7.937645] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts:
errors=remount-ro
[ 7.973622] EXT4-fs (dm-2): mounting ext3 file system using the ext4
subsystem
[ 8.020501] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts:
errors=remount-ro
[ 8.051234] EXT4-fs (dm-3): mounting ext3 file system using the ext4
subsystem
[ 8.084443] EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts:
errors=remount-ro
[ 8.110174] EXT4-fs: Warning: mounting with data=journal disables delayed
allocation and O_DIRECT support!
[ 8.110180] EXT4-fs (dm-4): mounting ext3 file system using the ext4
subsystem
[ 8.163955] EXT4-fs (dm-4): mounted filesystem with journalled data mode.
Opts: data=journal,errors=remount-ro
...
[1570311.174406] EXT4-fs (dm-2): pa ffff8800281ac980: logic 0, phys. 1351168,
len 512
[1570311.176802] EXT4-fs error (device dm-2): ext4_mb_release_inode_pa:3773:
group 41, free 431, pa_free 430
[1570311.179413] Aborting journal on device dm-2-8.
[1570311.199091] EXT4-fs (dm-2): Remounting filesystem read-only
On reboot the system complained about dm-2 being bad:
[ 9.196972] EXT4-fs (dm-2): mounting ext3 file system using the ext4
subsystem
[ 10.134032] EXT4-fs warning (device dm-2): ext4_clear_journal_err:4659:
Filesystem error recorded from previous mount: IO failure
[ 10.136694] EXT4-fs warning (device dm-2): ext4_clear_journal_err:4660:
Marking fs in need of filesystem check.
[ 10.158051] EXT4-fs (dm-2): warning: mounting fs with errors, running e2fsck
is recommended
[ 10.171173] EXT4-fs (dm-2): Errors on filesystem, clearing orphan list.
[ 10.171173]
[ 10.173578] EXT4-fs (dm-2): recovery complete
[ 10.188013] EXT4-fs (dm-2): mounted filesystem with ordered data mode. Opts:
errors=remount-ro
So I unmounted it and did a fsck:
Script started on Tue Aug 11 16:54:56 2015
root@olympic:~# fsck -y -f /dev/mapper/olympic-olympic--home
fsck from util-linux 2.20.1
e2fsck 1.42.5 (29-Jul-2012)
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found. Fix? yes
Inode 3827814 was part of the orphaned inode list. FIXED.
Deleted inode 3828431 has zero dtime. Fix? yes
Inode 4137318 was part of the orphaned inode list. FIXED.
Inode 4137485 was part of the orphaned inode list. FIXED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -(723856--723871) -(738144--738172) +1351589
-7897632 -(7926715--7926730) -(11616272--11616287) -(11627328--11627356)
-18842491 -(18846824--18846839) -29065743 -(29093767--29093782) -32965419
-(32983830--32983845) -(36864912--36864919) -(36965200--36965204)
Fix? yes
Free blocks count wrong for group #22 (27818, counted=27863).
Fix? yes
Free blocks count wrong for group #241 (25273, counted=25290).
Fix? yes
Free blocks count wrong for group #354 (23179, counted=23224).
Fix? yes
Free blocks count wrong for group #575 (25214, counted=25231).
Fix? yes
Free blocks count wrong for group #887 (24494, counted=24511).
Fix? yes
Free blocks count wrong for group #1006 (21249, counted=21266).
Fix? yes
Free blocks count wrong for group #1125 (25845, counted=25853).
Fix? yes
Free blocks count wrong for group #1128 (28938, counted=28943).
Fix? yes
Free blocks count wrong (22597710, counted=22597881).
Fix? yes
Inode bitmap differences: -3827814 -3828431 -4137318 -4137485
Fix? yes
Free inodes count wrong for group #467 (5325, counted=5327).
Fix? yes
Free inodes count wrong for group #505 (6985, counted=6987).
Fix? yes
Free inodes count wrong (9975485, counted=9975489).
Fix? yes
olympic-home: ***** FILE SYSTEM WAS MODIFIED *****
olympic-home: 510271/10485760 files (13.1% non-contiguous), 19345159/41943040
blocks
root@olympic:~# exit
Script done on Tue Aug 11 17:01:11 2015
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (15 preceding siblings ...)
2015-08-11 15:15 ` bugzilla-daemon
@ 2015-08-11 18:16 ` bugzilla-daemon
2015-08-11 21:11 ` bugzilla-daemon
` (4 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2015-08-11 18:16 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #17 from Theodore Tso <tytso@mit.edu> ---
John, are you willing to give us a lot more details about your set up?
Hardware, software, details of the host OS, details of the guest OS, what is
the workload where the problem is happening, etc?
And it might be helpful if you opened a separate bug which we use exclusively
for tracking your particular issue. One of the things which is often
disastrous, and why I've largely given up on Ubuntu's launchpad, is when
multiple people assume that the problem that they are seeing is the same as
everyone else's. when in fact they could be separate bugs, or one of the bugs
could be a hardware problem, etc.
Then when one person doesn't give details about their hardware because they saw
someone else already had, it makes finding the problems even harder.
One thing to keep in mind when we are talking about problems found in the guest
OS is that misconfigurations at the host OS level can be indistinguishable from
hardware problems. So for example, depending on the caching policy used by
KVM in the host OS issues, it may be that cache flush commands in the guest OS
don't peculate down all the way down to the actual hardware platters. Thus,
after a power fail event, even if everything is configured correctly at the
guest OS, it can result in an inconsistent file system image because the KVM /
the host OS didn't make sure everything got pushed to the disk after a journal
commit.
So knowing whether or not were any shutdowns caused either by KVM crashing, or
the host OS crashing, or the guest OS crashing, is all very important in trying
to debug things, because the error message reported simply means that there was
an inconsistency detected in the allocation bitmaps. This can be caused by
any number of things --- which is why doctors hate it when someone asks them
what's wrong because "I have a cough" at a social occasion, and then someone
else chimes in, "well, my grandfather had a cough right before he was diagnosed
with terminal lung cancer"...
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (16 preceding siblings ...)
2015-08-11 18:16 ` bugzilla-daemon
@ 2015-08-11 21:11 ` bugzilla-daemon
2015-08-12 9:08 ` bugzilla-daemon
` (3 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2015-08-11 21:11 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
cwseys@physics.wisc.edu changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |cwseys@physics.wisc.edu
--- Comment #18 from cwseys@physics.wisc.edu ---
Hi John,
Me too! :p We're having similar symptoms. Running VMs with Wheezy (kernel
version 3.2), SL6 (kernel version 2.6), and Jessie (kernel version 3.16).
Only Jessie VMs go read-only sporadically even when running on the same host.
I'm recording which VM host this is happening on, hoping to find a pattern,
but so far not luck.
Also, not easily reproducible. E.g. for me (maybe not same bug!) the hosts
have both been running software RAID and have been using hardware RAID cards.
Not all the VMs running 3.16 have had this problem, but some have had it
multiple times.
If I hit on a reliable failure I will start git bisecting between 3.2 and 3.16.
(A brute force approach, but maybe helpful.)
Chad.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (17 preceding siblings ...)
2015-08-11 21:11 ` bugzilla-daemon
@ 2015-08-12 9:08 ` bugzilla-daemon
2015-08-12 9:12 ` bugzilla-daemon
` (2 subsequent siblings)
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2015-08-12 9:08 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #19 from John Hughes <john@calva.com> ---
(In reply to Theodore Tso from comment #17)
> John, are you willing to give us a lot more details about your set up?
> Hardware, software, details of the host OS, details of the guest OS, what is
> the workload where the problem is happening, etc?
>
> And it might be helpful if you opened a separate bug which we use
> exclusively for tracking your particular issue.
Ok, I've opened https://bugzilla.kernel.org/show_bug.cgi?id=102731
Please let me know if you need more info.
> One thing to keep in mind when we are talking about problems found in the
> guest OS is that misconfigurations at the host OS level can be
> indistinguishable from hardware problems. So for example, depending on the
> caching policy used by KVM in the host OS issues, it may be that cache flush
> commands in the guest OS don't peculate down all the way down to the actual
> hardware platters. Thus, after a power fail event, even if everything is
> configured correctly at the guest OS, it can result in an inconsistent file
> system image because the KVM / the host OS didn't make sure everything got
> pushed to the disk after a journal commit.
I have had one power fail, but since then mdadm has run a consistency check and
found no inconsistencies.
I'm also running regular (once a week) fsck's on (snapshots of) my filesystems.
So far the fsck have never found a problem.
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (18 preceding siblings ...)
2015-08-12 9:08 ` bugzilla-daemon
@ 2015-08-12 9:12 ` bugzilla-daemon
2022-02-04 13:13 ` bugzilla-daemon
2022-02-07 9:34 ` bugzilla-daemon
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2015-08-12 9:12 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
John Hughes <john@calva.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Regression|No |Yes
--
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (19 preceding siblings ...)
2015-08-12 9:12 ` bugzilla-daemon
@ 2022-02-04 13:13 ` bugzilla-daemon
2022-02-07 9:34 ` bugzilla-daemon
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2022-02-04 13:13 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
GSI (kernel.org-115@groovy-skills.com) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |kernel.org-115@groovy-skill
| |s.com
--- Comment #20 from GSI (kernel.org-115@groovy-skills.com) ---
This is on a HP ProLiant DL380 G7 with HP Smart Array P410 controller with 8
disks configured as RAID-6.
S.M.A.R.T. reports the disks as healthy.
The server has been running for roughly ten months before this issue occured
the 1st of this month.
fsck did some repairs. Two days later, 3rd, the issue occured again.
What is the meaning of the suffix -8 as reported by
https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/tree/fs/jbd2/journal.c#n2422
for j_devname?
Regardless of actual device, all bug reports appear to have that -8 in common.
Examples:
"Aborting journal on device sda3-8."
https://askubuntu.com/questions/595024/suddenly-read-only-file-system-ext4
"Aborting journal on device dm-2-8."
https://bugzilla.kernel.org/show_bug.cgi?id=102731
"Aborting journal on device sda2:8."
https://bbs.archlinux.org/viewtopic.php?id=70077
"Aborting journal on device vda1-8."
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672
"Aborting journal on device dm-1-8."
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672/comments/26
"Aborting journal on device dm-9-8."
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1423672/comments/28
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Bug 89621] EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32
2014-12-12 12:28 [Bug 89621] New: EXT4-fs error (device dm-1): ext4_mb_release_inode_pa:3773: group 24089, free 34, pa_free 32 bugzilla-daemon
` (20 preceding siblings ...)
2022-02-04 13:13 ` bugzilla-daemon
@ 2022-02-07 9:34 ` bugzilla-daemon
21 siblings, 0 replies; 23+ messages in thread
From: bugzilla-daemon @ 2022-02-07 9:34 UTC (permalink / raw)
To: linux-ext4
https://bugzilla.kernel.org/show_bug.cgi?id=89621
--- Comment #21 from GSI (kernel.org-115@groovy-skills.com) ---
"This" in my last comment referring to the following output, which I had been
omitting unintentionally:
kvm [12151]: vcpu0, guest rIP: 0xffffffff010410a8 disabled perfctr wrmsr: 0xc1
data 0xabcd
kvm [14901]: vcpu0, guest rIP: 0xffffffff010410a8 disabled perfctr wrmsr: 0xc1
data 0xabcd
blk_update_request: critical medium error, dev sda, sector 1736391592 op
0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
Aborting journal on device sda4-8.
EXT4-fs error (device sda4): ext4_journal_check_start:83: comm worker: Detected
aborted journal
EXT4-fs (sda4): Remounting filesystem read-only
The first two lines may not be related, but at least some (all?) bug reports on
the topic mentioned the machine in question being used as hypervisor.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 23+ messages in thread