* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
2012-01-05 18:15 ` Ted Ts'o
@ 2012-01-06 16:40 ` Mikulas Patocka
2012-01-28 4:53 ` WIMPy
0 siblings, 1 reply; 8+ messages in thread
From: Mikulas Patocka @ 2012-01-06 16:40 UTC (permalink / raw)
To: device-mapper development; +Cc: Sander Eikelenboom, linux-ext4, linux-kernel
On Thu, 5 Jan 2012, Ted Ts'o wrote:
> On Thu, Jan 05, 2012 at 05:14:28PM +0100, Sander Eikelenboom wrote:
> >
> > OK spoke too soon, i have been able to trigger it again:
> > - copying files from LV to the same LV without the snapshot went OK
> > - copying from the RO snapshot of a LV to the same LV gave the error while copying the file again:
>
> OK. Originally, you said you did this:
>
> 1) fsck -v -p -f the filesystem
> 2) mount the filesystem
> 3) Try to copy a file
> 4) filesystem will be mounted RO on error (see below)
> 5) fsck again, journal will be recovered, no other errors
> 6) start at 1)
>
> Was this with with a read-only snapshot always being in existence
> through all of these five steps? When was the RO snapshot created?
>
> If a RO snapshot has to be there in order for this to happen, then
> this is almost certainly a device-mapper regression. (dm-devel folks,
The existence of a snapshot changes I/O completion times significantly, so
it may be a race condition in ext4 that gets triggered which changed
timings.
Mikulas
> this is a problem which apparently occurred when the user went from
> v3.1.5 to v3.2, so this looks likes 3.2 regression.)
>
> - Ted
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
2012-01-06 16:40 ` [dm-devel] " Mikulas Patocka
@ 2012-01-28 4:53 ` WIMPy
2012-01-28 8:14 ` WIMPy
0 siblings, 1 reply; 8+ messages in thread
From: WIMPy @ 2012-01-28 4:53 UTC (permalink / raw)
To: linux-ext4
Mikulas Patocka <mpatocka <at> redhat.com> writes:
> The existence of a snapshot changes I/O completion times significantly, so
> it may be a race condition in ext4 that gets triggered which changed
> timings.
The idea that timing might cause issues on a FS is disturbing.
> > this is a problem which apparently occurred when the user went from
> > v3.1.5 to v3.2, so this looks likes 3.2 regression.)
I am on 3.2.0 as well.
It happened for me on a freshly created FS.
"mke2fs -j -O sparse_super -O dir_index -O extents -O filetype -O uninit_bg"
mounted with no additional options for the first time I got an
"EXT4-fs error (device md127): ext4_mb_generate_buddy:739: group 28671, 32765
clusters in bitmap, 32766 in gd"
after writing about 3TB of data.
I do not have RO snapshots as the OP, but my md sits on to of luks containers.
So we do have the device mapper in common.
Just for the records: Unlike the contents, the hardware is not new and did not
have any known issues.
Greetings,
WIMPy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
2012-01-28 4:53 ` WIMPy
@ 2012-01-28 8:14 ` WIMPy
2012-01-28 8:34 ` Andreas Dilger
0 siblings, 1 reply; 8+ messages in thread
From: WIMPy @ 2012-01-28 8:14 UTC (permalink / raw)
To: linux-ext4
Update:
>> > > this is a problem which apparently occurred when the user went from
> > > v3.1.5 to v3.2, so this looks likes 3.2 regression.)
>
> I am on 3.2.0 as well.
I didn't spot anything obvious in the logs.
> It happened for me on a freshly created FS.
> "mke2fs -j -O sparse_super -O dir_index -O extents -O filetype -O uninit_bg"
> mounted with no additional options for the first time I got an
> "EXT4-fs error (device md127): ext4_mb_generate_buddy:739: group 28671, 32765
> clusters in bitmap, 32766 in gd"
> after writing about 3TB of data.
> I do not have RO snapshots as the OP, but my md sits on to of luks
containers.
> So we do have the device mapper in common.
After I did an fsck and tried to continue, I didn't get that far.
After another 200GB or so it happened again.
And now it's reproducible:
I can run fsck and then try to continue (using rsync). But as soon as writing
starts, the process hangs for a long time. At least one minute, probably longer.
Then the ext4_mb_generate_buddy comes again.
I upgraded e2fstools from 1.41.14 to 1.42 and the kernel to 3.2.2.
No difference.
That FS is unusable.
> Just for the records: Unlike the contents, the hardware is not new and did
not
> have any known issues.
>
> Greetings,
> WIMPy
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo <at> vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
2012-01-28 8:14 ` WIMPy
@ 2012-01-28 8:34 ` Andreas Dilger
2012-01-28 15:31 ` WIMPy
0 siblings, 1 reply; 8+ messages in thread
From: Andreas Dilger @ 2012-01-28 8:34 UTC (permalink / raw)
To: WIMPy; +Cc: linux-ext4@vger.kernel.org
Could you please try to bisect the problem, if it is reproducible?
I was looking for a change which I thought might be responsible (removal of block bitmap initialization when inodes are first allocated from an uninitialized inode table) but I couldn't see it in the git log, so maybe that change has not landed yet.
I don't have any other ideas of which recent patches might be responsible at this point.
Cheers, Andreas
On 2012-01-28, at 1:14, WIMPy <WIMPy@yeti.dk> wrote:
> Update:
>
>>>>> this is a problem which apparently occurred when the user went from
>>>> v3.1.5 to v3.2, so this looks likes 3.2 regression.)
>>
>> I am on 3.2.0 as well.
>
> I didn't spot anything obvious in the logs.
>
>> It happened for me on a freshly created FS.
>> "mke2fs -j -O sparse_super -O dir_index -O extents -O filetype -O uninit_bg"
>> mounted with no additional options for the first time I got an
>> "EXT4-fs error (device md127): ext4_mb_generate_buddy:739: group 28671, 32765
>> clusters in bitmap, 32766 in gd"
>> after writing about 3TB of data.
>> I do not have RO snapshots as the OP, but my md sits on to of luks
> containers.
>> So we do have the device mapper in common.
>
> After I did an fsck and tried to continue, I didn't get that far.
> After another 200GB or so it happened again.
> And now it's reproducible:
> I can run fsck and then try to continue (using rsync). But as soon as writing
> starts, the process hangs for a long time. At least one minute, probably longer.
> Then the ext4_mb_generate_buddy comes again.
>
> I upgraded e2fstools from 1.41.14 to 1.42 and the kernel to 3.2.2.
> No difference.
> That FS is unusable.
>
>> Just for the records: Unlike the contents, the hardware is not new and did
> not
>> have any known issues.
>>
>> Greetings,
>> WIMPy
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo <at> vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
2012-01-28 8:34 ` Andreas Dilger
@ 2012-01-28 15:31 ` WIMPy
2012-01-28 21:04 ` WIMPy
0 siblings, 1 reply; 8+ messages in thread
From: WIMPy @ 2012-01-28 15:31 UTC (permalink / raw)
To: linux-ext4
Andreas Dilger <adilger <at> dilger.ca> writes:
>
> Could you please try to bisect the problem, if it is reproducible?
If you or someone else has an idea, how to do so, I will try to collect more
information.
There is actually an important bit I forgot to mention in the last message:
After I got the error and umount the FS I get lots of journal commit I/O
errors. But no indication as to what or why it fails.
> I was looking for a change which I thought might be responsible (removal of
block bitmap initialization
> when inodes are first allocated from an uninitialized inode table) but I
couldn't see it in the git log, so
> maybe that change has not landed yet.
>
> I don't have any other ideas of which recent patches might be responsible at
this point.
As there was a mention at the beginning that this may have happened after an
upgrade from 3.1.5 to 3.2, I will build a 3.1.5 and see if that really makes a
difference.
> On 2012-01-28, at 1:14, WIMPy <WIMPy <at> yeti.dk> wrote:
>
> > Update:
> >
> >>>>> this is a problem which apparently occurred when the user went from
> >>>> v3.1.5 to v3.2, so this looks likes 3.2 regression.)
> >>
> >> I am on 3.2.0 as well.
> >
> > I didn't spot anything obvious in the logs.
> >
> >> It happened for me on a freshly created FS.
> >> "mke2fs -j -O sparse_super -O dir_index -O extents -O filetype -O uninit_
bg"
> >> mounted with no additional options for the first time I got an
> >> "EXT4-fs error (device md127): ext4_mb_generate_buddy:739: group 28671,
32765
> >> clusters in bitmap, 32766 in gd"
> >> after writing about 3TB of data.
> >> I do not have RO snapshots as the OP, but my md sits on to of luks
> > containers.
> >> So we do have the device mapper in common.
> >
> > After I did an fsck and tried to continue, I didn't get that far.
> > After another 200GB or so it happened again.
> > And now it's reproducible:
> > I can run fsck and then try to continue (using rsync). But as soon as
writing
> > starts, the process hangs for a long time. At least one minute, probably
longer.
> > Then the ext4_mb_generate_buddy comes again.
> >
> > I upgraded e2fstools from 1.41.14 to 1.42 and the kernel to 3.2.2.
> > No difference.
> > That FS is unusable.
> >
> >> Just for the records: Unlike the contents, the hardware is not new and did
> > not
> >> have any known issues.
> >>
> >> Greetings,
> >> WIMPy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
2012-01-28 15:31 ` WIMPy
@ 2012-01-28 21:04 ` WIMPy
2012-02-03 5:30 ` WIMPy
0 siblings, 1 reply; 8+ messages in thread
From: WIMPy @ 2012-01-28 21:04 UTC (permalink / raw)
To: linux-ext4
... and another update:
> As there was a mention at the beginning that this may have happened after an
> upgrade from 3.1.5 to 3.2, I will build a 3.1.5 and see if that really makes
a
> difference.
Yes it does.
3.1.5 has been working for 4.5 hours now, continuing form the point where 3.2
and 3.2.2 reproducibly barfed.
I see some changes to ext4 on January 9 and 10. But nothing thereafter so I'm
not sure if it's worth trying something like 3.3-rc1.
The bad thing is that 3.2 has been working for about 20 hours, so it's not a
quick test.
> > >>>>> this is a problem which apparently occurred when the user went from
> > >>>> v3.1.5 to v3.2, so this looks likes 3.2 regression.)
> > >> It happened for me on a freshly created FS.
> > >> "mke2fs -j -O sparse_super -O dir_index -O extents -O filetype -O uninit_
> bg"
> > >> mounted with no additional options for the first time I got an
> > >> "EXT4-fs error (device md127): ext4_mb_generate_buddy:739: group 28671,
> 32765
> > >> clusters in bitmap, 32766 in gd"
> > >> after writing about 3TB of data.
> > >> I do not have RO snapshots as the OP, but my md sits on top of luks
> > > containers.
> > >> So we do have the device mapper in common.
> > >
> > > After I did an fsck and tried to continue, I didn't get that far.
> > > After another 200GB or so it happened again.
> > > And now it's reproducible:
> > > I can run fsck and then try to continue (using rsync). But as soon as
> writing
> > > starts, the process hangs for a long time. At least one minute, probably
> longer.
> > > Then the ext4_mb_generate_buddy comes again.
Greetings,
WIMPy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
2012-01-28 21:04 ` WIMPy
@ 2012-02-03 5:30 ` WIMPy
0 siblings, 0 replies; 8+ messages in thread
From: WIMPy @ 2012-02-03 5:30 UTC (permalink / raw)
To: linux-ext4
WIMPy <WIMPy <at> yeti.dk> writes:
> ... and another update:
I don't know what the cause is, but I think I've got the trigger.
Those errors appeared when using rsync on a directory containing a file that was
written to (extended) while the rsync was running, which seems to be a
situation, where rsync causes a lot of stress. It certainly takes a hell of a
lot of time.
I suspect any of the ext4 related commits from Jan 9th/10th. From the log I
guess they should still exist in 3.3-rc1. I'm currently testing that, but
unfortunately that might take some time.
And a short repeat: I'm using an md, but no lvm.
Greetings,
WIMPy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd
@ 2012-03-19 23:06 Tony Hoyle
0 siblings, 0 replies; 8+ messages in thread
From: Tony Hoyle @ 2012-03-19 23:06 UTC (permalink / raw)
To: linux-ext4
I looked at the changelogs for 3.2.x and couldn't see anything that
obviously related to this issue - hence posting on this (slightly old)
thread, since I can't find any followup. I've downgraded to 2.6.32
(last debian kernel available, since they don't seem to keep historical
kernels around) for now, which is running solidly.
WIMPy <wimpy <at> yeti.dk> writes:
> written to (extended) while the rsync was running, which seems to be
> a situation, where rsync causes a lot of stress. It certainly takes a
> hell of a lot of time.
I get it when I'm writing large files over nfs - exactly the same
symptoms as mentioned elsewhere in the thread, followed by nfsd going
into D state and things generally going downhill from there.
Started when I upgraded to 3.1.0 and continued up to 3.2.0. fsck shows
no errors on the disk, but the logs fill up with
ext4_mb_generate_buddy:739 errors anyway.
> And a short repeat: I'm using an md, but no lvm.
>
Same setup here - md, but no lvm. Another non-raid drive doesn't show
the same symptoms, if it's any help.
Tony
nb: Some logs, FWIW. As mentioned above, fsck says there are no errors
on the drive:
Mar 19 20:50:52 goliath kernel: [ 1721.686880] EXT4-fs error (device
md0): ext4_mb_generate_buddy:739: group 21345, 32254 clusters in bitmap,
32258 in gd
Mar 19 20:50:52 goliath kernel: [ 1721.703397] JBD2: Spotted dirty
metadata buffer (dev = md0, blocknr = 0). There's a risk of filesystem
corruption in case of system crash.
Mar 19 20:51:38 goliath kernel: [ 1767.622399] EXT4-fs error (device
md0): ext4_mb_generate_buddy:739: group 21346, 32254 clusters in bitmap,
32258 in gd
Mar 19 20:52:18 goliath kernel: [ 1808.268856] EXT4-fs error (device
md0): ext4_mb_generate_buddy:739: group 21347, 32254 clusters in bitmap,
32258 in gd
Mar 19 20:53:29 goliath kernel: [ 1879.257332] EXT4-fs error (device
md0): ext4_mb_generate_buddy:739: group 21348, 32254 clusters in bitmap,
32258 in gd
Mar 19 20:54:45 goliath kernel: [ 1955.083019] EXT4-fs error (device
md0): ext4_mb_generate_buddy:739: group 21349, 32254 clusters in bitmap,
32258 in gd
..etc. They don't vary much. A few thousand of these in rapid succession.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-03-19 23:16 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-19 23:06 [dm-devel] can't recover ext4 on lvm from ext4_mb_generate_buddy:739: group 1687, 32254 clusters in bitmap, 32258 in gd Tony Hoyle
-- strict thread matches above, loose matches on Subject: below --
2012-01-05 10:37 Sander Eikelenboom
2012-01-05 13:21 ` Sander Eikelenboom
2012-01-05 14:45 ` Theodore Tso
[not found] ` <4910694144.20120105171428@eikelenboom.it>
2012-01-05 18:15 ` Ted Ts'o
2012-01-06 16:40 ` [dm-devel] " Mikulas Patocka
2012-01-28 4:53 ` WIMPy
2012-01-28 8:14 ` WIMPy
2012-01-28 8:34 ` Andreas Dilger
2012-01-28 15:31 ` WIMPy
2012-01-28 21:04 ` WIMPy
2012-02-03 5:30 ` WIMPy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).