* ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro) @ 2012-08-09 10:00 Paolo Bonzini 2012-08-09 17:06 ` Theodore Ts'o 0 siblings, 1 reply; 5+ messages in thread From: Paolo Bonzini @ 2012-08-09 10:00 UTC (permalink / raw) To: tytso, Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4 Here is how to reproduce it. It happens during fstrim. I found other occurrences of the error in the mailing list, but they were not related to trim so they may be something different. modprobe scsi_debug dev_size_mb=256 lbpws=1 dd if=/dev/zero of=/dev/sdb bs=1M fdisk /dev/sdb >> create a new partition accepting all defaults fdisk -lu /dev/sdb|tail -1 >> should show: /dev/sdb1 57 524285 262114+ 83 Linux mkfs.ext4 /dev/sdb1 mkdir test mount /dev/sdb1 test fstrim ./test Here is the output in dmesg: [140934.644166] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null) [140941.562060] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 16, 8160 clusters in bitmap, 4064 in gd [140941.603066] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 25, 8192 clusters in bitmap, 7934 in gd [140941.613060] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 27, 8192 clusters in bitmap, 7934 in gd [140941.634074] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 31, 8192 clusters in bitmap, 8159 in gd Hope this helps, Paolo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro) 2012-08-09 10:00 ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro) Paolo Bonzini @ 2012-08-09 17:06 ` Theodore Ts'o 2012-08-15 9:17 ` Lukáš Czerner 0 siblings, 1 reply; 5+ messages in thread From: Theodore Ts'o @ 2012-08-09 17:06 UTC (permalink / raw) To: Lukas Czerner Cc: Paolo Bonzini, Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4 On Thu, Aug 09, 2012 at 12:00:09PM +0200, Paolo Bonzini wrote: > Here is how to reproduce it. It happens during fstrim. I found other > occurrences of the error in the mailing list, but they were not related > to trim so they may be something different. > > modprobe scsi_debug dev_size_mb=256 lbpws=1 > dd if=/dev/zero of=/dev/sdb bs=1M > fdisk /dev/sdb > >> create a new partition accepting all defaults > fdisk -lu /dev/sdb|tail -1 > >> should show: /dev/sdb1 57 524285 262114+ 83 Linux > > mkfs.ext4 /dev/sdb1 > mkdir test > mount /dev/sdb1 test > fstrim ./test I can confirm that this accurately reproduces file system corruption using a 3.5 kernel. It looks like some block allocation bitmap blocks is getting trimmed when it shouldn't have been. Lukas, can you take a look at this? - Ted ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro) 2012-08-09 17:06 ` Theodore Ts'o @ 2012-08-15 9:17 ` Lukáš Czerner 2012-08-16 14:28 ` Lukáš Czerner 0 siblings, 1 reply; 5+ messages in thread From: Lukáš Czerner @ 2012-08-15 9:17 UTC (permalink / raw) To: Theodore Ts'o Cc: Lukas Czerner, Paolo Bonzini, Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4 On Thu, 9 Aug 2012, Theodore Ts'o wrote: > Date: Thu, 9 Aug 2012 13:06:40 -0400 > From: Theodore Ts'o <tytso@mit.edu> > To: Lukas Czerner <lczerner@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com>, > "Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List" > <linux-kernel@vger.kernel.org>, linux-ext4@vger.kernel.org > Subject: Re: ext4fs error > "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" > (with repro) > > On Thu, Aug 09, 2012 at 12:00:09PM +0200, Paolo Bonzini wrote: > > Here is how to reproduce it. It happens during fstrim. I found other > > occurrences of the error in the mailing list, but they were not related > > to trim so they may be something different. > > > > modprobe scsi_debug dev_size_mb=256 lbpws=1 > > dd if=/dev/zero of=/dev/sdb bs=1M > > fdisk /dev/sdb > > >> create a new partition accepting all defaults > > fdisk -lu /dev/sdb|tail -1 > > >> should show: /dev/sdb1 57 524285 262114+ 83 Linux > > > > mkfs.ext4 /dev/sdb1 > > mkdir test > > mount /dev/sdb1 test > > fstrim ./test > > I can confirm that this accurately reproduces file system corruption > using a 3.5 kernel. It looks like some block allocation bitmap blocks > is getting trimmed when it shouldn't have been. Lukas, can you take a > look at this? > > - Ted Hi Ted, sorry for the delay, I've just got back from my vacation. I'll take a look at it. Thanks! -Lukas ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro) 2012-08-15 9:17 ` Lukáš Czerner @ 2012-08-16 14:28 ` Lukáš Czerner 2012-08-16 20:00 ` Theodore Ts'o 0 siblings, 1 reply; 5+ messages in thread From: Lukáš Czerner @ 2012-08-16 14:28 UTC (permalink / raw) To: Lukáš Czerner Cc: Theodore Ts'o, Paolo Bonzini, Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4 [-- Attachment #1: Type: TEXT/PLAIN, Size: 3295 bytes --] On Wed, 15 Aug 2012, Lukáš Czerner wrote: > Date: Wed, 15 Aug 2012 11:17:57 +0200 (CEST) > From: Lukáš Czerner <lczerner@redhat.com> > To: Theodore Ts'o <tytso@mit.edu> > Cc: Lukas Czerner <lczerner@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, > "Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List" > <linux-kernel@vger.kernel.org>, linux-ext4@vger.kernel.org > Subject: Re: ext4fs error > "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" > (with repro) > > On Thu, 9 Aug 2012, Theodore Ts'o wrote: > > > Date: Thu, 9 Aug 2012 13:06:40 -0400 > > From: Theodore Ts'o <tytso@mit.edu> > > To: Lukas Czerner <lczerner@redhat.com> > > Cc: Paolo Bonzini <pbonzini@redhat.com>, > > "Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List" > > <linux-kernel@vger.kernel.org>, linux-ext4@vger.kernel.org > > Subject: Re: ext4fs error > > "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" > > (with repro) > > > > On Thu, Aug 09, 2012 at 12:00:09PM +0200, Paolo Bonzini wrote: > > > Here is how to reproduce it. It happens during fstrim. I found other > > > occurrences of the error in the mailing list, but they were not related > > > to trim so they may be something different. > > > > > > modprobe scsi_debug dev_size_mb=256 lbpws=1 > > > dd if=/dev/zero of=/dev/sdb bs=1M > > > fdisk /dev/sdb > > > >> create a new partition accepting all defaults > > > fdisk -lu /dev/sdb|tail -1 > > > >> should show: /dev/sdb1 57 524285 262114+ 83 Linux > > > > > > mkfs.ext4 /dev/sdb1 > > > mkdir test > > > mount /dev/sdb1 test > > > fstrim ./test > > > > I can confirm that this accurately reproduces file system corruption > > using a 3.5 kernel. It looks like some block allocation bitmap blocks > > is getting trimmed when it shouldn't have been. Lukas, can you take a > > look at this? > > > > - Ted > > Hi Ted, > > sorry for the delay, I've just got back from my vacation. I'll take > a look at it. > > Thanks! > -Lukas This does not seem like an ext4 problem. The code seems unable to actually discard blocks which are allocated. Moreover I was not able to reproduce the problem on the loop device with the same setting as the reported scsi_debug device (1024 bs file system on the 256MB image residing on the 1024 bs filesystem) After a little bit of tracing with the systemtap and blktrace ext4 does not seem to be doing anything wrong and yet we get part of the block bitmap trimmed. This lead me to the scsi_debug driver itself and indeed it seems that we have off-by-one bug there in the unamp_region() which is causing the problem. Here is the patch which fixes the problem for me, I'll resend the proper patch in a bit. diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index 182d5a5..f4cc413 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -2054,7 +2054,7 @@ static void unmap_region(sector_t lba, unsigned int len) block = lba + alignment; rem = do_div(block, granularity); - if (rem == 0 && lba + granularity <= end && block < map_size) { + if (rem == 0 && lba + granularity < end && block < map_size) { clear_bit(block, map_storep); if (scsi_debug_lbprz) memset(fake_storep + Thanks! -Lukas ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro) 2012-08-16 14:28 ` Lukáš Czerner @ 2012-08-16 20:00 ` Theodore Ts'o 0 siblings, 0 replies; 5+ messages in thread From: Theodore Ts'o @ 2012-08-16 20:00 UTC (permalink / raw) To: Lukáš Czerner Cc: Paolo Bonzini, Linux Kernel Mailinlinux-ext4@vger.kernel.orgg List, linux-ext4 On Thu, Aug 16, 2012 at 04:28:07PM +0200, Lukáš Czerner wrote: > > After a little bit of tracing with the systemtap and blktrace ext4 > does not seem to be doing anything wrong and yet we get part of the > block bitmap trimmed. This lead me to the scsi_debug driver itself > and indeed it seems that we have off-by-one bug there in the > unamp_region() which is causing the problem. Thanks for finding this --- I was getting scared that ext4 users were losing data in production. It's good to know it was just a bug in the scsi_debug driver. - Ted ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-08-16 20:00 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-08-09 10:00 ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro) Paolo Bonzini 2012-08-09 17:06 ` Theodore Ts'o 2012-08-15 9:17 ` Lukáš Czerner 2012-08-16 14:28 ` Lukáš Czerner 2012-08-16 20:00 ` Theodore Ts'o
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).