* Help tracking down problem --- endless loop in __find_get_block_slow @ 2005-02-21 10:46 Thomas S. Iversen 2005-02-22 9:18 ` Andrew Morton 0 siblings, 1 reply; 14+ messages in thread From: Thomas S. Iversen @ 2005-02-21 10:46 UTC (permalink / raw) To: device-mapper development, linux-kernel Hi There I am trying to develop a device mapper plugin which does transparent block encryption and sector shuffling in the style freebsd does it (GBDE) Reads are support and working, but have trouble getting writes to work properly. If I do a simple: echo "test" > /mnt/test (where /mnt is /dev/mapper/gbde) sync it works just fine. If I do dd if=/dev/zero of=/mnt/testfile count=N, N=1..6 it works fine But if I do dd if=/dev/zero of=/mnt/testfile count=N, N>6 I get into an endless loop in __find_get_block_slow. My write path does something like this: recieve original BIO (eg. size=4096). Split BIO into sectorsize chunks. map chunks to physical sectors. encrypt sectors generate keys write sectors update keysectors ... when all sectors are written call bio_endio on the original request. To awoid allocating alot of pages during writes, I use the mem from the original request to do encryption "inplace". Could that be the cause of my problems? I would clearly like to minimize the need for page allocation in my dm-module, so I hope it isn't. Whats going on here? Every comments appriciated! Regards Thomas ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Help tracking down problem --- endless loop in __find_get_block_slow 2005-02-21 10:46 Help tracking down problem --- endless loop in __find_get_block_slow Thomas S. Iversen @ 2005-02-22 9:18 ` Andrew Morton 2005-02-22 22:46 ` Jeff Mahoney 2005-02-23 12:00 ` Help tracking down problem --- endless loop in __find_get_block_slow Thomas S. Iversen 0 siblings, 2 replies; 14+ messages in thread From: Andrew Morton @ 2005-02-22 9:18 UTC (permalink / raw) To: Thomas S. Iversen; +Cc: dm-devel, linux-kernel "Thomas S. Iversen" <zensonic@zensonic.dk> wrote: > > But if I do > > dd if=/dev/zero of=/mnt/testfile count=N, N>6 > > I get into an endless loop in __find_get_block_slow. The only way in which __find_get_block_slow() can loop is if something wrecked the buffer_head ring at page->private: something caused an internal loop via bh->b_this_page. Are you sure that's where things are hanging? That it's not stuck on a spinlock? A sysrq-P trace might help. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Help tracking down problem --- endless loop in __find_get_block_slow 2005-02-22 9:18 ` Andrew Morton @ 2005-02-22 22:46 ` Jeff Mahoney 2005-02-22 23:28 ` Andrew Morton 2005-02-23 12:00 ` Help tracking down problem --- endless loop in __find_get_block_slow Thomas S. Iversen 1 sibling, 1 reply; 14+ messages in thread From: Jeff Mahoney @ 2005-02-22 22:46 UTC (permalink / raw) To: Andrew Morton; +Cc: Thomas S. Iversen, dm-devel, linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Andrew Morton wrote: > "Thomas S. Iversen" <zensonic@zensonic.dk> wrote: > >>But if I do >> >> dd if=/dev/zero of=/mnt/testfile count=N, N>6 >> >> I get into an endless loop in __find_get_block_slow. > > > The only way in which __find_get_block_slow() can loop is if something > wrecked the buffer_head ring at page->private: something caused an internal > loop via bh->b_this_page. > > Are you sure that's where things are hanging? That it's not stuck on a > spinlock? > > A sysrq-P trace might help. I've observed similar effects without DM involved at all. I've been able to reproduce using subfs (it brings out umount races nicely) on kernels from 2.6.5 to 2.6.11-rc4, it's platform and device independent. In my experience, the loop is actually outside of __find_get_block_slow(), in __getblk_slow(). I've been using xmon to interrupt the kernel, and the results vary but are all rooted in the for(;;) loop in __getblk_slow. It appears as though grow_buffers is finding/creating the page, but then __find_get_block can't locate the buffer it needs. - -Jeff - -- Jeff Mahoney SuSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFCG7ZeLPWxlyuTD7IRAixHAJsHORHEMfFtTIozqwUOkk9WGFxCggCgiSfn V3kCyFn/X87Mw4laVsJLUp4= =YNSw -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Help tracking down problem --- endless loop in __find_get_block_slow 2005-02-22 22:46 ` Jeff Mahoney @ 2005-02-22 23:28 ` Andrew Morton 2005-02-26 0:03 ` Jeff Mahoney 0 siblings, 1 reply; 14+ messages in thread From: Andrew Morton @ 2005-02-22 23:28 UTC (permalink / raw) To: Jeff Mahoney; +Cc: zensonic, dm-devel, linux-kernel Jeff Mahoney <jeffm@suse.com> wrote: > > In my experience, the loop is actually outside of > __find_get_block_slow(), in __getblk_slow(). I've been using xmon to > interrupt the kernel, and the results vary but are all rooted in the > for(;;) loop in __getblk_slow. It appears as though grow_buffers is > finding/creating the page, but then __find_get_block can't locate the > buffer it needs. Yes, that'll happen. Because there are still buffers attached to the page which have the wrong blocksize. Say, if someone is trying to read a 2k buffer_head which is backed by a page which already has 1k buffer_heads attached to it. Does your kernel not have that big printk in __find_get_block_slow()? If it does, maybe some of the buffers are unmapped. Try: --- 25/fs/buffer.c~a Tue Feb 22 15:27:35 2005 +++ 25-akpm/fs/buffer.c Tue Feb 22 15:27:41 2005 @@ -456,7 +456,7 @@ __find_get_block_slow(struct block_devic * file io on the block device and getblk. It gets dealt with * elsewhere, don't buffer_error if we had some unmapped buffers */ - if (all_mapped) { + { printk("__find_get_block_slow() failed. " "block=%llu, b_blocknr=%llu\n", (unsigned long long)block, (unsigned long long)bh->b_blocknr); _ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Help tracking down problem --- endless loop in __find_get_block_slow 2005-02-22 23:28 ` Andrew Morton @ 2005-02-26 0:03 ` Jeff Mahoney 2005-02-28 21:06 ` Jeff Mahoney 2005-02-28 21:09 ` Help tracking down problem --- endless loop in __find_get_block_slow (now with the patch) Jeff Mahoney 0 siblings, 2 replies; 14+ messages in thread From: Jeff Mahoney @ 2005-02-26 0:03 UTC (permalink / raw) To: Andrew Morton; +Cc: zensonic, dm-devel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2844 bytes --] -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Andrew Morton wrote: > Jeff Mahoney <jeffm@suse.com> wrote: > >>In my experience, the loop is actually outside of >>__find_get_block_slow(), in __getblk_slow(). I've been using xmon to >>interrupt the kernel, and the results vary but are all rooted in the >>for(;;) loop in __getblk_slow. It appears as though grow_buffers is >>finding/creating the page, but then __find_get_block can't locate the >>buffer it needs. > > > Yes, that'll happen. Because there are still buffers attached to the page > which have the wrong blocksize. Say, if someone is trying to read a 2k > buffer_head which is backed by a page which already has 1k buffer_heads > attached to it. > > Does your kernel not have that big printk in __find_get_block_slow()? If > it does, maybe some of the buffers are unmapped. Try: I think it's likely I'm experiencing a different bug than the original poster. I've tried making the printk unconditional, and I get no output. However, I've continued to track it down, and I believe I've found a umount race. I can also reproduce it without subfs, with the attached script. I added some debug output to aid in my search: __find_get_block_slow: find_get_page [block=17508,blksize=2048,index=8754,sizebits=1,size=512] returned null returning page [index=2188,block=17504,size=512,sizebits=3] Couldn't find buffer @ block 17508 What I'm observing is that __find_get_block_slow is calculating the index using the blocksize for the device, and the grow_buffers call is using the blocksize handed down from the filesystem via sb_bread(). They *should* be the same, but here's where my suspected race comes in. Since the buffers are being searched for in the wrong place, they're never found, causing the infinite loop. The open_bdev_excl() call in get_sb_bdev() should be keeping callers out until the block device is actually closed, but it uses the fs_type struct as the holder which, given that the filesystem to be mounted is the same one as the one being umounted, will be the same. This allows the mount attempt to continue. If the superblock for the umounting filesystem is already in the process of getting shut down, sget() will create a new superblock and the mount attempt will use that one. The umount will continue, destroying the old superblock and setting the blocksize back to its original value, dropping all buffers in the process. If kill_block_super resets the blocksize while an sb_bread is in progress, the sizes won't match up and we'll get stuck in the loop. I'll be working on a fix, but figured I'd send out a quick update. - -Jeff - -- Jeff Mahoney SuSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFCH7zXLPWxlyuTD7IRAr/WAJ9B6MLsKl6cv48Qlcklx1saYERv7ACdHWGW UBXAsQBiEAge3T1R4akLKd0= =w1zP -----END PGP SIGNATURE----- [-- Attachment #2: test.sh --] [-- Type: application/x-sh, Size: 290 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Help tracking down problem --- endless loop in __find_get_block_slow 2005-02-26 0:03 ` Jeff Mahoney @ 2005-02-28 21:06 ` Jeff Mahoney 2005-02-28 21:09 ` Help tracking down problem --- endless loop in __find_get_block_slow (now with the patch) Jeff Mahoney 1 sibling, 0 replies; 14+ messages in thread From: Jeff Mahoney @ 2005-02-28 21:06 UTC (permalink / raw) To: Jeff Mahoney; +Cc: Andrew Morton, zensonic, dm-devel, linux-kernel, Al Viro -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jeff Mahoney wrote: > Andrew Morton wrote: > >>>Jeff Mahoney <jeffm@suse.com> wrote: >>> >>> >>>>In my experience, the loop is actually outside of >>>>__find_get_block_slow(), in __getblk_slow(). I've been using xmon to >>>>interrupt the kernel, and the results vary but are all rooted in the >>>>for(;;) loop in __getblk_slow. It appears as though grow_buffers is >>>>finding/creating the page, but then __find_get_block can't locate the >>>>buffer it needs. >>> >>> >>>Yes, that'll happen. Because there are still buffers attached to the page >>>which have the wrong blocksize. Say, if someone is trying to read a 2k >>>buffer_head which is backed by a page which already has 1k buffer_heads >>>attached to it. >>> >>>Does your kernel not have that big printk in __find_get_block_slow()? If >>>it does, maybe some of the buffers are unmapped. Try: > > > I think it's likely I'm experiencing a different bug than the original > poster. I've tried making the printk unconditional, and I get no output. > However, I've continued to track it down, and I believe I've found a > umount race. I can also reproduce it without subfs, with the attached > script. > > I added some debug output to aid in my search: > __find_get_block_slow: find_get_page > [block=17508,blksize=2048,index=8754,sizebits=1,size=512] returned null > returning page [index=2188,block=17504,size=512,sizebits=3] > Couldn't find buffer @ block 17508 > > What I'm observing is that __find_get_block_slow is calculating the > index using the blocksize for the device, and the grow_buffers call is > using the blocksize handed down from the filesystem via sb_bread(). They > *should* be the same, but here's where my suspected race comes in. Since > the buffers are being searched for in the wrong place, they're never > found, causing the infinite loop. > > The open_bdev_excl() call in get_sb_bdev() should be keeping callers out > until the block device is actually closed, but it uses the fs_type > struct as the holder which, given that the filesystem to be mounted is > the same one as the one being umounted, will be the same. This allows > the mount attempt to continue. If the superblock for the umounting > filesystem is already in the process of getting shut down, sget() will > create a new superblock and the mount attempt will use that one. The > umount will continue, destroying the old superblock and setting the > blocksize back to its original value, dropping all buffers in the process. > > If kill_block_super resets the blocksize while an sb_bread is in > progress, the sizes won't match up and we'll get stuck in the loop. > > I'll be working on a fix, but figured I'd send out a quick update. To prove the race, I made bd_claim return -EBUSY even if the passed holder == current holder. This eliminated the race, but made a potentially unacceptable change to how block devices are handled. [1] Attached is a patch that isn't pretty but does still allow the multiple-open semantics of open_bdev_excl() while eliminating the race. Basically, it creates a call path that allows the device's block size to be changed only if the caller holds the sole reference to the block device. This has passed my admittedly small test cases without returning invalid results. Only the close case is protected by bdev_lock. This may seem racy, but I don't believe it to be. The only way a caller could have the opportunity to reset the block size while the block device is opened is if the holder value is the same, which means that it must be the same filesystem type. The same filesystem type opening the same device will result in finding the same on-disk superblock that the umounting filesystem was using, so it would be the same block size. set_blocksize will exit early if the block size is the same, thus avoiding the code that could cause issues. I'd appreciate any feedback. - -Jeff [1] Currently, mtd is the only non-filesystem caller of open_bdev_excl. While it's possible for mtd to call open_bdev_excl twice on the same device, it would require passing the same device more than once as a module parameter. - -- Jeff Mahoney SuSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFCI4fHLPWxlyuTD7IRAu5LAKCPaSFb+BCpHRw44YU5j2IE52X5AgCfVHby 4mz96aq70X9VOIccEWvnCko= =59ut -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Help tracking down problem --- endless loop in __find_get_block_slow (now with the patch) 2005-02-26 0:03 ` Jeff Mahoney 2005-02-28 21:06 ` Jeff Mahoney @ 2005-02-28 21:09 ` Jeff Mahoney 1 sibling, 0 replies; 14+ messages in thread From: Jeff Mahoney @ 2005-02-28 21:09 UTC (permalink / raw) To: Jeff Mahoney; +Cc: Andrew Morton, zensonic, dm-devel, linux-kernel, Al Viro [-- Attachment #1: Type: text/plain, Size: 4407 bytes --] -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jeff Mahoney wrote: > Andrew Morton wrote: > >>>Jeff Mahoney <jeffm@suse.com> wrote: >>> >>> >>>>In my experience, the loop is actually outside of >>>>__find_get_block_slow(), in __getblk_slow(). I've been using xmon to >>>>interrupt the kernel, and the results vary but are all rooted in the >>>>for(;;) loop in __getblk_slow. It appears as though grow_buffers is >>>>finding/creating the page, but then __find_get_block can't locate the >>>>buffer it needs. >>> >>> >>>Yes, that'll happen. Because there are still buffers attached to the page >>>which have the wrong blocksize. Say, if someone is trying to read a 2k >>>buffer_head which is backed by a page which already has 1k buffer_heads >>>attached to it. >>> >>>Does your kernel not have that big printk in __find_get_block_slow()? If >>>it does, maybe some of the buffers are unmapped. Try: > > > I think it's likely I'm experiencing a different bug than the original > poster. I've tried making the printk unconditional, and I get no output. > However, I've continued to track it down, and I believe I've found a > umount race. I can also reproduce it without subfs, with the attached > script. > > I added some debug output to aid in my search: > __find_get_block_slow: find_get_page > [block=17508,blksize=2048,index=8754,sizebits=1,size=512] returned null > returning page [index=2188,block=17504,size=512,sizebits=3] > Couldn't find buffer @ block 17508 > > What I'm observing is that __find_get_block_slow is calculating the > index using the blocksize for the device, and the grow_buffers call is > using the blocksize handed down from the filesystem via sb_bread(). They > *should* be the same, but here's where my suspected race comes in. Since > the buffers are being searched for in the wrong place, they're never > found, causing the infinite loop. > > The open_bdev_excl() call in get_sb_bdev() should be keeping callers out > until the block device is actually closed, but it uses the fs_type > struct as the holder which, given that the filesystem to be mounted is > the same one as the one being umounted, will be the same. This allows > the mount attempt to continue. If the superblock for the umounting > filesystem is already in the process of getting shut down, sget() will > create a new superblock and the mount attempt will use that one. The > umount will continue, destroying the old superblock and setting the > blocksize back to its original value, dropping all buffers in the process. > > If kill_block_super resets the blocksize while an sb_bread is in > progress, the sizes won't match up and we'll get stuck in the loop. > > I'll be working on a fix, but figured I'd send out a quick update. Resent: Forgot to attach the patch. :-\ To prove the race, I made bd_claim return -EBUSY even if the passed holder == current holder. This eliminated the race, but made a potentially unacceptable change to how block devices are handled. [1] Attached is a patch that isn't pretty but does still allow the multiple-open semantics of open_bdev_excl() while eliminating the race. Basically, it creates a call path that allows the device's block size to be changed only if the caller holds the sole reference to the block device. This has passed my admittedly small test cases without returning invalid results. Only the close case is protected by bdev_lock. This may seem racy, but I don't believe it to be. The only way a caller could have the opportunity to reset the block size while the block device is opened is if the holder value is the same, which means that it must be the same filesystem type. The same filesystem type opening the same device will result in finding the same on-disk superblock that the umounting filesystem was using, so it would be the same block size. set_blocksize will exit early if the block size is the same, thus avoiding the code that could cause issues. I'd appreciate any feedback. - -Jeff [1] Currently, mtd is the only non-filesystem caller of open_bdev_excl. While it's possible for mtd to call open_bdev_excl twice on the same device, it would require passing the same device more than once as a module parameter. - -- Jeff Mahoney SuSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFCI4iNLPWxlyuTD7IRAhdtAJ9s/GELyA4GE4SQaDiztuAtjzffgQCcCWyO MZ3swZSz7YJI+ElgH9YS5kE= =Sd5k -----END PGP SIGNATURE----- [-- Attachment #2: set_blocksize_race.diff --] [-- Type: text/x-patch, Size: 4591 bytes --] diff -ruNpX dontdiff linux-2.6.11-rc4.orig/fs/block_dev.c linux-2.6.11-rc4/fs/block_dev.c --- linux-2.6.11-rc4.orig/fs/block_dev.c 2005-02-28 14:06:59.000000000 -0500 +++ linux-2.6.11-rc4/fs/block_dev.c 2005-02-28 14:49:52.000000000 -0500 @@ -62,7 +62,7 @@ static void kill_bdev(struct block_devic truncate_inode_pages(bdev->bd_inode->i_mapping, 0); } -int set_blocksize(struct block_device *bdev, int size) +int __set_blocksize(struct block_device *bdev, int size, int sync) { /* Size must be a power of two, and between 512 and PAGE_SIZE */ if (size > PAGE_SIZE || size < 512 || (size & (size-1))) @@ -74,7 +74,8 @@ int set_blocksize(struct block_device *b /* Don't change the size if it is same as current */ if (bdev->bd_block_size != size) { - sync_blockdev(bdev); + if (sync) + sync_blockdev(bdev); bdev->bd_block_size = size; bdev->bd_inode->i_blkbits = blksize_bits(size); kill_bdev(bdev); @@ -82,7 +83,7 @@ int set_blocksize(struct block_device *b return 0; } -EXPORT_SYMBOL(set_blocksize); +EXPORT_SYMBOL(__set_blocksize); int sb_set_blocksize(struct super_block *sb, int size) { @@ -480,17 +481,19 @@ int bd_claim(struct block_device *bdev, EXPORT_SYMBOL(bd_claim); -void bd_release(struct block_device *bdev) +void __bd_release(struct block_device *bdev, int size) { spin_lock(&bdev_lock); if (!--bdev->bd_contains->bd_holders) bdev->bd_contains->bd_holder = NULL; - if (!--bdev->bd_holders) + if (!--bdev->bd_holders) { bdev->bd_holder = NULL; + set_blocksize_nosync (bdev, size); + } spin_unlock(&bdev_lock); } -EXPORT_SYMBOL(bd_release); +EXPORT_SYMBOL(__bd_release); /* * Tries to open block device by device number. Use it ONLY if you @@ -914,10 +917,10 @@ EXPORT_SYMBOL(open_bdev_excl); * * This is the counterpart to open_bdev_excl(). */ -void close_bdev_excl(struct block_device *bdev) +void __close_bdev_excl(struct block_device *bdev, int size) { - bd_release(bdev); + __bd_release(bdev, size); blkdev_put(bdev); } -EXPORT_SYMBOL(close_bdev_excl); +EXPORT_SYMBOL(__close_bdev_excl); diff -ruNpX dontdiff linux-2.6.11-rc4.orig/fs/super.c linux-2.6.11-rc4/fs/super.c --- linux-2.6.11-rc4.orig/fs/super.c 2005-02-28 14:07:01.000000000 -0500 +++ linux-2.6.11-rc4/fs/super.c 2005-02-28 14:42:49.000000000 -0500 @@ -732,8 +732,7 @@ void kill_block_super(struct super_block bdev_uevent(bdev, KOBJ_UMOUNT); generic_shutdown_super(sb); - set_blocksize(bdev, sb->s_old_blocksize); - close_bdev_excl(bdev); + __close_bdev_excl(bdev, sb->s_old_blocksize); } EXPORT_SYMBOL(kill_block_super); diff -ruNpX dontdiff linux-2.6.11-rc4.orig/include/linux/fs.h linux-2.6.11-rc4/include/linux/fs.h --- linux-2.6.11-rc4.orig/include/linux/fs.h 2005-02-28 14:07:42.000000000 -0500 +++ linux-2.6.11-rc4/include/linux/fs.h 2005-02-28 14:50:53.000000000 -0500 @@ -1294,7 +1294,10 @@ extern long compat_blkdev_ioctl(struct f extern int blkdev_get(struct block_device *, mode_t, unsigned); extern int blkdev_put(struct block_device *); extern int bd_claim(struct block_device *, void *); -extern void bd_release(struct block_device *); +extern void __bd_release(struct block_device *, int); +static inline void bd_release(struct block_device *bdev) { + __bd_release (bdev, bdev->bd_block_size); +} /* fs/char_dev.c */ extern int alloc_chrdev_region(dev_t *, unsigned, unsigned, const char *); @@ -1311,7 +1314,10 @@ extern const char *__bdevname(dev_t, cha extern const char *bdevname(struct block_device *bdev, char *buffer); extern struct block_device *lookup_bdev(const char *); extern struct block_device *open_bdev_excl(const char *, int, void *); -extern void close_bdev_excl(struct block_device *); +extern void __close_bdev_excl(struct block_device *, int); +static inline void close_bdev_excl(struct block_device *bdev) { + __close_bdev_excl(bdev, bdev->bd_block_size); +} extern void init_special_inode(struct inode *, umode_t, dev_t); @@ -1447,7 +1453,14 @@ extern void file_kill(struct file *f); struct bio; extern void submit_bio(int, struct bio *); extern int bdev_read_only(struct block_device *); -extern int set_blocksize(struct block_device *, int); +extern int __set_blocksize(struct block_device *, int, int); +static inline int set_blocksize(struct block_device *bdev, int size) { + return __set_blocksize (bdev, size, 1); +} +static inline int set_blocksize_nosync(struct block_device *bdev, int size) { + return __set_blocksize (bdev, size, 0); +} + extern int sb_set_blocksize(struct super_block *, int); extern int sb_min_blocksize(struct super_block *, int); ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Help tracking down problem --- endless loop in __find_get_block_slow 2005-02-22 9:18 ` Andrew Morton 2005-02-22 22:46 ` Jeff Mahoney @ 2005-02-23 12:00 ` Thomas S. Iversen 2005-02-23 12:10 ` Andrew Morton 1 sibling, 1 reply; 14+ messages in thread From: Thomas S. Iversen @ 2005-02-23 12:00 UTC (permalink / raw) To: Andrew Morton; +Cc: dm-devel, linux-kernel > > dd if=/dev/zero of=/mnt/testfile count=N, N>6 It turns out N>6 is variable. But large enough and it will hang. Sugests some kind of race I am afraid. > > I get into an endless loop in __find_get_block_slow. > > The only way in which __find_get_block_slow() can loop is if something > wrecked the buffer_head ring at page->private: something caused an internal > loop via bh->b_this_page. > > Are you sure that's where things are hanging? That it's not stuck on a > spinlock? I get the same message again and again all over (this trace with count=100 and a BUG() and panic inserted into buffer.c) Feb 23 13:38:34 localhost kernel: __find_get_block_slow() failed. block=101, b_blocknr=100 Feb 23 13:38:34 localhost kernel: b_state=0x00000010, b_size=2048 Feb 23 13:38:34 localhost kernel: device blocksize: 2048 Ad inf. The output of the BUG() is show below. My questions remain: 1. Whos fault is this? 2. How do I fix/avoid this problem? Regards Thomas __find_get_block_slow() failed. block=101, b_blocknr=100 b_state=0x00000010, b_size=2048 device blocksize: 2048 ------------[ cut here ]------------ DEBUG_PAGEALLOC Modules linked in: ufs sha512 aes_i586 dm_bde md5 ipv6 af_packet evdev ehci_hcd usbcore agpgart dm_mod rtc unix CPU: 0 EIP: 0060:[__find_get_block_slow+202/304] Not tainted EFLAGS: 00010286 (2.6.8) EIP is at __find_get_block_slow+0xca/0x130 eax: 00000017 ebx: dcd21208 ecx: c02c6950 edx: 000041e4 esi: c138d600 edi: 00000065 ebp: da63aed0 esp: da7e9c54 ds: 007b es: 007b ss: 0068 Process dd (pid: 1742, threadinfo=da7e8000 task=dc76da70) Stack: c029a003 00000800 00000800 00000064 00000000 00000000 00000000 da63ae6c 00000065 00000800 c014ab5f da63ae6c 00000065 00000800 00000800 00000065 da63ae6c da764df8 c014abdb da63ae6c 00000065 00000800 00000000 00000000 Call Trace: [__find_get_block+95/176] __find_get_block+0x5f/0xb0 [__getblk+43/96] __getblk+0x2b/0x60 [pg0+542048932/1070055424] ufs_new_fragments+0x1c4/0x740 [ufs] [pg0+542070875/1070055424] ufs_inode_getfrag+0x23b/0x3f0 [ufs] [pg0+542072873/1070055424] ufs_getfrag_block+0x349/0x3b0 [ufs] [create_empty_buffers+37/128] create_empty_buffers+0x25/0x80 [__block_prepare_write+467/960] __block_prepare_write+0x1d3/0x3c0 [kernel_map_pages+40/144] kernel_map_pages+0x28/0x90 [inode_update_time+167/224] inode_update_time+0xa7/0xe0 [block_prepare_write+52/80] block_prepare_write+0x34/0x50 [pg0+542072032/1070055424] ufs_getfrag_block+0x0/0x3b0 [ufs] [generic_file_aio_write_nolock+821/2624] generic_file_aio_write_nolock+0x335/0xa40 [pg0+542072032/1070055424] ufs_getfrag_block+0x0/0x3b0 [ufs] [do_no_page+99/688] do_no_page+0x63/0x2b0 [generic_file_write_nolock+116/144] generic_file_write_nolock+0x74/0x90 [clear_user+51/80] clear_user+0x33/0x50 [read_zero+452/528] read_zero+0x1c4/0x210 [generic_file_write+85/112] generic_file_write+0x55/0x70 [generic_file_write+0/112] generic_file_write+0x0/0x70 [vfs_write+184/304] vfs_write+0xb8/0x130 [copy_to_user+62/80] copy_to_user+0x3e/0x50 [sys_write+81/128] sys_write+0x51/0x80 [syscall_call+7/11] syscall_call+0x7/0xb Code: 0f 0b 15 02 c2 9f 29 c0 8b 06 f6 c4 08 75 17 8b 46 04 40 74 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Help tracking down problem --- endless loop in __find_get_block_slow 2005-02-23 12:00 ` Help tracking down problem --- endless loop in __find_get_block_slow Thomas S. Iversen @ 2005-02-23 12:10 ` Andrew Morton 2005-02-23 13:02 ` Thomas S. Iversen 0 siblings, 1 reply; 14+ messages in thread From: Andrew Morton @ 2005-02-23 12:10 UTC (permalink / raw) To: Thomas S. Iversen; +Cc: dm-devel, linux-kernel zensonic@zensonic.dk (Thomas S. Iversen) wrote: > > > > dd if=/dev/zero of=/mnt/testfile count=N, N>6 > > It turns out N>6 is variable. But large enough and it will hang. Sugests > some kind of race I am afraid. > > > > I get into an endless loop in __find_get_block_slow. > > > > The only way in which __find_get_block_slow() can loop is if something > > wrecked the buffer_head ring at page->private: something caused an internal > > loop via bh->b_this_page. > > > > Are you sure that's where things are hanging? That it's not stuck on a > > spinlock? > > I get the same message again and again all over (this trace with > count=100 and a BUG() and panic inserted into buffer.c) > > Feb 23 13:38:34 localhost kernel: __find_get_block_slow() failed. > block=101, b_blocknr=100 > Feb 23 13:38:34 localhost kernel: b_state=0x00000010, b_size=2048 > Feb 23 13:38:34 localhost kernel: device blocksize: 2048 OK, so we're looking for the buffer_head for block 101 and the first buffer_head which is attached to the page represents block 100. So the next buffer_head _should_ represent block 101. Please print it out: --- 25/fs/buffer.c~a 2005-02-23 04:06:12.000000000 -0800 +++ 25-akpm/fs/buffer.c 2005-02-23 04:06:51.000000000 -0800 @@ -459,8 +459,10 @@ __find_get_block_slow(struct block_devic */ if (all_mapped) { printk("__find_get_block_slow() failed. " - "block=%llu, b_blocknr=%llu\n", - (unsigned long long)block, (unsigned long long)bh->b_blocknr); + "block=%llu, b_blocknr=%llu, next=%llu\n", + (unsigned long long)block, + (unsigned long long)bh->b_blocknr, + (unsigned long long)bh->b_this_page->b_blocknr); printk("b_state=0x%08lx, b_size=%u\n", bh->b_state, bh->b_size); printk("device blocksize: %d\n", 1 << bd_inode->i_blkbits); } _ > Ad inf. The output of the BUG() is show below. My questions remain: > > 1. Whos fault is this? Could be UFS. But what does "transparent block encryption and sector shuffling" mean? How is the sector shuffling implemented? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Help tracking down problem --- endless loop in __find_get_block_slow 2005-02-23 12:10 ` Andrew Morton @ 2005-02-23 13:02 ` Thomas S. Iversen 2005-02-23 20:09 ` Andrew Morton 0 siblings, 1 reply; 14+ messages in thread From: Thomas S. Iversen @ 2005-02-23 13:02 UTC (permalink / raw) To: Andrew Morton; +Cc: dm-devel, linux-kernel > OK, so we're looking for the buffer_head for block 101 and the first > buffer_head which is attached to the page represents block 100. So the > next buffer_head _should_ represent block 101. Please print it out: Not quite the same, but simelar: Feb 23 14:50:24 localhost kernel: __find_get_block_slow() failed. block=102, b_blocknr=128, next=129 Feb 23 14:50:24 localhost kernel: b_state=0x00000013, b_size=2048 Feb 23 14:50:24 localhost kernel: device blocksize: 2048 Feb 23 14:50:24 localhost kernel: ------------[ cut here ]------------ > Could be UFS. But what does "transparent block encryption and sector > shuffling" mean? How is the sector shuffling implemented? GDBE is a block level encrypter. It encrypts the actual sectors transparently via the GEOM API (corresponds to the devicemapper api in linux). GBDE assigns a key for each block, thereby introducing keysectors. Furthermore the sectors are remapped so that one can not guess where e.g. metadata is located on the physical disk. It is a rather simple remap: Taken from GDBE: /* * Convert an unencrypted side offset to offsets on the encrypted side. * * Security objective: Make it harder to identify what sectors contain what * on a "cold" disk image. * * We do this by adding the "keyoffset" from the lock to the physical sector * number modulus the available number of sectors. Since all physical sectors * presumably look the same cold, this will do. * * As part of the mapping we have to skip the lock sectors which we know * the physical address off. We also truncate the work packet, respecting * zone boundaries and lock sectors, so that we end up with a sequence of * sectors which are physically contiguous. * * Shuffling things further is an option, but the incremental frustration is * not currently deemed worth the run-time performance hit resulting from the * increased number of disk arm movements it would incur. * * This function offers nothing but a trivial diversion for an attacker able * to do "the cleaning lady attack" in its current static mapping form. */" Further reading: http://phk.freebsd.dk/pubs/bsdcon-03.gbde.paper.pdf Regards Thomas ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Help tracking down problem --- endless loop in __find_get_block_slow 2005-02-23 13:02 ` Thomas S. Iversen @ 2005-02-23 20:09 ` Andrew Morton 2005-02-23 22:24 ` Thomas S. Iversen 0 siblings, 1 reply; 14+ messages in thread From: Andrew Morton @ 2005-02-23 20:09 UTC (permalink / raw) To: Thomas S. Iversen; +Cc: dm-devel, linux-kernel zensonic@zensonic.dk (Thomas S. Iversen) wrote: > > > OK, so we're looking for the buffer_head for block 101 and the first > > buffer_head which is attached to the page represents block 100. So the > > next buffer_head _should_ represent block 101. Please print it out: > > Not quite the same, but simelar: > > Feb 23 14:50:24 localhost kernel: __find_get_block_slow() failed. block=102, > b_blocknr=128, next=129 > Feb 23 14:50:24 localhost kernel: b_state=0x00000013, b_size=2048 > Feb 23 14:50:24 localhost kernel: device blocksize: 2048 > Feb 23 14:50:24 localhost kernel: ------------[ cut here ]------------ Something has caused the page at offset 51 (block 102) to have buffer_heads for blocks 128 and 129 attached to it. > > Could be UFS. But what does "transparent block encryption and sector > > shuffling" mean? How is the sector shuffling implemented? > > GDBE is a block level encrypter. It encrypts the actual sectors > transparently via the GEOM API (corresponds to the devicemapper api in linux). > > GBDE assigns a key for each block, thereby introducing keysectors. > Furthermore the sectors are remapped so that one can not guess where e.g. > metadata is located on the physical disk. It is a rather simple remap: I'd be suspecting that the sector remapping is the cause of the problem. How is it implemented? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Help tracking down problem --- endless loop in __find_get_block_slow 2005-02-23 20:09 ` Andrew Morton @ 2005-02-23 22:24 ` Thomas S. Iversen 2005-02-23 23:17 ` Andrew Morton 0 siblings, 1 reply; 14+ messages in thread From: Thomas S. Iversen @ 2005-02-23 22:24 UTC (permalink / raw) To: Andrew Morton; +Cc: dm-devel, linux-kernel > Something has caused the page at offset 51 (block 102) to have buffer_heads > for blocks 128 and 129 attached to it. Ok. > I'd be suspecting that the sector remapping is the cause of the problem. > How is it implemented? Quite simple actually. You're most welcome to see the code, but I have just done a test like the one below. Never mind the performance figures, correctness comes as a very first priority. It does not block, cause endless loops or anything funny if I take the filsystem out of the question and access the raw devicemapped block device. This should assert that it is not a fault in my code, right? zensonic@www:~$ ssh 192.168.1.9 -l root Password: Last login: Wed Feb 23 23:30:41 2005 maibritt:~# unload maibritt:~# modprobe dm-bde maibritt:~# alias dmc alias dmc='/usr/src/device-mapper.1.00.19/dmsetup/dmsetup create dmbde' maibritt:~# dmc maibritt:~# dd if=/dev/urandom of=/dev/mapper/dmbde dd: skriver til '/dev/mapper/dmbde': Ikke mere plads på enheden 46721+0 blokke ind 46720+0 blokke ud 23920640 bytes transferred in 13,004132 seconds (1839465 bytes/sec) maibritt:~# hexdump /dev/mapper/dmbde | head 0000000 f348 b77e 96b8 62cc ea69 1ddf e232 1d74 0000010 b1fe 521c 4f4f 30cd 62a0 d697 0ce9 55de 0000020 2bae ee86 53b1 c04d 88ab cc63 5826 11d7 0000030 98c5 2ee3 d81e 1658 29f2 9196 8591 fe20 0000040 7c4c 2bf4 a088 906e 23ae d327 9def 09ea 0000050 b435 75c6 3b78 d45e 16aa 038c f592 2e03 0000060 52fe 5b05 03c1 95ff 927a 84ed 6aab f0f8 0000070 a870 0797 5a82 e574 964d c633 8365 42b8 0000080 8e6e 6e60 8544 792f 597f 5065 7766 fbbb 0000090 5eb7 e6f9 9632 477b a17a 91d3 121e 8692 maibritt:~# dd if=/dev/zero of=/dev/mapper/dmbde dd: skriver til '/dev/mapper/dmbde': Ikke mere plads på enheden 46721+0 blokke ind 46720+0 blokke ud 23920640 bytes transferred in 6,335617 seconds (3775582 bytes/sec) maibritt:~# hexdump /dev/mapper/dmbde 0000000 0000 0000 0000 0000 0000 0000 0000 0000 * 16d0000 maibritt:~# maibritt:~# ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Help tracking down problem --- endless loop in __find_get_block_slow 2005-02-23 22:24 ` Thomas S. Iversen @ 2005-02-23 23:17 ` Andrew Morton 2005-02-24 8:54 ` Thomas S. Iversen 0 siblings, 1 reply; 14+ messages in thread From: Andrew Morton @ 2005-02-23 23:17 UTC (permalink / raw) To: Thomas S. Iversen; +Cc: dm-devel, linux-kernel "Thomas S. Iversen" <zensonic@zensonic.dk> wrote: > > > I'd be suspecting that the sector remapping is the cause of the problem. > > How is it implemented? > > Quite simple actually. You're most welcome to see the code, but I have > just done a test like the one below. Never mind the performance figures, > correctness comes as a very first priority. It does not block, cause > endless loops or anything funny if I take the filsystem out of the > question and access the raw devicemapped block device. > > This should assert that it is not a fault in my code, right? I don't know. Can you describe how your driver implements the remapping? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Help tracking down problem --- endless loop in __find_get_block_slow 2005-02-23 23:17 ` Andrew Morton @ 2005-02-24 8:54 ` Thomas S. Iversen 0 siblings, 0 replies; 14+ messages in thread From: Thomas S. Iversen @ 2005-02-24 8:54 UTC (permalink / raw) To: Andrew Morton; +Cc: dm-devel, linux-kernel > I don't know. Can you describe how your driver implements the remapping? I have tested with ext2. I get double faults when trying to sync. What are those? (By the look of the error message, I can see they are not something I want to have :-) I'm at loss how all of this can happend. All I do is to allocate BIOs and kernel memory, process the BIOs and deal with deallocation afterwards. Thomas ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2005-02-28 21:13 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-02-21 10:46 Help tracking down problem --- endless loop in __find_get_block_slow Thomas S. Iversen 2005-02-22 9:18 ` Andrew Morton 2005-02-22 22:46 ` Jeff Mahoney 2005-02-22 23:28 ` Andrew Morton 2005-02-26 0:03 ` Jeff Mahoney 2005-02-28 21:06 ` Jeff Mahoney 2005-02-28 21:09 ` Help tracking down problem --- endless loop in __find_get_block_slow (now with the patch) Jeff Mahoney 2005-02-23 12:00 ` Help tracking down problem --- endless loop in __find_get_block_slow Thomas S. Iversen 2005-02-23 12:10 ` Andrew Morton 2005-02-23 13:02 ` Thomas S. Iversen 2005-02-23 20:09 ` Andrew Morton 2005-02-23 22:24 ` Thomas S. Iversen 2005-02-23 23:17 ` Andrew Morton 2005-02-24 8:54 ` Thomas S. Iversen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox