* limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] [not found] <20180408040005.GA19128@ming.t460p> @ 2018-04-09 15:51 ` Mike Snitzer 2018-04-09 18:38 ` Mike Snitzer 0 siblings, 1 reply; 14+ messages in thread From: Mike Snitzer @ 2018-04-09 15:51 UTC (permalink / raw) To: Ming Lei; +Cc: dm-devel, linux-block On Sun, Apr 08 2018 at 12:00am -0400, Ming Lei <ming.lei@redhat.com> wrote: > Hi, > > The following kernel oops(divide error) is triggered when running > xfstest(generic/347) on ext4. > > [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44 > [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI > [ 443.840201] Dumping ftrace buffer: > [ 443.840692] (ftrace buffer empty) > [ 443.841195] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_snapshot dm_bufio xfs libcrc32c dm_flakey isofs iTCO_wdt iTCO_vendor_support lpc_ich i2c_i801 i2c_core mfd_core ip_tables sr_mod cdrom sd_mod usb_storage ahci libahci libata nvme crc32c_intel nvme_core virtio_scsi qemu_fw_cfg dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_debug] > [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1 > [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 > [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool] > [ 443.848949] RSP: 0018:ffffc90001407af0 EFLAGS: 00010246 > [ 443.849679] RAX: 0000000000000400 RBX: ffffc90001407b48 RCX: 0000000000000000 > [ 443.850969] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [ 443.852097] RBP: ffff88006ce028a0 R08: 00000000ffffffff R09: 0000000000000001 > [ 443.853099] R10: ffffc90001407b20 R11: ffffea0001cfad60 R12: ffff88006de62000 > [ 443.854404] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > [ 443.856129] FS: 00007fb30462d840(0000) GS:ffff88007bc80000(0000) knlGS:0000000000000000 > [ 443.857741] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 443.858576] CR2: 00007efc82a10440 CR3: 000000007e700006 CR4: 00000000007606e0 > [ 443.859583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 443.860587] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 443.861595] PKRU: 55555554 > [ 443.861978] Call Trace: > [ 443.862344] dm_calculate_queue_limits+0xb5/0x262 [dm_mod] > [ 443.863128] dm_setup_md_queue+0xe2/0x131 [dm_mod] > [ 443.863819] table_load+0x15e/0x2a7 [dm_mod] > [ 443.864425] ? table_clear+0xc1/0xc1 [dm_mod] > [ 443.865079] ctl_ioctl+0x295/0x374 [dm_mod] > [ 443.865679] dm_ctl_ioctl+0xa/0xd [dm_mod] > [ 443.866262] vfs_ioctl+0x1e/0x2b > [ 443.866721] do_vfs_ioctl+0x515/0x53d > [ 443.867242] ? ksys_semctl+0xb9/0x126 > [ 443.867761] ? __fput+0x17a/0x18d > [ 443.868236] ksys_ioctl+0x3e/0x5d > [ 443.868707] SyS_ioctl+0xa/0xd > [ 443.869144] do_syscall_64+0x9d/0x15e > [ 443.869669] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > [ 443.870381] RIP: 0033:0x7fb303ee8dc7 > [ 443.870886] RSP: 002b:00007ffdc3c81478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 443.871937] RAX: ffffffffffffffda RBX: 00007fb3041cbec0 RCX: 00007fb303ee8dc7 > [ 443.872925] RDX: 0000563591b81c30 RSI: 00000000c138fd09 RDI: 0000000000000003 > [ 443.873912] RBP: 0000000000000000 R08: 00007fb3042071c8 R09: 00007ffdc3c812e0 > [ 443.874900] R10: 00007fb304206683 R11: 0000000000000246 R12: 0000000000000000 > [ 443.875901] R13: 0000563591b81c60 R14: 0000563591b81c30 R15: 0000563591b81a80 > [ 443.876905] Code: 72 41 eb 33 8d 41 ff 85 c8 75 03 89 43 24 8b 43 24 44 89 c1 48 0f bd c8 4c 89 c8 48 d3 e0 89 43 24 8b 73 24 41 8b 44 24 38 31 d2 <48> f7 f6 48 89 f1 85 d2 75 cf eb bf 31 d2 89 f8 48 f7 f1 48 85 > [ 443.879519] RIP: pool_io_hints+0x77/0x153 [dm_thin_pool] RSP: ffffc90001407af0 > [ 443.880549] ---[ end trace 56e7f9b41e671f53 ]--- I was able to reproduce (in my case RIP was pool_io_hints+0x45) Which on my kernel, is: crash> dis -l pool_io_hints+0x45 /root/snitm/git/linux/drivers/md/dm-thin.c: 2748 0xffffffffc0765165 <pool_io_hints+69>: div %rdi Which is drivers/md/dm-thin.c:is_factor()'s return !sector_div(block_size, n); SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for this xfstests device... why would that be!? Clearly pool_io_hints() could stand to be more defensive with a !limits->max_sectors negative check but is it ever really valid for max_sectors to be 0? Pretty sure the ultimate bug is outside DM (but not seeing an obvious place where block core would set max_sectors to 0, all blk-settings.c uses min_not_zero(), etc). Mike ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] 2018-04-09 15:51 ` limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] Mike Snitzer @ 2018-04-09 18:38 ` Mike Snitzer 2018-04-09 19:32 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Mike Snitzer @ 2018-04-09 18:38 UTC (permalink / raw) To: Ming Lei; +Cc: dm-devel, linux-block On Mon, Apr 09 2018 at 11:51am -0400, Mike Snitzer <snitzer@redhat.com> wrote: > On Sun, Apr 08 2018 at 12:00am -0400, > Ming Lei <ming.lei@redhat.com> wrote: > > > Hi, > > > > The following kernel oops(divide error) is triggered when running > > xfstest(generic/347) on ext4. > > > > [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44 > > [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI > > [ 443.840201] Dumping ftrace buffer: > > [ 443.840692] (ftrace buffer empty) ... > > [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1 > > [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 > > [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool] ... > I was able to reproduce (in my case RIP was pool_io_hints+0x45) > > Which on my kernel, is: > > crash> dis -l pool_io_hints+0x45 > /root/snitm/git/linux/drivers/md/dm-thin.c: 2748 > 0xffffffffc0765165 <pool_io_hints+69>: div %rdi > > Which is drivers/md/dm-thin.c:is_factor()'s return > !sector_div(block_size, n); > > SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for > this xfstests device... why would that be!? > > Clearly pool_io_hints() could stand to be more defensive with a > !limits->max_sectors negative check but is it ever really valid for > max_sectors to be 0? > > Pretty sure the ultimate bug is outside DM (but not seeing an obvious > place where block core would set max_sectors to 0, all blk-settings.c > uses min_not_zero(), etc). I successfully ran this test against the linux-dm.git "for-4.17/dm-changes" tag that Linus merged after the block changes: git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes # ./check tests/generic/347 FSTYP -- ext4 PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch generic/347 65s Ran: generic/347 Passed all 1 tests SO this would seem to implicate some regression in the 4.17 block layer changes. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] 2018-04-09 18:38 ` Mike Snitzer @ 2018-04-09 19:32 ` Jens Axboe 2018-04-09 21:26 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2018-04-09 19:32 UTC (permalink / raw) To: Mike Snitzer, Ming Lei; +Cc: dm-devel, linux-block On 4/9/18 12:38 PM, Mike Snitzer wrote: > On Mon, Apr 09 2018 at 11:51am -0400, > Mike Snitzer <snitzer@redhat.com> wrote: > >> On Sun, Apr 08 2018 at 12:00am -0400, >> Ming Lei <ming.lei@redhat.com> wrote: >> >>> Hi, >>> >>> The following kernel oops(divide error) is triggered when running >>> xfstest(generic/347) on ext4. >>> >>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44 >>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI >>> [ 443.840201] Dumping ftrace buffer: >>> [ 443.840692] (ftrace buffer empty) > ... >>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1 >>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 >>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool] > > ... > >> I was able to reproduce (in my case RIP was pool_io_hints+0x45) >> >> Which on my kernel, is: >> >> crash> dis -l pool_io_hints+0x45 >> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748 >> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi >> >> Which is drivers/md/dm-thin.c:is_factor()'s return >> !sector_div(block_size, n); >> >> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for >> this xfstests device... why would that be!? >> >> Clearly pool_io_hints() could stand to be more defensive with a >> !limits->max_sectors negative check but is it ever really valid for >> max_sectors to be 0? >> >> Pretty sure the ultimate bug is outside DM (but not seeing an obvious >> place where block core would set max_sectors to 0, all blk-settings.c >> uses min_not_zero(), etc). > > I successfully ran this test against the linux-dm.git > "for-4.17/dm-changes" tag that Linus merged after the block changes: > git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes > > # ./check tests/generic/347 > FSTYP -- ext4 > PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm > MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch > MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch > > generic/347 65s > Ran: generic/347 > Passed all 1 tests > > SO this would seem to implicate some regression in the 4.17 block layer > changes. No immediate ideas come to mind, we didn't have a lot of changes and I don't see anything that looks problematic. Maybe you can try and bisect it and see what you come up with? -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] 2018-04-09 19:32 ` Jens Axboe @ 2018-04-09 21:26 ` Jens Axboe 2018-04-09 21:56 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2018-04-09 21:26 UTC (permalink / raw) To: Mike Snitzer, Ming Lei; +Cc: dm-devel, linux-block, Kees Cook On 4/9/18 1:32 PM, Jens Axboe wrote: > On 4/9/18 12:38 PM, Mike Snitzer wrote: >> On Mon, Apr 09 2018 at 11:51am -0400, >> Mike Snitzer <snitzer@redhat.com> wrote: >> >>> On Sun, Apr 08 2018 at 12:00am -0400, >>> Ming Lei <ming.lei@redhat.com> wrote: >>> >>>> Hi, >>>> >>>> The following kernel oops(divide error) is triggered when running >>>> xfstest(generic/347) on ext4. >>>> >>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44 >>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI >>>> [ 443.840201] Dumping ftrace buffer: >>>> [ 443.840692] (ftrace buffer empty) >> ... >>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1 >>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 >>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool] >> >> ... >> >>> I was able to reproduce (in my case RIP was pool_io_hints+0x45) >>> >>> Which on my kernel, is: >>> >>> crash> dis -l pool_io_hints+0x45 >>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748 >>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi >>> >>> Which is drivers/md/dm-thin.c:is_factor()'s return >>> !sector_div(block_size, n); >>> >>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for >>> this xfstests device... why would that be!? >>> >>> Clearly pool_io_hints() could stand to be more defensive with a >>> !limits->max_sectors negative check but is it ever really valid for >>> max_sectors to be 0? >>> >>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious >>> place where block core would set max_sectors to 0, all blk-settings.c >>> uses min_not_zero(), etc). >> >> I successfully ran this test against the linux-dm.git >> "for-4.17/dm-changes" tag that Linus merged after the block changes: >> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes >> >> # ./check tests/generic/347 >> FSTYP -- ext4 >> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm >> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch >> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch >> >> generic/347 65s >> Ran: generic/347 >> Passed all 1 tests >> >> SO this would seem to implicate some regression in the 4.17 block layer >> changes. > > No immediate ideas come to mind, we didn't have a lot of changes and I > don't see anything that looks problematic. Maybe you can try and > bisect it and see what you come up with? I ran it, problematic commit is: commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91 Author: Kees Cook <keescook@chromium.org> Date: Fri Mar 30 18:52:36 2018 -0700 kernel.h: Retain constant expression output for max()/min() -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] 2018-04-09 21:26 ` Jens Axboe @ 2018-04-09 21:56 ` Jens Axboe 2018-04-09 22:05 ` Kees Cook 2018-04-09 22:11 ` Linus Torvalds 0 siblings, 2 replies; 14+ messages in thread From: Jens Axboe @ 2018-04-09 21:56 UTC (permalink / raw) To: Mike Snitzer, Ming Lei Cc: dm-devel, linux-block, Kees Cook, Linus Torvalds, Chris Mason [-- Attachment #1: Type: text/plain, Size: 3183 bytes --] On 4/9/18 3:26 PM, Jens Axboe wrote: > On 4/9/18 1:32 PM, Jens Axboe wrote: >> On 4/9/18 12:38 PM, Mike Snitzer wrote: >>> On Mon, Apr 09 2018 at 11:51am -0400, >>> Mike Snitzer <snitzer@redhat.com> wrote: >>> >>>> On Sun, Apr 08 2018 at 12:00am -0400, >>>> Ming Lei <ming.lei@redhat.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> The following kernel oops(divide error) is triggered when running >>>>> xfstest(generic/347) on ext4. >>>>> >>>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44 >>>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI >>>>> [ 443.840201] Dumping ftrace buffer: >>>>> [ 443.840692] (ftrace buffer empty) >>> ... >>>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1 >>>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 >>>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool] >>> >>> ... >>> >>>> I was able to reproduce (in my case RIP was pool_io_hints+0x45) >>>> >>>> Which on my kernel, is: >>>> >>>> crash> dis -l pool_io_hints+0x45 >>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748 >>>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi >>>> >>>> Which is drivers/md/dm-thin.c:is_factor()'s return >>>> !sector_div(block_size, n); >>>> >>>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for >>>> this xfstests device... why would that be!? >>>> >>>> Clearly pool_io_hints() could stand to be more defensive with a >>>> !limits->max_sectors negative check but is it ever really valid for >>>> max_sectors to be 0? >>>> >>>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious >>>> place where block core would set max_sectors to 0, all blk-settings.c >>>> uses min_not_zero(), etc). >>> >>> I successfully ran this test against the linux-dm.git >>> "for-4.17/dm-changes" tag that Linus merged after the block changes: >>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes >>> >>> # ./check tests/generic/347 >>> FSTYP -- ext4 >>> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm >>> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch >>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch >>> >>> generic/347 65s >>> Ran: generic/347 >>> Passed all 1 tests >>> >>> SO this would seem to implicate some regression in the 4.17 block layer >>> changes. >> >> No immediate ideas come to mind, we didn't have a lot of changes and I >> don't see anything that looks problematic. Maybe you can try and >> bisect it and see what you come up with? > > I ran it, problematic commit is: > > commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91 > Author: Kees Cook <keescook@chromium.org> > Date: Fri Mar 30 18:52:36 2018 -0700 > > kernel.h: Retain constant expression output for max()/min() > The fun continues. Thinking I'd try a userspace repro and thinking it would be difficult to reproduce, try the attached min.c that just copies all the bits from include/linux/kernel.h axboe@x1:~ $ gcc -Wall -O2 -o min min.c axboe@x1:~ $ ./min 128 256 min_not_zero(128, 256) = 0 -- Jens Axboe [-- Attachment #2: min.c --] [-- Type: text/x-csrc, Size: 1112 bytes --] #include <stdio.h> #include <stdlib.h> #define __is_constexpr(x) \ (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8))) #define __no_side_effects(x, y) \ (__is_constexpr(x) && __is_constexpr(y)) #define __typecheck(x, y) \ (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) #define __safe_cmp(x, y) \ (__typecheck(x, y) && __no_side_effects(x, y)) #define __cmp(x, y, op) ((x) op (y) ? (x) : (y)) #define __cmp_once(x, y, op) ({ \ typeof(x) __x = (x); \ typeof(y) __y = (y); \ __cmp(__x, __y, op); }) #define __careful_cmp(x, y, op) \ __builtin_choose_expr(__safe_cmp(x, y), \ __cmp(x, y, op), __cmp_once(x, y, op)) #define min(x, y) __careful_cmp(x, y, <) #define min_not_zero(x, y) ({ \ typeof(x) __x = (x); \ typeof(y) __y = (y); \ __x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); }) int main(int argc, char *argv[]) { int val1, val2; if (argc < 3) { printf("%s val1 val2\n", argv[0]); return 1; } val1 = atoi(argv[1]); val2 = atoi(argv[2]); printf("min_not_zero(%d, %d) = %d\n", val1, val2, min_not_zero(val1, val2)); return 0; } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] 2018-04-09 21:56 ` Jens Axboe @ 2018-04-09 22:05 ` Kees Cook 2018-04-09 22:10 ` Jens Axboe 2018-04-09 22:11 ` Linus Torvalds 1 sibling, 1 reply; 14+ messages in thread From: Kees Cook @ 2018-04-09 22:05 UTC (permalink / raw) To: Jens Axboe Cc: Mike Snitzer, Ming Lei, dm-devel, linux-block, Linus Torvalds, Chris Mason On Mon, Apr 9, 2018 at 2:56 PM, Jens Axboe <axboe@kernel.dk> wrote: > On 4/9/18 3:26 PM, Jens Axboe wrote: >> On 4/9/18 1:32 PM, Jens Axboe wrote: >>> On 4/9/18 12:38 PM, Mike Snitzer wrote: >>>> On Mon, Apr 09 2018 at 11:51am -0400, >>>> Mike Snitzer <snitzer@redhat.com> wrote: >>>> >>>>> On Sun, Apr 08 2018 at 12:00am -0400, >>>>> Ming Lei <ming.lei@redhat.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> The following kernel oops(divide error) is triggered when running >>>>>> xfstest(generic/347) on ext4. >>>>>> >>>>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44 >>>>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI >>>>>> [ 443.840201] Dumping ftrace buffer: >>>>>> [ 443.840692] (ftrace buffer empty) >>>> ... >>>>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1 >>>>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 >>>>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool] >>>> >>>> ... >>>> >>>>> I was able to reproduce (in my case RIP was pool_io_hints+0x45) >>>>> >>>>> Which on my kernel, is: >>>>> >>>>> crash> dis -l pool_io_hints+0x45 >>>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748 >>>>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi >>>>> >>>>> Which is drivers/md/dm-thin.c:is_factor()'s return >>>>> !sector_div(block_size, n); >>>>> >>>>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for >>>>> this xfstests device... why would that be!? >>>>> >>>>> Clearly pool_io_hints() could stand to be more defensive with a >>>>> !limits->max_sectors negative check but is it ever really valid for >>>>> max_sectors to be 0? >>>>> >>>>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious >>>>> place where block core would set max_sectors to 0, all blk-settings.c >>>>> uses min_not_zero(), etc). >>>> >>>> I successfully ran this test against the linux-dm.git >>>> "for-4.17/dm-changes" tag that Linus merged after the block changes: >>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes >>>> >>>> # ./check tests/generic/347 >>>> FSTYP -- ext4 >>>> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm >>>> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch >>>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch >>>> >>>> generic/347 65s >>>> Ran: generic/347 >>>> Passed all 1 tests >>>> >>>> SO this would seem to implicate some regression in the 4.17 block layer >>>> changes. >>> >>> No immediate ideas come to mind, we didn't have a lot of changes and I >>> don't see anything that looks problematic. Maybe you can try and >>> bisect it and see what you come up with? >> >> I ran it, problematic commit is: >> >> commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91 >> Author: Kees Cook <keescook@chromium.org> >> Date: Fri Mar 30 18:52:36 2018 -0700 >> >> kernel.h: Retain constant expression output for max()/min() >> > > The fun continues. Thinking I'd try a userspace repro and thinking it > would be difficult to reproduce, try the attached min.c that just copies > all the bits from include/linux/kernel.h > > axboe@x1:~ $ gcc -Wall -O2 -o min min.c > axboe@x1:~ $ ./min 128 256 > min_not_zero(128, 256) = 0 This should be fixed with e9092d0d9796 ("Fix subtle macro variable shadowing in min_not_zero()"). -Kees -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] 2018-04-09 22:05 ` Kees Cook @ 2018-04-09 22:10 ` Jens Axboe 2018-04-09 22:27 ` Ming Lei 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2018-04-09 22:10 UTC (permalink / raw) To: Kees Cook Cc: Mike Snitzer, Ming Lei, dm-devel, linux-block, Linus Torvalds, Chris Mason On 4/9/18 4:05 PM, Kees Cook wrote: > On Mon, Apr 9, 2018 at 2:56 PM, Jens Axboe <axboe@kernel.dk> wrote: >> On 4/9/18 3:26 PM, Jens Axboe wrote: >>> On 4/9/18 1:32 PM, Jens Axboe wrote: >>>> On 4/9/18 12:38 PM, Mike Snitzer wrote: >>>>> On Mon, Apr 09 2018 at 11:51am -0400, >>>>> Mike Snitzer <snitzer@redhat.com> wrote: >>>>> >>>>>> On Sun, Apr 08 2018 at 12:00am -0400, >>>>>> Ming Lei <ming.lei@redhat.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> The following kernel oops(divide error) is triggered when running >>>>>>> xfstest(generic/347) on ext4. >>>>>>> >>>>>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44 >>>>>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI >>>>>>> [ 443.840201] Dumping ftrace buffer: >>>>>>> [ 443.840692] (ftrace buffer empty) >>>>> ... >>>>>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1 >>>>>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 >>>>>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool] >>>>> >>>>> ... >>>>> >>>>>> I was able to reproduce (in my case RIP was pool_io_hints+0x45) >>>>>> >>>>>> Which on my kernel, is: >>>>>> >>>>>> crash> dis -l pool_io_hints+0x45 >>>>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748 >>>>>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi >>>>>> >>>>>> Which is drivers/md/dm-thin.c:is_factor()'s return >>>>>> !sector_div(block_size, n); >>>>>> >>>>>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for >>>>>> this xfstests device... why would that be!? >>>>>> >>>>>> Clearly pool_io_hints() could stand to be more defensive with a >>>>>> !limits->max_sectors negative check but is it ever really valid for >>>>>> max_sectors to be 0? >>>>>> >>>>>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious >>>>>> place where block core would set max_sectors to 0, all blk-settings.c >>>>>> uses min_not_zero(), etc). >>>>> >>>>> I successfully ran this test against the linux-dm.git >>>>> "for-4.17/dm-changes" tag that Linus merged after the block changes: >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes >>>>> >>>>> # ./check tests/generic/347 >>>>> FSTYP -- ext4 >>>>> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm >>>>> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch >>>>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch >>>>> >>>>> generic/347 65s >>>>> Ran: generic/347 >>>>> Passed all 1 tests >>>>> >>>>> SO this would seem to implicate some regression in the 4.17 block layer >>>>> changes. >>>> >>>> No immediate ideas come to mind, we didn't have a lot of changes and I >>>> don't see anything that looks problematic. Maybe you can try and >>>> bisect it and see what you come up with? >>> >>> I ran it, problematic commit is: >>> >>> commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91 >>> Author: Kees Cook <keescook@chromium.org> >>> Date: Fri Mar 30 18:52:36 2018 -0700 >>> >>> kernel.h: Retain constant expression output for max()/min() >>> >> >> The fun continues. Thinking I'd try a userspace repro and thinking it >> would be difficult to reproduce, try the attached min.c that just copies >> all the bits from include/linux/kernel.h >> >> axboe@x1:~ $ gcc -Wall -O2 -o min min.c >> axboe@x1:~ $ ./min 128 256 >> min_not_zero(128, 256) = 0 > > This should be fixed with e9092d0d9796 ("Fix subtle macro variable > shadowing in min_not_zero()"). Yep that works, which is a relief. Some basic unit testing would have been very appropriate in this case, given how fundamentally broken it was... It's amazing nothing catastrophic happened. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] 2018-04-09 22:10 ` Jens Axboe @ 2018-04-09 22:27 ` Ming Lei 2018-04-09 22:32 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Ming Lei @ 2018-04-09 22:27 UTC (permalink / raw) To: Jens Axboe Cc: Kees Cook, Mike Snitzer, dm-devel, linux-block, Linus Torvalds, Chris Mason On Mon, Apr 09, 2018 at 04:10:17PM -0600, Jens Axboe wrote: > On 4/9/18 4:05 PM, Kees Cook wrote: > > On Mon, Apr 9, 2018 at 2:56 PM, Jens Axboe <axboe@kernel.dk> wrote: > >> On 4/9/18 3:26 PM, Jens Axboe wrote: > >>> On 4/9/18 1:32 PM, Jens Axboe wrote: > >>>> On 4/9/18 12:38 PM, Mike Snitzer wrote: > >>>>> On Mon, Apr 09 2018 at 11:51am -0400, > >>>>> Mike Snitzer <snitzer@redhat.com> wrote: > >>>>> > >>>>>> On Sun, Apr 08 2018 at 12:00am -0400, > >>>>>> Ming Lei <ming.lei@redhat.com> wrote: > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> The following kernel oops(divide error) is triggered when running > >>>>>>> xfstest(generic/347) on ext4. > >>>>>>> > >>>>>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44 > >>>>>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI > >>>>>>> [ 443.840201] Dumping ftrace buffer: > >>>>>>> [ 443.840692] (ftrace buffer empty) > >>>>> ... > >>>>>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1 > >>>>>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 > >>>>>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool] > >>>>> > >>>>> ... > >>>>> > >>>>>> I was able to reproduce (in my case RIP was pool_io_hints+0x45) > >>>>>> > >>>>>> Which on my kernel, is: > >>>>>> > >>>>>> crash> dis -l pool_io_hints+0x45 > >>>>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748 > >>>>>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi > >>>>>> > >>>>>> Which is drivers/md/dm-thin.c:is_factor()'s return > >>>>>> !sector_div(block_size, n); > >>>>>> > >>>>>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for > >>>>>> this xfstests device... why would that be!? > >>>>>> > >>>>>> Clearly pool_io_hints() could stand to be more defensive with a > >>>>>> !limits->max_sectors negative check but is it ever really valid for > >>>>>> max_sectors to be 0? > >>>>>> > >>>>>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious > >>>>>> place where block core would set max_sectors to 0, all blk-settings.c > >>>>>> uses min_not_zero(), etc). > >>>>> > >>>>> I successfully ran this test against the linux-dm.git > >>>>> "for-4.17/dm-changes" tag that Linus merged after the block changes: > >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes > >>>>> > >>>>> # ./check tests/generic/347 > >>>>> FSTYP -- ext4 > >>>>> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm > >>>>> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch > >>>>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch > >>>>> > >>>>> generic/347 65s > >>>>> Ran: generic/347 > >>>>> Passed all 1 tests > >>>>> > >>>>> SO this would seem to implicate some regression in the 4.17 block layer > >>>>> changes. > >>>> > >>>> No immediate ideas come to mind, we didn't have a lot of changes and I > >>>> don't see anything that looks problematic. Maybe you can try and > >>>> bisect it and see what you come up with? > >>> > >>> I ran it, problematic commit is: > >>> > >>> commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91 > >>> Author: Kees Cook <keescook@chromium.org> > >>> Date: Fri Mar 30 18:52:36 2018 -0700 > >>> > >>> kernel.h: Retain constant expression output for max()/min() > >>> > >> > >> The fun continues. Thinking I'd try a userspace repro and thinking it > >> would be difficult to reproduce, try the attached min.c that just copies > >> all the bits from include/linux/kernel.h > >> > >> axboe@x1:~ $ gcc -Wall -O2 -o min min.c > >> axboe@x1:~ $ ./min 128 256 > >> min_not_zero(128, 256) = 0 > > > > This should be fixed with e9092d0d9796 ("Fix subtle macro variable > > shadowing in min_not_zero()"). > > Yep that works, which is a relief. Some basic unit testing would have > been very appropriate in this case, given how fundamentally broken it > was... It's amazing nothing catastrophic happened. Actually, there was, :-) https://lkml.org/lkml/2018/4/9/355 -- Ming ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] 2018-04-09 22:27 ` Ming Lei @ 2018-04-09 22:32 ` Jens Axboe 2018-04-09 22:38 ` Kees Cook 2018-04-09 23:54 ` Linus Torvalds 0 siblings, 2 replies; 14+ messages in thread From: Jens Axboe @ 2018-04-09 22:32 UTC (permalink / raw) To: Ming Lei Cc: Kees Cook, Mike Snitzer, dm-devel, linux-block, Linus Torvalds, Chris Mason On 4/9/18 4:27 PM, Ming Lei wrote: > On Mon, Apr 09, 2018 at 04:10:17PM -0600, Jens Axboe wrote: >> On 4/9/18 4:05 PM, Kees Cook wrote: >>> On Mon, Apr 9, 2018 at 2:56 PM, Jens Axboe <axboe@kernel.dk> wrote: >>>> On 4/9/18 3:26 PM, Jens Axboe wrote: >>>>> On 4/9/18 1:32 PM, Jens Axboe wrote: >>>>>> On 4/9/18 12:38 PM, Mike Snitzer wrote: >>>>>>> On Mon, Apr 09 2018 at 11:51am -0400, >>>>>>> Mike Snitzer <snitzer@redhat.com> wrote: >>>>>>> >>>>>>>> On Sun, Apr 08 2018 at 12:00am -0400, >>>>>>>> Ming Lei <ming.lei@redhat.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> The following kernel oops(divide error) is triggered when running >>>>>>>>> xfstest(generic/347) on ext4. >>>>>>>>> >>>>>>>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44 >>>>>>>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI >>>>>>>>> [ 443.840201] Dumping ftrace buffer: >>>>>>>>> [ 443.840692] (ftrace buffer empty) >>>>>>> ... >>>>>>>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1 >>>>>>>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 >>>>>>>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool] >>>>>>> >>>>>>> ... >>>>>>> >>>>>>>> I was able to reproduce (in my case RIP was pool_io_hints+0x45) >>>>>>>> >>>>>>>> Which on my kernel, is: >>>>>>>> >>>>>>>> crash> dis -l pool_io_hints+0x45 >>>>>>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748 >>>>>>>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi >>>>>>>> >>>>>>>> Which is drivers/md/dm-thin.c:is_factor()'s return >>>>>>>> !sector_div(block_size, n); >>>>>>>> >>>>>>>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for >>>>>>>> this xfstests device... why would that be!? >>>>>>>> >>>>>>>> Clearly pool_io_hints() could stand to be more defensive with a >>>>>>>> !limits->max_sectors negative check but is it ever really valid for >>>>>>>> max_sectors to be 0? >>>>>>>> >>>>>>>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious >>>>>>>> place where block core would set max_sectors to 0, all blk-settings.c >>>>>>>> uses min_not_zero(), etc). >>>>>>> >>>>>>> I successfully ran this test against the linux-dm.git >>>>>>> "for-4.17/dm-changes" tag that Linus merged after the block changes: >>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes >>>>>>> >>>>>>> # ./check tests/generic/347 >>>>>>> FSTYP -- ext4 >>>>>>> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm >>>>>>> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch >>>>>>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch >>>>>>> >>>>>>> generic/347 65s >>>>>>> Ran: generic/347 >>>>>>> Passed all 1 tests >>>>>>> >>>>>>> SO this would seem to implicate some regression in the 4.17 block layer >>>>>>> changes. >>>>>> >>>>>> No immediate ideas come to mind, we didn't have a lot of changes and I >>>>>> don't see anything that looks problematic. Maybe you can try and >>>>>> bisect it and see what you come up with? >>>>> >>>>> I ran it, problematic commit is: >>>>> >>>>> commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91 >>>>> Author: Kees Cook <keescook@chromium.org> >>>>> Date: Fri Mar 30 18:52:36 2018 -0700 >>>>> >>>>> kernel.h: Retain constant expression output for max()/min() >>>>> >>>> >>>> The fun continues. Thinking I'd try a userspace repro and thinking it >>>> would be difficult to reproduce, try the attached min.c that just copies >>>> all the bits from include/linux/kernel.h >>>> >>>> axboe@x1:~ $ gcc -Wall -O2 -o min min.c >>>> axboe@x1:~ $ ./min 128 256 >>>> min_not_zero(128, 256) = 0 >>> >>> This should be fixed with e9092d0d9796 ("Fix subtle macro variable >>> shadowing in min_not_zero()"). >> >> Yep that works, which is a relief. Some basic unit testing would have >> been very appropriate in this case, given how fundamentally broken it >> was... It's amazing nothing catastrophic happened. > > Actually, there was, :-) > > https://lkml.org/lkml/2018/4/9/355 That's bad, for sure, but my worry was bigger than an oops or crash, we could have had corruption due to this. The resulting min/max and friends would have been trivial to test, but clearly they weren't. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] 2018-04-09 22:32 ` Jens Axboe @ 2018-04-09 22:38 ` Kees Cook 2018-04-09 23:01 ` Jens Axboe 2018-04-09 23:54 ` Linus Torvalds 1 sibling, 1 reply; 14+ messages in thread From: Kees Cook @ 2018-04-09 22:38 UTC (permalink / raw) To: Jens Axboe, Linus Torvalds Cc: Ming Lei, Mike Snitzer, dm-devel, linux-block, Chris Mason On Mon, Apr 9, 2018 at 3:32 PM, Jens Axboe <axboe@kernel.dk> wrote: > That's bad, for sure, but my worry was bigger than an oops or crash, > we could have had corruption due to this. > > The resulting min/max and friends would have been trivial to test, but > clearly they weren't. Yeah, that was bad luck and my fault: I tested min(), max(), min_t(), and max_t(). My assumption was that since the others were built from them, they'd be fine. Not true in this shadow variable case, though. :( We could do something like this, which would have caught it: diff --git a/init/main.c b/init/main.c index e4a3160991ea..ce46afc53b8b 100644 --- a/init/main.c +++ b/init/main.c @@ -993,10 +993,32 @@ static inline void mark_readonly(void) } #endif +static inline void compiletime_sanity_checks(void) +{ + /* Sanity-check min()/max() family of macros. */ + BUILD_BUG_ON(min(5, 50) != 5); + BUILD_BUG_ON(max(5, 50) != 50); + BUILD_BUG_ON(min_t(int, (size_t)-1 , 50) != -1); + BUILD_BUG_ON(max_t(size_t, -1 , 50) != (size_t)-1); + BUILD_BUG_ON(min3(-50, 0, 1000) != -50); + BUILD_BUG_ON(max3(-50, 0, 1000) != 1000); + BUILD_BUG_ON(min_not_zero(0, 20) != 20); + BUILD_BUG_ON(min_not_zero(30, 0) != 30); + BUILD_BUG_ON(min_not_zero(150, 40) != 40); + BUILD_BUG_ON(clamp(20, 1, 7) != 7); + BUILD_BUG_ON(clamp(40, 20, 100) != 40); + BUILD_BUG_ON(clamp(1, 20, 100) != 20); + BUILD_BUG_ON(clamp_t(int, -5, (size_t)-1, 100) != -1); + BUILD_BUG_ON(clamp_t(int, -1, (size_t)-5, 100) != -1); + BUILD_BUG_ON(clamp_t(size_t, -10, 1, -50) != -50); +} + static int __ref kernel_init(void *unused) { int ret; + compiletime_sanity_checks(); + kernel_init_freeable(); /* need to finish all async __init code before freeing the memory */ async_synchronize_full(); -- Kees Cook Pixel Security ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] 2018-04-09 22:38 ` Kees Cook @ 2018-04-09 23:01 ` Jens Axboe 0 siblings, 0 replies; 14+ messages in thread From: Jens Axboe @ 2018-04-09 23:01 UTC (permalink / raw) To: Kees Cook, Linus Torvalds Cc: Ming Lei, Mike Snitzer, dm-devel, linux-block, Chris Mason On 4/9/18 4:38 PM, Kees Cook wrote: > On Mon, Apr 9, 2018 at 3:32 PM, Jens Axboe <axboe@kernel.dk> wrote: >> That's bad, for sure, but my worry was bigger than an oops or crash, >> we could have had corruption due to this. >> >> The resulting min/max and friends would have been trivial to test, but >> clearly they weren't. > > Yeah, that was bad luck and my fault: I tested min(), max(), min_t(), It's only bad luck if it was tested :-) > and max_t(). My assumption was that since the others were built from > them, they'd be fine. Not true in this shadow variable case, though. > :( We could do something like this, which would have caught it: Might not hurt to do. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] 2018-04-09 22:32 ` Jens Axboe 2018-04-09 22:38 ` Kees Cook @ 2018-04-09 23:54 ` Linus Torvalds 2018-04-10 0:31 ` Jens Axboe 1 sibling, 1 reply; 14+ messages in thread From: Linus Torvalds @ 2018-04-09 23:54 UTC (permalink / raw) To: Jens Axboe Cc: Ming Lei, Kees Cook, Mike Snitzer, dm-devel, linux-block, Chris Mason On Mon, Apr 9, 2018 at 3:32 PM, Jens Axboe <axboe@kernel.dk> wrote: > > The resulting min/max and friends would have been trivial to test, but > clearly they weren't. Well, the min/max macros themselves actually were tested in user space by me. It was the interaction with the unrelated "min_not_zero()" that wasn't ;) It's easy in hind-sight to say "that's not at all unrelated", but within the context of doing min/max, it was. Linus ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] 2018-04-09 23:54 ` Linus Torvalds @ 2018-04-10 0:31 ` Jens Axboe 0 siblings, 0 replies; 14+ messages in thread From: Jens Axboe @ 2018-04-10 0:31 UTC (permalink / raw) To: Linus Torvalds Cc: Ming Lei, Kees Cook, Mike Snitzer, dm-devel, linux-block, Chris Mason On 4/9/18 5:54 PM, Linus Torvalds wrote: > On Mon, Apr 9, 2018 at 3:32 PM, Jens Axboe <axboe@kernel.dk> wrote: >> >> The resulting min/max and friends would have been trivial to test, but >> clearly they weren't. > > Well, the min/max macros themselves actually were tested in user space by me. > > It was the interaction with the unrelated "min_not_zero()" that wasn't ;) > > It's easy in hind-sight to say "that's not at all unrelated", but > within the context of doing min/max, it was. I guess we should just be thankful that it didn't cause bigger issues than an oops. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] 2018-04-09 21:56 ` Jens Axboe 2018-04-09 22:05 ` Kees Cook @ 2018-04-09 22:11 ` Linus Torvalds 1 sibling, 0 replies; 14+ messages in thread From: Linus Torvalds @ 2018-04-09 22:11 UTC (permalink / raw) To: Jens Axboe Cc: Mike Snitzer, Ming Lei, dm-devel, linux-block, Kees Cook, Chris Mason [-- Attachment #1: Type: text/plain, Size: 3640 bytes --] On mobile, sorry for html crud and top posting, but here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9092d0d97961146655ce51f43850907d95f68c3 Should fix it. Linus On Mon, Apr 9, 2018, 14:56 Jens Axboe <axboe@kernel.dk> wrote: > On 4/9/18 3:26 PM, Jens Axboe wrote: > > On 4/9/18 1:32 PM, Jens Axboe wrote: > >> On 4/9/18 12:38 PM, Mike Snitzer wrote: > >>> On Mon, Apr 09 2018 at 11:51am -0400, > >>> Mike Snitzer <snitzer@redhat.com> wrote: > >>> > >>>> On Sun, Apr 08 2018 at 12:00am -0400, > >>>> Ming Lei <ming.lei@redhat.com> wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> The following kernel oops(divide error) is triggered when running > >>>>> xfstest(generic/347) on ext4. > >>>>> > >>>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44 > >>>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI > >>>>> [ 443.840201] Dumping ftrace buffer: > >>>>> [ 443.840692] (ftrace buffer empty) > >>> ... > >>>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted > 4.16.0_f605ba97fb80_master+ #1 > >>>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), > BIOS 1.10.2-2.fc27 04/01/2014 > >>>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool] > >>> > >>> ... > >>> > >>>> I was able to reproduce (in my case RIP was pool_io_hints+0x45) > >>>> > >>>> Which on my kernel, is: > >>>> > >>>> crash> dis -l pool_io_hints+0x45 > >>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748 > >>>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi > >>>> > >>>> Which is drivers/md/dm-thin.c:is_factor()'s return > >>>> !sector_div(block_size, n); > >>>> > >>>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 > for > >>>> this xfstests device... why would that be!? > >>>> > >>>> Clearly pool_io_hints() could stand to be more defensive with a > >>>> !limits->max_sectors negative check but is it ever really valid for > >>>> max_sectors to be 0? > >>>> > >>>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious > >>>> place where block core would set max_sectors to 0, all blk-settings.c > >>>> uses min_not_zero(), etc). > >>> > >>> I successfully ran this test against the linux-dm.git > >>> "for-4.17/dm-changes" tag that Linus merged after the block changes: > >>> git:// > git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git > tags/for-4.17/dm-changes > >>> > >>> # ./check tests/generic/347 > >>> FSTYP -- ext4 > >>> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm > >>> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch > >>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch > /scratch > >>> > >>> generic/347 65s > >>> Ran: generic/347 > >>> Passed all 1 tests > >>> > >>> SO this would seem to implicate some regression in the 4.17 block layer > >>> changes. > >> > >> No immediate ideas come to mind, we didn't have a lot of changes and I > >> don't see anything that looks problematic. Maybe you can try and > >> bisect it and see what you come up with? > > > > I ran it, problematic commit is: > > > > commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91 > > Author: Kees Cook <keescook@chromium.org> > > Date: Fri Mar 30 18:52:36 2018 -0700 > > > > kernel.h: Retain constant expression output for max()/min() > > > > The fun continues. Thinking I'd try a userspace repro and thinking it > would be difficult to reproduce, try the attached min.c that just copies > all the bits from include/linux/kernel.h > > axboe@x1:~ $ gcc -Wall -O2 -o min min.c > axboe@x1:~ $ ./min 128 256 > min_not_zero(128, 256) = 0 > > -- > Jens Axboe > > [-- Attachment #2: Type: text/html, Size: 5662 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-04-10 0:31 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20180408040005.GA19128@ming.t460p>
2018-04-09 15:51 ` limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] Mike Snitzer
2018-04-09 18:38 ` Mike Snitzer
2018-04-09 19:32 ` Jens Axboe
2018-04-09 21:26 ` Jens Axboe
2018-04-09 21:56 ` Jens Axboe
2018-04-09 22:05 ` Kees Cook
2018-04-09 22:10 ` Jens Axboe
2018-04-09 22:27 ` Ming Lei
2018-04-09 22:32 ` Jens Axboe
2018-04-09 22:38 ` Kees Cook
2018-04-09 23:01 ` Jens Axboe
2018-04-09 23:54 ` Linus Torvalds
2018-04-10 0:31 ` Jens Axboe
2018-04-09 22:11 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).