* limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]
[not found] <20180408040005.GA19128@ming.t460p>
@ 2018-04-09 15:51 ` Mike Snitzer
2018-04-09 18:38 ` Mike Snitzer
0 siblings, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2018-04-09 15:51 UTC (permalink / raw)
To: Ming Lei; +Cc: dm-devel, linux-block
On Sun, Apr 08 2018 at 12:00am -0400,
Ming Lei <ming.lei@redhat.com> wrote:
> Hi,
>
> The following kernel oops(divide error) is triggered when running
> xfstest(generic/347) on ext4.
>
> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44
> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI
> [ 443.840201] Dumping ftrace buffer:
> [ 443.840692] (ftrace buffer empty)
> [ 443.841195] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_snapshot dm_bufio xfs libcrc32c dm_flakey isofs iTCO_wdt iTCO_vendor_support lpc_ich i2c_i801 i2c_core mfd_core ip_tables sr_mod cdrom sd_mod usb_storage ahci libahci libata nvme crc32c_intel nvme_core virtio_scsi qemu_fw_cfg dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_debug]
> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1
> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool]
> [ 443.848949] RSP: 0018:ffffc90001407af0 EFLAGS: 00010246
> [ 443.849679] RAX: 0000000000000400 RBX: ffffc90001407b48 RCX: 0000000000000000
> [ 443.850969] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [ 443.852097] RBP: ffff88006ce028a0 R08: 00000000ffffffff R09: 0000000000000001
> [ 443.853099] R10: ffffc90001407b20 R11: ffffea0001cfad60 R12: ffff88006de62000
> [ 443.854404] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 443.856129] FS: 00007fb30462d840(0000) GS:ffff88007bc80000(0000) knlGS:0000000000000000
> [ 443.857741] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 443.858576] CR2: 00007efc82a10440 CR3: 000000007e700006 CR4: 00000000007606e0
> [ 443.859583] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 443.860587] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 443.861595] PKRU: 55555554
> [ 443.861978] Call Trace:
> [ 443.862344] dm_calculate_queue_limits+0xb5/0x262 [dm_mod]
> [ 443.863128] dm_setup_md_queue+0xe2/0x131 [dm_mod]
> [ 443.863819] table_load+0x15e/0x2a7 [dm_mod]
> [ 443.864425] ? table_clear+0xc1/0xc1 [dm_mod]
> [ 443.865079] ctl_ioctl+0x295/0x374 [dm_mod]
> [ 443.865679] dm_ctl_ioctl+0xa/0xd [dm_mod]
> [ 443.866262] vfs_ioctl+0x1e/0x2b
> [ 443.866721] do_vfs_ioctl+0x515/0x53d
> [ 443.867242] ? ksys_semctl+0xb9/0x126
> [ 443.867761] ? __fput+0x17a/0x18d
> [ 443.868236] ksys_ioctl+0x3e/0x5d
> [ 443.868707] SyS_ioctl+0xa/0xd
> [ 443.869144] do_syscall_64+0x9d/0x15e
> [ 443.869669] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [ 443.870381] RIP: 0033:0x7fb303ee8dc7
> [ 443.870886] RSP: 002b:00007ffdc3c81478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 443.871937] RAX: ffffffffffffffda RBX: 00007fb3041cbec0 RCX: 00007fb303ee8dc7
> [ 443.872925] RDX: 0000563591b81c30 RSI: 00000000c138fd09 RDI: 0000000000000003
> [ 443.873912] RBP: 0000000000000000 R08: 00007fb3042071c8 R09: 00007ffdc3c812e0
> [ 443.874900] R10: 00007fb304206683 R11: 0000000000000246 R12: 0000000000000000
> [ 443.875901] R13: 0000563591b81c60 R14: 0000563591b81c30 R15: 0000563591b81a80
> [ 443.876905] Code: 72 41 eb 33 8d 41 ff 85 c8 75 03 89 43 24 8b 43 24 44 89 c1 48 0f bd c8 4c 89 c8 48 d3 e0 89 43 24 8b 73 24 41 8b 44 24 38 31 d2 <48> f7 f6 48 89 f1 85 d2 75 cf eb bf 31 d2 89 f8 48 f7 f1 48 85
> [ 443.879519] RIP: pool_io_hints+0x77/0x153 [dm_thin_pool] RSP: ffffc90001407af0
> [ 443.880549] ---[ end trace 56e7f9b41e671f53 ]---
I was able to reproduce (in my case RIP was pool_io_hints+0x45)
Which on my kernel, is:
crash> dis -l pool_io_hints+0x45
/root/snitm/git/linux/drivers/md/dm-thin.c: 2748
0xffffffffc0765165 <pool_io_hints+69>: div %rdi
Which is drivers/md/dm-thin.c:is_factor()'s return
!sector_div(block_size, n);
SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for
this xfstests device... why would that be!?
Clearly pool_io_hints() could stand to be more defensive with a
!limits->max_sectors negative check but is it ever really valid for
max_sectors to be 0?
Pretty sure the ultimate bug is outside DM (but not seeing an obvious
place where block core would set max_sectors to 0, all blk-settings.c
uses min_not_zero(), etc).
Mike
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]
2018-04-09 15:51 ` limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] Mike Snitzer
@ 2018-04-09 18:38 ` Mike Snitzer
2018-04-09 19:32 ` Jens Axboe
0 siblings, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2018-04-09 18:38 UTC (permalink / raw)
To: Ming Lei; +Cc: dm-devel, linux-block
On Mon, Apr 09 2018 at 11:51am -0400,
Mike Snitzer <snitzer@redhat.com> wrote:
> On Sun, Apr 08 2018 at 12:00am -0400,
> Ming Lei <ming.lei@redhat.com> wrote:
>
> > Hi,
> >
> > The following kernel oops(divide error) is triggered when running
> > xfstest(generic/347) on ext4.
> >
> > [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44
> > [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI
> > [ 443.840201] Dumping ftrace buffer:
> > [ 443.840692] (ftrace buffer empty)
...
> > [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1
> > [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
> > [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool]
...
> I was able to reproduce (in my case RIP was pool_io_hints+0x45)
>
> Which on my kernel, is:
>
> crash> dis -l pool_io_hints+0x45
> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748
> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi
>
> Which is drivers/md/dm-thin.c:is_factor()'s return
> !sector_div(block_size, n);
>
> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for
> this xfstests device... why would that be!?
>
> Clearly pool_io_hints() could stand to be more defensive with a
> !limits->max_sectors negative check but is it ever really valid for
> max_sectors to be 0?
>
> Pretty sure the ultimate bug is outside DM (but not seeing an obvious
> place where block core would set max_sectors to 0, all blk-settings.c
> uses min_not_zero(), etc).
I successfully ran this test against the linux-dm.git
"for-4.17/dm-changes" tag that Linus merged after the block changes:
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes
# ./check tests/generic/347
FSTYP -- ext4
PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm
MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch
MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch
generic/347 65s
Ran: generic/347
Passed all 1 tests
SO this would seem to implicate some regression in the 4.17 block layer
changes.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]
2018-04-09 18:38 ` Mike Snitzer
@ 2018-04-09 19:32 ` Jens Axboe
2018-04-09 21:26 ` Jens Axboe
0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2018-04-09 19:32 UTC (permalink / raw)
To: Mike Snitzer, Ming Lei; +Cc: dm-devel, linux-block
On 4/9/18 12:38 PM, Mike Snitzer wrote:
> On Mon, Apr 09 2018 at 11:51am -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>
>> On Sun, Apr 08 2018 at 12:00am -0400,
>> Ming Lei <ming.lei@redhat.com> wrote:
>>
>>> Hi,
>>>
>>> The following kernel oops(divide error) is triggered when running
>>> xfstest(generic/347) on ext4.
>>>
>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44
>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI
>>> [ 443.840201] Dumping ftrace buffer:
>>> [ 443.840692] (ftrace buffer empty)
> ...
>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1
>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool]
>
> ...
>
>> I was able to reproduce (in my case RIP was pool_io_hints+0x45)
>>
>> Which on my kernel, is:
>>
>> crash> dis -l pool_io_hints+0x45
>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748
>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi
>>
>> Which is drivers/md/dm-thin.c:is_factor()'s return
>> !sector_div(block_size, n);
>>
>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for
>> this xfstests device... why would that be!?
>>
>> Clearly pool_io_hints() could stand to be more defensive with a
>> !limits->max_sectors negative check but is it ever really valid for
>> max_sectors to be 0?
>>
>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious
>> place where block core would set max_sectors to 0, all blk-settings.c
>> uses min_not_zero(), etc).
>
> I successfully ran this test against the linux-dm.git
> "for-4.17/dm-changes" tag that Linus merged after the block changes:
> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes
>
> # ./check tests/generic/347
> FSTYP -- ext4
> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm
> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch
> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch
>
> generic/347 65s
> Ran: generic/347
> Passed all 1 tests
>
> SO this would seem to implicate some regression in the 4.17 block layer
> changes.
No immediate ideas come to mind, we didn't have a lot of changes and I
don't see anything that looks problematic. Maybe you can try and
bisect it and see what you come up with?
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]
2018-04-09 19:32 ` Jens Axboe
@ 2018-04-09 21:26 ` Jens Axboe
2018-04-09 21:56 ` Jens Axboe
0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2018-04-09 21:26 UTC (permalink / raw)
To: Mike Snitzer, Ming Lei; +Cc: dm-devel, linux-block, Kees Cook
On 4/9/18 1:32 PM, Jens Axboe wrote:
> On 4/9/18 12:38 PM, Mike Snitzer wrote:
>> On Mon, Apr 09 2018 at 11:51am -0400,
>> Mike Snitzer <snitzer@redhat.com> wrote:
>>
>>> On Sun, Apr 08 2018 at 12:00am -0400,
>>> Ming Lei <ming.lei@redhat.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> The following kernel oops(divide error) is triggered when running
>>>> xfstest(generic/347) on ext4.
>>>>
>>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44
>>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI
>>>> [ 443.840201] Dumping ftrace buffer:
>>>> [ 443.840692] (ftrace buffer empty)
>> ...
>>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1
>>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
>>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool]
>>
>> ...
>>
>>> I was able to reproduce (in my case RIP was pool_io_hints+0x45)
>>>
>>> Which on my kernel, is:
>>>
>>> crash> dis -l pool_io_hints+0x45
>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748
>>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi
>>>
>>> Which is drivers/md/dm-thin.c:is_factor()'s return
>>> !sector_div(block_size, n);
>>>
>>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for
>>> this xfstests device... why would that be!?
>>>
>>> Clearly pool_io_hints() could stand to be more defensive with a
>>> !limits->max_sectors negative check but is it ever really valid for
>>> max_sectors to be 0?
>>>
>>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious
>>> place where block core would set max_sectors to 0, all blk-settings.c
>>> uses min_not_zero(), etc).
>>
>> I successfully ran this test against the linux-dm.git
>> "for-4.17/dm-changes" tag that Linus merged after the block changes:
>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes
>>
>> # ./check tests/generic/347
>> FSTYP -- ext4
>> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm
>> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch
>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch
>>
>> generic/347 65s
>> Ran: generic/347
>> Passed all 1 tests
>>
>> SO this would seem to implicate some regression in the 4.17 block layer
>> changes.
>
> No immediate ideas come to mind, we didn't have a lot of changes and I
> don't see anything that looks problematic. Maybe you can try and
> bisect it and see what you come up with?
I ran it, problematic commit is:
commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91
Author: Kees Cook <keescook@chromium.org>
Date: Fri Mar 30 18:52:36 2018 -0700
kernel.h: Retain constant expression output for max()/min()
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]
2018-04-09 21:26 ` Jens Axboe
@ 2018-04-09 21:56 ` Jens Axboe
2018-04-09 22:05 ` Kees Cook
2018-04-09 22:11 ` Linus Torvalds
0 siblings, 2 replies; 14+ messages in thread
From: Jens Axboe @ 2018-04-09 21:56 UTC (permalink / raw)
To: Mike Snitzer, Ming Lei
Cc: dm-devel, linux-block, Kees Cook, Linus Torvalds, Chris Mason
[-- Attachment #1: Type: text/plain, Size: 3183 bytes --]
On 4/9/18 3:26 PM, Jens Axboe wrote:
> On 4/9/18 1:32 PM, Jens Axboe wrote:
>> On 4/9/18 12:38 PM, Mike Snitzer wrote:
>>> On Mon, Apr 09 2018 at 11:51am -0400,
>>> Mike Snitzer <snitzer@redhat.com> wrote:
>>>
>>>> On Sun, Apr 08 2018 at 12:00am -0400,
>>>> Ming Lei <ming.lei@redhat.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The following kernel oops(divide error) is triggered when running
>>>>> xfstest(generic/347) on ext4.
>>>>>
>>>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44
>>>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI
>>>>> [ 443.840201] Dumping ftrace buffer:
>>>>> [ 443.840692] (ftrace buffer empty)
>>> ...
>>>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1
>>>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
>>>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool]
>>>
>>> ...
>>>
>>>> I was able to reproduce (in my case RIP was pool_io_hints+0x45)
>>>>
>>>> Which on my kernel, is:
>>>>
>>>> crash> dis -l pool_io_hints+0x45
>>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748
>>>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi
>>>>
>>>> Which is drivers/md/dm-thin.c:is_factor()'s return
>>>> !sector_div(block_size, n);
>>>>
>>>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for
>>>> this xfstests device... why would that be!?
>>>>
>>>> Clearly pool_io_hints() could stand to be more defensive with a
>>>> !limits->max_sectors negative check but is it ever really valid for
>>>> max_sectors to be 0?
>>>>
>>>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious
>>>> place where block core would set max_sectors to 0, all blk-settings.c
>>>> uses min_not_zero(), etc).
>>>
>>> I successfully ran this test against the linux-dm.git
>>> "for-4.17/dm-changes" tag that Linus merged after the block changes:
>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes
>>>
>>> # ./check tests/generic/347
>>> FSTYP -- ext4
>>> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm
>>> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch
>>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch
>>>
>>> generic/347 65s
>>> Ran: generic/347
>>> Passed all 1 tests
>>>
>>> SO this would seem to implicate some regression in the 4.17 block layer
>>> changes.
>>
>> No immediate ideas come to mind, we didn't have a lot of changes and I
>> don't see anything that looks problematic. Maybe you can try and
>> bisect it and see what you come up with?
>
> I ran it, problematic commit is:
>
> commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91
> Author: Kees Cook <keescook@chromium.org>
> Date: Fri Mar 30 18:52:36 2018 -0700
>
> kernel.h: Retain constant expression output for max()/min()
>
The fun continues. Thinking I'd try a userspace repro and thinking it
would be difficult to reproduce, try the attached min.c that just copies
all the bits from include/linux/kernel.h
axboe@x1:~ $ gcc -Wall -O2 -o min min.c
axboe@x1:~ $ ./min 128 256
min_not_zero(128, 256) = 0
--
Jens Axboe
[-- Attachment #2: min.c --]
[-- Type: text/x-csrc, Size: 1112 bytes --]
#include <stdio.h>
#include <stdlib.h>
#define __is_constexpr(x) \
(sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
#define __no_side_effects(x, y) \
(__is_constexpr(x) && __is_constexpr(y))
#define __typecheck(x, y) \
(!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
#define __safe_cmp(x, y) \
(__typecheck(x, y) && __no_side_effects(x, y))
#define __cmp(x, y, op) ((x) op (y) ? (x) : (y))
#define __cmp_once(x, y, op) ({ \
typeof(x) __x = (x); \
typeof(y) __y = (y); \
__cmp(__x, __y, op); })
#define __careful_cmp(x, y, op) \
__builtin_choose_expr(__safe_cmp(x, y), \
__cmp(x, y, op), __cmp_once(x, y, op))
#define min(x, y) __careful_cmp(x, y, <)
#define min_not_zero(x, y) ({ \
typeof(x) __x = (x); \
typeof(y) __y = (y); \
__x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); })
int main(int argc, char *argv[])
{
int val1, val2;
if (argc < 3) {
printf("%s val1 val2\n", argv[0]);
return 1;
}
val1 = atoi(argv[1]);
val2 = atoi(argv[2]);
printf("min_not_zero(%d, %d) = %d\n", val1, val2, min_not_zero(val1, val2));
return 0;
}
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]
2018-04-09 21:56 ` Jens Axboe
@ 2018-04-09 22:05 ` Kees Cook
2018-04-09 22:10 ` Jens Axboe
2018-04-09 22:11 ` Linus Torvalds
1 sibling, 1 reply; 14+ messages in thread
From: Kees Cook @ 2018-04-09 22:05 UTC (permalink / raw)
To: Jens Axboe
Cc: Mike Snitzer, Ming Lei, dm-devel, linux-block, Linus Torvalds,
Chris Mason
On Mon, Apr 9, 2018 at 2:56 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 4/9/18 3:26 PM, Jens Axboe wrote:
>> On 4/9/18 1:32 PM, Jens Axboe wrote:
>>> On 4/9/18 12:38 PM, Mike Snitzer wrote:
>>>> On Mon, Apr 09 2018 at 11:51am -0400,
>>>> Mike Snitzer <snitzer@redhat.com> wrote:
>>>>
>>>>> On Sun, Apr 08 2018 at 12:00am -0400,
>>>>> Ming Lei <ming.lei@redhat.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The following kernel oops(divide error) is triggered when running
>>>>>> xfstest(generic/347) on ext4.
>>>>>>
>>>>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44
>>>>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI
>>>>>> [ 443.840201] Dumping ftrace buffer:
>>>>>> [ 443.840692] (ftrace buffer empty)
>>>> ...
>>>>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1
>>>>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
>>>>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool]
>>>>
>>>> ...
>>>>
>>>>> I was able to reproduce (in my case RIP was pool_io_hints+0x45)
>>>>>
>>>>> Which on my kernel, is:
>>>>>
>>>>> crash> dis -l pool_io_hints+0x45
>>>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748
>>>>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi
>>>>>
>>>>> Which is drivers/md/dm-thin.c:is_factor()'s return
>>>>> !sector_div(block_size, n);
>>>>>
>>>>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for
>>>>> this xfstests device... why would that be!?
>>>>>
>>>>> Clearly pool_io_hints() could stand to be more defensive with a
>>>>> !limits->max_sectors negative check but is it ever really valid for
>>>>> max_sectors to be 0?
>>>>>
>>>>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious
>>>>> place where block core would set max_sectors to 0, all blk-settings.c
>>>>> uses min_not_zero(), etc).
>>>>
>>>> I successfully ran this test against the linux-dm.git
>>>> "for-4.17/dm-changes" tag that Linus merged after the block changes:
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes
>>>>
>>>> # ./check tests/generic/347
>>>> FSTYP -- ext4
>>>> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm
>>>> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch
>>>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch
>>>>
>>>> generic/347 65s
>>>> Ran: generic/347
>>>> Passed all 1 tests
>>>>
>>>> SO this would seem to implicate some regression in the 4.17 block layer
>>>> changes.
>>>
>>> No immediate ideas come to mind, we didn't have a lot of changes and I
>>> don't see anything that looks problematic. Maybe you can try and
>>> bisect it and see what you come up with?
>>
>> I ran it, problematic commit is:
>>
>> commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91
>> Author: Kees Cook <keescook@chromium.org>
>> Date: Fri Mar 30 18:52:36 2018 -0700
>>
>> kernel.h: Retain constant expression output for max()/min()
>>
>
> The fun continues. Thinking I'd try a userspace repro and thinking it
> would be difficult to reproduce, try the attached min.c that just copies
> all the bits from include/linux/kernel.h
>
> axboe@x1:~ $ gcc -Wall -O2 -o min min.c
> axboe@x1:~ $ ./min 128 256
> min_not_zero(128, 256) = 0
This should be fixed with e9092d0d9796 ("Fix subtle macro variable
shadowing in min_not_zero()").
-Kees
--
Kees Cook
Pixel Security
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]
2018-04-09 22:05 ` Kees Cook
@ 2018-04-09 22:10 ` Jens Axboe
2018-04-09 22:27 ` Ming Lei
0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2018-04-09 22:10 UTC (permalink / raw)
To: Kees Cook
Cc: Mike Snitzer, Ming Lei, dm-devel, linux-block, Linus Torvalds,
Chris Mason
On 4/9/18 4:05 PM, Kees Cook wrote:
> On Mon, Apr 9, 2018 at 2:56 PM, Jens Axboe <axboe@kernel.dk> wrote:
>> On 4/9/18 3:26 PM, Jens Axboe wrote:
>>> On 4/9/18 1:32 PM, Jens Axboe wrote:
>>>> On 4/9/18 12:38 PM, Mike Snitzer wrote:
>>>>> On Mon, Apr 09 2018 at 11:51am -0400,
>>>>> Mike Snitzer <snitzer@redhat.com> wrote:
>>>>>
>>>>>> On Sun, Apr 08 2018 at 12:00am -0400,
>>>>>> Ming Lei <ming.lei@redhat.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> The following kernel oops(divide error) is triggered when running
>>>>>>> xfstest(generic/347) on ext4.
>>>>>>>
>>>>>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44
>>>>>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI
>>>>>>> [ 443.840201] Dumping ftrace buffer:
>>>>>>> [ 443.840692] (ftrace buffer empty)
>>>>> ...
>>>>>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1
>>>>>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
>>>>>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool]
>>>>>
>>>>> ...
>>>>>
>>>>>> I was able to reproduce (in my case RIP was pool_io_hints+0x45)
>>>>>>
>>>>>> Which on my kernel, is:
>>>>>>
>>>>>> crash> dis -l pool_io_hints+0x45
>>>>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748
>>>>>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi
>>>>>>
>>>>>> Which is drivers/md/dm-thin.c:is_factor()'s return
>>>>>> !sector_div(block_size, n);
>>>>>>
>>>>>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for
>>>>>> this xfstests device... why would that be!?
>>>>>>
>>>>>> Clearly pool_io_hints() could stand to be more defensive with a
>>>>>> !limits->max_sectors negative check but is it ever really valid for
>>>>>> max_sectors to be 0?
>>>>>>
>>>>>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious
>>>>>> place where block core would set max_sectors to 0, all blk-settings.c
>>>>>> uses min_not_zero(), etc).
>>>>>
>>>>> I successfully ran this test against the linux-dm.git
>>>>> "for-4.17/dm-changes" tag that Linus merged after the block changes:
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes
>>>>>
>>>>> # ./check tests/generic/347
>>>>> FSTYP -- ext4
>>>>> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm
>>>>> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch
>>>>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch
>>>>>
>>>>> generic/347 65s
>>>>> Ran: generic/347
>>>>> Passed all 1 tests
>>>>>
>>>>> SO this would seem to implicate some regression in the 4.17 block layer
>>>>> changes.
>>>>
>>>> No immediate ideas come to mind, we didn't have a lot of changes and I
>>>> don't see anything that looks problematic. Maybe you can try and
>>>> bisect it and see what you come up with?
>>>
>>> I ran it, problematic commit is:
>>>
>>> commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91
>>> Author: Kees Cook <keescook@chromium.org>
>>> Date: Fri Mar 30 18:52:36 2018 -0700
>>>
>>> kernel.h: Retain constant expression output for max()/min()
>>>
>>
>> The fun continues. Thinking I'd try a userspace repro and thinking it
>> would be difficult to reproduce, try the attached min.c that just copies
>> all the bits from include/linux/kernel.h
>>
>> axboe@x1:~ $ gcc -Wall -O2 -o min min.c
>> axboe@x1:~ $ ./min 128 256
>> min_not_zero(128, 256) = 0
>
> This should be fixed with e9092d0d9796 ("Fix subtle macro variable
> shadowing in min_not_zero()").
Yep that works, which is a relief. Some basic unit testing would have
been very appropriate in this case, given how fundamentally broken it
was... It's amazing nothing catastrophic happened.
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]
2018-04-09 21:56 ` Jens Axboe
2018-04-09 22:05 ` Kees Cook
@ 2018-04-09 22:11 ` Linus Torvalds
1 sibling, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2018-04-09 22:11 UTC (permalink / raw)
To: Jens Axboe
Cc: Mike Snitzer, Ming Lei, dm-devel, linux-block, Kees Cook,
Chris Mason
[-- Attachment #1: Type: text/plain, Size: 3640 bytes --]
On mobile, sorry for html crud and top posting, but here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9092d0d97961146655ce51f43850907d95f68c3
Should fix it.
Linus
On Mon, Apr 9, 2018, 14:56 Jens Axboe <axboe@kernel.dk> wrote:
> On 4/9/18 3:26 PM, Jens Axboe wrote:
> > On 4/9/18 1:32 PM, Jens Axboe wrote:
> >> On 4/9/18 12:38 PM, Mike Snitzer wrote:
> >>> On Mon, Apr 09 2018 at 11:51am -0400,
> >>> Mike Snitzer <snitzer@redhat.com> wrote:
> >>>
> >>>> On Sun, Apr 08 2018 at 12:00am -0400,
> >>>> Ming Lei <ming.lei@redhat.com> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> The following kernel oops(divide error) is triggered when running
> >>>>> xfstest(generic/347) on ext4.
> >>>>>
> >>>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44
> >>>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI
> >>>>> [ 443.840201] Dumping ftrace buffer:
> >>>>> [ 443.840692] (ftrace buffer empty)
> >>> ...
> >>>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted
> 4.16.0_f605ba97fb80_master+ #1
> >>>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS 1.10.2-2.fc27 04/01/2014
> >>>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool]
> >>>
> >>> ...
> >>>
> >>>> I was able to reproduce (in my case RIP was pool_io_hints+0x45)
> >>>>
> >>>> Which on my kernel, is:
> >>>>
> >>>> crash> dis -l pool_io_hints+0x45
> >>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748
> >>>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi
> >>>>
> >>>> Which is drivers/md/dm-thin.c:is_factor()'s return
> >>>> !sector_div(block_size, n);
> >>>>
> >>>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0
> for
> >>>> this xfstests device... why would that be!?
> >>>>
> >>>> Clearly pool_io_hints() could stand to be more defensive with a
> >>>> !limits->max_sectors negative check but is it ever really valid for
> >>>> max_sectors to be 0?
> >>>>
> >>>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious
> >>>> place where block core would set max_sectors to 0, all blk-settings.c
> >>>> uses min_not_zero(), etc).
> >>>
> >>> I successfully ran this test against the linux-dm.git
> >>> "for-4.17/dm-changes" tag that Linus merged after the block changes:
> >>> git://
> git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
> tags/for-4.17/dm-changes
> >>>
> >>> # ./check tests/generic/347
> >>> FSTYP -- ext4
> >>> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm
> >>> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch
> >>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch
> /scratch
> >>>
> >>> generic/347 65s
> >>> Ran: generic/347
> >>> Passed all 1 tests
> >>>
> >>> SO this would seem to implicate some regression in the 4.17 block layer
> >>> changes.
> >>
> >> No immediate ideas come to mind, we didn't have a lot of changes and I
> >> don't see anything that looks problematic. Maybe you can try and
> >> bisect it and see what you come up with?
> >
> > I ran it, problematic commit is:
> >
> > commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91
> > Author: Kees Cook <keescook@chromium.org>
> > Date: Fri Mar 30 18:52:36 2018 -0700
> >
> > kernel.h: Retain constant expression output for max()/min()
> >
>
> The fun continues. Thinking I'd try a userspace repro and thinking it
> would be difficult to reproduce, try the attached min.c that just copies
> all the bits from include/linux/kernel.h
>
> axboe@x1:~ $ gcc -Wall -O2 -o min min.c
> axboe@x1:~ $ ./min 128 256
> min_not_zero(128, 256) = 0
>
> --
> Jens Axboe
>
>
[-- Attachment #2: Type: text/html, Size: 5662 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]
2018-04-09 22:10 ` Jens Axboe
@ 2018-04-09 22:27 ` Ming Lei
2018-04-09 22:32 ` Jens Axboe
0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2018-04-09 22:27 UTC (permalink / raw)
To: Jens Axboe
Cc: Kees Cook, Mike Snitzer, dm-devel, linux-block, Linus Torvalds,
Chris Mason
On Mon, Apr 09, 2018 at 04:10:17PM -0600, Jens Axboe wrote:
> On 4/9/18 4:05 PM, Kees Cook wrote:
> > On Mon, Apr 9, 2018 at 2:56 PM, Jens Axboe <axboe@kernel.dk> wrote:
> >> On 4/9/18 3:26 PM, Jens Axboe wrote:
> >>> On 4/9/18 1:32 PM, Jens Axboe wrote:
> >>>> On 4/9/18 12:38 PM, Mike Snitzer wrote:
> >>>>> On Mon, Apr 09 2018 at 11:51am -0400,
> >>>>> Mike Snitzer <snitzer@redhat.com> wrote:
> >>>>>
> >>>>>> On Sun, Apr 08 2018 at 12:00am -0400,
> >>>>>> Ming Lei <ming.lei@redhat.com> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> The following kernel oops(divide error) is triggered when running
> >>>>>>> xfstest(generic/347) on ext4.
> >>>>>>>
> >>>>>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44
> >>>>>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI
> >>>>>>> [ 443.840201] Dumping ftrace buffer:
> >>>>>>> [ 443.840692] (ftrace buffer empty)
> >>>>> ...
> >>>>>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1
> >>>>>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
> >>>>>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool]
> >>>>>
> >>>>> ...
> >>>>>
> >>>>>> I was able to reproduce (in my case RIP was pool_io_hints+0x45)
> >>>>>>
> >>>>>> Which on my kernel, is:
> >>>>>>
> >>>>>> crash> dis -l pool_io_hints+0x45
> >>>>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748
> >>>>>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi
> >>>>>>
> >>>>>> Which is drivers/md/dm-thin.c:is_factor()'s return
> >>>>>> !sector_div(block_size, n);
> >>>>>>
> >>>>>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for
> >>>>>> this xfstests device... why would that be!?
> >>>>>>
> >>>>>> Clearly pool_io_hints() could stand to be more defensive with a
> >>>>>> !limits->max_sectors negative check but is it ever really valid for
> >>>>>> max_sectors to be 0?
> >>>>>>
> >>>>>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious
> >>>>>> place where block core would set max_sectors to 0, all blk-settings.c
> >>>>>> uses min_not_zero(), etc).
> >>>>>
> >>>>> I successfully ran this test against the linux-dm.git
> >>>>> "for-4.17/dm-changes" tag that Linus merged after the block changes:
> >>>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes
> >>>>>
> >>>>> # ./check tests/generic/347
> >>>>> FSTYP -- ext4
> >>>>> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm
> >>>>> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch
> >>>>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch
> >>>>>
> >>>>> generic/347 65s
> >>>>> Ran: generic/347
> >>>>> Passed all 1 tests
> >>>>>
> >>>>> SO this would seem to implicate some regression in the 4.17 block layer
> >>>>> changes.
> >>>>
> >>>> No immediate ideas come to mind, we didn't have a lot of changes and I
> >>>> don't see anything that looks problematic. Maybe you can try and
> >>>> bisect it and see what you come up with?
> >>>
> >>> I ran it, problematic commit is:
> >>>
> >>> commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91
> >>> Author: Kees Cook <keescook@chromium.org>
> >>> Date: Fri Mar 30 18:52:36 2018 -0700
> >>>
> >>> kernel.h: Retain constant expression output for max()/min()
> >>>
> >>
> >> The fun continues. Thinking I'd try a userspace repro and thinking it
> >> would be difficult to reproduce, try the attached min.c that just copies
> >> all the bits from include/linux/kernel.h
> >>
> >> axboe@x1:~ $ gcc -Wall -O2 -o min min.c
> >> axboe@x1:~ $ ./min 128 256
> >> min_not_zero(128, 256) = 0
> >
> > This should be fixed with e9092d0d9796 ("Fix subtle macro variable
> > shadowing in min_not_zero()").
>
> Yep that works, which is a relief. Some basic unit testing would have
> been very appropriate in this case, given how fundamentally broken it
> was... It's amazing nothing catastrophic happened.
Actually, there was, :-)
https://lkml.org/lkml/2018/4/9/355
--
Ming
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]
2018-04-09 22:27 ` Ming Lei
@ 2018-04-09 22:32 ` Jens Axboe
2018-04-09 22:38 ` Kees Cook
2018-04-09 23:54 ` Linus Torvalds
0 siblings, 2 replies; 14+ messages in thread
From: Jens Axboe @ 2018-04-09 22:32 UTC (permalink / raw)
To: Ming Lei
Cc: Kees Cook, Mike Snitzer, dm-devel, linux-block, Linus Torvalds,
Chris Mason
On 4/9/18 4:27 PM, Ming Lei wrote:
> On Mon, Apr 09, 2018 at 04:10:17PM -0600, Jens Axboe wrote:
>> On 4/9/18 4:05 PM, Kees Cook wrote:
>>> On Mon, Apr 9, 2018 at 2:56 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>>> On 4/9/18 3:26 PM, Jens Axboe wrote:
>>>>> On 4/9/18 1:32 PM, Jens Axboe wrote:
>>>>>> On 4/9/18 12:38 PM, Mike Snitzer wrote:
>>>>>>> On Mon, Apr 09 2018 at 11:51am -0400,
>>>>>>> Mike Snitzer <snitzer@redhat.com> wrote:
>>>>>>>
>>>>>>>> On Sun, Apr 08 2018 at 12:00am -0400,
>>>>>>>> Ming Lei <ming.lei@redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> The following kernel oops(divide error) is triggered when running
>>>>>>>>> xfstest(generic/347) on ext4.
>>>>>>>>>
>>>>>>>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44
>>>>>>>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI
>>>>>>>>> [ 443.840201] Dumping ftrace buffer:
>>>>>>>>> [ 443.840692] (ftrace buffer empty)
>>>>>>> ...
>>>>>>>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted 4.16.0_f605ba97fb80_master+ #1
>>>>>>>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
>>>>>>>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool]
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>> I was able to reproduce (in my case RIP was pool_io_hints+0x45)
>>>>>>>>
>>>>>>>> Which on my kernel, is:
>>>>>>>>
>>>>>>>> crash> dis -l pool_io_hints+0x45
>>>>>>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748
>>>>>>>> 0xffffffffc0765165 <pool_io_hints+69>: div %rdi
>>>>>>>>
>>>>>>>> Which is drivers/md/dm-thin.c:is_factor()'s return
>>>>>>>> !sector_div(block_size, n);
>>>>>>>>
>>>>>>>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 for
>>>>>>>> this xfstests device... why would that be!?
>>>>>>>>
>>>>>>>> Clearly pool_io_hints() could stand to be more defensive with a
>>>>>>>> !limits->max_sectors negative check but is it ever really valid for
>>>>>>>> max_sectors to be 0?
>>>>>>>>
>>>>>>>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious
>>>>>>>> place where block core would set max_sectors to 0, all blk-settings.c
>>>>>>>> uses min_not_zero(), etc).
>>>>>>>
>>>>>>> I successfully ran this test against the linux-dm.git
>>>>>>> "for-4.17/dm-changes" tag that Linus merged after the block changes:
>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-4.17/dm-changes
>>>>>>>
>>>>>>> # ./check tests/generic/347
>>>>>>> FSTYP -- ext4
>>>>>>> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm
>>>>>>> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch
>>>>>>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch /scratch
>>>>>>>
>>>>>>> generic/347 65s
>>>>>>> Ran: generic/347
>>>>>>> Passed all 1 tests
>>>>>>>
>>>>>>> SO this would seem to implicate some regression in the 4.17 block layer
>>>>>>> changes.
>>>>>>
>>>>>> No immediate ideas come to mind, we didn't have a lot of changes and I
>>>>>> don't see anything that looks problematic. Maybe you can try and
>>>>>> bisect it and see what you come up with?
>>>>>
>>>>> I ran it, problematic commit is:
>>>>>
>>>>> commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91
>>>>> Author: Kees Cook <keescook@chromium.org>
>>>>> Date: Fri Mar 30 18:52:36 2018 -0700
>>>>>
>>>>> kernel.h: Retain constant expression output for max()/min()
>>>>>
>>>>
>>>> The fun continues. Thinking I'd try a userspace repro and thinking it
>>>> would be difficult to reproduce, try the attached min.c that just copies
>>>> all the bits from include/linux/kernel.h
>>>>
>>>> axboe@x1:~ $ gcc -Wall -O2 -o min min.c
>>>> axboe@x1:~ $ ./min 128 256
>>>> min_not_zero(128, 256) = 0
>>>
>>> This should be fixed with e9092d0d9796 ("Fix subtle macro variable
>>> shadowing in min_not_zero()").
>>
>> Yep that works, which is a relief. Some basic unit testing would have
>> been very appropriate in this case, given how fundamentally broken it
>> was... It's amazing nothing catastrophic happened.
>
> Actually, there was, :-)
>
> https://lkml.org/lkml/2018/4/9/355
That's bad, for sure, but my worry was bigger than an oops or crash,
we could have had corruption due to this.
The resulting min/max and friends would have been trivial to test, but
clearly they weren't.
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]
2018-04-09 22:32 ` Jens Axboe
@ 2018-04-09 22:38 ` Kees Cook
2018-04-09 23:01 ` Jens Axboe
2018-04-09 23:54 ` Linus Torvalds
1 sibling, 1 reply; 14+ messages in thread
From: Kees Cook @ 2018-04-09 22:38 UTC (permalink / raw)
To: Jens Axboe, Linus Torvalds
Cc: Ming Lei, Mike Snitzer, dm-devel, linux-block, Chris Mason
On Mon, Apr 9, 2018 at 3:32 PM, Jens Axboe <axboe@kernel.dk> wrote:
> That's bad, for sure, but my worry was bigger than an oops or crash,
> we could have had corruption due to this.
>
> The resulting min/max and friends would have been trivial to test, but
> clearly they weren't.
Yeah, that was bad luck and my fault: I tested min(), max(), min_t(),
and max_t(). My assumption was that since the others were built from
them, they'd be fine. Not true in this shadow variable case, though.
:( We could do something like this, which would have caught it:
diff --git a/init/main.c b/init/main.c
index e4a3160991ea..ce46afc53b8b 100644
--- a/init/main.c
+++ b/init/main.c
@@ -993,10 +993,32 @@ static inline void mark_readonly(void)
}
#endif
+static inline void compiletime_sanity_checks(void)
+{
+ /* Sanity-check min()/max() family of macros. */
+ BUILD_BUG_ON(min(5, 50) != 5);
+ BUILD_BUG_ON(max(5, 50) != 50);
+ BUILD_BUG_ON(min_t(int, (size_t)-1 , 50) != -1);
+ BUILD_BUG_ON(max_t(size_t, -1 , 50) != (size_t)-1);
+ BUILD_BUG_ON(min3(-50, 0, 1000) != -50);
+ BUILD_BUG_ON(max3(-50, 0, 1000) != 1000);
+ BUILD_BUG_ON(min_not_zero(0, 20) != 20);
+ BUILD_BUG_ON(min_not_zero(30, 0) != 30);
+ BUILD_BUG_ON(min_not_zero(150, 40) != 40);
+ BUILD_BUG_ON(clamp(20, 1, 7) != 7);
+ BUILD_BUG_ON(clamp(40, 20, 100) != 40);
+ BUILD_BUG_ON(clamp(1, 20, 100) != 20);
+ BUILD_BUG_ON(clamp_t(int, -5, (size_t)-1, 100) != -1);
+ BUILD_BUG_ON(clamp_t(int, -1, (size_t)-5, 100) != -1);
+ BUILD_BUG_ON(clamp_t(size_t, -10, 1, -50) != -50);
+}
+
static int __ref kernel_init(void *unused)
{
int ret;
+ compiletime_sanity_checks();
+
kernel_init_freeable();
/* need to finish all async __init code before freeing the memory */
async_synchronize_full();
--
Kees Cook
Pixel Security
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]
2018-04-09 22:38 ` Kees Cook
@ 2018-04-09 23:01 ` Jens Axboe
0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2018-04-09 23:01 UTC (permalink / raw)
To: Kees Cook, Linus Torvalds
Cc: Ming Lei, Mike Snitzer, dm-devel, linux-block, Chris Mason
On 4/9/18 4:38 PM, Kees Cook wrote:
> On Mon, Apr 9, 2018 at 3:32 PM, Jens Axboe <axboe@kernel.dk> wrote:
>> That's bad, for sure, but my worry was bigger than an oops or crash,
>> we could have had corruption due to this.
>>
>> The resulting min/max and friends would have been trivial to test, but
>> clearly they weren't.
>
> Yeah, that was bad luck and my fault: I tested min(), max(), min_t(),
It's only bad luck if it was tested :-)
> and max_t(). My assumption was that since the others were built from
> them, they'd be fine. Not true in this shadow variable case, though.
> :( We could do something like this, which would have caught it:
Might not hurt to do.
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]
2018-04-09 22:32 ` Jens Axboe
2018-04-09 22:38 ` Kees Cook
@ 2018-04-09 23:54 ` Linus Torvalds
2018-04-10 0:31 ` Jens Axboe
1 sibling, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2018-04-09 23:54 UTC (permalink / raw)
To: Jens Axboe
Cc: Ming Lei, Kees Cook, Mike Snitzer, dm-devel, linux-block,
Chris Mason
On Mon, Apr 9, 2018 at 3:32 PM, Jens Axboe <axboe@kernel.dk> wrote:
>
> The resulting min/max and friends would have been trivial to test, but
> clearly they weren't.
Well, the min/max macros themselves actually were tested in user space by me.
It was the interaction with the unrelated "min_not_zero()" that wasn't ;)
It's easy in hind-sight to say "that's not at all unrelated", but
within the context of doing min/max, it was.
Linus
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+]
2018-04-09 23:54 ` Linus Torvalds
@ 2018-04-10 0:31 ` Jens Axboe
0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2018-04-10 0:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ming Lei, Kees Cook, Mike Snitzer, dm-devel, linux-block,
Chris Mason
On 4/9/18 5:54 PM, Linus Torvalds wrote:
> On Mon, Apr 9, 2018 at 3:32 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> The resulting min/max and friends would have been trivial to test, but
>> clearly they weren't.
>
> Well, the min/max macros themselves actually were tested in user space by me.
>
> It was the interaction with the unrelated "min_not_zero()" that wasn't ;)
>
> It's easy in hind-sight to say "that's not at all unrelated", but
> within the context of doing min/max, it was.
I guess we should just be thankful that it didn't cause bigger issues
than an oops.
--
Jens Axboe
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-04-10 0:31 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20180408040005.GA19128@ming.t460p>
2018-04-09 15:51 ` limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] Mike Snitzer
2018-04-09 18:38 ` Mike Snitzer
2018-04-09 19:32 ` Jens Axboe
2018-04-09 21:26 ` Jens Axboe
2018-04-09 21:56 ` Jens Axboe
2018-04-09 22:05 ` Kees Cook
2018-04-09 22:10 ` Jens Axboe
2018-04-09 22:27 ` Ming Lei
2018-04-09 22:32 ` Jens Axboe
2018-04-09 22:38 ` Kees Cook
2018-04-09 23:01 ` Jens Axboe
2018-04-09 23:54 ` Linus Torvalds
2018-04-10 0:31 ` Jens Axboe
2018-04-09 22:11 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).