From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 References: <20180408040005.GA19128@ming.t460p> <20180409155120.GA10990@redhat.com> <20180409183836.GA11256@redhat.com> <70f1d349-f091-19a2-9ec6-978ac1e7dda0@kernel.dk> <78732cb7-3666-e1b0-e6d3-2a3409a0d4ca@kernel.dk> In-Reply-To: <78732cb7-3666-e1b0-e6d3-2a3409a0d4ca@kernel.dk> From: Linus Torvalds Date: Mon, 09 Apr 2018 22:11:55 +0000 Message-ID: Subject: Re: limits->max_sectors is getting set to 0, why/where? [was: Re: dm: kernel oops by divide error on v4.16+] To: Jens Axboe Cc: Mike Snitzer , Ming Lei , dm-devel@redhat.com, linux-block@vger.kernel.org, Kees Cook , Chris Mason Content-Type: multipart/alternative; boundary="00000000000035df8d056971b20f" List-ID: --00000000000035df8d056971b20f Content-Type: text/plain; charset="UTF-8" On mobile, sorry for html crud and top posting, but here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9092d0d97961146655ce51f43850907d95f68c3 Should fix it. Linus On Mon, Apr 9, 2018, 14:56 Jens Axboe wrote: > On 4/9/18 3:26 PM, Jens Axboe wrote: > > On 4/9/18 1:32 PM, Jens Axboe wrote: > >> On 4/9/18 12:38 PM, Mike Snitzer wrote: > >>> On Mon, Apr 09 2018 at 11:51am -0400, > >>> Mike Snitzer wrote: > >>> > >>>> On Sun, Apr 08 2018 at 12:00am -0400, > >>>> Ming Lei wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> The following kernel oops(divide error) is triggered when running > >>>>> xfstest(generic/347) on ext4. > >>>>> > >>>>> [ 442.632954] run fstests generic/347 at 2018-04-07 18:06:44 > >>>>> [ 443.839480] divide error: 0000 [#1] PREEMPT SMP PTI > >>>>> [ 443.840201] Dumping ftrace buffer: > >>>>> [ 443.840692] (ftrace buffer empty) > >>> ... > >>>>> [ 443.845756] CPU: 1 PID: 29607 Comm: dmsetup Not tainted > 4.16.0_f605ba97fb80_master+ #1 > >>>>> [ 443.846968] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), > BIOS 1.10.2-2.fc27 04/01/2014 > >>>>> [ 443.848147] RIP: 0010:pool_io_hints+0x77/0x153 [dm_thin_pool] > >>> > >>> ... > >>> > >>>> I was able to reproduce (in my case RIP was pool_io_hints+0x45) > >>>> > >>>> Which on my kernel, is: > >>>> > >>>> crash> dis -l pool_io_hints+0x45 > >>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748 > >>>> 0xffffffffc0765165 : div %rdi > >>>> > >>>> Which is drivers/md/dm-thin.c:is_factor()'s return > >>>> !sector_div(block_size, n); > >>>> > >>>> SO looking at pool_io_hints() it would seem limits->max_sectors is 0 > for > >>>> this xfstests device... why would that be!? > >>>> > >>>> Clearly pool_io_hints() could stand to be more defensive with a > >>>> !limits->max_sectors negative check but is it ever really valid for > >>>> max_sectors to be 0? > >>>> > >>>> Pretty sure the ultimate bug is outside DM (but not seeing an obvious > >>>> place where block core would set max_sectors to 0, all blk-settings.c > >>>> uses min_not_zero(), etc). > >>> > >>> I successfully ran this test against the linux-dm.git > >>> "for-4.17/dm-changes" tag that Linus merged after the block changes: > >>> git:// > git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git > tags/for-4.17/dm-changes > >>> > >>> # ./check tests/generic/347 > >>> FSTYP -- ext4 > >>> PLATFORM -- Linux/x86_64 thegoat 4.16.0-rc5.snitm > >>> MKFS_OPTIONS -- /dev/mapper/test-xfstests_scratch > >>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_scratch > /scratch > >>> > >>> generic/347 65s > >>> Ran: generic/347 > >>> Passed all 1 tests > >>> > >>> SO this would seem to implicate some regression in the 4.17 block layer > >>> changes. > >> > >> No immediate ideas come to mind, we didn't have a lot of changes and I > >> don't see anything that looks problematic. Maybe you can try and > >> bisect it and see what you come up with? > > > > I ran it, problematic commit is: > > > > commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91 > > Author: Kees Cook > > Date: Fri Mar 30 18:52:36 2018 -0700 > > > > kernel.h: Retain constant expression output for max()/min() > > > > The fun continues. Thinking I'd try a userspace repro and thinking it > would be difficult to reproduce, try the attached min.c that just copies > all the bits from include/linux/kernel.h > > axboe@x1:~ $ gcc -Wall -O2 -o min min.c > axboe@x1:~ $ ./min 128 256 > min_not_zero(128, 256) = 0 > > -- > Jens Axboe > > --00000000000035df8d056971b20f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On mobile, sorry for html crud and top posting, but = here:
On 4/9/18 3:26 PM, Jens Axb= oe wrote:
> On 4/9/18 1:32 PM, Jens Axboe wrote:
>> On 4/9/18 12:38 PM, Mike Snitzer wrote:
>>> On Mon, Apr 09 2018 at 11:51am -0400,
>>> Mike Snitzer <snitzer@redhat.com> wrote:
>>>
>>>> On Sun, Apr 08 2018 at 12:00am -0400,
>>>> Ming Lei <ming.lei@redhat.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> The following kernel oops(divide error) is triggered w= hen running
>>>>> xfstest(generic/347) on ext4.
>>>>>
>>>>> [=C2=A0 442.632954] run fstests generic/347 at 2018-04= -07 18:06:44
>>>>> [=C2=A0 443.839480] divide error: 0000 [#1] PREEMPT SM= P PTI
>>>>> [=C2=A0 443.840201] Dumping ftrace buffer:
>>>>> [=C2=A0 443.840692]=C2=A0 =C2=A0 (ftrace buffer empty)=
>>> ...
>>>>> [=C2=A0 443.845756] CPU: 1 PID: 29607 Comm: dmsetup No= t tainted 4.16.0_f605ba97fb80_master+ #1
>>>>> [=C2=A0 443.846968] Hardware name: QEMU Standard PC (Q= 35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
>>>>> [=C2=A0 443.848147] RIP: 0010:pool_io_hints+0x77/0x153= [dm_thin_pool]
>>>
>>> ...
>>>
>>>> I was able to reproduce (in my case RIP was pool_io_hints+= 0x45)
>>>>
>>>> Which on my kernel, is:
>>>>
>>>> crash> dis -l pool_io_hints+0x45
>>>> /root/snitm/git/linux/drivers/md/dm-thin.c: 2748
>>>> 0xffffffffc0765165 <pool_io_hints+69>:=C2=A0 div=C2= =A0 =C2=A0 %rdi
>>>>
>>>> Which is drivers/md/dm-thin.c:is_factor()'s return
>>>> !sector_div(block_size, n);
>>>>
>>>> SO looking at pool_io_hints() it would seem limits->max= _sectors is 0 for
>>>> this xfstests device... why would that be!?
>>>>
>>>> Clearly pool_io_hints() could stand to be more defensive w= ith a
>>>> !limits->max_sectors negative check but is it ever real= ly valid for
>>>> max_sectors to be 0?
>>>>
>>>> Pretty sure the ultimate bug is outside DM (but not seeing= an obvious
>>>> place where block core would set max_sectors to 0, all blk= -settings.c
>>>> uses min_not_zero(), etc).
>>>
>>> I successfully ran this test against the linux-dm.git
>>> "for-4.17/dm-changes" tag that Linus merged after th= e block changes:
>>>=C2=A0 git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git= tags/for-4.17/dm-changes
>>>
>>> # ./check tests/generic/347
>>> FSTYP=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-- ext4
>>> PLATFORM=C2=A0 =C2=A0 =C2=A0 -- Linux/x86_64 thegoat 4.16.0-rc= 5.snitm
>>> MKFS_OPTIONS=C2=A0 -- /dev/mapper/test-xfstests_scratch
>>> MOUNT_OPTIONS -- -o acl,user_xattr /dev/mapper/test-xfstests_s= cratch /scratch
>>>
>>> generic/347=C2=A0 =C2=A0 =C2=A0 65s
>>> Ran: generic/347
>>> Passed all 1 tests
>>>
>>> SO this would seem to implicate some regression in the 4.17 bl= ock layer
>>> changes.
>>
>> No immediate ideas come to mind, we didn't have a lot of chang= es and I
>> don't see anything that looks problematic. Maybe you can try a= nd
>> bisect it and see what you come up with?
>
> I ran it, problematic commit is:
>
> commit 3c8ba0d61d04ced9f8d9ff93977995a9e4e96e91
> Author: Kees Cook <keescook@chromium.org>
> Date:=C2=A0 =C2=A0Fri Mar 30 18:52:36 2018 -0700
>
>=C2=A0 =C2=A0 =C2=A0kernel.h: Retain constant expression output for max= ()/min()
>

The fun continues. Thinking I'd try a userspace repro and thinking it would be difficult to reproduce, try the attached min.c that just copies all the bits from include/linux/kernel.h

axboe@x1:~ $ gcc -Wall -O2 -o min min.c
axboe@x1:~ $ ./min 128 256
min_not_zero(128, 256) =3D 0

--
Jens Axboe

--00000000000035df8d056971b20f--