cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848
@ 2015-07-29 13:27 Josh Boyer
  2015-07-29 13:51 ` Johannes Weiner
  0 siblings, 1 reply; 23+ messages in thread
From: Josh Boyer @ 2015-07-29 13:27 UTC (permalink / raw)
  To: Ming Lei, Tejun Heo, Johannes Weiner
  Cc: Jens Axboe, Linux-Kernel@Vger. Kernel. Org

Hi All,

We've gotten a report[1] that any of the upcoming Fedora 23 install
images are all failing on 32-bit VMs/machines.  Looking at the first
instance of the oops, it seems to be a bad page state where a page is
still charged to a group and it is trying to be freed.  The oops
output is below.

Has anyone seen this in their 32-bit testing at all?  Thus far nobody
can recreate this on a 64-bit machine/VM.

josh

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1247382

[    9.026738] systemd[1]: Switching root.
[    9.036467] systemd-journald[149]: Received SIGTERM from PID 1 (systemd).
[    9.082262] BUG: Bad page state in process kworker/u5:1  pfn:372ac
[    9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 index:0x16a
[    9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
[    9.087284] page dumped because: page still charged to cgroup
[    9.088772] bad because of flags:
[    9.089731] flags: 0x21(locked|lru)
[    9.090818] page->mem_cgroup:f2c3e400
[    9.091862] Modules linked in: loop nls_utf8 isofs 8021q garp stp
llc 8139too mrp 8139cp crc32_pclmul ata_generic crc32c_intel qxl
syscopyarea sysfillrect sysimgblt drm_kms_helper serio_raw mii
virtio_pci ttm pata_acpi drm scsi_dh_rdac scsi_dh_emc scsi_dh_alua
sunrpc dm_crypt dm_round_robin linear raid10 raid456 async_raid6_recov
async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0
iscsi_ibft iscsi_boot_sysfs floppy iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi squashfs cramfs edd dm_multipath
[    9.104829] CPU: 0 PID: 745 Comm: kworker/u5:1 Not tainted
4.2.0-0.rc3.git4.1.fc23.i686 #1
[    9.106987] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.7.5-20140709_153950- 04/01/2014
[    9.109445] Workqueue: kloopd1 loop_queue_read_work [loop]
[    9.110982]  c0d439a7 af8cdfed 00000000 f6cfbd1c c0aa22c9 f3d32ae0
f6cfbd40 c054e30a
[    9.113298]  c0c6e4e0 f6dfc228 000372ac 00b13ce1 c0c7271d f2252178
00000000 f6cfbd60
[    9.115562]  c054eea9 f6cfbd5c 00000000 00000000 f3d32ae0 f3494000
40020021 f6cfbd8c
[    9.117848] Call Trace:
[    9.118738]  [<c0aa22c9>] dump_stack+0x41/0x52
[    9.120034]  [<c054e30a>] bad_page.part.80+0xaa/0x100
[    9.121461]  [<c054eea9>] free_pages_prepare+0x3b9/0x3f0
[    9.122934]  [<c054fae2>] free_hot_cold_page+0x22/0x160
[    9.124400]  [<c071a22f>] ? copy_to_iter+0x1af/0x2a0
[    9.125750]  [<c054c4a3>] ? mempool_free_slab+0x13/0x20
[    9.126840]  [<c054fc57>] __free_pages+0x37/0x50
[    9.127849]  [<c054c4fd>] mempool_free_pages+0xd/0x10
[    9.128908]  [<c054c8b6>] mempool_free+0x26/0x80
[    9.129895]  [<c06f77e6>] bounce_end_io+0x56/0x80
[    9.130923]  [<c06f7ce2>] bounce_end_io_read+0x32/0x40
[    9.131973]  [<c06d8dc6>] bio_endio+0x56/0x90
[    9.132953]  [<c06df817>] blk_update_request+0x87/0x310
[    9.134042]  [<c04499f7>] ? kvm_clock_read+0x17/0x20
[    9.135103]  [<c040bdd8>] ? sched_clock+0x8/0x10
[    9.136100]  [<c06e7756>] blk_mq_end_request+0x16/0x60
[    9.136912]  [<c06e7fed>] __blk_mq_complete_request+0x9d/0xd0
[    9.137730]  [<c06e8035>] blk_mq_complete_request+0x15/0x20
[    9.138515]  [<f7e0851d>] loop_handle_cmd.isra.23+0x5d/0x8c0 [loop]
[    9.139390]  [<c0491b53>] ? pick_next_task_fair+0xa63/0xbb0
[    9.140202]  [<f7e08e60>] loop_queue_read_work+0x10/0x12 [loop]
[    9.141043]  [<c0471c55>] process_one_work+0x145/0x380
[    9.141779]  [<c0471ec9>] worker_thread+0x39/0x430
[    9.142524]  [<c0471e90>] ? process_one_work+0x380/0x380
[    9.143303]  [<c04772b6>] kthread+0xa6/0xc0
[    9.143936]  [<c0aa7a81>] ret_from_kernel_thread+0x21/0x30
[    9.144742]  [<c0477210>] ? kthread_worker_fn+0x130/0x130
[    9.145529] Disabling lock debugging due to kernel taint

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848
  2015-07-29 13:27 cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848 Josh Boyer
@ 2015-07-29 13:51 ` Johannes Weiner
  2015-07-29 15:32   ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Johannes Weiner @ 2015-07-29 13:51 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Ming Lei, Tejun Heo, Jens Axboe, Linux-Kernel@Vger. Kernel. Org

On Wed, Jul 29, 2015 at 09:27:16AM -0400, Josh Boyer wrote:
> Hi All,
> 
> We've gotten a report[1] that any of the upcoming Fedora 23 install
> images are all failing on 32-bit VMs/machines.  Looking at the first
> instance of the oops, it seems to be a bad page state where a page is
> still charged to a group and it is trying to be freed.  The oops
> output is below.
> 
> Has anyone seen this in their 32-bit testing at all?  Thus far nobody
> can recreate this on a 64-bit machine/VM.
> 
> josh
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1247382
> 
> [    9.026738] systemd[1]: Switching root.
> [    9.036467] systemd-journald[149]: Received SIGTERM from PID 1 (systemd).
> [    9.082262] BUG: Bad page state in process kworker/u5:1  pfn:372ac
> [    9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 index:0x16a
> [    9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
> [    9.087284] page dumped because: page still charged to cgroup
> [    9.088772] bad because of flags:
> [    9.089731] flags: 0x21(locked|lru)
> [    9.090818] page->mem_cgroup:f2c3e400

It's also still locked and on the LRU. This page shouldn't have been
freed.

> [    9.117848] Call Trace:
> [    9.118738]  [<c0aa22c9>] dump_stack+0x41/0x52
> [    9.120034]  [<c054e30a>] bad_page.part.80+0xaa/0x100
> [    9.121461]  [<c054eea9>] free_pages_prepare+0x3b9/0x3f0
> [    9.122934]  [<c054fae2>] free_hot_cold_page+0x22/0x160
> [    9.124400]  [<c071a22f>] ? copy_to_iter+0x1af/0x2a0
> [    9.125750]  [<c054c4a3>] ? mempool_free_slab+0x13/0x20
> [    9.126840]  [<c054fc57>] __free_pages+0x37/0x50
> [    9.127849]  [<c054c4fd>] mempool_free_pages+0xd/0x10
> [    9.128908]  [<c054c8b6>] mempool_free+0x26/0x80
> [    9.129895]  [<c06f77e6>] bounce_end_io+0x56/0x80

The page state looks completely off for a bounce buffer page. Did
somebody mess with a bounce bio's bv_page?

> [    9.130923]  [<c06f7ce2>] bounce_end_io_read+0x32/0x40
> [    9.131973]  [<c06d8dc6>] bio_endio+0x56/0x90
> [    9.132953]  [<c06df817>] blk_update_request+0x87/0x310
> [    9.134042]  [<c04499f7>] ? kvm_clock_read+0x17/0x20
> [    9.135103]  [<c040bdd8>] ? sched_clock+0x8/0x10
> [    9.136100]  [<c06e7756>] blk_mq_end_request+0x16/0x60
> [    9.136912]  [<c06e7fed>] __blk_mq_complete_request+0x9d/0xd0
> [    9.137730]  [<c06e8035>] blk_mq_complete_request+0x15/0x20
> [    9.138515]  [<f7e0851d>] loop_handle_cmd.isra.23+0x5d/0x8c0 [loop]
> [    9.139390]  [<c0491b53>] ? pick_next_task_fair+0xa63/0xbb0
> [    9.140202]  [<f7e08e60>] loop_queue_read_work+0x10/0x12 [loop]
> [    9.141043]  [<c0471c55>] process_one_work+0x145/0x380
> [    9.141779]  [<c0471ec9>] worker_thread+0x39/0x430
> [    9.142524]  [<c0471e90>] ? process_one_work+0x380/0x380
> [    9.143303]  [<c04772b6>] kthread+0xa6/0xc0
> [    9.143936]  [<c0aa7a81>] ret_from_kernel_thread+0x21/0x30
> [    9.144742]  [<c0477210>] ? kthread_worker_fn+0x130/0x130
> [    9.145529] Disabling lock debugging due to kernel taint

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848
  2015-07-29 13:51 ` Johannes Weiner
@ 2015-07-29 15:32   ` Ming Lei
  2015-07-29 16:36     ` Josh Boyer
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2015-07-29 15:32 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Josh Boyer, Tejun Heo, Jens Axboe, Linux-Kernel@Vger. Kernel. Org

On Wed, Jul 29, 2015 at 9:51 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Wed, Jul 29, 2015 at 09:27:16AM -0400, Josh Boyer wrote:
>> Hi All,
>>
>> We've gotten a report[1] that any of the upcoming Fedora 23 install
>> images are all failing on 32-bit VMs/machines.  Looking at the first
>> instance of the oops, it seems to be a bad page state where a page is
>> still charged to a group and it is trying to be freed.  The oops
>> output is below.
>>
>> Has anyone seen this in their 32-bit testing at all?  Thus far nobody
>> can recreate this on a 64-bit machine/VM.
>>
>> josh
>>
>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1247382
>>
>> [    9.026738] systemd[1]: Switching root.
>> [    9.036467] systemd-journald[149]: Received SIGTERM from PID 1 (systemd).
>> [    9.082262] BUG: Bad page state in process kworker/u5:1  pfn:372ac
>> [    9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 index:0x16a
>> [    9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
>> [    9.087284] page dumped because: page still charged to cgroup
>> [    9.088772] bad because of flags:
>> [    9.089731] flags: 0x21(locked|lru)
>> [    9.090818] page->mem_cgroup:f2c3e400
>
> It's also still locked and on the LRU. This page shouldn't have been
> freed.
>
>> [    9.117848] Call Trace:
>> [    9.118738]  [<c0aa22c9>] dump_stack+0x41/0x52
>> [    9.120034]  [<c054e30a>] bad_page.part.80+0xaa/0x100
>> [    9.121461]  [<c054eea9>] free_pages_prepare+0x3b9/0x3f0
>> [    9.122934]  [<c054fae2>] free_hot_cold_page+0x22/0x160
>> [    9.124400]  [<c071a22f>] ? copy_to_iter+0x1af/0x2a0
>> [    9.125750]  [<c054c4a3>] ? mempool_free_slab+0x13/0x20
>> [    9.126840]  [<c054fc57>] __free_pages+0x37/0x50
>> [    9.127849]  [<c054c4fd>] mempool_free_pages+0xd/0x10
>> [    9.128908]  [<c054c8b6>] mempool_free+0x26/0x80
>> [    9.129895]  [<c06f77e6>] bounce_end_io+0x56/0x80
>
> The page state looks completely off for a bounce buffer page. Did
> somebody mess with a bounce bio's bv_page?

Looks the page isn't touched in both lo_read_transfer() and
lo_read_simple().

Maybe it is related with aa4d86163e4e(block: loop: switch to VFS ITER_BVEC),
or it  might be helpful to run 'git bisect' if reverting aa4d86163e4e can't
fix the issue, suppose the issue can be reproduced easily.

>
>> [    9.130923]  [<c06f7ce2>] bounce_end_io_read+0x32/0x40
>> [    9.131973]  [<c06d8dc6>] bio_endio+0x56/0x90
>> [    9.132953]  [<c06df817>] blk_update_request+0x87/0x310
>> [    9.134042]  [<c04499f7>] ? kvm_clock_read+0x17/0x20
>> [    9.135103]  [<c040bdd8>] ? sched_clock+0x8/0x10
>> [    9.136100]  [<c06e7756>] blk_mq_end_request+0x16/0x60
>> [    9.136912]  [<c06e7fed>] __blk_mq_complete_request+0x9d/0xd0
>> [    9.137730]  [<c06e8035>] blk_mq_complete_request+0x15/0x20
>> [    9.138515]  [<f7e0851d>] loop_handle_cmd.isra.23+0x5d/0x8c0 [loop]
>> [    9.139390]  [<c0491b53>] ? pick_next_task_fair+0xa63/0xbb0
>> [    9.140202]  [<f7e08e60>] loop_queue_read_work+0x10/0x12 [loop]
>> [    9.141043]  [<c0471c55>] process_one_work+0x145/0x380
>> [    9.141779]  [<c0471ec9>] worker_thread+0x39/0x430
>> [    9.142524]  [<c0471e90>] ? process_one_work+0x380/0x380
>> [    9.143303]  [<c04772b6>] kthread+0xa6/0xc0
>> [    9.143936]  [<c0aa7a81>] ret_from_kernel_thread+0x21/0x30
>> [    9.144742]  [<c0477210>] ? kthread_worker_fn+0x130/0x130
>> [    9.145529] Disabling lock debugging due to kernel taint

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848
  2015-07-29 15:32   ` Ming Lei
@ 2015-07-29 16:36     ` Josh Boyer
  2015-07-30  0:29       ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Josh Boyer @ 2015-07-29 16:36 UTC (permalink / raw)
  To: Ming Lei
  Cc: Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Jul 29, 2015 at 11:32 AM, Ming Lei <ming.lei@canonical.com> wrote:
> On Wed, Jul 29, 2015 at 9:51 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> On Wed, Jul 29, 2015 at 09:27:16AM -0400, Josh Boyer wrote:
>>> Hi All,
>>>
>>> We've gotten a report[1] that any of the upcoming Fedora 23 install
>>> images are all failing on 32-bit VMs/machines.  Looking at the first
>>> instance of the oops, it seems to be a bad page state where a page is
>>> still charged to a group and it is trying to be freed.  The oops
>>> output is below.
>>>
>>> Has anyone seen this in their 32-bit testing at all?  Thus far nobody
>>> can recreate this on a 64-bit machine/VM.
>>>
>>> josh
>>>
>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1247382
>>>
>>> [    9.026738] systemd[1]: Switching root.
>>> [    9.036467] systemd-journald[149]: Received SIGTERM from PID 1 (systemd).
>>> [    9.082262] BUG: Bad page state in process kworker/u5:1  pfn:372ac
>>> [    9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 index:0x16a
>>> [    9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
>>> [    9.087284] page dumped because: page still charged to cgroup
>>> [    9.088772] bad because of flags:
>>> [    9.089731] flags: 0x21(locked|lru)
>>> [    9.090818] page->mem_cgroup:f2c3e400
>>
>> It's also still locked and on the LRU. This page shouldn't have been
>> freed.
>>
>>> [    9.117848] Call Trace:
>>> [    9.118738]  [<c0aa22c9>] dump_stack+0x41/0x52
>>> [    9.120034]  [<c054e30a>] bad_page.part.80+0xaa/0x100
>>> [    9.121461]  [<c054eea9>] free_pages_prepare+0x3b9/0x3f0
>>> [    9.122934]  [<c054fae2>] free_hot_cold_page+0x22/0x160
>>> [    9.124400]  [<c071a22f>] ? copy_to_iter+0x1af/0x2a0
>>> [    9.125750]  [<c054c4a3>] ? mempool_free_slab+0x13/0x20
>>> [    9.126840]  [<c054fc57>] __free_pages+0x37/0x50
>>> [    9.127849]  [<c054c4fd>] mempool_free_pages+0xd/0x10
>>> [    9.128908]  [<c054c8b6>] mempool_free+0x26/0x80
>>> [    9.129895]  [<c06f77e6>] bounce_end_io+0x56/0x80
>>
>> The page state looks completely off for a bounce buffer page. Did
>> somebody mess with a bounce bio's bv_page?
>
> Looks the page isn't touched in both lo_read_transfer() and
> lo_read_simple().
>
> Maybe it is related with aa4d86163e4e(block: loop: switch to VFS ITER_BVEC),
> or it  might be helpful to run 'git bisect' if reverting aa4d86163e4e can't
> fix the issue, suppose the issue can be reproduced easily.

I can try reverting that and getting someone to test it.  It is
somewhat complicated by having to spin a new install ISO, so a report
back will be somewhat delayed.  In the meantime, I'm also asking
people to track down the first kernel build that hits this, so
hopefully that gives us more of a clue as well.

It is odd that only 32-bit hits this issue though.  At least from what
we've seen thus far.

josh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848
  2015-07-29 16:36     ` Josh Boyer
@ 2015-07-30  0:29       ` Ming Lei
  2015-07-30 11:27         ` Josh Boyer
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2015-07-30  0:29 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Jul 29, 2015 at 12:36 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> On Wed, Jul 29, 2015 at 11:32 AM, Ming Lei <ming.lei@canonical.com> wrote:
>> On Wed, Jul 29, 2015 at 9:51 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>>> On Wed, Jul 29, 2015 at 09:27:16AM -0400, Josh Boyer wrote:
>>>> Hi All,
>>>>
>>>> We've gotten a report[1] that any of the upcoming Fedora 23 install
>>>> images are all failing on 32-bit VMs/machines.  Looking at the first
>>>> instance of the oops, it seems to be a bad page state where a page is
>>>> still charged to a group and it is trying to be freed.  The oops
>>>> output is below.
>>>>
>>>> Has anyone seen this in their 32-bit testing at all?  Thus far nobody
>>>> can recreate this on a 64-bit machine/VM.
>>>>
>>>> josh
>>>>
>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1247382
>>>>
>>>> [    9.026738] systemd[1]: Switching root.
>>>> [    9.036467] systemd-journald[149]: Received SIGTERM from PID 1 (systemd).
>>>> [    9.082262] BUG: Bad page state in process kworker/u5:1  pfn:372ac
>>>> [    9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 index:0x16a
>>>> [    9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
>>>> [    9.087284] page dumped because: page still charged to cgroup
>>>> [    9.088772] bad because of flags:
>>>> [    9.089731] flags: 0x21(locked|lru)
>>>> [    9.090818] page->mem_cgroup:f2c3e400
>>>
>>> It's also still locked and on the LRU. This page shouldn't have been
>>> freed.
>>>
>>>> [    9.117848] Call Trace:
>>>> [    9.118738]  [<c0aa22c9>] dump_stack+0x41/0x52
>>>> [    9.120034]  [<c054e30a>] bad_page.part.80+0xaa/0x100
>>>> [    9.121461]  [<c054eea9>] free_pages_prepare+0x3b9/0x3f0
>>>> [    9.122934]  [<c054fae2>] free_hot_cold_page+0x22/0x160
>>>> [    9.124400]  [<c071a22f>] ? copy_to_iter+0x1af/0x2a0
>>>> [    9.125750]  [<c054c4a3>] ? mempool_free_slab+0x13/0x20
>>>> [    9.126840]  [<c054fc57>] __free_pages+0x37/0x50
>>>> [    9.127849]  [<c054c4fd>] mempool_free_pages+0xd/0x10
>>>> [    9.128908]  [<c054c8b6>] mempool_free+0x26/0x80
>>>> [    9.129895]  [<c06f77e6>] bounce_end_io+0x56/0x80
>>>
>>> The page state looks completely off for a bounce buffer page. Did
>>> somebody mess with a bounce bio's bv_page?
>>
>> Looks the page isn't touched in both lo_read_transfer() and
>> lo_read_simple().
>>
>> Maybe it is related with aa4d86163e4e(block: loop: switch to VFS ITER_BVEC),
>> or it  might be helpful to run 'git bisect' if reverting aa4d86163e4e can't
>> fix the issue, suppose the issue can be reproduced easily.
>
> I can try reverting that and getting someone to test it.  It is
> somewhat complicated by having to spin a new install ISO, so a report
> back will be somewhat delayed.  In the meantime, I'm also asking
> people to track down the first kernel build that hits this, so
> hopefully that gives us more of a clue as well.
>
> It is odd that only 32-bit hits this issue though.  At least from what
> we've seen thus far.

Page bounce may be just valid on 32-bit, and I will try to find one ARM
box to see if it can be reproduced easily.

BTW, are there any extra steps for reproducing the issue? Such as
cgroup operations?

Thanks,

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848
  2015-07-30  0:29       ` Ming Lei
@ 2015-07-30 11:27         ` Josh Boyer
  2015-07-30 23:14           ` Josh Boyer
  0 siblings, 1 reply; 23+ messages in thread
From: Josh Boyer @ 2015-07-30 11:27 UTC (permalink / raw)
  To: Ming Lei
  Cc: Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org

On Wed, Jul 29, 2015 at 8:29 PM, Ming Lei <ming.lei@canonical.com> wrote:
> On Wed, Jul 29, 2015 at 12:36 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
>> On Wed, Jul 29, 2015 at 11:32 AM, Ming Lei <ming.lei@canonical.com> wrote:
>>> On Wed, Jul 29, 2015 at 9:51 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>>>> On Wed, Jul 29, 2015 at 09:27:16AM -0400, Josh Boyer wrote:
>>>>> Hi All,
>>>>>
>>>>> We've gotten a report[1] that any of the upcoming Fedora 23 install
>>>>> images are all failing on 32-bit VMs/machines.  Looking at the first
>>>>> instance of the oops, it seems to be a bad page state where a page is
>>>>> still charged to a group and it is trying to be freed.  The oops
>>>>> output is below.
>>>>>
>>>>> Has anyone seen this in their 32-bit testing at all?  Thus far nobody
>>>>> can recreate this on a 64-bit machine/VM.
>>>>>
>>>>> josh
>>>>>
>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1247382
>>>>>
>>>>> [    9.026738] systemd[1]: Switching root.
>>>>> [    9.036467] systemd-journald[149]: Received SIGTERM from PID 1 (systemd).
>>>>> [    9.082262] BUG: Bad page state in process kworker/u5:1  pfn:372ac
>>>>> [    9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 index:0x16a
>>>>> [    9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
>>>>> [    9.087284] page dumped because: page still charged to cgroup
>>>>> [    9.088772] bad because of flags:
>>>>> [    9.089731] flags: 0x21(locked|lru)
>>>>> [    9.090818] page->mem_cgroup:f2c3e400
>>>>
>>>> It's also still locked and on the LRU. This page shouldn't have been
>>>> freed.
>>>>
>>>>> [    9.117848] Call Trace:
>>>>> [    9.118738]  [<c0aa22c9>] dump_stack+0x41/0x52
>>>>> [    9.120034]  [<c054e30a>] bad_page.part.80+0xaa/0x100
>>>>> [    9.121461]  [<c054eea9>] free_pages_prepare+0x3b9/0x3f0
>>>>> [    9.122934]  [<c054fae2>] free_hot_cold_page+0x22/0x160
>>>>> [    9.124400]  [<c071a22f>] ? copy_to_iter+0x1af/0x2a0
>>>>> [    9.125750]  [<c054c4a3>] ? mempool_free_slab+0x13/0x20
>>>>> [    9.126840]  [<c054fc57>] __free_pages+0x37/0x50
>>>>> [    9.127849]  [<c054c4fd>] mempool_free_pages+0xd/0x10
>>>>> [    9.128908]  [<c054c8b6>] mempool_free+0x26/0x80
>>>>> [    9.129895]  [<c06f77e6>] bounce_end_io+0x56/0x80
>>>>
>>>> The page state looks completely off for a bounce buffer page. Did
>>>> somebody mess with a bounce bio's bv_page?
>>>
>>> Looks the page isn't touched in both lo_read_transfer() and
>>> lo_read_simple().
>>>
>>> Maybe it is related with aa4d86163e4e(block: loop: switch to VFS ITER_BVEC),
>>> or it  might be helpful to run 'git bisect' if reverting aa4d86163e4e can't
>>> fix the issue, suppose the issue can be reproduced easily.
>>
>> I can try reverting that and getting someone to test it.  It is
>> somewhat complicated by having to spin a new install ISO, so a report
>> back will be somewhat delayed.  In the meantime, I'm also asking
>> people to track down the first kernel build that hits this, so
>> hopefully that gives us more of a clue as well.
>>
>> It is odd that only 32-bit hits this issue though.  At least from what
>> we've seen thus far.
>
> Page bounce may be just valid on 32-bit, and I will try to find one ARM
> box to see if it can be reproduced easily.
>
> BTW, are there any extra steps for reproducing the issue? Such as
> cgroup operations?

I'm not entirely sure what the install environment on the ISOs is
doing, but nobody sees this issue with a kernel after install.  Thus
far recreate efforts have focused on recreating the install ISOs using
various kernels.  That is working, but I don't expect other people to
easily be able to do that.

Also, our primary tester seems to have narrowed it down to breaking
somewhere between 4.1-rc5 (good) and 4.1-rc6 (bad).  I'll be working
with him today to isolate it further, but the commit you pointed out
was in 4.1-rc1 and that worked.  He still needs to test a 4.2-rc4
kernel with it reverted, but so far it seems to be something else that
came in with the 4.1 kernel.

josh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848
  2015-07-30 11:27         ` Josh Boyer
@ 2015-07-30 23:14           ` Josh Boyer
  2015-07-31  0:19             ` Mike Snitzer
  0 siblings, 1 reply; 23+ messages in thread
From: Josh Boyer @ 2015-07-30 23:14 UTC (permalink / raw)
  To: Ming Lei, snitzer, ejt
  Cc: Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org

On Thu, Jul 30, 2015 at 7:27 AM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> On Wed, Jul 29, 2015 at 8:29 PM, Ming Lei <ming.lei@canonical.com> wrote:
>> On Wed, Jul 29, 2015 at 12:36 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
>>> On Wed, Jul 29, 2015 at 11:32 AM, Ming Lei <ming.lei@canonical.com> wrote:
>>>> On Wed, Jul 29, 2015 at 9:51 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>>>>> On Wed, Jul 29, 2015 at 09:27:16AM -0400, Josh Boyer wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> We've gotten a report[1] that any of the upcoming Fedora 23 install
>>>>>> images are all failing on 32-bit VMs/machines.  Looking at the first
>>>>>> instance of the oops, it seems to be a bad page state where a page is
>>>>>> still charged to a group and it is trying to be freed.  The oops
>>>>>> output is below.
>>>>>>
>>>>>> Has anyone seen this in their 32-bit testing at all?  Thus far nobody
>>>>>> can recreate this on a 64-bit machine/VM.
>>>>>>
>>>>>> josh
>>>>>>
>>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1247382
>>>>>>
>>>>>> [    9.026738] systemd[1]: Switching root.
>>>>>> [    9.036467] systemd-journald[149]: Received SIGTERM from PID 1 (systemd).
>>>>>> [    9.082262] BUG: Bad page state in process kworker/u5:1  pfn:372ac
>>>>>> [    9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 index:0x16a
>>>>>> [    9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
>>>>>> [    9.087284] page dumped because: page still charged to cgroup
>>>>>> [    9.088772] bad because of flags:
>>>>>> [    9.089731] flags: 0x21(locked|lru)
>>>>>> [    9.090818] page->mem_cgroup:f2c3e400
>>>>>
>>>>> It's also still locked and on the LRU. This page shouldn't have been
>>>>> freed.
>>>>>
>>>>>> [    9.117848] Call Trace:
>>>>>> [    9.118738]  [<c0aa22c9>] dump_stack+0x41/0x52
>>>>>> [    9.120034]  [<c054e30a>] bad_page.part.80+0xaa/0x100
>>>>>> [    9.121461]  [<c054eea9>] free_pages_prepare+0x3b9/0x3f0
>>>>>> [    9.122934]  [<c054fae2>] free_hot_cold_page+0x22/0x160
>>>>>> [    9.124400]  [<c071a22f>] ? copy_to_iter+0x1af/0x2a0
>>>>>> [    9.125750]  [<c054c4a3>] ? mempool_free_slab+0x13/0x20
>>>>>> [    9.126840]  [<c054fc57>] __free_pages+0x37/0x50
>>>>>> [    9.127849]  [<c054c4fd>] mempool_free_pages+0xd/0x10
>>>>>> [    9.128908]  [<c054c8b6>] mempool_free+0x26/0x80
>>>>>> [    9.129895]  [<c06f77e6>] bounce_end_io+0x56/0x80
>>>>>
>>>>> The page state looks completely off for a bounce buffer page. Did
>>>>> somebody mess with a bounce bio's bv_page?
>>>>
>>>> Looks the page isn't touched in both lo_read_transfer() and
>>>> lo_read_simple().
>>>>
>>>> Maybe it is related with aa4d86163e4e(block: loop: switch to VFS ITER_BVEC),
>>>> or it  might be helpful to run 'git bisect' if reverting aa4d86163e4e can't
>>>> fix the issue, suppose the issue can be reproduced easily.
>>>
>>> I can try reverting that and getting someone to test it.  It is
>>> somewhat complicated by having to spin a new install ISO, so a report
>>> back will be somewhat delayed.  In the meantime, I'm also asking
>>> people to track down the first kernel build that hits this, so
>>> hopefully that gives us more of a clue as well.

The revert of that patch did not fix the issue.

>>> It is odd that only 32-bit hits this issue though.  At least from what
>>> we've seen thus far.
>>
>> Page bounce may be just valid on 32-bit, and I will try to find one ARM
>> box to see if it can be reproduced easily.
>>
>> BTW, are there any extra steps for reproducing the issue? Such as
>> cgroup operations?
>
> I'm not entirely sure what the install environment on the ISOs is
> doing, but nobody sees this issue with a kernel after install.  Thus
> far recreate efforts have focused on recreating the install ISOs using
> various kernels.  That is working, but I don't expect other people to
> easily be able to do that.
>
> Also, our primary tester seems to have narrowed it down to breaking
> somewhere between 4.1-rc5 (good) and 4.1-rc6 (bad).  I'll be working
> with him today to isolate it further, but the commit you pointed out
> was in 4.1-rc1 and that worked.  He still needs to test a 4.2-rc4
> kernel with it reverted, but so far it seems to be something else that
> came in with the 4.1 kernel.

After doing some RPM bisecting, we've narrowed it down to the
following commit range:

[jwboyer@vader linux]$ git log --pretty=oneline c2102f3d73d8..0f1e5b5d19f6
0f1e5b5d19f6c06fe2078f946377db9861f3910d Merge tag 'dm-4.1-fixes-3' of
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
1c220c69ce0dcc0f234a9f263ad9c0864f971852 dm: fix casting bug in dm_merge_bvec()
15b94a690470038aa08247eedbebbe7e2218d5ee dm: fix reload failure of 0
path multipath mapping on blk-mq devices
e5d8de32cc02a259e1a237ab57cba00f2930fa6a dm: fix false warning in
free_rq_clone() for unmapped requests
45714fbed4556149d7f1730f5bae74f81d5e2cd5 dm: requeue from blk-mq
dm_mq_queue_rq() using BLK_MQ_RQ_QUEUE_BUSY
4c6dd53dd3674c310d7379c6b3273daa9fd95c79 dm mpath: fix leak of
dm_mpath_io structure in blk-mq .queue_rq error path
3a1407559a593d4360af12dd2df5296bf8eb0d28 dm: fix NULL pointer when
clone_and_map_rq returns !DM_MAPIO_REMAPPED
4ae9944d132b160d444fa3aa875307eb0fa3eeec dm: run queue on re-queue
[jwboyer@vader linux]$

It is interesting to note that we're also carrying a patch in our 4.1
kernel for loop performance reasons that went into upstream 4.2.  That
patch is blk-loop-avoid-too-many-pending-per-work-IO.patch which
corresponds to upstream commit
4d4e41aef9429872ea3b105e83426941f7185ab6.  All of those commits are in
4.2-rcX, which matches the failures we're seeing.

We can try a 4.1-rc5 snapshot build without the block patch to see if
that helps, but the patch was included in all the previously tested
good kernels and the issue only appeared after the DM merge commits
were included.

josh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848
  2015-07-30 23:14           ` Josh Boyer
@ 2015-07-31  0:19             ` Mike Snitzer
  2015-07-31 18:58               ` Josh Boyer
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2015-07-31  0:19 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Ming Lei, ejt, Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org

On Thu, Jul 30 2015 at  7:14pm -0400,
Josh Boyer <jwboyer@fedoraproject.org> wrote:

> On Thu, Jul 30, 2015 at 7:27 AM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> > On Wed, Jul 29, 2015 at 8:29 PM, Ming Lei <ming.lei@canonical.com> wrote:
> >> On Wed, Jul 29, 2015 at 12:36 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> >>> On Wed, Jul 29, 2015 at 11:32 AM, Ming Lei <ming.lei@canonical.com> wrote:
> >>>> On Wed, Jul 29, 2015 at 9:51 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >>>>> On Wed, Jul 29, 2015 at 09:27:16AM -0400, Josh Boyer wrote:
> >>>>>> Hi All,
> >>>>>>
> >>>>>> We've gotten a report[1] that any of the upcoming Fedora 23 install
> >>>>>> images are all failing on 32-bit VMs/machines.  Looking at the first
> >>>>>> instance of the oops, it seems to be a bad page state where a page is
> >>>>>> still charged to a group and it is trying to be freed.  The oops
> >>>>>> output is below.
> >>>>>>
> >>>>>> Has anyone seen this in their 32-bit testing at all?  Thus far nobody
> >>>>>> can recreate this on a 64-bit machine/VM.
> >>>>>>
> >>>>>> josh
> >>>>>>
> >>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1247382
> >>>>>>
> >>>>>> [    9.026738] systemd[1]: Switching root.
> >>>>>> [    9.036467] systemd-journald[149]: Received SIGTERM from PID 1 (systemd).
> >>>>>> [    9.082262] BUG: Bad page state in process kworker/u5:1  pfn:372ac
> >>>>>> [    9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 index:0x16a
> >>>>>> [    9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
> >>>>>> [    9.087284] page dumped because: page still charged to cgroup
> >>>>>> [    9.088772] bad because of flags:
> >>>>>> [    9.089731] flags: 0x21(locked|lru)
> >>>>>> [    9.090818] page->mem_cgroup:f2c3e400
> >>>>>
> >>>>> It's also still locked and on the LRU. This page shouldn't have been
> >>>>> freed.
> >>>>>
> >>>>>> [    9.117848] Call Trace:
> >>>>>> [    9.118738]  [<c0aa22c9>] dump_stack+0x41/0x52
> >>>>>> [    9.120034]  [<c054e30a>] bad_page.part.80+0xaa/0x100
> >>>>>> [    9.121461]  [<c054eea9>] free_pages_prepare+0x3b9/0x3f0
> >>>>>> [    9.122934]  [<c054fae2>] free_hot_cold_page+0x22/0x160
> >>>>>> [    9.124400]  [<c071a22f>] ? copy_to_iter+0x1af/0x2a0
> >>>>>> [    9.125750]  [<c054c4a3>] ? mempool_free_slab+0x13/0x20
> >>>>>> [    9.126840]  [<c054fc57>] __free_pages+0x37/0x50
> >>>>>> [    9.127849]  [<c054c4fd>] mempool_free_pages+0xd/0x10
> >>>>>> [    9.128908]  [<c054c8b6>] mempool_free+0x26/0x80
> >>>>>> [    9.129895]  [<c06f77e6>] bounce_end_io+0x56/0x80
> >>>>>
> >>>>> The page state looks completely off for a bounce buffer page. Did
> >>>>> somebody mess with a bounce bio's bv_page?
> >>>>
> >>>> Looks the page isn't touched in both lo_read_transfer() and
> >>>> lo_read_simple().
> >>>>
> >>>> Maybe it is related with aa4d86163e4e(block: loop: switch to VFS ITER_BVEC),
> >>>> or it  might be helpful to run 'git bisect' if reverting aa4d86163e4e can't
> >>>> fix the issue, suppose the issue can be reproduced easily.
> >>>
> >>> I can try reverting that and getting someone to test it.  It is
> >>> somewhat complicated by having to spin a new install ISO, so a report
> >>> back will be somewhat delayed.  In the meantime, I'm also asking
> >>> people to track down the first kernel build that hits this, so
> >>> hopefully that gives us more of a clue as well.
> 
> The revert of that patch did not fix the issue.
> 
> >>> It is odd that only 32-bit hits this issue though.  At least from what
> >>> we've seen thus far.
> >>
> >> Page bounce may be just valid on 32-bit, and I will try to find one ARM
> >> box to see if it can be reproduced easily.
> >>
> >> BTW, are there any extra steps for reproducing the issue? Such as
> >> cgroup operations?
> >
> > I'm not entirely sure what the install environment on the ISOs is
> > doing, but nobody sees this issue with a kernel after install.  Thus
> > far recreate efforts have focused on recreating the install ISOs using
> > various kernels.  That is working, but I don't expect other people to
> > easily be able to do that.
> >
> > Also, our primary tester seems to have narrowed it down to breaking
> > somewhere between 4.1-rc5 (good) and 4.1-rc6 (bad).  I'll be working
> > with him today to isolate it further, but the commit you pointed out
> > was in 4.1-rc1 and that worked.  He still needs to test a 4.2-rc4
> > kernel with it reverted, but so far it seems to be something else that
> > came in with the 4.1 kernel.
> 
> After doing some RPM bisecting, we've narrowed it down to the
> following commit range:
> 
> [jwboyer@vader linux]$ git log --pretty=oneline c2102f3d73d8..0f1e5b5d19f6
> 0f1e5b5d19f6c06fe2078f946377db9861f3910d Merge tag 'dm-4.1-fixes-3' of
> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
> 1c220c69ce0dcc0f234a9f263ad9c0864f971852 dm: fix casting bug in dm_merge_bvec()
> 15b94a690470038aa08247eedbebbe7e2218d5ee dm: fix reload failure of 0
> path multipath mapping on blk-mq devices
> e5d8de32cc02a259e1a237ab57cba00f2930fa6a dm: fix false warning in
> free_rq_clone() for unmapped requests
> 45714fbed4556149d7f1730f5bae74f81d5e2cd5 dm: requeue from blk-mq
> dm_mq_queue_rq() using BLK_MQ_RQ_QUEUE_BUSY
> 4c6dd53dd3674c310d7379c6b3273daa9fd95c79 dm mpath: fix leak of
> dm_mpath_io structure in blk-mq .queue_rq error path
> 3a1407559a593d4360af12dd2df5296bf8eb0d28 dm: fix NULL pointer when
> clone_and_map_rq returns !DM_MAPIO_REMAPPED
> 4ae9944d132b160d444fa3aa875307eb0fa3eeec dm: run queue on re-queue
> [jwboyer@vader linux]$
> 
> It is interesting to note that we're also carrying a patch in our 4.1
> kernel for loop performance reasons that went into upstream 4.2.  That
> patch is blk-loop-avoid-too-many-pending-per-work-IO.patch which
> corresponds to upstream commit
> 4d4e41aef9429872ea3b105e83426941f7185ab6.  All of those commits are in
> 4.2-rcX, which matches the failures we're seeing.
> 
> We can try a 4.1-rc5 snapshot build without the block patch to see if
> that helps, but the patch was included in all the previously tested
> good kernels and the issue only appeared after the DM merge commits
> were included.

The only commit that looks even remotely related (given 32bit concerns)
would be 1c220c69ce0dcc0f234a9f263ad9c0864f971852

All the other DM commits are request-based changes that, AFAICT, aren't
applicable.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848
  2015-07-31  0:19             ` Mike Snitzer
@ 2015-07-31 18:58               ` Josh Boyer
  2015-08-02 14:01                 ` Josh Boyer
  0 siblings, 1 reply; 23+ messages in thread
From: Josh Boyer @ 2015-07-31 18:58 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Ming Lei, ejt, Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org

On Thu, Jul 30, 2015 at 8:19 PM, Mike Snitzer <snitzer@redhat.com> wrote:
> On Thu, Jul 30 2015 at  7:14pm -0400,
> Josh Boyer <jwboyer@fedoraproject.org> wrote:
>
>> On Thu, Jul 30, 2015 at 7:27 AM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
>> > On Wed, Jul 29, 2015 at 8:29 PM, Ming Lei <ming.lei@canonical.com> wrote:
>> >> On Wed, Jul 29, 2015 at 12:36 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
>> >>> On Wed, Jul 29, 2015 at 11:32 AM, Ming Lei <ming.lei@canonical.com> wrote:
>> >>>> On Wed, Jul 29, 2015 at 9:51 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> >>>>> On Wed, Jul 29, 2015 at 09:27:16AM -0400, Josh Boyer wrote:
>> >>>>>> Hi All,
>> >>>>>>
>> >>>>>> We've gotten a report[1] that any of the upcoming Fedora 23 install
>> >>>>>> images are all failing on 32-bit VMs/machines.  Looking at the first
>> >>>>>> instance of the oops, it seems to be a bad page state where a page is
>> >>>>>> still charged to a group and it is trying to be freed.  The oops
>> >>>>>> output is below.
>> >>>>>>
>> >>>>>> Has anyone seen this in their 32-bit testing at all?  Thus far nobody
>> >>>>>> can recreate this on a 64-bit machine/VM.
>> >>>>>>
>> >>>>>> josh
>> >>>>>>
>> >>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1247382
>> >>>>>>
>> >>>>>> [    9.026738] systemd[1]: Switching root.
>> >>>>>> [    9.036467] systemd-journald[149]: Received SIGTERM from PID 1 (systemd).
>> >>>>>> [    9.082262] BUG: Bad page state in process kworker/u5:1  pfn:372ac
>> >>>>>> [    9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 index:0x16a
>> >>>>>> [    9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
>> >>>>>> [    9.087284] page dumped because: page still charged to cgroup
>> >>>>>> [    9.088772] bad because of flags:
>> >>>>>> [    9.089731] flags: 0x21(locked|lru)
>> >>>>>> [    9.090818] page->mem_cgroup:f2c3e400
>> >>>>>
>> >>>>> It's also still locked and on the LRU. This page shouldn't have been
>> >>>>> freed.
>> >>>>>
>> >>>>>> [    9.117848] Call Trace:
>> >>>>>> [    9.118738]  [<c0aa22c9>] dump_stack+0x41/0x52
>> >>>>>> [    9.120034]  [<c054e30a>] bad_page.part.80+0xaa/0x100
>> >>>>>> [    9.121461]  [<c054eea9>] free_pages_prepare+0x3b9/0x3f0
>> >>>>>> [    9.122934]  [<c054fae2>] free_hot_cold_page+0x22/0x160
>> >>>>>> [    9.124400]  [<c071a22f>] ? copy_to_iter+0x1af/0x2a0
>> >>>>>> [    9.125750]  [<c054c4a3>] ? mempool_free_slab+0x13/0x20
>> >>>>>> [    9.126840]  [<c054fc57>] __free_pages+0x37/0x50
>> >>>>>> [    9.127849]  [<c054c4fd>] mempool_free_pages+0xd/0x10
>> >>>>>> [    9.128908]  [<c054c8b6>] mempool_free+0x26/0x80
>> >>>>>> [    9.129895]  [<c06f77e6>] bounce_end_io+0x56/0x80
>> >>>>>
>> >>>>> The page state looks completely off for a bounce buffer page. Did
>> >>>>> somebody mess with a bounce bio's bv_page?
>> >>>>
>> >>>> Looks the page isn't touched in both lo_read_transfer() and
>> >>>> lo_read_simple().
>> >>>>
>> >>>> Maybe it is related with aa4d86163e4e(block: loop: switch to VFS ITER_BVEC),
>> >>>> or it  might be helpful to run 'git bisect' if reverting aa4d86163e4e can't
>> >>>> fix the issue, suppose the issue can be reproduced easily.
>> >>>
>> >>> I can try reverting that and getting someone to test it.  It is
>> >>> somewhat complicated by having to spin a new install ISO, so a report
>> >>> back will be somewhat delayed.  In the meantime, I'm also asking
>> >>> people to track down the first kernel build that hits this, so
>> >>> hopefully that gives us more of a clue as well.
>>
>> The revert of that patch did not fix the issue.
>>
>> >>> It is odd that only 32-bit hits this issue though.  At least from what
>> >>> we've seen thus far.
>> >>
>> >> Page bounce may be just valid on 32-bit, and I will try to find one ARM
>> >> box to see if it can be reproduced easily.
>> >>
>> >> BTW, are there any extra steps for reproducing the issue? Such as
>> >> cgroup operations?
>> >
>> > I'm not entirely sure what the install environment on the ISOs is
>> > doing, but nobody sees this issue with a kernel after install.  Thus
>> > far recreate efforts have focused on recreating the install ISOs using
>> > various kernels.  That is working, but I don't expect other people to
>> > easily be able to do that.
>> >
>> > Also, our primary tester seems to have narrowed it down to breaking
>> > somewhere between 4.1-rc5 (good) and 4.1-rc6 (bad).  I'll be working
>> > with him today to isolate it further, but the commit you pointed out
>> > was in 4.1-rc1 and that worked.  He still needs to test a 4.2-rc4
>> > kernel with it reverted, but so far it seems to be something else that
>> > came in with the 4.1 kernel.
>>
>> After doing some RPM bisecting, we've narrowed it down to the
>> following commit range:
>>
>> [jwboyer@vader linux]$ git log --pretty=oneline c2102f3d73d8..0f1e5b5d19f6
>> 0f1e5b5d19f6c06fe2078f946377db9861f3910d Merge tag 'dm-4.1-fixes-3' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
>> 1c220c69ce0dcc0f234a9f263ad9c0864f971852 dm: fix casting bug in dm_merge_bvec()
>> 15b94a690470038aa08247eedbebbe7e2218d5ee dm: fix reload failure of 0
>> path multipath mapping on blk-mq devices
>> e5d8de32cc02a259e1a237ab57cba00f2930fa6a dm: fix false warning in
>> free_rq_clone() for unmapped requests
>> 45714fbed4556149d7f1730f5bae74f81d5e2cd5 dm: requeue from blk-mq
>> dm_mq_queue_rq() using BLK_MQ_RQ_QUEUE_BUSY
>> 4c6dd53dd3674c310d7379c6b3273daa9fd95c79 dm mpath: fix leak of
>> dm_mpath_io structure in blk-mq .queue_rq error path
>> 3a1407559a593d4360af12dd2df5296bf8eb0d28 dm: fix NULL pointer when
>> clone_and_map_rq returns !DM_MAPIO_REMAPPED
>> 4ae9944d132b160d444fa3aa875307eb0fa3eeec dm: run queue on re-queue
>> [jwboyer@vader linux]$
>>
>> It is interesting to note that we're also carrying a patch in our 4.1
>> kernel for loop performance reasons that went into upstream 4.2.  That
>> patch is blk-loop-avoid-too-many-pending-per-work-IO.patch which
>> corresponds to upstream commit
>> 4d4e41aef9429872ea3b105e83426941f7185ab6.  All of those commits are in
>> 4.2-rcX, which matches the failures we're seeing.
>>
>> We can try a 4.1-rc5 snapshot build without the block patch to see if
>> that helps, but the patch was included in all the previously tested
>> good kernels and the issue only appeared after the DM merge commits
>> were included.
>
> The only commit that looks even remotely related (given 32bit concerns)
> would be 1c220c69ce0dcc0f234a9f263ad9c0864f971852

Confirmed.  I built kernels for our tester that started with the
working snapshot and applied the patches above one at a time.  The
failing patch was the commit you suspected.

I can try and build a 4.2-rc4 kernel with that reverted, but it would
be good if someone could start thinking about how that could cause
this issue.

josh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848
  2015-07-31 18:58               ` Josh Boyer
@ 2015-08-02 14:01                 ` Josh Boyer
  2015-08-03 14:28                   ` Mike Snitzer
  0 siblings, 1 reply; 23+ messages in thread
From: Josh Boyer @ 2015-08-02 14:01 UTC (permalink / raw)
  To: ejt, Mike Snitzer
  Cc: Ming Lei, Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org

On Fri, Jul 31, 2015 at 2:58 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> On Thu, Jul 30, 2015 at 8:19 PM, Mike Snitzer <snitzer@redhat.com> wrote:
>> On Thu, Jul 30 2015 at  7:14pm -0400,
>> Josh Boyer <jwboyer@fedoraproject.org> wrote:
>>
>>> On Thu, Jul 30, 2015 at 7:27 AM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
>>> > On Wed, Jul 29, 2015 at 8:29 PM, Ming Lei <ming.lei@canonical.com> wrote:
>>> >> On Wed, Jul 29, 2015 at 12:36 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
>>> >>> On Wed, Jul 29, 2015 at 11:32 AM, Ming Lei <ming.lei@canonical.com> wrote:
>>> >>>> On Wed, Jul 29, 2015 at 9:51 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>>> >>>>> On Wed, Jul 29, 2015 at 09:27:16AM -0400, Josh Boyer wrote:
>>> >>>>>> Hi All,
>>> >>>>>>
>>> >>>>>> We've gotten a report[1] that any of the upcoming Fedora 23 install
>>> >>>>>> images are all failing on 32-bit VMs/machines.  Looking at the first
>>> >>>>>> instance of the oops, it seems to be a bad page state where a page is
>>> >>>>>> still charged to a group and it is trying to be freed.  The oops
>>> >>>>>> output is below.
>>> >>>>>>
>>> >>>>>> Has anyone seen this in their 32-bit testing at all?  Thus far nobody
>>> >>>>>> can recreate this on a 64-bit machine/VM.
>>> >>>>>>
>>> >>>>>> josh
>>> >>>>>>
>>> >>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1247382
>>> >>>>>>
>>> >>>>>> [    9.026738] systemd[1]: Switching root.
>>> >>>>>> [    9.036467] systemd-journald[149]: Received SIGTERM from PID 1 (systemd).
>>> >>>>>> [    9.082262] BUG: Bad page state in process kworker/u5:1  pfn:372ac
>>> >>>>>> [    9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 index:0x16a
>>> >>>>>> [    9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
>>> >>>>>> [    9.087284] page dumped because: page still charged to cgroup
>>> >>>>>> [    9.088772] bad because of flags:
>>> >>>>>> [    9.089731] flags: 0x21(locked|lru)
>>> >>>>>> [    9.090818] page->mem_cgroup:f2c3e400
>>> >>>>>
>>> >>>>> It's also still locked and on the LRU. This page shouldn't have been
>>> >>>>> freed.
>>> >>>>>
>>> >>>>>> [    9.117848] Call Trace:
>>> >>>>>> [    9.118738]  [<c0aa22c9>] dump_stack+0x41/0x52
>>> >>>>>> [    9.120034]  [<c054e30a>] bad_page.part.80+0xaa/0x100
>>> >>>>>> [    9.121461]  [<c054eea9>] free_pages_prepare+0x3b9/0x3f0
>>> >>>>>> [    9.122934]  [<c054fae2>] free_hot_cold_page+0x22/0x160
>>> >>>>>> [    9.124400]  [<c071a22f>] ? copy_to_iter+0x1af/0x2a0
>>> >>>>>> [    9.125750]  [<c054c4a3>] ? mempool_free_slab+0x13/0x20
>>> >>>>>> [    9.126840]  [<c054fc57>] __free_pages+0x37/0x50
>>> >>>>>> [    9.127849]  [<c054c4fd>] mempool_free_pages+0xd/0x10
>>> >>>>>> [    9.128908]  [<c054c8b6>] mempool_free+0x26/0x80
>>> >>>>>> [    9.129895]  [<c06f77e6>] bounce_end_io+0x56/0x80
>>> >>>>>
>>> >>>>> The page state looks completely off for a bounce buffer page. Did
>>> >>>>> somebody mess with a bounce bio's bv_page?
>>> >>>>
>>> >>>> Looks the page isn't touched in both lo_read_transfer() and
>>> >>>> lo_read_simple().
>>> >>>>
>>> >>>> Maybe it is related with aa4d86163e4e(block: loop: switch to VFS ITER_BVEC),
>>> >>>> or it  might be helpful to run 'git bisect' if reverting aa4d86163e4e can't
>>> >>>> fix the issue, suppose the issue can be reproduced easily.
>>> >>>
>>> >>> I can try reverting that and getting someone to test it.  It is
>>> >>> somewhat complicated by having to spin a new install ISO, so a report
>>> >>> back will be somewhat delayed.  In the meantime, I'm also asking
>>> >>> people to track down the first kernel build that hits this, so
>>> >>> hopefully that gives us more of a clue as well.
>>>
>>> The revert of that patch did not fix the issue.
>>>
>>> >>> It is odd that only 32-bit hits this issue though.  At least from what
>>> >>> we've seen thus far.
>>> >>
>>> >> Page bounce may be just valid on 32-bit, and I will try to find one ARM
>>> >> box to see if it can be reproduced easily.
>>> >>
>>> >> BTW, are there any extra steps for reproducing the issue? Such as
>>> >> cgroup operations?
>>> >
>>> > I'm not entirely sure what the install environment on the ISOs is
>>> > doing, but nobody sees this issue with a kernel after install.  Thus
>>> > far recreate efforts have focused on recreating the install ISOs using
>>> > various kernels.  That is working, but I don't expect other people to
>>> > easily be able to do that.
>>> >
>>> > Also, our primary tester seems to have narrowed it down to breaking
>>> > somewhere between 4.1-rc5 (good) and 4.1-rc6 (bad).  I'll be working
>>> > with him today to isolate it further, but the commit you pointed out
>>> > was in 4.1-rc1 and that worked.  He still needs to test a 4.2-rc4
>>> > kernel with it reverted, but so far it seems to be something else that
>>> > came in with the 4.1 kernel.
>>>
>>> After doing some RPM bisecting, we've narrowed it down to the
>>> following commit range:
>>>
>>> [jwboyer@vader linux]$ git log --pretty=oneline c2102f3d73d8..0f1e5b5d19f6
>>> 0f1e5b5d19f6c06fe2078f946377db9861f3910d Merge tag 'dm-4.1-fixes-3' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
>>> 1c220c69ce0dcc0f234a9f263ad9c0864f971852 dm: fix casting bug in dm_merge_bvec()
>>> 15b94a690470038aa08247eedbebbe7e2218d5ee dm: fix reload failure of 0
>>> path multipath mapping on blk-mq devices
>>> e5d8de32cc02a259e1a237ab57cba00f2930fa6a dm: fix false warning in
>>> free_rq_clone() for unmapped requests
>>> 45714fbed4556149d7f1730f5bae74f81d5e2cd5 dm: requeue from blk-mq
>>> dm_mq_queue_rq() using BLK_MQ_RQ_QUEUE_BUSY
>>> 4c6dd53dd3674c310d7379c6b3273daa9fd95c79 dm mpath: fix leak of
>>> dm_mpath_io structure in blk-mq .queue_rq error path
>>> 3a1407559a593d4360af12dd2df5296bf8eb0d28 dm: fix NULL pointer when
>>> clone_and_map_rq returns !DM_MAPIO_REMAPPED
>>> 4ae9944d132b160d444fa3aa875307eb0fa3eeec dm: run queue on re-queue
>>> [jwboyer@vader linux]$
>>>
>>> It is interesting to note that we're also carrying a patch in our 4.1
>>> kernel for loop performance reasons that went into upstream 4.2.  That
>>> patch is blk-loop-avoid-too-many-pending-per-work-IO.patch which
>>> corresponds to upstream commit
>>> 4d4e41aef9429872ea3b105e83426941f7185ab6.  All of those commits are in
>>> 4.2-rcX, which matches the failures we're seeing.
>>>
>>> We can try a 4.1-rc5 snapshot build without the block patch to see if
>>> that helps, but the patch was included in all the previously tested
>>> good kernels and the issue only appeared after the DM merge commits
>>> were included.
>>
>> The only commit that looks even remotely related (given 32bit concerns)
>> would be 1c220c69ce0dcc0f234a9f263ad9c0864f971852
>
> Confirmed.  I built kernels for our tester that started with the
> working snapshot and applied the patches above one at a time.  The
> failing patch was the commit you suspected.
>
> I can try and build a 4.2-rc4 kernel with that reverted, but it would
> be good if someone could start thinking about how that could cause
> this issue.

A revert on top of 4.2-rc4 booted.  So this is currently causing
issues with upstream as well.

josh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848
  2015-08-02 14:01                 ` Josh Boyer
@ 2015-08-03 14:28                   ` Mike Snitzer
  2015-08-03 16:56                     ` Josh Boyer
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2015-08-03 14:28 UTC (permalink / raw)
  To: Josh Boyer
  Cc: ejt, Ming Lei, Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org

On Sun, Aug 02 2015 at 10:01P -0400,
Josh Boyer <jwboyer@fedoraproject.org> wrote:

> On Fri, Jul 31, 2015 at 2:58 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> > On Thu, Jul 30, 2015 at 8:19 PM, Mike Snitzer <snitzer@redhat.com> wrote:
> >>
> >> The only commit that looks even remotely related (given 32bit concerns)
> >> would be 1c220c69ce0dcc0f234a9f263ad9c0864f971852
> >
> > Confirmed.  I built kernels for our tester that started with the
> > working snapshot and applied the patches above one at a time.  The
> > failing patch was the commit you suspected.
> >
> > I can try and build a 4.2-rc4 kernel with that reverted, but it would
> > be good if someone could start thinking about how that could cause
> > this issue.
> 
> A revert on top of 4.2-rc4 booted.  So this is currently causing
> issues with upstream as well.

Hi Josh,

I've staged the following fix in linux-next (for 4.2-rc6 inclusion):
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=76270d574acc897178a5c8be0bd2a743a77e4bac

Can you please verify that it works for your 32bit testcase against
4.2-rc4 (or rc5)?

Thanks.

From: Mike Snitzer <snitzer@redhat.com>
Date: Mon, 3 Aug 2015 09:54:58 -0400
Subject: [PATCH] dm: fix dm_merge_bvec regression on 32 bit systems

A DM regression on 32 bit systems was reported against v4.2-rc3 here:
https://lkml.org/lkml/2015/7/29/401

Fix this by reverting both commit 1c220c69 ("dm: fix casting bug in
dm_merge_bvec()") and 148e51ba ("dm: improve documentation and code
clarity in dm_merge_bvec").  This combined revert is done to eliminate
the possibility of a partial revert in stable@ kernels.

In hindsight the correct fix, at the time 1c220c69 was applied to fix
the regression that 148e51ba introduced, should've been to simply revert
148e51ba.

Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.19+
---
 drivers/md/dm.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index ab37ae1..0d7ab20 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1729,7 +1729,8 @@ static int dm_merge_bvec(struct request_queue *q,
 	struct mapped_device *md = q->queuedata;
 	struct dm_table *map = dm_get_live_table_fast(md);
 	struct dm_target *ti;
-	sector_t max_sectors, max_size = 0;
+	sector_t max_sectors;
+	int max_size = 0;
 
 	if (unlikely(!map))
 		goto out;
@@ -1742,18 +1743,10 @@ static int dm_merge_bvec(struct request_queue *q,
 	 * Find maximum amount of I/O that won't need splitting
 	 */
 	max_sectors = min(max_io_len(bvm->bi_sector, ti),
-			  (sector_t) queue_max_sectors(q));
+			  (sector_t) BIO_MAX_SECTORS);
 	max_size = (max_sectors << SECTOR_SHIFT) - bvm->bi_size;
-
-	/*
-	 * FIXME: this stop-gap fix _must_ be cleaned up (by passing a sector_t
-	 * to the targets' merge function since it holds sectors not bytes).
-	 * Just doing this as an interim fix for stable@ because the more
-	 * comprehensive cleanup of switching to sector_t will impact every
-	 * DM target that implements a ->merge hook.
-	 */
-	if (max_size > INT_MAX)
-		max_size = INT_MAX;
+	if (max_size < 0)
+		max_size = 0;
 
 	/*
 	 * merge_bvec_fn() returns number of bytes
@@ -1761,13 +1754,13 @@ static int dm_merge_bvec(struct request_queue *q,
 	 * max is precomputed maximal io size
 	 */
 	if (max_size && ti->type->merge)
-		max_size = ti->type->merge(ti, bvm, biovec, (int) max_size);
+		max_size = ti->type->merge(ti, bvm, biovec, max_size);
 	/*
 	 * If the target doesn't support merge method and some of the devices
-	 * provided their merge_bvec method (we know this by looking for the
-	 * max_hw_sectors that dm_set_device_limits may set), then we can't
-	 * allow bios with multiple vector entries.  So always set max_size
-	 * to 0, and the code below allows just one page.
+	 * provided their merge_bvec method (we know this by looking at
+	 * queue_max_hw_sectors), then we can't allow bios with multiple vector
+	 * entries.  So always set max_size to 0, and the code below allows
+	 * just one page.
 	 */
 	else if (queue_max_hw_sectors(q) <= PAGE_SIZE >> 9)
 		max_size = 0;
-- 
2.3.2 (Apple Git-55)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848
  2015-08-03 14:28                   ` Mike Snitzer
@ 2015-08-03 16:56                     ` Josh Boyer
  2015-08-04  1:11                       ` Josh Boyer
  0 siblings, 1 reply; 23+ messages in thread
From: Josh Boyer @ 2015-08-03 16:56 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: ejt, Ming Lei, Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org

On Mon, Aug 3, 2015 at 10:28 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> On Sun, Aug 02 2015 at 10:01P -0400,
> Josh Boyer <jwboyer@fedoraproject.org> wrote:
>
>> On Fri, Jul 31, 2015 at 2:58 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
>> > On Thu, Jul 30, 2015 at 8:19 PM, Mike Snitzer <snitzer@redhat.com> wrote:
>> >>
>> >> The only commit that looks even remotely related (given 32bit concerns)
>> >> would be 1c220c69ce0dcc0f234a9f263ad9c0864f971852
>> >
>> > Confirmed.  I built kernels for our tester that started with the
>> > working snapshot and applied the patches above one at a time.  The
>> > failing patch was the commit you suspected.
>> >
>> > I can try and build a 4.2-rc4 kernel with that reverted, but it would
>> > be good if someone could start thinking about how that could cause
>> > this issue.
>>
>> A revert on top of 4.2-rc4 booted.  So this is currently causing
>> issues with upstream as well.
>
> Hi Josh,
>
> I've staged the following fix in linux-next (for 4.2-rc6 inclusion):
> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=76270d574acc897178a5c8be0bd2a743a77e4bac
>
> Can you please verify that it works for your 32bit testcase against
> 4.2-rc4 (or rc5)?

Sure, I'll get a kernel with this included spun up and ask Adam to test.

josh

> From: Mike Snitzer <snitzer@redhat.com>
> Date: Mon, 3 Aug 2015 09:54:58 -0400
> Subject: [PATCH] dm: fix dm_merge_bvec regression on 32 bit systems
>
> A DM regression on 32 bit systems was reported against v4.2-rc3 here:
> https://lkml.org/lkml/2015/7/29/401
>
> Fix this by reverting both commit 1c220c69 ("dm: fix casting bug in
> dm_merge_bvec()") and 148e51ba ("dm: improve documentation and code
> clarity in dm_merge_bvec").  This combined revert is done to eliminate
> the possibility of a partial revert in stable@ kernels.
>
> In hindsight the correct fix, at the time 1c220c69 was applied to fix
> the regression that 148e51ba introduced, should've been to simply revert
> 148e51ba.
>
> Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
> Acked-by: Joe Thornber <ejt@redhat.com>
> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
> Cc: stable@vger.kernel.org # 3.19+
> ---
>  drivers/md/dm.c | 27 ++++++++++-----------------
>  1 file changed, 10 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index ab37ae1..0d7ab20 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1729,7 +1729,8 @@ static int dm_merge_bvec(struct request_queue *q,
>         struct mapped_device *md = q->queuedata;
>         struct dm_table *map = dm_get_live_table_fast(md);
>         struct dm_target *ti;
> -       sector_t max_sectors, max_size = 0;
> +       sector_t max_sectors;
> +       int max_size = 0;
>
>         if (unlikely(!map))
>                 goto out;
> @@ -1742,18 +1743,10 @@ static int dm_merge_bvec(struct request_queue *q,
>          * Find maximum amount of I/O that won't need splitting
>          */
>         max_sectors = min(max_io_len(bvm->bi_sector, ti),
> -                         (sector_t) queue_max_sectors(q));
> +                         (sector_t) BIO_MAX_SECTORS);
>         max_size = (max_sectors << SECTOR_SHIFT) - bvm->bi_size;
> -
> -       /*
> -        * FIXME: this stop-gap fix _must_ be cleaned up (by passing a sector_t
> -        * to the targets' merge function since it holds sectors not bytes).
> -        * Just doing this as an interim fix for stable@ because the more
> -        * comprehensive cleanup of switching to sector_t will impact every
> -        * DM target that implements a ->merge hook.
> -        */
> -       if (max_size > INT_MAX)
> -               max_size = INT_MAX;
> +       if (max_size < 0)
> +               max_size = 0;
>
>         /*
>          * merge_bvec_fn() returns number of bytes
> @@ -1761,13 +1754,13 @@ static int dm_merge_bvec(struct request_queue *q,
>          * max is precomputed maximal io size
>          */
>         if (max_size && ti->type->merge)
> -               max_size = ti->type->merge(ti, bvm, biovec, (int) max_size);
> +               max_size = ti->type->merge(ti, bvm, biovec, max_size);
>         /*
>          * If the target doesn't support merge method and some of the devices
> -        * provided their merge_bvec method (we know this by looking for the
> -        * max_hw_sectors that dm_set_device_limits may set), then we can't
> -        * allow bios with multiple vector entries.  So always set max_size
> -        * to 0, and the code below allows just one page.
> +        * provided their merge_bvec method (we know this by looking at
> +        * queue_max_hw_sectors), then we can't allow bios with multiple vector
> +        * entries.  So always set max_size to 0, and the code below allows
> +        * just one page.
>          */
>         else if (queue_max_hw_sectors(q) <= PAGE_SIZE >> 9)
>                 max_size = 0;
> --
> 2.3.2 (Apple Git-55)
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848
  2015-08-03 16:56                     ` Josh Boyer
@ 2015-08-04  1:11                       ` Josh Boyer
  2015-09-11 21:43                         ` 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848] Mike Snitzer
  0 siblings, 1 reply; 23+ messages in thread
From: Josh Boyer @ 2015-08-04  1:11 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: ejt, Ming Lei, Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org

On Mon, Aug 3, 2015 at 12:56 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> On Mon, Aug 3, 2015 at 10:28 AM, Mike Snitzer <snitzer@redhat.com> wrote:
>> On Sun, Aug 02 2015 at 10:01P -0400,
>> Josh Boyer <jwboyer@fedoraproject.org> wrote:
>>
>>> On Fri, Jul 31, 2015 at 2:58 PM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
>>> > On Thu, Jul 30, 2015 at 8:19 PM, Mike Snitzer <snitzer@redhat.com> wrote:
>>> >>
>>> >> The only commit that looks even remotely related (given 32bit concerns)
>>> >> would be 1c220c69ce0dcc0f234a9f263ad9c0864f971852
>>> >
>>> > Confirmed.  I built kernels for our tester that started with the
>>> > working snapshot and applied the patches above one at a time.  The
>>> > failing patch was the commit you suspected.
>>> >
>>> > I can try and build a 4.2-rc4 kernel with that reverted, but it would
>>> > be good if someone could start thinking about how that could cause
>>> > this issue.
>>>
>>> A revert on top of 4.2-rc4 booted.  So this is currently causing
>>> issues with upstream as well.
>>
>> Hi Josh,
>>
>> I've staged the following fix in linux-next (for 4.2-rc6 inclusion):
>> https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=76270d574acc897178a5c8be0bd2a743a77e4bac
>>
>> Can you please verify that it works for your 32bit testcase against
>> 4.2-rc4 (or rc5)?
>
> Sure, I'll get a kernel with this included spun up and ask Adam to test.

Adam tested this with success.  If you're still collecting patch
metadata, adding:

Tested-by: Adam Williamson <awilliam@redhat.com>

would be appreciated.

josh

>> From: Mike Snitzer <snitzer@redhat.com>
>> Date: Mon, 3 Aug 2015 09:54:58 -0400
>> Subject: [PATCH] dm: fix dm_merge_bvec regression on 32 bit systems
>>
>> A DM regression on 32 bit systems was reported against v4.2-rc3 here:
>> https://lkml.org/lkml/2015/7/29/401
>>
>> Fix this by reverting both commit 1c220c69 ("dm: fix casting bug in
>> dm_merge_bvec()") and 148e51ba ("dm: improve documentation and code
>> clarity in dm_merge_bvec").  This combined revert is done to eliminate
>> the possibility of a partial revert in stable@ kernels.
>>
>> In hindsight the correct fix, at the time 1c220c69 was applied to fix
>> the regression that 148e51ba introduced, should've been to simply revert
>> 148e51ba.
>>
>> Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
>> Acked-by: Joe Thornber <ejt@redhat.com>
>> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
>> Cc: stable@vger.kernel.org # 3.19+
>> ---
>>  drivers/md/dm.c | 27 ++++++++++-----------------
>>  1 file changed, 10 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index ab37ae1..0d7ab20 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -1729,7 +1729,8 @@ static int dm_merge_bvec(struct request_queue *q,
>>         struct mapped_device *md = q->queuedata;
>>         struct dm_table *map = dm_get_live_table_fast(md);
>>         struct dm_target *ti;
>> -       sector_t max_sectors, max_size = 0;
>> +       sector_t max_sectors;
>> +       int max_size = 0;
>>
>>         if (unlikely(!map))
>>                 goto out;
>> @@ -1742,18 +1743,10 @@ static int dm_merge_bvec(struct request_queue *q,
>>          * Find maximum amount of I/O that won't need splitting
>>          */
>>         max_sectors = min(max_io_len(bvm->bi_sector, ti),
>> -                         (sector_t) queue_max_sectors(q));
>> +                         (sector_t) BIO_MAX_SECTORS);
>>         max_size = (max_sectors << SECTOR_SHIFT) - bvm->bi_size;
>> -
>> -       /*
>> -        * FIXME: this stop-gap fix _must_ be cleaned up (by passing a sector_t
>> -        * to the targets' merge function since it holds sectors not bytes).
>> -        * Just doing this as an interim fix for stable@ because the more
>> -        * comprehensive cleanup of switching to sector_t will impact every
>> -        * DM target that implements a ->merge hook.
>> -        */
>> -       if (max_size > INT_MAX)
>> -               max_size = INT_MAX;
>> +       if (max_size < 0)
>> +               max_size = 0;
>>
>>         /*
>>          * merge_bvec_fn() returns number of bytes
>> @@ -1761,13 +1754,13 @@ static int dm_merge_bvec(struct request_queue *q,
>>          * max is precomputed maximal io size
>>          */
>>         if (max_size && ti->type->merge)
>> -               max_size = ti->type->merge(ti, bvm, biovec, (int) max_size);
>> +               max_size = ti->type->merge(ti, bvm, biovec, max_size);
>>         /*
>>          * If the target doesn't support merge method and some of the devices
>> -        * provided their merge_bvec method (we know this by looking for the
>> -        * max_hw_sectors that dm_set_device_limits may set), then we can't
>> -        * allow bios with multiple vector entries.  So always set max_size
>> -        * to 0, and the code below allows just one page.
>> +        * provided their merge_bvec method (we know this by looking at
>> +        * queue_max_hw_sectors), then we can't allow bios with multiple vector
>> +        * entries.  So always set max_size to 0, and the code below allows
>> +        * just one page.
>>          */
>>         else if (queue_max_hw_sectors(q) <= PAGE_SIZE >> 9)
>>                 max_size = 0;
>> --
>> 2.3.2 (Apple Git-55)
>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848]
  2015-08-04  1:11                       ` Josh Boyer
@ 2015-09-11 21:43                         ` Mike Snitzer
  2015-09-11 21:50                           ` Adam Williamson
                                             ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Mike Snitzer @ 2015-09-11 21:43 UTC (permalink / raw)
  To: Josh Boyer
  Cc: ejt, Ming Lei, Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org, mlin, awilliam

Ming, Jens, others:

Please see this BZ comment that speaks to a 4.3 regression due to the
late bio splitting changes:
https://bugzilla.redhat.com/show_bug.cgi?id=1247382#c41

But inlined here so we can continue on list:
(In reply to Josh Boyer from comment #40)
> The function that was fixed in 4.2 doesn't exist any longer in
> 4.3.0-0.rc0.git6.1.fc24.  That kernel corresponds to Linux
> v4.2-6105-gdd5cdb48edfd which contains commit
> 8ae126660fddbeebb9251a174e6fa45b6ad8f932, which removed it completely.  So
> whatever fix was made in dm_merge_bvec doesn't seem to have made it to
> whatever replaced it.

The dm core fix to dm_merge_bvec was commit bd4aaf8f9b ("dm: fix
dm_merge_bvec regression on 32 bit systems").  But I'm not sure there is
a clear equivalent in the late bio splitting code that replaced block
core's merge_bvec logic.

merge_bvec was all about limiting bios (by asking "can/should this page
be added to this bio?") whereas the late bio splitting is more "build
the bios as large as possible and worry about splitting later".

Regardless, this regression needs to be reported to Ming Lin
<ming.l@ssi.samsung.com>, Jens Axboe and the others involved in
maintaining the late bio splitting changes in block core.

Josh and/or Adam: it would _really_ help if the regression test you guys
are using could be handed-over and/or explained to us.  Is it as simple
as loading a 32bit with a particular config?  Can you share the guest
image if it is small enough?

Mike

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848]
  2015-09-11 21:43                         ` 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848] Mike Snitzer
@ 2015-09-11 21:50                           ` Adam Williamson
  2015-09-12  4:43                           ` Ming Lin
  2015-09-12 13:19                           ` Ming Lei
  2 siblings, 0 replies; 23+ messages in thread
From: Adam Williamson @ 2015-09-11 21:50 UTC (permalink / raw)
  To: Mike Snitzer, Josh Boyer
  Cc: ejt, Ming Lei, Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org, mlin

On Fri, 2015-09-11 at 17:43 -0400, Mike Snitzer wrote:

> Josh and/or Adam: it would _really_ help if the regression test you
> guys
> are using could be handed-over and/or explained to us.  Is it as
> simple
> as loading a 32bit with a particular config?  Can you share the guest
> image if it is small enough?

The test is 'grab a Fedora 32-bit nightly boot.iso (network install)
image and try and boot it'. You can watch it failing in glorious video
technicolor here (the video is sped up, but you can step through frame-
by-frame to catch the errors):

https://openqa.happyassassin.net/tests/4631/file/video.ogv

you can download an affected image here:

https://kojipkgs.fedoraproject.org/mash/rawhide-20150911/rawhide/i386/os/images/boot.iso

I guess strictly speaking we're only sure it fails when booted in a
qemu-kvm VM, but I think we did hit the previous incarnation in bare
metal testing too (so far I haven't tried the 4.3 incarnation on
metal).

we don't keep the nightly ISOs around forever, but it'll be there for
at least a couple of weeks.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848]
  2015-09-11 21:43                         ` 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848] Mike Snitzer
  2015-09-11 21:50                           ` Adam Williamson
@ 2015-09-12  4:43                           ` Ming Lin
  2015-09-12  7:34                             ` Ming Lin
  2015-09-12 13:19                           ` Ming Lei
  2 siblings, 1 reply; 23+ messages in thread
From: Ming Lin @ 2015-09-12  4:43 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Josh Boyer, Joe Thornber, Ming Lei, Johannes Weiner, Tejun Heo,
	Jens Axboe, Linux-Kernel@Vger. Kernel. Org, awilliam

On Fri, Sep 11, 2015 at 2:43 PM, Mike Snitzer <snitzer@redhat.com> wrote:
> Ming, Jens, others:
>
> Please see this BZ comment that speaks to a 4.3 regression due to the
> late bio splitting changes:
> https://bugzilla.redhat.com/show_bug.cgi?id=1247382#c41
>
> But inlined here so we can continue on list:
> (In reply to Josh Boyer from comment #40)
>> The function that was fixed in 4.2 doesn't exist any longer in
>> 4.3.0-0.rc0.git6.1.fc24.  That kernel corresponds to Linux
>> v4.2-6105-gdd5cdb48edfd which contains commit
>> 8ae126660fddbeebb9251a174e6fa45b6ad8f932, which removed it completely.  So
>> whatever fix was made in dm_merge_bvec doesn't seem to have made it to
>> whatever replaced it.
>
> The dm core fix to dm_merge_bvec was commit bd4aaf8f9b ("dm: fix
> dm_merge_bvec regression on 32 bit systems").  But I'm not sure there is
> a clear equivalent in the late bio splitting code that replaced block
> core's merge_bvec logic.
>
> merge_bvec was all about limiting bios (by asking "can/should this page
> be added to this bio?") whereas the late bio splitting is more "build
> the bios as large as possible and worry about splitting later".
>
> Regardless, this regression needs to be reported to Ming Lin
> <ming.l@ssi.samsung.com>, Jens Axboe and the others involved in
> maintaining the late bio splitting changes in block core.

I'm looking at it now.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848]
  2015-09-12  4:43                           ` Ming Lin
@ 2015-09-12  7:34                             ` Ming Lin
  2015-09-12  7:52                               ` Ming Lin
  0 siblings, 1 reply; 23+ messages in thread
From: Ming Lin @ 2015-09-12  7:34 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Josh Boyer, Joe Thornber, Ming Lei, Johannes Weiner, Tejun Heo,
	Jens Axboe, Linux-Kernel@Vger. Kernel. Org, awilliam

On Fri, 2015-09-11 at 21:43 -0700, Ming Lin wrote:
> On Fri, Sep 11, 2015 at 2:43 PM, Mike Snitzer <snitzer@redhat.com> wrote:
> > Ming, Jens, others:
> >
> > Please see this BZ comment that speaks to a 4.3 regression due to the
> > late bio splitting changes:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1247382#c41
> >
> > But inlined here so we can continue on list:
> > (In reply to Josh Boyer from comment #40)
> >> The function that was fixed in 4.2 doesn't exist any longer in
> >> 4.3.0-0.rc0.git6.1.fc24.  That kernel corresponds to Linux
> >> v4.2-6105-gdd5cdb48edfd which contains commit
> >> 8ae126660fddbeebb9251a174e6fa45b6ad8f932, which removed it completely.  So
> >> whatever fix was made in dm_merge_bvec doesn't seem to have made it to
> >> whatever replaced it.
> >
> > The dm core fix to dm_merge_bvec was commit bd4aaf8f9b ("dm: fix
> > dm_merge_bvec regression on 32 bit systems").  But I'm not sure there is
> > a clear equivalent in the late bio splitting code that replaced block
> > core's merge_bvec logic.
> >
> > merge_bvec was all about limiting bios (by asking "can/should this page
> > be added to this bio?") whereas the late bio splitting is more "build
> > the bios as large as possible and worry about splitting later".
> >
> > Regardless, this regression needs to be reported to Ming Lin
> > <ming.l@ssi.samsung.com>, Jens Axboe and the others involved in
> > maintaining the late bio splitting changes in block core.
> 
> I'm looking at it now.

I tried rawhide-20150903 boot.iso and rawhide-20150904 boot.iso.
0903 boot.iso is OK, but 0904 boot.iso just stuck at "Reached target
Basic System". So I can't see the panic.
http://www.minggr.net/pub/20150912/rawhide-20150904-boot.iso.png

I'll run test on 32bit VM, see if I can reproduce the bug.

Adam,

Could you also help to confirm that commit 7140aaf is OK and commit
8ae1266 is bad?

Thanks,
Ming



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848]
  2015-09-12  7:34                             ` Ming Lin
@ 2015-09-12  7:52                               ` Ming Lin
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lin @ 2015-09-12  7:52 UTC (permalink / raw)
  To: awilliam
  Cc: Mike Snitzer, Josh Boyer, Joe Thornber, Ming Lei, Johannes Weiner,
	Tejun Heo, Jens Axboe, Linux-Kernel@Vger. Kernel. Org

On Sat, Sep 12, 2015 at 12:34 AM, Ming Lin <mlin@kernel.org> wrote:
> On Fri, 2015-09-11 at 21:43 -0700, Ming Lin wrote:
>> On Fri, Sep 11, 2015 at 2:43 PM, Mike Snitzer <snitzer@redhat.com> wrote:
>> > Ming, Jens, others:
>> >
>> > Please see this BZ comment that speaks to a 4.3 regression due to the
>> > late bio splitting changes:
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1247382#c41
>> >
>> > But inlined here so we can continue on list:
>> > (In reply to Josh Boyer from comment #40)
>> >> The function that was fixed in 4.2 doesn't exist any longer in
>> >> 4.3.0-0.rc0.git6.1.fc24.  That kernel corresponds to Linux
>> >> v4.2-6105-gdd5cdb48edfd which contains commit
>> >> 8ae126660fddbeebb9251a174e6fa45b6ad8f932, which removed it completely.  So
>> >> whatever fix was made in dm_merge_bvec doesn't seem to have made it to
>> >> whatever replaced it.
>> >
>> > The dm core fix to dm_merge_bvec was commit bd4aaf8f9b ("dm: fix
>> > dm_merge_bvec regression on 32 bit systems").  But I'm not sure there is
>> > a clear equivalent in the late bio splitting code that replaced block
>> > core's merge_bvec logic.
>> >
>> > merge_bvec was all about limiting bios (by asking "can/should this page
>> > be added to this bio?") whereas the late bio splitting is more "build
>> > the bios as large as possible and worry about splitting later".
>> >
>> > Regardless, this regression needs to be reported to Ming Lin
>> > <ming.l@ssi.samsung.com>, Jens Axboe and the others involved in
>> > maintaining the late bio splitting changes in block core.
>>
>> I'm looking at it now.
>
> I tried rawhide-20150903 boot.iso and rawhide-20150904 boot.iso.
> 0903 boot.iso is OK, but 0904 boot.iso just stuck at "Reached target
> Basic System". So I can't see the panic.
> http://www.minggr.net/pub/20150912/rawhide-20150904-boot.iso.png
>
> I'll run test on 32bit VM, see if I can reproduce the bug.
>
> Adam,
>
> Could you also help to confirm that commit 7140aaf is OK and commit
> 8ae1266 is bad?

I mean to confirm "commit 7140aaf + git cherry-pick bd4aaf8" is OK
and commit 8ae1266 is bad.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848]
  2015-09-11 21:43                         ` 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848] Mike Snitzer
  2015-09-11 21:50                           ` Adam Williamson
  2015-09-12  4:43                           ` Ming Lin
@ 2015-09-12 13:19                           ` Ming Lei
  2015-09-15 12:14                             ` Josh Boyer
  2 siblings, 1 reply; 23+ messages in thread
From: Ming Lei @ 2015-09-12 13:19 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Josh Boyer, ejt, Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org, mlin, awilliam, tom.leiming

On Fri, 11 Sep 2015 17:43:15 -0400
Mike Snitzer <snitzer@redhat.com> wrote:

> Ming, Jens, others:
> 
> Please see this BZ comment that speaks to a 4.3 regression due to the
> late bio splitting changes:
> https://bugzilla.redhat.com/show_bug.cgi?id=1247382#c41

I think it is a bug of bounce_end_io, and the following patch may
fix it.

----
>From 08df0db0be41e6bea306bcf5b4d325f5a79dc7a1 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Sat, 12 Sep 2015 20:48:42 +0800
Subject: [PATCH] block: fix bounce_end_io

When bio bounce is involved, one new bio and its io vector are
cloned from the coming bio, which can be one fast-cloned bio
and its io vector can be shared with another bio too, especially
after bio_split() is introduced.

So it is obviously wrong to assume the start index of the original
bio's io vector is zero, which can be any value between 0 and
(bi_max_vecs - 1), especially in case of bio split.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/bounce.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/bounce.c b/block/bounce.c
index 0611aea..1cb5dd3 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -128,12 +128,14 @@ static void bounce_end_io(struct bio *bio, mempool_t *pool)
 	struct bio *bio_orig = bio->bi_private;
 	struct bio_vec *bvec, *org_vec;
 	int i;
+	int start = bio_orig->bi_iter.bi_idx;
 
 	/*
 	 * free up bounce indirect pages used
 	 */
 	bio_for_each_segment_all(bvec, bio, i) {
-		org_vec = bio_orig->bi_io_vec + i;
+		org_vec = bio_orig->bi_io_vec + i + start;
+
 		if (bvec->bv_page == org_vec->bv_page)
 			continue;
 
-- 
1.9.1

> But inlined here so we can continue on list:
> (In reply to Josh Boyer from comment #40)
> > The function that was fixed in 4.2 doesn't exist any longer in
> > 4.3.0-0.rc0.git6.1.fc24.  That kernel corresponds to Linux
> > v4.2-6105-gdd5cdb48edfd which contains commit
> > 8ae126660fddbeebb9251a174e6fa45b6ad8f932, which removed it completely.  So
> > whatever fix was made in dm_merge_bvec doesn't seem to have made it to
> > whatever replaced it.
> 
> The dm core fix to dm_merge_bvec was commit bd4aaf8f9b ("dm: fix
> dm_merge_bvec regression on 32 bit systems").  But I'm not sure there is
> a clear equivalent in the late bio splitting code that replaced block
> core's merge_bvec logic.
> 
> merge_bvec was all about limiting bios (by asking "can/should this page
> be added to this bio?") whereas the late bio splitting is more "build
> the bios as large as possible and worry about splitting later".

IMO, given one vector can only point to one page, there shouldn't
have difference between the two.

> 
> Regardless, this regression needs to be reported to Ming Lin
> <ming.l@ssi.samsung.com>, Jens Axboe and the others involved in
> maintaining the late bio splitting changes in block core.
> 
> Josh and/or Adam: it would _really_ help if the regression test you guys
> are using could be handed-over and/or explained to us.  Is it as simple
> as loading a 32bit with a particular config?  Can you share the guest
> image if it is small enough?

Josh, Adam, would you mind testing the above patch to see if it can fix
your issue?

Thanks,
Ming

> 
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848]
  2015-09-12 13:19                           ` Ming Lei
@ 2015-09-15 12:14                             ` Josh Boyer
  2015-09-16 17:56                               ` Josh Boyer
  0 siblings, 1 reply; 23+ messages in thread
From: Josh Boyer @ 2015-09-15 12:14 UTC (permalink / raw)
  To: Ming Lei
  Cc: Mike Snitzer, ejt, Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org, mlin, Adam Williamson,
	tom.leiming

On Sat, Sep 12, 2015 at 9:19 AM, Ming Lei <ming.lei@canonical.com> wrote:
> On Fri, 11 Sep 2015 17:43:15 -0400
> Mike Snitzer <snitzer@redhat.com> wrote:
>
>> Ming, Jens, others:
>>
>> Please see this BZ comment that speaks to a 4.3 regression due to the
>> late bio splitting changes:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1247382#c41
>
> I think it is a bug of bounce_end_io, and the following patch may
> fix it.
>
> ----
> From 08df0db0be41e6bea306bcf5b4d325f5a79dc7a1 Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@canonical.com>
> Date: Sat, 12 Sep 2015 20:48:42 +0800
> Subject: [PATCH] block: fix bounce_end_io
>
> When bio bounce is involved, one new bio and its io vector are
> cloned from the coming bio, which can be one fast-cloned bio
> and its io vector can be shared with another bio too, especially
> after bio_split() is introduced.
>
> So it is obviously wrong to assume the start index of the original
> bio's io vector is zero, which can be any value between 0 and
> (bi_max_vecs - 1), especially in case of bio split.
>
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>  block/bounce.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/block/bounce.c b/block/bounce.c
> index 0611aea..1cb5dd3 100644
> --- a/block/bounce.c
> +++ b/block/bounce.c
> @@ -128,12 +128,14 @@ static void bounce_end_io(struct bio *bio, mempool_t *pool)
>         struct bio *bio_orig = bio->bi_private;
>         struct bio_vec *bvec, *org_vec;
>         int i;
> +       int start = bio_orig->bi_iter.bi_idx;
>
>         /*
>          * free up bounce indirect pages used
>          */
>         bio_for_each_segment_all(bvec, bio, i) {
> -               org_vec = bio_orig->bi_io_vec + i;
> +               org_vec = bio_orig->bi_io_vec + i + start;
> +
>                 if (bvec->bv_page == org_vec->bv_page)
>                         continue;
>
> --
> 1.9.1
>
>> But inlined here so we can continue on list:
>> (In reply to Josh Boyer from comment #40)
>> > The function that was fixed in 4.2 doesn't exist any longer in
>> > 4.3.0-0.rc0.git6.1.fc24.  That kernel corresponds to Linux
>> > v4.2-6105-gdd5cdb48edfd which contains commit
>> > 8ae126660fddbeebb9251a174e6fa45b6ad8f932, which removed it completely.  So
>> > whatever fix was made in dm_merge_bvec doesn't seem to have made it to
>> > whatever replaced it.
>>
>> The dm core fix to dm_merge_bvec was commit bd4aaf8f9b ("dm: fix
>> dm_merge_bvec regression on 32 bit systems").  But I'm not sure there is
>> a clear equivalent in the late bio splitting code that replaced block
>> core's merge_bvec logic.
>>
>> merge_bvec was all about limiting bios (by asking "can/should this page
>> be added to this bio?") whereas the late bio splitting is more "build
>> the bios as large as possible and worry about splitting later".
>
> IMO, given one vector can only point to one page, there shouldn't
> have difference between the two.
>
>>
>> Regardless, this regression needs to be reported to Ming Lin
>> <ming.l@ssi.samsung.com>, Jens Axboe and the others involved in
>> maintaining the late bio splitting changes in block core.
>>
>> Josh and/or Adam: it would _really_ help if the regression test you guys
>> are using could be handed-over and/or explained to us.  Is it as simple
>> as loading a 32bit with a particular config?  Can you share the guest
>> image if it is small enough?
>
> Josh, Adam, would you mind testing the above patch to see if it can fix
> your issue?

Sorry for the delay in reply.  I'll try and work with Adam today to
get this tested.

josh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848]
  2015-09-15 12:14                             ` Josh Boyer
@ 2015-09-16 17:56                               ` Josh Boyer
  2015-09-17 15:24                                 ` Adam Williamson
  0 siblings, 1 reply; 23+ messages in thread
From: Josh Boyer @ 2015-09-16 17:56 UTC (permalink / raw)
  To: Ming Lei
  Cc: Mike Snitzer, ejt, Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org, mlin, Adam Williamson,
	tom.leiming

On Tue, Sep 15, 2015 at 8:14 AM, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> On Sat, Sep 12, 2015 at 9:19 AM, Ming Lei <ming.lei@canonical.com> wrote:
>> On Fri, 11 Sep 2015 17:43:15 -0400
>> Mike Snitzer <snitzer@redhat.com> wrote:
>>
>>> Ming, Jens, others:
>>>
>>> Please see this BZ comment that speaks to a 4.3 regression due to the
>>> late bio splitting changes:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1247382#c41
>>
>> I think it is a bug of bounce_end_io, and the following patch may
>> fix it.
>>
>> ----
>> From 08df0db0be41e6bea306bcf5b4d325f5a79dc7a1 Mon Sep 17 00:00:00 2001
>> From: Ming Lei <ming.lei@canonical.com>
>> Date: Sat, 12 Sep 2015 20:48:42 +0800
>> Subject: [PATCH] block: fix bounce_end_io
>>
>> When bio bounce is involved, one new bio and its io vector are
>> cloned from the coming bio, which can be one fast-cloned bio
>> and its io vector can be shared with another bio too, especially
>> after bio_split() is introduced.
>>
>> So it is obviously wrong to assume the start index of the original
>> bio's io vector is zero, which can be any value between 0 and
>> (bi_max_vecs - 1), especially in case of bio split.
>>
>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>> ---
>>  block/bounce.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/bounce.c b/block/bounce.c
>> index 0611aea..1cb5dd3 100644
>> --- a/block/bounce.c
>> +++ b/block/bounce.c
>> @@ -128,12 +128,14 @@ static void bounce_end_io(struct bio *bio, mempool_t *pool)
>>         struct bio *bio_orig = bio->bi_private;
>>         struct bio_vec *bvec, *org_vec;
>>         int i;
>> +       int start = bio_orig->bi_iter.bi_idx;
>>
>>         /*
>>          * free up bounce indirect pages used
>>          */
>>         bio_for_each_segment_all(bvec, bio, i) {
>> -               org_vec = bio_orig->bi_io_vec + i;
>> +               org_vec = bio_orig->bi_io_vec + i + start;
>> +
>>                 if (bvec->bv_page == org_vec->bv_page)
>>                         continue;
>>
>> --
>> 1.9.1
>>
>>> But inlined here so we can continue on list:
>>> (In reply to Josh Boyer from comment #40)
>>> > The function that was fixed in 4.2 doesn't exist any longer in
>>> > 4.3.0-0.rc0.git6.1.fc24.  That kernel corresponds to Linux
>>> > v4.2-6105-gdd5cdb48edfd which contains commit
>>> > 8ae126660fddbeebb9251a174e6fa45b6ad8f932, which removed it completely.  So
>>> > whatever fix was made in dm_merge_bvec doesn't seem to have made it to
>>> > whatever replaced it.
>>>
>>> The dm core fix to dm_merge_bvec was commit bd4aaf8f9b ("dm: fix
>>> dm_merge_bvec regression on 32 bit systems").  But I'm not sure there is
>>> a clear equivalent in the late bio splitting code that replaced block
>>> core's merge_bvec logic.
>>>
>>> merge_bvec was all about limiting bios (by asking "can/should this page
>>> be added to this bio?") whereas the late bio splitting is more "build
>>> the bios as large as possible and worry about splitting later".
>>
>> IMO, given one vector can only point to one page, there shouldn't
>> have difference between the two.
>>
>>>
>>> Regardless, this regression needs to be reported to Ming Lin
>>> <ming.l@ssi.samsung.com>, Jens Axboe and the others involved in
>>> maintaining the late bio splitting changes in block core.
>>>
>>> Josh and/or Adam: it would _really_ help if the regression test you guys
>>> are using could be handed-over and/or explained to us.  Is it as simple
>>> as loading a 32bit with a particular config?  Can you share the guest
>>> image if it is small enough?
>>
>> Josh, Adam, would you mind testing the above patch to see if it can fix
>> your issue?
>
> Sorry for the delay in reply.  I'll try and work with Adam today to
> get this tested.

FWIW, reproducing the environment to recreate this is rather difficult
at the moment for reasons unrelated to the kernel.  We're going to add
the patch to our rawhide kernel so it gets pulled into tomorrow's
compose.  We'll test it as soon as it is available and let you know.

josh

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848]
  2015-09-16 17:56                               ` Josh Boyer
@ 2015-09-17 15:24                                 ` Adam Williamson
  2015-09-17 15:51                                   ` Ming Lei
  0 siblings, 1 reply; 23+ messages in thread
From: Adam Williamson @ 2015-09-17 15:24 UTC (permalink / raw)
  To: Josh Boyer, Ming Lei
  Cc: Mike Snitzer, ejt, Johannes Weiner, Tejun Heo, Jens Axboe,
	Linux-Kernel@Vger. Kernel. Org, mlin, tom.leiming

On Wed, 2015-09-16 at 13:56 -0400, Josh Boyer wrote:

> > > Josh, Adam, would you mind testing the above patch to see if it
> > > can fix
> > > your issue?
> > 
> > Sorry for the delay in reply.  I'll try and work with Adam today to
> > get this tested.
> 
> FWIW, reproducing the environment to recreate this is rather
> difficult
> at the moment for reasons unrelated to the kernel.  We're going to
> add
> the patch to our rawhide kernel so it gets pulled into tomorrow's
> compose.  We'll test it as soon as it is available and let you know.

The fix looks good in testing - today's 32-bit Rawhide installer
images boot successfully. Thanks.
-- 
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848]
  2015-09-17 15:24                                 ` Adam Williamson
@ 2015-09-17 15:51                                   ` Ming Lei
  0 siblings, 0 replies; 23+ messages in thread
From: Ming Lei @ 2015-09-17 15:51 UTC (permalink / raw)
  To: Adam Williamson
  Cc: Josh Boyer, Mike Snitzer, ejt, Johannes Weiner, Tejun Heo,
	Jens Axboe, Linux-Kernel@Vger. Kernel. Org, mlin

On Thu, Sep 17, 2015 at 11:24 PM, Adam Williamson <awilliam@redhat.com> wrote:
> On Wed, 2015-09-16 at 13:56 -0400, Josh Boyer wrote:
>
>> > > Josh, Adam, would you mind testing the above patch to see if it
>> > > can fix
>> > > your issue?
>> >
>> > Sorry for the delay in reply.  I'll try and work with Adam today to
>> > get this tested.
>>
>> FWIW, reproducing the environment to recreate this is rather
>> difficult
>> at the moment for reasons unrelated to the kernel.  We're going to
>> add
>> the patch to our rawhide kernel so it gets pulled into tomorrow's
>> compose.  We'll test it as soon as it is available and let you know.
>
> The fix looks good in testing - today's 32-bit Rawhide installer
> images boot successfully. Thanks.

That is great, and thanks for your test, and I'll prepare one formal version
for merge.

Thanks,

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-09-17 15:51 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-29 13:27 cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848 Josh Boyer
2015-07-29 13:51 ` Johannes Weiner
2015-07-29 15:32   ` Ming Lei
2015-07-29 16:36     ` Josh Boyer
2015-07-30  0:29       ` Ming Lei
2015-07-30 11:27         ` Josh Boyer
2015-07-30 23:14           ` Josh Boyer
2015-07-31  0:19             ` Mike Snitzer
2015-07-31 18:58               ` Josh Boyer
2015-08-02 14:01                 ` Josh Boyer
2015-08-03 14:28                   ` Mike Snitzer
2015-08-03 16:56                     ` Josh Boyer
2015-08-04  1:11                       ` Josh Boyer
2015-09-11 21:43                         ` 32-bit bio regression with 4.3 [was: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848] Mike Snitzer
2015-09-11 21:50                           ` Adam Williamson
2015-09-12  4:43                           ` Ming Lin
2015-09-12  7:34                             ` Ming Lin
2015-09-12  7:52                               ` Ming Lin
2015-09-12 13:19                           ` Ming Lei
2015-09-15 12:14                             ` Josh Boyer
2015-09-16 17:56                               ` Josh Boyer
2015-09-17 15:24                                 ` Adam Williamson
2015-09-17 15:51                                   ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).