* kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host @ 2015-12-11 12:01 Eryu Guan 2015-12-15 11:20 ` Eryu Guan 2016-01-05 11:57 ` Eryu Guan 0 siblings, 2 replies; 18+ messages in thread From: Eryu Guan @ 2015-12-11 12:01 UTC (permalink / raw) To: linux-scsi; +Cc: Martin K. Petersen Hi, I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced easily on ppc64 host by: modprobe scsi_debug sector_size=512 physblk_exp=3 dev_size_mb=256 And I bisected to this commit commit ca369d51b3e1649be4a72addd6d6a168cfb3f537 Author: Martin K. Petersen <martin.petersen@oracle.com> Date: Fri Nov 13 16:46:48 2015 -0500 block/sd: Fix device-imposed transfer length limits I confirmed by reverting this commit on top of 4.4-rc4 kernel and test passed. Thanks, Eryu P.S. dmesg log [ 817.477557] scsi_debug:sdebug_driver_probe: host protection [ 817.477571] scsi host1: scsi_debug, version 1.85 [20141022], dev_size_mb=256, opts=0x0 [ 817.478202] scsi 1:0:0:0: Direct-Access Linux scsi_debug 0184 PQ: 0 ANSI: 6 [ 817.478733] sd 1:0:0:0: Attached scsi generic sg1 type 0 [ 817.496144] sd 1:0:0:0: [sdb] 524288 512-byte logical blocks: (268 MB/256 MiB) [ 817.496155] sd 1:0:0:0: [sdb] 4096-byte physical blocks [ 817.506142] sd 1:0:0:0: [sdb] Write Protect is off [ 817.526134] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 817.646163] ------------[ cut here ]------------ [ 817.646168] kernel BUG at block/bio.c:1787! [ 817.646172] Oops: Exception in kernel mode, sig: 5 [#1] [ 817.646174] SMP NR_CPUS=2048 NUMA pSeries [ 817.646178] Modules linked in: scsi_debug(E) nfsv3(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) dm_mod(E) loop(E) sg(E) pseries_rng(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) sunrpc(E) grace(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) [ 817.646205] CPU: 6 PID: 166 Comm: kworker/u321:1 Tainted: G E 4.4.0-rc4 #1 [ 817.646211] Workqueue: events_unbound .async_run_entry_fn [ 817.646215] task: c00000000a0c0000 ti: c00000000a180000 task.ti: c00000000a180000 [ 817.646218] NIP: c0000000003b1d54 LR: c0000000003c4780 CTR: c0000000003be420 [ 817.646222] REGS: c00000000a1826c0 TRAP: 0700 Tainted: G E (4.4.0-rc4) [ 817.646225] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI> CR: 24732728 XER: 00000000 [ 817.646233] CFAR: c0000000003c477c SOFTE: 1 GPR00: c0000000003c4780 c00000000a182940 c000000001325e00 c00000016cebcf00 GPR04: 0000000000000000 0000000002400000 c00000013c5f4d80 0000000000000040 GPR08: f000000000436ac0 0000000000000001 0000000000000000 ffffffffffffffff GPR12: 0000000024732722 c00000000e743900 0000000000000000 f000000000436ac0 GPR16: c0000000f9e3eee0 c00000010dab0000 0000000000000001 0000000000000000 GPR20: 0000000000000000 0000000000000080 0000000000000000 c00000016cebcf00 GPR24: c0000000ff9b5a20 c00000000a182bb8 c00000016cebcf88 0000000000000000 GPR28: 0000000000000000 c00000016cebcf00 0000000000000000 0000000000010000 [ 817.646273] NIP [c0000000003b1d54] .bio_split+0x34/0x110 [ 817.646277] LR [c0000000003c4780] .blk_queue_split+0x3b0/0x560 [ 817.646280] Call Trace: [ 817.646282] [c00000000a182940] [c00000000a1829d0] 0xc00000000a1829d0 (unreliable) [ 817.646287] [c00000000a1829d0] [c0000000003c4780] .blk_queue_split+0x3b0/0x560 [ 817.646291] [c00000000a182ae0] [c0000000003be460] .blk_queue_bio+0x40/0x430 [ 817.646295] [c00000000a182b80] [c0000000003bc0f0] .generic_make_request+0x150/0x210 [ 817.646299] [c00000000a182c30] [c0000000003bc26c] .submit_bio+0xbc/0x1c0 [ 817.646304] [c00000000a182cf0] [c0000000002cb64c] .submit_bh_wbc+0x19c/0x200 [ 817.646308] [c00000000a182d90] [c0000000002cbb10] .block_read_full_page+0x310/0x410 [ 817.646312] [c00000000a183290] [c0000000002cf11c] .blkdev_readpage+0x1c/0x30 [ 817.646316] [c00000000a183300] [c0000000001e51a0] .do_read_cache_page+0xc0/0x290 [ 817.646321] [c00000000a1833c0] [c0000000003d59f8] .read_dev_sector+0x38/0xb0 [ 817.646325] [c00000000a183440] [c0000000003d977c] .read_lba+0xcc/0x1f0 [ 817.646329] [c00000000a1834f0] [c0000000003da3b8] .efi_partition+0x118/0x780 [ 817.646333] [c00000000a183670] [c0000000003d6fcc] .check_partition+0x14c/0x2e0 [ 817.646337] [c00000000a183700] [c0000000003d6260] .rescan_partitions+0xd0/0x380 [ 817.646341] [c00000000a1837e0] [c0000000002d0b88] .__blkdev_get+0x3d8/0x530 [ 817.646345] [c00000000a1838a0] [c0000000002d0f10] .blkdev_get+0x230/0x4a0 [ 817.646348] [c00000000a1839a0] [c0000000003d3288] .add_disk+0x468/0x4f0 [ 817.646353] [c00000000a183a60] [d000000002026450] .sd_probe_async+0xf0/0x230 [sd_mod] [ 817.646357] [c00000000a183af0] [c0000000000d23a8] .async_run_entry_fn+0x98/0x200 [ 817.646362] [c00000000a183ba0] [c0000000000c6d74] .process_one_work+0x1a4/0x490 [ 817.646366] [c00000000a183c40] [c0000000000c71dc] .worker_thread+0x17c/0x5a0 [ 817.646369] [c00000000a183d30] [c0000000000ce704] .kthread+0x104/0x130 [ 817.646374] [c00000000a183e30] [c000000000009534] .ret_from_kernel_thread+0x58/0xa4 [ 817.646377] Instruction dump: [ 817.646379] 3924ffff 7d292378 fba1ffe8 55290ffe fbc1fff0 fb81ffe0 fbe1fff8 7c9e2378 [ 817.646386] 7c7d1b78 f8010010 7d2907b4 f821ff71 <0b090000> 81230028 789c0020 5529ba7e [ 817.646394] ---[ end trace 0c08ee96e8610127 ]--- [ 817.647718] [ 819.647756] Kernel panic - not syncing: Fatal exception [ 819.656776] Rebooting in 10 seconds.. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-11 12:01 kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host Eryu Guan @ 2015-12-15 11:20 ` Eryu Guan 2015-12-15 12:06 ` Ming Lei 2015-12-15 15:38 ` Martin K. Petersen 2016-01-05 11:57 ` Eryu Guan 1 sibling, 2 replies; 18+ messages in thread From: Eryu Guan @ 2015-12-15 11:20 UTC (permalink / raw) To: linux-scsi; +Cc: Martin K. Petersen On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote: > Hi, > > I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced > easily on ppc64 host by: This is still reproducible with 4.4-rc5 kernel. Thanks, Eryu > > modprobe scsi_debug sector_size=512 physblk_exp=3 dev_size_mb=256 > > And I bisected to this commit > > commit ca369d51b3e1649be4a72addd6d6a168cfb3f537 > Author: Martin K. Petersen <martin.petersen@oracle.com> > Date: Fri Nov 13 16:46:48 2015 -0500 > > block/sd: Fix device-imposed transfer length limits > > I confirmed by reverting this commit on top of 4.4-rc4 kernel and test > passed. > > Thanks, > Eryu > > P.S. dmesg log > [ 817.477557] scsi_debug:sdebug_driver_probe: host protection > [ 817.477571] scsi host1: scsi_debug, version 1.85 [20141022], dev_size_mb=256, opts=0x0 > [ 817.478202] scsi 1:0:0:0: Direct-Access Linux scsi_debug 0184 PQ: 0 ANSI: 6 > [ 817.478733] sd 1:0:0:0: Attached scsi generic sg1 type 0 > [ 817.496144] sd 1:0:0:0: [sdb] 524288 512-byte logical blocks: (268 MB/256 MiB) > [ 817.496155] sd 1:0:0:0: [sdb] 4096-byte physical blocks > [ 817.506142] sd 1:0:0:0: [sdb] Write Protect is off > [ 817.526134] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA > [ 817.646163] ------------[ cut here ]------------ > [ 817.646168] kernel BUG at block/bio.c:1787! > [ 817.646172] Oops: Exception in kernel mode, sig: 5 [#1] > [ 817.646174] SMP NR_CPUS=2048 NUMA pSeries > [ 817.646178] Modules linked in: scsi_debug(E) nfsv3(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) dm_mod(E) loop(E) sg(E) pseries_rng(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) sunrpc(E) grace(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) > [ 817.646205] CPU: 6 PID: 166 Comm: kworker/u321:1 Tainted: G E 4.4.0-rc4 #1 > [ 817.646211] Workqueue: events_unbound .async_run_entry_fn > [ 817.646215] task: c00000000a0c0000 ti: c00000000a180000 task.ti: c00000000a180000 > [ 817.646218] NIP: c0000000003b1d54 LR: c0000000003c4780 CTR: c0000000003be420 > [ 817.646222] REGS: c00000000a1826c0 TRAP: 0700 Tainted: G E (4.4.0-rc4) > [ 817.646225] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI> CR: 24732728 XER: 00000000 > [ 817.646233] CFAR: c0000000003c477c SOFTE: 1 > GPR00: c0000000003c4780 c00000000a182940 c000000001325e00 c00000016cebcf00 > GPR04: 0000000000000000 0000000002400000 c00000013c5f4d80 0000000000000040 > GPR08: f000000000436ac0 0000000000000001 0000000000000000 ffffffffffffffff > GPR12: 0000000024732722 c00000000e743900 0000000000000000 f000000000436ac0 > GPR16: c0000000f9e3eee0 c00000010dab0000 0000000000000001 0000000000000000 > GPR20: 0000000000000000 0000000000000080 0000000000000000 c00000016cebcf00 > GPR24: c0000000ff9b5a20 c00000000a182bb8 c00000016cebcf88 0000000000000000 > GPR28: 0000000000000000 c00000016cebcf00 0000000000000000 0000000000010000 > [ 817.646273] NIP [c0000000003b1d54] .bio_split+0x34/0x110 > [ 817.646277] LR [c0000000003c4780] .blk_queue_split+0x3b0/0x560 > [ 817.646280] Call Trace: > [ 817.646282] [c00000000a182940] [c00000000a1829d0] 0xc00000000a1829d0 (unreliable) > [ 817.646287] [c00000000a1829d0] [c0000000003c4780] .blk_queue_split+0x3b0/0x560 > [ 817.646291] [c00000000a182ae0] [c0000000003be460] .blk_queue_bio+0x40/0x430 > [ 817.646295] [c00000000a182b80] [c0000000003bc0f0] .generic_make_request+0x150/0x210 > [ 817.646299] [c00000000a182c30] [c0000000003bc26c] .submit_bio+0xbc/0x1c0 > [ 817.646304] [c00000000a182cf0] [c0000000002cb64c] .submit_bh_wbc+0x19c/0x200 > [ 817.646308] [c00000000a182d90] [c0000000002cbb10] .block_read_full_page+0x310/0x410 > [ 817.646312] [c00000000a183290] [c0000000002cf11c] .blkdev_readpage+0x1c/0x30 > [ 817.646316] [c00000000a183300] [c0000000001e51a0] .do_read_cache_page+0xc0/0x290 > [ 817.646321] [c00000000a1833c0] [c0000000003d59f8] .read_dev_sector+0x38/0xb0 > [ 817.646325] [c00000000a183440] [c0000000003d977c] .read_lba+0xcc/0x1f0 > [ 817.646329] [c00000000a1834f0] [c0000000003da3b8] .efi_partition+0x118/0x780 > [ 817.646333] [c00000000a183670] [c0000000003d6fcc] .check_partition+0x14c/0x2e0 > [ 817.646337] [c00000000a183700] [c0000000003d6260] .rescan_partitions+0xd0/0x380 > [ 817.646341] [c00000000a1837e0] [c0000000002d0b88] .__blkdev_get+0x3d8/0x530 > [ 817.646345] [c00000000a1838a0] [c0000000002d0f10] .blkdev_get+0x230/0x4a0 > [ 817.646348] [c00000000a1839a0] [c0000000003d3288] .add_disk+0x468/0x4f0 > [ 817.646353] [c00000000a183a60] [d000000002026450] .sd_probe_async+0xf0/0x230 [sd_mod] > [ 817.646357] [c00000000a183af0] [c0000000000d23a8] .async_run_entry_fn+0x98/0x200 > [ 817.646362] [c00000000a183ba0] [c0000000000c6d74] .process_one_work+0x1a4/0x490 > [ 817.646366] [c00000000a183c40] [c0000000000c71dc] .worker_thread+0x17c/0x5a0 > [ 817.646369] [c00000000a183d30] [c0000000000ce704] .kthread+0x104/0x130 > [ 817.646374] [c00000000a183e30] [c000000000009534] .ret_from_kernel_thread+0x58/0xa4 > [ 817.646377] Instruction dump: > [ 817.646379] 3924ffff 7d292378 fba1ffe8 55290ffe fbc1fff0 fb81ffe0 fbe1fff8 7c9e2378 > [ 817.646386] 7c7d1b78 f8010010 7d2907b4 f821ff71 <0b090000> 81230028 789c0020 5529ba7e > [ 817.646394] ---[ end trace 0c08ee96e8610127 ]--- > [ 817.647718] > [ 819.647756] Kernel panic - not syncing: Fatal exception > [ 819.656776] Rebooting in 10 seconds.. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-15 11:20 ` Eryu Guan @ 2015-12-15 12:06 ` Ming Lei 2015-12-15 13:06 ` Eryu Guan 2015-12-15 15:38 ` Martin K. Petersen 1 sibling, 1 reply; 18+ messages in thread From: Ming Lei @ 2015-12-15 12:06 UTC (permalink / raw) To: Eryu Guan; +Cc: Linux SCSI List, Martin K. Petersen [-- Attachment #1: Type: text/plain, Size: 6047 bytes --] On Tue, Dec 15, 2015 at 7:20 PM, Eryu Guan <guaneryu@gmail.com> wrote: > On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote: >> Hi, >> >> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced >> easily on ppc64 host by: > > This is still reproducible with 4.4-rc5 kernel. Could you capture the debug log after appyling the attached patch and the reproduction? Thanks, > > Thanks, > Eryu > >> >> modprobe scsi_debug sector_size=512 physblk_exp=3 dev_size_mb=256 >> >> And I bisected to this commit >> >> commit ca369d51b3e1649be4a72addd6d6a168cfb3f537 >> Author: Martin K. Petersen <martin.petersen@oracle.com> >> Date: Fri Nov 13 16:46:48 2015 -0500 >> >> block/sd: Fix device-imposed transfer length limits >> >> I confirmed by reverting this commit on top of 4.4-rc4 kernel and test >> passed. >> >> Thanks, >> Eryu >> >> P.S. dmesg log >> [ 817.477557] scsi_debug:sdebug_driver_probe: host protection >> [ 817.477571] scsi host1: scsi_debug, version 1.85 [20141022], dev_size_mb=256, opts=0x0 >> [ 817.478202] scsi 1:0:0:0: Direct-Access Linux scsi_debug 0184 PQ: 0 ANSI: 6 >> [ 817.478733] sd 1:0:0:0: Attached scsi generic sg1 type 0 >> [ 817.496144] sd 1:0:0:0: [sdb] 524288 512-byte logical blocks: (268 MB/256 MiB) >> [ 817.496155] sd 1:0:0:0: [sdb] 4096-byte physical blocks >> [ 817.506142] sd 1:0:0:0: [sdb] Write Protect is off >> [ 817.526134] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA >> [ 817.646163] ------------[ cut here ]------------ >> [ 817.646168] kernel BUG at block/bio.c:1787! >> [ 817.646172] Oops: Exception in kernel mode, sig: 5 [#1] >> [ 817.646174] SMP NR_CPUS=2048 NUMA pSeries >> [ 817.646178] Modules linked in: scsi_debug(E) nfsv3(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) fscache(E) dm_mod(E) loop(E) sg(E) pseries_rng(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) sunrpc(E) grace(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) >> [ 817.646205] CPU: 6 PID: 166 Comm: kworker/u321:1 Tainted: G E 4.4.0-rc4 #1 >> [ 817.646211] Workqueue: events_unbound .async_run_entry_fn >> [ 817.646215] task: c00000000a0c0000 ti: c00000000a180000 task.ti: c00000000a180000 >> [ 817.646218] NIP: c0000000003b1d54 LR: c0000000003c4780 CTR: c0000000003be420 >> [ 817.646222] REGS: c00000000a1826c0 TRAP: 0700 Tainted: G E (4.4.0-rc4) >> [ 817.646225] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI> CR: 24732728 XER: 00000000 >> [ 817.646233] CFAR: c0000000003c477c SOFTE: 1 >> GPR00: c0000000003c4780 c00000000a182940 c000000001325e00 c00000016cebcf00 >> GPR04: 0000000000000000 0000000002400000 c00000013c5f4d80 0000000000000040 >> GPR08: f000000000436ac0 0000000000000001 0000000000000000 ffffffffffffffff >> GPR12: 0000000024732722 c00000000e743900 0000000000000000 f000000000436ac0 >> GPR16: c0000000f9e3eee0 c00000010dab0000 0000000000000001 0000000000000000 >> GPR20: 0000000000000000 0000000000000080 0000000000000000 c00000016cebcf00 >> GPR24: c0000000ff9b5a20 c00000000a182bb8 c00000016cebcf88 0000000000000000 >> GPR28: 0000000000000000 c00000016cebcf00 0000000000000000 0000000000010000 >> [ 817.646273] NIP [c0000000003b1d54] .bio_split+0x34/0x110 >> [ 817.646277] LR [c0000000003c4780] .blk_queue_split+0x3b0/0x560 >> [ 817.646280] Call Trace: >> [ 817.646282] [c00000000a182940] [c00000000a1829d0] 0xc00000000a1829d0 (unreliable) >> [ 817.646287] [c00000000a1829d0] [c0000000003c4780] .blk_queue_split+0x3b0/0x560 >> [ 817.646291] [c00000000a182ae0] [c0000000003be460] .blk_queue_bio+0x40/0x430 >> [ 817.646295] [c00000000a182b80] [c0000000003bc0f0] .generic_make_request+0x150/0x210 >> [ 817.646299] [c00000000a182c30] [c0000000003bc26c] .submit_bio+0xbc/0x1c0 >> [ 817.646304] [c00000000a182cf0] [c0000000002cb64c] .submit_bh_wbc+0x19c/0x200 >> [ 817.646308] [c00000000a182d90] [c0000000002cbb10] .block_read_full_page+0x310/0x410 >> [ 817.646312] [c00000000a183290] [c0000000002cf11c] .blkdev_readpage+0x1c/0x30 >> [ 817.646316] [c00000000a183300] [c0000000001e51a0] .do_read_cache_page+0xc0/0x290 >> [ 817.646321] [c00000000a1833c0] [c0000000003d59f8] .read_dev_sector+0x38/0xb0 >> [ 817.646325] [c00000000a183440] [c0000000003d977c] .read_lba+0xcc/0x1f0 >> [ 817.646329] [c00000000a1834f0] [c0000000003da3b8] .efi_partition+0x118/0x780 >> [ 817.646333] [c00000000a183670] [c0000000003d6fcc] .check_partition+0x14c/0x2e0 >> [ 817.646337] [c00000000a183700] [c0000000003d6260] .rescan_partitions+0xd0/0x380 >> [ 817.646341] [c00000000a1837e0] [c0000000002d0b88] .__blkdev_get+0x3d8/0x530 >> [ 817.646345] [c00000000a1838a0] [c0000000002d0f10] .blkdev_get+0x230/0x4a0 >> [ 817.646348] [c00000000a1839a0] [c0000000003d3288] .add_disk+0x468/0x4f0 >> [ 817.646353] [c00000000a183a60] [d000000002026450] .sd_probe_async+0xf0/0x230 [sd_mod] >> [ 817.646357] [c00000000a183af0] [c0000000000d23a8] .async_run_entry_fn+0x98/0x200 >> [ 817.646362] [c00000000a183ba0] [c0000000000c6d74] .process_one_work+0x1a4/0x490 >> [ 817.646366] [c00000000a183c40] [c0000000000c71dc] .worker_thread+0x17c/0x5a0 >> [ 817.646369] [c00000000a183d30] [c0000000000ce704] .kthread+0x104/0x130 >> [ 817.646374] [c00000000a183e30] [c000000000009534] .ret_from_kernel_thread+0x58/0xa4 >> [ 817.646377] Instruction dump: >> [ 817.646379] 3924ffff 7d292378 fba1ffe8 55290ffe fbc1fff0 fb81ffe0 fbe1fff8 7c9e2378 >> [ 817.646386] 7c7d1b78 f8010010 7d2907b4 f821ff71 <0b090000> 81230028 789c0020 5529ba7e >> [ 817.646394] ---[ end trace 0c08ee96e8610127 ]--- >> [ 817.647718] >> [ 819.647756] Kernel panic - not syncing: Fatal exception >> [ 819.656776] Rebooting in 10 seconds.. > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ming Lei [-- Attachment #2: bio-dbg.patch --] [-- Type: text/x-patch, Size: 455 bytes --] diff --git a/block/bio.c b/block/bio.c index dbabd48..8d23a99 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1784,6 +1784,12 @@ struct bio *bio_split(struct bio *bio, int sectors, { struct bio *split = NULL; + if (sectors <= 0 || (sectors >= bio_sectors(bio))) { + printk("%s: sectors %d, bio_sectors %u, bi_rw %x\n", + __func__, sectors, bio_sectors(bio), + bio->bi_rw); + } + BUG_ON(sectors <= 0); BUG_ON(sectors >= bio_sectors(bio)); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-15 12:06 ` Ming Lei @ 2015-12-15 13:06 ` Eryu Guan 2015-12-15 13:27 ` Ming Lei ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Eryu Guan @ 2015-12-15 13:06 UTC (permalink / raw) To: Ming Lei; +Cc: Linux SCSI List, Martin K. Petersen On Tue, Dec 15, 2015 at 08:06:47PM +0800, Ming Lei wrote: > On Tue, Dec 15, 2015 at 7:20 PM, Eryu Guan <guaneryu@gmail.com> wrote: > > On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote: > >> Hi, > >> > >> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced > >> easily on ppc64 host by: > > > > This is still reproducible with 4.4-rc5 kernel. > > Could you capture the debug log after appyling the attached patch and > the reproduction? Thanks for looking into this! dmesg shows: [ 686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0 Thanks, Eryu P.S. full call trace [ 686.065692] scsi_debug:sdebug_driver_probe: host protection [ 686.065710] scsi host1: scsi_debug, version 1.85 [20141022], dev_size_mb=256, opts=0x0 [ 686.065981] scsi 1:0:0:0: Direct-Access Linux scsi_debug 0184 PQ: 0 ANSI: 6 [ 686.066873] sd 1:0:0:0: Attached scsi generic sg1 type 0 [ 686.077683] sd 1:0:0:0: [sdb] 524288 512-byte logical blocks: (268 MB/256 MiB) [ 686.077694] sd 1:0:0:0: [sdb] 4096-byte physical blocks [ 686.087670] sd 1:0:0:0: [sdb] Write Protect is off [ 686.107671] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0 [ 686.217695] ------------[ cut here ]------------ [ 686.217698] kernel BUG at block/bio.c:1793! [ 686.217702] Oops: Exception in kernel mode, sig: 5 [#1] [ 686.217704] SMP NR_CPUS=2048 NUMA pSeries [ 686.217707] Modules linked in: scsi_debug sg pseries_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp [ 686.217727] CPU: 8 PID: 9515 Comm: kworker/u32:0 Not tainted 4.4.0-rc5+ #33 [ 686.217733] Workqueue: events_unbound async_run_entry_fn [ 686.217737] task: c0000005edb23cc0 ti: c0000005f016c000 task.ti: c0000005f016c000 [ 686.217740] NIP: c0000000003c45c4 LR: c0000000003c46b8 CTR: 00000000013abb8c [ 686.217743] REGS: c0000005f016ea20 TRAP: 0700 Not tainted (4.4.0-rc5+) [ 686.217746] MSR: 8000000100029033 <SF,EE,ME,IR,DR,RI,LE> CR: 22bb2322 XER: 0000000f [ 686.217756] CFAR: c0000000003c46cc SOFTE: 1 GPR00: c0000000003c46b8 c0000005f016eca0 c000000001068300 000000000000002e GPR04: c0000005ffd09c50 c0000005ffd1b4a0 0000000000010000 0000000000000000 GPR08: 0000000000000001 c000000000bab284 00000005ff160000 0000000000000130 GPR12: 0000000000003f30 c00000000e7e4c00 0000000000000000 f0000000015d0e40 GPR16: c0000005f3c3b7a0 c000000574390000 0000000000000001 0000000000000000 GPR20: 0000000000000000 0000000000000080 0000000000000000 c0000005f5093200 GPR24: c0000005edb0efa0 c0000005f016ee60 c0000005f5093288 0000000000000000 GPR28: 0000000002400000 c0000005f5093200 0000000000000000 c0000005efd67600 [ 686.217797] NIP [c0000000003c45c4] bio_split+0x54/0x160 [ 686.217800] LR [c0000000003c46b8] bio_split+0x148/0x160 [ 686.217803] Call Trace: [ 686.217805] [c0000005f016eca0] [c0000000003c46b8] bio_split+0x148/0x160 (unreliable) [ 686.217810] [c0000005f016ed30] [c0000000003d75e0] blk_queue_split+0x3c0/0x570 [ 686.217814] [c0000005f016ee30] [c0000000003d10a8] blk_queue_bio+0x48/0x440 [ 686.217818] [c0000005f016ee90] [c0000000003cec9c] generic_make_request+0x15c/0x220 [ 686.217822] [c0000005f016eef0] [c0000000003cee24] submit_bio+0xc4/0x1d0 [ 686.217826] [c0000005f016efa0] [c0000000002db204] submit_bh_wbc+0x1a4/0x200 [ 686.217830] [c0000005f016eff0] [c0000000002db6f0] block_read_full_page+0x320/0x420 [ 686.217835] [c0000005f016f4a0] [c0000000002dedb4] blkdev_readpage+0x24/0x40 [ 686.217839] [c0000005f016f4c0] [c0000000001f06fc] do_read_cache_page+0xbc/0x290 [ 686.217844] [c0000005f016f530] [c0000000003e8e00] read_dev_sector+0x40/0xc0 [ 686.217848] [c0000005f016f560] [c0000000003ec6bc] read_lba+0xdc/0x200 [ 686.217851] [c0000005f016f5c0] [c0000000003ece4c] find_valid_gpt+0xec/0x740 [ 686.217855] [c0000005f016f6a0] [c0000000003ed894] efi_partition+0x3f4/0x450 [ 686.217859] [c0000005f016f820] [c0000000003ea428] check_partition+0x158/0x2f0 [ 686.217863] [c0000005f016f8a0] [c0000000003e9694] rescan_partitions+0xd4/0x390 [ 686.217867] [c0000005f016f970] [c0000000002e0938] __blkdev_get+0x3a8/0x4d0 [ 686.217871] [c0000005f016f9e0] [c0000000002e0c90] blkdev_get+0x230/0x4a0 [ 686.217875] [c0000005f016fa90] [c0000000003e65b8] add_disk+0x478/0x500 [ 686.217880] [c0000005f016fb40] [d000000003fa66a8] sd_probe_async+0xf8/0x240 [sd_mod] [ 686.217884] [c0000005f016fbc0] [c0000000000d7db8] async_run_entry_fn+0x98/0x1f0 [ 686.217888] [c0000005f016fc50] [c0000000000cc1a0] process_one_work+0x190/0x470 [ 686.217892] [c0000005f016fce0] [c0000000000cc5fc] worker_thread+0x17c/0x5a0 [ 686.217896] [c0000005f016fd80] [c0000000000d3da8] kthread+0x108/0x130 [ 686.217901] [c0000005f016fe30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4 [ 686.217904] Instruction dump: [ 686.217906] 7cdf3378 7c9e2378 7c7d1b78 f8010010 7cbc2b78 f821ff71 80c30028 40dd00e8 [ 686.217912] 54caba7e 39000000 7f8a2040 40dd00d8 <0b080000> 54c9ba7e 7bdb0020 7f89d840 [ 686.217921] ---[ end trace 80d38b6aaec5b2ff ]--- ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-15 13:06 ` Eryu Guan @ 2015-12-15 13:27 ` Ming Lei 2015-12-15 16:56 ` Eryu Guan 2015-12-15 15:38 ` Ming Lei 2015-12-15 18:29 ` Martin K. Petersen 2 siblings, 1 reply; 18+ messages in thread From: Ming Lei @ 2015-12-15 13:27 UTC (permalink / raw) To: Eryu Guan; +Cc: Linux SCSI List, Martin K. Petersen [-- Attachment #1: Type: text/plain, Size: 5444 bytes --] On Tue, Dec 15, 2015 at 9:06 PM, Eryu Guan <guaneryu@gmail.com> wrote: > On Tue, Dec 15, 2015 at 08:06:47PM +0800, Ming Lei wrote: >> On Tue, Dec 15, 2015 at 7:20 PM, Eryu Guan <guaneryu@gmail.com> wrote: >> > On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote: >> >> Hi, >> >> >> >> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced >> >> easily on ppc64 host by: >> > >> > This is still reproducible with 4.4-rc5 kernel. >> >> Could you capture the debug log after appyling the attached patch and >> the reproduction? > > Thanks for looking into this! dmesg shows: > > [ 686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0 Then I guess queue_max_sectors(q) is bad, could you apply the attached patch(and the last patch) and post the log? > > Thanks, > Eryu > > P.S. full call trace > > [ 686.065692] scsi_debug:sdebug_driver_probe: host protection > [ 686.065710] scsi host1: scsi_debug, version 1.85 [20141022], dev_size_mb=256, opts=0x0 > [ 686.065981] scsi 1:0:0:0: Direct-Access Linux scsi_debug 0184 PQ: 0 ANSI: 6 > [ 686.066873] sd 1:0:0:0: Attached scsi generic sg1 type 0 > [ 686.077683] sd 1:0:0:0: [sdb] 524288 512-byte logical blocks: (268 MB/256 MiB) > [ 686.077694] sd 1:0:0:0: [sdb] 4096-byte physical blocks > [ 686.087670] sd 1:0:0:0: [sdb] Write Protect is off > [ 686.107671] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA > [ 686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0 > [ 686.217695] ------------[ cut here ]------------ > [ 686.217698] kernel BUG at block/bio.c:1793! > [ 686.217702] Oops: Exception in kernel mode, sig: 5 [#1] > [ 686.217704] SMP NR_CPUS=2048 NUMA pSeries > [ 686.217707] Modules linked in: scsi_debug sg pseries_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp > [ 686.217727] CPU: 8 PID: 9515 Comm: kworker/u32:0 Not tainted 4.4.0-rc5+ #33 > [ 686.217733] Workqueue: events_unbound async_run_entry_fn > [ 686.217737] task: c0000005edb23cc0 ti: c0000005f016c000 task.ti: c0000005f016c000 > [ 686.217740] NIP: c0000000003c45c4 LR: c0000000003c46b8 CTR: 00000000013abb8c > [ 686.217743] REGS: c0000005f016ea20 TRAP: 0700 Not tainted (4.4.0-rc5+) > [ 686.217746] MSR: 8000000100029033 <SF,EE,ME,IR,DR,RI,LE> CR: 22bb2322 XER: 0000000f > [ 686.217756] CFAR: c0000000003c46cc SOFTE: 1 > GPR00: c0000000003c46b8 c0000005f016eca0 c000000001068300 000000000000002e > GPR04: c0000005ffd09c50 c0000005ffd1b4a0 0000000000010000 0000000000000000 > GPR08: 0000000000000001 c000000000bab284 00000005ff160000 0000000000000130 > GPR12: 0000000000003f30 c00000000e7e4c00 0000000000000000 f0000000015d0e40 > GPR16: c0000005f3c3b7a0 c000000574390000 0000000000000001 0000000000000000 > GPR20: 0000000000000000 0000000000000080 0000000000000000 c0000005f5093200 > GPR24: c0000005edb0efa0 c0000005f016ee60 c0000005f5093288 0000000000000000 > GPR28: 0000000002400000 c0000005f5093200 0000000000000000 c0000005efd67600 > [ 686.217797] NIP [c0000000003c45c4] bio_split+0x54/0x160 > [ 686.217800] LR [c0000000003c46b8] bio_split+0x148/0x160 > [ 686.217803] Call Trace: > [ 686.217805] [c0000005f016eca0] [c0000000003c46b8] bio_split+0x148/0x160 (unreliable) > [ 686.217810] [c0000005f016ed30] [c0000000003d75e0] blk_queue_split+0x3c0/0x570 > [ 686.217814] [c0000005f016ee30] [c0000000003d10a8] blk_queue_bio+0x48/0x440 > [ 686.217818] [c0000005f016ee90] [c0000000003cec9c] generic_make_request+0x15c/0x220 > [ 686.217822] [c0000005f016eef0] [c0000000003cee24] submit_bio+0xc4/0x1d0 > [ 686.217826] [c0000005f016efa0] [c0000000002db204] submit_bh_wbc+0x1a4/0x200 > [ 686.217830] [c0000005f016eff0] [c0000000002db6f0] block_read_full_page+0x320/0x420 > [ 686.217835] [c0000005f016f4a0] [c0000000002dedb4] blkdev_readpage+0x24/0x40 > [ 686.217839] [c0000005f016f4c0] [c0000000001f06fc] do_read_cache_page+0xbc/0x290 > [ 686.217844] [c0000005f016f530] [c0000000003e8e00] read_dev_sector+0x40/0xc0 > [ 686.217848] [c0000005f016f560] [c0000000003ec6bc] read_lba+0xdc/0x200 > [ 686.217851] [c0000005f016f5c0] [c0000000003ece4c] find_valid_gpt+0xec/0x740 > [ 686.217855] [c0000005f016f6a0] [c0000000003ed894] efi_partition+0x3f4/0x450 > [ 686.217859] [c0000005f016f820] [c0000000003ea428] check_partition+0x158/0x2f0 > [ 686.217863] [c0000005f016f8a0] [c0000000003e9694] rescan_partitions+0xd4/0x390 > [ 686.217867] [c0000005f016f970] [c0000000002e0938] __blkdev_get+0x3a8/0x4d0 > [ 686.217871] [c0000005f016f9e0] [c0000000002e0c90] blkdev_get+0x230/0x4a0 > [ 686.217875] [c0000005f016fa90] [c0000000003e65b8] add_disk+0x478/0x500 > [ 686.217880] [c0000005f016fb40] [d000000003fa66a8] sd_probe_async+0xf8/0x240 [sd_mod] > [ 686.217884] [c0000005f016fbc0] [c0000000000d7db8] async_run_entry_fn+0x98/0x1f0 > [ 686.217888] [c0000005f016fc50] [c0000000000cc1a0] process_one_work+0x190/0x470 > [ 686.217892] [c0000005f016fce0] [c0000000000cc5fc] worker_thread+0x17c/0x5a0 > [ 686.217896] [c0000005f016fd80] [c0000000000d3da8] kthread+0x108/0x130 > [ 686.217901] [c0000005f016fe30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4 > [ 686.217904] Instruction dump: > [ 686.217906] 7cdf3378 7c9e2378 7c7d1b78 f8010010 7cbc2b78 f821ff71 80c30028 40dd00e8 > [ 686.217912] 54caba7e 39000000 7f8a2040 40dd00d8 <0b080000> 54c9ba7e 7bdb0020 7f89d840 > [ 686.217921] ---[ end trace 80d38b6aaec5b2ff ]--- -- Ming Lei [-- Attachment #2: blk-merge-dbg.patch --] [-- Type: text/x-patch, Size: 502 bytes --] diff --git a/block/blk-merge.c b/block/blk-merge.c index b66f095..d0ea926 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -129,6 +129,15 @@ split: *segs = nsegs; if (do_split) { + if (!sectors) { + printk("%s: nseg %u, max_secs %u, max segs %u\n", + __func__, nsegs, + queue_max_sectors(q), + queue_max_segments(q)); + printk("\t bv.len %u, bv.offset %u\n", + bv.bv_len, bv.bv_offset); + } + new = bio_split(bio, sectors, GFP_NOIO, bs); if (new) bio = new; ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-15 13:27 ` Ming Lei @ 2015-12-15 16:56 ` Eryu Guan 2015-12-16 1:15 ` Ming Lei 0 siblings, 1 reply; 18+ messages in thread From: Eryu Guan @ 2015-12-15 16:56 UTC (permalink / raw) To: Ming Lei; +Cc: Linux SCSI List, Martin K. Petersen On Tue, Dec 15, 2015 at 09:27:14PM +0800, Ming Lei wrote: > On Tue, Dec 15, 2015 at 9:06 PM, Eryu Guan <guaneryu@gmail.com> wrote: > > On Tue, Dec 15, 2015 at 08:06:47PM +0800, Ming Lei wrote: > >> On Tue, Dec 15, 2015 at 7:20 PM, Eryu Guan <guaneryu@gmail.com> wrote: > >> > On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote: > >> >> Hi, > >> >> > >> >> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced > >> >> easily on ppc64 host by: > >> > > >> > This is still reproducible with 4.4-rc5 kernel. > >> > >> Could you capture the debug log after appyling the attached patch and > >> the reproduction? > > > > Thanks for looking into this! dmesg shows: > > > > [ 686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0 > > Then I guess queue_max_sectors(q) is bad, could you apply the > attached patch(and the last patch) and post the log? [ 301.279018] blk_bio_segment_split: nseg 0, max_secs 64, max segs 2048 [ 301.279023] bv.len 65536, bv.offset 0 [ 301.279026] bio_split: sectors 0, bio_sectors 128, bi_rw 0 If full call trace is needed please let me know. Thanks, Eryu ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-15 16:56 ` Eryu Guan @ 2015-12-16 1:15 ` Ming Lei 2015-12-16 1:39 ` Martin K. Petersen 0 siblings, 1 reply; 18+ messages in thread From: Ming Lei @ 2015-12-16 1:15 UTC (permalink / raw) To: Eryu Guan; +Cc: Linux SCSI List, Martin K. Petersen On Wed, Dec 16, 2015 at 12:56 AM, Eryu Guan <guaneryu@gmail.com> wrote: > On Tue, Dec 15, 2015 at 09:27:14PM +0800, Ming Lei wrote: >> On Tue, Dec 15, 2015 at 9:06 PM, Eryu Guan <guaneryu@gmail.com> wrote: >> > On Tue, Dec 15, 2015 at 08:06:47PM +0800, Ming Lei wrote: >> >> On Tue, Dec 15, 2015 at 7:20 PM, Eryu Guan <guaneryu@gmail.com> wrote: >> >> > On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote: >> >> >> Hi, >> >> >> >> >> >> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced >> >> >> easily on ppc64 host by: >> >> > >> >> > This is still reproducible with 4.4-rc5 kernel. >> >> >> >> Could you capture the debug log after appyling the attached patch and >> >> the reproduction? >> > >> > Thanks for looking into this! dmesg shows: >> > >> > [ 686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0 >> >> Then I guess queue_max_sectors(q) is bad, could you apply the >> attached patch(and the last patch) and post the log? > > [ 301.279018] blk_bio_segment_split: nseg 0, max_secs 64, max segs 2048 > [ 301.279023] bv.len 65536, bv.offset 0 > [ 301.279026] bio_split: sectors 0, bio_sectors 128, bi_rw 0 Now, the issue is quite obvious, and page size is 64K on your platform, but max_sectors is set as 64 by commit ca369d51b3e164, and I think it is wrong to set max sectors from OPTIMAL TRANSFER LENGTH. Also it is ugly to set limits->max_sectors from drivers directly, and drivers should have called block helpers to do that. > If full call trace is needed please let me know. Thanks for your test, and the above log is absolutely enough, :-) Thanks, Ming Lei ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-16 1:15 ` Ming Lei @ 2015-12-16 1:39 ` Martin K. Petersen 0 siblings, 0 replies; 18+ messages in thread From: Martin K. Petersen @ 2015-12-16 1:39 UTC (permalink / raw) To: Ming Lei; +Cc: Eryu Guan, Linux SCSI List, Martin K. Petersen >>>>> "Ming" == Ming Lei <tom.leiming@gmail.com> writes: Ming> I think it is wrong to set max sectors from OPTIMAL TRANSFER Ming> LENGTH. OTL is the preferred size for REQ_TYPE_FS requests as reported by the device. The intent is to honor that. Your patch clamps the rw_size to BLK_DEF_MAX_SECTORS which is not correct. Ming> Also it is ugly to set limits->max_sectors from drivers directly, Ming> and drivers should have called block helpers to do that. We're trying to avoid unnecessary accessor functions for the queue limits. But I will add a sanity check for the page size. And fix up scsi_debug. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-15 13:06 ` Eryu Guan 2015-12-15 13:27 ` Ming Lei @ 2015-12-15 15:38 ` Ming Lei 2015-12-15 17:16 ` Eryu Guan 2015-12-15 18:29 ` Martin K. Petersen 2 siblings, 1 reply; 18+ messages in thread From: Ming Lei @ 2015-12-15 15:38 UTC (permalink / raw) To: Eryu Guan; +Cc: Linux SCSI List, Martin K. Petersen On Tue, 15 Dec 2015 21:06:31 +0800 Eryu Guan <guaneryu@gmail.com> wrote: > On Tue, Dec 15, 2015 at 08:06:47PM +0800, Ming Lei wrote: > > On Tue, Dec 15, 2015 at 7:20 PM, Eryu Guan <guaneryu@gmail.com> wrote: > > > On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote: > > >> Hi, > > >> > > >> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced > > >> easily on ppc64 host by: > > > > > > This is still reproducible with 4.4-rc5 kernel. > > > > Could you capture the debug log after appyling the attached patch and > > the reproduction? > > Thanks for looking into this! dmesg shows: > > [ 686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0 I guess the following patch should fix the issue, and ca369d51b3 uses OPTIMAL TRANSFER LENGTH to set limits->max_sectors, which may be less than one page size. I don't understand the idea behind this change, Martin, could you explain it a bit? --- diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 3d22fc3..d66d362 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2889,10 +2889,11 @@ static int sd_revalidate_disk(struct gendisk *disk) */ if (sdkp->opt_xfer_blocks && sdkp->opt_xfer_blocks <= dev_max && sdkp->opt_xfer_blocks <= SD_DEF_XFER_BLOCKS) - rw_max = q->limits.io_opt = + q->limits.io_opt = logical_to_sectors(sdp, sdkp->opt_xfer_blocks); - else - rw_max = BLK_DEF_MAX_SECTORS; + + rw_max = min_t(unsigned, BLK_DEF_MAX_SECTORS, + q->limits.max_dev_sectors); /* Combine with controller limits */ q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q)); -- 1.9.1 > > Thanks, > Eryu > > P.S. full call trace > > [ 686.065692] scsi_debug:sdebug_driver_probe: host protection > [ 686.065710] scsi host1: scsi_debug, version 1.85 [20141022], dev_size_mb=256, opts=0x0 > [ 686.065981] scsi 1:0:0:0: Direct-Access Linux scsi_debug 0184 PQ: 0 ANSI: 6 > [ 686.066873] sd 1:0:0:0: Attached scsi generic sg1 type 0 > [ 686.077683] sd 1:0:0:0: [sdb] 524288 512-byte logical blocks: (268 MB/256 MiB) > [ 686.077694] sd 1:0:0:0: [sdb] 4096-byte physical blocks > [ 686.087670] sd 1:0:0:0: [sdb] Write Protect is off > [ 686.107671] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA > [ 686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0 > [ 686.217695] ------------[ cut here ]------------ > [ 686.217698] kernel BUG at block/bio.c:1793! > [ 686.217702] Oops: Exception in kernel mode, sig: 5 [#1] > [ 686.217704] SMP NR_CPUS=2048 NUMA pSeries > [ 686.217707] Modules linked in: scsi_debug sg pseries_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp > [ 686.217727] CPU: 8 PID: 9515 Comm: kworker/u32:0 Not tainted 4.4.0-rc5+ #33 > [ 686.217733] Workqueue: events_unbound async_run_entry_fn > [ 686.217737] task: c0000005edb23cc0 ti: c0000005f016c000 task.ti: c0000005f016c000 > [ 686.217740] NIP: c0000000003c45c4 LR: c0000000003c46b8 CTR: 00000000013abb8c > [ 686.217743] REGS: c0000005f016ea20 TRAP: 0700 Not tainted (4.4.0-rc5+) > [ 686.217746] MSR: 8000000100029033 <SF,EE,ME,IR,DR,RI,LE> CR: 22bb2322 XER: 0000000f > [ 686.217756] CFAR: c0000000003c46cc SOFTE: 1 > GPR00: c0000000003c46b8 c0000005f016eca0 c000000001068300 000000000000002e > GPR04: c0000005ffd09c50 c0000005ffd1b4a0 0000000000010000 0000000000000000 > GPR08: 0000000000000001 c000000000bab284 00000005ff160000 0000000000000130 > GPR12: 0000000000003f30 c00000000e7e4c00 0000000000000000 f0000000015d0e40 > GPR16: c0000005f3c3b7a0 c000000574390000 0000000000000001 0000000000000000 > GPR20: 0000000000000000 0000000000000080 0000000000000000 c0000005f5093200 > GPR24: c0000005edb0efa0 c0000005f016ee60 c0000005f5093288 0000000000000000 > GPR28: 0000000002400000 c0000005f5093200 0000000000000000 c0000005efd67600 > [ 686.217797] NIP [c0000000003c45c4] bio_split+0x54/0x160 > [ 686.217800] LR [c0000000003c46b8] bio_split+0x148/0x160 > [ 686.217803] Call Trace: > [ 686.217805] [c0000005f016eca0] [c0000000003c46b8] bio_split+0x148/0x160 (unreliable) > [ 686.217810] [c0000005f016ed30] [c0000000003d75e0] blk_queue_split+0x3c0/0x570 > [ 686.217814] [c0000005f016ee30] [c0000000003d10a8] blk_queue_bio+0x48/0x440 > [ 686.217818] [c0000005f016ee90] [c0000000003cec9c] generic_make_request+0x15c/0x220 > [ 686.217822] [c0000005f016eef0] [c0000000003cee24] submit_bio+0xc4/0x1d0 > [ 686.217826] [c0000005f016efa0] [c0000000002db204] submit_bh_wbc+0x1a4/0x200 > [ 686.217830] [c0000005f016eff0] [c0000000002db6f0] block_read_full_page+0x320/0x420 > [ 686.217835] [c0000005f016f4a0] [c0000000002dedb4] blkdev_readpage+0x24/0x40 > [ 686.217839] [c0000005f016f4c0] [c0000000001f06fc] do_read_cache_page+0xbc/0x290 > [ 686.217844] [c0000005f016f530] [c0000000003e8e00] read_dev_sector+0x40/0xc0 > [ 686.217848] [c0000005f016f560] [c0000000003ec6bc] read_lba+0xdc/0x200 > [ 686.217851] [c0000005f016f5c0] [c0000000003ece4c] find_valid_gpt+0xec/0x740 > [ 686.217855] [c0000005f016f6a0] [c0000000003ed894] efi_partition+0x3f4/0x450 > [ 686.217859] [c0000005f016f820] [c0000000003ea428] check_partition+0x158/0x2f0 > [ 686.217863] [c0000005f016f8a0] [c0000000003e9694] rescan_partitions+0xd4/0x390 > [ 686.217867] [c0000005f016f970] [c0000000002e0938] __blkdev_get+0x3a8/0x4d0 > [ 686.217871] [c0000005f016f9e0] [c0000000002e0c90] blkdev_get+0x230/0x4a0 > [ 686.217875] [c0000005f016fa90] [c0000000003e65b8] add_disk+0x478/0x500 > [ 686.217880] [c0000005f016fb40] [d000000003fa66a8] sd_probe_async+0xf8/0x240 [sd_mod] > [ 686.217884] [c0000005f016fbc0] [c0000000000d7db8] async_run_entry_fn+0x98/0x1f0 > [ 686.217888] [c0000005f016fc50] [c0000000000cc1a0] process_one_work+0x190/0x470 > [ 686.217892] [c0000005f016fce0] [c0000000000cc5fc] worker_thread+0x17c/0x5a0 > [ 686.217896] [c0000005f016fd80] [c0000000000d3da8] kthread+0x108/0x130 > [ 686.217901] [c0000005f016fe30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4 > [ 686.217904] Instruction dump: > [ 686.217906] 7cdf3378 7c9e2378 7c7d1b78 f8010010 7cbc2b78 f821ff71 80c30028 40dd00e8 > [ 686.217912] 54caba7e 39000000 7f8a2040 40dd00d8 <0b080000> 54c9ba7e 7bdb0020 7f89d840 > [ 686.217921] ---[ end trace 80d38b6aaec5b2ff ]--- ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-15 15:38 ` Ming Lei @ 2015-12-15 17:16 ` Eryu Guan 0 siblings, 0 replies; 18+ messages in thread From: Eryu Guan @ 2015-12-15 17:16 UTC (permalink / raw) To: Ming Lei; +Cc: Linux SCSI List, Martin K. Petersen On Tue, Dec 15, 2015 at 11:38:41PM +0800, Ming Lei wrote: > On Tue, 15 Dec 2015 21:06:31 +0800 > Eryu Guan <guaneryu@gmail.com> wrote: > > > On Tue, Dec 15, 2015 at 08:06:47PM +0800, Ming Lei wrote: > > > On Tue, Dec 15, 2015 at 7:20 PM, Eryu Guan <guaneryu@gmail.com> wrote: > > > > On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote: > > > >> Hi, > > > >> > > > >> I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced > > > >> easily on ppc64 host by: > > > > > > > > This is still reproducible with 4.4-rc5 kernel. > > > > > > Could you capture the debug log after appyling the attached patch and > > > the reproduction? > > > > Thanks for looking into this! dmesg shows: > > > > [ 686.217682] bio_split: sectors 0, bio_sectors 128, bi_rw 0 > > I guess the following patch should fix the issue, and ca369d51b3 > uses OPTIMAL TRANSFER LENGTH to set limits->max_sectors, which > may be less than one page size. > > I don't understand the idea behind this change, Martin, could > you explain it a bit? > > --- > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 3d22fc3..d66d362 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -2889,10 +2889,11 @@ static int sd_revalidate_disk(struct gendisk *disk) > */ > if (sdkp->opt_xfer_blocks && sdkp->opt_xfer_blocks <= dev_max && > sdkp->opt_xfer_blocks <= SD_DEF_XFER_BLOCKS) > - rw_max = q->limits.io_opt = > + q->limits.io_opt = > logical_to_sectors(sdp, sdkp->opt_xfer_blocks); > - else > - rw_max = BLK_DEF_MAX_SECTORS; > + > + rw_max = min_t(unsigned, BLK_DEF_MAX_SECTORS, > + q->limits.max_dev_sectors); > > /* Combine with controller limits */ > q->limits.max_sectors = min(rw_max, queue_max_hw_sectors(q)); I tested this patch and no BUG_ON this time, the debug messages are not triggered either. Thanks, Eryu ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-15 13:06 ` Eryu Guan 2015-12-15 13:27 ` Ming Lei 2015-12-15 15:38 ` Ming Lei @ 2015-12-15 18:29 ` Martin K. Petersen 2015-12-16 1:17 ` Ming Lei 2015-12-16 7:25 ` Eryu Guan 2 siblings, 2 replies; 18+ messages in thread From: Martin K. Petersen @ 2015-12-15 18:29 UTC (permalink / raw) To: Eryu Guan; +Cc: Ming Lei, Linux SCSI List, Martin K. Petersen >>>>> "Eryu" == Eryu Guan <guaneryu@gmail.com> writes: Eryu, Does the patch below fix the issue? -- Martin K. Petersen Oracle Linux Engineering diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index 3d22fc3e3c1a..d1eb7aa78b8d 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -2667,8 +2667,9 @@ static void sd_read_block_limits(struct scsi_disk *sdkp) if (buffer[3] == 0x3c) { unsigned int lba_count, desc_count; + u64 max_ws = get_unaligned_be64(&buffer[36]); - sdkp->max_ws_blocks = (u32)get_unaligned_be64(&buffer[36]); + sdkp->max_ws_blocks = (u32)max_ws; if (!sdkp->lbpme) goto out; ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-15 18:29 ` Martin K. Petersen @ 2015-12-16 1:17 ` Ming Lei 2015-12-16 1:37 ` Martin K. Petersen 2015-12-16 7:25 ` Eryu Guan 1 sibling, 1 reply; 18+ messages in thread From: Ming Lei @ 2015-12-16 1:17 UTC (permalink / raw) To: Martin K. Petersen; +Cc: Eryu Guan, Linux SCSI List On Wed, Dec 16, 2015 at 2:29 AM, Martin K. Petersen <martin.petersen@oracle.com> wrote: >>>>>> "Eryu" == Eryu Guan <guaneryu@gmail.com> writes: > > Eryu, > > Does the patch below fix the issue? No, it can't. As the debug log shows, it is because you use 'OPTIMAL TRANSFER LENGTH' to set queue's max_sectors. Thanks, > > -- > Martin K. Petersen Oracle Linux Engineering > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 3d22fc3e3c1a..d1eb7aa78b8d 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -2667,8 +2667,9 @@ static void sd_read_block_limits(struct scsi_disk *sdkp) > > if (buffer[3] == 0x3c) { > unsigned int lba_count, desc_count; > + u64 max_ws = get_unaligned_be64(&buffer[36]); > > - sdkp->max_ws_blocks = (u32)get_unaligned_be64(&buffer[36]); > + sdkp->max_ws_blocks = (u32)max_ws; > > if (!sdkp->lbpme) > goto out; -- Ming Lei ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-16 1:17 ` Ming Lei @ 2015-12-16 1:37 ` Martin K. Petersen 0 siblings, 0 replies; 18+ messages in thread From: Martin K. Petersen @ 2015-12-16 1:37 UTC (permalink / raw) To: Ming Lei; +Cc: Martin K. Petersen, Eryu Guan, Linux SCSI List >>>>> "Ming" == Ming Lei <tom.leiming@gmail.com> writes: Ming, Ming> No, it can't. Well, it fixes a problem on one of my test systems where max_ws_blocks, by virtue of being 64 bits, clobbers opt_xfer_blocks causing rw_len and thus max_sectors to be set incorrectly. We haven't run into that issue on real hardware. Probably because scsi_debug is the only driver reporting $LUDICROUS_NUMBER as the max hw transfer. Ming> As the debug log shows, it is because you use 'OPTIMAL TRANSFER Ming> LENGTH' to set queue's max_sectors. But that is intentional. I agree that the value chosen by scsi_debug in this case is very low and we should fix that. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-15 18:29 ` Martin K. Petersen 2015-12-16 1:17 ` Ming Lei @ 2015-12-16 7:25 ` Eryu Guan 1 sibling, 0 replies; 18+ messages in thread From: Eryu Guan @ 2015-12-16 7:25 UTC (permalink / raw) To: Martin K. Petersen; +Cc: Ming Lei, Linux SCSI List On Tue, Dec 15, 2015 at 01:29:59PM -0500, Martin K. Petersen wrote: > >>>>> "Eryu" == Eryu Guan <guaneryu@gmail.com> writes: > > Eryu, > > Does the patch below fix the issue? Unfortunately no, still BUG_ON. Thanks, Eryu > > -- > Martin K. Petersen Oracle Linux Engineering > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index 3d22fc3e3c1a..d1eb7aa78b8d 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -2667,8 +2667,9 @@ static void sd_read_block_limits(struct scsi_disk *sdkp) > > if (buffer[3] == 0x3c) { > unsigned int lba_count, desc_count; > + u64 max_ws = get_unaligned_be64(&buffer[36]); > > - sdkp->max_ws_blocks = (u32)get_unaligned_be64(&buffer[36]); > + sdkp->max_ws_blocks = (u32)max_ws; > > if (!sdkp->lbpme) > goto out; ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-15 11:20 ` Eryu Guan 2015-12-15 12:06 ` Ming Lei @ 2015-12-15 15:38 ` Martin K. Petersen 1 sibling, 0 replies; 18+ messages in thread From: Martin K. Petersen @ 2015-12-15 15:38 UTC (permalink / raw) To: Eryu Guan; +Cc: linux-scsi, Martin K. Petersen >>>>> "Eryu" == Eryu Guan <guaneryu@gmail.com> writes: Eryu, Eryu> This is still reproducible with 4.4-rc5 kernel. Sorry about the delay. I've been busy with a lab move and most of my machines have been disconnected since last week. Almost done getting my equipment back online. However, I think I have found the smoking gun. More in a bit... -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2015-12-11 12:01 kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host Eryu Guan 2015-12-15 11:20 ` Eryu Guan @ 2016-01-05 11:57 ` Eryu Guan 2016-01-05 23:58 ` Martin K. Petersen 1 sibling, 1 reply; 18+ messages in thread From: Eryu Guan @ 2016-01-05 11:57 UTC (permalink / raw) To: linux-scsi; +Cc: Martin K. Petersen On Fri, Dec 11, 2015 at 07:53:40PM +0800, Eryu Guan wrote: > Hi, > > I saw this kernel BUG_ON on 4.4-rc4 kernel, and this can be reproduced > easily on ppc64 host by: > > modprobe scsi_debug sector_size=512 physblk_exp=3 dev_size_mb=256 > > And I bisected to this commit > > commit ca369d51b3e1649be4a72addd6d6a168cfb3f537 > Author: Martin K. Petersen <martin.petersen@oracle.com> > Date: Fri Nov 13 16:46:48 2015 -0500 > > block/sd: Fix device-imposed transfer length limits > > I confirmed by reverting this commit on top of 4.4-rc4 kernel and test > passed. Hi, Any updates on this? It's still reproducible with 4.4-rc8 kernel, and still blocks some of my tests :) Thanks, Eryu ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2016-01-05 11:57 ` Eryu Guan @ 2016-01-05 23:58 ` Martin K. Petersen 2016-01-06 3:33 ` Eryu Guan 0 siblings, 1 reply; 18+ messages in thread From: Martin K. Petersen @ 2016-01-05 23:58 UTC (permalink / raw) To: Eryu Guan; +Cc: linux-scsi, Martin K. Petersen >>>>> "Eryu" == Eryu Guan <guaneryu@gmail.com> writes: Eryu> Any updates on this? It's still reproducible with 4.4-rc8 kernel, Eryu> and still blocks some of my tests :) http://git.kernel.org/cgit/linux/kernel/git/mkp/scsi.git/log/?h=4.4/scsi-fixes It just hasn't made it to Linus yet... -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host 2016-01-05 23:58 ` Martin K. Petersen @ 2016-01-06 3:33 ` Eryu Guan 0 siblings, 0 replies; 18+ messages in thread From: Eryu Guan @ 2016-01-06 3:33 UTC (permalink / raw) To: Martin K. Petersen; +Cc: linux-scsi On Tue, Jan 05, 2016 at 06:58:25PM -0500, Martin K. Petersen wrote: > >>>>> "Eryu" == Eryu Guan <guaneryu@gmail.com> writes: > > Eryu> Any updates on this? It's still reproducible with 4.4-rc8 kernel, > Eryu> and still blocks some of my tests :) > > http://git.kernel.org/cgit/linux/kernel/git/mkp/scsi.git/log/?h=4.4/scsi-fixes > > It just hasn't made it to Linus yet... Great to hear that, thanks! (I don't subscribe linux-scsi@ so didn't see the patch sent out) Eryu > > -- > Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2016-01-06 3:34 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-12-11 12:01 kernel BUG at block/bio.c:1787! while initializing scsi_debug on ppc64 host Eryu Guan 2015-12-15 11:20 ` Eryu Guan 2015-12-15 12:06 ` Ming Lei 2015-12-15 13:06 ` Eryu Guan 2015-12-15 13:27 ` Ming Lei 2015-12-15 16:56 ` Eryu Guan 2015-12-16 1:15 ` Ming Lei 2015-12-16 1:39 ` Martin K. Petersen 2015-12-15 15:38 ` Ming Lei 2015-12-15 17:16 ` Eryu Guan 2015-12-15 18:29 ` Martin K. Petersen 2015-12-16 1:17 ` Ming Lei 2015-12-16 1:37 ` Martin K. Petersen 2015-12-16 7:25 ` Eryu Guan 2015-12-15 15:38 ` Martin K. Petersen 2016-01-05 11:57 ` Eryu Guan 2016-01-05 23:58 ` Martin K. Petersen 2016-01-06 3:33 ` Eryu Guan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.