* Re: linux-next: Tree for Aug 1 [not found] ` <20180801224813.GA13074@roeck-us.net> @ 2018-08-01 22:52 ` James Bottomley 2018-08-01 23:00 ` James Bottomley 2018-08-01 23:47 ` Guenter Roeck 0 siblings, 2 replies; 21+ messages in thread From: James Bottomley @ 2018-08-01 22:52 UTC (permalink / raw) To: Guenter Roeck, Stephen Rothwell Cc: Linux-Next Mailing List, Linux Kernel Mailing List, linux-scsi On Wed, 2018-08-01 at 15:48 -0700, Guenter Roeck wrote: > On Wed, Aug 01, 2018 at 05:58:52PM +1000, Stephen Rothwell wrote: > > Hi all, > > > > Changes since 20180731: > > > > The pci tree gained a conflict against the pci-current tree. > > > > The net-next tree gained a conflict against the bpf tree. > > > > The block tree lost its build failure. > > > > The staging tree still had its build failure due to an interaction > > with > > the vfs tree for which I disabled CONFIG_EROFS_FS. > > > > The kspp tree lost its build failure. > > > > Non-merge commits (relative to Linus' tree): 10070 > > 9137 files changed, 417605 insertions(+), 179996 deletions(-) > > > > ----------------------------------------------------------------- > > ----------- > > > > The widespread kernel hang issues are still seen. I managed > to bisect it after working around the transient build failures. > Bisect log is attached below. Unfortunately, it doesn't help much. > The culprit is reported as: > > 2d542828c5e9 Merge remote-tracking branch 'scsi/for-next' > > The preceding merge, > > 453f1d821165 Merge remote-tracking branch 'cgroup/for-next' > > checks out fine, as does the tip of scsi-next (commit 103c7b7e0184, > "Merge branch 'misc' into for-next"). No idea how to proceed. This sounds like you may have a problem with this patch: commit d5038a13eca72fb216c07eb717169092e92284f1 Author: Johannes Thumshirn <jthumshirn@suse.de> Date: Wed Jul 4 10:53:56 2018 +0200 scsi: core: switch to scsi-mq by default To verify, boot with the additional kernel parameter scsi_mod.use_blk_mq=0 Which will reverse the effect of the above patch. We already have one report of this patch causing boot failures: https://marc.info/?t=153305327000002 So I've added linux-scsi to see if they want any more information. James > > Guenter > > --- > bisect (mips:malta_defconfig and i386): > > # bad: [d9bd94c0bcaa42d9cace337590718afd22c47bcc] Add linux-next > specific files for 20180801 > # good: [acb1872577b346bd15ab3a3f8dff780d6cca4b70] Linux 4.18-rc7 > git bisect start 'HEAD' 'v4.18-rc7' > # good: [f7952a1210bce43e88d69f371c6226aed481f307] Merge remote- > tracking branch 'spi-nor/spi-nor/next' > git bisect good f7952a1210bce43e88d69f371c6226aed481f307 > # good: [fa3bb608cd0d41a02583a7ceb3d162c4dee7e0e4] Merge remote- > tracking branch 'spi/for-next' > git bisect good fa3bb608cd0d41a02583a7ceb3d162c4dee7e0e4 > # good: [39c5cce976449c934b60e31cec9cea6986531b94] Merge remote- > tracking branch 'char-misc/char-misc-next' > git bisect good 39c5cce976449c934b60e31cec9cea6986531b94 > # good: [453f1d8211658b75542f4581759f022420bdaea8] Merge remote- > tracking branch 'cgroup/for-next' > git bisect good 453f1d8211658b75542f4581759f022420bdaea8 > # bad: [f11e9f9af170533e660c5deddccc4c494784c1fa] Merge remote- > tracking branch 'nvdimm/libnvdimm-for-next' > git bisect bad f11e9f9af170533e660c5deddccc4c494784c1fa > # bad: [d8a758324b2e012cdba05b82ecbda6b84905f6ec] Merge remote- > tracking branch 'rpmsg/for-next' > git bisect bad d8a758324b2e012cdba05b82ecbda6b84905f6ec > # good: [97fe222524f8fdbcc528b44d160d1df71d96af86] scsi: arcmsr: Fix > error of resuming from hibernation for adapter type E > git bisect good 97fe222524f8fdbcc528b44d160d1df71d96af86 > # bad: [2d542828c5e94490480b2900f8a0cb7a8c46afb0] Merge remote- > tracking branch 'scsi/for-next' > git bisect bad 2d542828c5e94490480b2900f8a0cb7a8c46afb0 > # good: [cc74e31d4147f26ead6ea06e4649d63a14edc0fe] scsi: lpfc: remove > null check on nvmebuf > git bisect good cc74e31d4147f26ead6ea06e4649d63a14edc0fe > # good: [dc335a995527fb1ee9ec5649162b22cd1ce728ee] scsi: tcmu: unmap > if dev is configured > git bisect good dc335a995527fb1ee9ec5649162b22cd1ce728ee > # good: [d92f5db64445ac4e2ce9a2cb7e6c929a5f4e712b] Merge branch > 'misc' into for-next > git bisect good d92f5db64445ac4e2ce9a2cb7e6c929a5f4e712b > # good: [0e0d75267107e6a557ea9314d55bcff05a6ede44] scsi: tcmu: use > u64 for dev_size > git bisect good 0e0d75267107e6a557ea9314d55bcff05a6ede44 > # good: [c8a75afbf72ee4c16dad5339f55f62095879f207] Revert "scsi: > target/iscsi: Reduce number of __iscsit_free_cmd() callers" > git bisect good c8a75afbf72ee4c16dad5339f55f62095879f207 > # good: [103c7b7e01849d3c5bc998168ccd4df2c443d24b] Merge branch > 'misc' into for-next > git bisect good 103c7b7e01849d3c5bc998168ccd4df2c443d24b > # first bad commit: [2d542828c5e94490480b2900f8a0cb7a8c46afb0] Merge > remote-tracking branch 'scsi/for-next' > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-01 22:52 ` linux-next: Tree for Aug 1 James Bottomley @ 2018-08-01 23:00 ` James Bottomley 2018-08-02 0:05 ` Stephen Rothwell 2018-08-01 23:47 ` Guenter Roeck 1 sibling, 1 reply; 21+ messages in thread From: James Bottomley @ 2018-08-01 23:00 UTC (permalink / raw) To: Guenter Roeck, Stephen Rothwell Cc: Linux-Next Mailing List, Linux Kernel Mailing List, linux-scsi On Wed, 2018-08-01 at 15:52 -0700, James Bottomley wrote: > On Wed, 2018-08-01 at 15:48 -0700, Guenter Roeck wrote: > > On Wed, Aug 01, 2018 at 05:58:52PM +1000, Stephen Rothwell wrote: > > > Hi all, > > > > > > Changes since 20180731: > > > > > > The pci tree gained a conflict against the pci-current tree. > > > > > > The net-next tree gained a conflict against the bpf tree. > > > > > > The block tree lost its build failure. > > > > > > The staging tree still had its build failure due to an > > > interaction > > > with > > > the vfs tree for which I disabled CONFIG_EROFS_FS. > > > > > > The kspp tree lost its build failure. > > > > > > Non-merge commits (relative to Linus' tree): 10070 > > > 9137 files changed, 417605 insertions(+), 179996 deletions(-) > > > > > > ----------------------------------------------------------------- > > > ----------- > > > > > > > The widespread kernel hang issues are still seen. I managed > > to bisect it after working around the transient build failures. > > Bisect log is attached below. Unfortunately, it doesn't help much. > > The culprit is reported as: > > > > 2d542828c5e9 Merge remote-tracking branch 'scsi/for-next' > > > > The preceding merge, > > > > 453f1d821165 Merge remote-tracking branch 'cgroup/for-next' > > > > checks out fine, as does the tip of scsi-next (commit 103c7b7e0184, > > "Merge branch 'misc' into for-next"). No idea how to proceed. So what seems to be happening to cause this is that there's a patch somewhere between the merge base of my scsi-next series and the next tree and the patch just before scsi-next was actually merged that actually causes a boot failure with blk-mq enabled. Could you try to find this patch? I think the way to do it is to try to bisect this range of linux-next using the command line scsi_mod.use_blk_mq=1 Which forces block mq to be the default and seeing where the first boot failure is (you don't need my scsi-next tree merged to do this because all the offending patch does is flip the default state of the above flag). James ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-01 23:00 ` James Bottomley @ 2018-08-02 0:05 ` Stephen Rothwell 2018-08-02 1:19 ` Guenter Roeck 0 siblings, 1 reply; 21+ messages in thread From: Stephen Rothwell @ 2018-08-02 0:05 UTC (permalink / raw) To: James Bottomley Cc: Guenter Roeck, Linux-Next Mailing List, Linux Kernel Mailing List, linux-scsi [-- Attachment #1: Type: text/plain, Size: 1028 bytes --] Hi all, On Wed, 01 Aug 2018 16:00:54 -0700 James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > > So what seems to be happening to cause this is that there's a patch > somewhere between the merge base of my scsi-next series and the next > tree and the patch just before scsi-next was actually merged that > actually causes a boot failure with blk-mq enabled. Could you try to > find this patch? I think the way to do it is to try to bisect this > range of linux-next using the command line > > scsi_mod.use_blk_mq=1 > > Which forces block mq to be the default and seeing where the first boot > failure is (you don't need my scsi-next tree merged to do this because > all the offending patch does is flip the default state of the above > flag). So this means using v4.8-rc1 as the first good commit and 453f1d821165 ("Merge remote-tracking branch 'cgroup/for-next'") as the first bad (assuming that this latter fails to boot with "scsi_mod.use_blk_mq=1"). -- Cheers, Stephen Rothwell [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-02 0:05 ` Stephen Rothwell @ 2018-08-02 1:19 ` Guenter Roeck 0 siblings, 0 replies; 21+ messages in thread From: Guenter Roeck @ 2018-08-02 1:19 UTC (permalink / raw) To: Stephen Rothwell, James Bottomley Cc: Linux-Next Mailing List, Linux Kernel Mailing List, linux-scsi On 08/01/2018 05:05 PM, Stephen Rothwell wrote: > Hi all, > > On Wed, 01 Aug 2018 16:00:54 -0700 James Bottomley <James.Bottomley@HansenPartnership.com> wrote: >> >> So what seems to be happening to cause this is that there's a patch >> somewhere between the merge base of my scsi-next series and the next >> tree and the patch just before scsi-next was actually merged that >> actually causes a boot failure with blk-mq enabled. Could you try to >> find this patch? I think the way to do it is to try to bisect this >> range of linux-next using the command line >> >> scsi_mod.use_blk_mq=1 >> >> Which forces block mq to be the default and seeing where the first boot >> failure is (you don't need my scsi-next tree merged to do this because >> all the offending patch does is flip the default state of the above >> flag). > > So this means using v4.8-rc1 as the first good commit and 453f1d821165 > ("Merge remote-tracking branch 'cgroup/for-next'") as the first bad > (assuming that this latter fails to boot with "scsi_mod.use_blk_mq=1"). > Puzzled. Same results. 453f1d821165 works with both scsi_mod.use_blk_mq=0 and scsi_mod.use_blk_mq=1. next-20180801 works with scsi_mod.use_blk_mq=0 and fails with scsi_mod.use_blk_mq=1. Bisect still points to the same commit (which just changes the default) as culprit. I know that doesn't make sense. I'll need to think about it. Guenter ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-01 22:52 ` linux-next: Tree for Aug 1 James Bottomley 2018-08-01 23:00 ` James Bottomley @ 2018-08-01 23:47 ` Guenter Roeck 2018-08-01 23:57 ` Ming Lei 1 sibling, 1 reply; 21+ messages in thread From: Guenter Roeck @ 2018-08-01 23:47 UTC (permalink / raw) To: James Bottomley Cc: Stephen Rothwell, Linux-Next Mailing List, Linux Kernel Mailing List, linux-scsi On Wed, Aug 01, 2018 at 03:52:45PM -0700, James Bottomley wrote: > On Wed, 2018-08-01 at 15:48 -0700, Guenter Roeck wrote: > > On Wed, Aug 01, 2018 at 05:58:52PM +1000, Stephen Rothwell wrote: > > > Hi all, > > > > > > Changes since 20180731: > > > > > > The pci tree gained a conflict against the pci-current tree. > > > > > > The net-next tree gained a conflict against the bpf tree. > > > > > > The block tree lost its build failure. > > > > > > The staging tree still had its build failure due to an interaction > > > with > > > the vfs tree for which I disabled CONFIG_EROFS_FS. > > > > > > The kspp tree lost its build failure. > > > > > > Non-merge commits (relative to Linus' tree): 10070 > > > 9137 files changed, 417605 insertions(+), 179996 deletions(-) > > > > > > ----------------------------------------------------------------- > > > ----------- > > > > > > > The widespread kernel hang issues are still seen. I managed > > to bisect it after working around the transient build failures. > > Bisect log is attached below. Unfortunately, it doesn't help much. > > The culprit is reported as: > > > > 2d542828c5e9 Merge remote-tracking branch 'scsi/for-next' > > > > The preceding merge, > > > > 453f1d821165 Merge remote-tracking branch 'cgroup/for-next' > > > > checks out fine, as does the tip of scsi-next (commit 103c7b7e0184, > > "Merge branch 'misc' into for-next"). No idea how to proceed. > > This sounds like you may have a problem with this patch: > > commit d5038a13eca72fb216c07eb717169092e92284f1 > Author: Johannes Thumshirn <jthumshirn@suse.de> > Date: Wed Jul 4 10:53:56 2018 +0200 > > scsi: core: switch to scsi-mq by default > > To verify, boot with the additional kernel parameter > > scsi_mod.use_blk_mq=0 > > Which will reverse the effect of the above patch. > Yes, that fixes the problem. Guenter > We already have one report of this patch causing boot failures: > > https://marc.info/?t=153305327000002 > > So I've added linux-scsi to see if they want any more information. > > James > > > > > Guenter > > > > --- > > bisect (mips:malta_defconfig and i386): > > > > # bad: [d9bd94c0bcaa42d9cace337590718afd22c47bcc] Add linux-next > > specific files for 20180801 > > # good: [acb1872577b346bd15ab3a3f8dff780d6cca4b70] Linux 4.18-rc7 > > git bisect start 'HEAD' 'v4.18-rc7' > > # good: [f7952a1210bce43e88d69f371c6226aed481f307] Merge remote- > > tracking branch 'spi-nor/spi-nor/next' > > git bisect good f7952a1210bce43e88d69f371c6226aed481f307 > > # good: [fa3bb608cd0d41a02583a7ceb3d162c4dee7e0e4] Merge remote- > > tracking branch 'spi/for-next' > > git bisect good fa3bb608cd0d41a02583a7ceb3d162c4dee7e0e4 > > # good: [39c5cce976449c934b60e31cec9cea6986531b94] Merge remote- > > tracking branch 'char-misc/char-misc-next' > > git bisect good 39c5cce976449c934b60e31cec9cea6986531b94 > > # good: [453f1d8211658b75542f4581759f022420bdaea8] Merge remote- > > tracking branch 'cgroup/for-next' > > git bisect good 453f1d8211658b75542f4581759f022420bdaea8 > > # bad: [f11e9f9af170533e660c5deddccc4c494784c1fa] Merge remote- > > tracking branch 'nvdimm/libnvdimm-for-next' > > git bisect bad f11e9f9af170533e660c5deddccc4c494784c1fa > > # bad: [d8a758324b2e012cdba05b82ecbda6b84905f6ec] Merge remote- > > tracking branch 'rpmsg/for-next' > > git bisect bad d8a758324b2e012cdba05b82ecbda6b84905f6ec > > # good: [97fe222524f8fdbcc528b44d160d1df71d96af86] scsi: arcmsr: Fix > > error of resuming from hibernation for adapter type E > > git bisect good 97fe222524f8fdbcc528b44d160d1df71d96af86 > > # bad: [2d542828c5e94490480b2900f8a0cb7a8c46afb0] Merge remote- > > tracking branch 'scsi/for-next' > > git bisect bad 2d542828c5e94490480b2900f8a0cb7a8c46afb0 > > # good: [cc74e31d4147f26ead6ea06e4649d63a14edc0fe] scsi: lpfc: remove > > null check on nvmebuf > > git bisect good cc74e31d4147f26ead6ea06e4649d63a14edc0fe > > # good: [dc335a995527fb1ee9ec5649162b22cd1ce728ee] scsi: tcmu: unmap > > if dev is configured > > git bisect good dc335a995527fb1ee9ec5649162b22cd1ce728ee > > # good: [d92f5db64445ac4e2ce9a2cb7e6c929a5f4e712b] Merge branch > > 'misc' into for-next > > git bisect good d92f5db64445ac4e2ce9a2cb7e6c929a5f4e712b > > # good: [0e0d75267107e6a557ea9314d55bcff05a6ede44] scsi: tcmu: use > > u64 for dev_size > > git bisect good 0e0d75267107e6a557ea9314d55bcff05a6ede44 > > # good: [c8a75afbf72ee4c16dad5339f55f62095879f207] Revert "scsi: > > target/iscsi: Reduce number of __iscsit_free_cmd() callers" > > git bisect good c8a75afbf72ee4c16dad5339f55f62095879f207 > > # good: [103c7b7e01849d3c5bc998168ccd4df2c443d24b] Merge branch > > 'misc' into for-next > > git bisect good 103c7b7e01849d3c5bc998168ccd4df2c443d24b > > # first bad commit: [2d542828c5e94490480b2900f8a0cb7a8c46afb0] Merge > > remote-tracking branch 'scsi/for-next' > > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-01 23:47 ` Guenter Roeck @ 2018-08-01 23:57 ` Ming Lei 2018-08-02 0:03 ` James Bottomley 2018-08-02 0:12 ` Guenter Roeck 0 siblings, 2 replies; 21+ messages in thread From: Ming Lei @ 2018-08-01 23:57 UTC (permalink / raw) To: Guenter Roeck Cc: James Bottomley, Stephen Rothwell, Linux-Next Mailing List, Linux Kernel Mailing List, linux-scsi On Thu, Aug 2, 2018 at 7:47 AM, Guenter Roeck <linux@roeck-us.net> wrote: > On Wed, Aug 01, 2018 at 03:52:45PM -0700, James Bottomley wrote: >> On Wed, 2018-08-01 at 15:48 -0700, Guenter Roeck wrote: >> > On Wed, Aug 01, 2018 at 05:58:52PM +1000, Stephen Rothwell wrote: >> > > Hi all, >> > > >> > > Changes since 20180731: >> > > >> > > The pci tree gained a conflict against the pci-current tree. >> > > >> > > The net-next tree gained a conflict against the bpf tree. >> > > >> > > The block tree lost its build failure. >> > > >> > > The staging tree still had its build failure due to an interaction >> > > with >> > > the vfs tree for which I disabled CONFIG_EROFS_FS. >> > > >> > > The kspp tree lost its build failure. >> > > >> > > Non-merge commits (relative to Linus' tree): 10070 >> > > 9137 files changed, 417605 insertions(+), 179996 deletions(-) >> > > >> > > ----------------------------------------------------------------- >> > > ----------- >> > > >> > >> > The widespread kernel hang issues are still seen. I managed >> > to bisect it after working around the transient build failures. >> > Bisect log is attached below. Unfortunately, it doesn't help much. >> > The culprit is reported as: >> > >> > 2d542828c5e9 Merge remote-tracking branch 'scsi/for-next' >> > >> > The preceding merge, >> > >> > 453f1d821165 Merge remote-tracking branch 'cgroup/for-next' >> > >> > checks out fine, as does the tip of scsi-next (commit 103c7b7e0184, >> > "Merge branch 'misc' into for-next"). No idea how to proceed. >> >> This sounds like you may have a problem with this patch: >> >> commit d5038a13eca72fb216c07eb717169092e92284f1 >> Author: Johannes Thumshirn <jthumshirn@suse.de> >> Date: Wed Jul 4 10:53:56 2018 +0200 >> >> scsi: core: switch to scsi-mq by default >> >> To verify, boot with the additional kernel parameter >> >> scsi_mod.use_blk_mq=0 >> >> Which will reverse the effect of the above patch. >> > Yes, that fixes the problem. That may not the root cause, given this issue is only started to see from next-20180731, but d5038a13eca7 (scsi: core: switch to scsi-mq by default) has been in -next for quite a while. Seems something new causes this issue. Thanks, Ming Lei ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-01 23:57 ` Ming Lei @ 2018-08-02 0:03 ` James Bottomley 2018-08-02 0:20 ` Guenter Roeck 2018-08-02 4:58 ` Guenter Roeck 2018-08-02 0:12 ` Guenter Roeck 1 sibling, 2 replies; 21+ messages in thread From: James Bottomley @ 2018-08-02 0:03 UTC (permalink / raw) To: Ming Lei, Guenter Roeck Cc: Stephen Rothwell, Linux-Next Mailing List, Linux Kernel Mailing List, linux-scsi On Thu, 2018-08-02 at 07:57 +0800, Ming Lei wrote: > On Thu, Aug 2, 2018 at 7:47 AM, Guenter Roeck <linux@roeck-us.net> > wrote: > > On Wed, Aug 01, 2018 at 03:52:45PM -0700, James Bottomley wrote: > > > On Wed, 2018-08-01 at 15:48 -0700, Guenter Roeck wrote: > > > > On Wed, Aug 01, 2018 at 05:58:52PM +1000, Stephen Rothwell > > > > wrote: > > > > > Hi all, > > > > > > > > > > Changes since 20180731: > > > > > > > > > > The pci tree gained a conflict against the pci-current tree. > > > > > > > > > > The net-next tree gained a conflict against the bpf tree. > > > > > > > > > > The block tree lost its build failure. > > > > > > > > > > The staging tree still had its build failure due to an > > > > > interaction > > > > > with > > > > > the vfs tree for which I disabled CONFIG_EROFS_FS. > > > > > > > > > > The kspp tree lost its build failure. > > > > > > > > > > Non-merge commits (relative to Linus' tree): 10070 > > > > > 9137 files changed, 417605 insertions(+), 179996 deletions(- > > > > > ) > > > > > > > > > > ----------------------------------------------------------- > > > > > ------ > > > > > ----------- > > > > > > > > > > > > > The widespread kernel hang issues are still seen. I managed > > > > to bisect it after working around the transient build failures. > > > > Bisect log is attached below. Unfortunately, it doesn't help > > > > much. > > > > The culprit is reported as: > > > > > > > > 2d542828c5e9 Merge remote-tracking branch 'scsi/for-next' > > > > > > > > The preceding merge, > > > > > > > > 453f1d821165 Merge remote-tracking branch 'cgroup/for-next' > > > > > > > > checks out fine, as does the tip of scsi-next (commit > > > > 103c7b7e0184, > > > > "Merge branch 'misc' into for-next"). No idea how to proceed. > > > > > > This sounds like you may have a problem with this patch: > > > > > > commit d5038a13eca72fb216c07eb717169092e92284f1 > > > Author: Johannes Thumshirn <jthumshirn@suse.de> > > > Date: Wed Jul 4 10:53:56 2018 +0200 > > > > > > scsi: core: switch to scsi-mq by default > > > > > > To verify, boot with the additional kernel parameter > > > > > > scsi_mod.use_blk_mq=0 > > > > > > Which will reverse the effect of the above patch. > > > > > > > Yes, that fixes the problem. > > That may not the root cause, given this issue is only started to > see from next-20180731, but d5038a13eca7 (scsi: core: switch to > scsi-mq by default) > has been in -next for quite a while. > > Seems something new causes this issue. Read my other email about how to find this. https://marc.info/?l=linux-scsi&m=153316446223676 Now that we've confirmed the issue, Gunter, could you attempt to bisect it as that email describes? Thanks, James ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-02 0:03 ` James Bottomley @ 2018-08-02 0:20 ` Guenter Roeck 2018-08-02 4:58 ` Guenter Roeck 1 sibling, 0 replies; 21+ messages in thread From: Guenter Roeck @ 2018-08-02 0:20 UTC (permalink / raw) To: James Bottomley, Ming Lei Cc: Stephen Rothwell, Linux-Next Mailing List, Linux Kernel Mailing List, linux-scsi On 08/01/2018 05:03 PM, James Bottomley wrote: > On Thu, 2018-08-02 at 07:57 +0800, Ming Lei wrote: >> On Thu, Aug 2, 2018 at 7:47 AM, Guenter Roeck <linux@roeck-us.net> >> wrote: >>> On Wed, Aug 01, 2018 at 03:52:45PM -0700, James Bottomley wrote: >>>> On Wed, 2018-08-01 at 15:48 -0700, Guenter Roeck wrote: >>>>> On Wed, Aug 01, 2018 at 05:58:52PM +1000, Stephen Rothwell >>>>> wrote: >>>>>> Hi all, >>>>>> >>>>>> Changes since 20180731: >>>>>> >>>>>> The pci tree gained a conflict against the pci-current tree. >>>>>> >>>>>> The net-next tree gained a conflict against the bpf tree. >>>>>> >>>>>> The block tree lost its build failure. >>>>>> >>>>>> The staging tree still had its build failure due to an >>>>>> interaction >>>>>> with >>>>>> the vfs tree for which I disabled CONFIG_EROFS_FS. >>>>>> >>>>>> The kspp tree lost its build failure. >>>>>> >>>>>> Non-merge commits (relative to Linus' tree): 10070 >>>>>> 9137 files changed, 417605 insertions(+), 179996 deletions(- >>>>>> ) >>>>>> >>>>>> ----------------------------------------------------------- >>>>>> ------ >>>>>> ----------- >>>>>> >>>>> >>>>> The widespread kernel hang issues are still seen. I managed >>>>> to bisect it after working around the transient build failures. >>>>> Bisect log is attached below. Unfortunately, it doesn't help >>>>> much. >>>>> The culprit is reported as: >>>>> >>>>> 2d542828c5e9 Merge remote-tracking branch 'scsi/for-next' >>>>> >>>>> The preceding merge, >>>>> >>>>> 453f1d821165 Merge remote-tracking branch 'cgroup/for-next' >>>>> >>>>> checks out fine, as does the tip of scsi-next (commit >>>>> 103c7b7e0184, >>>>> "Merge branch 'misc' into for-next"). No idea how to proceed. >>>> >>>> This sounds like you may have a problem with this patch: >>>> >>>> commit d5038a13eca72fb216c07eb717169092e92284f1 >>>> Author: Johannes Thumshirn <jthumshirn@suse.de> >>>> Date: Wed Jul 4 10:53:56 2018 +0200 >>>> >>>> scsi: core: switch to scsi-mq by default >>>> >>>> To verify, boot with the additional kernel parameter >>>> >>>> scsi_mod.use_blk_mq=0 >>>> >>>> Which will reverse the effect of the above patch. >>>> >>> >>> Yes, that fixes the problem. >> >> That may not the root cause, given this issue is only started to >> see from next-20180731, but d5038a13eca7 (scsi: core: switch to >> scsi-mq by default) >> has been in -next for quite a while. >> >> Seems something new causes this issue. > > Read my other email about how to find this. > > https://marc.info/?l=linux-scsi&m=153316446223676 > > Now that we've confirmed the issue, Gunter, could you attempt to bisect > it as that email describes? > Already working on it. Guenter ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-02 0:03 ` James Bottomley 2018-08-02 0:20 ` Guenter Roeck @ 2018-08-02 4:58 ` Guenter Roeck 2018-08-02 5:04 ` Bart Van Assche 2018-08-02 11:35 ` Ming Lei 1 sibling, 2 replies; 21+ messages in thread From: Guenter Roeck @ 2018-08-02 4:58 UTC (permalink / raw) To: James Bottomley, Ming Lei Cc: Stephen Rothwell, Linux-Next Mailing List, Linux Kernel Mailing List, linux-scsi On 08/01/2018 05:03 PM, James Bottomley wrote: > On Thu, 2018-08-02 at 07:57 +0800, Ming Lei wrote: >> On Thu, Aug 2, 2018 at 7:47 AM, Guenter Roeck <linux@roeck-us.net> >> wrote: >>> On Wed, Aug 01, 2018 at 03:52:45PM -0700, James Bottomley wrote: >>>> On Wed, 2018-08-01 at 15:48 -0700, Guenter Roeck wrote: >>>>> On Wed, Aug 01, 2018 at 05:58:52PM +1000, Stephen Rothwell >>>>> wrote: >>>>>> Hi all, >>>>>> >>>>>> Changes since 20180731: >>>>>> >>>>>> The pci tree gained a conflict against the pci-current tree. >>>>>> >>>>>> The net-next tree gained a conflict against the bpf tree. >>>>>> >>>>>> The block tree lost its build failure. >>>>>> >>>>>> The staging tree still had its build failure due to an >>>>>> interaction >>>>>> with >>>>>> the vfs tree for which I disabled CONFIG_EROFS_FS. >>>>>> >>>>>> The kspp tree lost its build failure. >>>>>> >>>>>> Non-merge commits (relative to Linus' tree): 10070 >>>>>> 9137 files changed, 417605 insertions(+), 179996 deletions(- >>>>>> ) >>>>>> >>>>>> ----------------------------------------------------------- >>>>>> ------ >>>>>> ----------- >>>>>> >>>>> >>>>> The widespread kernel hang issues are still seen. I managed >>>>> to bisect it after working around the transient build failures. >>>>> Bisect log is attached below. Unfortunately, it doesn't help >>>>> much. >>>>> The culprit is reported as: >>>>> >>>>> 2d542828c5e9 Merge remote-tracking branch 'scsi/for-next' >>>>> >>>>> The preceding merge, >>>>> >>>>> 453f1d821165 Merge remote-tracking branch 'cgroup/for-next' >>>>> >>>>> checks out fine, as does the tip of scsi-next (commit >>>>> 103c7b7e0184, >>>>> "Merge branch 'misc' into for-next"). No idea how to proceed. >>>> >>>> This sounds like you may have a problem with this patch: >>>> >>>> commit d5038a13eca72fb216c07eb717169092e92284f1 >>>> Author: Johannes Thumshirn <jthumshirn@suse.de> >>>> Date: Wed Jul 4 10:53:56 2018 +0200 >>>> >>>> scsi: core: switch to scsi-mq by default >>>> >>>> To verify, boot with the additional kernel parameter >>>> >>>> scsi_mod.use_blk_mq=0 >>>> >>>> Which will reverse the effect of the above patch. >>>> >>> >>> Yes, that fixes the problem. >> >> That may not the root cause, given this issue is only started to >> see from next-20180731, but d5038a13eca7 (scsi: core: switch to >> scsi-mq by default) >> has been in -next for quite a while. >> >> Seems something new causes this issue. > > Read my other email about how to find this. > > https://marc.info/?l=linux-scsi&m=153316446223676 > > Now that we've confirmed the issue, Gunter, could you attempt to bisect > it as that email describes? > So, I am more and more baffled. I ran another round of bisect, this time each test executing twice, once with "scsi_mod.use_blk_mq=1" and once with "scsi_mod.use_blk_mq=0", requiring both to pass. Bisect still points to the merge as culprit. Ok, one step further: Actually _revert_ commit d5038a13eca72 before running each test, meaning the default is use_blk_mq=0. Still run both tests. Bisect _still_ points to the merge of scsi-next as culprit. So, to me it looks like the problem is triggered by _something_ in scsi-next, combined with _something_ in -next prior to the merge, not specifically associated with use_blk_mq=[0|1] or d5038a13eca72, but to a combination of some patch in scsi-next and some other patch. I am running out of ideas. Any thoughts on how to track this down further ? Guenter ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-02 4:58 ` Guenter Roeck @ 2018-08-02 5:04 ` Bart Van Assche 2018-08-02 12:46 ` Guenter Roeck 2018-08-02 11:35 ` Ming Lei 1 sibling, 1 reply; 21+ messages in thread From: Bart Van Assche @ 2018-08-02 5:04 UTC (permalink / raw) To: linux@roeck-us.net, James.Bottomley@HansenPartnership.com, tom.leiming@gmail.com Cc: sfr@canb.auug.org.au, linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, linux-next@vger.kernel.org On Wed, 2018-08-01 at 21:58 -0700, Guenter Roeck wrote: > I am running out of ideas. Any thoughts on how to track this down further ? Is a shell available when the hang occurs? If so, it would be helpful if you could provide a dump of the information in /sys/kernel/debug/block. There is namely detailed information in that directory about pending commands. Bart. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-02 5:04 ` Bart Van Assche @ 2018-08-02 12:46 ` Guenter Roeck 2018-08-02 12:51 ` Johannes Thumshirn 0 siblings, 1 reply; 21+ messages in thread From: Guenter Roeck @ 2018-08-02 12:46 UTC (permalink / raw) To: Bart Van Assche, James.Bottomley@HansenPartnership.com, tom.leiming@gmail.com Cc: sfr@canb.auug.org.au, linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, linux-next@vger.kernel.org On 08/01/2018 10:04 PM, Bart Van Assche wrote: > On Wed, 2018-08-01 at 21:58 -0700, Guenter Roeck wrote: >> I am running out of ideas. Any thoughts on how to track this down further ? > > Is a shell available when the hang occurs? If so, it would be helpful if you > could provide a dump of the information in /sys/kernel/debug/block. There is > namely detailed information in that directory about pending commands. > No, it hangs hard early in the boot process. See various logs at http://kerneltests.org/builders/, in the 'next' column. Here is some interesting information from the x86_64 boot tests. Building x86_64:q35:Broadwell-noTSX:defconfig:smp:sata:rootfs ... running .................................. failed (timeout) Building x86_64:q35:IvyBridge:defconfig:smp:nvme:rootfs ... running .................................. failed (timeout) Building x86_64:q35:SandyBridge:defconfig:smp:usb:rootfs ... running .................................. failed (timeout) Building x86_64:q35:Haswell:defconfig:smp:usb-uas:rootfs ... running ...... passed Building x86_64:q35:Skylake-Client:defconfig:smp:mmc:rootfs ... running .................................. failed (timeout) Building x86_64:q35:Conroe:defconfig:smp:scsi[DC395]:rootfs ... running ........ passed Building x86_64:q35:Nehalem:defconfig:smp:scsi[AM53C974]:rootfs ... running ...... passed Building x86_64:q35:Westmere-IBRS:defconfig:smp:scsi[53C810]:rootfs ... running ....... passed Building x86_64:q35:Skylake-Server:defconfig:smp:scsi[53C895A]:rootfs ... running ....... passed Building x86_64:pc:EPYC:defconfig:smp:scsi[MEGASAS]:rootfs ... running ...... passed Building x86_64:q35:EPYC-IBPB:defconfig:smp:scsi[MEGASAS2]:rootfs ... running ....... passed Building x86_64:q35:Opteron_G5:defconfig:smp:scsi[FUSION]:rootfs ... running ....... passed Building x86_64:pc:phenom:defconfig:smp:initrd ... running .................................. failed (timeout) Building x86_64:q35:Opteron_G1:defconfig:smp:initrd ... running .................................. failed (timeout) Building x86_64:pc:Opteron_G2:defconfig:smp:sata:rootfs ... running .................................. failed (timeout) Building x86_64:q35:core2duo:defconfig:smp:usb:rootfs ... running .................................. failed (timeout) Building x86_64:pc:Opteron_G3:defconfig:nosmp:usb:rootfs ... running .................................. failed (timeout) Building x86_64:q35:Opteron_G4:defconfig:nosmp:sata:rootfs ... running .................................. failed (timeout) This is consistent across multiple test runs. In summary, - Boot from initrd fails - Boot from SATA drive fails (this is with CONFIG_ATA) - Boot from NVME fails - Boot from USB drive fails - Boot from MMC (SD) fails - Boot from USB UAS drive passes - Boot from various real SCSI drives passes Platform (pc,q35), CPU type, or SMP/NOSMP does not seem to make a difference. Guenter ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-02 12:46 ` Guenter Roeck @ 2018-08-02 12:51 ` Johannes Thumshirn 2018-08-02 13:00 ` Guenter Roeck 0 siblings, 1 reply; 21+ messages in thread From: Johannes Thumshirn @ 2018-08-02 12:51 UTC (permalink / raw) To: Guenter Roeck Cc: Bart Van Assche, James.Bottomley@HansenPartnership.com, tom.leiming@gmail.com, sfr@canb.auug.org.au, linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, linux-next@vger.kernel.org On Thu, Aug 02, 2018 at 05:46:19AM -0700, Guenter Roeck wrote: > This is consistent across multiple test runs. In summary, > > - Boot from initrd fails > - Boot from SATA drive fails (this is with CONFIG_ATA) > - Boot from NVME fails > - Boot from USB drive fails > - Boot from MMC (SD) fails > - Boot from USB UAS drive passes > - Boot from various real SCSI drives passes > > Platform (pc,q35), CPU type, or SMP/NOSMP does not seem to make a difference. OK. I try to bisect between next-20180727 (known good) and next-20180731 (known bad) with forced scsi_mod.use_blk_mq=1, but so far the only bad I've seen is next-20180731. Byte, Johannes -- Johannes Thumshirn Storage jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-02 12:51 ` Johannes Thumshirn @ 2018-08-02 13:00 ` Guenter Roeck 2018-08-02 13:06 ` Johannes Thumshirn 0 siblings, 1 reply; 21+ messages in thread From: Guenter Roeck @ 2018-08-02 13:00 UTC (permalink / raw) To: Johannes Thumshirn Cc: Bart Van Assche, James.Bottomley@HansenPartnership.com, tom.leiming@gmail.com, sfr@canb.auug.org.au, linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, linux-next@vger.kernel.org On 08/02/2018 05:51 AM, Johannes Thumshirn wrote: > On Thu, Aug 02, 2018 at 05:46:19AM -0700, Guenter Roeck wrote: >> This is consistent across multiple test runs. In summary, >> >> - Boot from initrd fails >> - Boot from SATA drive fails (this is with CONFIG_ATA) >> - Boot from NVME fails >> - Boot from USB drive fails >> - Boot from MMC (SD) fails >> - Boot from USB UAS drive passes >> - Boot from various real SCSI drives passes >> >> Platform (pc,q35), CPU type, or SMP/NOSMP does not seem to make a difference. > > OK. I try to bisect between next-20180727 (known good) and > next-20180731 (known bad) with forced scsi_mod.use_blk_mq=1, but so > far the only bad I've seen is next-20180731. > Per my logs, next-20180730 is the first bad, next-20180727 is the last good. Guenter ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-02 13:00 ` Guenter Roeck @ 2018-08-02 13:06 ` Johannes Thumshirn 0 siblings, 0 replies; 21+ messages in thread From: Johannes Thumshirn @ 2018-08-02 13:06 UTC (permalink / raw) To: Guenter Roeck Cc: Bart Van Assche, James.Bottomley@HansenPartnership.com, tom.leiming@gmail.com, sfr@canb.auug.org.au, linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, linux-next@vger.kernel.org On Thu, Aug 02, 2018 at 06:00:19AM -0700, Guenter Roeck wrote: > Per my logs, next-20180730 is the first bad, next-20180727 is the last good. OK, so my bisecting is correct (a bit too much but still). -- Johannes Thumshirn Storage jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-02 4:58 ` Guenter Roeck 2018-08-02 5:04 ` Bart Van Assche @ 2018-08-02 11:35 ` Ming Lei 2018-08-02 13:05 ` Guenter Roeck 1 sibling, 1 reply; 21+ messages in thread From: Ming Lei @ 2018-08-02 11:35 UTC (permalink / raw) To: Guenter Roeck, linux-ide, Tejun Heo Cc: James Bottomley, Stephen Rothwell, Linux-Next Mailing List, Linux Kernel Mailing List, linux-scsi, Ming Lei On Thu, Aug 2, 2018 at 12:58 PM, Guenter Roeck <linux@roeck-us.net> wrote: > On 08/01/2018 05:03 PM, James Bottomley wrote: >> >> On Thu, 2018-08-02 at 07:57 +0800, Ming Lei wrote: >>> >>> On Thu, Aug 2, 2018 at 7:47 AM, Guenter Roeck <linux@roeck-us.net> >>> wrote: >>>> >>>> On Wed, Aug 01, 2018 at 03:52:45PM -0700, James Bottomley wrote: >>>>> >>>>> On Wed, 2018-08-01 at 15:48 -0700, Guenter Roeck wrote: >>>>>> >>>>>> On Wed, Aug 01, 2018 at 05:58:52PM +1000, Stephen Rothwell >>>>>> wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> Changes since 20180731: >>>>>>> >>>>>>> The pci tree gained a conflict against the pci-current tree. >>>>>>> >>>>>>> The net-next tree gained a conflict against the bpf tree. >>>>>>> >>>>>>> The block tree lost its build failure. >>>>>>> >>>>>>> The staging tree still had its build failure due to an >>>>>>> interaction >>>>>>> with >>>>>>> the vfs tree for which I disabled CONFIG_EROFS_FS. >>>>>>> >>>>>>> The kspp tree lost its build failure. >>>>>>> >>>>>>> Non-merge commits (relative to Linus' tree): 10070 >>>>>>> 9137 files changed, 417605 insertions(+), 179996 deletions(- >>>>>>> ) >>>>>>> >>>>>>> ----------------------------------------------------------- >>>>>>> ------ >>>>>>> ----------- >>>>>>> >>>>>> >>>>>> The widespread kernel hang issues are still seen. I managed >>>>>> to bisect it after working around the transient build failures. >>>>>> Bisect log is attached below. Unfortunately, it doesn't help >>>>>> much. >>>>>> The culprit is reported as: >>>>>> >>>>>> 2d542828c5e9 Merge remote-tracking branch 'scsi/for-next' >>>>>> >>>>>> The preceding merge, >>>>>> >>>>>> 453f1d821165 Merge remote-tracking branch 'cgroup/for-next' >>>>>> >>>>>> checks out fine, as does the tip of scsi-next (commit >>>>>> 103c7b7e0184, >>>>>> "Merge branch 'misc' into for-next"). No idea how to proceed. >>>>> >>>>> >>>>> This sounds like you may have a problem with this patch: >>>>> >>>>> commit d5038a13eca72fb216c07eb717169092e92284f1 >>>>> Author: Johannes Thumshirn <jthumshirn@suse.de> >>>>> Date: Wed Jul 4 10:53:56 2018 +0200 >>>>> >>>>> scsi: core: switch to scsi-mq by default >>>>> >>>>> To verify, boot with the additional kernel parameter >>>>> >>>>> scsi_mod.use_blk_mq=0 >>>>> >>>>> Which will reverse the effect of the above patch. >>>>> >>>> >>>> Yes, that fixes the problem. >>> >>> >>> That may not the root cause, given this issue is only started to >>> see from next-20180731, but d5038a13eca7 (scsi: core: switch to >>> scsi-mq by default) >>> has been in -next for quite a while. >>> >>> Seems something new causes this issue. >> >> >> Read my other email about how to find this. >> >> https://marc.info/?l=linux-scsi&m=153316446223676 >> >> Now that we've confirmed the issue, Gunter, could you attempt to bisect >> it as that email describes? >> > > So, I am more and more baffled. > > I ran another round of bisect, this time each test executing twice, > once with "scsi_mod.use_blk_mq=1" and once with "scsi_mod.use_blk_mq=0", > requiring both to pass. Bisect still points to the merge as culprit. > > Ok, one step further: Actually _revert_ commit d5038a13eca72 before running > each test, meaning the default is use_blk_mq=0. Still run both tests. > Bisect _still_ points to the merge of scsi-next as culprit. > > So, to me it looks like the problem is triggered by _something_ in > scsi-next, combined with _something_ in -next prior to the merge, > not specifically associated with use_blk_mq=[0|1] or d5038a13eca72, > but to a combination of some patch in scsi-next and some other patch. Today I am a bit busy, and not trace it much. So far, I found the code hangs in scsi_test_unit_ready() <-get_capabilities()<-sr_probe(), and scsi_queue_rq()/ata_scsi_queuecmd() has queued the command successfully, but never completed. Also tried to revert commits merged to ata tree on 30th, 31th, but no difference. Thanks, Ming Lei ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-02 11:35 ` Ming Lei @ 2018-08-02 13:05 ` Guenter Roeck 2018-08-02 16:27 ` Ming Lei 0 siblings, 1 reply; 21+ messages in thread From: Guenter Roeck @ 2018-08-02 13:05 UTC (permalink / raw) To: Ming Lei, linux-ide, Tejun Heo Cc: James Bottomley, Stephen Rothwell, Linux-Next Mailing List, Linux Kernel Mailing List, linux-scsi, Ming Lei On 08/02/2018 04:35 AM, Ming Lei wrote: > On Thu, Aug 2, 2018 at 12:58 PM, Guenter Roeck <linux@roeck-us.net> wrote: >> On 08/01/2018 05:03 PM, James Bottomley wrote: >>> >>> On Thu, 2018-08-02 at 07:57 +0800, Ming Lei wrote: >>>> >>>> On Thu, Aug 2, 2018 at 7:47 AM, Guenter Roeck <linux@roeck-us.net> >>>> wrote: >>>>> >>>>> On Wed, Aug 01, 2018 at 03:52:45PM -0700, James Bottomley wrote: >>>>>> >>>>>> On Wed, 2018-08-01 at 15:48 -0700, Guenter Roeck wrote: >>>>>>> >>>>>>> On Wed, Aug 01, 2018 at 05:58:52PM +1000, Stephen Rothwell >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> Changes since 20180731: >>>>>>>> >>>>>>>> The pci tree gained a conflict against the pci-current tree. >>>>>>>> >>>>>>>> The net-next tree gained a conflict against the bpf tree. >>>>>>>> >>>>>>>> The block tree lost its build failure. >>>>>>>> >>>>>>>> The staging tree still had its build failure due to an >>>>>>>> interaction >>>>>>>> with >>>>>>>> the vfs tree for which I disabled CONFIG_EROFS_FS. >>>>>>>> >>>>>>>> The kspp tree lost its build failure. >>>>>>>> >>>>>>>> Non-merge commits (relative to Linus' tree): 10070 >>>>>>>> 9137 files changed, 417605 insertions(+), 179996 deletions(- >>>>>>>> ) >>>>>>>> >>>>>>>> ----------------------------------------------------------- >>>>>>>> ------ >>>>>>>> ----------- >>>>>>>> >>>>>>> >>>>>>> The widespread kernel hang issues are still seen. I managed >>>>>>> to bisect it after working around the transient build failures. >>>>>>> Bisect log is attached below. Unfortunately, it doesn't help >>>>>>> much. >>>>>>> The culprit is reported as: >>>>>>> >>>>>>> 2d542828c5e9 Merge remote-tracking branch 'scsi/for-next' >>>>>>> >>>>>>> The preceding merge, >>>>>>> >>>>>>> 453f1d821165 Merge remote-tracking branch 'cgroup/for-next' >>>>>>> >>>>>>> checks out fine, as does the tip of scsi-next (commit >>>>>>> 103c7b7e0184, >>>>>>> "Merge branch 'misc' into for-next"). No idea how to proceed. >>>>>> >>>>>> >>>>>> This sounds like you may have a problem with this patch: >>>>>> >>>>>> commit d5038a13eca72fb216c07eb717169092e92284f1 >>>>>> Author: Johannes Thumshirn <jthumshirn@suse.de> >>>>>> Date: Wed Jul 4 10:53:56 2018 +0200 >>>>>> >>>>>> scsi: core: switch to scsi-mq by default >>>>>> >>>>>> To verify, boot with the additional kernel parameter >>>>>> >>>>>> scsi_mod.use_blk_mq=0 >>>>>> >>>>>> Which will reverse the effect of the above patch. >>>>>> >>>>> >>>>> Yes, that fixes the problem. >>>> >>>> >>>> That may not the root cause, given this issue is only started to >>>> see from next-20180731, but d5038a13eca7 (scsi: core: switch to >>>> scsi-mq by default) >>>> has been in -next for quite a while. >>>> >>>> Seems something new causes this issue. >>> >>> >>> Read my other email about how to find this. >>> >>> https://marc.info/?l=linux-scsi&m=153316446223676 >>> >>> Now that we've confirmed the issue, Gunter, could you attempt to bisect >>> it as that email describes? >>> >> >> So, I am more and more baffled. >> >> I ran another round of bisect, this time each test executing twice, >> once with "scsi_mod.use_blk_mq=1" and once with "scsi_mod.use_blk_mq=0", >> requiring both to pass. Bisect still points to the merge as culprit. >> >> Ok, one step further: Actually _revert_ commit d5038a13eca72 before running >> each test, meaning the default is use_blk_mq=0. Still run both tests. >> Bisect _still_ points to the merge of scsi-next as culprit. >> >> So, to me it looks like the problem is triggered by _something_ in >> scsi-next, combined with _something_ in -next prior to the merge, >> not specifically associated with use_blk_mq=[0|1] or d5038a13eca72, >> but to a combination of some patch in scsi-next and some other patch. > > Today I am a bit busy, and not trace it much. > > So far, I found the code hangs in scsi_test_unit_ready() > <-get_capabilities()<-sr_probe(), and scsi_queue_rq()/ata_scsi_queuecmd() > has queued the command successfully, but never completed. > > Also tried to revert commits merged to ata tree on 30th, 31th, > but no difference. > Looking at my commit logs, the problem started to happen after various DMA changes were introduced. The boot tests fail on ppc (few), mips (all 32 bit, most 64 bit), i386 (all), x86_64 (most). All other platform pass, even with the same type of boot tests. Here is an example from alpha: Building alpha:defconfig:initrd ... running .... passed Building alpha:defconfig:sata:rootfs ... running ..... passed Building alpha:defconfig:usb:rootfs ... running ..... passed Building alpha:defconfig:usb-uas:rootfs ... running ...... passed Building alpha:defconfig:scsi[AM53C974]:rootfs ... running ....... passed Building alpha:defconfig:scsi[DC395]:rootfs ... running ....... passed Building alpha:defconfig:scsi[MEGASAS]:rootfs ... running ...... passed Building alpha:defconfig:scsi[MEGASAS2]:rootfs ... running ...... passed Building alpha:defconfig:scsi[FUSION]:rootfs ... running ...... passed Building alpha:defconfig:nvme:rootfs ... running ..... passed arm64: Building arm64:virt:defconfig:smp:initrd ... running ..... passed Building arm64:virt:defconfig:smp:usb:rootfs ... running ..... passed Building arm64:virt:defconfig:smp:usb-uas:rootfs ... running ..... passed Building arm64:virt:defconfig:smp:virtio:rootfs ... running ..... passed Building arm64:virt:defconfig:smp:nvme:rootfs ... running ..... passed Building arm64:virt:defconfig:smp:mmc:rootfs ... running ..... passed Building arm64:virt:defconfig:smp:scsi[DC395]:rootfs ... running ..... passed Building arm64:virt:defconfig:smp:scsi[AM53C974]:rootfs ... running ..... passed Building arm64:virt:defconfig:smp:scsi[MEGASAS]:rootfs ... running ..... passed Building arm64:virt:defconfig:smp:scsi[MEGASAS2]:rootfs ... running ..... passed Building arm64:virt:defconfig:smp:scsi[53C810]:rootfs ... running ...... passed Building arm64:virt:defconfig:smp:scsi[53C895A]:rootfs ... running ...... passed Building arm64:virt:defconfig:smp:scsi[FUSION]:rootfs ... running ...... passed Skipping arm64:xlnx-zcu102:defconfig:smp:initrd:xilinx/zynqmp-ep108 ... Skipping arm64:xlnx-zcu102:defconfig:smp:sd:rootfs:xilinx/zynqmp-ep108 ... Skipping arm64:xlnx-zcu102:defconfig:smp:sata:rootfs:xilinx/zynqmp-ep108 ... Building arm64:xlnx-zcu102:defconfig:smp:initrd:xilinx/zynqmp-zcu102-rev1.0 ... running ....... passed Building arm64:xlnx-zcu102:defconfig:smp:sd1:rootfs:xilinx/zynqmp-zcu102-rev1.0 ... running ......... passed Building arm64:xlnx-zcu102:defconfig:smp:sata:rootfs:xilinx/zynqmp-zcu102-rev1.0 ... running ...... passed Building arm64:raspi3:defconfig:smp:initrd:broadcom/bcm2837-rpi-3-b ... running ..... passed Building arm64:raspi3:defconfig:smp:sd:rootfs:broadcom/bcm2837-rpi-3-b ... running ........ passed Building arm64:virt:defconfig:nosmp:initrd ... running ..... passed Skipping arm64:xlnx-zcu102:defconfig:nosmp:initrd:xilinx/zynqmp-ep108 ... Skipping arm64:xlnx-zcu102:defconfig:nosmp:sd:rootfs:xilinx/zynqmp-ep108 ... Building arm64:xlnx-zcu102:defconfig:nosmp:initrd:xilinx/zynqmp-zcu102-rev1.0 ... running ......... passed Building arm64:xlnx-zcu102:defconfig:nosmp:sd1:rootfs:xilinx/zynqmp-zcu102-rev1.0 ... running ......... passed ppc: Building powerpc:mac99:qemu_ppc_book3s_defconfig:nosmp:rootfs ... running ....... passed Building powerpc:g3beige:qemu_ppc_book3s_defconfig:nosmp:rootfs ... running ...... passed Building powerpc:mac99:qemu_ppc_book3s_defconfig:smp:rootfs ... running ....... passed Building powerpc:virtex-ml507:44x/virtex5_defconfig:devtmpfs:initrd ... running .... passed Building powerpc:mpc8544ds:mpc85xx_defconfig:initrd ... running .... passed Building powerpc:mpc8544ds:mpc85xx_defconfig:scsi:rootfs ... running ..... passed Building powerpc:mpc8544ds:mpc85xx_defconfig:sata:rootfs ... running .... passed Building powerpc:mpc8544ds:mpc85xx_smp_defconfig:initrd ... running .... passed Building powerpc:mpc8544ds:mpc85xx_smp_defconfig:scsi:rootfs ... running ..... passed Building powerpc:mpc8544ds:mpc85xx_smp_defconfig:sata:rootfs ... running .... passed Building powerpc:bamboo:44x/bamboo_defconfig:devtmpfs:initrd ... running .... passed Building powerpc:bamboo:44x/bamboo_defconfig:devtmpfs:scsi[AM53C974]:rootfs ... running ..... passed Building powerpc:bamboo:44x/bamboo_defconfig:devtmpfs:smp:initrd ... running .... passed Building powerpc:bamboo:44x/bamboo_defconfig:devtmpfs:smp:scsi[AM53C974]:rootfs ... running ..... passed Building powerpc:sam460ex:44x/canyonlands_defconfig:devtmpfs:initrd ... running ..... passed Building powerpc:sam460ex:44x/canyonlands_defconfig:devtmpfs:usbdisk:rootfs ... running ...... passed Building powerpc:mac99:pmac32_defconfig:devtmpfs:zilog:initrd ... running .................................. failed (timeout) Building powerpc:mac99:pmac32_defconfig:devtmpfs:zilog:rootfs ... running .................................. failed (timeout) Maybe that is a coincidence, but it is at least suspicious. Guenter ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-02 13:05 ` Guenter Roeck @ 2018-08-02 16:27 ` Ming Lei 2018-08-02 16:40 ` Bart Van Assche 0 siblings, 1 reply; 21+ messages in thread From: Ming Lei @ 2018-08-02 16:27 UTC (permalink / raw) To: Guenter Roeck Cc: Ming Lei, linux-ide, Tejun Heo, James Bottomley, Stephen Rothwell, Linux-Next Mailing List, Linux Kernel Mailing List, linux-scsi, Christoph Hellwig, Josef Bacik, Jens Axboe On Thu, Aug 02, 2018 at 06:05:16AM -0700, Guenter Roeck wrote: > On 08/02/2018 04:35 AM, Ming Lei wrote: > > On Thu, Aug 2, 2018 at 12:58 PM, Guenter Roeck <linux@roeck-us.net> wrote: > > > On 08/01/2018 05:03 PM, James Bottomley wrote: > > > > > > > > On Thu, 2018-08-02 at 07:57 +0800, Ming Lei wrote: > > > > > > > > > > On Thu, Aug 2, 2018 at 7:47 AM, Guenter Roeck <linux@roeck-us.net> > > > > > wrote: > > > > > > > > > > > > On Wed, Aug 01, 2018 at 03:52:45PM -0700, James Bottomley wrote: > > > > > > > > > > > > > > On Wed, 2018-08-01 at 15:48 -0700, Guenter Roeck wrote: > > > > > > > > > > > > > > > > On Wed, Aug 01, 2018 at 05:58:52PM +1000, Stephen Rothwell > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > Changes since 20180731: > > > > > > > > > > > > > > > > > > The pci tree gained a conflict against the pci-current tree. > > > > > > > > > > > > > > > > > > The net-next tree gained a conflict against the bpf tree. > > > > > > > > > > > > > > > > > > The block tree lost its build failure. > > > > > > > > > > > > > > > > > > The staging tree still had its build failure due to an > > > > > > > > > interaction > > > > > > > > > with > > > > > > > > > the vfs tree for which I disabled CONFIG_EROFS_FS. > > > > > > > > > > > > > > > > > > The kspp tree lost its build failure. > > > > > > > > > > > > > > > > > > Non-merge commits (relative to Linus' tree): 10070 > > > > > > > > > 9137 files changed, 417605 insertions(+), 179996 deletions(- > > > > > > > > > ) > > > > > > > > > > > > > > > > > > ----------------------------------------------------------- > > > > > > > > > ------ > > > > > > > > > ----------- > > > > > > > > > > > > > > > > > > > > > > > > > The widespread kernel hang issues are still seen. I managed > > > > > > > > to bisect it after working around the transient build failures. > > > > > > > > Bisect log is attached below. Unfortunately, it doesn't help > > > > > > > > much. > > > > > > > > The culprit is reported as: > > > > > > > > > > > > > > > > 2d542828c5e9 Merge remote-tracking branch 'scsi/for-next' > > > > > > > > > > > > > > > > The preceding merge, > > > > > > > > > > > > > > > > 453f1d821165 Merge remote-tracking branch 'cgroup/for-next' > > > > > > > > > > > > > > > > checks out fine, as does the tip of scsi-next (commit > > > > > > > > 103c7b7e0184, > > > > > > > > "Merge branch 'misc' into for-next"). No idea how to proceed. > > > > > > > > > > > > > > > > > > > > > This sounds like you may have a problem with this patch: > > > > > > > > > > > > > > commit d5038a13eca72fb216c07eb717169092e92284f1 > > > > > > > Author: Johannes Thumshirn <jthumshirn@suse.de> > > > > > > > Date: Wed Jul 4 10:53:56 2018 +0200 > > > > > > > > > > > > > > scsi: core: switch to scsi-mq by default > > > > > > > > > > > > > > To verify, boot with the additional kernel parameter > > > > > > > > > > > > > > scsi_mod.use_blk_mq=0 > > > > > > > > > > > > > > Which will reverse the effect of the above patch. > > > > > > > > > > > > > > > > > > > Yes, that fixes the problem. > > > > > > > > > > > > > > > That may not the root cause, given this issue is only started to > > > > > see from next-20180731, but d5038a13eca7 (scsi: core: switch to > > > > > scsi-mq by default) > > > > > has been in -next for quite a while. > > > > > > > > > > Seems something new causes this issue. > > > > > > > > > > > > Read my other email about how to find this. > > > > > > > > https://marc.info/?l=linux-scsi&m=153316446223676 > > > > > > > > Now that we've confirmed the issue, Gunter, could you attempt to bisect > > > > it as that email describes? > > > > > > > > > > So, I am more and more baffled. > > > > > > I ran another round of bisect, this time each test executing twice, > > > once with "scsi_mod.use_blk_mq=1" and once with "scsi_mod.use_blk_mq=0", > > > requiring both to pass. Bisect still points to the merge as culprit. > > > > > > Ok, one step further: Actually _revert_ commit d5038a13eca72 before running > > > each test, meaning the default is use_blk_mq=0. Still run both tests. > > > Bisect _still_ points to the merge of scsi-next as culprit. > > > > > > So, to me it looks like the problem is triggered by _something_ in > > > scsi-next, combined with _something_ in -next prior to the merge, > > > not specifically associated with use_blk_mq=[0|1] or d5038a13eca72, > > > but to a combination of some patch in scsi-next and some other patch. > > > > Today I am a bit busy, and not trace it much. > > > > So far, I found the code hangs in scsi_test_unit_ready() > > <-get_capabilities()<-sr_probe(), and scsi_queue_rq()/ata_scsi_queuecmd() > > has queued the command successfully, but never completed. > > > > Also tried to revert commits merged to ata tree on 30th, 31th, > > but no difference. > > > > Looking at my commit logs, the problem started to happen after various DMA > changes were introduced. The boot tests fail on ppc (few), mips (all 32 bit, > most 64 bit), i386 (all), x86_64 (most). All other platform pass, even with > the same type of boot tests. Here is an example from alpha: > > Building alpha:defconfig:initrd ... running .... passed > Building alpha:defconfig:sata:rootfs ... running ..... passed > Building alpha:defconfig:usb:rootfs ... running ..... passed > Building alpha:defconfig:usb-uas:rootfs ... running ...... passed > Building alpha:defconfig:scsi[AM53C974]:rootfs ... running ....... passed > Building alpha:defconfig:scsi[DC395]:rootfs ... running ....... passed > Building alpha:defconfig:scsi[MEGASAS]:rootfs ... running ...... passed > Building alpha:defconfig:scsi[MEGASAS2]:rootfs ... running ...... passed > Building alpha:defconfig:scsi[FUSION]:rootfs ... running ...... passed > Building alpha:defconfig:nvme:rootfs ... running ..... passed > > arm64: > > Building arm64:virt:defconfig:smp:initrd ... running ..... passed > Building arm64:virt:defconfig:smp:usb:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:usb-uas:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:virtio:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:nvme:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:mmc:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:scsi[DC395]:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:scsi[AM53C974]:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:scsi[MEGASAS]:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:scsi[MEGASAS2]:rootfs ... running ..... passed > Building arm64:virt:defconfig:smp:scsi[53C810]:rootfs ... running ...... passed > Building arm64:virt:defconfig:smp:scsi[53C895A]:rootfs ... running ...... passed > Building arm64:virt:defconfig:smp:scsi[FUSION]:rootfs ... running ...... passed > Skipping arm64:xlnx-zcu102:defconfig:smp:initrd:xilinx/zynqmp-ep108 ... > Skipping arm64:xlnx-zcu102:defconfig:smp:sd:rootfs:xilinx/zynqmp-ep108 ... > Skipping arm64:xlnx-zcu102:defconfig:smp:sata:rootfs:xilinx/zynqmp-ep108 ... > Building arm64:xlnx-zcu102:defconfig:smp:initrd:xilinx/zynqmp-zcu102-rev1.0 ... running ....... passed > Building arm64:xlnx-zcu102:defconfig:smp:sd1:rootfs:xilinx/zynqmp-zcu102-rev1.0 ... running ......... passed > Building arm64:xlnx-zcu102:defconfig:smp:sata:rootfs:xilinx/zynqmp-zcu102-rev1.0 ... running ...... passed > Building arm64:raspi3:defconfig:smp:initrd:broadcom/bcm2837-rpi-3-b ... running ..... passed > Building arm64:raspi3:defconfig:smp:sd:rootfs:broadcom/bcm2837-rpi-3-b ... running ........ passed > Building arm64:virt:defconfig:nosmp:initrd ... running ..... passed > Skipping arm64:xlnx-zcu102:defconfig:nosmp:initrd:xilinx/zynqmp-ep108 ... > Skipping arm64:xlnx-zcu102:defconfig:nosmp:sd:rootfs:xilinx/zynqmp-ep108 ... > Building arm64:xlnx-zcu102:defconfig:nosmp:initrd:xilinx/zynqmp-zcu102-rev1.0 ... running ......... passed > Building arm64:xlnx-zcu102:defconfig:nosmp:sd1:rootfs:xilinx/zynqmp-zcu102-rev1.0 ... running ......... passed > > ppc: > > Building powerpc:mac99:qemu_ppc_book3s_defconfig:nosmp:rootfs ... running ....... passed > Building powerpc:g3beige:qemu_ppc_book3s_defconfig:nosmp:rootfs ... running ...... passed > Building powerpc:mac99:qemu_ppc_book3s_defconfig:smp:rootfs ... running ....... passed > Building powerpc:virtex-ml507:44x/virtex5_defconfig:devtmpfs:initrd ... running .... passed > Building powerpc:mpc8544ds:mpc85xx_defconfig:initrd ... running .... passed > Building powerpc:mpc8544ds:mpc85xx_defconfig:scsi:rootfs ... running ..... passed > Building powerpc:mpc8544ds:mpc85xx_defconfig:sata:rootfs ... running .... passed > Building powerpc:mpc8544ds:mpc85xx_smp_defconfig:initrd ... running .... passed > Building powerpc:mpc8544ds:mpc85xx_smp_defconfig:scsi:rootfs ... running ..... passed > Building powerpc:mpc8544ds:mpc85xx_smp_defconfig:sata:rootfs ... running .... passed > Building powerpc:bamboo:44x/bamboo_defconfig:devtmpfs:initrd ... running .... passed > Building powerpc:bamboo:44x/bamboo_defconfig:devtmpfs:scsi[AM53C974]:rootfs ... running ..... passed > Building powerpc:bamboo:44x/bamboo_defconfig:devtmpfs:smp:initrd ... running .... passed > Building powerpc:bamboo:44x/bamboo_defconfig:devtmpfs:smp:scsi[AM53C974]:rootfs ... running ..... passed > Building powerpc:sam460ex:44x/canyonlands_defconfig:devtmpfs:initrd ... running ..... passed > Building powerpc:sam460ex:44x/canyonlands_defconfig:devtmpfs:usbdisk:rootfs ... running ...... passed > Building powerpc:mac99:pmac32_defconfig:devtmpfs:zilog:initrd ... running .................................. failed (timeout) > Building powerpc:mac99:pmac32_defconfig:devtmpfs:zilog:rootfs ... running .................................. failed (timeout) > > Maybe that is a coincidence, but it is at least suspicious. This issue can be fixed by reverting d250bf4e776ff09d5 ("blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter"). This patch looks wrong, because 'blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT' isn't completely same with 'blk_mq_request_started(req)'. Thanks, Ming ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-02 16:27 ` Ming Lei @ 2018-08-02 16:40 ` Bart Van Assche 2018-08-02 16:50 ` Ming Lei 0 siblings, 1 reply; 21+ messages in thread From: Bart Van Assche @ 2018-08-02 16:40 UTC (permalink / raw) To: linux@roeck-us.net, ming.lei@redhat.com Cc: linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org, tom.leiming@gmail.com, hch@lst.de, axboe@kernel.dk, linux-scsi@vger.kernel.org, sfr@canb.auug.org.au, linux-next@vger.kernel.org, James.Bottomley@hansenpartnership.com, josef@toxicpanda.com, tj@kernel.org On Fri, 2018-08-03 at 00:27 +0800, Ming Lei wrote: > This issue can be fixed by reverting d250bf4e776ff09d5 ("blk-mq: only iterate over > inflight requests in blk_mq_tagset_busy_iter"). > > This patch looks wrong, because 'blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT' > isn't completely same with 'blk_mq_request_started(req)'. Please test the following change instead of reverting the commit mentioned above: diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 09b2ee6694fb..25a0583d8b4c 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -271,7 +271,7 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) * test and set the bit before assining ->rqs[]. */ rq = tags->rqs[bitnr]; - if (rq && blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) + if (rq && blk_mq_rq_state(rq) != MQ_RQ_IDLE) iter_data->fn(rq, iter_data->data, reserved); return true; Thanks, Bart. ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-02 16:40 ` Bart Van Assche @ 2018-08-02 16:50 ` Ming Lei 2018-08-02 16:57 ` Bart Van Assche 0 siblings, 1 reply; 21+ messages in thread From: Ming Lei @ 2018-08-02 16:50 UTC (permalink / raw) To: Bart Van Assche Cc: linux@roeck-us.net, ming.lei@redhat.com, linux-ide@vger.kernel.org, linux-kernel@vger.kernel.org, hch@lst.de, axboe@kernel.dk, linux-scsi@vger.kernel.org, sfr@canb.auug.org.au, linux-next@vger.kernel.org, James.Bottomley@hansenpartnership.com, josef@toxicpanda.com, tj@kernel.org On Fri, Aug 3, 2018 at 12:40 AM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > On Fri, 2018-08-03 at 00:27 +0800, Ming Lei wrote: >> This issue can be fixed by reverting d250bf4e776ff09d5 ("blk-mq: only iterate over >> inflight requests in blk_mq_tagset_busy_iter"). >> >> This patch looks wrong, because 'blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT' >> isn't completely same with 'blk_mq_request_started(req)'. > > Please test the following change instead of reverting the commit mentioned > above: > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c > index 09b2ee6694fb..25a0583d8b4c 100644 > --- a/block/blk-mq-tag.c > +++ b/block/blk-mq-tag.c > @@ -271,7 +271,7 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) > * test and set the bit before assining ->rqs[]. > */ > rq = tags->rqs[bitnr]; > - if (rq && blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) > + if (rq && blk_mq_rq_state(rq) != MQ_RQ_IDLE) > iter_data->fn(rq, iter_data->data, reserved); > > return true; > I just sent out a similar patch on list, but use blk_mq_request_started() instead. https://marc.info/?l=linux-scsi&m=153322823307754&w=2 Thanks, Ming Lei ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-02 16:50 ` Ming Lei @ 2018-08-02 16:57 ` Bart Van Assche 0 siblings, 0 replies; 21+ messages in thread From: Bart Van Assche @ 2018-08-02 16:57 UTC (permalink / raw) To: tom.leiming@gmail.com Cc: linux-kernel@vger.kernel.org, linux-ide@vger.kernel.org, linux@roeck-us.net, hch@lst.de, axboe@kernel.dk, ming.lei@redhat.com, linux-scsi@vger.kernel.org, sfr@canb.auug.org.au, linux-next@vger.kernel.org, James.Bottomley@hansenpartnership.com, josef@toxicpanda.com, tj@kernel.org On Fri, 2018-08-03 at 00:50 +0800, Ming Lei wrote: > On Fri, Aug 3, 2018 at 12:40 AM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote: > > On Fri, 2018-08-03 at 00:27 +0800, Ming Lei wrote: > > > This issue can be fixed by reverting d250bf4e776ff09d5 ("blk-mq: only iterate over > > > inflight requests in blk_mq_tagset_busy_iter"). > > > > > > This patch looks wrong, because 'blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT' > > > isn't completely same with 'blk_mq_request_started(req)'. > > > > Please test the following change instead of reverting the commit mentioned > > above: > > > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c > > index 09b2ee6694fb..25a0583d8b4c 100644 > > --- a/block/blk-mq-tag.c > > +++ b/block/blk-mq-tag.c > > @@ -271,7 +271,7 @@ static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) > > * test and set the bit before assining ->rqs[]. > > */ > > rq = tags->rqs[bitnr]; > > - if (rq && blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) > > + if (rq && blk_mq_rq_state(rq) != MQ_RQ_IDLE) > > iter_data->fn(rq, iter_data->data, reserved); > > > > return true; > > > > I just sent out a similar patch on list, but use blk_mq_request_started() > instead. > > https://marc.info/?l=linux-scsi&m=153322823307754&w=2 Hello Ming, Since both patches are functionally equivalent, I'm fine with either version. Bart. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: linux-next: Tree for Aug 1 2018-08-01 23:57 ` Ming Lei 2018-08-02 0:03 ` James Bottomley @ 2018-08-02 0:12 ` Guenter Roeck 1 sibling, 0 replies; 21+ messages in thread From: Guenter Roeck @ 2018-08-02 0:12 UTC (permalink / raw) To: Ming Lei Cc: James Bottomley, Stephen Rothwell, Linux-Next Mailing List, Linux Kernel Mailing List, linux-scsi On 08/01/2018 04:57 PM, Ming Lei wrote: > On Thu, Aug 2, 2018 at 7:47 AM, Guenter Roeck <linux@roeck-us.net> wrote: >> On Wed, Aug 01, 2018 at 03:52:45PM -0700, James Bottomley wrote: >>> On Wed, 2018-08-01 at 15:48 -0700, Guenter Roeck wrote: >>>> On Wed, Aug 01, 2018 at 05:58:52PM +1000, Stephen Rothwell wrote: >>>>> Hi all, >>>>> >>>>> Changes since 20180731: >>>>> >>>>> The pci tree gained a conflict against the pci-current tree. >>>>> >>>>> The net-next tree gained a conflict against the bpf tree. >>>>> >>>>> The block tree lost its build failure. >>>>> >>>>> The staging tree still had its build failure due to an interaction >>>>> with >>>>> the vfs tree for which I disabled CONFIG_EROFS_FS. >>>>> >>>>> The kspp tree lost its build failure. >>>>> >>>>> Non-merge commits (relative to Linus' tree): 10070 >>>>> 9137 files changed, 417605 insertions(+), 179996 deletions(-) >>>>> >>>>> ----------------------------------------------------------------- >>>>> ----------- >>>>> >>>> >>>> The widespread kernel hang issues are still seen. I managed >>>> to bisect it after working around the transient build failures. >>>> Bisect log is attached below. Unfortunately, it doesn't help much. >>>> The culprit is reported as: >>>> >>>> 2d542828c5e9 Merge remote-tracking branch 'scsi/for-next' >>>> >>>> The preceding merge, >>>> >>>> 453f1d821165 Merge remote-tracking branch 'cgroup/for-next' >>>> >>>> checks out fine, as does the tip of scsi-next (commit 103c7b7e0184, >>>> "Merge branch 'misc' into for-next"). No idea how to proceed. >>> >>> This sounds like you may have a problem with this patch: >>> >>> commit d5038a13eca72fb216c07eb717169092e92284f1 >>> Author: Johannes Thumshirn <jthumshirn@suse.de> >>> Date: Wed Jul 4 10:53:56 2018 +0200 >>> >>> scsi: core: switch to scsi-mq by default >>> >>> To verify, boot with the additional kernel parameter >>> >>> scsi_mod.use_blk_mq=0 >>> >>> Which will reverse the effect of the above patch. >>> >> Yes, that fixes the problem. > > That may not the root cause, given this issue is only started to > see from next-20180731, but d5038a13eca7 (scsi: core: switch to > scsi-mq by default) > has been in -next for quite a while. > > Seems something new causes this issue. > Agreed. I should have said "fixes the symptom". I'll try to bisect with scsi_mod.use_blk_mq=1 as suggested by James. Guenter ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2018-08-02 16:57 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20180801175852.36549130@canb.auug.org.au>
[not found] ` <20180801224813.GA13074@roeck-us.net>
2018-08-01 22:52 ` linux-next: Tree for Aug 1 James Bottomley
2018-08-01 23:00 ` James Bottomley
2018-08-02 0:05 ` Stephen Rothwell
2018-08-02 1:19 ` Guenter Roeck
2018-08-01 23:47 ` Guenter Roeck
2018-08-01 23:57 ` Ming Lei
2018-08-02 0:03 ` James Bottomley
2018-08-02 0:20 ` Guenter Roeck
2018-08-02 4:58 ` Guenter Roeck
2018-08-02 5:04 ` Bart Van Assche
2018-08-02 12:46 ` Guenter Roeck
2018-08-02 12:51 ` Johannes Thumshirn
2018-08-02 13:00 ` Guenter Roeck
2018-08-02 13:06 ` Johannes Thumshirn
2018-08-02 11:35 ` Ming Lei
2018-08-02 13:05 ` Guenter Roeck
2018-08-02 16:27 ` Ming Lei
2018-08-02 16:40 ` Bart Van Assche
2018-08-02 16:50 ` Ming Lei
2018-08-02 16:57 ` Bart Van Assche
2018-08-02 0:12 ` Guenter Roeck
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox