* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
@ 2019-01-09 16:45 ` Frederic Barrat
0 siblings, 0 replies; 19+ messages in thread
From: Frederic Barrat @ 2019-01-09 16:45 UTC (permalink / raw)
To: Greg Kurz; +Cc: linuxppc-dev, aik, andrew.donnellan, stable
Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> On Wed, 9 Jan 2019 16:13:42 +0100
> Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>
>> With a recent change around IOMMU group, a system with an opencapi
>> adapter is no longer booting and we get a kernel oops:
>>
>> BUG: Kernel NULL pointer dereference at 0x00000028
>> Faulting instruction address: 0xc0000000000aa38c
>> Oops: Kernel access of bad area, sig: 7 [#1]
>> LE SMP NR_CPUS=2048 NUMA PowerNV
>> Modules linked in:
>> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
>> NIP: c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
>> REGS: c000000005783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-00001-g3bd6
>> MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000228 XER: 20
>> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
>> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
>> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
>> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
>> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
>> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
>> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
>> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
>> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
>> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
>> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
>> Call Trace:
>> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
>> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
>> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
>> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
>> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
>> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
>> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
>> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
>>
>> An opencapi device is using a device PE, so the current code breaks
>> because pe->pbus is not defined.
>>
>> More generally, there's no need to define an IOMMU group for opencapi,
>> as the device sends real addresses directly (admittedly, the
>> virtualization story is yet to be written). So let's fix it by
>
> Current plan is to go for mediated VFIO. The real HW stays under the control
> of the host ocxl driver, and we still don't need an IOMMU group.
>
>> skipping the IOMMU group setup for opencapi PHBs.
>>
>> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
>> ---
>
> Reviewed-by: Greg Kurz <groug@kaod.org>
>
> and
>
> Cc: stable@vger.kernel.org # v4.20
Thanks for the review! But why did you add stable? that problem is only
seen on 5.0-rc1, isn't it?
Fred
>> arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>> index 1d6406a051f1..7db3119f8a5b 100644
>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>> @@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void)
>> list_for_each_entry(hose, &hose_list, list_node) {
>> phb = hose->private_data;
>>
>> - if (phb->type == PNV_PHB_NPU_NVLINK)
>> + if (phb->type == PNV_PHB_NPU_NVLINK ||
>> + phb->type == PNV_PHB_NPU_OCAPI)
>> continue;
>>
>> list_for_each_entry(pe, &phb->ioda.pe_list, list) {
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
2019-01-09 16:45 ` Frederic Barrat
@ 2019-01-09 16:56 ` Greg Kurz
-1 siblings, 0 replies; 19+ messages in thread
From: Greg Kurz @ 2019-01-09 16:56 UTC (permalink / raw)
To: Frederic Barrat; +Cc: aik, linuxppc-dev, stable, andrew.donnellan
On Wed, 9 Jan 2019 17:45:53 +0100
Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> > On Wed, 9 Jan 2019 16:13:42 +0100
> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> >
> >> With a recent change around IOMMU group, a system with an opencapi
> >> adapter is no longer booting and we get a kernel oops:
> >>
> >> BUG: Kernel NULL pointer dereference at 0x00000028
> >> Faulting instruction address: 0xc0000000000aa38c
> >> Oops: Kernel access of bad area, sig: 7 [#1]
> >> LE SMP NR_CPUS=2048 NUMA PowerNV
> >> Modules linked in:
> >> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> >> NIP: c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> >> REGS: c000000005783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-00001-g3bd6
> >> MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000228 XER: 20
> >> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> >> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> >> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> >> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> >> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> >> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> >> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> >> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> >> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> >> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> >> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> >> Call Trace:
> >> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> >> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> >> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> >> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> >> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> >> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> >> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> >> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> >>
> >> An opencapi device is using a device PE, so the current code breaks
> >> because pe->pbus is not defined.
> >>
> >> More generally, there's no need to define an IOMMU group for opencapi,
> >> as the device sends real addresses directly (admittedly, the
> >> virtualization story is yet to be written). So let's fix it by
> >
> > Current plan is to go for mediated VFIO. The real HW stays under the control
> > of the host ocxl driver, and we still don't need an IOMMU group.
> >
> >> skipping the IOMMU group setup for opencapi PHBs.
> >>
> >> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> >> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> >> ---
> >
> > Reviewed-by: Greg Kurz <groug@kaod.org>
> >
> > and
> >
> > Cc: stable@vger.kernel.org # v4.20
>
> Thanks for the review! But why did you add stable? that problem is only
> seen on 5.0-rc1, isn't it?
>
Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
tested :)
> Fred
>
>
> >> arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> >> index 1d6406a051f1..7db3119f8a5b 100644
> >> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >> @@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void)
> >> list_for_each_entry(hose, &hose_list, list_node) {
> >> phb = hose->private_data;
> >>
> >> - if (phb->type == PNV_PHB_NPU_NVLINK)
> >> + if (phb->type == PNV_PHB_NPU_NVLINK ||
> >> + phb->type == PNV_PHB_NPU_OCAPI)
> >> continue;
> >>
> >> list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> >
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
@ 2019-01-09 16:56 ` Greg Kurz
0 siblings, 0 replies; 19+ messages in thread
From: Greg Kurz @ 2019-01-09 16:56 UTC (permalink / raw)
To: Frederic Barrat; +Cc: linuxppc-dev, aik, andrew.donnellan, stable
On Wed, 9 Jan 2019 17:45:53 +0100
Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> > On Wed, 9 Jan 2019 16:13:42 +0100
> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> >
> >> With a recent change around IOMMU group, a system with an opencapi
> >> adapter is no longer booting and we get a kernel oops:
> >>
> >> BUG: Kernel NULL pointer dereference at 0x00000028
> >> Faulting instruction address: 0xc0000000000aa38c
> >> Oops: Kernel access of bad area, sig: 7 [#1]
> >> LE SMP NR_CPUS=2048 NUMA PowerNV
> >> Modules linked in:
> >> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> >> NIP: c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> >> REGS: c000000005783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-00001-g3bd6
> >> MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000228 XER: 20
> >> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> >> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> >> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> >> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> >> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> >> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> >> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> >> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> >> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> >> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> >> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> >> Call Trace:
> >> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> >> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> >> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> >> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> >> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> >> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> >> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> >> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> >>
> >> An opencapi device is using a device PE, so the current code breaks
> >> because pe->pbus is not defined.
> >>
> >> More generally, there's no need to define an IOMMU group for opencapi,
> >> as the device sends real addresses directly (admittedly, the
> >> virtualization story is yet to be written). So let's fix it by
> >
> > Current plan is to go for mediated VFIO. The real HW stays under the control
> > of the host ocxl driver, and we still don't need an IOMMU group.
> >
> >> skipping the IOMMU group setup for opencapi PHBs.
> >>
> >> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> >> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> >> ---
> >
> > Reviewed-by: Greg Kurz <groug@kaod.org>
> >
> > and
> >
> > Cc: stable@vger.kernel.org # v4.20
>
> Thanks for the review! But why did you add stable? that problem is only
> seen on 5.0-rc1, isn't it?
>
Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
tested :)
> Fred
>
>
> >> arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> >> index 1d6406a051f1..7db3119f8a5b 100644
> >> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >> @@ -2681,7 +2681,8 @@ static void pnv_pci_ioda_setup_iommu_api(void)
> >> list_for_each_entry(hose, &hose_list, list_node) {
> >> phb = hose->private_data;
> >>
> >> - if (phb->type == PNV_PHB_NPU_NVLINK)
> >> + if (phb->type == PNV_PHB_NPU_NVLINK ||
> >> + phb->type == PNV_PHB_NPU_OCAPI)
> >> continue;
> >>
> >> list_for_each_entry(pe, &phb->ioda.pe_list, list) {
> >
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
2019-01-09 16:56 ` Greg Kurz
@ 2019-01-10 12:25 ` Michael Ellerman
-1 siblings, 0 replies; 19+ messages in thread
From: Michael Ellerman @ 2019-01-10 12:25 UTC (permalink / raw)
To: Greg Kurz, Frederic Barrat; +Cc: aik, linuxppc-dev, andrew.donnellan, stable
Greg Kurz <groug@kaod.org> writes:
> On Wed, 9 Jan 2019 17:45:53 +0100
> Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>
>> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
>> > On Wed, 9 Jan 2019 16:13:42 +0100
>> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>> >
>> >> With a recent change around IOMMU group, a system with an opencapi
>> >> adapter is no longer booting and we get a kernel oops:
>> >>
>> >> BUG: Kernel NULL pointer dereference at 0x00000028
>> >> Faulting instruction address: 0xc0000000000aa38c
>> >> Oops: Kernel access of bad area, sig: 7 [#1]
>> >> LE SMP NR_CPUS=2048 NUMA PowerNV
>> >> Modules linked in:
>> >> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
>> >> NIP: c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
>> >> REGS: c000000005783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-00001-g3bd6
>> >> MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000228 XER: 20
>> >> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
>> >> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
>> >> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
>> >> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
>> >> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
>> >> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
>> >> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
>> >> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
>> >> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
>> >> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
>> >> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
>> >> Call Trace:
>> >> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
>> >> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
>> >> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
>> >> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
>> >> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
>> >> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
>> >> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
>> >> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
>> >>
>> >> An opencapi device is using a device PE, so the current code breaks
>> >> because pe->pbus is not defined.
>> >>
>> >> More generally, there's no need to define an IOMMU group for opencapi,
>> >> as the device sends real addresses directly (admittedly, the
>> >> virtualization story is yet to be written). So let's fix it by
>> >
>> > Current plan is to go for mediated VFIO. The real HW stays under the control
>> > of the host ocxl driver, and we still don't need an IOMMU group.
>> >
>> >> skipping the IOMMU group setup for opencapi PHBs.
>> >>
>> >> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
>> >> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
>> >> ---
>> >
>> > Reviewed-by: Greg Kurz <groug@kaod.org>
>> >
>> > and
>> >
>> > Cc: stable@vger.kernel.org # v4.20
>>
>> Thanks for the review! But why did you add stable? that problem is only
>> seen on 5.0-rc1, isn't it?
>
> Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
> tested :)
It was committed to a branch based off 4.20-rc2, but it wasn't merged
into the 4.20 release.
$ git describe --match "v[0-9]*" --contains 0bd971676e68
v5.0-rc1~137^2~15
So it doesn't need to go to stable.
cheers
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
@ 2019-01-10 12:25 ` Michael Ellerman
0 siblings, 0 replies; 19+ messages in thread
From: Michael Ellerman @ 2019-01-10 12:25 UTC (permalink / raw)
To: Greg Kurz, Frederic Barrat; +Cc: aik, linuxppc-dev, stable, andrew.donnellan
Greg Kurz <groug@kaod.org> writes:
> On Wed, 9 Jan 2019 17:45:53 +0100
> Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>
>> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
>> > On Wed, 9 Jan 2019 16:13:42 +0100
>> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>> >
>> >> With a recent change around IOMMU group, a system with an opencapi
>> >> adapter is no longer booting and we get a kernel oops:
>> >>
>> >> BUG: Kernel NULL pointer dereference at 0x00000028
>> >> Faulting instruction address: 0xc0000000000aa38c
>> >> Oops: Kernel access of bad area, sig: 7 [#1]
>> >> LE SMP NR_CPUS=2048 NUMA PowerNV
>> >> Modules linked in:
>> >> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
>> >> NIP: c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
>> >> REGS: c000000005783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-00001-g3bd6
>> >> MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000228 XER: 20
>> >> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
>> >> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
>> >> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
>> >> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
>> >> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
>> >> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
>> >> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
>> >> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
>> >> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
>> >> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
>> >> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
>> >> Call Trace:
>> >> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
>> >> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
>> >> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
>> >> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
>> >> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
>> >> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
>> >> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
>> >> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
>> >>
>> >> An opencapi device is using a device PE, so the current code breaks
>> >> because pe->pbus is not defined.
>> >>
>> >> More generally, there's no need to define an IOMMU group for opencapi,
>> >> as the device sends real addresses directly (admittedly, the
>> >> virtualization story is yet to be written). So let's fix it by
>> >
>> > Current plan is to go for mediated VFIO. The real HW stays under the control
>> > of the host ocxl driver, and we still don't need an IOMMU group.
>> >
>> >> skipping the IOMMU group setup for opencapi PHBs.
>> >>
>> >> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
>> >> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
>> >> ---
>> >
>> > Reviewed-by: Greg Kurz <groug@kaod.org>
>> >
>> > and
>> >
>> > Cc: stable@vger.kernel.org # v4.20
>>
>> Thanks for the review! But why did you add stable? that problem is only
>> seen on 5.0-rc1, isn't it?
>
> Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
> tested :)
It was committed to a branch based off 4.20-rc2, but it wasn't merged
into the 4.20 release.
$ git describe --match "v[0-9]*" --contains 0bd971676e68
v5.0-rc1~137^2~15
So it doesn't need to go to stable.
cheers
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
2019-01-10 12:25 ` Michael Ellerman
@ 2019-01-10 12:31 ` Greg Kurz
-1 siblings, 0 replies; 19+ messages in thread
From: Greg Kurz @ 2019-01-10 12:31 UTC (permalink / raw)
To: Michael Ellerman
Cc: Frederic Barrat, aik, linuxppc-dev, andrew.donnellan, stable
On Thu, 10 Jan 2019 23:25:11 +1100
Michael Ellerman <mpe@ellerman.id.au> wrote:
> Greg Kurz <groug@kaod.org> writes:
> > On Wed, 9 Jan 2019 17:45:53 +0100
> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> >
> >> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> >> > On Wed, 9 Jan 2019 16:13:42 +0100
> >> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> >> >
> >> >> With a recent change around IOMMU group, a system with an opencapi
> >> >> adapter is no longer booting and we get a kernel oops:
> >> >>
> >> >> BUG: Kernel NULL pointer dereference at 0x00000028
> >> >> Faulting instruction address: 0xc0000000000aa38c
> >> >> Oops: Kernel access of bad area, sig: 7 [#1]
> >> >> LE SMP NR_CPUS=2048 NUMA PowerNV
> >> >> Modules linked in:
> >> >> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> >> >> NIP: c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> >> >> REGS: c000000005783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-00001-g3bd6
> >> >> MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000228 XER: 20
> >> >> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> >> >> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> >> >> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> >> >> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> >> >> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> >> >> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> >> >> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> >> >> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> >> >> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> >> >> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> >> >> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> >> >> Call Trace:
> >> >> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> >> >> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> >> >> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> >> >> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> >> >> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> >> >> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> >> >> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> >> >> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> >> >>
> >> >> An opencapi device is using a device PE, so the current code breaks
> >> >> because pe->pbus is not defined.
> >> >>
> >> >> More generally, there's no need to define an IOMMU group for opencapi,
> >> >> as the device sends real addresses directly (admittedly, the
> >> >> virtualization story is yet to be written). So let's fix it by
> >> >
> >> > Current plan is to go for mediated VFIO. The real HW stays under the control
> >> > of the host ocxl driver, and we still don't need an IOMMU group.
> >> >
> >> >> skipping the IOMMU group setup for opencapi PHBs.
> >> >>
> >> >> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> >> >> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> >> >> ---
> >> >
> >> > Reviewed-by: Greg Kurz <groug@kaod.org>
> >> >
> >> > and
> >> >
> >> > Cc: stable@vger.kernel.org # v4.20
> >>
> >> Thanks for the review! But why did you add stable? that problem is only
> >> seen on 5.0-rc1, isn't it?
> >
> > Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
> > tested :)
>
> It was committed to a branch based off 4.20-rc2, but it wasn't merged
> into the 4.20 release.
>
> $ git describe --match "v[0-9]*" --contains 0bd971676e68
> v5.0-rc1~137^2~15
>
> So it doesn't need to go to stable.
>
Yeah I realized that afterwards, sorry for the noise and Happy New Year :)
> cheers
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
@ 2019-01-10 12:31 ` Greg Kurz
0 siblings, 0 replies; 19+ messages in thread
From: Greg Kurz @ 2019-01-10 12:31 UTC (permalink / raw)
To: Michael Ellerman
Cc: Frederic Barrat, aik, linuxppc-dev, stable, andrew.donnellan
On Thu, 10 Jan 2019 23:25:11 +1100
Michael Ellerman <mpe@ellerman.id.au> wrote:
> Greg Kurz <groug@kaod.org> writes:
> > On Wed, 9 Jan 2019 17:45:53 +0100
> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> >
> >> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> >> > On Wed, 9 Jan 2019 16:13:42 +0100
> >> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> >> >
> >> >> With a recent change around IOMMU group, a system with an opencapi
> >> >> adapter is no longer booting and we get a kernel oops:
> >> >>
> >> >> BUG: Kernel NULL pointer dereference at 0x00000028
> >> >> Faulting instruction address: 0xc0000000000aa38c
> >> >> Oops: Kernel access of bad area, sig: 7 [#1]
> >> >> LE SMP NR_CPUS=2048 NUMA PowerNV
> >> >> Modules linked in:
> >> >> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> >> >> NIP: c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> >> >> REGS: c000000005783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-00001-g3bd6
> >> >> MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000228 XER: 20
> >> >> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> >> >> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> >> >> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> >> >> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> >> >> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> >> >> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> >> >> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> >> >> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> >> >> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> >> >> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> >> >> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> >> >> Call Trace:
> >> >> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> >> >> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> >> >> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> >> >> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> >> >> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> >> >> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> >> >> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> >> >> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> >> >>
> >> >> An opencapi device is using a device PE, so the current code breaks
> >> >> because pe->pbus is not defined.
> >> >>
> >> >> More generally, there's no need to define an IOMMU group for opencapi,
> >> >> as the device sends real addresses directly (admittedly, the
> >> >> virtualization story is yet to be written). So let's fix it by
> >> >
> >> > Current plan is to go for mediated VFIO. The real HW stays under the control
> >> > of the host ocxl driver, and we still don't need an IOMMU group.
> >> >
> >> >> skipping the IOMMU group setup for opencapi PHBs.
> >> >>
> >> >> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> >> >> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> >> >> ---
> >> >
> >> > Reviewed-by: Greg Kurz <groug@kaod.org>
> >> >
> >> > and
> >> >
> >> > Cc: stable@vger.kernel.org # v4.20
> >>
> >> Thanks for the review! But why did you add stable? that problem is only
> >> seen on 5.0-rc1, isn't it?
> >
> > Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
> > tested :)
>
> It was committed to a branch based off 4.20-rc2, but it wasn't merged
> into the 4.20 release.
>
> $ git describe --match "v[0-9]*" --contains 0bd971676e68
> v5.0-rc1~137^2~15
>
> So it doesn't need to go to stable.
>
Yeah I realized that afterwards, sorry for the noise and Happy New Year :)
> cheers
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
2019-01-10 12:25 ` Michael Ellerman
@ 2019-01-10 12:58 ` Frederic Barrat
-1 siblings, 0 replies; 19+ messages in thread
From: Frederic Barrat @ 2019-01-10 12:58 UTC (permalink / raw)
To: Michael Ellerman, Greg Kurz, greg
Cc: aik, linuxppc-dev, andrew.donnellan, stable
Le 10/01/2019 à 13:25, Michael Ellerman a écrit :
> Greg Kurz <groug@kaod.org> writes:
>> On Wed, 9 Jan 2019 17:45:53 +0100
>> Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>>
>>> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
>>>> On Wed, 9 Jan 2019 16:13:42 +0100
>>>> Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>>>>
>>>>> With a recent change around IOMMU group, a system with an opencapi
>>>>> adapter is no longer booting and we get a kernel oops:
>>>>>
>>>>> BUG: Kernel NULL pointer dereference at 0x00000028
>>>>> Faulting instruction address: 0xc0000000000aa38c
>>>>> Oops: Kernel access of bad area, sig: 7 [#1]
>>>>> LE SMP NR_CPUS=2048 NUMA PowerNV
>>>>> Modules linked in:
>>>>> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
>>>>> NIP: c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
>>>>> REGS: c000000005783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-00001-g3bd6
>>>>> MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000228 XER: 20
>>>>> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
>>>>> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
>>>>> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
>>>>> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
>>>>> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
>>>>> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
>>>>> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
>>>>> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
>>>>> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
>>>>> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
>>>>> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
>>>>> Call Trace:
>>>>> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
>>>>> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
>>>>> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
>>>>> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
>>>>> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
>>>>> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
>>>>> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
>>>>> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
>>>>>
>>>>> An opencapi device is using a device PE, so the current code breaks
>>>>> because pe->pbus is not defined.
>>>>>
>>>>> More generally, there's no need to define an IOMMU group for opencapi,
>>>>> as the device sends real addresses directly (admittedly, the
>>>>> virtualization story is yet to be written). So let's fix it by
>>>>
>>>> Current plan is to go for mediated VFIO. The real HW stays under the control
>>>> of the host ocxl driver, and we still don't need an IOMMU group.
>>>>
>>>>> skipping the IOMMU group setup for opencapi PHBs.
>>>>>
>>>>> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
>>>>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
>>>>> ---
>>>>
>>>> Reviewed-by: Greg Kurz <groug@kaod.org>
>>>>
>>>> and
>>>>
>>>> Cc: stable@vger.kernel.org # v4.20
>>>
>>> Thanks for the review! But why did you add stable? that problem is only
>>> seen on 5.0-rc1, isn't it?
>>
>> Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
>> tested :)
>
> It was committed to a branch based off 4.20-rc2, but it wasn't merged
> into the 4.20 release.
>
> $ git describe --match "v[0-9]*" --contains 0bd971676e68
> v5.0-rc1~137^2~15
>
> So it doesn't need to go to stable.
Which makes me wonder if Greg (KH) was really talking about that
original patch and whether something worthwhile was dropped from stable
by mistake?
Fred
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
@ 2019-01-10 12:58 ` Frederic Barrat
0 siblings, 0 replies; 19+ messages in thread
From: Frederic Barrat @ 2019-01-10 12:58 UTC (permalink / raw)
To: Michael Ellerman, Greg Kurz, greg
Cc: aik, linuxppc-dev, stable, andrew.donnellan
Le 10/01/2019 à 13:25, Michael Ellerman a écrit :
> Greg Kurz <groug@kaod.org> writes:
>> On Wed, 9 Jan 2019 17:45:53 +0100
>> Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>>
>>> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
>>>> On Wed, 9 Jan 2019 16:13:42 +0100
>>>> Frederic Barrat <fbarrat@linux.ibm.com> wrote:
>>>>
>>>>> With a recent change around IOMMU group, a system with an opencapi
>>>>> adapter is no longer booting and we get a kernel oops:
>>>>>
>>>>> BUG: Kernel NULL pointer dereference at 0x00000028
>>>>> Faulting instruction address: 0xc0000000000aa38c
>>>>> Oops: Kernel access of bad area, sig: 7 [#1]
>>>>> LE SMP NR_CPUS=2048 NUMA PowerNV
>>>>> Modules linked in:
>>>>> CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
>>>>> NIP: c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
>>>>> REGS: c000000005783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-00001-g3bd6
>>>>> MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000228 XER: 20
>>>>> CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
>>>>> GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
>>>>> GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
>>>>> GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
>>>>> GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
>>>>> GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
>>>>> GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
>>>>> GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
>>>>> GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
>>>>> NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
>>>>> LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
>>>>> Call Trace:
>>>>> [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
>>>>> [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
>>>>> [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
>>>>> [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
>>>>> [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
>>>>> [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
>>>>> [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
>>>>> [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
>>>>>
>>>>> An opencapi device is using a device PE, so the current code breaks
>>>>> because pe->pbus is not defined.
>>>>>
>>>>> More generally, there's no need to define an IOMMU group for opencapi,
>>>>> as the device sends real addresses directly (admittedly, the
>>>>> virtualization story is yet to be written). So let's fix it by
>>>>
>>>> Current plan is to go for mediated VFIO. The real HW stays under the control
>>>> of the host ocxl driver, and we still don't need an IOMMU group.
>>>>
>>>>> skipping the IOMMU group setup for opencapi PHBs.
>>>>>
>>>>> Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
>>>>> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
>>>>> ---
>>>>
>>>> Reviewed-by: Greg Kurz <groug@kaod.org>
>>>>
>>>> and
>>>>
>>>> Cc: stable@vger.kernel.org # v4.20
>>>
>>> Thanks for the review! But why did you add stable? that problem is only
>>> seen on 5.0-rc1, isn't it?
>>
>> Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
>> tested :)
>
> It was committed to a branch based off 4.20-rc2, but it wasn't merged
> into the 4.20 release.
>
> $ git describe --match "v[0-9]*" --contains 0bd971676e68
> v5.0-rc1~137^2~15
>
> So it doesn't need to go to stable.
Which makes me wonder if Greg (KH) was really talking about that
original patch and whether something worthwhile was dropped from stable
by mistake?
Fred
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
2019-01-10 12:58 ` Frederic Barrat
@ 2019-01-10 13:31 ` Greg KH
-1 siblings, 0 replies; 19+ messages in thread
From: Greg KH @ 2019-01-10 13:31 UTC (permalink / raw)
To: Frederic Barrat; +Cc: aik, Greg Kurz, stable, andrew.donnellan, linuxppc-dev
On Thu, Jan 10, 2019 at 01:58:31PM +0100, Frederic Barrat wrote:
>
>
> Le 10/01/2019 à 13:25, Michael Ellerman a écrit :
> > Greg Kurz <groug@kaod.org> writes:
> > > On Wed, 9 Jan 2019 17:45:53 +0100
> > > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> > >
> > > > Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> > > > > On Wed, 9 Jan 2019 16:13:42 +0100
> > > > > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> > > > > > With a recent change around IOMMU group, a system with an opencapi
> > > > > > adapter is no longer booting and we get a kernel oops:
> > > > > >
> > > > > > BUG: Kernel NULL pointer dereference at 0x00000028
> > > > > > Faulting instruction address: 0xc0000000000aa38c
> > > > > > Oops: Kernel access of bad area, sig: 7 [#1]
> > > > > > LE SMP NR_CPUS=2048 NUMA PowerNV
> > > > > > Modules linked in:
> > > > > > CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> > > > > > NIP: c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> > > > > > REGS: c000000005783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-00001-g3bd6
> > > > > > MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000228 XER: 20
> > > > > > CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> > > > > > GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> > > > > > GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> > > > > > GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> > > > > > GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> > > > > > GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> > > > > > GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> > > > > > GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> > > > > > GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> > > > > > NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> > > > > > LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> > > > > > Call Trace:
> > > > > > [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> > > > > > [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> > > > > > [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> > > > > > [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> > > > > > [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> > > > > > [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> > > > > > [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> > > > > > [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> > > > > >
> > > > > > An opencapi device is using a device PE, so the current code breaks
> > > > > > because pe->pbus is not defined.
> > > > > >
> > > > > > More generally, there's no need to define an IOMMU group for opencapi,
> > > > > > as the device sends real addresses directly (admittedly, the
> > > > > > virtualization story is yet to be written). So let's fix it by
> > > > >
> > > > > Current plan is to go for mediated VFIO. The real HW stays under the control
> > > > > of the host ocxl driver, and we still don't need an IOMMU group.
> > > > > > skipping the IOMMU group setup for opencapi PHBs.
> > > > > >
> > > > > > Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> > > > > > Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> > > > > > ---
> > > > >
> > > > > Reviewed-by: Greg Kurz <groug@kaod.org>
> > > > >
> > > > > and
> > > > >
> > > > > Cc: stable@vger.kernel.org # v4.20
> > > >
> > > > Thanks for the review! But why did you add stable? that problem is only
> > > > seen on 5.0-rc1, isn't it?
> > >
> > > Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
> > > tested :)
> >
> > It was committed to a branch based off 4.20-rc2, but it wasn't merged
> > into the 4.20 release.
> >
> > $ git describe --match "v[0-9]*" --contains 0bd971676e68
> > v5.0-rc1~137^2~15
> >
> > So it doesn't need to go to stable.
>
> Which makes me wonder if Greg (KH) was really talking about that original
> patch and whether something worthwhile was dropped from stable by mistake?
Totally different thread, sorry for the noise, my fault...
greg k-h
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
@ 2019-01-10 13:31 ` Greg KH
0 siblings, 0 replies; 19+ messages in thread
From: Greg KH @ 2019-01-10 13:31 UTC (permalink / raw)
To: Frederic Barrat
Cc: Michael Ellerman, Greg Kurz, aik, linuxppc-dev, stable,
andrew.donnellan
On Thu, Jan 10, 2019 at 01:58:31PM +0100, Frederic Barrat wrote:
>
>
> Le 10/01/2019 à 13:25, Michael Ellerman a écrit :
> > Greg Kurz <groug@kaod.org> writes:
> > > On Wed, 9 Jan 2019 17:45:53 +0100
> > > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> > >
> > > > Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> > > > > On Wed, 9 Jan 2019 16:13:42 +0100
> > > > > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> > > > > > With a recent change around IOMMU group, a system with an opencapi
> > > > > > adapter is no longer booting and we get a kernel oops:
> > > > > >
> > > > > > BUG: Kernel NULL pointer dereference at 0x00000028
> > > > > > Faulting instruction address: 0xc0000000000aa38c
> > > > > > Oops: Kernel access of bad area, sig: 7 [#1]
> > > > > > LE SMP NR_CPUS=2048 NUMA PowerNV
> > > > > > Modules linked in:
> > > > > > CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> > > > > > NIP: c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> > > > > > REGS: c000000005783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-00001-g3bd6
> > > > > > MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000228 XER: 20
> > > > > > CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> > > > > > GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> > > > > > GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> > > > > > GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> > > > > > GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> > > > > > GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> > > > > > GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> > > > > > GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> > > > > > GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> > > > > > NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> > > > > > LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> > > > > > Call Trace:
> > > > > > [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> > > > > > [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> > > > > > [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> > > > > > [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> > > > > > [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> > > > > > [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> > > > > > [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> > > > > > [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> > > > > >
> > > > > > An opencapi device is using a device PE, so the current code breaks
> > > > > > because pe->pbus is not defined.
> > > > > >
> > > > > > More generally, there's no need to define an IOMMU group for opencapi,
> > > > > > as the device sends real addresses directly (admittedly, the
> > > > > > virtualization story is yet to be written). So let's fix it by
> > > > >
> > > > > Current plan is to go for mediated VFIO. The real HW stays under the control
> > > > > of the host ocxl driver, and we still don't need an IOMMU group.
> > > > > > skipping the IOMMU group setup for opencapi PHBs.
> > > > > >
> > > > > > Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> > > > > > Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> > > > > > ---
> > > > >
> > > > > Reviewed-by: Greg Kurz <groug@kaod.org>
> > > > >
> > > > > and
> > > > >
> > > > > Cc: stable@vger.kernel.org # v4.20
> > > >
> > > > Thanks for the review! But why did you add stable? that problem is only
> > > > seen on 5.0-rc1, isn't it?
> > >
> > > Based on the fact that 0bd971676e68 was committed in 4.20... but I haven't
> > > tested :)
> >
> > It was committed to a branch based off 4.20-rc2, but it wasn't merged
> > into the 4.20 release.
> >
> > $ git describe --match "v[0-9]*" --contains 0bd971676e68
> > v5.0-rc1~137^2~15
> >
> > So it doesn't need to go to stable.
>
> Which makes me wonder if Greg (KH) was really talking about that original
> patch and whether something worthwhile was dropped from stable by mistake?
Totally different thread, sorry for the noise, my fault...
greg k-h
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
2019-01-09 16:45 ` Frederic Barrat
@ 2019-01-09 16:58 ` Greg KH
-1 siblings, 0 replies; 19+ messages in thread
From: Greg KH @ 2019-01-09 16:58 UTC (permalink / raw)
To: Frederic Barrat; +Cc: aik, stable, linuxppc-dev, Greg Kurz, andrew.donnellan
On Wed, Jan 09, 2019 at 05:45:53PM +0100, Frederic Barrat wrote:
>
>
> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> > On Wed, 9 Jan 2019 16:13:42 +0100
> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> >
> > > With a recent change around IOMMU group, a system with an opencapi
> > > adapter is no longer booting and we get a kernel oops:
> > >
> > > BUG: Kernel NULL pointer dereference at 0x00000028
> > > Faulting instruction address: 0xc0000000000aa38c
> > > Oops: Kernel access of bad area, sig: 7 [#1]
> > > LE SMP NR_CPUS=2048 NUMA PowerNV
> > > Modules linked in:
> > > CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> > > NIP: c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> > > REGS: c000000005783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-00001-g3bd6
> > > MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000228 XER: 20
> > > CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> > > GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> > > GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> > > GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> > > GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> > > GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> > > GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> > > GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> > > GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> > > NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> > > LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> > > Call Trace:
> > > [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> > > [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> > > [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> > > [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> > > [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> > > [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> > > [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> > > [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> > >
> > > An opencapi device is using a device PE, so the current code breaks
> > > because pe->pbus is not defined.
> > >
> > > More generally, there's no need to define an IOMMU group for opencapi,
> > > as the device sends real addresses directly (admittedly, the
> > > virtualization story is yet to be written). So let's fix it by
> >
> > Current plan is to go for mediated VFIO. The real HW stays under the control
> > of the host ocxl driver, and we still don't need an IOMMU group.
> >
> > > skipping the IOMMU group setup for opencapi PHBs.
> > >
> > > Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> > > Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> > > ---
> >
> > Reviewed-by: Greg Kurz <groug@kaod.org>
> >
> > and
> >
> > Cc: stable@vger.kernel.org # v4.20
>
> Thanks for the review! But why did you add stable? that problem is only seen
> on 5.0-rc1, isn't it?
No, this is fixing a patch that got backported to stable.
Well, attempted to be backported, I dropped it because of the problem :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH] powerpc/powernv/npu: Fix oops in pnv_try_setup_npu_table_group()
@ 2019-01-09 16:58 ` Greg KH
0 siblings, 0 replies; 19+ messages in thread
From: Greg KH @ 2019-01-09 16:58 UTC (permalink / raw)
To: Frederic Barrat; +Cc: Greg Kurz, linuxppc-dev, aik, andrew.donnellan, stable
On Wed, Jan 09, 2019 at 05:45:53PM +0100, Frederic Barrat wrote:
>
>
> Le 09/01/2019 à 17:25, Greg Kurz a écrit :
> > On Wed, 9 Jan 2019 16:13:42 +0100
> > Frederic Barrat <fbarrat@linux.ibm.com> wrote:
> >
> > > With a recent change around IOMMU group, a system with an opencapi
> > > adapter is no longer booting and we get a kernel oops:
> > >
> > > BUG: Kernel NULL pointer dereference at 0x00000028
> > > Faulting instruction address: 0xc0000000000aa38c
> > > Oops: Kernel access of bad area, sig: 7 [#1]
> > > LE SMP NR_CPUS=2048 NUMA PowerNV
> > > Modules linked in:
> > > CPU: 5 PID: 1 Comm: swapper/4 Not tainted 5.0.0-rc1-fxb-00001-g3bd6e94bec12
> > > NIP: c0000000000aa38c LR: c0000000000a6608 CTR: c000000000097480
> > > REGS: c000000005783700 TRAP: 0300 Not tainted (5.0.0-rc1-fxb-00001-g3bd6
> > > MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000228 XER: 20
> > > CFAR: c0000000000a6604 DAR: 0000000000000028 DSISR: 00080000 IRQMASK: 0
> > > GPR00: c0000000000a6608 c000000005783990 c000000001036100 c0000007bf761860
> > > GPR04: 0000000000000000 c000000005783834 0000000000000000 0000000000000000
> > > GPR08: 69626d2c6e707500 0000000000000000 0000000000000000 9000000002001003
> > > GPR12: 0000000000000000 c0000007bfff8300 c000000000010450 0000000000000000
> > > GPR16: c000000000ced938 0000000000000100 c000000000ced948 00000000000a0000
> > > GPR20: 00000000000bfffe c000000000ced9a8 0000000000000200 c000000000ced978
> > > GPR24: 00000000006080c0 c000000716d09828 c00000002e6fd000 0000000000000000
> > > GPR28: c0000007bf4aff68 c0000007bf8d0080 c000000000f23938 c0000007bf761860
> > > NIP [c0000000000aa38c] pnv_try_setup_npu_table_group+0x1c/0x1a0
> > > LR [c0000000000a6608] pnv_pci_ioda_fixup+0x1f8/0x660
> > > Call Trace:
> > > [c000000005783990] [c0000000000aa3d0] pnv_try_setup_npu_table_group+0x60/0x
> > > [c0000000057839d0] [c0000000000a661c] pnv_pci_ioda_fixup+0x20c/0x660
> > > [c000000005783ab0] [c000000000e1d4c0] pcibios_resource_survey+0x2c8/0x31c
> > > [c000000005783b90] [c000000000e1caf4] pcibios_init+0xb0/0xe4
> > > [c000000005783c10] [c000000000010054] do_one_initcall+0x64/0x264
> > > [c000000005783ce0] [c000000000e1132c] kernel_init_freeable+0x36c/0x468
> > > [c000000005783db0] [c000000000010474] kernel_init+0x2c/0x148
> > > [c000000005783e20] [c00000000000b794] ret_from_kernel_thread+0x5c/0x68
> > >
> > > An opencapi device is using a device PE, so the current code breaks
> > > because pe->pbus is not defined.
> > >
> > > More generally, there's no need to define an IOMMU group for opencapi,
> > > as the device sends real addresses directly (admittedly, the
> > > virtualization story is yet to be written). So let's fix it by
> >
> > Current plan is to go for mediated VFIO. The real HW stays under the control
> > of the host ocxl driver, and we still don't need an IOMMU group.
> >
> > > skipping the IOMMU group setup for opencapi PHBs.
> > >
> > > Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups")
> > > Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
> > > ---
> >
> > Reviewed-by: Greg Kurz <groug@kaod.org>
> >
> > and
> >
> > Cc: stable@vger.kernel.org # v4.20
>
> Thanks for the review! But why did you add stable? that problem is only seen
> on 5.0-rc1, isn't it?
No, this is fixing a patch that got backported to stable.
Well, attempted to be backported, I dropped it because of the problem :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 19+ messages in thread