* [PATCH] iommu/arm-smmu-qcom: NULL pointer check for driver data @ 2023-08-16 22:55 Aravind Vijayakumar 2023-08-17 1:01 ` Rob Clark 0 siblings, 1 reply; 8+ messages in thread From: Aravind Vijayakumar @ 2023-08-16 22:55 UTC (permalink / raw) To: will, joro Cc: robin.murphy, dmitry.baryshkov, quic_bjorande, konrad.dybcio, quic_eberman, robdclark, quic_psodagud, quic_rvishwak, quic_saipraka, quic_molvera, marijn.suijten, mani, linux-arm-kernel, iommu, linux-arm-msm, Aravind Vijayakumar The driver_data is NULL when qcom_adreno_smmu_init_context() is called before the dev_set_drvdata() from the client driver and is resulting in kernel crash. So add a null pointer check to handle the scenario where the client driver for the GPU SMMU device would be setting the driver data after the smmu client device probe is done and not necessarily before that. The function qcom_adreno_smmu_init_context() assumes that the client driver always set the driver data using dev_set_drvdata() before the smmu client device probe, but this assumption is not always true. Signed-off-by: Aravind Vijayakumar <quic_aprasann@quicinc.com> --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index c71afda79d64..5323f82264ca 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -231,6 +231,9 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, */ priv = dev_get_drvdata(dev); + if (!priv) + return 0; + priv->cookie = smmu_domain; priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg; priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg; -- 2.40.1 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] iommu/arm-smmu-qcom: NULL pointer check for driver data 2023-08-16 22:55 [PATCH] iommu/arm-smmu-qcom: NULL pointer check for driver data Aravind Vijayakumar @ 2023-08-17 1:01 ` Rob Clark 2023-08-28 21:35 ` Aravind Vijayakumar 0 siblings, 1 reply; 8+ messages in thread From: Rob Clark @ 2023-08-17 1:01 UTC (permalink / raw) To: Aravind Vijayakumar Cc: will, joro, robin.murphy, dmitry.baryshkov, quic_bjorande, konrad.dybcio, quic_eberman, quic_psodagud, quic_rvishwak, quic_saipraka, quic_molvera, marijn.suijten, mani, linux-arm-kernel, iommu, linux-arm-msm On Wed, Aug 16, 2023 at 3:55 PM Aravind Vijayakumar <quic_aprasann@quicinc.com> wrote: > > The driver_data is NULL when qcom_adreno_smmu_init_context() > is called before the dev_set_drvdata() from the client driver > and is resulting in kernel crash. > > So add a null pointer check to handle the scenario > where the client driver for the GPU SMMU device would > be setting the driver data after the smmu client device > probe is done and not necessarily before that. The function > qcom_adreno_smmu_init_context() assumes that the client > driver always set the driver data using dev_set_drvdata() > before the smmu client device probe, but this assumption > is not always true. > > Signed-off-by: Aravind Vijayakumar <quic_aprasann@quicinc.com> > --- > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > index c71afda79d64..5323f82264ca 100644 > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > @@ -231,6 +231,9 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, > */ > > priv = dev_get_drvdata(dev); > + if (!priv) > + return 0; could this -EPROBE_DEFER instead, or something like that? I think you patch as proposed would result in per-process gpu pgtables silently failing BR, -R > + > priv->cookie = smmu_domain; > priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg; > priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg; > -- > 2.40.1 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] iommu/arm-smmu-qcom: NULL pointer check for driver data 2023-08-17 1:01 ` Rob Clark @ 2023-08-28 21:35 ` Aravind Vijayakumar 2023-08-29 14:30 ` Rob Clark 0 siblings, 1 reply; 8+ messages in thread From: Aravind Vijayakumar @ 2023-08-28 21:35 UTC (permalink / raw) To: Rob Clark Cc: will, joro, robin.murphy, dmitry.baryshkov, quic_bjorande, konrad.dybcio, quic_eberman, quic_psodagud, quic_rvishwak, quic_saipraka, quic_molvera, marijn.suijten, mani, linux-arm-kernel, iommu, linux-arm-msm On 8/16/2023 6:01 PM, Rob Clark wrote: > On Wed, Aug 16, 2023 at 3:55 PM Aravind Vijayakumar > <quic_aprasann@quicinc.com> wrote: >> The driver_data is NULL when qcom_adreno_smmu_init_context() >> is called before the dev_set_drvdata() from the client driver >> and is resulting in kernel crash. >> >> So add a null pointer check to handle the scenario >> where the client driver for the GPU SMMU device would >> be setting the driver data after the smmu client device >> probe is done and not necessarily before that. The function >> qcom_adreno_smmu_init_context() assumes that the client >> driver always set the driver data using dev_set_drvdata() >> before the smmu client device probe, but this assumption >> is not always true. >> >> Signed-off-by: Aravind Vijayakumar <quic_aprasann@quicinc.com> >> --- >> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >> index c71afda79d64..5323f82264ca 100644 >> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >> @@ -231,6 +231,9 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, >> */ >> >> priv = dev_get_drvdata(dev); >> + if (!priv) >> + return 0; > could this -EPROBE_DEFER instead, or something like that? I think you > patch as proposed would result in per-process gpu pgtables silently > failing > > BR, > -R Thanks for the review comments. Returning -EPROBE_DEFER wont work because the probe of the client driver (which sets the driver data) will never get triggered. However, the probe of the client driver succeeds if we return -ENODATA. would that be acceptable? Regards, Aravind >> + >> priv->cookie = smmu_domain; >> priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg; >> priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg; >> -- >> 2.40.1 >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] iommu/arm-smmu-qcom: NULL pointer check for driver data 2023-08-28 21:35 ` Aravind Vijayakumar @ 2023-08-29 14:30 ` Rob Clark 2023-09-08 5:17 ` Aravind Vijayakumar 0 siblings, 1 reply; 8+ messages in thread From: Rob Clark @ 2023-08-29 14:30 UTC (permalink / raw) To: Aravind Vijayakumar Cc: will, joro, robin.murphy, dmitry.baryshkov, quic_bjorande, konrad.dybcio, quic_eberman, quic_psodagud, quic_rvishwak, quic_saipraka, quic_molvera, marijn.suijten, mani, linux-arm-kernel, iommu, linux-arm-msm On Mon, Aug 28, 2023 at 2:35 PM Aravind Vijayakumar <quic_aprasann@quicinc.com> wrote: > > > On 8/16/2023 6:01 PM, Rob Clark wrote: > > On Wed, Aug 16, 2023 at 3:55 PM Aravind Vijayakumar > > <quic_aprasann@quicinc.com> wrote: > >> The driver_data is NULL when qcom_adreno_smmu_init_context() > >> is called before the dev_set_drvdata() from the client driver > >> and is resulting in kernel crash. > >> > >> So add a null pointer check to handle the scenario > >> where the client driver for the GPU SMMU device would > >> be setting the driver data after the smmu client device > >> probe is done and not necessarily before that. The function > >> qcom_adreno_smmu_init_context() assumes that the client > >> driver always set the driver data using dev_set_drvdata() > >> before the smmu client device probe, but this assumption > >> is not always true. > >> > >> Signed-off-by: Aravind Vijayakumar <quic_aprasann@quicinc.com> > >> --- > >> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > >> index c71afda79d64..5323f82264ca 100644 > >> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > >> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c > >> @@ -231,6 +231,9 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, > >> */ > >> > >> priv = dev_get_drvdata(dev); > >> + if (!priv) > >> + return 0; > > could this -EPROBE_DEFER instead, or something like that? I think you > > patch as proposed would result in per-process gpu pgtables silently > > failing > > > > BR, > > -R > > Thanks for the review comments. Returning -EPROBE_DEFER wont work > because the probe of the client driver (which sets the driver data) will > never get triggered. However, the probe of the client driver succeeds if > we return -ENODATA. would that be acceptable? I _think_ so.. I need to page back in the sequence of how this works, but I do have some warn_on's in drm/msm to complain loudly if we don't get per-process pgtables. I'd be interested to see the callstack where you hit this issue. From what I remember the sequence should be: 1) before the client dev probes, arm-smmu probes and attaches the dma-api managed iommu_domain (which IIRC should be an identity domain, and is otherwise unused).. at this point drvdata is NULL 2) the drm/msm can probe 3) at some point later when GPU fw is avail the GPU is loaded, drvdata is set, and we start creating and attaching the iommu_domain's that are actually used (one for kernel context and one each for userspace processes using the GPU I guess maybe if you are hitting this case of NULL drvdata, then you aren't getting an identity context for the dma-api managed iommu_domain? BR, -R > Regards, > > Aravind > > >> + > >> priv->cookie = smmu_domain; > >> priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg; > >> priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg; > >> -- > >> 2.40.1 > >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] iommu/arm-smmu-qcom: NULL pointer check for driver data 2023-08-29 14:30 ` Rob Clark @ 2023-09-08 5:17 ` Aravind Vijayakumar 2023-09-08 12:21 ` Robin Murphy 0 siblings, 1 reply; 8+ messages in thread From: Aravind Vijayakumar @ 2023-09-08 5:17 UTC (permalink / raw) To: Rob Clark Cc: will, joro, robin.murphy, dmitry.baryshkov, quic_bjorande, konrad.dybcio, quic_eberman, quic_psodagud, quic_rvishwak, quic_saipraka, quic_molvera, marijn.suijten, mani, linux-arm-kernel, iommu, linux-arm-msm On 8/29/2023 7:30 AM, Rob Clark wrote: > On Mon, Aug 28, 2023 at 2:35 PM Aravind Vijayakumar > <quic_aprasann@quicinc.com> wrote: >> >> On 8/16/2023 6:01 PM, Rob Clark wrote: >>> On Wed, Aug 16, 2023 at 3:55 PM Aravind Vijayakumar >>> <quic_aprasann@quicinc.com> wrote: >>>> The driver_data is NULL when qcom_adreno_smmu_init_context() >>>> is called before the dev_set_drvdata() from the client driver >>>> and is resulting in kernel crash. >>>> >>>> So add a null pointer check to handle the scenario >>>> where the client driver for the GPU SMMU device would >>>> be setting the driver data after the smmu client device >>>> probe is done and not necessarily before that. The function >>>> qcom_adreno_smmu_init_context() assumes that the client >>>> driver always set the driver data using dev_set_drvdata() >>>> before the smmu client device probe, but this assumption >>>> is not always true. >>>> >>>> Signed-off-by: Aravind Vijayakumar <quic_aprasann@quicinc.com> >>>> --- >>>> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>> index c71afda79d64..5323f82264ca 100644 >>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>> @@ -231,6 +231,9 @@ static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, >>>> */ >>>> >>>> priv = dev_get_drvdata(dev); >>>> + if (!priv) >>>> + return 0; >>> could this -EPROBE_DEFER instead, or something like that? I think you >>> patch as proposed would result in per-process gpu pgtables silently >>> failing >>> >>> BR, >>> -R >> Thanks for the review comments. Returning -EPROBE_DEFER wont work >> because the probe of the client driver (which sets the driver data) will >> never get triggered. However, the probe of the client driver succeeds if >> we return -ENODATA. would that be acceptable? > I _think_ so.. I need to page back in the sequence of how this works, > but I do have some warn_on's in drm/msm to complain loudly if we don't > get per-process pgtables. I'd be interested to see the callstack > where you hit this issue. From what I remember the sequence should > be: > > 1) before the client dev probes, arm-smmu probes and attaches the > dma-api managed iommu_domain (which IIRC should be an identity domain, > and is otherwise unused).. at this point drvdata is NULL > 2) the drm/msm can probe > 3) at some point later when GPU fw is avail the GPU is loaded, drvdata > is set, and we start creating and attaching the iommu_domain's that > are actually used (one for kernel context and one each for userspace > processes using the GPU > > I guess maybe if you are hitting this case of NULL drvdata, then you > aren't getting an identity context for the dma-api managed > iommu_domain? > > BR, > -R > Yes, there are some warn_ons in io-pgtable.c, which have helped a lot during debugging. The following is the call stack when we are hitting the issue: qcom_adreno_smmu_init_context+0x28/0x100 arm_smmu_init_domain_context+0x1fc/0x4cc arm_smmu_attach_dev+0x7c/0x410 __iommu_attach_device+0x28/0x110 iommu_probe_device+0x98/0x144 of_iommu_configure+0x1f0/0x278 of_dma_configure_id+0x15c/0x320 platform_dma_configure+0x24/0x90 really_probe+0x138/0x39c __driver_probe_device+0x114/0x190 device_driver_attach+0x4c/0xac bind_store+0xb8/0x110 This is the call stack during platform_driver_register() , if there is no NULL check then the initial probe crashes, if there is NULL check, instead of crashing, the really_probe returns and we can call of_dma_configure again from the driver probe after setting the driver data. Please let me know if there is any concerns? Regards, Aravind >> Regards, >> >> Aravind >> >>>> + >>>> priv->cookie = smmu_domain; >>>> priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg; >>>> priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg; >>>> -- >>>> 2.40.1 >>>> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] iommu/arm-smmu-qcom: NULL pointer check for driver data 2023-09-08 5:17 ` Aravind Vijayakumar @ 2023-09-08 12:21 ` Robin Murphy 2023-09-15 0:20 ` Aravind Vijayakumar 0 siblings, 1 reply; 8+ messages in thread From: Robin Murphy @ 2023-09-08 12:21 UTC (permalink / raw) To: Aravind Vijayakumar, Rob Clark Cc: will, joro, dmitry.baryshkov, quic_bjorande, konrad.dybcio, quic_eberman, quic_psodagud, quic_rvishwak, quic_saipraka, quic_molvera, marijn.suijten, mani, linux-arm-kernel, iommu, linux-arm-msm On 2023-09-08 06:17, Aravind Vijayakumar wrote: > > On 8/29/2023 7:30 AM, Rob Clark wrote: >> On Mon, Aug 28, 2023 at 2:35 PM Aravind Vijayakumar >> <quic_aprasann@quicinc.com> wrote: >>> >>> On 8/16/2023 6:01 PM, Rob Clark wrote: >>>> On Wed, Aug 16, 2023 at 3:55 PM Aravind Vijayakumar >>>> <quic_aprasann@quicinc.com> wrote: >>>>> The driver_data is NULL when qcom_adreno_smmu_init_context() >>>>> is called before the dev_set_drvdata() from the client driver >>>>> and is resulting in kernel crash. >>>>> >>>>> So add a null pointer check to handle the scenario >>>>> where the client driver for the GPU SMMU device would >>>>> be setting the driver data after the smmu client device >>>>> probe is done and not necessarily before that. The function >>>>> qcom_adreno_smmu_init_context() assumes that the client >>>>> driver always set the driver data using dev_set_drvdata() >>>>> before the smmu client device probe, but this assumption >>>>> is not always true. >>>>> >>>>> Signed-off-by: Aravind Vijayakumar <quic_aprasann@quicinc.com> >>>>> --- >>>>> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 3 +++ >>>>> 1 file changed, 3 insertions(+) >>>>> >>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>>> index c71afda79d64..5323f82264ca 100644 >>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>>> @@ -231,6 +231,9 @@ static int qcom_adreno_smmu_init_context(struct >>>>> arm_smmu_domain *smmu_domain, >>>>> */ >>>>> >>>>> priv = dev_get_drvdata(dev); >>>>> + if (!priv) >>>>> + return 0; >>>> could this -EPROBE_DEFER instead, or something like that? I think you >>>> patch as proposed would result in per-process gpu pgtables silently >>>> failing >>>> >>>> BR, >>>> -R >>> Thanks for the review comments. Returning -EPROBE_DEFER wont work >>> because the probe of the client driver (which sets the driver data) will >>> never get triggered. However, the probe of the client driver succeeds if >>> we return -ENODATA. would that be acceptable? >> I _think_ so.. I need to page back in the sequence of how this works, >> but I do have some warn_on's in drm/msm to complain loudly if we don't >> get per-process pgtables. I'd be interested to see the callstack >> where you hit this issue. From what I remember the sequence should >> be: >> >> 1) before the client dev probes, arm-smmu probes and attaches the >> dma-api managed iommu_domain (which IIRC should be an identity domain, >> and is otherwise unused).. at this point drvdata is NULL >> 2) the drm/msm can probe >> 3) at some point later when GPU fw is avail the GPU is loaded, drvdata >> is set, and we start creating and attaching the iommu_domain's that >> are actually used (one for kernel context and one each for userspace >> processes using the GPU >> >> I guess maybe if you are hitting this case of NULL drvdata, then you >> aren't getting an identity context for the dma-api managed >> iommu_domain? >> >> BR, >> -R >> > Yes, there are some warn_ons in io-pgtable.c, which have helped a lot > during debugging. The following is the call stack when we are hitting > the issue: > > qcom_adreno_smmu_init_context+0x28/0x100 > arm_smmu_init_domain_context+0x1fc/0x4cc > arm_smmu_attach_dev+0x7c/0x410 > __iommu_attach_device+0x28/0x110 > iommu_probe_device+0x98/0x144 > of_iommu_configure+0x1f0/0x278 > of_dma_configure_id+0x15c/0x320 > platform_dma_configure+0x24/0x90 > really_probe+0x138/0x39c > __driver_probe_device+0x114/0x190 > device_driver_attach+0x4c/0xac > bind_store+0xb8/0x110 OK, so it looks like you are indeed getting a non-identity default domain as Rob suspected. I guess that means qcom_smmu_client_of_match needs updating for this platform? (In which case, maybe a WARN() here to point in that direction might be handy as well?) Thanks, Robin. > > This is the call stack during platform_driver_register() , if there is > no NULL check then the initial probe crashes, if there is NULL check, > instead of crashing, the really_probe returns and we can call > of_dma_configure again from the driver probe after setting the driver > data. Please let me know if there is any concerns? > > Regards, > > Aravind > >>> Regards, >>> >>> Aravind >>> >>>>> + >>>>> priv->cookie = smmu_domain; >>>>> priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg; >>>>> priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg; >>>>> -- >>>>> 2.40.1 >>>>> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] iommu/arm-smmu-qcom: NULL pointer check for driver data 2023-09-08 12:21 ` Robin Murphy @ 2023-09-15 0:20 ` Aravind Vijayakumar 2023-09-15 10:08 ` Robin Murphy 0 siblings, 1 reply; 8+ messages in thread From: Aravind Vijayakumar @ 2023-09-15 0:20 UTC (permalink / raw) To: Robin Murphy, Rob Clark Cc: will, joro, dmitry.baryshkov, quic_bjorande, konrad.dybcio, quic_eberman, quic_psodagud, quic_rvishwak, quic_saipraka, quic_molvera, marijn.suijten, mani, linux-arm-kernel, iommu, linux-arm-msm On 9/8/2023 5:21 AM, Robin Murphy wrote: > On 2023-09-08 06:17, Aravind Vijayakumar wrote: >> >> On 8/29/2023 7:30 AM, Rob Clark wrote: >>> On Mon, Aug 28, 2023 at 2:35 PM Aravind Vijayakumar >>> <quic_aprasann@quicinc.com> wrote: >>>> >>>> On 8/16/2023 6:01 PM, Rob Clark wrote: >>>>> On Wed, Aug 16, 2023 at 3:55 PM Aravind Vijayakumar >>>>> <quic_aprasann@quicinc.com> wrote: >>>>>> The driver_data is NULL when qcom_adreno_smmu_init_context() >>>>>> is called before the dev_set_drvdata() from the client driver >>>>>> and is resulting in kernel crash. >>>>>> >>>>>> So add a null pointer check to handle the scenario >>>>>> where the client driver for the GPU SMMU device would >>>>>> be setting the driver data after the smmu client device >>>>>> probe is done and not necessarily before that. The function >>>>>> qcom_adreno_smmu_init_context() assumes that the client >>>>>> driver always set the driver data using dev_set_drvdata() >>>>>> before the smmu client device probe, but this assumption >>>>>> is not always true. >>>>>> >>>>>> Signed-off-by: Aravind Vijayakumar <quic_aprasann@quicinc.com> >>>>>> --- >>>>>> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 3 +++ >>>>>> 1 file changed, 3 insertions(+) >>>>>> >>>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>>>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>>>> index c71afda79d64..5323f82264ca 100644 >>>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>>>> @@ -231,6 +231,9 @@ static int >>>>>> qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, >>>>>> */ >>>>>> >>>>>> priv = dev_get_drvdata(dev); >>>>>> + if (!priv) >>>>>> + return 0; >>>>> could this -EPROBE_DEFER instead, or something like that? I think you >>>>> patch as proposed would result in per-process gpu pgtables silently >>>>> failing >>>>> >>>>> BR, >>>>> -R >>>> Thanks for the review comments. Returning -EPROBE_DEFER wont work >>>> because the probe of the client driver (which sets the driver data) >>>> will >>>> never get triggered. However, the probe of the client driver >>>> succeeds if >>>> we return -ENODATA. would that be acceptable? >>> I _think_ so.. I need to page back in the sequence of how this works, >>> but I do have some warn_on's in drm/msm to complain loudly if we don't >>> get per-process pgtables. I'd be interested to see the callstack >>> where you hit this issue. From what I remember the sequence should >>> be: >>> >>> 1) before the client dev probes, arm-smmu probes and attaches the >>> dma-api managed iommu_domain (which IIRC should be an identity domain, >>> and is otherwise unused).. at this point drvdata is NULL >>> 2) the drm/msm can probe >>> 3) at some point later when GPU fw is avail the GPU is loaded, drvdata >>> is set, and we start creating and attaching the iommu_domain's that >>> are actually used (one for kernel context and one each for userspace >>> processes using the GPU >>> >>> I guess maybe if you are hitting this case of NULL drvdata, then you >>> aren't getting an identity context for the dma-api managed >>> iommu_domain? >>> >>> BR, >>> -R >>> >> Yes, there are some warn_ons in io-pgtable.c, which have helped a lot >> during debugging. The following is the call stack when we are hitting >> the issue: >> >> qcom_adreno_smmu_init_context+0x28/0x100 >> arm_smmu_init_domain_context+0x1fc/0x4cc >> arm_smmu_attach_dev+0x7c/0x410 >> __iommu_attach_device+0x28/0x110 >> iommu_probe_device+0x98/0x144 >> of_iommu_configure+0x1f0/0x278 >> of_dma_configure_id+0x15c/0x320 >> platform_dma_configure+0x24/0x90 >> really_probe+0x138/0x39c >> __driver_probe_device+0x114/0x190 >> device_driver_attach+0x4c/0xac >> bind_store+0xb8/0x110 > > OK, so it looks like you are indeed getting a non-identity default > domain as Rob suspected. I guess that means qcom_smmu_client_of_match > needs updating for this platform? (In which case, maybe a WARN() here > to point in that direction might be handy as well?) > > Thanks, > Robin. > Hi Rob and Robin, With the NULL check and return -ENODATA, we are getting the dma-api managed iommu domain, and reattach the device to iommu domain by calling "of_dma_configure" from the driver probe, but without the NULL check it will be a kernel crash. Also, with the NULL check we don't have to update the qcom_smmu_client_of_match and we can use the existing function - "qcom_adreno_smmu_init_context" itself. Regards, Aravind >> >> This is the call stack during platform_driver_register() , if there >> is no NULL check then the initial probe crashes, if there is NULL >> check, instead of crashing, the really_probe returns and we can call >> of_dma_configure again from the driver probe after setting the driver >> data. Please let me know if there is any concerns? >> >> Regards, >> >> Aravind >> >>>> Regards, >>>> >>>> Aravind >>>> >>>>>> + >>>>>> priv->cookie = smmu_domain; >>>>>> priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg; >>>>>> priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg; >>>>>> -- >>>>>> 2.40.1 >>>>>> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] iommu/arm-smmu-qcom: NULL pointer check for driver data 2023-09-15 0:20 ` Aravind Vijayakumar @ 2023-09-15 10:08 ` Robin Murphy 0 siblings, 0 replies; 8+ messages in thread From: Robin Murphy @ 2023-09-15 10:08 UTC (permalink / raw) To: Aravind Vijayakumar, Rob Clark Cc: will, joro, dmitry.baryshkov, quic_bjorande, konrad.dybcio, quic_eberman, quic_psodagud, quic_rvishwak, quic_saipraka, quic_molvera, marijn.suijten, mani, linux-arm-kernel, iommu, linux-arm-msm On 2023-09-15 01:20, Aravind Vijayakumar wrote: > > On 9/8/2023 5:21 AM, Robin Murphy wrote: >> On 2023-09-08 06:17, Aravind Vijayakumar wrote: >>> >>> On 8/29/2023 7:30 AM, Rob Clark wrote: >>>> On Mon, Aug 28, 2023 at 2:35 PM Aravind Vijayakumar >>>> <quic_aprasann@quicinc.com> wrote: >>>>> >>>>> On 8/16/2023 6:01 PM, Rob Clark wrote: >>>>>> On Wed, Aug 16, 2023 at 3:55 PM Aravind Vijayakumar >>>>>> <quic_aprasann@quicinc.com> wrote: >>>>>>> The driver_data is NULL when qcom_adreno_smmu_init_context() >>>>>>> is called before the dev_set_drvdata() from the client driver >>>>>>> and is resulting in kernel crash. >>>>>>> >>>>>>> So add a null pointer check to handle the scenario >>>>>>> where the client driver for the GPU SMMU device would >>>>>>> be setting the driver data after the smmu client device >>>>>>> probe is done and not necessarily before that. The function >>>>>>> qcom_adreno_smmu_init_context() assumes that the client >>>>>>> driver always set the driver data using dev_set_drvdata() >>>>>>> before the smmu client device probe, but this assumption >>>>>>> is not always true. >>>>>>> >>>>>>> Signed-off-by: Aravind Vijayakumar <quic_aprasann@quicinc.com> >>>>>>> --- >>>>>>> drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 3 +++ >>>>>>> 1 file changed, 3 insertions(+) >>>>>>> >>>>>>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>>>>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>>>>> index c71afda79d64..5323f82264ca 100644 >>>>>>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>>>>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>>>>>> @@ -231,6 +231,9 @@ static int >>>>>>> qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, >>>>>>> */ >>>>>>> >>>>>>> priv = dev_get_drvdata(dev); >>>>>>> + if (!priv) >>>>>>> + return 0; >>>>>> could this -EPROBE_DEFER instead, or something like that? I think you >>>>>> patch as proposed would result in per-process gpu pgtables silently >>>>>> failing >>>>>> >>>>>> BR, >>>>>> -R >>>>> Thanks for the review comments. Returning -EPROBE_DEFER wont work >>>>> because the probe of the client driver (which sets the driver data) >>>>> will >>>>> never get triggered. However, the probe of the client driver >>>>> succeeds if >>>>> we return -ENODATA. would that be acceptable? >>>> I _think_ so.. I need to page back in the sequence of how this works, >>>> but I do have some warn_on's in drm/msm to complain loudly if we don't >>>> get per-process pgtables. I'd be interested to see the callstack >>>> where you hit this issue. From what I remember the sequence should >>>> be: >>>> >>>> 1) before the client dev probes, arm-smmu probes and attaches the >>>> dma-api managed iommu_domain (which IIRC should be an identity domain, >>>> and is otherwise unused).. at this point drvdata is NULL >>>> 2) the drm/msm can probe >>>> 3) at some point later when GPU fw is avail the GPU is loaded, drvdata >>>> is set, and we start creating and attaching the iommu_domain's that >>>> are actually used (one for kernel context and one each for userspace >>>> processes using the GPU >>>> >>>> I guess maybe if you are hitting this case of NULL drvdata, then you >>>> aren't getting an identity context for the dma-api managed >>>> iommu_domain? >>>> >>>> BR, >>>> -R >>>> >>> Yes, there are some warn_ons in io-pgtable.c, which have helped a lot >>> during debugging. The following is the call stack when we are hitting >>> the issue: >>> >>> qcom_adreno_smmu_init_context+0x28/0x100 >>> arm_smmu_init_domain_context+0x1fc/0x4cc >>> arm_smmu_attach_dev+0x7c/0x410 >>> __iommu_attach_device+0x28/0x110 >>> iommu_probe_device+0x98/0x144 >>> of_iommu_configure+0x1f0/0x278 >>> of_dma_configure_id+0x15c/0x320 >>> platform_dma_configure+0x24/0x90 >>> really_probe+0x138/0x39c >>> __driver_probe_device+0x114/0x190 >>> device_driver_attach+0x4c/0xac >>> bind_store+0xb8/0x110 >> >> OK, so it looks like you are indeed getting a non-identity default >> domain as Rob suspected. I guess that means qcom_smmu_client_of_match >> needs updating for this platform? (In which case, maybe a WARN() here >> to point in that direction might be handy as well?) >> >> Thanks, >> Robin. >> > Hi Rob and Robin, > > With the NULL check and return -ENODATA, we are getting the dma-api > managed iommu domain, and reattach the device to iommu domain by calling > "of_dma_configure" from the driver probe, but without the NULL check it > will be a kernel crash. Um, what? Drivers should absolutely not be calling of_dma_configure() on their own devices, that is a massively egregious abuse of the API. > Also, with the NULL check we don't have to update the > qcom_smmu_client_of_match and we can use the existing function - > "qcom_adreno_smmu_init_context" itself. So making a bunch of conceptually-wrong code changes "saves" making the correct 1-line data change for the mechanism to work as designed? That hardly sounds like a good justification to me :/ Thanks, Robin. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-09-15 10:09 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-16 22:55 [PATCH] iommu/arm-smmu-qcom: NULL pointer check for driver data Aravind Vijayakumar 2023-08-17 1:01 ` Rob Clark 2023-08-28 21:35 ` Aravind Vijayakumar 2023-08-29 14:30 ` Rob Clark 2023-09-08 5:17 ` Aravind Vijayakumar 2023-09-08 12:21 ` Robin Murphy 2023-09-15 0:20 ` Aravind Vijayakumar 2023-09-15 10:08 ` Robin Murphy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox