netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NULL pointer dereference when removing xmm7360 PCI device
@ 2023-04-27 10:31 Samuel Wein PhD
  2023-04-27 21:08 ` Jakub Kicinski
  0 siblings, 1 reply; 6+ messages in thread
From: Samuel Wein PhD @ 2023-04-27 10:31 UTC (permalink / raw)
  To: netdev@vger.kernel.org

Hi Folks,
I've been trying to get the xmm7360 working with IOSM and the ModemManager. This has been what my highschool advisor would call a "learning process".
When trying `echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/remove` I get a variety of errors. One of these is a kernel error
`2023-04-27T12:23:38.937223+02:00 Nase kernel: [  587.997430] BUG: kernel NULL pointer dereference, address: 0000000000000048
2023-04-27T12:23:38.937237+02:00 Nase kernel: [  587.997447] #PF: supervisor read access in kernel mode
2023-04-27T12:23:38.937238+02:00 Nase kernel: [  587.997455] #PF: error_code(0x0000) - not-present page
2023-04-27T12:23:38.937241+02:00 Nase kernel: [  587.997463] PGD 0 P4D 0 
2023-04-27T12:23:38.937242+02:00 Nase kernel: [  587.997476] Oops: 0000 [#1] PREEMPT SMP NOPTI
2023-04-27T12:23:38.937242+02:00 Nase kernel: [  587.997489] CPU: 1 PID: 4767 Comm: bash Not tainted 6.3.0-060300-generic #202304232030
...
`
the full log is available at https://gist.github.com/poshul/0c5ffbde6106a71adcbc132d828dbcd7

Steps to reproduce: Boot device with xmm7360 installed and in PCI mode, place into suspend. Resume, and start issuing reset/remove commands to the PCI interface (without properly unloading the IOSM module first).

I'm not sure how widely applicable this is but wanted to at least report it.

Sam

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NULL pointer dereference when removing xmm7360 PCI device
  2023-04-27 10:31 NULL pointer dereference when removing xmm7360 PCI device Samuel Wein PhD
@ 2023-04-27 21:08 ` Jakub Kicinski
  2023-04-28  9:21   ` Kumar, M Chetan
  0 siblings, 1 reply; 6+ messages in thread
From: Jakub Kicinski @ 2023-04-27 21:08 UTC (permalink / raw)
  To: M Chetan Kumar; +Cc: Samuel Wein PhD, netdev@vger.kernel.org, Intel Corporation

On Thu, 27 Apr 2023 10:31:29 +0000 Samuel Wein PhD wrote:
> Hi Folks,
> I've been trying to get the xmm7360 working with IOSM and the ModemManager. This has been what my highschool advisor would call a "learning process".
> When trying `echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/remove` I get a variety of errors. One of these is a kernel error
> `2023-04-27T12:23:38.937223+02:00 Nase kernel: [  587.997430] BUG: kernel NULL pointer dereference, address: 0000000000000048
> 2023-04-27T12:23:38.937237+02:00 Nase kernel: [  587.997447] #PF: supervisor read access in kernel mode
> 2023-04-27T12:23:38.937238+02:00 Nase kernel: [  587.997455] #PF: error_code(0x0000) - not-present page
> 2023-04-27T12:23:38.937241+02:00 Nase kernel: [  587.997463] PGD 0 P4D 0 
> 2023-04-27T12:23:38.937242+02:00 Nase kernel: [  587.997476] Oops: 0000 [#1] PREEMPT SMP NOPTI
> 2023-04-27T12:23:38.937242+02:00 Nase kernel: [  587.997489] CPU: 1 PID: 4767 Comm: bash Not tainted 6.3.0-060300-generic #202304232030
> ...
> `
> the full log is available at https://gist.github.com/poshul/0c5ffbde6106a71adcbc132d828dbcd7
> 
> Steps to reproduce: Boot device with xmm7360 installed and in PCI mode, place into suspend. Resume, and start issuing reset/remove commands to the PCI interface (without properly unloading the IOSM module first).
> 
> I'm not sure how widely applicable this is but wanted to at least report it.

Intel folks, PTAL.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: NULL pointer dereference when removing xmm7360 PCI device
  2023-04-27 21:08 ` Jakub Kicinski
@ 2023-04-28  9:21   ` Kumar, M Chetan
  2023-04-28  9:47     ` Samuel Wein PhD
  0 siblings, 1 reply; 6+ messages in thread
From: Kumar, M Chetan @ 2023-04-28  9:21 UTC (permalink / raw)
  To: Samuel Wein PhD; +Cc: Jakub Kicinski, netdev@vger.kernel.org, linuxwwan

> -----Original Message-----
> From: Jakub Kicinski <kuba@kernel.org>
> Sent: Friday, April 28, 2023 2:38 AM
> To: Kumar, M Chetan <m.chetan.kumar@intel.com>
> Cc: Samuel Wein PhD <sam@samwein.com>; netdev@vger.kernel.org;
> linuxwwan <linuxwwan@intel.com>
> Subject: Re: NULL pointer dereference when removing xmm7360 PCI device
> 
> On Thu, 27 Apr 2023 10:31:29 +0000 Samuel Wein PhD wrote:
> > Hi Folks,
> > I've been trying to get the xmm7360 working with IOSM and the
> ModemManager. This has been what my highschool advisor would call a
> "learning process".
> > When trying `echo 1 >
> > /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/remove` I get a
> > variety of errors. One of these is a kernel error
> > `2023-04-27T12:23:38.937223+02:00 Nase kernel: [  587.997430] BUG:
> > kernel NULL pointer dereference, address: 0000000000000048
> > 2023-04-27T12:23:38.937237+02:00 Nase kernel: [  587.997447] #PF:
> > supervisor read access in kernel mode
> > 2023-04-27T12:23:38.937238+02:00 Nase kernel: [  587.997455] #PF:
> > error_code(0x0000) - not-present page
> > 2023-04-27T12:23:38.937241+02:00 Nase kernel: [  587.997463] PGD 0 P4D
> > 0
> > 2023-04-27T12:23:38.937242+02:00 Nase kernel: [  587.997476] Oops:
> > 0000 [#1] PREEMPT SMP NOPTI
> > 2023-04-27T12:23:38.937242+02:00 Nase kernel: [  587.997489] CPU: 1
> > PID: 4767 Comm: bash Not tainted 6.3.0-060300-generic #202304232030 ...
> > `
> > the full log is available at
> > https://gist.github.com/poshul/0c5ffbde6106a71adcbc132d828dbcd7
> >
> > Steps to reproduce: Boot device with xmm7360 installed and in PCI mode,
> place into suspend. Resume, and start issuing reset/remove commands to
> the PCI interface (without properly unloading the IOSM module first).
> >
> > I'm not sure how widely applicable this is but wanted to at least report it.
> 
> Intel folks, PTAL.

I tried reproducing the issue by following the steps you mentioned but so far could not
reproduce it. Could you please share the logs from boot-up and procedure you carried out in steps.

Once you boot-up the laptop, driver will be in working condition why device removal ?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: NULL pointer dereference when removing xmm7360 PCI device
  2023-04-28  9:21   ` Kumar, M Chetan
@ 2023-04-28  9:47     ` Samuel Wein PhD
  2023-04-28 14:24       ` Kumar, M Chetan
  0 siblings, 1 reply; 6+ messages in thread
From: Samuel Wein PhD @ 2023-04-28  9:47 UTC (permalink / raw)
  To: Kumar, M Chetan; +Cc: Jakub Kicinski, netdev@vger.kernel.org, linuxwwan

I've added the full set of logs to the gist. The sequence was:
boot PC
suspend PC
resume PC
repeat the following several (it varies) times until the error:
`sudo echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/remove`
`echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/rescan`

The XMM7360 card doesn't resume properly from suspend, with the AT devices not responding to any commands. I've been working off of some of Shane Parslow's code to try to get ModemManager control of the device and was basically trying a bunch of ways to try to reset the card, this one clearly didn't work, but it also clearly exposed some weirdness in how the kernel handled the attempt. The fact that there is not a proper MBIM interface for this card really makes managing it difficult.


------- Original Message -------
On Friday, April 28th, 2023 at 11:21 AM, Kumar, M Chetan <m.chetan.kumar@intel.com> wrote:


> 
> 
> > -----Original Message-----
> 
> > From: Jakub Kicinski kuba@kernel.org
> > Sent: Friday, April 28, 2023 2:38 AM
> > To: Kumar, M Chetan m.chetan.kumar@intel.com
> > Cc: Samuel Wein PhD sam@samwein.com; netdev@vger.kernel.org;
> > linuxwwan linuxwwan@intel.com
> > Subject: Re: NULL pointer dereference when removing xmm7360 PCI device
> > 
> > On Thu, 27 Apr 2023 10:31:29 +0000 Samuel Wein PhD wrote:
> > 
> > > Hi Folks,
> > > I've been trying to get the xmm7360 working with IOSM and the
> > > ModemManager. This has been what my highschool advisor would call a
> > > "learning process".
> > > When trying `echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/remove` I get a
> > > variety of errors. One of these is a kernel error
> > > `2023-04-27T12:23:38.937223+02:00 Nase kernel: [ 587.997430] BUG: kernel NULL pointer dereference, address: 0000000000000048 2023-04-27T12:23:38.937237+02:00 Nase kernel: [ 587.997447] #PF: supervisor read access in kernel mode 2023-04-27T12:23:38.937238+02:00 Nase kernel: [ 587.997455] #PF: error_code(0x0000) - not-present page 2023-04-27T12:23:38.937241+02:00 Nase kernel: [ 587.997463] PGD 0 P4D 0 2023-04-27T12:23:38.937242+02:00 Nase kernel: [ 587.997476] Oops: 0000 [#1] PREEMPT SMP NOPTI 2023-04-27T12:23:38.937242+02:00 Nase kernel: [ 587.997489] CPU: 1 PID: 4767 Comm: bash Not tainted 6.3.0-060300-generic #202304232030 ...`
> > > the full log is available at
> > > https://gist.github.com/poshul/0c5ffbde6106a71adcbc132d828dbcd7
> > > 
> > > Steps to reproduce: Boot device with xmm7360 installed and in PCI mode,
> > > place into suspend. Resume, and start issuing reset/remove commands to
> > > the PCI interface (without properly unloading the IOSM module first).
> > > 
> > > I'm not sure how widely applicable this is but wanted to at least report it.
> > 
> > Intel folks, PTAL.
> 
> 
> I tried reproducing the issue by following the steps you mentioned but so far could not
> reproduce it. Could you please share the logs from boot-up and procedure you carried out in steps.
> 
> Once you boot-up the laptop, driver will be in working condition why device removal ?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: NULL pointer dereference when removing xmm7360 PCI device
  2023-04-28  9:47     ` Samuel Wein PhD
@ 2023-04-28 14:24       ` Kumar, M Chetan
  2023-05-01 16:00         ` Samuel Wein PhD
  0 siblings, 1 reply; 6+ messages in thread
From: Kumar, M Chetan @ 2023-04-28 14:24 UTC (permalink / raw)
  To: Samuel Wein PhD; +Cc: Jakub Kicinski, netdev@vger.kernel.org, linuxwwan

> -----Original Message-----
> From: Samuel Wein PhD <sam@samwein.com>
> Sent: Friday, April 28, 2023 3:18 PM
> To: Kumar, M Chetan <m.chetan.kumar@intel.com>
> Cc: Jakub Kicinski <kuba@kernel.org>; netdev@vger.kernel.org; linuxwwan
> <linuxwwan@intel.com>
> Subject: RE: NULL pointer dereference when removing xmm7360 PCI device
> 
> I've added the full set of logs to the gist. The sequence was:
> boot PC
> suspend PC
> resume PC
> repeat the following several (it varies) times until the error:
> `sudo echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/remove`
> `echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/rescan`
> 
> The XMM7360 card doesn't resume properly from suspend, with the AT
> devices not responding to any commands. I've been working off of some of
> Shane Parslow's code to try to get ModemManager control of the device and
> was basically trying a bunch of ways to try to reset the card, this one clearly
> didn't work, but it also clearly exposed some weirdness in how the kernel
> handled the attempt. The fact that there is not a proper MBIM interface for
> this card really makes managing it difficult.
> 
> 
> ------- Original Message -------
> On Friday, April 28th, 2023 at 11:21 AM, Kumar, M Chetan
> <m.chetan.kumar@intel.com> wrote:
> 
> 
> >
> >
> > > -----Original Message-----
> >
> > > From: Jakub Kicinski kuba@kernel.org
> > > Sent: Friday, April 28, 2023 2:38 AM
> > > To: Kumar, M Chetan m.chetan.kumar@intel.com
> > > Cc: Samuel Wein PhD sam@samwein.com; netdev@vger.kernel.org;
> > > linuxwwan linuxwwan@intel.com
> > > Subject: Re: NULL pointer dereference when removing xmm7360 PCI
> > > device
> > >
> > > On Thu, 27 Apr 2023 10:31:29 +0000 Samuel Wein PhD wrote:
> > >
> > > > Hi Folks,
> > > > I've been trying to get the xmm7360 working with IOSM and the
> > > > ModemManager. This has been what my highschool advisor would call
> > > > a "learning process".
> > > > When trying `echo 1 >
> > > > /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/remove` I get a
> > > > variety of errors. One of these is a kernel error
> > > > `2023-04-27T12:23:38.937223+02:00 Nase kernel: [ 587.997430] BUG:
> > > > kernel NULL pointer dereference, address: 0000000000000048
> > > > 2023-04-27T12:23:38.937237+02:00 Nase kernel: [ 587.997447] #PF:
> > > > supervisor read access in kernel mode
> > > > 2023-04-27T12:23:38.937238+02:00 Nase kernel: [ 587.997455] #PF:
> > > > error_code(0x0000) - not-present page
> > > > 2023-04-27T12:23:38.937241+02:00 Nase kernel: [ 587.997463] PGD 0
> > > > P4D 0 2023-04-27T12:23:38.937242+02:00 Nase kernel: [ 587.997476]
> > > > Oops: 0000 [#1] PREEMPT SMP NOPTI 2023-04-
> 27T12:23:38.937242+02:00
> > > > Nase kernel: [ 587.997489] CPU: 1 PID: 4767 Comm: bash Not tainted
> > > > 6.3.0-060300-generic #202304232030 ...` the full log is available
> > > > at
> > > > https://gist.github.com/poshul/0c5ffbde6106a71adcbc132d828dbcd7
> > > >
> > > > Steps to reproduce: Boot device with xmm7360 installed and in PCI
> > > > mode, place into suspend. Resume, and start issuing reset/remove
> > > > commands to the PCI interface (without properly unloading the IOSM
> module first).
> > > >
> > > > I'm not sure how widely applicable this is but wanted to at least report
> it.
> > >
> > > Intel folks, PTAL.
> >
> >
> > I tried reproducing the issue by following the steps you mentioned but
> > so far could not reproduce it. Could you please share the logs from boot-up
> and procedure you carried out in steps.
> >
> > Once you boot-up the laptop, driver will be in working condition why
> device removal ?

There seems to be some issue at device side due to which communication is not
getting restored after suspend exit.

At some point device communication is broken and wwan is not initialized.
In remove path, accessing wwan to release resource is leading to NULL pointer
exception.

Error handling need to be corrected. Try the patch [1] it should fix.

Another thing for restoring the communication from such state, remove and rescan is
not enough you may have to follow below procedure for a graceful recover.

echo 1 > /sys/bus/pci/drivers/iosm/0000:01:00.0/reset 		-> replace BDF
echo 1 > /sys/bus/pci/drivers/iosm/0000:01:00.0/remove
echo 1 > /sys/bus/pci/rescan

[1]
diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem.c b/drivers/net/wwan/iosm/iosm_ipc_imem.c
index c066b0040a3f..0b4efd2571dd 100644
--- a/drivers/net/wwan/iosm/iosm_ipc_imem.c
+++ b/drivers/net/wwan/iosm/iosm_ipc_imem.c
@@ -565,24 +565,32 @@ static void ipc_imem_run_state_worker(struct work_struct *instance)
 	struct ipc_mux_config mux_cfg;
 	struct iosm_imem *ipc_imem;
 	u8 ctrl_chl_idx = 0;
+	int ret;
 
 	ipc_imem = container_of(instance, struct iosm_imem, run_state_worker);
 
 	if (ipc_imem->phase != IPC_P_RUN) {
 		dev_err(ipc_imem->dev,
 			"Modem link down. Exit run state worker.");
-		return;
+		goto err_cp_phase;
 	}
 
 	if (test_and_clear_bit(IOSM_DEVLINK_INIT, &ipc_imem->flag))
 		ipc_devlink_deinit(ipc_imem->ipc_devlink);
 
-	if (!ipc_imem_setup_cp_mux_cap_init(ipc_imem, &mux_cfg))
-		ipc_imem->mux = ipc_mux_init(&mux_cfg, ipc_imem);
+	ret = ipc_imem_setup_cp_mux_cap_init(ipc_imem, &mux_cfg);
+	if (ret < 0)
+		goto err_cap_init;
+
+	ipc_imem->mux = ipc_mux_init(&mux_cfg, ipc_imem);
+	if (!ipc_imem->mux)
+		goto err_mux_init;
+
+	ret = ipc_imem_wwan_channel_init(ipc_imem, mux_cfg.protocol);
+	if (ret < 0)
+		goto err_channel_init;
 
-	ipc_imem_wwan_channel_init(ipc_imem, mux_cfg.protocol);
-	if (ipc_imem->mux)
-		ipc_imem->mux->wwan = ipc_imem->wwan;
+	ipc_imem->mux->wwan = ipc_imem->wwan;
 
 	while (ctrl_chl_idx < IPC_MEM_MAX_CHANNELS) {
 		if (!ipc_chnl_cfg_get(&chnl_cfg_port, ctrl_chl_idx)) {
@@ -622,6 +630,15 @@ static void ipc_imem_run_state_worker(struct work_struct *instance)
 
 	/* Complete all memory stores after setting bit */
 	smp_mb__after_atomic();
+
+	return;
+
+err_channel_init:
+	ipc_mux_deinit(ipc_imem->mux);
+err_mux_init:
+err_cap_init:
+err_cp_phase:
+	ipc_uevent_send(ipc_imem->dev, UEVENT_CD_READY_LINK_DOWN);
 }
 
 static void ipc_imem_handle_irq(struct iosm_imem *ipc_imem, int irq)
diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
index 66b90cc4c346..109cf8930488 100644
--- a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
+++ b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
@@ -77,8 +77,8 @@ int ipc_imem_sys_wwan_transmit(struct iosm_imem *ipc_imem,
 }
 
 /* Initialize wwan channel */
-void ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
-				enum ipc_mux_protocol mux_type)
+int ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
+			       enum ipc_mux_protocol mux_type)
 {
 	struct ipc_chnl_cfg chnl_cfg = { 0 };
 
@@ -87,7 +87,7 @@ void ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
 	/* If modem version is invalid (0xffffffff), do not initialize WWAN. */
 	if (ipc_imem->cp_version == -1) {
 		dev_err(ipc_imem->dev, "invalid CP version");
-		return;
+		return -EIO;
 	}
 
 	ipc_chnl_cfg_get(&chnl_cfg, ipc_imem->nr_of_channels);
@@ -104,9 +104,13 @@ void ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
 
 	/* WWAN registration. */
 	ipc_imem->wwan = ipc_wwan_init(ipc_imem, ipc_imem->dev);
-	if (!ipc_imem->wwan)
+	if (!ipc_imem->wwan) {
 		dev_err(ipc_imem->dev,
 			"failed to register the ipc_wwan interfaces");
+		return -ENOMEM;
+	}
+
+	return 0;
 }
 
 /* Map SKB to DMA for transfer */
diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h
index f8afb217d9e2..026c5bd0f999 100644
--- a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h
+++ b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h
@@ -91,9 +91,11 @@ int ipc_imem_sys_wwan_transmit(struct iosm_imem *ipc_imem, int if_id,
  *				MUX.
  * @ipc_imem:		Pointer to iosm_imem struct.
  * @mux_type:		Type of mux protocol.
+ *
+ * Return: 0 on success and failure value on error
  */
-void ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
-				enum ipc_mux_protocol mux_type);
+int ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
+			       enum ipc_mux_protocol mux_type);
 
 /**
  * ipc_imem_sys_devlink_open - Open a Flash/CD Channel link to CP

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* RE: NULL pointer dereference when removing xmm7360 PCI device
  2023-04-28 14:24       ` Kumar, M Chetan
@ 2023-05-01 16:00         ` Samuel Wein PhD
  0 siblings, 0 replies; 6+ messages in thread
From: Samuel Wein PhD @ 2023-05-01 16:00 UTC (permalink / raw)
  To: Kumar, M Chetan; +Cc: Jakub Kicinski, netdev@vger.kernel.org, linuxwwan

This appears to do the trick. I'm still having issues bringing the device back after suspend (even with the proper reset->reload->rescan sequence), so I will let you know if I find any more kernel issues.

Thanks!

Sam 




------- Original Message -------
On Friday, April 28th, 2023 at 4:24 PM, Kumar, M Chetan <m.chetan.kumar@intel.com> wrote:


> 
> 
> > -----Original Message-----
> 
> > From: Samuel Wein PhD sam@samwein.com
> > Sent: Friday, April 28, 2023 3:18 PM
> > To: Kumar, M Chetan m.chetan.kumar@intel.com
> > Cc: Jakub Kicinski kuba@kernel.org; netdev@vger.kernel.org; linuxwwan
> > linuxwwan@intel.com
> > Subject: RE: NULL pointer dereference when removing xmm7360 PCI device
> > 
> > I've added the full set of logs to the gist. The sequence was:
> > boot PC
> > suspend PC
> > resume PC
> > repeat the following several (it varies) times until the error:
> > `sudo echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/remove`
> > `echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/rescan`
> > 
> > The XMM7360 card doesn't resume properly from suspend, with the AT
> > devices not responding to any commands. I've been working off of some of
> > Shane Parslow's code to try to get ModemManager control of the device and
> > was basically trying a bunch of ways to try to reset the card, this one clearly
> > didn't work, but it also clearly exposed some weirdness in how the kernel
> > handled the attempt. The fact that there is not a proper MBIM interface for
> > this card really makes managing it difficult.
> > 
> > ------- Original Message -------
> > On Friday, April 28th, 2023 at 11:21 AM, Kumar, M Chetan
> > m.chetan.kumar@intel.com wrote:
> > 
> > > > -----Original Message-----
> > > 
> > > > From: Jakub Kicinski kuba@kernel.org
> > > > Sent: Friday, April 28, 2023 2:38 AM
> > > > To: Kumar, M Chetan m.chetan.kumar@intel.com
> > > > Cc: Samuel Wein PhD sam@samwein.com; netdev@vger.kernel.org;
> > > > linuxwwan linuxwwan@intel.com
> > > > Subject: Re: NULL pointer dereference when removing xmm7360 PCI
> > > > device
> > > > 
> > > > On Thu, 27 Apr 2023 10:31:29 +0000 Samuel Wein PhD wrote:
> > > > 
> > > > > Hi Folks,
> > > > > I've been trying to get the xmm7360 working with IOSM and the
> > > > > ModemManager. This has been what my highschool advisor would call
> > > > > a "learning process".
> > > > > When trying `echo 1 > /sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/remove` I get a
> > > > > variety of errors. One of these is a kernel error
> > > > > `2023-04-27T12:23:38.937223+02:00 Nase kernel: [ 587.997430] BUG: kernel NULL pointer dereference, address: 0000000000000048 2023-04-27T12:23:38.937237+02:00 Nase kernel: [ 587.997447] #PF: supervisor read access in kernel mode 2023-04-27T12:23:38.937238+02:00 Nase kernel: [ 587.997455] #PF: error_code(0x0000) - not-present page 2023-04-27T12:23:38.937241+02:00 Nase kernel: [ 587.997463] PGD 0 P4D 0 2023-04-27T12:23:38.937242+02:00 Nase kernel: [ 587.997476] Oops: 0000 [#1] PREEMPT SMP NOPTI 2023-04- 27T12:23:38.937242+02:00 Nase kernel: [ 587.997489] CPU: 1 PID: 4767 Comm: bash Not tainted 6.3.0-060300-generic #202304232030 ...` the full log is available
> > > > > at
> > > > > https://gist.github.com/poshul/0c5ffbde6106a71adcbc132d828dbcd7
> > > > > 
> > > > > Steps to reproduce: Boot device with xmm7360 installed and in PCI
> > > > > mode, place into suspend. Resume, and start issuing reset/remove
> > > > > commands to the PCI interface (without properly unloading the IOSM
> > > > > module first).
> > > > > 
> > > > > I'm not sure how widely applicable this is but wanted to at least report
> > > > > it.
> > > > 
> > > > Intel folks, PTAL.
> > > 
> > > I tried reproducing the issue by following the steps you mentioned but
> > > so far could not reproduce it. Could you please share the logs from boot-up
> > > and procedure you carried out in steps.
> > > 
> > > Once you boot-up the laptop, driver will be in working condition why
> > > device removal ?
> 
> 
> There seems to be some issue at device side due to which communication is not
> getting restored after suspend exit.
> 
> At some point device communication is broken and wwan is not initialized.
> In remove path, accessing wwan to release resource is leading to NULL pointer
> exception.
> 
> Error handling need to be corrected. Try the patch [1] it should fix.
> 
> Another thing for restoring the communication from such state, remove and rescan is
> not enough you may have to follow below procedure for a graceful recover.
> 
> echo 1 > /sys/bus/pci/drivers/iosm/0000:01:00.0/reset -> replace BDF
> 
> echo 1 > /sys/bus/pci/drivers/iosm/0000:01:00.0/remove
> 
> echo 1 > /sys/bus/pci/rescan
> 
> 
> [1]
> diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem.c b/drivers/net/wwan/iosm/iosm_ipc_imem.c
> index c066b0040a3f..0b4efd2571dd 100644
> --- a/drivers/net/wwan/iosm/iosm_ipc_imem.c
> +++ b/drivers/net/wwan/iosm/iosm_ipc_imem.c
> @@ -565,24 +565,32 @@ static void ipc_imem_run_state_worker(struct work_struct *instance)
> struct ipc_mux_config mux_cfg;
> struct iosm_imem *ipc_imem;
> u8 ctrl_chl_idx = 0;
> + int ret;
> 
> ipc_imem = container_of(instance, struct iosm_imem, run_state_worker);
> 
> if (ipc_imem->phase != IPC_P_RUN) {
> 
> dev_err(ipc_imem->dev,
> 
> "Modem link down. Exit run state worker.");
> - return;
> + goto err_cp_phase;
> }
> 
> if (test_and_clear_bit(IOSM_DEVLINK_INIT, &ipc_imem->flag))
> 
> ipc_devlink_deinit(ipc_imem->ipc_devlink);
> 
> 
> - if (!ipc_imem_setup_cp_mux_cap_init(ipc_imem, &mux_cfg))
> - ipc_imem->mux = ipc_mux_init(&mux_cfg, ipc_imem);
> 
> + ret = ipc_imem_setup_cp_mux_cap_init(ipc_imem, &mux_cfg);
> + if (ret < 0)
> + goto err_cap_init;
> +
> + ipc_imem->mux = ipc_mux_init(&mux_cfg, ipc_imem);
> 
> + if (!ipc_imem->mux)
> 
> + goto err_mux_init;
> +
> + ret = ipc_imem_wwan_channel_init(ipc_imem, mux_cfg.protocol);
> + if (ret < 0)
> + goto err_channel_init;
> 
> - ipc_imem_wwan_channel_init(ipc_imem, mux_cfg.protocol);
> - if (ipc_imem->mux)
> 
> - ipc_imem->mux->wwan = ipc_imem->wwan;
> 
> + ipc_imem->mux->wwan = ipc_imem->wwan;
> 
> 
> while (ctrl_chl_idx < IPC_MEM_MAX_CHANNELS) {
> if (!ipc_chnl_cfg_get(&chnl_cfg_port, ctrl_chl_idx)) {
> @@ -622,6 +630,15 @@ static void ipc_imem_run_state_worker(struct work_struct instance)
> 
> / Complete all memory stores after setting bit */
> smp_mb__after_atomic();
> +
> + return;
> +
> +err_channel_init:
> + ipc_mux_deinit(ipc_imem->mux);
> 
> +err_mux_init:
> +err_cap_init:
> +err_cp_phase:
> + ipc_uevent_send(ipc_imem->dev, UEVENT_CD_READY_LINK_DOWN);
> 
> }
> 
> static void ipc_imem_handle_irq(struct iosm_imem *ipc_imem, int irq)
> diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
> index 66b90cc4c346..109cf8930488 100644
> --- a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
> +++ b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.c
> @@ -77,8 +77,8 @@ int ipc_imem_sys_wwan_transmit(struct iosm_imem ipc_imem,
> }
> 
> / Initialize wwan channel */
> -void ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
> - enum ipc_mux_protocol mux_type)
> +int ipc_imem_wwan_channel_init(struct iosm_imem *ipc_imem,
> + enum ipc_mux_protocol mux_type)
> {
> struct ipc_chnl_cfg chnl_cfg = { 0 };
> 
> @@ -87,7 +87,7 @@ void ipc_imem_wwan_channel_init(struct iosm_imem ipc_imem,
> / If modem version is invalid (0xffffffff), do not initialize WWAN. */
> if (ipc_imem->cp_version == -1) {
> 
> dev_err(ipc_imem->dev, "invalid CP version");
> 
> - return;
> + return -EIO;
> }
> 
> ipc_chnl_cfg_get(&chnl_cfg, ipc_imem->nr_of_channels);
> 
> @@ -104,9 +104,13 @@ void ipc_imem_wwan_channel_init(struct iosm_imem ipc_imem,
> 
> / WWAN registration. */
> ipc_imem->wwan = ipc_wwan_init(ipc_imem, ipc_imem->dev);
> 
> - if (!ipc_imem->wwan)
> 
> + if (!ipc_imem->wwan) {
> 
> dev_err(ipc_imem->dev,
> 
> "failed to register the ipc_wwan interfaces");
> + return -ENOMEM;
> + }
> +
> + return 0;
> }
> 
> /* Map SKB to DMA for transfer */
> diff --git a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h
> index f8afb217d9e2..026c5bd0f999 100644
> --- a/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h
> +++ b/drivers/net/wwan/iosm/iosm_ipc_imem_ops.h
> @@ -91,9 +91,11 @@ int ipc_imem_sys_wwan_transmit(struct iosm_imem *ipc_imem, int if_id,
> * MUX.
> * @ipc_imem: Pointer to iosm_imem struct.
> * @mux_type: Type of mux protocol.
> + *
> + * Return: 0 on success and failure value on error
> */
> -void ipc_imem_wwan_channel_init(struct iosm_imem ipc_imem,
> - enum ipc_mux_protocol mux_type);
> +int ipc_imem_wwan_channel_init(struct iosm_imem ipc_imem,
> + enum ipc_mux_protocol mux_type);
> 
> /
> * ipc_imem_sys_devlink_open - Open a Flash/CD Channel link to CP

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-05-01 16:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-27 10:31 NULL pointer dereference when removing xmm7360 PCI device Samuel Wein PhD
2023-04-27 21:08 ` Jakub Kicinski
2023-04-28  9:21   ` Kumar, M Chetan
2023-04-28  9:47     ` Samuel Wein PhD
2023-04-28 14:24       ` Kumar, M Chetan
2023-05-01 16:00         ` Samuel Wein PhD

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).