Re: [PATCH v2] usb: ucsi: fix connector partner ucsi work issue

Linux USB
 help / color / mirror / Atom feed

From: Jack Pham <quic_jackp@quicinc.com>
To: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Linyu Yuan <quic_linyyuan@quicinc.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	<linux-usb@vger.kernel.org>,
	Wesley Cheng <quic_wcheng@quicinc.com>,
	Subbaraman Narayanamurthy <quic_subbaram@quicinc.com>,
	Fenglin Wu <quic_fenglinw@quicinc.com>
Subject: Re: [PATCH v2] usb: ucsi: fix connector partner ucsi work issue
Date: Thu, 5 Jan 2023 10:34:41 -0800	[thread overview]
Message-ID: <20230105183441.GD28337@jackp-linux.qualcomm.com> (raw)
In-Reply-To: <Y7a0C+DkI5Q6hq6O@kuha.fi.intel.com>

Hi Heikki,

On Thu, Jan 05, 2023 at 01:27:07PM +0200, Heikki Krogerus wrote:
> Hi,
> 
> On Thu, Jan 05, 2023 at 01:42:40PM +0800, Linyu Yuan wrote:
> > When a PPM client unregisters with UCSI framework, connector specific work
> > queue is destroyed. However, a pending delayed work queued before may
> > still exist. Once the delay timer expires and the work is scheduled,
> > this can cause a system crash as the workqueue is destroyed already.
> 
> When the workqueue is destroyed it's also flushed, no? So how could
> there be still something pending, delayed or not? Did you actually see
> this happening?

Yes, we encountered this during a scenario in which our PPM firmware 
is undergoing a reset which is handled by calling ucsi_unregister().
The connectors' workqueues are destroyed but apparently the
destroy_workqueue() does *not* seem to take care pending delayed items.

	Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
	...
	Call trace:
	 __queue_work+0x1f4/0x550
	 delayed_work_timer_fn+0x28/0x38
	 call_timer_fn+0x3c/0x238
	 expire_timers+0xcc/0x168
	 __run_timers+0x194/0x1f8
	 run_timer_softirq+0x2c/0x54
	 _stext+0xec/0x3a8
	 __irq_exit_rcu+0xa0/0xfc
	 irq_exit_rcu+0x18/0x28
	 el0_interrupt+0x5c/0x138
	 __el0_irq_handler_common+0x20/0x30
	 el0t_64_irq_handler+0x18/0x28
	 el0t_64_irq+0x1a0/0x1a4
	Code: eb16013f 54000300 aa1a03e0 9441be2a (f9400280) 

In particular this is happening for the ucsi_check_connection() which is
the currently the only work item using a non-zero delay.  If we look
closely at queue_delayed_work() we can see in that case it defers by
using a separate timer:

	static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
					struct delayed_work *dwork, unsigned long delay)
	{
		struct timer_list *timer = &dwork->timer;
		struct work_struct *work = &dwork->work;

		<...snip...>

		/*
		 * If @delay is 0, queue @dwork->work immediately.  This is for
		 * both optimization and correctness.  The earliest @timer can
		 * expire is on the closest next tick and delayed_work users depend
		 * on that there's no such delay when @delay is 0.
		 */
		if (!delay) {
			__queue_work(cpu, wq, &dwork->work);
			return;
		}

		dwork->wq = wq;
		       ^^^^^^^^ wq gets saved here, but might be
				destroyed before timer expires

		dwork->cpu = cpu;
		timer->expires = jiffies + delay;

		if (unlikely(cpu != WORK_CPU_UNBOUND))
			add_timer_on(timer, cpu);
		else
			add_timer(timer);
	}

In ucsi_unregister() we destroy the connector's wq, but there is a
pending timer still for the ucsi_check_connection() item and upon
expiry it tries to do the real __queue_work() call on a dangling 'wq'.

> > Fix this by moving all partner related delayed work to connector instance
> > and cancel all of them when ucsi_unregister() is called by PPM client.
> 
> I would love to be able to cancel these works, though not because of
> driver removal - I'm more concerned about disconnections. The reason
> why each task is a unique work is because it allows the driver to add
> the same task to the queue as many times as needed, and that was
> needed inorder to recover from some firmware issues. If there's only a
> dedicated work per task like in your proposal, we can only schedule
> the task once at a time, and that may lead into other issues.

I see, queuing a task multiple times is a good reason to allocate the
workers dynamically.  Then what we really need is a way to reliably
cancel & reclaim any pending items that are sitting on their own timers,
since they are otherwise unreachable via the 'wq'. 

cancel_delayed_work(), cancel_delayed_work_sync(), flush_delayed_work()
all require the handle to the delayed_work itself which we don't keep a
reference to.

Do you have any other suggestions for this?

Thanks,
Jack

next prev parent reply	other threads:[~2023-01-05 18:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-05  5:42 [PATCH v2] usb: ucsi: fix connector partner ucsi work issue Linyu Yuan
2023-01-05 11:27 ` Heikki Krogerus
2023-01-05 18:34   ` Jack Pham [this message]
2023-01-09 11:49     ` Heikki Krogerus
2023-01-09 20:46       ` Jack Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230105183441.GD28337@jackp-linux.qualcomm.com \
    --to=quic_jackp@quicinc.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=heikki.krogerus@linux.intel.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=quic_fenglinw@quicinc.com \
    --cc=quic_linyyuan@quicinc.com \
    --cc=quic_subbaram@quicinc.com \
    --cc=quic_wcheng@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox