Re: [PATCH v2] usb: ucsi: fix connector partner ucsi work issue

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jack Pham <quic_jackp@quicinc.com>
To: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Cc: Linyu Yuan <quic_linyyuan@quicinc.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	<linux-usb@vger.kernel.org>,
	Wesley Cheng <quic_wcheng@quicinc.com>,
	Subbaraman Narayanamurthy <quic_subbaram@quicinc.com>,
	Fenglin Wu <quic_fenglinw@quicinc.com>
Subject: Re: [PATCH v2] usb: ucsi: fix connector partner ucsi work issue
Date: Thu, 5 Jan 2023 10:34:41 -0800	[thread overview]
Message-ID: <20230105183441.GD28337@jackp-linux.qualcomm.com> (raw)
In-Reply-To: <Y7a0C+DkI5Q6hq6O@kuha.fi.intel.com>

Hi Heikki,

On Thu, Jan 05, 2023 at 01:27:07PM +0200, Heikki Krogerus wrote:
> Hi,
> 
> On Thu, Jan 05, 2023 at 01:42:40PM +0800, Linyu Yuan wrote:
> > When a PPM client unregisters with UCSI framework, connector specific work
> > queue is destroyed. However, a pending delayed work queued before may
> > still exist. Once the delay timer expires and the work is scheduled,
> > this can cause a system crash as the workqueue is destroyed already.
> 
> When the workqueue is destroyed it's also flushed, no? So how could
> there be still something pending, delayed or not? Did you actually see
> this happening?

Yes, we encountered this during a scenario in which our PPM firmware 
is undergoing a reset which is handled by calling ucsi_unregister().
The connectors' workqueues are destroyed but apparently the
destroy_workqueue() does *not* seem to take care pending delayed items.

	Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
	...
	Call trace:
	 __queue_work+0x1f4/0x550
	 delayed_work_timer_fn+0x28/0x38
	 call_timer_fn+0x3c/0x238
	 expire_timers+0xcc/0x168
	 __run_timers+0x194/0x1f8
	 run_timer_softirq+0x2c/0x54
	 _stext+0xec/0x3a8
	 __irq_exit_rcu+0xa0/0xfc
	 irq_exit_rcu+0x18/0x28
	 el0_interrupt+0x5c/0x138
	 __el0_irq_handler_common+0x20/0x30
	 el0t_64_irq_handler+0x18/0x28
	 el0t_64_irq+0x1a0/0x1a4
	Code: eb16013f 54000300 aa1a03e0 9441be2a (f9400280) 

In particular this is happening for the ucsi_check_connection() which is
the currently the only work item using a non-zero delay.  If we look
closely at queue_delayed_work() we can see in that case it defers by
using a separate timer:

	static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
					struct delayed_work *dwork, unsigned long delay)
	{
		struct timer_list *timer = &dwork->timer;
		struct work_struct *work = &dwork->work;

		<...snip...>

		/*
		 * If @delay is 0, queue @dwork->work immediately.  This is for
		 * both optimization and correctness.  The earliest @timer can
		 * expire is on the closest next tick and delayed_work users depend
		 * on that there's no such delay when @delay is 0.
		 */
		if (!delay) {
			__queue_work(cpu, wq, &dwork->work);
			return;
		}

		dwork->wq = wq;
		       ^^^^^^^^ wq gets saved here, but might be
				destroyed before timer expires

		dwork->cpu = cpu;
		timer->expires = jiffies + delay;

		if (unlikely(cpu != WORK_CPU_UNBOUND))
			add_timer_on(timer, cpu);
		else
			add_timer(timer);
	}

In ucsi_unregister() we destroy the connector's wq, but there is a
pending timer still for the ucsi_check_connection() item and upon
expiry it tries to do the real __queue_work() call on a dangling 'wq'.

> > Fix this by moving all partner related delayed work to connector instance
> > and cancel all of them when ucsi_unregister() is called by PPM client.
> 
> I would love to be able to cancel these works, though not because of
> driver removal - I'm more concerned about disconnections. The reason
> why each task is a unique work is because it allows the driver to add
> the same task to the queue as many times as needed, and that was
> needed inorder to recover from some firmware issues. If there's only a
> dedicated work per task like in your proposal, we can only schedule
> the task once at a time, and that may lead into other issues.

I see, queuing a task multiple times is a good reason to allocate the
workers dynamically.  Then what we really need is a way to reliably
cancel & reclaim any pending items that are sitting on their own timers,
since they are otherwise unreachable via the 'wq'. 

cancel_delayed_work(), cancel_delayed_work_sync(), flush_delayed_work()
all require the handle to the delayed_work itself which we don't keep a
reference to.

Do you have any other suggestions for this?

Thanks,
Jack

next prev parent reply	other threads:[~2023-01-05 18:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-05  5:42 [PATCH v2] usb: ucsi: fix connector partner ucsi work issue Linyu Yuan
2023-01-05 11:27 ` Heikki Krogerus
2023-01-05 18:34   ` Jack Pham [this message]
2023-01-09 11:49     ` Heikki Krogerus
2023-01-09 20:46       ` Jack Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230105183441.GD28337@jackp-linux.qualcomm.com \
    --to=quic_jackp@quicinc.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=heikki.krogerus@linux.intel.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=quic_fenglinw@quicinc.com \
    --cc=quic_linyyuan@quicinc.com \
    --cc=quic_subbaram@quicinc.com \
    --cc=quic_wcheng@quicinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.