From: Heikki Krogerus <heikki.krogerus@linux.intel.com>
To: Jack Pham <quic_jackp@quicinc.com>
Cc: Linyu Yuan <quic_linyyuan@quicinc.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
linux-usb@vger.kernel.org, Wesley Cheng <quic_wcheng@quicinc.com>,
Subbaraman Narayanamurthy <quic_subbaram@quicinc.com>,
Fenglin Wu <quic_fenglinw@quicinc.com>
Subject: Re: [PATCH v2] usb: ucsi: fix connector partner ucsi work issue
Date: Mon, 9 Jan 2023 13:49:22 +0200 [thread overview]
Message-ID: <Y7v/Qk+dyDYyCGT4@kuha.fi.intel.com> (raw)
In-Reply-To: <20230105183441.GD28337@jackp-linux.qualcomm.com>
Hi,
On Thu, Jan 05, 2023 at 10:34:41AM -0800, Jack Pham wrote:
> On Thu, Jan 05, 2023 at 01:27:07PM +0200, Heikki Krogerus wrote:
> > On Thu, Jan 05, 2023 at 01:42:40PM +0800, Linyu Yuan wrote:
> > > When a PPM client unregisters with UCSI framework, connector specific work
> > > queue is destroyed. However, a pending delayed work queued before may
> > > still exist. Once the delay timer expires and the work is scheduled,
> > > this can cause a system crash as the workqueue is destroyed already.
> >
> > When the workqueue is destroyed it's also flushed, no? So how could
> > there be still something pending, delayed or not? Did you actually see
> > this happening?
>
> Yes, we encountered this during a scenario in which our PPM firmware
> is undergoing a reset which is handled by calling ucsi_unregister().
> The connectors' workqueues are destroyed but apparently the
> destroy_workqueue() does *not* seem to take care pending delayed items.
>
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> ...
> Call trace:
> __queue_work+0x1f4/0x550
> delayed_work_timer_fn+0x28/0x38
> call_timer_fn+0x3c/0x238
> expire_timers+0xcc/0x168
> __run_timers+0x194/0x1f8
> run_timer_softirq+0x2c/0x54
> _stext+0xec/0x3a8
> __irq_exit_rcu+0xa0/0xfc
> irq_exit_rcu+0x18/0x28
> el0_interrupt+0x5c/0x138
> __el0_irq_handler_common+0x20/0x30
> el0t_64_irq_handler+0x18/0x28
> el0t_64_irq+0x1a0/0x1a4
> Code: eb16013f 54000300 aa1a03e0 9441be2a (f9400280)
>
> In particular this is happening for the ucsi_check_connection() which is
> the currently the only work item using a non-zero delay. If we look
> closely at queue_delayed_work() we can see in that case it defers by
> using a separate timer:
>
> static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
> struct delayed_work *dwork, unsigned long delay)
> {
> struct timer_list *timer = &dwork->timer;
> struct work_struct *work = &dwork->work;
>
> <...snip...>
>
> /*
> * If @delay is 0, queue @dwork->work immediately. This is for
> * both optimization and correctness. The earliest @timer can
> * expire is on the closest next tick and delayed_work users depend
> * on that there's no such delay when @delay is 0.
> */
> if (!delay) {
> __queue_work(cpu, wq, &dwork->work);
> return;
> }
>
> dwork->wq = wq;
> ^^^^^^^^ wq gets saved here, but might be
> destroyed before timer expires
>
> dwork->cpu = cpu;
> timer->expires = jiffies + delay;
>
> if (unlikely(cpu != WORK_CPU_UNBOUND))
> add_timer_on(timer, cpu);
> else
> add_timer(timer);
> }
>
> In ucsi_unregister() we destroy the connector's wq, but there is a
> pending timer still for the ucsi_check_connection() item and upon
> expiry it tries to do the real __queue_work() call on a dangling 'wq'.
Okay.
> > > Fix this by moving all partner related delayed work to connector instance
> > > and cancel all of them when ucsi_unregister() is called by PPM client.
> >
> > I would love to be able to cancel these works, though not because of
> > driver removal - I'm more concerned about disconnections. The reason
> > why each task is a unique work is because it allows the driver to add
> > the same task to the queue as many times as needed, and that was
> > needed inorder to recover from some firmware issues. If there's only a
> > dedicated work per task like in your proposal, we can only schedule
> > the task once at a time, and that may lead into other issues.
>
> I see, queuing a task multiple times is a good reason to allocate the
> workers dynamically. Then what we really need is a way to reliably
> cancel & reclaim any pending items that are sitting on their own timers,
> since they are otherwise unreachable via the 'wq'.
>
> cancel_delayed_work(), cancel_delayed_work_sync(), flush_delayed_work()
> all require the handle to the delayed_work itself which we don't keep a
> reference to.
>
> Do you have any other suggestions for this?
How about separate list for the works?
diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
index 8695eb2c6227e..d5cf7573a2cfa 100644
--- a/drivers/usb/typec/ucsi/ucsi.c
+++ b/drivers/usb/typec/ucsi/ucsi.c
@@ -187,6 +187,7 @@ EXPORT_SYMBOL_GPL(ucsi_send_command);
struct ucsi_work {
struct delayed_work work;
+ struct list_head node;
unsigned long delay;
unsigned int count;
struct ucsi_connector *con;
@@ -202,6 +203,7 @@ static void ucsi_poll_worker(struct work_struct *work)
mutex_lock(&con->lock);
if (!con->partner) {
+ list_del(&uwork->node);
mutex_unlock(&con->lock);
kfree(uwork);
return;
@@ -209,10 +211,12 @@ static void ucsi_poll_worker(struct work_struct *work)
ret = uwork->cb(con);
- if (uwork->count-- && (ret == -EBUSY || ret == -ETIMEDOUT))
+ if (uwork->count-- && (ret == -EBUSY || ret == -ETIMEDOUT)) {
queue_delayed_work(con->wq, &uwork->work, uwork->delay);
- else
+ } else {
+ list_del(&uwork->node);
kfree(uwork);
+ }
mutex_unlock(&con->lock);
}
@@ -236,6 +240,7 @@ static int ucsi_partner_task(struct ucsi_connector *con,
uwork->con = con;
uwork->cb = cb;
+ list_add_tail(&uwork->node, &con->works);
queue_delayed_work(con->wq, &uwork->work, delay);
return 0;
@@ -1056,6 +1061,7 @@ static int ucsi_register_port(struct ucsi *ucsi, int index)
INIT_WORK(&con->work, ucsi_handle_connector_change);
init_completion(&con->complete);
mutex_init(&con->lock);
+ INIT_LIST_HEAD(&con->works);
con->num = index + 1;
con->ucsi = ucsi;
diff --git a/drivers/usb/typec/ucsi/ucsi.h b/drivers/usb/typec/ucsi/ucsi.h
index 8361e1cfc8eba..dcb792eeedb94 100644
--- a/drivers/usb/typec/ucsi/ucsi.h
+++ b/drivers/usb/typec/ucsi/ucsi.h
@@ -325,6 +325,7 @@ struct ucsi_connector {
struct work_struct work;
struct completion complete;
struct workqueue_struct *wq;
+ struct list_head works;
struct typec_port *port;
struct typec_partner *partner;
Something like that. Then just walk through the list and cancel each
work in it before destroying the wq. Would that work?
thanks,
--
heikki
next prev parent reply other threads:[~2023-01-09 11:49 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-05 5:42 [PATCH v2] usb: ucsi: fix connector partner ucsi work issue Linyu Yuan
2023-01-05 11:27 ` Heikki Krogerus
2023-01-05 18:34 ` Jack Pham
2023-01-09 11:49 ` Heikki Krogerus [this message]
2023-01-09 20:46 ` Jack Pham
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y7v/Qk+dyDYyCGT4@kuha.fi.intel.com \
--to=heikki.krogerus@linux.intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-usb@vger.kernel.org \
--cc=quic_fenglinw@quicinc.com \
--cc=quic_jackp@quicinc.com \
--cc=quic_linyyuan@quicinc.com \
--cc=quic_subbaram@quicinc.com \
--cc=quic_wcheng@quicinc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox