All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sultan Alsawaf <sultan@kerneltoast.com>
To: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: linux-kernel@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Martin Belanger <Martin.Belanger@dell.com>,
	Oliver O'Halloran <oohall@gmail.com>,
	Daniel Wagner <dwagner@suse.de>, Keith Busch <kbusch@kernel.org>,
	Lukas Wunner <lukas@wunner.de>,
	David Jeffery <djeffery@redhat.com>,
	Jeremy Allison <jallison@ciq.com>, Jens Axboe <axboe@fb.com>,
	Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org,
	Nathan Chancellor <nathan@kernel.org>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	Bert Karwatzki <spasswolf@web.de>
Subject: Re: [PATCH v10 1/5] kernel/async: streamline cookie synchronization
Date: Tue, 8 Jul 2025 15:17:23 -0700	[thread overview]
Message-ID: <aG2Y8795VSeT75hH@sultan-box> (raw)
In-Reply-To: <20250625201853.84062-2-stuart.w.hayes@gmail.com>

On Wed, Jun 25, 2025 at 03:18:49PM -0500, Stuart Hayes wrote:
> From: David Jeffery <djeffery@redhat.com>
> 
> To prevent a thundering herd effect, implement a custom wake function for
> the async shubsystem which will only wake waiters which have all their
> dependencies completed.
> 
> The async subsystem currently wakes all waiters on async_done when an async
> task completes. When there are many tasks trying to synchronize on differnt
> async values, this can create a thundering herd problem when an async task
> wakes up all waiters, most of whom go back to waiting after causing
> lock contention and wasting CPU.
> 
> Signed-off-by: David Jeffery <djeffery@redhat.com>
> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> ---
>  kernel/async.c | 42 +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/async.c b/kernel/async.c
> index 4c3e6a44595f..ae327f29bac9 100644
> --- a/kernel/async.c
> +++ b/kernel/async.c
> @@ -76,6 +76,12 @@ struct async_entry {
>  	struct async_domain	*domain;
>  };
>  
> +struct async_wait_entry {
> +	wait_queue_entry_t wait;
> +	async_cookie_t cookie;
> +	struct async_domain *domain;
> +};
> +
>  static DECLARE_WAIT_QUEUE_HEAD(async_done);
>  
>  static atomic_t entry_count;
> @@ -298,6 +304,24 @@ void async_synchronize_full_domain(struct async_domain *domain)
>  }
>  EXPORT_SYMBOL_GPL(async_synchronize_full_domain);
>  
> +/**
> + * async_domain_wake_function - wait function for cooking synchronization
> + *
> + * Custom wait function for async_synchronize_cookie_domain to check cookie
> + * value.  This prevents waking up waiting threads unnecessarily.
> + */
> +static int async_domain_wake_function(struct wait_queue_entry *wait,
> +				      unsigned int mode, int sync, void *key)
> +{
> +	struct async_wait_entry *await =
> +		container_of(wait, struct async_wait_entry, wait);
> +
> +	if (lowest_in_progress(await->domain) < await->cookie)
> +		return 0;
> +
> +	return autoremove_wake_function(wait, mode, sync, key);
> +}
> +
>  /**
>   * async_synchronize_cookie_domain - synchronize asynchronous function calls within a certain domain with cookie checkpointing
>   * @cookie: async_cookie_t to use as checkpoint
> @@ -310,11 +334,27 @@ EXPORT_SYMBOL_GPL(async_synchronize_full_domain);
>  void async_synchronize_cookie_domain(async_cookie_t cookie, struct async_domain *domain)
>  {
>  	ktime_t starttime;
> +	struct async_wait_entry await = {
> +		.cookie = cookie,
> +		.domain = domain,
> +		.wait = {
> +			.func = async_domain_wake_function,
> +			.private = current,
> +			.flags = 0,
> +			.entry = LIST_HEAD_INIT(await.wait.entry),
> +		}};
>  
>  	pr_debug("async_waiting @ %i\n", task_pid_nr(current));
>  	starttime = ktime_get();
>  
> -	wait_event(async_done, lowest_in_progress(domain) >= cookie);
> +	for (;;) {
> +		prepare_to_wait(&async_done, &await.wait, TASK_UNINTERRUPTIBLE);
> +
> +		if (lowest_in_progress(domain) >= cookie)

This line introduces a bug on PREEMPT_RT because lowest_in_progress() may sleep
on PREEMPT_RT. If it does sleep, it'll corrupt the current task's state by
setting it to TASK_RUNNING after the sleep is over. IOW, the current task's
state might be TASK_RUNNING after lowest_in_progress() returns.

lowest_in_progress() may sleep on PREEMPT_RT because it locks a non-raw spin
lock (async_lock).

> +			break;
> +		schedule();
> +	}
> +	finish_wait(&async_done, &await.wait);
>  
>  	pr_debug("async_continuing @ %i after %lli usec\n", task_pid_nr(current),
>  		 microseconds_since(starttime));
> -- 
> 2.39.3
> 

Sultan


  reply	other threads:[~2025-07-08 22:17 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-25 20:18 [PATCH v10 0/5] shut down devices asynchronously Stuart Hayes
2025-06-25 20:18 ` [PATCH v10 1/5] kernel/async: streamline cookie synchronization Stuart Hayes
2025-07-08 22:17   ` Sultan Alsawaf [this message]
2025-06-25 20:18 ` [PATCH v10 2/5] driver core: don't always lock parent in shutdown Stuart Hayes
2025-07-01  8:50   ` Greg Kroah-Hartman
2025-07-02 14:38     ` David Jeffery
2025-06-25 20:18 ` [PATCH v10 3/5] driver core: separate function to shutdown one device Stuart Hayes
2025-06-25 20:18 ` [PATCH v10 4/5] driver core: shut down devices asynchronously Stuart Hayes
2025-06-25 20:18 ` [PATCH v10 5/5] nvme-pci: Make driver prefer asynchronous shutdown Stuart Hayes
2025-06-30 20:33 ` [PATCH v10 0/5] shut down devices asynchronously Michael Kelley
2025-06-30 22:02   ` Laurence Oberman
2025-07-03 11:46 ` Christoph Hellwig
2025-07-03 15:41   ` Jeremy Allison
2025-07-04 13:45     ` David Jeffery
2025-07-04 16:26       ` Sultan Alsawaf
2025-07-07 15:34         ` David Jeffery
2025-07-07 20:49           ` stuart hayes
2025-07-08  0:00             ` Sultan Alsawaf
2025-07-08 21:47               ` Sultan Alsawaf
2025-07-08 21:31             ` Sultan Alsawaf
2025-07-03 15:59   ` stuart hayes
2025-07-04 13:38   ` David Jeffery
2025-07-04 13:44     ` Greg Kroah-Hartman
2025-07-04 14:09       ` David Jeffery
2025-07-04 14:13         ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aG2Y8795VSeT75hH@sultan-box \
    --to=sultan@kerneltoast.com \
    --cc=Martin.Belanger@dell.com \
    --cc=axboe@fb.com \
    --cc=djeffery@redhat.com \
    --cc=dwagner@suse.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=jallison@ciq.com \
    --cc=jan.kiszka@siemens.com \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=lukas@wunner.de \
    --cc=nathan@kernel.org \
    --cc=oohall@gmail.com \
    --cc=rafael@kernel.org \
    --cc=sagi@grimberg.me \
    --cc=spasswolf@web.de \
    --cc=stuart.w.hayes@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.