linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Nam Cao <namcao@linutronix.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	John Ogness <john.ogness@linutronix.de>,
	Clark Williams <clrkwllms@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-rt-devel@lists.linux.dev, linux-rt-users@vger.kernel.org,
	Joe Damato <jdamato@fastly.com>,
	Martin Karsten <mkarsten@uwaterloo.ca>,
	Jens Axboe <axboe@kernel.dk>,
	Frederic Weisbecker <frederic@kernel.org>,
	Valentin Schneider <vschneid@redhat.com>
Subject: Re: [PATCH v2] eventpoll: Fix priority inversion problem
Date: Fri, 23 May 2025 16:31:32 +0200	[thread overview]
Message-ID: <20250523143132.i_YkS3R3@linutronix.de> (raw)
In-Reply-To: <20250523061104.3490066-1-namcao@linutronix.de>

On 2025-05-23 08:11:04 [+0200], Nam Cao wrote:
> @@ -867,10 +837,25 @@ static bool __ep_remove(struct eventpoll *ep, struct epitem *epi, bool force)
>  
>  	rb_erase_cached(&epi->rbn, &ep->rbr);
>  
> -	write_lock_irq(&ep->lock);
> -	if (ep_is_linked(epi))
> -		list_del_init(&epi->rdllink);
> -	write_unlock_irq(&ep->lock);
> +	/*
> +	 * ep->mtx is held, which means no waiter is touching the ready list. This item is also no
> +	 * longer being added. Therefore, the ready flag can only mean one thing: this item is on
> +	 * the ready list.
> +	 */
> +	if (smp_load_acquire(&epi->ready)) {
> +		put_back_last = NULL;
> +		while (true) {
> +			struct llist_node *n = llist_del_first(&ep->rdllist);
> +
> +			if (&epi->rdllink == n || WARN_ON(!n))
> +				break;
> +			if (!put_back_last)
> +				put_back_last = n;
> +			llist_add(n, &put_back);

put_back is local, you cam use __llist_add()

You could llist_del_all() and then you could use the non-atomic
operations for the replacement. But you need to walk the whole list to
know the first and last for the batch.
the "wait" perf bench throws in "16320" items. Avoiding
llist_del_first() makes hardly a different to, worse, to slight
improvement. Since it is all random it depends when the atomic operation
outweighs  

The ctl case gets worse with this approach. On average it has to iterate
over 45% items to find the right item and it adds less than 200 items.
So the atomic does not outweigh the while iteration.

> +		}
> +		if (put_back_last)
> +			llist_add_batch(put_back.first, put_back_last, &ep->rdllist);
> +	}
>  
>  	wakeup_source_unregister(ep_wakeup_source(epi));
>  	/*
> @@ -1867,19 +1767,20 @@ static int ep_send_events(struct eventpoll *ep,
>  	init_poll_funcptr(&pt, NULL);
>  
>  	mutex_lock(&ep->mtx);
> -	ep_start_scan(ep, &txlist);
>  
> -	/*
> -	 * We can loop without lock because we are passed a task private list.
> -	 * Items cannot vanish during the loop we are holding ep->mtx.
> -	 */
> -	list_for_each_entry_safe(epi, tmp, &txlist, rdllink) {
> +	while (res < maxevents) {
>  		struct wakeup_source *ws;
> +		struct llist_node *n;
>  		__poll_t revents;
>  
> -		if (res >= maxevents)
> +		n = llist_del_first(&ep->rdllist);
> +		if (!n)
>  			break;
>  
> +		epi = llist_entry(n, struct epitem, rdllink);
> +		llist_add(n, &txlist);

txlist is local, you can use __list_add()

> +		smp_store_release(&epi->ready, false);
> +
>  		/*
>  		 * Activate ep->ws before deactivating epi->ws to prevent
>  		 * triggering auto-suspend here (in case we reactive epi->ws

Sebastian

  parent reply	other threads:[~2025-05-23 14:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-23  6:11 [PATCH v2] eventpoll: Fix priority inversion problem Nam Cao
2025-05-23 12:26 ` Sebastian Andrzej Siewior
2025-05-26  5:39   ` Nam Cao
2025-05-23 14:31 ` Sebastian Andrzej Siewior [this message]
2025-05-28  5:57 ` Holger Hoffstätte
2025-05-28  6:07   ` Holger Hoffstätte
2025-05-28  6:12   ` Nam Cao
2025-05-28  8:04     ` Nam Cao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250523143132.i_YkS3R3@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=clrkwllms@kernel.org \
    --cc=frederic@kernel.org \
    --cc=jack@suse.cz \
    --cc=jdamato@fastly.com \
    --cc=john.ogness@linutronix.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mkarsten@uwaterloo.ca \
    --cc=namcao@linutronix.de \
    --cc=rostedt@goodmis.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).