linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nam Cao <namcao@linutronix.de>
To: Valentin Schneider <vschneid@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	John Ogness <john.ogness@linutronix.de>,
	Clark Williams <clrkwllms@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-rt-devel@lists.linux.dev, linux-rt-users@vger.kernel.org,
	Joe Damato <jdamato@fastly.com>,
	Martin Karsten <mkarsten@uwaterloo.ca>,
	Jens Axboe <axboe@kernel.dk>,
	Frederic Weisbecker <frederic@kernel.org>
Subject: Re: [PATCH] eventpoll: Fix priority inversion problem
Date: Wed, 21 May 2025 16:53:20 +0200	[thread overview]
Message-ID: <20250521145320.oqUIOaRG@linutronix.de> (raw)
In-Reply-To: <xhsmh8qmq9h37.mognet@vschneid-thinkpadt14sgen2i.remote.csb>

On Wed, May 21, 2025 at 04:25:00PM +0200, Valentin Schneider wrote:
> On 19/05/25 09:40, Nam Cao wrote:
> > @@ -136,14 +136,28 @@ struct epitem {
> >               struct rcu_head rcu;
> >       };
> >
> > -	/* List header used to link this structure to the eventpoll ready list */
> > -	struct list_head rdllink;
> > +	/*
> > +	 * Whether this item can be added to the eventpoll ready list. Adding to the list can be
> > +	 * blocked for two reasons:
> > +	 *
> > +	 *  1. This item is already on the list.
> > +	 *  2. A waiter is consuming the ready list and has consumed this item. The waiter therefore
> > +	 *     is blocking this item from being added again, preventing seeing the same item twice.
> > +	 *     If adding is blocked due to this reason, the waiter will add this item to the list
> > +	 *     once consuming is done.
> > +	 */
> > +	bool link_locked;
> 
> Nit: IIUC it's not just the readylist, it's anytime the link is used
> (e.g. to punt it on a txlist), so how about:
> 
>         /*
>          * Whether epitem.rdllink is currently used in a list. When used, it
>          * cannot be detached or inserted elsewhere.
>          *
>          * It may be in use for two reasons:
>          *
>          * 1. This item is on the eventpoll ready list
>          * 2. This item is being consumed by a waiter and stashed on a temporary
>          *    list. If adding is blocked due to this reason, the waiter will add
>          *    this item to the list once consuming is done.
>          */
>          bool link_used;

Acked.

> >
> >       /*
> > -	 * Works together "struct eventpoll"->ovflist in keeping the
> > -	 * single linked chain of items.
> > +	 * Indicate whether this item is ready for consumption. All items on the ready list has this
> > +	 * flag set. Item that should be on the ready list, but cannot be added because of
> > +	 * link_locked (in other words, a waiter is consuming the ready list), also has this flag
> > +	 * set. When a waiter is done consuming, the waiter will add ready items to the ready list.
> >        */
> > -	struct epitem *next;
> > +	bool ready;
> > +
> > +	/* List header used to link this structure to the eventpoll ready list */
> > +	struct llist_node rdllink;
> >
> >       /* The file descriptor information this item refers to */
> >       struct epoll_filefd ffd;
> 
> > @@ -361,10 +368,27 @@ static inline int ep_cmp_ffd(struct epoll_filefd *p1,
> >               (p1->file < p2->file ? -1 : p1->fd - p2->fd));
> >  }
> >
> > -/* Tells us if the item is currently linked */
> > -static inline int ep_is_linked(struct epitem *epi)
> > +static void epitem_ready(struct epitem *epi)
> >  {
> > -	return !list_empty(&epi->rdllink);
> > +	/*
> > +	 * Mark it ready, just in case a waiter is blocking this item from going into the ready
> > +	 * list. This will tell the waiter to add this item to the ready list, after the waiter is
> > +	 * finished.
> > +	 */
> > +	xchg(&epi->ready, true);
> 
> Perhaps a stupid question, why use an xchg() for .ready and .link_locked
> (excepted for that epitem_ready() cmpxchg()) writes when the return value
> is always discarded? Wouldn't e.g. smp_store_release() suffice, considering
> the reads are smp_load_acquire()?

That's me being stupid, smp_store_release() is good enough.

> > +
> > +	/*
> > +	 * If this item is not blocked, add it to the ready list. This item could be blocked for two
> > +	 * reasons:
> > +	 *
> > +	 * 1. It is already on the ready list. Then nothing further is required.
> > +	 * 2. A waiter is consuming the ready list, and has consumed this item. The waiter is now
> > +	 *    blocking this item, so that this item won't be seen twice. In this case, the waiter
> > +	 *    will add this item to the ready list after the waiter is finished.
> > +	 */
> > +	if (!cmpxchg(&epi->link_locked, false, true))
> > +		llist_add(&epi->rdllink, &epi->ep->rdllist);
> > +
> >  }
> >
> >  static inline struct eppoll_entry *ep_pwq_from_wait(wait_queue_entry_t *p)
> 
> > @@ -1924,19 +1874,39 @@ static int ep_send_events(struct eventpoll *ep,
> >                        * Trigger mode, we need to insert back inside
> >                        * the ready list, so that the next call to
> >                        * epoll_wait() will check again the events
> > -			 * availability. At this point, no one can insert
> > -			 * into ep->rdllist besides us. The epoll_ctl()
> > -			 * callers are locked out by
> > -			 * ep_send_events() holding "mtx" and the
> > -			 * poll callback will queue them in ep->ovflist.
> > +			 * availability.
> >                        */
> > -			list_add_tail(&epi->rdllink, &ep->rdllist);
> > +			xchg(&epi->ready, true);
> > +		}
> > +	}
> > +
> > +	llist_for_each_entry_safe(epi, tmp, txlist.first, rdllink) {
> > +		/*
> > +		 * We are done iterating. Allow the items we took to be added back to the ready
> > +		 * list.
> > +		 */
> > +		xchg(&epi->link_locked, false);
> > +
> > +		/*
> > +		 * In the loop above, we may mark some items ready, and they should be added back.
> > +		 *
> > +		 * Additionally, someone else may also attempt to add the item to the ready list,
> > +		 * but got blocked by us. Add those blocked items now.
> > +		 */
> > +		if (smp_load_acquire(&epi->ready)) {
> >                       ep_pm_stay_awake(epi);
> > +			epitem_ready(epi);
> >               }
> 
> Isn't this missing a:
> 
>                 list_del_init(&epi->rdllink);
> 
> AFAICT we're always going to overwrite that link when next marking the item
> as ready, but I'd say it's best to exit this with a clean state. That would
> have to be before the clearing of link_locked so it doesn't race with a
> concurrent epitem_ready() call.

To confirm I understand you: there is no functional problem, and your
comment is more of a "just to be safe"?

Thanks so much for the review,
Nam

  reply	other threads:[~2025-05-21 14:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-19  7:40 [PATCH] eventpoll: Fix priority inversion problem Nam Cao
2025-05-19  9:25 ` Florian Bezdeka
2025-05-19  9:45   ` Nam Cao
2025-05-19 14:22     ` Valentin Schneider
2025-05-21 14:25 ` Valentin Schneider
2025-05-21 14:53   ` Nam Cao [this message]
2025-05-21 15:37     ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250521145320.oqUIOaRG@linutronix.de \
    --to=namcao@linutronix.de \
    --cc=axboe@kernel.dk \
    --cc=bigeasy@linutronix.de \
    --cc=brauner@kernel.org \
    --cc=clrkwllms@kernel.org \
    --cc=frederic@kernel.org \
    --cc=jack@suse.cz \
    --cc=jdamato@fastly.com \
    --cc=john.ogness@linutronix.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mkarsten@uwaterloo.ca \
    --cc=rostedt@goodmis.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).