All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Linux-RT <linux-rt-users@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] locking/rwbase: Prevent indefinite writer starvation
Date: Thu, 19 Jan 2023 11:02:20 +0000	[thread overview]
Message-ID: <20230119110220.kphftcehehhi5l5u@techsingularity.net> (raw)
In-Reply-To: <Y8j+lENBWNWgt4mf@linutronix.de>

On Thu, Jan 19, 2023 at 09:25:56AM +0100, Sebastian Andrzej Siewior wrote:
> On 2023-01-18 17:31:30 [+0000], Mel Gorman wrote:
>  > If we drop that "we prefer the RT reader" then it would block on the
> > > RTmutex. It will _still_ be preferred over the writer because it will be
> > > enqueued before the writer in the queue due to its RT priority. The only
> > > downside is that it has to wait until all readers are left.
> > 
> > The writer has to wait until all the readers have left anyway.
> 
> I meant the READER in case it has RT priority. It will enqueue itself on
> the RTmutex, first in line, and wait until all other READER leave.
> 

Ah.

> > If I understand you correctly, the patch becomes this;
> 
> exactly.
> 
> > --8<--
> ???
> > This patch records a timestamp when the first writer is blocked. DT /
> 
> s/DT/DL
> 

Fixed.

> > RT tasks can continue to take the lock for read as long as readers exist
> > indefinitely. Other readers can acquire the read lock unless a writer
> > has been blocked for a minimum of 4ms. This is sufficient to allow the
> > dio_truncate test case to complete within the 30 minutes timeout.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > ---
> ???
> > diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
> > index c201aadb9301..84c5e4e4d25b 100644
> > --- a/kernel/locking/rwbase_rt.c
> > +++ b/kernel/locking/rwbase_rt.c
> > @@ -74,9 +106,11 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *rwb,
> >  	raw_spin_lock_irq(&rtm->wait_lock);
> >  	/*
> >  	 * Allow readers, as long as the writer has not completely
> > -	 * acquired the semaphore for write.
> > +	 * acquired the semaphore for write and reader bias is still
> > +	 * allowed.
> >  	 */
> > -	if (atomic_read(&rwb->readers) != WRITER_BIAS) {
> > +	if (atomic_read(&rwb->readers) != WRITER_BIAS &&
> > +	    rwbase_allow_reader_bias(rwb)) {
> >  		atomic_inc(&rwb->readers);
> >  		raw_spin_unlock_irq(&rtm->wait_lock);
> >  		return 0;
> > @@ -264,12 +298,17 @@ static int __sched rwbase_write_lock(struct rwbase_rt *rwb,
> >  		if (__rwbase_write_trylock(rwb))
> >  			break;
> >  
> > +		/* Record first new read/write contention. */
> > +		set_writer_blocked(rwb);
> > +
> >  		raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
> >  		rwbase_schedule();
> >  		raw_spin_lock_irqsave(&rtm->wait_lock, flags);
> >  
> >  		set_current_state(state);
> >  	}
> > +
> > +	rwb->waiter_timeout = 0;
> 
> Regarding memory ordering and ordering in general:
> - Should the writer leave from rwbase_schedule() due to a signal then
>   set_writer_blocked() sets a timeout but it is not cleared on the
>   signal leave.
> 

You're correct, it should be reset in the signal check block before the
wait_lock is released by __rwbase_write_unlock. I considered different
ways to avoid multiple reset points but it was untidy.

> - There is only writer in that for loop within rwbase_write_lock()
>   because only one writer can own the rtmutex at a time. (A second
>   writer blocks on the RTmutex and needs to wait, I may have spread some
>   confusion earler). Therefore it should be okay to unconditionally set
>   the timeout (instead of checking for zero).
> 

Ah ok, I see now or at least I think I do.

> - Once the writer removes READER_BIAS, it forces the reader into the
>   slowpath.

Removed in __rwbase_write_trylock IIUC

>   At that time the writer does not own the wait_lock meaning
>   the reader _could_ check the timeout before writer had a chance to set
>   it. The worst thing is probably that if jiffies does not have the
>   highest bit set then it will always disable the reader bias here.
>   The easiest thing is probably to check timeout vs 0 and ensure on the
>   writer side that the lowest bit is always set (in the unlikely case it
>   will end up as zero).
> 

I am missing something important. On the read side, we have

        raw_spin_lock_irq(&rtm->wait_lock);
        /*
         * Allow readers, as long as the writer has not completely
         * acquired the semaphore for write and reader bias is still
         * allowed.
         */
        if (atomic_read(&rwb->readers) != WRITER_BIAS &&
            rwbase_allow_reader_bias(rwb)) {
                atomic_inc(&rwb->readers);
                raw_spin_unlock_irq(&rtm->wait_lock);
                return 0;
        }

So rtm->wait_lock is held and both the WRITER_BASE and timeout are checked
under the lock. On the write side it's also held when the timeout is
updated here

        raw_spin_lock_irqsave(&rtm->wait_lock, flags);
	...
	for (;;) {
		...

                /* Record timeout when reader bias is ignored. */
                rwb->waiter_timeout = jiffies + RWBASE_RT_WAIT_TIMEOUT;

                raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
                rwbase_schedule();
                raw_spin_lock_irqsave(&rtm->wait_lock, flags);

                set_current_state(state);
        }

        rwb->waiter_timeout = 0;
	...

out_unlock:
        raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);

(it's also now held in the signal block when it is cleared)

I'm not seeing exactly what the problem is unless it's an issue in the
fast path but I think the atomic_try_cmpxchg_acquire works there.

In the absense of wait_lock, I guess there would be a potential
race between the atomic_read(&rwb->readers) != WRITER_BIAS and
rwbase_allow_reader_bias(rwb) but that shouldn't be the case now. Even
if it wasn't locked, it might allow a new reader through but it'd be
a transient problem and writer starvation should be prevented once the
timeout is observed.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2023-01-19 11:03 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-17  8:38 [PATCH v2] locking/rwbase: Prevent indefinite writer starvation Mel Gorman
     [not found] ` <20230117105031.2512-1-hdanton@sina.com>
2023-01-17 12:18   ` Mel Gorman
2023-01-17 14:22 ` Sebastian Andrzej Siewior
2023-01-17 16:50   ` Mel Gorman
2023-01-18 10:45     ` Ingo Molnar
2023-01-18 16:00       ` Mel Gorman
2023-01-18 15:25     ` Sebastian Andrzej Siewior
2023-01-18 17:31       ` Mel Gorman
2023-01-19  1:15         ` Hillf Danton
2023-01-19  8:32           ` Sebastian Andrzej Siewior
2023-01-19 13:59             ` Hillf Danton
2023-01-19 16:36               ` Sebastian Andrzej Siewior
2023-01-20  9:37                 ` Hillf Danton
2023-01-20 18:34                   ` Sebastian Andrzej Siewior
2023-01-21  3:46                     ` Hillf Danton
2023-01-19  8:25         ` Sebastian Andrzej Siewior
2023-01-19 11:02           ` Mel Gorman [this message]
2023-01-19 16:28             ` Sebastian Andrzej Siewior
2023-01-19 17:41               ` Mel Gorman
2023-01-19 17:48                 ` Davidlohr Bueso
2023-01-19 17:58                   ` Davidlohr Bueso
2023-01-20  8:25                 ` Sebastian Andrzej Siewior
2023-01-20 13:24                   ` Mel Gorman
2023-01-20 13:38                     ` Sebastian Andrzej Siewior
2023-01-20 14:07                       ` Mel Gorman
2023-01-20 15:36                     ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230119110220.kphftcehehhi5l5u@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=bigeasy@linutronix.de \
    --cc=dave@stgolabs.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.