Re: [PATCH 1/2] sched/core: switch struct rq->nr_iowait to a normal int

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: Thomas Gleixner <tglx@linutronix.de>, linux-kernel@vger.kernel.org
Cc: peterz@infradead.org, mingo@kernel.org
Subject: Re: [PATCH 1/2] sched/core: switch struct rq->nr_iowait to a normal int
Date: Thu, 29 Feb 2024 15:30:05 -0700	[thread overview]
Message-ID: <f1fc564c-528a-47d0-8b68-d596d6681eb5@kernel.dk> (raw)
In-Reply-To: <87sf1b6o9w.ffs@tglx>

On 2/29/24 12:52 PM, Thomas Gleixner wrote:
> On Thu, Feb 29 2024 at 10:49, Jens Axboe wrote:
>> On 2/29/24 10:42 AM, Thomas Gleixner wrote:
>>> So but I just noticed that there is actually an issue with this:
>>>
>>>>  unsigned int nr_iowait_cpu(int cpu)
>>>>  {
>>>> -	return atomic_read(&cpu_rq(cpu)->nr_iowait);
>>>> +	struct rq *rq = cpu_rq(cpu);
>>>> +
>>>> +	return rq->nr_iowait - atomic_read(&rq->nr_iowait_remote);
>>>
>>> The access to rq->nr_iowait is not protected by the runqueue lock and
>>> therefore a data race when @cpu is not the current CPU.
>>>
>>> This needs to be properly annotated and explained why it does not
>>> matter.
>>
>> But that was always racy before as well, if someone else is inc/dec'ing
>> ->nr_iowait while it's being read, you could get either the before or
>> after value. This doesn't really change that. I could've sworn I
>> mentioned that in the commit message, but I did not.
> 
> There are actually two issues here:
> 
> 1) atomic_read() vs. atomic_inc/dec() guarantees that the read value
>    is consistent in itself.
> 
>    Non-atomic inc/dec is not guaranteeing that the concurrent read is a
>    consistent value as the compiler is free to do store/load
>    tearing. Unlikely but not guaranteed to never happen.
> 
>    KCSAN will complain about it sooner than later and then someone has
>    to go and do the analysis and the annotation. I rather let you do
>    the reasoning now than chasing you down later :)

Fair enough.

> 2) What's worse is that the result can be completely bogus:
> 
>    i.e.
> 
>    CPU0                                 CPU1                    CPU2
>    a = rq(CPU1)->nr_iowait; // 0
>                                         rq->nr_iowait++;
>                                                                 rq(CPU1)->nr_iowait_remote++;
>    b = rq(CPU1)->nr_iowait_remote; // 1
> 
>    r = a - b; // -1
>    return (unsigned int) r; // UINT_MAX
> 
>    The consumers of this interface might be upset. :)
> 
>    While with a single atomic_t it's guaranteed that the result is
>    always greater or equal zero.

Yeah OK, this is a real problem...

>>> So s/Reviewed-by/Un-Reviewed-by/
>>>
>>> Though thinking about it some more. Is this split a real benefit over
>>> always using the atomic? Do you have numbers to show?
>>
>> It was more on Peter's complaint that now we're trading a single atomic
>> for two, hence I got to thinking about nr_iowait in general. I don't
>> have numbers showing it matters, as mentioned in another email the most
>> costly part about this seems to be fetching task->in_iowait and not the
>> actual atomic.
> 
> On the write side (except for the remote case) the cache line is already
> dirty on the current CPU and I doubt that the atomic will be
> noticable. If there is concurrent remote access to the runqueue then the
> cache line is bouncing no matter what.

That was my exact thinking too, same cacheline and back-to-back atomics
don't really matter vs a single atomic on it.

> On the read side there is always an atomic operation required, so it's
> not really different.
> 
> I assume Peter's complaint was about the extra nr_iowait_acct part. I
> think that's solvable without the extra atomic_t member and with a
> single atomic_add()/sub(). atomic_t is 32bit wide, so what about
> splitting the thing and adding/subtracting both in one go?
> 
> While sketching this I noticed that prepare/finish can be written w/o
> any conditionals.
> 
> int io_schedule_prepare(void)
> {
> 	int flags = current->in_iowait + current->in_iowait_acct << 16;
> 
> 	current->in_iowait = 1;
> 	current->in_iowait_acct = 1;
> 	blk_flush_plug(current->plug, true);
> 	return flags;
> }
> 
> void io_schedule_finish(int old_wait_flags)
> {
> 	current->in_iowait = flags & 0x01;
>         current->in_iowait_acct = flags >> 16;
> }
> 
> Now __schedule():
> 
> 	if (prev->in_iowait) {
>            	int x = 1 + current->in_iowait_acct << 16;
> 
> 		atomic_add(x, rq->nr_iowait);
> 		delayacct_blkio_start();
> 	}
> 
> and ttwu_do_activate():
> 
> 	if (p->in_iowait) {
>         	int x = 1 + current->in_iowait_acct << 16;
> 
>                 delayacct_blkio_end(p);
>                 atomic_sub(x, task_rq(p)->nr_iowait);
> 	}
> 
> 
> and try_to_wake_up():
> 
> 	delayacct_blkio_end(p);
> 
> 	int x = 1 + current->in_iowait_acct << 16;
> 
> 	atomic_add(x, task_rq(p)->nr_iowait);
> 
> nr_iowait_acct_cpu() becomes:
> 
>         return atomic_read(&cpu_rq(cpu)->nr_iowait) >> 16;
> 
> and nr_iowait_cpu():
> 
>         return atomic_read(&cpu_rq(cpu)->nr_iowait) & ((1 << 16) - 1);
> 
> Obviously written with proper inline wrappers and defines, but you get
> the idea.

I'll play with this a bit, but do we want to switch to an atomic_long_t
for this? 2^16 in iowait seems extreme, but it definitely seems possible
to overflow it.

-- 
Jens Axboe

next prev parent reply	other threads:[~2024-02-29 22:30 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28 19:16 [PATCHSET v3 0/2] Split iowait into two states Jens Axboe
2024-02-28 19:16 ` [PATCH 1/2] sched/core: switch struct rq->nr_iowait to a normal int Jens Axboe
2024-02-29 16:53   ` Thomas Gleixner
2024-02-29 17:19     ` Jens Axboe
2024-02-29 17:42       ` Thomas Gleixner
2024-02-29 17:49         ` Jens Axboe
2024-02-29 19:52           ` Thomas Gleixner
2024-02-29 22:30             ` Jens Axboe [this message]
2024-03-01  0:02               ` Thomas Gleixner
2024-02-28 19:16 ` [PATCH 2/2] sched/core: split iowait state into two states Jens Axboe
2024-02-29 17:31   ` Thomas Gleixner
2024-02-29 17:45     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f1fc564c-528a-47d0-8b68-d596d6681eb5@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox