public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Jens Axboe <axboe@kernel.dk>, linux-kernel@vger.kernel.org
Cc: peterz@infradead.org, mingo@kernel.org
Subject: Re: [PATCH 1/2] sched/core: switch struct rq->nr_iowait to a normal int
Date: Thu, 29 Feb 2024 20:52:59 +0100	[thread overview]
Message-ID: <87sf1b6o9w.ffs@tglx> (raw)
In-Reply-To: <edd520ab-b95f-4a60-a35a-2490a6d5057f@kernel.dk>

On Thu, Feb 29 2024 at 10:49, Jens Axboe wrote:
> On 2/29/24 10:42 AM, Thomas Gleixner wrote:
>> So but I just noticed that there is actually an issue with this:
>> 
>>>  unsigned int nr_iowait_cpu(int cpu)
>>>  {
>>> -	return atomic_read(&cpu_rq(cpu)->nr_iowait);
>>> +	struct rq *rq = cpu_rq(cpu);
>>> +
>>> +	return rq->nr_iowait - atomic_read(&rq->nr_iowait_remote);
>> 
>> The access to rq->nr_iowait is not protected by the runqueue lock and
>> therefore a data race when @cpu is not the current CPU.
>> 
>> This needs to be properly annotated and explained why it does not
>> matter.
>
> But that was always racy before as well, if someone else is inc/dec'ing
> ->nr_iowait while it's being read, you could get either the before or
> after value. This doesn't really change that. I could've sworn I
> mentioned that in the commit message, but I did not.

There are actually two issues here:

1) atomic_read() vs. atomic_inc/dec() guarantees that the read value
   is consistent in itself.

   Non-atomic inc/dec is not guaranteeing that the concurrent read is a
   consistent value as the compiler is free to do store/load
   tearing. Unlikely but not guaranteed to never happen.

   KCSAN will complain about it sooner than later and then someone has
   to go and do the analysis and the annotation. I rather let you do
   the reasoning now than chasing you down later :)

2) What's worse is that the result can be completely bogus:

   i.e.

   CPU0                                 CPU1                    CPU2
   a = rq(CPU1)->nr_iowait; // 0
                                        rq->nr_iowait++;
                                                                rq(CPU1)->nr_iowait_remote++;
   b = rq(CPU1)->nr_iowait_remote; // 1

   r = a - b; // -1
   return (unsigned int) r; // UINT_MAX

   The consumers of this interface might be upset. :)

   While with a single atomic_t it's guaranteed that the result is
   always greater or equal zero.

>> So s/Reviewed-by/Un-Reviewed-by/
>> 
>> Though thinking about it some more. Is this split a real benefit over
>> always using the atomic? Do you have numbers to show?
>
> It was more on Peter's complaint that now we're trading a single atomic
> for two, hence I got to thinking about nr_iowait in general. I don't
> have numbers showing it matters, as mentioned in another email the most
> costly part about this seems to be fetching task->in_iowait and not the
> actual atomic.

On the write side (except for the remote case) the cache line is already
dirty on the current CPU and I doubt that the atomic will be
noticable. If there is concurrent remote access to the runqueue then the
cache line is bouncing no matter what.

On the read side there is always an atomic operation required, so it's
not really different.

I assume Peter's complaint was about the extra nr_iowait_acct part. I
think that's solvable without the extra atomic_t member and with a
single atomic_add()/sub(). atomic_t is 32bit wide, so what about
splitting the thing and adding/subtracting both in one go?

While sketching this I noticed that prepare/finish can be written w/o
any conditionals.

int io_schedule_prepare(void)
{
	int flags = current->in_iowait + current->in_iowait_acct << 16;

	current->in_iowait = 1;
	current->in_iowait_acct = 1;
	blk_flush_plug(current->plug, true);
	return flags;
}

void io_schedule_finish(int old_wait_flags)
{
	current->in_iowait = flags & 0x01;
        current->in_iowait_acct = flags >> 16;
}

Now __schedule():

	if (prev->in_iowait) {
           	int x = 1 + current->in_iowait_acct << 16;

		atomic_add(x, rq->nr_iowait);
		delayacct_blkio_start();
	}

and ttwu_do_activate():

	if (p->in_iowait) {
        	int x = 1 + current->in_iowait_acct << 16;

                delayacct_blkio_end(p);
                atomic_sub(x, task_rq(p)->nr_iowait);
	}


and try_to_wake_up():

	delayacct_blkio_end(p);

	int x = 1 + current->in_iowait_acct << 16;

	atomic_add(x, task_rq(p)->nr_iowait);

nr_iowait_acct_cpu() becomes:

        return atomic_read(&cpu_rq(cpu)->nr_iowait) >> 16;

and nr_iowait_cpu():

        return atomic_read(&cpu_rq(cpu)->nr_iowait) & ((1 << 16) - 1);

Obviously written with proper inline wrappers and defines, but you get
the idea.

Hmm?

Thanks,

        tglx

  reply	other threads:[~2024-02-29 19:53 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28 19:16 [PATCHSET v3 0/2] Split iowait into two states Jens Axboe
2024-02-28 19:16 ` [PATCH 1/2] sched/core: switch struct rq->nr_iowait to a normal int Jens Axboe
2024-02-29 16:53   ` Thomas Gleixner
2024-02-29 17:19     ` Jens Axboe
2024-02-29 17:42       ` Thomas Gleixner
2024-02-29 17:49         ` Jens Axboe
2024-02-29 19:52           ` Thomas Gleixner [this message]
2024-02-29 22:30             ` Jens Axboe
2024-03-01  0:02               ` Thomas Gleixner
2024-02-28 19:16 ` [PATCH 2/2] sched/core: split iowait state into two states Jens Axboe
2024-02-29 17:31   ` Thomas Gleixner
2024-02-29 17:45     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sf1b6o9w.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox