From: Jens Axboe <axboe@kernel.dk>
To: Michael Ellerman <mpe@ellerman.id.au>,
Nicholas Piggin <npiggin@gmail.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: daniel@mariadb.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: Memory coherency issue with IO thread offloading?
Date: Tue, 28 Mar 2023 10:38:03 -0600 [thread overview]
Message-ID: <91477fb8-c9d8-53e7-e657-f5d6ba2e276f@kernel.dk> (raw)
In-Reply-To: <87a5zxca3t.fsf@mpe.ellerman.id.au>
On 3/28/23 6:51?AM, Michael Ellerman wrote:
> Jens Axboe <axboe@kernel.dk> writes:
>>>> Can the queueing cause the creation of an IO thread (if one does not
>>>> exist, or all blocked?)
>>>
>>> Yep
>>>
>>> Since writing this email, I've gone through a lot of different tests.
>>> Here's a rough listing of what I found:
>>>
>>> - Like using the hack patch, if I just limit the number of IO thread
>>> workers to 1, it seems to pass. At least longer than before, does 1000
>>> iterations.
>>>
>>> - If I pin each IO worker to a single CPU, it also passes.
>>>
>>> - If I liberally sprinkle smp_mb() for the io-wq side, test still fails.
>>> I've added one before queueing the work item, and after. One before
>>> the io-wq worker grabs a work item and one after. Eg full hammer
>>> approach. This still fails.
>>>
>>> Puzzling... For the "pin each IO worker to a single CPU" I added some
>>> basic code around trying to ensure that a work item queued on CPU X
>>> would be processed by a worker on CPU X, and too a large degree, this
>>> does happen. But since the work list is a normal list, it's quite
>>> possible that some other worker finishes its work on CPU Y just in time
>>> to grab the one from cpu X. I checked and this does happen in the test
>>> case, yet it still passes. This may be because I got a bit lucky, but
>>> seems suspect with thousands of passes of the test case.
>>>
>>> Another theory there is that it's perhaps related to an io-wq worker
>>> being rescheduled on a different CPU. Though again puzzled as to why the
>>> smp_mb sprinkling didn't fix that then. I'm going to try and run the
>>> test case with JUST the io-wq worker pinning and not caring about where
>>> the work is processed to see if that does anything.
>>
>> Just pinning each worker to whatever CPU they got created on seemingly
>> fixes the issue too. This does not mean that each worker will process
>> work on the CPU on which it was queued, just that each worker will
>> remain on whatever CPU it originally got created on.
>>
>> Puzzling...
>>
>> Note that it is indeed quite possible that this isn't a ppc issue at
>> all, just shows on ppc. It could be page cache related, or it could even
>> be a bug in mariadb itself.
>
> I tried binary patching every lwsync to hwsync (read/write to full
> barrier) in mariadbd and all the libaries it links. It didn't fix the
> problem.
>
> I also tried switching all the kernel barriers/spin locks to using a
> hwsync, but that also didn't fix it.
>
> It's still possible there's somewhere that currently has no barrier at
> all that needs one, the above would only fix the problem if we have a
> read/write barrier that actually needs to be a full barrier.
>
>
> I also looked at making all TLB invalidates broadcast, regardless of
> whether we think the thread has only been on a single CPU. That didn't
> help, but I'm not sure I got all places where we do TLB invalidates, so
> I'll look at that some more tomorrow.
Thanks, appreciate your testing! I have no new data points since
yesterday, but the key point from then still seems to be that if an io
worker never reschedules onto a different CPU, then the problem doesn't
occur. This could very well be a page cache issue, if it isn't an issue
on the powerpc side...
--
Jens Axboe
next prev parent reply other threads:[~2023-03-28 16:39 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-23 18:54 Memory coherency issue with IO thread offloading? Jens Axboe
2023-03-24 7:27 ` Christophe Leroy
2023-03-24 12:06 ` Jens Axboe
2023-03-25 0:15 ` Michael Ellerman
2023-03-25 0:20 ` Jens Axboe
2023-03-25 0:42 ` Michael Ellerman
2023-03-25 1:15 ` Jens Axboe
2023-03-25 1:20 ` Jens Axboe
2023-03-27 4:22 ` Nicholas Piggin
2023-03-27 12:39 ` Jens Axboe
2023-03-27 21:24 ` Jens Axboe
2023-03-28 12:51 ` Michael Ellerman
2023-03-28 16:38 ` Jens Axboe [this message]
2023-03-27 13:53 ` Michael Ellerman
-- strict thread matches above, loose matches on Subject: below --
2023-03-28 6:20 Daniel Black
2023-03-28 12:10 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=91477fb8-c9d8-53e7-e657-f5d6ba2e276f@kernel.dk \
--to=axboe@kernel.dk \
--cc=christophe.leroy@csgroup.eu \
--cc=daniel@mariadb.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).