From: Jens Axboe <axboe@kernel.dk>
To: Nicholas Piggin <npiggin@gmail.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: Memory coherency issue with IO thread offloading?
Date: Mon, 27 Mar 2023 06:39:33 -0600 [thread overview]
Message-ID: <6f32b504-ccd5-d67c-1b67-95d8fe1cf185@kernel.dk> (raw)
In-Reply-To: <CRGVMXJ46PPN.1VWRMA1IMPHW2@bobo>
On 3/26/23 10:22?PM, Nicholas Piggin wrote:
> On Sat Mar 25, 2023 at 11:20 AM AEST, Jens Axboe wrote:
>> On 3/24/23 7:15?PM, Jens Axboe wrote:
>>>> Are there any CONFIG options I'd need to trip this?
>>>
>>> I don't think you need any special CONFIG options. I'll attach my config
>>> here, and I know the default distro one hits it too. But perhaps the
>>> mariadb version is not new enough? I think you need 10.6 or above, as
>>> will use io_uring by default. What version are you running?
>>
>> And here's the .config and the patch for using queue_work().
>
> So if you *don't* apply this patch, the work gets queued up with an IO
> thread? In io-wq.c? Does that worker end up just doing an io_write()
> same as this one?
Right, without this patch, it gets added to the io-wq work pool. If a
thread is available to run it, it will. If one is not, then one is
created. Eg either event can happen.
That thread does the exact same io_write() again.
> Can the queueing cause the creation of an IO thread (if one does not
> exist, or all blocked?)
Yep
Since writing this email, I've gone through a lot of different tests.
Here's a rough listing of what I found:
- Like using the hack patch, if I just limit the number of IO thread
workers to 1, it seems to pass. At least longer than before, does 1000
iterations.
- If I pin each IO worker to a single CPU, it also passes.
- If I liberally sprinkle smp_mb() for the io-wq side, test still fails.
I've added one before queueing the work item, and after. One before
the io-wq worker grabs a work item and one after. Eg full hammer
approach. This still fails.
Puzzling... For the "pin each IO worker to a single CPU" I added some
basic code around trying to ensure that a work item queued on CPU X
would be processed by a worker on CPU X, and too a large degree, this
does happen. But since the work list is a normal list, it's quite
possible that some other worker finishes its work on CPU Y just in time
to grab the one from cpu X. I checked and this does happen in the test
case, yet it still passes. This may be because I got a bit lucky, but
seems suspect with thousands of passes of the test case.
Another theory there is that it's perhaps related to an io-wq worker
being rescheduled on a different CPU. Though again puzzled as to why the
smp_mb sprinkling didn't fix that then. I'm going to try and run the
test case with JUST the io-wq worker pinning and not caring about where
the work is processed to see if that does anything.
> I'm wondering what the practical differences are between this patch and
> upstream.
>
> kthread_use_mm() should be basically the same as context switching to an
> IO thread. There is maybe a difference in that kthread_switch_mm() has
> a 'sync' instruction *after* the MMU is switched to the new thread from
> the membarrier code, but a regular context switch might not. The MMU
> switch does have an isync() after it though, so loads *should* be
> prohibited from moving ahead of that.
>
> Something like this adds a sync roughly where kthread_use_mm() has one.
> It's a pretty unlikely shot in the dark though. I'm more inclined to
> think the work submission to the IO thread might have a problem.
Didn't seem to change anything, fails pretty quickly:
[...]
encryption.innodb_encryption 'innodb,undo0' [ 38 pass ] 3083
encryption.innodb_encryption 'innodb,undo0' [ 39 pass ] 3135
encryption.innodb_encryption 'innodb,undo0' [ 40 fail ]
Test ended at 2023-03-27 12:20:46
CURRENT_TEST: encryption.innodb_encryption
mysqltest: At line 11: query 'SET @start_global_value = @@global.innodb_encryption_threads' failed: ER_UNKNOWN_SYSTEM_VARIABLE (1193): Unknown system variable 'innodb_encryption_threads'
The result from queries just before the failure was:
SET @start_global_value = @@global.innodb_encryption_threads;
- saving '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/' to '/dev/shm/mysql/log/encryption.innodb_encryption-innodb,undo0/'
***Warnings generated in error logs during shutdown after running tests: encryption.innodb_encryption
2023-03-27 12:20:45 0 [Warning] Plugin 'example_key_management' is of maturity level experimental while the server is gamma
2023-03-27 12:20:45 0 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './ibdata1' page [page id: space=0, page number=214]. You may have to recover from a backup.
2023-03-27 12:20:45 0 [ERROR] InnoDB: File './ibdata1' is corrupted
2023-03-27 12:20:45 0 [ERROR] InnoDB: Plugin initialization aborted with error Page read from tablespace is corrupted.
2023-03-27 12:20:45 0 [ERROR] Plugin 'InnoDB' init function returned error.
2023-03-27 12:20:45 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
--
Jens Axboe
next prev parent reply other threads:[~2023-03-27 12:40 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-23 18:54 Memory coherency issue with IO thread offloading? Jens Axboe
2023-03-24 7:27 ` Christophe Leroy
2023-03-24 12:06 ` Jens Axboe
2023-03-25 0:15 ` Michael Ellerman
2023-03-25 0:20 ` Jens Axboe
2023-03-25 0:42 ` Michael Ellerman
2023-03-25 1:15 ` Jens Axboe
2023-03-25 1:20 ` Jens Axboe
2023-03-27 4:22 ` Nicholas Piggin
2023-03-27 12:39 ` Jens Axboe [this message]
2023-03-27 21:24 ` Jens Axboe
2023-03-28 12:51 ` Michael Ellerman
2023-03-28 16:38 ` Jens Axboe
2023-03-27 13:53 ` Michael Ellerman
-- strict thread matches above, loose matches on Subject: below --
2023-03-28 6:20 Daniel Black
2023-03-28 12:10 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6f32b504-ccd5-d67c-1b67-95d8fe1cf185@kernel.dk \
--to=axboe@kernel.dk \
--cc=christophe.leroy@csgroup.eu \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).