From: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
To: "Christian König" <christian.koenig@amd.com>,
"Sergei Miroshnichenko" <s.miroshnichenko@yadro.com>
Cc: "Alexander.Deucher@amd.com" <Alexander.Deucher@amd.com>,
"anatoli.antonovitch@amd.com" <anatoli.antonovitch@amd.com>,
"helgaas@kernel.org" <helgaas@kernel.org>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"linux@yadro.com" <linux@yadro.com>
Subject: Re: Question about supporting AMD eGPU hot plug case
Date: Tue, 16 Mar 2021 13:39:35 -0400 [thread overview]
Message-ID: <eb36c4d7-c69b-8e80-c911-d0757d99f124@amd.com> (raw)
In-Reply-To: <57a72ca6-c58e-76a3-3e19-8aca98fa831c@amd.com>
On 2021-03-16 12:17 p.m., Christian König wrote:
> Am 15.03.21 um 18:11 schrieb Andrey Grodzovsky:
>>
>> On 2021-03-15 12:55 p.m., Christian König wrote:
>>>
>>>
>>> Am 15.03.21 um 17:21 schrieb Andrey Grodzovsky:
>>>>
>>>> On 2021-03-15 12:10 p.m., Christian König wrote:
>>>>> Am 12.03.21 um 16:34 schrieb Andrey Grodzovsky:
>>>>>>
>>>>>>
>>>>>> On 2021-03-12 4:03 a.m., Christian König wrote:
>>>>>>> Am 11.03.21 um 23:40 schrieb Andrey Grodzovsky:
>>>>>>>> [SNIP]
>>>>>>>>>> The expected result is they all move closer to the start of
>>>>>>>>>> PCI address
>>>>>>>>>> space.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ok, I updated as you described. Also I removed PCI conf
>>>>>>>>> command to stop
>>>>>>>>> address decoding and restart later as I noticed PCI core does
>>>>>>>>> it itself
>>>>>>>>> when needed.
>>>>>>>>> I tested now also with graphic desktop enabled while submitting
>>>>>>>>> 3d draw commands and seems like under this scenario everything
>>>>>>>>> still
>>>>>>>>> works. Again, this all needs to be tested with VRAM BAR move
>>>>>>>>> as then
>>>>>>>>> I believe I will see more issues like handling of MMIO mapped
>>>>>>>>> VRAM objects (like GART table). In case you do have an AMD
>>>>>>>>> card you could also maybe give it a try. In the meanwhile I
>>>>>>>>> will add support to ioremapping of those VRAM objects.
>>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>
>>>>>>>> Just an update, added support for unmaping/remapping of all VRAM
>>>>>>>> objects, both user space mmaped and kernel ioremaped. Seems to
>>>>>>>> work
>>>>>>>> ok but again, without forcing VRAM BAR to move I can't be sure.
>>>>>>>> Alex, Chsristian - take a look when you have some time to give
>>>>>>>> me some
>>>>>>>> initial feedback on the amdgpu side.
>>>>>>>>
>>>>>>>> The code is at
>>>>>>>> https://cgit.freedesktop.org/~agrodzov/linux/log/?h=yadro%2Fpcie_hotplug%2Fmovable_bars_v9.1
>>>>>>>>
>>>>>>>
>>>>>>> Mhm, that let's userspace busy retry until the BAR movement is
>>>>>>> done.
>>>>>>>
>>>>>>> Not sure if that can't live lock somehow.
>>>>>>>
>>>>>>> Christian.
>>>>>>
>>>>>> In my testing it didn't but, I can instead route them to some
>>>>>> global static dummy page while BARs are moving and then when
>>>>>> everything
>>>>>> done just invalidate the device address space again and let the
>>>>>> pagefaults fill in valid PFNs again.
>>>>>
>>>>> Well that won't work because the reads/writes which are done in
>>>>> the meantime do need to wait for the BAR to be available again.
>>>>>
>>>>> So waiting for the BAR move to finish is correct, but what we
>>>>> should do is to use a lock instead of an SRCU because that makes
>>>>> lockdep complain when we do something nasty.
>>>>>
>>>>> Christian.
>>>>
>>>>
>>>> Spinlock I assume ? We can't sleep there - it's an interrupt.
>>>
>>> Mhm, the BAR movement is in interrupt context?
>>
>>
>> No, BARs move is in task context I believe. The page faults are in
>> interrupt context and so we can only lock a spinlock there I assume,
>
> No, page faults are in task context as well! Otherwise you wouldn't be
> able to sleep for network I/O in a page fault for example.
Ok, that was a long standing confusion on my side, especially because
'Understanding the Linux Kernel' states that do_page_fault is an
interrupt handler here -
https://vistech.net/~champ/online-docs/books/linuxkernel2/060.htm
while in fact this is an exception handler which is ran in the context
of the user process causing it and hence can sleep ( as explained here
by Rober Love himself) https://www.spinics.net/lists/newbies/msg07287.html
>
>> not a mutex which might sleep. But we can't lock
>> spinlock for the entire BAR move because HW suspend + asic reset is a
>> long process with some sleeps/context switches inside it probably.
>>
>>
>>>
>>> Well that is rather bad. I was hoping to rename the GPU reset rw_sem
>>> into device_access rw_sem and then use the same lock for both (It's
>>> essentially the same problem).
>>
>>
>> I was thinking about it from day 1 but what looked to me different is
>> that in GPU reset case there is no technical need to block MMIO
>> accesses as the BARs are not moving
>> and so the page table entries remain valid. It's true that while the
>> device in reset those MMIO accesses are meaninglessness - so this
>> indeed could be good reason to block
>> access even during GPU reset.
>
> From the experience now I would say that we should block MMIO access
> during GPU reset as necessary.
>
> We can't do things like always taking the lock in each IOCTL, but for
> low level hardware access it shouldn't be a problem at all.
>
> Christian.
I will update the code then to reuse our adev->reset_sem for this locking.
Andrey
>
>>
>> Andrey
>>
>>
>>>
>>> But when we need to move the BAR in atomic/interrupt context that
>>> makes things a bit more complicated.
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>
>>>>>
>>>
>
prev parent reply other threads:[~2021-03-16 17:40 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <ddef2da4-4726-7321-40fe-5f90788cc836@amd.com>
[not found] ` <MN2PR12MB44880052F5C4ECF3EF22B58EF7B69@MN2PR12MB4488.namprd12.prod.outlook.com>
[not found] ` <ffb816ae8336ff2373ec5fcf695e698900c3d557.camel@yadro.com>
[not found] ` <9c41221f-ecfa-5554-f2ea-6f72cfe7dc7e@amd.com>
[not found] ` <dae8dfd8-3a99-620d-f0aa-ceb39923b807@amd.com>
[not found] ` <7d9e947648ce47a4ba8d223cdec08197@yadro.com>
[not found] ` <c82919f3-5296-cd0a-0b8f-c33614ca3ea9@amd.com>
[not found] ` <340062dba82b813b311b2c78022d2d3d0e6f414d.camel@yadro.com>
[not found] ` <927d7fbe-756f-a4f1-44cd-c68aecd906d7@amd.com>
[not found] ` <dc2de228b92c4e27819c7681f650e1e5@yadro.com>
[not found] ` <a6af29ed-179a-7619-3dde-474542c180f4@amd.com>
[not found] ` <8f53f1403f0c4121885398487bbfa241@yadro.com>
[not found] ` <fc2ea091-8470-9501-242d-8d82714adecb@amd.com>
[not found] ` <50afd1079dbabeba36fd35fdef84e6e15470ef45.camel@yadro.com>
[not found] ` <c53c23b5-dd37-44f1-dffd-ff9699018a82@amd.com>
[not found] ` <8d7e2d7b7982d8d78c0ecaa74b9d40ace4db8453.camel@yadro.com>
2021-03-04 19:49 ` Question about supporting AMD eGPU hot plug case Andrey Grodzovsky
2021-03-05 16:08 ` Sergei Miroshnichenko
2021-03-05 17:13 ` Andrey Grodzovsky
2021-03-05 19:12 ` Sergei Miroshnichenko
2021-03-05 23:13 ` Andrey Grodzovsky
2021-03-11 22:40 ` Andrey Grodzovsky
2021-03-12 9:03 ` Christian König
2021-03-12 15:34 ` Andrey Grodzovsky
2021-03-15 16:00 ` Andrey Grodzovsky
2021-03-15 16:10 ` Christian König
2021-03-15 16:21 ` Andrey Grodzovsky
2021-03-15 16:55 ` Christian König
2021-03-15 17:11 ` Andrey Grodzovsky
2021-03-16 16:17 ` Christian König
2021-03-16 17:39 ` Andrey Grodzovsky [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=eb36c4d7-c69b-8e80-c911-d0757d99f124@amd.com \
--to=andrey.grodzovsky@amd.com \
--cc=Alexander.Deucher@amd.com \
--cc=anatoli.antonovitch@amd.com \
--cc=christian.koenig@amd.com \
--cc=helgaas@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux@yadro.com \
--cc=s.miroshnichenko@yadro.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox