From: Jerome Glisse <j.glisse-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Tom St Denis <tom.stdenis-5C7GfCeVMHo@public.gmane.org>
Cc: "Deucher,
Alexander" <Alexander.Deucher-5C7GfCeVMHo@public.gmane.org>,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
"Koenig,
Christian" <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>
Subject: Re: Feature Request: Ability to decode bus/dma address back into physical address
Date: Tue, 1 Aug 2017 15:55:57 -0400 [thread overview]
Message-ID: <20170801195556.GD3443@gmail.com> (raw)
In-Reply-To: <42c5fe2b-f179-cb71-03d3-7ae991543edb-5C7GfCeVMHo@public.gmane.org>
On Tue, Aug 01, 2017 at 03:28:26PM -0400, Tom St Denis wrote:
> Adding the AMDGPU maintainers to get their opinions.
>
> Context:
> https://lists.linuxfoundation.org/pipermail/iommu/2017-August/023489.html
>
> (and others in case you missed it on the list).
>
>
> On 01/08/17 03:03 PM, Jerome Glisse wrote:
> > > > You would need to leverage thing like uevent to get event when something
> > > > happen like a bo being destroy or command submission ...
> > >
> > > The problem with this approach is when I'm reading an IB I'm not given user
> > > space addresses but bus addresses. So I can't correlate anything I'm seeing
> > > in the hardware with the user task if I wanted to.
> > >
> > > In fact, to augment [say] OpenGL debugging I would have to correlate a
> > > buffer handle/pointer's page backing with the bus address in the IB so I
> > > could correlate the two (e.g. dump an IB and print out user process variable
> > > names that correspond to the IB contents...).
> >
> > When you read IB you are provided with GPU virtual address, you can get the
> > GPU virtual address from the same snoop ioctl just add a field in bo_info
> > above. So i don't see any issue here.
>
> This is effectively what I'm doing with the patch except via a trace not a
> bolted on snooping ioctl.
>
> Tracers are a bit better since there's not really any overhead in the normal
> case which is desirable (and the code is relatively very simple).
>
> In an ideal case we could simply search the ttm pages for a given device to
> look for a match for a given dma address. But that also has the problem
> that ...
>
> This will fail for all the other allocations e.g. for our GART, ring
> buffers, etc.
>
> So a "complete" solution would be nice where any bus address mapping that is
> still valid for a device could be resolved to the physical page that is
> backing it.
So you agree that you can get what you want with ioctl ? Now you object
that the ioctl overhead is too important ?
I do not believe ioctl overhead to be an issue. Like i say in the model
i adviced you mix ioctl and thing like uevent or trace so you get "real
time" notification of what is going on.
Why do i say that your approach of tracking individual page mapping is
insane. There is several obstacle:
- to get from GPU virtual address to physical memory address you need
to first walk down the GPU page table to get the GPU pte entry. From
that you either get an address inside the GPU memory (VRAM of PCIE
device) or a DMA bus address. If it is the latter (DMA bus address)
you need to walk down the IOMMU page table to find the physical address
of the memory. So this is 2 page table walk down
- none of the above walk down can happen while holding enough lock so
that the GPU page table do not change behind you back (especialy the
second walk down) so you are open to race
- GPU device memory is not necessary accessible (due to PCIE bar size
limit i know that there is patches to allow to remap all GPU memory).
- you need to track a lot of informations in userspace ie what memory
is behind each 4k of GPU virtual address space you are interested in
so you are basicaly duplicating informations that already exist in
kernel space into userland
With what i outline you rely on the kernel informations to get to the
thing you want. You can even add something like:
struct amdgpu_snoop_snapshot_ioctl {
uint64_t gpu_virtual_address;
uint64_t size;
uint64_t uptr;
};
To snapshot a range of the GPU virtual address space so you not even
have to worry about GPU virtual address -> bo handle -> offset into
bo. You can even go further and add a /dev/dri/card0-snoop device file
that you can mmap to access GPU virtual address space. Process open
the card0 file register as snoop against another process from that point
on a GPU virtual address in the snooped process becomes an offset into
the card0-snoop file. So now you can read/write on that device file
to access the GPU memory of a process without doing ioctl. The changes
to amdgpu are not that big.
>
> > > Again, not looking to rehash this debate. AMDGPU has had register level
> > > debugfs access for nearly two years now. At the point you can write to
> > > hardware registers you have to be root anyways. I could just as easily load
> > > a tainted AMDGPU driver if I wanted to at that point which then achieves the
> > > same debugfs-free badness for me.
> >
> > You also have IOMMU that restrict what you can do. Even if you can write
> > register to program a DMA operation you are still likely behind an IOMMU
> > and thus you can't write anywhere (unless IOMMU is disabled or in pass
> > through ie 1 on 1 mapping).
>
> Fair point. All I was saying is at the point you have MMIO access you can
> typically do bad things if you wanted to (like crash the system).
>
> > I am saying you can have access to this memory with a sane API and not with
> > something insane. What is wrong with what i outlined above ?
>
> Well aside from the fact it doesn't cover all mappings, I can't see how it's
> materially different than what I was already doing. For this to work I'd
> have to be able to mmap objects from another process into the debugger. I
> don't know if drm is really designed for that sort of flow and I suspect it
> would take a lot of work to add that.
The amount of work does not matter. We are aiming for sane solution not
insane thing. We are aiming for thing that are safe from security point
of view. We are aiming for design that lower barrier (ie no need to be
root to debug/trace GPU). This is what matters.
Quite frankly what i outline is not a lot of work. The snoop device file
scheme for instance is pretty easy to achieve. Everytime there is a change
to ttm object or GPU virtual address space you would need to update
accordingly any mmap of that file. Overall i would say this is in the
500/1000 lines of code range.
GPU device memory is slightly harder, you probably need to do it as a
snapshot ie you copy device memory to system memory and map the system
memory.
>
> > > At the point I'm root I can attach to any process, change memory contents,
> > > do whatever I want. Unless the kernel has such broken vfs security that it
> > > allows non-root users to read/write debugfs/etc it shouldn't be a problem.
> >
> > With sane API you would not need to be root to debug/trace GPU activity of
> > your own process like gdb.
>
> You need to be root to read GPU registers though (simply reading some
> registers can cause a GPU hang). So if you want to see which waves are
> active, or the contents of a ring (or it's IBs) you can't do that as a
> normal user. There's a very tiny subset of registers you can access through
> libdrm and often we need more than that (especially for npi work).
>
> Indeed we recommend that on devel machines people +s umr so they can invoke
> it from normal user processes. You wouldn't normally install umr unless you
> were a developer anyways so for customers/users it's not an issue.
>
> It's not exactly as confined as all-software which is where some compromises
> are made.
We want thing like debugger to work out of the box on any distribution so
design must be sane and you should not need either to ask special modules
or special kernel config option or to be root. There is way to do what you
want without any of this requirement and thus that should be what you aim
for.
>
> > > > > The existing tracer patch "works" but it's not ideal and the maintainers of
> > > > > the AMDGPU driver are legitimately desiring a better solution.
> > > >
> > > > Again re-design what you are trying to do. There is no legitimate need to
> > > > track individual page mapping. All you need is access to all buffer object
> > > > a process have created and to be inform on anything a process do through
> > > > the amdgpu device file.
> > >
> > > Again the problem is the hardware is programmed with the dma address. So I
> > > want to correlate what the hardware is doing with what the process is doing
> > > I need the mappings.
> >
> > You only need the GPU virtual address and you can get it with the API i outline.
>
> Which is effectively what my tracer is. But it (and your design) can't
> translate all mappings which is why we even engaged with iommu list in the
> first place at all.
My design do not need to get the reverse mapping at all that is my point.
No need to engage the iommu folks no need to ask them for something as
you have all the information from inside the amdgpu driver.
>
> Adding drm support to allow another process to GEM_MAP BOs (and a search
> ioctl) might work but seems like a lot more work (and will likely reduce
> performance) than the tracer and has the same limitation. And ultimately we
> find direct memory access useful in NPI work so that's unlikely to be
> removed.
You have not proof that it would be slower and i outline the special file
idea that avoid any ioctl.
>
> Thanks for your time Jerome. I do understand where you're coming from but
> I'm not sure a drm update is both politically or technically feasible and we
> need this functionality (limited though it is) sooner than later.
You have to understand that kernel is not about compromising design so that
people can reach their schedule. Sane design do matter and if sane design
require lot of work than you should size your team accordingly to face the
challenges.
What you want to do can be done with very little code and no change outside
amdgpu kernel driver so i do not see any reason to not do so.
Jérôme
next prev parent reply other threads:[~2017-08-01 19:55 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-01 10:07 Feature Request: Ability to decode bus/dma address back into physical address Tom St Denis
[not found] ` <8379cf5a-7539-e221-c678-20f617fb4337-5C7GfCeVMHo@public.gmane.org>
2017-08-01 17:25 ` Jerome Glisse
[not found] ` <30eb1ecb-c86f-4d3b-cd49-e002f46e582d@amd.com>
[not found] ` <30eb1ecb-c86f-4d3b-cd49-e002f46e582d-5C7GfCeVMHo@public.gmane.org>
2017-08-01 18:04 ` Jerome Glisse
[not found] ` <483ecda0-2977-d2ea-794c-320e429d7645@amd.com>
[not found] ` <483ecda0-2977-d2ea-794c-320e429d7645-5C7GfCeVMHo@public.gmane.org>
2017-08-01 19:03 ` Jerome Glisse
[not found] ` <42c5fe2b-f179-cb71-03d3-7ae991543edb@amd.com>
[not found] ` <42c5fe2b-f179-cb71-03d3-7ae991543edb-5C7GfCeVMHo@public.gmane.org>
2017-08-01 19:55 ` Jerome Glisse [this message]
[not found] ` <20170801195556.GD3443-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-08-01 20:26 ` Tom St Denis
[not found] ` <77e557d2-aa75-46c4-88a7-cca5448ea08e-5C7GfCeVMHo@public.gmane.org>
2017-08-01 20:43 ` Deucher, Alexander
2017-08-01 21:01 ` Jerome Glisse
[not found] ` <20170801210105.GE3443-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-08-01 21:38 ` Tom St Denis
[not found] ` <2cd345ee-d5ad-1ad7-508a-86225e65621c-5C7GfCeVMHo@public.gmane.org>
2017-08-02 4:42 ` Jerome Glisse
[not found] ` <20170802044214.GA6285-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-08-02 8:26 ` Christian König
[not found] ` <4a2b004b-ebc2-9331-84c4-4e6672dd7b97-5C7GfCeVMHo@public.gmane.org>
2017-08-02 16:43 ` Jerome Glisse
[not found] ` <20170802164343.GA3105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-08-02 17:05 ` Christian König
[not found] ` <5ecbb5c2-fe4e-fd84-43b5-67ae06c5a032-5C7GfCeVMHo@public.gmane.org>
2017-08-02 17:13 ` Jerome Glisse
[not found] ` <20170802171303.GB3105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-08-02 17:23 ` Christian König
[not found] ` <d4a6ad38-522c-2928-2ef1-1f7cd2267c4e-5C7GfCeVMHo@public.gmane.org>
2017-08-02 17:33 ` Jerome Glisse
[not found] ` <20170802173311.GC3105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-08-02 18:18 ` Christian König
[not found] ` <4f00e033-7a08-e54f-4b64-2813d5f25b5b-5C7GfCeVMHo@public.gmane.org>
2017-08-02 18:36 ` Jerome Glisse
2017-08-02 18:42 ` Robin Murphy
[not found] ` <2f9da712-1559-6593-e512-c0508e21d747-5wv7dgnIgG8@public.gmane.org>
2017-08-02 19:25 ` Tom St Denis
2017-08-01 20:43 ` Alex Williamson
[not found] ` <20170801144320.63bda17d-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-08-01 21:38 ` Tom St Denis
[not found] ` <aa08a829-b472-cc39-b1a3-6a7d01f64da1-5C7GfCeVMHo@public.gmane.org>
2017-08-01 22:42 ` Alex Williamson
[not found] ` <20170801164250.3bae6436-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-08-04 9:43 ` Joerg Roedel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170801195556.GD3443@gmail.com \
--to=j.glisse-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=Alexander.Deucher-5C7GfCeVMHo@public.gmane.org \
--cc=Christian.Koenig-5C7GfCeVMHo@public.gmane.org \
--cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=tom.stdenis-5C7GfCeVMHo@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).