From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f177.google.com (mail-pd0-f177.google.com [209.85.192.177]) by kanga.kvack.org (Postfix) with ESMTP id 6F2F96B0035 for ; Mon, 21 Jul 2014 10:12:27 -0400 (EDT) Received: by mail-pd0-f177.google.com with SMTP id p10so9135271pdj.22 for ; Mon, 21 Jul 2014 07:12:24 -0700 (PDT) Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2lp0235.outbound.protection.outlook.com. [207.46.163.235]) by mx.google.com with ESMTPS id bg5si7197921pdb.210.2014.07.21.07.12.23 for (version=TLSv1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 21 Jul 2014 07:12:23 -0700 (PDT) Message-ID: <53CD1FB6.1000602@amd.com> Date: Mon, 21 Jul 2014 17:12:06 +0300 From: Oded Gabbay MIME-Version: 1.0 Subject: Re: [PATCH v2 00/25] AMDKFD kernel driver References: <53C7D645.3070607@amd.com> <20140720174652.GE3068@gmail.com> <53CD0961.4070505@amd.com> <53CD17FD.3000908@vodafone.de> In-Reply-To: <53CD17FD.3000908@vodafone.de> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: =?ISO-8859-1?Q?Christian_K=F6nig?= , Jerome Glisse Cc: David Airlie , Alex Deucher , Andrew Morton , John Bridgman , Joerg Roedel , Andrew Lewycky , =?ISO-8859-1?Q?Michel_D=E4nzer?= , Ben Goz , Alexey Skidanov , Evgeny Pinchuk , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , linux-mm On 21/07/14 16:39, Christian K=F6nig wrote: > Am 21.07.2014 14:36, schrieb Oded Gabbay: >> On 20/07/14 20:46, Jerome Glisse wrote: >>> On Thu, Jul 17, 2014 at 04:57:25PM +0300, Oded Gabbay wrote: >>>> Forgot to cc mailing list on cover letter. Sorry. >>>> >>>> As a continuation to the existing discussion, here is a v2 patch ser= ies >>>> restructured with a cleaner history and no totally-different-early-v= ersions >>>> of the code. >>>> >>>> Instead of 83 patches, there are now a total of 25 patches, where 5 = of them >>>> are modifications to radeon driver and 18 of them include only amdkf= d code. >>>> There is no code going away or even modified between patches, only a= dded. >>>> >>>> The driver was renamed from radeon_kfd to amdkfd and moved to reside= under >>>> drm/radeon/amdkfd. This move was done to emphasize the fact that thi= s driver >>>> is an AMD-only driver at this point. Having said that, we do foresee= a >>>> generic hsa framework being implemented in the future and in that ca= se, we >>>> will adjust amdkfd to work within that framework. >>>> >>>> As the amdkfd driver should support multiple AMD gfx drivers, we wan= t to >>>> keep it as a seperate driver from radeon. Therefore, the amdkfd code= is >>>> contained in its own folder. The amdkfd folder was put under the rad= eon >>>> folder because the only AMD gfx driver in the Linux kernel at this p= oint >>>> is the radeon driver. Having said that, we will probably need to mov= e it >>>> (maybe to be directly under drm) after we integrate with additional = AMD gfx >>>> drivers. >>>> >>>> For people who like to review using git, the v2 patch set is located= at: >>>> http://cgit.freedesktop.org/~gabbayo/linux/log/?h=3Dkfd-next-3.17-v2 >>>> >>>> Written by Oded Gabbayh >>> >>> So quick comments before i finish going over all patches. There is ma= ny >>> things that need more documentation espacialy as of right now there i= s >>> no userspace i can go look at. >> So quick comments on some of your questions but first of all, thanks f= or the >> time you dedicated to review the code. >>> >>> There few show stopper, biggest one is gpu memory pinning this is a b= ig >>> no, that would need serious arguments for any hope of convincing me o= n >>> that side. >> We only do gpu memory pinning for kernel objects. There are no userspa= ce >> objects that are pinned on the gpu memory in our driver. If that is th= e case, >> is it still a show stopper ? >> >> The kernel objects are: >> - pipelines (4 per device) >> - mqd per hiq (only 1 per device) >> - mqd per userspace queue. On KV, we support up to 1K queues per proce= ss, for >> a total of 512K queues. Each mqd is 151 bytes, but the allocation is d= one in >> 256 alignment. So total *possible* memory is 128MB >> - kernel queue (only 1 per device) >> - fence address for kernel queue >> - runlists for the CP (1 or 2 per device) > > The main questions here are if it's avoid able to pin down the memory a= nd if the > memory is pinned down at driver load, by request from userspace or by a= nything > else. > > As far as I can see only the "mqd per userspace queue" might be a bit > questionable, everything else sounds reasonable. > > Christian. Most of the pin downs are done on device initialization. The "mqd per userspace" is done per userspace queue creation. However, as= I=20 said, it has an upper limit of 128MB on KV, and considering the 2G local = memory,=20 I think it is OK. The runlists are also done on userspace queue creation/deletion, but we o= nly=20 have 1 or 2 runlists per device, so it is not that bad. Oded > >>> >>> It might be better to add a drivers/gpu/drm/amd directory and add com= mon >>> stuff there. >>> >>> Given that this is not intended to be final HSA api AFAICT then i wou= ld >>> say this far better to avoid the whole kfd module and add ioctl to ra= deon. >>> This would avoid crazy communication btw radeon and kfd. >>> >>> The whole aperture business needs some serious explanation. Especialy= as >>> you want to use userspace address there is nothing to prevent userspa= ce >>> program from allocating things at address you reserve for lds, scratc= h, >>> ... only sane way would be to move those lds, scratch inside the virt= ual >>> address reserved for kernel (see kernel memory map). >>> >>> The whole business of locking performance counter for exclusive per p= rocess >>> access is a big NO. Which leads me to the questionable usefullness of= user >>> space command ring. >> That's like saying: "Which leads me to the questionable usefulness of = HSA". I >> find it analogous to a situation where a network maintainer nacking a = driver >> for a network card, which is slower than a different network card. Doe= sn't >> seem reasonable this situation is would happen. He would still put bot= h the >> drivers in the kernel because people want to use the H/W and its featu= res. So, >> I don't think this is a valid reason to NACK the driver. >> >>> I only see issues with that. First and foremost i would >>> need to see solid figures that kernel ioctl or syscall has a higher a= n >>> overhead that is measurable in any meaning full way against a simple >>> function call. I know the userspace command ring is a big marketing f= eatures >>> that please ignorant userspace programmer. But really this only bring= s issues >>> and for absolutely not upside afaict. >> Really ? You think that doing a context switch to kernel space, with a= ll its >> overhead, is _not_ more expansive than just calling a function in user= space >> which only puts a buffer on a ring and writes a doorbell ? >>> >>> So i would rather see a very simple ioctl that write the doorbell and= might >>> do more than that in case of ring/queue overcommit where it would fir= st have >>> to wait for a free ring/queue to schedule stuff. This would also allo= w sane >>> implementation of things like performance counter that could be acqui= re by >>> kernel for duration of a job submitted by userspace. While still not = optimal >>> this would be better that userspace locking. >>> >>> >>> I might have more thoughts once i am done with all the patches. >>> >>> Cheers, >>> J=E9r=F4me >>> >>>> >>>> Original Cover Letter: >>>> >>>> This patch set implements a Heterogeneous System Architecture (HSA) = driver >>>> for radeon-family GPUs. >>>> HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to sh= are >>>> system resources more effectively via HW features including shared p= ageable >>>> memory, userspace-accessible work queues, and platform-level atomics= . In >>>> addition to the memory protection mechanisms in GPUVM and IOMMUv2, t= he Sea >>>> Islands family of GPUs also performs HW-level validation of commands= passed >>>> in through the queues (aka rings). >>>> >>>> The code in this patch set is intended to serve both as a sample dri= ver for >>>> other HSA-compatible hardware devices and as a production driver for >>>> radeon-family processors. The code is architected to support multipl= e CPUs >>>> each with connected GPUs, although the current implementation focuse= s on a >>>> single Kaveri/Berlin APU, and works alongside the existing radeon ke= rnel >>>> graphics driver (kgd). >>>> AMD GPUs designed for use with HSA (Sea Islands and up) share some h= ardware >>>> functionality between HSA compute and regular gfx/compute (memory, >>>> interrupts, registers), while other functionality has been added >>>> specifically for HSA compute (hw scheduler for virtualized compute = rings). >>>> All shared hardware is owned by the radeon graphics driver, and an i= nterface >>>> between kfd and kgd allows the kfd to make use of those shared resou= rces, >>>> while HSA-specific functionality is managed directly by kfd by submi= tting >>>> packets into an HSA-specific command queue (the "HIQ"). >>>> >>>> During kfd module initialization a char device node (/dev/kfd) is cr= eated >>>> (surviving until module exit), with ioctls for queue creation & mana= gement, >>>> and data structures are initialized for managing HSA device topology= . >>>> The rest of the initialization is driven by calls from the radeon kg= d at the >>>> following points : >>>> >>>> - radeon_init (kfd_init) >>>> - radeon_exit (kfd_fini) >>>> - radeon_driver_load_kms (kfd_device_probe, kfd_device_init) >>>> - radeon_driver_unload_kms (kfd_device_fini) >>>> >>>> During the probe and init processing per-device data structures are >>>> established which connect to the associated graphics kernel driver. = This >>>> information is exposed to userspace via sysfs, along with a version = number >>>> allowing userspace to determine if a topology change has occurred wh= ile it >>>> was reading from sysfs. >>>> The interface between kfd and kgd also allows the kfd to request buf= fer >>>> management services from kgd, and allows kgd to route interrupt requ= ests to >>>> kfd code since the interrupt block is shared between regular >>>> graphics/compute and HSA compute subsystems in the GPU. >>>> >>>> The kfd code works with an open source usermode library ("libhsakmt"= ) which >>>> is in the final stages of IP review and should be published in a sep= arate >>>> repo over the next few days. >>>> The code operates in one of three modes, selectable via the sched_po= licy >>>> module parameter : >>>> >>>> - sched_policy=3D0 uses a hardware scheduler running in the MEC bloc= k within >>>> CP, and allows oversubscription (more queues than HW slots) >>>> - sched_policy=3D1 also uses HW scheduling but does not allow >>>> oversubscription, so create_queue requests fail when we run out of H= W slots >>>> - sched_policy=3D2 does not use HW scheduling, so the driver manuall= y assigns >>>> queues to HW slots by programming registers >>>> >>>> The "no HW scheduling" option is for debug & new hardware bringup on= ly, so >>>> has less test coverage than the other options. Default in the curren= t code >>>> is "HW scheduling without oversubscription" since that is where we h= ave the >>>> most test coverage but we expect to change the default to "HW schedu= ling >>>> with oversubscription" after further testing. This effectively remov= es the >>>> HW limit on the number of work queues available to applications. >>>> >>>> Programs running on the GPU are associated with an address space thr= ough the >>>> VMID field, which is translated to a unique PASID at access time via= a set >>>> of 16 VMID-to-PASID mapping registers. The available VMIDs (currentl= y 16) >>>> are partitioned (under control of the radeon kgd) between current >>>> gfx/compute and HSA compute, with each getting 8 in the current code= . The >>>> VMID-to-PASID mapping registers are updated by the HW scheduler when= used, >>>> and by driver code if HW scheduling is not being used. >>>> The Sea Islands compute queues use a new "doorbell" mechanism instea= d of the >>>> earlier kernel-managed write pointer registers. Doorbells use a sepa= rate BAR >>>> dedicated for this purpose, and pages within the doorbell aperture a= re >>>> mapped to userspace (each page mapped to only one user address space= ). >>>> Writes to the doorbell aperture are intercepted by GPU hardware, all= owing >>>> userspace code to safely manage work queues (rings) without requirin= g a >>>> kernel call for every ring update. >>>> First step for an application process is to open the kfd device. Cal= ls to >>>> open create a kfd "process" structure only for the first thread of t= he >>>> process. Subsequent open calls are checked to see if they are from p= rocesses >>>> using the same mm_struct and, if so, don't do anything. The kfd per-= process >>>> data lives as long as the mm_struct exists. Each mm_struct is associ= ated >>>> with a unique PASID, allowing the IOMMUv2 to make userspace process = memory >>>> accessible to the GPU. >>>> Next step is for the application to collect topology information via= sysfs. >>>> This gives userspace enough information to be able to identify speci= fic >>>> nodes (processors) in subsequent queue management calls. Application >>>> processes can create queues on multiple processors, and processors s= upport >>>> queues from multiple processes. >>>> At this point the application can create work queues in userspace me= mory and >>>> pass them through the usermode library to kfd to have them mapped on= to HW >>>> queue slots so that commands written to the queues can be executed b= y the >>>> GPU. Queue operations specify a processor node, and so the bulk of t= his code >>>> is device-specific. >>>> Written by John Bridgman >>>> >>>> >>>> Alexey Skidanov (1): >>>> amdkfd: Implement the Get Process Aperture IOCTL >>>> >>>> Andrew Lewycky (3): >>>> amdkfd: Add basic modules to amdkfd >>>> amdkfd: Add interrupt handling module >>>> amdkfd: Implement the Set Memory Policy IOCTL >>>> >>>> Ben Goz (8): >>>> amdkfd: Add queue module >>>> amdkfd: Add mqd_manager module >>>> amdkfd: Add kernel queue module >>>> amdkfd: Add module parameter of scheduling policy >>>> amdkfd: Add packet manager module >>>> amdkfd: Add process queue manager module >>>> amdkfd: Add device queue manager module >>>> amdkfd: Implement the create/destroy/update queue IOCTLs >>>> >>>> Evgeny Pinchuk (3): >>>> amdkfd: Add topology module to amdkfd >>>> amdkfd: Implement the Get Clock Counters IOCTL >>>> amdkfd: Implement the PMC Acquire/Release IOCTLs >>>> >>>> Oded Gabbay (10): >>>> mm: Add kfd_process pointer to mm_struct >>>> drm/radeon: reduce number of free VMIDs and pipes in KV >>>> drm/radeon/cik: Don't touch int of pipes 1-7 >>>> drm/radeon: Report doorbell configuration to amdkfd >>>> drm/radeon: adding synchronization for GRBM GFX >>>> drm/radeon: Add radeon <--> amdkfd interface >>>> Update MAINTAINERS and CREDITS files with amdkfd info >>>> amdkfd: Add IOCTL set definitions of amdkfd >>>> amdkfd: Add amdkfd skeleton driver >>>> amdkfd: Add binding/unbinding calls to amd_iommu driver >>>> >>>> CREDITS | 7 + >>>> MAINTAINERS | 10 + >>>> drivers/gpu/drm/radeon/Kconfig | 2 + >>>> drivers/gpu/drm/radeon/Makefile | 3 + >>>> drivers/gpu/drm/radeon/amdkfd/Kconfig | 10 + >>>> drivers/gpu/drm/radeon/amdkfd/Makefile | 14 + >>>> drivers/gpu/drm/radeon/amdkfd/cik_mqds.h | 185 +++ >>>> drivers/gpu/drm/radeon/amdkfd/cik_regs.h | 220 ++++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c | 123 ++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c | 518 ++++++++= + >>>> drivers/gpu/drm/radeon/amdkfd/kfd_crat.h | 294 +++++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_device.c | 254 ++++ >>>> .../drm/radeon/amdkfd/kfd_device_queue_manager.c | 985 ++++++++= ++++++++ >>>> .../drm/radeon/amdkfd/kfd_device_queue_manager.h | 101 ++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c | 264 +++++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_interrupt.c | 161 +++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c | 305 +++++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h | 66 ++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_module.c | 131 +++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c | 291 +++++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h | 54 + >>>> drivers/gpu/drm/radeon/amdkfd/kfd_packet_manager.c | 488 ++++++++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c | 97 ++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h | 682 ++++++++= +++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h | 107 ++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_priv.h | 466 ++++++++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_process.c | 405 +++++++ >>>> .../drm/radeon/amdkfd/kfd_process_queue_manager.c | 343 ++++++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_queue.c | 109 ++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_topology.c | 1207 >>>> ++++++++++++++++++++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_topology.h | 168 +++ >>>> drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c | 96 ++ >>>> drivers/gpu/drm/radeon/cik.c | 154 +-- >>>> drivers/gpu/drm/radeon/cik_reg.h | 65 ++ >>>> drivers/gpu/drm/radeon/cikd.h | 51 +- >>>> drivers/gpu/drm/radeon/radeon.h | 9 + >>>> drivers/gpu/drm/radeon/radeon_device.c | 32 + >>>> drivers/gpu/drm/radeon/radeon_drv.c | 5 + >>>> drivers/gpu/drm/radeon/radeon_kfd.c | 566 ++++++++= + >>>> drivers/gpu/drm/radeon/radeon_kfd.h | 119 ++ >>>> drivers/gpu/drm/radeon/radeon_kms.c | 7 + >>>> include/linux/mm_types.h | 14 + >>>> include/uapi/linux/kfd_ioctl.h | 133 +++ >>>> 43 files changed, 9226 insertions(+), 95 deletions(-) >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/Kconfig >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/Makefile >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/cik_mqds.h >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/cik_regs.h >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_crat.h >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_device.c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_= manager.c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_= manager.h >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_interrupt.c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.= c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.= h >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_module.c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_packet_manage= r.c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_process.c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_process_queue= _manager.c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_queue.c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.c >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.h >>>> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c >>>> create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c >>>> create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.h >>>> create mode 100644 include/uapi/linux/kfd_ioctl.h >>>> >>>> -- >>>> 1.9.1 >>>> >> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org