From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pd0-f182.google.com (mail-pd0-f182.google.com [209.85.192.182]) by kanga.kvack.org (Postfix) with ESMTP id 3BED36B0038 for ; Mon, 21 Jul 2014 08:37:10 -0400 (EDT) Received: by mail-pd0-f182.google.com with SMTP id fp1so9126665pdb.13 for ; Mon, 21 Jul 2014 05:37:09 -0700 (PDT) Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2lp0244.outbound.protection.outlook.com. [207.46.163.244]) by mx.google.com with ESMTPS id gv10si14113089pbd.90.2014.07.21.05.37.08 for (version=TLSv1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 21 Jul 2014 05:37:09 -0700 (PDT) Message-ID: <53CD0961.4070505@amd.com> Date: Mon, 21 Jul 2014 15:36:49 +0300 From: Oded Gabbay MIME-Version: 1.0 Subject: Re: [PATCH v2 00/25] AMDKFD kernel driver References: <53C7D645.3070607@amd.com> <20140720174652.GE3068@gmail.com> In-Reply-To: <20140720174652.GE3068@gmail.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Jerome Glisse Cc: David Airlie , Alex Deucher , Andrew Morton , John Bridgman , Joerg Roedel , Andrew Lewycky , =?ISO-8859-1?Q?Christian_K=F6nig?= , =?ISO-8859-1?Q?Michel_D=E4nzer?= , Ben Goz , Alexey Skidanov , Evgeny Pinchuk , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , linux-mm On 20/07/14 20:46, Jerome Glisse wrote: > On Thu, Jul 17, 2014 at 04:57:25PM +0300, Oded Gabbay wrote: >> Forgot to cc mailing list on cover letter. Sorry. >> >> As a continuation to the existing discussion, here is a v2 patch serie= s >> restructured with a cleaner history and no totally-different-early-ver= sions >> of the code. >> >> Instead of 83 patches, there are now a total of 25 patches, where 5 of= them >> are modifications to radeon driver and 18 of them include only amdkfd = code. >> There is no code going away or even modified between patches, only add= ed. >> >> The driver was renamed from radeon_kfd to amdkfd and moved to reside u= nder >> drm/radeon/amdkfd. This move was done to emphasize the fact that this = driver >> is an AMD-only driver at this point. Having said that, we do foresee a >> generic hsa framework being implemented in the future and in that case= , we >> will adjust amdkfd to work within that framework. >> >> As the amdkfd driver should support multiple AMD gfx drivers, we want = to >> keep it as a seperate driver from radeon. Therefore, the amdkfd code i= s >> contained in its own folder. The amdkfd folder was put under the radeo= n >> folder because the only AMD gfx driver in the Linux kernel at this poi= nt >> is the radeon driver. Having said that, we will probably need to move = it >> (maybe to be directly under drm) after we integrate with additional AM= D gfx >> drivers. >> >> For people who like to review using git, the v2 patch set is located a= t: >> http://cgit.freedesktop.org/~gabbayo/linux/log/?h=3Dkfd-next-3.17-v2 >> >> Written by Oded Gabbayh > > So quick comments before i finish going over all patches. There is many > things that need more documentation espacialy as of right now there is > no userspace i can go look at. So quick comments on some of your questions but first of all, thanks for = the=20 time you dedicated to review the code. > > There few show stopper, biggest one is gpu memory pinning this is a big > no, that would need serious arguments for any hope of convincing me on > that side. We only do gpu memory pinning for kernel objects. There are no userspace = objects=20 that are pinned on the gpu memory in our driver. If that is the case, is = it=20 still a show stopper ? The kernel objects are: - pipelines (4 per device) - mqd per hiq (only 1 per device) - mqd per userspace queue. On KV, we support up to 1K queues per process,= for a=20 total of 512K queues. Each mqd is 151 bytes, but the allocation is done i= n 256=20 alignment. So total *possible* memory is 128MB - kernel queue (only 1 per device) - fence address for kernel queue - runlists for the CP (1 or 2 per device) > > It might be better to add a drivers/gpu/drm/amd directory and add commo= n > stuff there. > > Given that this is not intended to be final HSA api AFAICT then i would > say this far better to avoid the whole kfd module and add ioctl to rade= on. > This would avoid crazy communication btw radeon and kfd. > > The whole aperture business needs some serious explanation. Especialy a= s > you want to use userspace address there is nothing to prevent userspace > program from allocating things at address you reserve for lds, scratch, > ... only sane way would be to move those lds, scratch inside the virtua= l > address reserved for kernel (see kernel memory map). > > The whole business of locking performance counter for exclusive per pro= cess > access is a big NO. Which leads me to the questionable usefullness of u= ser > space command ring. That's like saying: "Which leads me to the questionable usefulness of HSA= ". I=20 find it analogous to a situation where a network maintainer nacking a dri= ver for=20 a network card, which is slower than a different network card. Doesn't se= em=20 reasonable this situation is would happen. He would still put both the dr= ivers=20 in the kernel because people want to use the H/W and its features. So, I = don't=20 think this is a valid reason to NACK the driver. > I only see issues with that. First and foremost i would > need to see solid figures that kernel ioctl or syscall has a higher an > overhead that is measurable in any meaning full way against a simple > function call. I know the userspace command ring is a big marketing fea= tures > that please ignorant userspace programmer. But really this only brings = issues > and for absolutely not upside afaict. Really ? You think that doing a context switch to kernel space, with all = its=20 overhead, is _not_ more expansive than just calling a function in userspa= ce=20 which only puts a buffer on a ring and writes a doorbell ? > > So i would rather see a very simple ioctl that write the doorbell and m= ight > do more than that in case of ring/queue overcommit where it would first= have > to wait for a free ring/queue to schedule stuff. This would also allow = sane > implementation of things like performance counter that could be acquire= by > kernel for duration of a job submitted by userspace. While still not op= timal > this would be better that userspace locking. > > > I might have more thoughts once i am done with all the patches. > > Cheers, > J=E9r=F4me > >> >> Original Cover Letter: >> >> This patch set implements a Heterogeneous System Architecture (HSA) dr= iver >> for radeon-family GPUs. >> HSA allows different processor types (CPUs, DSPs, GPUs, etc..) to shar= e >> system resources more effectively via HW features including shared pag= eable >> memory, userspace-accessible work queues, and platform-level atomics. = In >> addition to the memory protection mechanisms in GPUVM and IOMMUv2, the= Sea >> Islands family of GPUs also performs HW-level validation of commands p= assed >> in through the queues (aka rings). >> >> The code in this patch set is intended to serve both as a sample drive= r for >> other HSA-compatible hardware devices and as a production driver for >> radeon-family processors. The code is architected to support multiple = CPUs >> each with connected GPUs, although the current implementation focuses = on a >> single Kaveri/Berlin APU, and works alongside the existing radeon kern= el >> graphics driver (kgd). >> AMD GPUs designed for use with HSA (Sea Islands and up) share some har= dware >> functionality between HSA compute and regular gfx/compute (memory, >> interrupts, registers), while other functionality has been added >> specifically for HSA compute (hw scheduler for virtualized compute ri= ngs). >> All shared hardware is owned by the radeon graphics driver, and an int= erface >> between kfd and kgd allows the kfd to make use of those shared resourc= es, >> while HSA-specific functionality is managed directly by kfd by submitt= ing >> packets into an HSA-specific command queue (the "HIQ"). >> >> During kfd module initialization a char device node (/dev/kfd) is crea= ted >> (surviving until module exit), with ioctls for queue creation & manage= ment, >> and data structures are initialized for managing HSA device topology. >> The rest of the initialization is driven by calls from the radeon kgd = at the >> following points : >> >> - radeon_init (kfd_init) >> - radeon_exit (kfd_fini) >> - radeon_driver_load_kms (kfd_device_probe, kfd_device_init) >> - radeon_driver_unload_kms (kfd_device_fini) >> >> During the probe and init processing per-device data structures are >> established which connect to the associated graphics kernel driver. Th= is >> information is exposed to userspace via sysfs, along with a version nu= mber >> allowing userspace to determine if a topology change has occurred whil= e it >> was reading from sysfs. >> The interface between kfd and kgd also allows the kfd to request buffe= r >> management services from kgd, and allows kgd to route interrupt reques= ts to >> kfd code since the interrupt block is shared between regular >> graphics/compute and HSA compute subsystems in the GPU. >> >> The kfd code works with an open source usermode library ("libhsakmt") = which >> is in the final stages of IP review and should be published in a separ= ate >> repo over the next few days. >> The code operates in one of three modes, selectable via the sched_poli= cy >> module parameter : >> >> - sched_policy=3D0 uses a hardware scheduler running in the MEC block = within >> CP, and allows oversubscription (more queues than HW slots) >> - sched_policy=3D1 also uses HW scheduling but does not allow >> oversubscription, so create_queue requests fail when we run out of HW = slots >> - sched_policy=3D2 does not use HW scheduling, so the driver manually = assigns >> queues to HW slots by programming registers >> >> The "no HW scheduling" option is for debug & new hardware bringup only= , so >> has less test coverage than the other options. Default in the current = code >> is "HW scheduling without oversubscription" since that is where we hav= e the >> most test coverage but we expect to change the default to "HW scheduli= ng >> with oversubscription" after further testing. This effectively removes= the >> HW limit on the number of work queues available to applications. >> >> Programs running on the GPU are associated with an address space throu= gh the >> VMID field, which is translated to a unique PASID at access time via a= set >> of 16 VMID-to-PASID mapping registers. The available VMIDs (currently = 16) >> are partitioned (under control of the radeon kgd) between current >> gfx/compute and HSA compute, with each getting 8 in the current code. = The >> VMID-to-PASID mapping registers are updated by the HW scheduler when u= sed, >> and by driver code if HW scheduling is not being used. >> The Sea Islands compute queues use a new "doorbell" mechanism instead = of the >> earlier kernel-managed write pointer registers. Doorbells use a separa= te BAR >> dedicated for this purpose, and pages within the doorbell aperture are >> mapped to userspace (each page mapped to only one user address space). >> Writes to the doorbell aperture are intercepted by GPU hardware, allow= ing >> userspace code to safely manage work queues (rings) without requiring = a >> kernel call for every ring update. >> First step for an application process is to open the kfd device. Calls= to >> open create a kfd "process" structure only for the first thread of the >> process. Subsequent open calls are checked to see if they are from pro= cesses >> using the same mm_struct and, if so, don't do anything. The kfd per-pr= ocess >> data lives as long as the mm_struct exists. Each mm_struct is associat= ed >> with a unique PASID, allowing the IOMMUv2 to make userspace process me= mory >> accessible to the GPU. >> Next step is for the application to collect topology information via s= ysfs. >> This gives userspace enough information to be able to identify specifi= c >> nodes (processors) in subsequent queue management calls. Application >> processes can create queues on multiple processors, and processors sup= port >> queues from multiple processes. >> At this point the application can create work queues in userspace memo= ry and >> pass them through the usermode library to kfd to have them mapped onto= HW >> queue slots so that commands written to the queues can be executed by = the >> GPU. Queue operations specify a processor node, and so the bulk of thi= s code >> is device-specific. >> Written by John Bridgman >> >> >> Alexey Skidanov (1): >> amdkfd: Implement the Get Process Aperture IOCTL >> >> Andrew Lewycky (3): >> amdkfd: Add basic modules to amdkfd >> amdkfd: Add interrupt handling module >> amdkfd: Implement the Set Memory Policy IOCTL >> >> Ben Goz (8): >> amdkfd: Add queue module >> amdkfd: Add mqd_manager module >> amdkfd: Add kernel queue module >> amdkfd: Add module parameter of scheduling policy >> amdkfd: Add packet manager module >> amdkfd: Add process queue manager module >> amdkfd: Add device queue manager module >> amdkfd: Implement the create/destroy/update queue IOCTLs >> >> Evgeny Pinchuk (3): >> amdkfd: Add topology module to amdkfd >> amdkfd: Implement the Get Clock Counters IOCTL >> amdkfd: Implement the PMC Acquire/Release IOCTLs >> >> Oded Gabbay (10): >> mm: Add kfd_process pointer to mm_struct >> drm/radeon: reduce number of free VMIDs and pipes in KV >> drm/radeon/cik: Don't touch int of pipes 1-7 >> drm/radeon: Report doorbell configuration to amdkfd >> drm/radeon: adding synchronization for GRBM GFX >> drm/radeon: Add radeon <--> amdkfd interface >> Update MAINTAINERS and CREDITS files with amdkfd info >> amdkfd: Add IOCTL set definitions of amdkfd >> amdkfd: Add amdkfd skeleton driver >> amdkfd: Add binding/unbinding calls to amd_iommu driver >> >> CREDITS | 7 + >> MAINTAINERS | 10 + >> drivers/gpu/drm/radeon/Kconfig | 2 + >> drivers/gpu/drm/radeon/Makefile | 3 + >> drivers/gpu/drm/radeon/amdkfd/Kconfig | 10 + >> drivers/gpu/drm/radeon/amdkfd/Makefile | 14 + >> drivers/gpu/drm/radeon/amdkfd/cik_mqds.h | 185 +++ >> drivers/gpu/drm/radeon/amdkfd/cik_regs.h | 220 ++++ >> drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c | 123 ++ >> drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c | 518 +++++++++ >> drivers/gpu/drm/radeon/amdkfd/kfd_crat.h | 294 +++++ >> drivers/gpu/drm/radeon/amdkfd/kfd_device.c | 254 ++++ >> .../drm/radeon/amdkfd/kfd_device_queue_manager.c | 985 ++++++++++= ++++++ >> .../drm/radeon/amdkfd/kfd_device_queue_manager.h | 101 ++ >> drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c | 264 +++++ >> drivers/gpu/drm/radeon/amdkfd/kfd_interrupt.c | 161 +++ >> drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c | 305 +++++ >> drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h | 66 ++ >> drivers/gpu/drm/radeon/amdkfd/kfd_module.c | 131 +++ >> drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c | 291 +++++ >> drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h | 54 + >> drivers/gpu/drm/radeon/amdkfd/kfd_packet_manager.c | 488 ++++++++ >> drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c | 97 ++ >> drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h | 682 ++++++++++= + >> drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h | 107 ++ >> drivers/gpu/drm/radeon/amdkfd/kfd_priv.h | 466 ++++++++ >> drivers/gpu/drm/radeon/amdkfd/kfd_process.c | 405 +++++++ >> .../drm/radeon/amdkfd/kfd_process_queue_manager.c | 343 ++++++ >> drivers/gpu/drm/radeon/amdkfd/kfd_queue.c | 109 ++ >> drivers/gpu/drm/radeon/amdkfd/kfd_topology.c | 1207 ++++++++++= ++++++++++ >> drivers/gpu/drm/radeon/amdkfd/kfd_topology.h | 168 +++ >> drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c | 96 ++ >> drivers/gpu/drm/radeon/cik.c | 154 +-- >> drivers/gpu/drm/radeon/cik_reg.h | 65 ++ >> drivers/gpu/drm/radeon/cikd.h | 51 +- >> drivers/gpu/drm/radeon/radeon.h | 9 + >> drivers/gpu/drm/radeon/radeon_device.c | 32 + >> drivers/gpu/drm/radeon/radeon_drv.c | 5 + >> drivers/gpu/drm/radeon/radeon_kfd.c | 566 +++++++++ >> drivers/gpu/drm/radeon/radeon_kfd.h | 119 ++ >> drivers/gpu/drm/radeon/radeon_kms.c | 7 + >> include/linux/mm_types.h | 14 + >> include/uapi/linux/kfd_ioctl.h | 133 +++ >> 43 files changed, 9226 insertions(+), 95 deletions(-) >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/Kconfig >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/Makefile >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/cik_mqds.h >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/cik_regs.h >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_crat.h >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_device.c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_ma= nager.c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_ma= nager.h >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_interrupt.c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_module.c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_packet_manager.= c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_process.c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_process_queue_m= anager.c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_queue.c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.c >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.h >> create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c >> create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c >> create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.h >> create mode 100644 include/uapi/linux/kfd_ioctl.h >> >> -- >> 1.9.1 >> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org