[PATCH v2 02/25] drm/radeon: reduce number of free VMIDs and pipes in KV

dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 02/25] drm/radeon: reduce number of free VMIDs and pipes in KV
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-17 13:29 ` [PATCH v2 03/25] drm/radeon/cik: Don't touch int of pipes 1-7 Oded Gabbay
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

To support HSA on KV, we need to limit the number of vmids and pipes
that are available for radeon's use with KV.

This patch reserves VMIDs 8-15 for amdkfd (so radeon can only use VMIDs
0-7) and also makes radeon thinks that KV has only a single MEC with a single
pipe in it

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/cik.c | 48 ++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index 4bfc2c0..0b53633 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -4662,12 +4662,11 @@ static int cik_mec_init(struct radeon_device *rdev)
 	/*
 	 * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
 	 * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
+	 * Nonetheless, we assign only 1 pipe because all other pipes will
+	 * be handled by KFD
 	 */
-	if (rdev->family == CHIP_KAVERI)
-		rdev->mec.num_mec = 2;
-	else
-		rdev->mec.num_mec = 1;
-	rdev->mec.num_pipe = 4;
+	rdev->mec.num_mec = 1;
+	rdev->mec.num_pipe = 1;
 	rdev->mec.num_queue = rdev->mec.num_mec * rdev->mec.num_pipe * 8;
 
 	if (rdev->mec.hpd_eop_obj == NULL) {
@@ -4809,28 +4808,24 @@ static int cik_cp_compute_resume(struct radeon_device *rdev)
 
 	/* init the pipes */
 	mutex_lock(&rdev->srbm_mutex);
-	for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) {
-		int me = (i < 4) ? 1 : 2;
-		int pipe = (i < 4) ? i : (i - 4);
 
-		eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i * MEC_HPD_SIZE * 2);
+	eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
 
-		cik_srbm_select(rdev, me, pipe, 0, 0);
+	cik_srbm_select(rdev, 0, 0, 0, 0);
 
-		/* write the EOP addr */
-		WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
-		WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
+	/* write the EOP addr */
+	WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
+	WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
 
-		/* set the VMID assigned */
-		WREG32(CP_HPD_EOP_VMID, 0);
+	/* set the VMID assigned */
+	WREG32(CP_HPD_EOP_VMID, 0);
+
+	/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
+	tmp = RREG32(CP_HPD_EOP_CONTROL);
+	tmp &= ~EOP_SIZE_MASK;
+	tmp |= order_base_2(MEC_HPD_SIZE / 8);
+	WREG32(CP_HPD_EOP_CONTROL, tmp);
 
-		/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
-		tmp = RREG32(CP_HPD_EOP_CONTROL);
-		tmp &= ~EOP_SIZE_MASK;
-		tmp |= order_base_2(MEC_HPD_SIZE / 8);
-		WREG32(CP_HPD_EOP_CONTROL, tmp);
-	}
-	cik_srbm_select(rdev, 0, 0, 0, 0);
 	mutex_unlock(&rdev->srbm_mutex);
 
 	/* init the queues.  Just two for now. */
@@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib)
  */
 int cik_vm_init(struct radeon_device *rdev)
 {
-	/* number of VMs */
-	rdev->vm_manager.nvm = 16;
+	/*
+	 * number of VMs
+	 * VMID 0 is reserved for System
+	 * radeon graphics/compute will use VMIDs 1-7
+	 * amdkfd will use VMIDs 8-15
+	 */
+	rdev->vm_manager.nvm = 8;
 	/* base offset of vram pages */
 	if (rdev->flags & RADEON_IS_IGP) {
 		u64 tmp = RREG32(MC_VM_FB_OFFSET);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 03/25] drm/radeon/cik: Don't touch int of pipes 1-7
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
  2014-07-17 13:29 ` [PATCH v2 02/25] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-17 13:29 ` [PATCH v2 04/25] drm/radeon: Report doorbell configuration to amdkfd Oded Gabbay
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

amdkfd should set interrupts for pipes 1-7.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/cik.c | 71 +-------------------------------------------
 1 file changed, 1 insertion(+), 70 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index 0b53633..1d7dd3b 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -7270,8 +7270,7 @@ static int cik_irq_init(struct radeon_device *rdev)
 int cik_irq_set(struct radeon_device *rdev)
 {
 	u32 cp_int_cntl;
-	u32 cp_m1p0, cp_m1p1, cp_m1p2, cp_m1p3;
-	u32 cp_m2p0, cp_m2p1, cp_m2p2, cp_m2p3;
+	u32 cp_m1p0;
 	u32 crtc1 = 0, crtc2 = 0, crtc3 = 0, crtc4 = 0, crtc5 = 0, crtc6 = 0;
 	u32 hpd1, hpd2, hpd3, hpd4, hpd5, hpd6;
 	u32 grbm_int_cntl = 0;
@@ -7305,13 +7304,6 @@ int cik_irq_set(struct radeon_device *rdev)
 	dma_cntl1 = RREG32(SDMA0_CNTL + SDMA1_REGISTER_OFFSET) & ~TRAP_ENABLE;
 
 	cp_m1p0 = RREG32(CP_ME1_PIPE0_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m1p1 = RREG32(CP_ME1_PIPE1_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m1p2 = RREG32(CP_ME1_PIPE2_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m1p3 = RREG32(CP_ME1_PIPE3_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m2p0 = RREG32(CP_ME2_PIPE0_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m2p1 = RREG32(CP_ME2_PIPE1_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m2p2 = RREG32(CP_ME2_PIPE2_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m2p3 = RREG32(CP_ME2_PIPE3_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
 
 	if (rdev->flags & RADEON_IS_IGP)
 		thermal_int = RREG32_SMC(CG_THERMAL_INT_CTRL) &
@@ -7333,33 +7325,6 @@ int cik_irq_set(struct radeon_device *rdev)
 			case 0:
 				cp_m1p0 |= TIME_STAMP_INT_ENABLE;
 				break;
-			case 1:
-				cp_m1p1 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 2:
-				cp_m1p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 3:
-				cp_m1p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			default:
-				DRM_DEBUG("si_irq_set: sw int cp1 invalid pipe %d\n", ring->pipe);
-				break;
-			}
-		} else if (ring->me == 2) {
-			switch (ring->pipe) {
-			case 0:
-				cp_m2p0 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 1:
-				cp_m2p1 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 2:
-				cp_m2p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 3:
-				cp_m2p2 |= TIME_STAMP_INT_ENABLE;
-				break;
 			default:
 				DRM_DEBUG("si_irq_set: sw int cp1 invalid pipe %d\n", ring->pipe);
 				break;
@@ -7376,33 +7341,6 @@ int cik_irq_set(struct radeon_device *rdev)
 			case 0:
 				cp_m1p0 |= TIME_STAMP_INT_ENABLE;
 				break;
-			case 1:
-				cp_m1p1 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 2:
-				cp_m1p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 3:
-				cp_m1p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			default:
-				DRM_DEBUG("si_irq_set: sw int cp2 invalid pipe %d\n", ring->pipe);
-				break;
-			}
-		} else if (ring->me == 2) {
-			switch (ring->pipe) {
-			case 0:
-				cp_m2p0 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 1:
-				cp_m2p1 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 2:
-				cp_m2p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 3:
-				cp_m2p2 |= TIME_STAMP_INT_ENABLE;
-				break;
 			default:
 				DRM_DEBUG("si_irq_set: sw int cp2 invalid pipe %d\n", ring->pipe);
 				break;
@@ -7485,13 +7423,6 @@ int cik_irq_set(struct radeon_device *rdev)
 	WREG32(SDMA0_CNTL + SDMA1_REGISTER_OFFSET, dma_cntl1);
 
 	WREG32(CP_ME1_PIPE0_INT_CNTL, cp_m1p0);
-	WREG32(CP_ME1_PIPE1_INT_CNTL, cp_m1p1);
-	WREG32(CP_ME1_PIPE2_INT_CNTL, cp_m1p2);
-	WREG32(CP_ME1_PIPE3_INT_CNTL, cp_m1p3);
-	WREG32(CP_ME2_PIPE0_INT_CNTL, cp_m2p0);
-	WREG32(CP_ME2_PIPE1_INT_CNTL, cp_m2p1);
-	WREG32(CP_ME2_PIPE2_INT_CNTL, cp_m2p2);
-	WREG32(CP_ME2_PIPE3_INT_CNTL, cp_m2p3);
 
 	WREG32(GRBM_INT_CNTL, grbm_int_cntl);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 04/25] drm/radeon: Report doorbell configuration to amdkfd
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
  2014-07-17 13:29 ` [PATCH v2 02/25] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
  2014-07-17 13:29 ` [PATCH v2 03/25] drm/radeon/cik: Don't touch int of pipes 1-7 Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-17 13:29 ` [PATCH v2 05/25] drm/radeon: adding synchronization for GRBM GFX Oded Gabbay
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

radeon and amdkfd share the doorbell aperture.
radeon sets it up, takes the doorbells required for its own rings
and reports the setup to amdkfd.
radeon reserved doorbells are at the start of the doorbell aperture.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h        |  4 ++++
 drivers/gpu/drm/radeon/radeon_device.c | 31 +++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 7cda75d..4e7e41f 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -676,6 +676,10 @@ struct radeon_doorbell {
 
 int radeon_doorbell_get(struct radeon_device *rdev, u32 *page);
 void radeon_doorbell_free(struct radeon_device *rdev, u32 doorbell);
+void radeon_doorbell_get_kfd_info(struct radeon_device *rdev,
+				  phys_addr_t *aperture_base,
+				  size_t *aperture_size,
+				  size_t *start_offset);
 
 /*
  * IRQS.
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 03686fa..98538d2 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -328,6 +328,37 @@ void radeon_doorbell_free(struct radeon_device *rdev, u32 doorbell)
 		__clear_bit(doorbell, rdev->doorbell.used);
 }
 
+/**
+ * radeon_doorbell_get_kfd_info - Report doorbell configuration required to
+ *                                setup KFD
+ *
+ * @rdev: radeon_device pointer
+ * @aperture_base: output returning doorbell aperture base physical address
+ * @aperture_size: output returning doorbell aperture size in bytes
+ * @start_offset: output returning # of doorbell bytes reserved for radeon.
+ *
+ * Radeon and the KFD share the doorbell aperture. Radeon sets it up,
+ * takes doorbells required for its own rings and reports the setup to KFD.
+ * Radeon reserved doorbells are at the start of the doorbell aperture.
+ */
+void radeon_doorbell_get_kfd_info(struct radeon_device *rdev,
+				  phys_addr_t *aperture_base,
+				  size_t *aperture_size,
+				  size_t *start_offset)
+{
+	/* The first num_doorbells are used by radeon.
+	 * KFD takes whatever's left in the aperture. */
+	if (rdev->doorbell.size > rdev->doorbell.num_doorbells * sizeof(u32)) {
+		*aperture_base = rdev->doorbell.base;
+		*aperture_size = rdev->doorbell.size;
+		*start_offset = rdev->doorbell.num_doorbells * sizeof(u32);
+	} else {
+		*aperture_base = 0;
+		*aperture_size = 0;
+		*start_offset = 0;
+	}
+}
+
 /*
  * radeon_wb_*()
  * Writeback is the the method by which the the GPU updates special pages
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 05/25] drm/radeon: adding synchronization for GRBM GFX
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (2 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 04/25] drm/radeon: Report doorbell configuration to amdkfd Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-17 13:29 ` [PATCH v2 06/25] drm/radeon: Add radeon <--> amdkfd interface Oded Gabbay
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

Implementing a lock for selecting and accessing shader engines and arrays.
This lock will make sure that radeon and amdkfd are not colliding when
accessing shader engines and arrays with GRBM_GFX_INDEX register.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/cik.c           | 26 ++++++++++++++++++++++++++
 drivers/gpu/drm/radeon/radeon.h        |  2 ++
 drivers/gpu/drm/radeon/radeon_device.c |  1 +
 3 files changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index 1d7dd3b..b4bbc22 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -1563,6 +1563,8 @@ static const u32 godavari_golden_registers[] =
 
 static void cik_init_golden_registers(struct radeon_device *rdev)
 {
+	/* Some of the registers might be dependant on GRBM_GFX_INDEX */
+	mutex_lock(&rdev->grbm_idx_mutex);
 	switch (rdev->family) {
 	case CHIP_BONAIRE:
 		radeon_program_register_sequence(rdev,
@@ -1637,6 +1639,7 @@ static void cik_init_golden_registers(struct radeon_device *rdev)
 	default:
 		break;
 	}
+	mutex_unlock(&rdev->grbm_idx_mutex);
 }
 
 /**
@@ -3418,6 +3421,7 @@ static void cik_setup_rb(struct radeon_device *rdev,
 	u32 disabled_rbs = 0;
 	u32 enabled_rbs = 0;
 
+	mutex_lock(&rdev->grbm_idx_mutex);
 	for (i = 0; i < se_num; i++) {
 		for (j = 0; j < sh_per_se; j++) {
 			cik_select_se_sh(rdev, i, j);
@@ -3429,6 +3433,7 @@ static void cik_setup_rb(struct radeon_device *rdev,
 		}
 	}
 	cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
+	mutex_unlock(&rdev->grbm_idx_mutex);
 
 	mask = 1;
 	for (i = 0; i < max_rb_num_per_se * se_num; i++) {
@@ -3439,6 +3444,7 @@ static void cik_setup_rb(struct radeon_device *rdev,
 
 	rdev->config.cik.backend_enable_mask = enabled_rbs;
 
+	mutex_lock(&rdev->grbm_idx_mutex);
 	for (i = 0; i < se_num; i++) {
 		cik_select_se_sh(rdev, i, 0xffffffff);
 		data = 0;
@@ -3466,6 +3472,7 @@ static void cik_setup_rb(struct radeon_device *rdev,
 		WREG32(PA_SC_RASTER_CONFIG, data);
 	}
 	cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
+	mutex_unlock(&rdev->grbm_idx_mutex);
 }
 
 /**
@@ -3683,6 +3690,12 @@ static void cik_gpu_init(struct radeon_device *rdev)
 	/* set HW defaults for 3D engine */
 	WREG32(CP_MEQ_THRESHOLDS, MEQ1_START(0x30) | MEQ2_START(0x60));
 
+	mutex_lock(&rdev->grbm_idx_mutex);
+	/*
+	 * making sure that the following register writes will be broadcasted
+	 * to all the shaders
+	 */
+	cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
 	WREG32(SX_DEBUG_1, 0x20);
 
 	WREG32(TA_CNTL_AUX, 0x00010000);
@@ -3738,6 +3751,7 @@ static void cik_gpu_init(struct radeon_device *rdev)
 
 	WREG32(PA_CL_ENHANCE, CLIP_VTX_REORDER_ENA | NUM_CLIP_SEQ(3));
 	WREG32(PA_SC_ENHANCE, ENABLE_PA_SC_OUT_OF_ORDER);
+	mutex_unlock(&rdev->grbm_idx_mutex);
 
 	udelay(50);
 }
@@ -6037,6 +6051,7 @@ static void cik_wait_for_rlc_serdes(struct radeon_device *rdev)
 	u32 i, j, k;
 	u32 mask;
 
+	mutex_lock(&rdev->grbm_idx_mutex);
 	for (i = 0; i < rdev->config.cik.max_shader_engines; i++) {
 		for (j = 0; j < rdev->config.cik.max_sh_per_se; j++) {
 			cik_select_se_sh(rdev, i, j);
@@ -6048,6 +6063,7 @@ static void cik_wait_for_rlc_serdes(struct radeon_device *rdev)
 		}
 	}
 	cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
+	mutex_unlock(&rdev->grbm_idx_mutex);
 
 	mask = SE_MASTER_BUSY_MASK | GC_MASTER_BUSY | TC0_MASTER_BUSY | TC1_MASTER_BUSY;
 	for (k = 0; k < rdev->usec_timeout; k++) {
@@ -6182,10 +6198,12 @@ static int cik_rlc_resume(struct radeon_device *rdev)
 	WREG32(RLC_LB_CNTR_INIT, 0);
 	WREG32(RLC_LB_CNTR_MAX, 0x00008000);
 
+	mutex_lock(&rdev->grbm_idx_mutex);
 	cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
 	WREG32(RLC_LB_INIT_CU_MASK, 0xffffffff);
 	WREG32(RLC_LB_PARAMS, 0x00600408);
 	WREG32(RLC_LB_CNTL, 0x80000004);
+	mutex_unlock(&rdev->grbm_idx_mutex);
 
 	WREG32(RLC_MC_CNTL, 0);
 	WREG32(RLC_UCODE_CNTL, 0);
@@ -6252,11 +6270,13 @@ static void cik_enable_cgcg(struct radeon_device *rdev, bool enable)
 
 		tmp = cik_halt_rlc(rdev);
 
+		mutex_lock(&rdev->grbm_idx_mutex);
 		cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
 		WREG32(RLC_SERDES_WR_CU_MASTER_MASK, 0xffffffff);
 		WREG32(RLC_SERDES_WR_NONCU_MASTER_MASK, 0xffffffff);
 		tmp2 = BPM_ADDR_MASK | CGCG_OVERRIDE_0 | CGLS_ENABLE;
 		WREG32(RLC_SERDES_WR_CTRL, tmp2);
+		mutex_unlock(&rdev->grbm_idx_mutex);
 
 		cik_update_rlc(rdev, tmp);
 
@@ -6298,11 +6318,13 @@ static void cik_enable_mgcg(struct radeon_device *rdev, bool enable)
 
 		tmp = cik_halt_rlc(rdev);
 
+		mutex_lock(&rdev->grbm_idx_mutex);
 		cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
 		WREG32(RLC_SERDES_WR_CU_MASTER_MASK, 0xffffffff);
 		WREG32(RLC_SERDES_WR_NONCU_MASTER_MASK, 0xffffffff);
 		data = BPM_ADDR_MASK | MGCG_OVERRIDE_0;
 		WREG32(RLC_SERDES_WR_CTRL, data);
+		mutex_unlock(&rdev->grbm_idx_mutex);
 
 		cik_update_rlc(rdev, tmp);
 
@@ -6346,11 +6368,13 @@ static void cik_enable_mgcg(struct radeon_device *rdev, bool enable)
 
 		tmp = cik_halt_rlc(rdev);
 
+		mutex_lock(&rdev->grbm_idx_mutex);
 		cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
 		WREG32(RLC_SERDES_WR_CU_MASTER_MASK, 0xffffffff);
 		WREG32(RLC_SERDES_WR_NONCU_MASTER_MASK, 0xffffffff);
 		data = BPM_ADDR_MASK | MGCG_OVERRIDE_1;
 		WREG32(RLC_SERDES_WR_CTRL, data);
+		mutex_unlock(&rdev->grbm_idx_mutex);
 
 		cik_update_rlc(rdev, tmp);
 	}
@@ -6783,10 +6807,12 @@ static u32 cik_get_cu_active_bitmap(struct radeon_device *rdev, u32 se, u32 sh)
 	u32 mask = 0, tmp, tmp1;
 	int i;
 
+	mutex_lock(&rdev->grbm_idx_mutex);
 	cik_select_se_sh(rdev, se, sh);
 	tmp = RREG32(CC_GC_SHADER_ARRAY_CONFIG);
 	tmp1 = RREG32(GC_USER_SHADER_ARRAY_CONFIG);
 	cik_select_se_sh(rdev, 0xffffffff, 0xffffffff);
+	mutex_unlock(&rdev->grbm_idx_mutex);
 
 	tmp &= 0xffff0000;
 
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 4e7e41f..5136855 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2334,6 +2334,8 @@ struct radeon_device {
 	struct radeon_atcs		atcs;
 	/* srbm instance registers */
 	struct mutex			srbm_mutex;
+	/* GRBM index mutex. Protects concurrents access to GRBM index */
+	struct mutex			grbm_idx_mutex;
 	/* clock, powergating flags */
 	u32 cg_flags;
 	u32 pg_flags;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 98538d2..1b8b8b7 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1258,6 +1258,7 @@ int radeon_device_init(struct radeon_device *rdev,
 	mutex_init(&rdev->pm.mutex);
 	mutex_init(&rdev->gpu_clock_mutex);
 	mutex_init(&rdev->srbm_mutex);
+	mutex_init(&rdev->grbm_idx_mutex);
 	init_rwsem(&rdev->pm.mclk_lock);
 	init_rwsem(&rdev->exclusive_lock);
 	init_waitqueue_head(&rdev->irq.vblank_queue);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 06/25] drm/radeon: Add radeon <--> amdkfd interface
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (3 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 05/25] drm/radeon: adding synchronization for GRBM GFX Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-20 17:35   ` Jerome Glisse
  2014-07-17 13:29 ` [PATCH v2 09/25] amdkfd: Add amdkfd skeleton driver Oded Gabbay
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

This patch adds the interface between the radeon driver and the amdkfd driver.
The interface implementation is contained in radeon_kfd.c and radeon_kfd.h.

The interface itself is represented by a pointer to struct
kfd_dev. The pointer is located inside radeon_device structure.

All the register accesses that amdkfd need are done using this interface. This
allows us to avoid direct register accesses in amdkfd proper,  while also
avoiding locking between amdkfd and radeon.

The single exception is the doorbells that are used in both of the drivers.
However, because they are located in separate pci bar pages, the danger of
sharing registers between the drivers is minimal.

Having said that, we are planning to move the doorbells as well to radeon.

The loading of the amdkfd module is done via symbol lookup. According to the code review discussions, this may change in v3 of the patch set.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/Makefile     |   1 +
 drivers/gpu/drm/radeon/cik.c        |   9 +
 drivers/gpu/drm/radeon/cik_reg.h    |  65 +++++
 drivers/gpu/drm/radeon/cikd.h       |  51 +++-
 drivers/gpu/drm/radeon/radeon.h     |   3 +
 drivers/gpu/drm/radeon/radeon_drv.c |   5 +
 drivers/gpu/drm/radeon/radeon_kfd.c | 566 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/radeon/radeon_kfd.h | 119 ++++++++
 drivers/gpu/drm/radeon/radeon_kms.c |   7 +
 9 files changed, 825 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
 create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.h

diff --git a/drivers/gpu/drm/radeon/Makefile b/drivers/gpu/drm/radeon/Makefile
index 1b04002..a1c913d 100644
--- a/drivers/gpu/drm/radeon/Makefile
+++ b/drivers/gpu/drm/radeon/Makefile
@@ -104,6 +104,7 @@ radeon-y += \
 	radeon_vce.o \
 	vce_v1_0.o \
 	vce_v2_0.o \
+	radeon_kfd.o
 
 radeon-$(CONFIG_COMPAT) += radeon_ioc32.o
 radeon-$(CONFIG_VGA_SWITCHEROO) += radeon_atpx_handler.o
diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index b4bbc22..6f71095 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -32,6 +32,7 @@
 #include "cik_blit_shaders.h"
 #include "radeon_ucode.h"
 #include "clearstate_ci.h"
+#include "radeon_kfd.h"
 
 MODULE_FIRMWARE("radeon/BONAIRE_pfp.bin");
 MODULE_FIRMWARE("radeon/BONAIRE_me.bin");
@@ -7727,6 +7728,9 @@ restart_ih:
 	while (rptr != wptr) {
 		/* wptr/rptr are in bytes! */
 		ring_index = rptr / 4;
+
+		radeon_kfd_interrupt(rdev, (const void *) &rdev->ih.ring[ring_index]);
+
 		src_id =  le32_to_cpu(rdev->ih.ring[ring_index]) & 0xff;
 		src_data = le32_to_cpu(rdev->ih.ring[ring_index + 1]) & 0xfffffff;
 		ring_id = le32_to_cpu(rdev->ih.ring[ring_index + 2]) & 0xff;
@@ -8386,6 +8390,10 @@ static int cik_startup(struct radeon_device *rdev)
 	if (r)
 		return r;
 
+	r = radeon_kfd_resume(rdev);
+	if (r)
+		return r;
+
 	return 0;
 }
 
@@ -8434,6 +8442,7 @@ int cik_resume(struct radeon_device *rdev)
  */
 int cik_suspend(struct radeon_device *rdev)
 {
+	radeon_kfd_suspend(rdev);
 	radeon_pm_suspend(rdev);
 	dce6_audio_fini(rdev);
 	radeon_vm_manager_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/cik_reg.h b/drivers/gpu/drm/radeon/cik_reg.h
index ca1bb61..1ab3dbc 100644
--- a/drivers/gpu/drm/radeon/cik_reg.h
+++ b/drivers/gpu/drm/radeon/cik_reg.h
@@ -147,4 +147,69 @@
 
 #define CIK_LB_DESKTOP_HEIGHT                     0x6b0c
 
+struct cik_hqd_registers {
+	u32 cp_mqd_base_addr;
+	u32 cp_mqd_base_addr_hi;
+	u32 cp_hqd_active;
+	u32 cp_hqd_vmid;
+	u32 cp_hqd_persistent_state;
+	u32 cp_hqd_pipe_priority;
+	u32 cp_hqd_queue_priority;
+	u32 cp_hqd_quantum;
+	u32 cp_hqd_pq_base;
+	u32 cp_hqd_pq_base_hi;
+	u32 cp_hqd_pq_rptr;
+	u32 cp_hqd_pq_rptr_report_addr;
+	u32 cp_hqd_pq_rptr_report_addr_hi;
+	u32 cp_hqd_pq_wptr_poll_addr;
+	u32 cp_hqd_pq_wptr_poll_addr_hi;
+	u32 cp_hqd_pq_doorbell_control;
+	u32 cp_hqd_pq_wptr;
+	u32 cp_hqd_pq_control;
+	u32 cp_hqd_ib_base_addr;
+	u32 cp_hqd_ib_base_addr_hi;
+	u32 cp_hqd_ib_rptr;
+	u32 cp_hqd_ib_control;
+	u32 cp_hqd_iq_timer;
+	u32 cp_hqd_iq_rptr;
+	u32 cp_hqd_dequeue_request;
+	u32 cp_hqd_dma_offload;
+	u32 cp_hqd_sema_cmd;
+	u32 cp_hqd_msg_type;
+	u32 cp_hqd_atomic0_preop_lo;
+	u32 cp_hqd_atomic0_preop_hi;
+	u32 cp_hqd_atomic1_preop_lo;
+	u32 cp_hqd_atomic1_preop_hi;
+	u32 cp_hqd_hq_scheduler0;
+	u32 cp_hqd_hq_scheduler1;
+	u32 cp_mqd_control;
+};
+
+struct cik_mqd {
+	u32 header;
+	u32 dispatch_initiator;
+	u32 dimensions[3];
+	u32 start_idx[3];
+	u32 num_threads[3];
+	u32 pipeline_stat_enable;
+	u32 perf_counter_enable;
+	u32 pgm[2];
+	u32 tba[2];
+	u32 tma[2];
+	u32 pgm_rsrc[2];
+	u32 vmid;
+	u32 resource_limits;
+	u32 static_thread_mgmt01[2];
+	u32 tmp_ring_size;
+	u32 static_thread_mgmt23[2];
+	u32 restart[3];
+	u32 thread_trace_enable;
+	u32 reserved1;
+	u32 user_data[16];
+	u32 vgtcs_invoke_count[2];
+	struct cik_hqd_registers queue_state;
+	u32 dequeue_cntr;
+	u32 interrupt_queue[64];
+};
+
 #endif
diff --git a/drivers/gpu/drm/radeon/cikd.h b/drivers/gpu/drm/radeon/cikd.h
index 0c6e1b5..0a2a403 100644
--- a/drivers/gpu/drm/radeon/cikd.h
+++ b/drivers/gpu/drm/radeon/cikd.h
@@ -1137,6 +1137,9 @@
 #define			SH_MEM_ALIGNMENT_MODE_UNALIGNED			3
 #define		DEFAULT_MTYPE(x)				((x) << 4)
 #define		APE1_MTYPE(x)					((x) << 7)
+/* valid for both DEFAULT_MTYPE and APE1_MTYPE */
+#define	MTYPE_CACHED					0
+#define	MTYPE_NONCACHED					3
 
 #define	SX_DEBUG_1					0x9060
 
@@ -1447,6 +1450,16 @@
 #define CP_HQD_ACTIVE                                     0xC91C
 #define CP_HQD_VMID                                       0xC920
 
+#define CP_HQD_PERSISTENT_STATE							0xC924u
+#define	DEFAULT_CP_HQD_PERSISTENT_STATE						(0x33U << 8)
+
+#define CP_HQD_PIPE_PRIORITY							0xC928u
+#define CP_HQD_QUEUE_PRIORITY							0xC92Cu
+#define CP_HQD_QUANTUM									0xC930u
+#define	QUANTUM_EN											1U
+#define	QUANTUM_SCALE_1MS									(1U << 4)
+#define	QUANTUM_DURATION(x)									((x) << 8)
+
 #define CP_HQD_PQ_BASE                                    0xC934
 #define CP_HQD_PQ_BASE_HI                                 0xC938
 #define CP_HQD_PQ_RPTR                                    0xC93C
@@ -1474,12 +1487,32 @@
 #define		PRIV_STATE      			(1 << 30)
 #define		KMD_QUEUE      				(1 << 31)
 
-#define CP_HQD_DEQUEUE_REQUEST                          0xC974
+#define CP_HQD_IB_BASE_ADDR				0xC95Cu
+#define CP_HQD_IB_BASE_ADDR_HI			0xC960u
+#define CP_HQD_IB_RPTR					0xC964u
+#define CP_HQD_IB_CONTROL				0xC968u
+#define	IB_ATC_EN							(1U << 23)
+#define	DEFAULT_MIN_IB_AVAIL_SIZE			(3U << 20)
+
+#define CP_HQD_DEQUEUE_REQUEST			0xC974
+#define	DEQUEUE_REQUEST_DRAIN				1
+#define DEQUEUE_REQUEST_RESET				2
 
 #define CP_MQD_CONTROL                                  0xC99C
 #define		MQD_VMID(x)				((x) << 0)
 #define		MQD_VMID_MASK      			(0xf << 0)
 
+#define CP_HQD_SEMA_CMD					0xC97Cu
+#define CP_HQD_MSG_TYPE					0xC980u
+#define CP_HQD_ATOMIC0_PREOP_LO			0xC984u
+#define CP_HQD_ATOMIC0_PREOP_HI			0xC988u
+#define CP_HQD_ATOMIC1_PREOP_LO			0xC98Cu
+#define CP_HQD_ATOMIC1_PREOP_HI			0xC990u
+#define CP_HQD_HQ_SCHEDULER0			0xC994u
+#define CP_HQD_HQ_SCHEDULER1			0xC998u
+
+#define SH_STATIC_MEM_CONFIG			0x9604u
+
 #define DB_RENDER_CONTROL                               0x28000
 
 #define PA_SC_RASTER_CONFIG                             0x28350
@@ -2069,4 +2102,20 @@
 #define VCE_CMD_IB_AUTO		0x00000005
 #define VCE_CMD_SEMAPHORE	0x00000006
 
+#define ATC_VMID0_PASID_MAPPING					0x339Cu
+#define	ATC_VMID_PASID_MAPPING_UPDATE_STATUS	0x3398u
+#define	ATC_VMID_PASID_MAPPING_VALID				(1U << 31)
+
+#define ATC_VM_APERTURE0_CNTL					0x3310u
+#define	ATS_ACCESS_MODE_NEVER						0
+#define	ATS_ACCESS_MODE_ALWAYS						1
+
+#define ATC_VM_APERTURE0_CNTL2					0x3318u
+#define ATC_VM_APERTURE0_HIGH_ADDR				0x3308u
+#define ATC_VM_APERTURE0_LOW_ADDR				0x3300u
+#define ATC_VM_APERTURE1_CNTL					0x3314u
+#define ATC_VM_APERTURE1_CNTL2					0x331Cu
+#define ATC_VM_APERTURE1_HIGH_ADDR				0x330Cu
+#define ATC_VM_APERTURE1_LOW_ADDR				0x3304u
+
 #endif
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 5136855..94b38a7 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2342,6 +2342,9 @@ struct radeon_device {
 
 	struct dev_pm_domain vga_pm_domain;
 	bool have_disp_power_ref;
+
+	/* HSA KFD interface */
+	struct kfd_dev		*kfd;
 };
 
 bool radeon_is_px(struct drm_device *dev);
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index cb14213..efaa086 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -39,6 +39,8 @@
 #include <linux/pm_runtime.h>
 #include <linux/vga_switcheroo.h>
 #include "drm_crtc_helper.h"
+#include "radeon_kfd.h"
+
 /*
  * KMS wrapper.
  * - 2.0.0 - initial interface
@@ -630,12 +632,15 @@ static int __init radeon_init(void)
 #endif
 	}
 
+	radeon_kfd_init();
+
 	/* let modprobe override vga console setting */
 	return drm_pci_init(driver, pdriver);
 }
 
 static void __exit radeon_exit(void)
 {
+	radeon_kfd_fini();
 	drm_pci_exit(driver, pdriver);
 	radeon_unregister_atpx_handler();
 }
diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
new file mode 100644
index 0000000..0385239
--- /dev/null
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -0,0 +1,566 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/module.h>
+#include <linux/fdtable.h>
+#include <linux/uaccess.h>
+#include <drm/drmP.h>
+#include "radeon.h"
+#include "cikd.h"
+#include "cik_reg.h"
+#include "radeon_kfd.h"
+
+#define CIK_PIPE_PER_MEC	(4)
+
+struct kgd_mem {
+	struct radeon_bo *bo;
+	u32 domain;
+	struct radeon_bo_va *bo_va;
+};
+
+static int allocate_mem(struct kgd_dev *kgd, size_t size, size_t alignment,
+		enum kgd_memory_pool pool, struct kgd_mem **memory_handle);
+
+static void free_mem(struct kgd_dev *kgd, struct kgd_mem *memory_handle);
+
+static int gpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address);
+static void ungpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
+
+static int kmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr);
+static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
+
+static uint64_t get_vmem_size(struct kgd_dev *kgd);
+static uint64_t get_gpu_clock_counter(struct kgd_dev *kgd);
+
+static uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd);
+
+/*
+ * Register access functions
+ */
+
+static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid, uint32_t sh_mem_config,
+		uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit, uint32_t sh_mem_bases);
+static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid, unsigned int vmid);
+static int kgd_init_memory(struct kgd_dev *kgd);
+static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id, uint32_t hpd_size, uint64_t hpd_gpu_addr);
+static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr);
+static bool kgd_hqd_is_occupies(struct kgd_dev *kgd, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id);
+static int kgd_hqd_destroy(struct kgd_dev *kgd, bool is_reset, unsigned int timeout,
+				uint32_t pipe_id, uint32_t queue_id);
+
+static const struct kfd2kgd_calls kfd2kgd = {
+	.allocate_mem = allocate_mem,
+	.free_mem = free_mem,
+	.gpumap_mem = gpumap_mem,
+	.ungpumap_mem = ungpumap_mem,
+	.kmap_mem = kmap_mem,
+	.unkmap_mem = unkmap_mem,
+	.get_vmem_size = get_vmem_size,
+	.get_gpu_clock_counter = get_gpu_clock_counter,
+	.get_max_engine_clock_in_mhz = get_max_engine_clock_in_mhz,
+	.program_sh_mem_settings = kgd_program_sh_mem_settings,
+	.set_pasid_vmid_mapping = kgd_set_pasid_vmid_mapping,
+	.init_memory = kgd_init_memory,
+	.init_pipeline = kgd_init_pipeline,
+	.hqd_load = kgd_hqd_load,
+	.hqd_is_occupies = kgd_hqd_is_occupies,
+	.hqd_destroy = kgd_hqd_destroy,
+};
+
+static const struct kgd2kfd_calls *kgd2kfd;
+
+bool radeon_kfd_init(void)
+{
+	bool (*kgd2kfd_init_p)(unsigned, const struct kfd2kgd_calls*,
+				const struct kgd2kfd_calls**);
+
+	kgd2kfd_init_p = symbol_request(kgd2kfd_init);
+
+	if (kgd2kfd_init_p == NULL)
+		return false;
+
+	if (!kgd2kfd_init_p(KFD_INTERFACE_VERSION, &kfd2kgd, &kgd2kfd)) {
+		symbol_put(kgd2kfd_init);
+		kgd2kfd = NULL;
+
+		return false;
+	}
+
+	return true;
+}
+
+void radeon_kfd_fini(void)
+{
+	if (kgd2kfd) {
+		kgd2kfd->exit();
+		symbol_put(kgd2kfd_init);
+	}
+}
+
+void radeon_kfd_device_probe(struct radeon_device *rdev)
+{
+	if (kgd2kfd)
+		rdev->kfd = kgd2kfd->probe((struct kgd_dev *)rdev, rdev->pdev);
+}
+
+void radeon_kfd_device_init(struct radeon_device *rdev)
+{
+	if (rdev->kfd) {
+		struct kgd2kfd_shared_resources gpu_resources = {
+			.compute_vmid_bitmap = 0xFF00,
+
+			.first_compute_pipe = 1,
+			.compute_pipe_count = 8 - 1,
+		};
+
+		radeon_doorbell_get_kfd_info(rdev,
+				&gpu_resources.doorbell_physical_address,
+				&gpu_resources.doorbell_aperture_size,
+				&gpu_resources.doorbell_start_offset);
+
+		kgd2kfd->device_init(rdev->kfd, &gpu_resources);
+	}
+}
+
+void radeon_kfd_device_fini(struct radeon_device *rdev)
+{
+	if (rdev->kfd) {
+		kgd2kfd->device_exit(rdev->kfd);
+		rdev->kfd = NULL;
+	}
+}
+
+void radeon_kfd_interrupt(struct radeon_device *rdev, const void *ih_ring_entry)
+{
+	if (rdev->kfd)
+		kgd2kfd->interrupt(rdev->kfd, ih_ring_entry);
+}
+
+void radeon_kfd_suspend(struct radeon_device *rdev)
+{
+	if (rdev->kfd)
+		kgd2kfd->suspend(rdev->kfd);
+}
+
+int radeon_kfd_resume(struct radeon_device *rdev)
+{
+	int r = 0;
+
+	if (rdev->kfd)
+		r = kgd2kfd->resume(rdev->kfd);
+
+	return r;
+}
+
+static u32 pool_to_domain(enum kgd_memory_pool p)
+{
+	switch (p) {
+	case KGD_POOL_FRAMEBUFFER: return RADEON_GEM_DOMAIN_VRAM;
+	default: return RADEON_GEM_DOMAIN_GTT;
+	}
+}
+
+static int allocate_mem(struct kgd_dev *kgd, size_t size, size_t alignment,
+		enum kgd_memory_pool pool, struct kgd_mem **memory_handle)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+	struct kgd_mem *mem;
+	int r;
+
+	mem = kzalloc(sizeof(struct kgd_mem), GFP_KERNEL);
+	if (!mem)
+		return -ENOMEM;
+
+	mem->domain = pool_to_domain(pool);
+
+	r = radeon_bo_create(rdev, size, alignment, true, mem->domain, NULL, &mem->bo);
+	if (r) {
+		kfree(mem);
+		return r;
+	}
+
+	*memory_handle = mem;
+	return 0;
+}
+
+static void free_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
+{
+	/* Assume that KFD will never free gpumapped or kmapped memory. This is not quite settled. */
+	radeon_bo_unref(&mem->bo);
+	kfree(mem);
+}
+
+static int gpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address)
+{
+	int r;
+
+	r = radeon_bo_reserve(mem->bo, true);
+
+	/*
+	 * ttm_bo_reserve can only fail if the buffer reservation lock
+	 * is held in circumstances that would deadlock
+	 */
+	BUG_ON(r != 0);
+	r = radeon_bo_pin(mem->bo, mem->domain, vmid0_address);
+	radeon_bo_unreserve(mem->bo);
+
+	return r;
+}
+
+static void ungpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
+{
+	int r;
+
+	r = radeon_bo_reserve(mem->bo, true);
+
+	/*
+	 * ttm_bo_reserve can only fail if the buffer reservation lock
+	 * is held in circumstances that would deadlock
+	 */
+	BUG_ON(r != 0);
+	r = radeon_bo_unpin(mem->bo);
+
+	/*
+	 * This unpin only removed NO_EVICT placement flags
+	 * and should never fail
+	 */
+	BUG_ON(r != 0);
+	radeon_bo_unreserve(mem->bo);
+}
+
+static int kmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr)
+{
+	int r;
+
+	r = radeon_bo_reserve(mem->bo, true);
+
+	/*
+	 * ttm_bo_reserve can only fail if the buffer reservation lock
+	 * is held in circumstances that would deadlock
+	 */
+	BUG_ON(r != 0);
+	r = radeon_bo_kmap(mem->bo, ptr);
+	radeon_bo_unreserve(mem->bo);
+
+	return r;
+}
+
+static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
+{
+	int r;
+
+	r = radeon_bo_reserve(mem->bo, true);
+	/*
+	 * ttm_bo_reserve can only fail if the buffer reservation lock
+	 * is held in circumstances that would deadlock
+	 */
+	BUG_ON(r != 0);
+	radeon_bo_kunmap(mem->bo);
+	radeon_bo_unreserve(mem->bo);
+}
+
+static uint64_t get_vmem_size(struct kgd_dev *kgd)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+
+	BUG_ON(kgd == NULL);
+
+	return rdev->mc.real_vram_size;
+}
+
+static uint64_t get_gpu_clock_counter(struct kgd_dev *kgd)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+
+	return rdev->asic->get_gpu_clock_counter(rdev);
+}
+
+static uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+
+	/* The sclk is in quantas of 10kHz */
+	return rdev->pm.dpm.dyn_state.max_clock_voltage_on_ac.sclk / 100;
+}
+
+/*
+ * kfd/radeon registers access interface
+ */
+
+inline uint32_t lower_32(uint64_t x)
+{
+	return (uint32_t)x;
+}
+
+inline uint32_t upper_32(uint64_t x)
+{
+	return (uint32_t)(x >> 32);
+}
+
+static inline struct radeon_device *get_radeon_device(struct kgd_dev *kgd)
+{
+	return (struct radeon_device *)kgd;
+}
+
+static void write_register(struct kgd_dev *kgd, uint32_t offset, uint32_t value)
+{
+	struct radeon_device *rdev = get_radeon_device(kgd);
+
+	writel(value, (void __iomem *)(rdev->rmmio + offset));
+}
+
+static uint32_t read_register(struct kgd_dev *kgd, uint32_t offset)
+{
+	struct radeon_device *rdev = get_radeon_device(kgd);
+
+	return readl((void __iomem *)(rdev->rmmio + offset));
+}
+
+static void lock_srbm(struct kgd_dev *kgd, uint32_t mec, uint32_t pipe, uint32_t queue, uint32_t vmid)
+{
+	struct radeon_device *rdev = get_radeon_device(kgd);
+	uint32_t value = PIPEID(pipe) | MEID(mec) | VMID(vmid) | QUEUEID(queue);
+
+	mutex_lock(&rdev->srbm_mutex);
+	write_register(kgd, SRBM_GFX_CNTL, value);
+}
+
+static void unlock_srbm(struct kgd_dev *kgd)
+{
+	struct radeon_device *rdev = get_radeon_device(kgd);
+
+	write_register(kgd, SRBM_GFX_CNTL, 0);
+	mutex_unlock(&rdev->srbm_mutex);
+}
+
+static void acquire_queue(struct kgd_dev *kgd, uint32_t pipe_id, uint32_t queue_id)
+{
+	uint32_t mec = (++pipe_id / CIK_PIPE_PER_MEC) + 1;
+	uint32_t pipe = (pipe_id % CIK_PIPE_PER_MEC);
+
+	lock_srbm(kgd, mec, pipe, queue_id, 0);
+}
+
+static void release_queue(struct kgd_dev *kgd)
+{
+	unlock_srbm(kgd);
+}
+
+static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid, uint32_t sh_mem_config,
+		uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit, uint32_t sh_mem_bases)
+{
+	lock_srbm(kgd, 0, 0, 0, vmid);
+
+	write_register(kgd, SH_MEM_CONFIG, sh_mem_config);
+	write_register(kgd, SH_MEM_APE1_BASE, sh_mem_ape1_base);
+	write_register(kgd, SH_MEM_APE1_LIMIT, sh_mem_ape1_limit);
+	write_register(kgd, SH_MEM_BASES, sh_mem_bases);
+
+	unlock_srbm(kgd);
+}
+
+static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid, unsigned int vmid)
+{
+	/* We have to assume that there is no outstanding mapping.
+	 * The ATC_VMID_PASID_MAPPING_UPDATE_STATUS bit could be 0 because a mapping
+	 * is in progress or because a mapping finished and the SW cleared it.
+	 * So the protocol is to always wait & clear.
+	 */
+	uint32_t pasid_mapping = (pasid == 0) ? 0 : (uint32_t)pasid | ATC_VMID_PASID_MAPPING_VALID;
+
+	write_register(kgd, ATC_VMID0_PASID_MAPPING + vmid*sizeof(uint32_t), pasid_mapping);
+
+	while (!(read_register(kgd, ATC_VMID_PASID_MAPPING_UPDATE_STATUS) & (1U << vmid)))
+		cpu_relax();
+	write_register(kgd, ATC_VMID_PASID_MAPPING_UPDATE_STATUS, 1U << vmid);
+
+	return 0;
+}
+
+static int kgd_init_memory(struct kgd_dev *kgd)
+{
+	/* Configure apertures:
+	 * LDS:         0x60000000'00000000 - 0x60000001'00000000 (4GB)
+	 * Scratch:     0x60000001'00000000 - 0x60000002'00000000 (4GB)
+	 * GPUVM:       0x60010000'00000000 - 0x60020000'00000000 (1TB)
+	 */
+	int i;
+	uint32_t sh_mem_bases = PRIVATE_BASE(0x6000) | SHARED_BASE(0x6000);
+
+	for (i = 8; i < 16; i++) {
+		uint32_t sh_mem_config;
+
+		lock_srbm(kgd, 0, 0, 0, i);
+
+		sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED);
+		sh_mem_config |= DEFAULT_MTYPE(MTYPE_NONCACHED);
+
+		write_register(kgd, SH_MEM_CONFIG, sh_mem_config);
+
+		write_register(kgd, SH_MEM_BASES, sh_mem_bases);
+
+		/* Scratch aperture is not supported for now. */
+		write_register(kgd, SH_STATIC_MEM_CONFIG, 0);
+
+		/* APE1 disabled for now. */
+		write_register(kgd, SH_MEM_APE1_BASE, 1);
+		write_register(kgd, SH_MEM_APE1_LIMIT, 0);
+
+		unlock_srbm(kgd);
+	}
+
+	return 0;
+}
+
+static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id, uint32_t hpd_size, uint64_t hpd_gpu_addr)
+{
+	uint32_t mec = (++pipe_id / CIK_PIPE_PER_MEC) + 1;
+	uint32_t pipe = (pipe_id % CIK_PIPE_PER_MEC);
+
+	lock_srbm(kgd, mec, pipe, 0, 0);
+	write_register(kgd, CP_HPD_EOP_BASE_ADDR, lower_32(hpd_gpu_addr >> 8));
+	write_register(kgd, CP_HPD_EOP_BASE_ADDR_HI, upper_32(hpd_gpu_addr >> 8));
+	write_register(kgd, CP_HPD_EOP_VMID, 0);
+	write_register(kgd, CP_HPD_EOP_CONTROL, hpd_size);
+	unlock_srbm(kgd);
+
+	return 0;
+}
+
+static inline struct cik_mqd *get_mqd(void *mqd)
+{
+	return (struct cik_mqd *)mqd;
+}
+
+static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr)
+{
+	uint32_t wptr_shadow, is_wptr_shadow_valid;
+	struct cik_mqd *m;
+
+	m = get_mqd(mqd);
+
+	is_wptr_shadow_valid = !get_user(wptr_shadow, wptr);
+
+	acquire_queue(kgd, pipe_id, queue_id);
+	write_register(kgd, CP_MQD_BASE_ADDR, m->queue_state.cp_mqd_base_addr);
+	write_register(kgd, CP_MQD_BASE_ADDR_HI, m->queue_state.cp_mqd_base_addr_hi);
+	write_register(kgd, CP_MQD_CONTROL, m->queue_state.cp_mqd_control);
+
+	write_register(kgd, CP_HQD_PQ_BASE, m->queue_state.cp_hqd_pq_base);
+	write_register(kgd, CP_HQD_PQ_BASE_HI, m->queue_state.cp_hqd_pq_base_hi);
+	write_register(kgd, CP_HQD_PQ_CONTROL, m->queue_state.cp_hqd_pq_control);
+
+	write_register(kgd, CP_HQD_IB_CONTROL, m->queue_state.cp_hqd_ib_control);
+	write_register(kgd, CP_HQD_IB_BASE_ADDR, m->queue_state.cp_hqd_ib_base_addr);
+	write_register(kgd, CP_HQD_IB_BASE_ADDR_HI, m->queue_state.cp_hqd_ib_base_addr_hi);
+
+	write_register(kgd, CP_HQD_IB_RPTR, m->queue_state.cp_hqd_ib_rptr);
+
+	write_register(kgd, CP_HQD_PERSISTENT_STATE, m->queue_state.cp_hqd_persistent_state);
+	write_register(kgd, CP_HQD_SEMA_CMD, m->queue_state.cp_hqd_sema_cmd);
+	write_register(kgd, CP_HQD_MSG_TYPE, m->queue_state.cp_hqd_msg_type);
+
+	write_register(kgd, CP_HQD_ATOMIC0_PREOP_LO, m->queue_state.cp_hqd_atomic0_preop_lo);
+	write_register(kgd, CP_HQD_ATOMIC0_PREOP_HI, m->queue_state.cp_hqd_atomic0_preop_hi);
+	write_register(kgd, CP_HQD_ATOMIC1_PREOP_LO, m->queue_state.cp_hqd_atomic1_preop_lo);
+	write_register(kgd, CP_HQD_ATOMIC1_PREOP_HI, m->queue_state.cp_hqd_atomic1_preop_hi);
+
+	write_register(kgd, CP_HQD_PQ_RPTR_REPORT_ADDR, m->queue_state.cp_hqd_pq_rptr_report_addr);
+	write_register(kgd, CP_HQD_PQ_RPTR_REPORT_ADDR_HI, m->queue_state.cp_hqd_pq_rptr_report_addr_hi);
+	write_register(kgd, CP_HQD_PQ_RPTR, m->queue_state.cp_hqd_pq_rptr);
+
+	write_register(kgd, CP_HQD_PQ_WPTR_POLL_ADDR, m->queue_state.cp_hqd_pq_wptr_poll_addr);
+	write_register(kgd, CP_HQD_PQ_WPTR_POLL_ADDR_HI, m->queue_state.cp_hqd_pq_wptr_poll_addr_hi);
+
+	write_register(kgd, CP_HQD_PQ_DOORBELL_CONTROL, m->queue_state.cp_hqd_pq_doorbell_control);
+
+	write_register(kgd, CP_HQD_VMID, m->queue_state.cp_hqd_vmid);
+
+	write_register(kgd, CP_HQD_QUANTUM, m->queue_state.cp_hqd_quantum);
+
+	write_register(kgd, CP_HQD_PIPE_PRIORITY, m->queue_state.cp_hqd_pipe_priority);
+	write_register(kgd, CP_HQD_QUEUE_PRIORITY, m->queue_state.cp_hqd_queue_priority);
+
+	write_register(kgd, CP_HQD_HQ_SCHEDULER0, m->queue_state.cp_hqd_hq_scheduler0);
+	write_register(kgd, CP_HQD_HQ_SCHEDULER1, m->queue_state.cp_hqd_hq_scheduler1);
+
+	if (is_wptr_shadow_valid)
+		write_register(kgd, CP_HQD_PQ_WPTR, wptr_shadow);
+
+	write_register(kgd, CP_HQD_ACTIVE, m->queue_state.cp_hqd_active);
+	release_queue(kgd);
+
+	return 0;
+}
+
+static bool kgd_hqd_is_occupies(struct kgd_dev *kgd, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id)
+{
+	uint32_t act;
+	bool retval = false;
+	uint32_t low, high;
+
+	acquire_queue(kgd, pipe_id, queue_id);
+	act = read_register(kgd, CP_HQD_ACTIVE);
+	if (act) {
+		low = lower_32(queue_address >> 8);
+		high = upper_32(queue_address >> 8);
+
+		if (low == read_register(kgd, CP_HQD_PQ_BASE) &&
+				high == read_register(kgd, CP_HQD_PQ_BASE_HI))
+			retval = true;
+	}
+	release_queue(kgd);
+	return retval;
+}
+
+static int kgd_hqd_destroy(struct kgd_dev *kgd, bool is_reset,
+				unsigned int timeout, uint32_t pipe_id,
+				uint32_t queue_id)
+{
+	int status = 0;
+	bool sync = (timeout > 0) ? true : false;
+
+	acquire_queue(kgd, pipe_id, queue_id);
+	write_register(kgd, CP_HQD_PQ_DOORBELL_CONTROL, 0);
+
+	if (is_reset)
+		write_register(kgd, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_RESET);
+	else
+		write_register(kgd, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_DRAIN);
+
+
+	while (read_register(kgd, CP_HQD_ACTIVE) != 0) {
+		if (sync && timeout <= 0) {
+			status = -EBUSY;
+			break;
+		}
+		msleep(20);
+		if (sync) {
+			if (timeout >= 20)
+				timeout -= 20;
+			else
+				timeout = 0;
+		}
+	}
+	release_queue(kgd);
+	return status;
+}
diff --git a/drivers/gpu/drm/radeon/radeon_kfd.h b/drivers/gpu/drm/radeon/radeon_kfd.h
new file mode 100644
index 0000000..5171726
--- /dev/null
+++ b/drivers/gpu/drm/radeon/radeon_kfd.h
@@ -0,0 +1,119 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/*
+ * radeon_kfd.h defines the private interface between the
+ * AMD kernel graphics drivers and the AMD KFD.
+ */
+
+#ifndef RADEON_KFD_H_INCLUDED
+#define RADEON_KFD_H_INCLUDED
+
+#include <linux/types.h>
+
+struct pci_dev;
+
+#define KFD_INTERFACE_VERSION 1
+
+struct kfd_dev;
+struct kgd_dev;
+
+struct kgd_mem;
+
+struct radeon_device;
+
+enum kgd_memory_pool {
+	KGD_POOL_SYSTEM_CACHEABLE = 1,
+	KGD_POOL_SYSTEM_WRITECOMBINE = 2,
+	KGD_POOL_FRAMEBUFFER = 3,
+};
+
+struct kgd2kfd_shared_resources {
+	unsigned int compute_vmid_bitmap; /* Bit n == 1 means VMID n is available for KFD. */
+
+	unsigned int first_compute_pipe; /* Compute pipes are counted starting from MEC0/pipe0 as 0. */
+	unsigned int compute_pipe_count; /* Number of MEC pipes available for KFD. */
+
+	phys_addr_t doorbell_physical_address; /* Base address of doorbell aperture. */
+	size_t doorbell_aperture_size; /* Size in bytes of doorbell aperture. */
+	size_t doorbell_start_offset; /* Number of bytes at start of aperture reserved for KGD. */
+};
+
+struct kgd2kfd_calls {
+	void (*exit)(void);
+	struct kfd_dev* (*probe)(struct kgd_dev *kgd, struct pci_dev *pdev);
+	bool (*device_init)(struct kfd_dev *kfd, const struct kgd2kfd_shared_resources *gpu_resources);
+	void (*device_exit)(struct kfd_dev *kfd);
+	void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
+	void (*suspend)(struct kfd_dev *kfd);
+	int (*resume)(struct kfd_dev *kfd);
+};
+
+struct kfd2kgd_calls {
+	/* Memory management. */
+	int (*allocate_mem)(struct kgd_dev *kgd,
+				size_t size,
+				size_t alignment,
+				enum kgd_memory_pool pool,
+				struct kgd_mem **memory_handle);
+
+	void (*free_mem)(struct kgd_dev *kgd, struct kgd_mem *memory_handle);
+
+	int (*gpumap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address);
+	void (*ungpumap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
+
+	int (*kmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr);
+	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
+
+	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
+	uint64_t (*get_gpu_clock_counter)(struct kgd_dev *kgd);
+
+	uint32_t (*get_max_engine_clock_in_mhz)(struct kgd_dev *kgd);
+
+	/* Register access functions */
+	void (*program_sh_mem_settings)(struct kgd_dev *kgd, uint32_t vmid, uint32_t sh_mem_config,
+			uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit, uint32_t sh_mem_bases);
+	int (*set_pasid_vmid_mapping)(struct kgd_dev *kgd, unsigned int pasid, unsigned int vmid);
+	int (*init_memory)(struct kgd_dev *kgd);
+	int (*init_pipeline)(struct kgd_dev *kgd, uint32_t pipe_id, uint32_t hpd_size, uint64_t hpd_gpu_addr);
+	int (*hqd_load)(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr);
+	bool (*hqd_is_occupies)(struct kgd_dev *kgd, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id);
+	int (*hqd_destroy)(struct kgd_dev *kgd, bool is_reset, unsigned int timeout,
+				uint32_t pipe_id, uint32_t queue_id);
+};
+
+bool radeon_kfd_init(void);
+void radeon_kfd_fini(void);
+bool kgd2kfd_init(unsigned interface_version,
+		  const struct kfd2kgd_calls *f2g,
+		  const struct kgd2kfd_calls **g2f);
+
+void radeon_kfd_suspend(struct radeon_device *rdev);
+int radeon_kfd_resume(struct radeon_device *rdev);
+void radeon_kfd_interrupt(struct radeon_device *rdev,
+			const void *ih_ring_entry);
+void radeon_kfd_device_probe(struct radeon_device *rdev);
+void radeon_kfd_device_init(struct radeon_device *rdev);
+void radeon_kfd_device_fini(struct radeon_device *rdev);
+
+#endif
+
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
index 35d9318..929beda 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -34,6 +34,8 @@
 #include <linux/slab.h>
 #include <linux/pm_runtime.h>
 
+#include "radeon_kfd.h"
+
 #if defined(CONFIG_VGA_SWITCHEROO)
 bool radeon_has_atpx(void);
 #else
@@ -63,6 +65,8 @@ int radeon_driver_unload_kms(struct drm_device *dev)
 
 	pm_runtime_get_sync(dev->dev);
 
+	radeon_kfd_device_fini(rdev);
+
 	radeon_acpi_fini(rdev);
 	
 	radeon_modeset_fini(rdev);
@@ -142,6 +146,9 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
 				"Error during ACPI methods call\n");
 	}
 
+	radeon_kfd_device_probe(rdev);
+	radeon_kfd_device_init(rdev);
+
 	if (radeon_is_px(dev)) {
 		pm_runtime_use_autosuspend(dev->dev);
 		pm_runtime_set_autosuspend_delay(dev->dev, 5000);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 06/25] drm/radeon: Add radeon <--> amdkfd interface
  2014-07-17 13:29 ` [PATCH v2 06/25] drm/radeon: Add radeon <--> amdkfd interface Oded Gabbay
@ 2014-07-20 17:35   ` Jerome Glisse
  2014-08-02 20:07     ` Oded Gabbay
  0 siblings, 1 reply; 46+ messages in thread
From: Jerome Glisse @ 2014-07-20 17:35 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Evgeny Pinchuk, Alexey Skidanov, Alex Deucher, Andrew Morton,
	Christian König

On Thu, Jul 17, 2014 at 04:29:13PM +0300, Oded Gabbay wrote:
> This patch adds the interface between the radeon driver and the amdkfd driver.
> The interface implementation is contained in radeon_kfd.c and radeon_kfd.h.
> 
> The interface itself is represented by a pointer to struct
> kfd_dev. The pointer is located inside radeon_device structure.
> 
> All the register accesses that amdkfd need are done using this interface. This
> allows us to avoid direct register accesses in amdkfd proper,  while also
> avoiding locking between amdkfd and radeon.
> 
> The single exception is the doorbells that are used in both of the drivers.
> However, because they are located in separate pci bar pages, the danger of
> sharing registers between the drivers is minimal.
> 
> Having said that, we are planning to move the doorbells as well to radeon.
> 
> The loading of the amdkfd module is done via symbol lookup. According to the code review discussions, this may change in v3 of the patch set.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/drm/radeon/Makefile     |   1 +
>  drivers/gpu/drm/radeon/cik.c        |   9 +
>  drivers/gpu/drm/radeon/cik_reg.h    |  65 +++++
>  drivers/gpu/drm/radeon/cikd.h       |  51 +++-
>  drivers/gpu/drm/radeon/radeon.h     |   3 +
>  drivers/gpu/drm/radeon/radeon_drv.c |   5 +
>  drivers/gpu/drm/radeon/radeon_kfd.c | 566 ++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/radeon/radeon_kfd.h | 119 ++++++++
>  drivers/gpu/drm/radeon/radeon_kms.c |   7 +
>  9 files changed, 825 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
>  create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.h
> 
> diff --git a/drivers/gpu/drm/radeon/Makefile b/drivers/gpu/drm/radeon/Makefile
> index 1b04002..a1c913d 100644
> --- a/drivers/gpu/drm/radeon/Makefile
> +++ b/drivers/gpu/drm/radeon/Makefile
> @@ -104,6 +104,7 @@ radeon-y += \
>  	radeon_vce.o \
>  	vce_v1_0.o \
>  	vce_v2_0.o \
> +	radeon_kfd.o
>  
>  radeon-$(CONFIG_COMPAT) += radeon_ioc32.o
>  radeon-$(CONFIG_VGA_SWITCHEROO) += radeon_atpx_handler.o
> diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
> index b4bbc22..6f71095 100644
> --- a/drivers/gpu/drm/radeon/cik.c
> +++ b/drivers/gpu/drm/radeon/cik.c
> @@ -32,6 +32,7 @@
>  #include "cik_blit_shaders.h"
>  #include "radeon_ucode.h"
>  #include "clearstate_ci.h"
> +#include "radeon_kfd.h"
>  
>  MODULE_FIRMWARE("radeon/BONAIRE_pfp.bin");
>  MODULE_FIRMWARE("radeon/BONAIRE_me.bin");
> @@ -7727,6 +7728,9 @@ restart_ih:
>  	while (rptr != wptr) {
>  		/* wptr/rptr are in bytes! */
>  		ring_index = rptr / 4;
> +
> +		radeon_kfd_interrupt(rdev, (const void *) &rdev->ih.ring[ring_index]);
> +
>  		src_id =  le32_to_cpu(rdev->ih.ring[ring_index]) & 0xff;
>  		src_data = le32_to_cpu(rdev->ih.ring[ring_index + 1]) & 0xfffffff;
>  		ring_id = le32_to_cpu(rdev->ih.ring[ring_index + 2]) & 0xff;
> @@ -8386,6 +8390,10 @@ static int cik_startup(struct radeon_device *rdev)
>  	if (r)
>  		return r;
>  
> +	r = radeon_kfd_resume(rdev);
> +	if (r)
> +		return r;
> +
>  	return 0;
>  }
>  
> @@ -8434,6 +8442,7 @@ int cik_resume(struct radeon_device *rdev)
>   */
>  int cik_suspend(struct radeon_device *rdev)
>  {
> +	radeon_kfd_suspend(rdev);
>  	radeon_pm_suspend(rdev);
>  	dce6_audio_fini(rdev);
>  	radeon_vm_manager_fini(rdev);
> diff --git a/drivers/gpu/drm/radeon/cik_reg.h b/drivers/gpu/drm/radeon/cik_reg.h
> index ca1bb61..1ab3dbc 100644
> --- a/drivers/gpu/drm/radeon/cik_reg.h
> +++ b/drivers/gpu/drm/radeon/cik_reg.h
> @@ -147,4 +147,69 @@
>  
>  #define CIK_LB_DESKTOP_HEIGHT                     0x6b0c
>  
> +struct cik_hqd_registers {
> +	u32 cp_mqd_base_addr;
> +	u32 cp_mqd_base_addr_hi;
> +	u32 cp_hqd_active;
> +	u32 cp_hqd_vmid;
> +	u32 cp_hqd_persistent_state;
> +	u32 cp_hqd_pipe_priority;
> +	u32 cp_hqd_queue_priority;
> +	u32 cp_hqd_quantum;
> +	u32 cp_hqd_pq_base;
> +	u32 cp_hqd_pq_base_hi;
> +	u32 cp_hqd_pq_rptr;
> +	u32 cp_hqd_pq_rptr_report_addr;
> +	u32 cp_hqd_pq_rptr_report_addr_hi;
> +	u32 cp_hqd_pq_wptr_poll_addr;
> +	u32 cp_hqd_pq_wptr_poll_addr_hi;
> +	u32 cp_hqd_pq_doorbell_control;
> +	u32 cp_hqd_pq_wptr;
> +	u32 cp_hqd_pq_control;
> +	u32 cp_hqd_ib_base_addr;
> +	u32 cp_hqd_ib_base_addr_hi;
> +	u32 cp_hqd_ib_rptr;
> +	u32 cp_hqd_ib_control;
> +	u32 cp_hqd_iq_timer;
> +	u32 cp_hqd_iq_rptr;
> +	u32 cp_hqd_dequeue_request;
> +	u32 cp_hqd_dma_offload;
> +	u32 cp_hqd_sema_cmd;
> +	u32 cp_hqd_msg_type;
> +	u32 cp_hqd_atomic0_preop_lo;
> +	u32 cp_hqd_atomic0_preop_hi;
> +	u32 cp_hqd_atomic1_preop_lo;
> +	u32 cp_hqd_atomic1_preop_hi;
> +	u32 cp_hqd_hq_scheduler0;
> +	u32 cp_hqd_hq_scheduler1;
> +	u32 cp_mqd_control;
> +};
> +
> +struct cik_mqd {
> +	u32 header;
> +	u32 dispatch_initiator;
> +	u32 dimensions[3];
> +	u32 start_idx[3];
> +	u32 num_threads[3];
> +	u32 pipeline_stat_enable;
> +	u32 perf_counter_enable;
> +	u32 pgm[2];
> +	u32 tba[2];
> +	u32 tma[2];
> +	u32 pgm_rsrc[2];
> +	u32 vmid;
> +	u32 resource_limits;
> +	u32 static_thread_mgmt01[2];
> +	u32 tmp_ring_size;
> +	u32 static_thread_mgmt23[2];
> +	u32 restart[3];
> +	u32 thread_trace_enable;
> +	u32 reserved1;
> +	u32 user_data[16];
> +	u32 vgtcs_invoke_count[2];
> +	struct cik_hqd_registers queue_state;
> +	u32 dequeue_cntr;
> +	u32 interrupt_queue[64];
> +};
> +
>  #endif
> diff --git a/drivers/gpu/drm/radeon/cikd.h b/drivers/gpu/drm/radeon/cikd.h
> index 0c6e1b5..0a2a403 100644
> --- a/drivers/gpu/drm/radeon/cikd.h
> +++ b/drivers/gpu/drm/radeon/cikd.h
> @@ -1137,6 +1137,9 @@
>  #define			SH_MEM_ALIGNMENT_MODE_UNALIGNED			3
>  #define		DEFAULT_MTYPE(x)				((x) << 4)
>  #define		APE1_MTYPE(x)					((x) << 7)
> +/* valid for both DEFAULT_MTYPE and APE1_MTYPE */
> +#define	MTYPE_CACHED					0
> +#define	MTYPE_NONCACHED					3
>  
>  #define	SX_DEBUG_1					0x9060
>  
> @@ -1447,6 +1450,16 @@
>  #define CP_HQD_ACTIVE                                     0xC91C
>  #define CP_HQD_VMID                                       0xC920
>  
> +#define CP_HQD_PERSISTENT_STATE							0xC924u
> +#define	DEFAULT_CP_HQD_PERSISTENT_STATE						(0x33U << 8)
> +
> +#define CP_HQD_PIPE_PRIORITY							0xC928u
> +#define CP_HQD_QUEUE_PRIORITY							0xC92Cu
> +#define CP_HQD_QUANTUM									0xC930u
> +#define	QUANTUM_EN											1U
> +#define	QUANTUM_SCALE_1MS									(1U << 4)
> +#define	QUANTUM_DURATION(x)									((x) << 8)
> +

We need documentation for this queue/pipe priority to know their
granularity and how they are use exactly.

>  #define CP_HQD_PQ_BASE                                    0xC934
>  #define CP_HQD_PQ_BASE_HI                                 0xC938
>  #define CP_HQD_PQ_RPTR                                    0xC93C
> @@ -1474,12 +1487,32 @@
>  #define		PRIV_STATE      			(1 << 30)
>  #define		KMD_QUEUE      				(1 << 31)
>  
> -#define CP_HQD_DEQUEUE_REQUEST                          0xC974
> +#define CP_HQD_IB_BASE_ADDR				0xC95Cu
> +#define CP_HQD_IB_BASE_ADDR_HI			0xC960u
> +#define CP_HQD_IB_RPTR					0xC964u
> +#define CP_HQD_IB_CONTROL				0xC968u
> +#define	IB_ATC_EN							(1U << 23)
> +#define	DEFAULT_MIN_IB_AVAIL_SIZE			(3U << 20)
> +
> +#define CP_HQD_DEQUEUE_REQUEST			0xC974
> +#define	DEQUEUE_REQUEST_DRAIN				1
> +#define DEQUEUE_REQUEST_RESET				2
>  
>  #define CP_MQD_CONTROL                                  0xC99C
>  #define		MQD_VMID(x)				((x) << 0)
>  #define		MQD_VMID_MASK      			(0xf << 0)
>  
> +#define CP_HQD_SEMA_CMD					0xC97Cu
> +#define CP_HQD_MSG_TYPE					0xC980u
> +#define CP_HQD_ATOMIC0_PREOP_LO			0xC984u
> +#define CP_HQD_ATOMIC0_PREOP_HI			0xC988u
> +#define CP_HQD_ATOMIC1_PREOP_LO			0xC98Cu
> +#define CP_HQD_ATOMIC1_PREOP_HI			0xC990u
> +#define CP_HQD_HQ_SCHEDULER0			0xC994u
> +#define CP_HQD_HQ_SCHEDULER1			0xC998u
> +
> +#define SH_STATIC_MEM_CONFIG			0x9604u

Same here documentation is needed on all those register.

> +
>  #define DB_RENDER_CONTROL                               0x28000
>  
>  #define PA_SC_RASTER_CONFIG                             0x28350
> @@ -2069,4 +2102,20 @@
>  #define VCE_CMD_IB_AUTO		0x00000005
>  #define VCE_CMD_SEMAPHORE	0x00000006
>  
> +#define ATC_VMID0_PASID_MAPPING					0x339Cu
> +#define	ATC_VMID_PASID_MAPPING_UPDATE_STATUS	0x3398u
> +#define	ATC_VMID_PASID_MAPPING_VALID				(1U << 31)
> +
> +#define ATC_VM_APERTURE0_CNTL					0x3310u
> +#define	ATS_ACCESS_MODE_NEVER						0
> +#define	ATS_ACCESS_MODE_ALWAYS						1
> +
> +#define ATC_VM_APERTURE0_CNTL2					0x3318u
> +#define ATC_VM_APERTURE0_HIGH_ADDR				0x3308u
> +#define ATC_VM_APERTURE0_LOW_ADDR				0x3300u
> +#define ATC_VM_APERTURE1_CNTL					0x3314u
> +#define ATC_VM_APERTURE1_CNTL2					0x331Cu
> +#define ATC_VM_APERTURE1_HIGH_ADDR				0x330Cu
> +#define ATC_VM_APERTURE1_LOW_ADDR				0x3304u
> +
>  #endif
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 5136855..94b38a7 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -2342,6 +2342,9 @@ struct radeon_device {
>  
>  	struct dev_pm_domain vga_pm_domain;
>  	bool have_disp_power_ref;
> +
> +	/* HSA KFD interface */
> +	struct kfd_dev		*kfd;
>  };
>  
>  bool radeon_is_px(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
> index cb14213..efaa086 100644
> --- a/drivers/gpu/drm/radeon/radeon_drv.c
> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
> @@ -39,6 +39,8 @@
>  #include <linux/pm_runtime.h>
>  #include <linux/vga_switcheroo.h>
>  #include "drm_crtc_helper.h"
> +#include "radeon_kfd.h"
> +
>  /*
>   * KMS wrapper.
>   * - 2.0.0 - initial interface
> @@ -630,12 +632,15 @@ static int __init radeon_init(void)
>  #endif
>  	}
>  
> +	radeon_kfd_init();
> +
>  	/* let modprobe override vga console setting */
>  	return drm_pci_init(driver, pdriver);
>  }
>  
>  static void __exit radeon_exit(void)
>  {
> +	radeon_kfd_fini();
>  	drm_pci_exit(driver, pdriver);
>  	radeon_unregister_atpx_handler();
>  }
> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
> new file mode 100644
> index 0000000..0385239
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
> @@ -0,0 +1,566 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/fdtable.h>
> +#include <linux/uaccess.h>
> +#include <drm/drmP.h>
> +#include "radeon.h"
> +#include "cikd.h"
> +#include "cik_reg.h"
> +#include "radeon_kfd.h"
> +
> +#define CIK_PIPE_PER_MEC	(4)
> +
> +struct kgd_mem {
> +	struct radeon_bo *bo;
> +	u32 domain;
> +	struct radeon_bo_va *bo_va;
> +};
> +
> +static int allocate_mem(struct kgd_dev *kgd, size_t size, size_t alignment,
> +		enum kgd_memory_pool pool, struct kgd_mem **memory_handle);
> +
> +static void free_mem(struct kgd_dev *kgd, struct kgd_mem *memory_handle);
> +
> +static int gpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address);
> +static void ungpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
> +
> +static int kmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr);
> +static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
> +
> +static uint64_t get_vmem_size(struct kgd_dev *kgd);
> +static uint64_t get_gpu_clock_counter(struct kgd_dev *kgd);
> +
> +static uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd);
> +
> +/*
> + * Register access functions
> + */
> +
> +static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid, uint32_t sh_mem_config,
> +		uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit, uint32_t sh_mem_bases);
> +static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid, unsigned int vmid);
> +static int kgd_init_memory(struct kgd_dev *kgd);
> +static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id, uint32_t hpd_size, uint64_t hpd_gpu_addr);
> +static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr);
> +static bool kgd_hqd_is_occupies(struct kgd_dev *kgd, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id);
> +static int kgd_hqd_destroy(struct kgd_dev *kgd, bool is_reset, unsigned int timeout,
> +				uint32_t pipe_id, uint32_t queue_id);
> +
> +static const struct kfd2kgd_calls kfd2kgd = {
> +	.allocate_mem = allocate_mem,
> +	.free_mem = free_mem,
> +	.gpumap_mem = gpumap_mem,
> +	.ungpumap_mem = ungpumap_mem,
> +	.kmap_mem = kmap_mem,
> +	.unkmap_mem = unkmap_mem,
> +	.get_vmem_size = get_vmem_size,
> +	.get_gpu_clock_counter = get_gpu_clock_counter,
> +	.get_max_engine_clock_in_mhz = get_max_engine_clock_in_mhz,
> +	.program_sh_mem_settings = kgd_program_sh_mem_settings,
> +	.set_pasid_vmid_mapping = kgd_set_pasid_vmid_mapping,
> +	.init_memory = kgd_init_memory,
> +	.init_pipeline = kgd_init_pipeline,
> +	.hqd_load = kgd_hqd_load,
> +	.hqd_is_occupies = kgd_hqd_is_occupies,
> +	.hqd_destroy = kgd_hqd_destroy,
> +};
> +
> +static const struct kgd2kfd_calls *kgd2kfd;
> +
> +bool radeon_kfd_init(void)
> +{
> +	bool (*kgd2kfd_init_p)(unsigned, const struct kfd2kgd_calls*,
> +				const struct kgd2kfd_calls**);
> +
> +	kgd2kfd_init_p = symbol_request(kgd2kfd_init);
> +
> +	if (kgd2kfd_init_p == NULL)
> +		return false;
> +
> +	if (!kgd2kfd_init_p(KFD_INTERFACE_VERSION, &kfd2kgd, &kgd2kfd)) {
> +		symbol_put(kgd2kfd_init);
> +		kgd2kfd = NULL;
> +
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +void radeon_kfd_fini(void)
> +{
> +	if (kgd2kfd) {
> +		kgd2kfd->exit();
> +		symbol_put(kgd2kfd_init);
> +	}
> +}
> +
> +void radeon_kfd_device_probe(struct radeon_device *rdev)
> +{
> +	if (kgd2kfd)
> +		rdev->kfd = kgd2kfd->probe((struct kgd_dev *)rdev, rdev->pdev);
> +}
> +
> +void radeon_kfd_device_init(struct radeon_device *rdev)
> +{
> +	if (rdev->kfd) {
> +		struct kgd2kfd_shared_resources gpu_resources = {
> +			.compute_vmid_bitmap = 0xFF00,
> +
> +			.first_compute_pipe = 1,
> +			.compute_pipe_count = 8 - 1,
> +		};
> +
> +		radeon_doorbell_get_kfd_info(rdev,
> +				&gpu_resources.doorbell_physical_address,
> +				&gpu_resources.doorbell_aperture_size,
> +				&gpu_resources.doorbell_start_offset);
> +
> +		kgd2kfd->device_init(rdev->kfd, &gpu_resources);
> +	}
> +}
> +
> +void radeon_kfd_device_fini(struct radeon_device *rdev)
> +{
> +	if (rdev->kfd) {
> +		kgd2kfd->device_exit(rdev->kfd);
> +		rdev->kfd = NULL;
> +	}
> +}
> +
> +void radeon_kfd_interrupt(struct radeon_device *rdev, const void *ih_ring_entry)
> +{
> +	if (rdev->kfd)
> +		kgd2kfd->interrupt(rdev->kfd, ih_ring_entry);
> +}
> +
> +void radeon_kfd_suspend(struct radeon_device *rdev)
> +{
> +	if (rdev->kfd)
> +		kgd2kfd->suspend(rdev->kfd);
> +}
> +
> +int radeon_kfd_resume(struct radeon_device *rdev)
> +{
> +	int r = 0;
> +
> +	if (rdev->kfd)
> +		r = kgd2kfd->resume(rdev->kfd);
> +
> +	return r;
> +}

All of the above wrapper function should be move to header file and mark
as inline this would allow for compiler optimization. I still would like
to see the possibility to build radeon without hsa.

> +
> +static u32 pool_to_domain(enum kgd_memory_pool p)
> +{
> +	switch (p) {
> +	case KGD_POOL_FRAMEBUFFER: return RADEON_GEM_DOMAIN_VRAM;
> +	default: return RADEON_GEM_DOMAIN_GTT;
> +	}
> +}
> +
> +static int allocate_mem(struct kgd_dev *kgd, size_t size, size_t alignment,
> +		enum kgd_memory_pool pool, struct kgd_mem **memory_handle)
> +{
> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
> +	struct kgd_mem *mem;
> +	int r;
> +
> +	mem = kzalloc(sizeof(struct kgd_mem), GFP_KERNEL);
> +	if (!mem)
> +		return -ENOMEM;
> +
> +	mem->domain = pool_to_domain(pool);
> +
> +	r = radeon_bo_create(rdev, size, alignment, true, mem->domain, NULL, &mem->bo);
> +	if (r) {
> +		kfree(mem);
> +		return r;
> +	}
> +
> +	*memory_handle = mem;
> +	return 0;
> +}
> +
> +static void free_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
> +{
> +	/* Assume that KFD will never free gpumapped or kmapped memory. This is not quite settled. */
> +	radeon_bo_unref(&mem->bo);
> +	kfree(mem);
> +}
> +
> +static int gpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address)
> +{
> +	int r;
> +
> +	r = radeon_bo_reserve(mem->bo, true);
> +
> +	/*
> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
> +	 * is held in circumstances that would deadlock
> +	 */
> +	BUG_ON(r != 0);
> +	r = radeon_bo_pin(mem->bo, mem->domain, vmid0_address);
> +	radeon_bo_unreserve(mem->bo);
> +
> +	return r;
> +}

NACK NACK NACK, no radeon_bo_pin this is not acceptable. Buffer pining should be done
very seldomly and i would say only radeon module can do it and only for buffer object
under its control. We certainly can not accept to do that for buffer object that are
under userspace management.


> +
> +static void ungpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
> +{
> +	int r;
> +
> +	r = radeon_bo_reserve(mem->bo, true);
> +
> +	/*
> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
> +	 * is held in circumstances that would deadlock
> +	 */
> +	BUG_ON(r != 0);
> +	r = radeon_bo_unpin(mem->bo);
> +
> +	/*
> +	 * This unpin only removed NO_EVICT placement flags
> +	 * and should never fail
> +	 */
> +	BUG_ON(r != 0);
> +	radeon_bo_unreserve(mem->bo);
> +}
> +
> +static int kmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr)
> +{
> +	int r;
> +
> +	r = radeon_bo_reserve(mem->bo, true);
> +
> +	/*
> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
> +	 * is held in circumstances that would deadlock
> +	 */
> +	BUG_ON(r != 0);
> +	r = radeon_bo_kmap(mem->bo, ptr);
> +	radeon_bo_unreserve(mem->bo);
> +
> +	return r;
> +}
> +
> +static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
> +{
> +	int r;
> +
> +	r = radeon_bo_reserve(mem->bo, true);
> +	/*
> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
> +	 * is held in circumstances that would deadlock
> +	 */
> +	BUG_ON(r != 0);
> +	radeon_bo_kunmap(mem->bo);
> +	radeon_bo_unreserve(mem->bo);
> +}
> +
> +static uint64_t get_vmem_size(struct kgd_dev *kgd)
> +{
> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
> +
> +	BUG_ON(kgd == NULL);
> +
> +	return rdev->mc.real_vram_size;
> +}
> +
> +static uint64_t get_gpu_clock_counter(struct kgd_dev *kgd)
> +{
> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
> +
> +	return rdev->asic->get_gpu_clock_counter(rdev);
> +}
> +
> +static uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd)
> +{
> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
> +
> +	/* The sclk is in quantas of 10kHz */
> +	return rdev->pm.dpm.dyn_state.max_clock_voltage_on_ac.sclk / 100;
> +}
> +
> +/*
> + * kfd/radeon registers access interface
> + */
> +
> +inline uint32_t lower_32(uint64_t x)
> +{
> +	return (uint32_t)x;
> +}
> +
> +inline uint32_t upper_32(uint64_t x)
> +{
> +	return (uint32_t)(x >> 32);
> +}

Use appropriate macro (upper_32_bits, lower_32_bits) instead of those
inline function.

> +
> +static inline struct radeon_device *get_radeon_device(struct kgd_dev *kgd)
> +{
> +	return (struct radeon_device *)kgd;
> +}
> +
> +static void write_register(struct kgd_dev *kgd, uint32_t offset, uint32_t value)
> +{
> +	struct radeon_device *rdev = get_radeon_device(kgd);
> +
> +	writel(value, (void __iomem *)(rdev->rmmio + offset));
> +}
> +
> +static uint32_t read_register(struct kgd_dev *kgd, uint32_t offset)
> +{
> +	struct radeon_device *rdev = get_radeon_device(kgd);
> +
> +	return readl((void __iomem *)(rdev->rmmio + offset));
> +}
> +
> +static void lock_srbm(struct kgd_dev *kgd, uint32_t mec, uint32_t pipe, uint32_t queue, uint32_t vmid)
> +{
> +	struct radeon_device *rdev = get_radeon_device(kgd);
> +	uint32_t value = PIPEID(pipe) | MEID(mec) | VMID(vmid) | QUEUEID(queue);
> +
> +	mutex_lock(&rdev->srbm_mutex);
> +	write_register(kgd, SRBM_GFX_CNTL, value);
> +}
> +
> +static void unlock_srbm(struct kgd_dev *kgd)
> +{
> +	struct radeon_device *rdev = get_radeon_device(kgd);
> +
> +	write_register(kgd, SRBM_GFX_CNTL, 0);
> +	mutex_unlock(&rdev->srbm_mutex);
> +}
> +
> +static void acquire_queue(struct kgd_dev *kgd, uint32_t pipe_id, uint32_t queue_id)
> +{
> +	uint32_t mec = (++pipe_id / CIK_PIPE_PER_MEC) + 1;
> +	uint32_t pipe = (pipe_id % CIK_PIPE_PER_MEC);
> +
> +	lock_srbm(kgd, mec, pipe, queue_id, 0);
> +}
> +
> +static void release_queue(struct kgd_dev *kgd)
> +{
> +	unlock_srbm(kgd);
> +}
> +
> +static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid, uint32_t sh_mem_config,
> +		uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit, uint32_t sh_mem_bases)
> +{
> +	lock_srbm(kgd, 0, 0, 0, vmid);
> +
> +	write_register(kgd, SH_MEM_CONFIG, sh_mem_config);
> +	write_register(kgd, SH_MEM_APE1_BASE, sh_mem_ape1_base);
> +	write_register(kgd, SH_MEM_APE1_LIMIT, sh_mem_ape1_limit);
> +	write_register(kgd, SH_MEM_BASES, sh_mem_bases);
> +
> +	unlock_srbm(kgd);
> +}
> +
> +static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid, unsigned int vmid)
> +{
> +	/* We have to assume that there is no outstanding mapping.
> +	 * The ATC_VMID_PASID_MAPPING_UPDATE_STATUS bit could be 0 because a mapping
> +	 * is in progress or because a mapping finished and the SW cleared it.
> +	 * So the protocol is to always wait & clear.
> +	 */
> +	uint32_t pasid_mapping = (pasid == 0) ? 0 : (uint32_t)pasid | ATC_VMID_PASID_MAPPING_VALID;
> +
> +	write_register(kgd, ATC_VMID0_PASID_MAPPING + vmid*sizeof(uint32_t), pasid_mapping);
> +
> +	while (!(read_register(kgd, ATC_VMID_PASID_MAPPING_UPDATE_STATUS) & (1U << vmid)))
> +		cpu_relax();
> +	write_register(kgd, ATC_VMID_PASID_MAPPING_UPDATE_STATUS, 1U << vmid);
> +
> +	return 0;
> +}
> +
> +static int kgd_init_memory(struct kgd_dev *kgd)
> +{
> +	/* Configure apertures:
> +	 * LDS:         0x60000000'00000000 - 0x60000001'00000000 (4GB)
> +	 * Scratch:     0x60000001'00000000 - 0x60000002'00000000 (4GB)
> +	 * GPUVM:       0x60010000'00000000 - 0x60020000'00000000 (1TB)
> +	 */

Again this whole aperture business need some explanation somewhere.

> +	int i;
> +	uint32_t sh_mem_bases = PRIVATE_BASE(0x6000) | SHARED_BASE(0x6000);
> +
> +	for (i = 8; i < 16; i++) {
> +		uint32_t sh_mem_config;
> +
> +		lock_srbm(kgd, 0, 0, 0, i);
> +
> +		sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED);
> +		sh_mem_config |= DEFAULT_MTYPE(MTYPE_NONCACHED);
> +
> +		write_register(kgd, SH_MEM_CONFIG, sh_mem_config);
> +
> +		write_register(kgd, SH_MEM_BASES, sh_mem_bases);
> +
> +		/* Scratch aperture is not supported for now. */
> +		write_register(kgd, SH_STATIC_MEM_CONFIG, 0);
> +
> +		/* APE1 disabled for now. */
> +		write_register(kgd, SH_MEM_APE1_BASE, 1);
> +		write_register(kgd, SH_MEM_APE1_LIMIT, 0);
> +
> +		unlock_srbm(kgd);
> +	}
> +
> +	return 0;
> +}
> +
> +static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id, uint32_t hpd_size, uint64_t hpd_gpu_addr)
> +{
> +	uint32_t mec = (++pipe_id / CIK_PIPE_PER_MEC) + 1;
> +	uint32_t pipe = (pipe_id % CIK_PIPE_PER_MEC);
> +
> +	lock_srbm(kgd, mec, pipe, 0, 0);
> +	write_register(kgd, CP_HPD_EOP_BASE_ADDR, lower_32(hpd_gpu_addr >> 8));
> +	write_register(kgd, CP_HPD_EOP_BASE_ADDR_HI, upper_32(hpd_gpu_addr >> 8));
> +	write_register(kgd, CP_HPD_EOP_VMID, 0);
> +	write_register(kgd, CP_HPD_EOP_CONTROL, hpd_size);
> +	unlock_srbm(kgd);
> +
> +	return 0;
> +}
> +
> +static inline struct cik_mqd *get_mqd(void *mqd)
> +{
> +	return (struct cik_mqd *)mqd;
> +}
> +
> +static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr)
> +{
> +	uint32_t wptr_shadow, is_wptr_shadow_valid;
> +	struct cik_mqd *m;
> +
> +	m = get_mqd(mqd);
> +
> +	is_wptr_shadow_valid = !get_user(wptr_shadow, wptr);
> +
> +	acquire_queue(kgd, pipe_id, queue_id);
> +	write_register(kgd, CP_MQD_BASE_ADDR, m->queue_state.cp_mqd_base_addr);
> +	write_register(kgd, CP_MQD_BASE_ADDR_HI, m->queue_state.cp_mqd_base_addr_hi);
> +	write_register(kgd, CP_MQD_CONTROL, m->queue_state.cp_mqd_control);
> +
> +	write_register(kgd, CP_HQD_PQ_BASE, m->queue_state.cp_hqd_pq_base);
> +	write_register(kgd, CP_HQD_PQ_BASE_HI, m->queue_state.cp_hqd_pq_base_hi);
> +	write_register(kgd, CP_HQD_PQ_CONTROL, m->queue_state.cp_hqd_pq_control);
> +
> +	write_register(kgd, CP_HQD_IB_CONTROL, m->queue_state.cp_hqd_ib_control);
> +	write_register(kgd, CP_HQD_IB_BASE_ADDR, m->queue_state.cp_hqd_ib_base_addr);
> +	write_register(kgd, CP_HQD_IB_BASE_ADDR_HI, m->queue_state.cp_hqd_ib_base_addr_hi);
> +
> +	write_register(kgd, CP_HQD_IB_RPTR, m->queue_state.cp_hqd_ib_rptr);
> +
> +	write_register(kgd, CP_HQD_PERSISTENT_STATE, m->queue_state.cp_hqd_persistent_state);
> +	write_register(kgd, CP_HQD_SEMA_CMD, m->queue_state.cp_hqd_sema_cmd);
> +	write_register(kgd, CP_HQD_MSG_TYPE, m->queue_state.cp_hqd_msg_type);
> +
> +	write_register(kgd, CP_HQD_ATOMIC0_PREOP_LO, m->queue_state.cp_hqd_atomic0_preop_lo);
> +	write_register(kgd, CP_HQD_ATOMIC0_PREOP_HI, m->queue_state.cp_hqd_atomic0_preop_hi);
> +	write_register(kgd, CP_HQD_ATOMIC1_PREOP_LO, m->queue_state.cp_hqd_atomic1_preop_lo);
> +	write_register(kgd, CP_HQD_ATOMIC1_PREOP_HI, m->queue_state.cp_hqd_atomic1_preop_hi);
> +
> +	write_register(kgd, CP_HQD_PQ_RPTR_REPORT_ADDR, m->queue_state.cp_hqd_pq_rptr_report_addr);
> +	write_register(kgd, CP_HQD_PQ_RPTR_REPORT_ADDR_HI, m->queue_state.cp_hqd_pq_rptr_report_addr_hi);
> +	write_register(kgd, CP_HQD_PQ_RPTR, m->queue_state.cp_hqd_pq_rptr);
> +
> +	write_register(kgd, CP_HQD_PQ_WPTR_POLL_ADDR, m->queue_state.cp_hqd_pq_wptr_poll_addr);
> +	write_register(kgd, CP_HQD_PQ_WPTR_POLL_ADDR_HI, m->queue_state.cp_hqd_pq_wptr_poll_addr_hi);
> +
> +	write_register(kgd, CP_HQD_PQ_DOORBELL_CONTROL, m->queue_state.cp_hqd_pq_doorbell_control);
> +
> +	write_register(kgd, CP_HQD_VMID, m->queue_state.cp_hqd_vmid);
> +
> +	write_register(kgd, CP_HQD_QUANTUM, m->queue_state.cp_hqd_quantum);
> +
> +	write_register(kgd, CP_HQD_PIPE_PRIORITY, m->queue_state.cp_hqd_pipe_priority);
> +	write_register(kgd, CP_HQD_QUEUE_PRIORITY, m->queue_state.cp_hqd_queue_priority);
> +
> +	write_register(kgd, CP_HQD_HQ_SCHEDULER0, m->queue_state.cp_hqd_hq_scheduler0);
> +	write_register(kgd, CP_HQD_HQ_SCHEDULER1, m->queue_state.cp_hqd_hq_scheduler1);
> +
> +	if (is_wptr_shadow_valid)
> +		write_register(kgd, CP_HQD_PQ_WPTR, wptr_shadow);
> +
> +	write_register(kgd, CP_HQD_ACTIVE, m->queue_state.cp_hqd_active);
> +	release_queue(kgd);
> +
> +	return 0;
> +}
> +
> +static bool kgd_hqd_is_occupies(struct kgd_dev *kgd, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id)
> +{
> +	uint32_t act;
> +	bool retval = false;
> +	uint32_t low, high;
> +
> +	acquire_queue(kgd, pipe_id, queue_id);
> +	act = read_register(kgd, CP_HQD_ACTIVE);
> +	if (act) {
> +		low = lower_32(queue_address >> 8);
> +		high = upper_32(queue_address >> 8);
> +
> +		if (low == read_register(kgd, CP_HQD_PQ_BASE) &&
> +				high == read_register(kgd, CP_HQD_PQ_BASE_HI))
> +			retval = true;
> +	}
> +	release_queue(kgd);
> +	return retval;
> +}
> +
> +static int kgd_hqd_destroy(struct kgd_dev *kgd, bool is_reset,
> +				unsigned int timeout, uint32_t pipe_id,
> +				uint32_t queue_id)
> +{
> +	int status = 0;
> +	bool sync = (timeout > 0) ? true : false;
> +
> +	acquire_queue(kgd, pipe_id, queue_id);
> +	write_register(kgd, CP_HQD_PQ_DOORBELL_CONTROL, 0);
> +
> +	if (is_reset)
> +		write_register(kgd, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_RESET);
> +	else
> +		write_register(kgd, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_DRAIN);
> +
> +
> +	while (read_register(kgd, CP_HQD_ACTIVE) != 0) {
> +		if (sync && timeout <= 0) {
> +			status = -EBUSY;
> +			break;
> +		}
> +		msleep(20);
> +		if (sync) {
> +			if (timeout >= 20)
> +				timeout -= 20;
> +			else
> +				timeout = 0;
> +		}
> +	}
> +	release_queue(kgd);
> +	return status;
> +}
> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.h b/drivers/gpu/drm/radeon/radeon_kfd.h
> new file mode 100644
> index 0000000..5171726
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/radeon_kfd.h
> @@ -0,0 +1,119 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +/*
> + * radeon_kfd.h defines the private interface between the
> + * AMD kernel graphics drivers and the AMD KFD.
> + */
> +
> +#ifndef RADEON_KFD_H_INCLUDED
> +#define RADEON_KFD_H_INCLUDED
> +
> +#include <linux/types.h>
> +
> +struct pci_dev;
> +
> +#define KFD_INTERFACE_VERSION 1
> +
> +struct kfd_dev;
> +struct kgd_dev;
> +
> +struct kgd_mem;
> +
> +struct radeon_device;
> +
> +enum kgd_memory_pool {
> +	KGD_POOL_SYSTEM_CACHEABLE = 1,
> +	KGD_POOL_SYSTEM_WRITECOMBINE = 2,
> +	KGD_POOL_FRAMEBUFFER = 3,
> +};
> +
> +struct kgd2kfd_shared_resources {
> +	unsigned int compute_vmid_bitmap; /* Bit n == 1 means VMID n is available for KFD. */
> +
> +	unsigned int first_compute_pipe; /* Compute pipes are counted starting from MEC0/pipe0 as 0. */
> +	unsigned int compute_pipe_count; /* Number of MEC pipes available for KFD. */
> +
> +	phys_addr_t doorbell_physical_address; /* Base address of doorbell aperture. */
> +	size_t doorbell_aperture_size; /* Size in bytes of doorbell aperture. */
> +	size_t doorbell_start_offset; /* Number of bytes at start of aperture reserved for KGD. */
> +};
> +
> +struct kgd2kfd_calls {
> +	void (*exit)(void);
> +	struct kfd_dev* (*probe)(struct kgd_dev *kgd, struct pci_dev *pdev);
> +	bool (*device_init)(struct kfd_dev *kfd, const struct kgd2kfd_shared_resources *gpu_resources);
> +	void (*device_exit)(struct kfd_dev *kfd);
> +	void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
> +	void (*suspend)(struct kfd_dev *kfd);
> +	int (*resume)(struct kfd_dev *kfd);
> +};
> +
> +struct kfd2kgd_calls {
> +	/* Memory management. */
> +	int (*allocate_mem)(struct kgd_dev *kgd,
> +				size_t size,
> +				size_t alignment,
> +				enum kgd_memory_pool pool,
> +				struct kgd_mem **memory_handle);
> +
> +	void (*free_mem)(struct kgd_dev *kgd, struct kgd_mem *memory_handle);
> +
> +	int (*gpumap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address);
> +	void (*ungpumap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
> +
> +	int (*kmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr);
> +	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
> +
> +	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
> +	uint64_t (*get_gpu_clock_counter)(struct kgd_dev *kgd);
> +
> +	uint32_t (*get_max_engine_clock_in_mhz)(struct kgd_dev *kgd);
> +
> +	/* Register access functions */
> +	void (*program_sh_mem_settings)(struct kgd_dev *kgd, uint32_t vmid, uint32_t sh_mem_config,
> +			uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit, uint32_t sh_mem_bases);
> +	int (*set_pasid_vmid_mapping)(struct kgd_dev *kgd, unsigned int pasid, unsigned int vmid);
> +	int (*init_memory)(struct kgd_dev *kgd);
> +	int (*init_pipeline)(struct kgd_dev *kgd, uint32_t pipe_id, uint32_t hpd_size, uint64_t hpd_gpu_addr);
> +	int (*hqd_load)(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr);
> +	bool (*hqd_is_occupies)(struct kgd_dev *kgd, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id);
> +	int (*hqd_destroy)(struct kgd_dev *kgd, bool is_reset, unsigned int timeout,
> +				uint32_t pipe_id, uint32_t queue_id);
> +};

Such interface should be documented looks at ttm or any other function structure
inside the kernel for example on how to document those.

> +
> +bool radeon_kfd_init(void);
> +void radeon_kfd_fini(void);
> +bool kgd2kfd_init(unsigned interface_version,
> +		  const struct kfd2kgd_calls *f2g,
> +		  const struct kgd2kfd_calls **g2f);
> +
> +void radeon_kfd_suspend(struct radeon_device *rdev);
> +int radeon_kfd_resume(struct radeon_device *rdev);
> +void radeon_kfd_interrupt(struct radeon_device *rdev,
> +			const void *ih_ring_entry);
> +void radeon_kfd_device_probe(struct radeon_device *rdev);
> +void radeon_kfd_device_init(struct radeon_device *rdev);
> +void radeon_kfd_device_fini(struct radeon_device *rdev);
> +
> +#endif
> +
> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
> index 35d9318..929beda 100644
> --- a/drivers/gpu/drm/radeon/radeon_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> @@ -34,6 +34,8 @@
>  #include <linux/slab.h>
>  #include <linux/pm_runtime.h>
>  
> +#include "radeon_kfd.h"
> +
>  #if defined(CONFIG_VGA_SWITCHEROO)
>  bool radeon_has_atpx(void);
>  #else
> @@ -63,6 +65,8 @@ int radeon_driver_unload_kms(struct drm_device *dev)
>  
>  	pm_runtime_get_sync(dev->dev);
>  
> +	radeon_kfd_device_fini(rdev);
> +
>  	radeon_acpi_fini(rdev);
>  	
>  	radeon_modeset_fini(rdev);
> @@ -142,6 +146,9 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
>  				"Error during ACPI methods call\n");
>  	}
>  
> +	radeon_kfd_device_probe(rdev);
> +	radeon_kfd_device_init(rdev);
> +
>  	if (radeon_is_px(dev)) {
>  		pm_runtime_use_autosuspend(dev->dev);
>  		pm_runtime_set_autosuspend_delay(dev->dev, 5000);
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 06/25] drm/radeon: Add radeon <--> amdkfd interface
  2014-07-20 17:35   ` Jerome Glisse
@ 2014-08-02 20:07     ` Oded Gabbay
  0 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-08-02 20:07 UTC (permalink / raw)
  To: Jerome Glisse, linux-kernel, dri-devel
  Cc: Andrew Lewycky, Michel Dänzer, Alexey Skidanov,
	Andrew Morton



On 20/07/14 20:35, Jerome Glisse wrote:
> On Thu, Jul 17, 2014 at 04:29:13PM +0300, Oded Gabbay wrote:
>> This patch adds the interface between the radeon driver and the amdkfd driver.
>> The interface implementation is contained in radeon_kfd.c and radeon_kfd.h.
>>
>> The interface itself is represented by a pointer to struct
>> kfd_dev. The pointer is located inside radeon_device structure.
>>
>> All the register accesses that amdkfd need are done using this interface. This
>> allows us to avoid direct register accesses in amdkfd proper,  while also
>> avoiding locking between amdkfd and radeon.
>>
>> The single exception is the doorbells that are used in both of the drivers.
>> However, because they are located in separate pci bar pages, the danger of
>> sharing registers between the drivers is minimal.
>>
>> Having said that, we are planning to move the doorbells as well to radeon.
>>
>> The loading of the amdkfd module is done via symbol lookup. According to the code review discussions, this may change in v3 of the patch set.
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> ---
>>  drivers/gpu/drm/radeon/Makefile     |   1 +
>>  drivers/gpu/drm/radeon/cik.c        |   9 +
>>  drivers/gpu/drm/radeon/cik_reg.h    |  65 +++++
>>  drivers/gpu/drm/radeon/cikd.h       |  51 +++-
>>  drivers/gpu/drm/radeon/radeon.h     |   3 +
>>  drivers/gpu/drm/radeon/radeon_drv.c |   5 +
>>  drivers/gpu/drm/radeon/radeon_kfd.c | 566 ++++++++++++++++++++++++++++++++++++
>>  drivers/gpu/drm/radeon/radeon_kfd.h | 119 ++++++++
>>  drivers/gpu/drm/radeon/radeon_kms.c |   7 +
>>  9 files changed, 825 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
>>  create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.h
>>
>> diff --git a/drivers/gpu/drm/radeon/Makefile b/drivers/gpu/drm/radeon/Makefile
>> index 1b04002..a1c913d 100644
>> --- a/drivers/gpu/drm/radeon/Makefile
>> +++ b/drivers/gpu/drm/radeon/Makefile
>> @@ -104,6 +104,7 @@ radeon-y += \
>>  	radeon_vce.o \
>>  	vce_v1_0.o \
>>  	vce_v2_0.o \
>> +	radeon_kfd.o
>>  
>>  radeon-$(CONFIG_COMPAT) += radeon_ioc32.o
>>  radeon-$(CONFIG_VGA_SWITCHEROO) += radeon_atpx_handler.o
>> diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
>> index b4bbc22..6f71095 100644
>> --- a/drivers/gpu/drm/radeon/cik.c
>> +++ b/drivers/gpu/drm/radeon/cik.c
>> @@ -32,6 +32,7 @@
>>  #include "cik_blit_shaders.h"
>>  #include "radeon_ucode.h"
>>  #include "clearstate_ci.h"
>> +#include "radeon_kfd.h"
>>  
>>  MODULE_FIRMWARE("radeon/BONAIRE_pfp.bin");
>>  MODULE_FIRMWARE("radeon/BONAIRE_me.bin");
>> @@ -7727,6 +7728,9 @@ restart_ih:
>>  	while (rptr != wptr) {
>>  		/* wptr/rptr are in bytes! */
>>  		ring_index = rptr / 4;
>> +
>> +		radeon_kfd_interrupt(rdev, (const void *) &rdev->ih.ring[ring_index]);
>> +
>>  		src_id =  le32_to_cpu(rdev->ih.ring[ring_index]) & 0xff;
>>  		src_data = le32_to_cpu(rdev->ih.ring[ring_index + 1]) & 0xfffffff;
>>  		ring_id = le32_to_cpu(rdev->ih.ring[ring_index + 2]) & 0xff;
>> @@ -8386,6 +8390,10 @@ static int cik_startup(struct radeon_device *rdev)
>>  	if (r)
>>  		return r;
>>  
>> +	r = radeon_kfd_resume(rdev);
>> +	if (r)
>> +		return r;
>> +
>>  	return 0;
>>  }
>>  
>> @@ -8434,6 +8442,7 @@ int cik_resume(struct radeon_device *rdev)
>>   */
>>  int cik_suspend(struct radeon_device *rdev)
>>  {
>> +	radeon_kfd_suspend(rdev);
>>  	radeon_pm_suspend(rdev);
>>  	dce6_audio_fini(rdev);
>>  	radeon_vm_manager_fini(rdev);
>> diff --git a/drivers/gpu/drm/radeon/cik_reg.h b/drivers/gpu/drm/radeon/cik_reg.h
>> index ca1bb61..1ab3dbc 100644
>> --- a/drivers/gpu/drm/radeon/cik_reg.h
>> +++ b/drivers/gpu/drm/radeon/cik_reg.h
>> @@ -147,4 +147,69 @@
>>  
>>  #define CIK_LB_DESKTOP_HEIGHT                     0x6b0c
>>  
>> +struct cik_hqd_registers {
>> +	u32 cp_mqd_base_addr;
>> +	u32 cp_mqd_base_addr_hi;
>> +	u32 cp_hqd_active;
>> +	u32 cp_hqd_vmid;
>> +	u32 cp_hqd_persistent_state;
>> +	u32 cp_hqd_pipe_priority;
>> +	u32 cp_hqd_queue_priority;
>> +	u32 cp_hqd_quantum;
>> +	u32 cp_hqd_pq_base;
>> +	u32 cp_hqd_pq_base_hi;
>> +	u32 cp_hqd_pq_rptr;
>> +	u32 cp_hqd_pq_rptr_report_addr;
>> +	u32 cp_hqd_pq_rptr_report_addr_hi;
>> +	u32 cp_hqd_pq_wptr_poll_addr;
>> +	u32 cp_hqd_pq_wptr_poll_addr_hi;
>> +	u32 cp_hqd_pq_doorbell_control;
>> +	u32 cp_hqd_pq_wptr;
>> +	u32 cp_hqd_pq_control;
>> +	u32 cp_hqd_ib_base_addr;
>> +	u32 cp_hqd_ib_base_addr_hi;
>> +	u32 cp_hqd_ib_rptr;
>> +	u32 cp_hqd_ib_control;
>> +	u32 cp_hqd_iq_timer;
>> +	u32 cp_hqd_iq_rptr;
>> +	u32 cp_hqd_dequeue_request;
>> +	u32 cp_hqd_dma_offload;
>> +	u32 cp_hqd_sema_cmd;
>> +	u32 cp_hqd_msg_type;
>> +	u32 cp_hqd_atomic0_preop_lo;
>> +	u32 cp_hqd_atomic0_preop_hi;
>> +	u32 cp_hqd_atomic1_preop_lo;
>> +	u32 cp_hqd_atomic1_preop_hi;
>> +	u32 cp_hqd_hq_scheduler0;
>> +	u32 cp_hqd_hq_scheduler1;
>> +	u32 cp_mqd_control;
>> +};
>> +
>> +struct cik_mqd {
>> +	u32 header;
>> +	u32 dispatch_initiator;
>> +	u32 dimensions[3];
>> +	u32 start_idx[3];
>> +	u32 num_threads[3];
>> +	u32 pipeline_stat_enable;
>> +	u32 perf_counter_enable;
>> +	u32 pgm[2];
>> +	u32 tba[2];
>> +	u32 tma[2];
>> +	u32 pgm_rsrc[2];
>> +	u32 vmid;
>> +	u32 resource_limits;
>> +	u32 static_thread_mgmt01[2];
>> +	u32 tmp_ring_size;
>> +	u32 static_thread_mgmt23[2];
>> +	u32 restart[3];
>> +	u32 thread_trace_enable;
>> +	u32 reserved1;
>> +	u32 user_data[16];
>> +	u32 vgtcs_invoke_count[2];
>> +	struct cik_hqd_registers queue_state;
>> +	u32 dequeue_cntr;
>> +	u32 interrupt_queue[64];
>> +};
>> +
>>  #endif
>> diff --git a/drivers/gpu/drm/radeon/cikd.h b/drivers/gpu/drm/radeon/cikd.h
>> index 0c6e1b5..0a2a403 100644
>> --- a/drivers/gpu/drm/radeon/cikd.h
>> +++ b/drivers/gpu/drm/radeon/cikd.h
>> @@ -1137,6 +1137,9 @@
>>  #define			SH_MEM_ALIGNMENT_MODE_UNALIGNED			3
>>  #define		DEFAULT_MTYPE(x)				((x) << 4)
>>  #define		APE1_MTYPE(x)					((x) << 7)
>> +/* valid for both DEFAULT_MTYPE and APE1_MTYPE */
>> +#define	MTYPE_CACHED					0
>> +#define	MTYPE_NONCACHED					3
>>  
>>  #define	SX_DEBUG_1					0x9060
>>  
>> @@ -1447,6 +1450,16 @@
>>  #define CP_HQD_ACTIVE                                     0xC91C
>>  #define CP_HQD_VMID                                       0xC920
>>  
>> +#define CP_HQD_PERSISTENT_STATE							0xC924u
>> +#define	DEFAULT_CP_HQD_PERSISTENT_STATE						(0x33U << 8)
>> +
>> +#define CP_HQD_PIPE_PRIORITY							0xC928u
>> +#define CP_HQD_QUEUE_PRIORITY							0xC92Cu
>> +#define CP_HQD_QUANTUM									0xC930u
>> +#define	QUANTUM_EN											1U
>> +#define	QUANTUM_SCALE_1MS									(1U << 4)
>> +#define	QUANTUM_DURATION(x)									((x) << 8)
>> +
> 
> We need documentation for this queue/pipe priority to know their
> granularity and how they are use exactly.
> 
Done in v3
>>  #define CP_HQD_PQ_BASE                                    0xC934
>>  #define CP_HQD_PQ_BASE_HI                                 0xC938
>>  #define CP_HQD_PQ_RPTR                                    0xC93C
>> @@ -1474,12 +1487,32 @@
>>  #define		PRIV_STATE      			(1 << 30)
>>  #define		KMD_QUEUE      				(1 << 31)
>>  
>> -#define CP_HQD_DEQUEUE_REQUEST                          0xC974
>> +#define CP_HQD_IB_BASE_ADDR				0xC95Cu
>> +#define CP_HQD_IB_BASE_ADDR_HI			0xC960u
>> +#define CP_HQD_IB_RPTR					0xC964u
>> +#define CP_HQD_IB_CONTROL				0xC968u
>> +#define	IB_ATC_EN							(1U << 23)
>> +#define	DEFAULT_MIN_IB_AVAIL_SIZE			(3U << 20)
>> +
>> +#define CP_HQD_DEQUEUE_REQUEST			0xC974
>> +#define	DEQUEUE_REQUEST_DRAIN				1
>> +#define DEQUEUE_REQUEST_RESET				2
>>  
>>  #define CP_MQD_CONTROL                                  0xC99C
>>  #define		MQD_VMID(x)				((x) << 0)
>>  #define		MQD_VMID_MASK      			(0xf << 0)
>>  
>> +#define CP_HQD_SEMA_CMD					0xC97Cu
>> +#define CP_HQD_MSG_TYPE					0xC980u
>> +#define CP_HQD_ATOMIC0_PREOP_LO			0xC984u
>> +#define CP_HQD_ATOMIC0_PREOP_HI			0xC988u
>> +#define CP_HQD_ATOMIC1_PREOP_LO			0xC98Cu
>> +#define CP_HQD_ATOMIC1_PREOP_HI			0xC990u
>> +#define CP_HQD_HQ_SCHEDULER0			0xC994u
>> +#define CP_HQD_HQ_SCHEDULER1			0xC998u
>> +
>> +#define SH_STATIC_MEM_CONFIG			0x9604u
> 
> Same here documentation is needed on all those register.
> 
This is a bit more problematic. I need to find out what I can reveal. I
prefer to add this later (v4 or a single patch)
>> +
>>  #define DB_RENDER_CONTROL                               0x28000
>>  
>>  #define PA_SC_RASTER_CONFIG                             0x28350
>> @@ -2069,4 +2102,20 @@
>>  #define VCE_CMD_IB_AUTO		0x00000005
>>  #define VCE_CMD_SEMAPHORE	0x00000006
>>  
>> +#define ATC_VMID0_PASID_MAPPING					0x339Cu
>> +#define	ATC_VMID_PASID_MAPPING_UPDATE_STATUS	0x3398u
>> +#define	ATC_VMID_PASID_MAPPING_VALID				(1U << 31)
>> +
>> +#define ATC_VM_APERTURE0_CNTL					0x3310u
>> +#define	ATS_ACCESS_MODE_NEVER						0
>> +#define	ATS_ACCESS_MODE_ALWAYS						1
>> +
>> +#define ATC_VM_APERTURE0_CNTL2					0x3318u
>> +#define ATC_VM_APERTURE0_HIGH_ADDR				0x3308u
>> +#define ATC_VM_APERTURE0_LOW_ADDR				0x3300u
>> +#define ATC_VM_APERTURE1_CNTL					0x3314u
>> +#define ATC_VM_APERTURE1_CNTL2					0x331Cu
>> +#define ATC_VM_APERTURE1_HIGH_ADDR				0x330Cu
>> +#define ATC_VM_APERTURE1_LOW_ADDR				0x3304u
>> +
>>  #endif
>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
>> index 5136855..94b38a7 100644
>> --- a/drivers/gpu/drm/radeon/radeon.h
>> +++ b/drivers/gpu/drm/radeon/radeon.h
>> @@ -2342,6 +2342,9 @@ struct radeon_device {
>>  
>>  	struct dev_pm_domain vga_pm_domain;
>>  	bool have_disp_power_ref;
>> +
>> +	/* HSA KFD interface */
>> +	struct kfd_dev		*kfd;
>>  };
>>  
>>  bool radeon_is_px(struct drm_device *dev);
>> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
>> index cb14213..efaa086 100644
>> --- a/drivers/gpu/drm/radeon/radeon_drv.c
>> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
>> @@ -39,6 +39,8 @@
>>  #include <linux/pm_runtime.h>
>>  #include <linux/vga_switcheroo.h>
>>  #include "drm_crtc_helper.h"
>> +#include "radeon_kfd.h"
>> +
>>  /*
>>   * KMS wrapper.
>>   * - 2.0.0 - initial interface
>> @@ -630,12 +632,15 @@ static int __init radeon_init(void)
>>  #endif
>>  	}
>>  
>> +	radeon_kfd_init();
>> +
>>  	/* let modprobe override vga console setting */
>>  	return drm_pci_init(driver, pdriver);
>>  }
>>  
>>  static void __exit radeon_exit(void)
>>  {
>> +	radeon_kfd_fini();
>>  	drm_pci_exit(driver, pdriver);
>>  	radeon_unregister_atpx_handler();
>>  }
>> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
>> new file mode 100644
>> index 0000000..0385239
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
>> @@ -0,0 +1,566 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <linux/module.h>
>> +#include <linux/fdtable.h>
>> +#include <linux/uaccess.h>
>> +#include <drm/drmP.h>
>> +#include "radeon.h"
>> +#include "cikd.h"
>> +#include "cik_reg.h"
>> +#include "radeon_kfd.h"
>> +
>> +#define CIK_PIPE_PER_MEC	(4)
>> +
>> +struct kgd_mem {
>> +	struct radeon_bo *bo;
>> +	u32 domain;
>> +	struct radeon_bo_va *bo_va;
>> +};
>> +
>> +static int allocate_mem(struct kgd_dev *kgd, size_t size, size_t alignment,
>> +		enum kgd_memory_pool pool, struct kgd_mem **memory_handle);
>> +
>> +static void free_mem(struct kgd_dev *kgd, struct kgd_mem *memory_handle);
>> +
>> +static int gpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address);
>> +static void ungpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
>> +
>> +static int kmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr);
>> +static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
>> +
>> +static uint64_t get_vmem_size(struct kgd_dev *kgd);
>> +static uint64_t get_gpu_clock_counter(struct kgd_dev *kgd);
>> +
>> +static uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd);
>> +
>> +/*
>> + * Register access functions
>> + */
>> +
>> +static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid, uint32_t sh_mem_config,
>> +		uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit, uint32_t sh_mem_bases);
>> +static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid, unsigned int vmid);
>> +static int kgd_init_memory(struct kgd_dev *kgd);
>> +static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id, uint32_t hpd_size, uint64_t hpd_gpu_addr);
>> +static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr);
>> +static bool kgd_hqd_is_occupies(struct kgd_dev *kgd, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id);
>> +static int kgd_hqd_destroy(struct kgd_dev *kgd, bool is_reset, unsigned int timeout,
>> +				uint32_t pipe_id, uint32_t queue_id);
>> +
>> +static const struct kfd2kgd_calls kfd2kgd = {
>> +	.allocate_mem = allocate_mem,
>> +	.free_mem = free_mem,
>> +	.gpumap_mem = gpumap_mem,
>> +	.ungpumap_mem = ungpumap_mem,
>> +	.kmap_mem = kmap_mem,
>> +	.unkmap_mem = unkmap_mem,
>> +	.get_vmem_size = get_vmem_size,
>> +	.get_gpu_clock_counter = get_gpu_clock_counter,
>> +	.get_max_engine_clock_in_mhz = get_max_engine_clock_in_mhz,
>> +	.program_sh_mem_settings = kgd_program_sh_mem_settings,
>> +	.set_pasid_vmid_mapping = kgd_set_pasid_vmid_mapping,
>> +	.init_memory = kgd_init_memory,
>> +	.init_pipeline = kgd_init_pipeline,
>> +	.hqd_load = kgd_hqd_load,
>> +	.hqd_is_occupies = kgd_hqd_is_occupies,
>> +	.hqd_destroy = kgd_hqd_destroy,
>> +};
>> +
>> +static const struct kgd2kfd_calls *kgd2kfd;
>> +
>> +bool radeon_kfd_init(void)
>> +{
>> +	bool (*kgd2kfd_init_p)(unsigned, const struct kfd2kgd_calls*,
>> +				const struct kgd2kfd_calls**);
>> +
>> +	kgd2kfd_init_p = symbol_request(kgd2kfd_init);
>> +
>> +	if (kgd2kfd_init_p == NULL)
>> +		return false;
>> +
>> +	if (!kgd2kfd_init_p(KFD_INTERFACE_VERSION, &kfd2kgd, &kgd2kfd)) {
>> +		symbol_put(kgd2kfd_init);
>> +		kgd2kfd = NULL;
>> +
>> +		return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +void radeon_kfd_fini(void)
>> +{
>> +	if (kgd2kfd) {
>> +		kgd2kfd->exit();
>> +		symbol_put(kgd2kfd_init);
>> +	}
>> +}
>> +
>> +void radeon_kfd_device_probe(struct radeon_device *rdev)
>> +{
>> +	if (kgd2kfd)
>> +		rdev->kfd = kgd2kfd->probe((struct kgd_dev *)rdev, rdev->pdev);
>> +}
>> +
>> +void radeon_kfd_device_init(struct radeon_device *rdev)
>> +{
>> +	if (rdev->kfd) {
>> +		struct kgd2kfd_shared_resources gpu_resources = {
>> +			.compute_vmid_bitmap = 0xFF00,
>> +
>> +			.first_compute_pipe = 1,
>> +			.compute_pipe_count = 8 - 1,
>> +		};
>> +
>> +		radeon_doorbell_get_kfd_info(rdev,
>> +				&gpu_resources.doorbell_physical_address,
>> +				&gpu_resources.doorbell_aperture_size,
>> +				&gpu_resources.doorbell_start_offset);
>> +
>> +		kgd2kfd->device_init(rdev->kfd, &gpu_resources);
>> +	}
>> +}
>> +
>> +void radeon_kfd_device_fini(struct radeon_device *rdev)
>> +{
>> +	if (rdev->kfd) {
>> +		kgd2kfd->device_exit(rdev->kfd);
>> +		rdev->kfd = NULL;
>> +	}
>> +}
>> +
>> +void radeon_kfd_interrupt(struct radeon_device *rdev, const void *ih_ring_entry)
>> +{
>> +	if (rdev->kfd)
>> +		kgd2kfd->interrupt(rdev->kfd, ih_ring_entry);
>> +}
>> +
>> +void radeon_kfd_suspend(struct radeon_device *rdev)
>> +{
>> +	if (rdev->kfd)
>> +		kgd2kfd->suspend(rdev->kfd);
>> +}
>> +
>> +int radeon_kfd_resume(struct radeon_device *rdev)
>> +{
>> +	int r = 0;
>> +
>> +	if (rdev->kfd)
>> +		r = kgd2kfd->resume(rdev->kfd);
>> +
>> +	return r;
>> +}
> 
> All of the above wrapper function should be move to header file and mark
> as inline this would allow for compiler optimization. I still would like
> to see the possibility to build radeon without hsa.
> 
That is problematic as they don't compile in the header file. Anyway,
these functions are rarely called so a compiler optimization is quite
useless here.
Radeon can definitely build without amdkfd. This file will always be
built as it will be part of radeon.
>> +
>> +static u32 pool_to_domain(enum kgd_memory_pool p)
>> +{
>> +	switch (p) {
>> +	case KGD_POOL_FRAMEBUFFER: return RADEON_GEM_DOMAIN_VRAM;
>> +	default: return RADEON_GEM_DOMAIN_GTT;
>> +	}
>> +}
>> +
>> +static int allocate_mem(struct kgd_dev *kgd, size_t size, size_t alignment,
>> +		enum kgd_memory_pool pool, struct kgd_mem **memory_handle)
>> +{
>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>> +	struct kgd_mem *mem;
>> +	int r;
>> +
>> +	mem = kzalloc(sizeof(struct kgd_mem), GFP_KERNEL);
>> +	if (!mem)
>> +		return -ENOMEM;
>> +
>> +	mem->domain = pool_to_domain(pool);
>> +
>> +	r = radeon_bo_create(rdev, size, alignment, true, mem->domain, NULL, &mem->bo);
>> +	if (r) {
>> +		kfree(mem);
>> +		return r;
>> +	}
>> +
>> +	*memory_handle = mem;
>> +	return 0;
>> +}
>> +
>> +static void free_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
>> +{
>> +	/* Assume that KFD will never free gpumapped or kmapped memory. This is not quite settled. */
>> +	radeon_bo_unref(&mem->bo);
>> +	kfree(mem);
>> +}
>> +
>> +static int gpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address)
>> +{
>> +	int r;
>> +
>> +	r = radeon_bo_reserve(mem->bo, true);
>> +
>> +	/*
>> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
>> +	 * is held in circumstances that would deadlock
>> +	 */
>> +	BUG_ON(r != 0);
>> +	r = radeon_bo_pin(mem->bo, mem->domain, vmid0_address);
>> +	radeon_bo_unreserve(mem->bo);
>> +
>> +	return r;
>> +}
> 
> NACK NACK NACK, no radeon_bo_pin this is not acceptable. Buffer pining should be done
> very seldomly and i would say only radeon module can do it and only for buffer object
> under its control. We certainly can not accept to do that for buffer object that are
> under userspace management.
> 
Changed to new method in v3, as discussed in main thread.
> 
>> +
>> +static void ungpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
>> +{
>> +	int r;
>> +
>> +	r = radeon_bo_reserve(mem->bo, true);
>> +
>> +	/*
>> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
>> +	 * is held in circumstances that would deadlock
>> +	 */
>> +	BUG_ON(r != 0);
>> +	r = radeon_bo_unpin(mem->bo);
>> +
>> +	/*
>> +	 * This unpin only removed NO_EVICT placement flags
>> +	 * and should never fail
>> +	 */
>> +	BUG_ON(r != 0);
>> +	radeon_bo_unreserve(mem->bo);
>> +}
>> +
>> +static int kmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr)
>> +{
>> +	int r;
>> +
>> +	r = radeon_bo_reserve(mem->bo, true);
>> +
>> +	/*
>> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
>> +	 * is held in circumstances that would deadlock
>> +	 */
>> +	BUG_ON(r != 0);
>> +	r = radeon_bo_kmap(mem->bo, ptr);
>> +	radeon_bo_unreserve(mem->bo);
>> +
>> +	return r;
>> +}
>> +
>> +static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
>> +{
>> +	int r;
>> +
>> +	r = radeon_bo_reserve(mem->bo, true);
>> +	/*
>> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
>> +	 * is held in circumstances that would deadlock
>> +	 */
>> +	BUG_ON(r != 0);
>> +	radeon_bo_kunmap(mem->bo);
>> +	radeon_bo_unreserve(mem->bo);
>> +}
>> +
>> +static uint64_t get_vmem_size(struct kgd_dev *kgd)
>> +{
>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>> +
>> +	BUG_ON(kgd == NULL);
>> +
>> +	return rdev->mc.real_vram_size;
>> +}
>> +
>> +static uint64_t get_gpu_clock_counter(struct kgd_dev *kgd)
>> +{
>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>> +
>> +	return rdev->asic->get_gpu_clock_counter(rdev);
>> +}
>> +
>> +static uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd)
>> +{
>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>> +
>> +	/* The sclk is in quantas of 10kHz */
>> +	return rdev->pm.dpm.dyn_state.max_clock_voltage_on_ac.sclk / 100;
>> +}
>> +
>> +/*
>> + * kfd/radeon registers access interface
>> + */
>> +
>> +inline uint32_t lower_32(uint64_t x)
>> +{
>> +	return (uint32_t)x;
>> +}
>> +
>> +inline uint32_t upper_32(uint64_t x)
>> +{
>> +	return (uint32_t)(x >> 32);
>> +}
> 
> Use appropriate macro (upper_32_bits, lower_32_bits) instead of those
> inline function.
> 
Done in v3.
>> +
>> +static inline struct radeon_device *get_radeon_device(struct kgd_dev *kgd)
>> +{
>> +	return (struct radeon_device *)kgd;
>> +}
>> +
>> +static void write_register(struct kgd_dev *kgd, uint32_t offset, uint32_t value)
>> +{
>> +	struct radeon_device *rdev = get_radeon_device(kgd);
>> +
>> +	writel(value, (void __iomem *)(rdev->rmmio + offset));
>> +}
>> +
>> +static uint32_t read_register(struct kgd_dev *kgd, uint32_t offset)
>> +{
>> +	struct radeon_device *rdev = get_radeon_device(kgd);
>> +
>> +	return readl((void __iomem *)(rdev->rmmio + offset));
>> +}
>> +
>> +static void lock_srbm(struct kgd_dev *kgd, uint32_t mec, uint32_t pipe, uint32_t queue, uint32_t vmid)
>> +{
>> +	struct radeon_device *rdev = get_radeon_device(kgd);
>> +	uint32_t value = PIPEID(pipe) | MEID(mec) | VMID(vmid) | QUEUEID(queue);
>> +
>> +	mutex_lock(&rdev->srbm_mutex);
>> +	write_register(kgd, SRBM_GFX_CNTL, value);
>> +}
>> +
>> +static void unlock_srbm(struct kgd_dev *kgd)
>> +{
>> +	struct radeon_device *rdev = get_radeon_device(kgd);
>> +
>> +	write_register(kgd, SRBM_GFX_CNTL, 0);
>> +	mutex_unlock(&rdev->srbm_mutex);
>> +}
>> +
>> +static void acquire_queue(struct kgd_dev *kgd, uint32_t pipe_id, uint32_t queue_id)
>> +{
>> +	uint32_t mec = (++pipe_id / CIK_PIPE_PER_MEC) + 1;
>> +	uint32_t pipe = (pipe_id % CIK_PIPE_PER_MEC);
>> +
>> +	lock_srbm(kgd, mec, pipe, queue_id, 0);
>> +}
>> +
>> +static void release_queue(struct kgd_dev *kgd)
>> +{
>> +	unlock_srbm(kgd);
>> +}
>> +
>> +static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid, uint32_t sh_mem_config,
>> +		uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit, uint32_t sh_mem_bases)
>> +{
>> +	lock_srbm(kgd, 0, 0, 0, vmid);
>> +
>> +	write_register(kgd, SH_MEM_CONFIG, sh_mem_config);
>> +	write_register(kgd, SH_MEM_APE1_BASE, sh_mem_ape1_base);
>> +	write_register(kgd, SH_MEM_APE1_LIMIT, sh_mem_ape1_limit);
>> +	write_register(kgd, SH_MEM_BASES, sh_mem_bases);
>> +
>> +	unlock_srbm(kgd);
>> +}
>> +
>> +static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid, unsigned int vmid)
>> +{
>> +	/* We have to assume that there is no outstanding mapping.
>> +	 * The ATC_VMID_PASID_MAPPING_UPDATE_STATUS bit could be 0 because a mapping
>> +	 * is in progress or because a mapping finished and the SW cleared it.
>> +	 * So the protocol is to always wait & clear.
>> +	 */
>> +	uint32_t pasid_mapping = (pasid == 0) ? 0 : (uint32_t)pasid | ATC_VMID_PASID_MAPPING_VALID;
>> +
>> +	write_register(kgd, ATC_VMID0_PASID_MAPPING + vmid*sizeof(uint32_t), pasid_mapping);
>> +
>> +	while (!(read_register(kgd, ATC_VMID_PASID_MAPPING_UPDATE_STATUS) & (1U << vmid)))
>> +		cpu_relax();
>> +	write_register(kgd, ATC_VMID_PASID_MAPPING_UPDATE_STATUS, 1U << vmid);
>> +
>> +	return 0;
>> +}
>> +
>> +static int kgd_init_memory(struct kgd_dev *kgd)
>> +{
>> +	/* Configure apertures:
>> +	 * LDS:         0x60000000'00000000 - 0x60000001'00000000 (4GB)
>> +	 * Scratch:     0x60000001'00000000 - 0x60000002'00000000 (4GB)
>> +	 * GPUVM:       0x60010000'00000000 - 0x60020000'00000000 (1TB)
>> +	 */
> 
> Again this whole aperture business need some explanation somewhere.
> 
Added explanation in v3.
>> +	int i;
>> +	uint32_t sh_mem_bases = PRIVATE_BASE(0x6000) | SHARED_BASE(0x6000);
>> +
>> +	for (i = 8; i < 16; i++) {
>> +		uint32_t sh_mem_config;
>> +
>> +		lock_srbm(kgd, 0, 0, 0, i);
>> +
>> +		sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED);
>> +		sh_mem_config |= DEFAULT_MTYPE(MTYPE_NONCACHED);
>> +
>> +		write_register(kgd, SH_MEM_CONFIG, sh_mem_config);
>> +
>> +		write_register(kgd, SH_MEM_BASES, sh_mem_bases);
>> +
>> +		/* Scratch aperture is not supported for now. */
>> +		write_register(kgd, SH_STATIC_MEM_CONFIG, 0);
>> +
>> +		/* APE1 disabled for now. */
>> +		write_register(kgd, SH_MEM_APE1_BASE, 1);
>> +		write_register(kgd, SH_MEM_APE1_LIMIT, 0);
>> +
>> +		unlock_srbm(kgd);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id, uint32_t hpd_size, uint64_t hpd_gpu_addr)
>> +{
>> +	uint32_t mec = (++pipe_id / CIK_PIPE_PER_MEC) + 1;
>> +	uint32_t pipe = (pipe_id % CIK_PIPE_PER_MEC);
>> +
>> +	lock_srbm(kgd, mec, pipe, 0, 0);
>> +	write_register(kgd, CP_HPD_EOP_BASE_ADDR, lower_32(hpd_gpu_addr >> 8));
>> +	write_register(kgd, CP_HPD_EOP_BASE_ADDR_HI, upper_32(hpd_gpu_addr >> 8));
>> +	write_register(kgd, CP_HPD_EOP_VMID, 0);
>> +	write_register(kgd, CP_HPD_EOP_CONTROL, hpd_size);
>> +	unlock_srbm(kgd);
>> +
>> +	return 0;
>> +}
>> +
>> +static inline struct cik_mqd *get_mqd(void *mqd)
>> +{
>> +	return (struct cik_mqd *)mqd;
>> +}
>> +
>> +static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr)
>> +{
>> +	uint32_t wptr_shadow, is_wptr_shadow_valid;
>> +	struct cik_mqd *m;
>> +
>> +	m = get_mqd(mqd);
>> +
>> +	is_wptr_shadow_valid = !get_user(wptr_shadow, wptr);
>> +
>> +	acquire_queue(kgd, pipe_id, queue_id);
>> +	write_register(kgd, CP_MQD_BASE_ADDR, m->queue_state.cp_mqd_base_addr);
>> +	write_register(kgd, CP_MQD_BASE_ADDR_HI, m->queue_state.cp_mqd_base_addr_hi);
>> +	write_register(kgd, CP_MQD_CONTROL, m->queue_state.cp_mqd_control);
>> +
>> +	write_register(kgd, CP_HQD_PQ_BASE, m->queue_state.cp_hqd_pq_base);
>> +	write_register(kgd, CP_HQD_PQ_BASE_HI, m->queue_state.cp_hqd_pq_base_hi);
>> +	write_register(kgd, CP_HQD_PQ_CONTROL, m->queue_state.cp_hqd_pq_control);
>> +
>> +	write_register(kgd, CP_HQD_IB_CONTROL, m->queue_state.cp_hqd_ib_control);
>> +	write_register(kgd, CP_HQD_IB_BASE_ADDR, m->queue_state.cp_hqd_ib_base_addr);
>> +	write_register(kgd, CP_HQD_IB_BASE_ADDR_HI, m->queue_state.cp_hqd_ib_base_addr_hi);
>> +
>> +	write_register(kgd, CP_HQD_IB_RPTR, m->queue_state.cp_hqd_ib_rptr);
>> +
>> +	write_register(kgd, CP_HQD_PERSISTENT_STATE, m->queue_state.cp_hqd_persistent_state);
>> +	write_register(kgd, CP_HQD_SEMA_CMD, m->queue_state.cp_hqd_sema_cmd);
>> +	write_register(kgd, CP_HQD_MSG_TYPE, m->queue_state.cp_hqd_msg_type);
>> +
>> +	write_register(kgd, CP_HQD_ATOMIC0_PREOP_LO, m->queue_state.cp_hqd_atomic0_preop_lo);
>> +	write_register(kgd, CP_HQD_ATOMIC0_PREOP_HI, m->queue_state.cp_hqd_atomic0_preop_hi);
>> +	write_register(kgd, CP_HQD_ATOMIC1_PREOP_LO, m->queue_state.cp_hqd_atomic1_preop_lo);
>> +	write_register(kgd, CP_HQD_ATOMIC1_PREOP_HI, m->queue_state.cp_hqd_atomic1_preop_hi);
>> +
>> +	write_register(kgd, CP_HQD_PQ_RPTR_REPORT_ADDR, m->queue_state.cp_hqd_pq_rptr_report_addr);
>> +	write_register(kgd, CP_HQD_PQ_RPTR_REPORT_ADDR_HI, m->queue_state.cp_hqd_pq_rptr_report_addr_hi);
>> +	write_register(kgd, CP_HQD_PQ_RPTR, m->queue_state.cp_hqd_pq_rptr);
>> +
>> +	write_register(kgd, CP_HQD_PQ_WPTR_POLL_ADDR, m->queue_state.cp_hqd_pq_wptr_poll_addr);
>> +	write_register(kgd, CP_HQD_PQ_WPTR_POLL_ADDR_HI, m->queue_state.cp_hqd_pq_wptr_poll_addr_hi);
>> +
>> +	write_register(kgd, CP_HQD_PQ_DOORBELL_CONTROL, m->queue_state.cp_hqd_pq_doorbell_control);
>> +
>> +	write_register(kgd, CP_HQD_VMID, m->queue_state.cp_hqd_vmid);
>> +
>> +	write_register(kgd, CP_HQD_QUANTUM, m->queue_state.cp_hqd_quantum);
>> +
>> +	write_register(kgd, CP_HQD_PIPE_PRIORITY, m->queue_state.cp_hqd_pipe_priority);
>> +	write_register(kgd, CP_HQD_QUEUE_PRIORITY, m->queue_state.cp_hqd_queue_priority);
>> +
>> +	write_register(kgd, CP_HQD_HQ_SCHEDULER0, m->queue_state.cp_hqd_hq_scheduler0);
>> +	write_register(kgd, CP_HQD_HQ_SCHEDULER1, m->queue_state.cp_hqd_hq_scheduler1);
>> +
>> +	if (is_wptr_shadow_valid)
>> +		write_register(kgd, CP_HQD_PQ_WPTR, wptr_shadow);
>> +
>> +	write_register(kgd, CP_HQD_ACTIVE, m->queue_state.cp_hqd_active);
>> +	release_queue(kgd);
>> +
>> +	return 0;
>> +}
>> +
>> +static bool kgd_hqd_is_occupies(struct kgd_dev *kgd, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id)
>> +{
>> +	uint32_t act;
>> +	bool retval = false;
>> +	uint32_t low, high;
>> +
>> +	acquire_queue(kgd, pipe_id, queue_id);
>> +	act = read_register(kgd, CP_HQD_ACTIVE);
>> +	if (act) {
>> +		low = lower_32(queue_address >> 8);
>> +		high = upper_32(queue_address >> 8);
>> +
>> +		if (low == read_register(kgd, CP_HQD_PQ_BASE) &&
>> +				high == read_register(kgd, CP_HQD_PQ_BASE_HI))
>> +			retval = true;
>> +	}
>> +	release_queue(kgd);
>> +	return retval;
>> +}
>> +
>> +static int kgd_hqd_destroy(struct kgd_dev *kgd, bool is_reset,
>> +				unsigned int timeout, uint32_t pipe_id,
>> +				uint32_t queue_id)
>> +{
>> +	int status = 0;
>> +	bool sync = (timeout > 0) ? true : false;
>> +
>> +	acquire_queue(kgd, pipe_id, queue_id);
>> +	write_register(kgd, CP_HQD_PQ_DOORBELL_CONTROL, 0);
>> +
>> +	if (is_reset)
>> +		write_register(kgd, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_RESET);
>> +	else
>> +		write_register(kgd, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_DRAIN);
>> +
>> +
>> +	while (read_register(kgd, CP_HQD_ACTIVE) != 0) {
>> +		if (sync && timeout <= 0) {
>> +			status = -EBUSY;
>> +			break;
>> +		}
>> +		msleep(20);
>> +		if (sync) {
>> +			if (timeout >= 20)
>> +				timeout -= 20;
>> +			else
>> +				timeout = 0;
>> +		}
>> +	}
>> +	release_queue(kgd);
>> +	return status;
>> +}
>> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.h b/drivers/gpu/drm/radeon/radeon_kfd.h
>> new file mode 100644
>> index 0000000..5171726
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/radeon_kfd.h
>> @@ -0,0 +1,119 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +/*
>> + * radeon_kfd.h defines the private interface between the
>> + * AMD kernel graphics drivers and the AMD KFD.
>> + */
>> +
>> +#ifndef RADEON_KFD_H_INCLUDED
>> +#define RADEON_KFD_H_INCLUDED
>> +
>> +#include <linux/types.h>
>> +
>> +struct pci_dev;
>> +
>> +#define KFD_INTERFACE_VERSION 1
>> +
>> +struct kfd_dev;
>> +struct kgd_dev;
>> +
>> +struct kgd_mem;
>> +
>> +struct radeon_device;
>> +
>> +enum kgd_memory_pool {
>> +	KGD_POOL_SYSTEM_CACHEABLE = 1,
>> +	KGD_POOL_SYSTEM_WRITECOMBINE = 2,
>> +	KGD_POOL_FRAMEBUFFER = 3,
>> +};
>> +
>> +struct kgd2kfd_shared_resources {
>> +	unsigned int compute_vmid_bitmap; /* Bit n == 1 means VMID n is available for KFD. */
>> +
>> +	unsigned int first_compute_pipe; /* Compute pipes are counted starting from MEC0/pipe0 as 0. */
>> +	unsigned int compute_pipe_count; /* Number of MEC pipes available for KFD. */
>> +
>> +	phys_addr_t doorbell_physical_address; /* Base address of doorbell aperture. */
>> +	size_t doorbell_aperture_size; /* Size in bytes of doorbell aperture. */
>> +	size_t doorbell_start_offset; /* Number of bytes at start of aperture reserved for KGD. */
>> +};
>> +
>> +struct kgd2kfd_calls {
>> +	void (*exit)(void);
>> +	struct kfd_dev* (*probe)(struct kgd_dev *kgd, struct pci_dev *pdev);
>> +	bool (*device_init)(struct kfd_dev *kfd, const struct kgd2kfd_shared_resources *gpu_resources);
>> +	void (*device_exit)(struct kfd_dev *kfd);
>> +	void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
>> +	void (*suspend)(struct kfd_dev *kfd);
>> +	int (*resume)(struct kfd_dev *kfd);
>> +};
>> +
>> +struct kfd2kgd_calls {
>> +	/* Memory management. */
>> +	int (*allocate_mem)(struct kgd_dev *kgd,
>> +				size_t size,
>> +				size_t alignment,
>> +				enum kgd_memory_pool pool,
>> +				struct kgd_mem **memory_handle);
>> +
>> +	void (*free_mem)(struct kgd_dev *kgd, struct kgd_mem *memory_handle);
>> +
>> +	int (*gpumap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address);
>> +	void (*ungpumap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
>> +
>> +	int (*kmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr);
>> +	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
>> +
>> +	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
>> +	uint64_t (*get_gpu_clock_counter)(struct kgd_dev *kgd);
>> +
>> +	uint32_t (*get_max_engine_clock_in_mhz)(struct kgd_dev *kgd);
>> +
>> +	/* Register access functions */
>> +	void (*program_sh_mem_settings)(struct kgd_dev *kgd, uint32_t vmid, uint32_t sh_mem_config,
>> +			uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit, uint32_t sh_mem_bases);
>> +	int (*set_pasid_vmid_mapping)(struct kgd_dev *kgd, unsigned int pasid, unsigned int vmid);
>> +	int (*init_memory)(struct kgd_dev *kgd);
>> +	int (*init_pipeline)(struct kgd_dev *kgd, uint32_t pipe_id, uint32_t hpd_size, uint64_t hpd_gpu_addr);
>> +	int (*hqd_load)(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr);
>> +	bool (*hqd_is_occupies)(struct kgd_dev *kgd, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id);
>> +	int (*hqd_destroy)(struct kgd_dev *kgd, bool is_reset, unsigned int timeout,
>> +				uint32_t pipe_id, uint32_t queue_id);
>> +};
> 
> Such interface should be documented looks at ttm or any other function structure
> inside the kernel for example on how to document those.
> 
Done in v3.

	Oded
>> +
>> +bool radeon_kfd_init(void);
>> +void radeon_kfd_fini(void);
>> +bool kgd2kfd_init(unsigned interface_version,
>> +		  const struct kfd2kgd_calls *f2g,
>> +		  const struct kgd2kfd_calls **g2f);
>> +
>> +void radeon_kfd_suspend(struct radeon_device *rdev);
>> +int radeon_kfd_resume(struct radeon_device *rdev);
>> +void radeon_kfd_interrupt(struct radeon_device *rdev,
>> +			const void *ih_ring_entry);
>> +void radeon_kfd_device_probe(struct radeon_device *rdev);
>> +void radeon_kfd_device_init(struct radeon_device *rdev);
>> +void radeon_kfd_device_fini(struct radeon_device *rdev);
>> +
>> +#endif
>> +
>> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
>> index 35d9318..929beda 100644
>> --- a/drivers/gpu/drm/radeon/radeon_kms.c
>> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
>> @@ -34,6 +34,8 @@
>>  #include <linux/slab.h>
>>  #include <linux/pm_runtime.h>
>>  
>> +#include "radeon_kfd.h"
>> +
>>  #if defined(CONFIG_VGA_SWITCHEROO)
>>  bool radeon_has_atpx(void);
>>  #else
>> @@ -63,6 +65,8 @@ int radeon_driver_unload_kms(struct drm_device *dev)
>>  
>>  	pm_runtime_get_sync(dev->dev);
>>  
>> +	radeon_kfd_device_fini(rdev);
>> +
>>  	radeon_acpi_fini(rdev);
>>  	
>>  	radeon_modeset_fini(rdev);
>> @@ -142,6 +146,9 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
>>  				"Error during ACPI methods call\n");
>>  	}
>>  
>> +	radeon_kfd_device_probe(rdev);
>> +	radeon_kfd_device_init(rdev);
>> +
>>  	if (radeon_is_px(dev)) {
>>  		pm_runtime_use_autosuspend(dev->dev);
>>  		pm_runtime_set_autosuspend_delay(dev->dev, 5000);
>> -- 
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 09/25] amdkfd: Add amdkfd skeleton driver
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (4 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 06/25] drm/radeon: Add radeon <--> amdkfd interface Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-20 17:09   ` Jerome Glisse
  2014-07-17 13:29 ` [PATCH v2 10/25] amdkfd: Add topology module to amdkfd Oded Gabbay
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

This patch adds the amdkfd skeleton driver. The driver does nothing except
define a /dev/kfd device.

It returns -ENODEV on all amdkfd IOCTLs.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/Kconfig              |   2 +
 drivers/gpu/drm/radeon/Makefile             |   2 +
 drivers/gpu/drm/radeon/amdkfd/Kconfig       |  10 ++
 drivers/gpu/drm/radeon/amdkfd/Makefile      |   9 ++
 drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c | 203 ++++++++++++++++++++++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_device.c  | 129 ++++++++++++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_module.c  |  98 ++++++++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h    |  81 +++++++++++
 8 files changed, 534 insertions(+)
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/Kconfig
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/Makefile
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_device.c
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_module.c
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h

diff --git a/drivers/gpu/drm/radeon/Kconfig b/drivers/gpu/drm/radeon/Kconfig
index 970f8e9..b697321 100644
--- a/drivers/gpu/drm/radeon/Kconfig
+++ b/drivers/gpu/drm/radeon/Kconfig
@@ -6,3 +6,5 @@ config DRM_RADEON_UMS
 
 	  Userspace modesetting is deprecated for quite some time now, so
 	  enable this only if you have ancient versions of the DDX drivers.
+
+source "drivers/gpu/drm/radeon/amdkfd/Kconfig"
diff --git a/drivers/gpu/drm/radeon/Makefile b/drivers/gpu/drm/radeon/Makefile
index a1c913d..50823a1 100644
--- a/drivers/gpu/drm/radeon/Makefile
+++ b/drivers/gpu/drm/radeon/Makefile
@@ -112,4 +112,6 @@ radeon-$(CONFIG_ACPI) += radeon_acpi.o
 
 obj-$(CONFIG_DRM_RADEON)+= radeon.o
 
+obj-$(CONFIG_HSA_RADEON)+= amdkfd/
+
 CFLAGS_radeon_trace_points.o := -I$(src)
diff --git a/drivers/gpu/drm/radeon/amdkfd/Kconfig b/drivers/gpu/drm/radeon/amdkfd/Kconfig
new file mode 100644
index 0000000..900bb34
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/Kconfig
@@ -0,0 +1,10 @@
+#
+# Heterogenous system architecture configuration
+#
+
+config HSA_RADEON
+	tristate "HSA kernel driver for AMD Radeon devices"
+	depends on DRM_RADEON && AMD_IOMMU_V2 && X86_64
+	default m
+	help
+	  Enable this if you want to use HSA features on AMD radeon devices.
diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
new file mode 100644
index 0000000..9564e75
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
@@ -0,0 +1,9 @@
+#
+# Makefile for Heterogenous System Architecture support for AMD radeon devices
+#
+
+ccflags-y := -Iinclude/drm
+
+amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o
+
+obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
new file mode 100644
index 0000000..b98bcb7
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
@@ -0,0 +1,203 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/device.h>
+#include <linux/export.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/compat.h>
+#include <uapi/linux/kfd_ioctl.h>
+#include <linux/time.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+#include <uapi/asm-generic/mman-common.h>
+#include <asm/processor.h>
+#include "kfd_priv.h"
+
+static long kfd_ioctl(struct file *, unsigned int, unsigned long);
+static int kfd_open(struct inode *, struct file *);
+
+static const char kfd_dev_name[] = "kfd";
+
+static const struct file_operations kfd_fops = {
+	.owner = THIS_MODULE,
+	.unlocked_ioctl = kfd_ioctl,
+	.compat_ioctl = kfd_ioctl,
+	.open = kfd_open,
+};
+
+static int kfd_char_dev_major = -1;
+static struct class *kfd_class;
+struct device *kfd_device;
+
+int kfd_chardev_init(void)
+{
+	int err = 0;
+
+	kfd_char_dev_major = register_chrdev(0, kfd_dev_name, &kfd_fops);
+	err = kfd_char_dev_major;
+	if (err < 0)
+		goto err_register_chrdev;
+
+	kfd_class = class_create(THIS_MODULE, kfd_dev_name);
+	err = PTR_ERR(kfd_class);
+	if (IS_ERR(kfd_class))
+		goto err_class_create;
+
+	kfd_device = device_create(kfd_class, NULL, MKDEV(kfd_char_dev_major, 0), NULL, kfd_dev_name);
+	err = PTR_ERR(kfd_device);
+	if (IS_ERR(kfd_device))
+		goto err_device_create;
+
+	return 0;
+
+err_device_create:
+	class_destroy(kfd_class);
+err_class_create:
+	unregister_chrdev(kfd_char_dev_major, kfd_dev_name);
+err_register_chrdev:
+	return err;
+}
+
+void kfd_chardev_exit(void)
+{
+	device_destroy(kfd_class, MKDEV(kfd_char_dev_major, 0));
+	class_destroy(kfd_class);
+	unregister_chrdev(kfd_char_dev_major, kfd_dev_name);
+}
+
+struct device *kfd_chardev(void)
+{
+	return kfd_device;
+}
+
+
+static int kfd_open(struct inode *inode, struct file *filep)
+{
+	if (iminor(inode) != 0)
+		return -ENODEV;
+
+	return 0;
+}
+
+static long kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, void __user *arg)
+{
+	return -ENODEV;
+}
+
+static int kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *arg)
+{
+	return -ENODEV;
+}
+
+static int kfd_ioctl_update_queue(struct file *filp, struct kfd_process *p, void __user *arg)
+{
+	return -ENODEV;
+}
+
+static long kfd_ioctl_set_memory_policy(struct file *filep, struct kfd_process *p, void __user *arg)
+{
+	return -ENODEV;
+}
+
+static long kfd_ioctl_get_clock_counters(struct file *filep, struct kfd_process *p, void __user *arg)
+{
+	return -ENODEV;
+}
+
+
+static int kfd_ioctl_get_process_apertures(struct file *filp, struct kfd_process *p, void __user *arg)
+{
+	return -ENODEV;
+}
+
+static long kfd_ioctl_pmc_acquire_access(struct file *filp, struct kfd_process *p, void __user *arg)
+{
+	return -ENODEV;
+}
+
+static long kfd_ioctl_pmc_release_access(struct file *filp, struct kfd_process *p, void __user *arg)
+{
+	return -ENODEV;
+}
+
+static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
+{
+	struct kfd_process *process;
+	long err = -EINVAL;
+
+	dev_dbg(kfd_device,
+		"ioctl cmd 0x%x (#%d), arg 0x%lx\n",
+		cmd, _IOC_NR(cmd), arg);
+
+	/* TODO: add function that retrieves process */
+	process = NULL;
+
+	switch (cmd) {
+	case KFD_IOC_CREATE_QUEUE:
+		err = kfd_ioctl_create_queue(filep, process, (void __user *)arg);
+		break;
+
+	case KFD_IOC_DESTROY_QUEUE:
+		err = kfd_ioctl_destroy_queue(filep, process, (void __user *)arg);
+		break;
+
+	case KFD_IOC_SET_MEMORY_POLICY:
+		err = kfd_ioctl_set_memory_policy(filep, process, (void __user *)arg);
+		break;
+
+	case KFD_IOC_GET_CLOCK_COUNTERS:
+		err = kfd_ioctl_get_clock_counters(filep, process, (void __user *)arg);
+		break;
+
+	case KFD_IOC_GET_PROCESS_APERTURES:
+		err = kfd_ioctl_get_process_apertures(filep, process, (void __user *)arg);
+		break;
+
+	case KFD_IOC_UPDATE_QUEUE:
+		err = kfd_ioctl_update_queue(filep, process, (void __user *)arg);
+		break;
+
+	case KFD_IOC_PMC_ACQUIRE_ACCESS:
+		err = kfd_ioctl_pmc_acquire_access(filep, process, (void __user *) arg);
+		break;
+
+	case KFD_IOC_PMC_RELEASE_ACCESS:
+		err = kfd_ioctl_pmc_release_access(filep, process, (void __user *) arg);
+		break;
+
+	default:
+		dev_err(kfd_device,
+			"unknown ioctl cmd 0x%x, arg 0x%lx)\n",
+			cmd, arg);
+		err = -EINVAL;
+		break;
+	}
+
+	if (err < 0)
+		dev_err(kfd_device, "ioctl error %ld\n", err);
+
+	return err;
+}
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
new file mode 100644
index 0000000..dd63ce09
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
@@ -0,0 +1,129 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/amd-iommu.h>
+#include <linux/bsearch.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include "kfd_priv.h"
+
+static const struct kfd_device_info kaveri_device_info = {
+	.max_pasid_bits = 16,
+};
+
+struct kfd_deviceid {
+	unsigned short did;
+	const struct kfd_device_info *device_info;
+};
+
+/* Please keep this sorted by increasing device id. */
+static const struct kfd_deviceid supported_devices[] = {
+	{ 0x1304, &kaveri_device_info },	/* Kaveri */
+	{ 0x1305, &kaveri_device_info },	/* Kaveri */
+	{ 0x1306, &kaveri_device_info },	/* Kaveri */
+	{ 0x1307, &kaveri_device_info },	/* Kaveri */
+	{ 0x1309, &kaveri_device_info },	/* Kaveri */
+	{ 0x130A, &kaveri_device_info },	/* Kaveri */
+	{ 0x130B, &kaveri_device_info },	/* Kaveri */
+	{ 0x130C, &kaveri_device_info },	/* Kaveri */
+	{ 0x130D, &kaveri_device_info },	/* Kaveri */
+	{ 0x130E, &kaveri_device_info },	/* Kaveri */
+	{ 0x130F, &kaveri_device_info },	/* Kaveri */
+	{ 0x1310, &kaveri_device_info },	/* Kaveri */
+	{ 0x1311, &kaveri_device_info },	/* Kaveri */
+	{ 0x1312, &kaveri_device_info },	/* Kaveri */
+	{ 0x1313, &kaveri_device_info },	/* Kaveri */
+	{ 0x1315, &kaveri_device_info },	/* Kaveri */
+	{ 0x1316, &kaveri_device_info },	/* Kaveri */
+	{ 0x1317, &kaveri_device_info },	/* Kaveri */
+	{ 0x1318, &kaveri_device_info },	/* Kaveri */
+	{ 0x131B, &kaveri_device_info },	/* Kaveri */
+	{ 0x131C, &kaveri_device_info },	/* Kaveri */
+	{ 0x131D, &kaveri_device_info },	/* Kaveri */
+};
+
+static const struct kfd_device_info *lookup_device_info(unsigned short did)
+{
+	size_t i;
+
+	for (i = 0; i < ARRAY_SIZE(supported_devices); i++) {
+		if (supported_devices[i].did == did) {
+			BUG_ON(supported_devices[i].device_info == NULL);
+			return supported_devices[i].device_info;
+		}
+	}
+
+	return NULL;
+}
+
+struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev)
+{
+	struct kfd_dev *kfd;
+
+	const struct kfd_device_info *device_info = lookup_device_info(pdev->device);
+
+	if (!device_info)
+		return NULL;
+
+	kfd = kzalloc(sizeof(*kfd), GFP_KERNEL);
+	if (!kfd)
+		return NULL;
+
+	kfd->kgd = kgd;
+	kfd->device_info = device_info;
+	kfd->pdev = pdev;
+
+	return kfd;
+}
+
+bool kgd2kfd_device_init(struct kfd_dev *kfd,
+			 const struct kgd2kfd_shared_resources *gpu_resources)
+{
+	kfd->shared_resources = *gpu_resources;
+
+	kfd->init_complete = true;
+	dev_info(kfd_device, "added device (%x:%x)\n", kfd->pdev->vendor,
+		 kfd->pdev->device);
+
+	return true;
+}
+
+void kgd2kfd_device_exit(struct kfd_dev *kfd)
+{
+	kfree(kfd);
+}
+
+void kgd2kfd_suspend(struct kfd_dev *kfd)
+{
+	BUG_ON(kfd == NULL);
+}
+
+int kgd2kfd_resume(struct kfd_dev *kfd)
+{
+	BUG_ON(kfd == NULL);
+
+	return 0;
+}
+
+void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry)
+{
+}
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
new file mode 100644
index 0000000..c7faac6
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
@@ -0,0 +1,98 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/notifier.h>
+#include <linux/moduleparam.h>
+#include <linux/device.h>
+#include "kfd_priv.h"
+
+#define KFD_DRIVER_AUTHOR	"AMD Inc. and others"
+
+#define KFD_DRIVER_DESC		"Standalone HSA driver for AMD's GPUs"
+#define KFD_DRIVER_DATE		"20140710"
+#define KFD_DRIVER_MAJOR	0
+#define KFD_DRIVER_MINOR	6
+#define KFD_DRIVER_PATCHLEVEL	2
+
+const struct kfd2kgd_calls *kfd2kgd;
+static const struct kgd2kfd_calls kgd2kfd = {
+	.exit		= kgd2kfd_exit,
+	.probe		= kgd2kfd_probe,
+	.device_init	= kgd2kfd_device_init,
+	.device_exit	= kgd2kfd_device_exit,
+	.interrupt	= kgd2kfd_interrupt,
+	.suspend	= kgd2kfd_suspend,
+	.resume		= kgd2kfd_resume,
+};
+
+bool kgd2kfd_init(unsigned interface_version,
+		  const struct kfd2kgd_calls *f2g,
+		  const struct kgd2kfd_calls **g2f)
+{
+	/* Only one interface version is supported, no kfd/kgd version skew allowed. */
+	if (interface_version != KFD_INTERFACE_VERSION)
+		return false;
+
+	kfd2kgd = f2g;
+	*g2f = &kgd2kfd;
+
+	return true;
+}
+EXPORT_SYMBOL(kgd2kfd_init);
+
+void kgd2kfd_exit(void)
+{
+}
+
+static int __init kfd_module_init(void)
+{
+	int err;
+
+	err = kfd_chardev_init();
+	if (err < 0)
+		goto err_ioctl;
+
+	dev_info(kfd_device, "Initialized module\n");
+
+	return 0;
+
+err_ioctl:
+	return err;
+}
+
+static void __exit kfd_module_exit(void)
+{
+	kfd_chardev_exit();
+	dev_info(kfd_device, "Removed module\n");
+}
+
+module_init(kfd_module_init);
+module_exit(kfd_module_exit);
+
+MODULE_AUTHOR(KFD_DRIVER_AUTHOR);
+MODULE_DESCRIPTION(KFD_DRIVER_DESC);
+MODULE_LICENSE("GPL and additional rights");
+MODULE_VERSION(__stringify(KFD_DRIVER_MAJOR) "."
+	       __stringify(KFD_DRIVER_MINOR) "."
+	       __stringify(KFD_DRIVER_PATCHLEVEL));
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
new file mode 100644
index 0000000..05e892f
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -0,0 +1,81 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef KFD_PRIV_H_INCLUDED
+#define KFD_PRIV_H_INCLUDED
+
+#include <linux/hashtable.h>
+#include <linux/mmu_notifier.h>
+#include <linux/mutex.h>
+#include <linux/types.h>
+#include <linux/atomic.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
+#include "../radeon_kfd.h"
+
+struct kfd_device_info {
+	const struct kfd_scheduler_class *scheduler_class;
+	unsigned int max_pasid_bits;
+	size_t ih_ring_entry_size;
+};
+
+struct kfd_dev {
+	struct kgd_dev *kgd;
+
+	const struct kfd_device_info *device_info;
+	struct pci_dev *pdev;
+
+	bool init_complete;
+
+	unsigned int id;		/* topology stub index */
+
+	struct kgd2kfd_shared_resources shared_resources;
+};
+
+/* KGD2KFD callbacks */
+void kgd2kfd_exit(void);
+struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev);
+bool kgd2kfd_device_init(struct kfd_dev *kfd,
+			 const struct kgd2kfd_shared_resources *gpu_resources);
+void kgd2kfd_device_exit(struct kfd_dev *kfd);
+
+extern const struct kfd2kgd_calls *kfd2kgd;
+
+/* Character device interface */
+int kfd_chardev_init(void);
+void kfd_chardev_exit(void);
+struct device *kfd_chardev(void);
+
+/* Process data */
+struct kfd_process {
+};
+
+extern struct device *kfd_device;
+
+/* Interrupts */
+void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
+
+/* Power Management */
+void kgd2kfd_suspend(struct kfd_dev *dev);
+int kgd2kfd_resume(struct kfd_dev *dev);
+
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 09/25] amdkfd: Add amdkfd skeleton driver
  2014-07-17 13:29 ` [PATCH v2 09/25] amdkfd: Add amdkfd skeleton driver Oded Gabbay
@ 2014-07-20 17:09   ` Jerome Glisse
  2014-08-02 19:55     ` Oded Gabbay
  0 siblings, 1 reply; 46+ messages in thread
From: Jerome Glisse @ 2014-07-20 17:09 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Evgeny Pinchuk, Alexey Skidanov, Alex Deucher, Andrew Morton,
	Christian König

On Thu, Jul 17, 2014 at 04:29:16PM +0300, Oded Gabbay wrote:
> This patch adds the amdkfd skeleton driver. The driver does nothing except
> define a /dev/kfd device.
> 
> It returns -ENODEV on all amdkfd IOCTLs.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/drm/radeon/Kconfig              |   2 +
>  drivers/gpu/drm/radeon/Makefile             |   2 +
>  drivers/gpu/drm/radeon/amdkfd/Kconfig       |  10 ++
>  drivers/gpu/drm/radeon/amdkfd/Makefile      |   9 ++
>  drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c | 203 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_device.c  | 129 ++++++++++++++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_module.c  |  98 ++++++++++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_priv.h    |  81 +++++++++++
>  8 files changed, 534 insertions(+)
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/Kconfig
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/Makefile
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> 
> diff --git a/drivers/gpu/drm/radeon/Kconfig b/drivers/gpu/drm/radeon/Kconfig
> index 970f8e9..b697321 100644
> --- a/drivers/gpu/drm/radeon/Kconfig
> +++ b/drivers/gpu/drm/radeon/Kconfig
> @@ -6,3 +6,5 @@ config DRM_RADEON_UMS
>  
>  	  Userspace modesetting is deprecated for quite some time now, so
>  	  enable this only if you have ancient versions of the DDX drivers.
> +
> +source "drivers/gpu/drm/radeon/amdkfd/Kconfig"
> diff --git a/drivers/gpu/drm/radeon/Makefile b/drivers/gpu/drm/radeon/Makefile
> index a1c913d..50823a1 100644
> --- a/drivers/gpu/drm/radeon/Makefile
> +++ b/drivers/gpu/drm/radeon/Makefile
> @@ -112,4 +112,6 @@ radeon-$(CONFIG_ACPI) += radeon_acpi.o
>  
>  obj-$(CONFIG_DRM_RADEON)+= radeon.o
>  
> +obj-$(CONFIG_HSA_RADEON)+= amdkfd/
> +
>  CFLAGS_radeon_trace_points.o := -I$(src)
> diff --git a/drivers/gpu/drm/radeon/amdkfd/Kconfig b/drivers/gpu/drm/radeon/amdkfd/Kconfig
> new file mode 100644
> index 0000000..900bb34
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/Kconfig
> @@ -0,0 +1,10 @@
> +#
> +# Heterogenous system architecture configuration
> +#
> +
> +config HSA_RADEON
> +	tristate "HSA kernel driver for AMD Radeon devices"
> +	depends on DRM_RADEON && AMD_IOMMU_V2 && X86_64
> +	default m
> +	help
> +	  Enable this if you want to use HSA features on AMD radeon devices.
> diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
> new file mode 100644
> index 0000000..9564e75
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
> @@ -0,0 +1,9 @@
> +#
> +# Makefile for Heterogenous System Architecture support for AMD radeon devices
> +#
> +
> +ccflags-y := -Iinclude/drm
> +
> +amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o
> +
> +obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
> new file mode 100644
> index 0000000..b98bcb7
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
> @@ -0,0 +1,203 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/device.h>
> +#include <linux/export.h>
> +#include <linux/err.h>
> +#include <linux/fs.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +#include <linux/compat.h>
> +#include <uapi/linux/kfd_ioctl.h>
> +#include <linux/time.h>
> +#include <linux/mm.h>
> +#include <linux/uaccess.h>
> +#include <uapi/asm-generic/mman-common.h>
> +#include <asm/processor.h>
> +#include "kfd_priv.h"
> +
> +static long kfd_ioctl(struct file *, unsigned int, unsigned long);
> +static int kfd_open(struct inode *, struct file *);
> +
> +static const char kfd_dev_name[] = "kfd";
> +
> +static const struct file_operations kfd_fops = {
> +	.owner = THIS_MODULE,
> +	.unlocked_ioctl = kfd_ioctl,
> +	.compat_ioctl = kfd_ioctl,
> +	.open = kfd_open,
> +};
> +
> +static int kfd_char_dev_major = -1;
> +static struct class *kfd_class;
> +struct device *kfd_device;
> +
> +int kfd_chardev_init(void)
> +{
> +	int err = 0;
> +
> +	kfd_char_dev_major = register_chrdev(0, kfd_dev_name, &kfd_fops);
> +	err = kfd_char_dev_major;
> +	if (err < 0)
> +		goto err_register_chrdev;
> +
> +	kfd_class = class_create(THIS_MODULE, kfd_dev_name);
> +	err = PTR_ERR(kfd_class);
> +	if (IS_ERR(kfd_class))
> +		goto err_class_create;
> +
> +	kfd_device = device_create(kfd_class, NULL, MKDEV(kfd_char_dev_major, 0), NULL, kfd_dev_name);
> +	err = PTR_ERR(kfd_device);
> +	if (IS_ERR(kfd_device))
> +		goto err_device_create;
> +
> +	return 0;
> +
> +err_device_create:
> +	class_destroy(kfd_class);
> +err_class_create:
> +	unregister_chrdev(kfd_char_dev_major, kfd_dev_name);
> +err_register_chrdev:
> +	return err;
> +}
> +
> +void kfd_chardev_exit(void)
> +{
> +	device_destroy(kfd_class, MKDEV(kfd_char_dev_major, 0));
> +	class_destroy(kfd_class);
> +	unregister_chrdev(kfd_char_dev_major, kfd_dev_name);
> +}
> +
> +struct device *kfd_chardev(void)
> +{
> +	return kfd_device;
> +}
> +
> +
> +static int kfd_open(struct inode *inode, struct file *filep)
> +{
> +	if (iminor(inode) != 0)
> +		return -ENODEV;
> +
> +	return 0;
> +}
> +
> +static long kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, void __user *arg)
> +{
> +	return -ENODEV;
> +}
> +
> +static int kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *arg)
> +{
> +	return -ENODEV;
> +}
> +
> +static int kfd_ioctl_update_queue(struct file *filp, struct kfd_process *p, void __user *arg)
> +{
> +	return -ENODEV;
> +}
> +
> +static long kfd_ioctl_set_memory_policy(struct file *filep, struct kfd_process *p, void __user *arg)
> +{
> +	return -ENODEV;
> +}
> +
> +static long kfd_ioctl_get_clock_counters(struct file *filep, struct kfd_process *p, void __user *arg)
> +{
> +	return -ENODEV;
> +}
> +
> +
> +static int kfd_ioctl_get_process_apertures(struct file *filp, struct kfd_process *p, void __user *arg)
> +{
> +	return -ENODEV;
> +}
> +
> +static long kfd_ioctl_pmc_acquire_access(struct file *filp, struct kfd_process *p, void __user *arg)
> +{
> +	return -ENODEV;
> +}
> +
> +static long kfd_ioctl_pmc_release_access(struct file *filp, struct kfd_process *p, void __user *arg)
> +{
> +	return -ENODEV;
> +}
> +
> +static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
> +{
> +	struct kfd_process *process;
> +	long err = -EINVAL;
> +
> +	dev_dbg(kfd_device,
> +		"ioctl cmd 0x%x (#%d), arg 0x%lx\n",
> +		cmd, _IOC_NR(cmd), arg);
> +
> +	/* TODO: add function that retrieves process */
> +	process = NULL;
> +
> +	switch (cmd) {
> +	case KFD_IOC_CREATE_QUEUE:
> +		err = kfd_ioctl_create_queue(filep, process, (void __user *)arg);
> +		break;
> +
> +	case KFD_IOC_DESTROY_QUEUE:
> +		err = kfd_ioctl_destroy_queue(filep, process, (void __user *)arg);
> +		break;
> +
> +	case KFD_IOC_SET_MEMORY_POLICY:
> +		err = kfd_ioctl_set_memory_policy(filep, process, (void __user *)arg);
> +		break;
> +
> +	case KFD_IOC_GET_CLOCK_COUNTERS:
> +		err = kfd_ioctl_get_clock_counters(filep, process, (void __user *)arg);
> +		break;
> +
> +	case KFD_IOC_GET_PROCESS_APERTURES:
> +		err = kfd_ioctl_get_process_apertures(filep, process, (void __user *)arg);
> +		break;
> +
> +	case KFD_IOC_UPDATE_QUEUE:
> +		err = kfd_ioctl_update_queue(filep, process, (void __user *)arg);
> +		break;
> +
> +	case KFD_IOC_PMC_ACQUIRE_ACCESS:
> +		err = kfd_ioctl_pmc_acquire_access(filep, process, (void __user *) arg);
> +		break;
> +
> +	case KFD_IOC_PMC_RELEASE_ACCESS:
> +		err = kfd_ioctl_pmc_release_access(filep, process, (void __user *) arg);
> +		break;
> +
> +	default:
> +		dev_err(kfd_device,
> +			"unknown ioctl cmd 0x%x, arg 0x%lx)\n",
> +			cmd, arg);
> +		err = -EINVAL;
> +		break;
> +	}
> +
> +	if (err < 0)
> +		dev_err(kfd_device, "ioctl error %ld\n", err);
> +
> +	return err;
> +}
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
> new file mode 100644
> index 0000000..dd63ce09
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
> @@ -0,0 +1,129 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/amd-iommu.h>
> +#include <linux/bsearch.h>
> +#include <linux/pci.h>
> +#include <linux/slab.h>
> +#include "kfd_priv.h"
> +
> +static const struct kfd_device_info kaveri_device_info = {
> +	.max_pasid_bits = 16,
> +};
> +
> +struct kfd_deviceid {
> +	unsigned short did;
> +	const struct kfd_device_info *device_info;
> +};
> +
> +/* Please keep this sorted by increasing device id. */
> +static const struct kfd_deviceid supported_devices[] = {
> +	{ 0x1304, &kaveri_device_info },	/* Kaveri */
> +	{ 0x1305, &kaveri_device_info },	/* Kaveri */
> +	{ 0x1306, &kaveri_device_info },	/* Kaveri */
> +	{ 0x1307, &kaveri_device_info },	/* Kaveri */
> +	{ 0x1309, &kaveri_device_info },	/* Kaveri */
> +	{ 0x130A, &kaveri_device_info },	/* Kaveri */
> +	{ 0x130B, &kaveri_device_info },	/* Kaveri */
> +	{ 0x130C, &kaveri_device_info },	/* Kaveri */
> +	{ 0x130D, &kaveri_device_info },	/* Kaveri */
> +	{ 0x130E, &kaveri_device_info },	/* Kaveri */
> +	{ 0x130F, &kaveri_device_info },	/* Kaveri */
> +	{ 0x1310, &kaveri_device_info },	/* Kaveri */
> +	{ 0x1311, &kaveri_device_info },	/* Kaveri */
> +	{ 0x1312, &kaveri_device_info },	/* Kaveri */
> +	{ 0x1313, &kaveri_device_info },	/* Kaveri */
> +	{ 0x1315, &kaveri_device_info },	/* Kaveri */
> +	{ 0x1316, &kaveri_device_info },	/* Kaveri */
> +	{ 0x1317, &kaveri_device_info },	/* Kaveri */
> +	{ 0x1318, &kaveri_device_info },	/* Kaveri */
> +	{ 0x131B, &kaveri_device_info },	/* Kaveri */
> +	{ 0x131C, &kaveri_device_info },	/* Kaveri */
> +	{ 0x131D, &kaveri_device_info },	/* Kaveri */
> +};
> +
> +static const struct kfd_device_info *lookup_device_info(unsigned short did)
> +{
> +	size_t i;
> +
> +	for (i = 0; i < ARRAY_SIZE(supported_devices); i++) {
> +		if (supported_devices[i].did == did) {
> +			BUG_ON(supported_devices[i].device_info == NULL);
> +			return supported_devices[i].device_info;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev)
> +{
> +	struct kfd_dev *kfd;
> +
> +	const struct kfd_device_info *device_info = lookup_device_info(pdev->device);
> +
> +	if (!device_info)
> +		return NULL;
> +
> +	kfd = kzalloc(sizeof(*kfd), GFP_KERNEL);
> +	if (!kfd)
> +		return NULL;
> +
> +	kfd->kgd = kgd;
> +	kfd->device_info = device_info;
> +	kfd->pdev = pdev;
> +
> +	return kfd;
> +}
> +
> +bool kgd2kfd_device_init(struct kfd_dev *kfd,
> +			 const struct kgd2kfd_shared_resources *gpu_resources)
> +{
> +	kfd->shared_resources = *gpu_resources;
> +
> +	kfd->init_complete = true;
> +	dev_info(kfd_device, "added device (%x:%x)\n", kfd->pdev->vendor,
> +		 kfd->pdev->device);
> +
> +	return true;
> +}
> +
> +void kgd2kfd_device_exit(struct kfd_dev *kfd)
> +{
> +	kfree(kfd);
> +}
> +
> +void kgd2kfd_suspend(struct kfd_dev *kfd)
> +{
> +	BUG_ON(kfd == NULL);
> +}
> +
> +int kgd2kfd_resume(struct kfd_dev *kfd)
> +{
> +	BUG_ON(kfd == NULL);
> +
> +	return 0;
> +}
> +
> +void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry)
> +{
> +}
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
> new file mode 100644
> index 0000000..c7faac6
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
> @@ -0,0 +1,98 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/sched.h>
> +#include <linux/notifier.h>
> +#include <linux/moduleparam.h>
> +#include <linux/device.h>
> +#include "kfd_priv.h"
> +
> +#define KFD_DRIVER_AUTHOR	"AMD Inc. and others"
> +
> +#define KFD_DRIVER_DESC		"Standalone HSA driver for AMD's GPUs"
> +#define KFD_DRIVER_DATE		"20140710"
> +#define KFD_DRIVER_MAJOR	0
> +#define KFD_DRIVER_MINOR	6
> +#define KFD_DRIVER_PATCHLEVEL	2
> +
> +const struct kfd2kgd_calls *kfd2kgd;
> +static const struct kgd2kfd_calls kgd2kfd = {
> +	.exit		= kgd2kfd_exit,
> +	.probe		= kgd2kfd_probe,
> +	.device_init	= kgd2kfd_device_init,
> +	.device_exit	= kgd2kfd_device_exit,
> +	.interrupt	= kgd2kfd_interrupt,
> +	.suspend	= kgd2kfd_suspend,
> +	.resume		= kgd2kfd_resume,
> +};
> +
> +bool kgd2kfd_init(unsigned interface_version,
> +		  const struct kfd2kgd_calls *f2g,
> +		  const struct kgd2kfd_calls **g2f)
> +{
> +	/* Only one interface version is supported, no kfd/kgd version skew allowed. */
> +	if (interface_version != KFD_INTERFACE_VERSION)
> +		return false;

I am guessing this is for out of tree module ? Because otherwise this is
useless.

> +
> +	kfd2kgd = f2g;
> +	*g2f = &kgd2kfd;
> +
> +	return true;
> +}
> +EXPORT_SYMBOL(kgd2kfd_init);
> +
> +void kgd2kfd_exit(void)
> +{
> +}
> +
> +static int __init kfd_module_init(void)
> +{
> +	int err;
> +
> +	err = kfd_chardev_init();
> +	if (err < 0)
> +		goto err_ioctl;
> +
> +	dev_info(kfd_device, "Initialized module\n");
> +

Improve dev_info to provide some meaning full information like bus id, device name.

> +	return 0;
> +
> +err_ioctl:
> +	return err;
> +}
> +
> +static void __exit kfd_module_exit(void)
> +{
> +	kfd_chardev_exit();
> +	dev_info(kfd_device, "Removed module\n");
> +}

Same as for module_init, improve dev_info.

> +
> +module_init(kfd_module_init);
> +module_exit(kfd_module_exit);
> +
> +MODULE_AUTHOR(KFD_DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(KFD_DRIVER_DESC);
> +MODULE_LICENSE("GPL and additional rights");

I would like to see all copyright header to reflect that ie to clearly
state that it could be either licensed under GPL or under the BSD license
that you are using.

> +MODULE_VERSION(__stringify(KFD_DRIVER_MAJOR) "."
> +	       __stringify(KFD_DRIVER_MINOR) "."
> +	       __stringify(KFD_DRIVER_PATCHLEVEL));
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> new file mode 100644
> index 0000000..05e892f
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> @@ -0,0 +1,81 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef KFD_PRIV_H_INCLUDED
> +#define KFD_PRIV_H_INCLUDED
> +
> +#include <linux/hashtable.h>
> +#include <linux/mmu_notifier.h>
> +#include <linux/mutex.h>
> +#include <linux/types.h>
> +#include <linux/atomic.h>
> +#include <linux/workqueue.h>
> +#include <linux/spinlock.h>
> +#include "../radeon_kfd.h"
> +
> +struct kfd_device_info {
> +	const struct kfd_scheduler_class *scheduler_class;
> +	unsigned int max_pasid_bits;
> +	size_t ih_ring_entry_size;
> +};
> +
> +struct kfd_dev {
> +	struct kgd_dev *kgd;
> +
> +	const struct kfd_device_info *device_info;
> +	struct pci_dev *pdev;
> +
> +	bool init_complete;
> +
> +	unsigned int id;		/* topology stub index */
> +
> +	struct kgd2kfd_shared_resources shared_resources;
> +};
> +
> +/* KGD2KFD callbacks */
> +void kgd2kfd_exit(void);
> +struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev);
> +bool kgd2kfd_device_init(struct kfd_dev *kfd,
> +			 const struct kgd2kfd_shared_resources *gpu_resources);
> +void kgd2kfd_device_exit(struct kfd_dev *kfd);
> +
> +extern const struct kfd2kgd_calls *kfd2kgd;
> +
> +/* Character device interface */
> +int kfd_chardev_init(void);
> +void kfd_chardev_exit(void);
> +struct device *kfd_chardev(void);
> +
> +/* Process data */
> +struct kfd_process {
> +};
> +
> +extern struct device *kfd_device;
> +
> +/* Interrupts */
> +void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
> +
> +/* Power Management */
> +void kgd2kfd_suspend(struct kfd_dev *dev);
> +int kgd2kfd_resume(struct kfd_dev *dev);
> +
> +#endif
> -- 
> 1.9.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 09/25] amdkfd: Add amdkfd skeleton driver
  2014-07-20 17:09   ` Jerome Glisse
@ 2014-08-02 19:55     ` Oded Gabbay
  0 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-08-02 19:55 UTC (permalink / raw)
  To: Jerome Glisse, linux-kernel, dri-devel
  Cc: Andrew Lewycky, Michel Dänzer, Alexey Skidanov,
	Andrew Morton



On 20/07/14 20:09, Jerome Glisse wrote:
> On Thu, Jul 17, 2014 at 04:29:16PM +0300, Oded Gabbay wrote:
>> This patch adds the amdkfd skeleton driver. The driver does nothing except
>> define a /dev/kfd device.
>>
>> It returns -ENODEV on all amdkfd IOCTLs.
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> ---
>>  drivers/gpu/drm/radeon/Kconfig              |   2 +
>>  drivers/gpu/drm/radeon/Makefile             |   2 +
>>  drivers/gpu/drm/radeon/amdkfd/Kconfig       |  10 ++
>>  drivers/gpu/drm/radeon/amdkfd/Makefile      |   9 ++
>>  drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c | 203 ++++++++++++++++++++++++++++
>>  drivers/gpu/drm/radeon/amdkfd/kfd_device.c  | 129 ++++++++++++++++++
>>  drivers/gpu/drm/radeon/amdkfd/kfd_module.c  |  98 ++++++++++++++
>>  drivers/gpu/drm/radeon/amdkfd/kfd_priv.h    |  81 +++++++++++
>>  8 files changed, 534 insertions(+)
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/Kconfig
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/Makefile
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>>
>> diff --git a/drivers/gpu/drm/radeon/Kconfig b/drivers/gpu/drm/radeon/Kconfig
>> index 970f8e9..b697321 100644
>> --- a/drivers/gpu/drm/radeon/Kconfig
>> +++ b/drivers/gpu/drm/radeon/Kconfig
>> @@ -6,3 +6,5 @@ config DRM_RADEON_UMS
>>  
>>  	  Userspace modesetting is deprecated for quite some time now, so
>>  	  enable this only if you have ancient versions of the DDX drivers.
>> +
>> +source "drivers/gpu/drm/radeon/amdkfd/Kconfig"
>> diff --git a/drivers/gpu/drm/radeon/Makefile b/drivers/gpu/drm/radeon/Makefile
>> index a1c913d..50823a1 100644
>> --- a/drivers/gpu/drm/radeon/Makefile
>> +++ b/drivers/gpu/drm/radeon/Makefile
>> @@ -112,4 +112,6 @@ radeon-$(CONFIG_ACPI) += radeon_acpi.o
>>  
>>  obj-$(CONFIG_DRM_RADEON)+= radeon.o
>>  
>> +obj-$(CONFIG_HSA_RADEON)+= amdkfd/
>> +
>>  CFLAGS_radeon_trace_points.o := -I$(src)
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/Kconfig b/drivers/gpu/drm/radeon/amdkfd/Kconfig
>> new file mode 100644
>> index 0000000..900bb34
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/Kconfig
>> @@ -0,0 +1,10 @@
>> +#
>> +# Heterogenous system architecture configuration
>> +#
>> +
>> +config HSA_RADEON
>> +	tristate "HSA kernel driver for AMD Radeon devices"
>> +	depends on DRM_RADEON && AMD_IOMMU_V2 && X86_64
>> +	default m
>> +	help
>> +	  Enable this if you want to use HSA features on AMD radeon devices.
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
>> new file mode 100644
>> index 0000000..9564e75
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
>> @@ -0,0 +1,9 @@
>> +#
>> +# Makefile for Heterogenous System Architecture support for AMD radeon devices
>> +#
>> +
>> +ccflags-y := -Iinclude/drm
>> +
>> +amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o
>> +
>> +obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
>> new file mode 100644
>> index 0000000..b98bcb7
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
>> @@ -0,0 +1,203 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <linux/device.h>
>> +#include <linux/export.h>
>> +#include <linux/err.h>
>> +#include <linux/fs.h>
>> +#include <linux/sched.h>
>> +#include <linux/slab.h>
>> +#include <linux/uaccess.h>
>> +#include <linux/compat.h>
>> +#include <uapi/linux/kfd_ioctl.h>
>> +#include <linux/time.h>
>> +#include <linux/mm.h>
>> +#include <linux/uaccess.h>
>> +#include <uapi/asm-generic/mman-common.h>
>> +#include <asm/processor.h>
>> +#include "kfd_priv.h"
>> +
>> +static long kfd_ioctl(struct file *, unsigned int, unsigned long);
>> +static int kfd_open(struct inode *, struct file *);
>> +
>> +static const char kfd_dev_name[] = "kfd";
>> +
>> +static const struct file_operations kfd_fops = {
>> +	.owner = THIS_MODULE,
>> +	.unlocked_ioctl = kfd_ioctl,
>> +	.compat_ioctl = kfd_ioctl,
>> +	.open = kfd_open,
>> +};
>> +
>> +static int kfd_char_dev_major = -1;
>> +static struct class *kfd_class;
>> +struct device *kfd_device;
>> +
>> +int kfd_chardev_init(void)
>> +{
>> +	int err = 0;
>> +
>> +	kfd_char_dev_major = register_chrdev(0, kfd_dev_name, &kfd_fops);
>> +	err = kfd_char_dev_major;
>> +	if (err < 0)
>> +		goto err_register_chrdev;
>> +
>> +	kfd_class = class_create(THIS_MODULE, kfd_dev_name);
>> +	err = PTR_ERR(kfd_class);
>> +	if (IS_ERR(kfd_class))
>> +		goto err_class_create;
>> +
>> +	kfd_device = device_create(kfd_class, NULL, MKDEV(kfd_char_dev_major, 0), NULL, kfd_dev_name);
>> +	err = PTR_ERR(kfd_device);
>> +	if (IS_ERR(kfd_device))
>> +		goto err_device_create;
>> +
>> +	return 0;
>> +
>> +err_device_create:
>> +	class_destroy(kfd_class);
>> +err_class_create:
>> +	unregister_chrdev(kfd_char_dev_major, kfd_dev_name);
>> +err_register_chrdev:
>> +	return err;
>> +}
>> +
>> +void kfd_chardev_exit(void)
>> +{
>> +	device_destroy(kfd_class, MKDEV(kfd_char_dev_major, 0));
>> +	class_destroy(kfd_class);
>> +	unregister_chrdev(kfd_char_dev_major, kfd_dev_name);
>> +}
>> +
>> +struct device *kfd_chardev(void)
>> +{
>> +	return kfd_device;
>> +}
>> +
>> +
>> +static int kfd_open(struct inode *inode, struct file *filep)
>> +{
>> +	if (iminor(inode) != 0)
>> +		return -ENODEV;
>> +
>> +	return 0;
>> +}
>> +
>> +static long kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, void __user *arg)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>> +static int kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *arg)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>> +static int kfd_ioctl_update_queue(struct file *filp, struct kfd_process *p, void __user *arg)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>> +static long kfd_ioctl_set_memory_policy(struct file *filep, struct kfd_process *p, void __user *arg)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>> +static long kfd_ioctl_get_clock_counters(struct file *filep, struct kfd_process *p, void __user *arg)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>> +
>> +static int kfd_ioctl_get_process_apertures(struct file *filp, struct kfd_process *p, void __user *arg)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>> +static long kfd_ioctl_pmc_acquire_access(struct file *filp, struct kfd_process *p, void __user *arg)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>> +static long kfd_ioctl_pmc_release_access(struct file *filp, struct kfd_process *p, void __user *arg)
>> +{
>> +	return -ENODEV;
>> +}
>> +
>> +static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>> +{
>> +	struct kfd_process *process;
>> +	long err = -EINVAL;
>> +
>> +	dev_dbg(kfd_device,
>> +		"ioctl cmd 0x%x (#%d), arg 0x%lx\n",
>> +		cmd, _IOC_NR(cmd), arg);
>> +
>> +	/* TODO: add function that retrieves process */
>> +	process = NULL;
>> +
>> +	switch (cmd) {
>> +	case KFD_IOC_CREATE_QUEUE:
>> +		err = kfd_ioctl_create_queue(filep, process, (void __user *)arg);
>> +		break;
>> +
>> +	case KFD_IOC_DESTROY_QUEUE:
>> +		err = kfd_ioctl_destroy_queue(filep, process, (void __user *)arg);
>> +		break;
>> +
>> +	case KFD_IOC_SET_MEMORY_POLICY:
>> +		err = kfd_ioctl_set_memory_policy(filep, process, (void __user *)arg);
>> +		break;
>> +
>> +	case KFD_IOC_GET_CLOCK_COUNTERS:
>> +		err = kfd_ioctl_get_clock_counters(filep, process, (void __user *)arg);
>> +		break;
>> +
>> +	case KFD_IOC_GET_PROCESS_APERTURES:
>> +		err = kfd_ioctl_get_process_apertures(filep, process, (void __user *)arg);
>> +		break;
>> +
>> +	case KFD_IOC_UPDATE_QUEUE:
>> +		err = kfd_ioctl_update_queue(filep, process, (void __user *)arg);
>> +		break;
>> +
>> +	case KFD_IOC_PMC_ACQUIRE_ACCESS:
>> +		err = kfd_ioctl_pmc_acquire_access(filep, process, (void __user *) arg);
>> +		break;
>> +
>> +	case KFD_IOC_PMC_RELEASE_ACCESS:
>> +		err = kfd_ioctl_pmc_release_access(filep, process, (void __user *) arg);
>> +		break;
>> +
>> +	default:
>> +		dev_err(kfd_device,
>> +			"unknown ioctl cmd 0x%x, arg 0x%lx)\n",
>> +			cmd, arg);
>> +		err = -EINVAL;
>> +		break;
>> +	}
>> +
>> +	if (err < 0)
>> +		dev_err(kfd_device, "ioctl error %ld\n", err);
>> +
>> +	return err;
>> +}
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>> new file mode 100644
>> index 0000000..dd63ce09
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>> @@ -0,0 +1,129 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <linux/amd-iommu.h>
>> +#include <linux/bsearch.h>
>> +#include <linux/pci.h>
>> +#include <linux/slab.h>
>> +#include "kfd_priv.h"
>> +
>> +static const struct kfd_device_info kaveri_device_info = {
>> +	.max_pasid_bits = 16,
>> +};
>> +
>> +struct kfd_deviceid {
>> +	unsigned short did;
>> +	const struct kfd_device_info *device_info;
>> +};
>> +
>> +/* Please keep this sorted by increasing device id. */
>> +static const struct kfd_deviceid supported_devices[] = {
>> +	{ 0x1304, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x1305, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x1306, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x1307, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x1309, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x130A, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x130B, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x130C, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x130D, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x130E, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x130F, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x1310, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x1311, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x1312, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x1313, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x1315, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x1316, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x1317, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x1318, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x131B, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x131C, &kaveri_device_info },	/* Kaveri */
>> +	{ 0x131D, &kaveri_device_info },	/* Kaveri */
>> +};
>> +
>> +static const struct kfd_device_info *lookup_device_info(unsigned short did)
>> +{
>> +	size_t i;
>> +
>> +	for (i = 0; i < ARRAY_SIZE(supported_devices); i++) {
>> +		if (supported_devices[i].did == did) {
>> +			BUG_ON(supported_devices[i].device_info == NULL);
>> +			return supported_devices[i].device_info;
>> +		}
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev)
>> +{
>> +	struct kfd_dev *kfd;
>> +
>> +	const struct kfd_device_info *device_info = lookup_device_info(pdev->device);
>> +
>> +	if (!device_info)
>> +		return NULL;
>> +
>> +	kfd = kzalloc(sizeof(*kfd), GFP_KERNEL);
>> +	if (!kfd)
>> +		return NULL;
>> +
>> +	kfd->kgd = kgd;
>> +	kfd->device_info = device_info;
>> +	kfd->pdev = pdev;
>> +
>> +	return kfd;
>> +}
>> +
>> +bool kgd2kfd_device_init(struct kfd_dev *kfd,
>> +			 const struct kgd2kfd_shared_resources *gpu_resources)
>> +{
>> +	kfd->shared_resources = *gpu_resources;
>> +
>> +	kfd->init_complete = true;
>> +	dev_info(kfd_device, "added device (%x:%x)\n", kfd->pdev->vendor,
>> +		 kfd->pdev->device);
>> +
>> +	return true;
>> +}
>> +
>> +void kgd2kfd_device_exit(struct kfd_dev *kfd)
>> +{
>> +	kfree(kfd);
>> +}
>> +
>> +void kgd2kfd_suspend(struct kfd_dev *kfd)
>> +{
>> +	BUG_ON(kfd == NULL);
>> +}
>> +
>> +int kgd2kfd_resume(struct kfd_dev *kfd)
>> +{
>> +	BUG_ON(kfd == NULL);
>> +
>> +	return 0;
>> +}
>> +
>> +void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry)
>> +{
>> +}
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>> new file mode 100644
>> index 0000000..c7faac6
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>> @@ -0,0 +1,98 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <linux/module.h>
>> +#include <linux/sched.h>
>> +#include <linux/notifier.h>
>> +#include <linux/moduleparam.h>
>> +#include <linux/device.h>
>> +#include "kfd_priv.h"
>> +
>> +#define KFD_DRIVER_AUTHOR	"AMD Inc. and others"
>> +
>> +#define KFD_DRIVER_DESC		"Standalone HSA driver for AMD's GPUs"
>> +#define KFD_DRIVER_DATE		"20140710"
>> +#define KFD_DRIVER_MAJOR	0
>> +#define KFD_DRIVER_MINOR	6
>> +#define KFD_DRIVER_PATCHLEVEL	2
>> +
>> +const struct kfd2kgd_calls *kfd2kgd;
>> +static const struct kgd2kfd_calls kgd2kfd = {
>> +	.exit		= kgd2kfd_exit,
>> +	.probe		= kgd2kfd_probe,
>> +	.device_init	= kgd2kfd_device_init,
>> +	.device_exit	= kgd2kfd_device_exit,
>> +	.interrupt	= kgd2kfd_interrupt,
>> +	.suspend	= kgd2kfd_suspend,
>> +	.resume		= kgd2kfd_resume,
>> +};
>> +
>> +bool kgd2kfd_init(unsigned interface_version,
>> +		  const struct kfd2kgd_calls *f2g,
>> +		  const struct kgd2kfd_calls **g2f)
>> +{
>> +	/* Only one interface version is supported, no kfd/kgd version skew allowed. */
>> +	if (interface_version != KFD_INTERFACE_VERSION)
>> +		return false;
> 
> I am guessing this is for out of tree module ? Because otherwise this is
> useless.
> 
Yes
>> +
>> +	kfd2kgd = f2g;
>> +	*g2f = &kgd2kfd;
>> +
>> +	return true;
>> +}
>> +EXPORT_SYMBOL(kgd2kfd_init);
>> +
>> +void kgd2kfd_exit(void)
>> +{
>> +}
>> +
>> +static int __init kfd_module_init(void)
>> +{
>> +	int err;
>> +
>> +	err = kfd_chardev_init();
>> +	if (err < 0)
>> +		goto err_ioctl;
>> +
>> +	dev_info(kfd_device, "Initialized module\n");
>> +
> 
> Improve dev_info to provide some meaning full information like bus id, device name.
> 
There is a single kfd module for all GPU devices, so at this stage, I
can't print the information you requested.
That information is printed when a device is added to kfd (by
kgd2kfd_probe and kgd2kfd_device_init).
>> +	return 0;
>> +
>> +err_ioctl:
>> +	return err;
>> +}
>> +
>> +static void __exit kfd_module_exit(void)
>> +{
>> +	kfd_chardev_exit();
>> +	dev_info(kfd_device, "Removed module\n");
>> +}
> 
> Same as for module_init, improve dev_info.
> 
Same answer as for module_init
>> +
>> +module_init(kfd_module_init);
>> +module_exit(kfd_module_exit);
>> +
>> +MODULE_AUTHOR(KFD_DRIVER_AUTHOR);
>> +MODULE_DESCRIPTION(KFD_DRIVER_DESC);
>> +MODULE_LICENSE("GPL and additional rights");
> 
> I would like to see all copyright header to reflect that ie to clearly
> state that it could be either licensed under GPL or under the BSD license
> that you are using.
I wrote exactly what is written in radeon_drv.c (last line of the file).
However, we are researching this issue and maybe we will change it
before the upstream.
> 
>> +MODULE_VERSION(__stringify(KFD_DRIVER_MAJOR) "."
>> +	       __stringify(KFD_DRIVER_MINOR) "."
>> +	       __stringify(KFD_DRIVER_PATCHLEVEL));
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> new file mode 100644
>> index 0000000..05e892f
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> @@ -0,0 +1,81 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#ifndef KFD_PRIV_H_INCLUDED
>> +#define KFD_PRIV_H_INCLUDED
>> +
>> +#include <linux/hashtable.h>
>> +#include <linux/mmu_notifier.h>
>> +#include <linux/mutex.h>
>> +#include <linux/types.h>
>> +#include <linux/atomic.h>
>> +#include <linux/workqueue.h>
>> +#include <linux/spinlock.h>
>> +#include "../radeon_kfd.h"
>> +
>> +struct kfd_device_info {
>> +	const struct kfd_scheduler_class *scheduler_class;
>> +	unsigned int max_pasid_bits;
>> +	size_t ih_ring_entry_size;
>> +};
>> +
>> +struct kfd_dev {
>> +	struct kgd_dev *kgd;
>> +
>> +	const struct kfd_device_info *device_info;
>> +	struct pci_dev *pdev;
>> +
>> +	bool init_complete;
>> +
>> +	unsigned int id;		/* topology stub index */
>> +
>> +	struct kgd2kfd_shared_resources shared_resources;
>> +};
>> +
>> +/* KGD2KFD callbacks */
>> +void kgd2kfd_exit(void);
>> +struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev);
>> +bool kgd2kfd_device_init(struct kfd_dev *kfd,
>> +			 const struct kgd2kfd_shared_resources *gpu_resources);
>> +void kgd2kfd_device_exit(struct kfd_dev *kfd);
>> +
>> +extern const struct kfd2kgd_calls *kfd2kgd;
>> +
>> +/* Character device interface */
>> +int kfd_chardev_init(void);
>> +void kfd_chardev_exit(void);
>> +struct device *kfd_chardev(void);
>> +
>> +/* Process data */
>> +struct kfd_process {
>> +};
>> +
>> +extern struct device *kfd_device;
>> +
>> +/* Interrupts */
>> +void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
>> +
>> +/* Power Management */
>> +void kgd2kfd_suspend(struct kfd_dev *dev);
>> +int kgd2kfd_resume(struct kfd_dev *dev);
>> +
>> +#endif
>> -- 
>> 1.9.1
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 10/25] amdkfd: Add topology module to amdkfd
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (5 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 09/25] amdkfd: Add amdkfd skeleton driver Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-20 22:37   ` Jerome Glisse
  2014-07-17 13:29 ` [PATCH v2 11/25] amdkfd: Add basic modules " Oded Gabbay
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Evgeny Pinchuk <evgeny.pinchuk@amd.com>

This patch adds the topology module to the driver. The topology is exposed to
userspace through the sysfs.

The calls to add and remove a device to/from topology are done by the radeon
driver.

Signed-off-by: Evgeny Pinchuk <evgeny.pinchuk@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/Makefile       |    2 +-
 drivers/gpu/drm/radeon/amdkfd/kfd_crat.h     |  294 +++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_device.c   |    7 +
 drivers/gpu/drm/radeon/amdkfd/kfd_module.c   |    7 +
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h     |   17 +
 drivers/gpu/drm/radeon/amdkfd/kfd_topology.c | 1207 ++++++++++++++++++++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_topology.h |  168 ++++
 7 files changed, 1701 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.h

diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
index 9564e75..08ecfcd 100644
--- a/drivers/gpu/drm/radeon/amdkfd/Makefile
+++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
@@ -4,6 +4,6 @@
 
 ccflags-y := -Iinclude/drm
 
-amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o
+amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o
 
 obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h b/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
new file mode 100644
index 0000000..a374fa3
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
@@ -0,0 +1,294 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef KFD_CRAT_H_INCLUDED
+#define KFD_CRAT_H_INCLUDED
+
+#include <linux/types.h>
+
+#pragma pack(1)
+
+/*
+ * 4CC signature values for the CRAT and CDIT ACPI tables
+ */
+
+#define CRAT_SIGNATURE	"CRAT"
+#define CDIT_SIGNATURE	"CDIT"
+
+/*
+ * Component Resource Association Table (CRAT)
+ */
+
+#define CRAT_OEMID_LENGTH	6
+#define CRAT_OEMTABLEID_LENGTH	8
+#define CRAT_RESERVED_LENGTH	6
+
+#define CRAT_OEMID_64BIT_MASK ((1ULL << (CRAT_OEMID_LENGTH * 8)) - 1)
+
+struct crat_header {
+	uint32_t	signature;
+	uint32_t	length;
+	uint8_t		revision;
+	uint8_t		checksum;
+	uint8_t		oem_id[CRAT_OEMID_LENGTH];
+	uint8_t		oem_table_id[CRAT_OEMTABLEID_LENGTH];
+	uint32_t	oem_revision;
+	uint32_t	creator_id;
+	uint32_t	creator_revision;
+	uint32_t	total_entries;
+	uint16_t	num_domains;
+	uint8_t		reserved[CRAT_RESERVED_LENGTH];
+};
+
+/*
+ * The header structure is immediately followed by total_entries of the
+ * data definitions
+ */
+
+/*
+ * The currently defined subtype entries in the CRAT
+ */
+#define CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY	0
+#define CRAT_SUBTYPE_MEMORY_AFFINITY		1
+#define CRAT_SUBTYPE_CACHE_AFFINITY		2
+#define CRAT_SUBTYPE_TLB_AFFINITY		3
+#define CRAT_SUBTYPE_CCOMPUTE_AFFINITY		4
+#define CRAT_SUBTYPE_IOLINK_AFFINITY		5
+#define CRAT_SUBTYPE_MAX			6
+
+#define CRAT_SIBLINGMAP_SIZE	32
+
+/*
+ * ComputeUnit Affinity structure and definitions
+ */
+#define CRAT_CU_FLAGS_ENABLED		0x00000001
+#define CRAT_CU_FLAGS_HOT_PLUGGABLE	0x00000002
+#define CRAT_CU_FLAGS_CPU_PRESENT	0x00000004
+#define CRAT_CU_FLAGS_GPU_PRESENT	0x00000008
+#define CRAT_CU_FLAGS_IOMMU_PRESENT	0x00000010
+#define CRAT_CU_FLAGS_RESERVED		0xffffffe0
+
+#define CRAT_COMPUTEUNIT_RESERVED_LENGTH 4
+
+struct crat_subtype_computeunit {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	proximity_domain;
+	uint32_t	processor_id_low;
+	uint16_t	num_cpu_cores;
+	uint16_t	num_simd_cores;
+	uint16_t	max_waves_simd;
+	uint16_t	io_count;
+	uint16_t	hsa_capability;
+	uint16_t	lds_size_in_kb;
+	uint8_t		wave_front_size;
+	uint8_t		num_banks;
+	uint16_t	micro_engine_id;
+	uint8_t		num_arrays;
+	uint8_t		num_cu_per_array;
+	uint8_t		num_simd_per_cu;
+	uint8_t		max_slots_scatch_cu;
+	uint8_t		reserved2[CRAT_COMPUTEUNIT_RESERVED_LENGTH];
+};
+
+/*
+ * HSA Memory Affinity structure and definitions
+ */
+#define CRAT_MEM_FLAGS_ENABLED		0x00000001
+#define CRAT_MEM_FLAGS_HOT_PLUGGABLE	0x00000002
+#define CRAT_MEM_FLAGS_NON_VOLATILE	0x00000004
+#define CRAT_MEM_FLAGS_RESERVED		0xfffffff8
+
+#define CRAT_MEMORY_RESERVED_LENGTH 8
+
+struct crat_subtype_memory {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	promixity_domain;
+	uint32_t	base_addr_low;
+	uint32_t	base_addr_high;
+	uint32_t	length_low;
+	uint32_t	length_high;
+	uint32_t	width;
+	uint8_t		reserved2[CRAT_MEMORY_RESERVED_LENGTH];
+};
+
+/*
+ * HSA Cache Affinity structure and definitions
+ */
+#define CRAT_CACHE_FLAGS_ENABLED	0x00000001
+#define CRAT_CACHE_FLAGS_DATA_CACHE	0x00000002
+#define CRAT_CACHE_FLAGS_INST_CACHE	0x00000004
+#define CRAT_CACHE_FLAGS_CPU_CACHE	0x00000008
+#define CRAT_CACHE_FLAGS_SIMD_CACHE	0x00000010
+#define CRAT_CACHE_FLAGS_RESERVED	0xffffffe0
+
+#define CRAT_CACHE_RESERVED_LENGTH 8
+
+struct crat_subtype_cache {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	processor_id_low;
+	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
+	uint32_t	cache_size;
+	uint8_t		cache_level;
+	uint8_t		lines_per_tag;
+	uint16_t	cache_line_size;
+	uint8_t		associativity;
+	uint8_t		cache_properties;
+	uint16_t	cache_latency;
+	uint8_t		reserved2[CRAT_CACHE_RESERVED_LENGTH];
+};
+
+/*
+ * HSA TLB Affinity structure and definitions
+ */
+#define CRAT_TLB_FLAGS_ENABLED	0x00000001
+#define CRAT_TLB_FLAGS_DATA_TLB	0x00000002
+#define CRAT_TLB_FLAGS_INST_TLB	0x00000004
+#define CRAT_TLB_FLAGS_CPU_TLB	0x00000008
+#define CRAT_TLB_FLAGS_SIMD_TLB	0x00000010
+#define CRAT_TLB_FLAGS_RESERVED	0xffffffe0
+
+#define CRAT_TLB_RESERVED_LENGTH 4
+
+struct crat_subtype_tlb {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	processor_id_low;
+	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
+	uint32_t	tlb_level;
+	uint8_t		data_tlb_associativity_2mb;
+	uint8_t		data_tlb_size_2mb;
+	uint8_t		instruction_tlb_associativity_2mb;
+	uint8_t		instruction_tlb_size_2mb;
+	uint8_t		data_tlb_associativity_4k;
+	uint8_t		data_tlb_size_4k;
+	uint8_t		instruction_tlb_associativity_4k;
+	uint8_t		instruction_tlb_size_4k;
+	uint8_t		data_tlb_associativity_1gb;
+	uint8_t		data_tlb_size_1gb;
+	uint8_t		instruction_tlb_associativity_1gb;
+	uint8_t		instruction_tlb_size_1gb;
+	uint8_t		reserved2[CRAT_TLB_RESERVED_LENGTH];
+};
+
+/*
+ * HSA CCompute/APU Affinity structure and definitions
+ */
+#define CRAT_CCOMPUTE_FLAGS_ENABLED	0x00000001
+#define CRAT_CCOMPUTE_FLAGS_RESERVED	0xfffffffe
+
+#define CRAT_CCOMPUTE_RESERVED_LENGTH 16
+
+struct crat_subtype_ccompute {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	processor_id_low;
+	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
+	uint32_t	apu_size;
+	uint8_t		reserved2[CRAT_CCOMPUTE_RESERVED_LENGTH];
+};
+
+/*
+ * HSA IO Link Affinity structure and definitions
+ */
+#define CRAT_IOLINK_FLAGS_ENABLED	0x00000001
+#define CRAT_IOLINK_FLAGS_COHERENCY	0x00000002
+#define CRAT_IOLINK_FLAGS_RESERVED	0xfffffffc
+
+/*
+ * IO interface types
+ */
+#define CRAT_IOLINK_TYPE_UNDEFINED	0
+#define CRAT_IOLINK_TYPE_HYPERTRANSPORT	1
+#define CRAT_IOLINK_TYPE_PCIEXPRESS	2
+#define CRAT_IOLINK_TYPE_OTHER		3
+#define CRAT_IOLINK_TYPE_MAX		255
+
+#define CRAT_IOLINK_RESERVED_LENGTH 24
+
+struct crat_subtype_iolink {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	proximity_domain_from;
+	uint32_t	proximity_domain_to;
+	uint8_t		io_interface_type;
+	uint8_t		version_major;
+	uint16_t	version_minor;
+	uint32_t	minimum_latency;
+	uint32_t	maximum_latency;
+	uint32_t	minimum_bandwidth_mbs;
+	uint32_t	maximum_bandwidth_mbs;
+	uint32_t	recommended_transfer_size;
+	uint8_t		reserved2[CRAT_IOLINK_RESERVED_LENGTH];
+};
+
+/*
+ * HSA generic sub-type header
+ */
+
+#define CRAT_SUBTYPE_FLAGS_ENABLED 0x00000001
+
+struct crat_subtype_generic {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+};
+
+/*
+ * Component Locality Distance Information Table (CDIT)
+ */
+#define CDIT_OEMID_LENGTH	6
+#define CDIT_OEMTABLEID_LENGTH	8
+
+struct cdit_header {
+	uint32_t	signature;
+	uint32_t	length;
+	uint8_t		revision;
+	uint8_t		checksum;
+	uint8_t		oem_id[CDIT_OEMID_LENGTH];
+	uint8_t		oem_table_id[CDIT_OEMTABLEID_LENGTH];
+	uint32_t	oem_revision;
+	uint32_t	creator_id;
+	uint32_t	creator_revision;
+	uint32_t	total_entries;
+	uint16_t	num_domains;
+	uint8_t		entry[1];
+};
+
+#pragma pack()
+
+#endif /* KFD_CRAT_H_INCLUDED */
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
index dd63ce09..4138694 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
@@ -100,6 +100,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 {
 	kfd->shared_resources = *gpu_resources;
 
+	if (kfd_topology_add_device(kfd) != 0)
+		return false;
+
 	kfd->init_complete = true;
 	dev_info(kfd_device, "added device (%x:%x)\n", kfd->pdev->vendor,
 		 kfd->pdev->device);
@@ -109,6 +112,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 
 void kgd2kfd_device_exit(struct kfd_dev *kfd)
 {
+	int err = kfd_topology_remove_device(kfd);
+
+	BUG_ON(err != 0);
+
 	kfree(kfd);
 }
 
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
index c7faac6..c51f981 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
@@ -73,16 +73,23 @@ static int __init kfd_module_init(void)
 	if (err < 0)
 		goto err_ioctl;
 
+	err = kfd_topology_init();
+	if (err < 0)
+		goto err_topology;
+
 	dev_info(kfd_device, "Initialized module\n");
 
 	return 0;
 
+err_topology:
+	kfd_chardev_exit();
 err_ioctl:
 	return err;
 }
 
 static void __exit kfd_module_exit(void)
 {
+	kfd_topology_shutdown();
 	kfd_chardev_exit();
 	dev_info(kfd_device, "Removed module\n");
 }
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
index 05e892f..b391e24 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -32,6 +32,14 @@
 #include <linux/spinlock.h>
 #include "../radeon_kfd.h"
 
+#define KFD_SYSFS_FILE_MODE 0444
+
+/* GPU ID hash width in bits */
+#define KFD_GPU_ID_HASH_WIDTH 16
+
+/* Macro for allocating structures */
+#define kfd_alloc_struct(ptr_to_struct)	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
+
 struct kfd_device_info {
 	const struct kfd_scheduler_class *scheduler_class;
 	unsigned int max_pasid_bits;
@@ -71,6 +79,15 @@ struct kfd_process {
 
 extern struct device *kfd_device;
 
+/* Topology */
+int kfd_topology_init(void);
+void kfd_topology_shutdown(void);
+int kfd_topology_add_device(struct kfd_dev *gpu);
+int kfd_topology_remove_device(struct kfd_dev *gpu);
+struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
+struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
+struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx);
+
 /* Interrupts */
 void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
 
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
new file mode 100644
index 0000000..30da4c3
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
@@ -0,0 +1,1207 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/errno.h>
+#include <linux/acpi.h>
+#include <linux/hash.h>
+#include <linux/cpufreq.h>
+
+#include "kfd_priv.h"
+#include "kfd_crat.h"
+#include "kfd_topology.h"
+
+static struct list_head topology_device_list;
+static int topology_crat_parsed;
+static struct kfd_system_properties sys_props;
+
+static DECLARE_RWSEM(topology_lock);
+
+struct kfd_dev *kfd_device_by_id(uint32_t gpu_id)
+{
+	struct kfd_topology_device *top_dev;
+	struct kfd_dev *device = NULL;
+
+	down_read(&topology_lock);
+
+	list_for_each_entry(top_dev, &topology_device_list, list)
+		if (top_dev->gpu_id == gpu_id) {
+			device = top_dev->gpu;
+			break;
+		}
+
+	up_read(&topology_lock);
+
+	return device;
+}
+
+struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev)
+{
+	struct kfd_topology_device *top_dev;
+	struct kfd_dev *device = NULL;
+
+	down_read(&topology_lock);
+
+	list_for_each_entry(top_dev, &topology_device_list, list)
+		if (top_dev->gpu->pdev == pdev) {
+			device = top_dev->gpu;
+			break;
+		}
+
+	up_read(&topology_lock);
+
+	return device;
+}
+
+static int kfd_topology_get_crat_acpi(void *crat_image, size_t *size)
+{
+	struct acpi_table_header *crat_table;
+	acpi_status status;
+
+	if (!size)
+		return -EINVAL;
+
+	/*
+	 * Fetch the CRAT table from ACPI
+	 */
+	status = acpi_get_table(CRAT_SIGNATURE, 0, &crat_table);
+	if (status == AE_NOT_FOUND) {
+		pr_warn("CRAT table not found\n");
+		return -ENODATA;
+	} else if (ACPI_FAILURE(status)) {
+		const char *err = acpi_format_exception(status);
+
+		pr_err("CRAT table error: %s\n", err);
+		return -EINVAL;
+	}
+
+	if (*size >= crat_table->length && crat_image != 0)
+		memcpy(crat_image, crat_table, crat_table->length);
+
+	*size = crat_table->length;
+
+	return 0;
+}
+
+static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
+		struct crat_subtype_computeunit *cu)
+{
+	BUG_ON(!dev);
+	BUG_ON(!cu);
+
+	dev->node_props.cpu_cores_count = cu->num_cpu_cores;
+	dev->node_props.cpu_core_id_base = cu->processor_id_low;
+	if (cu->hsa_capability & CRAT_CU_FLAGS_IOMMU_PRESENT)
+		dev->node_props.capability |= HSA_CAP_ATS_PRESENT;
+
+	pr_info("CU CPU: cores=%d id_base=%d\n", cu->num_cpu_cores,
+			cu->processor_id_low);
+}
+
+static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
+		struct crat_subtype_computeunit *cu)
+{
+	BUG_ON(!dev);
+	BUG_ON(!cu);
+
+	dev->node_props.simd_id_base = cu->processor_id_low;
+	dev->node_props.simd_count = cu->num_simd_cores;
+	dev->node_props.lds_size_in_kb = cu->lds_size_in_kb;
+	dev->node_props.max_waves_per_simd = cu->max_waves_simd;
+	dev->node_props.wave_front_size = cu->wave_front_size;
+	dev->node_props.mem_banks_count = cu->num_banks;
+	dev->node_props.array_count = cu->num_arrays;
+	dev->node_props.cu_per_simd_array = cu->num_cu_per_array;
+	dev->node_props.simd_per_cu = cu->num_simd_per_cu;
+	dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
+	if (cu->hsa_capability & CRAT_CU_FLAGS_HOT_PLUGGABLE)
+		dev->node_props.capability |= HSA_CAP_HOT_PLUGGABLE;
+	pr_info("CU GPU: simds=%d id_base=%d\n", cu->num_simd_cores,
+				cu->processor_id_low);
+}
+
+/* kfd_parse_subtype_cu is called when the topology mutex is already acquired */
+static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu)
+{
+	struct kfd_topology_device *dev;
+	int i = 0;
+
+	BUG_ON(!cu);
+
+	pr_info("Found CU entry in CRAT table with proximity_domain=%d caps=%x\n",
+			cu->proximity_domain, cu->hsa_capability);
+	list_for_each_entry(dev, &topology_device_list, list) {
+		if (cu->proximity_domain == i) {
+			if (cu->flags & CRAT_CU_FLAGS_CPU_PRESENT)
+				kfd_populated_cu_info_cpu(dev, cu);
+
+			if (cu->flags & CRAT_CU_FLAGS_GPU_PRESENT)
+				kfd_populated_cu_info_gpu(dev, cu);
+			break;
+		}
+		i++;
+	}
+
+	return 0;
+}
+
+/* kfd_parse_subtype_mem is called when the topology mutex is already acquired */
+static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem)
+{
+	struct kfd_mem_properties *props;
+	struct kfd_topology_device *dev;
+	int i = 0;
+
+	BUG_ON(!mem);
+
+	pr_info("Found memory entry in CRAT table with proximity_domain=%d\n",
+			mem->promixity_domain);
+	list_for_each_entry(dev, &topology_device_list, list) {
+		if (mem->promixity_domain == i) {
+			props = kfd_alloc_struct(props);
+			if (props == 0)
+				return -ENOMEM;
+
+			if (dev->node_props.cpu_cores_count == 0)
+				props->heap_type = HSA_MEM_HEAP_TYPE_FB_PRIVATE;
+			else
+				props->heap_type = HSA_MEM_HEAP_TYPE_SYSTEM;
+
+			if (mem->flags & CRAT_MEM_FLAGS_HOT_PLUGGABLE)
+				props->flags |= HSA_MEM_FLAGS_HOT_PLUGGABLE;
+			if (mem->flags & CRAT_MEM_FLAGS_NON_VOLATILE)
+				props->flags |= HSA_MEM_FLAGS_NON_VOLATILE;
+
+			props->size_in_bytes = ((uint64_t)mem->length_high << 32) +
+						mem->length_low;
+			props->width = mem->width;
+
+			dev->mem_bank_count++;
+			list_add_tail(&props->list, &dev->mem_props);
+
+			break;
+		}
+		i++;
+	}
+
+	return 0;
+}
+
+/* kfd_parse_subtype_cache is called when the topology mutex is already acquired */
+static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
+{
+	struct kfd_cache_properties *props;
+	struct kfd_topology_device *dev;
+	uint32_t id;
+
+	BUG_ON(!cache);
+
+	id = cache->processor_id_low;
+
+	pr_info("Found cache entry in CRAT table with processor_id=%d\n", id);
+	list_for_each_entry(dev, &topology_device_list, list)
+		if (id == dev->node_props.cpu_core_id_base ||
+		    id == dev->node_props.simd_id_base) {
+			props = kfd_alloc_struct(props);
+			if (props == 0)
+				return -ENOMEM;
+
+			props->processor_id_low = id;
+			props->cache_level = cache->cache_level;
+			props->cache_size = cache->cache_size;
+			props->cacheline_size = cache->cache_line_size;
+			props->cachelines_per_tag = cache->lines_per_tag;
+			props->cache_assoc = cache->associativity;
+			props->cache_latency = cache->cache_latency;
+
+			if (cache->flags & CRAT_CACHE_FLAGS_DATA_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_DATA;
+			if (cache->flags & CRAT_CACHE_FLAGS_INST_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_INSTRUCTION;
+			if (cache->flags & CRAT_CACHE_FLAGS_CPU_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_CPU;
+			if (cache->flags & CRAT_CACHE_FLAGS_SIMD_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_HSACU;
+
+			dev->cache_count++;
+			dev->node_props.caches_count++;
+			list_add_tail(&props->list, &dev->cache_props);
+
+			break;
+		}
+
+	return 0;
+}
+
+/* kfd_parse_subtype_iolink is called when the topology mutex is already acquired */
+static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
+{
+	struct kfd_iolink_properties *props;
+	struct kfd_topology_device *dev;
+	uint32_t i = 0;
+	uint32_t id_from;
+	uint32_t id_to;
+
+	BUG_ON(!iolink);
+
+	id_from = iolink->proximity_domain_from;
+	id_to = iolink->proximity_domain_to;
+
+	pr_info("Found IO link entry in CRAT table with id_from=%d\n", id_from);
+	list_for_each_entry(dev, &topology_device_list, list) {
+		if (id_from == i) {
+			props = kfd_alloc_struct(props);
+			if (props == 0)
+				return -ENOMEM;
+
+			props->node_from = id_from;
+			props->node_to = id_to;
+			props->ver_maj = iolink->version_major;
+			props->ver_min = iolink->version_minor;
+
+			/*
+			 * weight factor (derived from CDIR), currently always 1
+			 */
+			props->weight = 1;
+
+			props->min_latency = iolink->minimum_latency;
+			props->max_latency = iolink->maximum_latency;
+			props->min_bandwidth = iolink->minimum_bandwidth_mbs;
+			props->max_bandwidth = iolink->maximum_bandwidth_mbs;
+			props->rec_transfer_size =
+					iolink->recommended_transfer_size;
+
+			dev->io_link_count++;
+			dev->node_props.io_links_count++;
+			list_add_tail(&props->list, &dev->io_link_props);
+
+			break;
+		}
+		i++;
+	}
+
+	return 0;
+}
+
+static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
+{
+	struct crat_subtype_computeunit *cu;
+	struct crat_subtype_memory *mem;
+	struct crat_subtype_cache *cache;
+	struct crat_subtype_iolink *iolink;
+	int ret = 0;
+
+	BUG_ON(!sub_type_hdr);
+
+	switch (sub_type_hdr->type) {
+	case CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY:
+		cu = (struct crat_subtype_computeunit *)sub_type_hdr;
+		ret = kfd_parse_subtype_cu(cu);
+		break;
+	case CRAT_SUBTYPE_MEMORY_AFFINITY:
+		mem = (struct crat_subtype_memory *)sub_type_hdr;
+		ret = kfd_parse_subtype_mem(mem);
+		break;
+	case CRAT_SUBTYPE_CACHE_AFFINITY:
+		cache = (struct crat_subtype_cache *)sub_type_hdr;
+		ret = kfd_parse_subtype_cache(cache);
+		break;
+	case CRAT_SUBTYPE_TLB_AFFINITY:
+		/*
+		 * For now, nothing to do here
+		 */
+		pr_info("Found TLB entry in CRAT table (not processing)\n");
+		break;
+	case CRAT_SUBTYPE_CCOMPUTE_AFFINITY:
+		/*
+		 * For now, nothing to do here
+		 */
+		pr_info("Found CCOMPUTE entry in CRAT table (not processing)\n");
+		break;
+	case CRAT_SUBTYPE_IOLINK_AFFINITY:
+		iolink = (struct crat_subtype_iolink *)sub_type_hdr;
+		ret = kfd_parse_subtype_iolink(iolink);
+		break;
+	default:
+		pr_warn("Unknown subtype (%d) in CRAT\n",
+				sub_type_hdr->type);
+	}
+
+	return ret;
+}
+
+static void kfd_release_topology_device(struct kfd_topology_device *dev)
+{
+	struct kfd_mem_properties *mem;
+	struct kfd_cache_properties *cache;
+	struct kfd_iolink_properties *iolink;
+
+	BUG_ON(!dev);
+
+	list_del(&dev->list);
+
+	while (dev->mem_props.next != &dev->mem_props) {
+		mem = container_of(dev->mem_props.next,
+				struct kfd_mem_properties, list);
+		list_del(&mem->list);
+		kfree(mem);
+	}
+
+	while (dev->cache_props.next != &dev->cache_props) {
+		cache = container_of(dev->cache_props.next,
+				struct kfd_cache_properties, list);
+		list_del(&cache->list);
+		kfree(cache);
+	}
+
+	while (dev->io_link_props.next != &dev->io_link_props) {
+		iolink = container_of(dev->io_link_props.next,
+				struct kfd_iolink_properties, list);
+		list_del(&iolink->list);
+		kfree(iolink);
+	}
+
+	kfree(dev);
+
+	sys_props.num_devices--;
+}
+
+static void kfd_release_live_view(void)
+{
+	struct kfd_topology_device *dev;
+
+	while (topology_device_list.next != &topology_device_list) {
+		dev = container_of(topology_device_list.next,
+				 struct kfd_topology_device, list);
+		kfd_release_topology_device(dev);
+}
+
+	memset(&sys_props, 0, sizeof(sys_props));
+}
+
+static struct kfd_topology_device *kfd_create_topology_device(void)
+{
+	struct kfd_topology_device *dev;
+
+	dev = kfd_alloc_struct(dev);
+	if (dev == 0) {
+		pr_err("No memory to allocate a topology device");
+		return 0;
+	}
+
+	INIT_LIST_HEAD(&dev->mem_props);
+	INIT_LIST_HEAD(&dev->cache_props);
+	INIT_LIST_HEAD(&dev->io_link_props);
+
+	list_add_tail(&dev->list, &topology_device_list);
+	sys_props.num_devices++;
+
+	return dev;
+	}
+
+static int kfd_parse_crat_table(void *crat_image)
+{
+	struct kfd_topology_device *top_dev;
+	struct crat_subtype_generic *sub_type_hdr;
+	uint16_t node_id;
+	int ret;
+	struct crat_header *crat_table = (struct crat_header *)crat_image;
+	uint16_t num_nodes;
+	uint32_t image_len;
+
+	if (!crat_image)
+		return -EINVAL;
+
+	num_nodes = crat_table->num_domains;
+	image_len = crat_table->length;
+
+	pr_info("Parsing CRAT table with %d nodes\n", num_nodes);
+
+	for (node_id = 0; node_id < num_nodes; node_id++) {
+		top_dev = kfd_create_topology_device();
+		if (!top_dev) {
+			kfd_release_live_view();
+			return -ENOMEM;
+		}
+	}
+
+	sys_props.platform_id = (*((uint64_t *)crat_table->oem_id)) & CRAT_OEMID_64BIT_MASK;
+	sys_props.platform_oem = *((uint64_t *)crat_table->oem_table_id);
+	sys_props.platform_rev = crat_table->revision;
+
+	sub_type_hdr = (struct crat_subtype_generic *)(crat_table+1);
+	while ((char *)sub_type_hdr + sizeof(struct crat_subtype_generic) <
+			((char *)crat_image) + image_len) {
+		if (sub_type_hdr->flags & CRAT_SUBTYPE_FLAGS_ENABLED) {
+			ret = kfd_parse_subtype(sub_type_hdr);
+			if (ret != 0) {
+				kfd_release_live_view();
+				return ret;
+			}
+		}
+
+		sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
+				sub_type_hdr->length);
+	}
+
+	sys_props.generation_count++;
+	topology_crat_parsed = 1;
+
+	return 0;
+}
+
+
+#define sysfs_show_gen_prop(buffer, fmt, ...) \
+		snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__)
+#define sysfs_show_32bit_prop(buffer, name, value) \
+		sysfs_show_gen_prop(buffer, "%s %u\n", name, value)
+#define sysfs_show_64bit_prop(buffer, name, value) \
+		sysfs_show_gen_prop(buffer, "%s %llu\n", name, value)
+#define sysfs_show_32bit_val(buffer, value) \
+		sysfs_show_gen_prop(buffer, "%u\n", value)
+#define sysfs_show_str_val(buffer, value) \
+		sysfs_show_gen_prop(buffer, "%s\n", value)
+
+static ssize_t sysprops_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	if (attr == &sys_props.attr_genid) {
+		ret = sysfs_show_32bit_val(buffer, sys_props.generation_count);
+	} else if (attr == &sys_props.attr_props) {
+		sysfs_show_64bit_prop(buffer, "platform_oem",
+				sys_props.platform_oem);
+		sysfs_show_64bit_prop(buffer, "platform_id",
+				sys_props.platform_id);
+		ret = sysfs_show_64bit_prop(buffer, "platform_rev",
+				sys_props.platform_rev);
+	} else {
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static const struct sysfs_ops sysprops_ops = {
+	.show = sysprops_show,
+};
+
+static struct kobj_type sysprops_type = {
+	.sysfs_ops = &sysprops_ops,
+};
+
+static ssize_t iolink_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+	struct kfd_iolink_properties *iolink;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	iolink = container_of(attr, struct kfd_iolink_properties, attr);
+	sysfs_show_32bit_prop(buffer, "type", iolink->iolink_type);
+	sysfs_show_32bit_prop(buffer, "version_major", iolink->ver_maj);
+	sysfs_show_32bit_prop(buffer, "version_minor", iolink->ver_min);
+	sysfs_show_32bit_prop(buffer, "node_from", iolink->node_from);
+	sysfs_show_32bit_prop(buffer, "node_to", iolink->node_to);
+	sysfs_show_32bit_prop(buffer, "weight", iolink->weight);
+	sysfs_show_32bit_prop(buffer, "min_latency", iolink->min_latency);
+	sysfs_show_32bit_prop(buffer, "max_latency", iolink->max_latency);
+	sysfs_show_32bit_prop(buffer, "min_bandwidth", iolink->min_bandwidth);
+	sysfs_show_32bit_prop(buffer, "max_bandwidth", iolink->max_bandwidth);
+	sysfs_show_32bit_prop(buffer, "recommended_transfer_size",
+			iolink->rec_transfer_size);
+	ret = sysfs_show_32bit_prop(buffer, "flags", iolink->flags);
+
+	return ret;
+}
+
+static const struct sysfs_ops iolink_ops = {
+	.show = iolink_show,
+};
+
+static struct kobj_type iolink_type = {
+	.sysfs_ops = &iolink_ops,
+};
+
+static ssize_t mem_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+	struct kfd_mem_properties *mem;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	mem = container_of(attr, struct kfd_mem_properties, attr);
+	sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
+	sysfs_show_64bit_prop(buffer, "size_in_bytes", mem->size_in_bytes);
+	sysfs_show_32bit_prop(buffer, "flags", mem->flags);
+	sysfs_show_32bit_prop(buffer, "width", mem->width);
+	ret = sysfs_show_32bit_prop(buffer, "mem_clk_max", mem->mem_clk_max);
+
+	return ret;
+}
+
+static const struct sysfs_ops mem_ops = {
+	.show = mem_show,
+};
+
+static struct kobj_type mem_type = {
+	.sysfs_ops = &mem_ops,
+};
+
+static ssize_t kfd_cache_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+	uint32_t i;
+	struct kfd_cache_properties *cache;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	cache = container_of(attr, struct kfd_cache_properties, attr);
+	sysfs_show_32bit_prop(buffer, "processor_id_low",
+			cache->processor_id_low);
+	sysfs_show_32bit_prop(buffer, "level", cache->cache_level);
+	sysfs_show_32bit_prop(buffer, "size", cache->cache_size);
+	sysfs_show_32bit_prop(buffer, "cache_line_size", cache->cacheline_size);
+	sysfs_show_32bit_prop(buffer, "cache_lines_per_tag",
+			cache->cachelines_per_tag);
+	sysfs_show_32bit_prop(buffer, "association", cache->cache_assoc);
+	sysfs_show_32bit_prop(buffer, "latency", cache->cache_latency);
+	sysfs_show_32bit_prop(buffer, "type", cache->cache_type);
+	snprintf(buffer, PAGE_SIZE, "%ssibling_map ", buffer);
+	for (i = 0; i < KFD_TOPOLOGY_CPU_SIBLINGS; i++)
+		ret = snprintf(buffer, PAGE_SIZE, "%s%d%s",
+				buffer, cache->sibling_map[i],
+				(i == KFD_TOPOLOGY_CPU_SIBLINGS-1) ?
+						"\n" : ",");
+
+	return ret;
+}
+
+static const struct sysfs_ops cache_ops = {
+	.show = kfd_cache_show,
+};
+
+static struct kobj_type cache_type = {
+	.sysfs_ops = &cache_ops,
+};
+
+static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+	struct kfd_topology_device *dev;
+	char public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
+	uint32_t i;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	if (strcmp(attr->name, "gpu_id") == 0) {
+		dev = container_of(attr, struct kfd_topology_device,
+				attr_gpuid);
+		ret = sysfs_show_32bit_val(buffer, dev->gpu_id);
+	} else if (strcmp(attr->name, "name") == 0) {
+		dev = container_of(attr, struct kfd_topology_device,
+				attr_name);
+		for (i = 0; i < KFD_TOPOLOGY_PUBLIC_NAME_SIZE; i++) {
+			public_name[i] =
+					(char)dev->node_props.marketing_name[i];
+			if (dev->node_props.marketing_name[i] == 0)
+				break;
+		}
+		public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE-1] = 0x0;
+		ret = sysfs_show_str_val(buffer, public_name);
+	} else {
+		dev = container_of(attr, struct kfd_topology_device,
+				attr_props);
+		sysfs_show_32bit_prop(buffer, "cpu_cores_count",
+				dev->node_props.cpu_cores_count);
+		sysfs_show_32bit_prop(buffer, "simd_count",
+				dev->node_props.simd_count);
+		sysfs_show_32bit_prop(buffer, "mem_banks_count",
+				dev->node_props.mem_banks_count);
+		sysfs_show_32bit_prop(buffer, "caches_count",
+				dev->node_props.caches_count);
+		sysfs_show_32bit_prop(buffer, "io_links_count",
+				dev->node_props.io_links_count);
+		sysfs_show_32bit_prop(buffer, "cpu_core_id_base",
+				dev->node_props.cpu_core_id_base);
+		sysfs_show_32bit_prop(buffer, "simd_id_base",
+				dev->node_props.simd_id_base);
+		sysfs_show_32bit_prop(buffer, "capability",
+				dev->node_props.capability);
+		sysfs_show_32bit_prop(buffer, "max_waves_per_simd",
+				dev->node_props.max_waves_per_simd);
+		sysfs_show_32bit_prop(buffer, "lds_size_in_kb",
+				dev->node_props.lds_size_in_kb);
+		sysfs_show_32bit_prop(buffer, "gds_size_in_kb",
+				dev->node_props.gds_size_in_kb);
+		sysfs_show_32bit_prop(buffer, "wave_front_size",
+				dev->node_props.wave_front_size);
+		sysfs_show_32bit_prop(buffer, "array_count",
+				dev->node_props.array_count);
+		sysfs_show_32bit_prop(buffer, "simd_arrays_per_engine",
+				dev->node_props.simd_arrays_per_engine);
+		sysfs_show_32bit_prop(buffer, "cu_per_simd_array",
+				dev->node_props.cu_per_simd_array);
+		sysfs_show_32bit_prop(buffer, "simd_per_cu",
+				dev->node_props.simd_per_cu);
+		sysfs_show_32bit_prop(buffer, "max_slots_scratch_cu",
+				dev->node_props.max_slots_scratch_cu);
+		sysfs_show_32bit_prop(buffer, "engine_id",
+				dev->node_props.engine_id);
+		sysfs_show_32bit_prop(buffer, "vendor_id",
+				dev->node_props.vendor_id);
+		sysfs_show_32bit_prop(buffer, "device_id",
+				dev->node_props.device_id);
+		sysfs_show_32bit_prop(buffer, "location_id",
+				dev->node_props.location_id);
+		sysfs_show_32bit_prop(buffer, "max_engine_clk_fcompute",
+				kfd2kgd->get_max_engine_clock_in_mhz(
+					dev->gpu->kgd));
+		sysfs_show_64bit_prop(buffer, "local_mem_size",
+				kfd2kgd->get_vmem_size(dev->gpu->kgd));
+		ret = sysfs_show_32bit_prop(buffer, "max_engine_clk_ccompute",
+				cpufreq_quick_get_max(0)/1000);
+	}
+
+	return ret;
+}
+
+static const struct sysfs_ops node_ops = {
+	.show = node_show,
+};
+
+static struct kobj_type node_type = {
+	.sysfs_ops = &node_ops,
+};
+
+static void kfd_remove_sysfs_file(struct kobject *kobj, struct attribute *attr)
+{
+	sysfs_remove_file(kobj, attr);
+	kobject_del(kobj);
+	kobject_put(kobj);
+}
+
+static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
+{
+	struct kfd_iolink_properties *iolink;
+	struct kfd_cache_properties *cache;
+	struct kfd_mem_properties *mem;
+
+	BUG_ON(!dev);
+
+	if (dev->kobj_iolink) {
+		list_for_each_entry(iolink, &dev->io_link_props, list)
+			if (iolink->kobj) {
+				kfd_remove_sysfs_file(iolink->kobj, &iolink->attr);
+				iolink->kobj = 0;
+			}
+		kobject_del(dev->kobj_iolink);
+		kobject_put(dev->kobj_iolink);
+		dev->kobj_iolink = 0;
+	}
+
+	if (dev->kobj_cache) {
+		list_for_each_entry(cache, &dev->cache_props, list)
+			if (cache->kobj) {
+				kfd_remove_sysfs_file(cache->kobj, &cache->attr);
+				cache->kobj = 0;
+			}
+		kobject_del(dev->kobj_cache);
+		kobject_put(dev->kobj_cache);
+		dev->kobj_cache = 0;
+	}
+
+	if (dev->kobj_mem) {
+		list_for_each_entry(mem, &dev->mem_props, list)
+			if (mem->kobj) {
+				kfd_remove_sysfs_file(mem->kobj, &mem->attr);
+				mem->kobj = 0;
+			}
+		kobject_del(dev->kobj_mem);
+		kobject_put(dev->kobj_mem);
+		dev->kobj_mem = 0;
+	}
+
+	if (dev->kobj_node) {
+		sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
+		sysfs_remove_file(dev->kobj_node, &dev->attr_name);
+		sysfs_remove_file(dev->kobj_node, &dev->attr_props);
+		kobject_del(dev->kobj_node);
+		kobject_put(dev->kobj_node);
+		dev->kobj_node = 0;
+	}
+}
+
+static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
+		uint32_t id)
+{
+	struct kfd_iolink_properties *iolink;
+	struct kfd_cache_properties *cache;
+	struct kfd_mem_properties *mem;
+	int ret;
+	uint32_t i;
+
+	BUG_ON(!dev);
+
+	/*
+	 * Creating the sysfs folders
+	 */
+	BUG_ON(dev->kobj_node);
+	dev->kobj_node = kfd_alloc_struct(dev->kobj_node);
+	if (!dev->kobj_node)
+		return -ENOMEM;
+
+	ret = kobject_init_and_add(dev->kobj_node, &node_type,
+			sys_props.kobj_nodes, "%d", id);
+	if (ret < 0)
+		return ret;
+
+	dev->kobj_mem = kobject_create_and_add("mem_banks", dev->kobj_node);
+	if (!dev->kobj_mem)
+		return -ENOMEM;
+
+	dev->kobj_cache = kobject_create_and_add("caches", dev->kobj_node);
+	if (!dev->kobj_cache)
+		return -ENOMEM;
+
+	dev->kobj_iolink = kobject_create_and_add("io_links", dev->kobj_node);
+	if (!dev->kobj_iolink)
+		return -ENOMEM;
+
+	/*
+	 * Creating sysfs files for node properties
+	 */
+	dev->attr_gpuid.name = "gpu_id";
+	dev->attr_gpuid.mode = KFD_SYSFS_FILE_MODE;
+	sysfs_attr_init(&dev->attr_gpuid);
+	dev->attr_name.name = "name";
+	dev->attr_name.mode = KFD_SYSFS_FILE_MODE;
+	sysfs_attr_init(&dev->attr_name);
+	dev->attr_props.name = "properties";
+	dev->attr_props.mode = KFD_SYSFS_FILE_MODE;
+	sysfs_attr_init(&dev->attr_props);
+	ret = sysfs_create_file(dev->kobj_node, &dev->attr_gpuid);
+	if (ret < 0)
+		return ret;
+	ret = sysfs_create_file(dev->kobj_node, &dev->attr_name);
+	if (ret < 0)
+		return ret;
+	ret = sysfs_create_file(dev->kobj_node, &dev->attr_props);
+	if (ret < 0)
+		return ret;
+
+	i = 0;
+	list_for_each_entry(mem, &dev->mem_props, list) {
+		mem->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
+		if (!mem->kobj)
+			return -ENOMEM;
+		ret = kobject_init_and_add(mem->kobj, &mem_type,
+				dev->kobj_mem, "%d", i);
+		if (ret < 0)
+			return ret;
+
+		mem->attr.name = "properties";
+		mem->attr.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&mem->attr);
+		ret = sysfs_create_file(mem->kobj, &mem->attr);
+		if (ret < 0)
+			return ret;
+		i++;
+	}
+
+	i = 0;
+	list_for_each_entry(cache, &dev->cache_props, list) {
+		cache->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
+		if (!cache->kobj)
+			return -ENOMEM;
+		ret = kobject_init_and_add(cache->kobj, &cache_type,
+				dev->kobj_cache, "%d", i);
+		if (ret < 0)
+			return ret;
+
+		cache->attr.name = "properties";
+		cache->attr.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&cache->attr);
+		ret = sysfs_create_file(cache->kobj, &cache->attr);
+		if (ret < 0)
+			return ret;
+		i++;
+	}
+
+	i = 0;
+	list_for_each_entry(iolink, &dev->io_link_props, list) {
+		iolink->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
+		if (!iolink->kobj)
+			return -ENOMEM;
+		ret = kobject_init_and_add(iolink->kobj, &iolink_type,
+				dev->kobj_iolink, "%d", i);
+		if (ret < 0)
+			return ret;
+
+		iolink->attr.name = "properties";
+		iolink->attr.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&iolink->attr);
+		ret = sysfs_create_file(iolink->kobj, &iolink->attr);
+		if (ret < 0)
+			return ret;
+		i++;
+}
+
+	return 0;
+}
+
+static int kfd_build_sysfs_node_tree(void)
+{
+	struct kfd_topology_device *dev;
+	int ret;
+	uint32_t i = 0;
+
+	list_for_each_entry(dev, &topology_device_list, list) {
+		ret = kfd_build_sysfs_node_entry(dev, 0);
+		if (ret < 0)
+			return ret;
+		i++;
+	}
+
+	return 0;
+}
+
+static void kfd_remove_sysfs_node_tree(void)
+{
+	struct kfd_topology_device *dev;
+
+	list_for_each_entry(dev, &topology_device_list, list)
+		kfd_remove_sysfs_node_entry(dev);
+}
+
+static int kfd_topology_update_sysfs(void)
+{
+	int ret;
+
+	pr_info("Creating topology SYSFS entries\n");
+	if (sys_props.kobj_topology == 0) {
+		sys_props.kobj_topology = kfd_alloc_struct(sys_props.kobj_topology);
+		if (!sys_props.kobj_topology)
+			return -ENOMEM;
+
+		ret = kobject_init_and_add(sys_props.kobj_topology,
+				&sysprops_type,  &kfd_device->kobj,
+				"topology");
+		if (ret < 0)
+			return ret;
+
+		sys_props.kobj_nodes = kobject_create_and_add("nodes",
+				sys_props.kobj_topology);
+		if (!sys_props.kobj_nodes)
+			return -ENOMEM;
+
+		sys_props.attr_genid.name = "generation_id";
+		sys_props.attr_genid.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&sys_props.attr_genid);
+		ret = sysfs_create_file(sys_props.kobj_topology,
+				&sys_props.attr_genid);
+		if (ret < 0)
+			return ret;
+
+		sys_props.attr_props.name = "system_properties";
+		sys_props.attr_props.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&sys_props.attr_props);
+		ret = sysfs_create_file(sys_props.kobj_topology,
+				&sys_props.attr_props);
+		if (ret < 0)
+			return ret;
+	}
+
+	kfd_remove_sysfs_node_tree();
+
+	return kfd_build_sysfs_node_tree();
+}
+
+static void kfd_topology_release_sysfs(void)
+{
+	kfd_remove_sysfs_node_tree();
+	if (sys_props.kobj_topology) {
+		sysfs_remove_file(sys_props.kobj_topology,
+				&sys_props.attr_genid);
+		sysfs_remove_file(sys_props.kobj_topology,
+				&sys_props.attr_props);
+		if (sys_props.kobj_nodes) {
+			kobject_del(sys_props.kobj_nodes);
+			kobject_put(sys_props.kobj_nodes);
+			sys_props.kobj_nodes = 0;
+		}
+		kobject_del(sys_props.kobj_topology);
+		kobject_put(sys_props.kobj_topology);
+		sys_props.kobj_topology = 0;
+	}
+}
+
+int kfd_topology_init(void)
+{
+	void *crat_image = 0;
+	size_t image_size = 0;
+	int ret;
+
+	/*
+	 * Initialize the head for the topology device list
+	 */
+	INIT_LIST_HEAD(&topology_device_list);
+	init_rwsem(&topology_lock);
+	topology_crat_parsed = 0;
+
+	memset(&sys_props, 0, sizeof(sys_props));
+
+	/*
+	 * Get the CRAT image from the ACPI
+	 */
+	ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
+	if (ret == 0 && image_size > 0) {
+		pr_info("Found CRAT image with size=%zd\n", image_size);
+		crat_image = kmalloc(image_size, GFP_KERNEL);
+		if (!crat_image) {
+			ret = -ENOMEM;
+			pr_err("No memory for allocating CRAT image\n");
+			goto err;
+		}
+		ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
+
+		if (ret == 0) {
+			down_write(&topology_lock);
+			ret = kfd_parse_crat_table(crat_image);
+			if (ret == 0)
+				ret = kfd_topology_update_sysfs();
+			up_write(&topology_lock);
+		} else {
+			pr_err("Couldn't get CRAT table size from ACPI\n");
+		}
+		kfree(crat_image);
+	} else if (ret == -ENODATA) {
+		ret = 0;
+	} else {
+		pr_err("Couldn't get CRAT table size from ACPI\n");
+	}
+
+err:
+	pr_info("Finished initializing topology ret=%d\n", ret);
+	return ret;
+}
+
+void kfd_topology_shutdown(void)
+{
+	kfd_topology_release_sysfs();
+	kfd_release_live_view();
+}
+
+static void kfd_debug_print_topology(void)
+{
+	struct kfd_topology_device *dev;
+	uint32_t i = 0;
+
+	pr_info("DEBUG PRINT OF TOPOLOGY:");
+	list_for_each_entry(dev, &topology_device_list, list) {
+		pr_info("Node: %d\n", i);
+		pr_info("\tGPU assigned: %s\n", (dev->gpu ? "yes" : "no"));
+		pr_info("\tCPU count: %d\n", dev->node_props.cpu_cores_count);
+		pr_info("\tSIMD count: %d", dev->node_props.simd_count);
+		i++;
+	}
+}
+
+static uint32_t kfd_generate_gpu_id(struct kfd_dev *gpu)
+{
+	uint32_t hashout;
+	uint32_t buf[7];
+	int i;
+
+	if (!gpu)
+		return 0;
+
+	buf[0] = gpu->pdev->devfn;
+	buf[1] = gpu->pdev->subsystem_vendor;
+	buf[2] = gpu->pdev->subsystem_device;
+	buf[3] = gpu->pdev->device;
+	buf[4] = gpu->pdev->bus->number;
+	buf[5] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) & 0xffffffff);
+	buf[6] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) >> 32);
+
+	for (i = 0, hashout = 0; i < 7; i++)
+		hashout ^= hash_32(buf[i], KFD_GPU_ID_HASH_WIDTH);
+
+	return hashout;
+}
+
+static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
+{
+	struct kfd_topology_device *dev;
+	struct kfd_topology_device *out_dev = 0;
+
+	BUG_ON(!gpu);
+
+	list_for_each_entry(dev, &topology_device_list, list)
+		if (dev->gpu == 0 && dev->node_props.simd_count > 0) {
+			dev->gpu = gpu;
+			out_dev = dev;
+			break;
+		}
+
+	return out_dev;
+}
+
+static void kfd_notify_gpu_change(uint32_t gpu_id, int arrival)
+{
+	/*
+	 * TODO: Generate an event for thunk about the arrival/removal
+	 * of the GPU
+	 */
+}
+
+int kfd_topology_add_device(struct kfd_dev *gpu)
+{
+	uint32_t gpu_id;
+	struct kfd_topology_device *dev;
+	int res;
+
+	BUG_ON(!gpu);
+
+	gpu_id = kfd_generate_gpu_id(gpu);
+
+	pr_debug("kfd: Adding new GPU (ID: 0x%x) to topology\n", gpu_id);
+
+	down_write(&topology_lock);
+	/*
+	 * Try to assign the GPU to existing topology device (generated from
+	 * CRAT table
+	 */
+	dev = kfd_assign_gpu(gpu);
+	if (!dev) {
+		pr_info("GPU was not found in the current topology. Extending.\n");
+		kfd_debug_print_topology();
+		dev = kfd_create_topology_device();
+		if (!dev) {
+			res = -ENOMEM;
+			goto err;
+		}
+		dev->gpu = gpu;
+
+		/*
+		 * TODO: Make a call to retrieve topology information from the
+		 * GPU vBIOS
+		 */
+
+		/*
+		 * Update the SYSFS tree, since we added another topology device
+		 */
+		if (kfd_topology_update_sysfs() < 0)
+			kfd_topology_release_sysfs();
+
+	}
+
+	dev->gpu_id = gpu_id;
+	gpu->id = gpu_id;
+	dev->node_props.vendor_id = gpu->pdev->vendor;
+	dev->node_props.device_id = gpu->pdev->device;
+	dev->node_props.location_id = (gpu->pdev->bus->number << 24) +
+			(gpu->pdev->devfn & 0xffffff);
+	/*
+	 * TODO: Retrieve max engine clock values from KGD
+	 */
+
+	res = 0;
+
+err:
+	up_write(&topology_lock);
+
+	if (res == 0)
+		kfd_notify_gpu_change(gpu_id, 1);
+
+	return res;
+}
+
+int kfd_topology_remove_device(struct kfd_dev *gpu)
+{
+	struct kfd_topology_device *dev;
+	uint32_t gpu_id;
+	int res = -ENODEV;
+
+	BUG_ON(!gpu);
+
+	down_write(&topology_lock);
+
+	list_for_each_entry(dev, &topology_device_list, list)
+		if (dev->gpu == gpu) {
+			gpu_id = dev->gpu_id;
+			kfd_remove_sysfs_node_entry(dev);
+			kfd_release_topology_device(dev);
+			res = 0;
+			if (kfd_topology_update_sysfs() < 0)
+				kfd_topology_release_sysfs();
+			break;
+		}
+
+	up_write(&topology_lock);
+
+	if (res == 0)
+		kfd_notify_gpu_change(gpu_id, 0);
+
+	return res;
+}
+
+/*
+ * When idx is out of bounds, the function will return NULL
+ */
+struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx)
+{
+
+	struct kfd_topology_device *top_dev;
+	struct kfd_dev *device = NULL;
+	uint8_t device_idx = 0;
+
+	down_read(&topology_lock);
+
+	list_for_each_entry(top_dev, &topology_device_list, list) {
+		if (device_idx == idx) {
+			device = top_dev->gpu;
+			break;
+		}
+
+		device_idx++;
+	}
+
+	up_read(&topology_lock);
+
+	return device;
+
+}
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
new file mode 100644
index 0000000..989624b
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
@@ -0,0 +1,168 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __KFD_TOPOLOGY_H__
+#define __KFD_TOPOLOGY_H__
+
+#include <linux/types.h>
+#include <linux/list.h>
+#include "kfd_priv.h"
+
+#define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
+
+#define HSA_CAP_HOT_PLUGGABLE			0x00000001
+#define HSA_CAP_ATS_PRESENT			0x00000002
+#define HSA_CAP_SHARED_WITH_GRAPHICS		0x00000004
+#define HSA_CAP_QUEUE_SIZE_POW2			0x00000008
+#define HSA_CAP_QUEUE_SIZE_32BIT		0x00000010
+#define HSA_CAP_QUEUE_IDLE_EVENT		0x00000020
+#define HSA_CAP_VA_LIMIT			0x00000040
+#define HSA_CAP_WATCH_POINTS_SUPPORTED		0x00000080
+#define HSA_CAP_WATCH_POINTS_TOTALBITS_MASK	0x00000f00
+#define HSA_CAP_WATCH_POINTS_TOTALBITS_SHIFT	8
+#define HSA_CAP_RESERVED			0xfffff000
+
+struct kfd_node_properties {
+	uint32_t cpu_cores_count;
+	uint32_t simd_count;
+	uint32_t mem_banks_count;
+	uint32_t caches_count;
+	uint32_t io_links_count;
+	uint32_t cpu_core_id_base;
+	uint32_t simd_id_base;
+	uint32_t capability;
+	uint32_t max_waves_per_simd;
+	uint32_t lds_size_in_kb;
+	uint32_t gds_size_in_kb;
+	uint32_t wave_front_size;
+	uint32_t array_count;
+	uint32_t simd_arrays_per_engine;
+	uint32_t cu_per_simd_array;
+	uint32_t simd_per_cu;
+	uint32_t max_slots_scratch_cu;
+	uint32_t engine_id;
+	uint32_t vendor_id;
+	uint32_t device_id;
+	uint32_t location_id;
+	uint32_t max_engine_clk_fcompute;
+	uint32_t max_engine_clk_ccompute;
+	uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
+};
+
+#define HSA_MEM_HEAP_TYPE_SYSTEM	0
+#define HSA_MEM_HEAP_TYPE_FB_PUBLIC	1
+#define HSA_MEM_HEAP_TYPE_FB_PRIVATE	2
+#define HSA_MEM_HEAP_TYPE_GPU_GDS	3
+#define HSA_MEM_HEAP_TYPE_GPU_LDS	4
+#define HSA_MEM_HEAP_TYPE_GPU_SCRATCH	5
+
+#define HSA_MEM_FLAGS_HOT_PLUGGABLE	0x00000001
+#define HSA_MEM_FLAGS_NON_VOLATILE	0x00000002
+#define HSA_MEM_FLAGS_RESERVED		0xfffffffc
+
+struct kfd_mem_properties {
+	struct list_head	list;
+	uint32_t		heap_type;
+	uint64_t		size_in_bytes;
+	uint32_t		flags;
+	uint32_t		width;
+	uint32_t		mem_clk_max;
+	struct kobject		*kobj;
+	struct attribute	attr;
+};
+
+#define KFD_TOPOLOGY_CPU_SIBLINGS 256
+
+#define HSA_CACHE_TYPE_DATA		0x00000001
+#define HSA_CACHE_TYPE_INSTRUCTION	0x00000002
+#define HSA_CACHE_TYPE_CPU		0x00000004
+#define HSA_CACHE_TYPE_HSACU		0x00000008
+#define HSA_CACHE_TYPE_RESERVED		0xfffffff0
+
+struct kfd_cache_properties {
+	struct list_head	list;
+	uint32_t		processor_id_low;
+	uint32_t		cache_level;
+	uint32_t		cache_size;
+	uint32_t		cacheline_size;
+	uint32_t		cachelines_per_tag;
+	uint32_t		cache_assoc;
+	uint32_t		cache_latency;
+	uint32_t		cache_type;
+	uint8_t			sibling_map[KFD_TOPOLOGY_CPU_SIBLINGS];
+	struct kobject		*kobj;
+	struct attribute	attr;
+};
+
+struct kfd_iolink_properties {
+	struct list_head	list;
+	uint32_t		iolink_type;
+	uint32_t		ver_maj;
+	uint32_t		ver_min;
+	uint32_t		node_from;
+	uint32_t		node_to;
+	uint32_t		weight;
+	uint32_t		min_latency;
+	uint32_t		max_latency;
+	uint32_t		min_bandwidth;
+	uint32_t		max_bandwidth;
+	uint32_t		rec_transfer_size;
+	uint32_t		flags;
+	struct kobject		*kobj;
+	struct attribute	attr;
+};
+
+struct kfd_topology_device {
+	struct list_head		list;
+	uint32_t			gpu_id;
+	struct kfd_node_properties	node_props;
+	uint32_t			mem_bank_count;
+	struct list_head		mem_props;
+	uint32_t			cache_count;
+	struct list_head		cache_props;
+	uint32_t			io_link_count;
+	struct list_head		io_link_props;
+	struct kfd_dev			*gpu;
+	struct kobject			*kobj_node;
+	struct kobject			*kobj_mem;
+	struct kobject			*kobj_cache;
+	struct kobject			*kobj_iolink;
+	struct attribute		attr_gpuid;
+	struct attribute		attr_name;
+	struct attribute		attr_props;
+};
+
+struct kfd_system_properties {
+	uint32_t		num_devices;     /* Number of H-NUMA nodes */
+	uint32_t		generation_count;
+	uint64_t		platform_oem;
+	uint64_t		platform_id;
+	uint64_t		platform_rev;
+	struct kobject		*kobj_topology;
+	struct kobject		*kobj_nodes;
+	struct attribute	attr_genid;
+	struct attribute	attr_props;
+};
+
+
+
+#endif /* __KFD_TOPOLOGY_H__ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 10/25] amdkfd: Add topology module to amdkfd
  2014-07-17 13:29 ` [PATCH v2 10/25] amdkfd: Add topology module to amdkfd Oded Gabbay
@ 2014-07-20 22:37   ` Jerome Glisse
  2014-07-27 11:15     ` Oded Gabbay
  2014-07-27 11:26     ` Oded Gabbay
  0 siblings, 2 replies; 46+ messages in thread
From: Jerome Glisse @ 2014-07-20 22:37 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Evgeny Pinchuk, Alexey Skidanov, Alex Deucher, Andrew Morton,
	Christian König

On Thu, Jul 17, 2014 at 04:29:17PM +0300, Oded Gabbay wrote:
> From: Evgeny Pinchuk <evgeny.pinchuk@amd.com>
> 
> This patch adds the topology module to the driver. The topology is exposed to
> userspace through the sysfs.
> 
> The calls to add and remove a device to/from topology are done by the radeon
> driver.

So overall we already said that we do not want to see the cpu architecture
re-expose by hsa in its own format. This pacth is NACK. Only expose additional
non existent information and also follow the number one rules of sysfs which
is one value -> one file.

I understand the temptation to rexpose the cpu topology in your own way to
make life simpler but there is already api for this so please use what exist
today and if there is short coming than i am sure they can be fixed.

See :

/sys/devices/system/cpu/cpu*

> 
> Signed-off-by: Evgeny Pinchuk <evgeny.pinchuk@amd.com>
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/drm/radeon/amdkfd/Makefile       |    2 +-
>  drivers/gpu/drm/radeon/amdkfd/kfd_crat.h     |  294 +++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_device.c   |    7 +
>  drivers/gpu/drm/radeon/amdkfd/kfd_module.c   |    7 +
>  drivers/gpu/drm/radeon/amdkfd/kfd_priv.h     |   17 +
>  drivers/gpu/drm/radeon/amdkfd/kfd_topology.c | 1207 ++++++++++++++++++++++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_topology.h |  168 ++++
>  7 files changed, 1701 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
> 
> diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
> index 9564e75..08ecfcd 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/Makefile
> +++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
> @@ -4,6 +4,6 @@
>  
>  ccflags-y := -Iinclude/drm
>  
> -amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o
> +amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o
>  
>  obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h b/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
> new file mode 100644
> index 0000000..a374fa3
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
> @@ -0,0 +1,294 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef KFD_CRAT_H_INCLUDED
> +#define KFD_CRAT_H_INCLUDED
> +
> +#include <linux/types.h>
> +
> +#pragma pack(1)

No pragma

> +
> +/*
> + * 4CC signature values for the CRAT and CDIT ACPI tables
> + */
> +
> +#define CRAT_SIGNATURE	"CRAT"
> +#define CDIT_SIGNATURE	"CDIT"
> +
> +/*
> + * Component Resource Association Table (CRAT)
> + */
> +
> +#define CRAT_OEMID_LENGTH	6
> +#define CRAT_OEMTABLEID_LENGTH	8
> +#define CRAT_RESERVED_LENGTH	6
> +
> +#define CRAT_OEMID_64BIT_MASK ((1ULL << (CRAT_OEMID_LENGTH * 8)) - 1)
> +
> +struct crat_header {
> +	uint32_t	signature;
> +	uint32_t	length;
> +	uint8_t		revision;
> +	uint8_t		checksum;
> +	uint8_t		oem_id[CRAT_OEMID_LENGTH];
> +	uint8_t		oem_table_id[CRAT_OEMTABLEID_LENGTH];
> +	uint32_t	oem_revision;
> +	uint32_t	creator_id;
> +	uint32_t	creator_revision;
> +	uint32_t	total_entries;
> +	uint16_t	num_domains;
> +	uint8_t		reserved[CRAT_RESERVED_LENGTH];
> +};
> +
> +/*
> + * The header structure is immediately followed by total_entries of the
> + * data definitions
> + */
> +
> +/*
> + * The currently defined subtype entries in the CRAT
> + */
> +#define CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY	0
> +#define CRAT_SUBTYPE_MEMORY_AFFINITY		1
> +#define CRAT_SUBTYPE_CACHE_AFFINITY		2
> +#define CRAT_SUBTYPE_TLB_AFFINITY		3
> +#define CRAT_SUBTYPE_CCOMPUTE_AFFINITY		4
> +#define CRAT_SUBTYPE_IOLINK_AFFINITY		5
> +#define CRAT_SUBTYPE_MAX			6
> +
> +#define CRAT_SIBLINGMAP_SIZE	32
> +
> +/*
> + * ComputeUnit Affinity structure and definitions
> + */
> +#define CRAT_CU_FLAGS_ENABLED		0x00000001
> +#define CRAT_CU_FLAGS_HOT_PLUGGABLE	0x00000002
> +#define CRAT_CU_FLAGS_CPU_PRESENT	0x00000004
> +#define CRAT_CU_FLAGS_GPU_PRESENT	0x00000008
> +#define CRAT_CU_FLAGS_IOMMU_PRESENT	0x00000010
> +#define CRAT_CU_FLAGS_RESERVED		0xffffffe0
> +
> +#define CRAT_COMPUTEUNIT_RESERVED_LENGTH 4
> +
> +struct crat_subtype_computeunit {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	proximity_domain;
> +	uint32_t	processor_id_low;
> +	uint16_t	num_cpu_cores;
> +	uint16_t	num_simd_cores;
> +	uint16_t	max_waves_simd;
> +	uint16_t	io_count;
> +	uint16_t	hsa_capability;
> +	uint16_t	lds_size_in_kb;
> +	uint8_t		wave_front_size;
> +	uint8_t		num_banks;
> +	uint16_t	micro_engine_id;
> +	uint8_t		num_arrays;
> +	uint8_t		num_cu_per_array;
> +	uint8_t		num_simd_per_cu;
> +	uint8_t		max_slots_scatch_cu;
> +	uint8_t		reserved2[CRAT_COMPUTEUNIT_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA Memory Affinity structure and definitions
> + */
> +#define CRAT_MEM_FLAGS_ENABLED		0x00000001
> +#define CRAT_MEM_FLAGS_HOT_PLUGGABLE	0x00000002
> +#define CRAT_MEM_FLAGS_NON_VOLATILE	0x00000004
> +#define CRAT_MEM_FLAGS_RESERVED		0xfffffff8
> +
> +#define CRAT_MEMORY_RESERVED_LENGTH 8
> +
> +struct crat_subtype_memory {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	promixity_domain;
> +	uint32_t	base_addr_low;
> +	uint32_t	base_addr_high;
> +	uint32_t	length_low;
> +	uint32_t	length_high;
> +	uint32_t	width;
> +	uint8_t		reserved2[CRAT_MEMORY_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA Cache Affinity structure and definitions
> + */
> +#define CRAT_CACHE_FLAGS_ENABLED	0x00000001
> +#define CRAT_CACHE_FLAGS_DATA_CACHE	0x00000002
> +#define CRAT_CACHE_FLAGS_INST_CACHE	0x00000004
> +#define CRAT_CACHE_FLAGS_CPU_CACHE	0x00000008
> +#define CRAT_CACHE_FLAGS_SIMD_CACHE	0x00000010
> +#define CRAT_CACHE_FLAGS_RESERVED	0xffffffe0
> +
> +#define CRAT_CACHE_RESERVED_LENGTH 8
> +
> +struct crat_subtype_cache {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	processor_id_low;
> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
> +	uint32_t	cache_size;
> +	uint8_t		cache_level;
> +	uint8_t		lines_per_tag;
> +	uint16_t	cache_line_size;
> +	uint8_t		associativity;
> +	uint8_t		cache_properties;
> +	uint16_t	cache_latency;
> +	uint8_t		reserved2[CRAT_CACHE_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA TLB Affinity structure and definitions
> + */
> +#define CRAT_TLB_FLAGS_ENABLED	0x00000001
> +#define CRAT_TLB_FLAGS_DATA_TLB	0x00000002
> +#define CRAT_TLB_FLAGS_INST_TLB	0x00000004
> +#define CRAT_TLB_FLAGS_CPU_TLB	0x00000008
> +#define CRAT_TLB_FLAGS_SIMD_TLB	0x00000010
> +#define CRAT_TLB_FLAGS_RESERVED	0xffffffe0
> +
> +#define CRAT_TLB_RESERVED_LENGTH 4
> +
> +struct crat_subtype_tlb {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	processor_id_low;
> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
> +	uint32_t	tlb_level;
> +	uint8_t		data_tlb_associativity_2mb;
> +	uint8_t		data_tlb_size_2mb;
> +	uint8_t		instruction_tlb_associativity_2mb;
> +	uint8_t		instruction_tlb_size_2mb;
> +	uint8_t		data_tlb_associativity_4k;
> +	uint8_t		data_tlb_size_4k;
> +	uint8_t		instruction_tlb_associativity_4k;
> +	uint8_t		instruction_tlb_size_4k;
> +	uint8_t		data_tlb_associativity_1gb;
> +	uint8_t		data_tlb_size_1gb;
> +	uint8_t		instruction_tlb_associativity_1gb;
> +	uint8_t		instruction_tlb_size_1gb;
> +	uint8_t		reserved2[CRAT_TLB_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA CCompute/APU Affinity structure and definitions
> + */
> +#define CRAT_CCOMPUTE_FLAGS_ENABLED	0x00000001
> +#define CRAT_CCOMPUTE_FLAGS_RESERVED	0xfffffffe
> +
> +#define CRAT_CCOMPUTE_RESERVED_LENGTH 16
> +
> +struct crat_subtype_ccompute {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	processor_id_low;
> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
> +	uint32_t	apu_size;
> +	uint8_t		reserved2[CRAT_CCOMPUTE_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA IO Link Affinity structure and definitions
> + */
> +#define CRAT_IOLINK_FLAGS_ENABLED	0x00000001
> +#define CRAT_IOLINK_FLAGS_COHERENCY	0x00000002
> +#define CRAT_IOLINK_FLAGS_RESERVED	0xfffffffc
> +
> +/*
> + * IO interface types
> + */
> +#define CRAT_IOLINK_TYPE_UNDEFINED	0
> +#define CRAT_IOLINK_TYPE_HYPERTRANSPORT	1
> +#define CRAT_IOLINK_TYPE_PCIEXPRESS	2
> +#define CRAT_IOLINK_TYPE_OTHER		3
> +#define CRAT_IOLINK_TYPE_MAX		255
> +
> +#define CRAT_IOLINK_RESERVED_LENGTH 24
> +
> +struct crat_subtype_iolink {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	proximity_domain_from;
> +	uint32_t	proximity_domain_to;
> +	uint8_t		io_interface_type;
> +	uint8_t		version_major;
> +	uint16_t	version_minor;
> +	uint32_t	minimum_latency;
> +	uint32_t	maximum_latency;
> +	uint32_t	minimum_bandwidth_mbs;
> +	uint32_t	maximum_bandwidth_mbs;
> +	uint32_t	recommended_transfer_size;
> +	uint8_t		reserved2[CRAT_IOLINK_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA generic sub-type header
> + */
> +
> +#define CRAT_SUBTYPE_FLAGS_ENABLED 0x00000001
> +
> +struct crat_subtype_generic {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +};
> +
> +/*
> + * Component Locality Distance Information Table (CDIT)
> + */
> +#define CDIT_OEMID_LENGTH	6
> +#define CDIT_OEMTABLEID_LENGTH	8
> +
> +struct cdit_header {
> +	uint32_t	signature;
> +	uint32_t	length;
> +	uint8_t		revision;
> +	uint8_t		checksum;
> +	uint8_t		oem_id[CDIT_OEMID_LENGTH];
> +	uint8_t		oem_table_id[CDIT_OEMTABLEID_LENGTH];
> +	uint32_t	oem_revision;
> +	uint32_t	creator_id;
> +	uint32_t	creator_revision;
> +	uint32_t	total_entries;
> +	uint16_t	num_domains;
> +	uint8_t		entry[1];
> +};
> +
> +#pragma pack()
> +
> +#endif /* KFD_CRAT_H_INCLUDED */
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
> index dd63ce09..4138694 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
> @@ -100,6 +100,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>  {
>  	kfd->shared_resources = *gpu_resources;
>  
> +	if (kfd_topology_add_device(kfd) != 0)
> +		return false;
> +
>  	kfd->init_complete = true;
>  	dev_info(kfd_device, "added device (%x:%x)\n", kfd->pdev->vendor,
>  		 kfd->pdev->device);
> @@ -109,6 +112,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>  
>  void kgd2kfd_device_exit(struct kfd_dev *kfd)
>  {
> +	int err = kfd_topology_remove_device(kfd);
> +
> +	BUG_ON(err != 0);
> +
>  	kfree(kfd);
>  }
>  
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
> index c7faac6..c51f981 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
> @@ -73,16 +73,23 @@ static int __init kfd_module_init(void)
>  	if (err < 0)
>  		goto err_ioctl;
>  
> +	err = kfd_topology_init();
> +	if (err < 0)
> +		goto err_topology;
> +
>  	dev_info(kfd_device, "Initialized module\n");
>  
>  	return 0;
>  
> +err_topology:
> +	kfd_chardev_exit();
>  err_ioctl:
>  	return err;
>  }
>  
>  static void __exit kfd_module_exit(void)
>  {
> +	kfd_topology_shutdown();
>  	kfd_chardev_exit();
>  	dev_info(kfd_device, "Removed module\n");
>  }
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> index 05e892f..b391e24 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> @@ -32,6 +32,14 @@
>  #include <linux/spinlock.h>
>  #include "../radeon_kfd.h"
>  
> +#define KFD_SYSFS_FILE_MODE 0444
> +
> +/* GPU ID hash width in bits */
> +#define KFD_GPU_ID_HASH_WIDTH 16
> +
> +/* Macro for allocating structures */
> +#define kfd_alloc_struct(ptr_to_struct)	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
> +
>  struct kfd_device_info {
>  	const struct kfd_scheduler_class *scheduler_class;
>  	unsigned int max_pasid_bits;
> @@ -71,6 +79,15 @@ struct kfd_process {
>  
>  extern struct device *kfd_device;
>  
> +/* Topology */
> +int kfd_topology_init(void);
> +void kfd_topology_shutdown(void);
> +int kfd_topology_add_device(struct kfd_dev *gpu);
> +int kfd_topology_remove_device(struct kfd_dev *gpu);
> +struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
> +struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
> +struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx);
> +
>  /* Interrupts */
>  void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
>  
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
> new file mode 100644
> index 0000000..30da4c3
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
> @@ -0,0 +1,1207 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/pci.h>
> +#include <linux/errno.h>
> +#include <linux/acpi.h>
> +#include <linux/hash.h>
> +#include <linux/cpufreq.h>
> +
> +#include "kfd_priv.h"
> +#include "kfd_crat.h"
> +#include "kfd_topology.h"
> +
> +static struct list_head topology_device_list;
> +static int topology_crat_parsed;
> +static struct kfd_system_properties sys_props;
> +
> +static DECLARE_RWSEM(topology_lock);
> +
> +struct kfd_dev *kfd_device_by_id(uint32_t gpu_id)
> +{
> +	struct kfd_topology_device *top_dev;
> +	struct kfd_dev *device = NULL;
> +
> +	down_read(&topology_lock);
> +
> +	list_for_each_entry(top_dev, &topology_device_list, list)
> +		if (top_dev->gpu_id == gpu_id) {
> +			device = top_dev->gpu;
> +			break;
> +		}
> +
> +	up_read(&topology_lock);
> +
> +	return device;
> +}
> +
> +struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev)
> +{
> +	struct kfd_topology_device *top_dev;
> +	struct kfd_dev *device = NULL;
> +
> +	down_read(&topology_lock);
> +
> +	list_for_each_entry(top_dev, &topology_device_list, list)
> +		if (top_dev->gpu->pdev == pdev) {
> +			device = top_dev->gpu;
> +			break;
> +		}
> +
> +	up_read(&topology_lock);
> +
> +	return device;
> +}
> +
> +static int kfd_topology_get_crat_acpi(void *crat_image, size_t *size)
> +{
> +	struct acpi_table_header *crat_table;
> +	acpi_status status;
> +
> +	if (!size)
> +		return -EINVAL;
> +
> +	/*
> +	 * Fetch the CRAT table from ACPI
> +	 */
> +	status = acpi_get_table(CRAT_SIGNATURE, 0, &crat_table);
> +	if (status == AE_NOT_FOUND) {
> +		pr_warn("CRAT table not found\n");
> +		return -ENODATA;
> +	} else if (ACPI_FAILURE(status)) {
> +		const char *err = acpi_format_exception(status);
> +
> +		pr_err("CRAT table error: %s\n", err);
> +		return -EINVAL;
> +	}
> +
> +	if (*size >= crat_table->length && crat_image != 0)
> +		memcpy(crat_image, crat_table, crat_table->length);
> +
> +	*size = crat_table->length;
> +
> +	return 0;
> +}
> +
> +static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
> +		struct crat_subtype_computeunit *cu)
> +{
> +	BUG_ON(!dev);
> +	BUG_ON(!cu);
> +
> +	dev->node_props.cpu_cores_count = cu->num_cpu_cores;
> +	dev->node_props.cpu_core_id_base = cu->processor_id_low;
> +	if (cu->hsa_capability & CRAT_CU_FLAGS_IOMMU_PRESENT)
> +		dev->node_props.capability |= HSA_CAP_ATS_PRESENT;
> +
> +	pr_info("CU CPU: cores=%d id_base=%d\n", cu->num_cpu_cores,
> +			cu->processor_id_low);
> +}
> +
> +static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
> +		struct crat_subtype_computeunit *cu)
> +{
> +	BUG_ON(!dev);
> +	BUG_ON(!cu);
> +
> +	dev->node_props.simd_id_base = cu->processor_id_low;
> +	dev->node_props.simd_count = cu->num_simd_cores;
> +	dev->node_props.lds_size_in_kb = cu->lds_size_in_kb;
> +	dev->node_props.max_waves_per_simd = cu->max_waves_simd;
> +	dev->node_props.wave_front_size = cu->wave_front_size;
> +	dev->node_props.mem_banks_count = cu->num_banks;
> +	dev->node_props.array_count = cu->num_arrays;
> +	dev->node_props.cu_per_simd_array = cu->num_cu_per_array;
> +	dev->node_props.simd_per_cu = cu->num_simd_per_cu;
> +	dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
> +	if (cu->hsa_capability & CRAT_CU_FLAGS_HOT_PLUGGABLE)
> +		dev->node_props.capability |= HSA_CAP_HOT_PLUGGABLE;
> +	pr_info("CU GPU: simds=%d id_base=%d\n", cu->num_simd_cores,
> +				cu->processor_id_low);
> +}
> +
> +/* kfd_parse_subtype_cu is called when the topology mutex is already acquired */
> +static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu)
> +{
> +	struct kfd_topology_device *dev;
> +	int i = 0;
> +
> +	BUG_ON(!cu);
> +
> +	pr_info("Found CU entry in CRAT table with proximity_domain=%d caps=%x\n",
> +			cu->proximity_domain, cu->hsa_capability);
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		if (cu->proximity_domain == i) {
> +			if (cu->flags & CRAT_CU_FLAGS_CPU_PRESENT)
> +				kfd_populated_cu_info_cpu(dev, cu);
> +
> +			if (cu->flags & CRAT_CU_FLAGS_GPU_PRESENT)
> +				kfd_populated_cu_info_gpu(dev, cu);
> +			break;
> +		}
> +		i++;
> +	}
> +
> +	return 0;
> +}
> +
> +/* kfd_parse_subtype_mem is called when the topology mutex is already acquired */
> +static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem)
> +{
> +	struct kfd_mem_properties *props;
> +	struct kfd_topology_device *dev;
> +	int i = 0;
> +
> +	BUG_ON(!mem);
> +
> +	pr_info("Found memory entry in CRAT table with proximity_domain=%d\n",
> +			mem->promixity_domain);
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		if (mem->promixity_domain == i) {
> +			props = kfd_alloc_struct(props);
> +			if (props == 0)
> +				return -ENOMEM;
> +
> +			if (dev->node_props.cpu_cores_count == 0)
> +				props->heap_type = HSA_MEM_HEAP_TYPE_FB_PRIVATE;
> +			else
> +				props->heap_type = HSA_MEM_HEAP_TYPE_SYSTEM;
> +
> +			if (mem->flags & CRAT_MEM_FLAGS_HOT_PLUGGABLE)
> +				props->flags |= HSA_MEM_FLAGS_HOT_PLUGGABLE;
> +			if (mem->flags & CRAT_MEM_FLAGS_NON_VOLATILE)
> +				props->flags |= HSA_MEM_FLAGS_NON_VOLATILE;
> +
> +			props->size_in_bytes = ((uint64_t)mem->length_high << 32) +
> +						mem->length_low;
> +			props->width = mem->width;
> +
> +			dev->mem_bank_count++;
> +			list_add_tail(&props->list, &dev->mem_props);
> +
> +			break;
> +		}
> +		i++;
> +	}
> +
> +	return 0;
> +}
> +
> +/* kfd_parse_subtype_cache is called when the topology mutex is already acquired */
> +static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
> +{
> +	struct kfd_cache_properties *props;
> +	struct kfd_topology_device *dev;
> +	uint32_t id;
> +
> +	BUG_ON(!cache);
> +
> +	id = cache->processor_id_low;
> +
> +	pr_info("Found cache entry in CRAT table with processor_id=%d\n", id);
> +	list_for_each_entry(dev, &topology_device_list, list)
> +		if (id == dev->node_props.cpu_core_id_base ||
> +		    id == dev->node_props.simd_id_base) {
> +			props = kfd_alloc_struct(props);
> +			if (props == 0)
> +				return -ENOMEM;
> +
> +			props->processor_id_low = id;
> +			props->cache_level = cache->cache_level;
> +			props->cache_size = cache->cache_size;
> +			props->cacheline_size = cache->cache_line_size;
> +			props->cachelines_per_tag = cache->lines_per_tag;
> +			props->cache_assoc = cache->associativity;
> +			props->cache_latency = cache->cache_latency;
> +
> +			if (cache->flags & CRAT_CACHE_FLAGS_DATA_CACHE)
> +				props->cache_type |= HSA_CACHE_TYPE_DATA;
> +			if (cache->flags & CRAT_CACHE_FLAGS_INST_CACHE)
> +				props->cache_type |= HSA_CACHE_TYPE_INSTRUCTION;
> +			if (cache->flags & CRAT_CACHE_FLAGS_CPU_CACHE)
> +				props->cache_type |= HSA_CACHE_TYPE_CPU;
> +			if (cache->flags & CRAT_CACHE_FLAGS_SIMD_CACHE)
> +				props->cache_type |= HSA_CACHE_TYPE_HSACU;
> +
> +			dev->cache_count++;
> +			dev->node_props.caches_count++;
> +			list_add_tail(&props->list, &dev->cache_props);
> +
> +			break;
> +		}
> +
> +	return 0;
> +}
> +
> +/* kfd_parse_subtype_iolink is called when the topology mutex is already acquired */
> +static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
> +{
> +	struct kfd_iolink_properties *props;
> +	struct kfd_topology_device *dev;
> +	uint32_t i = 0;
> +	uint32_t id_from;
> +	uint32_t id_to;
> +
> +	BUG_ON(!iolink);
> +
> +	id_from = iolink->proximity_domain_from;
> +	id_to = iolink->proximity_domain_to;
> +
> +	pr_info("Found IO link entry in CRAT table with id_from=%d\n", id_from);
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		if (id_from == i) {
> +			props = kfd_alloc_struct(props);
> +			if (props == 0)
> +				return -ENOMEM;
> +
> +			props->node_from = id_from;
> +			props->node_to = id_to;
> +			props->ver_maj = iolink->version_major;
> +			props->ver_min = iolink->version_minor;
> +
> +			/*
> +			 * weight factor (derived from CDIR), currently always 1
> +			 */
> +			props->weight = 1;
> +
> +			props->min_latency = iolink->minimum_latency;
> +			props->max_latency = iolink->maximum_latency;
> +			props->min_bandwidth = iolink->minimum_bandwidth_mbs;
> +			props->max_bandwidth = iolink->maximum_bandwidth_mbs;
> +			props->rec_transfer_size =
> +					iolink->recommended_transfer_size;
> +
> +			dev->io_link_count++;
> +			dev->node_props.io_links_count++;
> +			list_add_tail(&props->list, &dev->io_link_props);
> +
> +			break;
> +		}
> +		i++;
> +	}
> +
> +	return 0;
> +}
> +
> +static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
> +{
> +	struct crat_subtype_computeunit *cu;
> +	struct crat_subtype_memory *mem;
> +	struct crat_subtype_cache *cache;
> +	struct crat_subtype_iolink *iolink;
> +	int ret = 0;
> +
> +	BUG_ON(!sub_type_hdr);
> +
> +	switch (sub_type_hdr->type) {
> +	case CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY:
> +		cu = (struct crat_subtype_computeunit *)sub_type_hdr;
> +		ret = kfd_parse_subtype_cu(cu);
> +		break;
> +	case CRAT_SUBTYPE_MEMORY_AFFINITY:
> +		mem = (struct crat_subtype_memory *)sub_type_hdr;
> +		ret = kfd_parse_subtype_mem(mem);
> +		break;
> +	case CRAT_SUBTYPE_CACHE_AFFINITY:
> +		cache = (struct crat_subtype_cache *)sub_type_hdr;
> +		ret = kfd_parse_subtype_cache(cache);
> +		break;
> +	case CRAT_SUBTYPE_TLB_AFFINITY:
> +		/*
> +		 * For now, nothing to do here
> +		 */
> +		pr_info("Found TLB entry in CRAT table (not processing)\n");
> +		break;
> +	case CRAT_SUBTYPE_CCOMPUTE_AFFINITY:
> +		/*
> +		 * For now, nothing to do here
> +		 */
> +		pr_info("Found CCOMPUTE entry in CRAT table (not processing)\n");
> +		break;
> +	case CRAT_SUBTYPE_IOLINK_AFFINITY:
> +		iolink = (struct crat_subtype_iolink *)sub_type_hdr;
> +		ret = kfd_parse_subtype_iolink(iolink);
> +		break;
> +	default:
> +		pr_warn("Unknown subtype (%d) in CRAT\n",
> +				sub_type_hdr->type);
> +	}
> +
> +	return ret;
> +}
> +
> +static void kfd_release_topology_device(struct kfd_topology_device *dev)
> +{
> +	struct kfd_mem_properties *mem;
> +	struct kfd_cache_properties *cache;
> +	struct kfd_iolink_properties *iolink;
> +
> +	BUG_ON(!dev);
> +
> +	list_del(&dev->list);
> +
> +	while (dev->mem_props.next != &dev->mem_props) {
> +		mem = container_of(dev->mem_props.next,
> +				struct kfd_mem_properties, list);
> +		list_del(&mem->list);
> +		kfree(mem);
> +	}
> +
> +	while (dev->cache_props.next != &dev->cache_props) {
> +		cache = container_of(dev->cache_props.next,
> +				struct kfd_cache_properties, list);
> +		list_del(&cache->list);
> +		kfree(cache);
> +	}
> +
> +	while (dev->io_link_props.next != &dev->io_link_props) {
> +		iolink = container_of(dev->io_link_props.next,
> +				struct kfd_iolink_properties, list);
> +		list_del(&iolink->list);
> +		kfree(iolink);
> +	}
> +
> +	kfree(dev);
> +
> +	sys_props.num_devices--;
> +}
> +
> +static void kfd_release_live_view(void)
> +{
> +	struct kfd_topology_device *dev;
> +
> +	while (topology_device_list.next != &topology_device_list) {
> +		dev = container_of(topology_device_list.next,
> +				 struct kfd_topology_device, list);
> +		kfd_release_topology_device(dev);
> +}
> +
> +	memset(&sys_props, 0, sizeof(sys_props));
> +}
> +
> +static struct kfd_topology_device *kfd_create_topology_device(void)
> +{
> +	struct kfd_topology_device *dev;
> +
> +	dev = kfd_alloc_struct(dev);
> +	if (dev == 0) {
> +		pr_err("No memory to allocate a topology device");
> +		return 0;
> +	}
> +
> +	INIT_LIST_HEAD(&dev->mem_props);
> +	INIT_LIST_HEAD(&dev->cache_props);
> +	INIT_LIST_HEAD(&dev->io_link_props);
> +
> +	list_add_tail(&dev->list, &topology_device_list);
> +	sys_props.num_devices++;
> +
> +	return dev;
> +	}
> +
> +static int kfd_parse_crat_table(void *crat_image)
> +{
> +	struct kfd_topology_device *top_dev;
> +	struct crat_subtype_generic *sub_type_hdr;
> +	uint16_t node_id;
> +	int ret;
> +	struct crat_header *crat_table = (struct crat_header *)crat_image;
> +	uint16_t num_nodes;
> +	uint32_t image_len;
> +
> +	if (!crat_image)
> +		return -EINVAL;
> +
> +	num_nodes = crat_table->num_domains;
> +	image_len = crat_table->length;
> +
> +	pr_info("Parsing CRAT table with %d nodes\n", num_nodes);
> +
> +	for (node_id = 0; node_id < num_nodes; node_id++) {
> +		top_dev = kfd_create_topology_device();
> +		if (!top_dev) {
> +			kfd_release_live_view();
> +			return -ENOMEM;
> +		}
> +	}
> +
> +	sys_props.platform_id = (*((uint64_t *)crat_table->oem_id)) & CRAT_OEMID_64BIT_MASK;
> +	sys_props.platform_oem = *((uint64_t *)crat_table->oem_table_id);
> +	sys_props.platform_rev = crat_table->revision;
> +
> +	sub_type_hdr = (struct crat_subtype_generic *)(crat_table+1);
> +	while ((char *)sub_type_hdr + sizeof(struct crat_subtype_generic) <
> +			((char *)crat_image) + image_len) {
> +		if (sub_type_hdr->flags & CRAT_SUBTYPE_FLAGS_ENABLED) {
> +			ret = kfd_parse_subtype(sub_type_hdr);
> +			if (ret != 0) {
> +				kfd_release_live_view();
> +				return ret;
> +			}
> +		}
> +
> +		sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
> +				sub_type_hdr->length);
> +	}
> +
> +	sys_props.generation_count++;
> +	topology_crat_parsed = 1;
> +
> +	return 0;
> +}
> +
> +
> +#define sysfs_show_gen_prop(buffer, fmt, ...) \
> +		snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__)
> +#define sysfs_show_32bit_prop(buffer, name, value) \
> +		sysfs_show_gen_prop(buffer, "%s %u\n", name, value)
> +#define sysfs_show_64bit_prop(buffer, name, value) \
> +		sysfs_show_gen_prop(buffer, "%s %llu\n", name, value)
> +#define sysfs_show_32bit_val(buffer, value) \
> +		sysfs_show_gen_prop(buffer, "%u\n", value)
> +#define sysfs_show_str_val(buffer, value) \
> +		sysfs_show_gen_prop(buffer, "%s\n", value)
> +
> +static ssize_t sysprops_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	if (attr == &sys_props.attr_genid) {
> +		ret = sysfs_show_32bit_val(buffer, sys_props.generation_count);
> +	} else if (attr == &sys_props.attr_props) {
> +		sysfs_show_64bit_prop(buffer, "platform_oem",
> +				sys_props.platform_oem);
> +		sysfs_show_64bit_prop(buffer, "platform_id",
> +				sys_props.platform_id);
> +		ret = sysfs_show_64bit_prop(buffer, "platform_rev",
> +				sys_props.platform_rev);
> +	} else {
> +		ret = -EINVAL;
> +	}
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops sysprops_ops = {
> +	.show = sysprops_show,
> +};
> +
> +static struct kobj_type sysprops_type = {
> +	.sysfs_ops = &sysprops_ops,
> +};
> +
> +static ssize_t iolink_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +	struct kfd_iolink_properties *iolink;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	iolink = container_of(attr, struct kfd_iolink_properties, attr);
> +	sysfs_show_32bit_prop(buffer, "type", iolink->iolink_type);
> +	sysfs_show_32bit_prop(buffer, "version_major", iolink->ver_maj);
> +	sysfs_show_32bit_prop(buffer, "version_minor", iolink->ver_min);
> +	sysfs_show_32bit_prop(buffer, "node_from", iolink->node_from);
> +	sysfs_show_32bit_prop(buffer, "node_to", iolink->node_to);
> +	sysfs_show_32bit_prop(buffer, "weight", iolink->weight);
> +	sysfs_show_32bit_prop(buffer, "min_latency", iolink->min_latency);
> +	sysfs_show_32bit_prop(buffer, "max_latency", iolink->max_latency);
> +	sysfs_show_32bit_prop(buffer, "min_bandwidth", iolink->min_bandwidth);
> +	sysfs_show_32bit_prop(buffer, "max_bandwidth", iolink->max_bandwidth);
> +	sysfs_show_32bit_prop(buffer, "recommended_transfer_size",
> +			iolink->rec_transfer_size);
> +	ret = sysfs_show_32bit_prop(buffer, "flags", iolink->flags);
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops iolink_ops = {
> +	.show = iolink_show,
> +};
> +
> +static struct kobj_type iolink_type = {
> +	.sysfs_ops = &iolink_ops,
> +};
> +
> +static ssize_t mem_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +	struct kfd_mem_properties *mem;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	mem = container_of(attr, struct kfd_mem_properties, attr);
> +	sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
> +	sysfs_show_64bit_prop(buffer, "size_in_bytes", mem->size_in_bytes);
> +	sysfs_show_32bit_prop(buffer, "flags", mem->flags);
> +	sysfs_show_32bit_prop(buffer, "width", mem->width);
> +	ret = sysfs_show_32bit_prop(buffer, "mem_clk_max", mem->mem_clk_max);
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops mem_ops = {
> +	.show = mem_show,
> +};
> +
> +static struct kobj_type mem_type = {
> +	.sysfs_ops = &mem_ops,
> +};
> +
> +static ssize_t kfd_cache_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +	uint32_t i;
> +	struct kfd_cache_properties *cache;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	cache = container_of(attr, struct kfd_cache_properties, attr);
> +	sysfs_show_32bit_prop(buffer, "processor_id_low",
> +			cache->processor_id_low);
> +	sysfs_show_32bit_prop(buffer, "level", cache->cache_level);
> +	sysfs_show_32bit_prop(buffer, "size", cache->cache_size);
> +	sysfs_show_32bit_prop(buffer, "cache_line_size", cache->cacheline_size);
> +	sysfs_show_32bit_prop(buffer, "cache_lines_per_tag",
> +			cache->cachelines_per_tag);
> +	sysfs_show_32bit_prop(buffer, "association", cache->cache_assoc);
> +	sysfs_show_32bit_prop(buffer, "latency", cache->cache_latency);
> +	sysfs_show_32bit_prop(buffer, "type", cache->cache_type);
> +	snprintf(buffer, PAGE_SIZE, "%ssibling_map ", buffer);
> +	for (i = 0; i < KFD_TOPOLOGY_CPU_SIBLINGS; i++)
> +		ret = snprintf(buffer, PAGE_SIZE, "%s%d%s",
> +				buffer, cache->sibling_map[i],
> +				(i == KFD_TOPOLOGY_CPU_SIBLINGS-1) ?
> +						"\n" : ",");
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops cache_ops = {
> +	.show = kfd_cache_show,
> +};
> +
> +static struct kobj_type cache_type = {
> +	.sysfs_ops = &cache_ops,
> +};
> +
> +static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +	struct kfd_topology_device *dev;
> +	char public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
> +	uint32_t i;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	if (strcmp(attr->name, "gpu_id") == 0) {
> +		dev = container_of(attr, struct kfd_topology_device,
> +				attr_gpuid);
> +		ret = sysfs_show_32bit_val(buffer, dev->gpu_id);
> +	} else if (strcmp(attr->name, "name") == 0) {
> +		dev = container_of(attr, struct kfd_topology_device,
> +				attr_name);
> +		for (i = 0; i < KFD_TOPOLOGY_PUBLIC_NAME_SIZE; i++) {
> +			public_name[i] =
> +					(char)dev->node_props.marketing_name[i];
> +			if (dev->node_props.marketing_name[i] == 0)
> +				break;
> +		}
> +		public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE-1] = 0x0;
> +		ret = sysfs_show_str_val(buffer, public_name);
> +	} else {
> +		dev = container_of(attr, struct kfd_topology_device,
> +				attr_props);
> +		sysfs_show_32bit_prop(buffer, "cpu_cores_count",
> +				dev->node_props.cpu_cores_count);
> +		sysfs_show_32bit_prop(buffer, "simd_count",
> +				dev->node_props.simd_count);
> +		sysfs_show_32bit_prop(buffer, "mem_banks_count",
> +				dev->node_props.mem_banks_count);
> +		sysfs_show_32bit_prop(buffer, "caches_count",
> +				dev->node_props.caches_count);
> +		sysfs_show_32bit_prop(buffer, "io_links_count",
> +				dev->node_props.io_links_count);
> +		sysfs_show_32bit_prop(buffer, "cpu_core_id_base",
> +				dev->node_props.cpu_core_id_base);
> +		sysfs_show_32bit_prop(buffer, "simd_id_base",
> +				dev->node_props.simd_id_base);
> +		sysfs_show_32bit_prop(buffer, "capability",
> +				dev->node_props.capability);
> +		sysfs_show_32bit_prop(buffer, "max_waves_per_simd",
> +				dev->node_props.max_waves_per_simd);
> +		sysfs_show_32bit_prop(buffer, "lds_size_in_kb",
> +				dev->node_props.lds_size_in_kb);
> +		sysfs_show_32bit_prop(buffer, "gds_size_in_kb",
> +				dev->node_props.gds_size_in_kb);
> +		sysfs_show_32bit_prop(buffer, "wave_front_size",
> +				dev->node_props.wave_front_size);
> +		sysfs_show_32bit_prop(buffer, "array_count",
> +				dev->node_props.array_count);
> +		sysfs_show_32bit_prop(buffer, "simd_arrays_per_engine",
> +				dev->node_props.simd_arrays_per_engine);
> +		sysfs_show_32bit_prop(buffer, "cu_per_simd_array",
> +				dev->node_props.cu_per_simd_array);
> +		sysfs_show_32bit_prop(buffer, "simd_per_cu",
> +				dev->node_props.simd_per_cu);
> +		sysfs_show_32bit_prop(buffer, "max_slots_scratch_cu",
> +				dev->node_props.max_slots_scratch_cu);
> +		sysfs_show_32bit_prop(buffer, "engine_id",
> +				dev->node_props.engine_id);
> +		sysfs_show_32bit_prop(buffer, "vendor_id",
> +				dev->node_props.vendor_id);
> +		sysfs_show_32bit_prop(buffer, "device_id",
> +				dev->node_props.device_id);
> +		sysfs_show_32bit_prop(buffer, "location_id",
> +				dev->node_props.location_id);
> +		sysfs_show_32bit_prop(buffer, "max_engine_clk_fcompute",
> +				kfd2kgd->get_max_engine_clock_in_mhz(
> +					dev->gpu->kgd));
> +		sysfs_show_64bit_prop(buffer, "local_mem_size",
> +				kfd2kgd->get_vmem_size(dev->gpu->kgd));
> +		ret = sysfs_show_32bit_prop(buffer, "max_engine_clk_ccompute",
> +				cpufreq_quick_get_max(0)/1000);
> +	}
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops node_ops = {
> +	.show = node_show,
> +};
> +
> +static struct kobj_type node_type = {
> +	.sysfs_ops = &node_ops,
> +};
> +
> +static void kfd_remove_sysfs_file(struct kobject *kobj, struct attribute *attr)
> +{
> +	sysfs_remove_file(kobj, attr);
> +	kobject_del(kobj);
> +	kobject_put(kobj);
> +}
> +
> +static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
> +{
> +	struct kfd_iolink_properties *iolink;
> +	struct kfd_cache_properties *cache;
> +	struct kfd_mem_properties *mem;
> +
> +	BUG_ON(!dev);
> +
> +	if (dev->kobj_iolink) {
> +		list_for_each_entry(iolink, &dev->io_link_props, list)
> +			if (iolink->kobj) {
> +				kfd_remove_sysfs_file(iolink->kobj, &iolink->attr);
> +				iolink->kobj = 0;
> +			}
> +		kobject_del(dev->kobj_iolink);
> +		kobject_put(dev->kobj_iolink);
> +		dev->kobj_iolink = 0;
> +	}
> +
> +	if (dev->kobj_cache) {
> +		list_for_each_entry(cache, &dev->cache_props, list)
> +			if (cache->kobj) {
> +				kfd_remove_sysfs_file(cache->kobj, &cache->attr);
> +				cache->kobj = 0;
> +			}
> +		kobject_del(dev->kobj_cache);
> +		kobject_put(dev->kobj_cache);
> +		dev->kobj_cache = 0;
> +	}
> +
> +	if (dev->kobj_mem) {
> +		list_for_each_entry(mem, &dev->mem_props, list)
> +			if (mem->kobj) {
> +				kfd_remove_sysfs_file(mem->kobj, &mem->attr);
> +				mem->kobj = 0;
> +			}
> +		kobject_del(dev->kobj_mem);
> +		kobject_put(dev->kobj_mem);
> +		dev->kobj_mem = 0;
> +	}
> +
> +	if (dev->kobj_node) {
> +		sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
> +		sysfs_remove_file(dev->kobj_node, &dev->attr_name);
> +		sysfs_remove_file(dev->kobj_node, &dev->attr_props);
> +		kobject_del(dev->kobj_node);
> +		kobject_put(dev->kobj_node);
> +		dev->kobj_node = 0;
> +	}
> +}
> +
> +static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
> +		uint32_t id)
> +{
> +	struct kfd_iolink_properties *iolink;
> +	struct kfd_cache_properties *cache;
> +	struct kfd_mem_properties *mem;
> +	int ret;
> +	uint32_t i;
> +
> +	BUG_ON(!dev);
> +
> +	/*
> +	 * Creating the sysfs folders
> +	 */
> +	BUG_ON(dev->kobj_node);
> +	dev->kobj_node = kfd_alloc_struct(dev->kobj_node);
> +	if (!dev->kobj_node)
> +		return -ENOMEM;
> +
> +	ret = kobject_init_and_add(dev->kobj_node, &node_type,
> +			sys_props.kobj_nodes, "%d", id);
> +	if (ret < 0)
> +		return ret;
> +
> +	dev->kobj_mem = kobject_create_and_add("mem_banks", dev->kobj_node);
> +	if (!dev->kobj_mem)
> +		return -ENOMEM;
> +
> +	dev->kobj_cache = kobject_create_and_add("caches", dev->kobj_node);
> +	if (!dev->kobj_cache)
> +		return -ENOMEM;
> +
> +	dev->kobj_iolink = kobject_create_and_add("io_links", dev->kobj_node);
> +	if (!dev->kobj_iolink)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Creating sysfs files for node properties
> +	 */
> +	dev->attr_gpuid.name = "gpu_id";
> +	dev->attr_gpuid.mode = KFD_SYSFS_FILE_MODE;
> +	sysfs_attr_init(&dev->attr_gpuid);
> +	dev->attr_name.name = "name";
> +	dev->attr_name.mode = KFD_SYSFS_FILE_MODE;
> +	sysfs_attr_init(&dev->attr_name);
> +	dev->attr_props.name = "properties";
> +	dev->attr_props.mode = KFD_SYSFS_FILE_MODE;
> +	sysfs_attr_init(&dev->attr_props);
> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_gpuid);
> +	if (ret < 0)
> +		return ret;
> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_name);
> +	if (ret < 0)
> +		return ret;
> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_props);
> +	if (ret < 0)
> +		return ret;
> +
> +	i = 0;
> +	list_for_each_entry(mem, &dev->mem_props, list) {
> +		mem->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> +		if (!mem->kobj)
> +			return -ENOMEM;
> +		ret = kobject_init_and_add(mem->kobj, &mem_type,
> +				dev->kobj_mem, "%d", i);
> +		if (ret < 0)
> +			return ret;
> +
> +		mem->attr.name = "properties";
> +		mem->attr.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&mem->attr);
> +		ret = sysfs_create_file(mem->kobj, &mem->attr);
> +		if (ret < 0)
> +			return ret;
> +		i++;
> +	}
> +
> +	i = 0;
> +	list_for_each_entry(cache, &dev->cache_props, list) {
> +		cache->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> +		if (!cache->kobj)
> +			return -ENOMEM;
> +		ret = kobject_init_and_add(cache->kobj, &cache_type,
> +				dev->kobj_cache, "%d", i);
> +		if (ret < 0)
> +			return ret;
> +
> +		cache->attr.name = "properties";
> +		cache->attr.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&cache->attr);
> +		ret = sysfs_create_file(cache->kobj, &cache->attr);
> +		if (ret < 0)
> +			return ret;
> +		i++;
> +	}
> +
> +	i = 0;
> +	list_for_each_entry(iolink, &dev->io_link_props, list) {
> +		iolink->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> +		if (!iolink->kobj)
> +			return -ENOMEM;
> +		ret = kobject_init_and_add(iolink->kobj, &iolink_type,
> +				dev->kobj_iolink, "%d", i);
> +		if (ret < 0)
> +			return ret;
> +
> +		iolink->attr.name = "properties";
> +		iolink->attr.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&iolink->attr);
> +		ret = sysfs_create_file(iolink->kobj, &iolink->attr);
> +		if (ret < 0)
> +			return ret;
> +		i++;
> +}
> +
> +	return 0;
> +}
> +
> +static int kfd_build_sysfs_node_tree(void)
> +{
> +	struct kfd_topology_device *dev;
> +	int ret;
> +	uint32_t i = 0;
> +
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		ret = kfd_build_sysfs_node_entry(dev, 0);
> +		if (ret < 0)
> +			return ret;
> +		i++;
> +	}
> +
> +	return 0;
> +}
> +
> +static void kfd_remove_sysfs_node_tree(void)
> +{
> +	struct kfd_topology_device *dev;
> +
> +	list_for_each_entry(dev, &topology_device_list, list)
> +		kfd_remove_sysfs_node_entry(dev);
> +}
> +
> +static int kfd_topology_update_sysfs(void)
> +{
> +	int ret;
> +
> +	pr_info("Creating topology SYSFS entries\n");
> +	if (sys_props.kobj_topology == 0) {
> +		sys_props.kobj_topology = kfd_alloc_struct(sys_props.kobj_topology);
> +		if (!sys_props.kobj_topology)
> +			return -ENOMEM;
> +
> +		ret = kobject_init_and_add(sys_props.kobj_topology,
> +				&sysprops_type,  &kfd_device->kobj,
> +				"topology");
> +		if (ret < 0)
> +			return ret;
> +
> +		sys_props.kobj_nodes = kobject_create_and_add("nodes",
> +				sys_props.kobj_topology);
> +		if (!sys_props.kobj_nodes)
> +			return -ENOMEM;
> +
> +		sys_props.attr_genid.name = "generation_id";
> +		sys_props.attr_genid.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&sys_props.attr_genid);
> +		ret = sysfs_create_file(sys_props.kobj_topology,
> +				&sys_props.attr_genid);
> +		if (ret < 0)
> +			return ret;
> +
> +		sys_props.attr_props.name = "system_properties";
> +		sys_props.attr_props.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&sys_props.attr_props);
> +		ret = sysfs_create_file(sys_props.kobj_topology,
> +				&sys_props.attr_props);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	kfd_remove_sysfs_node_tree();
> +
> +	return kfd_build_sysfs_node_tree();
> +}
> +
> +static void kfd_topology_release_sysfs(void)
> +{
> +	kfd_remove_sysfs_node_tree();
> +	if (sys_props.kobj_topology) {
> +		sysfs_remove_file(sys_props.kobj_topology,
> +				&sys_props.attr_genid);
> +		sysfs_remove_file(sys_props.kobj_topology,
> +				&sys_props.attr_props);
> +		if (sys_props.kobj_nodes) {
> +			kobject_del(sys_props.kobj_nodes);
> +			kobject_put(sys_props.kobj_nodes);
> +			sys_props.kobj_nodes = 0;
> +		}
> +		kobject_del(sys_props.kobj_topology);
> +		kobject_put(sys_props.kobj_topology);
> +		sys_props.kobj_topology = 0;
> +	}
> +}
> +
> +int kfd_topology_init(void)
> +{
> +	void *crat_image = 0;
> +	size_t image_size = 0;
> +	int ret;
> +
> +	/*
> +	 * Initialize the head for the topology device list
> +	 */
> +	INIT_LIST_HEAD(&topology_device_list);
> +	init_rwsem(&topology_lock);
> +	topology_crat_parsed = 0;
> +
> +	memset(&sys_props, 0, sizeof(sys_props));
> +
> +	/*
> +	 * Get the CRAT image from the ACPI
> +	 */
> +	ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
> +	if (ret == 0 && image_size > 0) {
> +		pr_info("Found CRAT image with size=%zd\n", image_size);
> +		crat_image = kmalloc(image_size, GFP_KERNEL);
> +		if (!crat_image) {
> +			ret = -ENOMEM;
> +			pr_err("No memory for allocating CRAT image\n");
> +			goto err;
> +		}
> +		ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
> +
> +		if (ret == 0) {
> +			down_write(&topology_lock);
> +			ret = kfd_parse_crat_table(crat_image);
> +			if (ret == 0)
> +				ret = kfd_topology_update_sysfs();
> +			up_write(&topology_lock);
> +		} else {
> +			pr_err("Couldn't get CRAT table size from ACPI\n");
> +		}
> +		kfree(crat_image);
> +	} else if (ret == -ENODATA) {
> +		ret = 0;
> +	} else {
> +		pr_err("Couldn't get CRAT table size from ACPI\n");
> +	}
> +
> +err:
> +	pr_info("Finished initializing topology ret=%d\n", ret);
> +	return ret;
> +}
> +
> +void kfd_topology_shutdown(void)
> +{
> +	kfd_topology_release_sysfs();
> +	kfd_release_live_view();
> +}
> +
> +static void kfd_debug_print_topology(void)
> +{
> +	struct kfd_topology_device *dev;
> +	uint32_t i = 0;
> +
> +	pr_info("DEBUG PRINT OF TOPOLOGY:");
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		pr_info("Node: %d\n", i);
> +		pr_info("\tGPU assigned: %s\n", (dev->gpu ? "yes" : "no"));
> +		pr_info("\tCPU count: %d\n", dev->node_props.cpu_cores_count);
> +		pr_info("\tSIMD count: %d", dev->node_props.simd_count);
> +		i++;
> +	}
> +}
> +
> +static uint32_t kfd_generate_gpu_id(struct kfd_dev *gpu)
> +{
> +	uint32_t hashout;
> +	uint32_t buf[7];
> +	int i;
> +
> +	if (!gpu)
> +		return 0;
> +
> +	buf[0] = gpu->pdev->devfn;
> +	buf[1] = gpu->pdev->subsystem_vendor;
> +	buf[2] = gpu->pdev->subsystem_device;
> +	buf[3] = gpu->pdev->device;
> +	buf[4] = gpu->pdev->bus->number;
> +	buf[5] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) & 0xffffffff);
> +	buf[6] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) >> 32);
> +
> +	for (i = 0, hashout = 0; i < 7; i++)
> +		hashout ^= hash_32(buf[i], KFD_GPU_ID_HASH_WIDTH);
> +
> +	return hashout;
> +}
> +
> +static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
> +{
> +	struct kfd_topology_device *dev;
> +	struct kfd_topology_device *out_dev = 0;
> +
> +	BUG_ON(!gpu);
> +
> +	list_for_each_entry(dev, &topology_device_list, list)
> +		if (dev->gpu == 0 && dev->node_props.simd_count > 0) {
> +			dev->gpu = gpu;
> +			out_dev = dev;
> +			break;
> +		}
> +
> +	return out_dev;
> +}
> +
> +static void kfd_notify_gpu_change(uint32_t gpu_id, int arrival)
> +{
> +	/*
> +	 * TODO: Generate an event for thunk about the arrival/removal
> +	 * of the GPU
> +	 */
> +}
> +
> +int kfd_topology_add_device(struct kfd_dev *gpu)
> +{
> +	uint32_t gpu_id;
> +	struct kfd_topology_device *dev;
> +	int res;
> +
> +	BUG_ON(!gpu);
> +
> +	gpu_id = kfd_generate_gpu_id(gpu);
> +
> +	pr_debug("kfd: Adding new GPU (ID: 0x%x) to topology\n", gpu_id);
> +
> +	down_write(&topology_lock);
> +	/*
> +	 * Try to assign the GPU to existing topology device (generated from
> +	 * CRAT table
> +	 */
> +	dev = kfd_assign_gpu(gpu);
> +	if (!dev) {
> +		pr_info("GPU was not found in the current topology. Extending.\n");
> +		kfd_debug_print_topology();
> +		dev = kfd_create_topology_device();
> +		if (!dev) {
> +			res = -ENOMEM;
> +			goto err;
> +		}
> +		dev->gpu = gpu;
> +
> +		/*
> +		 * TODO: Make a call to retrieve topology information from the
> +		 * GPU vBIOS
> +		 */
> +
> +		/*
> +		 * Update the SYSFS tree, since we added another topology device
> +		 */
> +		if (kfd_topology_update_sysfs() < 0)
> +			kfd_topology_release_sysfs();
> +
> +	}
> +
> +	dev->gpu_id = gpu_id;
> +	gpu->id = gpu_id;
> +	dev->node_props.vendor_id = gpu->pdev->vendor;
> +	dev->node_props.device_id = gpu->pdev->device;
> +	dev->node_props.location_id = (gpu->pdev->bus->number << 24) +
> +			(gpu->pdev->devfn & 0xffffff);
> +	/*
> +	 * TODO: Retrieve max engine clock values from KGD
> +	 */
> +
> +	res = 0;
> +
> +err:
> +	up_write(&topology_lock);
> +
> +	if (res == 0)
> +		kfd_notify_gpu_change(gpu_id, 1);
> +
> +	return res;
> +}
> +
> +int kfd_topology_remove_device(struct kfd_dev *gpu)
> +{
> +	struct kfd_topology_device *dev;
> +	uint32_t gpu_id;
> +	int res = -ENODEV;
> +
> +	BUG_ON(!gpu);
> +
> +	down_write(&topology_lock);
> +
> +	list_for_each_entry(dev, &topology_device_list, list)
> +		if (dev->gpu == gpu) {
> +			gpu_id = dev->gpu_id;
> +			kfd_remove_sysfs_node_entry(dev);
> +			kfd_release_topology_device(dev);
> +			res = 0;
> +			if (kfd_topology_update_sysfs() < 0)
> +				kfd_topology_release_sysfs();
> +			break;
> +		}
> +
> +	up_write(&topology_lock);
> +
> +	if (res == 0)
> +		kfd_notify_gpu_change(gpu_id, 0);
> +
> +	return res;
> +}
> +
> +/*
> + * When idx is out of bounds, the function will return NULL
> + */
> +struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx)
> +{
> +
> +	struct kfd_topology_device *top_dev;
> +	struct kfd_dev *device = NULL;
> +	uint8_t device_idx = 0;
> +
> +	down_read(&topology_lock);
> +
> +	list_for_each_entry(top_dev, &topology_device_list, list) {
> +		if (device_idx == idx) {
> +			device = top_dev->gpu;
> +			break;
> +		}
> +
> +		device_idx++;
> +	}
> +
> +	up_read(&topology_lock);
> +
> +	return device;
> +
> +}
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
> new file mode 100644
> index 0000000..989624b
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
> @@ -0,0 +1,168 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef __KFD_TOPOLOGY_H__
> +#define __KFD_TOPOLOGY_H__
> +
> +#include <linux/types.h>
> +#include <linux/list.h>
> +#include "kfd_priv.h"
> +
> +#define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
> +
> +#define HSA_CAP_HOT_PLUGGABLE			0x00000001
> +#define HSA_CAP_ATS_PRESENT			0x00000002
> +#define HSA_CAP_SHARED_WITH_GRAPHICS		0x00000004
> +#define HSA_CAP_QUEUE_SIZE_POW2			0x00000008
> +#define HSA_CAP_QUEUE_SIZE_32BIT		0x00000010
> +#define HSA_CAP_QUEUE_IDLE_EVENT		0x00000020
> +#define HSA_CAP_VA_LIMIT			0x00000040
> +#define HSA_CAP_WATCH_POINTS_SUPPORTED		0x00000080
> +#define HSA_CAP_WATCH_POINTS_TOTALBITS_MASK	0x00000f00
> +#define HSA_CAP_WATCH_POINTS_TOTALBITS_SHIFT	8
> +#define HSA_CAP_RESERVED			0xfffff000
> +
> +struct kfd_node_properties {
> +	uint32_t cpu_cores_count;
> +	uint32_t simd_count;
> +	uint32_t mem_banks_count;
> +	uint32_t caches_count;
> +	uint32_t io_links_count;
> +	uint32_t cpu_core_id_base;
> +	uint32_t simd_id_base;
> +	uint32_t capability;
> +	uint32_t max_waves_per_simd;
> +	uint32_t lds_size_in_kb;
> +	uint32_t gds_size_in_kb;
> +	uint32_t wave_front_size;
> +	uint32_t array_count;
> +	uint32_t simd_arrays_per_engine;
> +	uint32_t cu_per_simd_array;
> +	uint32_t simd_per_cu;
> +	uint32_t max_slots_scratch_cu;
> +	uint32_t engine_id;
> +	uint32_t vendor_id;
> +	uint32_t device_id;
> +	uint32_t location_id;
> +	uint32_t max_engine_clk_fcompute;
> +	uint32_t max_engine_clk_ccompute;
> +	uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
> +};
> +
> +#define HSA_MEM_HEAP_TYPE_SYSTEM	0
> +#define HSA_MEM_HEAP_TYPE_FB_PUBLIC	1
> +#define HSA_MEM_HEAP_TYPE_FB_PRIVATE	2
> +#define HSA_MEM_HEAP_TYPE_GPU_GDS	3
> +#define HSA_MEM_HEAP_TYPE_GPU_LDS	4
> +#define HSA_MEM_HEAP_TYPE_GPU_SCRATCH	5
> +
> +#define HSA_MEM_FLAGS_HOT_PLUGGABLE	0x00000001
> +#define HSA_MEM_FLAGS_NON_VOLATILE	0x00000002
> +#define HSA_MEM_FLAGS_RESERVED		0xfffffffc
> +
> +struct kfd_mem_properties {
> +	struct list_head	list;
> +	uint32_t		heap_type;
> +	uint64_t		size_in_bytes;
> +	uint32_t		flags;
> +	uint32_t		width;
> +	uint32_t		mem_clk_max;
> +	struct kobject		*kobj;
> +	struct attribute	attr;
> +};
> +
> +#define KFD_TOPOLOGY_CPU_SIBLINGS 256
> +
> +#define HSA_CACHE_TYPE_DATA		0x00000001
> +#define HSA_CACHE_TYPE_INSTRUCTION	0x00000002
> +#define HSA_CACHE_TYPE_CPU		0x00000004
> +#define HSA_CACHE_TYPE_HSACU		0x00000008
> +#define HSA_CACHE_TYPE_RESERVED		0xfffffff0
> +
> +struct kfd_cache_properties {
> +	struct list_head	list;
> +	uint32_t		processor_id_low;
> +	uint32_t		cache_level;
> +	uint32_t		cache_size;
> +	uint32_t		cacheline_size;
> +	uint32_t		cachelines_per_tag;
> +	uint32_t		cache_assoc;
> +	uint32_t		cache_latency;
> +	uint32_t		cache_type;
> +	uint8_t			sibling_map[KFD_TOPOLOGY_CPU_SIBLINGS];
> +	struct kobject		*kobj;
> +	struct attribute	attr;
> +};
> +
> +struct kfd_iolink_properties {
> +	struct list_head	list;
> +	uint32_t		iolink_type;
> +	uint32_t		ver_maj;
> +	uint32_t		ver_min;
> +	uint32_t		node_from;
> +	uint32_t		node_to;
> +	uint32_t		weight;
> +	uint32_t		min_latency;
> +	uint32_t		max_latency;
> +	uint32_t		min_bandwidth;
> +	uint32_t		max_bandwidth;
> +	uint32_t		rec_transfer_size;
> +	uint32_t		flags;
> +	struct kobject		*kobj;
> +	struct attribute	attr;
> +};
> +
> +struct kfd_topology_device {
> +	struct list_head		list;
> +	uint32_t			gpu_id;
> +	struct kfd_node_properties	node_props;
> +	uint32_t			mem_bank_count;
> +	struct list_head		mem_props;
> +	uint32_t			cache_count;
> +	struct list_head		cache_props;
> +	uint32_t			io_link_count;
> +	struct list_head		io_link_props;
> +	struct kfd_dev			*gpu;
> +	struct kobject			*kobj_node;
> +	struct kobject			*kobj_mem;
> +	struct kobject			*kobj_cache;
> +	struct kobject			*kobj_iolink;
> +	struct attribute		attr_gpuid;
> +	struct attribute		attr_name;
> +	struct attribute		attr_props;
> +};
> +
> +struct kfd_system_properties {
> +	uint32_t		num_devices;     /* Number of H-NUMA nodes */
> +	uint32_t		generation_count;
> +	uint64_t		platform_oem;
> +	uint64_t		platform_id;
> +	uint64_t		platform_rev;
> +	struct kobject		*kobj_topology;
> +	struct kobject		*kobj_nodes;
> +	struct attribute	attr_genid;
> +	struct attribute	attr_props;
> +};
> +
> +
> +
> +#endif /* __KFD_TOPOLOGY_H__ */
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 10/25] amdkfd: Add topology module to amdkfd
  2014-07-20 22:37   ` Jerome Glisse
@ 2014-07-27 11:15     ` Oded Gabbay
  2014-07-30 12:10       ` Oded Gabbay
  2014-07-27 11:26     ` Oded Gabbay
  1 sibling, 1 reply; 46+ messages in thread
From: Oded Gabbay @ 2014-07-27 11:15 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Alexey Skidanov, Andrew Morton

On 21/07/14 01:37, Jerome Glisse wrote:
> On Thu, Jul 17, 2014 at 04:29:17PM +0300, Oded Gabbay wrote:
>> From: Evgeny Pinchuk <evgeny.pinchuk@amd.com>
>>
>> This patch adds the topology module to the driver. The topology is exposed to
>> userspace through the sysfs.
>>
>> The calls to add and remove a device to/from topology are done by the radeon
>> driver.
>
> So overall we already said that we do not want to see the cpu architecture
> re-expose by hsa in its own format. This pacth is NACK. Only expose additional
> non existent information and also follow the number one rules of sysfs which
> is one value -> one file.
>
> I understand the temptation to rexpose the cpu topology in your own way to
> make life simpler but there is already api for this so please use what exist
> today and if there is short coming than i am sure they can be fixed.
>
> See :
>
> /sys/devices/system/cpu/cpu*
>
Agreed and we will change the code. Hopefully it will be in v3, although we may 
release v3 early this week and release v4 with this change next week.

	Oded
>>
>> Signed-off-by: Evgeny Pinchuk <evgeny.pinchuk@amd.com>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> ---
>>   drivers/gpu/drm/radeon/amdkfd/Makefile       |    2 +-
>>   drivers/gpu/drm/radeon/amdkfd/kfd_crat.h     |  294 +++++++
>>   drivers/gpu/drm/radeon/amdkfd/kfd_device.c   |    7 +
>>   drivers/gpu/drm/radeon/amdkfd/kfd_module.c   |    7 +
>>   drivers/gpu/drm/radeon/amdkfd/kfd_priv.h     |   17 +
>>   drivers/gpu/drm/radeon/amdkfd/kfd_topology.c | 1207 ++++++++++++++++++++++++++
>>   drivers/gpu/drm/radeon/amdkfd/kfd_topology.h |  168 ++++
>>   7 files changed, 1701 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
>> index 9564e75..08ecfcd 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/Makefile
>> +++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
>> @@ -4,6 +4,6 @@
>>
>>   ccflags-y := -Iinclude/drm
>>
>> -amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o
>> +amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o
>>
>>   obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h b/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
>> new file mode 100644
>> index 0000000..a374fa3
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
>> @@ -0,0 +1,294 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#ifndef KFD_CRAT_H_INCLUDED
>> +#define KFD_CRAT_H_INCLUDED
>> +
>> +#include <linux/types.h>
>> +
>> +#pragma pack(1)
>
> No pragma
>
>> +
>> +/*
>> + * 4CC signature values for the CRAT and CDIT ACPI tables
>> + */
>> +
>> +#define CRAT_SIGNATURE	"CRAT"
>> +#define CDIT_SIGNATURE	"CDIT"
>> +
>> +/*
>> + * Component Resource Association Table (CRAT)
>> + */
>> +
>> +#define CRAT_OEMID_LENGTH	6
>> +#define CRAT_OEMTABLEID_LENGTH	8
>> +#define CRAT_RESERVED_LENGTH	6
>> +
>> +#define CRAT_OEMID_64BIT_MASK ((1ULL << (CRAT_OEMID_LENGTH * 8)) - 1)
>> +
>> +struct crat_header {
>> +	uint32_t	signature;
>> +	uint32_t	length;
>> +	uint8_t		revision;
>> +	uint8_t		checksum;
>> +	uint8_t		oem_id[CRAT_OEMID_LENGTH];
>> +	uint8_t		oem_table_id[CRAT_OEMTABLEID_LENGTH];
>> +	uint32_t	oem_revision;
>> +	uint32_t	creator_id;
>> +	uint32_t	creator_revision;
>> +	uint32_t	total_entries;
>> +	uint16_t	num_domains;
>> +	uint8_t		reserved[CRAT_RESERVED_LENGTH];
>> +};
>> +
>> +/*
>> + * The header structure is immediately followed by total_entries of the
>> + * data definitions
>> + */
>> +
>> +/*
>> + * The currently defined subtype entries in the CRAT
>> + */
>> +#define CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY	0
>> +#define CRAT_SUBTYPE_MEMORY_AFFINITY		1
>> +#define CRAT_SUBTYPE_CACHE_AFFINITY		2
>> +#define CRAT_SUBTYPE_TLB_AFFINITY		3
>> +#define CRAT_SUBTYPE_CCOMPUTE_AFFINITY		4
>> +#define CRAT_SUBTYPE_IOLINK_AFFINITY		5
>> +#define CRAT_SUBTYPE_MAX			6
>> +
>> +#define CRAT_SIBLINGMAP_SIZE	32
>> +
>> +/*
>> + * ComputeUnit Affinity structure and definitions
>> + */
>> +#define CRAT_CU_FLAGS_ENABLED		0x00000001
>> +#define CRAT_CU_FLAGS_HOT_PLUGGABLE	0x00000002
>> +#define CRAT_CU_FLAGS_CPU_PRESENT	0x00000004
>> +#define CRAT_CU_FLAGS_GPU_PRESENT	0x00000008
>> +#define CRAT_CU_FLAGS_IOMMU_PRESENT	0x00000010
>> +#define CRAT_CU_FLAGS_RESERVED		0xffffffe0
>> +
>> +#define CRAT_COMPUTEUNIT_RESERVED_LENGTH 4
>> +
>> +struct crat_subtype_computeunit {
>> +	uint8_t		type;
>> +	uint8_t		length;
>> +	uint16_t	reserved;
>> +	uint32_t	flags;
>> +	uint32_t	proximity_domain;
>> +	uint32_t	processor_id_low;
>> +	uint16_t	num_cpu_cores;
>> +	uint16_t	num_simd_cores;
>> +	uint16_t	max_waves_simd;
>> +	uint16_t	io_count;
>> +	uint16_t	hsa_capability;
>> +	uint16_t	lds_size_in_kb;
>> +	uint8_t		wave_front_size;
>> +	uint8_t		num_banks;
>> +	uint16_t	micro_engine_id;
>> +	uint8_t		num_arrays;
>> +	uint8_t		num_cu_per_array;
>> +	uint8_t		num_simd_per_cu;
>> +	uint8_t		max_slots_scatch_cu;
>> +	uint8_t		reserved2[CRAT_COMPUTEUNIT_RESERVED_LENGTH];
>> +};
>> +
>> +/*
>> + * HSA Memory Affinity structure and definitions
>> + */
>> +#define CRAT_MEM_FLAGS_ENABLED		0x00000001
>> +#define CRAT_MEM_FLAGS_HOT_PLUGGABLE	0x00000002
>> +#define CRAT_MEM_FLAGS_NON_VOLATILE	0x00000004
>> +#define CRAT_MEM_FLAGS_RESERVED		0xfffffff8
>> +
>> +#define CRAT_MEMORY_RESERVED_LENGTH 8
>> +
>> +struct crat_subtype_memory {
>> +	uint8_t		type;
>> +	uint8_t		length;
>> +	uint16_t	reserved;
>> +	uint32_t	flags;
>> +	uint32_t	promixity_domain;
>> +	uint32_t	base_addr_low;
>> +	uint32_t	base_addr_high;
>> +	uint32_t	length_low;
>> +	uint32_t	length_high;
>> +	uint32_t	width;
>> +	uint8_t		reserved2[CRAT_MEMORY_RESERVED_LENGTH];
>> +};
>> +
>> +/*
>> + * HSA Cache Affinity structure and definitions
>> + */
>> +#define CRAT_CACHE_FLAGS_ENABLED	0x00000001
>> +#define CRAT_CACHE_FLAGS_DATA_CACHE	0x00000002
>> +#define CRAT_CACHE_FLAGS_INST_CACHE	0x00000004
>> +#define CRAT_CACHE_FLAGS_CPU_CACHE	0x00000008
>> +#define CRAT_CACHE_FLAGS_SIMD_CACHE	0x00000010
>> +#define CRAT_CACHE_FLAGS_RESERVED	0xffffffe0
>> +
>> +#define CRAT_CACHE_RESERVED_LENGTH 8
>> +
>> +struct crat_subtype_cache {
>> +	uint8_t		type;
>> +	uint8_t		length;
>> +	uint16_t	reserved;
>> +	uint32_t	flags;
>> +	uint32_t	processor_id_low;
>> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
>> +	uint32_t	cache_size;
>> +	uint8_t		cache_level;
>> +	uint8_t		lines_per_tag;
>> +	uint16_t	cache_line_size;
>> +	uint8_t		associativity;
>> +	uint8_t		cache_properties;
>> +	uint16_t	cache_latency;
>> +	uint8_t		reserved2[CRAT_CACHE_RESERVED_LENGTH];
>> +};
>> +
>> +/*
>> + * HSA TLB Affinity structure and definitions
>> + */
>> +#define CRAT_TLB_FLAGS_ENABLED	0x00000001
>> +#define CRAT_TLB_FLAGS_DATA_TLB	0x00000002
>> +#define CRAT_TLB_FLAGS_INST_TLB	0x00000004
>> +#define CRAT_TLB_FLAGS_CPU_TLB	0x00000008
>> +#define CRAT_TLB_FLAGS_SIMD_TLB	0x00000010
>> +#define CRAT_TLB_FLAGS_RESERVED	0xffffffe0
>> +
>> +#define CRAT_TLB_RESERVED_LENGTH 4
>> +
>> +struct crat_subtype_tlb {
>> +	uint8_t		type;
>> +	uint8_t		length;
>> +	uint16_t	reserved;
>> +	uint32_t	flags;
>> +	uint32_t	processor_id_low;
>> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
>> +	uint32_t	tlb_level;
>> +	uint8_t		data_tlb_associativity_2mb;
>> +	uint8_t		data_tlb_size_2mb;
>> +	uint8_t		instruction_tlb_associativity_2mb;
>> +	uint8_t		instruction_tlb_size_2mb;
>> +	uint8_t		data_tlb_associativity_4k;
>> +	uint8_t		data_tlb_size_4k;
>> +	uint8_t		instruction_tlb_associativity_4k;
>> +	uint8_t		instruction_tlb_size_4k;
>> +	uint8_t		data_tlb_associativity_1gb;
>> +	uint8_t		data_tlb_size_1gb;
>> +	uint8_t		instruction_tlb_associativity_1gb;
>> +	uint8_t		instruction_tlb_size_1gb;
>> +	uint8_t		reserved2[CRAT_TLB_RESERVED_LENGTH];
>> +};
>> +
>> +/*
>> + * HSA CCompute/APU Affinity structure and definitions
>> + */
>> +#define CRAT_CCOMPUTE_FLAGS_ENABLED	0x00000001
>> +#define CRAT_CCOMPUTE_FLAGS_RESERVED	0xfffffffe
>> +
>> +#define CRAT_CCOMPUTE_RESERVED_LENGTH 16
>> +
>> +struct crat_subtype_ccompute {
>> +	uint8_t		type;
>> +	uint8_t		length;
>> +	uint16_t	reserved;
>> +	uint32_t	flags;
>> +	uint32_t	processor_id_low;
>> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
>> +	uint32_t	apu_size;
>> +	uint8_t		reserved2[CRAT_CCOMPUTE_RESERVED_LENGTH];
>> +};
>> +
>> +/*
>> + * HSA IO Link Affinity structure and definitions
>> + */
>> +#define CRAT_IOLINK_FLAGS_ENABLED	0x00000001
>> +#define CRAT_IOLINK_FLAGS_COHERENCY	0x00000002
>> +#define CRAT_IOLINK_FLAGS_RESERVED	0xfffffffc
>> +
>> +/*
>> + * IO interface types
>> + */
>> +#define CRAT_IOLINK_TYPE_UNDEFINED	0
>> +#define CRAT_IOLINK_TYPE_HYPERTRANSPORT	1
>> +#define CRAT_IOLINK_TYPE_PCIEXPRESS	2
>> +#define CRAT_IOLINK_TYPE_OTHER		3
>> +#define CRAT_IOLINK_TYPE_MAX		255
>> +
>> +#define CRAT_IOLINK_RESERVED_LENGTH 24
>> +
>> +struct crat_subtype_iolink {
>> +	uint8_t		type;
>> +	uint8_t		length;
>> +	uint16_t	reserved;
>> +	uint32_t	flags;
>> +	uint32_t	proximity_domain_from;
>> +	uint32_t	proximity_domain_to;
>> +	uint8_t		io_interface_type;
>> +	uint8_t		version_major;
>> +	uint16_t	version_minor;
>> +	uint32_t	minimum_latency;
>> +	uint32_t	maximum_latency;
>> +	uint32_t	minimum_bandwidth_mbs;
>> +	uint32_t	maximum_bandwidth_mbs;
>> +	uint32_t	recommended_transfer_size;
>> +	uint8_t		reserved2[CRAT_IOLINK_RESERVED_LENGTH];
>> +};
>> +
>> +/*
>> + * HSA generic sub-type header
>> + */
>> +
>> +#define CRAT_SUBTYPE_FLAGS_ENABLED 0x00000001
>> +
>> +struct crat_subtype_generic {
>> +	uint8_t		type;
>> +	uint8_t		length;
>> +	uint16_t	reserved;
>> +	uint32_t	flags;
>> +};
>> +
>> +/*
>> + * Component Locality Distance Information Table (CDIT)
>> + */
>> +#define CDIT_OEMID_LENGTH	6
>> +#define CDIT_OEMTABLEID_LENGTH	8
>> +
>> +struct cdit_header {
>> +	uint32_t	signature;
>> +	uint32_t	length;
>> +	uint8_t		revision;
>> +	uint8_t		checksum;
>> +	uint8_t		oem_id[CDIT_OEMID_LENGTH];
>> +	uint8_t		oem_table_id[CDIT_OEMTABLEID_LENGTH];
>> +	uint32_t	oem_revision;
>> +	uint32_t	creator_id;
>> +	uint32_t	creator_revision;
>> +	uint32_t	total_entries;
>> +	uint16_t	num_domains;
>> +	uint8_t		entry[1];
>> +};
>> +
>> +#pragma pack()
>> +
>> +#endif /* KFD_CRAT_H_INCLUDED */
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>> index dd63ce09..4138694 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>> @@ -100,6 +100,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>>   {
>>   	kfd->shared_resources = *gpu_resources;
>>
>> +	if (kfd_topology_add_device(kfd) != 0)
>> +		return false;
>> +
>>   	kfd->init_complete = true;
>>   	dev_info(kfd_device, "added device (%x:%x)\n", kfd->pdev->vendor,
>>   		 kfd->pdev->device);
>> @@ -109,6 +112,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>>
>>   void kgd2kfd_device_exit(struct kfd_dev *kfd)
>>   {
>> +	int err = kfd_topology_remove_device(kfd);
>> +
>> +	BUG_ON(err != 0);
>> +
>>   	kfree(kfd);
>>   }
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>> index c7faac6..c51f981 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>> @@ -73,16 +73,23 @@ static int __init kfd_module_init(void)
>>   	if (err < 0)
>>   		goto err_ioctl;
>>
>> +	err = kfd_topology_init();
>> +	if (err < 0)
>> +		goto err_topology;
>> +
>>   	dev_info(kfd_device, "Initialized module\n");
>>
>>   	return 0;
>>
>> +err_topology:
>> +	kfd_chardev_exit();
>>   err_ioctl:
>>   	return err;
>>   }
>>
>>   static void __exit kfd_module_exit(void)
>>   {
>> +	kfd_topology_shutdown();
>>   	kfd_chardev_exit();
>>   	dev_info(kfd_device, "Removed module\n");
>>   }
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> index 05e892f..b391e24 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> @@ -32,6 +32,14 @@
>>   #include <linux/spinlock.h>
>>   #include "../radeon_kfd.h"
>>
>> +#define KFD_SYSFS_FILE_MODE 0444
>> +
>> +/* GPU ID hash width in bits */
>> +#define KFD_GPU_ID_HASH_WIDTH 16
>> +
>> +/* Macro for allocating structures */
>> +#define kfd_alloc_struct(ptr_to_struct)	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
>> +
>>   struct kfd_device_info {
>>   	const struct kfd_scheduler_class *scheduler_class;
>>   	unsigned int max_pasid_bits;
>> @@ -71,6 +79,15 @@ struct kfd_process {
>>
>>   extern struct device *kfd_device;
>>
>> +/* Topology */
>> +int kfd_topology_init(void);
>> +void kfd_topology_shutdown(void);
>> +int kfd_topology_add_device(struct kfd_dev *gpu);
>> +int kfd_topology_remove_device(struct kfd_dev *gpu);
>> +struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
>> +struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
>> +struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx);
>> +
>>   /* Interrupts */
>>   void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
>> new file mode 100644
>> index 0000000..30da4c3
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
>> @@ -0,0 +1,1207 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <linux/types.h>
>> +#include <linux/kernel.h>
>> +#include <linux/pci.h>
>> +#include <linux/errno.h>
>> +#include <linux/acpi.h>
>> +#include <linux/hash.h>
>> +#include <linux/cpufreq.h>
>> +
>> +#include "kfd_priv.h"
>> +#include "kfd_crat.h"
>> +#include "kfd_topology.h"
>> +
>> +static struct list_head topology_device_list;
>> +static int topology_crat_parsed;
>> +static struct kfd_system_properties sys_props;
>> +
>> +static DECLARE_RWSEM(topology_lock);
>> +
>> +struct kfd_dev *kfd_device_by_id(uint32_t gpu_id)
>> +{
>> +	struct kfd_topology_device *top_dev;
>> +	struct kfd_dev *device = NULL;
>> +
>> +	down_read(&topology_lock);
>> +
>> +	list_for_each_entry(top_dev, &topology_device_list, list)
>> +		if (top_dev->gpu_id == gpu_id) {
>> +			device = top_dev->gpu;
>> +			break;
>> +		}
>> +
>> +	up_read(&topology_lock);
>> +
>> +	return device;
>> +}
>> +
>> +struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev)
>> +{
>> +	struct kfd_topology_device *top_dev;
>> +	struct kfd_dev *device = NULL;
>> +
>> +	down_read(&topology_lock);
>> +
>> +	list_for_each_entry(top_dev, &topology_device_list, list)
>> +		if (top_dev->gpu->pdev == pdev) {
>> +			device = top_dev->gpu;
>> +			break;
>> +		}
>> +
>> +	up_read(&topology_lock);
>> +
>> +	return device;
>> +}
>> +
>> +static int kfd_topology_get_crat_acpi(void *crat_image, size_t *size)
>> +{
>> +	struct acpi_table_header *crat_table;
>> +	acpi_status status;
>> +
>> +	if (!size)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * Fetch the CRAT table from ACPI
>> +	 */
>> +	status = acpi_get_table(CRAT_SIGNATURE, 0, &crat_table);
>> +	if (status == AE_NOT_FOUND) {
>> +		pr_warn("CRAT table not found\n");
>> +		return -ENODATA;
>> +	} else if (ACPI_FAILURE(status)) {
>> +		const char *err = acpi_format_exception(status);
>> +
>> +		pr_err("CRAT table error: %s\n", err);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (*size >= crat_table->length && crat_image != 0)
>> +		memcpy(crat_image, crat_table, crat_table->length);
>> +
>> +	*size = crat_table->length;
>> +
>> +	return 0;
>> +}
>> +
>> +static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
>> +		struct crat_subtype_computeunit *cu)
>> +{
>> +	BUG_ON(!dev);
>> +	BUG_ON(!cu);
>> +
>> +	dev->node_props.cpu_cores_count = cu->num_cpu_cores;
>> +	dev->node_props.cpu_core_id_base = cu->processor_id_low;
>> +	if (cu->hsa_capability & CRAT_CU_FLAGS_IOMMU_PRESENT)
>> +		dev->node_props.capability |= HSA_CAP_ATS_PRESENT;
>> +
>> +	pr_info("CU CPU: cores=%d id_base=%d\n", cu->num_cpu_cores,
>> +			cu->processor_id_low);
>> +}
>> +
>> +static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
>> +		struct crat_subtype_computeunit *cu)
>> +{
>> +	BUG_ON(!dev);
>> +	BUG_ON(!cu);
>> +
>> +	dev->node_props.simd_id_base = cu->processor_id_low;
>> +	dev->node_props.simd_count = cu->num_simd_cores;
>> +	dev->node_props.lds_size_in_kb = cu->lds_size_in_kb;
>> +	dev->node_props.max_waves_per_simd = cu->max_waves_simd;
>> +	dev->node_props.wave_front_size = cu->wave_front_size;
>> +	dev->node_props.mem_banks_count = cu->num_banks;
>> +	dev->node_props.array_count = cu->num_arrays;
>> +	dev->node_props.cu_per_simd_array = cu->num_cu_per_array;
>> +	dev->node_props.simd_per_cu = cu->num_simd_per_cu;
>> +	dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
>> +	if (cu->hsa_capability & CRAT_CU_FLAGS_HOT_PLUGGABLE)
>> +		dev->node_props.capability |= HSA_CAP_HOT_PLUGGABLE;
>> +	pr_info("CU GPU: simds=%d id_base=%d\n", cu->num_simd_cores,
>> +				cu->processor_id_low);
>> +}
>> +
>> +/* kfd_parse_subtype_cu is called when the topology mutex is already acquired */
>> +static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu)
>> +{
>> +	struct kfd_topology_device *dev;
>> +	int i = 0;
>> +
>> +	BUG_ON(!cu);
>> +
>> +	pr_info("Found CU entry in CRAT table with proximity_domain=%d caps=%x\n",
>> +			cu->proximity_domain, cu->hsa_capability);
>> +	list_for_each_entry(dev, &topology_device_list, list) {
>> +		if (cu->proximity_domain == i) {
>> +			if (cu->flags & CRAT_CU_FLAGS_CPU_PRESENT)
>> +				kfd_populated_cu_info_cpu(dev, cu);
>> +
>> +			if (cu->flags & CRAT_CU_FLAGS_GPU_PRESENT)
>> +				kfd_populated_cu_info_gpu(dev, cu);
>> +			break;
>> +		}
>> +		i++;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/* kfd_parse_subtype_mem is called when the topology mutex is already acquired */
>> +static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem)
>> +{
>> +	struct kfd_mem_properties *props;
>> +	struct kfd_topology_device *dev;
>> +	int i = 0;
>> +
>> +	BUG_ON(!mem);
>> +
>> +	pr_info("Found memory entry in CRAT table with proximity_domain=%d\n",
>> +			mem->promixity_domain);
>> +	list_for_each_entry(dev, &topology_device_list, list) {
>> +		if (mem->promixity_domain == i) {
>> +			props = kfd_alloc_struct(props);
>> +			if (props == 0)
>> +				return -ENOMEM;
>> +
>> +			if (dev->node_props.cpu_cores_count == 0)
>> +				props->heap_type = HSA_MEM_HEAP_TYPE_FB_PRIVATE;
>> +			else
>> +				props->heap_type = HSA_MEM_HEAP_TYPE_SYSTEM;
>> +
>> +			if (mem->flags & CRAT_MEM_FLAGS_HOT_PLUGGABLE)
>> +				props->flags |= HSA_MEM_FLAGS_HOT_PLUGGABLE;
>> +			if (mem->flags & CRAT_MEM_FLAGS_NON_VOLATILE)
>> +				props->flags |= HSA_MEM_FLAGS_NON_VOLATILE;
>> +
>> +			props->size_in_bytes = ((uint64_t)mem->length_high << 32) +
>> +						mem->length_low;
>> +			props->width = mem->width;
>> +
>> +			dev->mem_bank_count++;
>> +			list_add_tail(&props->list, &dev->mem_props);
>> +
>> +			break;
>> +		}
>> +		i++;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/* kfd_parse_subtype_cache is called when the topology mutex is already acquired */
>> +static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
>> +{
>> +	struct kfd_cache_properties *props;
>> +	struct kfd_topology_device *dev;
>> +	uint32_t id;
>> +
>> +	BUG_ON(!cache);
>> +
>> +	id = cache->processor_id_low;
>> +
>> +	pr_info("Found cache entry in CRAT table with processor_id=%d\n", id);
>> +	list_for_each_entry(dev, &topology_device_list, list)
>> +		if (id == dev->node_props.cpu_core_id_base ||
>> +		    id == dev->node_props.simd_id_base) {
>> +			props = kfd_alloc_struct(props);
>> +			if (props == 0)
>> +				return -ENOMEM;
>> +
>> +			props->processor_id_low = id;
>> +			props->cache_level = cache->cache_level;
>> +			props->cache_size = cache->cache_size;
>> +			props->cacheline_size = cache->cache_line_size;
>> +			props->cachelines_per_tag = cache->lines_per_tag;
>> +			props->cache_assoc = cache->associativity;
>> +			props->cache_latency = cache->cache_latency;
>> +
>> +			if (cache->flags & CRAT_CACHE_FLAGS_DATA_CACHE)
>> +				props->cache_type |= HSA_CACHE_TYPE_DATA;
>> +			if (cache->flags & CRAT_CACHE_FLAGS_INST_CACHE)
>> +				props->cache_type |= HSA_CACHE_TYPE_INSTRUCTION;
>> +			if (cache->flags & CRAT_CACHE_FLAGS_CPU_CACHE)
>> +				props->cache_type |= HSA_CACHE_TYPE_CPU;
>> +			if (cache->flags & CRAT_CACHE_FLAGS_SIMD_CACHE)
>> +				props->cache_type |= HSA_CACHE_TYPE_HSACU;
>> +
>> +			dev->cache_count++;
>> +			dev->node_props.caches_count++;
>> +			list_add_tail(&props->list, &dev->cache_props);
>> +
>> +			break;
>> +		}
>> +
>> +	return 0;
>> +}
>> +
>> +/* kfd_parse_subtype_iolink is called when the topology mutex is already acquired */
>> +static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
>> +{
>> +	struct kfd_iolink_properties *props;
>> +	struct kfd_topology_device *dev;
>> +	uint32_t i = 0;
>> +	uint32_t id_from;
>> +	uint32_t id_to;
>> +
>> +	BUG_ON(!iolink);
>> +
>> +	id_from = iolink->proximity_domain_from;
>> +	id_to = iolink->proximity_domain_to;
>> +
>> +	pr_info("Found IO link entry in CRAT table with id_from=%d\n", id_from);
>> +	list_for_each_entry(dev, &topology_device_list, list) {
>> +		if (id_from == i) {
>> +			props = kfd_alloc_struct(props);
>> +			if (props == 0)
>> +				return -ENOMEM;
>> +
>> +			props->node_from = id_from;
>> +			props->node_to = id_to;
>> +			props->ver_maj = iolink->version_major;
>> +			props->ver_min = iolink->version_minor;
>> +
>> +			/*
>> +			 * weight factor (derived from CDIR), currently always 1
>> +			 */
>> +			props->weight = 1;
>> +
>> +			props->min_latency = iolink->minimum_latency;
>> +			props->max_latency = iolink->maximum_latency;
>> +			props->min_bandwidth = iolink->minimum_bandwidth_mbs;
>> +			props->max_bandwidth = iolink->maximum_bandwidth_mbs;
>> +			props->rec_transfer_size =
>> +					iolink->recommended_transfer_size;
>> +
>> +			dev->io_link_count++;
>> +			dev->node_props.io_links_count++;
>> +			list_add_tail(&props->list, &dev->io_link_props);
>> +
>> +			break;
>> +		}
>> +		i++;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
>> +{
>> +	struct crat_subtype_computeunit *cu;
>> +	struct crat_subtype_memory *mem;
>> +	struct crat_subtype_cache *cache;
>> +	struct crat_subtype_iolink *iolink;
>> +	int ret = 0;
>> +
>> +	BUG_ON(!sub_type_hdr);
>> +
>> +	switch (sub_type_hdr->type) {
>> +	case CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY:
>> +		cu = (struct crat_subtype_computeunit *)sub_type_hdr;
>> +		ret = kfd_parse_subtype_cu(cu);
>> +		break;
>> +	case CRAT_SUBTYPE_MEMORY_AFFINITY:
>> +		mem = (struct crat_subtype_memory *)sub_type_hdr;
>> +		ret = kfd_parse_subtype_mem(mem);
>> +		break;
>> +	case CRAT_SUBTYPE_CACHE_AFFINITY:
>> +		cache = (struct crat_subtype_cache *)sub_type_hdr;
>> +		ret = kfd_parse_subtype_cache(cache);
>> +		break;
>> +	case CRAT_SUBTYPE_TLB_AFFINITY:
>> +		/*
>> +		 * For now, nothing to do here
>> +		 */
>> +		pr_info("Found TLB entry in CRAT table (not processing)\n");
>> +		break;
>> +	case CRAT_SUBTYPE_CCOMPUTE_AFFINITY:
>> +		/*
>> +		 * For now, nothing to do here
>> +		 */
>> +		pr_info("Found CCOMPUTE entry in CRAT table (not processing)\n");
>> +		break;
>> +	case CRAT_SUBTYPE_IOLINK_AFFINITY:
>> +		iolink = (struct crat_subtype_iolink *)sub_type_hdr;
>> +		ret = kfd_parse_subtype_iolink(iolink);
>> +		break;
>> +	default:
>> +		pr_warn("Unknown subtype (%d) in CRAT\n",
>> +				sub_type_hdr->type);
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static void kfd_release_topology_device(struct kfd_topology_device *dev)
>> +{
>> +	struct kfd_mem_properties *mem;
>> +	struct kfd_cache_properties *cache;
>> +	struct kfd_iolink_properties *iolink;
>> +
>> +	BUG_ON(!dev);
>> +
>> +	list_del(&dev->list);
>> +
>> +	while (dev->mem_props.next != &dev->mem_props) {
>> +		mem = container_of(dev->mem_props.next,
>> +				struct kfd_mem_properties, list);
>> +		list_del(&mem->list);
>> +		kfree(mem);
>> +	}
>> +
>> +	while (dev->cache_props.next != &dev->cache_props) {
>> +		cache = container_of(dev->cache_props.next,
>> +				struct kfd_cache_properties, list);
>> +		list_del(&cache->list);
>> +		kfree(cache);
>> +	}
>> +
>> +	while (dev->io_link_props.next != &dev->io_link_props) {
>> +		iolink = container_of(dev->io_link_props.next,
>> +				struct kfd_iolink_properties, list);
>> +		list_del(&iolink->list);
>> +		kfree(iolink);
>> +	}
>> +
>> +	kfree(dev);
>> +
>> +	sys_props.num_devices--;
>> +}
>> +
>> +static void kfd_release_live_view(void)
>> +{
>> +	struct kfd_topology_device *dev;
>> +
>> +	while (topology_device_list.next != &topology_device_list) {
>> +		dev = container_of(topology_device_list.next,
>> +				 struct kfd_topology_device, list);
>> +		kfd_release_topology_device(dev);
>> +}
>> +
>> +	memset(&sys_props, 0, sizeof(sys_props));
>> +}
>> +
>> +static struct kfd_topology_device *kfd_create_topology_device(void)
>> +{
>> +	struct kfd_topology_device *dev;
>> +
>> +	dev = kfd_alloc_struct(dev);
>> +	if (dev == 0) {
>> +		pr_err("No memory to allocate a topology device");
>> +		return 0;
>> +	}
>> +
>> +	INIT_LIST_HEAD(&dev->mem_props);
>> +	INIT_LIST_HEAD(&dev->cache_props);
>> +	INIT_LIST_HEAD(&dev->io_link_props);
>> +
>> +	list_add_tail(&dev->list, &topology_device_list);
>> +	sys_props.num_devices++;
>> +
>> +	return dev;
>> +	}
>> +
>> +static int kfd_parse_crat_table(void *crat_image)
>> +{
>> +	struct kfd_topology_device *top_dev;
>> +	struct crat_subtype_generic *sub_type_hdr;
>> +	uint16_t node_id;
>> +	int ret;
>> +	struct crat_header *crat_table = (struct crat_header *)crat_image;
>> +	uint16_t num_nodes;
>> +	uint32_t image_len;
>> +
>> +	if (!crat_image)
>> +		return -EINVAL;
>> +
>> +	num_nodes = crat_table->num_domains;
>> +	image_len = crat_table->length;
>> +
>> +	pr_info("Parsing CRAT table with %d nodes\n", num_nodes);
>> +
>> +	for (node_id = 0; node_id < num_nodes; node_id++) {
>> +		top_dev = kfd_create_topology_device();
>> +		if (!top_dev) {
>> +			kfd_release_live_view();
>> +			return -ENOMEM;
>> +		}
>> +	}
>> +
>> +	sys_props.platform_id = (*((uint64_t *)crat_table->oem_id)) & CRAT_OEMID_64BIT_MASK;
>> +	sys_props.platform_oem = *((uint64_t *)crat_table->oem_table_id);
>> +	sys_props.platform_rev = crat_table->revision;
>> +
>> +	sub_type_hdr = (struct crat_subtype_generic *)(crat_table+1);
>> +	while ((char *)sub_type_hdr + sizeof(struct crat_subtype_generic) <
>> +			((char *)crat_image) + image_len) {
>> +		if (sub_type_hdr->flags & CRAT_SUBTYPE_FLAGS_ENABLED) {
>> +			ret = kfd_parse_subtype(sub_type_hdr);
>> +			if (ret != 0) {
>> +				kfd_release_live_view();
>> +				return ret;
>> +			}
>> +		}
>> +
>> +		sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
>> +				sub_type_hdr->length);
>> +	}
>> +
>> +	sys_props.generation_count++;
>> +	topology_crat_parsed = 1;
>> +
>> +	return 0;
>> +}
>> +
>> +
>> +#define sysfs_show_gen_prop(buffer, fmt, ...) \
>> +		snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__)
>> +#define sysfs_show_32bit_prop(buffer, name, value) \
>> +		sysfs_show_gen_prop(buffer, "%s %u\n", name, value)
>> +#define sysfs_show_64bit_prop(buffer, name, value) \
>> +		sysfs_show_gen_prop(buffer, "%s %llu\n", name, value)
>> +#define sysfs_show_32bit_val(buffer, value) \
>> +		sysfs_show_gen_prop(buffer, "%u\n", value)
>> +#define sysfs_show_str_val(buffer, value) \
>> +		sysfs_show_gen_prop(buffer, "%s\n", value)
>> +
>> +static ssize_t sysprops_show(struct kobject *kobj, struct attribute *attr,
>> +		char *buffer)
>> +{
>> +	ssize_t ret;
>> +
>> +	/* Making sure that the buffer is an empty string */
>> +	buffer[0] = 0;
>> +
>> +	if (attr == &sys_props.attr_genid) {
>> +		ret = sysfs_show_32bit_val(buffer, sys_props.generation_count);
>> +	} else if (attr == &sys_props.attr_props) {
>> +		sysfs_show_64bit_prop(buffer, "platform_oem",
>> +				sys_props.platform_oem);
>> +		sysfs_show_64bit_prop(buffer, "platform_id",
>> +				sys_props.platform_id);
>> +		ret = sysfs_show_64bit_prop(buffer, "platform_rev",
>> +				sys_props.platform_rev);
>> +	} else {
>> +		ret = -EINVAL;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static const struct sysfs_ops sysprops_ops = {
>> +	.show = sysprops_show,
>> +};
>> +
>> +static struct kobj_type sysprops_type = {
>> +	.sysfs_ops = &sysprops_ops,
>> +};
>> +
>> +static ssize_t iolink_show(struct kobject *kobj, struct attribute *attr,
>> +		char *buffer)
>> +{
>> +	ssize_t ret;
>> +	struct kfd_iolink_properties *iolink;
>> +
>> +	/* Making sure that the buffer is an empty string */
>> +	buffer[0] = 0;
>> +
>> +	iolink = container_of(attr, struct kfd_iolink_properties, attr);
>> +	sysfs_show_32bit_prop(buffer, "type", iolink->iolink_type);
>> +	sysfs_show_32bit_prop(buffer, "version_major", iolink->ver_maj);
>> +	sysfs_show_32bit_prop(buffer, "version_minor", iolink->ver_min);
>> +	sysfs_show_32bit_prop(buffer, "node_from", iolink->node_from);
>> +	sysfs_show_32bit_prop(buffer, "node_to", iolink->node_to);
>> +	sysfs_show_32bit_prop(buffer, "weight", iolink->weight);
>> +	sysfs_show_32bit_prop(buffer, "min_latency", iolink->min_latency);
>> +	sysfs_show_32bit_prop(buffer, "max_latency", iolink->max_latency);
>> +	sysfs_show_32bit_prop(buffer, "min_bandwidth", iolink->min_bandwidth);
>> +	sysfs_show_32bit_prop(buffer, "max_bandwidth", iolink->max_bandwidth);
>> +	sysfs_show_32bit_prop(buffer, "recommended_transfer_size",
>> +			iolink->rec_transfer_size);
>> +	ret = sysfs_show_32bit_prop(buffer, "flags", iolink->flags);
>> +
>> +	return ret;
>> +}
>> +
>> +static const struct sysfs_ops iolink_ops = {
>> +	.show = iolink_show,
>> +};
>> +
>> +static struct kobj_type iolink_type = {
>> +	.sysfs_ops = &iolink_ops,
>> +};
>> +
>> +static ssize_t mem_show(struct kobject *kobj, struct attribute *attr,
>> +		char *buffer)
>> +{
>> +	ssize_t ret;
>> +	struct kfd_mem_properties *mem;
>> +
>> +	/* Making sure that the buffer is an empty string */
>> +	buffer[0] = 0;
>> +
>> +	mem = container_of(attr, struct kfd_mem_properties, attr);
>> +	sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
>> +	sysfs_show_64bit_prop(buffer, "size_in_bytes", mem->size_in_bytes);
>> +	sysfs_show_32bit_prop(buffer, "flags", mem->flags);
>> +	sysfs_show_32bit_prop(buffer, "width", mem->width);
>> +	ret = sysfs_show_32bit_prop(buffer, "mem_clk_max", mem->mem_clk_max);
>> +
>> +	return ret;
>> +}
>> +
>> +static const struct sysfs_ops mem_ops = {
>> +	.show = mem_show,
>> +};
>> +
>> +static struct kobj_type mem_type = {
>> +	.sysfs_ops = &mem_ops,
>> +};
>> +
>> +static ssize_t kfd_cache_show(struct kobject *kobj, struct attribute *attr,
>> +		char *buffer)
>> +{
>> +	ssize_t ret;
>> +	uint32_t i;
>> +	struct kfd_cache_properties *cache;
>> +
>> +	/* Making sure that the buffer is an empty string */
>> +	buffer[0] = 0;
>> +
>> +	cache = container_of(attr, struct kfd_cache_properties, attr);
>> +	sysfs_show_32bit_prop(buffer, "processor_id_low",
>> +			cache->processor_id_low);
>> +	sysfs_show_32bit_prop(buffer, "level", cache->cache_level);
>> +	sysfs_show_32bit_prop(buffer, "size", cache->cache_size);
>> +	sysfs_show_32bit_prop(buffer, "cache_line_size", cache->cacheline_size);
>> +	sysfs_show_32bit_prop(buffer, "cache_lines_per_tag",
>> +			cache->cachelines_per_tag);
>> +	sysfs_show_32bit_prop(buffer, "association", cache->cache_assoc);
>> +	sysfs_show_32bit_prop(buffer, "latency", cache->cache_latency);
>> +	sysfs_show_32bit_prop(buffer, "type", cache->cache_type);
>> +	snprintf(buffer, PAGE_SIZE, "%ssibling_map ", buffer);
>> +	for (i = 0; i < KFD_TOPOLOGY_CPU_SIBLINGS; i++)
>> +		ret = snprintf(buffer, PAGE_SIZE, "%s%d%s",
>> +				buffer, cache->sibling_map[i],
>> +				(i == KFD_TOPOLOGY_CPU_SIBLINGS-1) ?
>> +						"\n" : ",");
>> +
>> +	return ret;
>> +}
>> +
>> +static const struct sysfs_ops cache_ops = {
>> +	.show = kfd_cache_show,
>> +};
>> +
>> +static struct kobj_type cache_type = {
>> +	.sysfs_ops = &cache_ops,
>> +};
>> +
>> +static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>> +		char *buffer)
>> +{
>> +	ssize_t ret;
>> +	struct kfd_topology_device *dev;
>> +	char public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
>> +	uint32_t i;
>> +
>> +	/* Making sure that the buffer is an empty string */
>> +	buffer[0] = 0;
>> +
>> +	if (strcmp(attr->name, "gpu_id") == 0) {
>> +		dev = container_of(attr, struct kfd_topology_device,
>> +				attr_gpuid);
>> +		ret = sysfs_show_32bit_val(buffer, dev->gpu_id);
>> +	} else if (strcmp(attr->name, "name") == 0) {
>> +		dev = container_of(attr, struct kfd_topology_device,
>> +				attr_name);
>> +		for (i = 0; i < KFD_TOPOLOGY_PUBLIC_NAME_SIZE; i++) {
>> +			public_name[i] =
>> +					(char)dev->node_props.marketing_name[i];
>> +			if (dev->node_props.marketing_name[i] == 0)
>> +				break;
>> +		}
>> +		public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE-1] = 0x0;
>> +		ret = sysfs_show_str_val(buffer, public_name);
>> +	} else {
>> +		dev = container_of(attr, struct kfd_topology_device,
>> +				attr_props);
>> +		sysfs_show_32bit_prop(buffer, "cpu_cores_count",
>> +				dev->node_props.cpu_cores_count);
>> +		sysfs_show_32bit_prop(buffer, "simd_count",
>> +				dev->node_props.simd_count);
>> +		sysfs_show_32bit_prop(buffer, "mem_banks_count",
>> +				dev->node_props.mem_banks_count);
>> +		sysfs_show_32bit_prop(buffer, "caches_count",
>> +				dev->node_props.caches_count);
>> +		sysfs_show_32bit_prop(buffer, "io_links_count",
>> +				dev->node_props.io_links_count);
>> +		sysfs_show_32bit_prop(buffer, "cpu_core_id_base",
>> +				dev->node_props.cpu_core_id_base);
>> +		sysfs_show_32bit_prop(buffer, "simd_id_base",
>> +				dev->node_props.simd_id_base);
>> +		sysfs_show_32bit_prop(buffer, "capability",
>> +				dev->node_props.capability);
>> +		sysfs_show_32bit_prop(buffer, "max_waves_per_simd",
>> +				dev->node_props.max_waves_per_simd);
>> +		sysfs_show_32bit_prop(buffer, "lds_size_in_kb",
>> +				dev->node_props.lds_size_in_kb);
>> +		sysfs_show_32bit_prop(buffer, "gds_size_in_kb",
>> +				dev->node_props.gds_size_in_kb);
>> +		sysfs_show_32bit_prop(buffer, "wave_front_size",
>> +				dev->node_props.wave_front_size);
>> +		sysfs_show_32bit_prop(buffer, "array_count",
>> +				dev->node_props.array_count);
>> +		sysfs_show_32bit_prop(buffer, "simd_arrays_per_engine",
>> +				dev->node_props.simd_arrays_per_engine);
>> +		sysfs_show_32bit_prop(buffer, "cu_per_simd_array",
>> +				dev->node_props.cu_per_simd_array);
>> +		sysfs_show_32bit_prop(buffer, "simd_per_cu",
>> +				dev->node_props.simd_per_cu);
>> +		sysfs_show_32bit_prop(buffer, "max_slots_scratch_cu",
>> +				dev->node_props.max_slots_scratch_cu);
>> +		sysfs_show_32bit_prop(buffer, "engine_id",
>> +				dev->node_props.engine_id);
>> +		sysfs_show_32bit_prop(buffer, "vendor_id",
>> +				dev->node_props.vendor_id);
>> +		sysfs_show_32bit_prop(buffer, "device_id",
>> +				dev->node_props.device_id);
>> +		sysfs_show_32bit_prop(buffer, "location_id",
>> +				dev->node_props.location_id);
>> +		sysfs_show_32bit_prop(buffer, "max_engine_clk_fcompute",
>> +				kfd2kgd->get_max_engine_clock_in_mhz(
>> +					dev->gpu->kgd));
>> +		sysfs_show_64bit_prop(buffer, "local_mem_size",
>> +				kfd2kgd->get_vmem_size(dev->gpu->kgd));
>> +		ret = sysfs_show_32bit_prop(buffer, "max_engine_clk_ccompute",
>> +				cpufreq_quick_get_max(0)/1000);
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static const struct sysfs_ops node_ops = {
>> +	.show = node_show,
>> +};
>> +
>> +static struct kobj_type node_type = {
>> +	.sysfs_ops = &node_ops,
>> +};
>> +
>> +static void kfd_remove_sysfs_file(struct kobject *kobj, struct attribute *attr)
>> +{
>> +	sysfs_remove_file(kobj, attr);
>> +	kobject_del(kobj);
>> +	kobject_put(kobj);
>> +}
>> +
>> +static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>> +{
>> +	struct kfd_iolink_properties *iolink;
>> +	struct kfd_cache_properties *cache;
>> +	struct kfd_mem_properties *mem;
>> +
>> +	BUG_ON(!dev);
>> +
>> +	if (dev->kobj_iolink) {
>> +		list_for_each_entry(iolink, &dev->io_link_props, list)
>> +			if (iolink->kobj) {
>> +				kfd_remove_sysfs_file(iolink->kobj, &iolink->attr);
>> +				iolink->kobj = 0;
>> +			}
>> +		kobject_del(dev->kobj_iolink);
>> +		kobject_put(dev->kobj_iolink);
>> +		dev->kobj_iolink = 0;
>> +	}
>> +
>> +	if (dev->kobj_cache) {
>> +		list_for_each_entry(cache, &dev->cache_props, list)
>> +			if (cache->kobj) {
>> +				kfd_remove_sysfs_file(cache->kobj, &cache->attr);
>> +				cache->kobj = 0;
>> +			}
>> +		kobject_del(dev->kobj_cache);
>> +		kobject_put(dev->kobj_cache);
>> +		dev->kobj_cache = 0;
>> +	}
>> +
>> +	if (dev->kobj_mem) {
>> +		list_for_each_entry(mem, &dev->mem_props, list)
>> +			if (mem->kobj) {
>> +				kfd_remove_sysfs_file(mem->kobj, &mem->attr);
>> +				mem->kobj = 0;
>> +			}
>> +		kobject_del(dev->kobj_mem);
>> +		kobject_put(dev->kobj_mem);
>> +		dev->kobj_mem = 0;
>> +	}
>> +
>> +	if (dev->kobj_node) {
>> +		sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
>> +		sysfs_remove_file(dev->kobj_node, &dev->attr_name);
>> +		sysfs_remove_file(dev->kobj_node, &dev->attr_props);
>> +		kobject_del(dev->kobj_node);
>> +		kobject_put(dev->kobj_node);
>> +		dev->kobj_node = 0;
>> +	}
>> +}
>> +
>> +static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>> +		uint32_t id)
>> +{
>> +	struct kfd_iolink_properties *iolink;
>> +	struct kfd_cache_properties *cache;
>> +	struct kfd_mem_properties *mem;
>> +	int ret;
>> +	uint32_t i;
>> +
>> +	BUG_ON(!dev);
>> +
>> +	/*
>> +	 * Creating the sysfs folders
>> +	 */
>> +	BUG_ON(dev->kobj_node);
>> +	dev->kobj_node = kfd_alloc_struct(dev->kobj_node);
>> +	if (!dev->kobj_node)
>> +		return -ENOMEM;
>> +
>> +	ret = kobject_init_and_add(dev->kobj_node, &node_type,
>> +			sys_props.kobj_nodes, "%d", id);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	dev->kobj_mem = kobject_create_and_add("mem_banks", dev->kobj_node);
>> +	if (!dev->kobj_mem)
>> +		return -ENOMEM;
>> +
>> +	dev->kobj_cache = kobject_create_and_add("caches", dev->kobj_node);
>> +	if (!dev->kobj_cache)
>> +		return -ENOMEM;
>> +
>> +	dev->kobj_iolink = kobject_create_and_add("io_links", dev->kobj_node);
>> +	if (!dev->kobj_iolink)
>> +		return -ENOMEM;
>> +
>> +	/*
>> +	 * Creating sysfs files for node properties
>> +	 */
>> +	dev->attr_gpuid.name = "gpu_id";
>> +	dev->attr_gpuid.mode = KFD_SYSFS_FILE_MODE;
>> +	sysfs_attr_init(&dev->attr_gpuid);
>> +	dev->attr_name.name = "name";
>> +	dev->attr_name.mode = KFD_SYSFS_FILE_MODE;
>> +	sysfs_attr_init(&dev->attr_name);
>> +	dev->attr_props.name = "properties";
>> +	dev->attr_props.mode = KFD_SYSFS_FILE_MODE;
>> +	sysfs_attr_init(&dev->attr_props);
>> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_gpuid);
>> +	if (ret < 0)
>> +		return ret;
>> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_name);
>> +	if (ret < 0)
>> +		return ret;
>> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_props);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	i = 0;
>> +	list_for_each_entry(mem, &dev->mem_props, list) {
>> +		mem->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
>> +		if (!mem->kobj)
>> +			return -ENOMEM;
>> +		ret = kobject_init_and_add(mem->kobj, &mem_type,
>> +				dev->kobj_mem, "%d", i);
>> +		if (ret < 0)
>> +			return ret;
>> +
>> +		mem->attr.name = "properties";
>> +		mem->attr.mode = KFD_SYSFS_FILE_MODE;
>> +		sysfs_attr_init(&mem->attr);
>> +		ret = sysfs_create_file(mem->kobj, &mem->attr);
>> +		if (ret < 0)
>> +			return ret;
>> +		i++;
>> +	}
>> +
>> +	i = 0;
>> +	list_for_each_entry(cache, &dev->cache_props, list) {
>> +		cache->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
>> +		if (!cache->kobj)
>> +			return -ENOMEM;
>> +		ret = kobject_init_and_add(cache->kobj, &cache_type,
>> +				dev->kobj_cache, "%d", i);
>> +		if (ret < 0)
>> +			return ret;
>> +
>> +		cache->attr.name = "properties";
>> +		cache->attr.mode = KFD_SYSFS_FILE_MODE;
>> +		sysfs_attr_init(&cache->attr);
>> +		ret = sysfs_create_file(cache->kobj, &cache->attr);
>> +		if (ret < 0)
>> +			return ret;
>> +		i++;
>> +	}
>> +
>> +	i = 0;
>> +	list_for_each_entry(iolink, &dev->io_link_props, list) {
>> +		iolink->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
>> +		if (!iolink->kobj)
>> +			return -ENOMEM;
>> +		ret = kobject_init_and_add(iolink->kobj, &iolink_type,
>> +				dev->kobj_iolink, "%d", i);
>> +		if (ret < 0)
>> +			return ret;
>> +
>> +		iolink->attr.name = "properties";
>> +		iolink->attr.mode = KFD_SYSFS_FILE_MODE;
>> +		sysfs_attr_init(&iolink->attr);
>> +		ret = sysfs_create_file(iolink->kobj, &iolink->attr);
>> +		if (ret < 0)
>> +			return ret;
>> +		i++;
>> +}
>> +
>> +	return 0;
>> +}
>> +
>> +static int kfd_build_sysfs_node_tree(void)
>> +{
>> +	struct kfd_topology_device *dev;
>> +	int ret;
>> +	uint32_t i = 0;
>> +
>> +	list_for_each_entry(dev, &topology_device_list, list) {
>> +		ret = kfd_build_sysfs_node_entry(dev, 0);
>> +		if (ret < 0)
>> +			return ret;
>> +		i++;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void kfd_remove_sysfs_node_tree(void)
>> +{
>> +	struct kfd_topology_device *dev;
>> +
>> +	list_for_each_entry(dev, &topology_device_list, list)
>> +		kfd_remove_sysfs_node_entry(dev);
>> +}
>> +
>> +static int kfd_topology_update_sysfs(void)
>> +{
>> +	int ret;
>> +
>> +	pr_info("Creating topology SYSFS entries\n");
>> +	if (sys_props.kobj_topology == 0) {
>> +		sys_props.kobj_topology = kfd_alloc_struct(sys_props.kobj_topology);
>> +		if (!sys_props.kobj_topology)
>> +			return -ENOMEM;
>> +
>> +		ret = kobject_init_and_add(sys_props.kobj_topology,
>> +				&sysprops_type,  &kfd_device->kobj,
>> +				"topology");
>> +		if (ret < 0)
>> +			return ret;
>> +
>> +		sys_props.kobj_nodes = kobject_create_and_add("nodes",
>> +				sys_props.kobj_topology);
>> +		if (!sys_props.kobj_nodes)
>> +			return -ENOMEM;
>> +
>> +		sys_props.attr_genid.name = "generation_id";
>> +		sys_props.attr_genid.mode = KFD_SYSFS_FILE_MODE;
>> +		sysfs_attr_init(&sys_props.attr_genid);
>> +		ret = sysfs_create_file(sys_props.kobj_topology,
>> +				&sys_props.attr_genid);
>> +		if (ret < 0)
>> +			return ret;
>> +
>> +		sys_props.attr_props.name = "system_properties";
>> +		sys_props.attr_props.mode = KFD_SYSFS_FILE_MODE;
>> +		sysfs_attr_init(&sys_props.attr_props);
>> +		ret = sysfs_create_file(sys_props.kobj_topology,
>> +				&sys_props.attr_props);
>> +		if (ret < 0)
>> +			return ret;
>> +	}
>> +
>> +	kfd_remove_sysfs_node_tree();
>> +
>> +	return kfd_build_sysfs_node_tree();
>> +}
>> +
>> +static void kfd_topology_release_sysfs(void)
>> +{
>> +	kfd_remove_sysfs_node_tree();
>> +	if (sys_props.kobj_topology) {
>> +		sysfs_remove_file(sys_props.kobj_topology,
>> +				&sys_props.attr_genid);
>> +		sysfs_remove_file(sys_props.kobj_topology,
>> +				&sys_props.attr_props);
>> +		if (sys_props.kobj_nodes) {
>> +			kobject_del(sys_props.kobj_nodes);
>> +			kobject_put(sys_props.kobj_nodes);
>> +			sys_props.kobj_nodes = 0;
>> +		}
>> +		kobject_del(sys_props.kobj_topology);
>> +		kobject_put(sys_props.kobj_topology);
>> +		sys_props.kobj_topology = 0;
>> +	}
>> +}
>> +
>> +int kfd_topology_init(void)
>> +{
>> +	void *crat_image = 0;
>> +	size_t image_size = 0;
>> +	int ret;
>> +
>> +	/*
>> +	 * Initialize the head for the topology device list
>> +	 */
>> +	INIT_LIST_HEAD(&topology_device_list);
>> +	init_rwsem(&topology_lock);
>> +	topology_crat_parsed = 0;
>> +
>> +	memset(&sys_props, 0, sizeof(sys_props));
>> +
>> +	/*
>> +	 * Get the CRAT image from the ACPI
>> +	 */
>> +	ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
>> +	if (ret == 0 && image_size > 0) {
>> +		pr_info("Found CRAT image with size=%zd\n", image_size);
>> +		crat_image = kmalloc(image_size, GFP_KERNEL);
>> +		if (!crat_image) {
>> +			ret = -ENOMEM;
>> +			pr_err("No memory for allocating CRAT image\n");
>> +			goto err;
>> +		}
>> +		ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
>> +
>> +		if (ret == 0) {
>> +			down_write(&topology_lock);
>> +			ret = kfd_parse_crat_table(crat_image);
>> +			if (ret == 0)
>> +				ret = kfd_topology_update_sysfs();
>> +			up_write(&topology_lock);
>> +		} else {
>> +			pr_err("Couldn't get CRAT table size from ACPI\n");
>> +		}
>> +		kfree(crat_image);
>> +	} else if (ret == -ENODATA) {
>> +		ret = 0;
>> +	} else {
>> +		pr_err("Couldn't get CRAT table size from ACPI\n");
>> +	}
>> +
>> +err:
>> +	pr_info("Finished initializing topology ret=%d\n", ret);
>> +	return ret;
>> +}
>> +
>> +void kfd_topology_shutdown(void)
>> +{
>> +	kfd_topology_release_sysfs();
>> +	kfd_release_live_view();
>> +}
>> +
>> +static void kfd_debug_print_topology(void)
>> +{
>> +	struct kfd_topology_device *dev;
>> +	uint32_t i = 0;
>> +
>> +	pr_info("DEBUG PRINT OF TOPOLOGY:");
>> +	list_for_each_entry(dev, &topology_device_list, list) {
>> +		pr_info("Node: %d\n", i);
>> +		pr_info("\tGPU assigned: %s\n", (dev->gpu ? "yes" : "no"));
>> +		pr_info("\tCPU count: %d\n", dev->node_props.cpu_cores_count);
>> +		pr_info("\tSIMD count: %d", dev->node_props.simd_count);
>> +		i++;
>> +	}
>> +}
>> +
>> +static uint32_t kfd_generate_gpu_id(struct kfd_dev *gpu)
>> +{
>> +	uint32_t hashout;
>> +	uint32_t buf[7];
>> +	int i;
>> +
>> +	if (!gpu)
>> +		return 0;
>> +
>> +	buf[0] = gpu->pdev->devfn;
>> +	buf[1] = gpu->pdev->subsystem_vendor;
>> +	buf[2] = gpu->pdev->subsystem_device;
>> +	buf[3] = gpu->pdev->device;
>> +	buf[4] = gpu->pdev->bus->number;
>> +	buf[5] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) & 0xffffffff);
>> +	buf[6] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) >> 32);
>> +
>> +	for (i = 0, hashout = 0; i < 7; i++)
>> +		hashout ^= hash_32(buf[i], KFD_GPU_ID_HASH_WIDTH);
>> +
>> +	return hashout;
>> +}
>> +
>> +static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
>> +{
>> +	struct kfd_topology_device *dev;
>> +	struct kfd_topology_device *out_dev = 0;
>> +
>> +	BUG_ON(!gpu);
>> +
>> +	list_for_each_entry(dev, &topology_device_list, list)
>> +		if (dev->gpu == 0 && dev->node_props.simd_count > 0) {
>> +			dev->gpu = gpu;
>> +			out_dev = dev;
>> +			break;
>> +		}
>> +
>> +	return out_dev;
>> +}
>> +
>> +static void kfd_notify_gpu_change(uint32_t gpu_id, int arrival)
>> +{
>> +	/*
>> +	 * TODO: Generate an event for thunk about the arrival/removal
>> +	 * of the GPU
>> +	 */
>> +}
>> +
>> +int kfd_topology_add_device(struct kfd_dev *gpu)
>> +{
>> +	uint32_t gpu_id;
>> +	struct kfd_topology_device *dev;
>> +	int res;
>> +
>> +	BUG_ON(!gpu);
>> +
>> +	gpu_id = kfd_generate_gpu_id(gpu);
>> +
>> +	pr_debug("kfd: Adding new GPU (ID: 0x%x) to topology\n", gpu_id);
>> +
>> +	down_write(&topology_lock);
>> +	/*
>> +	 * Try to assign the GPU to existing topology device (generated from
>> +	 * CRAT table
>> +	 */
>> +	dev = kfd_assign_gpu(gpu);
>> +	if (!dev) {
>> +		pr_info("GPU was not found in the current topology. Extending.\n");
>> +		kfd_debug_print_topology();
>> +		dev = kfd_create_topology_device();
>> +		if (!dev) {
>> +			res = -ENOMEM;
>> +			goto err;
>> +		}
>> +		dev->gpu = gpu;
>> +
>> +		/*
>> +		 * TODO: Make a call to retrieve topology information from the
>> +		 * GPU vBIOS
>> +		 */
>> +
>> +		/*
>> +		 * Update the SYSFS tree, since we added another topology device
>> +		 */
>> +		if (kfd_topology_update_sysfs() < 0)
>> +			kfd_topology_release_sysfs();
>> +
>> +	}
>> +
>> +	dev->gpu_id = gpu_id;
>> +	gpu->id = gpu_id;
>> +	dev->node_props.vendor_id = gpu->pdev->vendor;
>> +	dev->node_props.device_id = gpu->pdev->device;
>> +	dev->node_props.location_id = (gpu->pdev->bus->number << 24) +
>> +			(gpu->pdev->devfn & 0xffffff);
>> +	/*
>> +	 * TODO: Retrieve max engine clock values from KGD
>> +	 */
>> +
>> +	res = 0;
>> +
>> +err:
>> +	up_write(&topology_lock);
>> +
>> +	if (res == 0)
>> +		kfd_notify_gpu_change(gpu_id, 1);
>> +
>> +	return res;
>> +}
>> +
>> +int kfd_topology_remove_device(struct kfd_dev *gpu)
>> +{
>> +	struct kfd_topology_device *dev;
>> +	uint32_t gpu_id;
>> +	int res = -ENODEV;
>> +
>> +	BUG_ON(!gpu);
>> +
>> +	down_write(&topology_lock);
>> +
>> +	list_for_each_entry(dev, &topology_device_list, list)
>> +		if (dev->gpu == gpu) {
>> +			gpu_id = dev->gpu_id;
>> +			kfd_remove_sysfs_node_entry(dev);
>> +			kfd_release_topology_device(dev);
>> +			res = 0;
>> +			if (kfd_topology_update_sysfs() < 0)
>> +				kfd_topology_release_sysfs();
>> +			break;
>> +		}
>> +
>> +	up_write(&topology_lock);
>> +
>> +	if (res == 0)
>> +		kfd_notify_gpu_change(gpu_id, 0);
>> +
>> +	return res;
>> +}
>> +
>> +/*
>> + * When idx is out of bounds, the function will return NULL
>> + */
>> +struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx)
>> +{
>> +
>> +	struct kfd_topology_device *top_dev;
>> +	struct kfd_dev *device = NULL;
>> +	uint8_t device_idx = 0;
>> +
>> +	down_read(&topology_lock);
>> +
>> +	list_for_each_entry(top_dev, &topology_device_list, list) {
>> +		if (device_idx == idx) {
>> +			device = top_dev->gpu;
>> +			break;
>> +		}
>> +
>> +		device_idx++;
>> +	}
>> +
>> +	up_read(&topology_lock);
>> +
>> +	return device;
>> +
>> +}
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
>> new file mode 100644
>> index 0000000..989624b
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
>> @@ -0,0 +1,168 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#ifndef __KFD_TOPOLOGY_H__
>> +#define __KFD_TOPOLOGY_H__
>> +
>> +#include <linux/types.h>
>> +#include <linux/list.h>
>> +#include "kfd_priv.h"
>> +
>> +#define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
>> +
>> +#define HSA_CAP_HOT_PLUGGABLE			0x00000001
>> +#define HSA_CAP_ATS_PRESENT			0x00000002
>> +#define HSA_CAP_SHARED_WITH_GRAPHICS		0x00000004
>> +#define HSA_CAP_QUEUE_SIZE_POW2			0x00000008
>> +#define HSA_CAP_QUEUE_SIZE_32BIT		0x00000010
>> +#define HSA_CAP_QUEUE_IDLE_EVENT		0x00000020
>> +#define HSA_CAP_VA_LIMIT			0x00000040
>> +#define HSA_CAP_WATCH_POINTS_SUPPORTED		0x00000080
>> +#define HSA_CAP_WATCH_POINTS_TOTALBITS_MASK	0x00000f00
>> +#define HSA_CAP_WATCH_POINTS_TOTALBITS_SHIFT	8
>> +#define HSA_CAP_RESERVED			0xfffff000
>> +
>> +struct kfd_node_properties {
>> +	uint32_t cpu_cores_count;
>> +	uint32_t simd_count;
>> +	uint32_t mem_banks_count;
>> +	uint32_t caches_count;
>> +	uint32_t io_links_count;
>> +	uint32_t cpu_core_id_base;
>> +	uint32_t simd_id_base;
>> +	uint32_t capability;
>> +	uint32_t max_waves_per_simd;
>> +	uint32_t lds_size_in_kb;
>> +	uint32_t gds_size_in_kb;
>> +	uint32_t wave_front_size;
>> +	uint32_t array_count;
>> +	uint32_t simd_arrays_per_engine;
>> +	uint32_t cu_per_simd_array;
>> +	uint32_t simd_per_cu;
>> +	uint32_t max_slots_scratch_cu;
>> +	uint32_t engine_id;
>> +	uint32_t vendor_id;
>> +	uint32_t device_id;
>> +	uint32_t location_id;
>> +	uint32_t max_engine_clk_fcompute;
>> +	uint32_t max_engine_clk_ccompute;
>> +	uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
>> +};
>> +
>> +#define HSA_MEM_HEAP_TYPE_SYSTEM	0
>> +#define HSA_MEM_HEAP_TYPE_FB_PUBLIC	1
>> +#define HSA_MEM_HEAP_TYPE_FB_PRIVATE	2
>> +#define HSA_MEM_HEAP_TYPE_GPU_GDS	3
>> +#define HSA_MEM_HEAP_TYPE_GPU_LDS	4
>> +#define HSA_MEM_HEAP_TYPE_GPU_SCRATCH	5
>> +
>> +#define HSA_MEM_FLAGS_HOT_PLUGGABLE	0x00000001
>> +#define HSA_MEM_FLAGS_NON_VOLATILE	0x00000002
>> +#define HSA_MEM_FLAGS_RESERVED		0xfffffffc
>> +
>> +struct kfd_mem_properties {
>> +	struct list_head	list;
>> +	uint32_t		heap_type;
>> +	uint64_t		size_in_bytes;
>> +	uint32_t		flags;
>> +	uint32_t		width;
>> +	uint32_t		mem_clk_max;
>> +	struct kobject		*kobj;
>> +	struct attribute	attr;
>> +};
>> +
>> +#define KFD_TOPOLOGY_CPU_SIBLINGS 256
>> +
>> +#define HSA_CACHE_TYPE_DATA		0x00000001
>> +#define HSA_CACHE_TYPE_INSTRUCTION	0x00000002
>> +#define HSA_CACHE_TYPE_CPU		0x00000004
>> +#define HSA_CACHE_TYPE_HSACU		0x00000008
>> +#define HSA_CACHE_TYPE_RESERVED		0xfffffff0
>> +
>> +struct kfd_cache_properties {
>> +	struct list_head	list;
>> +	uint32_t		processor_id_low;
>> +	uint32_t		cache_level;
>> +	uint32_t		cache_size;
>> +	uint32_t		cacheline_size;
>> +	uint32_t		cachelines_per_tag;
>> +	uint32_t		cache_assoc;
>> +	uint32_t		cache_latency;
>> +	uint32_t		cache_type;
>> +	uint8_t			sibling_map[KFD_TOPOLOGY_CPU_SIBLINGS];
>> +	struct kobject		*kobj;
>> +	struct attribute	attr;
>> +};
>> +
>> +struct kfd_iolink_properties {
>> +	struct list_head	list;
>> +	uint32_t		iolink_type;
>> +	uint32_t		ver_maj;
>> +	uint32_t		ver_min;
>> +	uint32_t		node_from;
>> +	uint32_t		node_to;
>> +	uint32_t		weight;
>> +	uint32_t		min_latency;
>> +	uint32_t		max_latency;
>> +	uint32_t		min_bandwidth;
>> +	uint32_t		max_bandwidth;
>> +	uint32_t		rec_transfer_size;
>> +	uint32_t		flags;
>> +	struct kobject		*kobj;
>> +	struct attribute	attr;
>> +};
>> +
>> +struct kfd_topology_device {
>> +	struct list_head		list;
>> +	uint32_t			gpu_id;
>> +	struct kfd_node_properties	node_props;
>> +	uint32_t			mem_bank_count;
>> +	struct list_head		mem_props;
>> +	uint32_t			cache_count;
>> +	struct list_head		cache_props;
>> +	uint32_t			io_link_count;
>> +	struct list_head		io_link_props;
>> +	struct kfd_dev			*gpu;
>> +	struct kobject			*kobj_node;
>> +	struct kobject			*kobj_mem;
>> +	struct kobject			*kobj_cache;
>> +	struct kobject			*kobj_iolink;
>> +	struct attribute		attr_gpuid;
>> +	struct attribute		attr_name;
>> +	struct attribute		attr_props;
>> +};
>> +
>> +struct kfd_system_properties {
>> +	uint32_t		num_devices;     /* Number of H-NUMA nodes */
>> +	uint32_t		generation_count;
>> +	uint64_t		platform_oem;
>> +	uint64_t		platform_id;
>> +	uint64_t		platform_rev;
>> +	struct kobject		*kobj_topology;
>> +	struct kobject		*kobj_nodes;
>> +	struct attribute	attr_genid;
>> +	struct attribute	attr_props;
>> +};
>> +
>> +
>> +
>> +#endif /* __KFD_TOPOLOGY_H__ */
>> --
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 10/25] amdkfd: Add topology module to amdkfd
  2014-07-27 11:15     ` Oded Gabbay
@ 2014-07-30 12:10       ` Oded Gabbay
  0 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-30 12:10 UTC (permalink / raw)
  To: Jerome Glisse, linux-kernel, dri-devel
  Cc: Andrew Lewycky, Michel Dänzer, Alexey Skidanov, Dave Airlie,
	Andrew Morton

On 27/07/14 14:15, Oded Gabbay wrote:
> On 21/07/14 01:37, Jerome Glisse wrote:
>> On Thu, Jul 17, 2014 at 04:29:17PM +0300, Oded Gabbay wrote:
>>> From: Evgeny Pinchuk <evgeny.pinchuk@amd.com>
>>>
>>> This patch adds the topology module to the driver. The topology is exposed to
>>> userspace through the sysfs.
>>>
>>> The calls to add and remove a device to/from topology are done by the radeon
>>> driver.
>>
>> So overall we already said that we do not want to see the cpu architecture
>> re-expose by hsa in its own format. This pacth is NACK. Only expose additional
>> non existent information and also follow the number one rules of sysfs which
>> is one value -> one file.
>>
>> I understand the temptation to rexpose the cpu topology in your own way to
>> make life simpler but there is already api for this so please use what exist
>> today and if there is short coming than i am sure they can be fixed.
>>
>> See :
>>
>> /sys/devices/system/cpu/cpu*
>>
> Agreed and we will change the code. Hopefully it will be in v3, although we may
> release v3 early this week and release v4 with this change next week.
>
>      Oded

So it seems that I jumped the gun here.
The CPU information, that is provided in the topology section of the amdkfd 
driver, is extracted from the CRAT table. Unlike the CPU information located in 
/sys/devices/system/cpu/cpu*, which is extracted from the SRAT table.
While the CPU information provided by the CRAT and the SRAT tables might be 
identical, the node topology might be different. The SRAT table contains the 
topology of CPU nodes only. The CRAT table contains the topology of CPU and GPU 
nodes together (and can be interleaved). For example CPU node 1 in SRAT can be 
CPU node 3 in CRAT. Furthermore it's worth to mention that the CRAT table 
contains only HSA compatible nodes (nodes which are compliant with the HSA spec).
To recap, amdkfd exposes a different kind of topology than the one exposed by 
/sys/devices/system/cpu/cpu even though it may contain similar information.

	Oded
>>>
>>> Signed-off-by: Evgeny Pinchuk <evgeny.pinchuk@amd.com>
>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>> ---
>>>   drivers/gpu/drm/radeon/amdkfd/Makefile       |    2 +-
>>>   drivers/gpu/drm/radeon/amdkfd/kfd_crat.h     |  294 +++++++
>>>   drivers/gpu/drm/radeon/amdkfd/kfd_device.c   |    7 +
>>>   drivers/gpu/drm/radeon/amdkfd/kfd_module.c   |    7 +
>>>   drivers/gpu/drm/radeon/amdkfd/kfd_priv.h     |   17 +
>>>   drivers/gpu/drm/radeon/amdkfd/kfd_topology.c | 1207 ++++++++++++++++++++++++++
>>>   drivers/gpu/drm/radeon/amdkfd/kfd_topology.h |  168 ++++
>>>   7 files changed, 1701 insertions(+), 1 deletion(-)
>>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
>>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
>>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
>>>
>>> diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile
>>> b/drivers/gpu/drm/radeon/amdkfd/Makefile
>>> index 9564e75..08ecfcd 100644
>>> --- a/drivers/gpu/drm/radeon/amdkfd/Makefile
>>> +++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
>>> @@ -4,6 +4,6 @@
>>>
>>>   ccflags-y := -Iinclude/drm
>>>
>>> -amdkfd-y    := kfd_module.o kfd_device.o kfd_chardev.o
>>> +amdkfd-y    := kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o
>>>
>>>   obj-$(CONFIG_HSA_RADEON)    += amdkfd.o
>>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
>>> b/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
>>> new file mode 100644
>>> index 0000000..a374fa3
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
>>> @@ -0,0 +1,294 @@
>>> +/*
>>> + * Copyright 2014 Advanced Micro Devices, Inc.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>> + * copy of this software and associated documentation files (the "Software"),
>>> + * to deal in the Software without restriction, including without limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>> + */
>>> +
>>> +#ifndef KFD_CRAT_H_INCLUDED
>>> +#define KFD_CRAT_H_INCLUDED
>>> +
>>> +#include <linux/types.h>
>>> +
>>> +#pragma pack(1)
>>
>> No pragma
>>
>>> +
>>> +/*
>>> + * 4CC signature values for the CRAT and CDIT ACPI tables
>>> + */
>>> +
>>> +#define CRAT_SIGNATURE    "CRAT"
>>> +#define CDIT_SIGNATURE    "CDIT"
>>> +
>>> +/*
>>> + * Component Resource Association Table (CRAT)
>>> + */
>>> +
>>> +#define CRAT_OEMID_LENGTH    6
>>> +#define CRAT_OEMTABLEID_LENGTH    8
>>> +#define CRAT_RESERVED_LENGTH    6
>>> +
>>> +#define CRAT_OEMID_64BIT_MASK ((1ULL << (CRAT_OEMID_LENGTH * 8)) - 1)
>>> +
>>> +struct crat_header {
>>> +    uint32_t    signature;
>>> +    uint32_t    length;
>>> +    uint8_t        revision;
>>> +    uint8_t        checksum;
>>> +    uint8_t        oem_id[CRAT_OEMID_LENGTH];
>>> +    uint8_t        oem_table_id[CRAT_OEMTABLEID_LENGTH];
>>> +    uint32_t    oem_revision;
>>> +    uint32_t    creator_id;
>>> +    uint32_t    creator_revision;
>>> +    uint32_t    total_entries;
>>> +    uint16_t    num_domains;
>>> +    uint8_t        reserved[CRAT_RESERVED_LENGTH];
>>> +};
>>> +
>>> +/*
>>> + * The header structure is immediately followed by total_entries of the
>>> + * data definitions
>>> + */
>>> +
>>> +/*
>>> + * The currently defined subtype entries in the CRAT
>>> + */
>>> +#define CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY    0
>>> +#define CRAT_SUBTYPE_MEMORY_AFFINITY        1
>>> +#define CRAT_SUBTYPE_CACHE_AFFINITY        2
>>> +#define CRAT_SUBTYPE_TLB_AFFINITY        3
>>> +#define CRAT_SUBTYPE_CCOMPUTE_AFFINITY        4
>>> +#define CRAT_SUBTYPE_IOLINK_AFFINITY        5
>>> +#define CRAT_SUBTYPE_MAX            6
>>> +
>>> +#define CRAT_SIBLINGMAP_SIZE    32
>>> +
>>> +/*
>>> + * ComputeUnit Affinity structure and definitions
>>> + */
>>> +#define CRAT_CU_FLAGS_ENABLED        0x00000001
>>> +#define CRAT_CU_FLAGS_HOT_PLUGGABLE    0x00000002
>>> +#define CRAT_CU_FLAGS_CPU_PRESENT    0x00000004
>>> +#define CRAT_CU_FLAGS_GPU_PRESENT    0x00000008
>>> +#define CRAT_CU_FLAGS_IOMMU_PRESENT    0x00000010
>>> +#define CRAT_CU_FLAGS_RESERVED        0xffffffe0
>>> +
>>> +#define CRAT_COMPUTEUNIT_RESERVED_LENGTH 4
>>> +
>>> +struct crat_subtype_computeunit {
>>> +    uint8_t        type;
>>> +    uint8_t        length;
>>> +    uint16_t    reserved;
>>> +    uint32_t    flags;
>>> +    uint32_t    proximity_domain;
>>> +    uint32_t    processor_id_low;
>>> +    uint16_t    num_cpu_cores;
>>> +    uint16_t    num_simd_cores;
>>> +    uint16_t    max_waves_simd;
>>> +    uint16_t    io_count;
>>> +    uint16_t    hsa_capability;
>>> +    uint16_t    lds_size_in_kb;
>>> +    uint8_t        wave_front_size;
>>> +    uint8_t        num_banks;
>>> +    uint16_t    micro_engine_id;
>>> +    uint8_t        num_arrays;
>>> +    uint8_t        num_cu_per_array;
>>> +    uint8_t        num_simd_per_cu;
>>> +    uint8_t        max_slots_scatch_cu;
>>> +    uint8_t        reserved2[CRAT_COMPUTEUNIT_RESERVED_LENGTH];
>>> +};
>>> +
>>> +/*
>>> + * HSA Memory Affinity structure and definitions
>>> + */
>>> +#define CRAT_MEM_FLAGS_ENABLED        0x00000001
>>> +#define CRAT_MEM_FLAGS_HOT_PLUGGABLE    0x00000002
>>> +#define CRAT_MEM_FLAGS_NON_VOLATILE    0x00000004
>>> +#define CRAT_MEM_FLAGS_RESERVED        0xfffffff8
>>> +
>>> +#define CRAT_MEMORY_RESERVED_LENGTH 8
>>> +
>>> +struct crat_subtype_memory {
>>> +    uint8_t        type;
>>> +    uint8_t        length;
>>> +    uint16_t    reserved;
>>> +    uint32_t    flags;
>>> +    uint32_t    promixity_domain;
>>> +    uint32_t    base_addr_low;
>>> +    uint32_t    base_addr_high;
>>> +    uint32_t    length_low;
>>> +    uint32_t    length_high;
>>> +    uint32_t    width;
>>> +    uint8_t        reserved2[CRAT_MEMORY_RESERVED_LENGTH];
>>> +};
>>> +
>>> +/*
>>> + * HSA Cache Affinity structure and definitions
>>> + */
>>> +#define CRAT_CACHE_FLAGS_ENABLED    0x00000001
>>> +#define CRAT_CACHE_FLAGS_DATA_CACHE    0x00000002
>>> +#define CRAT_CACHE_FLAGS_INST_CACHE    0x00000004
>>> +#define CRAT_CACHE_FLAGS_CPU_CACHE    0x00000008
>>> +#define CRAT_CACHE_FLAGS_SIMD_CACHE    0x00000010
>>> +#define CRAT_CACHE_FLAGS_RESERVED    0xffffffe0
>>> +
>>> +#define CRAT_CACHE_RESERVED_LENGTH 8
>>> +
>>> +struct crat_subtype_cache {
>>> +    uint8_t        type;
>>> +    uint8_t        length;
>>> +    uint16_t    reserved;
>>> +    uint32_t    flags;
>>> +    uint32_t    processor_id_low;
>>> +    uint8_t        sibling_map[CRAT_SIBLINGMAP_SIZE];
>>> +    uint32_t    cache_size;
>>> +    uint8_t        cache_level;
>>> +    uint8_t        lines_per_tag;
>>> +    uint16_t    cache_line_size;
>>> +    uint8_t        associativity;
>>> +    uint8_t        cache_properties;
>>> +    uint16_t    cache_latency;
>>> +    uint8_t        reserved2[CRAT_CACHE_RESERVED_LENGTH];
>>> +};
>>> +
>>> +/*
>>> + * HSA TLB Affinity structure and definitions
>>> + */
>>> +#define CRAT_TLB_FLAGS_ENABLED    0x00000001
>>> +#define CRAT_TLB_FLAGS_DATA_TLB    0x00000002
>>> +#define CRAT_TLB_FLAGS_INST_TLB    0x00000004
>>> +#define CRAT_TLB_FLAGS_CPU_TLB    0x00000008
>>> +#define CRAT_TLB_FLAGS_SIMD_TLB    0x00000010
>>> +#define CRAT_TLB_FLAGS_RESERVED    0xffffffe0
>>> +
>>> +#define CRAT_TLB_RESERVED_LENGTH 4
>>> +
>>> +struct crat_subtype_tlb {
>>> +    uint8_t        type;
>>> +    uint8_t        length;
>>> +    uint16_t    reserved;
>>> +    uint32_t    flags;
>>> +    uint32_t    processor_id_low;
>>> +    uint8_t        sibling_map[CRAT_SIBLINGMAP_SIZE];
>>> +    uint32_t    tlb_level;
>>> +    uint8_t        data_tlb_associativity_2mb;
>>> +    uint8_t        data_tlb_size_2mb;
>>> +    uint8_t        instruction_tlb_associativity_2mb;
>>> +    uint8_t        instruction_tlb_size_2mb;
>>> +    uint8_t        data_tlb_associativity_4k;
>>> +    uint8_t        data_tlb_size_4k;
>>> +    uint8_t        instruction_tlb_associativity_4k;
>>> +    uint8_t        instruction_tlb_size_4k;
>>> +    uint8_t        data_tlb_associativity_1gb;
>>> +    uint8_t        data_tlb_size_1gb;
>>> +    uint8_t        instruction_tlb_associativity_1gb;
>>> +    uint8_t        instruction_tlb_size_1gb;
>>> +    uint8_t        reserved2[CRAT_TLB_RESERVED_LENGTH];
>>> +};
>>> +
>>> +/*
>>> + * HSA CCompute/APU Affinity structure and definitions
>>> + */
>>> +#define CRAT_CCOMPUTE_FLAGS_ENABLED    0x00000001
>>> +#define CRAT_CCOMPUTE_FLAGS_RESERVED    0xfffffffe
>>> +
>>> +#define CRAT_CCOMPUTE_RESERVED_LENGTH 16
>>> +
>>> +struct crat_subtype_ccompute {
>>> +    uint8_t        type;
>>> +    uint8_t        length;
>>> +    uint16_t    reserved;
>>> +    uint32_t    flags;
>>> +    uint32_t    processor_id_low;
>>> +    uint8_t        sibling_map[CRAT_SIBLINGMAP_SIZE];
>>> +    uint32_t    apu_size;
>>> +    uint8_t        reserved2[CRAT_CCOMPUTE_RESERVED_LENGTH];
>>> +};
>>> +
>>> +/*
>>> + * HSA IO Link Affinity structure and definitions
>>> + */
>>> +#define CRAT_IOLINK_FLAGS_ENABLED    0x00000001
>>> +#define CRAT_IOLINK_FLAGS_COHERENCY    0x00000002
>>> +#define CRAT_IOLINK_FLAGS_RESERVED    0xfffffffc
>>> +
>>> +/*
>>> + * IO interface types
>>> + */
>>> +#define CRAT_IOLINK_TYPE_UNDEFINED    0
>>> +#define CRAT_IOLINK_TYPE_HYPERTRANSPORT    1
>>> +#define CRAT_IOLINK_TYPE_PCIEXPRESS    2
>>> +#define CRAT_IOLINK_TYPE_OTHER        3
>>> +#define CRAT_IOLINK_TYPE_MAX        255
>>> +
>>> +#define CRAT_IOLINK_RESERVED_LENGTH 24
>>> +
>>> +struct crat_subtype_iolink {
>>> +    uint8_t        type;
>>> +    uint8_t        length;
>>> +    uint16_t    reserved;
>>> +    uint32_t    flags;
>>> +    uint32_t    proximity_domain_from;
>>> +    uint32_t    proximity_domain_to;
>>> +    uint8_t        io_interface_type;
>>> +    uint8_t        version_major;
>>> +    uint16_t    version_minor;
>>> +    uint32_t    minimum_latency;
>>> +    uint32_t    maximum_latency;
>>> +    uint32_t    minimum_bandwidth_mbs;
>>> +    uint32_t    maximum_bandwidth_mbs;
>>> +    uint32_t    recommended_transfer_size;
>>> +    uint8_t        reserved2[CRAT_IOLINK_RESERVED_LENGTH];
>>> +};
>>> +
>>> +/*
>>> + * HSA generic sub-type header
>>> + */
>>> +
>>> +#define CRAT_SUBTYPE_FLAGS_ENABLED 0x00000001
>>> +
>>> +struct crat_subtype_generic {
>>> +    uint8_t        type;
>>> +    uint8_t        length;
>>> +    uint16_t    reserved;
>>> +    uint32_t    flags;
>>> +};
>>> +
>>> +/*
>>> + * Component Locality Distance Information Table (CDIT)
>>> + */
>>> +#define CDIT_OEMID_LENGTH    6
>>> +#define CDIT_OEMTABLEID_LENGTH    8
>>> +
>>> +struct cdit_header {
>>> +    uint32_t    signature;
>>> +    uint32_t    length;
>>> +    uint8_t        revision;
>>> +    uint8_t        checksum;
>>> +    uint8_t        oem_id[CDIT_OEMID_LENGTH];
>>> +    uint8_t        oem_table_id[CDIT_OEMTABLEID_LENGTH];
>>> +    uint32_t    oem_revision;
>>> +    uint32_t    creator_id;
>>> +    uint32_t    creator_revision;
>>> +    uint32_t    total_entries;
>>> +    uint16_t    num_domains;
>>> +    uint8_t        entry[1];
>>> +};
>>> +
>>> +#pragma pack()
>>> +
>>> +#endif /* KFD_CRAT_H_INCLUDED */
>>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>>> b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>>> index dd63ce09..4138694 100644
>>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>>> @@ -100,6 +100,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>>>   {
>>>       kfd->shared_resources = *gpu_resources;
>>>
>>> +    if (kfd_topology_add_device(kfd) != 0)
>>> +        return false;
>>> +
>>>       kfd->init_complete = true;
>>>       dev_info(kfd_device, "added device (%x:%x)\n", kfd->pdev->vendor,
>>>            kfd->pdev->device);
>>> @@ -109,6 +112,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>>>
>>>   void kgd2kfd_device_exit(struct kfd_dev *kfd)
>>>   {
>>> +    int err = kfd_topology_remove_device(kfd);
>>> +
>>> +    BUG_ON(err != 0);
>>> +
>>>       kfree(kfd);
>>>   }
>>>
>>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>>> b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>>> index c7faac6..c51f981 100644
>>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>>> @@ -73,16 +73,23 @@ static int __init kfd_module_init(void)
>>>       if (err < 0)
>>>           goto err_ioctl;
>>>
>>> +    err = kfd_topology_init();
>>> +    if (err < 0)
>>> +        goto err_topology;
>>> +
>>>       dev_info(kfd_device, "Initialized module\n");
>>>
>>>       return 0;
>>>
>>> +err_topology:
>>> +    kfd_chardev_exit();
>>>   err_ioctl:
>>>       return err;
>>>   }
>>>
>>>   static void __exit kfd_module_exit(void)
>>>   {
>>> +    kfd_topology_shutdown();
>>>       kfd_chardev_exit();
>>>       dev_info(kfd_device, "Removed module\n");
>>>   }
>>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>>> b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>>> index 05e892f..b391e24 100644
>>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>>> @@ -32,6 +32,14 @@
>>>   #include <linux/spinlock.h>
>>>   #include "../radeon_kfd.h"
>>>
>>> +#define KFD_SYSFS_FILE_MODE 0444
>>> +
>>> +/* GPU ID hash width in bits */
>>> +#define KFD_GPU_ID_HASH_WIDTH 16
>>> +
>>> +/* Macro for allocating structures */
>>> +#define kfd_alloc_struct(ptr_to_struct)    ((typeof(ptr_to_struct))
>>> kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
>>> +
>>>   struct kfd_device_info {
>>>       const struct kfd_scheduler_class *scheduler_class;
>>>       unsigned int max_pasid_bits;
>>> @@ -71,6 +79,15 @@ struct kfd_process {
>>>
>>>   extern struct device *kfd_device;
>>>
>>> +/* Topology */
>>> +int kfd_topology_init(void);
>>> +void kfd_topology_shutdown(void);
>>> +int kfd_topology_add_device(struct kfd_dev *gpu);
>>> +int kfd_topology_remove_device(struct kfd_dev *gpu);
>>> +struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
>>> +struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
>>> +struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx);
>>> +
>>>   /* Interrupts */
>>>   void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
>>>
>>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
>>> b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
>>> new file mode 100644
>>> index 0000000..30da4c3
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
>>> @@ -0,0 +1,1207 @@
>>> +/*
>>> + * Copyright 2014 Advanced Micro Devices, Inc.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>> + * copy of this software and associated documentation files (the "Software"),
>>> + * to deal in the Software without restriction, including without limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>> + */
>>> +
>>> +#include <linux/types.h>
>>> +#include <linux/kernel.h>
>>> +#include <linux/pci.h>
>>> +#include <linux/errno.h>
>>> +#include <linux/acpi.h>
>>> +#include <linux/hash.h>
>>> +#include <linux/cpufreq.h>
>>> +
>>> +#include "kfd_priv.h"
>>> +#include "kfd_crat.h"
>>> +#include "kfd_topology.h"
>>> +
>>> +static struct list_head topology_device_list;
>>> +static int topology_crat_parsed;
>>> +static struct kfd_system_properties sys_props;
>>> +
>>> +static DECLARE_RWSEM(topology_lock);
>>> +
>>> +struct kfd_dev *kfd_device_by_id(uint32_t gpu_id)
>>> +{
>>> +    struct kfd_topology_device *top_dev;
>>> +    struct kfd_dev *device = NULL;
>>> +
>>> +    down_read(&topology_lock);
>>> +
>>> +    list_for_each_entry(top_dev, &topology_device_list, list)
>>> +        if (top_dev->gpu_id == gpu_id) {
>>> +            device = top_dev->gpu;
>>> +            break;
>>> +        }
>>> +
>>> +    up_read(&topology_lock);
>>> +
>>> +    return device;
>>> +}
>>> +
>>> +struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev)
>>> +{
>>> +    struct kfd_topology_device *top_dev;
>>> +    struct kfd_dev *device = NULL;
>>> +
>>> +    down_read(&topology_lock);
>>> +
>>> +    list_for_each_entry(top_dev, &topology_device_list, list)
>>> +        if (top_dev->gpu->pdev == pdev) {
>>> +            device = top_dev->gpu;
>>> +            break;
>>> +        }
>>> +
>>> +    up_read(&topology_lock);
>>> +
>>> +    return device;
>>> +}
>>> +
>>> +static int kfd_topology_get_crat_acpi(void *crat_image, size_t *size)
>>> +{
>>> +    struct acpi_table_header *crat_table;
>>> +    acpi_status status;
>>> +
>>> +    if (!size)
>>> +        return -EINVAL;
>>> +
>>> +    /*
>>> +     * Fetch the CRAT table from ACPI
>>> +     */
>>> +    status = acpi_get_table(CRAT_SIGNATURE, 0, &crat_table);
>>> +    if (status == AE_NOT_FOUND) {
>>> +        pr_warn("CRAT table not found\n");
>>> +        return -ENODATA;
>>> +    } else if (ACPI_FAILURE(status)) {
>>> +        const char *err = acpi_format_exception(status);
>>> +
>>> +        pr_err("CRAT table error: %s\n", err);
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    if (*size >= crat_table->length && crat_image != 0)
>>> +        memcpy(crat_image, crat_table, crat_table->length);
>>> +
>>> +    *size = crat_table->length;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
>>> +        struct crat_subtype_computeunit *cu)
>>> +{
>>> +    BUG_ON(!dev);
>>> +    BUG_ON(!cu);
>>> +
>>> +    dev->node_props.cpu_cores_count = cu->num_cpu_cores;
>>> +    dev->node_props.cpu_core_id_base = cu->processor_id_low;
>>> +    if (cu->hsa_capability & CRAT_CU_FLAGS_IOMMU_PRESENT)
>>> +        dev->node_props.capability |= HSA_CAP_ATS_PRESENT;
>>> +
>>> +    pr_info("CU CPU: cores=%d id_base=%d\n", cu->num_cpu_cores,
>>> +            cu->processor_id_low);
>>> +}
>>> +
>>> +static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
>>> +        struct crat_subtype_computeunit *cu)
>>> +{
>>> +    BUG_ON(!dev);
>>> +    BUG_ON(!cu);
>>> +
>>> +    dev->node_props.simd_id_base = cu->processor_id_low;
>>> +    dev->node_props.simd_count = cu->num_simd_cores;
>>> +    dev->node_props.lds_size_in_kb = cu->lds_size_in_kb;
>>> +    dev->node_props.max_waves_per_simd = cu->max_waves_simd;
>>> +    dev->node_props.wave_front_size = cu->wave_front_size;
>>> +    dev->node_props.mem_banks_count = cu->num_banks;
>>> +    dev->node_props.array_count = cu->num_arrays;
>>> +    dev->node_props.cu_per_simd_array = cu->num_cu_per_array;
>>> +    dev->node_props.simd_per_cu = cu->num_simd_per_cu;
>>> +    dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
>>> +    if (cu->hsa_capability & CRAT_CU_FLAGS_HOT_PLUGGABLE)
>>> +        dev->node_props.capability |= HSA_CAP_HOT_PLUGGABLE;
>>> +    pr_info("CU GPU: simds=%d id_base=%d\n", cu->num_simd_cores,
>>> +                cu->processor_id_low);
>>> +}
>>> +
>>> +/* kfd_parse_subtype_cu is called when the topology mutex is already
>>> acquired */
>>> +static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu)
>>> +{
>>> +    struct kfd_topology_device *dev;
>>> +    int i = 0;
>>> +
>>> +    BUG_ON(!cu);
>>> +
>>> +    pr_info("Found CU entry in CRAT table with proximity_domain=%d caps=%x\n",
>>> +            cu->proximity_domain, cu->hsa_capability);
>>> +    list_for_each_entry(dev, &topology_device_list, list) {
>>> +        if (cu->proximity_domain == i) {
>>> +            if (cu->flags & CRAT_CU_FLAGS_CPU_PRESENT)
>>> +                kfd_populated_cu_info_cpu(dev, cu);
>>> +
>>> +            if (cu->flags & CRAT_CU_FLAGS_GPU_PRESENT)
>>> +                kfd_populated_cu_info_gpu(dev, cu);
>>> +            break;
>>> +        }
>>> +        i++;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +/* kfd_parse_subtype_mem is called when the topology mutex is already
>>> acquired */
>>> +static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem)
>>> +{
>>> +    struct kfd_mem_properties *props;
>>> +    struct kfd_topology_device *dev;
>>> +    int i = 0;
>>> +
>>> +    BUG_ON(!mem);
>>> +
>>> +    pr_info("Found memory entry in CRAT table with proximity_domain=%d\n",
>>> +            mem->promixity_domain);
>>> +    list_for_each_entry(dev, &topology_device_list, list) {
>>> +        if (mem->promixity_domain == i) {
>>> +            props = kfd_alloc_struct(props);
>>> +            if (props == 0)
>>> +                return -ENOMEM;
>>> +
>>> +            if (dev->node_props.cpu_cores_count == 0)
>>> +                props->heap_type = HSA_MEM_HEAP_TYPE_FB_PRIVATE;
>>> +            else
>>> +                props->heap_type = HSA_MEM_HEAP_TYPE_SYSTEM;
>>> +
>>> +            if (mem->flags & CRAT_MEM_FLAGS_HOT_PLUGGABLE)
>>> +                props->flags |= HSA_MEM_FLAGS_HOT_PLUGGABLE;
>>> +            if (mem->flags & CRAT_MEM_FLAGS_NON_VOLATILE)
>>> +                props->flags |= HSA_MEM_FLAGS_NON_VOLATILE;
>>> +
>>> +            props->size_in_bytes = ((uint64_t)mem->length_high << 32) +
>>> +                        mem->length_low;
>>> +            props->width = mem->width;
>>> +
>>> +            dev->mem_bank_count++;
>>> +            list_add_tail(&props->list, &dev->mem_props);
>>> +
>>> +            break;
>>> +        }
>>> +        i++;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +/* kfd_parse_subtype_cache is called when the topology mutex is already
>>> acquired */
>>> +static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
>>> +{
>>> +    struct kfd_cache_properties *props;
>>> +    struct kfd_topology_device *dev;
>>> +    uint32_t id;
>>> +
>>> +    BUG_ON(!cache);
>>> +
>>> +    id = cache->processor_id_low;
>>> +
>>> +    pr_info("Found cache entry in CRAT table with processor_id=%d\n", id);
>>> +    list_for_each_entry(dev, &topology_device_list, list)
>>> +        if (id == dev->node_props.cpu_core_id_base ||
>>> +            id == dev->node_props.simd_id_base) {
>>> +            props = kfd_alloc_struct(props);
>>> +            if (props == 0)
>>> +                return -ENOMEM;
>>> +
>>> +            props->processor_id_low = id;
>>> +            props->cache_level = cache->cache_level;
>>> +            props->cache_size = cache->cache_size;
>>> +            props->cacheline_size = cache->cache_line_size;
>>> +            props->cachelines_per_tag = cache->lines_per_tag;
>>> +            props->cache_assoc = cache->associativity;
>>> +            props->cache_latency = cache->cache_latency;
>>> +
>>> +            if (cache->flags & CRAT_CACHE_FLAGS_DATA_CACHE)
>>> +                props->cache_type |= HSA_CACHE_TYPE_DATA;
>>> +            if (cache->flags & CRAT_CACHE_FLAGS_INST_CACHE)
>>> +                props->cache_type |= HSA_CACHE_TYPE_INSTRUCTION;
>>> +            if (cache->flags & CRAT_CACHE_FLAGS_CPU_CACHE)
>>> +                props->cache_type |= HSA_CACHE_TYPE_CPU;
>>> +            if (cache->flags & CRAT_CACHE_FLAGS_SIMD_CACHE)
>>> +                props->cache_type |= HSA_CACHE_TYPE_HSACU;
>>> +
>>> +            dev->cache_count++;
>>> +            dev->node_props.caches_count++;
>>> +            list_add_tail(&props->list, &dev->cache_props);
>>> +
>>> +            break;
>>> +        }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +/* kfd_parse_subtype_iolink is called when the topology mutex is already
>>> acquired */
>>> +static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
>>> +{
>>> +    struct kfd_iolink_properties *props;
>>> +    struct kfd_topology_device *dev;
>>> +    uint32_t i = 0;
>>> +    uint32_t id_from;
>>> +    uint32_t id_to;
>>> +
>>> +    BUG_ON(!iolink);
>>> +
>>> +    id_from = iolink->proximity_domain_from;
>>> +    id_to = iolink->proximity_domain_to;
>>> +
>>> +    pr_info("Found IO link entry in CRAT table with id_from=%d\n", id_from);
>>> +    list_for_each_entry(dev, &topology_device_list, list) {
>>> +        if (id_from == i) {
>>> +            props = kfd_alloc_struct(props);
>>> +            if (props == 0)
>>> +                return -ENOMEM;
>>> +
>>> +            props->node_from = id_from;
>>> +            props->node_to = id_to;
>>> +            props->ver_maj = iolink->version_major;
>>> +            props->ver_min = iolink->version_minor;
>>> +
>>> +            /*
>>> +             * weight factor (derived from CDIR), currently always 1
>>> +             */
>>> +            props->weight = 1;
>>> +
>>> +            props->min_latency = iolink->minimum_latency;
>>> +            props->max_latency = iolink->maximum_latency;
>>> +            props->min_bandwidth = iolink->minimum_bandwidth_mbs;
>>> +            props->max_bandwidth = iolink->maximum_bandwidth_mbs;
>>> +            props->rec_transfer_size =
>>> +                    iolink->recommended_transfer_size;
>>> +
>>> +            dev->io_link_count++;
>>> +            dev->node_props.io_links_count++;
>>> +            list_add_tail(&props->list, &dev->io_link_props);
>>> +
>>> +            break;
>>> +        }
>>> +        i++;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
>>> +{
>>> +    struct crat_subtype_computeunit *cu;
>>> +    struct crat_subtype_memory *mem;
>>> +    struct crat_subtype_cache *cache;
>>> +    struct crat_subtype_iolink *iolink;
>>> +    int ret = 0;
>>> +
>>> +    BUG_ON(!sub_type_hdr);
>>> +
>>> +    switch (sub_type_hdr->type) {
>>> +    case CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY:
>>> +        cu = (struct crat_subtype_computeunit *)sub_type_hdr;
>>> +        ret = kfd_parse_subtype_cu(cu);
>>> +        break;
>>> +    case CRAT_SUBTYPE_MEMORY_AFFINITY:
>>> +        mem = (struct crat_subtype_memory *)sub_type_hdr;
>>> +        ret = kfd_parse_subtype_mem(mem);
>>> +        break;
>>> +    case CRAT_SUBTYPE_CACHE_AFFINITY:
>>> +        cache = (struct crat_subtype_cache *)sub_type_hdr;
>>> +        ret = kfd_parse_subtype_cache(cache);
>>> +        break;
>>> +    case CRAT_SUBTYPE_TLB_AFFINITY:
>>> +        /*
>>> +         * For now, nothing to do here
>>> +         */
>>> +        pr_info("Found TLB entry in CRAT table (not processing)\n");
>>> +        break;
>>> +    case CRAT_SUBTYPE_CCOMPUTE_AFFINITY:
>>> +        /*
>>> +         * For now, nothing to do here
>>> +         */
>>> +        pr_info("Found CCOMPUTE entry in CRAT table (not processing)\n");
>>> +        break;
>>> +    case CRAT_SUBTYPE_IOLINK_AFFINITY:
>>> +        iolink = (struct crat_subtype_iolink *)sub_type_hdr;
>>> +        ret = kfd_parse_subtype_iolink(iolink);
>>> +        break;
>>> +    default:
>>> +        pr_warn("Unknown subtype (%d) in CRAT\n",
>>> +                sub_type_hdr->type);
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>> +{
>>> +    struct kfd_mem_properties *mem;
>>> +    struct kfd_cache_properties *cache;
>>> +    struct kfd_iolink_properties *iolink;
>>> +
>>> +    BUG_ON(!dev);
>>> +
>>> +    list_del(&dev->list);
>>> +
>>> +    while (dev->mem_props.next != &dev->mem_props) {
>>> +        mem = container_of(dev->mem_props.next,
>>> +                struct kfd_mem_properties, list);
>>> +        list_del(&mem->list);
>>> +        kfree(mem);
>>> +    }
>>> +
>>> +    while (dev->cache_props.next != &dev->cache_props) {
>>> +        cache = container_of(dev->cache_props.next,
>>> +                struct kfd_cache_properties, list);
>>> +        list_del(&cache->list);
>>> +        kfree(cache);
>>> +    }
>>> +
>>> +    while (dev->io_link_props.next != &dev->io_link_props) {
>>> +        iolink = container_of(dev->io_link_props.next,
>>> +                struct kfd_iolink_properties, list);
>>> +        list_del(&iolink->list);
>>> +        kfree(iolink);
>>> +    }
>>> +
>>> +    kfree(dev);
>>> +
>>> +    sys_props.num_devices--;
>>> +}
>>> +
>>> +static void kfd_release_live_view(void)
>>> +{
>>> +    struct kfd_topology_device *dev;
>>> +
>>> +    while (topology_device_list.next != &topology_device_list) {
>>> +        dev = container_of(topology_device_list.next,
>>> +                 struct kfd_topology_device, list);
>>> +        kfd_release_topology_device(dev);
>>> +}
>>> +
>>> +    memset(&sys_props, 0, sizeof(sys_props));
>>> +}
>>> +
>>> +static struct kfd_topology_device *kfd_create_topology_device(void)
>>> +{
>>> +    struct kfd_topology_device *dev;
>>> +
>>> +    dev = kfd_alloc_struct(dev);
>>> +    if (dev == 0) {
>>> +        pr_err("No memory to allocate a topology device");
>>> +        return 0;
>>> +    }
>>> +
>>> +    INIT_LIST_HEAD(&dev->mem_props);
>>> +    INIT_LIST_HEAD(&dev->cache_props);
>>> +    INIT_LIST_HEAD(&dev->io_link_props);
>>> +
>>> +    list_add_tail(&dev->list, &topology_device_list);
>>> +    sys_props.num_devices++;
>>> +
>>> +    return dev;
>>> +    }
>>> +
>>> +static int kfd_parse_crat_table(void *crat_image)
>>> +{
>>> +    struct kfd_topology_device *top_dev;
>>> +    struct crat_subtype_generic *sub_type_hdr;
>>> +    uint16_t node_id;
>>> +    int ret;
>>> +    struct crat_header *crat_table = (struct crat_header *)crat_image;
>>> +    uint16_t num_nodes;
>>> +    uint32_t image_len;
>>> +
>>> +    if (!crat_image)
>>> +        return -EINVAL;
>>> +
>>> +    num_nodes = crat_table->num_domains;
>>> +    image_len = crat_table->length;
>>> +
>>> +    pr_info("Parsing CRAT table with %d nodes\n", num_nodes);
>>> +
>>> +    for (node_id = 0; node_id < num_nodes; node_id++) {
>>> +        top_dev = kfd_create_topology_device();
>>> +        if (!top_dev) {
>>> +            kfd_release_live_view();
>>> +            return -ENOMEM;
>>> +        }
>>> +    }
>>> +
>>> +    sys_props.platform_id = (*((uint64_t *)crat_table->oem_id)) &
>>> CRAT_OEMID_64BIT_MASK;
>>> +    sys_props.platform_oem = *((uint64_t *)crat_table->oem_table_id);
>>> +    sys_props.platform_rev = crat_table->revision;
>>> +
>>> +    sub_type_hdr = (struct crat_subtype_generic *)(crat_table+1);
>>> +    while ((char *)sub_type_hdr + sizeof(struct crat_subtype_generic) <
>>> +            ((char *)crat_image) + image_len) {
>>> +        if (sub_type_hdr->flags & CRAT_SUBTYPE_FLAGS_ENABLED) {
>>> +            ret = kfd_parse_subtype(sub_type_hdr);
>>> +            if (ret != 0) {
>>> +                kfd_release_live_view();
>>> +                return ret;
>>> +            }
>>> +        }
>>> +
>>> +        sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
>>> +                sub_type_hdr->length);
>>> +    }
>>> +
>>> +    sys_props.generation_count++;
>>> +    topology_crat_parsed = 1;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +
>>> +#define sysfs_show_gen_prop(buffer, fmt, ...) \
>>> +        snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__)
>>> +#define sysfs_show_32bit_prop(buffer, name, value) \
>>> +        sysfs_show_gen_prop(buffer, "%s %u\n", name, value)
>>> +#define sysfs_show_64bit_prop(buffer, name, value) \
>>> +        sysfs_show_gen_prop(buffer, "%s %llu\n", name, value)
>>> +#define sysfs_show_32bit_val(buffer, value) \
>>> +        sysfs_show_gen_prop(buffer, "%u\n", value)
>>> +#define sysfs_show_str_val(buffer, value) \
>>> +        sysfs_show_gen_prop(buffer, "%s\n", value)
>>> +
>>> +static ssize_t sysprops_show(struct kobject *kobj, struct attribute *attr,
>>> +        char *buffer)
>>> +{
>>> +    ssize_t ret;
>>> +
>>> +    /* Making sure that the buffer is an empty string */
>>> +    buffer[0] = 0;
>>> +
>>> +    if (attr == &sys_props.attr_genid) {
>>> +        ret = sysfs_show_32bit_val(buffer, sys_props.generation_count);
>>> +    } else if (attr == &sys_props.attr_props) {
>>> +        sysfs_show_64bit_prop(buffer, "platform_oem",
>>> +                sys_props.platform_oem);
>>> +        sysfs_show_64bit_prop(buffer, "platform_id",
>>> +                sys_props.platform_id);
>>> +        ret = sysfs_show_64bit_prop(buffer, "platform_rev",
>>> +                sys_props.platform_rev);
>>> +    } else {
>>> +        ret = -EINVAL;
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static const struct sysfs_ops sysprops_ops = {
>>> +    .show = sysprops_show,
>>> +};
>>> +
>>> +static struct kobj_type sysprops_type = {
>>> +    .sysfs_ops = &sysprops_ops,
>>> +};
>>> +
>>> +static ssize_t iolink_show(struct kobject *kobj, struct attribute *attr,
>>> +        char *buffer)
>>> +{
>>> +    ssize_t ret;
>>> +    struct kfd_iolink_properties *iolink;
>>> +
>>> +    /* Making sure that the buffer is an empty string */
>>> +    buffer[0] = 0;
>>> +
>>> +    iolink = container_of(attr, struct kfd_iolink_properties, attr);
>>> +    sysfs_show_32bit_prop(buffer, "type", iolink->iolink_type);
>>> +    sysfs_show_32bit_prop(buffer, "version_major", iolink->ver_maj);
>>> +    sysfs_show_32bit_prop(buffer, "version_minor", iolink->ver_min);
>>> +    sysfs_show_32bit_prop(buffer, "node_from", iolink->node_from);
>>> +    sysfs_show_32bit_prop(buffer, "node_to", iolink->node_to);
>>> +    sysfs_show_32bit_prop(buffer, "weight", iolink->weight);
>>> +    sysfs_show_32bit_prop(buffer, "min_latency", iolink->min_latency);
>>> +    sysfs_show_32bit_prop(buffer, "max_latency", iolink->max_latency);
>>> +    sysfs_show_32bit_prop(buffer, "min_bandwidth", iolink->min_bandwidth);
>>> +    sysfs_show_32bit_prop(buffer, "max_bandwidth", iolink->max_bandwidth);
>>> +    sysfs_show_32bit_prop(buffer, "recommended_transfer_size",
>>> +            iolink->rec_transfer_size);
>>> +    ret = sysfs_show_32bit_prop(buffer, "flags", iolink->flags);
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static const struct sysfs_ops iolink_ops = {
>>> +    .show = iolink_show,
>>> +};
>>> +
>>> +static struct kobj_type iolink_type = {
>>> +    .sysfs_ops = &iolink_ops,
>>> +};
>>> +
>>> +static ssize_t mem_show(struct kobject *kobj, struct attribute *attr,
>>> +        char *buffer)
>>> +{
>>> +    ssize_t ret;
>>> +    struct kfd_mem_properties *mem;
>>> +
>>> +    /* Making sure that the buffer is an empty string */
>>> +    buffer[0] = 0;
>>> +
>>> +    mem = container_of(attr, struct kfd_mem_properties, attr);
>>> +    sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
>>> +    sysfs_show_64bit_prop(buffer, "size_in_bytes", mem->size_in_bytes);
>>> +    sysfs_show_32bit_prop(buffer, "flags", mem->flags);
>>> +    sysfs_show_32bit_prop(buffer, "width", mem->width);
>>> +    ret = sysfs_show_32bit_prop(buffer, "mem_clk_max", mem->mem_clk_max);
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static const struct sysfs_ops mem_ops = {
>>> +    .show = mem_show,
>>> +};
>>> +
>>> +static struct kobj_type mem_type = {
>>> +    .sysfs_ops = &mem_ops,
>>> +};
>>> +
>>> +static ssize_t kfd_cache_show(struct kobject *kobj, struct attribute *attr,
>>> +        char *buffer)
>>> +{
>>> +    ssize_t ret;
>>> +    uint32_t i;
>>> +    struct kfd_cache_properties *cache;
>>> +
>>> +    /* Making sure that the buffer is an empty string */
>>> +    buffer[0] = 0;
>>> +
>>> +    cache = container_of(attr, struct kfd_cache_properties, attr);
>>> +    sysfs_show_32bit_prop(buffer, "processor_id_low",
>>> +            cache->processor_id_low);
>>> +    sysfs_show_32bit_prop(buffer, "level", cache->cache_level);
>>> +    sysfs_show_32bit_prop(buffer, "size", cache->cache_size);
>>> +    sysfs_show_32bit_prop(buffer, "cache_line_size", cache->cacheline_size);
>>> +    sysfs_show_32bit_prop(buffer, "cache_lines_per_tag",
>>> +            cache->cachelines_per_tag);
>>> +    sysfs_show_32bit_prop(buffer, "association", cache->cache_assoc);
>>> +    sysfs_show_32bit_prop(buffer, "latency", cache->cache_latency);
>>> +    sysfs_show_32bit_prop(buffer, "type", cache->cache_type);
>>> +    snprintf(buffer, PAGE_SIZE, "%ssibling_map ", buffer);
>>> +    for (i = 0; i < KFD_TOPOLOGY_CPU_SIBLINGS; i++)
>>> +        ret = snprintf(buffer, PAGE_SIZE, "%s%d%s",
>>> +                buffer, cache->sibling_map[i],
>>> +                (i == KFD_TOPOLOGY_CPU_SIBLINGS-1) ?
>>> +                        "\n" : ",");
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static const struct sysfs_ops cache_ops = {
>>> +    .show = kfd_cache_show,
>>> +};
>>> +
>>> +static struct kobj_type cache_type = {
>>> +    .sysfs_ops = &cache_ops,
>>> +};
>>> +
>>> +static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>>> +        char *buffer)
>>> +{
>>> +    ssize_t ret;
>>> +    struct kfd_topology_device *dev;
>>> +    char public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
>>> +    uint32_t i;
>>> +
>>> +    /* Making sure that the buffer is an empty string */
>>> +    buffer[0] = 0;
>>> +
>>> +    if (strcmp(attr->name, "gpu_id") == 0) {
>>> +        dev = container_of(attr, struct kfd_topology_device,
>>> +                attr_gpuid);
>>> +        ret = sysfs_show_32bit_val(buffer, dev->gpu_id);
>>> +    } else if (strcmp(attr->name, "name") == 0) {
>>> +        dev = container_of(attr, struct kfd_topology_device,
>>> +                attr_name);
>>> +        for (i = 0; i < KFD_TOPOLOGY_PUBLIC_NAME_SIZE; i++) {
>>> +            public_name[i] =
>>> +                    (char)dev->node_props.marketing_name[i];
>>> +            if (dev->node_props.marketing_name[i] == 0)
>>> +                break;
>>> +        }
>>> +        public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE-1] = 0x0;
>>> +        ret = sysfs_show_str_val(buffer, public_name);
>>> +    } else {
>>> +        dev = container_of(attr, struct kfd_topology_device,
>>> +                attr_props);
>>> +        sysfs_show_32bit_prop(buffer, "cpu_cores_count",
>>> +                dev->node_props.cpu_cores_count);
>>> +        sysfs_show_32bit_prop(buffer, "simd_count",
>>> +                dev->node_props.simd_count);
>>> +        sysfs_show_32bit_prop(buffer, "mem_banks_count",
>>> +                dev->node_props.mem_banks_count);
>>> +        sysfs_show_32bit_prop(buffer, "caches_count",
>>> +                dev->node_props.caches_count);
>>> +        sysfs_show_32bit_prop(buffer, "io_links_count",
>>> +                dev->node_props.io_links_count);
>>> +        sysfs_show_32bit_prop(buffer, "cpu_core_id_base",
>>> +                dev->node_props.cpu_core_id_base);
>>> +        sysfs_show_32bit_prop(buffer, "simd_id_base",
>>> +                dev->node_props.simd_id_base);
>>> +        sysfs_show_32bit_prop(buffer, "capability",
>>> +                dev->node_props.capability);
>>> +        sysfs_show_32bit_prop(buffer, "max_waves_per_simd",
>>> +                dev->node_props.max_waves_per_simd);
>>> +        sysfs_show_32bit_prop(buffer, "lds_size_in_kb",
>>> +                dev->node_props.lds_size_in_kb);
>>> +        sysfs_show_32bit_prop(buffer, "gds_size_in_kb",
>>> +                dev->node_props.gds_size_in_kb);
>>> +        sysfs_show_32bit_prop(buffer, "wave_front_size",
>>> +                dev->node_props.wave_front_size);
>>> +        sysfs_show_32bit_prop(buffer, "array_count",
>>> +                dev->node_props.array_count);
>>> +        sysfs_show_32bit_prop(buffer, "simd_arrays_per_engine",
>>> +                dev->node_props.simd_arrays_per_engine);
>>> +        sysfs_show_32bit_prop(buffer, "cu_per_simd_array",
>>> +                dev->node_props.cu_per_simd_array);
>>> +        sysfs_show_32bit_prop(buffer, "simd_per_cu",
>>> +                dev->node_props.simd_per_cu);
>>> +        sysfs_show_32bit_prop(buffer, "max_slots_scratch_cu",
>>> +                dev->node_props.max_slots_scratch_cu);
>>> +        sysfs_show_32bit_prop(buffer, "engine_id",
>>> +                dev->node_props.engine_id);
>>> +        sysfs_show_32bit_prop(buffer, "vendor_id",
>>> +                dev->node_props.vendor_id);
>>> +        sysfs_show_32bit_prop(buffer, "device_id",
>>> +                dev->node_props.device_id);
>>> +        sysfs_show_32bit_prop(buffer, "location_id",
>>> +                dev->node_props.location_id);
>>> +        sysfs_show_32bit_prop(buffer, "max_engine_clk_fcompute",
>>> +                kfd2kgd->get_max_engine_clock_in_mhz(
>>> +                    dev->gpu->kgd));
>>> +        sysfs_show_64bit_prop(buffer, "local_mem_size",
>>> +                kfd2kgd->get_vmem_size(dev->gpu->kgd));
>>> +        ret = sysfs_show_32bit_prop(buffer, "max_engine_clk_ccompute",
>>> +                cpufreq_quick_get_max(0)/1000);
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +static const struct sysfs_ops node_ops = {
>>> +    .show = node_show,
>>> +};
>>> +
>>> +static struct kobj_type node_type = {
>>> +    .sysfs_ops = &node_ops,
>>> +};
>>> +
>>> +static void kfd_remove_sysfs_file(struct kobject *kobj, struct attribute *attr)
>>> +{
>>> +    sysfs_remove_file(kobj, attr);
>>> +    kobject_del(kobj);
>>> +    kobject_put(kobj);
>>> +}
>>> +
>>> +static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>> +{
>>> +    struct kfd_iolink_properties *iolink;
>>> +    struct kfd_cache_properties *cache;
>>> +    struct kfd_mem_properties *mem;
>>> +
>>> +    BUG_ON(!dev);
>>> +
>>> +    if (dev->kobj_iolink) {
>>> +        list_for_each_entry(iolink, &dev->io_link_props, list)
>>> +            if (iolink->kobj) {
>>> +                kfd_remove_sysfs_file(iolink->kobj, &iolink->attr);
>>> +                iolink->kobj = 0;
>>> +            }
>>> +        kobject_del(dev->kobj_iolink);
>>> +        kobject_put(dev->kobj_iolink);
>>> +        dev->kobj_iolink = 0;
>>> +    }
>>> +
>>> +    if (dev->kobj_cache) {
>>> +        list_for_each_entry(cache, &dev->cache_props, list)
>>> +            if (cache->kobj) {
>>> +                kfd_remove_sysfs_file(cache->kobj, &cache->attr);
>>> +                cache->kobj = 0;
>>> +            }
>>> +        kobject_del(dev->kobj_cache);
>>> +        kobject_put(dev->kobj_cache);
>>> +        dev->kobj_cache = 0;
>>> +    }
>>> +
>>> +    if (dev->kobj_mem) {
>>> +        list_for_each_entry(mem, &dev->mem_props, list)
>>> +            if (mem->kobj) {
>>> +                kfd_remove_sysfs_file(mem->kobj, &mem->attr);
>>> +                mem->kobj = 0;
>>> +            }
>>> +        kobject_del(dev->kobj_mem);
>>> +        kobject_put(dev->kobj_mem);
>>> +        dev->kobj_mem = 0;
>>> +    }
>>> +
>>> +    if (dev->kobj_node) {
>>> +        sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
>>> +        sysfs_remove_file(dev->kobj_node, &dev->attr_name);
>>> +        sysfs_remove_file(dev->kobj_node, &dev->attr_props);
>>> +        kobject_del(dev->kobj_node);
>>> +        kobject_put(dev->kobj_node);
>>> +        dev->kobj_node = 0;
>>> +    }
>>> +}
>>> +
>>> +static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>> +        uint32_t id)
>>> +{
>>> +    struct kfd_iolink_properties *iolink;
>>> +    struct kfd_cache_properties *cache;
>>> +    struct kfd_mem_properties *mem;
>>> +    int ret;
>>> +    uint32_t i;
>>> +
>>> +    BUG_ON(!dev);
>>> +
>>> +    /*
>>> +     * Creating the sysfs folders
>>> +     */
>>> +    BUG_ON(dev->kobj_node);
>>> +    dev->kobj_node = kfd_alloc_struct(dev->kobj_node);
>>> +    if (!dev->kobj_node)
>>> +        return -ENOMEM;
>>> +
>>> +    ret = kobject_init_and_add(dev->kobj_node, &node_type,
>>> +            sys_props.kobj_nodes, "%d", id);
>>> +    if (ret < 0)
>>> +        return ret;
>>> +
>>> +    dev->kobj_mem = kobject_create_and_add("mem_banks", dev->kobj_node);
>>> +    if (!dev->kobj_mem)
>>> +        return -ENOMEM;
>>> +
>>> +    dev->kobj_cache = kobject_create_and_add("caches", dev->kobj_node);
>>> +    if (!dev->kobj_cache)
>>> +        return -ENOMEM;
>>> +
>>> +    dev->kobj_iolink = kobject_create_and_add("io_links", dev->kobj_node);
>>> +    if (!dev->kobj_iolink)
>>> +        return -ENOMEM;
>>> +
>>> +    /*
>>> +     * Creating sysfs files for node properties
>>> +     */
>>> +    dev->attr_gpuid.name = "gpu_id";
>>> +    dev->attr_gpuid.mode = KFD_SYSFS_FILE_MODE;
>>> +    sysfs_attr_init(&dev->attr_gpuid);
>>> +    dev->attr_name.name = "name";
>>> +    dev->attr_name.mode = KFD_SYSFS_FILE_MODE;
>>> +    sysfs_attr_init(&dev->attr_name);
>>> +    dev->attr_props.name = "properties";
>>> +    dev->attr_props.mode = KFD_SYSFS_FILE_MODE;
>>> +    sysfs_attr_init(&dev->attr_props);
>>> +    ret = sysfs_create_file(dev->kobj_node, &dev->attr_gpuid);
>>> +    if (ret < 0)
>>> +        return ret;
>>> +    ret = sysfs_create_file(dev->kobj_node, &dev->attr_name);
>>> +    if (ret < 0)
>>> +        return ret;
>>> +    ret = sysfs_create_file(dev->kobj_node, &dev->attr_props);
>>> +    if (ret < 0)
>>> +        return ret;
>>> +
>>> +    i = 0;
>>> +    list_for_each_entry(mem, &dev->mem_props, list) {
>>> +        mem->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
>>> +        if (!mem->kobj)
>>> +            return -ENOMEM;
>>> +        ret = kobject_init_and_add(mem->kobj, &mem_type,
>>> +                dev->kobj_mem, "%d", i);
>>> +        if (ret < 0)
>>> +            return ret;
>>> +
>>> +        mem->attr.name = "properties";
>>> +        mem->attr.mode = KFD_SYSFS_FILE_MODE;
>>> +        sysfs_attr_init(&mem->attr);
>>> +        ret = sysfs_create_file(mem->kobj, &mem->attr);
>>> +        if (ret < 0)
>>> +            return ret;
>>> +        i++;
>>> +    }
>>> +
>>> +    i = 0;
>>> +    list_for_each_entry(cache, &dev->cache_props, list) {
>>> +        cache->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
>>> +        if (!cache->kobj)
>>> +            return -ENOMEM;
>>> +        ret = kobject_init_and_add(cache->kobj, &cache_type,
>>> +                dev->kobj_cache, "%d", i);
>>> +        if (ret < 0)
>>> +            return ret;
>>> +
>>> +        cache->attr.name = "properties";
>>> +        cache->attr.mode = KFD_SYSFS_FILE_MODE;
>>> +        sysfs_attr_init(&cache->attr);
>>> +        ret = sysfs_create_file(cache->kobj, &cache->attr);
>>> +        if (ret < 0)
>>> +            return ret;
>>> +        i++;
>>> +    }
>>> +
>>> +    i = 0;
>>> +    list_for_each_entry(iolink, &dev->io_link_props, list) {
>>> +        iolink->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
>>> +        if (!iolink->kobj)
>>> +            return -ENOMEM;
>>> +        ret = kobject_init_and_add(iolink->kobj, &iolink_type,
>>> +                dev->kobj_iolink, "%d", i);
>>> +        if (ret < 0)
>>> +            return ret;
>>> +
>>> +        iolink->attr.name = "properties";
>>> +        iolink->attr.mode = KFD_SYSFS_FILE_MODE;
>>> +        sysfs_attr_init(&iolink->attr);
>>> +        ret = sysfs_create_file(iolink->kobj, &iolink->attr);
>>> +        if (ret < 0)
>>> +            return ret;
>>> +        i++;
>>> +}
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int kfd_build_sysfs_node_tree(void)
>>> +{
>>> +    struct kfd_topology_device *dev;
>>> +    int ret;
>>> +    uint32_t i = 0;
>>> +
>>> +    list_for_each_entry(dev, &topology_device_list, list) {
>>> +        ret = kfd_build_sysfs_node_entry(dev, 0);
>>> +        if (ret < 0)
>>> +            return ret;
>>> +        i++;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static void kfd_remove_sysfs_node_tree(void)
>>> +{
>>> +    struct kfd_topology_device *dev;
>>> +
>>> +    list_for_each_entry(dev, &topology_device_list, list)
>>> +        kfd_remove_sysfs_node_entry(dev);
>>> +}
>>> +
>>> +static int kfd_topology_update_sysfs(void)
>>> +{
>>> +    int ret;
>>> +
>>> +    pr_info("Creating topology SYSFS entries\n");
>>> +    if (sys_props.kobj_topology == 0) {
>>> +        sys_props.kobj_topology = kfd_alloc_struct(sys_props.kobj_topology);
>>> +        if (!sys_props.kobj_topology)
>>> +            return -ENOMEM;
>>> +
>>> +        ret = kobject_init_and_add(sys_props.kobj_topology,
>>> +                &sysprops_type,  &kfd_device->kobj,
>>> +                "topology");
>>> +        if (ret < 0)
>>> +            return ret;
>>> +
>>> +        sys_props.kobj_nodes = kobject_create_and_add("nodes",
>>> +                sys_props.kobj_topology);
>>> +        if (!sys_props.kobj_nodes)
>>> +            return -ENOMEM;
>>> +
>>> +        sys_props.attr_genid.name = "generation_id";
>>> +        sys_props.attr_genid.mode = KFD_SYSFS_FILE_MODE;
>>> +        sysfs_attr_init(&sys_props.attr_genid);
>>> +        ret = sysfs_create_file(sys_props.kobj_topology,
>>> +                &sys_props.attr_genid);
>>> +        if (ret < 0)
>>> +            return ret;
>>> +
>>> +        sys_props.attr_props.name = "system_properties";
>>> +        sys_props.attr_props.mode = KFD_SYSFS_FILE_MODE;
>>> +        sysfs_attr_init(&sys_props.attr_props);
>>> +        ret = sysfs_create_file(sys_props.kobj_topology,
>>> +                &sys_props.attr_props);
>>> +        if (ret < 0)
>>> +            return ret;
>>> +    }
>>> +
>>> +    kfd_remove_sysfs_node_tree();
>>> +
>>> +    return kfd_build_sysfs_node_tree();
>>> +}
>>> +
>>> +static void kfd_topology_release_sysfs(void)
>>> +{
>>> +    kfd_remove_sysfs_node_tree();
>>> +    if (sys_props.kobj_topology) {
>>> +        sysfs_remove_file(sys_props.kobj_topology,
>>> +                &sys_props.attr_genid);
>>> +        sysfs_remove_file(sys_props.kobj_topology,
>>> +                &sys_props.attr_props);
>>> +        if (sys_props.kobj_nodes) {
>>> +            kobject_del(sys_props.kobj_nodes);
>>> +            kobject_put(sys_props.kobj_nodes);
>>> +            sys_props.kobj_nodes = 0;
>>> +        }
>>> +        kobject_del(sys_props.kobj_topology);
>>> +        kobject_put(sys_props.kobj_topology);
>>> +        sys_props.kobj_topology = 0;
>>> +    }
>>> +}
>>> +
>>> +int kfd_topology_init(void)
>>> +{
>>> +    void *crat_image = 0;
>>> +    size_t image_size = 0;
>>> +    int ret;
>>> +
>>> +    /*
>>> +     * Initialize the head for the topology device list
>>> +     */
>>> +    INIT_LIST_HEAD(&topology_device_list);
>>> +    init_rwsem(&topology_lock);
>>> +    topology_crat_parsed = 0;
>>> +
>>> +    memset(&sys_props, 0, sizeof(sys_props));
>>> +
>>> +    /*
>>> +     * Get the CRAT image from the ACPI
>>> +     */
>>> +    ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
>>> +    if (ret == 0 && image_size > 0) {
>>> +        pr_info("Found CRAT image with size=%zd\n", image_size);
>>> +        crat_image = kmalloc(image_size, GFP_KERNEL);
>>> +        if (!crat_image) {
>>> +            ret = -ENOMEM;
>>> +            pr_err("No memory for allocating CRAT image\n");
>>> +            goto err;
>>> +        }
>>> +        ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
>>> +
>>> +        if (ret == 0) {
>>> +            down_write(&topology_lock);
>>> +            ret = kfd_parse_crat_table(crat_image);
>>> +            if (ret == 0)
>>> +                ret = kfd_topology_update_sysfs();
>>> +            up_write(&topology_lock);
>>> +        } else {
>>> +            pr_err("Couldn't get CRAT table size from ACPI\n");
>>> +        }
>>> +        kfree(crat_image);
>>> +    } else if (ret == -ENODATA) {
>>> +        ret = 0;
>>> +    } else {
>>> +        pr_err("Couldn't get CRAT table size from ACPI\n");
>>> +    }
>>> +
>>> +err:
>>> +    pr_info("Finished initializing topology ret=%d\n", ret);
>>> +    return ret;
>>> +}
>>> +
>>> +void kfd_topology_shutdown(void)
>>> +{
>>> +    kfd_topology_release_sysfs();
>>> +    kfd_release_live_view();
>>> +}
>>> +
>>> +static void kfd_debug_print_topology(void)
>>> +{
>>> +    struct kfd_topology_device *dev;
>>> +    uint32_t i = 0;
>>> +
>>> +    pr_info("DEBUG PRINT OF TOPOLOGY:");
>>> +    list_for_each_entry(dev, &topology_device_list, list) {
>>> +        pr_info("Node: %d\n", i);
>>> +        pr_info("\tGPU assigned: %s\n", (dev->gpu ? "yes" : "no"));
>>> +        pr_info("\tCPU count: %d\n", dev->node_props.cpu_cores_count);
>>> +        pr_info("\tSIMD count: %d", dev->node_props.simd_count);
>>> +        i++;
>>> +    }
>>> +}
>>> +
>>> +static uint32_t kfd_generate_gpu_id(struct kfd_dev *gpu)
>>> +{
>>> +    uint32_t hashout;
>>> +    uint32_t buf[7];
>>> +    int i;
>>> +
>>> +    if (!gpu)
>>> +        return 0;
>>> +
>>> +    buf[0] = gpu->pdev->devfn;
>>> +    buf[1] = gpu->pdev->subsystem_vendor;
>>> +    buf[2] = gpu->pdev->subsystem_device;
>>> +    buf[3] = gpu->pdev->device;
>>> +    buf[4] = gpu->pdev->bus->number;
>>> +    buf[5] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) & 0xffffffff);
>>> +    buf[6] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) >> 32);
>>> +
>>> +    for (i = 0, hashout = 0; i < 7; i++)
>>> +        hashout ^= hash_32(buf[i], KFD_GPU_ID_HASH_WIDTH);
>>> +
>>> +    return hashout;
>>> +}
>>> +
>>> +static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
>>> +{
>>> +    struct kfd_topology_device *dev;
>>> +    struct kfd_topology_device *out_dev = 0;
>>> +
>>> +    BUG_ON(!gpu);
>>> +
>>> +    list_for_each_entry(dev, &topology_device_list, list)
>>> +        if (dev->gpu == 0 && dev->node_props.simd_count > 0) {
>>> +            dev->gpu = gpu;
>>> +            out_dev = dev;
>>> +            break;
>>> +        }
>>> +
>>> +    return out_dev;
>>> +}
>>> +
>>> +static void kfd_notify_gpu_change(uint32_t gpu_id, int arrival)
>>> +{
>>> +    /*
>>> +     * TODO: Generate an event for thunk about the arrival/removal
>>> +     * of the GPU
>>> +     */
>>> +}
>>> +
>>> +int kfd_topology_add_device(struct kfd_dev *gpu)
>>> +{
>>> +    uint32_t gpu_id;
>>> +    struct kfd_topology_device *dev;
>>> +    int res;
>>> +
>>> +    BUG_ON(!gpu);
>>> +
>>> +    gpu_id = kfd_generate_gpu_id(gpu);
>>> +
>>> +    pr_debug("kfd: Adding new GPU (ID: 0x%x) to topology\n", gpu_id);
>>> +
>>> +    down_write(&topology_lock);
>>> +    /*
>>> +     * Try to assign the GPU to existing topology device (generated from
>>> +     * CRAT table
>>> +     */
>>> +    dev = kfd_assign_gpu(gpu);
>>> +    if (!dev) {
>>> +        pr_info("GPU was not found in the current topology. Extending.\n");
>>> +        kfd_debug_print_topology();
>>> +        dev = kfd_create_topology_device();
>>> +        if (!dev) {
>>> +            res = -ENOMEM;
>>> +            goto err;
>>> +        }
>>> +        dev->gpu = gpu;
>>> +
>>> +        /*
>>> +         * TODO: Make a call to retrieve topology information from the
>>> +         * GPU vBIOS
>>> +         */
>>> +
>>> +        /*
>>> +         * Update the SYSFS tree, since we added another topology device
>>> +         */
>>> +        if (kfd_topology_update_sysfs() < 0)
>>> +            kfd_topology_release_sysfs();
>>> +
>>> +    }
>>> +
>>> +    dev->gpu_id = gpu_id;
>>> +    gpu->id = gpu_id;
>>> +    dev->node_props.vendor_id = gpu->pdev->vendor;
>>> +    dev->node_props.device_id = gpu->pdev->device;
>>> +    dev->node_props.location_id = (gpu->pdev->bus->number << 24) +
>>> +            (gpu->pdev->devfn & 0xffffff);
>>> +    /*
>>> +     * TODO: Retrieve max engine clock values from KGD
>>> +     */
>>> +
>>> +    res = 0;
>>> +
>>> +err:
>>> +    up_write(&topology_lock);
>>> +
>>> +    if (res == 0)
>>> +        kfd_notify_gpu_change(gpu_id, 1);
>>> +
>>> +    return res;
>>> +}
>>> +
>>> +int kfd_topology_remove_device(struct kfd_dev *gpu)
>>> +{
>>> +    struct kfd_topology_device *dev;
>>> +    uint32_t gpu_id;
>>> +    int res = -ENODEV;
>>> +
>>> +    BUG_ON(!gpu);
>>> +
>>> +    down_write(&topology_lock);
>>> +
>>> +    list_for_each_entry(dev, &topology_device_list, list)
>>> +        if (dev->gpu == gpu) {
>>> +            gpu_id = dev->gpu_id;
>>> +            kfd_remove_sysfs_node_entry(dev);
>>> +            kfd_release_topology_device(dev);
>>> +            res = 0;
>>> +            if (kfd_topology_update_sysfs() < 0)
>>> +                kfd_topology_release_sysfs();
>>> +            break;
>>> +        }
>>> +
>>> +    up_write(&topology_lock);
>>> +
>>> +    if (res == 0)
>>> +        kfd_notify_gpu_change(gpu_id, 0);
>>> +
>>> +    return res;
>>> +}
>>> +
>>> +/*
>>> + * When idx is out of bounds, the function will return NULL
>>> + */
>>> +struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx)
>>> +{
>>> +
>>> +    struct kfd_topology_device *top_dev;
>>> +    struct kfd_dev *device = NULL;
>>> +    uint8_t device_idx = 0;
>>> +
>>> +    down_read(&topology_lock);
>>> +
>>> +    list_for_each_entry(top_dev, &topology_device_list, list) {
>>> +        if (device_idx == idx) {
>>> +            device = top_dev->gpu;
>>> +            break;
>>> +        }
>>> +
>>> +        device_idx++;
>>> +    }
>>> +
>>> +    up_read(&topology_lock);
>>> +
>>> +    return device;
>>> +
>>> +}
>>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
>>> b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
>>> new file mode 100644
>>> index 0000000..989624b
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
>>> @@ -0,0 +1,168 @@
>>> +/*
>>> + * Copyright 2014 Advanced Micro Devices, Inc.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>> + * copy of this software and associated documentation files (the "Software"),
>>> + * to deal in the Software without restriction, including without limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>> + */
>>> +
>>> +#ifndef __KFD_TOPOLOGY_H__
>>> +#define __KFD_TOPOLOGY_H__
>>> +
>>> +#include <linux/types.h>
>>> +#include <linux/list.h>
>>> +#include "kfd_priv.h"
>>> +
>>> +#define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
>>> +
>>> +#define HSA_CAP_HOT_PLUGGABLE            0x00000001
>>> +#define HSA_CAP_ATS_PRESENT            0x00000002
>>> +#define HSA_CAP_SHARED_WITH_GRAPHICS        0x00000004
>>> +#define HSA_CAP_QUEUE_SIZE_POW2            0x00000008
>>> +#define HSA_CAP_QUEUE_SIZE_32BIT        0x00000010
>>> +#define HSA_CAP_QUEUE_IDLE_EVENT        0x00000020
>>> +#define HSA_CAP_VA_LIMIT            0x00000040
>>> +#define HSA_CAP_WATCH_POINTS_SUPPORTED        0x00000080
>>> +#define HSA_CAP_WATCH_POINTS_TOTALBITS_MASK    0x00000f00
>>> +#define HSA_CAP_WATCH_POINTS_TOTALBITS_SHIFT    8
>>> +#define HSA_CAP_RESERVED            0xfffff000
>>> +
>>> +struct kfd_node_properties {
>>> +    uint32_t cpu_cores_count;
>>> +    uint32_t simd_count;
>>> +    uint32_t mem_banks_count;
>>> +    uint32_t caches_count;
>>> +    uint32_t io_links_count;
>>> +    uint32_t cpu_core_id_base;
>>> +    uint32_t simd_id_base;
>>> +    uint32_t capability;
>>> +    uint32_t max_waves_per_simd;
>>> +    uint32_t lds_size_in_kb;
>>> +    uint32_t gds_size_in_kb;
>>> +    uint32_t wave_front_size;
>>> +    uint32_t array_count;
>>> +    uint32_t simd_arrays_per_engine;
>>> +    uint32_t cu_per_simd_array;
>>> +    uint32_t simd_per_cu;
>>> +    uint32_t max_slots_scratch_cu;
>>> +    uint32_t engine_id;
>>> +    uint32_t vendor_id;
>>> +    uint32_t device_id;
>>> +    uint32_t location_id;
>>> +    uint32_t max_engine_clk_fcompute;
>>> +    uint32_t max_engine_clk_ccompute;
>>> +    uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
>>> +};
>>> +
>>> +#define HSA_MEM_HEAP_TYPE_SYSTEM    0
>>> +#define HSA_MEM_HEAP_TYPE_FB_PUBLIC    1
>>> +#define HSA_MEM_HEAP_TYPE_FB_PRIVATE    2
>>> +#define HSA_MEM_HEAP_TYPE_GPU_GDS    3
>>> +#define HSA_MEM_HEAP_TYPE_GPU_LDS    4
>>> +#define HSA_MEM_HEAP_TYPE_GPU_SCRATCH    5
>>> +
>>> +#define HSA_MEM_FLAGS_HOT_PLUGGABLE    0x00000001
>>> +#define HSA_MEM_FLAGS_NON_VOLATILE    0x00000002
>>> +#define HSA_MEM_FLAGS_RESERVED        0xfffffffc
>>> +
>>> +struct kfd_mem_properties {
>>> +    struct list_head    list;
>>> +    uint32_t        heap_type;
>>> +    uint64_t        size_in_bytes;
>>> +    uint32_t        flags;
>>> +    uint32_t        width;
>>> +    uint32_t        mem_clk_max;
>>> +    struct kobject        *kobj;
>>> +    struct attribute    attr;
>>> +};
>>> +
>>> +#define KFD_TOPOLOGY_CPU_SIBLINGS 256
>>> +
>>> +#define HSA_CACHE_TYPE_DATA        0x00000001
>>> +#define HSA_CACHE_TYPE_INSTRUCTION    0x00000002
>>> +#define HSA_CACHE_TYPE_CPU        0x00000004
>>> +#define HSA_CACHE_TYPE_HSACU        0x00000008
>>> +#define HSA_CACHE_TYPE_RESERVED        0xfffffff0
>>> +
>>> +struct kfd_cache_properties {
>>> +    struct list_head    list;
>>> +    uint32_t        processor_id_low;
>>> +    uint32_t        cache_level;
>>> +    uint32_t        cache_size;
>>> +    uint32_t        cacheline_size;
>>> +    uint32_t        cachelines_per_tag;
>>> +    uint32_t        cache_assoc;
>>> +    uint32_t        cache_latency;
>>> +    uint32_t        cache_type;
>>> +    uint8_t            sibling_map[KFD_TOPOLOGY_CPU_SIBLINGS];
>>> +    struct kobject        *kobj;
>>> +    struct attribute    attr;
>>> +};
>>> +
>>> +struct kfd_iolink_properties {
>>> +    struct list_head    list;
>>> +    uint32_t        iolink_type;
>>> +    uint32_t        ver_maj;
>>> +    uint32_t        ver_min;
>>> +    uint32_t        node_from;
>>> +    uint32_t        node_to;
>>> +    uint32_t        weight;
>>> +    uint32_t        min_latency;
>>> +    uint32_t        max_latency;
>>> +    uint32_t        min_bandwidth;
>>> +    uint32_t        max_bandwidth;
>>> +    uint32_t        rec_transfer_size;
>>> +    uint32_t        flags;
>>> +    struct kobject        *kobj;
>>> +    struct attribute    attr;
>>> +};
>>> +
>>> +struct kfd_topology_device {
>>> +    struct list_head        list;
>>> +    uint32_t            gpu_id;
>>> +    struct kfd_node_properties    node_props;
>>> +    uint32_t            mem_bank_count;
>>> +    struct list_head        mem_props;
>>> +    uint32_t            cache_count;
>>> +    struct list_head        cache_props;
>>> +    uint32_t            io_link_count;
>>> +    struct list_head        io_link_props;
>>> +    struct kfd_dev            *gpu;
>>> +    struct kobject            *kobj_node;
>>> +    struct kobject            *kobj_mem;
>>> +    struct kobject            *kobj_cache;
>>> +    struct kobject            *kobj_iolink;
>>> +    struct attribute        attr_gpuid;
>>> +    struct attribute        attr_name;
>>> +    struct attribute        attr_props;
>>> +};
>>> +
>>> +struct kfd_system_properties {
>>> +    uint32_t        num_devices;     /* Number of H-NUMA nodes */
>>> +    uint32_t        generation_count;
>>> +    uint64_t        platform_oem;
>>> +    uint64_t        platform_id;
>>> +    uint64_t        platform_rev;
>>> +    struct kobject        *kobj_topology;
>>> +    struct kobject        *kobj_nodes;
>>> +    struct attribute    attr_genid;
>>> +    struct attribute    attr_props;
>>> +};
>>> +
>>> +
>>> +
>>> +#endif /* __KFD_TOPOLOGY_H__ */
>>> --
>>> 1.9.1
>>>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 10/25] amdkfd: Add topology module to amdkfd
  2014-07-20 22:37   ` Jerome Glisse
  2014-07-27 11:15     ` Oded Gabbay
@ 2014-07-27 11:26     ` Oded Gabbay
  1 sibling, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-27 11:26 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Alexey Skidanov, Andrew Morton

On 21/07/14 01:37, Jerome Glisse wrote:
> On Thu, Jul 17, 2014 at 04:29:17PM +0300, Oded Gabbay wrote:
>> From: Evgeny Pinchuk <evgeny.pinchuk@amd.com>
>>
>> This patch adds the topology module to the driver. The topology is exposed to
>> userspace through the sysfs.
>>
>> The calls to add and remove a device to/from topology are done by the radeon
>> driver.
>
> So overall we already said that we do not want to see the cpu architecture
> re-expose by hsa in its own format. This pacth is NACK. Only expose additional
> non existent information and also follow the number one rules of sysfs which
> is one value -> one file.
>
> I understand the temptation to rexpose the cpu topology in your own way to
> make life simpler but there is already api for this so please use what exist
> today and if there is short coming than i am sure they can be fixed.
>
> See :
>
> /sys/devices/system/cpu/cpu*
>
>>
>> Signed-off-by: Evgeny Pinchuk <evgeny.pinchuk@amd.com>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> ---
>>   drivers/gpu/drm/radeon/amdkfd/Makefile       |    2 +-
>>   drivers/gpu/drm/radeon/amdkfd/kfd_crat.h     |  294 +++++++
>>   drivers/gpu/drm/radeon/amdkfd/kfd_device.c   |    7 +
>>   drivers/gpu/drm/radeon/amdkfd/kfd_module.c   |    7 +
>>   drivers/gpu/drm/radeon/amdkfd/kfd_priv.h     |   17 +
>>   drivers/gpu/drm/radeon/amdkfd/kfd_topology.c | 1207 ++++++++++++++++++++++++++
>>   drivers/gpu/drm/radeon/amdkfd/kfd_topology.h |  168 ++++
>>   7 files changed, 1701 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
>> index 9564e75..08ecfcd 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/Makefile
>> +++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
>> @@ -4,6 +4,6 @@
>>
>>   ccflags-y := -Iinclude/drm
>>
>> -amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o
>> +amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o
>>
>>   obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h b/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
>> new file mode 100644
>> index 0000000..a374fa3
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_crat.h
>> @@ -0,0 +1,294 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#ifndef KFD_CRAT_H_INCLUDED
>> +#define KFD_CRAT_H_INCLUDED
>> +
>> +#include <linux/types.h>
>> +
>> +#pragma pack(1)
>
> No pragma
>
Why no pragma ?
The structures here describe H/W tables in the BIOS so we must pragma pack(1) 
them. See for example the following line in file atombios.h in radeon:

#pragma pack(1)          /* BIOS data must use byte aligment */

	Oded
>> +
>> +/*
>> + * 4CC signature values for the CRAT and CDIT ACPI tables
>> + */
>> +
>> +#define CRAT_SIGNATURE	"CRAT"
>> +#define CDIT_SIGNATURE	"CDIT"
>> +
>> +/*
>> + * Component Resource Association Table (CRAT)
>> + */
>> +
>> +#define CRAT_OEMID_LENGTH	6
>> +#define CRAT_OEMTABLEID_LENGTH	8
>> +#define CRAT_RESERVED_LENGTH	6
>> +
>> +#define CRAT_OEMID_64BIT_MASK ((1ULL << (CRAT_OEMID_LENGTH * 8)) - 1)
>> +
>> +struct crat_header {
>> +	uint32_t	signature;
>> +	uint32_t	length;
>> +	uint8_t		revision;
>> +	uint8_t		checksum;
>> +	uint8_t		oem_id[CRAT_OEMID_LENGTH];
>> +	uint8_t		oem_table_id[CRAT_OEMTABLEID_LENGTH];
>> +	uint32_t	oem_revision;
>> +	uint32_t	creator_id;
>> +	uint32_t	creator_revision;
>> +	uint32_t	total_entries;
>> +	uint16_t	num_domains;
>> +	uint8_t		reserved[CRAT_RESERVED_LENGTH];
>> +};
>> +
>> +/*
>> + * The header structure is immediately followed by total_entries of the
>> + * data definitions
>> + */
>> +
>> +/*
>> + * The currently defined subtype entries in the CRAT
>> + */
>> +#define CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY	0
>> +#define CRAT_SUBTYPE_MEMORY_AFFINITY		1
>> +#define CRAT_SUBTYPE_CACHE_AFFINITY		2
>> +#define CRAT_SUBTYPE_TLB_AFFINITY		3
>> +#define CRAT_SUBTYPE_CCOMPUTE_AFFINITY		4
>> +#define CRAT_SUBTYPE_IOLINK_AFFINITY		5
>> +#define CRAT_SUBTYPE_MAX			6
>> +
>> +#define CRAT_SIBLINGMAP_SIZE	32
>> +
>> +/*
>> + * ComputeUnit Affinity structure and definitions
>> + */
>> +#define CRAT_CU_FLAGS_ENABLED		0x00000001
>> +#define CRAT_CU_FLAGS_HOT_PLUGGABLE	0x00000002
>> +#define CRAT_CU_FLAGS_CPU_PRESENT	0x00000004
>> +#define CRAT_CU_FLAGS_GPU_PRESENT	0x00000008
>> +#define CRAT_CU_FLAGS_IOMMU_PRESENT	0x00000010
>> +#define CRAT_CU_FLAGS_RESERVED		0xffffffe0
>> +
>> +#define CRAT_COMPUTEUNIT_RESERVED_LENGTH 4
>> +
>> +struct crat_subtype_computeunit {
>> +	uint8_t		type;
>> +	uint8_t		length;
>> +	uint16_t	reserved;
>> +	uint32_t	flags;
>> +	uint32_t	proximity_domain;
>> +	uint32_t	processor_id_low;
>> +	uint16_t	num_cpu_cores;
>> +	uint16_t	num_simd_cores;
>> +	uint16_t	max_waves_simd;
>> +	uint16_t	io_count;
>> +	uint16_t	hsa_capability;
>> +	uint16_t	lds_size_in_kb;
>> +	uint8_t		wave_front_size;
>> +	uint8_t		num_banks;
>> +	uint16_t	micro_engine_id;
>> +	uint8_t		num_arrays;
>> +	uint8_t		num_cu_per_array;
>> +	uint8_t		num_simd_per_cu;
>> +	uint8_t		max_slots_scatch_cu;
>> +	uint8_t		reserved2[CRAT_COMPUTEUNIT_RESERVED_LENGTH];
>> +};
>> +
>> +/*
>> + * HSA Memory Affinity structure and definitions
>> + */
>> +#define CRAT_MEM_FLAGS_ENABLED		0x00000001
>> +#define CRAT_MEM_FLAGS_HOT_PLUGGABLE	0x00000002
>> +#define CRAT_MEM_FLAGS_NON_VOLATILE	0x00000004
>> +#define CRAT_MEM_FLAGS_RESERVED		0xfffffff8
>> +
>> +#define CRAT_MEMORY_RESERVED_LENGTH 8
>> +
>> +struct crat_subtype_memory {
>> +	uint8_t		type;
>> +	uint8_t		length;
>> +	uint16_t	reserved;
>> +	uint32_t	flags;
>> +	uint32_t	promixity_domain;
>> +	uint32_t	base_addr_low;
>> +	uint32_t	base_addr_high;
>> +	uint32_t	length_low;
>> +	uint32_t	length_high;
>> +	uint32_t	width;
>> +	uint8_t		reserved2[CRAT_MEMORY_RESERVED_LENGTH];
>> +};
>> +
>> +/*
>> + * HSA Cache Affinity structure and definitions
>> + */
>> +#define CRAT_CACHE_FLAGS_ENABLED	0x00000001
>> +#define CRAT_CACHE_FLAGS_DATA_CACHE	0x00000002
>> +#define CRAT_CACHE_FLAGS_INST_CACHE	0x00000004
>> +#define CRAT_CACHE_FLAGS_CPU_CACHE	0x00000008
>> +#define CRAT_CACHE_FLAGS_SIMD_CACHE	0x00000010
>> +#define CRAT_CACHE_FLAGS_RESERVED	0xffffffe0
>> +
>> +#define CRAT_CACHE_RESERVED_LENGTH 8
>> +
>> +struct crat_subtype_cache {
>> +	uint8_t		type;
>> +	uint8_t		length;
>> +	uint16_t	reserved;
>> +	uint32_t	flags;
>> +	uint32_t	processor_id_low;
>> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
>> +	uint32_t	cache_size;
>> +	uint8_t		cache_level;
>> +	uint8_t		lines_per_tag;
>> +	uint16_t	cache_line_size;
>> +	uint8_t		associativity;
>> +	uint8_t		cache_properties;
>> +	uint16_t	cache_latency;
>> +	uint8_t		reserved2[CRAT_CACHE_RESERVED_LENGTH];
>> +};
>> +
>> +/*
>> + * HSA TLB Affinity structure and definitions
>> + */
>> +#define CRAT_TLB_FLAGS_ENABLED	0x00000001
>> +#define CRAT_TLB_FLAGS_DATA_TLB	0x00000002
>> +#define CRAT_TLB_FLAGS_INST_TLB	0x00000004
>> +#define CRAT_TLB_FLAGS_CPU_TLB	0x00000008
>> +#define CRAT_TLB_FLAGS_SIMD_TLB	0x00000010
>> +#define CRAT_TLB_FLAGS_RESERVED	0xffffffe0
>> +
>> +#define CRAT_TLB_RESERVED_LENGTH 4
>> +
>> +struct crat_subtype_tlb {
>> +	uint8_t		type;
>> +	uint8_t		length;
>> +	uint16_t	reserved;
>> +	uint32_t	flags;
>> +	uint32_t	processor_id_low;
>> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
>> +	uint32_t	tlb_level;
>> +	uint8_t		data_tlb_associativity_2mb;
>> +	uint8_t		data_tlb_size_2mb;
>> +	uint8_t		instruction_tlb_associativity_2mb;
>> +	uint8_t		instruction_tlb_size_2mb;
>> +	uint8_t		data_tlb_associativity_4k;
>> +	uint8_t		data_tlb_size_4k;
>> +	uint8_t		instruction_tlb_associativity_4k;
>> +	uint8_t		instruction_tlb_size_4k;
>> +	uint8_t		data_tlb_associativity_1gb;
>> +	uint8_t		data_tlb_size_1gb;
>> +	uint8_t		instruction_tlb_associativity_1gb;
>> +	uint8_t		instruction_tlb_size_1gb;
>> +	uint8_t		reserved2[CRAT_TLB_RESERVED_LENGTH];
>> +};
>> +
>> +/*
>> + * HSA CCompute/APU Affinity structure and definitions
>> + */
>> +#define CRAT_CCOMPUTE_FLAGS_ENABLED	0x00000001
>> +#define CRAT_CCOMPUTE_FLAGS_RESERVED	0xfffffffe
>> +
>> +#define CRAT_CCOMPUTE_RESERVED_LENGTH 16
>> +
>> +struct crat_subtype_ccompute {
>> +	uint8_t		type;
>> +	uint8_t		length;
>> +	uint16_t	reserved;
>> +	uint32_t	flags;
>> +	uint32_t	processor_id_low;
>> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
>> +	uint32_t	apu_size;
>> +	uint8_t		reserved2[CRAT_CCOMPUTE_RESERVED_LENGTH];
>> +};
>> +
>> +/*
>> + * HSA IO Link Affinity structure and definitions
>> + */
>> +#define CRAT_IOLINK_FLAGS_ENABLED	0x00000001
>> +#define CRAT_IOLINK_FLAGS_COHERENCY	0x00000002
>> +#define CRAT_IOLINK_FLAGS_RESERVED	0xfffffffc
>> +
>> +/*
>> + * IO interface types
>> + */
>> +#define CRAT_IOLINK_TYPE_UNDEFINED	0
>> +#define CRAT_IOLINK_TYPE_HYPERTRANSPORT	1
>> +#define CRAT_IOLINK_TYPE_PCIEXPRESS	2
>> +#define CRAT_IOLINK_TYPE_OTHER		3
>> +#define CRAT_IOLINK_TYPE_MAX		255
>> +
>> +#define CRAT_IOLINK_RESERVED_LENGTH 24
>> +
>> +struct crat_subtype_iolink {
>> +	uint8_t		type;
>> +	uint8_t		length;
>> +	uint16_t	reserved;
>> +	uint32_t	flags;
>> +	uint32_t	proximity_domain_from;
>> +	uint32_t	proximity_domain_to;
>> +	uint8_t		io_interface_type;
>> +	uint8_t		version_major;
>> +	uint16_t	version_minor;
>> +	uint32_t	minimum_latency;
>> +	uint32_t	maximum_latency;
>> +	uint32_t	minimum_bandwidth_mbs;
>> +	uint32_t	maximum_bandwidth_mbs;
>> +	uint32_t	recommended_transfer_size;
>> +	uint8_t		reserved2[CRAT_IOLINK_RESERVED_LENGTH];
>> +};
>> +
>> +/*
>> + * HSA generic sub-type header
>> + */
>> +
>> +#define CRAT_SUBTYPE_FLAGS_ENABLED 0x00000001
>> +
>> +struct crat_subtype_generic {
>> +	uint8_t		type;
>> +	uint8_t		length;
>> +	uint16_t	reserved;
>> +	uint32_t	flags;
>> +};
>> +
>> +/*
>> + * Component Locality Distance Information Table (CDIT)
>> + */
>> +#define CDIT_OEMID_LENGTH	6
>> +#define CDIT_OEMTABLEID_LENGTH	8
>> +
>> +struct cdit_header {
>> +	uint32_t	signature;
>> +	uint32_t	length;
>> +	uint8_t		revision;
>> +	uint8_t		checksum;
>> +	uint8_t		oem_id[CDIT_OEMID_LENGTH];
>> +	uint8_t		oem_table_id[CDIT_OEMTABLEID_LENGTH];
>> +	uint32_t	oem_revision;
>> +	uint32_t	creator_id;
>> +	uint32_t	creator_revision;
>> +	uint32_t	total_entries;
>> +	uint16_t	num_domains;
>> +	uint8_t		entry[1];
>> +};
>> +
>> +#pragma pack()
>> +
>> +#endif /* KFD_CRAT_H_INCLUDED */
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>> index dd63ce09..4138694 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>> @@ -100,6 +100,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>>   {
>>   	kfd->shared_resources = *gpu_resources;
>>
>> +	if (kfd_topology_add_device(kfd) != 0)
>> +		return false;
>> +
>>   	kfd->init_complete = true;
>>   	dev_info(kfd_device, "added device (%x:%x)\n", kfd->pdev->vendor,
>>   		 kfd->pdev->device);
>> @@ -109,6 +112,10 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>>
>>   void kgd2kfd_device_exit(struct kfd_dev *kfd)
>>   {
>> +	int err = kfd_topology_remove_device(kfd);
>> +
>> +	BUG_ON(err != 0);
>> +
>>   	kfree(kfd);
>>   }
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>> index c7faac6..c51f981 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>> @@ -73,16 +73,23 @@ static int __init kfd_module_init(void)
>>   	if (err < 0)
>>   		goto err_ioctl;
>>
>> +	err = kfd_topology_init();
>> +	if (err < 0)
>> +		goto err_topology;
>> +
>>   	dev_info(kfd_device, "Initialized module\n");
>>
>>   	return 0;
>>
>> +err_topology:
>> +	kfd_chardev_exit();
>>   err_ioctl:
>>   	return err;
>>   }
>>
>>   static void __exit kfd_module_exit(void)
>>   {
>> +	kfd_topology_shutdown();
>>   	kfd_chardev_exit();
>>   	dev_info(kfd_device, "Removed module\n");
>>   }
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> index 05e892f..b391e24 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> @@ -32,6 +32,14 @@
>>   #include <linux/spinlock.h>
>>   #include "../radeon_kfd.h"
>>
>> +#define KFD_SYSFS_FILE_MODE 0444
>> +
>> +/* GPU ID hash width in bits */
>> +#define KFD_GPU_ID_HASH_WIDTH 16
>> +
>> +/* Macro for allocating structures */
>> +#define kfd_alloc_struct(ptr_to_struct)	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
>> +
>>   struct kfd_device_info {
>>   	const struct kfd_scheduler_class *scheduler_class;
>>   	unsigned int max_pasid_bits;
>> @@ -71,6 +79,15 @@ struct kfd_process {
>>
>>   extern struct device *kfd_device;
>>
>> +/* Topology */
>> +int kfd_topology_init(void);
>> +void kfd_topology_shutdown(void);
>> +int kfd_topology_add_device(struct kfd_dev *gpu);
>> +int kfd_topology_remove_device(struct kfd_dev *gpu);
>> +struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
>> +struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
>> +struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx);
>> +
>>   /* Interrupts */
>>   void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
>> new file mode 100644
>> index 0000000..30da4c3
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.c
>> @@ -0,0 +1,1207 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <linux/types.h>
>> +#include <linux/kernel.h>
>> +#include <linux/pci.h>
>> +#include <linux/errno.h>
>> +#include <linux/acpi.h>
>> +#include <linux/hash.h>
>> +#include <linux/cpufreq.h>
>> +
>> +#include "kfd_priv.h"
>> +#include "kfd_crat.h"
>> +#include "kfd_topology.h"
>> +
>> +static struct list_head topology_device_list;
>> +static int topology_crat_parsed;
>> +static struct kfd_system_properties sys_props;
>> +
>> +static DECLARE_RWSEM(topology_lock);
>> +
>> +struct kfd_dev *kfd_device_by_id(uint32_t gpu_id)
>> +{
>> +	struct kfd_topology_device *top_dev;
>> +	struct kfd_dev *device = NULL;
>> +
>> +	down_read(&topology_lock);
>> +
>> +	list_for_each_entry(top_dev, &topology_device_list, list)
>> +		if (top_dev->gpu_id == gpu_id) {
>> +			device = top_dev->gpu;
>> +			break;
>> +		}
>> +
>> +	up_read(&topology_lock);
>> +
>> +	return device;
>> +}
>> +
>> +struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev)
>> +{
>> +	struct kfd_topology_device *top_dev;
>> +	struct kfd_dev *device = NULL;
>> +
>> +	down_read(&topology_lock);
>> +
>> +	list_for_each_entry(top_dev, &topology_device_list, list)
>> +		if (top_dev->gpu->pdev == pdev) {
>> +			device = top_dev->gpu;
>> +			break;
>> +		}
>> +
>> +	up_read(&topology_lock);
>> +
>> +	return device;
>> +}
>> +
>> +static int kfd_topology_get_crat_acpi(void *crat_image, size_t *size)
>> +{
>> +	struct acpi_table_header *crat_table;
>> +	acpi_status status;
>> +
>> +	if (!size)
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * Fetch the CRAT table from ACPI
>> +	 */
>> +	status = acpi_get_table(CRAT_SIGNATURE, 0, &crat_table);
>> +	if (status == AE_NOT_FOUND) {
>> +		pr_warn("CRAT table not found\n");
>> +		return -ENODATA;
>> +	} else if (ACPI_FAILURE(status)) {
>> +		const char *err = acpi_format_exception(status);
>> +
>> +		pr_err("CRAT table error: %s\n", err);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (*size >= crat_table->length && crat_image != 0)
>> +		memcpy(crat_image, crat_table, crat_table->length);
>> +
>> +	*size = crat_table->length;
>> +
>> +	return 0;
>> +}
>> +
>> +static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
>> +		struct crat_subtype_computeunit *cu)
>> +{
>> +	BUG_ON(!dev);
>> +	BUG_ON(!cu);
>> +
>> +	dev->node_props.cpu_cores_count = cu->num_cpu_cores;
>> +	dev->node_props.cpu_core_id_base = cu->processor_id_low;
>> +	if (cu->hsa_capability & CRAT_CU_FLAGS_IOMMU_PRESENT)
>> +		dev->node_props.capability |= HSA_CAP_ATS_PRESENT;
>> +
>> +	pr_info("CU CPU: cores=%d id_base=%d\n", cu->num_cpu_cores,
>> +			cu->processor_id_low);
>> +}
>> +
>> +static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
>> +		struct crat_subtype_computeunit *cu)
>> +{
>> +	BUG_ON(!dev);
>> +	BUG_ON(!cu);
>> +
>> +	dev->node_props.simd_id_base = cu->processor_id_low;
>> +	dev->node_props.simd_count = cu->num_simd_cores;
>> +	dev->node_props.lds_size_in_kb = cu->lds_size_in_kb;
>> +	dev->node_props.max_waves_per_simd = cu->max_waves_simd;
>> +	dev->node_props.wave_front_size = cu->wave_front_size;
>> +	dev->node_props.mem_banks_count = cu->num_banks;
>> +	dev->node_props.array_count = cu->num_arrays;
>> +	dev->node_props.cu_per_simd_array = cu->num_cu_per_array;
>> +	dev->node_props.simd_per_cu = cu->num_simd_per_cu;
>> +	dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
>> +	if (cu->hsa_capability & CRAT_CU_FLAGS_HOT_PLUGGABLE)
>> +		dev->node_props.capability |= HSA_CAP_HOT_PLUGGABLE;
>> +	pr_info("CU GPU: simds=%d id_base=%d\n", cu->num_simd_cores,
>> +				cu->processor_id_low);
>> +}
>> +
>> +/* kfd_parse_subtype_cu is called when the topology mutex is already acquired */
>> +static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu)
>> +{
>> +	struct kfd_topology_device *dev;
>> +	int i = 0;
>> +
>> +	BUG_ON(!cu);
>> +
>> +	pr_info("Found CU entry in CRAT table with proximity_domain=%d caps=%x\n",
>> +			cu->proximity_domain, cu->hsa_capability);
>> +	list_for_each_entry(dev, &topology_device_list, list) {
>> +		if (cu->proximity_domain == i) {
>> +			if (cu->flags & CRAT_CU_FLAGS_CPU_PRESENT)
>> +				kfd_populated_cu_info_cpu(dev, cu);
>> +
>> +			if (cu->flags & CRAT_CU_FLAGS_GPU_PRESENT)
>> +				kfd_populated_cu_info_gpu(dev, cu);
>> +			break;
>> +		}
>> +		i++;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/* kfd_parse_subtype_mem is called when the topology mutex is already acquired */
>> +static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem)
>> +{
>> +	struct kfd_mem_properties *props;
>> +	struct kfd_topology_device *dev;
>> +	int i = 0;
>> +
>> +	BUG_ON(!mem);
>> +
>> +	pr_info("Found memory entry in CRAT table with proximity_domain=%d\n",
>> +			mem->promixity_domain);
>> +	list_for_each_entry(dev, &topology_device_list, list) {
>> +		if (mem->promixity_domain == i) {
>> +			props = kfd_alloc_struct(props);
>> +			if (props == 0)
>> +				return -ENOMEM;
>> +
>> +			if (dev->node_props.cpu_cores_count == 0)
>> +				props->heap_type = HSA_MEM_HEAP_TYPE_FB_PRIVATE;
>> +			else
>> +				props->heap_type = HSA_MEM_HEAP_TYPE_SYSTEM;
>> +
>> +			if (mem->flags & CRAT_MEM_FLAGS_HOT_PLUGGABLE)
>> +				props->flags |= HSA_MEM_FLAGS_HOT_PLUGGABLE;
>> +			if (mem->flags & CRAT_MEM_FLAGS_NON_VOLATILE)
>> +				props->flags |= HSA_MEM_FLAGS_NON_VOLATILE;
>> +
>> +			props->size_in_bytes = ((uint64_t)mem->length_high << 32) +
>> +						mem->length_low;
>> +			props->width = mem->width;
>> +
>> +			dev->mem_bank_count++;
>> +			list_add_tail(&props->list, &dev->mem_props);
>> +
>> +			break;
>> +		}
>> +		i++;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/* kfd_parse_subtype_cache is called when the topology mutex is already acquired */
>> +static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
>> +{
>> +	struct kfd_cache_properties *props;
>> +	struct kfd_topology_device *dev;
>> +	uint32_t id;
>> +
>> +	BUG_ON(!cache);
>> +
>> +	id = cache->processor_id_low;
>> +
>> +	pr_info("Found cache entry in CRAT table with processor_id=%d\n", id);
>> +	list_for_each_entry(dev, &topology_device_list, list)
>> +		if (id == dev->node_props.cpu_core_id_base ||
>> +		    id == dev->node_props.simd_id_base) {
>> +			props = kfd_alloc_struct(props);
>> +			if (props == 0)
>> +				return -ENOMEM;
>> +
>> +			props->processor_id_low = id;
>> +			props->cache_level = cache->cache_level;
>> +			props->cache_size = cache->cache_size;
>> +			props->cacheline_size = cache->cache_line_size;
>> +			props->cachelines_per_tag = cache->lines_per_tag;
>> +			props->cache_assoc = cache->associativity;
>> +			props->cache_latency = cache->cache_latency;
>> +
>> +			if (cache->flags & CRAT_CACHE_FLAGS_DATA_CACHE)
>> +				props->cache_type |= HSA_CACHE_TYPE_DATA;
>> +			if (cache->flags & CRAT_CACHE_FLAGS_INST_CACHE)
>> +				props->cache_type |= HSA_CACHE_TYPE_INSTRUCTION;
>> +			if (cache->flags & CRAT_CACHE_FLAGS_CPU_CACHE)
>> +				props->cache_type |= HSA_CACHE_TYPE_CPU;
>> +			if (cache->flags & CRAT_CACHE_FLAGS_SIMD_CACHE)
>> +				props->cache_type |= HSA_CACHE_TYPE_HSACU;
>> +
>> +			dev->cache_count++;
>> +			dev->node_props.caches_count++;
>> +			list_add_tail(&props->list, &dev->cache_props);
>> +
>> +			break;
>> +		}
>> +
>> +	return 0;
>> +}
>> +
>> +/* kfd_parse_subtype_iolink is called when the topology mutex is already acquired */
>> +static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
>> +{
>> +	struct kfd_iolink_properties *props;
>> +	struct kfd_topology_device *dev;
>> +	uint32_t i = 0;
>> +	uint32_t id_from;
>> +	uint32_t id_to;
>> +
>> +	BUG_ON(!iolink);
>> +
>> +	id_from = iolink->proximity_domain_from;
>> +	id_to = iolink->proximity_domain_to;
>> +
>> +	pr_info("Found IO link entry in CRAT table with id_from=%d\n", id_from);
>> +	list_for_each_entry(dev, &topology_device_list, list) {
>> +		if (id_from == i) {
>> +			props = kfd_alloc_struct(props);
>> +			if (props == 0)
>> +				return -ENOMEM;
>> +
>> +			props->node_from = id_from;
>> +			props->node_to = id_to;
>> +			props->ver_maj = iolink->version_major;
>> +			props->ver_min = iolink->version_minor;
>> +
>> +			/*
>> +			 * weight factor (derived from CDIR), currently always 1
>> +			 */
>> +			props->weight = 1;
>> +
>> +			props->min_latency = iolink->minimum_latency;
>> +			props->max_latency = iolink->maximum_latency;
>> +			props->min_bandwidth = iolink->minimum_bandwidth_mbs;
>> +			props->max_bandwidth = iolink->maximum_bandwidth_mbs;
>> +			props->rec_transfer_size =
>> +					iolink->recommended_transfer_size;
>> +
>> +			dev->io_link_count++;
>> +			dev->node_props.io_links_count++;
>> +			list_add_tail(&props->list, &dev->io_link_props);
>> +
>> +			break;
>> +		}
>> +		i++;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
>> +{
>> +	struct crat_subtype_computeunit *cu;
>> +	struct crat_subtype_memory *mem;
>> +	struct crat_subtype_cache *cache;
>> +	struct crat_subtype_iolink *iolink;
>> +	int ret = 0;
>> +
>> +	BUG_ON(!sub_type_hdr);
>> +
>> +	switch (sub_type_hdr->type) {
>> +	case CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY:
>> +		cu = (struct crat_subtype_computeunit *)sub_type_hdr;
>> +		ret = kfd_parse_subtype_cu(cu);
>> +		break;
>> +	case CRAT_SUBTYPE_MEMORY_AFFINITY:
>> +		mem = (struct crat_subtype_memory *)sub_type_hdr;
>> +		ret = kfd_parse_subtype_mem(mem);
>> +		break;
>> +	case CRAT_SUBTYPE_CACHE_AFFINITY:
>> +		cache = (struct crat_subtype_cache *)sub_type_hdr;
>> +		ret = kfd_parse_subtype_cache(cache);
>> +		break;
>> +	case CRAT_SUBTYPE_TLB_AFFINITY:
>> +		/*
>> +		 * For now, nothing to do here
>> +		 */
>> +		pr_info("Found TLB entry in CRAT table (not processing)\n");
>> +		break;
>> +	case CRAT_SUBTYPE_CCOMPUTE_AFFINITY:
>> +		/*
>> +		 * For now, nothing to do here
>> +		 */
>> +		pr_info("Found CCOMPUTE entry in CRAT table (not processing)\n");
>> +		break;
>> +	case CRAT_SUBTYPE_IOLINK_AFFINITY:
>> +		iolink = (struct crat_subtype_iolink *)sub_type_hdr;
>> +		ret = kfd_parse_subtype_iolink(iolink);
>> +		break;
>> +	default:
>> +		pr_warn("Unknown subtype (%d) in CRAT\n",
>> +				sub_type_hdr->type);
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static void kfd_release_topology_device(struct kfd_topology_device *dev)
>> +{
>> +	struct kfd_mem_properties *mem;
>> +	struct kfd_cache_properties *cache;
>> +	struct kfd_iolink_properties *iolink;
>> +
>> +	BUG_ON(!dev);
>> +
>> +	list_del(&dev->list);
>> +
>> +	while (dev->mem_props.next != &dev->mem_props) {
>> +		mem = container_of(dev->mem_props.next,
>> +				struct kfd_mem_properties, list);
>> +		list_del(&mem->list);
>> +		kfree(mem);
>> +	}
>> +
>> +	while (dev->cache_props.next != &dev->cache_props) {
>> +		cache = container_of(dev->cache_props.next,
>> +				struct kfd_cache_properties, list);
>> +		list_del(&cache->list);
>> +		kfree(cache);
>> +	}
>> +
>> +	while (dev->io_link_props.next != &dev->io_link_props) {
>> +		iolink = container_of(dev->io_link_props.next,
>> +				struct kfd_iolink_properties, list);
>> +		list_del(&iolink->list);
>> +		kfree(iolink);
>> +	}
>> +
>> +	kfree(dev);
>> +
>> +	sys_props.num_devices--;
>> +}
>> +
>> +static void kfd_release_live_view(void)
>> +{
>> +	struct kfd_topology_device *dev;
>> +
>> +	while (topology_device_list.next != &topology_device_list) {
>> +		dev = container_of(topology_device_list.next,
>> +				 struct kfd_topology_device, list);
>> +		kfd_release_topology_device(dev);
>> +}
>> +
>> +	memset(&sys_props, 0, sizeof(sys_props));
>> +}
>> +
>> +static struct kfd_topology_device *kfd_create_topology_device(void)
>> +{
>> +	struct kfd_topology_device *dev;
>> +
>> +	dev = kfd_alloc_struct(dev);
>> +	if (dev == 0) {
>> +		pr_err("No memory to allocate a topology device");
>> +		return 0;
>> +	}
>> +
>> +	INIT_LIST_HEAD(&dev->mem_props);
>> +	INIT_LIST_HEAD(&dev->cache_props);
>> +	INIT_LIST_HEAD(&dev->io_link_props);
>> +
>> +	list_add_tail(&dev->list, &topology_device_list);
>> +	sys_props.num_devices++;
>> +
>> +	return dev;
>> +	}
>> +
>> +static int kfd_parse_crat_table(void *crat_image)
>> +{
>> +	struct kfd_topology_device *top_dev;
>> +	struct crat_subtype_generic *sub_type_hdr;
>> +	uint16_t node_id;
>> +	int ret;
>> +	struct crat_header *crat_table = (struct crat_header *)crat_image;
>> +	uint16_t num_nodes;
>> +	uint32_t image_len;
>> +
>> +	if (!crat_image)
>> +		return -EINVAL;
>> +
>> +	num_nodes = crat_table->num_domains;
>> +	image_len = crat_table->length;
>> +
>> +	pr_info("Parsing CRAT table with %d nodes\n", num_nodes);
>> +
>> +	for (node_id = 0; node_id < num_nodes; node_id++) {
>> +		top_dev = kfd_create_topology_device();
>> +		if (!top_dev) {
>> +			kfd_release_live_view();
>> +			return -ENOMEM;
>> +		}
>> +	}
>> +
>> +	sys_props.platform_id = (*((uint64_t *)crat_table->oem_id)) & CRAT_OEMID_64BIT_MASK;
>> +	sys_props.platform_oem = *((uint64_t *)crat_table->oem_table_id);
>> +	sys_props.platform_rev = crat_table->revision;
>> +
>> +	sub_type_hdr = (struct crat_subtype_generic *)(crat_table+1);
>> +	while ((char *)sub_type_hdr + sizeof(struct crat_subtype_generic) <
>> +			((char *)crat_image) + image_len) {
>> +		if (sub_type_hdr->flags & CRAT_SUBTYPE_FLAGS_ENABLED) {
>> +			ret = kfd_parse_subtype(sub_type_hdr);
>> +			if (ret != 0) {
>> +				kfd_release_live_view();
>> +				return ret;
>> +			}
>> +		}
>> +
>> +		sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
>> +				sub_type_hdr->length);
>> +	}
>> +
>> +	sys_props.generation_count++;
>> +	topology_crat_parsed = 1;
>> +
>> +	return 0;
>> +}
>> +
>> +
>> +#define sysfs_show_gen_prop(buffer, fmt, ...) \
>> +		snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__)
>> +#define sysfs_show_32bit_prop(buffer, name, value) \
>> +		sysfs_show_gen_prop(buffer, "%s %u\n", name, value)
>> +#define sysfs_show_64bit_prop(buffer, name, value) \
>> +		sysfs_show_gen_prop(buffer, "%s %llu\n", name, value)
>> +#define sysfs_show_32bit_val(buffer, value) \
>> +		sysfs_show_gen_prop(buffer, "%u\n", value)
>> +#define sysfs_show_str_val(buffer, value) \
>> +		sysfs_show_gen_prop(buffer, "%s\n", value)
>> +
>> +static ssize_t sysprops_show(struct kobject *kobj, struct attribute *attr,
>> +		char *buffer)
>> +{
>> +	ssize_t ret;
>> +
>> +	/* Making sure that the buffer is an empty string */
>> +	buffer[0] = 0;
>> +
>> +	if (attr == &sys_props.attr_genid) {
>> +		ret = sysfs_show_32bit_val(buffer, sys_props.generation_count);
>> +	} else if (attr == &sys_props.attr_props) {
>> +		sysfs_show_64bit_prop(buffer, "platform_oem",
>> +				sys_props.platform_oem);
>> +		sysfs_show_64bit_prop(buffer, "platform_id",
>> +				sys_props.platform_id);
>> +		ret = sysfs_show_64bit_prop(buffer, "platform_rev",
>> +				sys_props.platform_rev);
>> +	} else {
>> +		ret = -EINVAL;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static const struct sysfs_ops sysprops_ops = {
>> +	.show = sysprops_show,
>> +};
>> +
>> +static struct kobj_type sysprops_type = {
>> +	.sysfs_ops = &sysprops_ops,
>> +};
>> +
>> +static ssize_t iolink_show(struct kobject *kobj, struct attribute *attr,
>> +		char *buffer)
>> +{
>> +	ssize_t ret;
>> +	struct kfd_iolink_properties *iolink;
>> +
>> +	/* Making sure that the buffer is an empty string */
>> +	buffer[0] = 0;
>> +
>> +	iolink = container_of(attr, struct kfd_iolink_properties, attr);
>> +	sysfs_show_32bit_prop(buffer, "type", iolink->iolink_type);
>> +	sysfs_show_32bit_prop(buffer, "version_major", iolink->ver_maj);
>> +	sysfs_show_32bit_prop(buffer, "version_minor", iolink->ver_min);
>> +	sysfs_show_32bit_prop(buffer, "node_from", iolink->node_from);
>> +	sysfs_show_32bit_prop(buffer, "node_to", iolink->node_to);
>> +	sysfs_show_32bit_prop(buffer, "weight", iolink->weight);
>> +	sysfs_show_32bit_prop(buffer, "min_latency", iolink->min_latency);
>> +	sysfs_show_32bit_prop(buffer, "max_latency", iolink->max_latency);
>> +	sysfs_show_32bit_prop(buffer, "min_bandwidth", iolink->min_bandwidth);
>> +	sysfs_show_32bit_prop(buffer, "max_bandwidth", iolink->max_bandwidth);
>> +	sysfs_show_32bit_prop(buffer, "recommended_transfer_size",
>> +			iolink->rec_transfer_size);
>> +	ret = sysfs_show_32bit_prop(buffer, "flags", iolink->flags);
>> +
>> +	return ret;
>> +}
>> +
>> +static const struct sysfs_ops iolink_ops = {
>> +	.show = iolink_show,
>> +};
>> +
>> +static struct kobj_type iolink_type = {
>> +	.sysfs_ops = &iolink_ops,
>> +};
>> +
>> +static ssize_t mem_show(struct kobject *kobj, struct attribute *attr,
>> +		char *buffer)
>> +{
>> +	ssize_t ret;
>> +	struct kfd_mem_properties *mem;
>> +
>> +	/* Making sure that the buffer is an empty string */
>> +	buffer[0] = 0;
>> +
>> +	mem = container_of(attr, struct kfd_mem_properties, attr);
>> +	sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
>> +	sysfs_show_64bit_prop(buffer, "size_in_bytes", mem->size_in_bytes);
>> +	sysfs_show_32bit_prop(buffer, "flags", mem->flags);
>> +	sysfs_show_32bit_prop(buffer, "width", mem->width);
>> +	ret = sysfs_show_32bit_prop(buffer, "mem_clk_max", mem->mem_clk_max);
>> +
>> +	return ret;
>> +}
>> +
>> +static const struct sysfs_ops mem_ops = {
>> +	.show = mem_show,
>> +};
>> +
>> +static struct kobj_type mem_type = {
>> +	.sysfs_ops = &mem_ops,
>> +};
>> +
>> +static ssize_t kfd_cache_show(struct kobject *kobj, struct attribute *attr,
>> +		char *buffer)
>> +{
>> +	ssize_t ret;
>> +	uint32_t i;
>> +	struct kfd_cache_properties *cache;
>> +
>> +	/* Making sure that the buffer is an empty string */
>> +	buffer[0] = 0;
>> +
>> +	cache = container_of(attr, struct kfd_cache_properties, attr);
>> +	sysfs_show_32bit_prop(buffer, "processor_id_low",
>> +			cache->processor_id_low);
>> +	sysfs_show_32bit_prop(buffer, "level", cache->cache_level);
>> +	sysfs_show_32bit_prop(buffer, "size", cache->cache_size);
>> +	sysfs_show_32bit_prop(buffer, "cache_line_size", cache->cacheline_size);
>> +	sysfs_show_32bit_prop(buffer, "cache_lines_per_tag",
>> +			cache->cachelines_per_tag);
>> +	sysfs_show_32bit_prop(buffer, "association", cache->cache_assoc);
>> +	sysfs_show_32bit_prop(buffer, "latency", cache->cache_latency);
>> +	sysfs_show_32bit_prop(buffer, "type", cache->cache_type);
>> +	snprintf(buffer, PAGE_SIZE, "%ssibling_map ", buffer);
>> +	for (i = 0; i < KFD_TOPOLOGY_CPU_SIBLINGS; i++)
>> +		ret = snprintf(buffer, PAGE_SIZE, "%s%d%s",
>> +				buffer, cache->sibling_map[i],
>> +				(i == KFD_TOPOLOGY_CPU_SIBLINGS-1) ?
>> +						"\n" : ",");
>> +
>> +	return ret;
>> +}
>> +
>> +static const struct sysfs_ops cache_ops = {
>> +	.show = kfd_cache_show,
>> +};
>> +
>> +static struct kobj_type cache_type = {
>> +	.sysfs_ops = &cache_ops,
>> +};
>> +
>> +static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>> +		char *buffer)
>> +{
>> +	ssize_t ret;
>> +	struct kfd_topology_device *dev;
>> +	char public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
>> +	uint32_t i;
>> +
>> +	/* Making sure that the buffer is an empty string */
>> +	buffer[0] = 0;
>> +
>> +	if (strcmp(attr->name, "gpu_id") == 0) {
>> +		dev = container_of(attr, struct kfd_topology_device,
>> +				attr_gpuid);
>> +		ret = sysfs_show_32bit_val(buffer, dev->gpu_id);
>> +	} else if (strcmp(attr->name, "name") == 0) {
>> +		dev = container_of(attr, struct kfd_topology_device,
>> +				attr_name);
>> +		for (i = 0; i < KFD_TOPOLOGY_PUBLIC_NAME_SIZE; i++) {
>> +			public_name[i] =
>> +					(char)dev->node_props.marketing_name[i];
>> +			if (dev->node_props.marketing_name[i] == 0)
>> +				break;
>> +		}
>> +		public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE-1] = 0x0;
>> +		ret = sysfs_show_str_val(buffer, public_name);
>> +	} else {
>> +		dev = container_of(attr, struct kfd_topology_device,
>> +				attr_props);
>> +		sysfs_show_32bit_prop(buffer, "cpu_cores_count",
>> +				dev->node_props.cpu_cores_count);
>> +		sysfs_show_32bit_prop(buffer, "simd_count",
>> +				dev->node_props.simd_count);
>> +		sysfs_show_32bit_prop(buffer, "mem_banks_count",
>> +				dev->node_props.mem_banks_count);
>> +		sysfs_show_32bit_prop(buffer, "caches_count",
>> +				dev->node_props.caches_count);
>> +		sysfs_show_32bit_prop(buffer, "io_links_count",
>> +				dev->node_props.io_links_count);
>> +		sysfs_show_32bit_prop(buffer, "cpu_core_id_base",
>> +				dev->node_props.cpu_core_id_base);
>> +		sysfs_show_32bit_prop(buffer, "simd_id_base",
>> +				dev->node_props.simd_id_base);
>> +		sysfs_show_32bit_prop(buffer, "capability",
>> +				dev->node_props.capability);
>> +		sysfs_show_32bit_prop(buffer, "max_waves_per_simd",
>> +				dev->node_props.max_waves_per_simd);
>> +		sysfs_show_32bit_prop(buffer, "lds_size_in_kb",
>> +				dev->node_props.lds_size_in_kb);
>> +		sysfs_show_32bit_prop(buffer, "gds_size_in_kb",
>> +				dev->node_props.gds_size_in_kb);
>> +		sysfs_show_32bit_prop(buffer, "wave_front_size",
>> +				dev->node_props.wave_front_size);
>> +		sysfs_show_32bit_prop(buffer, "array_count",
>> +				dev->node_props.array_count);
>> +		sysfs_show_32bit_prop(buffer, "simd_arrays_per_engine",
>> +				dev->node_props.simd_arrays_per_engine);
>> +		sysfs_show_32bit_prop(buffer, "cu_per_simd_array",
>> +				dev->node_props.cu_per_simd_array);
>> +		sysfs_show_32bit_prop(buffer, "simd_per_cu",
>> +				dev->node_props.simd_per_cu);
>> +		sysfs_show_32bit_prop(buffer, "max_slots_scratch_cu",
>> +				dev->node_props.max_slots_scratch_cu);
>> +		sysfs_show_32bit_prop(buffer, "engine_id",
>> +				dev->node_props.engine_id);
>> +		sysfs_show_32bit_prop(buffer, "vendor_id",
>> +				dev->node_props.vendor_id);
>> +		sysfs_show_32bit_prop(buffer, "device_id",
>> +				dev->node_props.device_id);
>> +		sysfs_show_32bit_prop(buffer, "location_id",
>> +				dev->node_props.location_id);
>> +		sysfs_show_32bit_prop(buffer, "max_engine_clk_fcompute",
>> +				kfd2kgd->get_max_engine_clock_in_mhz(
>> +					dev->gpu->kgd));
>> +		sysfs_show_64bit_prop(buffer, "local_mem_size",
>> +				kfd2kgd->get_vmem_size(dev->gpu->kgd));
>> +		ret = sysfs_show_32bit_prop(buffer, "max_engine_clk_ccompute",
>> +				cpufreq_quick_get_max(0)/1000);
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +static const struct sysfs_ops node_ops = {
>> +	.show = node_show,
>> +};
>> +
>> +static struct kobj_type node_type = {
>> +	.sysfs_ops = &node_ops,
>> +};
>> +
>> +static void kfd_remove_sysfs_file(struct kobject *kobj, struct attribute *attr)
>> +{
>> +	sysfs_remove_file(kobj, attr);
>> +	kobject_del(kobj);
>> +	kobject_put(kobj);
>> +}
>> +
>> +static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>> +{
>> +	struct kfd_iolink_properties *iolink;
>> +	struct kfd_cache_properties *cache;
>> +	struct kfd_mem_properties *mem;
>> +
>> +	BUG_ON(!dev);
>> +
>> +	if (dev->kobj_iolink) {
>> +		list_for_each_entry(iolink, &dev->io_link_props, list)
>> +			if (iolink->kobj) {
>> +				kfd_remove_sysfs_file(iolink->kobj, &iolink->attr);
>> +				iolink->kobj = 0;
>> +			}
>> +		kobject_del(dev->kobj_iolink);
>> +		kobject_put(dev->kobj_iolink);
>> +		dev->kobj_iolink = 0;
>> +	}
>> +
>> +	if (dev->kobj_cache) {
>> +		list_for_each_entry(cache, &dev->cache_props, list)
>> +			if (cache->kobj) {
>> +				kfd_remove_sysfs_file(cache->kobj, &cache->attr);
>> +				cache->kobj = 0;
>> +			}
>> +		kobject_del(dev->kobj_cache);
>> +		kobject_put(dev->kobj_cache);
>> +		dev->kobj_cache = 0;
>> +	}
>> +
>> +	if (dev->kobj_mem) {
>> +		list_for_each_entry(mem, &dev->mem_props, list)
>> +			if (mem->kobj) {
>> +				kfd_remove_sysfs_file(mem->kobj, &mem->attr);
>> +				mem->kobj = 0;
>> +			}
>> +		kobject_del(dev->kobj_mem);
>> +		kobject_put(dev->kobj_mem);
>> +		dev->kobj_mem = 0;
>> +	}
>> +
>> +	if (dev->kobj_node) {
>> +		sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
>> +		sysfs_remove_file(dev->kobj_node, &dev->attr_name);
>> +		sysfs_remove_file(dev->kobj_node, &dev->attr_props);
>> +		kobject_del(dev->kobj_node);
>> +		kobject_put(dev->kobj_node);
>> +		dev->kobj_node = 0;
>> +	}
>> +}
>> +
>> +static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>> +		uint32_t id)
>> +{
>> +	struct kfd_iolink_properties *iolink;
>> +	struct kfd_cache_properties *cache;
>> +	struct kfd_mem_properties *mem;
>> +	int ret;
>> +	uint32_t i;
>> +
>> +	BUG_ON(!dev);
>> +
>> +	/*
>> +	 * Creating the sysfs folders
>> +	 */
>> +	BUG_ON(dev->kobj_node);
>> +	dev->kobj_node = kfd_alloc_struct(dev->kobj_node);
>> +	if (!dev->kobj_node)
>> +		return -ENOMEM;
>> +
>> +	ret = kobject_init_and_add(dev->kobj_node, &node_type,
>> +			sys_props.kobj_nodes, "%d", id);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	dev->kobj_mem = kobject_create_and_add("mem_banks", dev->kobj_node);
>> +	if (!dev->kobj_mem)
>> +		return -ENOMEM;
>> +
>> +	dev->kobj_cache = kobject_create_and_add("caches", dev->kobj_node);
>> +	if (!dev->kobj_cache)
>> +		return -ENOMEM;
>> +
>> +	dev->kobj_iolink = kobject_create_and_add("io_links", dev->kobj_node);
>> +	if (!dev->kobj_iolink)
>> +		return -ENOMEM;
>> +
>> +	/*
>> +	 * Creating sysfs files for node properties
>> +	 */
>> +	dev->attr_gpuid.name = "gpu_id";
>> +	dev->attr_gpuid.mode = KFD_SYSFS_FILE_MODE;
>> +	sysfs_attr_init(&dev->attr_gpuid);
>> +	dev->attr_name.name = "name";
>> +	dev->attr_name.mode = KFD_SYSFS_FILE_MODE;
>> +	sysfs_attr_init(&dev->attr_name);
>> +	dev->attr_props.name = "properties";
>> +	dev->attr_props.mode = KFD_SYSFS_FILE_MODE;
>> +	sysfs_attr_init(&dev->attr_props);
>> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_gpuid);
>> +	if (ret < 0)
>> +		return ret;
>> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_name);
>> +	if (ret < 0)
>> +		return ret;
>> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_props);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	i = 0;
>> +	list_for_each_entry(mem, &dev->mem_props, list) {
>> +		mem->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
>> +		if (!mem->kobj)
>> +			return -ENOMEM;
>> +		ret = kobject_init_and_add(mem->kobj, &mem_type,
>> +				dev->kobj_mem, "%d", i);
>> +		if (ret < 0)
>> +			return ret;
>> +
>> +		mem->attr.name = "properties";
>> +		mem->attr.mode = KFD_SYSFS_FILE_MODE;
>> +		sysfs_attr_init(&mem->attr);
>> +		ret = sysfs_create_file(mem->kobj, &mem->attr);
>> +		if (ret < 0)
>> +			return ret;
>> +		i++;
>> +	}
>> +
>> +	i = 0;
>> +	list_for_each_entry(cache, &dev->cache_props, list) {
>> +		cache->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
>> +		if (!cache->kobj)
>> +			return -ENOMEM;
>> +		ret = kobject_init_and_add(cache->kobj, &cache_type,
>> +				dev->kobj_cache, "%d", i);
>> +		if (ret < 0)
>> +			return ret;
>> +
>> +		cache->attr.name = "properties";
>> +		cache->attr.mode = KFD_SYSFS_FILE_MODE;
>> +		sysfs_attr_init(&cache->attr);
>> +		ret = sysfs_create_file(cache->kobj, &cache->attr);
>> +		if (ret < 0)
>> +			return ret;
>> +		i++;
>> +	}
>> +
>> +	i = 0;
>> +	list_for_each_entry(iolink, &dev->io_link_props, list) {
>> +		iolink->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
>> +		if (!iolink->kobj)
>> +			return -ENOMEM;
>> +		ret = kobject_init_and_add(iolink->kobj, &iolink_type,
>> +				dev->kobj_iolink, "%d", i);
>> +		if (ret < 0)
>> +			return ret;
>> +
>> +		iolink->attr.name = "properties";
>> +		iolink->attr.mode = KFD_SYSFS_FILE_MODE;
>> +		sysfs_attr_init(&iolink->attr);
>> +		ret = sysfs_create_file(iolink->kobj, &iolink->attr);
>> +		if (ret < 0)
>> +			return ret;
>> +		i++;
>> +}
>> +
>> +	return 0;
>> +}
>> +
>> +static int kfd_build_sysfs_node_tree(void)
>> +{
>> +	struct kfd_topology_device *dev;
>> +	int ret;
>> +	uint32_t i = 0;
>> +
>> +	list_for_each_entry(dev, &topology_device_list, list) {
>> +		ret = kfd_build_sysfs_node_entry(dev, 0);
>> +		if (ret < 0)
>> +			return ret;
>> +		i++;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void kfd_remove_sysfs_node_tree(void)
>> +{
>> +	struct kfd_topology_device *dev;
>> +
>> +	list_for_each_entry(dev, &topology_device_list, list)
>> +		kfd_remove_sysfs_node_entry(dev);
>> +}
>> +
>> +static int kfd_topology_update_sysfs(void)
>> +{
>> +	int ret;
>> +
>> +	pr_info("Creating topology SYSFS entries\n");
>> +	if (sys_props.kobj_topology == 0) {
>> +		sys_props.kobj_topology = kfd_alloc_struct(sys_props.kobj_topology);
>> +		if (!sys_props.kobj_topology)
>> +			return -ENOMEM;
>> +
>> +		ret = kobject_init_and_add(sys_props.kobj_topology,
>> +				&sysprops_type,  &kfd_device->kobj,
>> +				"topology");
>> +		if (ret < 0)
>> +			return ret;
>> +
>> +		sys_props.kobj_nodes = kobject_create_and_add("nodes",
>> +				sys_props.kobj_topology);
>> +		if (!sys_props.kobj_nodes)
>> +			return -ENOMEM;
>> +
>> +		sys_props.attr_genid.name = "generation_id";
>> +		sys_props.attr_genid.mode = KFD_SYSFS_FILE_MODE;
>> +		sysfs_attr_init(&sys_props.attr_genid);
>> +		ret = sysfs_create_file(sys_props.kobj_topology,
>> +				&sys_props.attr_genid);
>> +		if (ret < 0)
>> +			return ret;
>> +
>> +		sys_props.attr_props.name = "system_properties";
>> +		sys_props.attr_props.mode = KFD_SYSFS_FILE_MODE;
>> +		sysfs_attr_init(&sys_props.attr_props);
>> +		ret = sysfs_create_file(sys_props.kobj_topology,
>> +				&sys_props.attr_props);
>> +		if (ret < 0)
>> +			return ret;
>> +	}
>> +
>> +	kfd_remove_sysfs_node_tree();
>> +
>> +	return kfd_build_sysfs_node_tree();
>> +}
>> +
>> +static void kfd_topology_release_sysfs(void)
>> +{
>> +	kfd_remove_sysfs_node_tree();
>> +	if (sys_props.kobj_topology) {
>> +		sysfs_remove_file(sys_props.kobj_topology,
>> +				&sys_props.attr_genid);
>> +		sysfs_remove_file(sys_props.kobj_topology,
>> +				&sys_props.attr_props);
>> +		if (sys_props.kobj_nodes) {
>> +			kobject_del(sys_props.kobj_nodes);
>> +			kobject_put(sys_props.kobj_nodes);
>> +			sys_props.kobj_nodes = 0;
>> +		}
>> +		kobject_del(sys_props.kobj_topology);
>> +		kobject_put(sys_props.kobj_topology);
>> +		sys_props.kobj_topology = 0;
>> +	}
>> +}
>> +
>> +int kfd_topology_init(void)
>> +{
>> +	void *crat_image = 0;
>> +	size_t image_size = 0;
>> +	int ret;
>> +
>> +	/*
>> +	 * Initialize the head for the topology device list
>> +	 */
>> +	INIT_LIST_HEAD(&topology_device_list);
>> +	init_rwsem(&topology_lock);
>> +	topology_crat_parsed = 0;
>> +
>> +	memset(&sys_props, 0, sizeof(sys_props));
>> +
>> +	/*
>> +	 * Get the CRAT image from the ACPI
>> +	 */
>> +	ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
>> +	if (ret == 0 && image_size > 0) {
>> +		pr_info("Found CRAT image with size=%zd\n", image_size);
>> +		crat_image = kmalloc(image_size, GFP_KERNEL);
>> +		if (!crat_image) {
>> +			ret = -ENOMEM;
>> +			pr_err("No memory for allocating CRAT image\n");
>> +			goto err;
>> +		}
>> +		ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
>> +
>> +		if (ret == 0) {
>> +			down_write(&topology_lock);
>> +			ret = kfd_parse_crat_table(crat_image);
>> +			if (ret == 0)
>> +				ret = kfd_topology_update_sysfs();
>> +			up_write(&topology_lock);
>> +		} else {
>> +			pr_err("Couldn't get CRAT table size from ACPI\n");
>> +		}
>> +		kfree(crat_image);
>> +	} else if (ret == -ENODATA) {
>> +		ret = 0;
>> +	} else {
>> +		pr_err("Couldn't get CRAT table size from ACPI\n");
>> +	}
>> +
>> +err:
>> +	pr_info("Finished initializing topology ret=%d\n", ret);
>> +	return ret;
>> +}
>> +
>> +void kfd_topology_shutdown(void)
>> +{
>> +	kfd_topology_release_sysfs();
>> +	kfd_release_live_view();
>> +}
>> +
>> +static void kfd_debug_print_topology(void)
>> +{
>> +	struct kfd_topology_device *dev;
>> +	uint32_t i = 0;
>> +
>> +	pr_info("DEBUG PRINT OF TOPOLOGY:");
>> +	list_for_each_entry(dev, &topology_device_list, list) {
>> +		pr_info("Node: %d\n", i);
>> +		pr_info("\tGPU assigned: %s\n", (dev->gpu ? "yes" : "no"));
>> +		pr_info("\tCPU count: %d\n", dev->node_props.cpu_cores_count);
>> +		pr_info("\tSIMD count: %d", dev->node_props.simd_count);
>> +		i++;
>> +	}
>> +}
>> +
>> +static uint32_t kfd_generate_gpu_id(struct kfd_dev *gpu)
>> +{
>> +	uint32_t hashout;
>> +	uint32_t buf[7];
>> +	int i;
>> +
>> +	if (!gpu)
>> +		return 0;
>> +
>> +	buf[0] = gpu->pdev->devfn;
>> +	buf[1] = gpu->pdev->subsystem_vendor;
>> +	buf[2] = gpu->pdev->subsystem_device;
>> +	buf[3] = gpu->pdev->device;
>> +	buf[4] = gpu->pdev->bus->number;
>> +	buf[5] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) & 0xffffffff);
>> +	buf[6] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) >> 32);
>> +
>> +	for (i = 0, hashout = 0; i < 7; i++)
>> +		hashout ^= hash_32(buf[i], KFD_GPU_ID_HASH_WIDTH);
>> +
>> +	return hashout;
>> +}
>> +
>> +static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
>> +{
>> +	struct kfd_topology_device *dev;
>> +	struct kfd_topology_device *out_dev = 0;
>> +
>> +	BUG_ON(!gpu);
>> +
>> +	list_for_each_entry(dev, &topology_device_list, list)
>> +		if (dev->gpu == 0 && dev->node_props.simd_count > 0) {
>> +			dev->gpu = gpu;
>> +			out_dev = dev;
>> +			break;
>> +		}
>> +
>> +	return out_dev;
>> +}
>> +
>> +static void kfd_notify_gpu_change(uint32_t gpu_id, int arrival)
>> +{
>> +	/*
>> +	 * TODO: Generate an event for thunk about the arrival/removal
>> +	 * of the GPU
>> +	 */
>> +}
>> +
>> +int kfd_topology_add_device(struct kfd_dev *gpu)
>> +{
>> +	uint32_t gpu_id;
>> +	struct kfd_topology_device *dev;
>> +	int res;
>> +
>> +	BUG_ON(!gpu);
>> +
>> +	gpu_id = kfd_generate_gpu_id(gpu);
>> +
>> +	pr_debug("kfd: Adding new GPU (ID: 0x%x) to topology\n", gpu_id);
>> +
>> +	down_write(&topology_lock);
>> +	/*
>> +	 * Try to assign the GPU to existing topology device (generated from
>> +	 * CRAT table
>> +	 */
>> +	dev = kfd_assign_gpu(gpu);
>> +	if (!dev) {
>> +		pr_info("GPU was not found in the current topology. Extending.\n");
>> +		kfd_debug_print_topology();
>> +		dev = kfd_create_topology_device();
>> +		if (!dev) {
>> +			res = -ENOMEM;
>> +			goto err;
>> +		}
>> +		dev->gpu = gpu;
>> +
>> +		/*
>> +		 * TODO: Make a call to retrieve topology information from the
>> +		 * GPU vBIOS
>> +		 */
>> +
>> +		/*
>> +		 * Update the SYSFS tree, since we added another topology device
>> +		 */
>> +		if (kfd_topology_update_sysfs() < 0)
>> +			kfd_topology_release_sysfs();
>> +
>> +	}
>> +
>> +	dev->gpu_id = gpu_id;
>> +	gpu->id = gpu_id;
>> +	dev->node_props.vendor_id = gpu->pdev->vendor;
>> +	dev->node_props.device_id = gpu->pdev->device;
>> +	dev->node_props.location_id = (gpu->pdev->bus->number << 24) +
>> +			(gpu->pdev->devfn & 0xffffff);
>> +	/*
>> +	 * TODO: Retrieve max engine clock values from KGD
>> +	 */
>> +
>> +	res = 0;
>> +
>> +err:
>> +	up_write(&topology_lock);
>> +
>> +	if (res == 0)
>> +		kfd_notify_gpu_change(gpu_id, 1);
>> +
>> +	return res;
>> +}
>> +
>> +int kfd_topology_remove_device(struct kfd_dev *gpu)
>> +{
>> +	struct kfd_topology_device *dev;
>> +	uint32_t gpu_id;
>> +	int res = -ENODEV;
>> +
>> +	BUG_ON(!gpu);
>> +
>> +	down_write(&topology_lock);
>> +
>> +	list_for_each_entry(dev, &topology_device_list, list)
>> +		if (dev->gpu == gpu) {
>> +			gpu_id = dev->gpu_id;
>> +			kfd_remove_sysfs_node_entry(dev);
>> +			kfd_release_topology_device(dev);
>> +			res = 0;
>> +			if (kfd_topology_update_sysfs() < 0)
>> +				kfd_topology_release_sysfs();
>> +			break;
>> +		}
>> +
>> +	up_write(&topology_lock);
>> +
>> +	if (res == 0)
>> +		kfd_notify_gpu_change(gpu_id, 0);
>> +
>> +	return res;
>> +}
>> +
>> +/*
>> + * When idx is out of bounds, the function will return NULL
>> + */
>> +struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx)
>> +{
>> +
>> +	struct kfd_topology_device *top_dev;
>> +	struct kfd_dev *device = NULL;
>> +	uint8_t device_idx = 0;
>> +
>> +	down_read(&topology_lock);
>> +
>> +	list_for_each_entry(top_dev, &topology_device_list, list) {
>> +		if (device_idx == idx) {
>> +			device = top_dev->gpu;
>> +			break;
>> +		}
>> +
>> +		device_idx++;
>> +	}
>> +
>> +	up_read(&topology_lock);
>> +
>> +	return device;
>> +
>> +}
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
>> new file mode 100644
>> index 0000000..989624b
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_topology.h
>> @@ -0,0 +1,168 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#ifndef __KFD_TOPOLOGY_H__
>> +#define __KFD_TOPOLOGY_H__
>> +
>> +#include <linux/types.h>
>> +#include <linux/list.h>
>> +#include "kfd_priv.h"
>> +
>> +#define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
>> +
>> +#define HSA_CAP_HOT_PLUGGABLE			0x00000001
>> +#define HSA_CAP_ATS_PRESENT			0x00000002
>> +#define HSA_CAP_SHARED_WITH_GRAPHICS		0x00000004
>> +#define HSA_CAP_QUEUE_SIZE_POW2			0x00000008
>> +#define HSA_CAP_QUEUE_SIZE_32BIT		0x00000010
>> +#define HSA_CAP_QUEUE_IDLE_EVENT		0x00000020
>> +#define HSA_CAP_VA_LIMIT			0x00000040
>> +#define HSA_CAP_WATCH_POINTS_SUPPORTED		0x00000080
>> +#define HSA_CAP_WATCH_POINTS_TOTALBITS_MASK	0x00000f00
>> +#define HSA_CAP_WATCH_POINTS_TOTALBITS_SHIFT	8
>> +#define HSA_CAP_RESERVED			0xfffff000
>> +
>> +struct kfd_node_properties {
>> +	uint32_t cpu_cores_count;
>> +	uint32_t simd_count;
>> +	uint32_t mem_banks_count;
>> +	uint32_t caches_count;
>> +	uint32_t io_links_count;
>> +	uint32_t cpu_core_id_base;
>> +	uint32_t simd_id_base;
>> +	uint32_t capability;
>> +	uint32_t max_waves_per_simd;
>> +	uint32_t lds_size_in_kb;
>> +	uint32_t gds_size_in_kb;
>> +	uint32_t wave_front_size;
>> +	uint32_t array_count;
>> +	uint32_t simd_arrays_per_engine;
>> +	uint32_t cu_per_simd_array;
>> +	uint32_t simd_per_cu;
>> +	uint32_t max_slots_scratch_cu;
>> +	uint32_t engine_id;
>> +	uint32_t vendor_id;
>> +	uint32_t device_id;
>> +	uint32_t location_id;
>> +	uint32_t max_engine_clk_fcompute;
>> +	uint32_t max_engine_clk_ccompute;
>> +	uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
>> +};
>> +
>> +#define HSA_MEM_HEAP_TYPE_SYSTEM	0
>> +#define HSA_MEM_HEAP_TYPE_FB_PUBLIC	1
>> +#define HSA_MEM_HEAP_TYPE_FB_PRIVATE	2
>> +#define HSA_MEM_HEAP_TYPE_GPU_GDS	3
>> +#define HSA_MEM_HEAP_TYPE_GPU_LDS	4
>> +#define HSA_MEM_HEAP_TYPE_GPU_SCRATCH	5
>> +
>> +#define HSA_MEM_FLAGS_HOT_PLUGGABLE	0x00000001
>> +#define HSA_MEM_FLAGS_NON_VOLATILE	0x00000002
>> +#define HSA_MEM_FLAGS_RESERVED		0xfffffffc
>> +
>> +struct kfd_mem_properties {
>> +	struct list_head	list;
>> +	uint32_t		heap_type;
>> +	uint64_t		size_in_bytes;
>> +	uint32_t		flags;
>> +	uint32_t		width;
>> +	uint32_t		mem_clk_max;
>> +	struct kobject		*kobj;
>> +	struct attribute	attr;
>> +};
>> +
>> +#define KFD_TOPOLOGY_CPU_SIBLINGS 256
>> +
>> +#define HSA_CACHE_TYPE_DATA		0x00000001
>> +#define HSA_CACHE_TYPE_INSTRUCTION	0x00000002
>> +#define HSA_CACHE_TYPE_CPU		0x00000004
>> +#define HSA_CACHE_TYPE_HSACU		0x00000008
>> +#define HSA_CACHE_TYPE_RESERVED		0xfffffff0
>> +
>> +struct kfd_cache_properties {
>> +	struct list_head	list;
>> +	uint32_t		processor_id_low;
>> +	uint32_t		cache_level;
>> +	uint32_t		cache_size;
>> +	uint32_t		cacheline_size;
>> +	uint32_t		cachelines_per_tag;
>> +	uint32_t		cache_assoc;
>> +	uint32_t		cache_latency;
>> +	uint32_t		cache_type;
>> +	uint8_t			sibling_map[KFD_TOPOLOGY_CPU_SIBLINGS];
>> +	struct kobject		*kobj;
>> +	struct attribute	attr;
>> +};
>> +
>> +struct kfd_iolink_properties {
>> +	struct list_head	list;
>> +	uint32_t		iolink_type;
>> +	uint32_t		ver_maj;
>> +	uint32_t		ver_min;
>> +	uint32_t		node_from;
>> +	uint32_t		node_to;
>> +	uint32_t		weight;
>> +	uint32_t		min_latency;
>> +	uint32_t		max_latency;
>> +	uint32_t		min_bandwidth;
>> +	uint32_t		max_bandwidth;
>> +	uint32_t		rec_transfer_size;
>> +	uint32_t		flags;
>> +	struct kobject		*kobj;
>> +	struct attribute	attr;
>> +};
>> +
>> +struct kfd_topology_device {
>> +	struct list_head		list;
>> +	uint32_t			gpu_id;
>> +	struct kfd_node_properties	node_props;
>> +	uint32_t			mem_bank_count;
>> +	struct list_head		mem_props;
>> +	uint32_t			cache_count;
>> +	struct list_head		cache_props;
>> +	uint32_t			io_link_count;
>> +	struct list_head		io_link_props;
>> +	struct kfd_dev			*gpu;
>> +	struct kobject			*kobj_node;
>> +	struct kobject			*kobj_mem;
>> +	struct kobject			*kobj_cache;
>> +	struct kobject			*kobj_iolink;
>> +	struct attribute		attr_gpuid;
>> +	struct attribute		attr_name;
>> +	struct attribute		attr_props;
>> +};
>> +
>> +struct kfd_system_properties {
>> +	uint32_t		num_devices;     /* Number of H-NUMA nodes */
>> +	uint32_t		generation_count;
>> +	uint64_t		platform_oem;
>> +	uint64_t		platform_id;
>> +	uint64_t		platform_rev;
>> +	struct kobject		*kobj_topology;
>> +	struct kobject		*kobj_nodes;
>> +	struct attribute	attr_genid;
>> +	struct attribute	attr_props;
>> +};
>> +
>> +
>> +
>> +#endif /* __KFD_TOPOLOGY_H__ */
>> --
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 11/25] amdkfd: Add basic modules to amdkfd
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (6 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 10/25] amdkfd: Add topology module to amdkfd Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-20 23:02   ` Jerome Glisse
  2014-07-17 13:29 ` [PATCH v2 12/25] amdkfd: Add binding/unbinding calls to amd_iommu driver Oded Gabbay
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Andrew Lewycky <Andrew.Lewycky@amd.com>

This patch adds the process module and 4 helper modules:

- kfd_process, which handles process which open /dev/kfd
- kfd_doorbell, which provides helper functions for doorbell allocation, release and mapping to userspace
- kfd_pasid, which provides helper functions for pasid allocation and release
- kfd_vidmem, which provides helper functions for allocation and release of memory from the gfx driver
- kfd_aperture, which provides helper functions for managing the LDS, Local GPU memory and Scratch memory apertures of the process

This patch only contains the basic kfd_process module, which doesn't contain the reference to the queue scheduler. This was done to allow easier code review.

Also, this patch doesn't contain the calls to the IOMMU driver for binding the pasid to the device. Again, this was done to allow easier code review

The kfd_process object is created when a process opens /dev/kfd and is closed when the mm_struct of that process is teared-down.

Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/Makefile       |   4 +-
 drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c | 123 +++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c  |  36 ++-
 drivers/gpu/drm/radeon/amdkfd/kfd_device.c   |   2 +
 drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c | 264 +++++++++++++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_module.c   |  22 ++
 drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c    |  97 +++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h     | 148 +++++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_process.c  | 374 +++++++++++++++++++++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c   |  96 +++++++
 10 files changed, 1163 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_process.c
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c

diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
index 08ecfcd..daf75a8 100644
--- a/drivers/gpu/drm/radeon/amdkfd/Makefile
+++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
@@ -4,6 +4,8 @@
 
 ccflags-y := -Iinclude/drm
 
-amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o
+amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
+		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
+		kfd_process.o
 
 obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c b/drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c
new file mode 100644
index 0000000..0468114
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c
@@ -0,0 +1,123 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/device.h>
+#include <linux/export.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/compat.h>
+#include <uapi/linux/kfd_ioctl.h>
+#include <linux/time.h>
+#include "kfd_priv.h"
+#include <linux/mm.h>
+#include <uapi/asm-generic/mman-common.h>
+#include <asm/processor.h>
+
+
+#define MAKE_GPUVM_APP_BASE(gpu_num) (((uint64_t)(gpu_num) << 61) + 0x1000000000000)
+#define MAKE_GPUVM_APP_LIMIT(base) (((uint64_t)(base) & 0xFFFFFF0000000000) | 0xFFFFFFFFFF)
+#define MAKE_SCRATCH_APP_BASE(gpu_num) (((uint64_t)(gpu_num) << 61) + 0x100000000)
+#define MAKE_SCRATCH_APP_LIMIT(base) (((uint64_t)base & 0xFFFFFFFF00000000) | 0xFFFFFFFF)
+#define MAKE_LDS_APP_BASE(gpu_num) (((uint64_t)(gpu_num) << 61) + 0x0)
+#define MAKE_LDS_APP_LIMIT(base) (((uint64_t)(base) & 0xFFFFFFFF00000000) | 0xFFFFFFFF)
+
+#define HSA_32BIT_LDS_APP_SIZE 0x10000
+#define HSA_32BIT_LDS_APP_ALIGNMENT 0x10000
+
+static unsigned long kfd_reserve_aperture(struct kfd_process *process, unsigned long len, unsigned long alignment)
+{
+
+	unsigned long addr = 0;
+	unsigned long start_address;
+
+	/*
+	 * Go bottom up and find the first available aligned address.
+	 * We may narrow space to scan by getting mmap range limits.
+	 */
+	for (start_address =  alignment; start_address < (TASK_SIZE - alignment); start_address += alignment) {
+		addr = vm_mmap(NULL, start_address, len, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, 0);
+		if (!IS_ERR_VALUE(addr)) {
+			if (addr == start_address)
+				return addr;
+			vm_munmap(addr, len);
+		}
+	}
+	return 0;
+
+}
+
+int kfd_init_apertures(struct kfd_process *process)
+{
+	uint8_t id  = 0;
+	struct kfd_dev *dev;
+	struct kfd_process_device *pdd;
+
+	mutex_lock(&process->mutex);
+
+	/*Iterating over all devices*/
+	while ((dev = kfd_topology_enum_kfd_devices(id)) != NULL && id < NUM_OF_SUPPORTED_GPUS) {
+
+		pdd = kfd_get_process_device_data(dev, process);
+
+		/*for 64 bit process aperture will be statically reserved in the non canonical process address space
+		 *for 32 bit process the aperture will be reserved in the process address space
+		 */
+		if (process->is_32bit_user_mode) {
+			/*try to reserve aperture. continue on failure, just put the aperture size to be 0*/
+			pdd->lds_base = kfd_reserve_aperture(
+						process,
+						HSA_32BIT_LDS_APP_SIZE,
+						HSA_32BIT_LDS_APP_ALIGNMENT);
+
+			if (pdd->lds_base)
+				pdd->lds_limit = pdd->lds_base + HSA_32BIT_LDS_APP_SIZE - 1;
+			else
+				pdd->lds_limit = 0;
+
+			/*GPUVM and Scratch apertures are not supported*/
+			pdd->gpuvm_base = pdd->gpuvm_limit = pdd->scratch_base = pdd->scratch_limit = 0;
+		} else {
+			/*node id couldn't be 0 - the three MSB bits of aperture shoudn't be 0*/
+			pdd->lds_base = MAKE_LDS_APP_BASE(id + 1);
+			pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
+			pdd->gpuvm_base = MAKE_GPUVM_APP_BASE(id + 1);
+			pdd->gpuvm_limit = MAKE_GPUVM_APP_LIMIT(pdd->gpuvm_base);
+			pdd->scratch_base = MAKE_SCRATCH_APP_BASE(id + 1);
+			pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
+		}
+
+		dev_dbg(kfd_device, "node id %u, gpu id %u, lds_base %llX lds_limit %llX gpuvm_base %llX gpuvm_limit %llX scratch_base %llX scratch_limit %llX",
+				id, pdd->dev->id, pdd->lds_base, pdd->lds_limit, pdd->gpuvm_base, pdd->gpuvm_limit, pdd->scratch_base, pdd->scratch_limit);
+
+		id++;
+	}
+
+	mutex_unlock(&process->mutex);
+
+	return 0;
+}
+
+
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
index b98bcb7..d6580a6 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
@@ -38,6 +38,7 @@
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
+static int kfd_mmap(struct file *, struct vm_area_struct *);
 
 static const char kfd_dev_name[] = "kfd";
 
@@ -46,6 +47,7 @@ static const struct file_operations kfd_fops = {
 	.unlocked_ioctl = kfd_ioctl,
 	.compat_ioctl = kfd_ioctl,
 	.open = kfd_open,
+	.mmap = kfd_mmap,
 };
 
 static int kfd_char_dev_major = -1;
@@ -96,9 +98,22 @@ struct device *kfd_chardev(void)
 
 static int kfd_open(struct inode *inode, struct file *filep)
 {
+	struct kfd_process *process;
+
 	if (iminor(inode) != 0)
 		return -ENODEV;
 
+	process = kfd_create_process(current);
+	if (IS_ERR(process))
+		return PTR_ERR(process);
+
+	process->is_32bit_user_mode = is_compat_task();
+
+	dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
+		process->pasid, process->is_32bit_user_mode);
+
+	kfd_init_apertures(process);
+
 	return 0;
 }
 
@@ -152,8 +167,9 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 		"ioctl cmd 0x%x (#%d), arg 0x%lx\n",
 		cmd, _IOC_NR(cmd), arg);
 
-	/* TODO: add function that retrieves process */
-	process = NULL;
+	process = kfd_get_process(current);
+	if (IS_ERR(process))
+		return PTR_ERR(process);
 
 	switch (cmd) {
 	case KFD_IOC_CREATE_QUEUE:
@@ -201,3 +217,19 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 
 	return err;
 }
+
+static int
+kfd_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+	unsigned long pgoff = vma->vm_pgoff;
+	struct kfd_process *process;
+
+	process = kfd_get_process(current);
+	if (IS_ERR(process))
+		return PTR_ERR(process);
+
+	if (pgoff >= KFD_MMAP_DOORBELL_START && pgoff < KFD_MMAP_DOORBELL_END)
+		return kfd_doorbell_mmap(process, vma);
+
+	return -EINVAL;
+}
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
index 4138694..f6a7cf7 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
@@ -100,6 +100,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 {
 	kfd->shared_resources = *gpu_resources;
 
+	kfd_doorbell_init(kfd);
+
 	if (kfd_topology_add_device(kfd) != 0)
 		return false;
 
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c
new file mode 100644
index 0000000..972eaea
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c
@@ -0,0 +1,264 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "kfd_priv.h"
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/slab.h>
+
+/*
+ * This extension supports a kernel level doorbells management for
+ * the kernel queues.
+ * Basically the last doorbells page is devoted to kernel queues
+ * and that's assures that any user process won't get access to the
+ * kernel doorbells page
+ */
+static DEFINE_MUTEX(doorbell_mutex);
+static unsigned long doorbell_available_index[DIV_ROUND_UP(MAX_PROCESS_QUEUES, BITS_PER_LONG)] = { 0 };
+#define KERNEL_DOORBELL_PASID 1
+
+/*
+ * Each device exposes a doorbell aperture, a PCI MMIO aperture that
+ * receives 32-bit writes that are passed to queues as wptr values.
+ * The doorbells are intended to be written by applications as part
+ * of queueing work on user-mode queues.
+ * We assign doorbells to applications in PAGE_SIZE-sized and aligned chunks.
+ * We map the doorbell address space into user-mode when a process creates
+ * its first queue on each device.
+ * Although the mapping is done by KFD, it is equivalent to an mmap of
+ * the /dev/kfd with the particular device encoded in the mmap offset.
+ * There will be other uses for mmap of /dev/kfd, so only a range of
+ * offsets (KFD_MMAP_DOORBELL_START-END) is used for doorbells.
+ */
+
+/* # of doorbell bytes allocated for each process. */
+static inline size_t doorbell_process_allocation(void)
+{
+	return roundup(sizeof(doorbell_t) * MAX_PROCESS_QUEUES, PAGE_SIZE);
+}
+
+/* Doorbell calculations for device init. */
+void kfd_doorbell_init(struct kfd_dev *kfd)
+{
+	size_t doorbell_start_offset;
+	size_t doorbell_aperture_size;
+	size_t doorbell_process_limit;
+
+	/*
+	 * We start with calculations in bytes because the input data might
+	 * only be byte-aligned.
+	 * Only after we have done the rounding can we assume any alignment.
+	 */
+
+	doorbell_start_offset = roundup(kfd->shared_resources.doorbell_start_offset,
+					doorbell_process_allocation());
+	doorbell_aperture_size = rounddown(kfd->shared_resources.doorbell_aperture_size,
+					doorbell_process_allocation());
+
+	if (doorbell_aperture_size > doorbell_start_offset)
+		doorbell_process_limit =
+			(doorbell_aperture_size - doorbell_start_offset) / doorbell_process_allocation();
+	else
+		doorbell_process_limit = 0;
+
+	kfd->doorbell_base = kfd->shared_resources.doorbell_physical_address + doorbell_start_offset;
+	kfd->doorbell_id_offset = doorbell_start_offset / sizeof(doorbell_t);
+	kfd->doorbell_process_limit = doorbell_process_limit - 1;
+
+	kfd->doorbell_kernel_ptr = ioremap(kfd->doorbell_base, doorbell_process_allocation());
+	BUG_ON(!kfd->doorbell_kernel_ptr);
+
+	pr_debug("kfd: doorbell initialization\n"
+				 "     doorbell base           == 0x%08lX\n"
+				 "     doorbell_id_offset      == 0x%08lu\n"
+				 "     doorbell_process_limit  == 0x%08lu\n"
+				 "     doorbell_kernel_offset  == 0x%08lX\n"
+				 "     doorbell aperture size  == 0x%08lX\n"
+				 "     doorbell kernel address == 0x%08lX\n",
+				 (uintptr_t)kfd->doorbell_base,
+				 kfd->doorbell_id_offset,
+				 doorbell_process_limit,
+				 (uintptr_t)kfd->doorbell_base,
+				 kfd->shared_resources.doorbell_aperture_size,
+				 (uintptr_t)kfd->doorbell_kernel_ptr);
+
+}
+
+/*
+ * This is the /dev/kfd mmap (for doorbell) implementation.
+ * We intend that this is only called through map_doorbells, not through
+ * user-mode mmap of /dev/kfd
+ */
+int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma)
+{
+	unsigned int device_index;
+	struct kfd_dev *dev;
+	phys_addr_t start;
+
+	BUG_ON(vma->vm_pgoff < KFD_MMAP_DOORBELL_START || vma->vm_pgoff >= KFD_MMAP_DOORBELL_END);
+
+	/* For simplicitly we only allow mapping of the entire doorbell allocation of a single device & process. */
+	if (vma->vm_end - vma->vm_start != doorbell_process_allocation())
+		return -EINVAL;
+
+	/* device_index must be GPU ID!! */
+	device_index = vma->vm_pgoff - KFD_MMAP_DOORBELL_START;
+
+	dev = kfd_device_by_id(device_index);
+	if (dev == NULL)
+		return -EINVAL;
+
+	vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE | VM_DONTDUMP | VM_PFNMAP;
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+
+	start = dev->doorbell_base + process->pasid * doorbell_process_allocation();
+
+	pr_debug("kfd: mapping doorbell page in kfd_doorbell_mmap\n"
+		 "     target user address == 0x%016llX\n"
+		 "     physical address    == 0x%016llX\n"
+		 "     vm_flags            == 0x%08lX\n"
+		 "     size                == 0x%08lX\n",
+		 (long long unsigned int) vma->vm_start, start, vma->vm_flags,
+		 doorbell_process_allocation());
+
+	return io_remap_pfn_range(vma,
+				vma->vm_start,
+				start >> PAGE_SHIFT,
+				doorbell_process_allocation(),
+				vma->vm_page_prot);
+}
+
+/*
+ * Map the doorbells for a single process & device.
+ * This will indirectly call kfd_doorbell_mmap.
+ * This assumes that the process mutex is being held.
+ */
+static int map_doorbells(struct file *devkfd, struct kfd_process *process,
+				struct kfd_dev *dev)
+{
+	struct kfd_process_device *pdd = kfd_get_process_device_data(dev, process);
+
+	if (pdd == NULL)
+		return -ENOMEM;
+
+	if (pdd->doorbell_mapping == NULL) {
+		unsigned long offset = (KFD_MMAP_DOORBELL_START + dev->id) << PAGE_SHIFT;
+		doorbell_t __user *doorbell_mapping;
+
+		doorbell_mapping = (doorbell_t __user *)vm_mmap(devkfd, 0, doorbell_process_allocation(), PROT_WRITE,
+								MAP_SHARED, offset);
+		if (IS_ERR(doorbell_mapping))
+			return PTR_ERR(doorbell_mapping);
+
+		pdd->doorbell_mapping = doorbell_mapping;
+	}
+
+	return 0;
+}
+
+/* get kernel iomem pointer for a doorbell */
+u32 __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd, unsigned int *doorbell_off)
+{
+	u32 inx;
+
+	BUG_ON(!kfd || !doorbell_off);
+
+	mutex_lock(&doorbell_mutex);
+	inx = find_first_zero_bit(doorbell_available_index, MAX_PROCESS_QUEUES);
+	__set_bit(inx, doorbell_available_index);
+	mutex_unlock(&doorbell_mutex);
+
+	if (inx >= MAX_PROCESS_QUEUES)
+		return NULL;
+
+	/* caluculating the kernel doorbell offset using "faked" kernel pasid that allocated for kernel queues only */
+	*doorbell_off = KERNEL_DOORBELL_PASID * (doorbell_process_allocation()/sizeof(doorbell_t)) + inx;
+
+	pr_debug("kfd: get kernel queue doorbell\n"
+			 "     doorbell offset   == 0x%08d\n"
+			 "     kernel address    == 0x%08lX\n",
+			 *doorbell_off, (uintptr_t)(kfd->doorbell_kernel_ptr + inx));
+
+	return kfd->doorbell_kernel_ptr + inx;
+}
+
+void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr)
+{
+	unsigned int inx;
+
+	BUG_ON(!kfd || !db_addr);
+
+	inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr);
+
+	mutex_lock(&doorbell_mutex);
+	__clear_bit(inx, doorbell_available_index);
+	mutex_unlock(&doorbell_mutex);
+}
+
+inline void write_kernel_doorbell(u32 __iomem *db, u32 value)
+{
+	if (db) {
+		writel(value, db);
+		pr_debug("writing %d to doorbell address 0x%p\n", value, db);
+	}
+}
+
+/*
+ * Get the user-mode address of a doorbell.
+ * Assumes that the process mutex is being held.
+ */
+doorbell_t __user *kfd_get_doorbell(struct file *devkfd,
+					struct kfd_process *process,
+					struct kfd_dev *dev,
+					unsigned int doorbell_index)
+{
+	struct kfd_process_device *pdd;
+	int err;
+
+	BUG_ON(doorbell_index > MAX_DOORBELL_INDEX);
+
+	err = map_doorbells(devkfd, process, dev);
+	if (err)
+		return ERR_PTR(err);
+
+	pdd = kfd_get_process_device_data(dev, process);
+	BUG_ON(pdd == NULL); /* map_doorbells would have failed otherwise */
+
+	pr_debug("doorbell value on creation 0x%x\n", pdd->doorbell_mapping[doorbell_index]);
+
+	return &pdd->doorbell_mapping[doorbell_index];
+}
+
+/*
+ * queue_ids are in the range [0,MAX_PROCESS_QUEUES) and are mapped 1:1
+ * to doorbells with the process's doorbell page
+ */
+unsigned int kfd_queue_id_to_doorbell(struct kfd_dev *kfd, struct kfd_process *process, unsigned int queue_id)
+{
+	/*
+	 * doorbell_id_offset accounts for doorbells taken by KGD.
+	 * pasid * doorbell_process_allocation/sizeof(doorbell_t) adjusts
+	 * to the process's doorbells
+	 */
+	return kfd->doorbell_id_offset + process->pasid * (doorbell_process_allocation()/sizeof(doorbell_t)) + queue_id;
+}
+
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
index c51f981..dc08f51 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
@@ -65,14 +65,30 @@ void kgd2kfd_exit(void)
 {
 }
 
+extern int kfd_process_exit(struct notifier_block *nb,
+				unsigned long action, void *data);
+
+static struct notifier_block kfd_mmput_nb = {
+	.notifier_call		= kfd_process_exit,
+	.priority		= 3,
+};
+
 static int __init kfd_module_init(void)
 {
 	int err;
 
+	err = kfd_pasid_init();
+	if (err < 0)
+		goto err_pasid;
+
 	err = kfd_chardev_init();
 	if (err < 0)
 		goto err_ioctl;
 
+	err = mmput_register_notifier(&kfd_mmput_nb);
+	if (err)
+		goto err_mmu_notifier;
+
 	err = kfd_topology_init();
 	if (err < 0)
 		goto err_topology;
@@ -82,15 +98,21 @@ static int __init kfd_module_init(void)
 	return 0;
 
 err_topology:
+	mmput_unregister_notifier(&kfd_mmput_nb);
+err_mmu_notifier:
 	kfd_chardev_exit();
 err_ioctl:
+	kfd_pasid_exit();
+err_pasid:
 	return err;
 }
 
 static void __exit kfd_module_exit(void)
 {
 	kfd_topology_shutdown();
+	mmput_unregister_notifier(&kfd_mmput_nb);
 	kfd_chardev_exit();
+	kfd_pasid_exit();
 	dev_info(kfd_device, "Removed module\n");
 }
 
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c b/drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c
new file mode 100644
index 0000000..0b594e4
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c
@@ -0,0 +1,97 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/slab.h>
+#include <linux/types.h>
+#include "kfd_priv.h"
+
+#define INITIAL_PASID_LIMIT (1<<20)
+
+static unsigned long *pasid_bitmap;
+static pasid_t pasid_limit;
+static DEFINE_MUTEX(pasid_mutex);
+
+int kfd_pasid_init(void)
+{
+	pasid_limit = INITIAL_PASID_LIMIT;
+
+	pasid_bitmap = kzalloc(DIV_ROUND_UP(INITIAL_PASID_LIMIT, BITS_PER_BYTE), GFP_KERNEL);
+	if (!pasid_bitmap)
+		return -ENOMEM;
+
+	set_bit(0, pasid_bitmap); /* PASID 0 is reserved. */
+
+	return 0;
+}
+
+void kfd_pasid_exit(void)
+{
+	kfree(pasid_bitmap);
+}
+
+bool kfd_set_pasid_limit(pasid_t new_limit)
+{
+	if (new_limit < pasid_limit) {
+		bool ok;
+
+		mutex_lock(&pasid_mutex);
+
+		/* ensure that no pasids >= new_limit are in-use */
+		ok = (find_next_bit(pasid_bitmap, pasid_limit, new_limit) == pasid_limit);
+		if (ok)
+			pasid_limit = new_limit;
+
+		mutex_unlock(&pasid_mutex);
+
+		return ok;
+	}
+
+	return true;
+}
+
+inline pasid_t kfd_get_pasid_limit(void)
+{
+	return pasid_limit;
+}
+
+pasid_t kfd_pasid_alloc(void)
+{
+	pasid_t found;
+
+	mutex_lock(&pasid_mutex);
+
+	found = find_first_zero_bit(pasid_bitmap, pasid_limit);
+	if (found == pasid_limit)
+		found = 0;
+	else
+		set_bit(found, pasid_bitmap);
+
+	mutex_unlock(&pasid_mutex);
+
+	return found;
+}
+
+void kfd_pasid_free(pasid_t pasid)
+{
+	BUG_ON(pasid == 0 || pasid >= pasid_limit);
+	clear_bit(pasid, pasid_bitmap);
+}
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
index b391e24..af5a5e4 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -32,14 +32,39 @@
 #include <linux/spinlock.h>
 #include "../radeon_kfd.h"
 
+/*
+ * Per-process limit. Each process can only
+ * create MAX_PROCESS_QUEUES across all devices
+ */
+#define MAX_PROCESS_QUEUES 1024
+
+#define MAX_DOORBELL_INDEX MAX_PROCESS_QUEUES
 #define KFD_SYSFS_FILE_MODE 0444
 
+/*
+ * We multiplex different sorts of mmap-able memory onto /dev/kfd.
+ * We figure out what type of memory the caller wanted by comparing
+ * the mmap page offset to known ranges.
+ */
+#define KFD_MMAP_DOORBELL_START	(((1ULL << 32)*1) >> PAGE_SHIFT)
+#define KFD_MMAP_DOORBELL_END	(((1ULL << 32)*2) >> PAGE_SHIFT)
+
 /* GPU ID hash width in bits */
 #define KFD_GPU_ID_HASH_WIDTH 16
 
 /* Macro for allocating structures */
 #define kfd_alloc_struct(ptr_to_struct)	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
 
+/*
+ * Large enough to hold the maximum usable pasid + 1.
+ * It must also be able to store the number of doorbells
+ * reported by a KFD device.
+ */
+typedef unsigned int pasid_t;
+
+/* Type that represents a HW doorbell slot. */
+typedef u32 doorbell_t;
+
 struct kfd_device_info {
 	const struct kfd_scheduler_class *scheduler_class;
 	unsigned int max_pasid_bits;
@@ -56,6 +81,17 @@ struct kfd_dev {
 
 	unsigned int id;		/* topology stub index */
 
+	phys_addr_t doorbell_base;	/* Start of actual doorbells used by
+					 * KFD. It is aligned for mapping
+					 * into user mode
+					 */
+	size_t doorbell_id_offset;	/* Doorbell offset (from KFD doorbell
+					 * to HW doorbell, GFX reserved some
+					 * at the start)
+					 */
+	size_t doorbell_process_limit;	/* Number of processes we have doorbell space for. */
+	u32 __iomem *doorbell_kernel_ptr; /* this is a pointer for a doorbells page used by kernel queue */
+
 	struct kgd2kfd_shared_resources shared_resources;
 };
 
@@ -68,15 +104,124 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd);
 
 extern const struct kfd2kgd_calls *kfd2kgd;
 
+/* Dummy struct just to make kfd_mem_obj* a unique pointer type. */
+struct kfd_mem_obj_s;
+typedef struct kfd_mem_obj_s *kfd_mem_obj;
+
+enum kfd_mempool {
+	KFD_MEMPOOL_SYSTEM_CACHEABLE = 1,
+	KFD_MEMPOOL_SYSTEM_WRITECOMBINE = 2,
+	KFD_MEMPOOL_FRAMEBUFFER = 3,
+};
+
+
+int kfd_vidmem_alloc(struct kfd_dev *kfd, size_t size, size_t alignment,
+			enum kfd_mempool pool, kfd_mem_obj *mem_obj);
+void kfd_vidmem_free(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
+int kfd_vidmem_gpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, uint64_t *vmid0_address);
+void kfd_vidmem_ungpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
+int kfd_vidmem_kmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, void **ptr);
+void kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
+int kfd_vidmem_alloc_map(struct kfd_dev *kfd, kfd_mem_obj *mem_obj, void **ptr,
+			uint64_t *vmid0_address, size_t size);
+void kfd_vidmem_free_unmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
 /* Character device interface */
 int kfd_chardev_init(void);
 void kfd_chardev_exit(void);
 struct device *kfd_chardev(void);
 
+
+/* Data that is per-process-per device. */
+struct kfd_process_device {
+	/*
+	 * List of all per-device data for a process.
+	 * Starts from kfd_process.per_device_data.
+	 */
+	struct list_head per_device_list;
+
+	/* The device that owns this data. */
+	struct kfd_dev *dev;
+
+	/* The user-mode address of the doorbell mapping for this device. */
+	doorbell_t __user *doorbell_mapping;
+
+	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
+	bool bound;
+
+	/*Apertures*/
+	uint64_t lds_base;
+	uint64_t lds_limit;
+	uint64_t gpuvm_base;
+	uint64_t gpuvm_limit;
+	uint64_t scratch_base;
+	uint64_t scratch_limit;
+};
+
 /* Process data */
 struct kfd_process {
+	struct list_head processes_list;
+
+	struct mm_struct *mm;
+
+	struct mutex mutex;
+
+	/*
+	 * In any process, the thread that started main() is the lead
+	 * thread and outlives the rest.
+	 * It is here because amd_iommu_bind_pasid wants a task_struct.
+	 */
+	struct task_struct *lead_thread;
+
+	pasid_t pasid;
+
+	/*
+	 * List of kfd_process_device structures,
+	 * one for each device the process is using.
+	 */
+	struct list_head per_device_data;
+
+	/* The process's queues. */
+	size_t queue_array_size;
+
+	/* Size is queue_array_size, up to MAX_PROCESS_QUEUES. */
+	struct kfd_queue **queues;
+
+	unsigned long allocated_queue_bitmap[DIV_ROUND_UP(MAX_PROCESS_QUEUES, BITS_PER_LONG)];
+
+	/*Is the user space process 32 bit?*/
+	bool is_32bit_user_mode;
 };
 
+struct kfd_process *kfd_create_process(const struct task_struct *);
+struct kfd_process *kfd_get_process(const struct task_struct *);
+
+struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
+							struct kfd_process *p);
+
+/* PASIDs */
+int kfd_pasid_init(void);
+void kfd_pasid_exit(void);
+bool kfd_set_pasid_limit(pasid_t new_limit);
+pasid_t kfd_get_pasid_limit(void);
+pasid_t kfd_pasid_alloc(void);
+void kfd_pasid_free(pasid_t pasid);
+
+/* Doorbells */
+void kfd_doorbell_init(struct kfd_dev *kfd);
+int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma);
+doorbell_t __user *kfd_get_doorbell(struct file *devkfd,
+					struct kfd_process *process,
+					struct kfd_dev *dev,
+					unsigned int doorbell_index);
+u32 __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
+					unsigned int *doorbell_off);
+void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr);
+u32 read_kernel_doorbell(u32 __iomem *db);
+void write_kernel_doorbell(u32 __iomem *db, u32 value);
+unsigned int kfd_queue_id_to_doorbell(struct kfd_dev *kfd,
+					struct kfd_process *process,
+					unsigned int queue_id);
+
 extern struct device *kfd_device;
 
 /* Topology */
@@ -95,4 +240,7 @@ void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
 void kgd2kfd_suspend(struct kfd_dev *dev);
 int kgd2kfd_resume(struct kfd_dev *dev);
 
+/* amdkfd Apertures */
+int kfd_init_apertures(struct kfd_process *process);
+
 #endif
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_process.c b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
new file mode 100644
index 0000000..5efbce0
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
@@ -0,0 +1,374 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/mutex.h>
+#include <linux/log2.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/notifier.h>
+struct mm_struct;
+
+#include "kfd_priv.h"
+
+/*
+ * Initial size for the array of queues.
+ * The allocated size is doubled each time
+ * it is exceeded up to MAX_PROCESS_QUEUES.
+ */
+#define INITIAL_QUEUE_ARRAY_SIZE 16
+
+/* List of struct kfd_process */
+static struct list_head kfd_processes_list = LIST_HEAD_INIT(kfd_processes_list);
+
+static DEFINE_MUTEX(kfd_processes_mutex);
+
+static struct kfd_process *create_process(const struct task_struct *thread);
+
+struct kfd_process *kfd_create_process(const struct task_struct *thread)
+{
+	struct kfd_process *process;
+
+	if (thread->mm == NULL)
+		return ERR_PTR(-EINVAL);
+
+	/* Only the pthreads threading model is supported. */
+	if (thread->group_leader->mm != thread->mm)
+		return ERR_PTR(-EINVAL);
+
+	/*
+	 * take kfd processes mutex before starting of process creation
+	 * so there won't be a case where two threads of the same process
+	 * create two kfd_process structures
+	 */
+	mutex_lock(&kfd_processes_mutex);
+
+	/* A prior open of /dev/kfd could have already created the process. */
+	process = thread->mm->kfd_process;
+	if (process)
+		pr_debug("kfd: process already found\n");
+
+	if (!process)
+		process = create_process(thread);
+
+	mutex_unlock(&kfd_processes_mutex);
+
+	return process;
+}
+
+struct kfd_process *kfd_get_process(const struct task_struct *thread)
+{
+	struct kfd_process *process;
+
+	if (thread->mm == NULL)
+		return ERR_PTR(-EINVAL);
+
+	/* Only the pthreads threading model is supported. */
+	if (thread->group_leader->mm != thread->mm)
+		return ERR_PTR(-EINVAL);
+
+	process = thread->mm->kfd_process;
+
+	return process;
+}
+
+static void free_process(struct kfd_process *p)
+{
+	struct kfd_process_device *pdd, *temp;
+
+	BUG_ON(p == NULL);
+
+	list_for_each_entry_safe(pdd, temp, &p->per_device_data, per_device_list) {
+		list_del(&pdd->per_device_list);
+		kfree(pdd);
+	}
+
+	kfd_pasid_free(p->pasid);
+
+	mutex_destroy(&p->mutex);
+
+	kfree(p->queues);
+
+	list_del(&p->processes_list);
+
+	kfree(p);
+}
+
+int kfd_process_exit(struct notifier_block *nb,
+			unsigned long action, void *data)
+{
+	struct mm_struct *mm = data;
+	struct kfd_process *p;
+
+	mutex_lock(&kfd_processes_mutex);
+
+	p = mm->kfd_process;
+	if (p) {
+		free_process(p);
+		mm->kfd_process = NULL;
+	}
+
+	mutex_unlock(&kfd_processes_mutex);
+
+	return 0;
+}
+
+static struct kfd_process *create_process(const struct task_struct *thread)
+{
+	struct kfd_process *process;
+	int err = -ENOMEM;
+
+	process = kzalloc(sizeof(*process), GFP_KERNEL);
+
+	if (!process)
+		goto err_alloc_process;
+
+	process->queues = kmalloc_array(INITIAL_QUEUE_ARRAY_SIZE, sizeof(process->queues[0]), GFP_KERNEL);
+	if (!process->queues)
+		goto err_alloc_queues;
+
+	process->pasid = kfd_pasid_alloc();
+	if (process->pasid == 0)
+		goto err_alloc_pasid;
+
+	mutex_init(&process->mutex);
+
+	process->mm = thread->mm;
+	thread->mm->kfd_process = process;
+	list_add_tail(&process->processes_list, &kfd_processes_list);
+
+	process->lead_thread = thread->group_leader;
+
+	process->queue_array_size = INITIAL_QUEUE_ARRAY_SIZE;
+
+	INIT_LIST_HEAD(&process->per_device_data);
+
+	return process;
+
+err_alloc_pasid:
+	kfree(process->queues);
+err_alloc_queues:
+	kfree(process);
+err_alloc_process:
+	return ERR_PTR(err);
+}
+
+struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
+							struct kfd_process *p)
+{
+	struct kfd_process_device *pdd;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list)
+		if (pdd->dev == dev)
+			return pdd;
+
+	pdd = kzalloc(sizeof(*pdd), GFP_KERNEL);
+	if (pdd != NULL) {
+		pdd->dev = dev;
+		list_add(&pdd->per_device_list, &p->per_device_data);
+	}
+
+	return pdd;
+}
+
+/*
+ * Direct the IOMMU to bind the process (specifically the pasid->mm) to the device.
+ * Unbinding occurs when the process dies or the device is removed.
+ *
+ * Assumes that the process lock is held.
+ */
+struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
+							struct kfd_process *p)
+{
+	struct kfd_process_device *pdd = kfd_get_process_device_data(dev, p);
+
+	if (pdd == NULL)
+		return ERR_PTR(-ENOMEM);
+
+	if (pdd->bound)
+		return pdd;
+
+	pdd->bound = true;
+
+	return pdd;
+}
+
+void kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid)
+{
+	struct kfd_process *p;
+	struct kfd_process_device *pdd;
+
+	BUG_ON(dev == NULL);
+
+	mutex_lock(&kfd_processes_mutex);
+
+	list_for_each_entry(p, &kfd_processes_list, processes_list)
+		if (p->pasid == pasid)
+			break;
+
+	mutex_unlock(&kfd_processes_mutex);
+
+	BUG_ON(p->pasid != pasid);
+
+	pdd = kfd_get_process_device_data(dev, p);
+
+	BUG_ON(pdd == NULL);
+
+	mutex_lock(&p->mutex);
+
+	/*
+	 * Just mark pdd as unbound, because we still need it to call
+	 * amd_iommu_unbind_pasid() in when the process exits.
+	 * We don't call amd_iommu_unbind_pasid() here
+	 * because the IOMMU called us.
+	 */
+	pdd->bound = false;
+
+	mutex_unlock(&p->mutex);
+}
+
+/*
+ * Ensure that the process's queue array is large enough to hold
+ * the queue at queue_id.
+ * Assumes that the process lock is held.
+ */
+static bool ensure_queue_array_size(struct kfd_process *p, unsigned int queue_id)
+{
+	size_t desired_size;
+	struct kfd_queue **new_queues;
+
+	compiletime_assert(INITIAL_QUEUE_ARRAY_SIZE > 0, "INITIAL_QUEUE_ARRAY_SIZE must not be 0");
+	compiletime_assert(INITIAL_QUEUE_ARRAY_SIZE <= MAX_PROCESS_QUEUES,
+			   "INITIAL_QUEUE_ARRAY_SIZE must be less than MAX_PROCESS_QUEUES");
+	/* Ensure that doubling the current size won't ever overflow. */
+	compiletime_assert(MAX_PROCESS_QUEUES < SIZE_MAX / 2, "MAX_PROCESS_QUEUES must be less than SIZE_MAX/2");
+
+	/*
+	 * These & queue_id < MAX_PROCESS_QUEUES guarantee that
+	 * the desired_size calculation will end up <= MAX_PROCESS_QUEUES
+	 */
+	compiletime_assert(is_power_of_2(INITIAL_QUEUE_ARRAY_SIZE), "INITIAL_QUEUE_ARRAY_SIZE must be power of 2.");
+	compiletime_assert(MAX_PROCESS_QUEUES % INITIAL_QUEUE_ARRAY_SIZE == 0,
+			   "MAX_PROCESS_QUEUES must be multiple of INITIAL_QUEUE_ARRAY_SIZE.");
+	compiletime_assert(is_power_of_2(MAX_PROCESS_QUEUES / INITIAL_QUEUE_ARRAY_SIZE),
+			   "MAX_PROCESS_QUEUES must be a power-of-2 multiple of INITIAL_QUEUE_ARRAY_SIZE.");
+
+	if (queue_id < p->queue_array_size)
+		return true;
+
+	if (queue_id >= MAX_PROCESS_QUEUES)
+		return false;
+
+	desired_size = p->queue_array_size;
+	while (desired_size <= queue_id)
+		desired_size *= 2;
+
+	BUG_ON(desired_size < queue_id || desired_size > MAX_PROCESS_QUEUES);
+	BUG_ON(desired_size % INITIAL_QUEUE_ARRAY_SIZE != 0 || !is_power_of_2(desired_size / INITIAL_QUEUE_ARRAY_SIZE));
+
+	new_queues = kmalloc_array(desired_size, sizeof(p->queues[0]), GFP_KERNEL);
+	if (!new_queues)
+		return false;
+
+	memcpy(new_queues, p->queues, p->queue_array_size * sizeof(p->queues[0]));
+
+	kfree(p->queues);
+	p->queues = new_queues;
+	p->queue_array_size = desired_size;
+
+	return true;
+}
+
+/* Assumes that the process lock is held. */
+bool kfd_allocate_queue_id(struct kfd_process *p, unsigned int *queue_id)
+{
+	unsigned int qid = find_first_zero_bit(p->allocated_queue_bitmap, MAX_PROCESS_QUEUES);
+
+	if (qid >= MAX_PROCESS_QUEUES)
+		return false;
+
+	if (!ensure_queue_array_size(p, qid))
+		return false;
+
+	__set_bit(qid, p->allocated_queue_bitmap);
+
+	p->queues[qid] = NULL;
+	*queue_id = qid;
+
+	return true;
+}
+
+/*
+ * Install a queue into a previously-allocated queue id.
+ * Assumes that the process lock is held.
+ */
+void kfd_install_queue(struct kfd_process *p, unsigned int queue_id, struct kfd_queue *queue)
+{
+	/* Have to call allocate_queue_id before install_queue. */
+	BUG_ON(queue_id >= p->queue_array_size);
+	BUG_ON(queue == NULL);
+
+	p->queues[queue_id] = queue;
+}
+
+/*
+ * Remove a queue from the open queue list and deallocate the queue id.
+ * This can be called whether or not a queue was installed.
+ * Assumes that the process lock is held.
+ */
+void kfd_remove_queue(struct kfd_process *p, unsigned int queue_id)
+{
+	BUG_ON(!test_bit(queue_id, p->allocated_queue_bitmap));
+	BUG_ON(queue_id >= p->queue_array_size);
+
+	__clear_bit(queue_id, p->allocated_queue_bitmap);
+}
+
+/* Assumes that the process lock is held. */
+struct kfd_queue *kfd_get_queue(struct kfd_process *p, unsigned int queue_id)
+{
+	/*
+	 * test_bit because the contents of unallocated
+	 * queue slots are undefined.
+	 * Otherwise ensure_queue_array_size would have to clear new entries and
+	 * remove_queue would have to NULL removed queues.
+	 */
+	return (queue_id < p->queue_array_size &&
+		test_bit(queue_id, p->allocated_queue_bitmap)) ?
+			p->queues[queue_id] : NULL;
+}
+
+struct kfd_process_device *kfd_get_first_process_device_data(struct kfd_process *p)
+{
+	return list_first_entry(&p->per_device_data, struct kfd_process_device, per_device_list);
+}
+
+struct kfd_process_device *kfd_get_next_process_device_data(struct kfd_process *p, struct kfd_process_device *pdd)
+{
+	if (list_is_last(&pdd->per_device_list, &p->per_device_data))
+		return NULL;
+	return list_next_entry(pdd, per_device_list);
+}
+
+bool kfd_has_process_device_data(struct kfd_process *p)
+{
+	return !(list_empty(&p->per_device_data));
+}
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c b/drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c
new file mode 100644
index 0000000..a2c4d30
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c
@@ -0,0 +1,96 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "kfd_priv.h"
+
+int kfd_vidmem_alloc(struct kfd_dev *kfd, size_t size, size_t alignment,
+			enum kfd_mempool pool, kfd_mem_obj *mem_obj)
+{
+	return kfd2kgd->allocate_mem(kfd->kgd,
+					size,
+					alignment,
+					(enum kgd_memory_pool)pool,
+					(struct kgd_mem **)mem_obj);
+}
+
+void kfd_vidmem_free(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
+{
+	kfd2kgd->free_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
+}
+
+int kfd_vidmem_gpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj,
+			uint64_t *vmid0_address)
+{
+	return kfd2kgd->gpumap_mem(kfd->kgd,
+					(struct kgd_mem *)mem_obj,
+					vmid0_address);
+}
+
+void kfd_vidmem_ungpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
+{
+	kfd2kgd->ungpumap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
+}
+
+int kfd_vidmem_kmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, void **ptr)
+{
+	return kfd2kgd->kmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj, ptr);
+}
+
+void kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
+{
+	kfd2kgd->unkmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
+}
+
+int kfd_vidmem_alloc_map(struct kfd_dev *kfd, kfd_mem_obj *mem_obj,
+			void **ptr, uint64_t *vmid0_address, size_t size)
+{
+	int retval;
+
+	retval = kfd_vidmem_alloc(kfd, size, PAGE_SIZE,
+				KFD_MEMPOOL_SYSTEM_WRITECOMBINE, mem_obj);
+	if (retval != 0)
+		goto fail_vidmem_alloc;
+
+	retval = kfd_vidmem_kmap(kfd, *mem_obj, ptr);
+	if (retval != 0)
+		goto fail_vidmem_kmap;
+
+	retval = kfd_vidmem_gpumap(kfd, *mem_obj, vmid0_address);
+	if (retval != 0)
+		goto fail_vidmem_gpumap;
+
+	return 0;
+
+fail_vidmem_gpumap:
+	kfd_vidmem_unkmap(kfd, *mem_obj);
+fail_vidmem_kmap:
+	kfd_vidmem_free(kfd, *mem_obj);
+fail_vidmem_alloc:
+	return retval;
+}
+
+void kfd_vidmem_free_unmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
+{
+	kfd_vidmem_ungpumap(kfd, mem_obj);
+	kfd_vidmem_unkmap(kfd, mem_obj);
+	kfd_vidmem_free(kfd, mem_obj);
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 11/25] amdkfd: Add basic modules to amdkfd
  2014-07-17 13:29 ` [PATCH v2 11/25] amdkfd: Add basic modules " Oded Gabbay
@ 2014-07-20 23:02   ` Jerome Glisse
  2014-08-02 19:25     ` Oded Gabbay
  0 siblings, 1 reply; 46+ messages in thread
From: Jerome Glisse @ 2014-07-20 23:02 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Evgeny Pinchuk, Alexey Skidanov, Alex Deucher, Andrew Morton,
	Christian König

On Thu, Jul 17, 2014 at 04:29:18PM +0300, Oded Gabbay wrote:
> From: Andrew Lewycky <Andrew.Lewycky@amd.com>
> 
> This patch adds the process module and 4 helper modules:
> 
> - kfd_process, which handles process which open /dev/kfd
> - kfd_doorbell, which provides helper functions for doorbell allocation, release and mapping to userspace
> - kfd_pasid, which provides helper functions for pasid allocation and release
> - kfd_vidmem, which provides helper functions for allocation and release of memory from the gfx driver
> - kfd_aperture, which provides helper functions for managing the LDS, Local GPU memory and Scratch memory apertures of the process
> 
> This patch only contains the basic kfd_process module, which doesn't contain the reference to the queue scheduler. This was done to allow easier code review.
> 
> Also, this patch doesn't contain the calls to the IOMMU driver for binding the pasid to the device. Again, this was done to allow easier code review
> 
> The kfd_process object is created when a process opens /dev/kfd and is closed when the mm_struct of that process is teared-down.

So i valid argument were made to have one file per device and because this is not
a common hsa architecture i am rather reluctant to add the /dev/kfd directory just
for a temporary solution until people inside the HSA foundation get there act to-
gether and work on a common API.

So i rather have all kfd temporary solution inside the radeon driver under the
drm folder. I think we have enough ioctl left to accomodate you.

> 
> Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/drm/radeon/amdkfd/Makefile       |   4 +-
>  drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c | 123 +++++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c  |  36 ++-
>  drivers/gpu/drm/radeon/amdkfd/kfd_device.c   |   2 +
>  drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c | 264 +++++++++++++++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_module.c   |  22 ++
>  drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c    |  97 +++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_priv.h     | 148 +++++++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_process.c  | 374 +++++++++++++++++++++++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c   |  96 +++++++
>  10 files changed, 1163 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_process.c
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c
> 
> diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
> index 08ecfcd..daf75a8 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/Makefile
> +++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
> @@ -4,6 +4,8 @@
>  
>  ccflags-y := -Iinclude/drm
>  
> -amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o
> +amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
> +		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
> +		kfd_process.o
>  
>  obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c b/drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c
> new file mode 100644
> index 0000000..0468114
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c
> @@ -0,0 +1,123 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#include <linux/device.h>
> +#include <linux/export.h>
> +#include <linux/err.h>
> +#include <linux/fs.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +#include <linux/compat.h>
> +#include <uapi/linux/kfd_ioctl.h>
> +#include <linux/time.h>
> +#include "kfd_priv.h"
> +#include <linux/mm.h>
> +#include <uapi/asm-generic/mman-common.h>
> +#include <asm/processor.h>
> +
> +
> +#define MAKE_GPUVM_APP_BASE(gpu_num) (((uint64_t)(gpu_num) << 61) + 0x1000000000000)
> +#define MAKE_GPUVM_APP_LIMIT(base) (((uint64_t)(base) & 0xFFFFFF0000000000) | 0xFFFFFFFFFF)
> +#define MAKE_SCRATCH_APP_BASE(gpu_num) (((uint64_t)(gpu_num) << 61) + 0x100000000)
> +#define MAKE_SCRATCH_APP_LIMIT(base) (((uint64_t)base & 0xFFFFFFFF00000000) | 0xFFFFFFFF)
> +#define MAKE_LDS_APP_BASE(gpu_num) (((uint64_t)(gpu_num) << 61) + 0x0)
> +#define MAKE_LDS_APP_LIMIT(base) (((uint64_t)(base) & 0xFFFFFFFF00000000) | 0xFFFFFFFF)
> +
> +#define HSA_32BIT_LDS_APP_SIZE 0x10000
> +#define HSA_32BIT_LDS_APP_ALIGNMENT 0x10000
> +
> +static unsigned long kfd_reserve_aperture(struct kfd_process *process, unsigned long len, unsigned long alignment)
> +{
> +
> +	unsigned long addr = 0;
> +	unsigned long start_address;
> +
> +	/*
> +	 * Go bottom up and find the first available aligned address.
> +	 * We may narrow space to scan by getting mmap range limits.
> +	 */
> +	for (start_address =  alignment; start_address < (TASK_SIZE - alignment); start_address += alignment) {
> +		addr = vm_mmap(NULL, start_address, len, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, 0);

So this forcing aperture into address space process is not really
welcome. Userspace have no idea this will happen and valid existing
program may already staticly allocate those address through mmap
either after or before they might trigger this code.

As i said in the general answer, i think best here is to use the
kernel reserved area to map this. You can work around the gate
page if gate page matter to you.

This of course beg the question what happen if gpu try to access
inside the kernel region ? Does the iommu respect the system flag
of the page table ? Or does it just happily allow the gpu to access
the whole kernel area ?

I guess i should go dive into the iommuv2 datasheet to find out.

> +		if (!IS_ERR_VALUE(addr)) {
> +			if (addr == start_address)
> +				return addr;
> +			vm_munmap(addr, len);
> +		}
> +	}
> +	return 0;
> +
> +}
> +
> +int kfd_init_apertures(struct kfd_process *process)
> +{
> +	uint8_t id  = 0;
> +	struct kfd_dev *dev;
> +	struct kfd_process_device *pdd;
> +
> +	mutex_lock(&process->mutex);
> +
> +	/*Iterating over all devices*/
> +	while ((dev = kfd_topology_enum_kfd_devices(id)) != NULL && id < NUM_OF_SUPPORTED_GPUS) {
> +
> +		pdd = kfd_get_process_device_data(dev, process);
> +
> +		/*for 64 bit process aperture will be statically reserved in the non canonical process address space

What does non canonical process address space means ? This is the x86-64 terminology
or something else ?

> +		 *for 32 bit process the aperture will be reserved in the process address space
> +		 */
> +		if (process->is_32bit_user_mode) {
> +			/*try to reserve aperture. continue on failure, just put the aperture size to be 0*/
> +			pdd->lds_base = kfd_reserve_aperture(
> +						process,
> +						HSA_32BIT_LDS_APP_SIZE,
> +						HSA_32BIT_LDS_APP_ALIGNMENT);
> +
> +			if (pdd->lds_base)
> +				pdd->lds_limit = pdd->lds_base + HSA_32BIT_LDS_APP_SIZE - 1;
> +			else
> +				pdd->lds_limit = 0;
> +
> +			/*GPUVM and Scratch apertures are not supported*/
> +			pdd->gpuvm_base = pdd->gpuvm_limit = pdd->scratch_base = pdd->scratch_limit = 0;
> +		} else {
> +			/*node id couldn't be 0 - the three MSB bits of aperture shoudn't be 0*/
> +			pdd->lds_base = MAKE_LDS_APP_BASE(id + 1);
> +			pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
> +			pdd->gpuvm_base = MAKE_GPUVM_APP_BASE(id + 1);
> +			pdd->gpuvm_limit = MAKE_GPUVM_APP_LIMIT(pdd->gpuvm_base);
> +			pdd->scratch_base = MAKE_SCRATCH_APP_BASE(id + 1);
> +			pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
> +		}
> +
> +		dev_dbg(kfd_device, "node id %u, gpu id %u, lds_base %llX lds_limit %llX gpuvm_base %llX gpuvm_limit %llX scratch_base %llX scratch_limit %llX",
> +				id, pdd->dev->id, pdd->lds_base, pdd->lds_limit, pdd->gpuvm_base, pdd->gpuvm_limit, pdd->scratch_base, pdd->scratch_limit);

Break this debug output into several debug message. Not all of us have 30"
monitor.

> +
> +		id++;
> +	}
> +
> +	mutex_unlock(&process->mutex);
> +
> +	return 0;
> +}
> +
> +
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
> index b98bcb7..d6580a6 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
> @@ -38,6 +38,7 @@
>  
>  static long kfd_ioctl(struct file *, unsigned int, unsigned long);
>  static int kfd_open(struct inode *, struct file *);
> +static int kfd_mmap(struct file *, struct vm_area_struct *);
>  
>  static const char kfd_dev_name[] = "kfd";
>  
> @@ -46,6 +47,7 @@ static const struct file_operations kfd_fops = {
>  	.unlocked_ioctl = kfd_ioctl,
>  	.compat_ioctl = kfd_ioctl,
>  	.open = kfd_open,
> +	.mmap = kfd_mmap,
>  };
>  
>  static int kfd_char_dev_major = -1;
> @@ -96,9 +98,22 @@ struct device *kfd_chardev(void)
>  
>  static int kfd_open(struct inode *inode, struct file *filep)
>  {
> +	struct kfd_process *process;
> +
>  	if (iminor(inode) != 0)
>  		return -ENODEV;
>  
> +	process = kfd_create_process(current);
> +	if (IS_ERR(process))
> +		return PTR_ERR(process);
> +
> +	process->is_32bit_user_mode = is_compat_task();
> +
> +	dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
> +		process->pasid, process->is_32bit_user_mode);
> +
> +	kfd_init_apertures(process);
> +
>  	return 0;
>  }
>  
> @@ -152,8 +167,9 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  		"ioctl cmd 0x%x (#%d), arg 0x%lx\n",
>  		cmd, _IOC_NR(cmd), arg);
>  
> -	/* TODO: add function that retrieves process */
> -	process = NULL;
> +	process = kfd_get_process(current);
> +	if (IS_ERR(process))
> +		return PTR_ERR(process);
>  
>  	switch (cmd) {
>  	case KFD_IOC_CREATE_QUEUE:
> @@ -201,3 +217,19 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  
>  	return err;
>  }
> +
> +static int
> +kfd_mmap(struct file *filp, struct vm_area_struct *vma)
> +{
> +	unsigned long pgoff = vma->vm_pgoff;
> +	struct kfd_process *process;
> +
> +	process = kfd_get_process(current);
> +	if (IS_ERR(process))
> +		return PTR_ERR(process);
> +
> +	if (pgoff >= KFD_MMAP_DOORBELL_START && pgoff < KFD_MMAP_DOORBELL_END)
> +		return kfd_doorbell_mmap(process, vma);
> +
> +	return -EINVAL;
> +}
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
> index 4138694..f6a7cf7 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
> @@ -100,6 +100,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>  {
>  	kfd->shared_resources = *gpu_resources;
>  
> +	kfd_doorbell_init(kfd);
> +
>  	if (kfd_topology_add_device(kfd) != 0)
>  		return false;
>  
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c
> new file mode 100644
> index 0000000..972eaea
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c
> @@ -0,0 +1,264 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include "kfd_priv.h"
> +#include <linux/mm.h>
> +#include <linux/mman.h>
> +#include <linux/slab.h>
> +
> +/*
> + * This extension supports a kernel level doorbells management for
> + * the kernel queues.
> + * Basically the last doorbells page is devoted to kernel queues
> + * and that's assures that any user process won't get access to the
> + * kernel doorbells page
> + */
> +static DEFINE_MUTEX(doorbell_mutex);
> +static unsigned long doorbell_available_index[DIV_ROUND_UP(MAX_PROCESS_QUEUES, BITS_PER_LONG)] = { 0 };
> +#define KERNEL_DOORBELL_PASID 1
> +
> +/*
> + * Each device exposes a doorbell aperture, a PCI MMIO aperture that
> + * receives 32-bit writes that are passed to queues as wptr values.
> + * The doorbells are intended to be written by applications as part
> + * of queueing work on user-mode queues.
> + * We assign doorbells to applications in PAGE_SIZE-sized and aligned chunks.
> + * We map the doorbell address space into user-mode when a process creates
> + * its first queue on each device.
> + * Although the mapping is done by KFD, it is equivalent to an mmap of
> + * the /dev/kfd with the particular device encoded in the mmap offset.
> + * There will be other uses for mmap of /dev/kfd, so only a range of
> + * offsets (KFD_MMAP_DOORBELL_START-END) is used for doorbells.
> + */

Mapping should not be done by the driver instead you should provide the
offset to userspace and have userspace call mmap with proper argument.
I do not think having device driver doing mmap in the back of an ioctl
would be a welcome idea.

> +
> +/* # of doorbell bytes allocated for each process. */
> +static inline size_t doorbell_process_allocation(void)
> +{
> +	return roundup(sizeof(doorbell_t) * MAX_PROCESS_QUEUES, PAGE_SIZE);
> +}

This whole doorbell situation needs some cleanup instead of passing every
things as byte and byte offset you should rather pass everything as pfn and
pgoffset so it is clear that a doorbell is on page granularity and you will
not have to clutter all kind of align and round up accross code. Just cleaner
and safer.

> +
> +/* Doorbell calculations for device init. */
> +void kfd_doorbell_init(struct kfd_dev *kfd)
> +{
> +	size_t doorbell_start_offset;
> +	size_t doorbell_aperture_size;
> +	size_t doorbell_process_limit;
> +
> +	/*
> +	 * We start with calculations in bytes because the input data might
> +	 * only be byte-aligned.
> +	 * Only after we have done the rounding can we assume any alignment.
> +	 */
> +
> +	doorbell_start_offset = roundup(kfd->shared_resources.doorbell_start_offset,
> +					doorbell_process_allocation());
> +	doorbell_aperture_size = rounddown(kfd->shared_resources.doorbell_aperture_size,
> +					doorbell_process_allocation());
> +
> +	if (doorbell_aperture_size > doorbell_start_offset)
> +		doorbell_process_limit =
> +			(doorbell_aperture_size - doorbell_start_offset) / doorbell_process_allocation();
> +	else
> +		doorbell_process_limit = 0;
> +
> +	kfd->doorbell_base = kfd->shared_resources.doorbell_physical_address + doorbell_start_offset;
> +	kfd->doorbell_id_offset = doorbell_start_offset / sizeof(doorbell_t);
> +	kfd->doorbell_process_limit = doorbell_process_limit - 1;
> +
> +	kfd->doorbell_kernel_ptr = ioremap(kfd->doorbell_base, doorbell_process_allocation());
> +	BUG_ON(!kfd->doorbell_kernel_ptr);
> +
> +	pr_debug("kfd: doorbell initialization\n"
> +				 "     doorbell base           == 0x%08lX\n"
> +				 "     doorbell_id_offset      == 0x%08lu\n"
> +				 "     doorbell_process_limit  == 0x%08lu\n"
> +				 "     doorbell_kernel_offset  == 0x%08lX\n"
> +				 "     doorbell aperture size  == 0x%08lX\n"
> +				 "     doorbell kernel address == 0x%08lX\n",
> +				 (uintptr_t)kfd->doorbell_base,
> +				 kfd->doorbell_id_offset,
> +				 doorbell_process_limit,
> +				 (uintptr_t)kfd->doorbell_base,
> +				 kfd->shared_resources.doorbell_aperture_size,
> +				 (uintptr_t)kfd->doorbell_kernel_ptr);

Kind of ugly, will break some of the kernel log manager, you need to do one
pr_debug call per line.

> +
> +}
> +
> +/*
> + * This is the /dev/kfd mmap (for doorbell) implementation.
> + * We intend that this is only called through map_doorbells, not through
> + * user-mode mmap of /dev/kfd
> + */
> +int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma)
> +{
> +	unsigned int device_index;
> +	struct kfd_dev *dev;
> +	phys_addr_t start;
> +
> +	BUG_ON(vma->vm_pgoff < KFD_MMAP_DOORBELL_START || vma->vm_pgoff >= KFD_MMAP_DOORBELL_END);
> +
> +	/* For simplicitly we only allow mapping of the entire doorbell allocation of a single device & process. */
> +	if (vma->vm_end - vma->vm_start != doorbell_process_allocation())
> +		return -EINVAL;
> +
> +	/* device_index must be GPU ID!! */
> +	device_index = vma->vm_pgoff - KFD_MMAP_DOORBELL_START;
> +
> +	dev = kfd_device_by_id(device_index);
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE | VM_DONTDUMP | VM_PFNMAP;
> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +
> +	start = dev->doorbell_base + process->pasid * doorbell_process_allocation();
> +
> +	pr_debug("kfd: mapping doorbell page in kfd_doorbell_mmap\n"
> +		 "     target user address == 0x%016llX\n"
> +		 "     physical address    == 0x%016llX\n"
> +		 "     vm_flags            == 0x%08lX\n"
> +		 "     size                == 0x%08lX\n",
> +		 (long long unsigned int) vma->vm_start, start, vma->vm_flags,
> +		 doorbell_process_allocation());
> +
> +	return io_remap_pfn_range(vma,
> +				vma->vm_start,
> +				start >> PAGE_SHIFT,
> +				doorbell_process_allocation(),
> +				vma->vm_page_prot);
> +}
> +
> +/*
> + * Map the doorbells for a single process & device.
> + * This will indirectly call kfd_doorbell_mmap.
> + * This assumes that the process mutex is being held.
> + */
> +static int map_doorbells(struct file *devkfd, struct kfd_process *process,
> +				struct kfd_dev *dev)
> +{
> +	struct kfd_process_device *pdd = kfd_get_process_device_data(dev, process);
> +
> +	if (pdd == NULL)
> +		return -ENOMEM;
> +
> +	if (pdd->doorbell_mapping == NULL) {
> +		unsigned long offset = (KFD_MMAP_DOORBELL_START + dev->id) << PAGE_SHIFT;
> +		doorbell_t __user *doorbell_mapping;
> +
> +		doorbell_mapping = (doorbell_t __user *)vm_mmap(devkfd, 0, doorbell_process_allocation(), PROT_WRITE,
> +								MAP_SHARED, offset);

Like said above have the userspace do that. Do not do it inside
the kernel.

> +		if (IS_ERR(doorbell_mapping))
> +			return PTR_ERR(doorbell_mapping);
> +
> +		pdd->doorbell_mapping = doorbell_mapping;
> +	}
> +
> +	return 0;
> +}
> +
> +/* get kernel iomem pointer for a doorbell */
> +u32 __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd, unsigned int *doorbell_off)
> +{
> +	u32 inx;
> +
> +	BUG_ON(!kfd || !doorbell_off);
> +
> +	mutex_lock(&doorbell_mutex);
> +	inx = find_first_zero_bit(doorbell_available_index, MAX_PROCESS_QUEUES);
> +	__set_bit(inx, doorbell_available_index);
> +	mutex_unlock(&doorbell_mutex);
> +
> +	if (inx >= MAX_PROCESS_QUEUES)
> +		return NULL;
> +
> +	/* caluculating the kernel doorbell offset using "faked" kernel pasid that allocated for kernel queues only */
> +	*doorbell_off = KERNEL_DOORBELL_PASID * (doorbell_process_allocation()/sizeof(doorbell_t)) + inx;
> +
> +	pr_debug("kfd: get kernel queue doorbell\n"
> +			 "     doorbell offset   == 0x%08d\n"
> +			 "     kernel address    == 0x%08lX\n",
> +			 *doorbell_off, (uintptr_t)(kfd->doorbell_kernel_ptr + inx));
> +
> +	return kfd->doorbell_kernel_ptr + inx;
> +}
> +
> +void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr)
> +{
> +	unsigned int inx;
> +
> +	BUG_ON(!kfd || !db_addr);
> +
> +	inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr);
> +
> +	mutex_lock(&doorbell_mutex);
> +	__clear_bit(inx, doorbell_available_index);
> +	mutex_unlock(&doorbell_mutex);
> +}
> +
> +inline void write_kernel_doorbell(u32 __iomem *db, u32 value)
> +{
> +	if (db) {
> +		writel(value, db);
> +		pr_debug("writing %d to doorbell address 0x%p\n", value, db);
> +	}
> +}
> +
> +/*
> + * Get the user-mode address of a doorbell.
> + * Assumes that the process mutex is being held.
> + */
> +doorbell_t __user *kfd_get_doorbell(struct file *devkfd,
> +					struct kfd_process *process,
> +					struct kfd_dev *dev,
> +					unsigned int doorbell_index)
> +{
> +	struct kfd_process_device *pdd;
> +	int err;
> +
> +	BUG_ON(doorbell_index > MAX_DOORBELL_INDEX);
> +
> +	err = map_doorbells(devkfd, process, dev);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	pdd = kfd_get_process_device_data(dev, process);
> +	BUG_ON(pdd == NULL); /* map_doorbells would have failed otherwise */
> +
> +	pr_debug("doorbell value on creation 0x%x\n", pdd->doorbell_mapping[doorbell_index]);
> +
> +	return &pdd->doorbell_mapping[doorbell_index];
> +}
> +
> +/*
> + * queue_ids are in the range [0,MAX_PROCESS_QUEUES) and are mapped 1:1
> + * to doorbells with the process's doorbell page
> + */
> +unsigned int kfd_queue_id_to_doorbell(struct kfd_dev *kfd, struct kfd_process *process, unsigned int queue_id)
> +{
> +	/*
> +	 * doorbell_id_offset accounts for doorbells taken by KGD.
> +	 * pasid * doorbell_process_allocation/sizeof(doorbell_t) adjusts
> +	 * to the process's doorbells
> +	 */
> +	return kfd->doorbell_id_offset + process->pasid * (doorbell_process_allocation()/sizeof(doorbell_t)) + queue_id;
> +}
> +
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
> index c51f981..dc08f51 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
> @@ -65,14 +65,30 @@ void kgd2kfd_exit(void)
>  {
>  }
>  
> +extern int kfd_process_exit(struct notifier_block *nb,
> +				unsigned long action, void *data);
> +
> +static struct notifier_block kfd_mmput_nb = {
> +	.notifier_call		= kfd_process_exit,
> +	.priority		= 3,
> +};
> +
>  static int __init kfd_module_init(void)
>  {
>  	int err;
>  
> +	err = kfd_pasid_init();
> +	if (err < 0)
> +		goto err_pasid;
> +
>  	err = kfd_chardev_init();
>  	if (err < 0)
>  		goto err_ioctl;
>  
> +	err = mmput_register_notifier(&kfd_mmput_nb);
> +	if (err)
> +		goto err_mmu_notifier;
> +
>  	err = kfd_topology_init();
>  	if (err < 0)
>  		goto err_topology;
> @@ -82,15 +98,21 @@ static int __init kfd_module_init(void)
>  	return 0;
>  
>  err_topology:
> +	mmput_unregister_notifier(&kfd_mmput_nb);
> +err_mmu_notifier:
>  	kfd_chardev_exit();
>  err_ioctl:
> +	kfd_pasid_exit();
> +err_pasid:
>  	return err;
>  }
>  
>  static void __exit kfd_module_exit(void)
>  {
>  	kfd_topology_shutdown();
> +	mmput_unregister_notifier(&kfd_mmput_nb);
>  	kfd_chardev_exit();
> +	kfd_pasid_exit();
>  	dev_info(kfd_device, "Removed module\n");
>  }
>  
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c b/drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c
> new file mode 100644
> index 0000000..0b594e4
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c
> @@ -0,0 +1,97 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include "kfd_priv.h"
> +
> +#define INITIAL_PASID_LIMIT (1<<20)
> +
> +static unsigned long *pasid_bitmap;
> +static pasid_t pasid_limit;
> +static DEFINE_MUTEX(pasid_mutex);
> +
> +int kfd_pasid_init(void)
> +{
> +	pasid_limit = INITIAL_PASID_LIMIT;
> +
> +	pasid_bitmap = kzalloc(DIV_ROUND_UP(INITIAL_PASID_LIMIT, BITS_PER_BYTE), GFP_KERNEL);
> +	if (!pasid_bitmap)
> +		return -ENOMEM;
> +
> +	set_bit(0, pasid_bitmap); /* PASID 0 is reserved. */
> +
> +	return 0;
> +}
> +
> +void kfd_pasid_exit(void)
> +{
> +	kfree(pasid_bitmap);
> +}
> +
> +bool kfd_set_pasid_limit(pasid_t new_limit)
> +{
> +	if (new_limit < pasid_limit) {
> +		bool ok;
> +
> +		mutex_lock(&pasid_mutex);
> +
> +		/* ensure that no pasids >= new_limit are in-use */
> +		ok = (find_next_bit(pasid_bitmap, pasid_limit, new_limit) == pasid_limit);
> +		if (ok)
> +			pasid_limit = new_limit;
> +
> +		mutex_unlock(&pasid_mutex);
> +
> +		return ok;
> +	}
> +
> +	return true;
> +}
> +
> +inline pasid_t kfd_get_pasid_limit(void)
> +{
> +	return pasid_limit;
> +}
> +
> +pasid_t kfd_pasid_alloc(void)
> +{
> +	pasid_t found;
> +
> +	mutex_lock(&pasid_mutex);
> +
> +	found = find_first_zero_bit(pasid_bitmap, pasid_limit);
> +	if (found == pasid_limit)
> +		found = 0;
> +	else
> +		set_bit(found, pasid_bitmap);
> +
> +	mutex_unlock(&pasid_mutex);
> +
> +	return found;
> +}
> +
> +void kfd_pasid_free(pasid_t pasid)
> +{
> +	BUG_ON(pasid == 0 || pasid >= pasid_limit);
> +	clear_bit(pasid, pasid_bitmap);
> +}
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> index b391e24..af5a5e4 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> @@ -32,14 +32,39 @@
>  #include <linux/spinlock.h>
>  #include "../radeon_kfd.h"
>  
> +/*
> + * Per-process limit. Each process can only
> + * create MAX_PROCESS_QUEUES across all devices
> + */
> +#define MAX_PROCESS_QUEUES 1024
> +
> +#define MAX_DOORBELL_INDEX MAX_PROCESS_QUEUES
>  #define KFD_SYSFS_FILE_MODE 0444
>  
> +/*
> + * We multiplex different sorts of mmap-able memory onto /dev/kfd.
> + * We figure out what type of memory the caller wanted by comparing
> + * the mmap page offset to known ranges.
> + */
> +#define KFD_MMAP_DOORBELL_START	(((1ULL << 32)*1) >> PAGE_SHIFT)
> +#define KFD_MMAP_DOORBELL_END	(((1ULL << 32)*2) >> PAGE_SHIFT)
> +
>  /* GPU ID hash width in bits */
>  #define KFD_GPU_ID_HASH_WIDTH 16
>  
>  /* Macro for allocating structures */
>  #define kfd_alloc_struct(ptr_to_struct)	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
>  
> +/*
> + * Large enough to hold the maximum usable pasid + 1.
> + * It must also be able to store the number of doorbells
> + * reported by a KFD device.
> + */
> +typedef unsigned int pasid_t;
> +
> +/* Type that represents a HW doorbell slot. */
> +typedef u32 doorbell_t;
> +
>  struct kfd_device_info {
>  	const struct kfd_scheduler_class *scheduler_class;
>  	unsigned int max_pasid_bits;
> @@ -56,6 +81,17 @@ struct kfd_dev {
>  
>  	unsigned int id;		/* topology stub index */
>  
> +	phys_addr_t doorbell_base;	/* Start of actual doorbells used by
> +					 * KFD. It is aligned for mapping
> +					 * into user mode
> +					 */
> +	size_t doorbell_id_offset;	/* Doorbell offset (from KFD doorbell
> +					 * to HW doorbell, GFX reserved some
> +					 * at the start)
> +					 */
> +	size_t doorbell_process_limit;	/* Number of processes we have doorbell space for. */
> +	u32 __iomem *doorbell_kernel_ptr; /* this is a pointer for a doorbells page used by kernel queue */
> +
>  	struct kgd2kfd_shared_resources shared_resources;
>  };
>  
> @@ -68,15 +104,124 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd);
>  
>  extern const struct kfd2kgd_calls *kfd2kgd;
>  
> +/* Dummy struct just to make kfd_mem_obj* a unique pointer type. */
> +struct kfd_mem_obj_s;
> +typedef struct kfd_mem_obj_s *kfd_mem_obj;

IIRC the rule is no more typedef in kernel. Or maybe i just dreamt
that rule.

> +
> +enum kfd_mempool {
> +	KFD_MEMPOOL_SYSTEM_CACHEABLE = 1,
> +	KFD_MEMPOOL_SYSTEM_WRITECOMBINE = 2,
> +	KFD_MEMPOOL_FRAMEBUFFER = 3,
> +};
> +
> +
> +int kfd_vidmem_alloc(struct kfd_dev *kfd, size_t size, size_t alignment,
> +			enum kfd_mempool pool, kfd_mem_obj *mem_obj);
> +void kfd_vidmem_free(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
> +int kfd_vidmem_gpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, uint64_t *vmid0_address);
> +void kfd_vidmem_ungpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
> +int kfd_vidmem_kmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, void **ptr);
> +void kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
> +int kfd_vidmem_alloc_map(struct kfd_dev *kfd, kfd_mem_obj *mem_obj, void **ptr,
> +			uint64_t *vmid0_address, size_t size);
> +void kfd_vidmem_free_unmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
>  /* Character device interface */
>  int kfd_chardev_init(void);
>  void kfd_chardev_exit(void);
>  struct device *kfd_chardev(void);
>  
> +
> +/* Data that is per-process-per device. */
> +struct kfd_process_device {
> +	/*
> +	 * List of all per-device data for a process.
> +	 * Starts from kfd_process.per_device_data.
> +	 */
> +	struct list_head per_device_list;
> +
> +	/* The device that owns this data. */
> +	struct kfd_dev *dev;
> +
> +	/* The user-mode address of the doorbell mapping for this device. */
> +	doorbell_t __user *doorbell_mapping;
> +
> +	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
> +	bool bound;

Best to put the boolean at the end of the structure ...

> +
> +	/*Apertures*/
> +	uint64_t lds_base;
> +	uint64_t lds_limit;
> +	uint64_t gpuvm_base;
> +	uint64_t gpuvm_limit;
> +	uint64_t scratch_base;
> +	uint64_t scratch_limit;
> +};
> +
>  /* Process data */
>  struct kfd_process {
> +	struct list_head processes_list;
> +
> +	struct mm_struct *mm;
> +
> +	struct mutex mutex;
> +
> +	/*
> +	 * In any process, the thread that started main() is the lead
> +	 * thread and outlives the rest.
> +	 * It is here because amd_iommu_bind_pasid wants a task_struct.
> +	 */
> +	struct task_struct *lead_thread;
> +
> +	pasid_t pasid;
> +
> +	/*
> +	 * List of kfd_process_device structures,
> +	 * one for each device the process is using.
> +	 */
> +	struct list_head per_device_data;
> +
> +	/* The process's queues. */
> +	size_t queue_array_size;
> +
> +	/* Size is queue_array_size, up to MAX_PROCESS_QUEUES. */
> +	struct kfd_queue **queues;
> +
> +	unsigned long allocated_queue_bitmap[DIV_ROUND_UP(MAX_PROCESS_QUEUES, BITS_PER_LONG)];
> +
> +	/*Is the user space process 32 bit?*/
> +	bool is_32bit_user_mode;
>  };
>  
> +struct kfd_process *kfd_create_process(const struct task_struct *);
> +struct kfd_process *kfd_get_process(const struct task_struct *);
> +
> +struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
> +							struct kfd_process *p);
> +
> +/* PASIDs */
> +int kfd_pasid_init(void);
> +void kfd_pasid_exit(void);
> +bool kfd_set_pasid_limit(pasid_t new_limit);
> +pasid_t kfd_get_pasid_limit(void);
> +pasid_t kfd_pasid_alloc(void);
> +void kfd_pasid_free(pasid_t pasid);
> +
> +/* Doorbells */
> +void kfd_doorbell_init(struct kfd_dev *kfd);
> +int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma);
> +doorbell_t __user *kfd_get_doorbell(struct file *devkfd,
> +					struct kfd_process *process,
> +					struct kfd_dev *dev,
> +					unsigned int doorbell_index);
> +u32 __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
> +					unsigned int *doorbell_off);
> +void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr);
> +u32 read_kernel_doorbell(u32 __iomem *db);
> +void write_kernel_doorbell(u32 __iomem *db, u32 value);
> +unsigned int kfd_queue_id_to_doorbell(struct kfd_dev *kfd,
> +					struct kfd_process *process,
> +					unsigned int queue_id);
> +
>  extern struct device *kfd_device;
>  
>  /* Topology */
> @@ -95,4 +240,7 @@ void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
>  void kgd2kfd_suspend(struct kfd_dev *dev);
>  int kgd2kfd_resume(struct kfd_dev *dev);
>  
> +/* amdkfd Apertures */
> +int kfd_init_apertures(struct kfd_process *process);
> +
>  #endif
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_process.c b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
> new file mode 100644
> index 0000000..5efbce0
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
> @@ -0,0 +1,374 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/mutex.h>
> +#include <linux/log2.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/notifier.h>
> +struct mm_struct;
> +
> +#include "kfd_priv.h"
> +
> +/*
> + * Initial size for the array of queues.
> + * The allocated size is doubled each time
> + * it is exceeded up to MAX_PROCESS_QUEUES.
> + */
> +#define INITIAL_QUEUE_ARRAY_SIZE 16
> +
> +/* List of struct kfd_process */
> +static struct list_head kfd_processes_list = LIST_HEAD_INIT(kfd_processes_list);
> +
> +static DEFINE_MUTEX(kfd_processes_mutex);
> +
> +static struct kfd_process *create_process(const struct task_struct *thread);
> +
> +struct kfd_process *kfd_create_process(const struct task_struct *thread)
> +{
> +	struct kfd_process *process;
> +
> +	if (thread->mm == NULL)
> +		return ERR_PTR(-EINVAL);
> +
> +	/* Only the pthreads threading model is supported. */
> +	if (thread->group_leader->mm != thread->mm)
> +		return ERR_PTR(-EINVAL);
> +
> +	/*
> +	 * take kfd processes mutex before starting of process creation
> +	 * so there won't be a case where two threads of the same process
> +	 * create two kfd_process structures
> +	 */
> +	mutex_lock(&kfd_processes_mutex);
> +
> +	/* A prior open of /dev/kfd could have already created the process. */
> +	process = thread->mm->kfd_process;
> +	if (process)
> +		pr_debug("kfd: process already found\n");
> +
> +	if (!process)
> +		process = create_process(thread);
> +
> +	mutex_unlock(&kfd_processes_mutex);
> +
> +	return process;
> +}
> +
> +struct kfd_process *kfd_get_process(const struct task_struct *thread)
> +{
> +	struct kfd_process *process;
> +
> +	if (thread->mm == NULL)
> +		return ERR_PTR(-EINVAL);
> +
> +	/* Only the pthreads threading model is supported. */
> +	if (thread->group_leader->mm != thread->mm)
> +		return ERR_PTR(-EINVAL);
> +
> +	process = thread->mm->kfd_process;
> +
> +	return process;
> +}
> +
> +static void free_process(struct kfd_process *p)
> +{
> +	struct kfd_process_device *pdd, *temp;
> +
> +	BUG_ON(p == NULL);
> +
> +	list_for_each_entry_safe(pdd, temp, &p->per_device_data, per_device_list) {
> +		list_del(&pdd->per_device_list);
> +		kfree(pdd);
> +	}
> +
> +	kfd_pasid_free(p->pasid);
> +
> +	mutex_destroy(&p->mutex);
> +
> +	kfree(p->queues);
> +
> +	list_del(&p->processes_list);
> +
> +	kfree(p);
> +}
> +
> +int kfd_process_exit(struct notifier_block *nb,
> +			unsigned long action, void *data)
> +{
> +	struct mm_struct *mm = data;
> +	struct kfd_process *p;
> +
> +	mutex_lock(&kfd_processes_mutex);
> +
> +	p = mm->kfd_process;
> +	if (p) {
> +		free_process(p);
> +		mm->kfd_process = NULL;
> +	}
> +
> +	mutex_unlock(&kfd_processes_mutex);
> +
> +	return 0;
> +}
> +
> +static struct kfd_process *create_process(const struct task_struct *thread)
> +{
> +	struct kfd_process *process;
> +	int err = -ENOMEM;
> +
> +	process = kzalloc(sizeof(*process), GFP_KERNEL);
> +
> +	if (!process)
> +		goto err_alloc_process;
> +
> +	process->queues = kmalloc_array(INITIAL_QUEUE_ARRAY_SIZE, sizeof(process->queues[0]), GFP_KERNEL);
> +	if (!process->queues)
> +		goto err_alloc_queues;
> +
> +	process->pasid = kfd_pasid_alloc();
> +	if (process->pasid == 0)
> +		goto err_alloc_pasid;
> +
> +	mutex_init(&process->mutex);
> +
> +	process->mm = thread->mm;
> +	thread->mm->kfd_process = process;
> +	list_add_tail(&process->processes_list, &kfd_processes_list);
> +
> +	process->lead_thread = thread->group_leader;
> +
> +	process->queue_array_size = INITIAL_QUEUE_ARRAY_SIZE;
> +
> +	INIT_LIST_HEAD(&process->per_device_data);
> +
> +	return process;
> +
> +err_alloc_pasid:
> +	kfree(process->queues);
> +err_alloc_queues:
> +	kfree(process);
> +err_alloc_process:
> +	return ERR_PTR(err);
> +}
> +
> +struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
> +							struct kfd_process *p)
> +{
> +	struct kfd_process_device *pdd;
> +
> +	list_for_each_entry(pdd, &p->per_device_data, per_device_list)
> +		if (pdd->dev == dev)
> +			return pdd;
> +
> +	pdd = kzalloc(sizeof(*pdd), GFP_KERNEL);
> +	if (pdd != NULL) {
> +		pdd->dev = dev;
> +		list_add(&pdd->per_device_list, &p->per_device_data);
> +	}
> +
> +	return pdd;
> +}
> +
> +/*
> + * Direct the IOMMU to bind the process (specifically the pasid->mm) to the device.
> + * Unbinding occurs when the process dies or the device is removed.
> + *
> + * Assumes that the process lock is held.
> + */
> +struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
> +							struct kfd_process *p)
> +{
> +	struct kfd_process_device *pdd = kfd_get_process_device_data(dev, p);
> +
> +	if (pdd == NULL)
> +		return ERR_PTR(-ENOMEM);
> +
> +	if (pdd->bound)
> +		return pdd;
> +
> +	pdd->bound = true;
> +
> +	return pdd;
> +}
> +
> +void kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid)
> +{
> +	struct kfd_process *p;
> +	struct kfd_process_device *pdd;
> +
> +	BUG_ON(dev == NULL);
> +
> +	mutex_lock(&kfd_processes_mutex);
> +
> +	list_for_each_entry(p, &kfd_processes_list, processes_list)
> +		if (p->pasid == pasid)
> +			break;
> +
> +	mutex_unlock(&kfd_processes_mutex);
> +
> +	BUG_ON(p->pasid != pasid);
> +
> +	pdd = kfd_get_process_device_data(dev, p);
> +
> +	BUG_ON(pdd == NULL);
> +
> +	mutex_lock(&p->mutex);
> +
> +	/*
> +	 * Just mark pdd as unbound, because we still need it to call
> +	 * amd_iommu_unbind_pasid() in when the process exits.
> +	 * We don't call amd_iommu_unbind_pasid() here
> +	 * because the IOMMU called us.
> +	 */
> +	pdd->bound = false;
> +
> +	mutex_unlock(&p->mutex);
> +}
> +
> +/*
> + * Ensure that the process's queue array is large enough to hold
> + * the queue at queue_id.
> + * Assumes that the process lock is held.
> + */
> +static bool ensure_queue_array_size(struct kfd_process *p, unsigned int queue_id)
> +{
> +	size_t desired_size;
> +	struct kfd_queue **new_queues;
> +
> +	compiletime_assert(INITIAL_QUEUE_ARRAY_SIZE > 0, "INITIAL_QUEUE_ARRAY_SIZE must not be 0");
> +	compiletime_assert(INITIAL_QUEUE_ARRAY_SIZE <= MAX_PROCESS_QUEUES,
> +			   "INITIAL_QUEUE_ARRAY_SIZE must be less than MAX_PROCESS_QUEUES");
> +	/* Ensure that doubling the current size won't ever overflow. */
> +	compiletime_assert(MAX_PROCESS_QUEUES < SIZE_MAX / 2, "MAX_PROCESS_QUEUES must be less than SIZE_MAX/2");
> +
> +	/*
> +	 * These & queue_id < MAX_PROCESS_QUEUES guarantee that
> +	 * the desired_size calculation will end up <= MAX_PROCESS_QUEUES
> +	 */
> +	compiletime_assert(is_power_of_2(INITIAL_QUEUE_ARRAY_SIZE), "INITIAL_QUEUE_ARRAY_SIZE must be power of 2.");
> +	compiletime_assert(MAX_PROCESS_QUEUES % INITIAL_QUEUE_ARRAY_SIZE == 0,
> +			   "MAX_PROCESS_QUEUES must be multiple of INITIAL_QUEUE_ARRAY_SIZE.");
> +	compiletime_assert(is_power_of_2(MAX_PROCESS_QUEUES / INITIAL_QUEUE_ARRAY_SIZE),
> +			   "MAX_PROCESS_QUEUES must be a power-of-2 multiple of INITIAL_QUEUE_ARRAY_SIZE.");
> +
> +	if (queue_id < p->queue_array_size)
> +		return true;
> +
> +	if (queue_id >= MAX_PROCESS_QUEUES)
> +		return false;
> +
> +	desired_size = p->queue_array_size;
> +	while (desired_size <= queue_id)
> +		desired_size *= 2;
> +
> +	BUG_ON(desired_size < queue_id || desired_size > MAX_PROCESS_QUEUES);
> +	BUG_ON(desired_size % INITIAL_QUEUE_ARRAY_SIZE != 0 || !is_power_of_2(desired_size / INITIAL_QUEUE_ARRAY_SIZE));
> +
> +	new_queues = kmalloc_array(desired_size, sizeof(p->queues[0]), GFP_KERNEL);
> +	if (!new_queues)
> +		return false;
> +
> +	memcpy(new_queues, p->queues, p->queue_array_size * sizeof(p->queues[0]));
> +
> +	kfree(p->queues);
> +	p->queues = new_queues;
> +	p->queue_array_size = desired_size;
> +
> +	return true;
> +}
> +
> +/* Assumes that the process lock is held. */
> +bool kfd_allocate_queue_id(struct kfd_process *p, unsigned int *queue_id)
> +{
> +	unsigned int qid = find_first_zero_bit(p->allocated_queue_bitmap, MAX_PROCESS_QUEUES);
> +
> +	if (qid >= MAX_PROCESS_QUEUES)
> +		return false;
> +
> +	if (!ensure_queue_array_size(p, qid))
> +		return false;
> +
> +	__set_bit(qid, p->allocated_queue_bitmap);
> +
> +	p->queues[qid] = NULL;
> +	*queue_id = qid;
> +
> +	return true;
> +}
> +
> +/*
> + * Install a queue into a previously-allocated queue id.
> + * Assumes that the process lock is held.
> + */
> +void kfd_install_queue(struct kfd_process *p, unsigned int queue_id, struct kfd_queue *queue)
> +{
> +	/* Have to call allocate_queue_id before install_queue. */
> +	BUG_ON(queue_id >= p->queue_array_size);
> +	BUG_ON(queue == NULL);
> +
> +	p->queues[queue_id] = queue;
> +}
> +
> +/*
> + * Remove a queue from the open queue list and deallocate the queue id.
> + * This can be called whether or not a queue was installed.
> + * Assumes that the process lock is held.
> + */
> +void kfd_remove_queue(struct kfd_process *p, unsigned int queue_id)
> +{
> +	BUG_ON(!test_bit(queue_id, p->allocated_queue_bitmap));
> +	BUG_ON(queue_id >= p->queue_array_size);
> +
> +	__clear_bit(queue_id, p->allocated_queue_bitmap);
> +}
> +
> +/* Assumes that the process lock is held. */
> +struct kfd_queue *kfd_get_queue(struct kfd_process *p, unsigned int queue_id)
> +{
> +	/*
> +	 * test_bit because the contents of unallocated
> +	 * queue slots are undefined.
> +	 * Otherwise ensure_queue_array_size would have to clear new entries and
> +	 * remove_queue would have to NULL removed queues.
> +	 */
> +	return (queue_id < p->queue_array_size &&
> +		test_bit(queue_id, p->allocated_queue_bitmap)) ?
> +			p->queues[queue_id] : NULL;
> +}
> +
> +struct kfd_process_device *kfd_get_first_process_device_data(struct kfd_process *p)
> +{
> +	return list_first_entry(&p->per_device_data, struct kfd_process_device, per_device_list);
> +}
> +
> +struct kfd_process_device *kfd_get_next_process_device_data(struct kfd_process *p, struct kfd_process_device *pdd)
> +{
> +	if (list_is_last(&pdd->per_device_list, &p->per_device_data))
> +		return NULL;
> +	return list_next_entry(pdd, per_device_list);
> +}
> +
> +bool kfd_has_process_device_data(struct kfd_process *p)
> +{
> +	return !(list_empty(&p->per_device_data));
> +}
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c b/drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c
> new file mode 100644
> index 0000000..a2c4d30
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c
> @@ -0,0 +1,96 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include "kfd_priv.h"
> +
> +int kfd_vidmem_alloc(struct kfd_dev *kfd, size_t size, size_t alignment,
> +			enum kfd_mempool pool, kfd_mem_obj *mem_obj)
> +{
> +	return kfd2kgd->allocate_mem(kfd->kgd,
> +					size,
> +					alignment,
> +					(enum kgd_memory_pool)pool,
> +					(struct kgd_mem **)mem_obj);
> +}
> +
> +void kfd_vidmem_free(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
> +{
> +	kfd2kgd->free_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
> +}
> +
> +int kfd_vidmem_gpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj,
> +			uint64_t *vmid0_address)
> +{
> +	return kfd2kgd->gpumap_mem(kfd->kgd,
> +					(struct kgd_mem *)mem_obj,
> +					vmid0_address);

As discussed previously this will not fly, pinning gpu memory is a big NACK.

> +}
> +
> +void kfd_vidmem_ungpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
> +{
> +	kfd2kgd->ungpumap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
> +}
> +
> +int kfd_vidmem_kmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, void **ptr)
> +{
> +	return kfd2kgd->kmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj, ptr);
> +}
> +
> +void kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
> +{
> +	kfd2kgd->unkmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
> +}
> +
> +int kfd_vidmem_alloc_map(struct kfd_dev *kfd, kfd_mem_obj *mem_obj,
> +			void **ptr, uint64_t *vmid0_address, size_t size)
> +{
> +	int retval;
> +
> +	retval = kfd_vidmem_alloc(kfd, size, PAGE_SIZE,
> +				KFD_MEMPOOL_SYSTEM_WRITECOMBINE, mem_obj);
> +	if (retval != 0)
> +		goto fail_vidmem_alloc;
> +
> +	retval = kfd_vidmem_kmap(kfd, *mem_obj, ptr);
> +	if (retval != 0)
> +		goto fail_vidmem_kmap;
> +
> +	retval = kfd_vidmem_gpumap(kfd, *mem_obj, vmid0_address);
> +	if (retval != 0)
> +		goto fail_vidmem_gpumap;
> +
> +	return 0;
> +
> +fail_vidmem_gpumap:
> +	kfd_vidmem_unkmap(kfd, *mem_obj);
> +fail_vidmem_kmap:
> +	kfd_vidmem_free(kfd, *mem_obj);
> +fail_vidmem_alloc:
> +	return retval;
> +}
> +
> +void kfd_vidmem_free_unmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
> +{
> +	kfd_vidmem_ungpumap(kfd, mem_obj);
> +	kfd_vidmem_unkmap(kfd, mem_obj);
> +	kfd_vidmem_free(kfd, mem_obj);
> +}
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 11/25] amdkfd: Add basic modules to amdkfd
  2014-07-20 23:02   ` Jerome Glisse
@ 2014-08-02 19:25     ` Oded Gabbay
  0 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-08-02 19:25 UTC (permalink / raw)
  To: Jerome Glisse, linux-kernel, dri-devel
  Cc: Andrew Lewycky, Michel Dänzer, Alexey Skidanov,
	Andrew Morton



On 21/07/14 02:02, Jerome Glisse wrote:
> On Thu, Jul 17, 2014 at 04:29:18PM +0300, Oded Gabbay wrote:
>> From: Andrew Lewycky <Andrew.Lewycky@amd.com>
>>
>> This patch adds the process module and 4 helper modules:
>>
>> - kfd_process, which handles process which open /dev/kfd
>> - kfd_doorbell, which provides helper functions for doorbell allocation, release and mapping to userspace
>> - kfd_pasid, which provides helper functions for pasid allocation and release
>> - kfd_vidmem, which provides helper functions for allocation and release of memory from the gfx driver
>> - kfd_aperture, which provides helper functions for managing the LDS, Local GPU memory and Scratch memory apertures of the process
>>
>> This patch only contains the basic kfd_process module, which doesn't contain the reference to the queue scheduler. This was done to allow easier code review.
>>
>> Also, this patch doesn't contain the calls to the IOMMU driver for binding the pasid to the device. Again, this was done to allow easier code review
>>
>> The kfd_process object is created when a process opens /dev/kfd and is closed when the mm_struct of that process is teared-down.
> 
> So i valid argument were made to have one file per device and because this is not
> a common hsa architecture i am rather reluctant to add the /dev/kfd directory just
> for a temporary solution until people inside the HSA foundation get there act to-
> gether and work on a common API.
> 
> So i rather have all kfd temporary solution inside the radeon driver under the
> drm folder. I think we have enough ioctl left to accomodate you.
> 
>>
>> Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> ---
>>  drivers/gpu/drm/radeon/amdkfd/Makefile       |   4 +-
>>  drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c | 123 +++++++++
>>  drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c  |  36 ++-
>>  drivers/gpu/drm/radeon/amdkfd/kfd_device.c   |   2 +
>>  drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c | 264 +++++++++++++++++++
>>  drivers/gpu/drm/radeon/amdkfd/kfd_module.c   |  22 ++
>>  drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c    |  97 +++++++
>>  drivers/gpu/drm/radeon/amdkfd/kfd_priv.h     | 148 +++++++++++
>>  drivers/gpu/drm/radeon/amdkfd/kfd_process.c  | 374 +++++++++++++++++++++++++++
>>  drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c   |  96 +++++++
>>  10 files changed, 1163 insertions(+), 3 deletions(-)
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_process.c
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
>> index 08ecfcd..daf75a8 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/Makefile
>> +++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
>> @@ -4,6 +4,8 @@
>>  
>>  ccflags-y := -Iinclude/drm
>>  
>> -amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o
>> +amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
>> +		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
>> +		kfd_process.o
>>  
>>  obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c b/drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c
>> new file mode 100644
>> index 0000000..0468114
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_aperture.c
>> @@ -0,0 +1,123 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#include <linux/device.h>
>> +#include <linux/export.h>
>> +#include <linux/err.h>
>> +#include <linux/fs.h>
>> +#include <linux/sched.h>
>> +#include <linux/slab.h>
>> +#include <linux/uaccess.h>
>> +#include <linux/compat.h>
>> +#include <uapi/linux/kfd_ioctl.h>
>> +#include <linux/time.h>
>> +#include "kfd_priv.h"
>> +#include <linux/mm.h>
>> +#include <uapi/asm-generic/mman-common.h>
>> +#include <asm/processor.h>
>> +
>> +
>> +#define MAKE_GPUVM_APP_BASE(gpu_num) (((uint64_t)(gpu_num) << 61) + 0x1000000000000)
>> +#define MAKE_GPUVM_APP_LIMIT(base) (((uint64_t)(base) & 0xFFFFFF0000000000) | 0xFFFFFFFFFF)
>> +#define MAKE_SCRATCH_APP_BASE(gpu_num) (((uint64_t)(gpu_num) << 61) + 0x100000000)
>> +#define MAKE_SCRATCH_APP_LIMIT(base) (((uint64_t)base & 0xFFFFFFFF00000000) | 0xFFFFFFFF)
>> +#define MAKE_LDS_APP_BASE(gpu_num) (((uint64_t)(gpu_num) << 61) + 0x0)
>> +#define MAKE_LDS_APP_LIMIT(base) (((uint64_t)(base) & 0xFFFFFFFF00000000) | 0xFFFFFFFF)
>> +
>> +#define HSA_32BIT_LDS_APP_SIZE 0x10000
>> +#define HSA_32BIT_LDS_APP_ALIGNMENT 0x10000
>> +
>> +static unsigned long kfd_reserve_aperture(struct kfd_process *process, unsigned long len, unsigned long alignment)
>> +{
>> +
>> +	unsigned long addr = 0;
>> +	unsigned long start_address;
>> +
>> +	/*
>> +	 * Go bottom up and find the first available aligned address.
>> +	 * We may narrow space to scan by getting mmap range limits.
>> +	 */
>> +	for (start_address =  alignment; start_address < (TASK_SIZE - alignment); start_address += alignment) {
>> +		addr = vm_mmap(NULL, start_address, len, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, 0);
> 
> So this forcing aperture into address space process is not really
> welcome. Userspace have no idea this will happen and valid existing
> program may already staticly allocate those address through mmap
> either after or before they might trigger this code.
> 
> As i said in the general answer, i think best here is to use the
> kernel reserved area to map this. You can work around the gate
> page if gate page matter to you.

We talked about it in another thread, but to sum it up, I removed the
support for LDS aperture in 32 bit mode (which is the mode that uses the
above function).
> 
> This of course beg the question what happen if gpu try to access
> inside the kernel region ? Does the iommu respect the system flag
> of the page table ? Or does it just happily allow the gpu to access
> the whole kernel area ?
> 
> I guess i should go dive into the iommuv2 datasheet to find out.
> 
>> +		if (!IS_ERR_VALUE(addr)) {
>> +			if (addr == start_address)
>> +				return addr;
>> +			vm_munmap(addr, len);
>> +		}
>> +	}
>> +	return 0;
>> +
>> +}
>> +
>> +int kfd_init_apertures(struct kfd_process *process)
>> +{
>> +	uint8_t id  = 0;
>> +	struct kfd_dev *dev;
>> +	struct kfd_process_device *pdd;
>> +
>> +	mutex_lock(&process->mutex);
>> +
>> +	/*Iterating over all devices*/
>> +	while ((dev = kfd_topology_enum_kfd_devices(id)) != NULL && id < NUM_OF_SUPPORTED_GPUS) {
>> +
>> +		pdd = kfd_get_process_device_data(dev, process);
>> +
>> +		/*for 64 bit process aperture will be statically reserved in the non canonical process address space
> 
> What does non canonical process address space means ? This is the x86-64 terminology
> or something else ?
This is the x86_64 terminology. In v3 I will add detailed explanation on
this subject.
> 
>> +		 *for 32 bit process the aperture will be reserved in the process address space
>> +		 */
>> +		if (process->is_32bit_user_mode) {
>> +			/*try to reserve aperture. continue on failure, just put the aperture size to be 0*/
>> +			pdd->lds_base = kfd_reserve_aperture(
>> +						process,
>> +						HSA_32BIT_LDS_APP_SIZE,
>> +						HSA_32BIT_LDS_APP_ALIGNMENT);
>> +
>> +			if (pdd->lds_base)
>> +				pdd->lds_limit = pdd->lds_base + HSA_32BIT_LDS_APP_SIZE - 1;
>> +			else
>> +				pdd->lds_limit = 0;
>> +
>> +			/*GPUVM and Scratch apertures are not supported*/
>> +			pdd->gpuvm_base = pdd->gpuvm_limit = pdd->scratch_base = pdd->scratch_limit = 0;
>> +		} else {
>> +			/*node id couldn't be 0 - the three MSB bits of aperture shoudn't be 0*/
>> +			pdd->lds_base = MAKE_LDS_APP_BASE(id + 1);
>> +			pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
>> +			pdd->gpuvm_base = MAKE_GPUVM_APP_BASE(id + 1);
>> +			pdd->gpuvm_limit = MAKE_GPUVM_APP_LIMIT(pdd->gpuvm_base);
>> +			pdd->scratch_base = MAKE_SCRATCH_APP_BASE(id + 1);
>> +			pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
>> +		}
>> +
>> +		dev_dbg(kfd_device, "node id %u, gpu id %u, lds_base %llX lds_limit %llX gpuvm_base %llX gpuvm_limit %llX scratch_base %llX scratch_limit %llX",
>> +				id, pdd->dev->id, pdd->lds_base, pdd->lds_limit, pdd->gpuvm_base, pdd->gpuvm_limit, pdd->scratch_base, pdd->scratch_limit);
> 
> Break this debug output into several debug message. Not all of us have 30"
> monitor.
Done in v3
> 
>> +
>> +		id++;
>> +	}
>> +
>> +	mutex_unlock(&process->mutex);
>> +
>> +	return 0;
>> +}
>> +
>> +
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
>> index b98bcb7..d6580a6 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
>> @@ -38,6 +38,7 @@
>>  
>>  static long kfd_ioctl(struct file *, unsigned int, unsigned long);
>>  static int kfd_open(struct inode *, struct file *);
>> +static int kfd_mmap(struct file *, struct vm_area_struct *);
>>  
>>  static const char kfd_dev_name[] = "kfd";
>>  
>> @@ -46,6 +47,7 @@ static const struct file_operations kfd_fops = {
>>  	.unlocked_ioctl = kfd_ioctl,
>>  	.compat_ioctl = kfd_ioctl,
>>  	.open = kfd_open,
>> +	.mmap = kfd_mmap,
>>  };
>>  
>>  static int kfd_char_dev_major = -1;
>> @@ -96,9 +98,22 @@ struct device *kfd_chardev(void)
>>  
>>  static int kfd_open(struct inode *inode, struct file *filep)
>>  {
>> +	struct kfd_process *process;
>> +
>>  	if (iminor(inode) != 0)
>>  		return -ENODEV;
>>  
>> +	process = kfd_create_process(current);
>> +	if (IS_ERR(process))
>> +		return PTR_ERR(process);
>> +
>> +	process->is_32bit_user_mode = is_compat_task();
>> +
>> +	dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
>> +		process->pasid, process->is_32bit_user_mode);
>> +
>> +	kfd_init_apertures(process);
>> +
>>  	return 0;
>>  }
>>  
>> @@ -152,8 +167,9 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>>  		"ioctl cmd 0x%x (#%d), arg 0x%lx\n",
>>  		cmd, _IOC_NR(cmd), arg);
>>  
>> -	/* TODO: add function that retrieves process */
>> -	process = NULL;
>> +	process = kfd_get_process(current);
>> +	if (IS_ERR(process))
>> +		return PTR_ERR(process);
>>  
>>  	switch (cmd) {
>>  	case KFD_IOC_CREATE_QUEUE:
>> @@ -201,3 +217,19 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>>  
>>  	return err;
>>  }
>> +
>> +static int
>> +kfd_mmap(struct file *filp, struct vm_area_struct *vma)
>> +{
>> +	unsigned long pgoff = vma->vm_pgoff;
>> +	struct kfd_process *process;
>> +
>> +	process = kfd_get_process(current);
>> +	if (IS_ERR(process))
>> +		return PTR_ERR(process);
>> +
>> +	if (pgoff >= KFD_MMAP_DOORBELL_START && pgoff < KFD_MMAP_DOORBELL_END)
>> +		return kfd_doorbell_mmap(process, vma);
>> +
>> +	return -EINVAL;
>> +}
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>> index 4138694..f6a7cf7 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>> @@ -100,6 +100,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>>  {
>>  	kfd->shared_resources = *gpu_resources;
>>  
>> +	kfd_doorbell_init(kfd);
>> +
>>  	if (kfd_topology_add_device(kfd) != 0)
>>  		return false;
>>  
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c
>> new file mode 100644
>> index 0000000..972eaea
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_doorbell.c
>> @@ -0,0 +1,264 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include "kfd_priv.h"
>> +#include <linux/mm.h>
>> +#include <linux/mman.h>
>> +#include <linux/slab.h>
>> +
>> +/*
>> + * This extension supports a kernel level doorbells management for
>> + * the kernel queues.
>> + * Basically the last doorbells page is devoted to kernel queues
>> + * and that's assures that any user process won't get access to the
>> + * kernel doorbells page
>> + */
>> +static DEFINE_MUTEX(doorbell_mutex);
>> +static unsigned long doorbell_available_index[DIV_ROUND_UP(MAX_PROCESS_QUEUES, BITS_PER_LONG)] = { 0 };
>> +#define KERNEL_DOORBELL_PASID 1
>> +
>> +/*
>> + * Each device exposes a doorbell aperture, a PCI MMIO aperture that
>> + * receives 32-bit writes that are passed to queues as wptr values.
>> + * The doorbells are intended to be written by applications as part
>> + * of queueing work on user-mode queues.
>> + * We assign doorbells to applications in PAGE_SIZE-sized and aligned chunks.
>> + * We map the doorbell address space into user-mode when a process creates
>> + * its first queue on each device.
>> + * Although the mapping is done by KFD, it is equivalent to an mmap of
>> + * the /dev/kfd with the particular device encoded in the mmap offset.
>> + * There will be other uses for mmap of /dev/kfd, so only a range of
>> + * offsets (KFD_MMAP_DOORBELL_START-END) is used for doorbells.
>> + */
> 
> Mapping should not be done by the driver instead you should provide the
> offset to userspace and have userspace call mmap with proper argument.
> I do not think having device driver doing mmap in the back of an ioctl
> would be a welcome idea.
> 
Done in v3

>> +
>> +/* # of doorbell bytes allocated for each process. */
>> +static inline size_t doorbell_process_allocation(void)
>> +{
>> +	return roundup(sizeof(doorbell_t) * MAX_PROCESS_QUEUES, PAGE_SIZE);
>> +}
> 
> This whole doorbell situation needs some cleanup instead of passing every
> things as byte and byte offset you should rather pass everything as pfn and
> pgoffset so it is clear that a doorbell is on page granularity and you will
> not have to clutter all kind of align and round up accross code. Just cleaner
> and safer.
> 
Done in v3
>> +
>> +/* Doorbell calculations for device init. */
>> +void kfd_doorbell_init(struct kfd_dev *kfd)
>> +{
>> +	size_t doorbell_start_offset;
>> +	size_t doorbell_aperture_size;
>> +	size_t doorbell_process_limit;
>> +
>> +	/*
>> +	 * We start with calculations in bytes because the input data might
>> +	 * only be byte-aligned.
>> +	 * Only after we have done the rounding can we assume any alignment.
>> +	 */
>> +
>> +	doorbell_start_offset = roundup(kfd->shared_resources.doorbell_start_offset,
>> +					doorbell_process_allocation());
>> +	doorbell_aperture_size = rounddown(kfd->shared_resources.doorbell_aperture_size,
>> +					doorbell_process_allocation());
>> +
>> +	if (doorbell_aperture_size > doorbell_start_offset)
>> +		doorbell_process_limit =
>> +			(doorbell_aperture_size - doorbell_start_offset) / doorbell_process_allocation();
>> +	else
>> +		doorbell_process_limit = 0;
>> +
>> +	kfd->doorbell_base = kfd->shared_resources.doorbell_physical_address + doorbell_start_offset;
>> +	kfd->doorbell_id_offset = doorbell_start_offset / sizeof(doorbell_t);
>> +	kfd->doorbell_process_limit = doorbell_process_limit - 1;
>> +
>> +	kfd->doorbell_kernel_ptr = ioremap(kfd->doorbell_base, doorbell_process_allocation());
>> +	BUG_ON(!kfd->doorbell_kernel_ptr);
>> +
>> +	pr_debug("kfd: doorbell initialization\n"
>> +				 "     doorbell base           == 0x%08lX\n"
>> +				 "     doorbell_id_offset      == 0x%08lu\n"
>> +				 "     doorbell_process_limit  == 0x%08lu\n"
>> +				 "     doorbell_kernel_offset  == 0x%08lX\n"
>> +				 "     doorbell aperture size  == 0x%08lX\n"
>> +				 "     doorbell kernel address == 0x%08lX\n",
>> +				 (uintptr_t)kfd->doorbell_base,
>> +				 kfd->doorbell_id_offset,
>> +				 doorbell_process_limit,
>> +				 (uintptr_t)kfd->doorbell_base,
>> +				 kfd->shared_resources.doorbell_aperture_size,
>> +				 (uintptr_t)kfd->doorbell_kernel_ptr);
> 
> Kind of ugly, will break some of the kernel log manager, you need to do one
> pr_debug call per line.
> 
Done in v3

>> +
>> +}
>> +
>> +/*
>> + * This is the /dev/kfd mmap (for doorbell) implementation.
>> + * We intend that this is only called through map_doorbells, not through
>> + * user-mode mmap of /dev/kfd
>> + */
>> +int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma)
>> +{
>> +	unsigned int device_index;
>> +	struct kfd_dev *dev;
>> +	phys_addr_t start;
>> +
>> +	BUG_ON(vma->vm_pgoff < KFD_MMAP_DOORBELL_START || vma->vm_pgoff >= KFD_MMAP_DOORBELL_END);
>> +
>> +	/* For simplicitly we only allow mapping of the entire doorbell allocation of a single device & process. */
>> +	if (vma->vm_end - vma->vm_start != doorbell_process_allocation())
>> +		return -EINVAL;
>> +
>> +	/* device_index must be GPU ID!! */
>> +	device_index = vma->vm_pgoff - KFD_MMAP_DOORBELL_START;
>> +
>> +	dev = kfd_device_by_id(device_index);
>> +	if (dev == NULL)
>> +		return -EINVAL;
>> +
>> +	vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE | VM_DONTDUMP | VM_PFNMAP;
>> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>> +
>> +	start = dev->doorbell_base + process->pasid * doorbell_process_allocation();
>> +
>> +	pr_debug("kfd: mapping doorbell page in kfd_doorbell_mmap\n"
>> +		 "     target user address == 0x%016llX\n"
>> +		 "     physical address    == 0x%016llX\n"
>> +		 "     vm_flags            == 0x%08lX\n"
>> +		 "     size                == 0x%08lX\n",
>> +		 (long long unsigned int) vma->vm_start, start, vma->vm_flags,
>> +		 doorbell_process_allocation());
>> +
>> +	return io_remap_pfn_range(vma,
>> +				vma->vm_start,
>> +				start >> PAGE_SHIFT,
>> +				doorbell_process_allocation(),
>> +				vma->vm_page_prot);
>> +}
>> +
>> +/*
>> + * Map the doorbells for a single process & device.
>> + * This will indirectly call kfd_doorbell_mmap.
>> + * This assumes that the process mutex is being held.
>> + */
>> +static int map_doorbells(struct file *devkfd, struct kfd_process *process,
>> +				struct kfd_dev *dev)
>> +{
>> +	struct kfd_process_device *pdd = kfd_get_process_device_data(dev, process);
>> +
>> +	if (pdd == NULL)
>> +		return -ENOMEM;
>> +
>> +	if (pdd->doorbell_mapping == NULL) {
>> +		unsigned long offset = (KFD_MMAP_DOORBELL_START + dev->id) << PAGE_SHIFT;
>> +		doorbell_t __user *doorbell_mapping;
>> +
>> +		doorbell_mapping = (doorbell_t __user *)vm_mmap(devkfd, 0, doorbell_process_allocation(), PROT_WRITE,
>> +								MAP_SHARED, offset);
> 
> Like said above have the userspace do that. Do not do it inside
> the kernel.
> 
Done in v3

>> +		if (IS_ERR(doorbell_mapping))
>> +			return PTR_ERR(doorbell_mapping);
>> +
>> +		pdd->doorbell_mapping = doorbell_mapping;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +/* get kernel iomem pointer for a doorbell */
>> +u32 __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd, unsigned int *doorbell_off)
>> +{
>> +	u32 inx;
>> +
>> +	BUG_ON(!kfd || !doorbell_off);
>> +
>> +	mutex_lock(&doorbell_mutex);
>> +	inx = find_first_zero_bit(doorbell_available_index, MAX_PROCESS_QUEUES);
>> +	__set_bit(inx, doorbell_available_index);
>> +	mutex_unlock(&doorbell_mutex);
>> +
>> +	if (inx >= MAX_PROCESS_QUEUES)
>> +		return NULL;
>> +
>> +	/* caluculating the kernel doorbell offset using "faked" kernel pasid that allocated for kernel queues only */
>> +	*doorbell_off = KERNEL_DOORBELL_PASID * (doorbell_process_allocation()/sizeof(doorbell_t)) + inx;
>> +
>> +	pr_debug("kfd: get kernel queue doorbell\n"
>> +			 "     doorbell offset   == 0x%08d\n"
>> +			 "     kernel address    == 0x%08lX\n",
>> +			 *doorbell_off, (uintptr_t)(kfd->doorbell_kernel_ptr + inx));
>> +
>> +	return kfd->doorbell_kernel_ptr + inx;
>> +}
>> +
>> +void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr)
>> +{
>> +	unsigned int inx;
>> +
>> +	BUG_ON(!kfd || !db_addr);
>> +
>> +	inx = (unsigned int)(db_addr - kfd->doorbell_kernel_ptr);
>> +
>> +	mutex_lock(&doorbell_mutex);
>> +	__clear_bit(inx, doorbell_available_index);
>> +	mutex_unlock(&doorbell_mutex);
>> +}
>> +
>> +inline void write_kernel_doorbell(u32 __iomem *db, u32 value)
>> +{
>> +	if (db) {
>> +		writel(value, db);
>> +		pr_debug("writing %d to doorbell address 0x%p\n", value, db);
>> +	}
>> +}
>> +
>> +/*
>> + * Get the user-mode address of a doorbell.
>> + * Assumes that the process mutex is being held.
>> + */
>> +doorbell_t __user *kfd_get_doorbell(struct file *devkfd,
>> +					struct kfd_process *process,
>> +					struct kfd_dev *dev,
>> +					unsigned int doorbell_index)
>> +{
>> +	struct kfd_process_device *pdd;
>> +	int err;
>> +
>> +	BUG_ON(doorbell_index > MAX_DOORBELL_INDEX);
>> +
>> +	err = map_doorbells(devkfd, process, dev);
>> +	if (err)
>> +		return ERR_PTR(err);
>> +
>> +	pdd = kfd_get_process_device_data(dev, process);
>> +	BUG_ON(pdd == NULL); /* map_doorbells would have failed otherwise */
>> +
>> +	pr_debug("doorbell value on creation 0x%x\n", pdd->doorbell_mapping[doorbell_index]);
>> +
>> +	return &pdd->doorbell_mapping[doorbell_index];
>> +}
>> +
>> +/*
>> + * queue_ids are in the range [0,MAX_PROCESS_QUEUES) and are mapped 1:1
>> + * to doorbells with the process's doorbell page
>> + */
>> +unsigned int kfd_queue_id_to_doorbell(struct kfd_dev *kfd, struct kfd_process *process, unsigned int queue_id)
>> +{
>> +	/*
>> +	 * doorbell_id_offset accounts for doorbells taken by KGD.
>> +	 * pasid * doorbell_process_allocation/sizeof(doorbell_t) adjusts
>> +	 * to the process's doorbells
>> +	 */
>> +	return kfd->doorbell_id_offset + process->pasid * (doorbell_process_allocation()/sizeof(doorbell_t)) + queue_id;
>> +}
>> +
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>> index c51f981..dc08f51 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>> @@ -65,14 +65,30 @@ void kgd2kfd_exit(void)
>>  {
>>  }
>>  
>> +extern int kfd_process_exit(struct notifier_block *nb,
>> +				unsigned long action, void *data);
>> +
>> +static struct notifier_block kfd_mmput_nb = {
>> +	.notifier_call		= kfd_process_exit,
>> +	.priority		= 3,
>> +};
>> +
>>  static int __init kfd_module_init(void)
>>  {
>>  	int err;
>>  
>> +	err = kfd_pasid_init();
>> +	if (err < 0)
>> +		goto err_pasid;
>> +
>>  	err = kfd_chardev_init();
>>  	if (err < 0)
>>  		goto err_ioctl;
>>  
>> +	err = mmput_register_notifier(&kfd_mmput_nb);
>> +	if (err)
>> +		goto err_mmu_notifier;
>> +
>>  	err = kfd_topology_init();
>>  	if (err < 0)
>>  		goto err_topology;
>> @@ -82,15 +98,21 @@ static int __init kfd_module_init(void)
>>  	return 0;
>>  
>>  err_topology:
>> +	mmput_unregister_notifier(&kfd_mmput_nb);
>> +err_mmu_notifier:
>>  	kfd_chardev_exit();
>>  err_ioctl:
>> +	kfd_pasid_exit();
>> +err_pasid:
>>  	return err;
>>  }
>>  
>>  static void __exit kfd_module_exit(void)
>>  {
>>  	kfd_topology_shutdown();
>> +	mmput_unregister_notifier(&kfd_mmput_nb);
>>  	kfd_chardev_exit();
>> +	kfd_pasid_exit();
>>  	dev_info(kfd_device, "Removed module\n");
>>  }
>>  
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c b/drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c
>> new file mode 100644
>> index 0000000..0b594e4
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_pasid.c
>> @@ -0,0 +1,97 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <linux/slab.h>
>> +#include <linux/types.h>
>> +#include "kfd_priv.h"
>> +
>> +#define INITIAL_PASID_LIMIT (1<<20)
>> +
>> +static unsigned long *pasid_bitmap;
>> +static pasid_t pasid_limit;
>> +static DEFINE_MUTEX(pasid_mutex);
>> +
>> +int kfd_pasid_init(void)
>> +{
>> +	pasid_limit = INITIAL_PASID_LIMIT;
>> +
>> +	pasid_bitmap = kzalloc(DIV_ROUND_UP(INITIAL_PASID_LIMIT, BITS_PER_BYTE), GFP_KERNEL);
>> +	if (!pasid_bitmap)
>> +		return -ENOMEM;
>> +
>> +	set_bit(0, pasid_bitmap); /* PASID 0 is reserved. */
>> +
>> +	return 0;
>> +}
>> +
>> +void kfd_pasid_exit(void)
>> +{
>> +	kfree(pasid_bitmap);
>> +}
>> +
>> +bool kfd_set_pasid_limit(pasid_t new_limit)
>> +{
>> +	if (new_limit < pasid_limit) {
>> +		bool ok;
>> +
>> +		mutex_lock(&pasid_mutex);
>> +
>> +		/* ensure that no pasids >= new_limit are in-use */
>> +		ok = (find_next_bit(pasid_bitmap, pasid_limit, new_limit) == pasid_limit);
>> +		if (ok)
>> +			pasid_limit = new_limit;
>> +
>> +		mutex_unlock(&pasid_mutex);
>> +
>> +		return ok;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +inline pasid_t kfd_get_pasid_limit(void)
>> +{
>> +	return pasid_limit;
>> +}
>> +
>> +pasid_t kfd_pasid_alloc(void)
>> +{
>> +	pasid_t found;
>> +
>> +	mutex_lock(&pasid_mutex);
>> +
>> +	found = find_first_zero_bit(pasid_bitmap, pasid_limit);
>> +	if (found == pasid_limit)
>> +		found = 0;
>> +	else
>> +		set_bit(found, pasid_bitmap);
>> +
>> +	mutex_unlock(&pasid_mutex);
>> +
>> +	return found;
>> +}
>> +
>> +void kfd_pasid_free(pasid_t pasid)
>> +{
>> +	BUG_ON(pasid == 0 || pasid >= pasid_limit);
>> +	clear_bit(pasid, pasid_bitmap);
>> +}
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> index b391e24..af5a5e4 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> @@ -32,14 +32,39 @@
>>  #include <linux/spinlock.h>
>>  #include "../radeon_kfd.h"
>>  
>> +/*
>> + * Per-process limit. Each process can only
>> + * create MAX_PROCESS_QUEUES across all devices
>> + */
>> +#define MAX_PROCESS_QUEUES 1024
>> +
>> +#define MAX_DOORBELL_INDEX MAX_PROCESS_QUEUES
>>  #define KFD_SYSFS_FILE_MODE 0444
>>  
>> +/*
>> + * We multiplex different sorts of mmap-able memory onto /dev/kfd.
>> + * We figure out what type of memory the caller wanted by comparing
>> + * the mmap page offset to known ranges.
>> + */
>> +#define KFD_MMAP_DOORBELL_START	(((1ULL << 32)*1) >> PAGE_SHIFT)
>> +#define KFD_MMAP_DOORBELL_END	(((1ULL << 32)*2) >> PAGE_SHIFT)
>> +
>>  /* GPU ID hash width in bits */
>>  #define KFD_GPU_ID_HASH_WIDTH 16
>>  
>>  /* Macro for allocating structures */
>>  #define kfd_alloc_struct(ptr_to_struct)	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
>>  
>> +/*
>> + * Large enough to hold the maximum usable pasid + 1.
>> + * It must also be able to store the number of doorbells
>> + * reported by a KFD device.
>> + */
>> +typedef unsigned int pasid_t;
>> +
>> +/* Type that represents a HW doorbell slot. */
>> +typedef u32 doorbell_t;
>> +
>>  struct kfd_device_info {
>>  	const struct kfd_scheduler_class *scheduler_class;
>>  	unsigned int max_pasid_bits;
>> @@ -56,6 +81,17 @@ struct kfd_dev {
>>  
>>  	unsigned int id;		/* topology stub index */
>>  
>> +	phys_addr_t doorbell_base;	/* Start of actual doorbells used by
>> +					 * KFD. It is aligned for mapping
>> +					 * into user mode
>> +					 */
>> +	size_t doorbell_id_offset;	/* Doorbell offset (from KFD doorbell
>> +					 * to HW doorbell, GFX reserved some
>> +					 * at the start)
>> +					 */
>> +	size_t doorbell_process_limit;	/* Number of processes we have doorbell space for. */
>> +	u32 __iomem *doorbell_kernel_ptr; /* this is a pointer for a doorbells page used by kernel queue */
>> +
>>  	struct kgd2kfd_shared_resources shared_resources;
>>  };
>>  
>> @@ -68,15 +104,124 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd);
>>  
>>  extern const struct kfd2kgd_calls *kfd2kgd;
>>  
>> +/* Dummy struct just to make kfd_mem_obj* a unique pointer type. */
>> +struct kfd_mem_obj_s;
>> +typedef struct kfd_mem_obj_s *kfd_mem_obj;
> 
> IIRC the rule is no more typedef in kernel. Or maybe i just dreamt
> that rule.
> 
Removed all typedefs in v3

>> +
>> +enum kfd_mempool {
>> +	KFD_MEMPOOL_SYSTEM_CACHEABLE = 1,
>> +	KFD_MEMPOOL_SYSTEM_WRITECOMBINE = 2,
>> +	KFD_MEMPOOL_FRAMEBUFFER = 3,
>> +};
>> +
>> +
>> +int kfd_vidmem_alloc(struct kfd_dev *kfd, size_t size, size_t alignment,
>> +			enum kfd_mempool pool, kfd_mem_obj *mem_obj);
>> +void kfd_vidmem_free(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
>> +int kfd_vidmem_gpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, uint64_t *vmid0_address);
>> +void kfd_vidmem_ungpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
>> +int kfd_vidmem_kmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, void **ptr);
>> +void kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
>> +int kfd_vidmem_alloc_map(struct kfd_dev *kfd, kfd_mem_obj *mem_obj, void **ptr,
>> +			uint64_t *vmid0_address, size_t size);
>> +void kfd_vidmem_free_unmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
>>  /* Character device interface */
>>  int kfd_chardev_init(void);
>>  void kfd_chardev_exit(void);
>>  struct device *kfd_chardev(void);
>>  
>> +
>> +/* Data that is per-process-per device. */
>> +struct kfd_process_device {
>> +	/*
>> +	 * List of all per-device data for a process.
>> +	 * Starts from kfd_process.per_device_data.
>> +	 */
>> +	struct list_head per_device_list;
>> +
>> +	/* The device that owns this data. */
>> +	struct kfd_dev *dev;
>> +
>> +	/* The user-mode address of the doorbell mapping for this device. */
>> +	doorbell_t __user *doorbell_mapping;
>> +
>> +	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
>> +	bool bound;
> 
> Best to put the boolean at the end of the structure ...
> 
Done in v3

>> +
>> +	/*Apertures*/
>> +	uint64_t lds_base;
>> +	uint64_t lds_limit;
>> +	uint64_t gpuvm_base;
>> +	uint64_t gpuvm_limit;
>> +	uint64_t scratch_base;
>> +	uint64_t scratch_limit;
>> +};
>> +
>>  /* Process data */
>>  struct kfd_process {
>> +	struct list_head processes_list;
>> +
>> +	struct mm_struct *mm;
>> +
>> +	struct mutex mutex;
>> +
>> +	/*
>> +	 * In any process, the thread that started main() is the lead
>> +	 * thread and outlives the rest.
>> +	 * It is here because amd_iommu_bind_pasid wants a task_struct.
>> +	 */
>> +	struct task_struct *lead_thread;
>> +
>> +	pasid_t pasid;
>> +
>> +	/*
>> +	 * List of kfd_process_device structures,
>> +	 * one for each device the process is using.
>> +	 */
>> +	struct list_head per_device_data;
>> +
>> +	/* The process's queues. */
>> +	size_t queue_array_size;
>> +
>> +	/* Size is queue_array_size, up to MAX_PROCESS_QUEUES. */
>> +	struct kfd_queue **queues;
>> +
>> +	unsigned long allocated_queue_bitmap[DIV_ROUND_UP(MAX_PROCESS_QUEUES, BITS_PER_LONG)];
>> +
>> +	/*Is the user space process 32 bit?*/
>> +	bool is_32bit_user_mode;
>>  };
>>  
>> +struct kfd_process *kfd_create_process(const struct task_struct *);
>> +struct kfd_process *kfd_get_process(const struct task_struct *);
>> +
>> +struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
>> +							struct kfd_process *p);
>> +
>> +/* PASIDs */
>> +int kfd_pasid_init(void);
>> +void kfd_pasid_exit(void);
>> +bool kfd_set_pasid_limit(pasid_t new_limit);
>> +pasid_t kfd_get_pasid_limit(void);
>> +pasid_t kfd_pasid_alloc(void);
>> +void kfd_pasid_free(pasid_t pasid);
>> +
>> +/* Doorbells */
>> +void kfd_doorbell_init(struct kfd_dev *kfd);
>> +int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma);
>> +doorbell_t __user *kfd_get_doorbell(struct file *devkfd,
>> +					struct kfd_process *process,
>> +					struct kfd_dev *dev,
>> +					unsigned int doorbell_index);
>> +u32 __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
>> +					unsigned int *doorbell_off);
>> +void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr);
>> +u32 read_kernel_doorbell(u32 __iomem *db);
>> +void write_kernel_doorbell(u32 __iomem *db, u32 value);
>> +unsigned int kfd_queue_id_to_doorbell(struct kfd_dev *kfd,
>> +					struct kfd_process *process,
>> +					unsigned int queue_id);
>> +
>>  extern struct device *kfd_device;
>>  
>>  /* Topology */
>> @@ -95,4 +240,7 @@ void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
>>  void kgd2kfd_suspend(struct kfd_dev *dev);
>>  int kgd2kfd_resume(struct kfd_dev *dev);
>>  
>> +/* amdkfd Apertures */
>> +int kfd_init_apertures(struct kfd_process *process);
>> +
>>  #endif
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_process.c b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
>> new file mode 100644
>> index 0000000..5efbce0
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
>> @@ -0,0 +1,374 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <linux/mutex.h>
>> +#include <linux/log2.h>
>> +#include <linux/sched.h>
>> +#include <linux/slab.h>
>> +#include <linux/notifier.h>
>> +struct mm_struct;
>> +
>> +#include "kfd_priv.h"
>> +
>> +/*
>> + * Initial size for the array of queues.
>> + * The allocated size is doubled each time
>> + * it is exceeded up to MAX_PROCESS_QUEUES.
>> + */
>> +#define INITIAL_QUEUE_ARRAY_SIZE 16
>> +
>> +/* List of struct kfd_process */
>> +static struct list_head kfd_processes_list = LIST_HEAD_INIT(kfd_processes_list);
>> +
>> +static DEFINE_MUTEX(kfd_processes_mutex);
>> +
>> +static struct kfd_process *create_process(const struct task_struct *thread);
>> +
>> +struct kfd_process *kfd_create_process(const struct task_struct *thread)
>> +{
>> +	struct kfd_process *process;
>> +
>> +	if (thread->mm == NULL)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	/* Only the pthreads threading model is supported. */
>> +	if (thread->group_leader->mm != thread->mm)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	/*
>> +	 * take kfd processes mutex before starting of process creation
>> +	 * so there won't be a case where two threads of the same process
>> +	 * create two kfd_process structures
>> +	 */
>> +	mutex_lock(&kfd_processes_mutex);
>> +
>> +	/* A prior open of /dev/kfd could have already created the process. */
>> +	process = thread->mm->kfd_process;
>> +	if (process)
>> +		pr_debug("kfd: process already found\n");
>> +
>> +	if (!process)
>> +		process = create_process(thread);
>> +
>> +	mutex_unlock(&kfd_processes_mutex);
>> +
>> +	return process;
>> +}
>> +
>> +struct kfd_process *kfd_get_process(const struct task_struct *thread)
>> +{
>> +	struct kfd_process *process;
>> +
>> +	if (thread->mm == NULL)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	/* Only the pthreads threading model is supported. */
>> +	if (thread->group_leader->mm != thread->mm)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	process = thread->mm->kfd_process;
>> +
>> +	return process;
>> +}
>> +
>> +static void free_process(struct kfd_process *p)
>> +{
>> +	struct kfd_process_device *pdd, *temp;
>> +
>> +	BUG_ON(p == NULL);
>> +
>> +	list_for_each_entry_safe(pdd, temp, &p->per_device_data, per_device_list) {
>> +		list_del(&pdd->per_device_list);
>> +		kfree(pdd);
>> +	}
>> +
>> +	kfd_pasid_free(p->pasid);
>> +
>> +	mutex_destroy(&p->mutex);
>> +
>> +	kfree(p->queues);
>> +
>> +	list_del(&p->processes_list);
>> +
>> +	kfree(p);
>> +}
>> +
>> +int kfd_process_exit(struct notifier_block *nb,
>> +			unsigned long action, void *data)
>> +{
>> +	struct mm_struct *mm = data;
>> +	struct kfd_process *p;
>> +
>> +	mutex_lock(&kfd_processes_mutex);
>> +
>> +	p = mm->kfd_process;
>> +	if (p) {
>> +		free_process(p);
>> +		mm->kfd_process = NULL;
>> +	}
>> +
>> +	mutex_unlock(&kfd_processes_mutex);
>> +
>> +	return 0;
>> +}
>> +
>> +static struct kfd_process *create_process(const struct task_struct *thread)
>> +{
>> +	struct kfd_process *process;
>> +	int err = -ENOMEM;
>> +
>> +	process = kzalloc(sizeof(*process), GFP_KERNEL);
>> +
>> +	if (!process)
>> +		goto err_alloc_process;
>> +
>> +	process->queues = kmalloc_array(INITIAL_QUEUE_ARRAY_SIZE, sizeof(process->queues[0]), GFP_KERNEL);
>> +	if (!process->queues)
>> +		goto err_alloc_queues;
>> +
>> +	process->pasid = kfd_pasid_alloc();
>> +	if (process->pasid == 0)
>> +		goto err_alloc_pasid;
>> +
>> +	mutex_init(&process->mutex);
>> +
>> +	process->mm = thread->mm;
>> +	thread->mm->kfd_process = process;
>> +	list_add_tail(&process->processes_list, &kfd_processes_list);
>> +
>> +	process->lead_thread = thread->group_leader;
>> +
>> +	process->queue_array_size = INITIAL_QUEUE_ARRAY_SIZE;
>> +
>> +	INIT_LIST_HEAD(&process->per_device_data);
>> +
>> +	return process;
>> +
>> +err_alloc_pasid:
>> +	kfree(process->queues);
>> +err_alloc_queues:
>> +	kfree(process);
>> +err_alloc_process:
>> +	return ERR_PTR(err);
>> +}
>> +
>> +struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
>> +							struct kfd_process *p)
>> +{
>> +	struct kfd_process_device *pdd;
>> +
>> +	list_for_each_entry(pdd, &p->per_device_data, per_device_list)
>> +		if (pdd->dev == dev)
>> +			return pdd;
>> +
>> +	pdd = kzalloc(sizeof(*pdd), GFP_KERNEL);
>> +	if (pdd != NULL) {
>> +		pdd->dev = dev;
>> +		list_add(&pdd->per_device_list, &p->per_device_data);
>> +	}
>> +
>> +	return pdd;
>> +}
>> +
>> +/*
>> + * Direct the IOMMU to bind the process (specifically the pasid->mm) to the device.
>> + * Unbinding occurs when the process dies or the device is removed.
>> + *
>> + * Assumes that the process lock is held.
>> + */
>> +struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
>> +							struct kfd_process *p)
>> +{
>> +	struct kfd_process_device *pdd = kfd_get_process_device_data(dev, p);
>> +
>> +	if (pdd == NULL)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	if (pdd->bound)
>> +		return pdd;
>> +
>> +	pdd->bound = true;
>> +
>> +	return pdd;
>> +}
>> +
>> +void kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid)
>> +{
>> +	struct kfd_process *p;
>> +	struct kfd_process_device *pdd;
>> +
>> +	BUG_ON(dev == NULL);
>> +
>> +	mutex_lock(&kfd_processes_mutex);
>> +
>> +	list_for_each_entry(p, &kfd_processes_list, processes_list)
>> +		if (p->pasid == pasid)
>> +			break;
>> +
>> +	mutex_unlock(&kfd_processes_mutex);
>> +
>> +	BUG_ON(p->pasid != pasid);
>> +
>> +	pdd = kfd_get_process_device_data(dev, p);
>> +
>> +	BUG_ON(pdd == NULL);
>> +
>> +	mutex_lock(&p->mutex);
>> +
>> +	/*
>> +	 * Just mark pdd as unbound, because we still need it to call
>> +	 * amd_iommu_unbind_pasid() in when the process exits.
>> +	 * We don't call amd_iommu_unbind_pasid() here
>> +	 * because the IOMMU called us.
>> +	 */
>> +	pdd->bound = false;
>> +
>> +	mutex_unlock(&p->mutex);
>> +}
>> +
>> +/*
>> + * Ensure that the process's queue array is large enough to hold
>> + * the queue at queue_id.
>> + * Assumes that the process lock is held.
>> + */
>> +static bool ensure_queue_array_size(struct kfd_process *p, unsigned int queue_id)
>> +{
>> +	size_t desired_size;
>> +	struct kfd_queue **new_queues;
>> +
>> +	compiletime_assert(INITIAL_QUEUE_ARRAY_SIZE > 0, "INITIAL_QUEUE_ARRAY_SIZE must not be 0");
>> +	compiletime_assert(INITIAL_QUEUE_ARRAY_SIZE <= MAX_PROCESS_QUEUES,
>> +			   "INITIAL_QUEUE_ARRAY_SIZE must be less than MAX_PROCESS_QUEUES");
>> +	/* Ensure that doubling the current size won't ever overflow. */
>> +	compiletime_assert(MAX_PROCESS_QUEUES < SIZE_MAX / 2, "MAX_PROCESS_QUEUES must be less than SIZE_MAX/2");
>> +
>> +	/*
>> +	 * These & queue_id < MAX_PROCESS_QUEUES guarantee that
>> +	 * the desired_size calculation will end up <= MAX_PROCESS_QUEUES
>> +	 */
>> +	compiletime_assert(is_power_of_2(INITIAL_QUEUE_ARRAY_SIZE), "INITIAL_QUEUE_ARRAY_SIZE must be power of 2.");
>> +	compiletime_assert(MAX_PROCESS_QUEUES % INITIAL_QUEUE_ARRAY_SIZE == 0,
>> +			   "MAX_PROCESS_QUEUES must be multiple of INITIAL_QUEUE_ARRAY_SIZE.");
>> +	compiletime_assert(is_power_of_2(MAX_PROCESS_QUEUES / INITIAL_QUEUE_ARRAY_SIZE),
>> +			   "MAX_PROCESS_QUEUES must be a power-of-2 multiple of INITIAL_QUEUE_ARRAY_SIZE.");
>> +
>> +	if (queue_id < p->queue_array_size)
>> +		return true;
>> +
>> +	if (queue_id >= MAX_PROCESS_QUEUES)
>> +		return false;
>> +
>> +	desired_size = p->queue_array_size;
>> +	while (desired_size <= queue_id)
>> +		desired_size *= 2;
>> +
>> +	BUG_ON(desired_size < queue_id || desired_size > MAX_PROCESS_QUEUES);
>> +	BUG_ON(desired_size % INITIAL_QUEUE_ARRAY_SIZE != 0 || !is_power_of_2(desired_size / INITIAL_QUEUE_ARRAY_SIZE));
>> +
>> +	new_queues = kmalloc_array(desired_size, sizeof(p->queues[0]), GFP_KERNEL);
>> +	if (!new_queues)
>> +		return false;
>> +
>> +	memcpy(new_queues, p->queues, p->queue_array_size * sizeof(p->queues[0]));
>> +
>> +	kfree(p->queues);
>> +	p->queues = new_queues;
>> +	p->queue_array_size = desired_size;
>> +
>> +	return true;
>> +}
>> +
>> +/* Assumes that the process lock is held. */
>> +bool kfd_allocate_queue_id(struct kfd_process *p, unsigned int *queue_id)
>> +{
>> +	unsigned int qid = find_first_zero_bit(p->allocated_queue_bitmap, MAX_PROCESS_QUEUES);
>> +
>> +	if (qid >= MAX_PROCESS_QUEUES)
>> +		return false;
>> +
>> +	if (!ensure_queue_array_size(p, qid))
>> +		return false;
>> +
>> +	__set_bit(qid, p->allocated_queue_bitmap);
>> +
>> +	p->queues[qid] = NULL;
>> +	*queue_id = qid;
>> +
>> +	return true;
>> +}
>> +
>> +/*
>> + * Install a queue into a previously-allocated queue id.
>> + * Assumes that the process lock is held.
>> + */
>> +void kfd_install_queue(struct kfd_process *p, unsigned int queue_id, struct kfd_queue *queue)
>> +{
>> +	/* Have to call allocate_queue_id before install_queue. */
>> +	BUG_ON(queue_id >= p->queue_array_size);
>> +	BUG_ON(queue == NULL);
>> +
>> +	p->queues[queue_id] = queue;
>> +}
>> +
>> +/*
>> + * Remove a queue from the open queue list and deallocate the queue id.
>> + * This can be called whether or not a queue was installed.
>> + * Assumes that the process lock is held.
>> + */
>> +void kfd_remove_queue(struct kfd_process *p, unsigned int queue_id)
>> +{
>> +	BUG_ON(!test_bit(queue_id, p->allocated_queue_bitmap));
>> +	BUG_ON(queue_id >= p->queue_array_size);
>> +
>> +	__clear_bit(queue_id, p->allocated_queue_bitmap);
>> +}
>> +
>> +/* Assumes that the process lock is held. */
>> +struct kfd_queue *kfd_get_queue(struct kfd_process *p, unsigned int queue_id)
>> +{
>> +	/*
>> +	 * test_bit because the contents of unallocated
>> +	 * queue slots are undefined.
>> +	 * Otherwise ensure_queue_array_size would have to clear new entries and
>> +	 * remove_queue would have to NULL removed queues.
>> +	 */
>> +	return (queue_id < p->queue_array_size &&
>> +		test_bit(queue_id, p->allocated_queue_bitmap)) ?
>> +			p->queues[queue_id] : NULL;
>> +}
>> +
>> +struct kfd_process_device *kfd_get_first_process_device_data(struct kfd_process *p)
>> +{
>> +	return list_first_entry(&p->per_device_data, struct kfd_process_device, per_device_list);
>> +}
>> +
>> +struct kfd_process_device *kfd_get_next_process_device_data(struct kfd_process *p, struct kfd_process_device *pdd)
>> +{
>> +	if (list_is_last(&pdd->per_device_list, &p->per_device_data))
>> +		return NULL;
>> +	return list_next_entry(pdd, per_device_list);
>> +}
>> +
>> +bool kfd_has_process_device_data(struct kfd_process *p)
>> +{
>> +	return !(list_empty(&p->per_device_data));
>> +}
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c b/drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c
>> new file mode 100644
>> index 0000000..a2c4d30
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_vidmem.c
>> @@ -0,0 +1,96 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include "kfd_priv.h"
>> +
>> +int kfd_vidmem_alloc(struct kfd_dev *kfd, size_t size, size_t alignment,
>> +			enum kfd_mempool pool, kfd_mem_obj *mem_obj)
>> +{
>> +	return kfd2kgd->allocate_mem(kfd->kgd,
>> +					size,
>> +					alignment,
>> +					(enum kgd_memory_pool)pool,
>> +					(struct kgd_mem **)mem_obj);
>> +}
>> +
>> +void kfd_vidmem_free(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
>> +{
>> +	kfd2kgd->free_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
>> +}
>> +
>> +int kfd_vidmem_gpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj,
>> +			uint64_t *vmid0_address)
>> +{
>> +	return kfd2kgd->gpumap_mem(kfd->kgd,
>> +					(struct kgd_mem *)mem_obj,
>> +					vmid0_address);
> 
> As discussed previously this will not fly, pinning gpu memory is a big NACK.
> 
>> +}
>> +
>> +void kfd_vidmem_ungpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
>> +{
>> +	kfd2kgd->ungpumap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
>> +}
>> +
>> +int kfd_vidmem_kmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, void **ptr)
>> +{
>> +	return kfd2kgd->kmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj, ptr);
>> +}
>> +
>> +void kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
>> +{
>> +	kfd2kgd->unkmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
>> +}
>> +
>> +int kfd_vidmem_alloc_map(struct kfd_dev *kfd, kfd_mem_obj *mem_obj,
>> +			void **ptr, uint64_t *vmid0_address, size_t size)
>> +{
>> +	int retval;
>> +
>> +	retval = kfd_vidmem_alloc(kfd, size, PAGE_SIZE,
>> +				KFD_MEMPOOL_SYSTEM_WRITECOMBINE, mem_obj);
>> +	if (retval != 0)
>> +		goto fail_vidmem_alloc;
>> +
>> +	retval = kfd_vidmem_kmap(kfd, *mem_obj, ptr);
>> +	if (retval != 0)
>> +		goto fail_vidmem_kmap;
>> +
>> +	retval = kfd_vidmem_gpumap(kfd, *mem_obj, vmid0_address);
>> +	if (retval != 0)
>> +		goto fail_vidmem_gpumap;
>> +
>> +	return 0;
>> +
>> +fail_vidmem_gpumap:
>> +	kfd_vidmem_unkmap(kfd, *mem_obj);
>> +fail_vidmem_kmap:
>> +	kfd_vidmem_free(kfd, *mem_obj);
>> +fail_vidmem_alloc:
>> +	return retval;
>> +}
>> +
>> +void kfd_vidmem_free_unmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
>> +{
>> +	kfd_vidmem_ungpumap(kfd, mem_obj);
>> +	kfd_vidmem_unkmap(kfd, mem_obj);
>> +	kfd_vidmem_free(kfd, mem_obj);
>> +}
>> -- 
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 12/25] amdkfd: Add binding/unbinding calls to amd_iommu driver
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (7 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 11/25] amdkfd: Add basic modules " Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-20 23:04   ` Jerome Glisse
  2014-07-17 13:29 ` [PATCH v2 13/25] amdkfd: Add queue module Oded Gabbay
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

This patch adds the functions to bind and unbind pasid from a device through the amd_iommu driver.

The unbind function is called when the mm_struct of the process is released.

The bind function is not called here because it is called only in the IOCTLs which are not yet implemented at this stage of the patchset.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/kfd_device.c  | 80 ++++++++++++++++++++++++++++-
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h    |  1 +
 drivers/gpu/drm/radeon/amdkfd/kfd_process.c | 12 +++++
 3 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
index f6a7cf7..7c4c836 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
@@ -95,6 +95,59 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev)
 	return kfd;
 }
 
+static bool device_iommu_pasid_init(struct kfd_dev *kfd)
+{
+	const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP | AMD_IOMMU_DEVICE_FLAG_PRI_SUP
+					| AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
+
+	struct amd_iommu_device_info iommu_info;
+	pasid_t pasid_limit;
+	int err;
+
+	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
+	if (err < 0) {
+		dev_err(kfd_device, "error getting iommu info. is the iommu enabled?\n");
+		return false;
+	}
+
+	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags) {
+		dev_err(kfd_device, "error required iommu flags ats(%i), pri(%i), pasid(%i)\n",
+		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
+		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
+		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP) != 0);
+		return false;
+	}
+
+	pasid_limit = min_t(pasid_t, (pasid_t)1 << kfd->device_info->max_pasid_bits, iommu_info.max_pasids);
+	/*
+	 * last pasid is used for kernel queues doorbells
+	 * in the future the last pasid might be used for a kernel thread.
+	 */
+	pasid_limit = min_t(pasid_t, pasid_limit, kfd->doorbell_process_limit - 1);
+
+	err = amd_iommu_init_device(kfd->pdev, pasid_limit);
+	if (err < 0) {
+		dev_err(kfd_device, "error initializing iommu device\n");
+		return false;
+	}
+
+	if (!kfd_set_pasid_limit(pasid_limit)) {
+		dev_err(kfd_device, "error setting pasid limit\n");
+		amd_iommu_free_device(kfd->pdev);
+		return false;
+	}
+
+	return true;
+}
+
+static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
+{
+	struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
+
+	if (dev)
+		kfd_unbind_process_from_device(dev, pasid);
+}
+
 bool kgd2kfd_device_init(struct kfd_dev *kfd,
 			 const struct kgd2kfd_shared_resources *gpu_resources)
 {
@@ -102,8 +155,15 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 
 	kfd_doorbell_init(kfd);
 
-	if (kfd_topology_add_device(kfd) != 0)
+	if (!device_iommu_pasid_init(kfd))
+		return false;
+
+	if (kfd_topology_add_device(kfd) != 0) {
+		amd_iommu_free_device(kfd->pdev);
 		return false;
+	}
+
+	amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback);
 
 	kfd->init_complete = true;
 	dev_info(kfd_device, "added device (%x:%x)\n", kfd->pdev->vendor,
@@ -118,18 +178,36 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd)
 
 	BUG_ON(err != 0);
 
+	if (kfd->init_complete)
+		amd_iommu_free_device(kfd->pdev);
+
 	kfree(kfd);
 }
 
 void kgd2kfd_suspend(struct kfd_dev *kfd)
 {
 	BUG_ON(kfd == NULL);
+
+	if (kfd->init_complete)
+		amd_iommu_free_device(kfd->pdev);
 }
 
 int kgd2kfd_resume(struct kfd_dev *kfd)
 {
+	pasid_t pasid_limit;
+	int err;
+
 	BUG_ON(kfd == NULL);
 
+	pasid_limit = kfd_get_pasid_limit();
+
+	if (kfd->init_complete) {
+		err = amd_iommu_init_device(kfd->pdev, pasid_limit);
+		if (err < 0)
+			return -ENXIO;
+		amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback);
+	}
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
index af5a5e4..604c317 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -195,6 +195,7 @@ struct kfd_process {
 struct kfd_process *kfd_create_process(const struct task_struct *);
 struct kfd_process *kfd_get_process(const struct task_struct *);
 
+void kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid);
 struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
 							struct kfd_process *p);
 
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_process.c b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
index 5efbce0..908b3b7 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
@@ -24,6 +24,7 @@
 #include <linux/log2.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
+#include <linux/amd-iommu.h>
 #include <linux/notifier.h>
 struct mm_struct;
 
@@ -97,6 +98,7 @@ static void free_process(struct kfd_process *p)
 	BUG_ON(p == NULL);
 
 	list_for_each_entry_safe(pdd, temp, &p->per_device_data, per_device_list) {
+		amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
 		list_del(&pdd->per_device_list);
 		kfree(pdd);
 	}
@@ -199,6 +201,7 @@ struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
 							struct kfd_process *p)
 {
 	struct kfd_process_device *pdd = kfd_get_process_device_data(dev, p);
+	int err;
 
 	if (pdd == NULL)
 		return ERR_PTR(-ENOMEM);
@@ -206,6 +209,15 @@ struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
 	if (pdd->bound)
 		return pdd;
 
+	err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
+	if (err < 0)
+		return ERR_PTR(err);
+
+	if (err < 0) {
+		amd_iommu_unbind_pasid(dev->pdev, p->pasid);
+		return ERR_PTR(err);
+	}
+
 	pdd->bound = true;
 
 	return pdd;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 12/25] amdkfd: Add binding/unbinding calls to amd_iommu driver
  2014-07-17 13:29 ` [PATCH v2 12/25] amdkfd: Add binding/unbinding calls to amd_iommu driver Oded Gabbay
@ 2014-07-20 23:04   ` Jerome Glisse
  2014-07-27 11:11     ` Oded Gabbay
  0 siblings, 1 reply; 46+ messages in thread
From: Jerome Glisse @ 2014-07-20 23:04 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Evgeny Pinchuk, Alexey Skidanov, Alex Deucher, Andrew Morton,
	Christian König

On Thu, Jul 17, 2014 at 04:29:19PM +0300, Oded Gabbay wrote:
> This patch adds the functions to bind and unbind pasid from a device through the amd_iommu driver.
> 
> The unbind function is called when the mm_struct of the process is released.
> 
> The bind function is not called here because it is called only in the IOCTLs which are not yet implemented at this stage of the patchset.

Commit message should follow the 80 char per line rules too IIRC
No other comment than apply the 80char for the patch too.

> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/drm/radeon/amdkfd/kfd_device.c  | 80 ++++++++++++++++++++++++++++-
>  drivers/gpu/drm/radeon/amdkfd/kfd_priv.h    |  1 +
>  drivers/gpu/drm/radeon/amdkfd/kfd_process.c | 12 +++++
>  3 files changed, 92 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
> index f6a7cf7..7c4c836 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
> @@ -95,6 +95,59 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev)
>  	return kfd;
>  }
>  
> +static bool device_iommu_pasid_init(struct kfd_dev *kfd)
> +{
> +	const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP | AMD_IOMMU_DEVICE_FLAG_PRI_SUP
> +					| AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
> +
> +	struct amd_iommu_device_info iommu_info;
> +	pasid_t pasid_limit;
> +	int err;
> +
> +	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
> +	if (err < 0) {
> +		dev_err(kfd_device, "error getting iommu info. is the iommu enabled?\n");
> +		return false;
> +	}
> +
> +	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags) {
> +		dev_err(kfd_device, "error required iommu flags ats(%i), pri(%i), pasid(%i)\n",
> +		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
> +		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
> +		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP) != 0);
> +		return false;
> +	}
> +
> +	pasid_limit = min_t(pasid_t, (pasid_t)1 << kfd->device_info->max_pasid_bits, iommu_info.max_pasids);
> +	/*
> +	 * last pasid is used for kernel queues doorbells
> +	 * in the future the last pasid might be used for a kernel thread.
> +	 */
> +	pasid_limit = min_t(pasid_t, pasid_limit, kfd->doorbell_process_limit - 1);
> +
> +	err = amd_iommu_init_device(kfd->pdev, pasid_limit);
> +	if (err < 0) {
> +		dev_err(kfd_device, "error initializing iommu device\n");
> +		return false;
> +	}
> +
> +	if (!kfd_set_pasid_limit(pasid_limit)) {
> +		dev_err(kfd_device, "error setting pasid limit\n");
> +		amd_iommu_free_device(kfd->pdev);
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
> +{
> +	struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
> +
> +	if (dev)
> +		kfd_unbind_process_from_device(dev, pasid);
> +}
> +
>  bool kgd2kfd_device_init(struct kfd_dev *kfd,
>  			 const struct kgd2kfd_shared_resources *gpu_resources)
>  {
> @@ -102,8 +155,15 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>  
>  	kfd_doorbell_init(kfd);
>  
> -	if (kfd_topology_add_device(kfd) != 0)
> +	if (!device_iommu_pasid_init(kfd))
> +		return false;
> +
> +	if (kfd_topology_add_device(kfd) != 0) {
> +		amd_iommu_free_device(kfd->pdev);
>  		return false;
> +	}
> +
> +	amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback);
>  
>  	kfd->init_complete = true;
>  	dev_info(kfd_device, "added device (%x:%x)\n", kfd->pdev->vendor,
> @@ -118,18 +178,36 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd)
>  
>  	BUG_ON(err != 0);
>  
> +	if (kfd->init_complete)
> +		amd_iommu_free_device(kfd->pdev);
> +
>  	kfree(kfd);
>  }
>  
>  void kgd2kfd_suspend(struct kfd_dev *kfd)
>  {
>  	BUG_ON(kfd == NULL);
> +
> +	if (kfd->init_complete)
> +		amd_iommu_free_device(kfd->pdev);
>  }
>  
>  int kgd2kfd_resume(struct kfd_dev *kfd)
>  {
> +	pasid_t pasid_limit;
> +	int err;
> +
>  	BUG_ON(kfd == NULL);
>  
> +	pasid_limit = kfd_get_pasid_limit();
> +
> +	if (kfd->init_complete) {
> +		err = amd_iommu_init_device(kfd->pdev, pasid_limit);
> +		if (err < 0)
> +			return -ENXIO;
> +		amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback);
> +	}
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> index af5a5e4..604c317 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> @@ -195,6 +195,7 @@ struct kfd_process {
>  struct kfd_process *kfd_create_process(const struct task_struct *);
>  struct kfd_process *kfd_get_process(const struct task_struct *);
>  
> +void kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid);
>  struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
>  							struct kfd_process *p);
>  
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_process.c b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
> index 5efbce0..908b3b7 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
> @@ -24,6 +24,7 @@
>  #include <linux/log2.h>
>  #include <linux/sched.h>
>  #include <linux/slab.h>
> +#include <linux/amd-iommu.h>
>  #include <linux/notifier.h>
>  struct mm_struct;
>  
> @@ -97,6 +98,7 @@ static void free_process(struct kfd_process *p)
>  	BUG_ON(p == NULL);
>  
>  	list_for_each_entry_safe(pdd, temp, &p->per_device_data, per_device_list) {
> +		amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
>  		list_del(&pdd->per_device_list);
>  		kfree(pdd);
>  	}
> @@ -199,6 +201,7 @@ struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
>  							struct kfd_process *p)
>  {
>  	struct kfd_process_device *pdd = kfd_get_process_device_data(dev, p);
> +	int err;
>  
>  	if (pdd == NULL)
>  		return ERR_PTR(-ENOMEM);
> @@ -206,6 +209,15 @@ struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
>  	if (pdd->bound)
>  		return pdd;
>  
> +	err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
> +	if (err < 0)
> +		return ERR_PTR(err);
> +
> +	if (err < 0) {
> +		amd_iommu_unbind_pasid(dev->pdev, p->pasid);
> +		return ERR_PTR(err);
> +	}
> +
>  	pdd->bound = true;
>  
>  	return pdd;
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 12/25] amdkfd: Add binding/unbinding calls to amd_iommu driver
  2014-07-20 23:04   ` Jerome Glisse
@ 2014-07-27 11:11     ` Oded Gabbay
  0 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-27 11:11 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Alexey Skidanov, Andrew Morton

On 21/07/14 02:04, Jerome Glisse wrote:
> On Thu, Jul 17, 2014 at 04:29:19PM +0300, Oded Gabbay wrote:
>> This patch adds the functions to bind and unbind pasid from a device through the amd_iommu driver.
>>
>> The unbind function is called when the mm_struct of the process is released.
>>
>> The bind function is not called here because it is called only in the IOCTLs which are not yet implemented at this stage of the patchset.
>
> Commit message should follow the 80 char per line rules too IIRC
> No other comment than apply the 80char for the patch too.
Done in v3 for commit msg.
For the patch, I tried to minimize the lines above 80 char as much as I could 
without harming readability of the code.

	Oded
>
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> ---
>>   drivers/gpu/drm/radeon/amdkfd/kfd_device.c  | 80 ++++++++++++++++++++++++++++-
>>   drivers/gpu/drm/radeon/amdkfd/kfd_priv.h    |  1 +
>>   drivers/gpu/drm/radeon/amdkfd/kfd_process.c | 12 +++++
>>   3 files changed, 92 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>> index f6a7cf7..7c4c836 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
>> @@ -95,6 +95,59 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev)
>>   	return kfd;
>>   }
>>
>> +static bool device_iommu_pasid_init(struct kfd_dev *kfd)
>> +{
>> +	const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP | AMD_IOMMU_DEVICE_FLAG_PRI_SUP
>> +					| AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
>> +
>> +	struct amd_iommu_device_info iommu_info;
>> +	pasid_t pasid_limit;
>> +	int err;
>> +
>> +	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
>> +	if (err < 0) {
>> +		dev_err(kfd_device, "error getting iommu info. is the iommu enabled?\n");
>> +		return false;
>> +	}
>> +
>> +	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags) {
>> +		dev_err(kfd_device, "error required iommu flags ats(%i), pri(%i), pasid(%i)\n",
>> +		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
>> +		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
>> +		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP) != 0);
>> +		return false;
>> +	}
>> +
>> +	pasid_limit = min_t(pasid_t, (pasid_t)1 << kfd->device_info->max_pasid_bits, iommu_info.max_pasids);
>> +	/*
>> +	 * last pasid is used for kernel queues doorbells
>> +	 * in the future the last pasid might be used for a kernel thread.
>> +	 */
>> +	pasid_limit = min_t(pasid_t, pasid_limit, kfd->doorbell_process_limit - 1);
>> +
>> +	err = amd_iommu_init_device(kfd->pdev, pasid_limit);
>> +	if (err < 0) {
>> +		dev_err(kfd_device, "error initializing iommu device\n");
>> +		return false;
>> +	}
>> +
>> +	if (!kfd_set_pasid_limit(pasid_limit)) {
>> +		dev_err(kfd_device, "error setting pasid limit\n");
>> +		amd_iommu_free_device(kfd->pdev);
>> +		return false;
>> +	}
>> +
>> +	return true;
>> +}
>> +
>> +static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
>> +{
>> +	struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
>> +
>> +	if (dev)
>> +		kfd_unbind_process_from_device(dev, pasid);
>> +}
>> +
>>   bool kgd2kfd_device_init(struct kfd_dev *kfd,
>>   			 const struct kgd2kfd_shared_resources *gpu_resources)
>>   {
>> @@ -102,8 +155,15 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>>
>>   	kfd_doorbell_init(kfd);
>>
>> -	if (kfd_topology_add_device(kfd) != 0)
>> +	if (!device_iommu_pasid_init(kfd))
>> +		return false;
>> +
>> +	if (kfd_topology_add_device(kfd) != 0) {
>> +		amd_iommu_free_device(kfd->pdev);
>>   		return false;
>> +	}
>> +
>> +	amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback);
>>
>>   	kfd->init_complete = true;
>>   	dev_info(kfd_device, "added device (%x:%x)\n", kfd->pdev->vendor,
>> @@ -118,18 +178,36 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd)
>>
>>   	BUG_ON(err != 0);
>>
>> +	if (kfd->init_complete)
>> +		amd_iommu_free_device(kfd->pdev);
>> +
>>   	kfree(kfd);
>>   }
>>
>>   void kgd2kfd_suspend(struct kfd_dev *kfd)
>>   {
>>   	BUG_ON(kfd == NULL);
>> +
>> +	if (kfd->init_complete)
>> +		amd_iommu_free_device(kfd->pdev);
>>   }
>>
>>   int kgd2kfd_resume(struct kfd_dev *kfd)
>>   {
>> +	pasid_t pasid_limit;
>> +	int err;
>> +
>>   	BUG_ON(kfd == NULL);
>>
>> +	pasid_limit = kfd_get_pasid_limit();
>> +
>> +	if (kfd->init_complete) {
>> +		err = amd_iommu_init_device(kfd->pdev, pasid_limit);
>> +		if (err < 0)
>> +			return -ENXIO;
>> +		amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback);
>> +	}
>> +
>>   	return 0;
>>   }
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> index af5a5e4..604c317 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> @@ -195,6 +195,7 @@ struct kfd_process {
>>   struct kfd_process *kfd_create_process(const struct task_struct *);
>>   struct kfd_process *kfd_get_process(const struct task_struct *);
>>
>> +void kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid);
>>   struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
>>   							struct kfd_process *p);
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_process.c b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
>> index 5efbce0..908b3b7 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
>> @@ -24,6 +24,7 @@
>>   #include <linux/log2.h>
>>   #include <linux/sched.h>
>>   #include <linux/slab.h>
>> +#include <linux/amd-iommu.h>
>>   #include <linux/notifier.h>
>>   struct mm_struct;
>>
>> @@ -97,6 +98,7 @@ static void free_process(struct kfd_process *p)
>>   	BUG_ON(p == NULL);
>>
>>   	list_for_each_entry_safe(pdd, temp, &p->per_device_data, per_device_list) {
>> +		amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
>>   		list_del(&pdd->per_device_list);
>>   		kfree(pdd);
>>   	}
>> @@ -199,6 +201,7 @@ struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
>>   							struct kfd_process *p)
>>   {
>>   	struct kfd_process_device *pdd = kfd_get_process_device_data(dev, p);
>> +	int err;
>>
>>   	if (pdd == NULL)
>>   		return ERR_PTR(-ENOMEM);
>> @@ -206,6 +209,15 @@ struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
>>   	if (pdd->bound)
>>   		return pdd;
>>
>> +	err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
>> +	if (err < 0)
>> +		return ERR_PTR(err);
>> +
>> +	if (err < 0) {
>> +		amd_iommu_unbind_pasid(dev->pdev, p->pasid);
>> +		return ERR_PTR(err);
>> +	}
>> +
>>   	pdd->bound = true;
>>
>>   	return pdd;
>> --
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 13/25] amdkfd: Add queue module
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (8 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 12/25] amdkfd: Add binding/unbinding calls to amd_iommu driver Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-20 23:06   ` Jerome Glisse
  2014-07-17 13:29 ` [PATCH v2 14/25] amdkfd: Add mqd_manager module Oded Gabbay
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Ben Goz <ben.goz@amd.com>

The queue module enables allocating and initializing queues uniformly.

Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/Makefile    |   2 +-
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h  |  48 +++++++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_queue.c | 109 ++++++++++++++++++++++++++++++
 3 files changed, 158 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_queue.c

diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
index daf75a8..dbff147 100644
--- a/drivers/gpu/drm/radeon/amdkfd/Makefile
+++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
@@ -6,6 +6,6 @@ ccflags-y := -Iinclude/drm
 
 amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
 		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
-		kfd_process.o
+		kfd_process.o kfd_queue.o
 
 obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
index 604c317..94ff1c3 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -65,6 +65,9 @@ typedef unsigned int pasid_t;
 /* Type that represents a HW doorbell slot. */
 typedef u32 doorbell_t;
 
+/* Type that represents queue pointer */
+typedef u32 qptr_t;
+
 struct kfd_device_info {
 	const struct kfd_scheduler_class *scheduler_class;
 	unsigned int max_pasid_bits;
@@ -125,12 +128,57 @@ void kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
 int kfd_vidmem_alloc_map(struct kfd_dev *kfd, kfd_mem_obj *mem_obj, void **ptr,
 			uint64_t *vmid0_address, size_t size);
 void kfd_vidmem_free_unmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
+
 /* Character device interface */
 int kfd_chardev_init(void);
 void kfd_chardev_exit(void);
 struct device *kfd_chardev(void);
 
 
+enum kfd_queue_type  {
+	KFD_QUEUE_TYPE_COMPUTE,
+	KFD_QUEUE_TYPE_SDMA,
+	KFD_QUEUE_TYPE_HIQ,
+	KFD_QUEUE_TYPE_DIQ
+};
+
+struct queue_properties {
+	enum kfd_queue_type type;
+	unsigned int queue_id;
+	uint64_t queue_address;
+	uint64_t  queue_size;
+	uint32_t priority;
+	uint32_t queue_percent;
+	qptr_t *read_ptr;
+	qptr_t *write_ptr;
+	qptr_t *doorbell_ptr;
+	qptr_t doorbell_off;
+	bool is_interop;
+	bool is_active;
+	/* Not relevant for user mode queues in cp scheduling */
+	unsigned int vmid;
+};
+
+struct queue {
+	struct list_head list;
+	void *mqd;
+	/* kfd_mem_obj contains the mqd */
+	kfd_mem_obj mqd_mem_obj;
+	uint64_t gart_mqd_addr; /* needed for cp scheduling */
+	struct queue_properties properties;
+
+	/*
+	 * Used by the queue device manager to track the hqd slot per queue
+	 * when using no cp scheduling
+	 */
+	uint32_t mec;
+	uint32_t pipe;
+	uint32_t queue;
+
+	struct kfd_process	*process;
+	struct kfd_dev		*device;
+};
+
 /* Data that is per-process-per device. */
 struct kfd_process_device {
 	/*
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_queue.c b/drivers/gpu/drm/radeon/amdkfd/kfd_queue.c
new file mode 100644
index 0000000..646b6d1
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_queue.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/slab.h>
+#include "kfd_priv.h"
+
+void print_queue_properties(struct queue_properties *q)
+{
+	if (!q)
+		return;
+
+	pr_debug("Printing queue properties\n"
+			"Queue Type: %u\n"
+			"Queue Size: %llu\n"
+			"Queue percent: %u\n"
+			"Queue Address: 0x%llX\n"
+			"Queue Id: %u\n"
+			"Queue Process Vmid: %u\n"
+			"Queue Read Pointer: 0x%p\n"
+			"Queue Write Pointer: 0x%p\n"
+			"Queue Doorbell Pointer: 0x%p\n"
+			"Queue Doorbell Offset: %u\n",  q->type,
+							q->queue_size,
+							q->queue_percent,
+							q->queue_address,
+							q->queue_id,
+							q->vmid,
+							q->read_ptr,
+							q->write_ptr,
+							q->doorbell_ptr,
+							q->doorbell_off);
+}
+
+void print_queue(struct queue *q)
+{
+	if (!q)
+		return;
+	pr_debug("Printing queue\n"
+			"Queue Type: %u\n"
+			"Queue Size: %llu\n"
+			"Queue percent: %u\n"
+			"Queue Address: 0x%llX\n"
+			"Queue Id: %u\n"
+			"Queue Process Vmid: %u\n"
+			"Queue Read Pointer: 0x%p\n"
+			"Queue Write Pointer: 0x%p\n"
+			"Queue Doorbell Pointer: 0x%p\n"
+			"Queue Doorbell Offset: %u\n"
+			"Queue MQD Address: 0x%p\n"
+			"Queue MQD Gart: 0x%llX\n"
+			"Queue Process Address: 0x%p\n"
+			"Queue Device Address: 0x%p\n",
+					q->properties.type,
+					q->properties.queue_size,
+					q->properties.queue_percent,
+					q->properties.queue_address,
+					q->properties.queue_id,
+					q->properties.vmid,
+					q->properties.read_ptr,
+					q->properties.write_ptr,
+					q->properties.doorbell_ptr,
+					q->properties.doorbell_off,
+					q->mqd,
+					q->gart_mqd_addr,
+					q->process,
+					q->device);
+}
+
+int init_queue(struct queue **q, struct queue_properties properties)
+{
+	struct queue *tmp;
+
+	BUG_ON(!q);
+
+	tmp = kzalloc(sizeof(struct queue), GFP_KERNEL);
+	if (!tmp)
+		return -ENOMEM;
+
+	memset(&tmp->properties, 0, sizeof(struct queue_properties));
+	memcpy(&tmp->properties, &properties, sizeof(struct queue_properties));
+
+	*q = tmp;
+	return 0;
+}
+
+void uninit_queue(struct queue *q)
+{
+	kfree(q);
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 13/25] amdkfd: Add queue module
  2014-07-17 13:29 ` [PATCH v2 13/25] amdkfd: Add queue module Oded Gabbay
@ 2014-07-20 23:06   ` Jerome Glisse
  2014-07-27 11:09     ` Oded Gabbay
  0 siblings, 1 reply; 46+ messages in thread
From: Jerome Glisse @ 2014-07-20 23:06 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Evgeny Pinchuk, Alexey Skidanov, Alex Deucher, Andrew Morton,
	Christian König

On Thu, Jul 17, 2014 at 04:29:20PM +0300, Oded Gabbay wrote:
> From: Ben Goz <ben.goz@amd.com>
> 
> The queue module enables allocating and initializing queues uniformly.
> 
> Signed-off-by: Ben Goz <ben.goz@amd.com>
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/drm/radeon/amdkfd/Makefile    |   2 +-
>  drivers/gpu/drm/radeon/amdkfd/kfd_priv.h  |  48 +++++++++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_queue.c | 109 ++++++++++++++++++++++++++++++
>  3 files changed, 158 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_queue.c
> 
> diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
> index daf75a8..dbff147 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/Makefile
> +++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
> @@ -6,6 +6,6 @@ ccflags-y := -Iinclude/drm
>  
>  amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
>  		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
> -		kfd_process.o
> +		kfd_process.o kfd_queue.o
>  
>  obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> index 604c317..94ff1c3 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> @@ -65,6 +65,9 @@ typedef unsigned int pasid_t;
>  /* Type that represents a HW doorbell slot. */
>  typedef u32 doorbell_t;
>  
> +/* Type that represents queue pointer */
> +typedef u32 qptr_t;
> +
>  struct kfd_device_info {
>  	const struct kfd_scheduler_class *scheduler_class;
>  	unsigned int max_pasid_bits;
> @@ -125,12 +128,57 @@ void kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
>  int kfd_vidmem_alloc_map(struct kfd_dev *kfd, kfd_mem_obj *mem_obj, void **ptr,
>  			uint64_t *vmid0_address, size_t size);
>  void kfd_vidmem_free_unmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
> +
>  /* Character device interface */
>  int kfd_chardev_init(void);
>  void kfd_chardev_exit(void);
>  struct device *kfd_chardev(void);
>  
>  
> +enum kfd_queue_type  {
> +	KFD_QUEUE_TYPE_COMPUTE,
> +	KFD_QUEUE_TYPE_SDMA,
> +	KFD_QUEUE_TYPE_HIQ,
> +	KFD_QUEUE_TYPE_DIQ
> +};
> +
> +struct queue_properties {
> +	enum kfd_queue_type type;
> +	unsigned int queue_id;
> +	uint64_t queue_address;
> +	uint64_t  queue_size;
> +	uint32_t priority;
> +	uint32_t queue_percent;
> +	qptr_t *read_ptr;
> +	qptr_t *write_ptr;
> +	qptr_t *doorbell_ptr;
> +	qptr_t doorbell_off;
> +	bool is_interop;
> +	bool is_active;
> +	/* Not relevant for user mode queues in cp scheduling */
> +	unsigned int vmid;
> +};
> +
> +struct queue {
> +	struct list_head list;
> +	void *mqd;
> +	/* kfd_mem_obj contains the mqd */
> +	kfd_mem_obj mqd_mem_obj;
> +	uint64_t gart_mqd_addr; /* needed for cp scheduling */
> +	struct queue_properties properties;
> +
> +	/*
> +	 * Used by the queue device manager to track the hqd slot per queue
> +	 * when using no cp scheduling
> +	 */
> +	uint32_t mec;
> +	uint32_t pipe;
> +	uint32_t queue;
> +
> +	struct kfd_process	*process;
> +	struct kfd_dev		*device;
> +};
> +
>  /* Data that is per-process-per device. */
>  struct kfd_process_device {
>  	/*
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_queue.c b/drivers/gpu/drm/radeon/amdkfd/kfd_queue.c
> new file mode 100644
> index 0000000..646b6d1
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_queue.c
> @@ -0,0 +1,109 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#include <linux/slab.h>
> +#include "kfd_priv.h"
> +
> +void print_queue_properties(struct queue_properties *q)
> +{
> +	if (!q)
> +		return;
> +
> +	pr_debug("Printing queue properties\n"
> +			"Queue Type: %u\n"
> +			"Queue Size: %llu\n"
> +			"Queue percent: %u\n"
> +			"Queue Address: 0x%llX\n"
> +			"Queue Id: %u\n"
> +			"Queue Process Vmid: %u\n"
> +			"Queue Read Pointer: 0x%p\n"
> +			"Queue Write Pointer: 0x%p\n"
> +			"Queue Doorbell Pointer: 0x%p\n"
> +			"Queue Doorbell Offset: %u\n",  q->type,
> +							q->queue_size,
> +							q->queue_percent,
> +							q->queue_address,
> +							q->queue_id,
> +							q->vmid,
> +							q->read_ptr,
> +							q->write_ptr,
> +							q->doorbell_ptr,
> +							q->doorbell_off);

One pr_debug call per line.

> +}
> +
> +void print_queue(struct queue *q)
> +{
> +	if (!q)
> +		return;
> +	pr_debug("Printing queue\n"
> +			"Queue Type: %u\n"
> +			"Queue Size: %llu\n"
> +			"Queue percent: %u\n"
> +			"Queue Address: 0x%llX\n"
> +			"Queue Id: %u\n"
> +			"Queue Process Vmid: %u\n"
> +			"Queue Read Pointer: 0x%p\n"
> +			"Queue Write Pointer: 0x%p\n"
> +			"Queue Doorbell Pointer: 0x%p\n"
> +			"Queue Doorbell Offset: %u\n"
> +			"Queue MQD Address: 0x%p\n"
> +			"Queue MQD Gart: 0x%llX\n"
> +			"Queue Process Address: 0x%p\n"
> +			"Queue Device Address: 0x%p\n",
> +					q->properties.type,
> +					q->properties.queue_size,
> +					q->properties.queue_percent,
> +					q->properties.queue_address,
> +					q->properties.queue_id,
> +					q->properties.vmid,
> +					q->properties.read_ptr,
> +					q->properties.write_ptr,
> +					q->properties.doorbell_ptr,
> +					q->properties.doorbell_off,
> +					q->mqd,
> +					q->gart_mqd_addr,
> +					q->process,
> +					q->device);

Ditto

> +}
> +
> +int init_queue(struct queue **q, struct queue_properties properties)
> +{
> +	struct queue *tmp;
> +
> +	BUG_ON(!q);
> +
> +	tmp = kzalloc(sizeof(struct queue), GFP_KERNEL);
> +	if (!tmp)
> +		return -ENOMEM;
> +
> +	memset(&tmp->properties, 0, sizeof(struct queue_properties));

memset uselss because of the memcpy below.

> +	memcpy(&tmp->properties, &properties, sizeof(struct queue_properties));
> +
> +	*q = tmp;
> +	return 0;
> +}
> +
> +void uninit_queue(struct queue *q)
> +{
> +	kfree(q);
> +}
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 13/25] amdkfd: Add queue module
  2014-07-20 23:06   ` Jerome Glisse
@ 2014-07-27 11:09     ` Oded Gabbay
  0 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-27 11:09 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Alexey Skidanov, Andrew Morton

On 21/07/14 02:06, Jerome Glisse wrote:
> On Thu, Jul 17, 2014 at 04:29:20PM +0300, Oded Gabbay wrote:
>> From: Ben Goz <ben.goz@amd.com>
>>
>> The queue module enables allocating and initializing queues uniformly.
>>
>> Signed-off-by: Ben Goz <ben.goz@amd.com>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> ---
>>   drivers/gpu/drm/radeon/amdkfd/Makefile    |   2 +-
>>   drivers/gpu/drm/radeon/amdkfd/kfd_priv.h  |  48 +++++++++++++
>>   drivers/gpu/drm/radeon/amdkfd/kfd_queue.c | 109 ++++++++++++++++++++++++++++++
>>   3 files changed, 158 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_queue.c
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
>> index daf75a8..dbff147 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/Makefile
>> +++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
>> @@ -6,6 +6,6 @@ ccflags-y := -Iinclude/drm
>>
>>   amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
>>   		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
>> -		kfd_process.o
>> +		kfd_process.o kfd_queue.o
>>
>>   obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> index 604c317..94ff1c3 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> @@ -65,6 +65,9 @@ typedef unsigned int pasid_t;
>>   /* Type that represents a HW doorbell slot. */
>>   typedef u32 doorbell_t;
>>
>> +/* Type that represents queue pointer */
>> +typedef u32 qptr_t;
>> +
>>   struct kfd_device_info {
>>   	const struct kfd_scheduler_class *scheduler_class;
>>   	unsigned int max_pasid_bits;
>> @@ -125,12 +128,57 @@ void kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
>>   int kfd_vidmem_alloc_map(struct kfd_dev *kfd, kfd_mem_obj *mem_obj, void **ptr,
>>   			uint64_t *vmid0_address, size_t size);
>>   void kfd_vidmem_free_unmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
>> +
>>   /* Character device interface */
>>   int kfd_chardev_init(void);
>>   void kfd_chardev_exit(void);
>>   struct device *kfd_chardev(void);
>>
>>
>> +enum kfd_queue_type  {
>> +	KFD_QUEUE_TYPE_COMPUTE,
>> +	KFD_QUEUE_TYPE_SDMA,
>> +	KFD_QUEUE_TYPE_HIQ,
>> +	KFD_QUEUE_TYPE_DIQ
>> +};
>> +
>> +struct queue_properties {
>> +	enum kfd_queue_type type;
>> +	unsigned int queue_id;
>> +	uint64_t queue_address;
>> +	uint64_t  queue_size;
>> +	uint32_t priority;
>> +	uint32_t queue_percent;
>> +	qptr_t *read_ptr;
>> +	qptr_t *write_ptr;
>> +	qptr_t *doorbell_ptr;
>> +	qptr_t doorbell_off;
>> +	bool is_interop;
>> +	bool is_active;
>> +	/* Not relevant for user mode queues in cp scheduling */
>> +	unsigned int vmid;
>> +};
>> +
>> +struct queue {
>> +	struct list_head list;
>> +	void *mqd;
>> +	/* kfd_mem_obj contains the mqd */
>> +	kfd_mem_obj mqd_mem_obj;
>> +	uint64_t gart_mqd_addr; /* needed for cp scheduling */
>> +	struct queue_properties properties;
>> +
>> +	/*
>> +	 * Used by the queue device manager to track the hqd slot per queue
>> +	 * when using no cp scheduling
>> +	 */
>> +	uint32_t mec;
>> +	uint32_t pipe;
>> +	uint32_t queue;
>> +
>> +	struct kfd_process	*process;
>> +	struct kfd_dev		*device;
>> +};
>> +
>>   /* Data that is per-process-per device. */
>>   struct kfd_process_device {
>>   	/*
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_queue.c b/drivers/gpu/drm/radeon/amdkfd/kfd_queue.c
>> new file mode 100644
>> index 0000000..646b6d1
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_queue.c
>> @@ -0,0 +1,109 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#include <linux/slab.h>
>> +#include "kfd_priv.h"
>> +
>> +void print_queue_properties(struct queue_properties *q)
>> +{
>> +	if (!q)
>> +		return;
>> +
>> +	pr_debug("Printing queue properties\n"
>> +			"Queue Type: %u\n"
>> +			"Queue Size: %llu\n"
>> +			"Queue percent: %u\n"
>> +			"Queue Address: 0x%llX\n"
>> +			"Queue Id: %u\n"
>> +			"Queue Process Vmid: %u\n"
>> +			"Queue Read Pointer: 0x%p\n"
>> +			"Queue Write Pointer: 0x%p\n"
>> +			"Queue Doorbell Pointer: 0x%p\n"
>> +			"Queue Doorbell Offset: %u\n",  q->type,
>> +							q->queue_size,
>> +							q->queue_percent,
>> +							q->queue_address,
>> +							q->queue_id,
>> +							q->vmid,
>> +							q->read_ptr,
>> +							q->write_ptr,
>> +							q->doorbell_ptr,
>> +							q->doorbell_off);
>
> One pr_debug call per line.
Done in v3
>
>> +}
>> +
>> +void print_queue(struct queue *q)
>> +{
>> +	if (!q)
>> +		return;
>> +	pr_debug("Printing queue\n"
>> +			"Queue Type: %u\n"
>> +			"Queue Size: %llu\n"
>> +			"Queue percent: %u\n"
>> +			"Queue Address: 0x%llX\n"
>> +			"Queue Id: %u\n"
>> +			"Queue Process Vmid: %u\n"
>> +			"Queue Read Pointer: 0x%p\n"
>> +			"Queue Write Pointer: 0x%p\n"
>> +			"Queue Doorbell Pointer: 0x%p\n"
>> +			"Queue Doorbell Offset: %u\n"
>> +			"Queue MQD Address: 0x%p\n"
>> +			"Queue MQD Gart: 0x%llX\n"
>> +			"Queue Process Address: 0x%p\n"
>> +			"Queue Device Address: 0x%p\n",
>> +					q->properties.type,
>> +					q->properties.queue_size,
>> +					q->properties.queue_percent,
>> +					q->properties.queue_address,
>> +					q->properties.queue_id,
>> +					q->properties.vmid,
>> +					q->properties.read_ptr,
>> +					q->properties.write_ptr,
>> +					q->properties.doorbell_ptr,
>> +					q->properties.doorbell_off,
>> +					q->mqd,
>> +					q->gart_mqd_addr,
>> +					q->process,
>> +					q->device);
>
> Ditto
Done in v3
>
>> +}
>> +
>> +int init_queue(struct queue **q, struct queue_properties properties)
>> +{
>> +	struct queue *tmp;
>> +
>> +	BUG_ON(!q);
>> +
>> +	tmp = kzalloc(sizeof(struct queue), GFP_KERNEL);
>> +	if (!tmp)
>> +		return -ENOMEM;
>> +
>> +	memset(&tmp->properties, 0, sizeof(struct queue_properties));
>
> memset uselss because of the memcpy below.
Removed in v3.
	Oded
>
>> +	memcpy(&tmp->properties, &properties, sizeof(struct queue_properties));
>> +
>> +	*q = tmp;
>> +	return 0;
>> +}
>> +
>> +void uninit_queue(struct queue *q)
>> +{
>> +	kfree(q);
>> +}
>> --
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 14/25] amdkfd: Add mqd_manager module
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (9 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 13/25] amdkfd: Add queue module Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-21  2:33   ` Jerome Glisse
  2014-07-17 13:29 ` [PATCH v2 15/25] amdkfd: Add kernel queue module Oded Gabbay
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Ben Goz <ben.goz@amd.com>

The mqd_manager module handles MQD data structures. MQD stands for Memory Queue Descriptor, which is used by the H/W to keep the usermode queue state in memory.

Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/Makefile          |   2 +-
 drivers/gpu/drm/radeon/amdkfd/cik_mqds.h        | 185 +++++++++++++++
 drivers/gpu/drm/radeon/amdkfd/cik_regs.h        | 220 ++++++++++++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c | 291 ++++++++++++++++++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h |  54 +++++
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h        |   8 +
 6 files changed, 759 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/cik_mqds.h
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/cik_regs.h
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h

diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
index dbff147..b5201f4 100644
--- a/drivers/gpu/drm/radeon/amdkfd/Makefile
+++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
@@ -6,6 +6,6 @@ ccflags-y := -Iinclude/drm
 
 amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
 		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
-		kfd_process.o kfd_queue.o
+		kfd_process.o kfd_queue.o kfd_mqd_manager.o
 
 obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
diff --git a/drivers/gpu/drm/radeon/amdkfd/cik_mqds.h b/drivers/gpu/drm/radeon/amdkfd/cik_mqds.h
new file mode 100644
index 0000000..ce75604
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/cik_mqds.h
@@ -0,0 +1,185 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef CIK_MQDS_H_
+#define CIK_MQDS_H_
+
+#pragma pack(push, 4)
+
+struct cik_hpd_registers {
+	u32 cp_hpd_roq_offsets;
+	u32 cp_hpd_eop_base_addr;
+	u32 cp_hpd_eop_base_addr_hi;
+	u32 cp_hpd_eop_vmid;
+	u32 cp_hpd_eop_control;
+};
+
+/* This structure represents mqd used for cp scheduling queue
+ * taken from Gfx72_cp_program_spec.pdf
+ */
+struct cik_compute_mqd {
+	u32 header;
+	u32 compute_dispatch_initiator;
+	u32 compute_dim_x;
+	u32 compute_dim_y;
+	u32 compute_dim_z;
+	u32 compute_start_x;
+	u32 compute_start_y;
+	u32 compute_start_z;
+	u32 compute_num_thread_x;
+	u32 compute_num_thread_y;
+	u32 compute_num_thread_z;
+	u32 compute_pipelinestat_enable;
+	u32 compute_perfcount_enable;
+	u32 compute_pgm_lo;
+	u32 compute_pgm_hi;
+	u32 compute_tba_lo;
+	u32 compute_tba_hi;
+	u32 compute_tma_lo;
+	u32 compute_tma_hi;
+	u32 compute_pgm_rsrc1;
+	u32 compute_pgm_rsrc2;
+	u32 compute_vmid;
+	u32 compute_resource_limits;
+	u32 compute_static_thread_mgmt_se0;
+	u32 compute_static_thread_mgmt_se1;
+	u32 compute_tmpring_size;
+	u32 compute_static_thread_mgmt_se2;
+	u32 compute_static_thread_mgmt_se3;
+	u32 compute_restart_x;
+	u32 compute_restart_y;
+	u32 compute_restart_z;
+	u32 compute_thread_trace_enable;
+	u32 compute_misc_reserved;
+	u32 compute_user_data[16];
+	u32 vgt_csinvoc_count_lo;
+	u32 vgt_csinvoc_count_hi;
+	u32 cp_mqd_base_addr51;
+	u32 cp_mqd_base_addr_hi;
+	u32 cp_hqd_active;
+	u32 cp_hqd_vmid;
+	u32 cp_hqd_persistent_state;
+	u32 cp_hqd_pipe_priority;
+	u32 cp_hqd_queue_priority;
+	u32 cp_hqd_quantum;
+	u32 cp_hqd_pq_base;
+	u32 cp_hqd_pq_base_hi;
+	u32 cp_hqd_pq_rptr;
+	u32 cp_hqd_pq_rptr_report_addr;
+	u32 cp_hqd_pq_rptr_report_addr_hi;
+	u32 cp_hqd_pq_wptr_poll_addr;
+	u32 cp_hqd_pq_wptr_poll_addr_hi;
+	u32 cp_hqd_pq_doorbell_control;
+	u32 cp_hqd_pq_wptr;
+	u32 cp_hqd_pq_control;
+	u32 cp_hqd_ib_base_addr;
+	u32 cp_hqd_ib_base_addr_hi;
+	u32 cp_hqd_ib_rptr;
+	u32 cp_hqd_ib_control;
+	u32 cp_hqd_iq_timer;
+	u32 cp_hqd_iq_rptr;
+	u32 cp_hqd_dequeue_request;
+	u32 cp_hqd_dma_offload;
+	u32 cp_hqd_sema_cmd;
+	u32 cp_hqd_msg_type;
+	u32 cp_hqd_atomic0_preop_lo;
+	u32 cp_hqd_atomic0_preop_hi;
+	u32 cp_hqd_atomic1_preop_lo;
+	u32 cp_hqd_atomic1_preop_hi;
+	u32 cp_hqd_hq_scheduler0;
+	u32 cp_hqd_hq_scheduler1;
+	u32 cp_mqd_control;
+	u32 reserved1[10];
+	u32 cp_mqd_query_time_lo;
+	u32 cp_mqd_query_time_hi;
+	u32 reserved2[4];
+	u32 cp_mqd_connect_start_time_lo;
+	u32 cp_mqd_connect_start_time_hi;
+	u32 cp_mqd_connect_end_time_lo;
+	u32 cp_mqd_connect_end_time_hi;
+	u32 cp_mqd_connect_end_wf_count;
+	u32 cp_mqd_connect_end_pq_rptr;
+	u32 cp_mqd_connect_end_pq_wptr;
+	u32 cp_mqd_connect_end_ib_rptr;
+	u32 reserved3[18];
+};
+
+/* This structure represents all *IQs
+ * Taken from Gfx73_CPC_Eng_Init_Prog.pdf
+ */
+struct cik_interface_mqd {
+	u32 reserved1[128];
+	u32 cp_mqd_base_addr;
+	u32 cp_mqd_base_addr_hi;
+	u32 cp_hqd_active;
+	u32 cp_hqd_vmid;
+	u32 cp_hqd_persistent_state;
+	u32 cp_hqd_pipe_priority;
+	u32 cp_hqd_queue_priority;
+	u32 cp_hqd_quantum;
+	u32 cp_hqd_pq_base;
+	u32 cp_hqd_pq_base_hi;
+	u32 cp_hqd_pq_rptr;
+	u32 cp_hqd_pq_rptr_report_addr;
+	u32 cp_hqd_pq_rptr_report_addr_hi;
+	u32 cp_hqd_pq_wptr_poll_addr;
+	u32 cp_hqd_pq_wptr_poll_addr_hi;
+	u32 cp_hqd_pq_doorbell_control;
+	u32 cp_hqd_pq_wptr;
+	u32 cp_hqd_pq_control;
+	u32 cp_hqd_ib_base_addr;
+	u32 cp_hqd_ib_base_addr_hi;
+	u32 cp_hqd_ib_rptr;
+	u32 cp_hqd_ib_control;
+	u32 cp_hqd_iq_timer;
+	u32 cp_hqd_iq_rptr;
+	u32 cp_hqd_dequeue_request;
+	u32 cp_hqd_dma_offload;
+	u32 cp_hqd_sema_cmd;
+	u32 cp_hqd_msg_type;
+	u32 cp_hqd_atomic0_preop_lo;
+	u32 cp_hqd_atomic0_preop_hi;
+	u32 cp_hqd_atomic1_preop_lo;
+	u32 cp_hqd_atomic1_preop_hi;
+	u32 cp_hqd_hq_status0;
+	u32 cp_hqd_hq_control0;
+	u32 cp_mqd_control;
+	u32 reserved2[3];
+	u32 cp_hqd_hq_status1;
+	u32 cp_hqd_hq_control1;
+	u32 reserved3[16];
+	u32 cp_hqd_hq_status2;
+	u32 cp_hqd_hq_control2;
+	u32 cp_hqd_hq_status3;
+	u32 cp_hqd_hq_control3;
+	u32 reserved4[2];
+	u32 cp_mqd_query_time_lo;
+	u32 cp_mqd_query_time_hi;
+	u32 reserved5[48];
+	u32 cp_mqd_skip_process[16];
+};
+
+#pragma pack(pop)
+
+
+#endif /* CIK_MQDS_H_ */
diff --git a/drivers/gpu/drm/radeon/amdkfd/cik_regs.h b/drivers/gpu/drm/radeon/amdkfd/cik_regs.h
new file mode 100644
index 0000000..a6404e3
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/cik_regs.h
@@ -0,0 +1,220 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef CIK_REGS_H
+#define CIK_REGS_H
+
+#define IH_VMID_0_LUT					0x3D40u
+
+#define BIF_DOORBELL_CNTL				0x530Cu
+
+#define	SRBM_GFX_CNTL					0xE44
+#define	PIPEID(x)					((x) << 0)
+#define	MEID(x)						((x) << 2)
+#define	VMID(x)						((x) << 4)
+#define	QUEUEID(x)					((x) << 8)
+
+#define	SQ_CONFIG					0x8C00
+
+#define	SH_MEM_BASES					0x8C28
+/* if PTR32, these are the bases for scratch and lds */
+#define	PRIVATE_BASE(x)					((x) << 0) /* scratch */
+#define	SHARED_BASE(x)					((x) << 16) /* LDS */
+#define	SH_MEM_APE1_BASE				0x8C2C
+/* if PTR32, this is the base location of GPUVM */
+#define	SH_MEM_APE1_LIMIT				0x8C30
+/* if PTR32, this is the upper limit of GPUVM */
+#define	SH_MEM_CONFIG					0x8C34
+#define	PTR32						(1 << 0)
+#define PRIVATE_ATC					(1 << 1)
+#define	ALIGNMENT_MODE(x)				((x) << 2)
+#define	SH_MEM_ALIGNMENT_MODE_DWORD			0
+#define	SH_MEM_ALIGNMENT_MODE_DWORD_STRICT		1
+#define	SH_MEM_ALIGNMENT_MODE_STRICT			2
+#define	SH_MEM_ALIGNMENT_MODE_UNALIGNED			3
+#define	DEFAULT_MTYPE(x)				((x) << 4)
+#define	APE1_MTYPE(x)					((x) << 7)
+
+/* valid for both DEFAULT_MTYPE and APE1_MTYPE */
+#define	MTYPE_CACHED					0
+#define	MTYPE_NONCACHED					3
+
+
+#define SH_STATIC_MEM_CONFIG				0x9604u
+
+#define	TC_CFG_L1_LOAD_POLICY0				0xAC68
+#define	TC_CFG_L1_LOAD_POLICY1				0xAC6C
+#define	TC_CFG_L1_STORE_POLICY				0xAC70
+#define	TC_CFG_L2_LOAD_POLICY0				0xAC74
+#define	TC_CFG_L2_LOAD_POLICY1				0xAC78
+#define	TC_CFG_L2_STORE_POLICY0				0xAC7C
+#define	TC_CFG_L2_STORE_POLICY1				0xAC80
+#define	TC_CFG_L2_ATOMIC_POLICY				0xAC84
+#define	TC_CFG_L1_VOLATILE				0xAC88
+#define	TC_CFG_L2_VOLATILE				0xAC8C
+
+#define CP_PQ_WPTR_POLL_CNTL				0xC20C
+#define	WPTR_POLL_EN					(1 << 31)
+
+#define CPC_INT_CNTL					0xC2D0
+#define CP_ME1_PIPE0_INT_CNTL				0xC214
+#define CP_ME1_PIPE1_INT_CNTL				0xC218
+#define CP_ME1_PIPE2_INT_CNTL				0xC21C
+#define CP_ME1_PIPE3_INT_CNTL				0xC220
+#define CP_ME2_PIPE0_INT_CNTL				0xC224
+#define CP_ME2_PIPE1_INT_CNTL				0xC228
+#define CP_ME2_PIPE2_INT_CNTL				0xC22C
+#define CP_ME2_PIPE3_INT_CNTL				0xC230
+#define DEQUEUE_REQUEST_INT_ENABLE			(1 << 13)
+#define WRM_POLL_TIMEOUT_INT_ENABLE			(1 << 17)
+#define PRIV_REG_INT_ENABLE				(1 << 23)
+#define TIME_STAMP_INT_ENABLE				(1 << 26)
+#define GENERIC2_INT_ENABLE				(1 << 29)
+#define GENERIC1_INT_ENABLE				(1 << 30)
+#define GENERIC0_INT_ENABLE				(1 << 31)
+#define CP_ME1_PIPE0_INT_STATUS				0xC214
+#define CP_ME1_PIPE1_INT_STATUS				0xC218
+#define CP_ME1_PIPE2_INT_STATUS				0xC21C
+#define CP_ME1_PIPE3_INT_STATUS				0xC220
+#define CP_ME2_PIPE0_INT_STATUS				0xC224
+#define CP_ME2_PIPE1_INT_STATUS				0xC228
+#define CP_ME2_PIPE2_INT_STATUS				0xC22C
+#define CP_ME2_PIPE3_INT_STATUS				0xC230
+#define DEQUEUE_REQUEST_INT_STATUS			(1 << 13)
+#define WRM_POLL_TIMEOUT_INT_STATUS			(1 << 17)
+#define PRIV_REG_INT_STATUS				(1 << 23)
+#define TIME_STAMP_INT_STATUS				(1 << 26)
+#define GENERIC2_INT_STATUS				(1 << 29)
+#define GENERIC1_INT_STATUS				(1 << 30)
+#define GENERIC0_INT_STATUS				(1 << 31)
+
+#define CP_HPD_EOP_BASE_ADDR				0xC904
+#define CP_HPD_EOP_BASE_ADDR_HI				0xC908
+#define CP_HPD_EOP_VMID					0xC90C
+#define CP_HPD_EOP_CONTROL				0xC910
+#define	EOP_SIZE(x)					((x) << 0)
+#define	EOP_SIZE_MASK					(0x3f << 0)
+#define CP_MQD_BASE_ADDR				0xC914
+#define CP_MQD_BASE_ADDR_HI				0xC918
+#define CP_HQD_ACTIVE					0xC91C
+#define CP_HQD_VMID					0xC920
+
+#define CP_HQD_PERSISTENT_STATE				0xC924u
+#define	DEFAULT_CP_HQD_PERSISTENT_STATE			(0x33U << 8)
+
+#define CP_HQD_PIPE_PRIORITY				0xC928u
+#define CP_HQD_QUEUE_PRIORITY				0xC92Cu
+#define CP_HQD_QUANTUM					0xC930u
+#define	QUANTUM_EN					1U
+#define	QUANTUM_SCALE_1MS				(1U << 4)
+#define	QUANTUM_DURATION(x)				((x) << 8)
+
+#define CP_HQD_PQ_BASE					0xC934
+#define CP_HQD_PQ_BASE_HI				0xC938
+#define CP_HQD_PQ_RPTR					0xC93C
+#define CP_HQD_PQ_RPTR_REPORT_ADDR			0xC940
+#define CP_HQD_PQ_RPTR_REPORT_ADDR_HI			0xC944
+#define CP_HQD_PQ_WPTR_POLL_ADDR			0xC948
+#define CP_HQD_PQ_WPTR_POLL_ADDR_HI			0xC94C
+#define CP_HQD_PQ_DOORBELL_CONTROL			0xC950
+#define	DOORBELL_OFFSET(x)				((x) << 2)
+#define	DOORBELL_OFFSET_MASK				(0x1fffff << 2)
+#define	DOORBELL_SOURCE					(1 << 28)
+#define	DOORBELL_SCHD_HIT				(1 << 29)
+#define	DOORBELL_EN					(1 << 30)
+#define	DOORBELL_HIT					(1 << 31)
+#define CP_HQD_PQ_WPTR					0xC954
+#define CP_HQD_PQ_CONTROL				0xC958
+#define	QUEUE_SIZE(x)					((x) << 0)
+#define	QUEUE_SIZE_MASK					(0x3f << 0)
+#define	RPTR_BLOCK_SIZE(x)				((x) << 8)
+#define	RPTR_BLOCK_SIZE_MASK				(0x3f << 8)
+#define	MIN_AVAIL_SIZE(x)				((x) << 20)
+#define	PQ_ATC_EN					(1 << 23)
+#define	PQ_VOLATILE					(1 << 26)
+#define	NO_UPDATE_RPTR					(1 << 27)
+#define	UNORD_DISPATCH					(1 << 28)
+#define	ROQ_PQ_IB_FLIP					(1 << 29)
+#define	PRIV_STATE					(1 << 30)
+#define	KMD_QUEUE					(1 << 31)
+
+#define	DEFAULT_RPTR_BLOCK_SIZE				RPTR_BLOCK_SIZE(5)
+#define	DEFAULT_MIN_AVAIL_SIZE				MIN_AVAIL_SIZE(3)
+
+#define CP_HQD_IB_BASE_ADDR				0xC95Cu
+#define CP_HQD_IB_BASE_ADDR_HI				0xC960u
+#define CP_HQD_IB_RPTR					0xC964u
+#define CP_HQD_IB_CONTROL				0xC968u
+#define	IB_ATC_EN					(1U << 23)
+#define	DEFAULT_MIN_IB_AVAIL_SIZE			(3U << 20)
+
+#define CP_HQD_DEQUEUE_REQUEST				0xC974
+#define	DEQUEUE_REQUEST_DRAIN				1
+#define DEQUEUE_REQUEST_RESET				2
+#define		DEQUEUE_INT					(1U << 8)
+
+#define CP_HQD_SEMA_CMD					0xC97Cu
+#define CP_HQD_MSG_TYPE					0xC980u
+#define CP_HQD_ATOMIC0_PREOP_LO				0xC984u
+#define CP_HQD_ATOMIC0_PREOP_HI				0xC988u
+#define CP_HQD_ATOMIC1_PREOP_LO				0xC98Cu
+#define CP_HQD_ATOMIC1_PREOP_HI				0xC990u
+#define CP_HQD_HQ_SCHEDULER0				0xC994u
+#define CP_HQD_HQ_SCHEDULER1				0xC998u
+
+
+#define CP_MQD_CONTROL					0xC99C
+#define	MQD_VMID(x)					((x) << 0)
+#define	MQD_VMID_MASK					(0xf << 0)
+#define	MQD_CONTROL_PRIV_STATE_EN			(1U << 8)
+
+#define GRBM_GFX_INDEX					0x30800
+#define	INSTANCE_INDEX(x)				((x) << 0)
+#define	SH_INDEX(x)					((x) << 8)
+#define	SE_INDEX(x)					((x) << 16)
+#define	SH_BROADCAST_WRITES				(1 << 29)
+#define	INSTANCE_BROADCAST_WRITES			(1 << 30)
+#define	SE_BROADCAST_WRITES				(1 << 31)
+
+#define SQC_CACHES					0x30d20
+#define SQC_POLICY					0x8C38u
+#define SQC_VOLATILE					0x8C3Cu
+
+#define CP_PERFMON_CNTL					0x36020
+
+#define ATC_VMID0_PASID_MAPPING				0x339Cu
+#define	ATC_VMID_PASID_MAPPING_UPDATE_STATUS		0x3398u
+#define	ATC_VMID_PASID_MAPPING_VALID			(1U << 31)
+
+#define ATC_VM_APERTURE0_CNTL				0x3310u
+#define	ATS_ACCESS_MODE_NEVER				0
+#define	ATS_ACCESS_MODE_ALWAYS				1
+
+#define ATC_VM_APERTURE0_CNTL2				0x3318u
+#define ATC_VM_APERTURE0_HIGH_ADDR			0x3308u
+#define ATC_VM_APERTURE0_LOW_ADDR			0x3300u
+#define ATC_VM_APERTURE1_CNTL				0x3314u
+#define ATC_VM_APERTURE1_CNTL2				0x331Cu
+#define ATC_VM_APERTURE1_HIGH_ADDR			0x330Cu
+#define ATC_VM_APERTURE1_LOW_ADDR			0x3304u
+
+#endif
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c
new file mode 100644
index 0000000..5f9f9b9
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c
@@ -0,0 +1,291 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include "kfd_priv.h"
+#include "kfd_mqd_manager.h"
+#include "cik_mqds.h"
+#include "cik_regs.h"
+#include "../cik_reg.h"
+
+inline uint32_t lower_32(uint64_t x)
+{
+	return (uint32_t)x;
+}
+
+inline uint32_t upper_32(uint64_t x)
+{
+	return (uint32_t)(x >> 32);
+}
+
+inline void busy_wait(unsigned long ms)
+{
+	while (time_before(jiffies, ms))
+		cpu_relax();
+}
+
+static inline struct cik_mqd *get_mqd(void *mqd)
+{
+	return (struct cik_mqd *)mqd;
+}
+
+static int init_mqd(struct mqd_manager *mm, void **mqd, kfd_mem_obj *mqd_mem_obj,
+		uint64_t *gart_addr, struct queue_properties *q)
+{
+	uint64_t addr;
+	struct cik_mqd *m;
+	int retval;
+
+	BUG_ON(!mm || !q || !mqd);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	retval = kfd_vidmem_alloc_map(
+				mm->dev,
+				mqd_mem_obj,
+				(void **)&m,
+				&addr,
+				ALIGN(sizeof(struct cik_mqd), 256));
+
+	if (retval != 0)
+		return -ENOMEM;
+
+	memset(m, 0, sizeof(struct cik_mqd));
+
+	m->header = 0xC0310800;
+	m->pipeline_stat_enable = 1;
+	m->static_thread_mgmt01[0] = 0xFFFFFFFF;
+	m->static_thread_mgmt01[1] = 0xFFFFFFFF;
+	m->static_thread_mgmt23[0] = 0xFFFFFFFF;
+	m->static_thread_mgmt23[1] = 0xFFFFFFFF;
+
+	m->queue_state.cp_hqd_persistent_state = DEFAULT_CP_HQD_PERSISTENT_STATE;
+
+	m->queue_state.cp_mqd_control             = MQD_CONTROL_PRIV_STATE_EN;
+	m->queue_state.cp_mqd_base_addr           = lower_32(addr);
+	m->queue_state.cp_mqd_base_addr_hi        = upper_32(addr);
+
+	m->queue_state.cp_hqd_ib_control = DEFAULT_MIN_IB_AVAIL_SIZE | IB_ATC_EN;
+	/* Although WinKFD writes this, I suspect it should not be necessary. */
+	m->queue_state.cp_hqd_ib_control = IB_ATC_EN | DEFAULT_MIN_IB_AVAIL_SIZE;
+
+	m->queue_state.cp_hqd_quantum = QUANTUM_EN | QUANTUM_SCALE_1MS | QUANTUM_DURATION(10);
+
+	m->queue_state.cp_hqd_pipe_priority = 1;
+	m->queue_state.cp_hqd_queue_priority = 15;
+
+	*mqd = m;
+	if (gart_addr != NULL)
+		*gart_addr = addr;
+	retval = mm->update_mqd(mm, m, q);
+
+	return retval;
+}
+
+static void uninit_mqd(struct mqd_manager *mm, void *mqd, kfd_mem_obj mqd_mem_obj)
+{
+	BUG_ON(!mm || !mqd);
+	kfd_vidmem_free_unmap(mm->dev, mqd_mem_obj);
+}
+
+static int load_mqd(struct mqd_manager *mm, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr)
+{
+	return kfd2kgd->hqd_load(mm->dev->kgd, mqd, pipe_id, queue_id, wptr);
+
+}
+
+static int update_mqd(struct mqd_manager *mm, void *mqd, struct queue_properties *q)
+{
+	struct cik_mqd *m;
+
+	BUG_ON(!mm || !q || !mqd);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	m = get_mqd(mqd);
+	m->queue_state.cp_hqd_pq_control = DEFAULT_RPTR_BLOCK_SIZE | DEFAULT_MIN_AVAIL_SIZE | PQ_ATC_EN;
+	/* calculating queue size which is log base 2 of actual queue size -1 dwords and another -1 for ffs */
+	m->queue_state.cp_hqd_pq_control |= ffs(q->queue_size / sizeof(unsigned int)) - 1 - 1;
+	m->queue_state.cp_hqd_pq_base = lower_32((uint64_t)q->queue_address >> 8);
+	m->queue_state.cp_hqd_pq_base_hi = upper_32((uint64_t)q->queue_address >> 8);
+	m->queue_state.cp_hqd_pq_rptr_report_addr = lower_32((uint64_t)q->read_ptr);
+	m->queue_state.cp_hqd_pq_rptr_report_addr_hi = upper_32((uint64_t)q->read_ptr);
+	m->queue_state.cp_hqd_pq_doorbell_control = DOORBELL_EN | DOORBELL_OFFSET(q->doorbell_off);
+
+	m->queue_state.cp_hqd_vmid = q->vmid;
+
+	m->queue_state.cp_hqd_active = 0;
+	q->is_active = false;
+	if (q->queue_size > 0 &&
+			q->queue_address != 0 &&
+			q->queue_percent > 0) {
+		m->queue_state.cp_hqd_active = 1;
+		q->is_active = true;
+	}
+
+	return 0;
+}
+
+static int destroy_mqd(struct mqd_manager *mm, bool is_reset, unsigned int timeout, uint32_t pipe_id, uint32_t queue_id)
+{
+	return kfd2kgd->hqd_destroy(mm->dev->kgd, is_reset, timeout, pipe_id, queue_id);
+}
+
+bool is_occupied(struct mqd_manager *mm, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id)
+{
+
+	return kfd2kgd->hqd_is_occupies(mm->dev->kgd, queue_address, pipe_id, queue_id);
+
+}
+
+/*
+ * HIQ MQD Implementation
+ */
+
+static int init_mqd_hiq(struct mqd_manager *mm, void **mqd, kfd_mem_obj *mqd_mem_obj,
+		uint64_t *gart_addr, struct queue_properties *q)
+{
+	uint64_t addr;
+	struct cik_mqd *m;
+	int retval;
+
+	BUG_ON(!mm || !q || !mqd || !mqd_mem_obj);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	retval = kfd_vidmem_alloc_map(
+				mm->dev,
+				mqd_mem_obj,
+				(void **)&m,
+				&addr,
+				ALIGN(sizeof(struct cik_mqd), PAGE_SIZE));
+
+	if (retval != 0)
+		return -ENOMEM;
+
+	memset(m, 0, sizeof(struct cik_mqd));
+
+	m->header = 0xC0310800;
+	m->pipeline_stat_enable = 1;
+	m->static_thread_mgmt01[0] = 0xFFFFFFFF;
+	m->static_thread_mgmt01[1] = 0xFFFFFFFF;
+	m->static_thread_mgmt23[0] = 0xFFFFFFFF;
+	m->static_thread_mgmt23[1] = 0xFFFFFFFF;
+
+	m->queue_state.cp_hqd_persistent_state = DEFAULT_CP_HQD_PERSISTENT_STATE;
+
+	m->queue_state.cp_mqd_control             = MQD_CONTROL_PRIV_STATE_EN;
+	m->queue_state.cp_mqd_base_addr           = lower_32(addr);
+	m->queue_state.cp_mqd_base_addr_hi        = upper_32(addr);
+
+	m->queue_state.cp_hqd_ib_control = DEFAULT_MIN_IB_AVAIL_SIZE;
+
+	m->queue_state.cp_hqd_quantum = QUANTUM_EN | QUANTUM_SCALE_1MS | QUANTUM_DURATION(10);
+
+	m->queue_state.cp_hqd_pipe_priority = 1;
+	m->queue_state.cp_hqd_queue_priority = 15;
+
+	*mqd = m;
+	if (gart_addr)
+		*gart_addr = addr;
+	retval = mm->update_mqd(mm, m, q);
+
+	return retval;
+}
+
+static int update_mqd_hiq(struct mqd_manager *mm, void *mqd, struct queue_properties *q)
+{
+	struct cik_mqd *m;
+
+	BUG_ON(!mm || !q || !mqd);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	m = get_mqd(mqd);
+	m->queue_state.cp_hqd_pq_control = DEFAULT_RPTR_BLOCK_SIZE | DEFAULT_MIN_AVAIL_SIZE | PRIV_STATE | KMD_QUEUE;
+	/* calculating queue size which is log base 2 of actual queue size -1 dwords */
+	m->queue_state.cp_hqd_pq_control |= ffs(q->queue_size / sizeof(unsigned int)) - 1 - 1;
+	m->queue_state.cp_hqd_pq_base = lower_32((uint64_t)q->queue_address >> 8);
+	m->queue_state.cp_hqd_pq_base_hi = upper_32((uint64_t)q->queue_address >> 8);
+	m->queue_state.cp_hqd_pq_rptr_report_addr = lower_32((uint64_t)q->read_ptr);
+	m->queue_state.cp_hqd_pq_rptr_report_addr_hi = upper_32((uint64_t)q->read_ptr);
+	m->queue_state.cp_hqd_pq_doorbell_control = DOORBELL_EN | DOORBELL_OFFSET(q->doorbell_off);
+
+	m->queue_state.cp_hqd_vmid = q->vmid;
+
+	m->queue_state.cp_hqd_active = 0;
+	q->is_active = false;
+	if (q->queue_size > 0 &&
+			q->queue_address != 0 &&
+			q->queue_percent > 0) {
+		m->queue_state.cp_hqd_active = 1;
+		q->is_active = true;
+	}
+
+	return 0;
+}
+
+struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type, struct kfd_dev *dev)
+{
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dev);
+	BUG_ON(type >= KFD_MQD_TYPE_MAX);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	mqd = kzalloc(sizeof(struct mqd_manager), GFP_KERNEL);
+	if (!mqd)
+		return NULL;
+
+	mqd->dev = dev;
+
+	switch (type) {
+	case KFD_MQD_TYPE_CIK_CP:
+	case KFD_MQD_TYPE_CIK_COMPUTE:
+		mqd->init_mqd = init_mqd;
+		mqd->uninit_mqd = uninit_mqd;
+		mqd->load_mqd = load_mqd;
+		mqd->update_mqd = update_mqd;
+		mqd->destroy_mqd = destroy_mqd;
+		mqd->is_occupied = is_occupied;
+		break;
+	case KFD_MQD_TYPE_CIK_HIQ:
+		mqd->init_mqd = init_mqd_hiq;
+		mqd->uninit_mqd = uninit_mqd;
+		mqd->load_mqd = load_mqd;
+		mqd->update_mqd = update_mqd_hiq;
+		mqd->destroy_mqd = destroy_mqd;
+		mqd->is_occupied = is_occupied;
+		break;
+	default:
+		kfree(mqd);
+		return NULL;
+		break;
+	}
+
+	return mqd;
+}
+
+/* SDMA queues should be implemented here when the cp will supports them */
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h b/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h
new file mode 100644
index 0000000..a6b0007
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef KFD_MQD_MANAGER_H_
+#define KFD_MQD_MANAGER_H_
+
+#include "kfd_priv.h"
+
+struct mqd_manager {
+	int	(*init_mqd)(struct mqd_manager *mm, void **mqd,
+			kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
+			struct queue_properties *q);
+
+	int	(*load_mqd)(struct mqd_manager *mm, void *mqd,
+				uint32_t pipe_id, uint32_t queue_id,
+				uint32_t __user *wptr);
+
+	int	(*update_mqd)(struct mqd_manager *mm, void *mqd,
+				struct queue_properties *q);
+
+	int	(*destroy_mqd)(struct mqd_manager *mm, bool is_reset,
+				unsigned int timeout, uint32_t pipe_id,
+				uint32_t queue_id);
+
+	void	(*uninit_mqd)(struct mqd_manager *mm, void *mqd,
+				kfd_mem_obj mqd_mem_obj);
+	bool	(*is_occupied)(struct mqd_manager *mm, uint64_t queue_address,
+				uint32_t pipe_id, uint32_t queue_id);
+
+	struct mutex	mqd_mutex;
+	struct kfd_dev	*dev;
+};
+
+#endif /* KFD_MQD_MANAGER_H_ */
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
index 94ff1c3..76494757 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -179,6 +179,14 @@ struct queue {
 	struct kfd_dev		*device;
 };
 
+enum KFD_MQD_TYPE {
+	KFD_MQD_TYPE_CIK_COMPUTE = 0, /* for no cp scheduling */
+	KFD_MQD_TYPE_CIK_HIQ, /* for hiq */
+	KFD_MQD_TYPE_CIK_CP, /* for cp queues and diq */
+	KFD_MQD_TYPE_CIK_SDMA, /* for sdma queues */
+	KFD_MQD_TYPE_MAX
+};
+
 /* Data that is per-process-per device. */
 struct kfd_process_device {
 	/*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 14/25] amdkfd: Add mqd_manager module
  2014-07-17 13:29 ` [PATCH v2 14/25] amdkfd: Add mqd_manager module Oded Gabbay
@ 2014-07-21  2:33   ` Jerome Glisse
  2014-08-02 19:18     ` Oded Gabbay
  0 siblings, 1 reply; 46+ messages in thread
From: Jerome Glisse @ 2014-07-21  2:33 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Evgeny Pinchuk, Alexey Skidanov, Alex Deucher, Andrew Morton,
	Christian König

On Thu, Jul 17, 2014 at 04:29:21PM +0300, Oded Gabbay wrote:
> From: Ben Goz <ben.goz@amd.com>
> 
> The mqd_manager module handles MQD data structures. MQD stands for Memory Queue Descriptor, which is used by the H/W to keep the usermode queue state in memory.
> 
> Signed-off-by: Ben Goz <ben.goz@amd.com>
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/drm/radeon/amdkfd/Makefile          |   2 +-
>  drivers/gpu/drm/radeon/amdkfd/cik_mqds.h        | 185 +++++++++++++++
>  drivers/gpu/drm/radeon/amdkfd/cik_regs.h        | 220 ++++++++++++++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c | 291 ++++++++++++++++++++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h |  54 +++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_priv.h        |   8 +
>  6 files changed, 759 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/cik_mqds.h
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/cik_regs.h
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h
> 
> diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
> index dbff147..b5201f4 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/Makefile
> +++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
> @@ -6,6 +6,6 @@ ccflags-y := -Iinclude/drm
>  
>  amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
>  		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
> -		kfd_process.o kfd_queue.o
> +		kfd_process.o kfd_queue.o kfd_mqd_manager.o
>  
>  obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
> diff --git a/drivers/gpu/drm/radeon/amdkfd/cik_mqds.h b/drivers/gpu/drm/radeon/amdkfd/cik_mqds.h
> new file mode 100644
> index 0000000..ce75604
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/cik_mqds.h
> @@ -0,0 +1,185 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef CIK_MQDS_H_
> +#define CIK_MQDS_H_
> +
> +#pragma pack(push, 4)

No pragma pack.

> +
> +struct cik_hpd_registers {
> +	u32 cp_hpd_roq_offsets;
> +	u32 cp_hpd_eop_base_addr;
> +	u32 cp_hpd_eop_base_addr_hi;
> +	u32 cp_hpd_eop_vmid;
> +	u32 cp_hpd_eop_control;
> +};
> +
> +/* This structure represents mqd used for cp scheduling queue
> + * taken from Gfx72_cp_program_spec.pdf
> + */
> +struct cik_compute_mqd {
> +	u32 header;
> +	u32 compute_dispatch_initiator;
> +	u32 compute_dim_x;
> +	u32 compute_dim_y;
> +	u32 compute_dim_z;
> +	u32 compute_start_x;
> +	u32 compute_start_y;
> +	u32 compute_start_z;
> +	u32 compute_num_thread_x;
> +	u32 compute_num_thread_y;
> +	u32 compute_num_thread_z;
> +	u32 compute_pipelinestat_enable;
> +	u32 compute_perfcount_enable;
> +	u32 compute_pgm_lo;
> +	u32 compute_pgm_hi;
> +	u32 compute_tba_lo;
> +	u32 compute_tba_hi;
> +	u32 compute_tma_lo;
> +	u32 compute_tma_hi;
> +	u32 compute_pgm_rsrc1;
> +	u32 compute_pgm_rsrc2;
> +	u32 compute_vmid;
> +	u32 compute_resource_limits;
> +	u32 compute_static_thread_mgmt_se0;
> +	u32 compute_static_thread_mgmt_se1;
> +	u32 compute_tmpring_size;
> +	u32 compute_static_thread_mgmt_se2;
> +	u32 compute_static_thread_mgmt_se3;
> +	u32 compute_restart_x;
> +	u32 compute_restart_y;
> +	u32 compute_restart_z;
> +	u32 compute_thread_trace_enable;
> +	u32 compute_misc_reserved;
> +	u32 compute_user_data[16];
> +	u32 vgt_csinvoc_count_lo;
> +	u32 vgt_csinvoc_count_hi;
> +	u32 cp_mqd_base_addr51;
> +	u32 cp_mqd_base_addr_hi;
> +	u32 cp_hqd_active;
> +	u32 cp_hqd_vmid;
> +	u32 cp_hqd_persistent_state;
> +	u32 cp_hqd_pipe_priority;
> +	u32 cp_hqd_queue_priority;
> +	u32 cp_hqd_quantum;
> +	u32 cp_hqd_pq_base;
> +	u32 cp_hqd_pq_base_hi;
> +	u32 cp_hqd_pq_rptr;
> +	u32 cp_hqd_pq_rptr_report_addr;
> +	u32 cp_hqd_pq_rptr_report_addr_hi;
> +	u32 cp_hqd_pq_wptr_poll_addr;
> +	u32 cp_hqd_pq_wptr_poll_addr_hi;
> +	u32 cp_hqd_pq_doorbell_control;
> +	u32 cp_hqd_pq_wptr;
> +	u32 cp_hqd_pq_control;
> +	u32 cp_hqd_ib_base_addr;
> +	u32 cp_hqd_ib_base_addr_hi;
> +	u32 cp_hqd_ib_rptr;
> +	u32 cp_hqd_ib_control;
> +	u32 cp_hqd_iq_timer;
> +	u32 cp_hqd_iq_rptr;
> +	u32 cp_hqd_dequeue_request;
> +	u32 cp_hqd_dma_offload;
> +	u32 cp_hqd_sema_cmd;
> +	u32 cp_hqd_msg_type;
> +	u32 cp_hqd_atomic0_preop_lo;
> +	u32 cp_hqd_atomic0_preop_hi;
> +	u32 cp_hqd_atomic1_preop_lo;
> +	u32 cp_hqd_atomic1_preop_hi;
> +	u32 cp_hqd_hq_scheduler0;
> +	u32 cp_hqd_hq_scheduler1;
> +	u32 cp_mqd_control;
> +	u32 reserved1[10];
> +	u32 cp_mqd_query_time_lo;
> +	u32 cp_mqd_query_time_hi;
> +	u32 reserved2[4];
> +	u32 cp_mqd_connect_start_time_lo;
> +	u32 cp_mqd_connect_start_time_hi;
> +	u32 cp_mqd_connect_end_time_lo;
> +	u32 cp_mqd_connect_end_time_hi;
> +	u32 cp_mqd_connect_end_wf_count;
> +	u32 cp_mqd_connect_end_pq_rptr;
> +	u32 cp_mqd_connect_end_pq_wptr;
> +	u32 cp_mqd_connect_end_ib_rptr;
> +	u32 reserved3[18];
> +};
> +
> +/* This structure represents all *IQs
> + * Taken from Gfx73_CPC_Eng_Init_Prog.pdf
> + */
> +struct cik_interface_mqd {
> +	u32 reserved1[128];
> +	u32 cp_mqd_base_addr;
> +	u32 cp_mqd_base_addr_hi;
> +	u32 cp_hqd_active;
> +	u32 cp_hqd_vmid;
> +	u32 cp_hqd_persistent_state;
> +	u32 cp_hqd_pipe_priority;
> +	u32 cp_hqd_queue_priority;
> +	u32 cp_hqd_quantum;
> +	u32 cp_hqd_pq_base;
> +	u32 cp_hqd_pq_base_hi;
> +	u32 cp_hqd_pq_rptr;
> +	u32 cp_hqd_pq_rptr_report_addr;
> +	u32 cp_hqd_pq_rptr_report_addr_hi;
> +	u32 cp_hqd_pq_wptr_poll_addr;
> +	u32 cp_hqd_pq_wptr_poll_addr_hi;
> +	u32 cp_hqd_pq_doorbell_control;
> +	u32 cp_hqd_pq_wptr;
> +	u32 cp_hqd_pq_control;
> +	u32 cp_hqd_ib_base_addr;
> +	u32 cp_hqd_ib_base_addr_hi;
> +	u32 cp_hqd_ib_rptr;
> +	u32 cp_hqd_ib_control;
> +	u32 cp_hqd_iq_timer;
> +	u32 cp_hqd_iq_rptr;
> +	u32 cp_hqd_dequeue_request;
> +	u32 cp_hqd_dma_offload;
> +	u32 cp_hqd_sema_cmd;
> +	u32 cp_hqd_msg_type;
> +	u32 cp_hqd_atomic0_preop_lo;
> +	u32 cp_hqd_atomic0_preop_hi;
> +	u32 cp_hqd_atomic1_preop_lo;
> +	u32 cp_hqd_atomic1_preop_hi;
> +	u32 cp_hqd_hq_status0;
> +	u32 cp_hqd_hq_control0;
> +	u32 cp_mqd_control;
> +	u32 reserved2[3];
> +	u32 cp_hqd_hq_status1;
> +	u32 cp_hqd_hq_control1;
> +	u32 reserved3[16];
> +	u32 cp_hqd_hq_status2;
> +	u32 cp_hqd_hq_control2;
> +	u32 cp_hqd_hq_status3;
> +	u32 cp_hqd_hq_control3;
> +	u32 reserved4[2];
> +	u32 cp_mqd_query_time_lo;
> +	u32 cp_mqd_query_time_hi;
> +	u32 reserved5[48];
> +	u32 cp_mqd_skip_process[16];
> +};

I have not fully check but very few of the above fields are use. So please
do strip this structure to only used field we need to keep stack use as low
as possible. Moreover the whole reserved* business kind of tell me that this
is done to match register layout which i would rather avoid being use as a
struct.

> +
> +#pragma pack(pop)
> +
> +
> +#endif /* CIK_MQDS_H_ */
> diff --git a/drivers/gpu/drm/radeon/amdkfd/cik_regs.h b/drivers/gpu/drm/radeon/amdkfd/cik_regs.h
> new file mode 100644
> index 0000000..a6404e3
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/cik_regs.h
> @@ -0,0 +1,220 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef CIK_REGS_H
> +#define CIK_REGS_H
> +
> +#define IH_VMID_0_LUT					0x3D40u
> +
> +#define BIF_DOORBELL_CNTL				0x530Cu
> +
> +#define	SRBM_GFX_CNTL					0xE44
> +#define	PIPEID(x)					((x) << 0)
> +#define	MEID(x)						((x) << 2)
> +#define	VMID(x)						((x) << 4)
> +#define	QUEUEID(x)					((x) << 8)
> +
> +#define	SQ_CONFIG					0x8C00
> +
> +#define	SH_MEM_BASES					0x8C28
> +/* if PTR32, these are the bases for scratch and lds */
> +#define	PRIVATE_BASE(x)					((x) << 0) /* scratch */
> +#define	SHARED_BASE(x)					((x) << 16) /* LDS */
> +#define	SH_MEM_APE1_BASE				0x8C2C
> +/* if PTR32, this is the base location of GPUVM */
> +#define	SH_MEM_APE1_LIMIT				0x8C30
> +/* if PTR32, this is the upper limit of GPUVM */
> +#define	SH_MEM_CONFIG					0x8C34
> +#define	PTR32						(1 << 0)
> +#define PRIVATE_ATC					(1 << 1)
> +#define	ALIGNMENT_MODE(x)				((x) << 2)
> +#define	SH_MEM_ALIGNMENT_MODE_DWORD			0
> +#define	SH_MEM_ALIGNMENT_MODE_DWORD_STRICT		1
> +#define	SH_MEM_ALIGNMENT_MODE_STRICT			2
> +#define	SH_MEM_ALIGNMENT_MODE_UNALIGNED			3
> +#define	DEFAULT_MTYPE(x)				((x) << 4)
> +#define	APE1_MTYPE(x)					((x) << 7)
> +
> +/* valid for both DEFAULT_MTYPE and APE1_MTYPE */
> +#define	MTYPE_CACHED					0
> +#define	MTYPE_NONCACHED					3
> +
> +
> +#define SH_STATIC_MEM_CONFIG				0x9604u
> +
> +#define	TC_CFG_L1_LOAD_POLICY0				0xAC68
> +#define	TC_CFG_L1_LOAD_POLICY1				0xAC6C
> +#define	TC_CFG_L1_STORE_POLICY				0xAC70
> +#define	TC_CFG_L2_LOAD_POLICY0				0xAC74
> +#define	TC_CFG_L2_LOAD_POLICY1				0xAC78
> +#define	TC_CFG_L2_STORE_POLICY0				0xAC7C
> +#define	TC_CFG_L2_STORE_POLICY1				0xAC80
> +#define	TC_CFG_L2_ATOMIC_POLICY				0xAC84
> +#define	TC_CFG_L1_VOLATILE				0xAC88
> +#define	TC_CFG_L2_VOLATILE				0xAC8C
> +
> +#define CP_PQ_WPTR_POLL_CNTL				0xC20C
> +#define	WPTR_POLL_EN					(1 << 31)
> +
> +#define CPC_INT_CNTL					0xC2D0
> +#define CP_ME1_PIPE0_INT_CNTL				0xC214
> +#define CP_ME1_PIPE1_INT_CNTL				0xC218
> +#define CP_ME1_PIPE2_INT_CNTL				0xC21C
> +#define CP_ME1_PIPE3_INT_CNTL				0xC220
> +#define CP_ME2_PIPE0_INT_CNTL				0xC224
> +#define CP_ME2_PIPE1_INT_CNTL				0xC228
> +#define CP_ME2_PIPE2_INT_CNTL				0xC22C
> +#define CP_ME2_PIPE3_INT_CNTL				0xC230
> +#define DEQUEUE_REQUEST_INT_ENABLE			(1 << 13)
> +#define WRM_POLL_TIMEOUT_INT_ENABLE			(1 << 17)
> +#define PRIV_REG_INT_ENABLE				(1 << 23)
> +#define TIME_STAMP_INT_ENABLE				(1 << 26)
> +#define GENERIC2_INT_ENABLE				(1 << 29)
> +#define GENERIC1_INT_ENABLE				(1 << 30)
> +#define GENERIC0_INT_ENABLE				(1 << 31)
> +#define CP_ME1_PIPE0_INT_STATUS				0xC214
> +#define CP_ME1_PIPE1_INT_STATUS				0xC218
> +#define CP_ME1_PIPE2_INT_STATUS				0xC21C
> +#define CP_ME1_PIPE3_INT_STATUS				0xC220
> +#define CP_ME2_PIPE0_INT_STATUS				0xC224
> +#define CP_ME2_PIPE1_INT_STATUS				0xC228
> +#define CP_ME2_PIPE2_INT_STATUS				0xC22C
> +#define CP_ME2_PIPE3_INT_STATUS				0xC230
> +#define DEQUEUE_REQUEST_INT_STATUS			(1 << 13)
> +#define WRM_POLL_TIMEOUT_INT_STATUS			(1 << 17)
> +#define PRIV_REG_INT_STATUS				(1 << 23)
> +#define TIME_STAMP_INT_STATUS				(1 << 26)
> +#define GENERIC2_INT_STATUS				(1 << 29)
> +#define GENERIC1_INT_STATUS				(1 << 30)
> +#define GENERIC0_INT_STATUS				(1 << 31)
> +
> +#define CP_HPD_EOP_BASE_ADDR				0xC904
> +#define CP_HPD_EOP_BASE_ADDR_HI				0xC908
> +#define CP_HPD_EOP_VMID					0xC90C
> +#define CP_HPD_EOP_CONTROL				0xC910
> +#define	EOP_SIZE(x)					((x) << 0)
> +#define	EOP_SIZE_MASK					(0x3f << 0)
> +#define CP_MQD_BASE_ADDR				0xC914
> +#define CP_MQD_BASE_ADDR_HI				0xC918
> +#define CP_HQD_ACTIVE					0xC91C
> +#define CP_HQD_VMID					0xC920
> +
> +#define CP_HQD_PERSISTENT_STATE				0xC924u
> +#define	DEFAULT_CP_HQD_PERSISTENT_STATE			(0x33U << 8)
> +
> +#define CP_HQD_PIPE_PRIORITY				0xC928u
> +#define CP_HQD_QUEUE_PRIORITY				0xC92Cu
> +#define CP_HQD_QUANTUM					0xC930u
> +#define	QUANTUM_EN					1U
> +#define	QUANTUM_SCALE_1MS				(1U << 4)
> +#define	QUANTUM_DURATION(x)				((x) << 8)
> +
> +#define CP_HQD_PQ_BASE					0xC934
> +#define CP_HQD_PQ_BASE_HI				0xC938
> +#define CP_HQD_PQ_RPTR					0xC93C
> +#define CP_HQD_PQ_RPTR_REPORT_ADDR			0xC940
> +#define CP_HQD_PQ_RPTR_REPORT_ADDR_HI			0xC944
> +#define CP_HQD_PQ_WPTR_POLL_ADDR			0xC948
> +#define CP_HQD_PQ_WPTR_POLL_ADDR_HI			0xC94C
> +#define CP_HQD_PQ_DOORBELL_CONTROL			0xC950
> +#define	DOORBELL_OFFSET(x)				((x) << 2)
> +#define	DOORBELL_OFFSET_MASK				(0x1fffff << 2)
> +#define	DOORBELL_SOURCE					(1 << 28)
> +#define	DOORBELL_SCHD_HIT				(1 << 29)
> +#define	DOORBELL_EN					(1 << 30)
> +#define	DOORBELL_HIT					(1 << 31)
> +#define CP_HQD_PQ_WPTR					0xC954
> +#define CP_HQD_PQ_CONTROL				0xC958
> +#define	QUEUE_SIZE(x)					((x) << 0)
> +#define	QUEUE_SIZE_MASK					(0x3f << 0)
> +#define	RPTR_BLOCK_SIZE(x)				((x) << 8)
> +#define	RPTR_BLOCK_SIZE_MASK				(0x3f << 8)
> +#define	MIN_AVAIL_SIZE(x)				((x) << 20)
> +#define	PQ_ATC_EN					(1 << 23)
> +#define	PQ_VOLATILE					(1 << 26)
> +#define	NO_UPDATE_RPTR					(1 << 27)
> +#define	UNORD_DISPATCH					(1 << 28)
> +#define	ROQ_PQ_IB_FLIP					(1 << 29)
> +#define	PRIV_STATE					(1 << 30)
> +#define	KMD_QUEUE					(1 << 31)
> +
> +#define	DEFAULT_RPTR_BLOCK_SIZE				RPTR_BLOCK_SIZE(5)
> +#define	DEFAULT_MIN_AVAIL_SIZE				MIN_AVAIL_SIZE(3)
> +
> +#define CP_HQD_IB_BASE_ADDR				0xC95Cu
> +#define CP_HQD_IB_BASE_ADDR_HI				0xC960u
> +#define CP_HQD_IB_RPTR					0xC964u
> +#define CP_HQD_IB_CONTROL				0xC968u
> +#define	IB_ATC_EN					(1U << 23)
> +#define	DEFAULT_MIN_IB_AVAIL_SIZE			(3U << 20)
> +
> +#define CP_HQD_DEQUEUE_REQUEST				0xC974
> +#define	DEQUEUE_REQUEST_DRAIN				1
> +#define DEQUEUE_REQUEST_RESET				2
> +#define		DEQUEUE_INT					(1U << 8)
> +
> +#define CP_HQD_SEMA_CMD					0xC97Cu
> +#define CP_HQD_MSG_TYPE					0xC980u
> +#define CP_HQD_ATOMIC0_PREOP_LO				0xC984u
> +#define CP_HQD_ATOMIC0_PREOP_HI				0xC988u
> +#define CP_HQD_ATOMIC1_PREOP_LO				0xC98Cu
> +#define CP_HQD_ATOMIC1_PREOP_HI				0xC990u
> +#define CP_HQD_HQ_SCHEDULER0				0xC994u
> +#define CP_HQD_HQ_SCHEDULER1				0xC998u
> +
> +
> +#define CP_MQD_CONTROL					0xC99C
> +#define	MQD_VMID(x)					((x) << 0)
> +#define	MQD_VMID_MASK					(0xf << 0)
> +#define	MQD_CONTROL_PRIV_STATE_EN			(1U << 8)
> +
> +#define GRBM_GFX_INDEX					0x30800
> +#define	INSTANCE_INDEX(x)				((x) << 0)
> +#define	SH_INDEX(x)					((x) << 8)
> +#define	SE_INDEX(x)					((x) << 16)
> +#define	SH_BROADCAST_WRITES				(1 << 29)
> +#define	INSTANCE_BROADCAST_WRITES			(1 << 30)
> +#define	SE_BROADCAST_WRITES				(1 << 31)
> +
> +#define SQC_CACHES					0x30d20
> +#define SQC_POLICY					0x8C38u
> +#define SQC_VOLATILE					0x8C3Cu
> +
> +#define CP_PERFMON_CNTL					0x36020
> +
> +#define ATC_VMID0_PASID_MAPPING				0x339Cu
> +#define	ATC_VMID_PASID_MAPPING_UPDATE_STATUS		0x3398u
> +#define	ATC_VMID_PASID_MAPPING_VALID			(1U << 31)
> +
> +#define ATC_VM_APERTURE0_CNTL				0x3310u
> +#define	ATS_ACCESS_MODE_NEVER				0
> +#define	ATS_ACCESS_MODE_ALWAYS				1
> +
> +#define ATC_VM_APERTURE0_CNTL2				0x3318u
> +#define ATC_VM_APERTURE0_HIGH_ADDR			0x3308u
> +#define ATC_VM_APERTURE0_LOW_ADDR			0x3300u
> +#define ATC_VM_APERTURE1_CNTL				0x3314u
> +#define ATC_VM_APERTURE1_CNTL2				0x331Cu
> +#define ATC_VM_APERTURE1_HIGH_ADDR			0x330Cu
> +#define ATC_VM_APERTURE1_LOW_ADDR			0x3304u
> +
> +#endif
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c
> new file mode 100644
> index 0000000..5f9f9b9
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c
> @@ -0,0 +1,291 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include "kfd_priv.h"
> +#include "kfd_mqd_manager.h"
> +#include "cik_mqds.h"
> +#include "cik_regs.h"
> +#include "../cik_reg.h"
> +
> +inline uint32_t lower_32(uint64_t x)
> +{
> +	return (uint32_t)x;
> +}
> +
> +inline uint32_t upper_32(uint64_t x)
> +{
> +	return (uint32_t)(x >> 32);
> +}

Do use kernel macro upper_32_bits or lower_32_bits. Each time you do something
like that go check for existing macro.

> +
> +inline void busy_wait(unsigned long ms)
> +{
> +	while (time_before(jiffies, ms))
> +		cpu_relax();
> +}
> +
> +static inline struct cik_mqd *get_mqd(void *mqd)
> +{
> +	return (struct cik_mqd *)mqd;
> +}
> +
> +static int init_mqd(struct mqd_manager *mm, void **mqd, kfd_mem_obj *mqd_mem_obj,
> +		uint64_t *gart_addr, struct queue_properties *q)
> +{
> +	uint64_t addr;
> +	struct cik_mqd *m;
> +	int retval;
> +
> +	BUG_ON(!mm || !q || !mqd);
> +
> +	pr_debug("kfd: In func %s\n", __func__);
> +
> +	retval = kfd_vidmem_alloc_map(
> +				mm->dev,
> +				mqd_mem_obj,
> +				(void **)&m,
> +				&addr,
> +				ALIGN(sizeof(struct cik_mqd), 256));
> +
> +	if (retval != 0)
> +		return -ENOMEM;
> +
> +	memset(m, 0, sizeof(struct cik_mqd));
> +
> +	m->header = 0xC0310800;
> +	m->pipeline_stat_enable = 1;
> +	m->static_thread_mgmt01[0] = 0xFFFFFFFF;
> +	m->static_thread_mgmt01[1] = 0xFFFFFFFF;
> +	m->static_thread_mgmt23[0] = 0xFFFFFFFF;
> +	m->static_thread_mgmt23[1] = 0xFFFFFFFF;
> +
> +	m->queue_state.cp_hqd_persistent_state = DEFAULT_CP_HQD_PERSISTENT_STATE;
> +
> +	m->queue_state.cp_mqd_control             = MQD_CONTROL_PRIV_STATE_EN;
> +	m->queue_state.cp_mqd_base_addr           = lower_32(addr);
> +	m->queue_state.cp_mqd_base_addr_hi        = upper_32(addr);
> +
> +	m->queue_state.cp_hqd_ib_control = DEFAULT_MIN_IB_AVAIL_SIZE | IB_ATC_EN;
> +	/* Although WinKFD writes this, I suspect it should not be necessary. */
> +	m->queue_state.cp_hqd_ib_control = IB_ATC_EN | DEFAULT_MIN_IB_AVAIL_SIZE;
> +
> +	m->queue_state.cp_hqd_quantum = QUANTUM_EN | QUANTUM_SCALE_1MS | QUANTUM_DURATION(10);
> +
> +	m->queue_state.cp_hqd_pipe_priority = 1;
> +	m->queue_state.cp_hqd_queue_priority = 15;
> +
> +	*mqd = m;
> +	if (gart_addr != NULL)
> +		*gart_addr = addr;
> +	retval = mm->update_mqd(mm, m, q);
> +
> +	return retval;
> +}
> +
> +static void uninit_mqd(struct mqd_manager *mm, void *mqd, kfd_mem_obj mqd_mem_obj)
> +{
> +	BUG_ON(!mm || !mqd);
> +	kfd_vidmem_free_unmap(mm->dev, mqd_mem_obj);
> +}
> +
> +static int load_mqd(struct mqd_manager *mm, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr)
> +{
> +	return kfd2kgd->hqd_load(mm->dev->kgd, mqd, pipe_id, queue_id, wptr);
> +
> +}
> +
> +static int update_mqd(struct mqd_manager *mm, void *mqd, struct queue_properties *q)
> +{
> +	struct cik_mqd *m;
> +
> +	BUG_ON(!mm || !q || !mqd);
> +
> +	pr_debug("kfd: In func %s\n", __func__);
> +
> +	m = get_mqd(mqd);
> +	m->queue_state.cp_hqd_pq_control = DEFAULT_RPTR_BLOCK_SIZE | DEFAULT_MIN_AVAIL_SIZE | PQ_ATC_EN;
> +	/* calculating queue size which is log base 2 of actual queue size -1 dwords and another -1 for ffs */
> +	m->queue_state.cp_hqd_pq_control |= ffs(q->queue_size / sizeof(unsigned int)) - 1 - 1;
> +	m->queue_state.cp_hqd_pq_base = lower_32((uint64_t)q->queue_address >> 8);
> +	m->queue_state.cp_hqd_pq_base_hi = upper_32((uint64_t)q->queue_address >> 8);
> +	m->queue_state.cp_hqd_pq_rptr_report_addr = lower_32((uint64_t)q->read_ptr);
> +	m->queue_state.cp_hqd_pq_rptr_report_addr_hi = upper_32((uint64_t)q->read_ptr);
> +	m->queue_state.cp_hqd_pq_doorbell_control = DOORBELL_EN | DOORBELL_OFFSET(q->doorbell_off);
> +
> +	m->queue_state.cp_hqd_vmid = q->vmid;
> +
> +	m->queue_state.cp_hqd_active = 0;
> +	q->is_active = false;
> +	if (q->queue_size > 0 &&
> +			q->queue_address != 0 &&
> +			q->queue_percent > 0) {
> +		m->queue_state.cp_hqd_active = 1;
> +		q->is_active = true;
> +	}
> +
> +	return 0;
> +}
> +
> +static int destroy_mqd(struct mqd_manager *mm, bool is_reset, unsigned int timeout, uint32_t pipe_id, uint32_t queue_id)
> +{
> +	return kfd2kgd->hqd_destroy(mm->dev->kgd, is_reset, timeout, pipe_id, queue_id);
> +}
> +
> +bool is_occupied(struct mqd_manager *mm, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id)
> +{
> +
> +	return kfd2kgd->hqd_is_occupies(mm->dev->kgd, queue_address, pipe_id, queue_id);
> +
> +}
> +
> +/*
> + * HIQ MQD Implementation
> + */

A more useful comment than that.

> +
> +static int init_mqd_hiq(struct mqd_manager *mm, void **mqd, kfd_mem_obj *mqd_mem_obj,
> +		uint64_t *gart_addr, struct queue_properties *q)
> +{
> +	uint64_t addr;
> +	struct cik_mqd *m;
> +	int retval;
> +
> +	BUG_ON(!mm || !q || !mqd || !mqd_mem_obj);
> +
> +	pr_debug("kfd: In func %s\n", __func__);
> +
> +	retval = kfd_vidmem_alloc_map(
> +				mm->dev,
> +				mqd_mem_obj,
> +				(void **)&m,
> +				&addr,
> +				ALIGN(sizeof(struct cik_mqd), PAGE_SIZE));
> +
> +	if (retval != 0)
> +		return -ENOMEM;
> +
> +	memset(m, 0, sizeof(struct cik_mqd));
> +
> +	m->header = 0xC0310800;
> +	m->pipeline_stat_enable = 1;
> +	m->static_thread_mgmt01[0] = 0xFFFFFFFF;
> +	m->static_thread_mgmt01[1] = 0xFFFFFFFF;
> +	m->static_thread_mgmt23[0] = 0xFFFFFFFF;
> +	m->static_thread_mgmt23[1] = 0xFFFFFFFF;
> +
> +	m->queue_state.cp_hqd_persistent_state = DEFAULT_CP_HQD_PERSISTENT_STATE;
> +
> +	m->queue_state.cp_mqd_control             = MQD_CONTROL_PRIV_STATE_EN;
> +	m->queue_state.cp_mqd_base_addr           = lower_32(addr);
> +	m->queue_state.cp_mqd_base_addr_hi        = upper_32(addr);
> +
> +	m->queue_state.cp_hqd_ib_control = DEFAULT_MIN_IB_AVAIL_SIZE;
> +
> +	m->queue_state.cp_hqd_quantum = QUANTUM_EN | QUANTUM_SCALE_1MS | QUANTUM_DURATION(10);
> +
> +	m->queue_state.cp_hqd_pipe_priority = 1;
> +	m->queue_state.cp_hqd_queue_priority = 15;
> +
> +	*mqd = m;
> +	if (gart_addr)
> +		*gart_addr = addr;
> +	retval = mm->update_mqd(mm, m, q);
> +
> +	return retval;
> +}
> +
> +static int update_mqd_hiq(struct mqd_manager *mm, void *mqd, struct queue_properties *q)
> +{
> +	struct cik_mqd *m;
> +
> +	BUG_ON(!mm || !q || !mqd);
> +
> +	pr_debug("kfd: In func %s\n", __func__);
> +
> +	m = get_mqd(mqd);
> +	m->queue_state.cp_hqd_pq_control = DEFAULT_RPTR_BLOCK_SIZE | DEFAULT_MIN_AVAIL_SIZE | PRIV_STATE | KMD_QUEUE;
> +	/* calculating queue size which is log base 2 of actual queue size -1 dwords */
> +	m->queue_state.cp_hqd_pq_control |= ffs(q->queue_size / sizeof(unsigned int)) - 1 - 1;
> +	m->queue_state.cp_hqd_pq_base = lower_32((uint64_t)q->queue_address >> 8);
> +	m->queue_state.cp_hqd_pq_base_hi = upper_32((uint64_t)q->queue_address >> 8);
> +	m->queue_state.cp_hqd_pq_rptr_report_addr = lower_32((uint64_t)q->read_ptr);
> +	m->queue_state.cp_hqd_pq_rptr_report_addr_hi = upper_32((uint64_t)q->read_ptr);
> +	m->queue_state.cp_hqd_pq_doorbell_control = DOORBELL_EN | DOORBELL_OFFSET(q->doorbell_off);
> +
> +	m->queue_state.cp_hqd_vmid = q->vmid;
> +
> +	m->queue_state.cp_hqd_active = 0;
> +	q->is_active = false;
> +	if (q->queue_size > 0 &&
> +			q->queue_address != 0 &&
> +			q->queue_percent > 0) {
> +		m->queue_state.cp_hqd_active = 1;
> +		q->is_active = true;
> +	}
> +
> +	return 0;
> +}
> +
> +struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type, struct kfd_dev *dev)
> +{
> +	struct mqd_manager *mqd;
> +
> +	BUG_ON(!dev);
> +	BUG_ON(type >= KFD_MQD_TYPE_MAX);
> +
> +	pr_debug("kfd: In func %s\n", __func__);
> +
> +	mqd = kzalloc(sizeof(struct mqd_manager), GFP_KERNEL);
> +	if (!mqd)
> +		return NULL;
> +
> +	mqd->dev = dev;
> +
> +	switch (type) {
> +	case KFD_MQD_TYPE_CIK_CP:
> +	case KFD_MQD_TYPE_CIK_COMPUTE:
> +		mqd->init_mqd = init_mqd;
> +		mqd->uninit_mqd = uninit_mqd;
> +		mqd->load_mqd = load_mqd;
> +		mqd->update_mqd = update_mqd;
> +		mqd->destroy_mqd = destroy_mqd;
> +		mqd->is_occupied = is_occupied;
> +		break;
> +	case KFD_MQD_TYPE_CIK_HIQ:
> +		mqd->init_mqd = init_mqd_hiq;
> +		mqd->uninit_mqd = uninit_mqd;
> +		mqd->load_mqd = load_mqd;
> +		mqd->update_mqd = update_mqd_hiq;
> +		mqd->destroy_mqd = destroy_mqd;
> +		mqd->is_occupied = is_occupied;
> +		break;
> +	default:
> +		kfree(mqd);
> +		return NULL;
> +		break;
> +	}
> +
> +	return mqd;
> +}
> +
> +/* SDMA queues should be implemented here when the cp will supports them */
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h b/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h
> new file mode 100644
> index 0000000..a6b0007
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h
> @@ -0,0 +1,54 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef KFD_MQD_MANAGER_H_
> +#define KFD_MQD_MANAGER_H_
> +
> +#include "kfd_priv.h"
> +
> +struct mqd_manager {
> +	int	(*init_mqd)(struct mqd_manager *mm, void **mqd,
> +			kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
> +			struct queue_properties *q);
> +
> +	int	(*load_mqd)(struct mqd_manager *mm, void *mqd,
> +				uint32_t pipe_id, uint32_t queue_id,
> +				uint32_t __user *wptr);
> +
> +	int	(*update_mqd)(struct mqd_manager *mm, void *mqd,
> +				struct queue_properties *q);
> +
> +	int	(*destroy_mqd)(struct mqd_manager *mm, bool is_reset,
> +				unsigned int timeout, uint32_t pipe_id,
> +				uint32_t queue_id);
> +
> +	void	(*uninit_mqd)(struct mqd_manager *mm, void *mqd,
> +				kfd_mem_obj mqd_mem_obj);
> +	bool	(*is_occupied)(struct mqd_manager *mm, uint64_t queue_address,
> +				uint32_t pipe_id, uint32_t queue_id);
> +
> +	struct mutex	mqd_mutex;
> +	struct kfd_dev	*dev;
> +};

Would be nice to have this interface documented. For reference see how ttm
document things (include/drm/ttm/*.h)

> +
> +#endif /* KFD_MQD_MANAGER_H_ */
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> index 94ff1c3..76494757 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> @@ -179,6 +179,14 @@ struct queue {
>  	struct kfd_dev		*device;
>  };
>  
> +enum KFD_MQD_TYPE {
> +	KFD_MQD_TYPE_CIK_COMPUTE = 0, /* for no cp scheduling */
> +	KFD_MQD_TYPE_CIK_HIQ, /* for hiq */
> +	KFD_MQD_TYPE_CIK_CP, /* for cp queues and diq */
> +	KFD_MQD_TYPE_CIK_SDMA, /* for sdma queues */
> +	KFD_MQD_TYPE_MAX
> +};
> +
>  /* Data that is per-process-per device. */
>  struct kfd_process_device {
>  	/*
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 14/25] amdkfd: Add mqd_manager module
  2014-07-21  2:33   ` Jerome Glisse
@ 2014-08-02 19:18     ` Oded Gabbay
  0 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-08-02 19:18 UTC (permalink / raw)
  To: Jerome Glisse, linux-kernel, dri-devel
  Cc: Andrew Lewycky, Michel Dänzer, Alexey Skidanov,
	Andrew Morton



On 21/07/14 05:33, Jerome Glisse wrote:
> On Thu, Jul 17, 2014 at 04:29:21PM +0300, Oded Gabbay wrote:
>> From: Ben Goz <ben.goz@amd.com>
>>
>> The mqd_manager module handles MQD data structures. MQD stands for Memory Queue Descriptor, which is used by the H/W to keep the usermode queue state in memory.
>>
>> Signed-off-by: Ben Goz <ben.goz@amd.com>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> ---
>>  drivers/gpu/drm/radeon/amdkfd/Makefile          |   2 +-
>>  drivers/gpu/drm/radeon/amdkfd/cik_mqds.h        | 185 +++++++++++++++
>>  drivers/gpu/drm/radeon/amdkfd/cik_regs.h        | 220 ++++++++++++++++++
>>  drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c | 291 ++++++++++++++++++++++++
>>  drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h |  54 +++++
>>  drivers/gpu/drm/radeon/amdkfd/kfd_priv.h        |   8 +
>>  6 files changed, 759 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/cik_mqds.h
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/cik_regs.h
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c
>>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
>> index dbff147..b5201f4 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/Makefile
>> +++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
>> @@ -6,6 +6,6 @@ ccflags-y := -Iinclude/drm
>>  
>>  amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
>>  		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
>> -		kfd_process.o kfd_queue.o
>> +		kfd_process.o kfd_queue.o kfd_mqd_manager.o
>>  
>>  obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/cik_mqds.h b/drivers/gpu/drm/radeon/amdkfd/cik_mqds.h
>> new file mode 100644
>> index 0000000..ce75604
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/cik_mqds.h
>> @@ -0,0 +1,185 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#ifndef CIK_MQDS_H_
>> +#define CIK_MQDS_H_
>> +
>> +#pragma pack(push, 4)
> 
> No pragma pack.
> 
Fixed in v3.
>> +
>> +struct cik_hpd_registers {
>> +	u32 cp_hpd_roq_offsets;
>> +	u32 cp_hpd_eop_base_addr;
>> +	u32 cp_hpd_eop_base_addr_hi;
>> +	u32 cp_hpd_eop_vmid;
>> +	u32 cp_hpd_eop_control;
>> +};
>> +
>> +/* This structure represents mqd used for cp scheduling queue
>> + * taken from Gfx72_cp_program_spec.pdf
>> + */
>> +struct cik_compute_mqd {
>> +	u32 header;
>> +	u32 compute_dispatch_initiator;
>> +	u32 compute_dim_x;
>> +	u32 compute_dim_y;
>> +	u32 compute_dim_z;
>> +	u32 compute_start_x;
>> +	u32 compute_start_y;
>> +	u32 compute_start_z;
>> +	u32 compute_num_thread_x;
>> +	u32 compute_num_thread_y;
>> +	u32 compute_num_thread_z;
>> +	u32 compute_pipelinestat_enable;
>> +	u32 compute_perfcount_enable;
>> +	u32 compute_pgm_lo;
>> +	u32 compute_pgm_hi;
>> +	u32 compute_tba_lo;
>> +	u32 compute_tba_hi;
>> +	u32 compute_tma_lo;
>> +	u32 compute_tma_hi;
>> +	u32 compute_pgm_rsrc1;
>> +	u32 compute_pgm_rsrc2;
>> +	u32 compute_vmid;
>> +	u32 compute_resource_limits;
>> +	u32 compute_static_thread_mgmt_se0;
>> +	u32 compute_static_thread_mgmt_se1;
>> +	u32 compute_tmpring_size;
>> +	u32 compute_static_thread_mgmt_se2;
>> +	u32 compute_static_thread_mgmt_se3;
>> +	u32 compute_restart_x;
>> +	u32 compute_restart_y;
>> +	u32 compute_restart_z;
>> +	u32 compute_thread_trace_enable;
>> +	u32 compute_misc_reserved;
>> +	u32 compute_user_data[16];
>> +	u32 vgt_csinvoc_count_lo;
>> +	u32 vgt_csinvoc_count_hi;
>> +	u32 cp_mqd_base_addr51;
>> +	u32 cp_mqd_base_addr_hi;
>> +	u32 cp_hqd_active;
>> +	u32 cp_hqd_vmid;
>> +	u32 cp_hqd_persistent_state;
>> +	u32 cp_hqd_pipe_priority;
>> +	u32 cp_hqd_queue_priority;
>> +	u32 cp_hqd_quantum;
>> +	u32 cp_hqd_pq_base;
>> +	u32 cp_hqd_pq_base_hi;
>> +	u32 cp_hqd_pq_rptr;
>> +	u32 cp_hqd_pq_rptr_report_addr;
>> +	u32 cp_hqd_pq_rptr_report_addr_hi;
>> +	u32 cp_hqd_pq_wptr_poll_addr;
>> +	u32 cp_hqd_pq_wptr_poll_addr_hi;
>> +	u32 cp_hqd_pq_doorbell_control;
>> +	u32 cp_hqd_pq_wptr;
>> +	u32 cp_hqd_pq_control;
>> +	u32 cp_hqd_ib_base_addr;
>> +	u32 cp_hqd_ib_base_addr_hi;
>> +	u32 cp_hqd_ib_rptr;
>> +	u32 cp_hqd_ib_control;
>> +	u32 cp_hqd_iq_timer;
>> +	u32 cp_hqd_iq_rptr;
>> +	u32 cp_hqd_dequeue_request;
>> +	u32 cp_hqd_dma_offload;
>> +	u32 cp_hqd_sema_cmd;
>> +	u32 cp_hqd_msg_type;
>> +	u32 cp_hqd_atomic0_preop_lo;
>> +	u32 cp_hqd_atomic0_preop_hi;
>> +	u32 cp_hqd_atomic1_preop_lo;
>> +	u32 cp_hqd_atomic1_preop_hi;
>> +	u32 cp_hqd_hq_scheduler0;
>> +	u32 cp_hqd_hq_scheduler1;
>> +	u32 cp_mqd_control;
>> +	u32 reserved1[10];
>> +	u32 cp_mqd_query_time_lo;
>> +	u32 cp_mqd_query_time_hi;
>> +	u32 reserved2[4];
>> +	u32 cp_mqd_connect_start_time_lo;
>> +	u32 cp_mqd_connect_start_time_hi;
>> +	u32 cp_mqd_connect_end_time_lo;
>> +	u32 cp_mqd_connect_end_time_hi;
>> +	u32 cp_mqd_connect_end_wf_count;
>> +	u32 cp_mqd_connect_end_pq_rptr;
>> +	u32 cp_mqd_connect_end_pq_wptr;
>> +	u32 cp_mqd_connect_end_ib_rptr;
>> +	u32 reserved3[18];
>> +};
>> +
>> +/* This structure represents all *IQs
>> + * Taken from Gfx73_CPC_Eng_Init_Prog.pdf
>> + */
>> +struct cik_interface_mqd {
>> +	u32 reserved1[128];
>> +	u32 cp_mqd_base_addr;
>> +	u32 cp_mqd_base_addr_hi;
>> +	u32 cp_hqd_active;
>> +	u32 cp_hqd_vmid;
>> +	u32 cp_hqd_persistent_state;
>> +	u32 cp_hqd_pipe_priority;
>> +	u32 cp_hqd_queue_priority;
>> +	u32 cp_hqd_quantum;
>> +	u32 cp_hqd_pq_base;
>> +	u32 cp_hqd_pq_base_hi;
>> +	u32 cp_hqd_pq_rptr;
>> +	u32 cp_hqd_pq_rptr_report_addr;
>> +	u32 cp_hqd_pq_rptr_report_addr_hi;
>> +	u32 cp_hqd_pq_wptr_poll_addr;
>> +	u32 cp_hqd_pq_wptr_poll_addr_hi;
>> +	u32 cp_hqd_pq_doorbell_control;
>> +	u32 cp_hqd_pq_wptr;
>> +	u32 cp_hqd_pq_control;
>> +	u32 cp_hqd_ib_base_addr;
>> +	u32 cp_hqd_ib_base_addr_hi;
>> +	u32 cp_hqd_ib_rptr;
>> +	u32 cp_hqd_ib_control;
>> +	u32 cp_hqd_iq_timer;
>> +	u32 cp_hqd_iq_rptr;
>> +	u32 cp_hqd_dequeue_request;
>> +	u32 cp_hqd_dma_offload;
>> +	u32 cp_hqd_sema_cmd;
>> +	u32 cp_hqd_msg_type;
>> +	u32 cp_hqd_atomic0_preop_lo;
>> +	u32 cp_hqd_atomic0_preop_hi;
>> +	u32 cp_hqd_atomic1_preop_lo;
>> +	u32 cp_hqd_atomic1_preop_hi;
>> +	u32 cp_hqd_hq_status0;
>> +	u32 cp_hqd_hq_control0;
>> +	u32 cp_mqd_control;
>> +	u32 reserved2[3];
>> +	u32 cp_hqd_hq_status1;
>> +	u32 cp_hqd_hq_control1;
>> +	u32 reserved3[16];
>> +	u32 cp_hqd_hq_status2;
>> +	u32 cp_hqd_hq_control2;
>> +	u32 cp_hqd_hq_status3;
>> +	u32 cp_hqd_hq_control3;
>> +	u32 reserved4[2];
>> +	u32 cp_mqd_query_time_lo;
>> +	u32 cp_mqd_query_time_hi;
>> +	u32 reserved5[48];
>> +	u32 cp_mqd_skip_process[16];
>> +};
> 
> I have not fully check but very few of the above fields are use. So please
> do strip this structure to only used field we need to keep stack use as low
> as possible. Moreover the whole reserved* business kind of tell me that this
> is done to match register layout which i would rather avoid being use as a
> struct.
> 
The struct cik_mqd, which also includes struct cik_hqd_registers,
describe the mqd itself. The mqd is not registers perse, but rather a
structure that is common interface between the CPU and GPU. Although we
don't initalize all its members (as some of them are for the GPU usage),
I believe this is the proper way to use it. Do you have another suggestion ?

>> +
>> +#pragma pack(pop)
>> +
>> +
>> +#endif /* CIK_MQDS_H_ */
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/cik_regs.h b/drivers/gpu/drm/radeon/amdkfd/cik_regs.h
>> new file mode 100644
>> index 0000000..a6404e3
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/cik_regs.h
>> @@ -0,0 +1,220 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#ifndef CIK_REGS_H
>> +#define CIK_REGS_H
>> +
>> +#define IH_VMID_0_LUT					0x3D40u
>> +
>> +#define BIF_DOORBELL_CNTL				0x530Cu
>> +
>> +#define	SRBM_GFX_CNTL					0xE44
>> +#define	PIPEID(x)					((x) << 0)
>> +#define	MEID(x)						((x) << 2)
>> +#define	VMID(x)						((x) << 4)
>> +#define	QUEUEID(x)					((x) << 8)
>> +
>> +#define	SQ_CONFIG					0x8C00
>> +
>> +#define	SH_MEM_BASES					0x8C28
>> +/* if PTR32, these are the bases for scratch and lds */
>> +#define	PRIVATE_BASE(x)					((x) << 0) /* scratch */
>> +#define	SHARED_BASE(x)					((x) << 16) /* LDS */
>> +#define	SH_MEM_APE1_BASE				0x8C2C
>> +/* if PTR32, this is the base location of GPUVM */
>> +#define	SH_MEM_APE1_LIMIT				0x8C30
>> +/* if PTR32, this is the upper limit of GPUVM */
>> +#define	SH_MEM_CONFIG					0x8C34
>> +#define	PTR32						(1 << 0)
>> +#define PRIVATE_ATC					(1 << 1)
>> +#define	ALIGNMENT_MODE(x)				((x) << 2)
>> +#define	SH_MEM_ALIGNMENT_MODE_DWORD			0
>> +#define	SH_MEM_ALIGNMENT_MODE_DWORD_STRICT		1
>> +#define	SH_MEM_ALIGNMENT_MODE_STRICT			2
>> +#define	SH_MEM_ALIGNMENT_MODE_UNALIGNED			3
>> +#define	DEFAULT_MTYPE(x)				((x) << 4)
>> +#define	APE1_MTYPE(x)					((x) << 7)
>> +
>> +/* valid for both DEFAULT_MTYPE and APE1_MTYPE */
>> +#define	MTYPE_CACHED					0
>> +#define	MTYPE_NONCACHED					3
>> +
>> +
>> +#define SH_STATIC_MEM_CONFIG				0x9604u
>> +
>> +#define	TC_CFG_L1_LOAD_POLICY0				0xAC68
>> +#define	TC_CFG_L1_LOAD_POLICY1				0xAC6C
>> +#define	TC_CFG_L1_STORE_POLICY				0xAC70
>> +#define	TC_CFG_L2_LOAD_POLICY0				0xAC74
>> +#define	TC_CFG_L2_LOAD_POLICY1				0xAC78
>> +#define	TC_CFG_L2_STORE_POLICY0				0xAC7C
>> +#define	TC_CFG_L2_STORE_POLICY1				0xAC80
>> +#define	TC_CFG_L2_ATOMIC_POLICY				0xAC84
>> +#define	TC_CFG_L1_VOLATILE				0xAC88
>> +#define	TC_CFG_L2_VOLATILE				0xAC8C
>> +
>> +#define CP_PQ_WPTR_POLL_CNTL				0xC20C
>> +#define	WPTR_POLL_EN					(1 << 31)
>> +
>> +#define CPC_INT_CNTL					0xC2D0
>> +#define CP_ME1_PIPE0_INT_CNTL				0xC214
>> +#define CP_ME1_PIPE1_INT_CNTL				0xC218
>> +#define CP_ME1_PIPE2_INT_CNTL				0xC21C
>> +#define CP_ME1_PIPE3_INT_CNTL				0xC220
>> +#define CP_ME2_PIPE0_INT_CNTL				0xC224
>> +#define CP_ME2_PIPE1_INT_CNTL				0xC228
>> +#define CP_ME2_PIPE2_INT_CNTL				0xC22C
>> +#define CP_ME2_PIPE3_INT_CNTL				0xC230
>> +#define DEQUEUE_REQUEST_INT_ENABLE			(1 << 13)
>> +#define WRM_POLL_TIMEOUT_INT_ENABLE			(1 << 17)
>> +#define PRIV_REG_INT_ENABLE				(1 << 23)
>> +#define TIME_STAMP_INT_ENABLE				(1 << 26)
>> +#define GENERIC2_INT_ENABLE				(1 << 29)
>> +#define GENERIC1_INT_ENABLE				(1 << 30)
>> +#define GENERIC0_INT_ENABLE				(1 << 31)
>> +#define CP_ME1_PIPE0_INT_STATUS				0xC214
>> +#define CP_ME1_PIPE1_INT_STATUS				0xC218
>> +#define CP_ME1_PIPE2_INT_STATUS				0xC21C
>> +#define CP_ME1_PIPE3_INT_STATUS				0xC220
>> +#define CP_ME2_PIPE0_INT_STATUS				0xC224
>> +#define CP_ME2_PIPE1_INT_STATUS				0xC228
>> +#define CP_ME2_PIPE2_INT_STATUS				0xC22C
>> +#define CP_ME2_PIPE3_INT_STATUS				0xC230
>> +#define DEQUEUE_REQUEST_INT_STATUS			(1 << 13)
>> +#define WRM_POLL_TIMEOUT_INT_STATUS			(1 << 17)
>> +#define PRIV_REG_INT_STATUS				(1 << 23)
>> +#define TIME_STAMP_INT_STATUS				(1 << 26)
>> +#define GENERIC2_INT_STATUS				(1 << 29)
>> +#define GENERIC1_INT_STATUS				(1 << 30)
>> +#define GENERIC0_INT_STATUS				(1 << 31)
>> +
>> +#define CP_HPD_EOP_BASE_ADDR				0xC904
>> +#define CP_HPD_EOP_BASE_ADDR_HI				0xC908
>> +#define CP_HPD_EOP_VMID					0xC90C
>> +#define CP_HPD_EOP_CONTROL				0xC910
>> +#define	EOP_SIZE(x)					((x) << 0)
>> +#define	EOP_SIZE_MASK					(0x3f << 0)
>> +#define CP_MQD_BASE_ADDR				0xC914
>> +#define CP_MQD_BASE_ADDR_HI				0xC918
>> +#define CP_HQD_ACTIVE					0xC91C
>> +#define CP_HQD_VMID					0xC920
>> +
>> +#define CP_HQD_PERSISTENT_STATE				0xC924u
>> +#define	DEFAULT_CP_HQD_PERSISTENT_STATE			(0x33U << 8)
>> +
>> +#define CP_HQD_PIPE_PRIORITY				0xC928u
>> +#define CP_HQD_QUEUE_PRIORITY				0xC92Cu
>> +#define CP_HQD_QUANTUM					0xC930u
>> +#define	QUANTUM_EN					1U
>> +#define	QUANTUM_SCALE_1MS				(1U << 4)
>> +#define	QUANTUM_DURATION(x)				((x) << 8)
>> +
>> +#define CP_HQD_PQ_BASE					0xC934
>> +#define CP_HQD_PQ_BASE_HI				0xC938
>> +#define CP_HQD_PQ_RPTR					0xC93C
>> +#define CP_HQD_PQ_RPTR_REPORT_ADDR			0xC940
>> +#define CP_HQD_PQ_RPTR_REPORT_ADDR_HI			0xC944
>> +#define CP_HQD_PQ_WPTR_POLL_ADDR			0xC948
>> +#define CP_HQD_PQ_WPTR_POLL_ADDR_HI			0xC94C
>> +#define CP_HQD_PQ_DOORBELL_CONTROL			0xC950
>> +#define	DOORBELL_OFFSET(x)				((x) << 2)
>> +#define	DOORBELL_OFFSET_MASK				(0x1fffff << 2)
>> +#define	DOORBELL_SOURCE					(1 << 28)
>> +#define	DOORBELL_SCHD_HIT				(1 << 29)
>> +#define	DOORBELL_EN					(1 << 30)
>> +#define	DOORBELL_HIT					(1 << 31)
>> +#define CP_HQD_PQ_WPTR					0xC954
>> +#define CP_HQD_PQ_CONTROL				0xC958
>> +#define	QUEUE_SIZE(x)					((x) << 0)
>> +#define	QUEUE_SIZE_MASK					(0x3f << 0)
>> +#define	RPTR_BLOCK_SIZE(x)				((x) << 8)
>> +#define	RPTR_BLOCK_SIZE_MASK				(0x3f << 8)
>> +#define	MIN_AVAIL_SIZE(x)				((x) << 20)
>> +#define	PQ_ATC_EN					(1 << 23)
>> +#define	PQ_VOLATILE					(1 << 26)
>> +#define	NO_UPDATE_RPTR					(1 << 27)
>> +#define	UNORD_DISPATCH					(1 << 28)
>> +#define	ROQ_PQ_IB_FLIP					(1 << 29)
>> +#define	PRIV_STATE					(1 << 30)
>> +#define	KMD_QUEUE					(1 << 31)
>> +
>> +#define	DEFAULT_RPTR_BLOCK_SIZE				RPTR_BLOCK_SIZE(5)
>> +#define	DEFAULT_MIN_AVAIL_SIZE				MIN_AVAIL_SIZE(3)
>> +
>> +#define CP_HQD_IB_BASE_ADDR				0xC95Cu
>> +#define CP_HQD_IB_BASE_ADDR_HI				0xC960u
>> +#define CP_HQD_IB_RPTR					0xC964u
>> +#define CP_HQD_IB_CONTROL				0xC968u
>> +#define	IB_ATC_EN					(1U << 23)
>> +#define	DEFAULT_MIN_IB_AVAIL_SIZE			(3U << 20)
>> +
>> +#define CP_HQD_DEQUEUE_REQUEST				0xC974
>> +#define	DEQUEUE_REQUEST_DRAIN				1
>> +#define DEQUEUE_REQUEST_RESET				2
>> +#define		DEQUEUE_INT					(1U << 8)
>> +
>> +#define CP_HQD_SEMA_CMD					0xC97Cu
>> +#define CP_HQD_MSG_TYPE					0xC980u
>> +#define CP_HQD_ATOMIC0_PREOP_LO				0xC984u
>> +#define CP_HQD_ATOMIC0_PREOP_HI				0xC988u
>> +#define CP_HQD_ATOMIC1_PREOP_LO				0xC98Cu
>> +#define CP_HQD_ATOMIC1_PREOP_HI				0xC990u
>> +#define CP_HQD_HQ_SCHEDULER0				0xC994u
>> +#define CP_HQD_HQ_SCHEDULER1				0xC998u
>> +
>> +
>> +#define CP_MQD_CONTROL					0xC99C
>> +#define	MQD_VMID(x)					((x) << 0)
>> +#define	MQD_VMID_MASK					(0xf << 0)
>> +#define	MQD_CONTROL_PRIV_STATE_EN			(1U << 8)
>> +
>> +#define GRBM_GFX_INDEX					0x30800
>> +#define	INSTANCE_INDEX(x)				((x) << 0)
>> +#define	SH_INDEX(x)					((x) << 8)
>> +#define	SE_INDEX(x)					((x) << 16)
>> +#define	SH_BROADCAST_WRITES				(1 << 29)
>> +#define	INSTANCE_BROADCAST_WRITES			(1 << 30)
>> +#define	SE_BROADCAST_WRITES				(1 << 31)
>> +
>> +#define SQC_CACHES					0x30d20
>> +#define SQC_POLICY					0x8C38u
>> +#define SQC_VOLATILE					0x8C3Cu
>> +
>> +#define CP_PERFMON_CNTL					0x36020
>> +
>> +#define ATC_VMID0_PASID_MAPPING				0x339Cu
>> +#define	ATC_VMID_PASID_MAPPING_UPDATE_STATUS		0x3398u
>> +#define	ATC_VMID_PASID_MAPPING_VALID			(1U << 31)
>> +
>> +#define ATC_VM_APERTURE0_CNTL				0x3310u
>> +#define	ATS_ACCESS_MODE_NEVER				0
>> +#define	ATS_ACCESS_MODE_ALWAYS				1
>> +
>> +#define ATC_VM_APERTURE0_CNTL2				0x3318u
>> +#define ATC_VM_APERTURE0_HIGH_ADDR			0x3308u
>> +#define ATC_VM_APERTURE0_LOW_ADDR			0x3300u
>> +#define ATC_VM_APERTURE1_CNTL				0x3314u
>> +#define ATC_VM_APERTURE1_CNTL2				0x331Cu
>> +#define ATC_VM_APERTURE1_HIGH_ADDR			0x330Cu
>> +#define ATC_VM_APERTURE1_LOW_ADDR			0x3304u
>> +
>> +#endif
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c
>> new file mode 100644
>> index 0000000..5f9f9b9
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.c
>> @@ -0,0 +1,291 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#include <linux/printk.h>
>> +#include <linux/slab.h>
>> +#include "kfd_priv.h"
>> +#include "kfd_mqd_manager.h"
>> +#include "cik_mqds.h"
>> +#include "cik_regs.h"
>> +#include "../cik_reg.h"
>> +
>> +inline uint32_t lower_32(uint64_t x)
>> +{
>> +	return (uint32_t)x;
>> +}
>> +
>> +inline uint32_t upper_32(uint64_t x)
>> +{
>> +	return (uint32_t)(x >> 32);
>> +}
> 
> Do use kernel macro upper_32_bits or lower_32_bits. Each time you do something
> like that go check for existing macro.
> 
Done in v3
>> +
>> +inline void busy_wait(unsigned long ms)
>> +{
>> +	while (time_before(jiffies, ms))
>> +		cpu_relax();
>> +}
>> +
>> +static inline struct cik_mqd *get_mqd(void *mqd)
>> +{
>> +	return (struct cik_mqd *)mqd;
>> +}
>> +
>> +static int init_mqd(struct mqd_manager *mm, void **mqd, kfd_mem_obj *mqd_mem_obj,
>> +		uint64_t *gart_addr, struct queue_properties *q)
>> +{
>> +	uint64_t addr;
>> +	struct cik_mqd *m;
>> +	int retval;
>> +
>> +	BUG_ON(!mm || !q || !mqd);
>> +
>> +	pr_debug("kfd: In func %s\n", __func__);
>> +
>> +	retval = kfd_vidmem_alloc_map(
>> +				mm->dev,
>> +				mqd_mem_obj,
>> +				(void **)&m,
>> +				&addr,
>> +				ALIGN(sizeof(struct cik_mqd), 256));
>> +
>> +	if (retval != 0)
>> +		return -ENOMEM;
>> +
>> +	memset(m, 0, sizeof(struct cik_mqd));
>> +
>> +	m->header = 0xC0310800;
>> +	m->pipeline_stat_enable = 1;
>> +	m->static_thread_mgmt01[0] = 0xFFFFFFFF;
>> +	m->static_thread_mgmt01[1] = 0xFFFFFFFF;
>> +	m->static_thread_mgmt23[0] = 0xFFFFFFFF;
>> +	m->static_thread_mgmt23[1] = 0xFFFFFFFF;
>> +
>> +	m->queue_state.cp_hqd_persistent_state = DEFAULT_CP_HQD_PERSISTENT_STATE;
>> +
>> +	m->queue_state.cp_mqd_control             = MQD_CONTROL_PRIV_STATE_EN;
>> +	m->queue_state.cp_mqd_base_addr           = lower_32(addr);
>> +	m->queue_state.cp_mqd_base_addr_hi        = upper_32(addr);
>> +
>> +	m->queue_state.cp_hqd_ib_control = DEFAULT_MIN_IB_AVAIL_SIZE | IB_ATC_EN;
>> +	/* Although WinKFD writes this, I suspect it should not be necessary. */
>> +	m->queue_state.cp_hqd_ib_control = IB_ATC_EN | DEFAULT_MIN_IB_AVAIL_SIZE;
>> +
>> +	m->queue_state.cp_hqd_quantum = QUANTUM_EN | QUANTUM_SCALE_1MS | QUANTUM_DURATION(10);
>> +
>> +	m->queue_state.cp_hqd_pipe_priority = 1;
>> +	m->queue_state.cp_hqd_queue_priority = 15;
>> +
>> +	*mqd = m;
>> +	if (gart_addr != NULL)
>> +		*gart_addr = addr;
>> +	retval = mm->update_mqd(mm, m, q);
>> +
>> +	return retval;
>> +}
>> +
>> +static void uninit_mqd(struct mqd_manager *mm, void *mqd, kfd_mem_obj mqd_mem_obj)
>> +{
>> +	BUG_ON(!mm || !mqd);
>> +	kfd_vidmem_free_unmap(mm->dev, mqd_mem_obj);
>> +}
>> +
>> +static int load_mqd(struct mqd_manager *mm, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr)
>> +{
>> +	return kfd2kgd->hqd_load(mm->dev->kgd, mqd, pipe_id, queue_id, wptr);
>> +
>> +}
>> +
>> +static int update_mqd(struct mqd_manager *mm, void *mqd, struct queue_properties *q)
>> +{
>> +	struct cik_mqd *m;
>> +
>> +	BUG_ON(!mm || !q || !mqd);
>> +
>> +	pr_debug("kfd: In func %s\n", __func__);
>> +
>> +	m = get_mqd(mqd);
>> +	m->queue_state.cp_hqd_pq_control = DEFAULT_RPTR_BLOCK_SIZE | DEFAULT_MIN_AVAIL_SIZE | PQ_ATC_EN;
>> +	/* calculating queue size which is log base 2 of actual queue size -1 dwords and another -1 for ffs */
>> +	m->queue_state.cp_hqd_pq_control |= ffs(q->queue_size / sizeof(unsigned int)) - 1 - 1;
>> +	m->queue_state.cp_hqd_pq_base = lower_32((uint64_t)q->queue_address >> 8);
>> +	m->queue_state.cp_hqd_pq_base_hi = upper_32((uint64_t)q->queue_address >> 8);
>> +	m->queue_state.cp_hqd_pq_rptr_report_addr = lower_32((uint64_t)q->read_ptr);
>> +	m->queue_state.cp_hqd_pq_rptr_report_addr_hi = upper_32((uint64_t)q->read_ptr);
>> +	m->queue_state.cp_hqd_pq_doorbell_control = DOORBELL_EN | DOORBELL_OFFSET(q->doorbell_off);
>> +
>> +	m->queue_state.cp_hqd_vmid = q->vmid;
>> +
>> +	m->queue_state.cp_hqd_active = 0;
>> +	q->is_active = false;
>> +	if (q->queue_size > 0 &&
>> +			q->queue_address != 0 &&
>> +			q->queue_percent > 0) {
>> +		m->queue_state.cp_hqd_active = 1;
>> +		q->is_active = true;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int destroy_mqd(struct mqd_manager *mm, bool is_reset, unsigned int timeout, uint32_t pipe_id, uint32_t queue_id)
>> +{
>> +	return kfd2kgd->hqd_destroy(mm->dev->kgd, is_reset, timeout, pipe_id, queue_id);
>> +}
>> +
>> +bool is_occupied(struct mqd_manager *mm, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id)
>> +{
>> +
>> +	return kfd2kgd->hqd_is_occupies(mm->dev->kgd, queue_address, pipe_id, queue_id);
>> +
>> +}
>> +
>> +/*
>> + * HIQ MQD Implementation
>> + */
> 
> A more useful comment than that.
Done in v3
> 
>> +
>> +static int init_mqd_hiq(struct mqd_manager *mm, void **mqd, kfd_mem_obj *mqd_mem_obj,
>> +		uint64_t *gart_addr, struct queue_properties *q)
>> +{
>> +	uint64_t addr;
>> +	struct cik_mqd *m;
>> +	int retval;
>> +
>> +	BUG_ON(!mm || !q || !mqd || !mqd_mem_obj);
>> +
>> +	pr_debug("kfd: In func %s\n", __func__);
>> +
>> +	retval = kfd_vidmem_alloc_map(
>> +				mm->dev,
>> +				mqd_mem_obj,
>> +				(void **)&m,
>> +				&addr,
>> +				ALIGN(sizeof(struct cik_mqd), PAGE_SIZE));
>> +
>> +	if (retval != 0)
>> +		return -ENOMEM;
>> +
>> +	memset(m, 0, sizeof(struct cik_mqd));
>> +
>> +	m->header = 0xC0310800;
>> +	m->pipeline_stat_enable = 1;
>> +	m->static_thread_mgmt01[0] = 0xFFFFFFFF;
>> +	m->static_thread_mgmt01[1] = 0xFFFFFFFF;
>> +	m->static_thread_mgmt23[0] = 0xFFFFFFFF;
>> +	m->static_thread_mgmt23[1] = 0xFFFFFFFF;
>> +
>> +	m->queue_state.cp_hqd_persistent_state = DEFAULT_CP_HQD_PERSISTENT_STATE;
>> +
>> +	m->queue_state.cp_mqd_control             = MQD_CONTROL_PRIV_STATE_EN;
>> +	m->queue_state.cp_mqd_base_addr           = lower_32(addr);
>> +	m->queue_state.cp_mqd_base_addr_hi        = upper_32(addr);
>> +
>> +	m->queue_state.cp_hqd_ib_control = DEFAULT_MIN_IB_AVAIL_SIZE;
>> +
>> +	m->queue_state.cp_hqd_quantum = QUANTUM_EN | QUANTUM_SCALE_1MS | QUANTUM_DURATION(10);
>> +
>> +	m->queue_state.cp_hqd_pipe_priority = 1;
>> +	m->queue_state.cp_hqd_queue_priority = 15;
>> +
>> +	*mqd = m;
>> +	if (gart_addr)
>> +		*gart_addr = addr;
>> +	retval = mm->update_mqd(mm, m, q);
>> +
>> +	return retval;
>> +}
>> +
>> +static int update_mqd_hiq(struct mqd_manager *mm, void *mqd, struct queue_properties *q)
>> +{
>> +	struct cik_mqd *m;
>> +
>> +	BUG_ON(!mm || !q || !mqd);
>> +
>> +	pr_debug("kfd: In func %s\n", __func__);
>> +
>> +	m = get_mqd(mqd);
>> +	m->queue_state.cp_hqd_pq_control = DEFAULT_RPTR_BLOCK_SIZE | DEFAULT_MIN_AVAIL_SIZE | PRIV_STATE | KMD_QUEUE;
>> +	/* calculating queue size which is log base 2 of actual queue size -1 dwords */
>> +	m->queue_state.cp_hqd_pq_control |= ffs(q->queue_size / sizeof(unsigned int)) - 1 - 1;
>> +	m->queue_state.cp_hqd_pq_base = lower_32((uint64_t)q->queue_address >> 8);
>> +	m->queue_state.cp_hqd_pq_base_hi = upper_32((uint64_t)q->queue_address >> 8);
>> +	m->queue_state.cp_hqd_pq_rptr_report_addr = lower_32((uint64_t)q->read_ptr);
>> +	m->queue_state.cp_hqd_pq_rptr_report_addr_hi = upper_32((uint64_t)q->read_ptr);
>> +	m->queue_state.cp_hqd_pq_doorbell_control = DOORBELL_EN | DOORBELL_OFFSET(q->doorbell_off);
>> +
>> +	m->queue_state.cp_hqd_vmid = q->vmid;
>> +
>> +	m->queue_state.cp_hqd_active = 0;
>> +	q->is_active = false;
>> +	if (q->queue_size > 0 &&
>> +			q->queue_address != 0 &&
>> +			q->queue_percent > 0) {
>> +		m->queue_state.cp_hqd_active = 1;
>> +		q->is_active = true;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type, struct kfd_dev *dev)
>> +{
>> +	struct mqd_manager *mqd;
>> +
>> +	BUG_ON(!dev);
>> +	BUG_ON(type >= KFD_MQD_TYPE_MAX);
>> +
>> +	pr_debug("kfd: In func %s\n", __func__);
>> +
>> +	mqd = kzalloc(sizeof(struct mqd_manager), GFP_KERNEL);
>> +	if (!mqd)
>> +		return NULL;
>> +
>> +	mqd->dev = dev;
>> +
>> +	switch (type) {
>> +	case KFD_MQD_TYPE_CIK_CP:
>> +	case KFD_MQD_TYPE_CIK_COMPUTE:
>> +		mqd->init_mqd = init_mqd;
>> +		mqd->uninit_mqd = uninit_mqd;
>> +		mqd->load_mqd = load_mqd;
>> +		mqd->update_mqd = update_mqd;
>> +		mqd->destroy_mqd = destroy_mqd;
>> +		mqd->is_occupied = is_occupied;
>> +		break;
>> +	case KFD_MQD_TYPE_CIK_HIQ:
>> +		mqd->init_mqd = init_mqd_hiq;
>> +		mqd->uninit_mqd = uninit_mqd;
>> +		mqd->load_mqd = load_mqd;
>> +		mqd->update_mqd = update_mqd_hiq;
>> +		mqd->destroy_mqd = destroy_mqd;
>> +		mqd->is_occupied = is_occupied;
>> +		break;
>> +	default:
>> +		kfree(mqd);
>> +		return NULL;
>> +		break;
>> +	}
>> +
>> +	return mqd;
>> +}
>> +
>> +/* SDMA queues should be implemented here when the cp will supports them */
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h b/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h
>> new file mode 100644
>> index 0000000..a6b0007
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_mqd_manager.h
>> @@ -0,0 +1,54 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#ifndef KFD_MQD_MANAGER_H_
>> +#define KFD_MQD_MANAGER_H_
>> +
>> +#include "kfd_priv.h"
>> +
>> +struct mqd_manager {
>> +	int	(*init_mqd)(struct mqd_manager *mm, void **mqd,
>> +			kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
>> +			struct queue_properties *q);
>> +
>> +	int	(*load_mqd)(struct mqd_manager *mm, void *mqd,
>> +				uint32_t pipe_id, uint32_t queue_id,
>> +				uint32_t __user *wptr);
>> +
>> +	int	(*update_mqd)(struct mqd_manager *mm, void *mqd,
>> +				struct queue_properties *q);
>> +
>> +	int	(*destroy_mqd)(struct mqd_manager *mm, bool is_reset,
>> +				unsigned int timeout, uint32_t pipe_id,
>> +				uint32_t queue_id);
>> +
>> +	void	(*uninit_mqd)(struct mqd_manager *mm, void *mqd,
>> +				kfd_mem_obj mqd_mem_obj);
>> +	bool	(*is_occupied)(struct mqd_manager *mm, uint64_t queue_address,
>> +				uint32_t pipe_id, uint32_t queue_id);
>> +
>> +	struct mutex	mqd_mutex;
>> +	struct kfd_dev	*dev;
>> +};
> 
> Would be nice to have this interface documented. For reference see how ttm
> document things (include/drm/ttm/*.h)
> 
Done in v3

	Oded
>> +
>> +#endif /* KFD_MQD_MANAGER_H_ */
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> index 94ff1c3..76494757 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> @@ -179,6 +179,14 @@ struct queue {
>>  	struct kfd_dev		*device;
>>  };
>>  
>> +enum KFD_MQD_TYPE {
>> +	KFD_MQD_TYPE_CIK_COMPUTE = 0, /* for no cp scheduling */
>> +	KFD_MQD_TYPE_CIK_HIQ, /* for hiq */
>> +	KFD_MQD_TYPE_CIK_CP, /* for cp queues and diq */
>> +	KFD_MQD_TYPE_CIK_SDMA, /* for sdma queues */
>> +	KFD_MQD_TYPE_MAX
>> +};
>> +
>>  /* Data that is per-process-per device. */
>>  struct kfd_process_device {
>>  	/*
>> -- 
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 15/25] amdkfd: Add kernel queue module
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (10 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 14/25] amdkfd: Add mqd_manager module Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-21  2:42   ` Jerome Glisse
  2014-07-17 13:29 ` [PATCH v2 16/25] amdkfd: Add module parameter of scheduling policy Oded Gabbay
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Ben Goz <ben.goz@amd.com>

The kernel queue module enables the amdkfd to establish kernel queues, not exposed to user space.

The kernel queues are used for HIQ (HSA Interface Queue) and DIQ (Debug Interface Queue) operations

Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/Makefile             |   3 +-
 .../drm/radeon/amdkfd/kfd_device_queue_manager.h   | 101 +++
 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c   | 305 +++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h   |  66 ++
 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h    | 682 +++++++++++++++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h    | 107 ++++
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h           |  32 +
 7 files changed, 1295 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.h
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h

diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
index b5201f4..bead1be 100644
--- a/drivers/gpu/drm/radeon/amdkfd/Makefile
+++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
@@ -6,6 +6,7 @@ ccflags-y := -Iinclude/drm
 
 amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
 		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
-		kfd_process.o kfd_queue.o kfd_mqd_manager.o
+		kfd_process.o kfd_queue.o kfd_mqd_manager.o \
+		kfd_kernel_queue.o
 
 obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.h
new file mode 100644
index 0000000..037eaf8
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.h
@@ -0,0 +1,101 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef KFD_DEVICE_QUEUE_MANAGER_H_
+#define KFD_DEVICE_QUEUE_MANAGER_H_
+
+#include <linux/rwsem.h>
+#include <linux/list.h>
+#include "kfd_priv.h"
+#include "kfd_mqd_manager.h"
+
+#define QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS	(500)
+#define QUEUES_PER_PIPE				(8)
+#define PIPE_PER_ME_CP_SCHEDULING		(3)
+#define CIK_VMID_NUM				(8)
+#define KFD_VMID_START_OFFSET			(8)
+#define VMID_PER_DEVICE				CIK_VMID_NUM
+#define KFD_DQM_FIRST_PIPE			(0)
+
+struct device_process_node {
+	struct qcm_process_device *qpd;
+	struct list_head list;
+};
+
+struct device_queue_manager {
+	int	(*create_queue)(struct device_queue_manager *dqm,
+				struct queue *q,
+				struct qcm_process_device *qpd,
+				int *allocate_vmid);
+	int	(*destroy_queue)(struct device_queue_manager *dqm,
+				struct qcm_process_device *qpd,
+				struct queue *q);
+	int	(*update_queue)(struct device_queue_manager *dqm,
+				struct queue *q);
+	int	(*destroy_queues)(struct device_queue_manager *dqm);
+	struct mqd_manager * (*get_mqd_manager)(struct device_queue_manager *dqm,
+						enum KFD_MQD_TYPE type);
+	int	(*execute_queues)(struct device_queue_manager *dqm);
+	int	(*register_process)(struct device_queue_manager *dqm,
+					struct qcm_process_device *qpd);
+	int	(*unregister_process)(struct device_queue_manager *dqm,
+					struct qcm_process_device *qpd);
+	int	(*initialize)(struct device_queue_manager *dqm);
+	int	(*start)(struct device_queue_manager *dqm);
+	int	(*stop)(struct device_queue_manager *dqm);
+	void	(*uninitialize)(struct device_queue_manager *dqm);
+	int	(*create_kernel_queue)(struct device_queue_manager *dqm,
+					struct kernel_queue *kq,
+					struct qcm_process_device *qpd);
+	void	(*destroy_kernel_queue)(struct device_queue_manager *dqm,
+					struct kernel_queue *kq,
+					struct qcm_process_device *qpd);
+	bool	(*set_cache_memory_policy)(struct device_queue_manager *dqm,
+					   struct qcm_process_device *qpd,
+					   enum cache_policy default_policy,
+					   enum cache_policy alternate_policy,
+					   void __user *alternate_aperture_base,
+					   uint64_t alternate_aperture_size);
+
+
+	struct mqd_manager	*mqds[KFD_MQD_TYPE_MAX];
+	struct packet_manager	packets;
+	struct kfd_dev		*dev;
+	struct mutex		lock;
+	struct list_head	queues;
+	unsigned int		processes_count;
+	unsigned int		queue_count;
+	unsigned int		next_pipe_to_allocate;
+	unsigned int		*allocated_queues;
+	unsigned int		vmid_bitmap;
+	uint64_t		pipelines_addr;
+	kfd_mem_obj		pipeline_mem;
+	uint64_t		fence_gpu_addr;
+	unsigned int		*fence_addr;
+	kfd_mem_obj		fence_mem;
+	bool			active_runlist;
+};
+
+
+
+#endif /* KFD_DEVICE_QUEUE_MANAGER_H_ */
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c
new file mode 100644
index 0000000..b212524
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c
@@ -0,0 +1,305 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/printk.h>
+#include "kfd_kernel_queue.h"
+#include "kfd_priv.h"
+#include "kfd_device_queue_manager.h"
+#include "kfd_pm4_headers.h"
+#include "kfd_pm4_opcodes.h"
+
+#define PM4_COUNT_ZERO (((1 << 15) - 1) << 16)
+
+static bool initialize(struct kernel_queue *kq, struct kfd_dev *dev,
+		enum kfd_queue_type type, unsigned int queue_size)
+{
+	struct queue_properties prop;
+	int retval;
+	union PM4_TYPE_3_HEADER nop;
+
+	BUG_ON(!kq || !dev);
+	BUG_ON(type != KFD_QUEUE_TYPE_DIQ && type != KFD_QUEUE_TYPE_HIQ);
+
+	pr_debug("kfd: In func %s initializing queue type %d size %d\n", __func__, KFD_QUEUE_TYPE_HIQ, queue_size);
+
+	nop.opcode = IT_NOP;
+	nop.type = PM4_TYPE_3;
+	nop.u32all |= PM4_COUNT_ZERO;
+
+	kq->dev = dev;
+	kq->nop_packet = nop.u32all;
+	switch (type) {
+	case KFD_QUEUE_TYPE_DIQ:
+	case KFD_QUEUE_TYPE_HIQ:
+		kq->mqd = dev->dqm->get_mqd_manager(dev->dqm, KFD_MQD_TYPE_CIK_HIQ);
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	if (kq->mqd == NULL)
+		return false;
+
+	prop.doorbell_ptr = (qptr_t *)kfd_get_kernel_doorbell(dev, &prop.doorbell_off);
+	if (prop.doorbell_ptr == NULL)
+		goto err_get_kernel_doorbell;
+
+	retval = kfd_vidmem_alloc_map(dev, &kq->pq, (void **)&kq->pq_kernel_addr, &kq->pq_gpu_addr, queue_size);
+	if (retval != 0)
+		goto err_pq_allocate_vidmem;
+
+	retval = kfd_vidmem_alloc_map(kq->dev, &kq->rptr_mem, (void **)&kq->rptr_kernel, &kq->rptr_gpu_addr,
+					     sizeof(*kq->rptr_kernel));
+	if (retval != 0)
+		goto err_rptr_allocate_vidmem;
+
+	retval = kfd_vidmem_alloc_map(kq->dev, &kq->wptr_mem, (void **)&kq->wptr_kernel, &kq->wptr_gpu_addr,
+					     sizeof(*kq->rptr_kernel));
+	if (retval != 0)
+		goto err_wptr_allocate_vidmem;
+
+	prop.queue_size = queue_size;
+	prop.is_interop = false;
+	prop.priority = 1;
+	prop.queue_percent = 100;
+	prop.type = type;
+	prop.vmid = 0;
+	prop.queue_address = kq->pq_gpu_addr;
+	prop.read_ptr = (qptr_t *) kq->rptr_gpu_addr;
+	prop.write_ptr = (qptr_t *) kq->wptr_gpu_addr;
+
+	if (init_queue(&kq->queue, prop) != 0)
+		goto err_init_queue;
+
+	kq->queue->device = dev;
+	kq->queue->process = kfd_get_process(current);
+
+	retval = kq->mqd->init_mqd(kq->mqd, &kq->queue->mqd, &kq->queue->mqd_mem_obj,
+				   &kq->queue->gart_mqd_addr, &kq->queue->properties);
+	if (retval != 0)
+		goto err_init_mqd;
+
+	/* assign HIQ to HQD */
+	if (type == KFD_QUEUE_TYPE_HIQ) {
+		pr_debug("assigning hiq to hqd\n");
+		kq->queue->pipe = KFD_CIK_HIQ_PIPE;
+		kq->queue->queue = KFD_CIK_HIQ_QUEUE;
+		kq->mqd->load_mqd(kq->mqd, kq->queue->mqd, kq->queue->pipe, kq->queue->queue, NULL);
+	} else {
+		/* allocate fence for DIQ */
+		retval = kfd_vidmem_alloc_map(
+				dev,
+				&kq->fence_mem_obj,
+				&kq->fence_kernel_address,
+				&kq->fence_gpu_addr,
+				sizeof(uint32_t));
+
+		if (retval != 0)
+			goto err_alloc_fence;
+	}
+
+	print_queue(kq->queue);
+
+	return true;
+err_alloc_fence:
+err_init_mqd:
+	uninit_queue(kq->queue);
+err_init_queue:
+	kfd_vidmem_free_unmap(kq->dev, kq->wptr_mem);
+err_wptr_allocate_vidmem:
+	kfd_vidmem_free_unmap(kq->dev, kq->rptr_mem);
+err_rptr_allocate_vidmem:
+	kfd_vidmem_free_unmap(kq->dev, kq->pq);
+err_pq_allocate_vidmem:
+	pr_err("kfd: error init pq\n");
+	kfd_release_kernel_doorbell(dev, (u32 *)prop.doorbell_ptr);
+err_get_kernel_doorbell:
+	pr_err("kfd: error init doorbell");
+	return false;
+
+}
+
+static void uninitialize(struct kernel_queue *kq)
+{
+	BUG_ON(!kq);
+
+	if (kq->queue->properties.type == KFD_QUEUE_TYPE_HIQ)
+		kq->mqd->destroy_mqd(kq->mqd,
+					false,
+					QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS,
+					kq->queue->pipe,
+					kq->queue->queue);
+
+	kfd_vidmem_free_unmap(kq->dev, kq->rptr_mem);
+	kfd_vidmem_free_unmap(kq->dev, kq->wptr_mem);
+	kfd_vidmem_free_unmap(kq->dev, kq->pq);
+	kfd_release_kernel_doorbell(kq->dev, (u32 *)kq->queue->properties.doorbell_ptr);
+	uninit_queue(kq->queue);
+}
+
+static int acquire_packet_buffer(struct kernel_queue *kq,
+		size_t packet_size_in_dwords, unsigned int **buffer_ptr)
+{
+	size_t available_size;
+	size_t queue_size_dwords;
+	qptr_t wptr, rptr;
+	unsigned int *queue_address;
+
+	BUG_ON(!kq || !buffer_ptr);
+
+	rptr = *kq->rptr_kernel;
+	wptr = *kq->wptr_kernel;
+	queue_address = (unsigned int *)kq->pq_kernel_addr;
+	queue_size_dwords = kq->queue->properties.queue_size / sizeof(uint32_t);
+
+	pr_debug("kfd: In func %s\nrptr: %d\nwptr: %d\nqueue_address 0x%p\n", __func__, rptr, wptr, queue_address);
+
+	available_size = (rptr - 1 - wptr + queue_size_dwords) % queue_size_dwords;
+
+	if (packet_size_in_dwords >= queue_size_dwords ||
+			packet_size_in_dwords >= available_size)
+		return -ENOMEM;
+
+	if (wptr + packet_size_in_dwords > queue_size_dwords) {
+		while (wptr > 0) {
+			queue_address[wptr] = kq->nop_packet;
+			wptr = (wptr + 1) % queue_size_dwords;
+		}
+	}
+
+	*buffer_ptr = &queue_address[wptr];
+	kq->pending_wptr = wptr + packet_size_in_dwords;
+
+	return 0;
+}
+
+static void submit_packet(struct kernel_queue *kq)
+{
+#ifdef DEBUG
+	int i;
+#endif
+
+	BUG_ON(!kq);
+
+#ifdef DEBUG
+	for (i = *kq->wptr_kernel; i < kq->pending_wptr; i++) {
+		pr_debug("0x%2X ", kq->pq_kernel_addr[i]);
+		if (i % 15 == 0)
+			pr_debug("\n");
+	}
+	pr_debug("\n");
+#endif
+
+	*kq->wptr_kernel = kq->pending_wptr;
+	write_kernel_doorbell((u32 *)kq->queue->properties.doorbell_ptr, kq->pending_wptr);
+}
+
+static int sync_with_hw(struct kernel_queue *kq, unsigned long timeout_ms)
+{
+	unsigned long org_timeout_ms;
+
+	BUG_ON(!kq);
+
+	org_timeout_ms = timeout_ms;
+	timeout_ms += jiffies * 1000 / HZ;
+	while (*kq->wptr_kernel != *kq->rptr_kernel) {
+		if (time_after(jiffies * 1000 / HZ, timeout_ms)) {
+			pr_err("kfd: kernel_queue %s timeout expired %lu\n",
+				__func__, org_timeout_ms);
+			pr_err("kfd: wptr: %d rptr: %d\n",
+				*kq->wptr_kernel, *kq->rptr_kernel);
+			return -ETIME;
+		}
+		cpu_relax();
+	}
+
+	return 0;
+}
+
+static void rollback_packet(struct kernel_queue *kq)
+{
+	BUG_ON(!kq);
+	kq->pending_wptr = *kq->queue->properties.write_ptr;
+}
+
+struct kernel_queue *kernel_queue_init(struct kfd_dev *dev, enum kfd_queue_type type)
+{
+	struct kernel_queue *kq;
+
+	BUG_ON(!dev);
+
+	kq = kzalloc(sizeof(struct kernel_queue), GFP_KERNEL);
+	if (!kq)
+		return NULL;
+
+	kq->initialize = initialize;
+	kq->uninitialize = uninitialize;
+	kq->acquire_packet_buffer = acquire_packet_buffer;
+	kq->submit_packet = submit_packet;
+	kq->sync_with_hw = sync_with_hw;
+	kq->rollback_packet = rollback_packet;
+
+	if (kq->initialize(kq, dev, type, 2048) == false) {
+		pr_err("kfd: failed to init kernel queue\n");
+		kfree(kq);
+		return NULL;
+	}
+	return kq;
+}
+
+void kernel_queue_uninit(struct kernel_queue *kq)
+{
+	BUG_ON(!kq);
+
+	kq->uninitialize(kq);
+	kfree(kq);
+}
+
+void test_kq(struct kfd_dev *dev)
+{
+	struct kernel_queue *kq;
+	uint32_t *buffer, i;
+	int retval;
+
+	BUG_ON(!dev);
+
+	pr_debug("kfd: starting kernel queue test\n");
+
+	kq = kernel_queue_init(dev, KFD_QUEUE_TYPE_HIQ);
+	BUG_ON(!kq);
+
+	retval = kq->acquire_packet_buffer(kq, 5, &buffer);
+	BUG_ON(retval != 0);
+	for (i = 0; i < 5; i++)
+		buffer[i] = kq->nop_packet;
+	kq->submit_packet(kq);
+	kq->sync_with_hw(kq, 1000);
+
+	pr_debug("kfd: ending kernel queue test\n");
+}
+
+
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h
new file mode 100644
index 0000000..abfb9c8
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h
@@ -0,0 +1,66 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef KFD_KERNEL_QUEUE_H_
+#define KFD_KERNEL_QUEUE_H_
+
+#include <linux/list.h>
+#include <linux/types.h>
+#include "kfd_priv.h"
+
+struct kernel_queue {
+	/* interface */
+	bool	(*initialize)(struct kernel_queue *kq, struct kfd_dev *dev,
+			enum kfd_queue_type type, unsigned int queue_size);
+	void	(*uninitialize)(struct kernel_queue *kq);
+	int	(*acquire_packet_buffer)(struct kernel_queue *kq,
+			size_t packet_size_in_dwords, unsigned int **buffer_ptr);
+	void	(*submit_packet)(struct kernel_queue *kq);
+	int	(*sync_with_hw)(struct kernel_queue *kq, unsigned long timeout_ms);
+	void	(*rollback_packet)(struct kernel_queue *kq);
+
+	/* data */
+	struct kfd_dev		*dev;
+	struct mqd_manager	*mqd;
+	struct queue		*queue;
+	qptr_t			pending_wptr;
+	unsigned int		nop_packet;
+
+	kfd_mem_obj		rptr_mem;
+	qptr_t			*rptr_kernel;
+	uint64_t		rptr_gpu_addr;
+	kfd_mem_obj		wptr_mem;
+	qptr_t			*wptr_kernel;
+	uint64_t		wptr_gpu_addr;
+	kfd_mem_obj		pq;
+	uint64_t		pq_gpu_addr;
+	qptr_t			*pq_kernel_addr;
+
+	kfd_mem_obj		fence_mem_obj;
+	uint64_t		fence_gpu_addr;
+	void			*fence_kernel_address;
+
+	struct list_head	list;
+};
+
+#endif /* KFD_KERNEL_QUEUE_H_ */
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h
new file mode 100644
index 0000000..95e46f8
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h
@@ -0,0 +1,682 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef KFD_PM4_HEADERS_H_
+#define KFD_PM4_HEADERS_H_
+
+#ifndef PM4_HEADER_DEFINED
+#define PM4_HEADER_DEFINED
+
+union PM4_TYPE_3_HEADER {
+	struct {
+		unsigned int predicate:1;	/* < 0 for diq packets */
+		unsigned int shader_type:1;	/* < 0 for diq packets */
+		unsigned int reserved1:6;	/* < reserved */
+		unsigned int opcode:8;		/* < IT opcode */
+		unsigned int count:14;		/* < number of DWORDs - 1 in the information body. */
+		unsigned int type:2;		/* < packet identifier. It should be 3 for type 3 packets */
+	};
+	unsigned int u32all;
+};
+#endif
+
+/*
+ * --------------------_MAP_QUEUES--------------------
+ */
+
+#ifndef _PM4__MAP_QUEUES_DEFINED
+#define _PM4__MAP_QUEUES_DEFINED
+enum _map_queues_queue_sel_enum {
+	queue_sel___map_queues__map_to_specified_queue_slots = 0,
+	queue_sel___map_queues__map_to_hws_determined_queue_slots = 1,
+	queue_sel___map_queues__enable_process_queues = 2,
+	queue_sel___map_queues__reserved = 3 };
+
+enum _map_queues_vidmem_enum {
+	vidmem___map_queues__uses_no_video_memory = 0,
+	vidmem___map_queues__uses_video_memory = 1 };
+
+enum _map_queues_alloc_format_enum {
+	alloc_format___map_queues__one_per_pipe = 0,
+	alloc_format___map_queues__all_on_one_pipe = 1 };
+
+enum _map_queues_engine_sel_enum {
+	engine_sel___map_queues__compute = 0,
+	engine_sel___map_queues__sdma0_queue = 2,
+	engine_sel___map_queues__sdma1_queue = 3 };
+
+struct pm4_map_queues {
+	union {
+		union PM4_TYPE_3_HEADER header;
+		unsigned int ordinal1;
+	};
+
+	union {
+		struct {
+			unsigned int reserved1:4;
+			enum _map_queues_queue_sel_enum queue_sel:2;
+			unsigned int reserved2:2;
+			unsigned int vmid:4;
+			unsigned int reserved3:4;
+			enum _map_queues_vidmem_enum vidmem:2;
+			unsigned int reserved4:6;
+			enum _map_queues_alloc_format_enum alloc_format:2;
+			enum _map_queues_engine_sel_enum engine_sel:3;
+			unsigned int num_queues:3;
+		} bitfields2;
+		unsigned int ordinal2;
+	};
+
+	struct {
+		union {
+			struct {
+				unsigned int reserved5:2;
+				unsigned int doorbell_offset:21;
+				unsigned int reserved6:3;
+				unsigned int queue:6;
+			} bitfields3;
+			unsigned int ordinal3;
+		};
+
+		unsigned int mqd_addr_lo;
+		unsigned int mqd_addr_hi;
+		unsigned int wptr_addr_lo;
+		unsigned int wptr_addr_hi;
+
+	} _map_queues_ordinals[1];	/* 1..N of these ordinal groups */
+
+};
+#endif
+
+/*
+ * --------------------_QUERY_STATUS--------------------
+ */
+
+#ifndef _PM4__QUERY_STATUS_DEFINED
+#define _PM4__QUERY_STATUS_DEFINED
+enum _query_status_interrupt_sel_enum {
+	interrupt_sel___query_status__completion_status = 0,
+	interrupt_sel___query_status__process_status = 1,
+	interrupt_sel___query_status__queue_status = 2,
+	interrupt_sel___query_status__reserved = 3 };
+
+enum _query_status_command_enum {
+	command___query_status__interrupt_only = 0,
+	command___query_status__fence_only_immediate = 1,
+	command___query_status__fence_only_after_write_ack = 2,
+	command___query_status__fence_wait_for_write_ack_send_interrupt = 3 };
+
+enum _query_status_engine_sel_enum {
+	engine_sel___query_status__compute = 0,
+	engine_sel___query_status__sdma0 = 2,
+	engine_sel___query_status__sdma1 = 3 };
+
+struct pm4_query_status {
+	union {
+		union PM4_TYPE_3_HEADER header;
+		unsigned int ordinal1;
+	};
+
+	union {
+		struct {
+			unsigned int context_id:28;
+			enum _query_status_interrupt_sel_enum interrupt_sel:2;
+			enum _query_status_command_enum command:2;
+		} bitfields2;
+		unsigned int ordinal2;
+	};
+
+	union {
+		struct {
+			unsigned int pasid:16;
+			unsigned int reserved1:16;
+		} bitfields3;
+		struct {
+			unsigned int reserved2:2;
+			unsigned int doorbell_offset:21;
+			unsigned int reserved3:3;
+			enum _query_status_engine_sel_enum engine_sel:3;
+			unsigned int reserved4:3;
+		} bitfields4;
+		unsigned int ordinal3;
+	};
+
+	unsigned int addr_lo;
+	unsigned int addr_hi;
+	unsigned int data_lo;
+	unsigned int data_hi;
+
+};
+#endif
+
+/*
+ * --------------------_UNMAP_QUEUES--------------------
+ */
+
+#ifndef _PM4__UNMAP_QUEUES_DEFINED
+#define _PM4__UNMAP_QUEUES_DEFINED
+enum _unmap_queues_action_enum {
+	action___unmap_queues__preempt_queues = 0,
+	action___unmap_queues__reset_queues = 1,
+	action___unmap_queues__disable_process_queues = 2,
+	action___unmap_queues__reserved = 3 };
+
+enum _unmap_queues_queue_sel_enum {
+	queue_sel___unmap_queues__perform_request_on_specified_queues = 0,
+	queue_sel___unmap_queues__perform_request_on_pasid_queues = 1,
+	queue_sel___unmap_queues__perform_request_on_all_active_queues = 2,
+	queue_sel___unmap_queues__reserved = 3 };
+
+enum _unmap_queues_engine_sel_enum {
+	engine_sel___unmap_queues__compute = 0,
+	engine_sel___unmap_queues__sdma0 = 2,
+	engine_sel___unmap_queues__sdma1 = 3 };
+
+struct pm4_unmap_queues {
+	union {
+		union PM4_TYPE_3_HEADER header;
+		unsigned int ordinal1;
+	};
+
+	union {
+		struct {
+			enum _unmap_queues_action_enum action:2;
+			unsigned int reserved1:2;
+			enum _unmap_queues_queue_sel_enum queue_sel:2;
+			unsigned int reserved2:20;
+			enum _unmap_queues_engine_sel_enum engine_sel:3;
+			unsigned int num_queues:3;
+		} bitfields2;
+		unsigned int ordinal2;
+	};
+
+	union {
+		struct {
+			unsigned int pasid:16;
+			unsigned int reserved3:16;
+		} bitfields3;
+		struct {
+			unsigned int reserved4:2;
+			unsigned int doorbell_offset0:21;
+			unsigned int reserved5:9;
+		} bitfields4;
+		unsigned int ordinal3;
+	};
+
+	union {
+		struct {
+			unsigned int reserved6:2;
+			unsigned int doorbell_offset1:21;
+			unsigned int reserved7:9;
+		} bitfields5;
+		unsigned int ordinal4;
+	};
+
+	union {
+		struct {
+			unsigned int reserved8:2;
+			unsigned int doorbell_offset2:21;
+			unsigned int reserved9:9;
+		} bitfields6;
+		unsigned int ordinal5;
+	};
+
+	union {
+		struct {
+			unsigned int reserved10:2;
+			unsigned int doorbell_offset3:21;
+			unsigned int reserved11:9;
+		} bitfields7;
+		unsigned int ordinal6;
+	};
+
+};
+#endif
+
+/*
+ * --------------------_SET_RESOURCES--------------------
+ */
+
+#ifndef _PM4__SET_RESOURCES_DEFINED
+#define _PM4__SET_RESOURCES_DEFINED
+enum _set_resources_queue_type_enum {
+	queue_type___set_resources__hsa_interface_queue_hiq = 1,
+	queue_type___set_resources__hsa_debug_interface_queue = 4 };
+
+struct pm4_set_resources {
+	union {
+		union PM4_TYPE_3_HEADER header;
+		unsigned int ordinal1;
+	};
+
+	union {
+		struct {
+
+			unsigned int vmid_mask:16;
+			unsigned int unmap_latency:8;
+			unsigned int reserved1:5;
+			enum _set_resources_queue_type_enum queue_type:3;
+		} bitfields2;
+		unsigned int ordinal2;
+	};
+
+	unsigned int queue_mask_lo;
+	unsigned int queue_mask_hi;
+	unsigned int gws_mask_lo;
+	unsigned int gws_mask_hi;
+
+	union {
+		struct {
+			unsigned int oac_mask:16;
+			unsigned int reserved2:16;
+		} bitfields3;
+		unsigned int ordinal7;
+	};
+
+	union {
+		struct {
+			unsigned int gds_heap_base:6;
+			unsigned int reserved3:5;
+			unsigned int gds_heap_size:6;
+			unsigned int reserved4:15;
+		} bitfields4;
+		unsigned int ordinal8;
+	};
+
+};
+#endif
+
+/*
+ * --------------------_RUN_LIST--------------------
+ */
+
+#ifndef _PM4__RUN_LIST_DEFINED
+#define _PM4__RUN_LIST_DEFINED
+
+struct pm4_runlist {
+	union {
+		union PM4_TYPE_3_HEADER header;
+		unsigned int ordinal1;
+	};
+
+	union {
+		struct {
+			unsigned int reserved1:2;
+			unsigned int ib_base_lo:30;
+		} bitfields2;
+		unsigned int ordinal2;
+	};
+
+	union {
+		struct {
+			unsigned int ib_base_hi:16;
+			unsigned int reserved2:16;
+		} bitfields3;
+		unsigned int ordinal3;
+	};
+
+	union {
+		struct {
+			unsigned int ib_size:20;
+			unsigned int chain:1;
+			unsigned int offload_polling:1;
+			unsigned int reserved3:1;
+			unsigned int valid:1;
+			unsigned int vmid:4;
+			unsigned int reserved4:4;
+		} bitfields4;
+		unsigned int ordinal4;
+	};
+
+};
+#endif
+
+/*
+ * --------------------_MAP_PROCESS--------------------
+ */
+
+#ifndef _PM4__MAP_PROCESS_DEFINED
+#define _PM4__MAP_PROCESS_DEFINED
+
+struct pm4_map_process {
+	union {
+		union PM4_TYPE_3_HEADER header;
+		unsigned int ordinal1;
+	};
+
+	union {
+		struct {
+			unsigned int pasid:16;
+			unsigned int reserved1:8;
+			unsigned int diq_enable:1;
+			unsigned int reserved2:7;
+		} bitfields2;
+		unsigned int ordinal2;
+	};
+
+	union {
+		struct {
+			unsigned int page_table_base:28;
+			unsigned int reserved3:4;
+		} bitfields3;
+		unsigned int ordinal3;
+	};
+
+	unsigned int sh_mem_bases;
+	unsigned int sh_mem_ape1_base;
+	unsigned int sh_mem_ape1_limit;
+	unsigned int sh_mem_config;
+	unsigned int gds_addr_lo;
+	unsigned int gds_addr_hi;
+
+	union {
+		struct {
+			unsigned int num_gws:6;
+			unsigned int reserved4:2;
+			unsigned int num_oac:4;
+			unsigned int reserved5:4;
+			unsigned int gds_size:6;
+			unsigned int reserved6:10;
+		} bitfields4;
+		unsigned int ordinal10;
+	};
+
+};
+#endif
+
+/*--------------------_MAP_QUEUES--------------------*/
+
+#ifndef _PM4__MAP_QUEUES_DEFINED
+#define _PM4__MAP_QUEUES_DEFINED
+enum _MAP_QUEUES_queue_sel_enum {
+	 queue_sel___map_queues__map_to_specified_queue_slots = 0,
+	 queue_sel___map_queues__map_to_hws_determined_queue_slots = 1,
+	 queue_sel___map_queues__enable_process_queues = 2,
+	 queue_sel___map_queues__reserved = 3 };
+
+enum _MAP_QUEUES_vidmem_enum {
+	 vidmem___map_queues__uses_no_video_memory = 0,
+	 vidmem___map_queues__uses_video_memory = 1 };
+
+enum _MAP_QUEUES_alloc_format_enum {
+	 alloc_format___map_queues__one_per_pipe = 0,
+	 alloc_format___map_queues__all_on_one_pipe = 1 };
+
+enum _MAP_QUEUES_engine_sel_enum {
+	 engine_sel___map_queues__compute = 0,
+	 engine_sel___map_queues__sdma0_queue = 2,
+	 engine_sel___map_queues__sdma1_queue = 3 };
+
+
+struct _PM4__MAP_QUEUES {
+	union {
+		PM4_TYPE_3_HEADER header;
+		unsigned int ordinal1;
+	};
+
+	union {
+		struct {
+			unsigned int reserved1:4;
+			enum _MAP_QUEUES_queue_sel_enum queue_sel:2;
+			unsigned int reserved2:2;
+			unsigned int vmid:4;
+			unsigned int reserved3:4;
+			enum _MAP_QUEUES_vidmem_enum vidmem:2;
+			unsigned int reserved4:6;
+			enum _MAP_QUEUES_alloc_format_enum alloc_format:2;
+			enum _MAP_QUEUES_engine_sel_enum engine_sel:3;
+			unsigned int num_queues:3;
+		} bitfields2;
+		unsigned int ordinal2;
+	};
+
+	struct {
+		union {
+			struct {
+				unsigned int reserved5:2;
+				unsigned int doorbell_offset:21;
+				unsigned int reserved6:3;
+				unsigned int queue:6;
+			} bitfields3;
+			unsigned int ordinal3;
+		};
+
+		unsigned int mqd_addr_lo;
+
+		unsigned int mqd_addr_hi;
+
+		unsigned int wptr_addr_lo;
+
+		unsigned int wptr_addr_hi;
+
+	} _map_queues_ordinals[1];	/* 1..N of these ordinal groups */
+
+};
+#endif
+
+/*--------------------_QUERY_STATUS--------------------*/
+
+#ifndef _PM4__QUERY_STATUS_DEFINED
+#define _PM4__QUERY_STATUS_DEFINED
+enum _QUERY_STATUS_interrupt_sel_enum {
+	 interrupt_sel___query_status__completion_status = 0,
+	 interrupt_sel___query_status__process_status = 1,
+	 interrupt_sel___query_status__queue_status = 2,
+	 interrupt_sel___query_status__reserved = 3 };
+
+enum _QUERY_STATUS_command_enum {
+	 command___query_status__interrupt_only = 0,
+	 command___query_status__fence_only_immediate = 1,
+	 command___query_status__fence_only_after_write_ack = 2,
+	 command___query_status__fence_wait_for_write_ack_send_interrupt = 3 };
+
+enum _QUERY_STATUS_engine_sel_enum {
+	 engine_sel___query_status__compute = 0,
+	 engine_sel___query_status__sdma0 = 2,
+	 engine_sel___query_status__sdma1 = 3 };
+
+
+struct _PM4__QUERY_STATUS {
+	union {
+		PM4_TYPE_3_HEADER header;
+		unsigned int ordinal1;
+	};
+
+	union {
+		struct {
+			unsigned int context_id:28;
+			enum _QUERY_STATUS_interrupt_sel_enum interrupt_sel:2;
+			enum _QUERY_STATUS_command_enum command:2;
+		} bitfields2;
+		unsigned int ordinal2;
+	};
+
+	union {
+		struct {
+			unsigned int pasid:16;
+			unsigned int reserved1:16;
+		} bitfields3;
+		struct {
+			unsigned int reserved2:2;
+			unsigned int doorbell_offset:21;
+			unsigned int reserved3:3;
+			enum _QUERY_STATUS_engine_sel_enum engine_sel:3;
+			unsigned int reserved4:3;
+		} bitfields4;
+		unsigned int ordinal3;
+	};
+
+	unsigned int addr_lo;
+
+	unsigned int addr_hi;
+
+	unsigned int data_lo;
+
+	unsigned int data_hi;
+
+};
+#endif
+
+/*
+ *  --------------------UNMAP_QUEUES--------------------
+ */
+
+#ifndef _PM4__UNMAP_QUEUES_DEFINED
+#define _PM4__UNMAP_QUEUES_DEFINED
+enum _unmap_queues_action_enum {
+	 action___unmap_queues__preempt_queues = 0,
+	 action___unmap_queues__reset_queues = 1,
+	 action___unmap_queues__disable_process_queues = 2,
+	 action___unmap_queues__reserved = 3 };
+
+enum _unmap_queues_queue_sel_enum {
+	 queue_sel___unmap_queues__perform_request_on_specified_queues = 0,
+	 queue_sel___unmap_queues__perform_request_on_pasid_queues = 1,
+	 queue_sel___unmap_queues__perform_request_on_all_active_queues = 2,
+	 queue_sel___unmap_queues__reserved = 3 };
+
+enum _unmap_queues_engine_sel_enum {
+	 engine_sel___unmap_queues__compute = 0,
+	 engine_sel___unmap_queues__sdma0 = 2,
+	 engine_sel___unmap_queues__sdma1 = 3 };
+
+
+struct pm4_unmap_queues {
+	union {
+		PM4_TYPE_3_HEADER header;
+		unsigned int ordinal1;
+	};
+
+	union {
+		struct {
+			_unmap_queues_action_enum action:2;
+			unsigned int reserved1:2;
+
+			_unmap_queues_queue_sel_enum queue_sel:2;
+			unsigned int reserved2:20;
+
+			_unmap_queues_engine_sel_enum engine_sel:3;
+			unsigned int num_queues:3;
+		} bitfields2;
+		unsigned int ordinal2;
+	};
+
+	union {
+		struct {
+			unsigned int pasid:16;
+			unsigned int reserved3:16;
+		} bitfields3;
+		struct {
+			unsigned int reserved4:2;
+			unsigned int doorbell_offset0:21;
+			unsigned int reserved5:9;
+		} bitfields4;
+		unsigned int ordinal3;
+	};
+
+	union {
+		struct {
+			unsigned int reserved6:2;
+			unsigned int doorbell_offset1:21;
+			unsigned int reserved7:9;
+		} bitfields5;
+		unsigned int ordinal4;
+	};
+
+	union {
+		struct {
+			unsigned int reserved8:2;
+			unsigned int doorbell_offset2:21;
+			unsigned int reserved9:9;
+		} bitfields6;
+		unsigned int ordinal5;
+	};
+
+	union {
+		struct {
+			unsigned int reserved10:2;
+			unsigned int doorbell_offset3:21;
+			unsigned int reserved11:9;
+		} bitfields7;
+		unsigned int ordinal6;
+	};
+
+};
+#endif
+
+/* --------------------_SET_SH_REG--------------------*/
+
+#ifndef _PM4__SET_SH_REG_DEFINED
+#define _PM4__SET_SH_REG_DEFINED
+
+struct _PM4__SET_SH_REG {
+	union {
+		union PM4_TYPE_3_HEADER header;
+		unsigned int ordinal1;
+	};
+
+	union {
+		struct {
+			unsigned int reg_offset:16;
+			unsigned int reserved1:8;
+			unsigned int vmid_shift:5;
+			unsigned int insert_vmid:1;
+			unsigned int reserved2:1;
+			unsigned int non_incr_addr:1;
+		} bitfields2;
+		unsigned int ordinal2;
+	};
+
+	unsigned int reg_data[1];	/* 1..N of these fields */
+
+};
+#endif
+
+/*--------------------_SET_CONFIG_REG--------------------*/
+
+#ifndef _PM4__SET_CONFIG_REG_DEFINED
+#define _PM4__SET_CONFIG_REG_DEFINED
+
+struct pm4__set_config_reg {
+	union {
+		union PM4_TYPE_3_HEADER header;
+		unsigned int ordinal1;
+	};
+
+	union {
+		struct {
+			unsigned int reg_offset:16;
+			unsigned int reserved1:8;
+			unsigned int vmid_shift:5;
+			unsigned int insert_vmid:1;
+			unsigned int reserved2:2;
+		} bitfields2;
+		unsigned int ordinal2;
+	};
+
+	unsigned int reg_data[1];	/* 1..N of these fields */
+
+};
+#endif
+
+#endif /* KFD_PM4_HEADERS_H_ */
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h
new file mode 100644
index 0000000..b72fa3b
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h
@@ -0,0 +1,107 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+
+#ifndef KFD_PM4_OPCODES_H
+#define KFD_PM4_OPCODES_H
+
+enum it_opcode_type {
+	IT_NOP                               = 0x10,
+	IT_SET_BASE                          = 0x11,
+	IT_CLEAR_STATE                       = 0x12,
+	IT_INDEX_BUFFER_SIZE                 = 0x13,
+	IT_DISPATCH_DIRECT                   = 0x15,
+	IT_DISPATCH_INDIRECT                 = 0x16,
+	IT_ATOMIC_GDS                        = 0x1D,
+	IT_OCCLUSION_QUERY                   = 0x1F,
+	IT_SET_PREDICATION                   = 0x20,
+	IT_REG_RMW                           = 0x21,
+	IT_COND_EXEC                         = 0x22,
+	IT_PRED_EXEC                         = 0x23,
+	IT_DRAW_INDIRECT                     = 0x24,
+	IT_DRAW_INDEX_INDIRECT               = 0x25,
+	IT_INDEX_BASE                        = 0x26,
+	IT_DRAW_INDEX_2                      = 0x27,
+	IT_CONTEXT_CONTROL                   = 0x28,
+	IT_INDEX_TYPE                        = 0x2A,
+	IT_DRAW_INDIRECT_MULTI               = 0x2C,
+	IT_DRAW_INDEX_AUTO                   = 0x2D,
+	IT_NUM_INSTANCES                     = 0x2F,
+	IT_DRAW_INDEX_MULTI_AUTO             = 0x30,
+	IT_INDIRECT_BUFFER_CNST              = 0x33,
+	IT_STRMOUT_BUFFER_UPDATE             = 0x34,
+	IT_DRAW_INDEX_OFFSET_2               = 0x35,
+	IT_DRAW_PREAMBLE                     = 0x36,
+	IT_WRITE_DATA                        = 0x37,
+	IT_DRAW_INDEX_INDIRECT_MULTI         = 0x38,
+	IT_MEM_SEMAPHORE                     = 0x39,
+	IT_COPY_DW                           = 0x3B,
+	IT_WAIT_REG_MEM                      = 0x3C,
+	IT_INDIRECT_BUFFER                   = 0x3F,
+	IT_COPY_DATA                         = 0x40,
+	IT_PFP_SYNC_ME                       = 0x42,
+	IT_SURFACE_SYNC                      = 0x43,
+	IT_COND_WRITE                        = 0x45,
+	IT_EVENT_WRITE                       = 0x46,
+	IT_EVENT_WRITE_EOP                   = 0x47,
+	IT_EVENT_WRITE_EOS                   = 0x48,
+	IT_RELEASE_MEM                       = 0x49,
+	IT_PREAMBLE_CNTL                     = 0x4A,
+	IT_DMA_DATA                          = 0x50,
+	IT_ACQUIRE_MEM                       = 0x58,
+	IT_REWIND                            = 0x59,
+	IT_LOAD_UCONFIG_REG                  = 0x5E,
+	IT_LOAD_SH_REG                       = 0x5F,
+	IT_LOAD_CONFIG_REG                   = 0x60,
+	IT_LOAD_CONTEXT_REG                  = 0x61,
+	IT_SET_CONFIG_REG                    = 0x68,
+	IT_SET_CONTEXT_REG                   = 0x69,
+	IT_SET_CONTEXT_REG_INDIRECT          = 0x73,
+	IT_SET_SH_REG                        = 0x76,
+	IT_SET_SH_REG_OFFSET                 = 0x77,
+	IT_SET_QUEUE_REG                     = 0x78,
+	IT_SET_UCONFIG_REG                   = 0x79,
+	IT_SCRATCH_RAM_WRITE                 = 0x7D,
+	IT_SCRATCH_RAM_READ                  = 0x7E,
+	IT_LOAD_CONST_RAM                    = 0x80,
+	IT_WRITE_CONST_RAM                   = 0x81,
+	IT_DUMP_CONST_RAM                    = 0x83,
+	IT_INCREMENT_CE_COUNTER              = 0x84,
+	IT_INCREMENT_DE_COUNTER              = 0x85,
+	IT_WAIT_ON_CE_COUNTER                = 0x86,
+	IT_WAIT_ON_DE_COUNTER_DIFF           = 0x88,
+	IT_SWITCH_BUFFER                     = 0x8B,
+	IT_SET_RESOURCES                     = 0xA0,
+	IT_MAP_PROCESS                       = 0xA1,
+	IT_MAP_QUEUES                        = 0xA2,
+	IT_UNMAP_QUEUES                      = 0xA3,
+	IT_QUERY_STATUS                      = 0xA4,
+	IT_RUN_LIST                          = 0xA5,
+};
+
+#define PM4_TYPE_0 0
+#define PM4_TYPE_2 2
+#define PM4_TYPE_3 3
+
+#endif /* KFD_PM4_OPCODES_H */
+
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
index 76494757..25f23c5 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -49,6 +49,15 @@
 #define KFD_MMAP_DOORBELL_START	(((1ULL << 32)*1) >> PAGE_SHIFT)
 #define KFD_MMAP_DOORBELL_END	(((1ULL << 32)*2) >> PAGE_SHIFT)
 
+/*
+ * When working with cp scheduler we should assign the HIQ manually or via the radeon driver
+ * to a fixed hqd slot, here are the fixed HIQ hqd slot definitions for Kaveri.
+ * In Kaveri only the first ME queues participates in the cp scheduling taking that in mind
+ * we set the HIQ slot in the second ME.
+ */
+#define KFD_CIK_HIQ_PIPE 4
+#define KFD_CIK_HIQ_QUEUE 0
+
 /* GPU ID hash width in bits */
 #define KFD_GPU_ID_HASH_WIDTH 16
 
@@ -68,6 +77,11 @@ typedef u32 doorbell_t;
 /* Type that represents queue pointer */
 typedef u32 qptr_t;
 
+enum cache_policy {
+	cache_policy_coherent,
+	cache_policy_noncoherent
+};
+
 struct kfd_device_info {
 	const struct kfd_scheduler_class *scheduler_class;
 	unsigned int max_pasid_bits;
@@ -96,6 +110,9 @@ struct kfd_dev {
 	u32 __iomem *doorbell_kernel_ptr; /* this is a pointer for a doorbells page used by kernel queue */
 
 	struct kgd2kfd_shared_resources shared_resources;
+
+	/* QCM Device instance */
+	struct device_queue_manager *dqm;
 };
 
 /* KGD2KFD callbacks */
@@ -300,4 +317,19 @@ int kgd2kfd_resume(struct kfd_dev *dev);
 /* amdkfd Apertures */
 int kfd_init_apertures(struct kfd_process *process);
 
+/* Queue Context Management */
+int init_queue(struct queue **q, struct queue_properties properties);
+void uninit_queue(struct queue *q);
+void print_queue(struct queue *q);
+
+/* Packet Manager */
+
+struct packet_manager {
+	struct device_queue_manager *dqm;
+	struct kernel_queue *priv_queue;
+	struct mutex lock;
+	bool allocated;
+	kfd_mem_obj ib_buffer_obj;
+};
+
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 15/25] amdkfd: Add kernel queue module
  2014-07-17 13:29 ` [PATCH v2 15/25] amdkfd: Add kernel queue module Oded Gabbay
@ 2014-07-21  2:42   ` Jerome Glisse
  2014-07-27 11:05     ` Oded Gabbay
  0 siblings, 1 reply; 46+ messages in thread
From: Jerome Glisse @ 2014-07-21  2:42 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Evgeny Pinchuk, Alexey Skidanov, Alex Deucher, Andrew Morton,
	Christian König

On Thu, Jul 17, 2014 at 04:29:22PM +0300, Oded Gabbay wrote:
> From: Ben Goz <ben.goz@amd.com>
> 
> The kernel queue module enables the amdkfd to establish kernel queues, not exposed to user space.
> 
> The kernel queues are used for HIQ (HSA Interface Queue) and DIQ (Debug Interface Queue) operations
> 
> Signed-off-by: Ben Goz <ben.goz@amd.com>
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/drm/radeon/amdkfd/Makefile             |   3 +-
>  .../drm/radeon/amdkfd/kfd_device_queue_manager.h   | 101 +++
>  drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c   | 305 +++++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h   |  66 ++
>  drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h    | 682 +++++++++++++++++++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h    | 107 ++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_priv.h           |  32 +
>  7 files changed, 1295 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.h
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h
>  create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h
> 
> diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
> index b5201f4..bead1be 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/Makefile
> +++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
> @@ -6,6 +6,7 @@ ccflags-y := -Iinclude/drm
>  
>  amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
>  		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
> -		kfd_process.o kfd_queue.o kfd_mqd_manager.o
> +		kfd_process.o kfd_queue.o kfd_mqd_manager.o \
> +		kfd_kernel_queue.o
>  
>  obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.h
> new file mode 100644
> index 0000000..037eaf8
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.h
> @@ -0,0 +1,101 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef KFD_DEVICE_QUEUE_MANAGER_H_
> +#define KFD_DEVICE_QUEUE_MANAGER_H_
> +
> +#include <linux/rwsem.h>
> +#include <linux/list.h>
> +#include "kfd_priv.h"
> +#include "kfd_mqd_manager.h"
> +
> +#define QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS	(500)
> +#define QUEUES_PER_PIPE				(8)
> +#define PIPE_PER_ME_CP_SCHEDULING		(3)
> +#define CIK_VMID_NUM				(8)
> +#define KFD_VMID_START_OFFSET			(8)
> +#define VMID_PER_DEVICE				CIK_VMID_NUM
> +#define KFD_DQM_FIRST_PIPE			(0)
> +
> +struct device_process_node {
> +	struct qcm_process_device *qpd;
> +	struct list_head list;
> +};
> +
> +struct device_queue_manager {
> +	int	(*create_queue)(struct device_queue_manager *dqm,
> +				struct queue *q,
> +				struct qcm_process_device *qpd,
> +				int *allocate_vmid);
> +	int	(*destroy_queue)(struct device_queue_manager *dqm,
> +				struct qcm_process_device *qpd,
> +				struct queue *q);
> +	int	(*update_queue)(struct device_queue_manager *dqm,
> +				struct queue *q);
> +	int	(*destroy_queues)(struct device_queue_manager *dqm);
> +	struct mqd_manager * (*get_mqd_manager)(struct device_queue_manager *dqm,
> +						enum KFD_MQD_TYPE type);
> +	int	(*execute_queues)(struct device_queue_manager *dqm);
> +	int	(*register_process)(struct device_queue_manager *dqm,
> +					struct qcm_process_device *qpd);
> +	int	(*unregister_process)(struct device_queue_manager *dqm,
> +					struct qcm_process_device *qpd);
> +	int	(*initialize)(struct device_queue_manager *dqm);
> +	int	(*start)(struct device_queue_manager *dqm);
> +	int	(*stop)(struct device_queue_manager *dqm);
> +	void	(*uninitialize)(struct device_queue_manager *dqm);
> +	int	(*create_kernel_queue)(struct device_queue_manager *dqm,
> +					struct kernel_queue *kq,
> +					struct qcm_process_device *qpd);
> +	void	(*destroy_kernel_queue)(struct device_queue_manager *dqm,
> +					struct kernel_queue *kq,
> +					struct qcm_process_device *qpd);
> +	bool	(*set_cache_memory_policy)(struct device_queue_manager *dqm,
> +					   struct qcm_process_device *qpd,
> +					   enum cache_policy default_policy,
> +					   enum cache_policy alternate_policy,
> +					   void __user *alternate_aperture_base,
> +					   uint64_t alternate_aperture_size);
> +
> +
> +	struct mqd_manager	*mqds[KFD_MQD_TYPE_MAX];
> +	struct packet_manager	packets;
> +	struct kfd_dev		*dev;
> +	struct mutex		lock;
> +	struct list_head	queues;
> +	unsigned int		processes_count;
> +	unsigned int		queue_count;
> +	unsigned int		next_pipe_to_allocate;
> +	unsigned int		*allocated_queues;
> +	unsigned int		vmid_bitmap;
> +	uint64_t		pipelines_addr;
> +	kfd_mem_obj		pipeline_mem;
> +	uint64_t		fence_gpu_addr;
> +	unsigned int		*fence_addr;
> +	kfd_mem_obj		fence_mem;
> +	bool			active_runlist;
> +};
> +
> +
> +
> +#endif /* KFD_DEVICE_QUEUE_MANAGER_H_ */
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c
> new file mode 100644
> index 0000000..b212524
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c
> @@ -0,0 +1,305 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#include <linux/types.h>
> +#include <linux/mutex.h>
> +#include <linux/slab.h>
> +#include <linux/printk.h>
> +#include "kfd_kernel_queue.h"
> +#include "kfd_priv.h"
> +#include "kfd_device_queue_manager.h"
> +#include "kfd_pm4_headers.h"
> +#include "kfd_pm4_opcodes.h"
> +
> +#define PM4_COUNT_ZERO (((1 << 15) - 1) << 16)
> +
> +static bool initialize(struct kernel_queue *kq, struct kfd_dev *dev,
> +		enum kfd_queue_type type, unsigned int queue_size)
> +{
> +	struct queue_properties prop;
> +	int retval;
> +	union PM4_TYPE_3_HEADER nop;
> +
> +	BUG_ON(!kq || !dev);
> +	BUG_ON(type != KFD_QUEUE_TYPE_DIQ && type != KFD_QUEUE_TYPE_HIQ);
> +
> +	pr_debug("kfd: In func %s initializing queue type %d size %d\n", __func__, KFD_QUEUE_TYPE_HIQ, queue_size);
> +
> +	nop.opcode = IT_NOP;
> +	nop.type = PM4_TYPE_3;
> +	nop.u32all |= PM4_COUNT_ZERO;
> +
> +	kq->dev = dev;
> +	kq->nop_packet = nop.u32all;
> +	switch (type) {
> +	case KFD_QUEUE_TYPE_DIQ:
> +	case KFD_QUEUE_TYPE_HIQ:
> +		kq->mqd = dev->dqm->get_mqd_manager(dev->dqm, KFD_MQD_TYPE_CIK_HIQ);
> +		break;
> +	default:
> +		BUG();
> +		break;
> +	}
> +
> +	if (kq->mqd == NULL)
> +		return false;
> +
> +	prop.doorbell_ptr = (qptr_t *)kfd_get_kernel_doorbell(dev, &prop.doorbell_off);
> +	if (prop.doorbell_ptr == NULL)
> +		goto err_get_kernel_doorbell;
> +
> +	retval = kfd_vidmem_alloc_map(dev, &kq->pq, (void **)&kq->pq_kernel_addr, &kq->pq_gpu_addr, queue_size);
> +	if (retval != 0)
> +		goto err_pq_allocate_vidmem;
> +
> +	retval = kfd_vidmem_alloc_map(kq->dev, &kq->rptr_mem, (void **)&kq->rptr_kernel, &kq->rptr_gpu_addr,
> +					     sizeof(*kq->rptr_kernel));
> +	if (retval != 0)
> +		goto err_rptr_allocate_vidmem;
> +
> +	retval = kfd_vidmem_alloc_map(kq->dev, &kq->wptr_mem, (void **)&kq->wptr_kernel, &kq->wptr_gpu_addr,
> +					     sizeof(*kq->rptr_kernel));
> +	if (retval != 0)
> +		goto err_wptr_allocate_vidmem;
> +
> +	prop.queue_size = queue_size;
> +	prop.is_interop = false;
> +	prop.priority = 1;
> +	prop.queue_percent = 100;
> +	prop.type = type;
> +	prop.vmid = 0;
> +	prop.queue_address = kq->pq_gpu_addr;
> +	prop.read_ptr = (qptr_t *) kq->rptr_gpu_addr;
> +	prop.write_ptr = (qptr_t *) kq->wptr_gpu_addr;
> +
> +	if (init_queue(&kq->queue, prop) != 0)
> +		goto err_init_queue;
> +
> +	kq->queue->device = dev;
> +	kq->queue->process = kfd_get_process(current);
> +
> +	retval = kq->mqd->init_mqd(kq->mqd, &kq->queue->mqd, &kq->queue->mqd_mem_obj,
> +				   &kq->queue->gart_mqd_addr, &kq->queue->properties);
> +	if (retval != 0)
> +		goto err_init_mqd;
> +
> +	/* assign HIQ to HQD */
> +	if (type == KFD_QUEUE_TYPE_HIQ) {
> +		pr_debug("assigning hiq to hqd\n");
> +		kq->queue->pipe = KFD_CIK_HIQ_PIPE;
> +		kq->queue->queue = KFD_CIK_HIQ_QUEUE;
> +		kq->mqd->load_mqd(kq->mqd, kq->queue->mqd, kq->queue->pipe, kq->queue->queue, NULL);
> +	} else {
> +		/* allocate fence for DIQ */
> +		retval = kfd_vidmem_alloc_map(
> +				dev,
> +				&kq->fence_mem_obj,
> +				&kq->fence_kernel_address,
> +				&kq->fence_gpu_addr,
> +				sizeof(uint32_t));
> +
> +		if (retval != 0)
> +			goto err_alloc_fence;
> +	}
> +
> +	print_queue(kq->queue);
> +
> +	return true;
> +err_alloc_fence:
> +err_init_mqd:
> +	uninit_queue(kq->queue);
> +err_init_queue:
> +	kfd_vidmem_free_unmap(kq->dev, kq->wptr_mem);
> +err_wptr_allocate_vidmem:
> +	kfd_vidmem_free_unmap(kq->dev, kq->rptr_mem);
> +err_rptr_allocate_vidmem:
> +	kfd_vidmem_free_unmap(kq->dev, kq->pq);
> +err_pq_allocate_vidmem:
> +	pr_err("kfd: error init pq\n");
> +	kfd_release_kernel_doorbell(dev, (u32 *)prop.doorbell_ptr);
> +err_get_kernel_doorbell:
> +	pr_err("kfd: error init doorbell");
> +	return false;
> +
> +}
> +
> +static void uninitialize(struct kernel_queue *kq)
> +{
> +	BUG_ON(!kq);
> +
> +	if (kq->queue->properties.type == KFD_QUEUE_TYPE_HIQ)
> +		kq->mqd->destroy_mqd(kq->mqd,
> +					false,
> +					QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS,
> +					kq->queue->pipe,
> +					kq->queue->queue);
> +
> +	kfd_vidmem_free_unmap(kq->dev, kq->rptr_mem);
> +	kfd_vidmem_free_unmap(kq->dev, kq->wptr_mem);
> +	kfd_vidmem_free_unmap(kq->dev, kq->pq);
> +	kfd_release_kernel_doorbell(kq->dev, (u32 *)kq->queue->properties.doorbell_ptr);
> +	uninit_queue(kq->queue);
> +}
> +
> +static int acquire_packet_buffer(struct kernel_queue *kq,
> +		size_t packet_size_in_dwords, unsigned int **buffer_ptr)
> +{
> +	size_t available_size;
> +	size_t queue_size_dwords;
> +	qptr_t wptr, rptr;
> +	unsigned int *queue_address;
> +
> +	BUG_ON(!kq || !buffer_ptr);
> +
> +	rptr = *kq->rptr_kernel;
> +	wptr = *kq->wptr_kernel;
> +	queue_address = (unsigned int *)kq->pq_kernel_addr;
> +	queue_size_dwords = kq->queue->properties.queue_size / sizeof(uint32_t);
> +
> +	pr_debug("kfd: In func %s\nrptr: %d\nwptr: %d\nqueue_address 0x%p\n", __func__, rptr, wptr, queue_address);
> +
> +	available_size = (rptr - 1 - wptr + queue_size_dwords) % queue_size_dwords;
> +
> +	if (packet_size_in_dwords >= queue_size_dwords ||
> +			packet_size_in_dwords >= available_size)
> +		return -ENOMEM;
> +
> +	if (wptr + packet_size_in_dwords > queue_size_dwords) {
> +		while (wptr > 0) {
> +			queue_address[wptr] = kq->nop_packet;
> +			wptr = (wptr + 1) % queue_size_dwords;
> +		}
> +	}
> +
> +	*buffer_ptr = &queue_address[wptr];
> +	kq->pending_wptr = wptr + packet_size_in_dwords;
> +
> +	return 0;
> +}
> +
> +static void submit_packet(struct kernel_queue *kq)
> +{
> +#ifdef DEBUG
> +	int i;
> +#endif
> +
> +	BUG_ON(!kq);
> +
> +#ifdef DEBUG
> +	for (i = *kq->wptr_kernel; i < kq->pending_wptr; i++) {
> +		pr_debug("0x%2X ", kq->pq_kernel_addr[i]);
> +		if (i % 15 == 0)
> +			pr_debug("\n");
> +	}
> +	pr_debug("\n");
> +#endif
> +
> +	*kq->wptr_kernel = kq->pending_wptr;
> +	write_kernel_doorbell((u32 *)kq->queue->properties.doorbell_ptr, kq->pending_wptr);
> +}
> +
> +static int sync_with_hw(struct kernel_queue *kq, unsigned long timeout_ms)
> +{
> +	unsigned long org_timeout_ms;
> +
> +	BUG_ON(!kq);
> +
> +	org_timeout_ms = timeout_ms;
> +	timeout_ms += jiffies * 1000 / HZ;
> +	while (*kq->wptr_kernel != *kq->rptr_kernel) {

I am not a fan of this kind of busy wait even with the cpu_relax below. Won't
there be some interrupt you can wait on (signaled through a wait queue perhaps) ?

> +		if (time_after(jiffies * 1000 / HZ, timeout_ms)) {
> +			pr_err("kfd: kernel_queue %s timeout expired %lu\n",
> +				__func__, org_timeout_ms);
> +			pr_err("kfd: wptr: %d rptr: %d\n",
> +				*kq->wptr_kernel, *kq->rptr_kernel);
> +			return -ETIME;
> +		}
> +		cpu_relax();
> +	}
> +
> +	return 0;
> +}
> +
> +static void rollback_packet(struct kernel_queue *kq)
> +{
> +	BUG_ON(!kq);
> +	kq->pending_wptr = *kq->queue->properties.write_ptr;
> +}
> +
> +struct kernel_queue *kernel_queue_init(struct kfd_dev *dev, enum kfd_queue_type type)
> +{
> +	struct kernel_queue *kq;
> +
> +	BUG_ON(!dev);
> +
> +	kq = kzalloc(sizeof(struct kernel_queue), GFP_KERNEL);
> +	if (!kq)
> +		return NULL;
> +
> +	kq->initialize = initialize;
> +	kq->uninitialize = uninitialize;
> +	kq->acquire_packet_buffer = acquire_packet_buffer;
> +	kq->submit_packet = submit_packet;
> +	kq->sync_with_hw = sync_with_hw;
> +	kq->rollback_packet = rollback_packet;
> +
> +	if (kq->initialize(kq, dev, type, 2048) == false) {
> +		pr_err("kfd: failed to init kernel queue\n");
> +		kfree(kq);
> +		return NULL;
> +	}
> +	return kq;
> +}
> +
> +void kernel_queue_uninit(struct kernel_queue *kq)
> +{
> +	BUG_ON(!kq);
> +
> +	kq->uninitialize(kq);
> +	kfree(kq);
> +}
> +
> +void test_kq(struct kfd_dev *dev)
> +{
> +	struct kernel_queue *kq;
> +	uint32_t *buffer, i;
> +	int retval;
> +
> +	BUG_ON(!dev);
> +
> +	pr_debug("kfd: starting kernel queue test\n");
> +
> +	kq = kernel_queue_init(dev, KFD_QUEUE_TYPE_HIQ);
> +	BUG_ON(!kq);
> +
> +	retval = kq->acquire_packet_buffer(kq, 5, &buffer);
> +	BUG_ON(retval != 0);
> +	for (i = 0; i < 5; i++)
> +		buffer[i] = kq->nop_packet;
> +	kq->submit_packet(kq);
> +	kq->sync_with_hw(kq, 1000);
> +
> +	pr_debug("kfd: ending kernel queue test\n");
> +}
> +
> +
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h
> new file mode 100644
> index 0000000..abfb9c8
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h
> @@ -0,0 +1,66 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef KFD_KERNEL_QUEUE_H_
> +#define KFD_KERNEL_QUEUE_H_
> +
> +#include <linux/list.h>
> +#include <linux/types.h>
> +#include "kfd_priv.h"
> +
> +struct kernel_queue {
> +	/* interface */
> +	bool	(*initialize)(struct kernel_queue *kq, struct kfd_dev *dev,
> +			enum kfd_queue_type type, unsigned int queue_size);
> +	void	(*uninitialize)(struct kernel_queue *kq);
> +	int	(*acquire_packet_buffer)(struct kernel_queue *kq,
> +			size_t packet_size_in_dwords, unsigned int **buffer_ptr);
> +	void	(*submit_packet)(struct kernel_queue *kq);
> +	int	(*sync_with_hw)(struct kernel_queue *kq, unsigned long timeout_ms);
> +	void	(*rollback_packet)(struct kernel_queue *kq);
> +
> +	/* data */
> +	struct kfd_dev		*dev;
> +	struct mqd_manager	*mqd;
> +	struct queue		*queue;
> +	qptr_t			pending_wptr;
> +	unsigned int		nop_packet;
> +
> +	kfd_mem_obj		rptr_mem;
> +	qptr_t			*rptr_kernel;
> +	uint64_t		rptr_gpu_addr;
> +	kfd_mem_obj		wptr_mem;
> +	qptr_t			*wptr_kernel;
> +	uint64_t		wptr_gpu_addr;
> +	kfd_mem_obj		pq;
> +	uint64_t		pq_gpu_addr;
> +	qptr_t			*pq_kernel_addr;
> +
> +	kfd_mem_obj		fence_mem_obj;
> +	uint64_t		fence_gpu_addr;
> +	void			*fence_kernel_address;
> +
> +	struct list_head	list;
> +};
> +
> +#endif /* KFD_KERNEL_QUEUE_H_ */
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h
> new file mode 100644
> index 0000000..95e46f8
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h
> @@ -0,0 +1,682 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#ifndef KFD_PM4_HEADERS_H_
> +#define KFD_PM4_HEADERS_H_
> +
> +#ifndef PM4_HEADER_DEFINED
> +#define PM4_HEADER_DEFINED
> +
> +union PM4_TYPE_3_HEADER {
> +	struct {
> +		unsigned int predicate:1;	/* < 0 for diq packets */
> +		unsigned int shader_type:1;	/* < 0 for diq packets */
> +		unsigned int reserved1:6;	/* < reserved */
> +		unsigned int opcode:8;		/* < IT opcode */
> +		unsigned int count:14;		/* < number of DWORDs - 1 in the information body. */
> +		unsigned int type:2;		/* < packet identifier. It should be 3 for type 3 packets */
> +	};
> +	unsigned int u32all;
> +};

Do not build packet that way this will be broken on PPC you might
not care but we try to be little endian safe. Refer to radeon on
how to build packet. So the whole union stuff below is broken.

> +#endif
> +
> +/*
> + * --------------------_MAP_QUEUES--------------------
> + */
> +
> +#ifndef _PM4__MAP_QUEUES_DEFINED
> +#define _PM4__MAP_QUEUES_DEFINED
> +enum _map_queues_queue_sel_enum {
> +	queue_sel___map_queues__map_to_specified_queue_slots = 0,
> +	queue_sel___map_queues__map_to_hws_determined_queue_slots = 1,
> +	queue_sel___map_queues__enable_process_queues = 2,
> +	queue_sel___map_queues__reserved = 3 };
> +
> +enum _map_queues_vidmem_enum {
> +	vidmem___map_queues__uses_no_video_memory = 0,
> +	vidmem___map_queues__uses_video_memory = 1 };
> +
> +enum _map_queues_alloc_format_enum {
> +	alloc_format___map_queues__one_per_pipe = 0,
> +	alloc_format___map_queues__all_on_one_pipe = 1 };
> +
> +enum _map_queues_engine_sel_enum {
> +	engine_sel___map_queues__compute = 0,
> +	engine_sel___map_queues__sdma0_queue = 2,
> +	engine_sel___map_queues__sdma1_queue = 3 };
> +
> +struct pm4_map_queues {
> +	union {
> +		union PM4_TYPE_3_HEADER header;
> +		unsigned int ordinal1;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int reserved1:4;
> +			enum _map_queues_queue_sel_enum queue_sel:2;
> +			unsigned int reserved2:2;
> +			unsigned int vmid:4;
> +			unsigned int reserved3:4;
> +			enum _map_queues_vidmem_enum vidmem:2;
> +			unsigned int reserved4:6;
> +			enum _map_queues_alloc_format_enum alloc_format:2;
> +			enum _map_queues_engine_sel_enum engine_sel:3;
> +			unsigned int num_queues:3;
> +		} bitfields2;
> +		unsigned int ordinal2;
> +	};
> +
> +	struct {
> +		union {
> +			struct {
> +				unsigned int reserved5:2;
> +				unsigned int doorbell_offset:21;
> +				unsigned int reserved6:3;
> +				unsigned int queue:6;
> +			} bitfields3;
> +			unsigned int ordinal3;
> +		};
> +
> +		unsigned int mqd_addr_lo;
> +		unsigned int mqd_addr_hi;
> +		unsigned int wptr_addr_lo;
> +		unsigned int wptr_addr_hi;
> +
> +	} _map_queues_ordinals[1];	/* 1..N of these ordinal groups */
> +
> +};
> +#endif
> +
> +/*
> + * --------------------_QUERY_STATUS--------------------
> + */
> +
> +#ifndef _PM4__QUERY_STATUS_DEFINED
> +#define _PM4__QUERY_STATUS_DEFINED
> +enum _query_status_interrupt_sel_enum {
> +	interrupt_sel___query_status__completion_status = 0,
> +	interrupt_sel___query_status__process_status = 1,
> +	interrupt_sel___query_status__queue_status = 2,
> +	interrupt_sel___query_status__reserved = 3 };
> +
> +enum _query_status_command_enum {
> +	command___query_status__interrupt_only = 0,
> +	command___query_status__fence_only_immediate = 1,
> +	command___query_status__fence_only_after_write_ack = 2,
> +	command___query_status__fence_wait_for_write_ack_send_interrupt = 3 };
> +
> +enum _query_status_engine_sel_enum {
> +	engine_sel___query_status__compute = 0,
> +	engine_sel___query_status__sdma0 = 2,
> +	engine_sel___query_status__sdma1 = 3 };
> +
> +struct pm4_query_status {
> +	union {
> +		union PM4_TYPE_3_HEADER header;
> +		unsigned int ordinal1;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int context_id:28;
> +			enum _query_status_interrupt_sel_enum interrupt_sel:2;
> +			enum _query_status_command_enum command:2;
> +		} bitfields2;
> +		unsigned int ordinal2;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int pasid:16;
> +			unsigned int reserved1:16;
> +		} bitfields3;
> +		struct {
> +			unsigned int reserved2:2;
> +			unsigned int doorbell_offset:21;
> +			unsigned int reserved3:3;
> +			enum _query_status_engine_sel_enum engine_sel:3;
> +			unsigned int reserved4:3;
> +		} bitfields4;
> +		unsigned int ordinal3;
> +	};
> +
> +	unsigned int addr_lo;
> +	unsigned int addr_hi;
> +	unsigned int data_lo;
> +	unsigned int data_hi;
> +
> +};
> +#endif
> +
> +/*
> + * --------------------_UNMAP_QUEUES--------------------
> + */
> +
> +#ifndef _PM4__UNMAP_QUEUES_DEFINED
> +#define _PM4__UNMAP_QUEUES_DEFINED
> +enum _unmap_queues_action_enum {
> +	action___unmap_queues__preempt_queues = 0,
> +	action___unmap_queues__reset_queues = 1,
> +	action___unmap_queues__disable_process_queues = 2,
> +	action___unmap_queues__reserved = 3 };
> +
> +enum _unmap_queues_queue_sel_enum {
> +	queue_sel___unmap_queues__perform_request_on_specified_queues = 0,
> +	queue_sel___unmap_queues__perform_request_on_pasid_queues = 1,
> +	queue_sel___unmap_queues__perform_request_on_all_active_queues = 2,
> +	queue_sel___unmap_queues__reserved = 3 };
> +
> +enum _unmap_queues_engine_sel_enum {
> +	engine_sel___unmap_queues__compute = 0,
> +	engine_sel___unmap_queues__sdma0 = 2,
> +	engine_sel___unmap_queues__sdma1 = 3 };
> +
> +struct pm4_unmap_queues {
> +	union {
> +		union PM4_TYPE_3_HEADER header;
> +		unsigned int ordinal1;
> +	};
> +
> +	union {
> +		struct {
> +			enum _unmap_queues_action_enum action:2;
> +			unsigned int reserved1:2;
> +			enum _unmap_queues_queue_sel_enum queue_sel:2;
> +			unsigned int reserved2:20;
> +			enum _unmap_queues_engine_sel_enum engine_sel:3;
> +			unsigned int num_queues:3;
> +		} bitfields2;
> +		unsigned int ordinal2;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int pasid:16;
> +			unsigned int reserved3:16;
> +		} bitfields3;
> +		struct {
> +			unsigned int reserved4:2;
> +			unsigned int doorbell_offset0:21;
> +			unsigned int reserved5:9;
> +		} bitfields4;
> +		unsigned int ordinal3;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int reserved6:2;
> +			unsigned int doorbell_offset1:21;
> +			unsigned int reserved7:9;
> +		} bitfields5;
> +		unsigned int ordinal4;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int reserved8:2;
> +			unsigned int doorbell_offset2:21;
> +			unsigned int reserved9:9;
> +		} bitfields6;
> +		unsigned int ordinal5;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int reserved10:2;
> +			unsigned int doorbell_offset3:21;
> +			unsigned int reserved11:9;
> +		} bitfields7;
> +		unsigned int ordinal6;
> +	};
> +
> +};
> +#endif
> +
> +/*
> + * --------------------_SET_RESOURCES--------------------
> + */
> +
> +#ifndef _PM4__SET_RESOURCES_DEFINED
> +#define _PM4__SET_RESOURCES_DEFINED
> +enum _set_resources_queue_type_enum {
> +	queue_type___set_resources__hsa_interface_queue_hiq = 1,
> +	queue_type___set_resources__hsa_debug_interface_queue = 4 };
> +
> +struct pm4_set_resources {
> +	union {
> +		union PM4_TYPE_3_HEADER header;
> +		unsigned int ordinal1;
> +	};
> +
> +	union {
> +		struct {
> +
> +			unsigned int vmid_mask:16;
> +			unsigned int unmap_latency:8;
> +			unsigned int reserved1:5;
> +			enum _set_resources_queue_type_enum queue_type:3;
> +		} bitfields2;
> +		unsigned int ordinal2;
> +	};
> +
> +	unsigned int queue_mask_lo;
> +	unsigned int queue_mask_hi;
> +	unsigned int gws_mask_lo;
> +	unsigned int gws_mask_hi;
> +
> +	union {
> +		struct {
> +			unsigned int oac_mask:16;
> +			unsigned int reserved2:16;
> +		} bitfields3;
> +		unsigned int ordinal7;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int gds_heap_base:6;
> +			unsigned int reserved3:5;
> +			unsigned int gds_heap_size:6;
> +			unsigned int reserved4:15;
> +		} bitfields4;
> +		unsigned int ordinal8;
> +	};
> +
> +};
> +#endif
> +
> +/*
> + * --------------------_RUN_LIST--------------------
> + */
> +
> +#ifndef _PM4__RUN_LIST_DEFINED
> +#define _PM4__RUN_LIST_DEFINED
> +
> +struct pm4_runlist {
> +	union {
> +		union PM4_TYPE_3_HEADER header;
> +		unsigned int ordinal1;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int reserved1:2;
> +			unsigned int ib_base_lo:30;
> +		} bitfields2;
> +		unsigned int ordinal2;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int ib_base_hi:16;
> +			unsigned int reserved2:16;
> +		} bitfields3;
> +		unsigned int ordinal3;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int ib_size:20;
> +			unsigned int chain:1;
> +			unsigned int offload_polling:1;
> +			unsigned int reserved3:1;
> +			unsigned int valid:1;
> +			unsigned int vmid:4;
> +			unsigned int reserved4:4;
> +		} bitfields4;
> +		unsigned int ordinal4;
> +	};
> +
> +};
> +#endif
> +
> +/*
> + * --------------------_MAP_PROCESS--------------------
> + */
> +
> +#ifndef _PM4__MAP_PROCESS_DEFINED
> +#define _PM4__MAP_PROCESS_DEFINED
> +
> +struct pm4_map_process {
> +	union {
> +		union PM4_TYPE_3_HEADER header;
> +		unsigned int ordinal1;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int pasid:16;
> +			unsigned int reserved1:8;
> +			unsigned int diq_enable:1;
> +			unsigned int reserved2:7;
> +		} bitfields2;
> +		unsigned int ordinal2;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int page_table_base:28;
> +			unsigned int reserved3:4;
> +		} bitfields3;
> +		unsigned int ordinal3;
> +	};
> +
> +	unsigned int sh_mem_bases;
> +	unsigned int sh_mem_ape1_base;
> +	unsigned int sh_mem_ape1_limit;
> +	unsigned int sh_mem_config;
> +	unsigned int gds_addr_lo;
> +	unsigned int gds_addr_hi;
> +
> +	union {
> +		struct {
> +			unsigned int num_gws:6;
> +			unsigned int reserved4:2;
> +			unsigned int num_oac:4;
> +			unsigned int reserved5:4;
> +			unsigned int gds_size:6;
> +			unsigned int reserved6:10;
> +		} bitfields4;
> +		unsigned int ordinal10;
> +	};
> +
> +};
> +#endif
> +
> +/*--------------------_MAP_QUEUES--------------------*/
> +
> +#ifndef _PM4__MAP_QUEUES_DEFINED
> +#define _PM4__MAP_QUEUES_DEFINED
> +enum _MAP_QUEUES_queue_sel_enum {
> +	 queue_sel___map_queues__map_to_specified_queue_slots = 0,
> +	 queue_sel___map_queues__map_to_hws_determined_queue_slots = 1,
> +	 queue_sel___map_queues__enable_process_queues = 2,
> +	 queue_sel___map_queues__reserved = 3 };
> +
> +enum _MAP_QUEUES_vidmem_enum {
> +	 vidmem___map_queues__uses_no_video_memory = 0,
> +	 vidmem___map_queues__uses_video_memory = 1 };
> +
> +enum _MAP_QUEUES_alloc_format_enum {
> +	 alloc_format___map_queues__one_per_pipe = 0,
> +	 alloc_format___map_queues__all_on_one_pipe = 1 };
> +
> +enum _MAP_QUEUES_engine_sel_enum {
> +	 engine_sel___map_queues__compute = 0,
> +	 engine_sel___map_queues__sdma0_queue = 2,
> +	 engine_sel___map_queues__sdma1_queue = 3 };
> +
> +
> +struct _PM4__MAP_QUEUES {
> +	union {
> +		PM4_TYPE_3_HEADER header;
> +		unsigned int ordinal1;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int reserved1:4;
> +			enum _MAP_QUEUES_queue_sel_enum queue_sel:2;
> +			unsigned int reserved2:2;
> +			unsigned int vmid:4;
> +			unsigned int reserved3:4;
> +			enum _MAP_QUEUES_vidmem_enum vidmem:2;
> +			unsigned int reserved4:6;
> +			enum _MAP_QUEUES_alloc_format_enum alloc_format:2;
> +			enum _MAP_QUEUES_engine_sel_enum engine_sel:3;
> +			unsigned int num_queues:3;
> +		} bitfields2;
> +		unsigned int ordinal2;
> +	};
> +
> +	struct {
> +		union {
> +			struct {
> +				unsigned int reserved5:2;
> +				unsigned int doorbell_offset:21;
> +				unsigned int reserved6:3;
> +				unsigned int queue:6;
> +			} bitfields3;
> +			unsigned int ordinal3;
> +		};
> +
> +		unsigned int mqd_addr_lo;
> +
> +		unsigned int mqd_addr_hi;
> +
> +		unsigned int wptr_addr_lo;
> +
> +		unsigned int wptr_addr_hi;
> +
> +	} _map_queues_ordinals[1];	/* 1..N of these ordinal groups */
> +
> +};
> +#endif
> +
> +/*--------------------_QUERY_STATUS--------------------*/
> +
> +#ifndef _PM4__QUERY_STATUS_DEFINED
> +#define _PM4__QUERY_STATUS_DEFINED
> +enum _QUERY_STATUS_interrupt_sel_enum {
> +	 interrupt_sel___query_status__completion_status = 0,
> +	 interrupt_sel___query_status__process_status = 1,
> +	 interrupt_sel___query_status__queue_status = 2,
> +	 interrupt_sel___query_status__reserved = 3 };
> +
> +enum _QUERY_STATUS_command_enum {
> +	 command___query_status__interrupt_only = 0,
> +	 command___query_status__fence_only_immediate = 1,
> +	 command___query_status__fence_only_after_write_ack = 2,
> +	 command___query_status__fence_wait_for_write_ack_send_interrupt = 3 };
> +
> +enum _QUERY_STATUS_engine_sel_enum {
> +	 engine_sel___query_status__compute = 0,
> +	 engine_sel___query_status__sdma0 = 2,
> +	 engine_sel___query_status__sdma1 = 3 };
> +
> +
> +struct _PM4__QUERY_STATUS {
> +	union {
> +		PM4_TYPE_3_HEADER header;
> +		unsigned int ordinal1;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int context_id:28;
> +			enum _QUERY_STATUS_interrupt_sel_enum interrupt_sel:2;
> +			enum _QUERY_STATUS_command_enum command:2;
> +		} bitfields2;
> +		unsigned int ordinal2;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int pasid:16;
> +			unsigned int reserved1:16;
> +		} bitfields3;
> +		struct {
> +			unsigned int reserved2:2;
> +			unsigned int doorbell_offset:21;
> +			unsigned int reserved3:3;
> +			enum _QUERY_STATUS_engine_sel_enum engine_sel:3;
> +			unsigned int reserved4:3;
> +		} bitfields4;
> +		unsigned int ordinal3;
> +	};
> +
> +	unsigned int addr_lo;
> +
> +	unsigned int addr_hi;
> +
> +	unsigned int data_lo;
> +
> +	unsigned int data_hi;
> +
> +};
> +#endif
> +
> +/*
> + *  --------------------UNMAP_QUEUES--------------------
> + */
> +
> +#ifndef _PM4__UNMAP_QUEUES_DEFINED
> +#define _PM4__UNMAP_QUEUES_DEFINED
> +enum _unmap_queues_action_enum {
> +	 action___unmap_queues__preempt_queues = 0,
> +	 action___unmap_queues__reset_queues = 1,
> +	 action___unmap_queues__disable_process_queues = 2,
> +	 action___unmap_queues__reserved = 3 };
> +
> +enum _unmap_queues_queue_sel_enum {
> +	 queue_sel___unmap_queues__perform_request_on_specified_queues = 0,
> +	 queue_sel___unmap_queues__perform_request_on_pasid_queues = 1,
> +	 queue_sel___unmap_queues__perform_request_on_all_active_queues = 2,
> +	 queue_sel___unmap_queues__reserved = 3 };
> +
> +enum _unmap_queues_engine_sel_enum {
> +	 engine_sel___unmap_queues__compute = 0,
> +	 engine_sel___unmap_queues__sdma0 = 2,
> +	 engine_sel___unmap_queues__sdma1 = 3 };
> +
> +
> +struct pm4_unmap_queues {
> +	union {
> +		PM4_TYPE_3_HEADER header;
> +		unsigned int ordinal1;
> +	};
> +
> +	union {
> +		struct {
> +			_unmap_queues_action_enum action:2;
> +			unsigned int reserved1:2;
> +
> +			_unmap_queues_queue_sel_enum queue_sel:2;
> +			unsigned int reserved2:20;
> +
> +			_unmap_queues_engine_sel_enum engine_sel:3;
> +			unsigned int num_queues:3;
> +		} bitfields2;
> +		unsigned int ordinal2;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int pasid:16;
> +			unsigned int reserved3:16;
> +		} bitfields3;
> +		struct {
> +			unsigned int reserved4:2;
> +			unsigned int doorbell_offset0:21;
> +			unsigned int reserved5:9;
> +		} bitfields4;
> +		unsigned int ordinal3;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int reserved6:2;
> +			unsigned int doorbell_offset1:21;
> +			unsigned int reserved7:9;
> +		} bitfields5;
> +		unsigned int ordinal4;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int reserved8:2;
> +			unsigned int doorbell_offset2:21;
> +			unsigned int reserved9:9;
> +		} bitfields6;
> +		unsigned int ordinal5;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int reserved10:2;
> +			unsigned int doorbell_offset3:21;
> +			unsigned int reserved11:9;
> +		} bitfields7;
> +		unsigned int ordinal6;
> +	};
> +
> +};
> +#endif
> +
> +/* --------------------_SET_SH_REG--------------------*/
> +
> +#ifndef _PM4__SET_SH_REG_DEFINED
> +#define _PM4__SET_SH_REG_DEFINED
> +
> +struct _PM4__SET_SH_REG {
> +	union {
> +		union PM4_TYPE_3_HEADER header;
> +		unsigned int ordinal1;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int reg_offset:16;
> +			unsigned int reserved1:8;
> +			unsigned int vmid_shift:5;
> +			unsigned int insert_vmid:1;
> +			unsigned int reserved2:1;
> +			unsigned int non_incr_addr:1;
> +		} bitfields2;
> +		unsigned int ordinal2;
> +	};
> +
> +	unsigned int reg_data[1];	/* 1..N of these fields */
> +
> +};
> +#endif
> +
> +/*--------------------_SET_CONFIG_REG--------------------*/
> +
> +#ifndef _PM4__SET_CONFIG_REG_DEFINED
> +#define _PM4__SET_CONFIG_REG_DEFINED
> +
> +struct pm4__set_config_reg {
> +	union {
> +		union PM4_TYPE_3_HEADER header;
> +		unsigned int ordinal1;
> +	};
> +
> +	union {
> +		struct {
> +			unsigned int reg_offset:16;
> +			unsigned int reserved1:8;
> +			unsigned int vmid_shift:5;
> +			unsigned int insert_vmid:1;
> +			unsigned int reserved2:2;
> +		} bitfields2;
> +		unsigned int ordinal2;
> +	};
> +
> +	unsigned int reg_data[1];	/* 1..N of these fields */
> +
> +};
> +#endif
> +
> +#endif /* KFD_PM4_HEADERS_H_ */
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h
> new file mode 100644
> index 0000000..b72fa3b
> --- /dev/null
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h
> @@ -0,0 +1,107 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +
> +#ifndef KFD_PM4_OPCODES_H
> +#define KFD_PM4_OPCODES_H
> +
> +enum it_opcode_type {
> +	IT_NOP                               = 0x10,
> +	IT_SET_BASE                          = 0x11,
> +	IT_CLEAR_STATE                       = 0x12,
> +	IT_INDEX_BUFFER_SIZE                 = 0x13,
> +	IT_DISPATCH_DIRECT                   = 0x15,
> +	IT_DISPATCH_INDIRECT                 = 0x16,
> +	IT_ATOMIC_GDS                        = 0x1D,
> +	IT_OCCLUSION_QUERY                   = 0x1F,
> +	IT_SET_PREDICATION                   = 0x20,
> +	IT_REG_RMW                           = 0x21,
> +	IT_COND_EXEC                         = 0x22,
> +	IT_PRED_EXEC                         = 0x23,
> +	IT_DRAW_INDIRECT                     = 0x24,
> +	IT_DRAW_INDEX_INDIRECT               = 0x25,
> +	IT_INDEX_BASE                        = 0x26,
> +	IT_DRAW_INDEX_2                      = 0x27,
> +	IT_CONTEXT_CONTROL                   = 0x28,
> +	IT_INDEX_TYPE                        = 0x2A,
> +	IT_DRAW_INDIRECT_MULTI               = 0x2C,
> +	IT_DRAW_INDEX_AUTO                   = 0x2D,
> +	IT_NUM_INSTANCES                     = 0x2F,
> +	IT_DRAW_INDEX_MULTI_AUTO             = 0x30,
> +	IT_INDIRECT_BUFFER_CNST              = 0x33,
> +	IT_STRMOUT_BUFFER_UPDATE             = 0x34,
> +	IT_DRAW_INDEX_OFFSET_2               = 0x35,
> +	IT_DRAW_PREAMBLE                     = 0x36,
> +	IT_WRITE_DATA                        = 0x37,
> +	IT_DRAW_INDEX_INDIRECT_MULTI         = 0x38,
> +	IT_MEM_SEMAPHORE                     = 0x39,
> +	IT_COPY_DW                           = 0x3B,
> +	IT_WAIT_REG_MEM                      = 0x3C,
> +	IT_INDIRECT_BUFFER                   = 0x3F,
> +	IT_COPY_DATA                         = 0x40,
> +	IT_PFP_SYNC_ME                       = 0x42,
> +	IT_SURFACE_SYNC                      = 0x43,
> +	IT_COND_WRITE                        = 0x45,
> +	IT_EVENT_WRITE                       = 0x46,
> +	IT_EVENT_WRITE_EOP                   = 0x47,
> +	IT_EVENT_WRITE_EOS                   = 0x48,
> +	IT_RELEASE_MEM                       = 0x49,
> +	IT_PREAMBLE_CNTL                     = 0x4A,
> +	IT_DMA_DATA                          = 0x50,
> +	IT_ACQUIRE_MEM                       = 0x58,
> +	IT_REWIND                            = 0x59,
> +	IT_LOAD_UCONFIG_REG                  = 0x5E,
> +	IT_LOAD_SH_REG                       = 0x5F,
> +	IT_LOAD_CONFIG_REG                   = 0x60,
> +	IT_LOAD_CONTEXT_REG                  = 0x61,
> +	IT_SET_CONFIG_REG                    = 0x68,
> +	IT_SET_CONTEXT_REG                   = 0x69,
> +	IT_SET_CONTEXT_REG_INDIRECT          = 0x73,
> +	IT_SET_SH_REG                        = 0x76,
> +	IT_SET_SH_REG_OFFSET                 = 0x77,
> +	IT_SET_QUEUE_REG                     = 0x78,
> +	IT_SET_UCONFIG_REG                   = 0x79,
> +	IT_SCRATCH_RAM_WRITE                 = 0x7D,
> +	IT_SCRATCH_RAM_READ                  = 0x7E,
> +	IT_LOAD_CONST_RAM                    = 0x80,
> +	IT_WRITE_CONST_RAM                   = 0x81,
> +	IT_DUMP_CONST_RAM                    = 0x83,
> +	IT_INCREMENT_CE_COUNTER              = 0x84,
> +	IT_INCREMENT_DE_COUNTER              = 0x85,
> +	IT_WAIT_ON_CE_COUNTER                = 0x86,
> +	IT_WAIT_ON_DE_COUNTER_DIFF           = 0x88,
> +	IT_SWITCH_BUFFER                     = 0x8B,
> +	IT_SET_RESOURCES                     = 0xA0,
> +	IT_MAP_PROCESS                       = 0xA1,
> +	IT_MAP_QUEUES                        = 0xA2,
> +	IT_UNMAP_QUEUES                      = 0xA3,
> +	IT_QUERY_STATUS                      = 0xA4,
> +	IT_RUN_LIST                          = 0xA5,
> +};
> +
> +#define PM4_TYPE_0 0
> +#define PM4_TYPE_2 2
> +#define PM4_TYPE_3 3

Reusing existing radeon define sounds like a good idea here.

> +
> +#endif /* KFD_PM4_OPCODES_H */
> +
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> index 76494757..25f23c5 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> @@ -49,6 +49,15 @@
>  #define KFD_MMAP_DOORBELL_START	(((1ULL << 32)*1) >> PAGE_SHIFT)
>  #define KFD_MMAP_DOORBELL_END	(((1ULL << 32)*2) >> PAGE_SHIFT)

Both of this define do not seems to be use, which is somewhat of a relief when i
look at them.

>  
> +/*
> + * When working with cp scheduler we should assign the HIQ manually or via the radeon driver
> + * to a fixed hqd slot, here are the fixed HIQ hqd slot definitions for Kaveri.
> + * In Kaveri only the first ME queues participates in the cp scheduling taking that in mind
> + * we set the HIQ slot in the second ME.
> + */
> +#define KFD_CIK_HIQ_PIPE 4
> +#define KFD_CIK_HIQ_QUEUE 0
> +
>  /* GPU ID hash width in bits */
>  #define KFD_GPU_ID_HASH_WIDTH 16
>  
> @@ -68,6 +77,11 @@ typedef u32 doorbell_t;
>  /* Type that represents queue pointer */
>  typedef u32 qptr_t;
>  
> +enum cache_policy {
> +	cache_policy_coherent,
> +	cache_policy_noncoherent
> +};
> +
>  struct kfd_device_info {
>  	const struct kfd_scheduler_class *scheduler_class;
>  	unsigned int max_pasid_bits;
> @@ -96,6 +110,9 @@ struct kfd_dev {
>  	u32 __iomem *doorbell_kernel_ptr; /* this is a pointer for a doorbells page used by kernel queue */
>  
>  	struct kgd2kfd_shared_resources shared_resources;
> +
> +	/* QCM Device instance */
> +	struct device_queue_manager *dqm;
>  };
>  
>  /* KGD2KFD callbacks */
> @@ -300,4 +317,19 @@ int kgd2kfd_resume(struct kfd_dev *dev);
>  /* amdkfd Apertures */
>  int kfd_init_apertures(struct kfd_process *process);
>  
> +/* Queue Context Management */
> +int init_queue(struct queue **q, struct queue_properties properties);
> +void uninit_queue(struct queue *q);
> +void print_queue(struct queue *q);
> +
> +/* Packet Manager */
> +
> +struct packet_manager {
> +	struct device_queue_manager *dqm;
> +	struct kernel_queue *priv_queue;
> +	struct mutex lock;
> +	bool allocated;
> +	kfd_mem_obj ib_buffer_obj;
> +};
> +
>  #endif
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 15/25] amdkfd: Add kernel queue module
  2014-07-21  2:42   ` Jerome Glisse
@ 2014-07-27 11:05     ` Oded Gabbay
  2014-07-27 12:40       ` Christian König
  0 siblings, 1 reply; 46+ messages in thread
From: Oded Gabbay @ 2014-07-27 11:05 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Alexey Skidanov, Andrew Morton

On 21/07/14 05:42, Jerome Glisse wrote:
> On Thu, Jul 17, 2014 at 04:29:22PM +0300, Oded Gabbay wrote:
>> From: Ben Goz <ben.goz@amd.com>
>>
>> The kernel queue module enables the amdkfd to establish kernel queues, not exposed to user space.
>>
>> The kernel queues are used for HIQ (HSA Interface Queue) and DIQ (Debug Interface Queue) operations
>>
>> Signed-off-by: Ben Goz <ben.goz@amd.com>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> ---
>>   drivers/gpu/drm/radeon/amdkfd/Makefile             |   3 +-
>>   .../drm/radeon/amdkfd/kfd_device_queue_manager.h   | 101 +++
>>   drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c   | 305 +++++++++
>>   drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h   |  66 ++
>>   drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h    | 682 +++++++++++++++++++++
>>   drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h    | 107 ++++
>>   drivers/gpu/drm/radeon/amdkfd/kfd_priv.h           |  32 +
>>   7 files changed, 1295 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.h
>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c
>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h
>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h
>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c
>> new file mode 100644
>> index 0000000..b212524
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c
>> +
>> +static int sync_with_hw(struct kernel_queue *kq, unsigned long timeout_ms)
>> +{
>> +	unsigned long org_timeout_ms;
>> +
>> +	BUG_ON(!kq);
>> +
>> +	org_timeout_ms = timeout_ms;
>> +	timeout_ms += jiffies * 1000 / HZ;
>> +	while (*kq->wptr_kernel != *kq->rptr_kernel) {
>
> I am not a fan of this kind of busy wait even with the cpu_relax below. Won't
> there be some interrupt you can wait on (signaled through a wait queue perhaps) ?
>
So there is an interrupt but we don't use it for two reasons:
1. According to our thunk spec (thunk is the userspace bits of amdkfd), all 
ioctls calls to amdkfd must be synchronous, meaning that when the ioctl returns 
to thunk, the operation has completed. The sync_with_hw function is called 
during the create/destroy/update queue ioctls and we must wait to its completion 
before returning from the ioctl. Therefore, there is no point in using interrupt 
here as we will also need to wait for the interrupt before returning. It is 
especially important in the destroy path, as the runtime library above the thunk 
release the memory of the queue once it returns from the thunk's destroy function.

2. Simpler code. The operations of adding/destroying queue require allocations 
and releases of various memory objects (runlists, indirect buffers). Adding an 
interrupt context in the middle of this would make the code a lot more complex 
than it should be, IMO.

>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h
>> new file mode 100644
>> index 0000000..abfb9c8
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h
>> @@ -0,0 +1,66 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#ifndef KFD_KERNEL_QUEUE_H_
>> +#define KFD_KERNEL_QUEUE_H_
>> +
>> +#include <linux/list.h>
>> +#include <linux/types.h>
>> +#include "kfd_priv.h"
>> +
>> +struct kernel_queue {
>> +	/* interface */
>> +	bool	(*initialize)(struct kernel_queue *kq, struct kfd_dev *dev,
>> +			enum kfd_queue_type type, unsigned int queue_size);
>> +	void	(*uninitialize)(struct kernel_queue *kq);
>> +	int	(*acquire_packet_buffer)(struct kernel_queue *kq,
>> +			size_t packet_size_in_dwords, unsigned int **buffer_ptr);
>> +	void	(*submit_packet)(struct kernel_queue *kq);
>> +	int	(*sync_with_hw)(struct kernel_queue *kq, unsigned long timeout_ms);
>> +	void	(*rollback_packet)(struct kernel_queue *kq);
>> +
>> +	/* data */
>> +	struct kfd_dev		*dev;
>> +	struct mqd_manager	*mqd;
>> +	struct queue		*queue;
>> +	qptr_t			pending_wptr;
>> +	unsigned int		nop_packet;
>> +
>> +	kfd_mem_obj		rptr_mem;
>> +	qptr_t			*rptr_kernel;
>> +	uint64_t		rptr_gpu_addr;
>> +	kfd_mem_obj		wptr_mem;
>> +	qptr_t			*wptr_kernel;
>> +	uint64_t		wptr_gpu_addr;
>> +	kfd_mem_obj		pq;
>> +	uint64_t		pq_gpu_addr;
>> +	qptr_t			*pq_kernel_addr;
>> +
>> +	kfd_mem_obj		fence_mem_obj;
>> +	uint64_t		fence_gpu_addr;
>> +	void			*fence_kernel_address;
>> +
>> +	struct list_head	list;
>> +};
>> +
>> +#endif /* KFD_KERNEL_QUEUE_H_ */
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h
>> new file mode 100644
>> index 0000000..95e46f8
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h
>> @@ -0,0 +1,682 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#ifndef KFD_PM4_HEADERS_H_
>> +#define KFD_PM4_HEADERS_H_
>> +
>> +#ifndef PM4_HEADER_DEFINED
>> +#define PM4_HEADER_DEFINED
>> +
>> +union PM4_TYPE_3_HEADER {
>> +	struct {
>> +		unsigned int predicate:1;	/* < 0 for diq packets */
>> +		unsigned int shader_type:1;	/* < 0 for diq packets */
>> +		unsigned int reserved1:6;	/* < reserved */
>> +		unsigned int opcode:8;		/* < IT opcode */
>> +		unsigned int count:14;		/* < number of DWORDs - 1 in the information body. */
>> +		unsigned int type:2;		/* < packet identifier. It should be 3 for type 3 packets */
>> +	};
>> +	unsigned int u32all;
>> +};
>
> Do not build packet that way this will be broken on PPC you might
> not care but we try to be little endian safe. Refer to radeon on
> how to build packet. So the whole union stuff below is broken.
>
Agreed, but I would like to postpone this fix if possible to later stage (rc stage).
>> +#endif
>> +

>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h
>> new file mode 100644
>> index 0000000..b72fa3b
>> --- /dev/null
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h
>> @@ -0,0 +1,107 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +
>> +#ifndef KFD_PM4_OPCODES_H
>> +#define KFD_PM4_OPCODES_H
>> +
>> +enum it_opcode_type {
>> +	IT_NOP                               = 0x10,
>> +	IT_SET_BASE                          = 0x11,
>> +	IT_CLEAR_STATE                       = 0x12,
>> +	IT_INDEX_BUFFER_SIZE                 = 0x13,
>> +	IT_DISPATCH_DIRECT                   = 0x15,
>> +	IT_DISPATCH_INDIRECT                 = 0x16,
>> +	IT_ATOMIC_GDS                        = 0x1D,
>> +	IT_OCCLUSION_QUERY                   = 0x1F,
>> +	IT_SET_PREDICATION                   = 0x20,
>> +	IT_REG_RMW                           = 0x21,
>> +	IT_COND_EXEC                         = 0x22,
>> +	IT_PRED_EXEC                         = 0x23,
>> +	IT_DRAW_INDIRECT                     = 0x24,
>> +	IT_DRAW_INDEX_INDIRECT               = 0x25,
>> +	IT_INDEX_BASE                        = 0x26,
>> +	IT_DRAW_INDEX_2                      = 0x27,
>> +	IT_CONTEXT_CONTROL                   = 0x28,
>> +	IT_INDEX_TYPE                        = 0x2A,
>> +	IT_DRAW_INDIRECT_MULTI               = 0x2C,
>> +	IT_DRAW_INDEX_AUTO                   = 0x2D,
>> +	IT_NUM_INSTANCES                     = 0x2F,
>> +	IT_DRAW_INDEX_MULTI_AUTO             = 0x30,
>> +	IT_INDIRECT_BUFFER_CNST              = 0x33,
>> +	IT_STRMOUT_BUFFER_UPDATE             = 0x34,
>> +	IT_DRAW_INDEX_OFFSET_2               = 0x35,
>> +	IT_DRAW_PREAMBLE                     = 0x36,
>> +	IT_WRITE_DATA                        = 0x37,
>> +	IT_DRAW_INDEX_INDIRECT_MULTI         = 0x38,
>> +	IT_MEM_SEMAPHORE                     = 0x39,
>> +	IT_COPY_DW                           = 0x3B,
>> +	IT_WAIT_REG_MEM                      = 0x3C,
>> +	IT_INDIRECT_BUFFER                   = 0x3F,
>> +	IT_COPY_DATA                         = 0x40,
>> +	IT_PFP_SYNC_ME                       = 0x42,
>> +	IT_SURFACE_SYNC                      = 0x43,
>> +	IT_COND_WRITE                        = 0x45,
>> +	IT_EVENT_WRITE                       = 0x46,
>> +	IT_EVENT_WRITE_EOP                   = 0x47,
>> +	IT_EVENT_WRITE_EOS                   = 0x48,
>> +	IT_RELEASE_MEM                       = 0x49,
>> +	IT_PREAMBLE_CNTL                     = 0x4A,
>> +	IT_DMA_DATA                          = 0x50,
>> +	IT_ACQUIRE_MEM                       = 0x58,
>> +	IT_REWIND                            = 0x59,
>> +	IT_LOAD_UCONFIG_REG                  = 0x5E,
>> +	IT_LOAD_SH_REG                       = 0x5F,
>> +	IT_LOAD_CONFIG_REG                   = 0x60,
>> +	IT_LOAD_CONTEXT_REG                  = 0x61,
>> +	IT_SET_CONFIG_REG                    = 0x68,
>> +	IT_SET_CONTEXT_REG                   = 0x69,
>> +	IT_SET_CONTEXT_REG_INDIRECT          = 0x73,
>> +	IT_SET_SH_REG                        = 0x76,
>> +	IT_SET_SH_REG_OFFSET                 = 0x77,
>> +	IT_SET_QUEUE_REG                     = 0x78,
>> +	IT_SET_UCONFIG_REG                   = 0x79,
>> +	IT_SCRATCH_RAM_WRITE                 = 0x7D,
>> +	IT_SCRATCH_RAM_READ                  = 0x7E,
>> +	IT_LOAD_CONST_RAM                    = 0x80,
>> +	IT_WRITE_CONST_RAM                   = 0x81,
>> +	IT_DUMP_CONST_RAM                    = 0x83,
>> +	IT_INCREMENT_CE_COUNTER              = 0x84,
>> +	IT_INCREMENT_DE_COUNTER              = 0x85,
>> +	IT_WAIT_ON_CE_COUNTER                = 0x86,
>> +	IT_WAIT_ON_DE_COUNTER_DIFF           = 0x88,
>> +	IT_SWITCH_BUFFER                     = 0x8B,
>> +	IT_SET_RESOURCES                     = 0xA0,
>> +	IT_MAP_PROCESS                       = 0xA1,
>> +	IT_MAP_QUEUES                        = 0xA2,
>> +	IT_UNMAP_QUEUES                      = 0xA3,
>> +	IT_QUERY_STATUS                      = 0xA4,
>> +	IT_RUN_LIST                          = 0xA5,
>> +};
>> +
>> +#define PM4_TYPE_0 0
>> +#define PM4_TYPE_2 2
>> +#define PM4_TYPE_3 3
>
> Reusing existing radeon define sounds like a good idea here.
Agreed, but I would like to postpone this fix if possible to later stage (rc stage).

>
>> +
>> +#endif /* KFD_PM4_OPCODES_H */
>> +
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> index 76494757..25f23c5 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> @@ -49,6 +49,15 @@
>>   #define KFD_MMAP_DOORBELL_START	(((1ULL << 32)*1) >> PAGE_SHIFT)
>>   #define KFD_MMAP_DOORBELL_END	(((1ULL << 32)*2) >> PAGE_SHIFT)
>
> Both of this define do not seems to be use, which is somewhat of a relief when i
> look at them.
>
Actually they are in use, in kfd_mmap(), kfd_doorbell_mmap() and map_doorbells()

	Oded

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 15/25] amdkfd: Add kernel queue module
  2014-07-27 11:05     ` Oded Gabbay
@ 2014-07-27 12:40       ` Christian König
  0 siblings, 0 replies; 46+ messages in thread
From: Christian König @ 2014-07-27 12:40 UTC (permalink / raw)
  To: Oded Gabbay, Jerome Glisse
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Alexey Skidanov, Andrew Morton

Am 27.07.2014 um 13:05 schrieb Oded Gabbay:
> On 21/07/14 05:42, Jerome Glisse wrote:
>> On Thu, Jul 17, 2014 at 04:29:22PM +0300, Oded Gabbay wrote:
>>> From: Ben Goz <ben.goz@amd.com>
>>>
>>> The kernel queue module enables the amdkfd to establish kernel 
>>> queues, not exposed to user space.
>>>
>>> The kernel queues are used for HIQ (HSA Interface Queue) and DIQ 
>>> (Debug Interface Queue) operations
>>>
>>> Signed-off-by: Ben Goz <ben.goz@amd.com>
>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>> ---
>>>   drivers/gpu/drm/radeon/amdkfd/Makefile             |   3 +-
>>>   .../drm/radeon/amdkfd/kfd_device_queue_manager.h   | 101 +++
>>>   drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c   | 305 +++++++++
>>>   drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h   |  66 ++
>>>   drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h    | 682 
>>> +++++++++++++++++++++
>>>   drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h    | 107 ++++
>>>   drivers/gpu/drm/radeon/amdkfd/kfd_priv.h           |  32 +
>>>   7 files changed, 1295 insertions(+), 1 deletion(-)
>>>   create mode 100644 
>>> drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.h
>>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c
>>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h
>>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h
>>>   create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h
>>>
>>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c 
>>> b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c
>>> new file mode 100644
>>> index 0000000..b212524
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.c
>>> +
>>> +static int sync_with_hw(struct kernel_queue *kq, unsigned long 
>>> timeout_ms)
>>> +{
>>> +    unsigned long org_timeout_ms;
>>> +
>>> +    BUG_ON(!kq);
>>> +
>>> +    org_timeout_ms = timeout_ms;
>>> +    timeout_ms += jiffies * 1000 / HZ;
>>> +    while (*kq->wptr_kernel != *kq->rptr_kernel) {
>>
>> I am not a fan of this kind of busy wait even with the cpu_relax 
>> below. Won't
>> there be some interrupt you can wait on (signaled through a wait 
>> queue perhaps) ?
>>
> So there is an interrupt but we don't use it for two reasons:
> 1. According to our thunk spec (thunk is the userspace bits of 
> amdkfd), all ioctls calls to amdkfd must be synchronous, meaning that 
> when the ioctl returns to thunk, the operation has completed. The 
> sync_with_hw function is called during the create/destroy/update queue 
> ioctls and we must wait to its completion before returning from the 
> ioctl. Therefore, there is no point in using interrupt here as we will 
> also need to wait for the interrupt before returning. It is especially 
> important in the destroy path, as the runtime library above the thunk 
> release the memory of the queue once it returns from the thunk's 
> destroy function.
>
> 2. Simpler code. The operations of adding/destroying queue require 
> allocations and releases of various memory objects (runlists, indirect 
> buffers). Adding an interrupt context in the middle of this would make 
> the code a lot more complex than it should be, IMO.

How about using a wait_queue? That would allow the IOCTL to be 
synchronous, is rather simple to implement and still don't busy wait for 
any hardware state to be reached.

Regards,
Christian.

>
>>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h 
>>> b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h
>>> new file mode 100644
>>> index 0000000..abfb9c8
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_kernel_queue.h
>>> @@ -0,0 +1,66 @@
>>> +/*
>>> + * Copyright 2014 Advanced Micro Devices, Inc.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person 
>>> obtaining a
>>> + * copy of this software and associated documentation files (the 
>>> "Software"),
>>> + * to deal in the Software without restriction, including without 
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, 
>>> sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to 
>>> whom the
>>> + * Software is furnished to do so, subject to the following 
>>> conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be 
>>> included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>>> EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>>> MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
>>> EVENT SHALL
>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
>>> DAMAGES OR
>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
>>> OTHERWISE,
>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
>>> USE OR
>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>> + *
>>> + */
>>> +
>>> +#ifndef KFD_KERNEL_QUEUE_H_
>>> +#define KFD_KERNEL_QUEUE_H_
>>> +
>>> +#include <linux/list.h>
>>> +#include <linux/types.h>
>>> +#include "kfd_priv.h"
>>> +
>>> +struct kernel_queue {
>>> +    /* interface */
>>> +    bool    (*initialize)(struct kernel_queue *kq, struct kfd_dev 
>>> *dev,
>>> +            enum kfd_queue_type type, unsigned int queue_size);
>>> +    void    (*uninitialize)(struct kernel_queue *kq);
>>> +    int    (*acquire_packet_buffer)(struct kernel_queue *kq,
>>> +            size_t packet_size_in_dwords, unsigned int **buffer_ptr);
>>> +    void    (*submit_packet)(struct kernel_queue *kq);
>>> +    int    (*sync_with_hw)(struct kernel_queue *kq, unsigned long 
>>> timeout_ms);
>>> +    void    (*rollback_packet)(struct kernel_queue *kq);
>>> +
>>> +    /* data */
>>> +    struct kfd_dev        *dev;
>>> +    struct mqd_manager    *mqd;
>>> +    struct queue        *queue;
>>> +    qptr_t            pending_wptr;
>>> +    unsigned int        nop_packet;
>>> +
>>> +    kfd_mem_obj        rptr_mem;
>>> +    qptr_t            *rptr_kernel;
>>> +    uint64_t        rptr_gpu_addr;
>>> +    kfd_mem_obj        wptr_mem;
>>> +    qptr_t            *wptr_kernel;
>>> +    uint64_t        wptr_gpu_addr;
>>> +    kfd_mem_obj        pq;
>>> +    uint64_t        pq_gpu_addr;
>>> +    qptr_t            *pq_kernel_addr;
>>> +
>>> +    kfd_mem_obj        fence_mem_obj;
>>> +    uint64_t        fence_gpu_addr;
>>> +    void            *fence_kernel_address;
>>> +
>>> +    struct list_head    list;
>>> +};
>>> +
>>> +#endif /* KFD_KERNEL_QUEUE_H_ */
>>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h 
>>> b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h
>>> new file mode 100644
>>> index 0000000..95e46f8
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_headers.h
>>> @@ -0,0 +1,682 @@
>>> +/*
>>> + * Copyright 2014 Advanced Micro Devices, Inc.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person 
>>> obtaining a
>>> + * copy of this software and associated documentation files (the 
>>> "Software"),
>>> + * to deal in the Software without restriction, including without 
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, 
>>> sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to 
>>> whom the
>>> + * Software is furnished to do so, subject to the following 
>>> conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be 
>>> included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>>> EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>>> MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
>>> EVENT SHALL
>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
>>> DAMAGES OR
>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
>>> OTHERWISE,
>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
>>> USE OR
>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>> + *
>>> + */
>>> +
>>> +#ifndef KFD_PM4_HEADERS_H_
>>> +#define KFD_PM4_HEADERS_H_
>>> +
>>> +#ifndef PM4_HEADER_DEFINED
>>> +#define PM4_HEADER_DEFINED
>>> +
>>> +union PM4_TYPE_3_HEADER {
>>> +    struct {
>>> +        unsigned int predicate:1;    /* < 0 for diq packets */
>>> +        unsigned int shader_type:1;    /* < 0 for diq packets */
>>> +        unsigned int reserved1:6;    /* < reserved */
>>> +        unsigned int opcode:8;        /* < IT opcode */
>>> +        unsigned int count:14;        /* < number of DWORDs - 1 in 
>>> the information body. */
>>> +        unsigned int type:2;        /* < packet identifier. It 
>>> should be 3 for type 3 packets */
>>> +    };
>>> +    unsigned int u32all;
>>> +};
>>
>> Do not build packet that way this will be broken on PPC you might
>> not care but we try to be little endian safe. Refer to radeon on
>> how to build packet. So the whole union stuff below is broken.
>>
> Agreed, but I would like to postpone this fix if possible to later 
> stage (rc stage).
>>> +#endif
>>> +
>
>>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h 
>>> b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h
>>> new file mode 100644
>>> index 0000000..b72fa3b
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_pm4_opcodes.h
>>> @@ -0,0 +1,107 @@
>>> +/*
>>> + * Copyright 2014 Advanced Micro Devices, Inc.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person 
>>> obtaining a
>>> + * copy of this software and associated documentation files (the 
>>> "Software"),
>>> + * to deal in the Software without restriction, including without 
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, 
>>> sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to 
>>> whom the
>>> + * Software is furnished to do so, subject to the following 
>>> conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be 
>>> included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
>>> EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
>>> MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO 
>>> EVENT SHALL
>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, 
>>> DAMAGES OR
>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 
>>> OTHERWISE,
>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 
>>> USE OR
>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>> + *
>>> + */
>>> +
>>> +
>>> +#ifndef KFD_PM4_OPCODES_H
>>> +#define KFD_PM4_OPCODES_H
>>> +
>>> +enum it_opcode_type {
>>> +    IT_NOP                               = 0x10,
>>> +    IT_SET_BASE                          = 0x11,
>>> +    IT_CLEAR_STATE                       = 0x12,
>>> +    IT_INDEX_BUFFER_SIZE                 = 0x13,
>>> +    IT_DISPATCH_DIRECT                   = 0x15,
>>> +    IT_DISPATCH_INDIRECT                 = 0x16,
>>> +    IT_ATOMIC_GDS                        = 0x1D,
>>> +    IT_OCCLUSION_QUERY                   = 0x1F,
>>> +    IT_SET_PREDICATION                   = 0x20,
>>> +    IT_REG_RMW                           = 0x21,
>>> +    IT_COND_EXEC                         = 0x22,
>>> +    IT_PRED_EXEC                         = 0x23,
>>> +    IT_DRAW_INDIRECT                     = 0x24,
>>> +    IT_DRAW_INDEX_INDIRECT               = 0x25,
>>> +    IT_INDEX_BASE                        = 0x26,
>>> +    IT_DRAW_INDEX_2                      = 0x27,
>>> +    IT_CONTEXT_CONTROL                   = 0x28,
>>> +    IT_INDEX_TYPE                        = 0x2A,
>>> +    IT_DRAW_INDIRECT_MULTI               = 0x2C,
>>> +    IT_DRAW_INDEX_AUTO                   = 0x2D,
>>> +    IT_NUM_INSTANCES                     = 0x2F,
>>> +    IT_DRAW_INDEX_MULTI_AUTO             = 0x30,
>>> +    IT_INDIRECT_BUFFER_CNST              = 0x33,
>>> +    IT_STRMOUT_BUFFER_UPDATE             = 0x34,
>>> +    IT_DRAW_INDEX_OFFSET_2               = 0x35,
>>> +    IT_DRAW_PREAMBLE                     = 0x36,
>>> +    IT_WRITE_DATA                        = 0x37,
>>> +    IT_DRAW_INDEX_INDIRECT_MULTI         = 0x38,
>>> +    IT_MEM_SEMAPHORE                     = 0x39,
>>> +    IT_COPY_DW                           = 0x3B,
>>> +    IT_WAIT_REG_MEM                      = 0x3C,
>>> +    IT_INDIRECT_BUFFER                   = 0x3F,
>>> +    IT_COPY_DATA                         = 0x40,
>>> +    IT_PFP_SYNC_ME                       = 0x42,
>>> +    IT_SURFACE_SYNC                      = 0x43,
>>> +    IT_COND_WRITE                        = 0x45,
>>> +    IT_EVENT_WRITE                       = 0x46,
>>> +    IT_EVENT_WRITE_EOP                   = 0x47,
>>> +    IT_EVENT_WRITE_EOS                   = 0x48,
>>> +    IT_RELEASE_MEM                       = 0x49,
>>> +    IT_PREAMBLE_CNTL                     = 0x4A,
>>> +    IT_DMA_DATA                          = 0x50,
>>> +    IT_ACQUIRE_MEM                       = 0x58,
>>> +    IT_REWIND                            = 0x59,
>>> +    IT_LOAD_UCONFIG_REG                  = 0x5E,
>>> +    IT_LOAD_SH_REG                       = 0x5F,
>>> +    IT_LOAD_CONFIG_REG                   = 0x60,
>>> +    IT_LOAD_CONTEXT_REG                  = 0x61,
>>> +    IT_SET_CONFIG_REG                    = 0x68,
>>> +    IT_SET_CONTEXT_REG                   = 0x69,
>>> +    IT_SET_CONTEXT_REG_INDIRECT          = 0x73,
>>> +    IT_SET_SH_REG                        = 0x76,
>>> +    IT_SET_SH_REG_OFFSET                 = 0x77,
>>> +    IT_SET_QUEUE_REG                     = 0x78,
>>> +    IT_SET_UCONFIG_REG                   = 0x79,
>>> +    IT_SCRATCH_RAM_WRITE                 = 0x7D,
>>> +    IT_SCRATCH_RAM_READ                  = 0x7E,
>>> +    IT_LOAD_CONST_RAM                    = 0x80,
>>> +    IT_WRITE_CONST_RAM                   = 0x81,
>>> +    IT_DUMP_CONST_RAM                    = 0x83,
>>> +    IT_INCREMENT_CE_COUNTER              = 0x84,
>>> +    IT_INCREMENT_DE_COUNTER              = 0x85,
>>> +    IT_WAIT_ON_CE_COUNTER                = 0x86,
>>> +    IT_WAIT_ON_DE_COUNTER_DIFF           = 0x88,
>>> +    IT_SWITCH_BUFFER                     = 0x8B,
>>> +    IT_SET_RESOURCES                     = 0xA0,
>>> +    IT_MAP_PROCESS                       = 0xA1,
>>> +    IT_MAP_QUEUES                        = 0xA2,
>>> +    IT_UNMAP_QUEUES                      = 0xA3,
>>> +    IT_QUERY_STATUS                      = 0xA4,
>>> +    IT_RUN_LIST                          = 0xA5,
>>> +};
>>> +
>>> +#define PM4_TYPE_0 0
>>> +#define PM4_TYPE_2 2
>>> +#define PM4_TYPE_3 3
>>
>> Reusing existing radeon define sounds like a good idea here.
> Agreed, but I would like to postpone this fix if possible to later 
> stage (rc stage).
>
>>
>>> +
>>> +#endif /* KFD_PM4_OPCODES_H */
>>> +
>>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h 
>>> b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>>> index 76494757..25f23c5 100644
>>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>>> @@ -49,6 +49,15 @@
>>>   #define KFD_MMAP_DOORBELL_START    (((1ULL << 32)*1) >> PAGE_SHIFT)
>>>   #define KFD_MMAP_DOORBELL_END    (((1ULL << 32)*2) >> PAGE_SHIFT)
>>
>> Both of this define do not seems to be use, which is somewhat of a 
>> relief when i
>> look at them.
>>
> Actually they are in use, in kfd_mmap(), kfd_doorbell_mmap() and 
> map_doorbells()
>
>     Oded
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 16/25] amdkfd: Add module parameter of scheduling policy
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (11 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 15/25] amdkfd: Add kernel queue module Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-21  2:45   ` Jerome Glisse
  2014-07-17 13:29 ` [PATCH v2 17/25] amdkfd: Add packet manager module Oded Gabbay
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Ben Goz <ben.goz@amd.com>

This patch adds a new parameter to the amdkfd driver. This parameter enables the user to select the scheduling policy of the CP. The choices are:

* CP Scheduling with support for over-subscription
* CP Scheduling without support for over-subscription
* Without CP Scheduling

Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/kfd_module.c | 4 ++++
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h   | 9 +++++++++
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
index dc08f51..fe5e39d 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
@@ -46,6 +46,10 @@ static const struct kgd2kfd_calls kgd2kfd = {
 	.resume		= kgd2kfd_resume,
 };
 
+int sched_policy = KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION;
+module_param(sched_policy, int, S_IRUSR | S_IWUSR);
+MODULE_PARM_DESC(sched_policy, "Kernel cmdline parameter define the kfd scheduling policy");
+
 bool kgd2kfd_init(unsigned interface_version,
 		  const struct kfd2kgd_calls *f2g,
 		  const struct kgd2kfd_calls **g2f)
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
index 25f23c5..8be07a1 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -64,6 +64,15 @@
 /* Macro for allocating structures */
 #define kfd_alloc_struct(ptr_to_struct)	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
 
+/* Kernel module parameter to specify the scheduling policy */
+extern int sched_policy;
+
+enum kfd_sched_policy {
+	KFD_SCHED_POLICY_HWS = 0,
+	KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION,
+	KFD_SCHED_POLICY_NO_HWS
+};
+
 /*
  * Large enough to hold the maximum usable pasid + 1.
  * It must also be able to store the number of doorbells
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 16/25] amdkfd: Add module parameter of scheduling policy
  2014-07-17 13:29 ` [PATCH v2 16/25] amdkfd: Add module parameter of scheduling policy Oded Gabbay
@ 2014-07-21  2:45   ` Jerome Glisse
  2014-07-27 10:21     ` Oded Gabbay
  0 siblings, 1 reply; 46+ messages in thread
From: Jerome Glisse @ 2014-07-21  2:45 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Evgeny Pinchuk, Alexey Skidanov, Alex Deucher, Andrew Morton,
	Christian König

On Thu, Jul 17, 2014 at 04:29:23PM +0300, Oded Gabbay wrote:
> From: Ben Goz <ben.goz@amd.com>
> 
> This patch adds a new parameter to the amdkfd driver. This parameter enables the user to select the scheduling policy of the CP. The choices are:
> 
> * CP Scheduling with support for over-subscription
> * CP Scheduling without support for over-subscription
> * Without CP Scheduling

Is this property per process ? And again 80 chars line please.

> 
> Signed-off-by: Ben Goz <ben.goz@amd.com>
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/drm/radeon/amdkfd/kfd_module.c | 4 ++++
>  drivers/gpu/drm/radeon/amdkfd/kfd_priv.h   | 9 +++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
> index dc08f51..fe5e39d 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
> @@ -46,6 +46,10 @@ static const struct kgd2kfd_calls kgd2kfd = {
>  	.resume		= kgd2kfd_resume,
>  };
>  
> +int sched_policy = KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION;
> +module_param(sched_policy, int, S_IRUSR | S_IWUSR);
> +MODULE_PARM_DESC(sched_policy, "Kernel cmdline parameter define the kfd scheduling policy");
> +
>  bool kgd2kfd_init(unsigned interface_version,
>  		  const struct kfd2kgd_calls *f2g,
>  		  const struct kgd2kfd_calls **g2f)
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> index 25f23c5..8be07a1 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> @@ -64,6 +64,15 @@
>  /* Macro for allocating structures */
>  #define kfd_alloc_struct(ptr_to_struct)	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
>  
> +/* Kernel module parameter to specify the scheduling policy */
> +extern int sched_policy;
> +
> +enum kfd_sched_policy {
> +	KFD_SCHED_POLICY_HWS = 0,
> +	KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION,
> +	KFD_SCHED_POLICY_NO_HWS
> +};
> +
>  /*
>   * Large enough to hold the maximum usable pasid + 1.
>   * It must also be able to store the number of doorbells
> -- 
> 1.9.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 16/25] amdkfd: Add module parameter of scheduling policy
  2014-07-21  2:45   ` Jerome Glisse
@ 2014-07-27 10:21     ` Oded Gabbay
  0 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-27 10:21 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Alexey Skidanov, Deucher, Alexander, Andrew Morton,
	Christian König

On 21/07/14 05:45, Jerome Glisse wrote:
> On Thu, Jul 17, 2014 at 04:29:23PM +0300, Oded Gabbay wrote:
>> From: Ben Goz <ben.goz@amd.com>
>>
>> This patch adds a new parameter to the amdkfd driver. This parameter enables the user to select the scheduling policy of the CP. The choices are:
>>
>> * CP Scheduling with support for over-subscription
>> * CP Scheduling without support for over-subscription
>> * Without CP Scheduling
>
> Is this property per process ?
No, this is the general scheduling mode for all of amdkfd.
The runlist that we feed to the GPU contains queues from all HSA processes. 
Furthermore, the number of hardware queues is a total number of the GPU. 
Therefore, there is no option to operate in different modes (and I see no point 
in that).

Also, I see I forgot to write in the commit msg that the third option (without 
CP Scheduling) is only for debug purposes and bringup of new H/W. As such, it is 
_not_ guaranteed to work at all times on all H/W versions.
Added this is v3.

> And again 80 chars line please.
Fixed in v3.
	Oded

>
>>
>> Signed-off-by: Ben Goz <ben.goz@amd.com>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> ---
>>   drivers/gpu/drm/radeon/amdkfd/kfd_module.c | 4 ++++
>>   drivers/gpu/drm/radeon/amdkfd/kfd_priv.h   | 9 +++++++++
>>   2 files changed, 13 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>> index dc08f51..fe5e39d 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_module.c
>> @@ -46,6 +46,10 @@ static const struct kgd2kfd_calls kgd2kfd = {
>>   	.resume		= kgd2kfd_resume,
>>   };
>>
>> +int sched_policy = KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION;
>> +module_param(sched_policy, int, S_IRUSR | S_IWUSR);
>> +MODULE_PARM_DESC(sched_policy, "Kernel cmdline parameter define the kfd scheduling policy");
>> +
>>   bool kgd2kfd_init(unsigned interface_version,
>>   		  const struct kfd2kgd_calls *f2g,
>>   		  const struct kgd2kfd_calls **g2f)
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> index 25f23c5..8be07a1 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> @@ -64,6 +64,15 @@
>>   /* Macro for allocating structures */
>>   #define kfd_alloc_struct(ptr_to_struct)	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
>>
>> +/* Kernel module parameter to specify the scheduling policy */
>> +extern int sched_policy;
>> +
>> +enum kfd_sched_policy {
>> +	KFD_SCHED_POLICY_HWS = 0,
>> +	KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION,
>> +	KFD_SCHED_POLICY_NO_HWS
>> +};
>> +
>>   /*
>>    * Large enough to hold the maximum usable pasid + 1.
>>    * It must also be able to store the number of doorbells
>> --
>> 1.9.1
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 17/25] amdkfd: Add packet manager module
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (12 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 16/25] amdkfd: Add module parameter of scheduling policy Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-17 13:29 ` [PATCH v2 18/25] amdkfd: Add process queue " Oded Gabbay
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Ben Goz <ben.goz@amd.com>

The packet manager module builds PM4 packets for the sole use of the CP scheduler. Those packets are used by the HIQ to submit runlists to the CP.

Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/Makefile             |   2 +-
 drivers/gpu/drm/radeon/amdkfd/kfd_packet_manager.c | 488 +++++++++++++++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h           |  62 +++
 3 files changed, 551 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_packet_manager.c

diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
index bead1be..4083f28 100644
--- a/drivers/gpu/drm/radeon/amdkfd/Makefile
+++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
@@ -7,6 +7,6 @@ ccflags-y := -Iinclude/drm
 amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
 		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
 		kfd_process.o kfd_queue.o kfd_mqd_manager.o \
-		kfd_kernel_queue.o
+		kfd_kernel_queue.o kfd_packet_manager.o
 
 obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/radeon/amdkfd/kfd_packet_manager.c
new file mode 100644
index 0000000..394fbd9
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_packet_manager.c
@@ -0,0 +1,488 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/slab.h>
+#include <linux/mutex.h>
+#include "kfd_device_queue_manager.h"
+#include "kfd_kernel_queue.h"
+#include "kfd_priv.h"
+#include "kfd_pm4_headers.h"
+#include "kfd_pm4_opcodes.h"
+#include "cik_mqds.h"
+
+static inline void inc_wptr(unsigned int *wptr, unsigned int increment_bytes, unsigned int buffer_size_bytes)
+{
+	unsigned int temp = *wptr + increment_bytes / sizeof(uint32_t);
+
+	BUG_ON((temp * sizeof(uint32_t)) > buffer_size_bytes);
+	*wptr = temp;
+}
+
+static unsigned int build_pm4_header(unsigned int opcode, size_t packet_size)
+{
+	union PM4_TYPE_3_HEADER header;
+
+	header.u32all = 0;
+	header.opcode = opcode;
+	header.count = packet_size/sizeof(uint32_t) - 2;
+	header.type = PM4_TYPE_3;
+
+	return header.u32all;
+}
+
+static void pm_calc_rlib_size(struct packet_manager *pm, unsigned int *rlib_size, bool *over_subscription)
+{
+	unsigned int process_count, queue_count;
+
+	BUG_ON(!pm || !rlib_size || !over_subscription);
+
+	process_count = pm->dqm->processes_count;
+	queue_count = pm->dqm->queue_count;
+
+	/* check if there is over subscription*/
+	*over_subscription = false;
+	if ((process_count >= VMID_PER_DEVICE) ||
+			queue_count > PIPE_PER_ME_CP_SCHEDULING * QUEUES_PER_PIPE) {
+		*over_subscription = true;
+		pr_debug("kfd: over subscribed runlist\n");
+	}
+
+	/* calculate run list ib allocation size */
+	*rlib_size = process_count * sizeof(struct pm4_map_process) +
+		     queue_count * sizeof(struct pm4_map_queues);
+
+	/* increase the allocation size in case we need a chained run list when over subscription */
+	if (*over_subscription)
+		*rlib_size += sizeof(struct pm4_runlist);
+
+	pr_debug("kfd: runlist ib size %d\n", *rlib_size);
+}
+
+static int pm_allocate_runlist_ib(struct packet_manager *pm, unsigned int **rl_buffer, uint64_t *rl_gpu_buffer,
+		unsigned int *rl_buffer_size, bool *is_over_subscription)
+{
+	int retval;
+
+	BUG_ON(!pm);
+	BUG_ON(pm->allocated == true);
+	BUG_ON(is_over_subscription == NULL);
+
+	pm_calc_rlib_size(pm, rl_buffer_size, is_over_subscription);
+
+	retval = kfd_vidmem_alloc_map(pm->dqm->dev, &pm->ib_buffer_obj, (void **)rl_buffer,
+					     rl_gpu_buffer, ALIGN(*rl_buffer_size, PAGE_SIZE));
+	if (retval != 0) {
+		pr_err("kfd: failed to allocate runlist IB\n");
+		return retval;
+	}
+
+	memset(*rl_buffer, 0, *rl_buffer_size);
+	pm->allocated = true;
+	return retval;
+}
+
+static int pm_create_runlist(struct packet_manager *pm, uint32_t *buffer,
+			uint64_t ib, size_t ib_size_in_dwords, bool chain)
+{
+	struct pm4_runlist *packet;
+
+	BUG_ON(!pm || !buffer || !ib);
+
+	packet = (struct pm4_runlist *)buffer;
+
+	memset(buffer, 0, sizeof(struct pm4_runlist));
+	packet->header.u32all = build_pm4_header(IT_RUN_LIST, sizeof(struct pm4_runlist));
+
+	packet->bitfields4.ib_size = ib_size_in_dwords;
+	packet->bitfields4.chain = chain ? 1 : 0;
+	packet->bitfields4.offload_polling = 0;
+	packet->bitfields4.valid = 1;
+	packet->bitfields4.vmid = 0;
+	packet->ordinal2 = lower_32(ib);
+	packet->bitfields3.ib_base_hi = upper_32(ib);
+
+	return 0;
+}
+
+static int pm_create_map_process(struct packet_manager *pm, uint32_t *buffer, struct qcm_process_device *qpd)
+{
+	struct pm4_map_process *packet;
+
+	BUG_ON(!pm || !buffer || !qpd);
+
+	packet = (struct pm4_map_process *)buffer;
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	memset(buffer, 0, sizeof(struct pm4_map_process));
+
+	packet->header.u32all = build_pm4_header(IT_MAP_PROCESS, sizeof(struct pm4_map_process));
+	packet->bitfields2.diq_enable = (qpd->is_debug) ? 1 : 0;
+	packet->bitfields2.pasid = qpd->pqm->process->pasid;
+	packet->bitfields3.page_table_base = qpd->page_table_base;
+	packet->bitfields4.gds_size = qpd->gds_size;
+	packet->bitfields4.num_gws = qpd->num_gws;
+	packet->bitfields4.num_oac = qpd->num_oac;
+
+	packet->sh_mem_config = qpd->sh_mem_config;
+	packet->sh_mem_bases = qpd->sh_mem_bases;
+	packet->sh_mem_ape1_base = qpd->sh_mem_ape1_base;
+	packet->sh_mem_ape1_limit = qpd->sh_mem_ape1_limit;
+
+	packet->gds_addr_lo = lower_32(qpd->gds_context_area);
+	packet->gds_addr_hi = upper_32(qpd->gds_context_area);
+
+	return 0;
+}
+
+static int pm_create_map_queue(struct packet_manager *pm, uint32_t *buffer, struct queue *q)
+{
+	struct pm4_map_queues *packet;
+
+	BUG_ON(!pm || !buffer || !q);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	packet = (struct pm4_map_queues *)buffer;
+	memset(buffer, 0, sizeof(struct pm4_map_queues));
+
+	packet->header.u32all = build_pm4_header(IT_MAP_QUEUES, sizeof(struct pm4_map_queues));
+	packet->bitfields2.alloc_format = alloc_format___map_queues__one_per_pipe;
+	packet->bitfields2.num_queues = 1;
+	packet->bitfields2.queue_sel = queue_sel___map_queues__map_to_hws_determined_queue_slots;
+	packet->bitfields2.vidmem = (q->properties.is_interop) ? vidmem___map_queues__uses_video_memory :
+			vidmem___map_queues__uses_no_video_memory;
+
+	switch (q->properties.type) {
+	case KFD_QUEUE_TYPE_COMPUTE:
+	case KFD_QUEUE_TYPE_DIQ:
+		packet->bitfields2.engine_sel = engine_sel___map_queues__compute;
+		break;
+	case KFD_QUEUE_TYPE_SDMA:
+		packet->bitfields2.engine_sel = engine_sel___map_queues__sdma0_queue;
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	packet->_map_queues_ordinals[0].bitfields3.doorbell_offset = q->properties.doorbell_off;
+	packet->_map_queues_ordinals[0].mqd_addr_lo = lower_32(q->gart_mqd_addr);
+	packet->_map_queues_ordinals[0].mqd_addr_hi = upper_32(q->gart_mqd_addr);
+	packet->_map_queues_ordinals[0].wptr_addr_lo = lower_32((uint64_t)q->properties.write_ptr);
+	packet->_map_queues_ordinals[0].wptr_addr_hi = upper_32((uint64_t)q->properties.write_ptr);
+
+	return 0;
+}
+
+static int pm_create_runlist_ib(struct packet_manager *pm, struct list_head *queues,
+				uint64_t *rl_gpu_addr, size_t *rl_size_bytes)
+{
+	unsigned int alloc_size_bytes;
+	unsigned int *rl_buffer, rl_wptr, i;
+	int retval, proccesses_mapped;
+	struct device_process_node *cur;
+	struct qcm_process_device *qpd;
+	struct queue *q;
+	struct kernel_queue *kq;
+	bool is_over_subscription;
+
+	BUG_ON(!pm || !queues || !rl_size_bytes || !rl_gpu_addr);
+
+	rl_wptr = retval = proccesses_mapped = 0;
+
+	retval = pm_allocate_runlist_ib(pm, &rl_buffer, rl_gpu_addr, &alloc_size_bytes, &is_over_subscription);
+	if (retval != 0)
+		return retval;
+
+	*rl_size_bytes = alloc_size_bytes;
+
+	pr_debug("kfd: In func %s\n", __func__);
+	pr_debug("kfd: building runlist ib process count: %d queues count %d\n", pm->dqm->processes_count,
+			pm->dqm->queue_count);
+
+	/* build the run list ib packet */
+	list_for_each_entry(cur, queues, list) {
+		qpd = cur->qpd;
+		/* build map process packet */
+		if (proccesses_mapped >= pm->dqm->processes_count) {
+			pr_debug("kfd: not enough space left in runlist IB\n");
+			pm_release_ib(pm);
+			return -ENOMEM;
+		}
+		retval = pm_create_map_process(pm, &rl_buffer[rl_wptr], qpd);
+		if (retval != 0)
+			return retval;
+		proccesses_mapped++;
+		inc_wptr(&rl_wptr, sizeof(struct pm4_map_process), alloc_size_bytes);
+		list_for_each_entry(kq, &qpd->priv_queue_list, list) {
+			if (kq->queue->properties.is_active != true)
+				continue;
+			retval = pm_create_map_queue(pm, &rl_buffer[rl_wptr], kq->queue);
+			if (retval != 0)
+				return retval;
+			inc_wptr(&rl_wptr, sizeof(struct pm4_map_queues), alloc_size_bytes);
+		}
+
+		list_for_each_entry(q, &qpd->queues_list, list) {
+			if (q->properties.is_active != true)
+				continue;
+			retval = pm_create_map_queue(pm, &rl_buffer[rl_wptr], q);
+			if (retval != 0)
+				return retval;
+			inc_wptr(&rl_wptr, sizeof(struct pm4_map_queues), alloc_size_bytes);
+		}
+	}
+
+	pr_debug("kfd: finished map process and queues to runlist\n");
+
+	if (is_over_subscription)
+		pm_create_runlist(pm, &rl_buffer[rl_wptr], *rl_gpu_addr, alloc_size_bytes / sizeof(uint32_t), true);
+
+	for (i = 0; i < alloc_size_bytes / sizeof(uint32_t); i++)
+		pr_debug("0x%2X ", rl_buffer[i]);
+	pr_debug("\n");
+
+	return 0;
+}
+
+int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm)
+{
+	BUG_ON(!dqm);
+
+	pm->dqm = dqm;
+	mutex_init(&pm->lock);
+	pm->priv_queue = kernel_queue_init(dqm->dev, KFD_QUEUE_TYPE_HIQ);
+	if (pm->priv_queue == NULL) {
+		mutex_destroy(&pm->lock);
+		return -ENOMEM;
+	}
+	pm->allocated = false;
+
+	return 0;
+}
+
+void pm_uninit(struct packet_manager *pm)
+{
+	BUG_ON(!pm);
+
+	mutex_destroy(&pm->lock);
+	kernel_queue_uninit(pm->priv_queue);
+}
+
+int pm_send_set_resources(struct packet_manager *pm, struct scheduling_resources *res)
+{
+	struct pm4_set_resources *packet;
+
+	BUG_ON(!pm || !res);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	mutex_lock(&pm->lock);
+	pm->priv_queue->acquire_packet_buffer(pm->priv_queue, sizeof(*packet) / sizeof(uint32_t),
+			(unsigned int **)&packet);
+	if (packet == NULL) {
+		mutex_unlock(&pm->lock);
+		pr_err("kfd: failed to allocate buffer on kernel queue\n");
+		return -ENOMEM;
+	}
+
+	memset(packet, 0, sizeof(struct pm4_set_resources));
+	packet->header.u32all = build_pm4_header(IT_SET_RESOURCES, sizeof(struct pm4_set_resources));
+
+	packet->bitfields2.queue_type = queue_type___set_resources__hsa_interface_queue_hiq;
+	packet->bitfields2.vmid_mask = res->vmid_mask;
+	packet->bitfields2.unmap_latency = KFD_UNMAP_LATENCY;
+	packet->bitfields3.oac_mask = res->oac_mask;
+	packet->bitfields4.gds_heap_base = res->gds_heap_base;
+	packet->bitfields4.gds_heap_size = res->gds_heap_size;
+
+	packet->gws_mask_lo = lower_32(res->gws_mask);
+	packet->gws_mask_hi = upper_32(res->gws_mask);
+
+	packet->queue_mask_lo = lower_32(res->queue_mask);
+	packet->queue_mask_hi = upper_32(res->queue_mask);
+
+	pm->priv_queue->submit_packet(pm->priv_queue);
+	pm->priv_queue->sync_with_hw(pm->priv_queue, KFD_HIQ_TIMEOUT);
+
+	mutex_unlock(&pm->lock);
+
+	return 0;
+}
+
+int pm_send_runlist(struct packet_manager *pm, struct list_head *dqm_queues)
+{
+	uint64_t rl_gpu_ib_addr;
+	uint32_t *rl_buffer;
+	size_t rl_ib_size, packet_size_dwords;
+	int retval;
+
+	BUG_ON(!pm || !dqm_queues);
+
+	retval = pm_create_runlist_ib(pm, dqm_queues, &rl_gpu_ib_addr, &rl_ib_size);
+	if (retval != 0)
+		goto fail_create_runlist_ib;
+
+	pr_debug("kfd: runlist IB address: 0x%llX\n", rl_gpu_ib_addr);
+
+	packet_size_dwords = sizeof(struct pm4_runlist) / sizeof(uint32_t);
+	mutex_lock(&pm->lock);
+
+	retval = pm->priv_queue->acquire_packet_buffer(pm->priv_queue, packet_size_dwords, &rl_buffer);
+	if (retval != 0)
+		goto fail_acquire_packet_buffer;
+
+	retval = pm_create_runlist(pm, rl_buffer, rl_gpu_ib_addr, rl_ib_size / sizeof(uint32_t), false);
+	if (retval != 0)
+		goto fail_create_runlist;
+
+	pm->priv_queue->submit_packet(pm->priv_queue);
+	pm->priv_queue->sync_with_hw(pm->priv_queue, KFD_HIQ_TIMEOUT);
+
+	mutex_unlock(&pm->lock);
+
+	return retval;
+
+fail_create_runlist:
+	pm->priv_queue->rollback_packet(pm->priv_queue);
+fail_acquire_packet_buffer:
+	mutex_unlock(&pm->lock);
+fail_create_runlist_ib:
+	if (pm->allocated == true)
+		pm_release_ib(pm);
+	return retval;
+}
+
+int pm_send_query_status(struct packet_manager *pm, uint64_t fence_address, uint32_t fence_value)
+{
+	int retval;
+	struct pm4_query_status *packet;
+
+	BUG_ON(!pm || !fence_address);
+
+	mutex_lock(&pm->lock);
+	retval = pm->priv_queue->acquire_packet_buffer(pm->priv_queue,
+			sizeof(struct pm4_query_status) / sizeof(uint32_t), (unsigned int **)&packet);
+	if (retval != 0)
+		goto fail_acquire_packet_buffer;
+
+	packet->header.u32all = build_pm4_header(IT_QUERY_STATUS, sizeof(struct pm4_query_status));
+
+	packet->bitfields2.context_id = 0;
+	packet->bitfields2.interrupt_sel = interrupt_sel___query_status__completion_status;
+	packet->bitfields2.command = command___query_status__fence_only_after_write_ack;
+
+	packet->addr_hi = upper_32((uint64_t)fence_address);
+	packet->addr_lo = lower_32((uint64_t)fence_address);
+	packet->data_hi = upper_32((uint64_t)fence_value);
+	packet->data_lo = lower_32((uint64_t)fence_value);
+
+	pm->priv_queue->submit_packet(pm->priv_queue);
+	pm->priv_queue->sync_with_hw(pm->priv_queue, KFD_HIQ_TIMEOUT);
+	mutex_unlock(&pm->lock);
+
+	return 0;
+
+fail_acquire_packet_buffer:
+	mutex_unlock(&pm->lock);
+	return retval;
+}
+
+int pm_send_unmap_queue(struct packet_manager *pm, enum kfd_queue_type type,
+			enum kfd_preempt_type_filter mode, uint32_t filter_param, bool reset)
+{
+	int retval;
+	uint32_t *buffer;
+	struct pm4_unmap_queues *packet;
+
+	BUG_ON(!pm);
+
+	mutex_lock(&pm->lock);
+	retval = pm->priv_queue->acquire_packet_buffer(pm->priv_queue,
+				sizeof(struct pm4_unmap_queues) / sizeof(uint32_t), &buffer);
+	if (retval != 0)
+		goto err_acquire_packet_buffer;
+
+	packet = (struct pm4_unmap_queues *)buffer;
+	memset(buffer, 0, sizeof(struct pm4_unmap_queues));
+
+	packet->header.u32all = build_pm4_header(IT_UNMAP_QUEUES, sizeof(struct pm4_unmap_queues));
+	switch (type) {
+	case KFD_QUEUE_TYPE_COMPUTE:
+	case KFD_QUEUE_TYPE_DIQ:
+		packet->bitfields2.engine_sel = engine_sel___unmap_queues__compute;
+		break;
+	case KFD_QUEUE_TYPE_SDMA:
+		packet->bitfields2.engine_sel = engine_sel___unmap_queues__sdma0;
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	if (reset)
+		packet->bitfields2.action = action___unmap_queues__reset_queues;
+	else
+		packet->bitfields2.action = action___unmap_queues__preempt_queues;
+
+	switch (mode) {
+	case KFD_PREEMPT_TYPE_FILTER_SINGLE_QUEUE:
+	    packet->bitfields2.queue_sel = queue_sel___unmap_queues__perform_request_on_specified_queues;
+	    packet->bitfields2.num_queues = 1;
+	    packet->bitfields4.doorbell_offset0 = filter_param;
+	    break;
+	case KFD_PRERMPT_TYPE_FILTER_BY_PASID:
+	    packet->bitfields2.queue_sel = queue_sel___unmap_queues__perform_request_on_pasid_queues;
+	    packet->bitfields3.pasid = filter_param;
+	    break;
+	case KFD_PRERMPT_TYPE_FILTER_ALL_QUEUES:
+	    packet->bitfields2.queue_sel = queue_sel___unmap_queues__perform_request_on_all_active_queues;
+	    break;
+	default:
+	    BUG();
+	    break;
+	};
+
+	pm->priv_queue->submit_packet(pm->priv_queue);
+	pm->priv_queue->sync_with_hw(pm->priv_queue, KFD_HIQ_TIMEOUT);
+
+	mutex_unlock(&pm->lock);
+	return 0;
+
+err_acquire_packet_buffer:
+	mutex_unlock(&pm->lock);
+	return retval;
+}
+
+void pm_release_ib(struct packet_manager *pm)
+{
+	BUG_ON(!pm);
+
+	mutex_lock(&pm->lock);
+	if (pm->allocated) {
+		kfd_vidmem_free_unmap(pm->dqm->dev, pm->ib_buffer_obj);
+		pm->allocated = false;
+	}
+	mutex_unlock(&pm->lock);
+}
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
index 8be07a1..63e492a 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -160,6 +160,11 @@ int kfd_chardev_init(void);
 void kfd_chardev_exit(void);
 struct device *kfd_chardev(void);
 
+enum kfd_preempt_type_filter {
+	KFD_PREEMPT_TYPE_FILTER_SINGLE_QUEUE,
+	KFD_PRERMPT_TYPE_FILTER_ALL_QUEUES,
+	KFD_PRERMPT_TYPE_FILTER_BY_PASID
+};
 
 enum kfd_queue_type  {
 	KFD_QUEUE_TYPE_COMPUTE,
@@ -213,6 +218,51 @@ enum KFD_MQD_TYPE {
 	KFD_MQD_TYPE_MAX
 };
 
+struct scheduling_resources {
+	unsigned int vmid_mask;
+	enum kfd_queue_type type;
+	uint64_t queue_mask;
+	uint64_t gws_mask;
+	uint32_t oac_mask;
+	uint32_t gds_heap_base;
+	uint32_t gds_heap_size;
+};
+
+struct process_queue_manager {
+	/* data */
+	struct kfd_process	*process;
+	unsigned int		num_concurrent_processes;
+	struct list_head	queues;
+	unsigned long		*queue_slot_bitmap;
+};
+
+struct qcm_process_device {
+	/* The Device Queue Manager that owns this data */
+	struct device_queue_manager *dqm;
+	struct process_queue_manager *pqm;
+	/* Device Queue Manager lock */
+	struct mutex *lock;
+	/* Queues list */
+	struct list_head queues_list;
+	struct list_head priv_queue_list;
+
+	unsigned int queue_count;
+	unsigned int vmid;
+	bool is_debug;
+	/*
+	 * All the memory management data should be here too
+	 */
+	uint64_t gds_context_area;
+	uint32_t sh_mem_config;
+	uint32_t sh_mem_bases;
+	uint32_t sh_mem_ape1_base;
+	uint32_t sh_mem_ape1_limit;
+	uint32_t page_table_base;
+	uint32_t gds_size;
+	uint32_t num_gws;
+	uint32_t num_oac;
+};
+
 /* Data that is per-process-per device. */
 struct kfd_process_device {
 	/*
@@ -327,12 +377,22 @@ int kgd2kfd_resume(struct kfd_dev *dev);
 int kfd_init_apertures(struct kfd_process *process);
 
 /* Queue Context Management */
+inline uint32_t lower_32(uint64_t x);
+inline uint32_t upper_32(uint64_t x);
+
 int init_queue(struct queue **q, struct queue_properties properties);
 void uninit_queue(struct queue *q);
 void print_queue(struct queue *q);
 
+struct kernel_queue *kernel_queue_init(struct kfd_dev *dev, enum kfd_queue_type type);
+void kernel_queue_uninit(struct kernel_queue *kq);
+
 /* Packet Manager */
 
+#define KFD_HIQ_TIMEOUT (500)
+
+#define KFD_UNMAP_LATENCY (15)
+
 struct packet_manager {
 	struct device_queue_manager *dqm;
 	struct kernel_queue *priv_queue;
@@ -341,4 +401,6 @@ struct packet_manager {
 	kfd_mem_obj ib_buffer_obj;
 };
 
+void pm_release_ib(struct packet_manager *pm);
+
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 18/25] amdkfd: Add process queue manager module
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (13 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 17/25] amdkfd: Add packet manager module Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-17 13:29 ` [PATCH v2 19/25] amdkfd: Add device " Oded Gabbay
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Ben Goz <ben.goz@amd.com>

The queue scheduler divides into two sections, one section is process bounded and the other section is device bounded.
The process bounded section is handled by this module. The PQM handles usermode queue setup, updates and tear-down.

Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/Makefile             |   3 +-
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h           |  17 +
 drivers/gpu/drm/radeon/amdkfd/kfd_process.c        |  13 +
 .../drm/radeon/amdkfd/kfd_process_queue_manager.c  | 343 +++++++++++++++++++++
 4 files changed, 375 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_process_queue_manager.c

diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
index 4083f28..eacef85 100644
--- a/drivers/gpu/drm/radeon/amdkfd/Makefile
+++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
@@ -7,6 +7,7 @@ ccflags-y := -Iinclude/drm
 amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
 		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
 		kfd_process.o kfd_queue.o kfd_mqd_manager.o \
-		kfd_kernel_queue.o kfd_packet_manager.o
+		kfd_kernel_queue.o kfd_packet_manager.o \
+		kfd_process_queue_manager.o
 
 obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
index 63e492a..c444b38 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -277,6 +277,9 @@ struct kfd_process_device {
 	/* The user-mode address of the doorbell mapping for this device. */
 	doorbell_t __user *doorbell_mapping;
 
+	/* per-process-per device QCM data structure */
+	struct qcm_process_device qpd;
+
 	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
 	bool bound;
 
@@ -312,6 +315,8 @@ struct kfd_process {
 	 */
 	struct list_head per_device_data;
 
+	struct process_queue_manager pqm;
+
 	/* The process's queues. */
 	size_t queue_array_size;
 
@@ -382,11 +387,23 @@ inline uint32_t upper_32(uint64_t x);
 
 int init_queue(struct queue **q, struct queue_properties properties);
 void uninit_queue(struct queue *q);
+void print_queue_properties(struct queue_properties *q);
 void print_queue(struct queue *q);
 
 struct kernel_queue *kernel_queue_init(struct kfd_dev *dev, enum kfd_queue_type type);
 void kernel_queue_uninit(struct kernel_queue *kq);
 
+/* Process Queue Manager */
+struct process_queue_node {
+	struct queue *q;
+	struct kernel_queue *kq;
+	struct list_head process_queue_list;
+};
+
+int pqm_init(struct process_queue_manager *pqm, struct kfd_process *p);
+void pqm_uninit(struct process_queue_manager *pqm);
+int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid);
+
 /* Packet Manager */
 
 #define KFD_HIQ_TIMEOUT (500)
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_process.c b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
index 908b3b7..bcc004f 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
@@ -163,8 +163,16 @@ static struct kfd_process *create_process(const struct task_struct *thread)
 
 	INIT_LIST_HEAD(&process->per_device_data);
 
+	err = pqm_init(&process->pqm, process);
+	if (err != 0)
+		goto err_process_pqm_init;
+
 	return process;
 
+err_process_pqm_init:
+	kfd_pasid_free(process->pasid);
+	list_del(&process->processes_list);
+	thread->mm->kfd_process = NULL;
 err_alloc_pasid:
 	kfree(process->queues);
 err_alloc_queues:
@@ -185,6 +193,9 @@ struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
 	pdd = kzalloc(sizeof(*pdd), GFP_KERNEL);
 	if (pdd != NULL) {
 		pdd->dev = dev;
+		INIT_LIST_HEAD(&pdd->qpd.queues_list);
+		INIT_LIST_HEAD(&pdd->qpd.priv_queue_list);
+		pdd->qpd.dqm = dev->dqm;
 		list_add(&pdd->per_device_list, &p->per_device_data);
 	}
 
@@ -246,6 +257,8 @@ void kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid)
 
 	mutex_lock(&p->mutex);
 
+	pqm_uninit(&p->pqm);
+
 	/*
 	 * Just mark pdd as unbound, because we still need it to call
 	 * amd_iommu_unbind_pasid() in when the process exits.
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/radeon/amdkfd/kfd_process_queue_manager.c
new file mode 100644
index 0000000..f54df3c
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_process_queue_manager.c
@@ -0,0 +1,343 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/slab.h>
+#include <linux/list.h>
+#include "kfd_device_queue_manager.h"
+#include "kfd_priv.h"
+#include "kfd_kernel_queue.h"
+
+static inline struct process_queue_node *get_queue_by_qid(struct process_queue_manager *pqm, unsigned int qid)
+{
+	struct process_queue_node *pqn;
+
+	BUG_ON(!pqm);
+
+	list_for_each_entry(pqn, &pqm->queues, process_queue_list) {
+		if (pqn->q && pqn->q->properties.queue_id == qid)
+			return pqn;
+		if (pqn->kq && pqn->kq->queue->properties.queue_id == qid)
+			return pqn;
+	}
+
+	return NULL;
+}
+
+static int find_available_queue_slot(struct process_queue_manager *pqm, unsigned int *qid)
+{
+	unsigned long found;
+
+	BUG_ON(!pqm || !qid);
+
+	pr_debug("kfd: in %s\n", __func__);
+
+	found = find_first_zero_bit(pqm->queue_slot_bitmap, MAX_PROCESS_QUEUES);
+
+	pr_debug("kfd: the new slot id %lu\n", found);
+
+	if (found >= MAX_PROCESS_QUEUES)
+		return -ENOMEM;
+
+	set_bit(found, pqm->queue_slot_bitmap);
+	*qid = found;
+
+	return 0;
+}
+
+int pqm_init(struct process_queue_manager *pqm, struct kfd_process *p)
+{
+	BUG_ON(!pqm);
+
+	INIT_LIST_HEAD(&pqm->queues);
+	pqm->queue_slot_bitmap = kzalloc(DIV_ROUND_UP(MAX_PROCESS_QUEUES, BITS_PER_BYTE), GFP_KERNEL);
+	if (pqm->queue_slot_bitmap == NULL)
+		return -ENOMEM;
+	pqm->process = p;
+
+	return 0;
+}
+
+void pqm_uninit(struct process_queue_manager *pqm)
+{
+	int retval;
+	struct process_queue_node *pqn, *next;
+
+	BUG_ON(!pqm);
+
+	pr_debug("In func %s\n", __func__);
+
+	list_for_each_entry_safe(pqn, next, &pqm->queues, process_queue_list) {
+		retval = pqm_destroy_queue(
+				pqm,
+				(pqn->q != NULL) ?
+					pqn->q->properties.queue_id :
+					pqn->kq->queue->properties.queue_id);
+
+		if (retval != 0) {
+			pr_err("kfd: failed to destroy queue\n");
+			return;
+		}
+	}
+	kfree(pqm->queue_slot_bitmap);
+}
+
+static int create_cp_queue(struct process_queue_manager *pqm, struct kfd_dev *dev, struct queue **q,
+		struct queue_properties *q_properties, struct file *f, unsigned int qid)
+{
+	int retval;
+
+	retval = 0;
+
+	q_properties->doorbell_ptr = kfd_get_doorbell(f, pqm->process, dev, qid);
+		if (!q_properties->doorbell_ptr)
+			return -ENOMEM;
+
+	q_properties->doorbell_off = kfd_queue_id_to_doorbell(dev, pqm->process, qid);
+
+	/* let DQM handle it*/
+	q_properties->vmid = 0;
+	q_properties->queue_id = qid;
+	q_properties->type = KFD_QUEUE_TYPE_COMPUTE;
+
+	retval = init_queue(q, *q_properties);
+	if (retval != 0)
+		goto err_init_queue;
+
+	(*q)->device = dev;
+	(*q)->process = pqm->process;
+
+	pr_debug("kfd: PQM After init queue");
+
+	return retval;
+
+err_init_queue:
+	return retval;
+}
+
+int pqm_create_queue(struct process_queue_manager *pqm,
+			    struct kfd_dev *dev,
+			    struct file *f,
+			    struct queue_properties *properties,
+			    unsigned int flags,
+			    enum kfd_queue_type type,
+			    unsigned int *qid)
+{
+	int retval;
+	struct kfd_process_device *pdd;
+	struct queue_properties q_properties;
+	struct queue *q;
+	struct process_queue_node *pqn;
+	struct kernel_queue *kq;
+
+	BUG_ON(!pqm || !dev || !properties || !qid);
+
+	memset(&q_properties, 0, sizeof(struct queue_properties));
+	memcpy(&q_properties, properties, sizeof(struct queue_properties));
+
+	pdd = kfd_get_process_device_data(dev, pqm->process);
+	BUG_ON(!pdd);
+
+	retval = find_available_queue_slot(pqm, qid);
+	if (retval != 0)
+		return retval;
+
+	if (list_empty(&pqm->queues)) {
+		pdd->qpd.pqm = pqm;
+		dev->dqm->register_process(dev->dqm, &pdd->qpd);
+	}
+
+	pqn = kzalloc(sizeof(struct process_queue_node), GFP_KERNEL);
+	if (!pqn) {
+		retval = -ENOMEM;
+		goto err_allocate_pqn;
+	}
+
+	switch (type) {
+	case KFD_QUEUE_TYPE_COMPUTE:
+		/* check if there is over subscription */
+		if ((sched_policy == KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION) &&
+		((dev->dqm->processes_count >= VMID_PER_DEVICE) ||
+		(dev->dqm->queue_count >= PIPE_PER_ME_CP_SCHEDULING * QUEUES_PER_PIPE))) {
+			pr_err("kfd: over-subscription is not allowed in radeon_kfd.sched_policy == 1\n");
+			retval = -EPERM;
+			goto err_create_queue;
+		}
+
+		retval = create_cp_queue(pqm, dev, &q, &q_properties, f, *qid);
+		if (retval != 0)
+			goto err_create_queue;
+		pqn->q = q;
+		pqn->kq = NULL;
+		retval = dev->dqm->create_queue(dev->dqm, q, &pdd->qpd, &q->properties.vmid);
+		print_queue(q);
+		break;
+	case KFD_QUEUE_TYPE_DIQ:
+		kq = kernel_queue_init(dev, KFD_QUEUE_TYPE_DIQ);
+		if (kq == NULL) {
+			kernel_queue_uninit(kq);
+			goto err_create_queue;
+		}
+		kq->queue->properties.queue_id = *qid;
+		pqn->kq = kq;
+		pqn->q = NULL;
+		retval = dev->dqm->create_kernel_queue(dev->dqm, kq, &pdd->qpd);
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	if (retval != 0) {
+		pr_err("kfd: error dqm create queue\n");
+		goto err_create_queue;
+	}
+
+	pr_debug("kfd: PQM After DQM create queue\n");
+
+	list_add(&pqn->process_queue_list, &pqm->queues);
+
+	retval = dev->dqm->execute_queues(dev->dqm);
+	if (retval != 0) {
+		if (pqn->kq)
+			dev->dqm->destroy_kernel_queue(dev->dqm, pqn->kq, &pdd->qpd);
+		if (pqn->q)
+			dev->dqm->destroy_queue(dev->dqm, &pdd->qpd, pqn->q);
+
+		goto err_execute_runlist;
+	}
+
+	*properties = q->properties;
+	pr_debug("kfd: PQM done creating queue\n");
+	print_queue_properties(properties);
+
+	return retval;
+
+err_execute_runlist:
+	list_del(&pqn->process_queue_list);
+err_create_queue:
+	kfree(pqn);
+err_allocate_pqn:
+	clear_bit(*qid, pqm->queue_slot_bitmap);
+	return retval;
+}
+
+int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid)
+{
+	struct process_queue_node *pqn;
+	struct kfd_process_device *pdd;
+	struct device_queue_manager *dqm;
+	struct kfd_dev *dev;
+	int retval;
+
+	dqm = NULL;
+
+	BUG_ON(!pqm);
+	retval = 0;
+
+	pr_debug("kfd: In Func %s\n", __func__);
+
+	pqn = get_queue_by_qid(pqm, qid);
+	BUG_ON(!pqn);
+
+	dev = NULL;
+	if (pqn->kq)
+		dev = pqn->kq->dev;
+	if (pqn->q)
+		dev = pqn->q->device;
+	BUG_ON(!dev);
+
+	pdd = kfd_get_process_device_data(dev, pqm->process);
+	BUG_ON(!pdd);
+
+	if (pqn->kq) {
+		/* destroy kernel queue (DIQ) */
+		dqm = pqn->kq->dev->dqm;
+		dqm->destroy_kernel_queue(dqm, pqn->kq, &pdd->qpd);
+		kernel_queue_uninit(pqn->kq);
+	}
+
+	if (pqn->q) {
+		dqm = pqn->q->device->dqm;
+		retval = dqm->destroy_queue(dqm, &pdd->qpd, pqn->q);
+		if (retval != 0)
+			return retval;
+
+		uninit_queue(pqn->q);
+	}
+
+	list_del(&pqn->process_queue_list);
+	kfree(pqn);
+	clear_bit(qid, pqm->queue_slot_bitmap);
+
+	if (list_empty(&pqm->queues))
+		dqm->unregister_process(dqm, &pdd->qpd);
+
+	retval = dqm->execute_queues(dqm);
+
+	return retval;
+}
+
+int pqm_update_queue(struct process_queue_manager *pqm, unsigned int qid, struct queue_properties *p)
+{
+	int retval;
+	struct process_queue_node *pqn;
+
+	BUG_ON(!pqm);
+
+	pqn = get_queue_by_qid(pqm, qid);
+	BUG_ON(!pqn);
+
+	pqn->q->properties.queue_address = p->queue_address;
+	pqn->q->properties.queue_size = p->queue_size;
+	pqn->q->properties.queue_percent = p->queue_percent;
+	pqn->q->properties.priority = p->priority;
+
+	retval = pqn->q->device->dqm->destroy_queues(pqn->q->device->dqm);
+	if (retval != 0)
+		return retval;
+
+	retval = pqn->q->device->dqm->update_queue(pqn->q->device->dqm, pqn->q);
+	if (retval != 0)
+		return retval;
+
+	retval = pqn->q->device->dqm->execute_queues(pqn->q->device->dqm);
+	if (retval != 0)
+		return retval;
+
+	return 0;
+}
+
+struct kernel_queue *pqm_get_kernel_queue(struct process_queue_manager *pqm, unsigned int qid)
+{
+	struct process_queue_node *pqn;
+
+	BUG_ON(!pqm);
+
+	pqn = get_queue_by_qid(pqm, qid);
+	if (pqn && pqn->kq)
+		return pqn->kq;
+
+	return NULL;
+}
+
+
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 19/25] amdkfd: Add device queue manager module
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (14 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 18/25] amdkfd: Add process queue " Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-17 13:29 ` [PATCH v2 20/25] amdkfd: Add interrupt handling module Oded Gabbay
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Ben Goz <ben.goz@amd.com>

The queue scheduler divides into two sections, one section is process bounded and the other section is device bounded.
The device bounded section is handled by this module.
The DQM module handles queue setup, update and tear-down from the device side.
It also supports suspend/resume operation.

Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/Makefile             |   2 +-
 drivers/gpu/drm/radeon/amdkfd/kfd_device.c         |  26 +-
 .../drm/radeon/amdkfd/kfd_device_queue_manager.c   | 985 +++++++++++++++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h           |  13 +
 4 files changed, 1023 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.c

diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
index eacef85..44639f2 100644
--- a/drivers/gpu/drm/radeon/amdkfd/Makefile
+++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
@@ -8,6 +8,6 @@ amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
 		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
 		kfd_process.o kfd_queue.o kfd_mqd_manager.o \
 		kfd_kernel_queue.o kfd_packet_manager.o \
-		kfd_process_queue_manager.o
+		kfd_process_queue_manager.o kfd_device_queue_manager.o
 
 obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
index 7c4c836..f5e9f39 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
@@ -25,6 +25,7 @@
 #include <linux/pci.h>
 #include <linux/slab.h>
 #include "kfd_priv.h"
+#include "kfd_device_queue_manager.h"
 
 static const struct kfd_device_info kaveri_device_info = {
 	.max_pasid_bits = 16,
@@ -165,10 +166,26 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 
 	amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback);
 
+	kfd->dqm = device_queue_manager_init(kfd);
+	if (!kfd->dqm) {
+		kfd_topology_remove_device(kfd);
+		amd_iommu_free_device(kfd->pdev);
+		return false;
+	}
+
+	if (kfd->dqm->start(kfd->dqm) != 0) {
+		device_queue_manager_uninit(kfd->dqm);
+		kfd_topology_remove_device(kfd);
+		amd_iommu_free_device(kfd->pdev);
+		return false;
+	}
+
 	kfd->init_complete = true;
 	dev_info(kfd_device, "added device (%x:%x)\n", kfd->pdev->vendor,
 		 kfd->pdev->device);
 
+	pr_debug("kfd: Starting kfd with the following scheduling policy %d\n", sched_policy);
+
 	return true;
 }
 
@@ -178,8 +195,10 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd)
 
 	BUG_ON(err != 0);
 
-	if (kfd->init_complete)
+	if (kfd->init_complete) {
+		device_queue_manager_uninit(kfd->dqm);
 		amd_iommu_free_device(kfd->pdev);
+	}
 
 	kfree(kfd);
 }
@@ -188,8 +207,10 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
 {
 	BUG_ON(kfd == NULL);
 
-	if (kfd->init_complete)
+	if (kfd->init_complete) {
+		kfd->dqm->stop(kfd->dqm);
 		amd_iommu_free_device(kfd->pdev);
+	}
 }
 
 int kgd2kfd_resume(struct kfd_dev *kfd)
@@ -206,6 +227,7 @@ int kgd2kfd_resume(struct kfd_dev *kfd)
 		if (err < 0)
 			return -ENXIO;
 		amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback);
+		kfd->dqm->start(kfd->dqm);
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.c
new file mode 100644
index 0000000..d875d00
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device_queue_manager.c
@@ -0,0 +1,985 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/types.h>
+#include <linux/printk.h>
+#include <linux/bitops.h>
+#include "kfd_priv.h"
+#include "kfd_device_queue_manager.h"
+#include "kfd_mqd_manager.h"
+#include "cik_regs.h"
+#include "kfd_kernel_queue.h"
+
+#define CIK_HPD_SIZE_LOG2 11
+#define CIK_HPD_SIZE (1U << CIK_HPD_SIZE_LOG2)
+
+static bool is_mem_initialized;
+
+static int init_memory(struct device_queue_manager *dqm);
+static int
+set_pasid_vmid_mapping(struct device_queue_manager *dqm, unsigned int pasid, unsigned int vmid);
+
+static inline unsigned int get_pipes_num(struct device_queue_manager *dqm)
+{
+	BUG_ON(!dqm || !dqm->dev);
+	return dqm->dev->shared_resources.compute_pipe_count;
+}
+
+static inline unsigned int get_first_pipe(struct device_queue_manager *dqm)
+{
+	BUG_ON(!dqm);
+	return dqm->dev->shared_resources.first_compute_pipe;
+}
+
+static inline unsigned int get_pipes_num_cpsch(void)
+{
+	return PIPE_PER_ME_CP_SCHEDULING;
+}
+
+static unsigned int get_sh_mem_bases_nybble_64(struct kfd_process *process, struct kfd_dev *dev)
+{
+	struct kfd_process_device *pdd;
+	uint32_t nybble;
+
+	pdd = kfd_get_process_device_data(dev, process);
+	nybble = (pdd->lds_base >> 60) & 0x0E;
+
+	return nybble;
+
+}
+
+static unsigned int get_sh_mem_bases_32(struct kfd_process *process, struct kfd_dev *dev)
+{
+	struct kfd_process_device *pdd;
+	unsigned int shared_base;
+
+	pdd = kfd_get_process_device_data(dev, process);
+	shared_base = (pdd->lds_base >> 16) & 0xFF;
+
+	return shared_base;
+}
+
+static uint32_t compute_sh_mem_bases_64bit(unsigned int top_address_nybble);
+static void init_process_memory(struct device_queue_manager *dqm, struct qcm_process_device *qpd)
+{
+	unsigned int temp;
+
+	BUG_ON(!dqm || !qpd);
+
+	/* check if sh_mem_config register already configured */
+	if (qpd->sh_mem_config == 0) {
+		qpd->sh_mem_config =
+			ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED) |
+			DEFAULT_MTYPE(MTYPE_NONCACHED) |
+			APE1_MTYPE(MTYPE_NONCACHED);
+		qpd->sh_mem_ape1_limit = 0;
+		qpd->sh_mem_ape1_base = 0;
+	}
+
+	if (qpd->pqm->process->is_32bit_user_mode) {
+		temp = get_sh_mem_bases_32(qpd->pqm->process, dqm->dev);
+		qpd->sh_mem_bases = SHARED_BASE(temp);
+		qpd->sh_mem_config |= PTR32;
+	} else {
+		temp = get_sh_mem_bases_nybble_64(qpd->pqm->process, dqm->dev);
+		qpd->sh_mem_bases = compute_sh_mem_bases_64bit(temp);
+	}
+
+	pr_debug("kfd: is32bit process: %d sh_mem_bases nybble: 0x%X and register 0x%X\n",
+		qpd->pqm->process->is_32bit_user_mode, temp, qpd->sh_mem_bases);
+}
+
+static void program_sh_mem_settings(struct device_queue_manager *dqm, struct qcm_process_device *qpd)
+{
+	return kfd2kgd->program_sh_mem_settings(dqm->dev->kgd, qpd->vmid, qpd->sh_mem_config,
+			qpd->sh_mem_ape1_base, qpd->sh_mem_ape1_limit, qpd->sh_mem_bases);
+}
+
+static int create_queue_nocpsch(struct device_queue_manager *dqm, struct queue *q,
+			struct qcm_process_device *qpd, int *allocate_vmid)
+{
+	bool set, is_new_vmid;
+	int bit, retval, pipe, i;
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dqm || !q || !qpd || !allocate_vmid);
+	retval = 0;
+
+	pr_debug("kfd: In func %s\n", __func__);
+	print_queue(q);
+
+	mutex_lock(&dqm->lock);
+
+	if (dqm->vmid_bitmap == 0 && qpd->vmid == 0) {
+		retval = -ENOMEM;
+		goto no_vmid;
+	}
+
+	is_new_vmid = false;
+	if (qpd->vmid == 0) {
+		bit = find_first_bit((unsigned long *)&dqm->vmid_bitmap, CIK_VMID_NUM);
+		clear_bit(bit, (unsigned long *)&dqm->vmid_bitmap);
+
+		/* Kaveri kfd vmid's strts from vmid 8 */
+		*allocate_vmid = qpd->vmid = bit + KFD_VMID_START_OFFSET;
+		q->properties.vmid = *allocate_vmid;
+
+
+		pr_debug("kfd: vmid allocation %d\n", *allocate_vmid);
+		set_pasid_vmid_mapping(dqm, q->process->pasid, q->properties.vmid);
+		qpd->vmid = *allocate_vmid;
+		is_new_vmid = true;
+
+		program_sh_mem_settings(dqm, qpd);
+	}
+	q->properties.vmid = qpd->vmid;
+
+	set = false;
+	for (i = 0, pipe = dqm->next_pipe_to_allocate; i < get_pipes_num(dqm);
+			pipe = (pipe + i++) % get_pipes_num(dqm)) {
+		if (dqm->allocated_queues[pipe] != 0) {
+			bit = find_first_bit((unsigned long *)&dqm->allocated_queues[pipe], QUEUES_PER_PIPE);
+			clear_bit(bit, (unsigned long *)&dqm->allocated_queues[pipe]);
+			q->pipe = pipe;
+			q->queue = bit;
+			set = true;
+			break;
+		}
+	}
+
+	if (set == false) {
+		retval = -EBUSY;
+		goto no_hqd;
+	}
+	pr_debug("kfd: DQM %s hqd slot - pipe (%d) queue(%d)\n",
+				__func__, q->pipe, q->queue);
+	dqm->next_pipe_to_allocate = (pipe + 1) % get_pipes_num(dqm);
+
+	mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_COMPUTE);
+	if (mqd == NULL) {
+		retval = -ENOMEM;
+		goto fail_get_mqd_manager;
+	}
+
+	retval = mqd->init_mqd(mqd, &q->mqd, &q->mqd_mem_obj, &q->gart_mqd_addr, &q->properties);
+	if (retval != 0) {
+		set_bit(q->queue, (unsigned long *)&dqm->allocated_queues[q->pipe]);
+		goto init_mqd_failed;
+	}
+
+	list_add(&q->list, &qpd->queues_list);
+	dqm->queue_count++;
+
+	mutex_unlock(&dqm->lock);
+	return 0;
+
+init_mqd_failed:
+fail_get_mqd_manager:
+no_hqd:
+	if (is_new_vmid == true) {
+		set_bit(*allocate_vmid - KFD_VMID_START_OFFSET, (unsigned long *)&dqm->vmid_bitmap);
+		*allocate_vmid = qpd->vmid = q->properties.vmid = 0;
+	}
+no_vmid:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int destroy_queue_nocpsch(struct device_queue_manager *dqm, struct qcm_process_device *qpd, struct queue *q)
+{
+	int retval;
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dqm || !q || !q->mqd || !qpd);
+
+	retval = 0;
+
+	pr_debug("kfd: In Func %s\n", __func__);
+
+	mutex_lock(&dqm->lock);
+	mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_COMPUTE);
+	if (mqd == NULL) {
+		retval = -ENOMEM;
+		goto out;
+	}
+	retval = mqd->destroy_mqd(mqd, false, QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS, q->pipe, q->queue);
+	if (retval != 0)
+		goto out;
+
+	mqd->uninit_mqd(mqd, q->mqd, q->mqd_mem_obj);
+
+	set_bit(q->queue, (unsigned long *)&dqm->allocated_queues[q->pipe]);
+	q->queue = q->pipe = 0;
+	list_del(&q->list);
+	if (list_empty(&qpd->queues_list)) {
+		set_bit(qpd->vmid - 8, (unsigned long *)&dqm->vmid_bitmap);
+		qpd->vmid = 0;
+	}
+	dqm->queue_count--;
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int update_queue_nocpsch(struct device_queue_manager *dqm, struct queue *q)
+{
+	int retval;
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dqm || !q || !q->mqd);
+
+	mutex_lock(&dqm->lock);
+	mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_COMPUTE);
+	if (mqd == NULL) {
+		mutex_unlock(&dqm->lock);
+		return -ENOMEM;
+	}
+	retval = mqd->update_mqd(mqd, q->mqd, &q->properties);
+	if (q->properties.is_active == true)
+		dqm->queue_count++;
+	else
+		dqm->queue_count--;
+
+	mutex_unlock(&dqm->lock);
+	return 0;
+}
+
+static int destroy_queues_nocpsch(struct device_queue_manager *dqm)
+{
+	struct device_process_node *cur;
+	struct mqd_manager *mqd;
+	struct queue *q;
+
+	BUG_ON(!dqm);
+
+	mutex_lock(&dqm->lock);
+	mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_COMPUTE);
+	if (mqd == NULL) {
+		mutex_unlock(&dqm->lock);
+		return -ENOMEM;
+	}
+
+	list_for_each_entry(cur, &dqm->queues, list) {
+		list_for_each_entry(q, &cur->qpd->queues_list, list) {
+			mqd->destroy_mqd(mqd, false, QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS, q->pipe, q->queue);
+		}
+	}
+
+	mutex_unlock(&dqm->lock);
+
+	return 0;
+}
+
+static struct mqd_manager *get_mqd_manager_nocpsch(struct device_queue_manager *dqm, enum KFD_MQD_TYPE type)
+{
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dqm || type >= KFD_MQD_TYPE_MAX);
+
+	pr_debug("kfd: In func %s mqd type %d\n", __func__, type);
+
+	mqd = dqm->mqds[type];
+	if (!mqd) {
+		mqd = mqd_manager_init(type, dqm->dev);
+		if (mqd == NULL)
+			pr_err("kfd: mqd manager is NULL");
+		dqm->mqds[type] = mqd;
+	}
+
+	return mqd;
+}
+
+static int execute_queues_nocpsch(struct device_queue_manager *dqm)
+{
+	struct qcm_process_device *qpd;
+	struct device_process_node *node;
+	struct queue *q;
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dqm);
+
+	mutex_lock(&dqm->lock);
+	mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_COMPUTE);
+	if (mqd == NULL) {
+		mutex_unlock(&dqm->lock);
+		return -ENOMEM;
+	}
+
+	list_for_each_entry(node, &dqm->queues, list) {
+		qpd = node->qpd;
+		list_for_each_entry(q, &qpd->queues_list, list) {
+			pr_debug("kfd: executing queue (%d, %d)\n", q->pipe, q->queue);
+			if (mqd->is_occupied(mqd, q->properties.queue_address, q->pipe, q->queue) == false &&
+					q->properties.is_active == true)
+				mqd->load_mqd(mqd, q->mqd, q->pipe, q->queue, q->properties.write_ptr);
+		}
+	}
+
+	mutex_unlock(&dqm->lock);
+
+	return 0;
+}
+
+static int register_process_nocpsch(struct device_queue_manager *dqm, struct qcm_process_device *qpd)
+{
+	struct device_process_node *n;
+
+	BUG_ON(!dqm || !qpd);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	n = kzalloc(sizeof(struct device_process_node), GFP_KERNEL);
+	if (!n)
+		return -ENOMEM;
+
+	n->qpd = qpd;
+
+	mutex_lock(&dqm->lock);
+	list_add(&n->list, &dqm->queues);
+
+	init_process_memory(dqm, qpd);
+	dqm->processes_count++;
+
+	mutex_unlock(&dqm->lock);
+
+	return 0;
+}
+
+static int unregister_process_nocpsch(struct device_queue_manager *dqm, struct qcm_process_device *qpd)
+{
+	int retval;
+	struct device_process_node *cur, *next;
+
+	BUG_ON(!dqm || !qpd);
+
+	BUG_ON(!list_empty(&qpd->queues_list));
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	retval = 0;
+	mutex_lock(&dqm->lock);
+
+	list_for_each_entry_safe(cur, next, &dqm->queues, list) {
+		if (qpd == cur->qpd) {
+			list_del(&cur->list);
+			dqm->processes_count--;
+			goto out;
+		}
+	}
+	/* qpd not found in dqm list */
+	retval = 1;
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int
+set_pasid_vmid_mapping(struct device_queue_manager *dqm, unsigned int pasid, unsigned int vmid)
+{
+	uint32_t pasid_mapping;
+
+	pasid_mapping = (pasid == 0) ? 0 : (uint32_t)pasid | ATC_VMID_PASID_MAPPING_VALID;
+	return kfd2kgd->set_pasid_vmid_mapping(dqm->dev->kgd, pasid_mapping, vmid);
+}
+
+static uint32_t compute_sh_mem_bases_64bit(unsigned int top_address_nybble)
+{
+	/* In 64-bit mode, we can only control the top 3 bits of the LDS, scratch and GPUVM apertures.
+	 * The hardware fills in the remaining 59 bits according to the following pattern:
+	 * LDS:		X0000000'00000000 - X0000001'00000000 (4GB)
+	 * Scratch:	X0000001'00000000 - X0000002'00000000 (4GB)
+	 * GPUVM:	Y0010000'00000000 - Y0020000'00000000 (1TB)
+	 *
+	 * (where X/Y is the configurable nybble with the low-bit 0)
+	 *
+	 * LDS and scratch will have the same top nybble programmed in the top 3 bits of SH_MEM_BASES.PRIVATE_BASE.
+	 * GPUVM can have a different top nybble programmed in the top 3 bits of SH_MEM_BASES.SHARED_BASE.
+	 * We don't bother to support different top nybbles for LDS/Scratch and GPUVM.
+	 */
+
+	BUG_ON((top_address_nybble & 1) || top_address_nybble > 0xE || top_address_nybble == 0);
+
+	return PRIVATE_BASE(top_address_nybble << 12) | SHARED_BASE(top_address_nybble << 12);
+}
+
+static int init_memory(struct device_queue_manager *dqm)
+{
+	int i, retval;
+
+	for (i = 8; i < 16; i++)
+		set_pasid_vmid_mapping(dqm, 0, i);
+
+	retval = kfd2kgd->init_memory(dqm->dev->kgd);
+	if (retval == 0)
+		is_mem_initialized = true;
+	return retval;
+}
+
+
+static int init_pipelines(struct device_queue_manager *dqm, unsigned int pipes_num, unsigned int first_pipe)
+{
+	void *hpdptr;
+	struct mqd_manager *mqd;
+	unsigned int i, err, inx;
+	uint64_t pipe_hpd_addr;
+
+	BUG_ON(!dqm || !dqm->dev);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	/*
+	 * Allocate memory for the HPDs. This is hardware-owned per-pipe data.
+	 * The driver never accesses this memory after zeroing it. It doesn't even have
+	 * to be saved/restored on suspend/resume because it contains no data when there
+	 * are no active queues.
+	 */
+	err = kfd_vidmem_alloc(dqm->dev,
+				CIK_HPD_SIZE * pipes_num,
+				PAGE_SIZE,
+				KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
+				&dqm->pipeline_mem);
+	if (err) {
+		pr_err("kfd: error allocate vidmem num pipes: %d\n", pipes_num);
+		return -ENOMEM;
+	}
+
+	err = kfd_vidmem_kmap(dqm->dev, dqm->pipeline_mem, &hpdptr);
+	if (err) {
+		pr_err("kfd: err kmap vidmem\n");
+		kfd_vidmem_free(dqm->dev, dqm->pipeline_mem);
+		return -ENOMEM;
+	}
+
+	memset(hpdptr, 0, CIK_HPD_SIZE * pipes_num);
+	kfd_vidmem_unkmap(dqm->dev, dqm->pipeline_mem);
+
+	kfd_vidmem_gpumap(dqm->dev, dqm->pipeline_mem, &dqm->pipelines_addr);
+
+	mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_COMPUTE);
+	if (mqd == NULL) {
+		kfd_vidmem_free(dqm->dev, dqm->pipeline_mem);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < pipes_num; i++) {
+		inx = i + first_pipe;
+		pipe_hpd_addr = dqm->pipelines_addr + i * CIK_HPD_SIZE;
+		pr_debug("kfd: pipeline address %llX\n", pipe_hpd_addr);
+		kfd2kgd->init_pipeline(dqm->dev->kgd, i, CIK_HPD_SIZE_LOG2, pipe_hpd_addr);
+	}
+
+	return 0;
+}
+
+
+static int init_scheduler(struct device_queue_manager *dqm)
+{
+	int retval;
+
+	BUG_ON(!dqm);
+
+	pr_debug("kfd: In %s\n", __func__);
+
+	retval = init_pipelines(dqm, get_pipes_num(dqm), KFD_DQM_FIRST_PIPE);
+	if (retval != 0)
+		return retval;
+	/* should be later integrated with Evgeny/Alexey memory management code */
+	retval = init_memory(dqm);
+	return retval;
+}
+
+static int initialize_nocpsch(struct device_queue_manager *dqm)
+{
+	int i;
+
+	BUG_ON(!dqm);
+
+	pr_debug("kfd: In func %s num of pipes: %d\n", __func__, get_pipes_num(dqm));
+
+	mutex_init(&dqm->lock);
+	INIT_LIST_HEAD(&dqm->queues);
+	dqm->queue_count = dqm->next_pipe_to_allocate = 0;
+	dqm->allocated_queues = kcalloc(get_pipes_num(dqm), sizeof(unsigned int), GFP_KERNEL);
+	if (!dqm->allocated_queues) {
+		mutex_destroy(&dqm->lock);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < get_pipes_num(dqm); i++)
+		dqm->allocated_queues[i] = (1 << QUEUES_PER_PIPE) - 1;
+
+	dqm->vmid_bitmap = (1 << VMID_PER_DEVICE) - 1;
+
+	init_scheduler(dqm);
+	return 0;
+}
+
+static void uninitialize_nocpsch(struct device_queue_manager *dqm)
+{
+	BUG_ON(!dqm);
+
+	BUG_ON(dqm->queue_count > 0 || dqm->processes_count > 0);
+
+	kfree(dqm->allocated_queues);
+	mutex_destroy(&dqm->lock);
+	kfd_vidmem_free(dqm->dev, dqm->pipeline_mem);
+}
+
+static int start_nocpsch(struct device_queue_manager *dqm)
+{
+	return 0;
+}
+
+static int stop_nocpsch(struct device_queue_manager *dqm)
+{
+	return 0;
+}
+
+/*
+ * Device Queue Manager implementation for cp scheduler
+ */
+
+static int set_sched_resources(struct device_queue_manager *dqm)
+{
+	struct scheduling_resources res;
+	unsigned int queue_num, queue_mask;
+
+	BUG_ON(!dqm);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	queue_num = get_pipes_num_cpsch() * QUEUES_PER_PIPE;
+	queue_mask = (1 << queue_num) - 1;
+	res.vmid_mask = (1 << VMID_PER_DEVICE) - 1;
+	res.vmid_mask <<= KFD_VMID_START_OFFSET;
+	res.queue_mask = queue_mask << (get_first_pipe(dqm) * QUEUES_PER_PIPE);
+	res.gws_mask = res.oac_mask = res.gds_heap_base = res.gds_heap_size = 0;
+
+	pr_debug("kfd: scheduling resources:\n"
+			"      vmid mask: 0x%8X\n"
+			"      queue mask: 0x%8llX\n", res.vmid_mask, res.queue_mask);
+
+	return pm_send_set_resources(&dqm->packets, &res);
+}
+
+static int initialize_cpsch(struct device_queue_manager *dqm)
+{
+	int retval;
+
+	BUG_ON(!dqm);
+
+	pr_debug("kfd: In func %s num of pipes: %d\n", __func__, get_pipes_num_cpsch());
+
+	mutex_init(&dqm->lock);
+	INIT_LIST_HEAD(&dqm->queues);
+	dqm->queue_count = dqm->processes_count = 0;
+	dqm->active_runlist = false;
+	retval = init_pipelines(dqm, get_pipes_num(dqm), 0);
+	if (retval != 0)
+		goto fail_init_pipelines;
+
+	return 0;
+
+fail_init_pipelines:
+	mutex_destroy(&dqm->lock);
+	return retval;
+}
+
+static int start_cpsch(struct device_queue_manager *dqm)
+{
+	struct device_process_node *node;
+	int retval;
+
+	BUG_ON(!dqm);
+
+	retval = 0;
+
+	retval = pm_init(&dqm->packets, dqm);
+	if (retval != 0)
+		goto fail_packet_manager_init;
+
+	retval = set_sched_resources(dqm);
+	if (retval != 0)
+		goto fail_set_sched_resources;
+
+	pr_debug("kfd: allocating fence memory\n");
+	/* allocate fence memory on the gart */
+	retval = kfd_vidmem_alloc_map(dqm->dev, &dqm->fence_mem,
+					(void **)&dqm->fence_addr,
+					&dqm->fence_gpu_addr,
+					sizeof(*dqm->fence_addr));
+	if (retval != 0)
+		goto fail_allocate_vidmem;
+
+	list_for_each_entry(node, &dqm->queues, list) {
+	if (node->qpd->pqm->process && dqm->dev)
+		kfd_bind_process_to_device(dqm->dev, node->qpd->pqm->process);
+	}
+
+	dqm->execute_queues(dqm);
+
+	return 0;
+fail_allocate_vidmem:
+fail_set_sched_resources:
+	pm_uninit(&dqm->packets);
+fail_packet_manager_init:
+	return retval;
+}
+
+static int stop_cpsch(struct device_queue_manager *dqm)
+{
+	struct device_process_node *node;
+	struct kfd_process_device *pdd;
+
+	BUG_ON(!dqm);
+
+	dqm->destroy_queues(dqm);
+
+	list_for_each_entry(node, &dqm->queues, list) {
+		pdd = kfd_get_process_device_data(dqm->dev, node->qpd->pqm->process);
+		pdd->bound = false;
+	}
+	kfd_vidmem_free_unmap(dqm->dev, dqm->fence_mem);
+	pm_uninit(&dqm->packets);
+
+	return 0;
+}
+
+static int create_kernel_queue_cpsch(struct device_queue_manager *dqm,
+					struct kernel_queue *kq,
+					struct qcm_process_device *qpd)
+{
+	BUG_ON(!dqm || !kq || !qpd);
+
+	pr_debug("kfd: In func %s\n", __func__);
+
+	mutex_lock(&dqm->lock);
+	list_add(&kq->list, &qpd->priv_queue_list);
+	dqm->queue_count++;
+	qpd->is_debug = true;
+	mutex_unlock(&dqm->lock);
+
+	return 0;
+}
+
+static void destroy_kernel_queue_cpsch(struct device_queue_manager *dqm,
+					struct kernel_queue *kq,
+					struct qcm_process_device *qpd)
+{
+	BUG_ON(!dqm || !kq);
+
+	pr_debug("kfd: In %s\n", __func__);
+
+	dqm->destroy_queues(dqm);
+
+	mutex_lock(&dqm->lock);
+	list_del(&kq->list);
+	dqm->queue_count--;
+	qpd->is_debug = false;
+	mutex_unlock(&dqm->lock);
+}
+
+static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
+			struct qcm_process_device *qpd, int *allocate_vmid)
+{
+	int retval;
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dqm || !q || !qpd);
+
+	retval = 0;
+
+	if (allocate_vmid)
+		*allocate_vmid = 0;
+
+	mutex_lock(&dqm->lock);
+
+	mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_CP);
+	if (mqd == NULL) {
+		mutex_unlock(&dqm->lock);
+		return -ENOMEM;
+	}
+
+	retval = mqd->init_mqd(mqd, &q->mqd, &q->mqd_mem_obj, &q->gart_mqd_addr, &q->properties);
+	if (retval != 0)
+		goto out;
+
+	list_add(&q->list, &qpd->queues_list);
+	if (q->properties.is_active)
+		dqm->queue_count++;
+
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+int fence_wait_timeout(unsigned int *fence_addr, unsigned int fence_value, unsigned long timeout)
+{
+	BUG_ON(!fence_addr);
+	timeout += jiffies;
+
+	while (*fence_addr != fence_value) {
+		if (time_after(jiffies, timeout)) {
+			pr_err("kfd: qcm fence wait loop timeout expired\n");
+			return -ETIME;
+		}
+		cpu_relax();
+	}
+
+	return 0;
+}
+
+static int destroy_queues_cpsch(struct device_queue_manager *dqm)
+{
+	int retval;
+
+	BUG_ON(!dqm);
+
+	retval = 0;
+
+	mutex_lock(&dqm->lock);
+	if (dqm->active_runlist == false)
+		goto out;
+	retval = pm_send_unmap_queue(&dqm->packets, KFD_QUEUE_TYPE_COMPUTE,
+			KFD_PRERMPT_TYPE_FILTER_ALL_QUEUES, 0, false);
+	if (retval != 0)
+		goto out;
+
+	*dqm->fence_addr = KFD_FENCE_INIT;
+	pm_send_query_status(&dqm->packets, dqm->fence_gpu_addr, KFD_FENCE_COMPLETED);
+	/* should be timed out */
+	fence_wait_timeout(dqm->fence_addr, KFD_FENCE_COMPLETED, QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS);
+	pm_release_ib(&dqm->packets);
+	dqm->active_runlist = false;
+
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int execute_queues_cpsch(struct device_queue_manager *dqm)
+{
+	int retval;
+
+	BUG_ON(!dqm);
+
+	retval = dqm->destroy_queues(dqm);
+	if (retval != 0) {
+		pr_err("kfd: the cp might be in an unrecoverable state due to an unsuccesful queues premption");
+		return retval;
+	}
+
+	if (dqm->queue_count <= 0 || dqm->processes_count <= 0)
+		return 0;
+
+	mutex_lock(&dqm->lock);
+	if (dqm->active_runlist) {
+		retval = 0;
+		goto out;
+	}
+	retval = pm_send_runlist(&dqm->packets, &dqm->queues);
+	if (retval != 0) {
+		pr_err("kfd: failed to execute runlist");
+		goto out;
+	}
+	dqm->active_runlist = true;
+
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int destroy_queue_cpsch(struct device_queue_manager *dqm, struct qcm_process_device *qpd, struct queue *q)
+{
+	int retval;
+	struct mqd_manager *mqd;
+
+	BUG_ON(!dqm || !qpd || !q);
+
+	retval = 0;
+
+	/* preempt queues before delete mqd */
+	dqm->destroy_queues(dqm);
+
+	mutex_lock(&dqm->lock);
+	mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_CP);
+	if (!mqd) {
+		retval = -ENOMEM;
+		goto failed_get_mqd_manager;
+	}
+	list_del(&q->list);
+
+	mqd->uninit_mqd(mqd, q->mqd, q->mqd_mem_obj);
+	dqm->queue_count--;
+	mutex_unlock(&dqm->lock);
+
+	return 0;
+failed_get_mqd_manager:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+/* Low bits must be 0000/FFFF as required by HW, high bits must be 0 to stay in user mode. */
+#define APE1_FIXED_BITS_MASK 0xFFFF80000000FFFFULL
+#define APE1_LIMIT_ALIGNMENT 0xFFFF /* APE1 limit is inclusive and 64K aligned. */
+
+static bool set_cache_memory_policy(struct device_queue_manager *dqm,
+				   struct qcm_process_device *qpd,
+				   enum cache_policy default_policy,
+				   enum cache_policy alternate_policy,
+				   void __user *alternate_aperture_base,
+				   uint64_t alternate_aperture_size)
+{
+	uint32_t default_mtype;
+	uint32_t ape1_mtype;
+
+	pr_debug("kfd: In func %s\n", __func__);
+	mutex_lock(&dqm->lock);
+
+	if (alternate_aperture_size == 0) {
+		/* base > limit disables APE1 */
+		qpd->sh_mem_ape1_base = 1;
+		qpd->sh_mem_ape1_limit = 0;
+	} else {
+		/*
+		 * In FSA64, APE1_Base[63:0] = { 16{SH_MEM_APE1_BASE[31]}, SH_MEM_APE1_BASE[31:0], 0x0000 }
+		 * APE1_Limit[63:0] = { 16{SH_MEM_APE1_LIMIT[31]}, SH_MEM_APE1_LIMIT[31:0], 0xFFFF }
+		 * Verify that the base and size parameters can be represented in this format
+		 * and convert them. Additionally restrict APE1 to user-mode addresses.
+		 */
+
+		uint64_t base = (uintptr_t)alternate_aperture_base;
+		uint64_t limit = base + alternate_aperture_size - 1;
+
+		if (limit <= base)
+			goto out;
+
+		if ((base & APE1_FIXED_BITS_MASK) != 0)
+			goto out;
+
+		if ((limit & APE1_FIXED_BITS_MASK) != APE1_LIMIT_ALIGNMENT)
+			goto out;
+
+		qpd->sh_mem_ape1_base = base >> 16;
+		qpd->sh_mem_ape1_limit = limit >> 16;
+
+	}
+
+	default_mtype = (default_policy == cache_policy_coherent) ?
+			MTYPE_NONCACHED :
+			MTYPE_CACHED;
+
+	ape1_mtype = (alternate_policy == cache_policy_coherent) ?
+			MTYPE_NONCACHED :
+			MTYPE_CACHED;
+
+	qpd->sh_mem_config = (qpd->sh_mem_config & PTR32)
+			| ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED)
+			| DEFAULT_MTYPE(default_mtype)
+			| APE1_MTYPE(ape1_mtype);
+
+	if ((sched_policy == KFD_SCHED_POLICY_NO_HWS) && (qpd->vmid != 0))
+		program_sh_mem_settings(dqm, qpd);
+
+	pr_debug("kfd: sh_mem_config: 0x%x, ape1_base: 0x%x, ape1_limit: 0x%x\n",
+		qpd->sh_mem_config, qpd->sh_mem_ape1_base,
+		qpd->sh_mem_ape1_limit);
+
+	mutex_unlock(&dqm->lock);
+	return true;
+
+out:
+	mutex_unlock(&dqm->lock);
+	return false;
+}
+
+struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
+{
+	struct device_queue_manager *dqm;
+
+	BUG_ON(!dev);
+
+	dqm = kzalloc(sizeof(struct device_queue_manager), GFP_KERNEL);
+	if (!dqm)
+		return NULL;
+
+	dqm->dev = dev;
+	switch (sched_policy) {
+	case KFD_SCHED_POLICY_HWS:
+	case KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION:
+		/* initialize dqm for cp scheduling */
+		dqm->create_queue = create_queue_cpsch;
+		dqm->initialize = initialize_cpsch;
+		dqm->start = start_cpsch;
+		dqm->stop = stop_cpsch;
+		dqm->destroy_queues = destroy_queues_cpsch;
+		dqm->execute_queues = execute_queues_cpsch;
+		dqm->destroy_queue = destroy_queue_cpsch;
+		dqm->update_queue = update_queue_nocpsch;
+		dqm->get_mqd_manager = get_mqd_manager_nocpsch;
+		dqm->register_process = register_process_nocpsch;
+		dqm->unregister_process = unregister_process_nocpsch;
+		dqm->uninitialize = uninitialize_nocpsch;
+		dqm->create_kernel_queue = create_kernel_queue_cpsch;
+		dqm->destroy_kernel_queue = destroy_kernel_queue_cpsch;
+		dqm->set_cache_memory_policy = set_cache_memory_policy;
+		break;
+	case KFD_SCHED_POLICY_NO_HWS:
+		/* initialize dqm for no cp scheduling */
+		dqm->start = start_nocpsch;
+		dqm->stop = stop_nocpsch;
+		dqm->create_queue = create_queue_nocpsch;
+		dqm->destroy_queue = destroy_queue_nocpsch;
+		dqm->update_queue = update_queue_nocpsch;
+		dqm->destroy_queues = destroy_queues_nocpsch;
+		dqm->get_mqd_manager = get_mqd_manager_nocpsch;
+		dqm->execute_queues = execute_queues_nocpsch;
+		dqm->register_process = register_process_nocpsch;
+		dqm->unregister_process = unregister_process_nocpsch;
+		dqm->initialize = initialize_nocpsch;
+		dqm->uninitialize = uninitialize_nocpsch;
+		dqm->set_cache_memory_policy = set_cache_memory_policy;
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	if (dqm->initialize(dqm) != 0) {
+		kfree(dqm);
+		return NULL;
+	}
+
+	return dqm;
+}
+
+void device_queue_manager_uninit(struct device_queue_manager *dqm)
+{
+	BUG_ON(!dqm);
+
+	dqm->uninitialize(dqm);
+	kfree(dqm);
+}
+
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
index c444b38..9815ead 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -332,6 +332,7 @@ struct kfd_process {
 struct kfd_process *kfd_create_process(const struct task_struct *);
 struct kfd_process *kfd_get_process(const struct task_struct *);
 
+struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev, struct kfd_process *p);
 void kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid);
 struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
 							struct kfd_process *p);
@@ -390,6 +391,9 @@ void uninit_queue(struct queue *q);
 void print_queue_properties(struct queue_properties *q);
 void print_queue(struct queue *q);
 
+struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type, struct kfd_dev *dev);
+struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev);
+void device_queue_manager_uninit(struct device_queue_manager *dqm);
 struct kernel_queue *kernel_queue_init(struct kfd_dev *dev, enum kfd_queue_type type);
 void kernel_queue_uninit(struct kernel_queue *kq);
 
@@ -408,6 +412,8 @@ int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid);
 
 #define KFD_HIQ_TIMEOUT (500)
 
+#define KFD_FENCE_COMPLETED (100)
+#define KFD_FENCE_INIT   (10)
 #define KFD_UNMAP_LATENCY (15)
 
 struct packet_manager {
@@ -418,6 +424,13 @@ struct packet_manager {
 	kfd_mem_obj ib_buffer_obj;
 };
 
+int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm);
+void pm_uninit(struct packet_manager *pm);
+int pm_send_set_resources(struct packet_manager *pm, struct scheduling_resources *res);
+int pm_send_runlist(struct packet_manager *pm, struct list_head *dqm_queues);
+int pm_send_query_status(struct packet_manager *pm, uint64_t fence_address, uint32_t fence_value);
+int pm_send_unmap_queue(struct packet_manager *pm, enum kfd_queue_type type,
+			enum kfd_preempt_type_filter mode, uint32_t filter_param, bool reset);
 void pm_release_ib(struct packet_manager *pm);
 
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 20/25] amdkfd: Add interrupt handling module
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (15 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 19/25] amdkfd: Add device " Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-17 13:29 ` [PATCH v2 21/25] amdkfd: Implement the create/destroy/update queue IOCTLs Oded Gabbay
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Andrew Lewycky <Andrew.Lewycky@amd.com>

This patch adds the interrupt handling module, in kfd_interrupt.c, and its related members in different data structures to the amdkfd driver.

The amdkfd interrupt module maintains an internal interrupt ring per amdkfd device. The internal interrupt ring contains interrupts that needs further handling. The extra handling is deferred to a later time through a workqueue.

There's no acknowledgment for the interrupts we use. The hardware simply queues a new interrupt each time without waiting.

The fixed-size internal queue means that it's possible for us to lose interrupts because we have no back-pressure to the hardware.

Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/Makefile        |   3 +-
 drivers/gpu/drm/radeon/amdkfd/kfd_device.c    |  16 ++-
 drivers/gpu/drm/radeon/amdkfd/kfd_interrupt.c | 161 ++++++++++++++++++++++++++
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h      |  18 ++-
 4 files changed, 193 insertions(+), 5 deletions(-)
 create mode 100644 drivers/gpu/drm/radeon/amdkfd/kfd_interrupt.c

diff --git a/drivers/gpu/drm/radeon/amdkfd/Makefile b/drivers/gpu/drm/radeon/amdkfd/Makefile
index 44639f2..e634681 100644
--- a/drivers/gpu/drm/radeon/amdkfd/Makefile
+++ b/drivers/gpu/drm/radeon/amdkfd/Makefile
@@ -8,6 +8,7 @@ amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
 		kfd_pasid.o kfd_doorbell.o kfd_vidmem.o kfd_aperture.o \
 		kfd_process.o kfd_queue.o kfd_mqd_manager.o \
 		kfd_kernel_queue.o kfd_packet_manager.o \
-		kfd_process_queue_manager.o kfd_device_queue_manager.o
+		kfd_process_queue_manager.o kfd_device_queue_manager.o \
+		kfd_interrupt.o
 
 obj-$(CONFIG_HSA_RADEON)	+= amdkfd.o
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
index f5e9f39..6a7a8b2 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
@@ -29,6 +29,7 @@
 
 static const struct kfd_device_info kaveri_device_info = {
 	.max_pasid_bits = 16,
+	.ih_ring_entry_size = 4 * sizeof(uint32_t)
 };
 
 struct kfd_deviceid {
@@ -156,6 +157,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 
 	kfd_doorbell_init(kfd);
 
+	if (kfd_interrupt_init(kfd))
+		return false;
+
 	if (!device_iommu_pasid_init(kfd))
 		return false;
 
@@ -195,6 +199,8 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd)
 
 	BUG_ON(err != 0);
 
+	kfd_interrupt_exit(kfd);
+
 	if (kfd->init_complete) {
 		device_queue_manager_uninit(kfd->dqm);
 		amd_iommu_free_device(kfd->pdev);
@@ -233,6 +239,14 @@ int kgd2kfd_resume(struct kfd_dev *kfd)
 	return 0;
 }
 
-void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry)
+/* This is called directly from KGD at ISR. */
+void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
 {
+	spin_lock(&kfd->interrupt_lock);
+
+	if (kfd->interrupts_active
+	    && enqueue_ih_ring_entry(kfd, ih_ring_entry))
+		schedule_work(&kfd->interrupt_work);
+
+	spin_unlock(&kfd->interrupt_lock);
 }
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_interrupt.c b/drivers/gpu/drm/radeon/amdkfd/kfd_interrupt.c
new file mode 100644
index 0000000..eed43a7
--- /dev/null
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_interrupt.c
@@ -0,0 +1,161 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/*
+ * KFD Interrupts.
+ *
+ * AMD GPUs deliver interrupts by pushing an interrupt description onto the
+ * interrupt ring and then sending an interrupt. KGD receives the interrupt
+ * in ISR and sends us a pointer to each new entry on the interrupt ring.
+ *
+ * We generally can't process interrupt-signaled events from ISR, so we call
+ * out to each interrupt client module (currently only the scheduler) to ask if
+ * each interrupt is interesting. If they return true, then it requires further
+ * processing so we copy it to an internal interrupt ring and call each
+ * interrupt client again from a work-queue.
+ *
+ * There's no acknowledgment for the interrupts we use. The hardware simply
+ * queues a new interrupt each time without waiting.
+ *
+ * The fixed-size internal queue means that it's possible for us to lose
+ * interrupts because we have no back-pressure to the hardware.
+ */
+
+#include <linux/slab.h>
+#include <linux/device.h>
+#include "kfd_priv.h"
+
+#define KFD_INTERRUPT_RING_SIZE 256
+
+static void interrupt_wq(struct work_struct *);
+
+int kfd_interrupt_init(struct kfd_dev *kfd)
+{
+	void *interrupt_ring = kmalloc_array(KFD_INTERRUPT_RING_SIZE,
+					kfd->device_info->ih_ring_entry_size,
+					GFP_KERNEL);
+	if (!interrupt_ring)
+		return -ENOMEM;
+
+	kfd->interrupt_ring = interrupt_ring;
+	kfd->interrupt_ring_size =
+		KFD_INTERRUPT_RING_SIZE * kfd->device_info->ih_ring_entry_size;
+	atomic_set(&kfd->interrupt_ring_wptr, 0);
+	atomic_set(&kfd->interrupt_ring_rptr, 0);
+
+	spin_lock_init(&kfd->interrupt_lock);
+
+	INIT_WORK(&kfd->interrupt_work, interrupt_wq);
+
+	kfd->interrupts_active = true;
+
+	/*
+	 * After this function returns, the interrupt will be enabled. This
+	 * barrier ensures that the interrupt running on a different processor
+	 * sees all the above writes.
+	 */
+	smp_wmb();
+
+	return 0;
+}
+
+void kfd_interrupt_exit(struct kfd_dev *kfd)
+{
+	/*
+	 * Stop the interrupt handler from writing to the ring and scheduling
+	 * workqueue items. The spinlock ensures that any interrupt running
+	 * after we have unlocked sees interrupts_active = false.
+	 */
+	unsigned long flags;
+
+	spin_lock_irqsave(&kfd->interrupt_lock, flags);
+	kfd->interrupts_active = false;
+	spin_unlock_irqrestore(&kfd->interrupt_lock, flags);
+
+	/*
+	 * Flush_scheduled_work ensures that there are no outstanding work-queue
+	 * items that will access interrupt_ring. New work items can't be
+	 * created because we stopped interrupt handling above.
+	 */
+	flush_scheduled_work();
+
+	kfree(kfd->interrupt_ring);
+}
+
+/*
+ * This assumes that it can't be called concurrently with itself
+ * but only with dequeue_ih_ring_entry.
+ */
+bool enqueue_ih_ring_entry(struct kfd_dev *kfd,	const void *ih_ring_entry)
+{
+	unsigned int rptr = atomic_read(&kfd->interrupt_ring_rptr);
+	unsigned int wptr = atomic_read(&kfd->interrupt_ring_wptr);
+
+	if ((rptr - wptr) % kfd->interrupt_ring_size == kfd->device_info->ih_ring_entry_size) {
+		/* This is very bad, the system is likely to hang. */
+		dev_err_ratelimited(kfd_chardev(),
+			"Interrupt ring overflow, dropping interrupt.\n");
+		return false;
+	}
+
+	memcpy(kfd->interrupt_ring + wptr, ih_ring_entry, kfd->device_info->ih_ring_entry_size);
+	wptr = (wptr + kfd->device_info->ih_ring_entry_size) % kfd->interrupt_ring_size;
+	smp_wmb(); /* Ensure memcpy'd data is visible before wptr update. */
+	atomic_set(&kfd->interrupt_ring_wptr, wptr);
+
+	return true;
+}
+
+/*
+ * This assumes that it can't be called concurrently with itself
+ * but only with enqueue_ih_ring_entry.
+ */
+static bool dequeue_ih_ring_entry(struct kfd_dev *kfd, void *ih_ring_entry)
+{
+	/*
+	 * Assume that wait queues have an implicit barrier, i.e. anything that
+	 * happened in the ISR before it queued work is visible.
+	 */
+
+	unsigned int wptr = atomic_read(&kfd->interrupt_ring_wptr);
+	unsigned int rptr = atomic_read(&kfd->interrupt_ring_rptr);
+
+	if (rptr == wptr)
+		return false;
+
+	memcpy(ih_ring_entry, kfd->interrupt_ring + rptr, kfd->device_info->ih_ring_entry_size);
+	rptr = (rptr + kfd->device_info->ih_ring_entry_size) % kfd->interrupt_ring_size;
+	smp_mb(); /* Ensure the rptr write update is not visible until memcpy has finished reading. */
+	atomic_set(&kfd->interrupt_ring_rptr, rptr);
+
+	return true;
+}
+
+static void interrupt_wq(struct work_struct *work)
+{
+	struct kfd_dev *dev = container_of(work, struct kfd_dev, interrupt_work);
+
+	uint32_t ih_ring_entry[DIV_ROUND_UP(dev->device_info->ih_ring_entry_size, sizeof(uint32_t))];
+
+	while (dequeue_ih_ring_entry(dev, ih_ring_entry))
+		;
+}
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
index 9815ead..8a1de68 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -120,6 +120,15 @@ struct kfd_dev {
 
 	struct kgd2kfd_shared_resources shared_resources;
 
+	/* Interrupts of interest to KFD are copied from the HW ring into a SW ring. */
+	bool interrupts_active;
+	void *interrupt_ring;
+	size_t interrupt_ring_size;
+	atomic_t interrupt_ring_rptr;
+	atomic_t interrupt_ring_wptr;
+	struct work_struct interrupt_work;
+	spinlock_t interrupt_lock;
+
 	/* QCM Device instance */
 	struct device_queue_manager *dqm;
 };
@@ -373,11 +382,14 @@ struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
 struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx);
 
 /* Interrupts */
-void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
+int kfd_interrupt_init(struct kfd_dev *dev);
+void kfd_interrupt_exit(struct kfd_dev *dev);
+void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry);
+bool enqueue_ih_ring_entry(struct kfd_dev *kfd,	const void *ih_ring_entry);
 
 /* Power Management */
-void kgd2kfd_suspend(struct kfd_dev *dev);
-int kgd2kfd_resume(struct kfd_dev *dev);
+void kgd2kfd_suspend(struct kfd_dev *kfd);
+int kgd2kfd_resume(struct kfd_dev *kfd);
 
 /* amdkfd Apertures */
 int kfd_init_apertures(struct kfd_process *process);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 21/25] amdkfd: Implement the create/destroy/update queue IOCTLs
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (16 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 20/25] amdkfd: Add interrupt handling module Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-20 23:09   ` Jerome Glisse
  2014-07-17 13:29 ` [PATCH v2 22/25] amdkfd: Implement the Set Memory Policy IOCTL Oded Gabbay
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Ben Goz <ben.goz@amd.com>

Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c | 133 +++++++++++++++++++++++++++-
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h    |   8 ++
 2 files changed, 138 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
index d6580a6..a74693a 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
@@ -119,17 +119,144 @@ static int kfd_open(struct inode *inode, struct file *filep)
 
 static long kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, void __user *arg)
 {
-	return -ENODEV;
+	struct kfd_ioctl_create_queue_args args;
+	struct kfd_dev *dev;
+	int err = 0;
+	unsigned int queue_id;
+	struct kfd_process_device *pdd;
+	struct queue_properties q_properties;
+
+	memset(&q_properties, 0, sizeof(struct queue_properties));
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	if (!access_ok(VERIFY_WRITE, args.read_pointer_address, sizeof(qptr_t))) {
+		pr_err("kfd: can't access read pointer");
+		return -EFAULT;
+	}
+
+	if (!access_ok(VERIFY_WRITE, args.write_pointer_address, sizeof(qptr_t))) {
+		pr_err("kfd: can't access write pointer");
+		return -EFAULT;
+	}
+
+	q_properties.is_interop = false;
+	q_properties.queue_percent = args.queue_percentage;
+	q_properties.priority = args.queue_priority;
+	q_properties.queue_address = args.ring_base_address;
+	q_properties.queue_size = args.ring_size;
+	q_properties.read_ptr = (qptr_t *) args.read_pointer_address;
+	q_properties.write_ptr = (qptr_t *) args.write_pointer_address;
+
+
+	pr_debug("%s Arguments: Queue Percentage (%d, %d)\n"
+			"Queue Priority (%d, %d)\n"
+			"Queue Address (0x%llX, 0x%llX)\n"
+			"Queue Size (0x%llX, %u)\n"
+			"Queue r/w Pointers (0x%llX, 0x%llX)\n",
+			__func__,
+			q_properties.queue_percent, args.queue_percentage,
+			q_properties.priority, args.queue_priority,
+			q_properties.queue_address, args.ring_base_address,
+			q_properties.queue_size, args.ring_size,
+			(uint64_t) q_properties.read_ptr,
+			(uint64_t) q_properties.write_ptr);
+
+	dev = kfd_device_by_id(args.gpu_id);
+	if (dev == NULL)
+		return -EINVAL;
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_bind_process_to_device(dev, p);
+	if (IS_ERR(pdd) < 0) {
+		err = PTR_ERR(pdd);
+		goto err_bind_process;
+	}
+
+	pr_debug("kfd: creating queue for PASID %d on GPU 0x%x\n",
+			p->pasid,
+			dev->id);
+
+	err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, 0, KFD_QUEUE_TYPE_COMPUTE, &queue_id);
+	if (err != 0)
+		goto err_create_queue;
+
+	args.queue_id = queue_id;
+	args.doorbell_address = (uint64_t)q_properties.doorbell_ptr;
+
+	if (copy_to_user(arg, &args, sizeof(args))) {
+		err = -EFAULT;
+		goto err_copy_args_out;
+	}
+
+	mutex_unlock(&p->mutex);
+
+	pr_debug("kfd: queue id %d was created successfully.\n"
+		 "     ring buffer address == 0x%016llX\n"
+		 "     read ptr address    == 0x%016llX\n"
+		 "     write ptr address   == 0x%016llX\n"
+		 "     doorbell address    == 0x%016llX\n",
+			args.queue_id,
+			args.ring_base_address,
+			args.read_pointer_address,
+			args.write_pointer_address,
+			args.doorbell_address);
+
+	return 0;
+
+err_copy_args_out:
+	pqm_destroy_queue(&p->pqm, queue_id);
+err_create_queue:
+err_bind_process:
+	mutex_unlock(&p->mutex);
+	return err;
 }
 
 static int kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *arg)
 {
-	return -ENODEV;
+	int retval;
+	struct kfd_ioctl_destroy_queue_args args;
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	pr_debug("kfd: destroying queue id %d for PASID %d\n",
+				args.queue_id,
+				p->pasid);
+
+	mutex_lock(&p->mutex);
+
+	retval = pqm_destroy_queue(&p->pqm, args.queue_id);
+
+	mutex_unlock(&p->mutex);
+	return retval;
 }
 
 static int kfd_ioctl_update_queue(struct file *filp, struct kfd_process *p, void __user *arg)
 {
-	return -ENODEV;
+	int retval;
+	struct kfd_ioctl_update_queue_args args;
+	struct queue_properties properties;
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	properties.queue_address = args.ring_base_address;
+	properties.queue_size = args.ring_size;
+	properties.queue_percent = args.queue_percentage;
+	properties.priority = args.queue_priority;
+
+	pr_debug("kfd: updating queue id %d for PASID %d\n", args.queue_id, p->pasid);
+
+	mutex_lock(&p->mutex);
+
+	retval = pqm_update_queue(&p->pqm, args.queue_id, &properties);
+
+	mutex_unlock(&p->mutex);
+
+	return retval;
 }
 
 static long kfd_ioctl_set_memory_policy(struct file *filep, struct kfd_process *p, void __user *arg)
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
index 8a1de68..7ea0e81 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -418,7 +418,15 @@ struct process_queue_node {
 
 int pqm_init(struct process_queue_manager *pqm, struct kfd_process *p);
 void pqm_uninit(struct process_queue_manager *pqm);
+int pqm_create_queue(struct process_queue_manager *pqm,
+			    struct kfd_dev *dev,
+			    struct file *f,
+			    struct queue_properties *properties,
+			    unsigned int flags,
+			    enum kfd_queue_type type,
+			    unsigned int *qid);
 int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid);
+int pqm_update_queue(struct process_queue_manager *pqm, unsigned int qid, struct queue_properties *p);
 
 /* Packet Manager */
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 21/25] amdkfd: Implement the create/destroy/update queue IOCTLs
  2014-07-17 13:29 ` [PATCH v2 21/25] amdkfd: Implement the create/destroy/update queue IOCTLs Oded Gabbay
@ 2014-07-20 23:09   ` Jerome Glisse
  2014-07-27 10:15     ` Oded Gabbay
  0 siblings, 1 reply; 46+ messages in thread
From: Jerome Glisse @ 2014-07-20 23:09 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Evgeny Pinchuk, Alexey Skidanov, Alex Deucher, Andrew Morton,
	Christian König

On Thu, Jul 17, 2014 at 04:29:28PM +0300, Oded Gabbay wrote:
> From: Ben Goz <ben.goz@amd.com>
> 
> Signed-off-by: Ben Goz <ben.goz@amd.com>
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c | 133 +++++++++++++++++++++++++++-
>  drivers/gpu/drm/radeon/amdkfd/kfd_priv.h    |   8 ++
>  2 files changed, 138 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
> index d6580a6..a74693a 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
> @@ -119,17 +119,144 @@ static int kfd_open(struct inode *inode, struct file *filep)
>  
>  static long kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, void __user *arg)
>  {
> -	return -ENODEV;
> +	struct kfd_ioctl_create_queue_args args;
> +	struct kfd_dev *dev;
> +	int err = 0;
> +	unsigned int queue_id;
> +	struct kfd_process_device *pdd;
> +	struct queue_properties q_properties;
> +
> +	memset(&q_properties, 0, sizeof(struct queue_properties));
> +
> +	if (copy_from_user(&args, arg, sizeof(args)))
> +		return -EFAULT;
> +
> +	if (!access_ok(VERIFY_WRITE, args.read_pointer_address, sizeof(qptr_t))) {
> +		pr_err("kfd: can't access read pointer");
> +		return -EFAULT;
> +	}
> +
> +	if (!access_ok(VERIFY_WRITE, args.write_pointer_address, sizeof(qptr_t))) {
> +		pr_err("kfd: can't access write pointer");
> +		return -EFAULT;
> +	}
> +
> +	q_properties.is_interop = false;
> +	q_properties.queue_percent = args.queue_percentage;
> +	q_properties.priority = args.queue_priority;
> +	q_properties.queue_address = args.ring_base_address;
> +	q_properties.queue_size = args.ring_size;
> +	q_properties.read_ptr = (qptr_t *) args.read_pointer_address;
> +	q_properties.write_ptr = (qptr_t *) args.write_pointer_address;
> +

So there is still no sanity check on any of the argument especialy the queue_size.
I might have missed it, if so i think it really should be here inside the ioctl
function as is simpler to find.

> +
> +	pr_debug("%s Arguments: Queue Percentage (%d, %d)\n"
> +			"Queue Priority (%d, %d)\n"
> +			"Queue Address (0x%llX, 0x%llX)\n"
> +			"Queue Size (0x%llX, %u)\n"
> +			"Queue r/w Pointers (0x%llX, 0x%llX)\n",
> +			__func__,
> +			q_properties.queue_percent, args.queue_percentage,
> +			q_properties.priority, args.queue_priority,
> +			q_properties.queue_address, args.ring_base_address,
> +			q_properties.queue_size, args.ring_size,
> +			(uint64_t) q_properties.read_ptr,
> +			(uint64_t) q_properties.write_ptr);

One pr_debug call perline.

> +
> +	dev = kfd_device_by_id(args.gpu_id);
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	mutex_lock(&p->mutex);
> +
> +	pdd = kfd_bind_process_to_device(dev, p);
> +	if (IS_ERR(pdd) < 0) {
> +		err = PTR_ERR(pdd);
> +		goto err_bind_process;
> +	}
> +
> +	pr_debug("kfd: creating queue for PASID %d on GPU 0x%x\n",
> +			p->pasid,
> +			dev->id);
> +
> +	err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, 0, KFD_QUEUE_TYPE_COMPUTE, &queue_id);
> +	if (err != 0)
> +		goto err_create_queue;
> +
> +	args.queue_id = queue_id;
> +	args.doorbell_address = (uint64_t)q_properties.doorbell_ptr;
> +
> +	if (copy_to_user(arg, &args, sizeof(args))) {
> +		err = -EFAULT;
> +		goto err_copy_args_out;
> +	}
> +
> +	mutex_unlock(&p->mutex);
> +
> +	pr_debug("kfd: queue id %d was created successfully.\n"
> +		 "     ring buffer address == 0x%016llX\n"
> +		 "     read ptr address    == 0x%016llX\n"
> +		 "     write ptr address   == 0x%016llX\n"
> +		 "     doorbell address    == 0x%016llX\n",
> +			args.queue_id,
> +			args.ring_base_address,
> +			args.read_pointer_address,
> +			args.write_pointer_address,
> +			args.doorbell_address);
> +

Ditto

> +	return 0;
> +
> +err_copy_args_out:
> +	pqm_destroy_queue(&p->pqm, queue_id);
> +err_create_queue:
> +err_bind_process:
> +	mutex_unlock(&p->mutex);
> +	return err;
>  }
>  
>  static int kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *arg)
>  {
> -	return -ENODEV;
> +	int retval;
> +	struct kfd_ioctl_destroy_queue_args args;
> +
> +	if (copy_from_user(&args, arg, sizeof(args)))
> +		return -EFAULT;
> +
> +	pr_debug("kfd: destroying queue id %d for PASID %d\n",
> +				args.queue_id,
> +				p->pasid);
> +
> +	mutex_lock(&p->mutex);
> +
> +	retval = pqm_destroy_queue(&p->pqm, args.queue_id);
> +
> +	mutex_unlock(&p->mutex);
> +	return retval;
>  }
>  
>  static int kfd_ioctl_update_queue(struct file *filp, struct kfd_process *p, void __user *arg)
>  {
> -	return -ENODEV;
> +	int retval;
> +	struct kfd_ioctl_update_queue_args args;
> +	struct queue_properties properties;
> +
> +	if (copy_from_user(&args, arg, sizeof(args)))
> +		return -EFAULT;
> +
> +	properties.queue_address = args.ring_base_address;
> +	properties.queue_size = args.ring_size;
> +	properties.queue_percent = args.queue_percentage;
> +	properties.priority = args.queue_priority;
> +

Would need sanity check on argument.

> +	pr_debug("kfd: updating queue id %d for PASID %d\n", args.queue_id, p->pasid);
> +
> +	mutex_lock(&p->mutex);
> +
> +	retval = pqm_update_queue(&p->pqm, args.queue_id, &properties);
> +
> +	mutex_unlock(&p->mutex);
> +
> +	return retval;
>  }
>  
>  static long kfd_ioctl_set_memory_policy(struct file *filep, struct kfd_process *p, void __user *arg)
> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> index 8a1de68..7ea0e81 100644
> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
> @@ -418,7 +418,15 @@ struct process_queue_node {
>  
>  int pqm_init(struct process_queue_manager *pqm, struct kfd_process *p);
>  void pqm_uninit(struct process_queue_manager *pqm);
> +int pqm_create_queue(struct process_queue_manager *pqm,
> +			    struct kfd_dev *dev,
> +			    struct file *f,
> +			    struct queue_properties *properties,
> +			    unsigned int flags,
> +			    enum kfd_queue_type type,
> +			    unsigned int *qid);
>  int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid);
> +int pqm_update_queue(struct process_queue_manager *pqm, unsigned int qid, struct queue_properties *p);
>  
>  /* Packet Manager */
>  
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2 21/25] amdkfd: Implement the create/destroy/update queue IOCTLs
  2014-07-20 23:09   ` Jerome Glisse
@ 2014-07-27 10:15     ` Oded Gabbay
  0 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-27 10:15 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, dri-devel,
	Alex Deucher, Alexey Skidanov, Andrew Morton,
	Christian König

On 21/07/14 02:09, Jerome Glisse wrote:
> On Thu, Jul 17, 2014 at 04:29:28PM +0300, Oded Gabbay wrote:
>> From: Ben Goz <ben.goz@amd.com>
>>
>> Signed-off-by: Ben Goz <ben.goz@amd.com>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> ---
>>   drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c | 133 +++++++++++++++++++++++++++-
>>   drivers/gpu/drm/radeon/amdkfd/kfd_priv.h    |   8 ++
>>   2 files changed, 138 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
>> index d6580a6..a74693a 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
>> @@ -119,17 +119,144 @@ static int kfd_open(struct inode *inode, struct file *filep)
>>
>>   static long kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, void __user *arg)
>>   {
>> -	return -ENODEV;
>> +	struct kfd_ioctl_create_queue_args args;
>> +	struct kfd_dev *dev;
>> +	int err = 0;
>> +	unsigned int queue_id;
>> +	struct kfd_process_device *pdd;
>> +	struct queue_properties q_properties;
>> +
>> +	memset(&q_properties, 0, sizeof(struct queue_properties));
>> +
>> +	if (copy_from_user(&args, arg, sizeof(args)))
>> +		return -EFAULT;
>> +
>> +	if (!access_ok(VERIFY_WRITE, args.read_pointer_address, sizeof(qptr_t))) {
>> +		pr_err("kfd: can't access read pointer");
>> +		return -EFAULT;
>> +	}
>> +
>> +	if (!access_ok(VERIFY_WRITE, args.write_pointer_address, sizeof(qptr_t))) {
>> +		pr_err("kfd: can't access write pointer");
>> +		return -EFAULT;
>> +	}
>> +
>> +	q_properties.is_interop = false;
>> +	q_properties.queue_percent = args.queue_percentage;
>> +	q_properties.priority = args.queue_priority;
>> +	q_properties.queue_address = args.ring_base_address;
>> +	q_properties.queue_size = args.ring_size;
>> +	q_properties.read_ptr = (qptr_t *) args.read_pointer_address;
>> +	q_properties.write_ptr = (qptr_t *) args.write_pointer_address;
>> +
>
> So there is still no sanity check on any of the argument especialy the queue_size.
> I might have missed it, if so i think it really should be here inside the ioctl
> function as is simpler to find.
>
Fixed in v3.
>> +
>> +	pr_debug("%s Arguments: Queue Percentage (%d, %d)\n"
>> +			"Queue Priority (%d, %d)\n"
>> +			"Queue Address (0x%llX, 0x%llX)\n"
>> +			"Queue Size (0x%llX, %u)\n"
>> +			"Queue r/w Pointers (0x%llX, 0x%llX)\n",
>> +			__func__,
>> +			q_properties.queue_percent, args.queue_percentage,
>> +			q_properties.priority, args.queue_priority,
>> +			q_properties.queue_address, args.ring_base_address,
>> +			q_properties.queue_size, args.ring_size,
>> +			(uint64_t) q_properties.read_ptr,
>> +			(uint64_t) q_properties.write_ptr);
>
> One pr_debug call perline.
>
Fixed in v3.
>> +
>> +	dev = kfd_device_by_id(args.gpu_id);
>> +	if (dev == NULL)
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&p->mutex);
>> +
>> +	pdd = kfd_bind_process_to_device(dev, p);
>> +	if (IS_ERR(pdd) < 0) {
>> +		err = PTR_ERR(pdd);
>> +		goto err_bind_process;
>> +	}
>> +
>> +	pr_debug("kfd: creating queue for PASID %d on GPU 0x%x\n",
>> +			p->pasid,
>> +			dev->id);
>> +
>> +	err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, 0, KFD_QUEUE_TYPE_COMPUTE, &queue_id);
>> +	if (err != 0)
>> +		goto err_create_queue;
>> +
>> +	args.queue_id = queue_id;
>> +	args.doorbell_address = (uint64_t)q_properties.doorbell_ptr;
>> +
>> +	if (copy_to_user(arg, &args, sizeof(args))) {
>> +		err = -EFAULT;
>> +		goto err_copy_args_out;
>> +	}
>> +
>> +	mutex_unlock(&p->mutex);
>> +
>> +	pr_debug("kfd: queue id %d was created successfully.\n"
>> +		 "     ring buffer address == 0x%016llX\n"
>> +		 "     read ptr address    == 0x%016llX\n"
>> +		 "     write ptr address   == 0x%016llX\n"
>> +		 "     doorbell address    == 0x%016llX\n",
>> +			args.queue_id,
>> +			args.ring_base_address,
>> +			args.read_pointer_address,
>> +			args.write_pointer_address,
>> +			args.doorbell_address);
>> +
>
> Ditto

Fixed in v3.
>
>> +	return 0;
>> +
>> +err_copy_args_out:
>> +	pqm_destroy_queue(&p->pqm, queue_id);
>> +err_create_queue:
>> +err_bind_process:
>> +	mutex_unlock(&p->mutex);
>> +	return err;
>>   }
>>
>>   static int kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *arg)
>>   {
>> -	return -ENODEV;
>> +	int retval;
>> +	struct kfd_ioctl_destroy_queue_args args;
>> +
>> +	if (copy_from_user(&args, arg, sizeof(args)))
>> +		return -EFAULT;
>> +
>> +	pr_debug("kfd: destroying queue id %d for PASID %d\n",
>> +				args.queue_id,
>> +				p->pasid);
>> +
>> +	mutex_lock(&p->mutex);
>> +
>> +	retval = pqm_destroy_queue(&p->pqm, args.queue_id);
>> +
>> +	mutex_unlock(&p->mutex);
>> +	return retval;
>>   }
>>
>>   static int kfd_ioctl_update_queue(struct file *filp, struct kfd_process *p, void __user *arg)
>>   {
>> -	return -ENODEV;
>> +	int retval;
>> +	struct kfd_ioctl_update_queue_args args;
>> +	struct queue_properties properties;
>> +
>> +	if (copy_from_user(&args, arg, sizeof(args)))
>> +		return -EFAULT;
>> +
>> +	properties.queue_address = args.ring_base_address;
>> +	properties.queue_size = args.ring_size;
>> +	properties.queue_percent = args.queue_percentage;
>> +	properties.priority = args.queue_priority;
>> +
>
> Would need sanity check on argument.

fixed in v3.
	Oded
>
>> +	pr_debug("kfd: updating queue id %d for PASID %d\n", args.queue_id, p->pasid);
>> +
>> +	mutex_lock(&p->mutex);
>> +
>> +	retval = pqm_update_queue(&p->pqm, args.queue_id, &properties);
>> +
>> +	mutex_unlock(&p->mutex);
>> +
>> +	return retval;
>>   }
>>
>>   static long kfd_ioctl_set_memory_policy(struct file *filep, struct kfd_process *p, void __user *arg)
>> diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> index 8a1de68..7ea0e81 100644
>> --- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
>> @@ -418,7 +418,15 @@ struct process_queue_node {
>>
>>   int pqm_init(struct process_queue_manager *pqm, struct kfd_process *p);
>>   void pqm_uninit(struct process_queue_manager *pqm);
>> +int pqm_create_queue(struct process_queue_manager *pqm,
>> +			    struct kfd_dev *dev,
>> +			    struct file *f,
>> +			    struct queue_properties *properties,
>> +			    unsigned int flags,
>> +			    enum kfd_queue_type type,
>> +			    unsigned int *qid);
>>   int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid);
>> +int pqm_update_queue(struct process_queue_manager *pqm, unsigned int qid, struct queue_properties *p);
>>
>>   /* Packet Manager */
>>
>> --
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2 22/25] amdkfd: Implement the Set Memory Policy IOCTL
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (17 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 21/25] amdkfd: Implement the create/destroy/update queue IOCTLs Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-17 13:29 ` [PATCH v2 23/25] amdkfd: Implement the Get Clock Counters IOCTL Oded Gabbay
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Andrew Lewycky <Andrew.Lewycky@amd.com>

Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c | 51 ++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
index a74693a..085bd91 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
@@ -35,6 +35,7 @@
 #include <uapi/asm-generic/mman-common.h>
 #include <asm/processor.h>
 #include "kfd_priv.h"
+#include "kfd_device_queue_manager.h"
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
@@ -261,7 +262,55 @@ static int kfd_ioctl_update_queue(struct file *filp, struct kfd_process *p, void
 
 static long kfd_ioctl_set_memory_policy(struct file *filep, struct kfd_process *p, void __user *arg)
 {
-	return -ENODEV;
+	struct kfd_ioctl_set_memory_policy_args args;
+	struct kfd_dev *dev;
+	int err = 0;
+	struct kfd_process_device *pdd;
+	enum cache_policy default_policy, alternate_policy;
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	if (args.default_policy != KFD_IOC_CACHE_POLICY_COHERENT
+	    && args.default_policy != KFD_IOC_CACHE_POLICY_NONCOHERENT) {
+		return -EINVAL;
+	}
+
+	if (args.alternate_policy != KFD_IOC_CACHE_POLICY_COHERENT
+	    && args.alternate_policy != KFD_IOC_CACHE_POLICY_NONCOHERENT) {
+		return -EINVAL;
+	}
+
+	dev = kfd_device_by_id(args.gpu_id);
+	if (dev == NULL)
+		return -EINVAL;
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_bind_process_to_device(dev, p);
+	if (IS_ERR(pdd) < 0) {
+		err = PTR_ERR(pdd);
+		goto out;
+	}
+
+	default_policy = (args.default_policy == KFD_IOC_CACHE_POLICY_COHERENT)
+			 ? cache_policy_coherent : cache_policy_noncoherent;
+
+	alternate_policy = (args.alternate_policy == KFD_IOC_CACHE_POLICY_COHERENT)
+			   ? cache_policy_coherent : cache_policy_noncoherent;
+
+	if (!dev->dqm->set_cache_memory_policy(dev->dqm,
+					 &pdd->qpd,
+					 default_policy,
+					 alternate_policy,
+					 (void __user *)args.alternate_aperture_base,
+					 args.alternate_aperture_size))
+		err = -EINVAL;
+
+out:
+	mutex_unlock(&p->mutex);
+
+	return err;
 }
 
 static long kfd_ioctl_get_clock_counters(struct file *filep, struct kfd_process *p, void __user *arg)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 23/25] amdkfd: Implement the Get Clock Counters IOCTL
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (18 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 22/25] amdkfd: Implement the Set Memory Policy IOCTL Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-17 13:29 ` [PATCH v2 24/25] amdkfd: Implement the Get Process Aperture IOCTL Oded Gabbay
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Evgeny Pinchuk <evgeny.pinchuk@amd.com>

Signed-off-by: Evgeny Pinchuk <evgeny.pinchuk@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
index 085bd91..72d8e79 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
@@ -315,7 +315,34 @@ out:
 
 static long kfd_ioctl_get_clock_counters(struct file *filep, struct kfd_process *p, void __user *arg)
 {
-	return -ENODEV;
+	struct kfd_ioctl_get_clock_counters_args args;
+	struct kfd_dev *dev;
+	struct timespec time;
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	dev = kfd_device_by_id(args.gpu_id);
+	if (dev == NULL)
+		return -EINVAL;
+
+	/* Reading GPU clock counter from KGD */
+	args.gpu_clock_counter = kfd2kgd->get_gpu_clock_counter(dev->kgd);
+
+	/* No access to rdtsc. Using raw monotonic time */
+	getrawmonotonic(&time);
+	args.cpu_clock_counter = (uint64_t)timespec_to_ns(&time);
+
+	get_monotonic_boottime(&time);
+	args.system_clock_counter = (uint64_t)timespec_to_ns(&time);
+
+	/* Since the counter is in nano-seconds we use 1GHz frequency */
+	args.system_clock_freq = 1000000000;
+
+	if (copy_to_user(arg, &args, sizeof(args)))
+		return -EFAULT;
+
+	return 0;
 }
 
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 24/25] amdkfd: Implement the Get Process Aperture IOCTL
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (19 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 23/25] amdkfd: Implement the Get Clock Counters IOCTL Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
  2014-07-17 13:29 ` [PATCH v2 25/25] amdkfd: Implement the PMC Acquire/Release IOCTLs Oded Gabbay
       [not found] ` <1405603773-32688-9-git-send-email-oded.gabbay@amd.com>
  22 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Alexey Skidanov <Alexey.Skidanov@amd.com>

Signed-off-by: Alexey Skidanov <Alexey.Skidanov@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c | 40 ++++++++++++++++++++++++++++-
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h    |  5 ++++
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
index 72d8e79..1e19504 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
@@ -348,7 +348,45 @@ static long kfd_ioctl_get_clock_counters(struct file *filep, struct kfd_process
 
 static int kfd_ioctl_get_process_apertures(struct file *filp, struct kfd_process *p, void __user *arg)
 {
-	return -ENODEV;
+	struct kfd_ioctl_get_process_apertures_args args;
+	struct kfd_process_device *pdd;
+
+	dev_dbg(kfd_device, "get apertures for PASID %d", p->pasid);
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	args.num_of_nodes = 0;
+
+	mutex_lock(&p->mutex);
+
+	/*if the process-device list isn't empty*/
+	if (kfd_has_process_device_data(p)) {
+		/* Run over all pdd of the process */
+		pdd = kfd_get_first_process_device_data(p);
+		do {
+
+			args.process_apertures[args.num_of_nodes].gpu_id = pdd->dev->id;
+			args.process_apertures[args.num_of_nodes].lds_base = pdd->lds_base;
+			args.process_apertures[args.num_of_nodes].lds_limit = pdd->lds_limit;
+			args.process_apertures[args.num_of_nodes].gpuvm_base = pdd->gpuvm_base;
+			args.process_apertures[args.num_of_nodes].gpuvm_limit = pdd->gpuvm_limit;
+			args.process_apertures[args.num_of_nodes].scratch_base = pdd->scratch_base;
+			args.process_apertures[args.num_of_nodes].scratch_limit = pdd->scratch_limit;
+
+			dev_dbg(kfd_device, "node id %u, gpu id %u, lds_base %llX lds_limit %llX gpuvm_base %llX gpuvm_limit %llX scratch_base %llX scratch_limit %llX",
+					args.num_of_nodes, pdd->dev->id, pdd->lds_base, pdd->lds_limit, pdd->gpuvm_base, pdd->gpuvm_limit, pdd->scratch_base, pdd->scratch_limit);
+			args.num_of_nodes++;
+		} while ((pdd = kfd_get_next_process_device_data(p, pdd)) != NULL &&
+				(args.num_of_nodes < NUM_OF_SUPPORTED_GPUS));
+	}
+
+	mutex_unlock(&p->mutex);
+
+	if (copy_to_user(arg, &args, sizeof(args)))
+		return -EFAULT;
+
+	return 0;
 }
 
 static long kfd_ioctl_pmc_acquire_access(struct file *filp, struct kfd_process *p, void __user *arg)
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
index 7ea0e81..1db1ede 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -346,6 +346,11 @@ void kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid);
 struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
 							struct kfd_process *p);
 
+/* Process device data iterator */
+struct kfd_process_device *kfd_get_first_process_device_data(struct kfd_process *p);
+struct kfd_process_device *kfd_get_next_process_device_data(struct kfd_process *p, struct kfd_process_device *pdd);
+bool kfd_has_process_device_data(struct kfd_process *p);
+
 /* PASIDs */
 int kfd_pasid_init(void);
 void kfd_pasid_exit(void);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v2 25/25] amdkfd: Implement the PMC Acquire/Release IOCTLs
       [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
                   ` (20 preceding siblings ...)
  2014-07-17 13:29 ` [PATCH v2 24/25] amdkfd: Implement the Get Process Aperture IOCTL Oded Gabbay
@ 2014-07-17 13:29 ` Oded Gabbay
       [not found] ` <1405603773-32688-9-git-send-email-oded.gabbay@amd.com>
  22 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-07-17 13:29 UTC (permalink / raw)
  To: David Airlie, Jerome Glisse, Alex Deucher, Andrew Morton
  Cc: Andrew Lewycky, Michel Dänzer, linux-kernel, Evgeny Pinchuk,
	Alexey Skidanov, dri-devel, Alex Deucher, Christian König

From: Evgeny Pinchuk <evgeny.pinchuk@amd.com>

Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c | 46 +++++++++++++++++++++++++++--
 drivers/gpu/drm/radeon/amdkfd/kfd_device.c  |  2 ++
 drivers/gpu/drm/radeon/amdkfd/kfd_priv.h    |  5 ++++
 drivers/gpu/drm/radeon/amdkfd/kfd_process.c |  6 ++++
 4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
index 1e19504..be90ab9 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_chardev.c
@@ -391,12 +391,54 @@ static int kfd_ioctl_get_process_apertures(struct file *filp, struct kfd_process
 
 static long kfd_ioctl_pmc_acquire_access(struct file *filp, struct kfd_process *p, void __user *arg)
 {
-	return -ENODEV;
+	struct kfd_ioctl_pmc_acquire_access_args args;
+	struct kfd_dev *dev;
+	int err = -EBUSY;
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	dev = kfd_device_by_id(args.gpu_id);
+	if (dev == NULL)
+		return -EINVAL;
+
+	spin_lock(&dev->pmc_access_lock);
+	if (dev->pmc_locking_process == NULL) {
+		dev->pmc_locking_process = p;
+		dev->pmc_locking_trace = args.trace_id;
+		err = 0;
+	} else if (dev->pmc_locking_process == p && dev->pmc_locking_trace == args.trace_id) {
+		/* Same trace already has an access. Returning success */
+		err = 0;
+	}
+
+	spin_unlock(&dev->pmc_access_lock);
+
+	return err;
 }
 
 static long kfd_ioctl_pmc_release_access(struct file *filp, struct kfd_process *p, void __user *arg)
 {
-	return -ENODEV;
+	struct kfd_ioctl_pmc_release_access_args args;
+	struct kfd_dev *dev;
+	int err = -EINVAL;
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	dev = kfd_device_by_id(args.gpu_id);
+	if (dev == NULL)
+		return -EINVAL;
+
+	spin_lock(&dev->pmc_access_lock);
+	if (dev->pmc_locking_process == p && dev->pmc_locking_trace == args.trace_id) {
+		dev->pmc_locking_process = NULL;
+		dev->pmc_locking_trace = 0;
+		err = 0;
+	}
+	spin_unlock(&dev->pmc_access_lock);
+
+	return err;
 }
 
 static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
index 6a7a8b2..f1cbc46 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_device.c
@@ -184,6 +184,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 		return false;
 	}
 
+	spin_lock_init(&kfd->pmc_access_lock);
+
 	kfd->init_complete = true;
 	dev_info(kfd_device, "added device (%x:%x)\n", kfd->pdev->vendor,
 		 kfd->pdev->device);
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
index 1db1ede..a5356d1 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_priv.h
@@ -131,6 +131,11 @@ struct kfd_dev {
 
 	/* QCM Device instance */
 	struct device_queue_manager *dqm;
+
+	/* Performance counters exclusivity lock */
+	spinlock_t pmc_access_lock;
+	struct kfd_process *pmc_locking_process;
+	uint64_t pmc_locking_trace;
 };
 
 /* KGD2KFD callbacks */
diff --git a/drivers/gpu/drm/radeon/amdkfd/kfd_process.c b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
index bcc004f..a67c239 100644
--- a/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/radeon/amdkfd/kfd_process.c
@@ -98,6 +98,12 @@ static void free_process(struct kfd_process *p)
 	BUG_ON(p == NULL);
 
 	list_for_each_entry_safe(pdd, temp, &p->per_device_data, per_device_list) {
+		spin_lock(&pdd->dev->pmc_access_lock);
+		if (pdd->dev->pmc_locking_process == p) {
+			pdd->dev->pmc_locking_process = NULL;
+			pdd->dev->pmc_locking_trace = 0;
+		}
+		spin_unlock(&pdd->dev->pmc_access_lock);
 		amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
 		list_del(&pdd->per_device_list);
 		kfree(pdd);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

[parent not found: <1405603773-32688-9-git-send-email-oded.gabbay@amd.com>]

[parent not found: <20140720165432.GB3068@gmail.com>]

* Re: [PATCH v2 08/25] amdkfd: Add IOCTL set definitions of amdkfd
       [not found]   ` <20140720165432.GB3068@gmail.com>
@ 2014-08-02 20:00     ` Oded Gabbay
  0 siblings, 0 replies; 46+ messages in thread
From: Oded Gabbay @ 2014-08-02 20:00 UTC (permalink / raw)
  To: Jerome Glisse, linux-kernel, linux-api,
	dri-devel@lists.freedesktop.org
  Cc: Andrew Lewycky, Michel Dänzer, Alexey Skidanov,
	Andrew Morton



On 20/07/14 19:54, Jerome Glisse wrote:
> On Thu, Jul 17, 2014 at 04:29:15PM +0300, Oded Gabbay wrote:
>> - KFD_IOC_GET_VERSION:
>> 	Retrieves the interface version of amdkfd
>>
>> - KFD_IOC_CREATE_QUEUE:
>> 	Creates a usermode queue that runs on a specific GPU device
>>
>> - KFD_IOC_DESTROY_QUEUE:
>> 	Destroys an existing usermode queue
>>
>> - KFD_IOC_SET_MEMORY_POLICY:
>> 	Sets the memory policy of the default and alternate aperture of the calling process
>>
>> - KFD_IOC_GET_CLOCK_COUNTERS:
>> 	Retrieves counters (timestamps) of CPU and GPU
>>
>> - KFD_IOC_GET_PROCESS_APERTURES:
>> 	Retrieves information about process apertures that were initialized
>> during the open() call of the amdkfd device
>>
>> - KFD_IOC_UPDATE_QUEUE:
>> 	Updates configuration of an existing usermode queue
>>
>> - KFD_IOC_PMC_ACQUIRE_ACCESS:
>> 	Acquires exclusive access (from other HSA processes) to the performance
>> counters of the GPU
>>
>> - KFD_IOC_PMC_RELEASE_ACCESS:
>> 	Releases exclusive access of the GPU's performance counters
> 
> Exclusive userspace access is recipie for failure. This must go and you must
> come up with better model. Which in my mind involve an ioctl for each command
> buffer submission.
> 
Removed these 2 ioctls in v3
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> ---
>>  include/uapi/linux/kfd_ioctl.h | 133 +++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 133 insertions(+)
>>  create mode 100644 include/uapi/linux/kfd_ioctl.h
>>
>> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
>> new file mode 100644
>> index 0000000..3cedd1a
>> --- /dev/null
>> +++ b/include/uapi/linux/kfd_ioctl.h
>> @@ -0,0 +1,133 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#ifndef KFD_IOCTL_H_INCLUDED
>> +#define KFD_IOCTL_H_INCLUDED
>> +
>> +#include <linux/types.h>
>> +#include <linux/ioctl.h>
>> +
>> +#define KFD_IOCTL_CURRENT_VERSION 1
>> +
>> +/* The 64-bit ABI is the authoritative version. */
>> +#pragma pack(push, 1)
> 
> pagram pack must be remove do not use that.
> 
Removed in v3
>> +
>> +struct kfd_ioctl_get_version_args {
>> +	uint32_t min_supported_version;	/* from KFD */
>> +	uint32_t max_supported_version;	/* from KFD */
>> +};
>> +
>> +/* For kfd_ioctl_create_queue_args.queue_type. */
>> +#define KFD_IOC_QUEUE_TYPE_COMPUTE   0
>> +#define KFD_IOC_QUEUE_TYPE_SDMA      1
>> +
>> +struct kfd_ioctl_create_queue_args {
>> +	uint64_t ring_base_address;	/* to KFD */
>> +	uint64_t write_pointer_address;	/* from KFD */
>> +	uint64_t read_pointer_address;	/* from KFD */
>> +	uint64_t doorbell_address;	/* from KFD */
>> +
>> +	uint32_t ring_size;		/* to KFD */
>> +	uint32_t gpu_id;		/* to KFD */
>> +	uint32_t queue_type;		/* to KFD */
>> +	uint32_t queue_percentage;	/* to KFD */
>> +	uint32_t queue_priority;	/* to KFD */
>> +	uint32_t queue_id;		/* from KFD */
>> +};
>> +
>> +struct kfd_ioctl_destroy_queue_args {
>> +	uint32_t queue_id;		/* to KFD */
>> +};
>> +
>> +struct kfd_ioctl_update_queue_args {
>> +	uint64_t ring_base_address;	/* to KFD */
>> +
>> +	uint32_t queue_id;		/* to KFD */
>> +	uint32_t ring_size;		/* to KFD */
>> +	uint32_t queue_percentage;	/* to KFD */
>> +	uint32_t queue_priority;	/* to KFD */
>> +};
> 
> The queue_percentage and queue_priority really needs some explanations. I guess
> userspace would shed some light on those but still this should be properly explain.
> 
Added explanation in v3
>> +
>> +/* For kfd_ioctl_set_memory_policy_args.default_policy and alternate_policy */
>> +#define KFD_IOC_CACHE_POLICY_COHERENT 0
>> +#define KFD_IOC_CACHE_POLICY_NONCOHERENT 1
>> +
>> +struct kfd_ioctl_set_memory_policy_args {
>> +	uint64_t alternate_aperture_base;	/* to KFD */
>> +	uint64_t alternate_aperture_size;	/* to KFD */
>> +
>> +	uint32_t gpu_id;			/* to KFD */
>> +	uint32_t default_policy;		/* to KFD */
>> +	uint32_t alternate_policy;		/* to KFD */
>> +};
> 
> Same what is aperture in this context. I know about all this stuff but i have no
> idea what aperture is in this context.
> 
Added explanation in v3
>> +
>> +struct kfd_ioctl_get_clock_counters_args {
>> +	uint64_t gpu_clock_counter;	/* from KFD */
>> +	uint64_t cpu_clock_counter;	/* from KFD */
>> +	uint64_t system_clock_counter;	/* from KFD */
>> +	uint64_t system_clock_freq;	/* from KFD */
>> +
>> +	uint32_t gpu_id;		/* to KFD */
>> +};
> 
> Again comment about what kind of counter this is, monotonic, ...
> 
Added explanation in v3
>> +
>> +#define NUM_OF_SUPPORTED_GPUS 7
>> +
>> +struct kfd_process_device_apertures {
>> +	uint64_t lds_base;		/* from KFD */
>> +	uint64_t lds_limit;		/* from KFD */
>> +	uint64_t scratch_base;		/* from KFD */
>> +	uint64_t scratch_limit;		/* from KFD */
>> +	uint64_t gpuvm_base;		/* from KFD */
>> +	uint64_t gpuvm_limit;		/* from KFD */
>> +	uint32_t gpu_id;		/* from KFD */
>> +};
>> +
>> +struct kfd_ioctl_get_process_apertures_args {
>> +	struct kfd_process_device_apertures process_apertures[NUM_OF_SUPPORTED_GPUS];/* from KFD */
>> +	uint8_t num_of_nodes; /* from KFD, should be in the range [1 - NUM_OF_SUPPORTED_GPUS]*/
>> +};
> 
> I would rather see userspace provide a pointer to an array and the size of that
> array and possibly size of individual element. This would allow you to grow the
> kfd_process_device_apertures if needs be. Thought as i understand it this is a
> temporary driver.
> 
This is not going to change in the near or far future. I also agree that
we would probably see a common framework before NUM_OF_SUPPORTED_GPUS
increases.
If you insist, I'll try to squeeze it in at v4
>> +
>> +struct kfd_ioctl_pmc_acquire_access_args {
>> +	uint64_t trace_id;		/* to KFD */
>> +	uint32_t gpu_id;		/* to KFD */
>> +};
>> +
>> +struct kfd_ioctl_pmc_release_access_args {
>> +	uint64_t trace_id;		/* to KFD */
>> +	uint32_t gpu_id;		/* to KFD */
>> +};
> 
> As said above no ioctl to have some exclusive access you can not trust userspace
> that's rules NUMBER ONE.
> 
Removed in v3.
>> +
>> +#define KFD_IOC_MAGIC 'K'
>> +
>> +#define KFD_IOC_GET_VERSION		_IOR(KFD_IOC_MAGIC, 1, struct kfd_ioctl_get_version_args)
>> +#define KFD_IOC_CREATE_QUEUE		_IOWR(KFD_IOC_MAGIC, 2, struct kfd_ioctl_create_queue_args)
>> +#define KFD_IOC_DESTROY_QUEUE		_IOWR(KFD_IOC_MAGIC, 3, struct kfd_ioctl_destroy_queue_args)
>> +#define KFD_IOC_SET_MEMORY_POLICY	_IOW(KFD_IOC_MAGIC, 4, struct kfd_ioctl_set_memory_policy_args)
>> +#define KFD_IOC_GET_CLOCK_COUNTERS	_IOWR(KFD_IOC_MAGIC, 5, struct kfd_ioctl_get_clock_counters_args)
>> +#define KFD_IOC_GET_PROCESS_APERTURES	_IOR(KFD_IOC_MAGIC, 6, struct kfd_ioctl_get_process_apertures_args)
>> +#define KFD_IOC_UPDATE_QUEUE		_IOW(KFD_IOC_MAGIC, 7, struct kfd_ioctl_update_queue_args)
>> +#define KFD_IOC_PMC_ACQUIRE_ACCESS	_IOW(KFD_IOC_MAGIC, 12, struct kfd_ioctl_pmc_acquire_access_args)
>> +#define KFD_IOC_PMC_RELEASE_ACCESS	_IOW(KFD_IOC_MAGIC, 13, struct kfd_ioctl_pmc_release_access_args)
>> +
>> +#pragma pack(pop)
> 
> No pragma packing.
> 
Removed in v3
>> +
>> +#endif
>> -- 
>> 1.9.1
>>
Oded

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2014-08-02 20:07 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1405603773-32688-1-git-send-email-oded.gabbay@amd.com>
2014-07-17 13:29 ` [PATCH v2 02/25] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 03/25] drm/radeon/cik: Don't touch int of pipes 1-7 Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 04/25] drm/radeon: Report doorbell configuration to amdkfd Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 05/25] drm/radeon: adding synchronization for GRBM GFX Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 06/25] drm/radeon: Add radeon <--> amdkfd interface Oded Gabbay
2014-07-20 17:35   ` Jerome Glisse
2014-08-02 20:07     ` Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 09/25] amdkfd: Add amdkfd skeleton driver Oded Gabbay
2014-07-20 17:09   ` Jerome Glisse
2014-08-02 19:55     ` Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 10/25] amdkfd: Add topology module to amdkfd Oded Gabbay
2014-07-20 22:37   ` Jerome Glisse
2014-07-27 11:15     ` Oded Gabbay
2014-07-30 12:10       ` Oded Gabbay
2014-07-27 11:26     ` Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 11/25] amdkfd: Add basic modules " Oded Gabbay
2014-07-20 23:02   ` Jerome Glisse
2014-08-02 19:25     ` Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 12/25] amdkfd: Add binding/unbinding calls to amd_iommu driver Oded Gabbay
2014-07-20 23:04   ` Jerome Glisse
2014-07-27 11:11     ` Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 13/25] amdkfd: Add queue module Oded Gabbay
2014-07-20 23:06   ` Jerome Glisse
2014-07-27 11:09     ` Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 14/25] amdkfd: Add mqd_manager module Oded Gabbay
2014-07-21  2:33   ` Jerome Glisse
2014-08-02 19:18     ` Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 15/25] amdkfd: Add kernel queue module Oded Gabbay
2014-07-21  2:42   ` Jerome Glisse
2014-07-27 11:05     ` Oded Gabbay
2014-07-27 12:40       ` Christian König
2014-07-17 13:29 ` [PATCH v2 16/25] amdkfd: Add module parameter of scheduling policy Oded Gabbay
2014-07-21  2:45   ` Jerome Glisse
2014-07-27 10:21     ` Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 17/25] amdkfd: Add packet manager module Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 18/25] amdkfd: Add process queue " Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 19/25] amdkfd: Add device " Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 20/25] amdkfd: Add interrupt handling module Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 21/25] amdkfd: Implement the create/destroy/update queue IOCTLs Oded Gabbay
2014-07-20 23:09   ` Jerome Glisse
2014-07-27 10:15     ` Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 22/25] amdkfd: Implement the Set Memory Policy IOCTL Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 23/25] amdkfd: Implement the Get Clock Counters IOCTL Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 24/25] amdkfd: Implement the Get Process Aperture IOCTL Oded Gabbay
2014-07-17 13:29 ` [PATCH v2 25/25] amdkfd: Implement the PMC Acquire/Release IOCTLs Oded Gabbay
     [not found] ` <1405603773-32688-9-git-send-email-oded.gabbay@amd.com>
     [not found]   ` <20140720165432.GB3068@gmail.com>
2014-08-02 20:00     ` [PATCH v2 08/25] amdkfd: Add IOCTL set definitions of amdkfd Oded Gabbay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).