All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: intel-gpu-tools patches for read/write MMIO
       [not found]         ` <CALNAZXoC-Ss_2uxV+3Fc=SoStn+t_9pkBAdfMZ-ReVLHHGvu3g@mail.gmail.com>
@ 2013-01-29  8:16           ` Cheah, Vincent Beng Keat
  2013-01-29 20:01             ` Jesse Barnes
  0 siblings, 1 reply; 13+ messages in thread
From: Cheah, Vincent Beng Keat @ 2013-01-29  8:16 UTC (permalink / raw)
  To: intel-gfx@lists.freedesktop.org
  Cc: Vetter, Daniel, Barnes, Jesse, Ung, Teng En,
	Teres Alexis, Alan Previn, Widawsky, Benjamin

[-- Attachment #1: Type: text/plain, Size: 7475 bytes --]

Hi 

Attached refers to two different patches that I have made for Benjamin Windawsky’s branch (bwidawsk_branch.patch) and intel-gpu-tools (master branch - intel-gpu-tools_master.patch). Alternative link: (\\pglvm2008-v03.png.intel.com\automation\binary\Linux\Automation\patches )

patches: 
	•	intel-gpu-tools-1.3_master.patch 
		o	To be applied on latest intel-gpu-tools-1.3 (git clone git://anongit.freedesktop.org/xorg/app/intel-gpu-tools ) 
		o	The patches added are VLV chipset support + correcting intel_read_reg.c, intel_reg_wirte.c and intel_gtt.c
		o	Web link: http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/
	•	bwidawsk_branch.patch
		o	To be applied on Benjamin Windawsky’s branch (git clone git://people.freedesktop.org/~bwidawsk/intel-gpu-tools -b dump_util
		o	The patches added are VLV chipset support + correcting intel_read_reg.c, intel_reg_wirte.c and intel_gtt.c + merge in change(s) from intel-gpu-tools-1.3
		o	Web link: http://cgit.freedesktop.org/~bwidawsk/intel-gpu-tools/?h=dump_util

Could somebody you please help to upstream this? 

Thanks.

Best regards, 
Vincent 


-----Original Message-----
From: Ben Widawsky [mailto:benjamin.widawsky@intel.com] 
Sent: Tuesday, January 15, 2013 2:55 PM
To: Teres Alexis, Alan Previn
Cc: Barnes, Jesse; Cheah, Vincent Beng Keat; Vetter, Daniel
Subject: Re: intel-gpu-tools patches for read/write MMIO

On Mon, Jan 14, 2013 at 10:42 PM, Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com> wrote:
> Ben, point us to that infrastructure ur working on - and since ur currently maintaining the intel-gpu-tools, let us know if that framework is still being worked on for VLV support or if someone else is working on adding VLV support in some form into the intel-gpu-tools.
> Vincent is already starting to work on adding IS_DISPLAY_REG for VLV. Don’t want any overlap - let us know if so.
>

I am too lazy to find the mailing list post, but here it is:
http://cgit.freedesktop.org/~bwidawsk/intel-gpu-tools/log/?h=dump_util

I made some changed during PO which I probably never pushed. I'd have to look. IMO, this is the way to go though. (see vlv_display.txt)

> On the intel_reg_read/write should only do what the user asks - I agree with that. But if that function is being re-used by other internal tests like "dump display regs" or something, then an internal function could pass in that value - i.e. the option to explicitly say if its display or not should still be there.

We don't have the kind of capability you're referring to there. It would be nice to have, but not there yet. Anyway, I agree with you.

> Also, the option to have a text file define the range sounds excellent 
> - but should stop the one-off cmd line drive reg read / write - which 
> I am sure is not being removed by anyone in any branch for any reason 
> :P

Yeah, I think Daniel gave up arguing against it, I forget if I was supposed to resubmit the patch. It came up at our London meeting.
Anyone remember?

>
> ...alan
>
>
> -----Original Message-----
> From: Ben Widawsky [mailto:benjamin.widawsky@intel.com]
> Sent: Tuesday, January 15, 2013 12:49 PM
> To: Teres Alexis, Alan Previn
> Cc: Barnes, Jesse; Cheah, Vincent Beng Keat; Vetter, Daniel
> Subject: Re: intel-gpu-tools patches for read/write MMIO
>
> This is what that infrastructure I worked on was meant to do (where a text file defines the registers you want to read), you know, the one Daniel more or less nak'd ;-) ... intel_reg_read/write shouldn't ever do anything except what the user asked. Personally, I think the dump range never belonged in read/write, but that predated me.
> intel_reg_dumper is a bit of another story though, see first sentenc.
>
> There is no need to work with Daniel directly if you don't want.
> Simply submit them to the intel-gfx mailing lists. If we have patches that cannot be me public yet, we have an internal list for that which we can point you to (and I am currently maintaining that intel-gpu-tools repository).
>
> Anyway, I wasn't directly addressed, so I'll butt out having left my 
> $.02 :-)
>
> On Mon, Jan 14, 2013 at 6:19 PM, Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com> wrote:
>> Hey Jesse and Daniel,
>> Looks like our team mate didn't add VLV support into the whole intel_gpu_tools suite, he only added VLV support intel_reg_read and intel_reg_write - where the 0x180000 was hard coded for manual user register reads and register writes.
>> The other tests would pass or fail depending. For example, intel_reg_dumper.c might fail (in most cases), because its mostly display regs and needs the 0x18000 but intel_gem_blahblah tests would pass because I belive most of them don't touch display regs.
>> But any tests that want to verify GTT might fail because the gtt mapping was not modded to support VLV.
>>
>> Jesse, Daniel,  do u have someone on OTC enabling full support of VLV for intel-gpu-tools??? If not, then  then Vincent has volunteered to enable this and upstream thru Daniel - I will help him add explicit support on test-case by test-case basis as I summarized above.
>>
>> For generic reading / writing regs, I would propose an additional param (that is defaulted to zero) that means "is_display_reg" so the user could explicitly request to read or write a register and tell the tool that it IS_DISPLAY or  IS_NOT_DISPLAY. And in other cases, this tool will decide based on the same IS_DISPLAY macro in the kernel driver. (the optional override is important since we have overlapping IRQ and some other registers that have the same offset for both render and display and those cases require explicit mention).
>>
>> ...alan
>>
>>
>> -----Original Message-----
>> From: Teres Alexis, Alan Previn
>> Sent: Tuesday, January 15, 2013 7:13 AM
>> To: Barnes, Jesse; Cheah, Vincent Beng Keat
>> Cc: Vetter, Daniel; Widawsky, Benjamin
>> Subject: RE: intel-gpu-tools patches for read/write MMIO
>>
>> Vincent - lets review this offline - if intel-gpu-tools holds register names and addresses, then we can add that driver IS_VLV_DISPLAY_REG macro into that tool (which handles the optional need to add - or not to add - the 0x180000 offset).
>> Else we should remove it and just ensure the MMIO BAR ranges can cover the larger range.
>> ...alan
>>
>> -----Original Message-----
>> From: Barnes, Jesse
>> Sent: Monday, January 14, 2013 11:38 PM
>> To: Cheah, Vincent Beng Keat
>> Cc: Vetter, Daniel; Teres Alexis, Alan Previn; Widawsky, Benjamin
>> Subject: Re: intel-gpu-tools patches for read/write MMIO
>>
>> On Mon, 14 Jan 2013 00:57:15 -0800
>> "Cheah, Vincent Beng Keat" <vincent.beng.keat.cheah@intel.com> wrote:
>>
>>> Hi Daniel.
>>> Attached refers to the patches  that I have done on intel-gpu-tools-1.3 to read and write MMIO register for VLV platform specific.
>>>
>>> Could help me to make this  upstream.
>>
>> I don't think this is quite right.  Not all of the regs are above 0x180000, just the display ones.
>>
>> Also, I think we should drop the comments about "PO boards" and just call them VLV_D, VLV_M, and VLV_T to match the SKUs we have.
>>
>> I don't think we need to add the offset to _read & _write either; those are just bare tools and users can just add the offset themselves.
>>
>> But yes, we do have permission to publish this stuff, so you can publish an updated patch to the mailing list.
>>
>> Thanks,
>> Jesse

[-- Attachment #2: bwidawsk_branch.patch --]
[-- Type: application/octet-stream, Size: 277171 bytes --]

diff -rupN dump_1/lib/drmtest.c dump/lib/drmtest.c
--- dump_1/lib/drmtest.c	2001-01-14 08:11:40.055619273 +0800
+++ dump/lib/drmtest.c	2001-01-14 08:10:57.831625937 +0800
@@ -36,6 +36,7 @@
 #include <signal.h>
 #include <pciaccess.h>
 #include <math.h>
+#include <getopt.h>
 
 #include "drmtest.h"
 #include "i915_drm.h"
@@ -45,6 +46,23 @@
 /* This file contains a bunch of wrapper functions to directly use gem ioctls.
  * Mostly useful to write kernel tests. */
 
+drm_intel_bo *
+gem_handle_to_libdrm_bo(drm_intel_bufmgr *bufmgr, int fd, const char *name, uint32_t handle)
+{
+	struct drm_gem_flink flink;
+	int ret;
+	drm_intel_bo *bo;
+
+	flink.handle = handle;
+	ret = ioctl(fd, DRM_IOCTL_GEM_FLINK, &flink);
+	assert(ret == 0);
+
+	bo = drm_intel_bo_gem_create_from_name(bufmgr, name, flink.name);
+	assert(bo);
+
+	return bo;
+}
+
 static int
 is_intel(int fd)
 {
@@ -171,7 +189,6 @@ int drm_get_card(int master)
 			continue;
 
 		if (is_intel(fd) && master == 0) {
-			gem_quiescent_gpu(fd);
 			close(fd);
 			break;
 		}
@@ -497,6 +514,65 @@ void drmtest_stop_signal_helper(void)
 	signal_helper = -1;
 }
 
+/* subtests helpers */
+static bool list_subtests = false;
+static char *run_single_subtest = NULL;
+
+void drmtest_subtest_init(int argc, char **argv)
+{
+	int c, option_index = 0;
+	static struct option long_options[] = {
+		{"list-subtests", 0, 0, 'l'},
+		{"run-subtest", 1, 0, 'r'},
+		{NULL, 0, 0, 0,}
+	};
+
+	/* supress getopt errors about unknown options */
+	opterr = 0;
+	while((c = getopt_long(argc, argv, "",
+			       long_options, &option_index)) != -1) {
+		switch(c) {
+		case 'l':
+			list_subtests = true;
+			goto out;
+		case 'r':
+			run_single_subtest = strdup(optarg);
+			goto out;
+		}
+	}
+
+out:
+	/* reset opt parsing */
+	optind = 1;
+}
+
+/*
+ * Note: Testcases which use these helpers MUST NOT output anything to stdout
+ * outside of places protected by drmtest_run_subtest checks - the piglit
+ * runner adds every line to the subtest list.
+ */
+bool drmtest_run_subtest(const char *subtest_name)
+{
+	if (list_subtests) {
+		printf("%s\n", subtest_name);
+		return false;
+	}
+
+	if (!run_single_subtest) {
+		return true;
+	} else {
+		if (strcmp(subtest_name, run_single_subtest) == 0)
+			return true;
+
+		return false;
+	}
+}
+
+bool drmtest_only_list_subtests(void)
+{
+	return list_subtests;
+}
+
 /* other helpers */
 void drmtest_exchange_int(void *array, unsigned i, unsigned j)
 {
@@ -525,13 +601,21 @@ void drmtest_permute_array(void *array,
 
 void drmtest_progress(const char *header, uint64_t i, uint64_t total)
 {
+	int divider = 200;
+
+	if (!isatty(fileno(stderr)))
+		return;
+
 	if (i+1 >= total) {
 		fprintf(stderr, "\r%s100%%\n", header);
 		return;
 	}
 
+	if (total / 200 == 0)
+		divider = 1;
+
 	/* only bother updating about every 0.5% */
-	if (i % (total / 200) == 0 || i+1 >= total) {
+	if (i % (total / divider) == 0 || i+1 >= total) {
 		fprintf(stderr, "\r%s%3llu%%", header,
 			(long long unsigned) i * 100 / total);
 	}
@@ -768,7 +852,6 @@ unsigned int kmstest_create_fb(int fd, i
 	cairo_status_t status;
 	cairo_t *cr;
 	char buf[128];
-	int ret;
 	unsigned int fb_id;
 
 	surface = paint_allocate_surface(fd, width, height, depth, bpp,
@@ -798,11 +881,10 @@ unsigned int kmstest_create_fb(int fd, i
 	assert(!status);
 	cairo_destroy(cr);
 
-	ret = drmModeAddFB(fd, width, height, depth, bpp,
-			   fb_info->stride,
-			   fb_info->gem_handle, &fb_id);
+	do_or_die(drmModeAddFB(fd, width, height, depth, bpp,
+			       fb_info->stride,
+			       fb_info->gem_handle, &fb_id));
 
-	assert(ret == 0);
 	cairo_surface_destroy(surface);
 
 	fb_info->fb_id = fb_id;
@@ -810,6 +892,11 @@ unsigned int kmstest_create_fb(int fd, i
 	return fb_id;
 }
 
+void kmstest_remove_fb(int fd, int fb_id)
+{
+	do_or_die(drmModeRmFB(fd, fb_id));
+}
+
 void kmstest_dump_mode(drmModeModeInfo *mode)
 {
 	printf("  %s %d %d %d %d %d %d %d %d %d 0x%x 0x%x %d\n",
@@ -829,3 +916,16 @@ void kmstest_dump_mode(drmModeModeInfo *
 	fflush(stdout);
 }
 
+int kmstest_get_pipe_from_crtc_id(int fd, int crtc_id)
+{
+	struct drm_i915_get_pipe_from_crtc_id pfci;
+	int ret;
+
+	memset(&pfci, 0, sizeof(pfci));
+	pfci.crtc_id = crtc_id;
+	ret = drmIoctl(fd, DRM_IOCTL_I915_GET_PIPE_FROM_CRTC_ID, &pfci);
+	assert(ret == 0);
+
+	return pfci.pipe;
+}
+
diff -rupN dump_1/lib/drmtest.h dump/lib/drmtest.h
--- dump_1/lib/drmtest.h	2001-01-14 08:11:40.056619273 +0800
+++ dump/lib/drmtest.h	2001-01-14 08:10:57.832625811 +0800
@@ -37,6 +37,9 @@
 #include "xf86drmMode.h"
 #include "intel_batchbuffer.h"
 
+drm_intel_bo * gem_handle_to_libdrm_bo(drm_intel_bufmgr *bufmgr, int fd,
+				       const char *name, uint32_t handle);
+
 int drm_get_card(int master);
 int drm_open_any(void);
 int drm_open_any_master(void);
@@ -81,6 +84,9 @@ void drmtest_permute_array(void *array,
 						 unsigned i,
 						 unsigned j));
 void drmtest_progress(const char *header, uint64_t i, uint64_t total);
+void drmtest_subtest_init(int argc, char **argv);
+bool drmtest_run_subtest(const char *subtest_name);
+bool drmtest_only_list_subtests(void);
 
 /* helpers based upon the libdrm buffer manager */
 void drmtest_init_aperture_trashers(drm_intel_bufmgr *bufmgr);
@@ -102,7 +108,9 @@ unsigned int kmstest_create_fb(int fd, i
 			       struct kmstest_fb *fb_info,
 			       kmstest_paint_func paint_func,
 			       void *func_arg);
+void kmstest_remove_fb(int fd, int fb_id);
 void kmstest_dump_mode(drmModeModeInfo *mode);
+int kmstest_get_pipe_from_crtc_id(int fd, int crtc_id);
 
 inline static void _do_or_die(const char *function, int line, int ret)
 {
diff -rupN dump_1/lib/gen7_render.h dump/lib/gen7_render.h
--- dump_1/lib/gen7_render.h	2001-01-14 08:11:40.057619273 +0800
+++ dump/lib/gen7_render.h	2001-01-14 08:10:57.836625323 +0800
@@ -1,222 +1,1364 @@
 #ifndef GEN7_RENDER_H
 #define GEN7_RENDER_H
 
-#include "gen6_render.h"
+#define INTEL_MASK(high, low) (((1 << ((high) - (low) + 1)) - 1) << (low))
 
-#define GEN7_3DSTATE_URB_VS (0x7830 << 16)
-#define GEN7_3DSTATE_URB_HS (0x7831 << 16)
-#define GEN7_3DSTATE_URB_DS (0x7832 << 16)
-#define GEN7_3DSTATE_URB_GS (0x7833 << 16)
-
-#define GEN6_3DSTATE_SCISSOR_STATE_POINTERS	GEN6_3D(3, 0, 0xf)
-#define GEN7_3DSTATE_CLEAR_PARAMS		GEN6_3D(3, 0, 0x04)
-#define GEN7_3DSTATE_DEPTH_BUFFER		GEN6_3D(3, 0, 0x05)
-#define GEN7_3DSTATE_STENCIL_BUFFER		GEN6_3D(3, 0, 0x06)
-#define GEN7_3DSTATE_HIER_DEPTH_BUFFER		GEN6_3D(3, 0, 0x07)
-
-#define GEN7_3DSTATE_GS				GEN6_3D(3, 0, 0x11)
-#define GEN7_3DSTATE_CONSTANT_GS		GEN6_3D(3, 0, 0x16)
-#define GEN7_3DSTATE_CONSTANT_HS		GEN6_3D(3, 0, 0x19)
-#define GEN7_3DSTATE_CONSTANT_DS		GEN6_3D(3, 0, 0x1a)
-#define GEN7_3DSTATE_HS				GEN6_3D(3, 0, 0x1b)
-#define GEN7_3DSTATE_TE				GEN6_3D(3, 0, 0x1c)
-#define GEN7_3DSTATE_DS				GEN6_3D(3, 0, 0x1d)
-#define GEN7_3DSTATE_STREAMOUT			GEN6_3D(3, 0, 0x1e)
-#define GEN7_3DSTATE_SBE			GEN6_3D(3, 0, 0x1f)
-#define GEN7_3DSTATE_PS				GEN6_3D(3, 0, 0x20)
-#define GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP	\
-						GEN6_3D(3, 0, 0x21)
-#define GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_CC	GEN6_3D(3, 0, 0x23)
-#define GEN7_3DSTATE_BLEND_STATE_POINTERS	GEN6_3D(3, 0, 0x24)
-#define GEN7_3DSTATE_DS_STATE_POINTERS		GEN6_3D(3, 0, 0x25)
-#define GEN7_3DSTATE_BINDING_TABLE_POINTERS_VS	GEN6_3D(3, 0, 0x26)
-#define GEN7_3DSTATE_BINDING_TABLE_POINTERS_HS	GEN6_3D(3, 0, 0x27)
-#define GEN7_3DSTATE_BINDING_TABLE_POINTERS_DS	GEN6_3D(3, 0, 0x28)
-#define GEN7_3DSTATE_BINDING_TABLE_POINTERS_GS	GEN6_3D(3, 0, 0x29)
-#define GEN7_3DSTATE_BINDING_TABLE_POINTERS_PS	GEN6_3D(3, 0, 0x2a)
-
-#define GEN7_3DSTATE_SAMPLER_STATE_POINTERS_VS	GEN6_3D(3, 0, 0x2b)
-#define GEN7_3DSTATE_SAMPLER_STATE_POINTERS_HS	GEN6_3D(3, 0, 0x2c)
-#define GEN7_3DSTATE_SAMPLER_STATE_POINTERS_DS	GEN6_3D(3, 0, 0x2d)
-#define GEN7_3DSTATE_SAMPLER_STATE_POINTERS_GS	GEN6_3D(3, 0, 0x2e)
-#define GEN7_3DSTATE_SAMPLER_STATE_POINTERS_PS	GEN6_3D(3, 0, 0x2f)
-
-#define GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_VS	GEN6_3D(3, 1, 0x12)
-#define GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_HS	GEN6_3D(3, 1, 0x13)
-#define GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_DS	GEN6_3D(3, 1, 0x14)
-#define GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_GS	GEN6_3D(3, 1, 0x15)
-#define GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_PS	GEN6_3D(3, 1, 0x16)
-
-/* Some random bits that we care about */
-#define GEN7_VB0_BUFFER_ADDR_MOD_EN		(1 << 14)
-#define GEN7_WM_DISPATCH_ENABLE			(1 << 29)
-#define GEN7_3DSTATE_PS_PERSPECTIVE_PIXEL_BARYCENTRIC (1 << 11)
-#define GEN7_3DSTATE_PS_ATTRIBUTE_ENABLED	 (1 << 10)
-
-/* Random shifts */
-#define GEN7_3DSTATE_WM_MAX_THREADS_SHIFT 24
-#define HSW_3DSTATE_WM_MAX_THREADS_SHIFT 23
-
-/* Shamelessly ripped from mesa */
-struct gen7_surface_state
-{
-	struct {
-		uint32_t cube_pos_z:1;
-		uint32_t cube_neg_z:1;
-		uint32_t cube_pos_y:1;
-		uint32_t cube_neg_y:1;
-		uint32_t cube_pos_x:1;
-		uint32_t cube_neg_x:1;
-		uint32_t pad2:2;
-		uint32_t render_cache_read_write:1;
+#define GEN7_3D(Pipeline,Opcode,Subopcode) ((3 << 29) | \
+					   ((Pipeline) << 27) | \
+					   ((Opcode) << 24) | \
+					   ((Subopcode) << 16))
+
+#define GEN7_STATE_BASE_ADDRESS			GEN7_3D(0, 1, 1)
+#define GEN7_STATE_SIP				GEN7_3D(0, 1, 2)
+
+#define GEN7_PIPELINE_SELECT			GEN7_3D(1, 1, 4)
+
+#define GEN7_MEDIA_STATE_POINTERS		GEN7_3D(2, 0, 0)
+#define GEN7_MEDIA_OBJECT			GEN7_3D(2, 1, 0)
+
+#define GEN7_3DSTATE_VERTEX_BUFFERS		GEN7_3D(3, 0, 8)
+#define GEN7_3DSTATE_VERTEX_ELEMENTS		GEN7_3D(3, 0, 9)
+#define GEN7_3DSTATE_INDEX_BUFFER		GEN7_3D(3, 0, 0xa)
+#define GEN7_3DSTATE_VF_STATISTICS		GEN7_3D(3, 0, 0xb)
+
+#define GEN7_3DSTATE_DRAWING_RECTANGLE		GEN7_3D(3, 1, 0)
+#define GEN7_3DSTATE_CONSTANT_COLOR		GEN7_3D(3, 1, 1)
+#define GEN7_3DSTATE_SAMPLER_PALETTE_LOAD	GEN7_3D(3, 1, 2)
+#define GEN7_3DSTATE_CHROMA_KEY			GEN7_3D(3, 1, 4)
+
+#define GEN7_3DSTATE_POLY_STIPPLE_OFFSET		GEN7_3D(3, 1, 6)
+#define GEN7_3DSTATE_POLY_STIPPLE_PATTERN	GEN7_3D(3, 1, 7)
+#define GEN7_3DSTATE_LINE_STIPPLE		GEN7_3D(3, 1, 8)
+#define GEN7_3DSTATE_GLOBAL_DEPTH_OFFSET_CLAMP	GEN7_3D(3, 1, 9)
+/* These two are BLC and CTG only, not BW or CL */
+#define GEN7_3DSTATE_AA_LINE_PARAMS		GEN7_3D(3, 1, 0xa)
+#define GEN7_3DSTATE_GS_SVB_INDEX		GEN7_3D(3, 1, 0xb)
+
+#define GEN7_3DPRIMITIVE				GEN7_3D(3, 3, 0)
+
+#define GEN7_3DSTATE_SAMPLER_STATE_POINTERS	GEN7_3D(3, 0, 0x02)
+# define GEN7_3DSTATE_SAMPLER_STATE_MODIFY_PS	(1 << 12)
+# define GEN7_3DSTATE_SAMPLER_STATE_MODIFY_GS	(1 << 9)
+# define GEN7_3DSTATE_SAMPLER_STATE_MODIFY_VS	(1 << 8)
+
+#define GEN7_3DSTATE_URB			GEN7_3D(3, 0, 0x05)
+/* DW1 */
+# define GEN7_3DSTATE_URB_VS_SIZE_SHIFT		16
+# define GEN7_3DSTATE_URB_VS_ENTRIES_SHIFT	0
+/* DW2 */
+# define GEN7_3DSTATE_URB_GS_ENTRIES_SHIFT	8
+# define GEN7_3DSTATE_URB_GS_SIZE_SHIFT		0
+
+#define GEN7_3DSTATE_VIEWPORT_STATE_POINTERS	GEN7_3D(3, 0, 0x0d)
+# define GEN7_3DSTATE_VIEWPORT_STATE_MODIFY_CC		(1 << 12)
+# define GEN7_3DSTATE_VIEWPORT_STATE_MODIFY_SF		(1 << 11)
+# define GEN7_3DSTATE_VIEWPORT_STATE_MODIFY_CLIP	(1 << 10)
+
+#define GEN7_3DSTATE_CC_STATE_POINTERS		GEN7_3D(3, 0, 0x0e)
+
+#define GEN7_3DSTATE_VS				GEN7_3D(3, 0, 0x10)
+
+#define GEN7_3DSTATE_GS				GEN7_3D(3, 0, 0x11)
+/* DW4 */
+# define GEN7_3DSTATE_GS_DISPATCH_START_GRF_SHIFT	0
+
+#define GEN7_3DSTATE_CLIP			GEN7_3D(3, 0, 0x12)
+
+#define GEN7_3DSTATE_SF				GEN7_3D(3, 0, 0x13)
+/* DW1 */
+# define GEN7_3DSTATE_SF_NUM_OUTPUTS_SHIFT		22
+# define GEN7_3DSTATE_SF_URB_ENTRY_READ_LENGTH_SHIFT	11
+# define GEN7_3DSTATE_SF_URB_ENTRY_READ_OFFSET_SHIFT	4
+/* DW2 */
+/* DW3 */
+# define GEN7_3DSTATE_SF_CULL_BOTH			(0 << 29)
+# define GEN7_3DSTATE_SF_CULL_NONE			(1 << 29)
+# define GEN7_3DSTATE_SF_CULL_FRONT			(2 << 29)
+# define GEN7_3DSTATE_SF_CULL_BACK			(3 << 29)
+/* DW4 */
+# define GEN7_3DSTATE_SF_TRI_PROVOKE_SHIFT		29
+# define GEN7_3DSTATE_SF_LINE_PROVOKE_SHIFT		27
+# define GEN7_3DSTATE_SF_TRIFAN_PROVOKE_SHIFT		25
+
+#define GEN7_3DSTATE_WM				GEN7_3D(3, 0, 0x14)
+/* DW1 */
+# define GEN7_WM_STATISTICS_ENABLE                              (1 << 31)
+# define GEN7_WM_DEPTH_CLEAR                                    (1 << 30)
+# define GEN7_WM_DISPATCH_ENABLE                                (1 << 29)
+# define GEN7_WM_DEPTH_RESOLVE                                  (1 << 28)
+# define GEN7_WM_HIERARCHICAL_DEPTH_RESOLVE                     (1 << 27)
+# define GEN7_WM_KILL_ENABLE                                    (1 << 25)
+# define GEN7_WM_PSCDEPTH_OFF                                   (0 << 23)
+# define GEN7_WM_PSCDEPTH_ON                                    (1 << 23)
+# define GEN7_WM_PSCDEPTH_ON_GE                                 (2 << 23)
+# define GEN7_WM_PSCDEPTH_ON_LE                                 (3 << 23)
+# define GEN7_WM_USES_SOURCE_DEPTH                              (1 << 20)
+# define GEN7_WM_USES_SOURCE_W                                  (1 << 19)
+# define GEN7_WM_POSITION_ZW_PIXEL                              (0 << 17)
+# define GEN7_WM_POSITION_ZW_CENTROID                           (2 << 17)
+# define GEN7_WM_POSITION_ZW_SAMPLE                             (3 << 17)
+# define GEN7_WM_NONPERSPECTIVE_SAMPLE_BARYCENTRIC              (1 << 16)
+# define GEN7_WM_NONPERSPECTIVE_CENTROID_BARYCENTRIC            (1 << 15)
+# define GEN7_WM_NONPERSPECTIVE_PIXEL_BARYCENTRIC               (1 << 14)
+# define GEN7_WM_PERSPECTIVE_SAMPLE_BARYCENTRIC                 (1 << 13)
+# define GEN7_WM_PERSPECTIVE_CENTROID_BARYCENTRIC               (1 << 12)
+# define GEN7_WM_PERSPECTIVE_PIXEL_BARYCENTRIC                  (1 << 11)
+# define GEN7_WM_USES_INPUT_COVERAGE_MASK                       (1 << 10)
+# define GEN7_WM_LINE_END_CAP_AA_WIDTH_0_5                      (0 << 8)
+# define GEN7_WM_LINE_END_CAP_AA_WIDTH_1_0                      (1 << 8)
+# define GEN7_WM_LINE_END_CAP_AA_WIDTH_2_0                      (2 << 8)
+# define GEN7_WM_LINE_END_CAP_AA_WIDTH_4_0                      (3 << 8)
+# define GEN7_WM_LINE_AA_WIDTH_0_5                              (0 << 6)
+# define GEN7_WM_LINE_AA_WIDTH_1_0                              (1 << 6)
+# define GEN7_WM_LINE_AA_WIDTH_2_0                              (2 << 6)
+# define GEN7_WM_LINE_AA_WIDTH_4_0                              (3 << 6)
+# define GEN7_WM_POLYGON_STIPPLE_ENABLE                         (1 << 4)
+# define GEN7_WM_LINE_STIPPLE_ENABLE                            (1 << 3)
+# define GEN7_WM_POINT_RASTRULE_UPPER_RIGHT                     (1 << 2)
+# define GEN7_WM_MSRAST_OFF_PIXEL                               (0 << 0)
+# define GEN7_WM_MSRAST_OFF_PATTERN                             (1 << 0)
+# define GEN7_WM_MSRAST_ON_PIXEL                                (2 << 0)
+# define GEN7_WM_MSRAST_ON_PATTERN                              (3 << 0)
+/* DW2 */
+# define GEN7_WM_MSDISPMODE_PERPIXEL                            (1 << 31)
+
+
+#define GEN7_3DSTATE_CONSTANT_VS		GEN7_3D(3, 0, 0x15)
+#define GEN7_3DSTATE_CONSTANT_GS		GEN7_3D(3, 0, 0x16)
+#define GEN7_3DSTATE_CONSTANT_PS		GEN7_3D(3, 0, 0x17)
+
+#define GEN7_3DSTATE_SAMPLE_MASK		GEN7_3D(3, 0, 0x18)
+
+#define GEN7_3DSTATE_MULTISAMPLE		GEN7_3D(3, 1, 0x0d)
+/* DW1 */
+# define GEN7_3DSTATE_MULTISAMPLE_PIXEL_LOCATION_CENTER		(0 << 4)
+# define GEN7_3DSTATE_MULTISAMPLE_PIXEL_LOCATION_UPPER_LEFT	(1 << 4)
+# define GEN7_3DSTATE_MULTISAMPLE_NUMSAMPLES_1			(0 << 1)
+# define GEN7_3DSTATE_MULTISAMPLE_NUMSAMPLES_4			(2 << 1)
+# define GEN7_3DSTATE_MULTISAMPLE_NUMSAMPLES_8			(3 << 1)
+
+#define PIPELINE_SELECT_3D		0
+#define PIPELINE_SELECT_MEDIA		1
+
+/* for GEN7_STATE_BASE_ADDRESS */
+#define BASE_ADDRESS_MODIFY		(1 << 0)
+
+/* for GEN7_PIPE_CONTROL */
+#define GEN7_PIPE_CONTROL			GEN7_3D(3, 2, 0)
+#define GEN7_PIPE_CONTROL_CS_STALL      (1 << 20)
+#define GEN7_PIPE_CONTROL_NOWRITE       (0 << 14)
+#define GEN7_PIPE_CONTROL_WRITE_QWORD   (1 << 14)
+#define GEN7_PIPE_CONTROL_WRITE_DEPTH   (2 << 14)
+#define GEN7_PIPE_CONTROL_WRITE_TIME    (3 << 14)
+#define GEN7_PIPE_CONTROL_DEPTH_STALL   (1 << 13)
+#define GEN7_PIPE_CONTROL_WC_FLUSH      (1 << 12)
+#define GEN7_PIPE_CONTROL_IS_FLUSH      (1 << 11)
+#define GEN7_PIPE_CONTROL_TC_FLUSH      (1 << 10)
+#define GEN7_PIPE_CONTROL_NOTIFY_ENABLE (1 << 8)
+#define GEN7_PIPE_CONTROL_GLOBAL_GTT    (1 << 2)
+#define GEN7_PIPE_CONTROL_LOCAL_PGTT    (0 << 2)
+#define GEN7_PIPE_CONTROL_STALL_AT_SCOREBOARD   (1 << 1)
+#define GEN7_PIPE_CONTROL_DEPTH_CACHE_FLUSH	(1 << 0)
+
+/* VERTEX_BUFFER_STATE Structure */
+#define GEN7_VB0_BUFFER_INDEX_SHIFT	26
+#define GEN7_VB0_VERTEXDATA		(0 << 20)
+#define GEN7_VB0_INSTANCEDATA		(1 << 20)
+#define GEN7_VB0_BUFFER_PITCH_SHIFT	0
+#define GEN7_VB0_ADDRESS_MODIFY_ENABLE	(1 << 14)
+
+/* VERTEX_ELEMENT_STATE Structure */
+#define GEN7_VE0_VERTEX_BUFFER_INDEX_SHIFT		26
+#define GEN7_VE0_VALID					(1 << 25)
+#define GEN7_VE0_FORMAT_SHIFT				16
+#define GEN7_VE0_OFFSET_SHIFT				0
+#define GEN7_VE1_VFCOMPONENT_0_SHIFT			28
+#define GEN7_VE1_VFCOMPONENT_1_SHIFT			24
+#define GEN7_VE1_VFCOMPONENT_2_SHIFT			20
+#define GEN7_VE1_VFCOMPONENT_3_SHIFT			16
+#define GEN7_VE1_DESTINATION_ELEMENT_OFFSET_SHIFT	0
+
+/* 3DPRIMITIVE bits */
+#define GEN7_3DPRIMITIVE_VERTEX_SEQUENTIAL (0 << 15)
+#define GEN7_3DPRIMITIVE_VERTEX_RANDOM	  (1 << 15)
+
+#define GEN7_SVG_CTL		       0x7400
+
+#define GEN7_SVG_CTL_GS_BA	       (0 << 8)
+#define GEN7_SVG_CTL_SS_BA	       (1 << 8)
+#define GEN7_SVG_CTL_IO_BA	       (2 << 8)
+#define GEN7_SVG_CTL_GS_AUB	       (3 << 8)
+#define GEN7_SVG_CTL_IO_AUB	       (4 << 8)
+#define GEN7_SVG_CTL_SIP		       (5 << 8)
+
+#define GEN7_VF_CTL_SNAPSHOT_COMPLETE		   (1 << 31)
+#define GEN7_VF_CTL_SNAPSHOT_MUX_SELECT_THREADID	   (0 << 8)
+#define GEN7_VF_CTL_SNAPSHOT_MUX_SELECT_VF_DEBUG	   (1 << 8)
+#define GEN7_VF_CTL_SNAPSHOT_TYPE_VERTEX_SEQUENCE   (0 << 4)
+#define GEN7_VF_CTL_SNAPSHOT_TYPE_VERTEX_INDEX	   (1 << 4)
+#define GEN7_VF_CTL_SKIP_INITIAL_PRIMITIVES	   (1 << 3)
+#define GEN7_VF_CTL_MAX_PRIMITIVES_LIMIT_ENABLE	   (1 << 2)
+#define GEN7_VF_CTL_VERTEX_RANGE_LIMIT_ENABLE	   (1 << 1)
+#define GEN7_VF_CTL_SNAPSHOT_ENABLE		   (1 << 0)
+
+#define GEN7_VF_STRG_VAL		       0x7504
+#define GEN7_VF_STR_VL_OVR	       0x7508
+#define GEN7_VF_VC_OVR		       0x750c
+#define GEN7_VF_STR_PSKIP	       0x7510
+#define GEN7_VF_MAX_PRIM		       0x7514
+#define GEN7_VF_RDATA		       0x7518
+
+#define GEN7_VS_CTL		       0x7600
+#define GEN7_VS_CTL_SNAPSHOT_COMPLETE		   (1 << 31)
+#define GEN7_VS_CTL_SNAPSHOT_MUX_VERTEX_0	   (0 << 8)
+#define GEN7_VS_CTL_SNAPSHOT_MUX_VERTEX_1	   (1 << 8)
+#define GEN7_VS_CTL_SNAPSHOT_MUX_VALID_COUNT	   (2 << 8)
+#define GEN7_VS_CTL_SNAPSHOT_MUX_VS_KERNEL_POINTER  (3 << 8)
+#define GEN7_VS_CTL_SNAPSHOT_ALL_THREADS		   (1 << 2)
+#define GEN7_VS_CTL_THREAD_SNAPSHOT_ENABLE	   (1 << 1)
+#define GEN7_VS_CTL_SNAPSHOT_ENABLE		   (1 << 0)
+
+#define GEN7_VS_STRG_VAL		       0x7604
+#define GEN7_VS_RDATA		       0x7608
+
+#define GEN7_SF_CTL		       0x7b00
+#define GEN7_SF_CTL_SNAPSHOT_COMPLETE		   (1 << 31)
+#define GEN7_SF_CTL_SNAPSHOT_MUX_VERTEX_0_FF_ID	   (0 << 8)
+#define GEN7_SF_CTL_SNAPSHOT_MUX_VERTEX_0_REL_COUNT (1 << 8)
+#define GEN7_SF_CTL_SNAPSHOT_MUX_VERTEX_1_FF_ID	   (2 << 8)
+#define GEN7_SF_CTL_SNAPSHOT_MUX_VERTEX_1_REL_COUNT (3 << 8)
+#define GEN7_SF_CTL_SNAPSHOT_MUX_VERTEX_2_FF_ID	   (4 << 8)
+#define GEN7_SF_CTL_SNAPSHOT_MUX_VERTEX_2_REL_COUNT (5 << 8)
+#define GEN7_SF_CTL_SNAPSHOT_MUX_VERTEX_COUNT	   (6 << 8)
+#define GEN7_SF_CTL_SNAPSHOT_MUX_SF_KERNEL_POINTER  (7 << 8)
+#define GEN7_SF_CTL_MIN_MAX_PRIMITIVE_RANGE_ENABLE  (1 << 4)
+#define GEN7_SF_CTL_DEBUG_CLIP_RECTANGLE_ENABLE	   (1 << 3)
+#define GEN7_SF_CTL_SNAPSHOT_ALL_THREADS		   (1 << 2)
+#define GEN7_SF_CTL_THREAD_SNAPSHOT_ENABLE	   (1 << 1)
+#define GEN7_SF_CTL_SNAPSHOT_ENABLE		   (1 << 0)
+
+#define GEN7_SF_STRG_VAL		       0x7b04
+#define GEN7_SF_RDATA		       0x7b18
+
+#define GEN7_WIZ_CTL		       0x7c00
+#define GEN7_WIZ_CTL_SNAPSHOT_COMPLETE		   (1 << 31)
+#define GEN7_WIZ_CTL_SUBSPAN_INSTANCE_SHIFT	   16
+#define GEN7_WIZ_CTL_SNAPSHOT_MUX_WIZ_KERNEL_POINTER   (0 << 8)
+#define GEN7_WIZ_CTL_SNAPSHOT_MUX_SUBSPAN_INSTANCE     (1 << 8)
+#define GEN7_WIZ_CTL_SNAPSHOT_MUX_PRIMITIVE_SEQUENCE   (2 << 8)
+#define GEN7_WIZ_CTL_SINGLE_SUBSPAN_DISPATCH	      (1 << 6)
+#define GEN7_WIZ_CTL_IGNORE_COLOR_SCOREBOARD_STALLS    (1 << 5)
+#define GEN7_WIZ_CTL_ENABLE_SUBSPAN_INSTANCE_COMPARE   (1 << 4)
+#define GEN7_WIZ_CTL_USE_UPSTREAM_SNAPSHOT_FLAG	      (1 << 3)
+#define GEN7_WIZ_CTL_SNAPSHOT_ALL_THREADS	      (1 << 2)
+#define GEN7_WIZ_CTL_THREAD_SNAPSHOT_ENABLE	      (1 << 1)
+#define GEN7_WIZ_CTL_SNAPSHOT_ENABLE		      (1 << 0)
+
+#define GEN7_WIZ_STRG_VAL			      0x7c04
+#define GEN7_WIZ_RDATA				      0x7c18
+
+#define GEN7_TS_CTL		       0x7e00
+#define GEN7_TS_CTL_SNAPSHOT_COMPLETE		   (1 << 31)
+#define GEN7_TS_CTL_SNAPSHOT_MESSAGE_ERROR	   (0 << 8)
+#define GEN7_TS_CTL_SNAPSHOT_INTERFACE_DESCRIPTOR   (3 << 8)
+#define GEN7_TS_CTL_SNAPSHOT_ALL_CHILD_THREADS	   (1 << 2)
+#define GEN7_TS_CTL_SNAPSHOT_ALL_ROOT_THREADS	   (1 << 1)
+#define GEN7_TS_CTL_SNAPSHOT_ENABLE		   (1 << 0)
+
+#define GEN7_TS_STRG_VAL		       0x7e04
+#define GEN7_TS_RDATA		       0x7e08
+
+#define GEN7_TD_CTL		       0x8000
+#define GEN7_TD_CTL_MUX_SHIFT	       8
+#define GEN7_TD_CTL_EXTERNAL_HALT_R0_DEBUG_MATCH	   (1 << 7)
+#define GEN7_TD_CTL_FORCE_EXTERNAL_HALT		   (1 << 6)
+#define GEN7_TD_CTL_EXCEPTION_MASK_OVERRIDE	   (1 << 5)
+#define GEN7_TD_CTL_FORCE_THREAD_BREAKPOINT_ENABLE  (1 << 4)
+#define GEN7_TD_CTL_BREAKPOINT_ENABLE		   (1 << 2)
+#define GEN7_TD_CTL2		       0x8004
+#define GEN7_TD_CTL2_ILLEGAL_OPCODE_EXCEPTION_OVERRIDE (1 << 28)
+#define GEN7_TD_CTL2_MASKSTACK_EXCEPTION_OVERRIDE      (1 << 26)
+#define GEN7_TD_CTL2_SOFTWARE_EXCEPTION_OVERRIDE	      (1 << 25)
+#define GEN7_TD_CTL2_ACTIVE_THREAD_LIMIT_SHIFT	      16
+#define GEN7_TD_CTL2_ACTIVE_THREAD_LIMIT_ENABLE	      (1 << 8)
+#define GEN7_TD_CTL2_THREAD_SPAWNER_EXECUTION_MASK_ENABLE (1 << 7)
+#define GEN7_TD_CTL2_WIZ_EXECUTION_MASK_ENABLE	      (1 << 6)
+#define GEN7_TD_CTL2_SF_EXECUTION_MASK_ENABLE	      (1 << 5)
+#define GEN7_TD_CTL2_CLIPPER_EXECUTION_MASK_ENABLE     (1 << 4)
+#define GEN7_TD_CTL2_GS_EXECUTION_MASK_ENABLE	      (1 << 3)
+#define GEN7_TD_CTL2_VS_EXECUTION_MASK_ENABLE	      (1 << 0)
+#define GEN7_TD_VF_VS_EMSK	       0x8008
+#define GEN7_TD_GS_EMSK		       0x800c
+#define GEN7_TD_CLIP_EMSK	       0x8010
+#define GEN7_TD_SF_EMSK		       0x8014
+#define GEN7_TD_WIZ_EMSK		       0x8018
+#define GEN7_TD_0_6_EHTRG_VAL	       0x801c
+#define GEN7_TD_0_7_EHTRG_VAL	       0x8020
+#define GEN7_TD_0_6_EHTRG_MSK           0x8024
+#define GEN7_TD_0_7_EHTRG_MSK	       0x8028
+#define GEN7_TD_RDATA		       0x802c
+#define GEN7_TD_TS_EMSK		       0x8030
+
+#define GEN7_EU_CTL		       0x8800
+#define GEN7_EU_CTL_SELECT_SHIFT	       16
+#define GEN7_EU_CTL_DATA_MUX_SHIFT      8
+#define GEN7_EU_ATT_0		       0x8810
+#define GEN7_EU_ATT_1		       0x8814
+#define GEN7_EU_ATT_DATA_0	       0x8820
+#define GEN7_EU_ATT_DATA_1	       0x8824
+#define GEN7_EU_ATT_CLR_0	       0x8830
+#define GEN7_EU_ATT_CLR_1	       0x8834
+#define GEN7_EU_RDATA		       0x8840
+
+#define _3DPRIM_POINTLIST         0x01
+#define _3DPRIM_LINELIST          0x02
+#define _3DPRIM_LINESTRIP         0x03
+#define _3DPRIM_TRILIST           0x04
+#define _3DPRIM_TRISTRIP          0x05
+#define _3DPRIM_TRIFAN            0x06
+#define _3DPRIM_QUADLIST          0x07
+#define _3DPRIM_QUADSTRIP         0x08
+#define _3DPRIM_LINELIST_ADJ      0x09
+#define _3DPRIM_LINESTRIP_ADJ     0x0A
+#define _3DPRIM_TRILIST_ADJ       0x0B
+#define _3DPRIM_TRISTRIP_ADJ      0x0C
+#define _3DPRIM_TRISTRIP_REVERSE  0x0D
+#define _3DPRIM_POLYGON           0x0E
+#define _3DPRIM_RECTLIST          0x0F
+#define _3DPRIM_LINELOOP          0x10
+#define _3DPRIM_POINTLIST_BF      0x11
+#define _3DPRIM_LINESTRIP_CONT    0x12
+#define _3DPRIM_LINESTRIP_BF      0x13
+#define _3DPRIM_LINESTRIP_CONT_BF 0x14
+#define _3DPRIM_TRIFAN_NOSTIPPLE  0x15
+
+#define _3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL 0
+#define _3DPRIM_VERTEXBUFFER_ACCESS_RANDOM     1
+
+#define GEN7_ANISORATIO_2     0
+#define GEN7_ANISORATIO_4     1
+#define GEN7_ANISORATIO_6     2
+#define GEN7_ANISORATIO_8     3
+#define GEN7_ANISORATIO_10    4
+#define GEN7_ANISORATIO_12    5
+#define GEN7_ANISORATIO_14    6
+#define GEN7_ANISORATIO_16    7
+
+#define GEN7_BLENDFACTOR_ONE                 0x1
+#define GEN7_BLENDFACTOR_SRC_COLOR           0x2
+#define GEN7_BLENDFACTOR_SRC_ALPHA           0x3
+#define GEN7_BLENDFACTOR_DST_ALPHA           0x4
+#define GEN7_BLENDFACTOR_DST_COLOR           0x5
+#define GEN7_BLENDFACTOR_SRC_ALPHA_SATURATE  0x6
+#define GEN7_BLENDFACTOR_CONST_COLOR         0x7
+#define GEN7_BLENDFACTOR_CONST_ALPHA         0x8
+#define GEN7_BLENDFACTOR_SRC1_COLOR          0x9
+#define GEN7_BLENDFACTOR_SRC1_ALPHA          0x0A
+#define GEN7_BLENDFACTOR_ZERO                0x11
+#define GEN7_BLENDFACTOR_INV_SRC_COLOR       0x12
+#define GEN7_BLENDFACTOR_INV_SRC_ALPHA       0x13
+#define GEN7_BLENDFACTOR_INV_DST_ALPHA       0x14
+#define GEN7_BLENDFACTOR_INV_DST_COLOR       0x15
+#define GEN7_BLENDFACTOR_INV_CONST_COLOR     0x17
+#define GEN7_BLENDFACTOR_INV_CONST_ALPHA     0x18
+#define GEN7_BLENDFACTOR_INV_SRC1_COLOR      0x19
+#define GEN7_BLENDFACTOR_INV_SRC1_ALPHA      0x1A
+
+#define GEN7_BLENDFUNCTION_ADD               0
+#define GEN7_BLENDFUNCTION_SUBTRACT          1
+#define GEN7_BLENDFUNCTION_REVERSE_SUBTRACT  2
+#define GEN7_BLENDFUNCTION_MIN               3
+#define GEN7_BLENDFUNCTION_MAX               4
+
+#define GEN7_ALPHATEST_FORMAT_UNORM8         0
+#define GEN7_ALPHATEST_FORMAT_FLOAT32        1
+
+#define GEN7_CHROMAKEY_KILL_ON_ANY_MATCH  0
+#define GEN7_CHROMAKEY_REPLACE_BLACK      1
+
+#define GEN7_CLIP_API_OGL     0
+#define GEN7_CLIP_API_DX      1
+
+#define GEN7_CLIPMODE_NORMAL              0
+#define GEN7_CLIPMODE_CLIP_ALL            1
+#define GEN7_CLIPMODE_CLIP_NON_REJECTED   2
+#define GEN7_CLIPMODE_REJECT_ALL          3
+#define GEN7_CLIPMODE_ACCEPT_ALL          4
+
+#define GEN7_CLIP_NDCSPACE     0
+#define GEN7_CLIP_SCREENSPACE  1
+
+#define GEN7_COMPAREFUNCTION_ALWAYS       0
+#define GEN7_COMPAREFUNCTION_NEVER        1
+#define GEN7_COMPAREFUNCTION_LESS         2
+#define GEN7_COMPAREFUNCTION_EQUAL        3
+#define GEN7_COMPAREFUNCTION_LEQUAL       4
+#define GEN7_COMPAREFUNCTION_GREATER      5
+#define GEN7_COMPAREFUNCTION_NOTEQUAL     6
+#define GEN7_COMPAREFUNCTION_GEQUAL       7
+
+#define GEN7_COVERAGE_PIXELS_HALF     0
+#define GEN7_COVERAGE_PIXELS_1        1
+#define GEN7_COVERAGE_PIXELS_2        2
+#define GEN7_COVERAGE_PIXELS_4        3
+
+#define GEN7_CULLMODE_BOTH        0
+#define GEN7_CULLMODE_NONE        1
+#define GEN7_CULLMODE_FRONT       2
+#define GEN7_CULLMODE_BACK        3
+
+#define GEN7_DEFAULTCOLOR_R8G8B8A8_UNORM      0
+#define GEN7_DEFAULTCOLOR_R32G32B32A32_FLOAT  1
+
+#define GEN7_DEPTHFORMAT_D32_FLOAT_S8X24_UINT     0
+#define GEN7_DEPTHFORMAT_D32_FLOAT                1
+#define GEN7_DEPTHFORMAT_D24_UNORM_S8_UINT        2
+#define GEN7_DEPTHFORMAT_D16_UNORM                5
+
+#define GEN7_FLOATING_POINT_IEEE_754        0
+#define GEN7_FLOATING_POINT_NON_IEEE_754    1
+
+#define GEN7_FRONTWINDING_CW      0
+#define GEN7_FRONTWINDING_CCW     1
+
+#define GEN7_INDEX_BYTE     0
+#define GEN7_INDEX_WORD     1
+#define GEN7_INDEX_DWORD    2
+
+#define GEN7_LOGICOPFUNCTION_CLEAR            0
+#define GEN7_LOGICOPFUNCTION_NOR              1
+#define GEN7_LOGICOPFUNCTION_AND_INVERTED     2
+#define GEN7_LOGICOPFUNCTION_COPY_INVERTED    3
+#define GEN7_LOGICOPFUNCTION_AND_REVERSE      4
+#define GEN7_LOGICOPFUNCTION_INVERT           5
+#define GEN7_LOGICOPFUNCTION_XOR              6
+#define GEN7_LOGICOPFUNCTION_NAND             7
+#define GEN7_LOGICOPFUNCTION_AND              8
+#define GEN7_LOGICOPFUNCTION_EQUIV            9
+#define GEN7_LOGICOPFUNCTION_NOOP             10
+#define GEN7_LOGICOPFUNCTION_OR_INVERTED      11
+#define GEN7_LOGICOPFUNCTION_COPY             12
+#define GEN7_LOGICOPFUNCTION_OR_REVERSE       13
+#define GEN7_LOGICOPFUNCTION_OR               14
+#define GEN7_LOGICOPFUNCTION_SET              15
+
+#define GEN7_MAPFILTER_NEAREST        0x0
+#define GEN7_MAPFILTER_LINEAR         0x1
+#define GEN7_MAPFILTER_ANISOTROPIC    0x2
+
+#define GEN7_MIPFILTER_NONE        0
+#define GEN7_MIPFILTER_NEAREST     1
+#define GEN7_MIPFILTER_LINEAR      3
+
+#define GEN7_POLYGON_FRONT_FACING     0
+#define GEN7_POLYGON_BACK_FACING      1
+
+#define GEN7_PREFILTER_ALWAYS     0x0
+#define GEN7_PREFILTER_NEVER      0x1
+#define GEN7_PREFILTER_LESS       0x2
+#define GEN7_PREFILTER_EQUAL      0x3
+#define GEN7_PREFILTER_LEQUAL     0x4
+#define GEN7_PREFILTER_GREATER    0x5
+#define GEN7_PREFILTER_NOTEQUAL   0x6
+#define GEN7_PREFILTER_GEQUAL     0x7
+
+#define GEN7_PROVOKING_VERTEX_0    0
+#define GEN7_PROVOKING_VERTEX_1    1
+#define GEN7_PROVOKING_VERTEX_2    2
+
+#define GEN7_RASTRULE_UPPER_LEFT  0
+#define GEN7_RASTRULE_UPPER_RIGHT 1
+
+#define GEN7_RENDERTARGET_CLAMPRANGE_UNORM    0
+#define GEN7_RENDERTARGET_CLAMPRANGE_SNORM    1
+#define GEN7_RENDERTARGET_CLAMPRANGE_FORMAT   2
+
+#define GEN7_STENCILOP_KEEP               0
+#define GEN7_STENCILOP_ZERO               1
+#define GEN7_STENCILOP_REPLACE            2
+#define GEN7_STENCILOP_INCRSAT            3
+#define GEN7_STENCILOP_DECRSAT            4
+#define GEN7_STENCILOP_INCR               5
+#define GEN7_STENCILOP_DECR               6
+#define GEN7_STENCILOP_INVERT             7
+
+#define GEN7_SURFACE_MIPMAPLAYOUT_BELOW   0
+#define GEN7_SURFACE_MIPMAPLAYOUT_RIGHT   1
+
+#define GEN7_SURFACEFORMAT_R32G32B32A32_FLOAT             0x000
+#define GEN7_SURFACEFORMAT_R32G32B32A32_SINT              0x001
+#define GEN7_SURFACEFORMAT_R32G32B32A32_UINT              0x002
+#define GEN7_SURFACEFORMAT_R32G32B32A32_UNORM             0x003
+#define GEN7_SURFACEFORMAT_R32G32B32A32_SNORM             0x004
+#define GEN7_SURFACEFORMAT_R64G64_FLOAT                   0x005
+#define GEN7_SURFACEFORMAT_R32G32B32X32_FLOAT             0x006
+#define GEN7_SURFACEFORMAT_R32G32B32A32_SSCALED           0x007
+#define GEN7_SURFACEFORMAT_R32G32B32A32_USCALED           0x008
+#define GEN7_SURFACEFORMAT_R32G32B32_FLOAT                0x040
+#define GEN7_SURFACEFORMAT_R32G32B32_SINT                 0x041
+#define GEN7_SURFACEFORMAT_R32G32B32_UINT                 0x042
+#define GEN7_SURFACEFORMAT_R32G32B32_UNORM                0x043
+#define GEN7_SURFACEFORMAT_R32G32B32_SNORM                0x044
+#define GEN7_SURFACEFORMAT_R32G32B32_SSCALED              0x045
+#define GEN7_SURFACEFORMAT_R32G32B32_USCALED              0x046
+#define GEN7_SURFACEFORMAT_R16G16B16A16_UNORM             0x080
+#define GEN7_SURFACEFORMAT_R16G16B16A16_SNORM             0x081
+#define GEN7_SURFACEFORMAT_R16G16B16A16_SINT              0x082
+#define GEN7_SURFACEFORMAT_R16G16B16A16_UINT              0x083
+#define GEN7_SURFACEFORMAT_R16G16B16A16_FLOAT             0x084
+#define GEN7_SURFACEFORMAT_R32G32_FLOAT                   0x085
+#define GEN7_SURFACEFORMAT_R32G32_SINT                    0x086
+#define GEN7_SURFACEFORMAT_R32G32_UINT                    0x087
+#define GEN7_SURFACEFORMAT_R32_FLOAT_X8X24_TYPELESS       0x088
+#define GEN7_SURFACEFORMAT_X32_TYPELESS_G8X24_UINT        0x089
+#define GEN7_SURFACEFORMAT_L32A32_FLOAT                   0x08A
+#define GEN7_SURFACEFORMAT_R32G32_UNORM                   0x08B
+#define GEN7_SURFACEFORMAT_R32G32_SNORM                   0x08C
+#define GEN7_SURFACEFORMAT_R64_FLOAT                      0x08D
+#define GEN7_SURFACEFORMAT_R16G16B16X16_UNORM             0x08E
+#define GEN7_SURFACEFORMAT_R16G16B16X16_FLOAT             0x08F
+#define GEN7_SURFACEFORMAT_A32X32_FLOAT                   0x090
+#define GEN7_SURFACEFORMAT_L32X32_FLOAT                   0x091
+#define GEN7_SURFACEFORMAT_I32X32_FLOAT                   0x092
+#define GEN7_SURFACEFORMAT_R16G16B16A16_SSCALED           0x093
+#define GEN7_SURFACEFORMAT_R16G16B16A16_USCALED           0x094
+#define GEN7_SURFACEFORMAT_R32G32_SSCALED                 0x095
+#define GEN7_SURFACEFORMAT_R32G32_USCALED                 0x096
+#define GEN7_SURFACEFORMAT_B8G8R8A8_UNORM                 0x0C0
+#define GEN7_SURFACEFORMAT_B8G8R8A8_UNORM_SRGB            0x0C1
+#define GEN7_SURFACEFORMAT_R10G10B10A2_UNORM              0x0C2
+#define GEN7_SURFACEFORMAT_R10G10B10A2_UNORM_SRGB         0x0C3
+#define GEN7_SURFACEFORMAT_R10G10B10A2_UINT               0x0C4
+#define GEN7_SURFACEFORMAT_R10G10B10_SNORM_A2_UNORM       0x0C5
+#define GEN7_SURFACEFORMAT_R8G8B8A8_UNORM                 0x0C7
+#define GEN7_SURFACEFORMAT_R8G8B8A8_UNORM_SRGB            0x0C8
+#define GEN7_SURFACEFORMAT_R8G8B8A8_SNORM                 0x0C9
+#define GEN7_SURFACEFORMAT_R8G8B8A8_SINT                  0x0CA
+#define GEN7_SURFACEFORMAT_R8G8B8A8_UINT                  0x0CB
+#define GEN7_SURFACEFORMAT_R16G16_UNORM                   0x0CC
+#define GEN7_SURFACEFORMAT_R16G16_SNORM                   0x0CD
+#define GEN7_SURFACEFORMAT_R16G16_SINT                    0x0CE
+#define GEN7_SURFACEFORMAT_R16G16_UINT                    0x0CF
+#define GEN7_SURFACEFORMAT_R16G16_FLOAT                   0x0D0
+#define GEN7_SURFACEFORMAT_B10G10R10A2_UNORM              0x0D1
+#define GEN7_SURFACEFORMAT_B10G10R10A2_UNORM_SRGB         0x0D2
+#define GEN7_SURFACEFORMAT_R11G11B10_FLOAT                0x0D3
+#define GEN7_SURFACEFORMAT_R32_SINT                       0x0D6
+#define GEN7_SURFACEFORMAT_R32_UINT                       0x0D7
+#define GEN7_SURFACEFORMAT_R32_FLOAT                      0x0D8
+#define GEN7_SURFACEFORMAT_R24_UNORM_X8_TYPELESS          0x0D9
+#define GEN7_SURFACEFORMAT_X24_TYPELESS_G8_UINT           0x0DA
+#define GEN7_SURFACEFORMAT_L16A16_UNORM                   0x0DF
+#define GEN7_SURFACEFORMAT_I24X8_UNORM                    0x0E0
+#define GEN7_SURFACEFORMAT_L24X8_UNORM                    0x0E1
+#define GEN7_SURFACEFORMAT_A24X8_UNORM                    0x0E2
+#define GEN7_SURFACEFORMAT_I32_FLOAT                      0x0E3
+#define GEN7_SURFACEFORMAT_L32_FLOAT                      0x0E4
+#define GEN7_SURFACEFORMAT_A32_FLOAT                      0x0E5
+#define GEN7_SURFACEFORMAT_B8G8R8X8_UNORM                 0x0E9
+#define GEN7_SURFACEFORMAT_B8G8R8X8_UNORM_SRGB            0x0EA
+#define GEN7_SURFACEFORMAT_R8G8B8X8_UNORM                 0x0EB
+#define GEN7_SURFACEFORMAT_R8G8B8X8_UNORM_SRGB            0x0EC
+#define GEN7_SURFACEFORMAT_R9G9B9E5_SHAREDEXP             0x0ED
+#define GEN7_SURFACEFORMAT_B10G10R10X2_UNORM              0x0EE
+#define GEN7_SURFACEFORMAT_L16A16_FLOAT                   0x0F0
+#define GEN7_SURFACEFORMAT_R32_UNORM                      0x0F1
+#define GEN7_SURFACEFORMAT_R32_SNORM                      0x0F2
+#define GEN7_SURFACEFORMAT_R10G10B10X2_USCALED            0x0F3
+#define GEN7_SURFACEFORMAT_R8G8B8A8_SSCALED               0x0F4
+#define GEN7_SURFACEFORMAT_R8G8B8A8_USCALED               0x0F5
+#define GEN7_SURFACEFORMAT_R16G16_SSCALED                 0x0F6
+#define GEN7_SURFACEFORMAT_R16G16_USCALED                 0x0F7
+#define GEN7_SURFACEFORMAT_R32_SSCALED                    0x0F8
+#define GEN7_SURFACEFORMAT_R32_USCALED                    0x0F9
+#define GEN7_SURFACEFORMAT_B5G6R5_UNORM                   0x100
+#define GEN7_SURFACEFORMAT_B5G6R5_UNORM_SRGB              0x101
+#define GEN7_SURFACEFORMAT_B5G5R5A1_UNORM                 0x102
+#define GEN7_SURFACEFORMAT_B5G5R5A1_UNORM_SRGB            0x103
+#define GEN7_SURFACEFORMAT_B4G4R4A4_UNORM                 0x104
+#define GEN7_SURFACEFORMAT_B4G4R4A4_UNORM_SRGB            0x105
+#define GEN7_SURFACEFORMAT_R8G8_UNORM                     0x106
+#define GEN7_SURFACEFORMAT_R8G8_SNORM                     0x107
+#define GEN7_SURFACEFORMAT_R8G8_SINT                      0x108
+#define GEN7_SURFACEFORMAT_R8G8_UINT                      0x109
+#define GEN7_SURFACEFORMAT_R16_UNORM                      0x10A
+#define GEN7_SURFACEFORMAT_R16_SNORM                      0x10B
+#define GEN7_SURFACEFORMAT_R16_SINT                       0x10C
+#define GEN7_SURFACEFORMAT_R16_UINT                       0x10D
+#define GEN7_SURFACEFORMAT_R16_FLOAT                      0x10E
+#define GEN7_SURFACEFORMAT_I16_UNORM                      0x111
+#define GEN7_SURFACEFORMAT_L16_UNORM                      0x112
+#define GEN7_SURFACEFORMAT_A16_UNORM                      0x113
+#define GEN7_SURFACEFORMAT_L8A8_UNORM                     0x114
+#define GEN7_SURFACEFORMAT_I16_FLOAT                      0x115
+#define GEN7_SURFACEFORMAT_L16_FLOAT                      0x116
+#define GEN7_SURFACEFORMAT_A16_FLOAT                      0x117
+#define GEN7_SURFACEFORMAT_R5G5_SNORM_B6_UNORM            0x119
+#define GEN7_SURFACEFORMAT_B5G5R5X1_UNORM                 0x11A
+#define GEN7_SURFACEFORMAT_B5G5R5X1_UNORM_SRGB            0x11B
+#define GEN7_SURFACEFORMAT_R8G8_SSCALED                   0x11C
+#define GEN7_SURFACEFORMAT_R8G8_USCALED                   0x11D
+#define GEN7_SURFACEFORMAT_R16_SSCALED                    0x11E
+#define GEN7_SURFACEFORMAT_R16_USCALED                    0x11F
+#define GEN7_SURFACEFORMAT_R8_UNORM                       0x140
+#define GEN7_SURFACEFORMAT_R8_SNORM                       0x141
+#define GEN7_SURFACEFORMAT_R8_SINT                        0x142
+#define GEN7_SURFACEFORMAT_R8_UINT                        0x143
+#define GEN7_SURFACEFORMAT_A8_UNORM                       0x144
+#define GEN7_SURFACEFORMAT_I8_UNORM                       0x145
+#define GEN7_SURFACEFORMAT_L8_UNORM                       0x146
+#define GEN7_SURFACEFORMAT_P4A4_UNORM                     0x147
+#define GEN7_SURFACEFORMAT_A4P4_UNORM                     0x148
+#define GEN7_SURFACEFORMAT_R8_SSCALED                     0x149
+#define GEN7_SURFACEFORMAT_R8_USCALED                     0x14A
+#define GEN7_SURFACEFORMAT_R1_UINT                        0x181
+#define GEN7_SURFACEFORMAT_YCRCB_NORMAL                   0x182
+#define GEN7_SURFACEFORMAT_YCRCB_SWAPUVY                  0x183
+#define GEN7_SURFACEFORMAT_BC1_UNORM                      0x186
+#define GEN7_SURFACEFORMAT_BC2_UNORM                      0x187
+#define GEN7_SURFACEFORMAT_BC3_UNORM                      0x188
+#define GEN7_SURFACEFORMAT_BC4_UNORM                      0x189
+#define GEN7_SURFACEFORMAT_BC5_UNORM                      0x18A
+#define GEN7_SURFACEFORMAT_BC1_UNORM_SRGB                 0x18B
+#define GEN7_SURFACEFORMAT_BC2_UNORM_SRGB                 0x18C
+#define GEN7_SURFACEFORMAT_BC3_UNORM_SRGB                 0x18D
+#define GEN7_SURFACEFORMAT_MONO8                          0x18E
+#define GEN7_SURFACEFORMAT_YCRCB_SWAPUV                   0x18F
+#define GEN7_SURFACEFORMAT_YCRCB_SWAPY                    0x190
+#define GEN7_SURFACEFORMAT_DXT1_RGB                       0x191
+#define GEN7_SURFACEFORMAT_FXT1                           0x192
+#define GEN7_SURFACEFORMAT_R8G8B8_UNORM                   0x193
+#define GEN7_SURFACEFORMAT_R8G8B8_SNORM                   0x194
+#define GEN7_SURFACEFORMAT_R8G8B8_SSCALED                 0x195
+#define GEN7_SURFACEFORMAT_R8G8B8_USCALED                 0x196
+#define GEN7_SURFACEFORMAT_R64G64B64A64_FLOAT             0x197
+#define GEN7_SURFACEFORMAT_R64G64B64_FLOAT                0x198
+#define GEN7_SURFACEFORMAT_BC4_SNORM                      0x199
+#define GEN7_SURFACEFORMAT_BC5_SNORM                      0x19A
+#define GEN7_SURFACEFORMAT_R16G16B16_UNORM                0x19C
+#define GEN7_SURFACEFORMAT_R16G16B16_SNORM                0x19D
+#define GEN7_SURFACEFORMAT_R16G16B16_SSCALED              0x19E
+#define GEN7_SURFACEFORMAT_R16G16B16_USCALED              0x19F
+
+#define GEN7_SURFACERETURNFORMAT_FLOAT32  0
+#define GEN7_SURFACERETURNFORMAT_S1       1
+
+#define GEN7_SURFACE_1D      0
+#define GEN7_SURFACE_2D      1
+#define GEN7_SURFACE_3D      2
+#define GEN7_SURFACE_CUBE    3
+#define GEN7_SURFACE_BUFFER  4
+#define GEN7_SURFACE_NULL    7
+
+#define GEN7_BORDER_COLOR_MODE_DEFAULT	0
+#define GEN7_BORDER_COLOR_MODE_LEGACY	1
+
+#define GEN7_TEXCOORDMODE_WRAP            0
+#define GEN7_TEXCOORDMODE_MIRROR          1
+#define GEN7_TEXCOORDMODE_CLAMP           2
+#define GEN7_TEXCOORDMODE_CUBE            3
+#define GEN7_TEXCOORDMODE_CLAMP_BORDER    4
+#define GEN7_TEXCOORDMODE_MIRROR_ONCE     5
+
+#define GEN7_THREAD_PRIORITY_NORMAL   0
+#define GEN7_THREAD_PRIORITY_HIGH     1
+
+#define GEN7_TILEWALK_XMAJOR                 0
+#define GEN7_TILEWALK_YMAJOR                 1
+
+#define GEN7_VERTEX_SUBPIXEL_PRECISION_8BITS  0
+#define GEN7_VERTEX_SUBPIXEL_PRECISION_4BITS  1
+
+#define GEN7_VERTEXBUFFER_ACCESS_VERTEXDATA     0
+#define GEN7_VERTEXBUFFER_ACCESS_INSTANCEDATA   1
+
+#define GEN7_VFCOMPONENT_NOSTORE      0
+#define GEN7_VFCOMPONENT_STORE_SRC    1
+#define GEN7_VFCOMPONENT_STORE_0      2
+#define GEN7_VFCOMPONENT_STORE_1_FLT  3
+#define GEN7_VFCOMPONENT_STORE_1_INT  4
+#define GEN7_VFCOMPONENT_STORE_VID    5
+#define GEN7_VFCOMPONENT_STORE_IID    6
+#define GEN7_VFCOMPONENT_STORE_PID    7
+
+
+/* Execution Unit (EU) defines
+ */
+
+#define GEN7_ALIGN_1   0
+#define GEN7_ALIGN_16  1
+
+#define GEN7_ADDRESS_DIRECT                        0
+#define GEN7_ADDRESS_REGISTER_INDIRECT_REGISTER    1
+
+#define GEN7_CHANNEL_X     0
+#define GEN7_CHANNEL_Y     1
+#define GEN7_CHANNEL_Z     2
+#define GEN7_CHANNEL_W     3
+
+#define GEN7_COMPRESSION_NONE          0
+#define GEN7_COMPRESSION_2NDHALF       1
+#define GEN7_COMPRESSION_COMPRESSED    2
+
+#define GEN7_CONDITIONAL_NONE  0
+#define GEN7_CONDITIONAL_Z     1
+#define GEN7_CONDITIONAL_NZ    2
+#define GEN7_CONDITIONAL_EQ    1	/* Z */
+#define GEN7_CONDITIONAL_NEQ   2	/* NZ */
+#define GEN7_CONDITIONAL_G     3
+#define GEN7_CONDITIONAL_GE    4
+#define GEN7_CONDITIONAL_L     5
+#define GEN7_CONDITIONAL_LE    6
+#define GEN7_CONDITIONAL_C     7
+#define GEN7_CONDITIONAL_O     8
+
+#define GEN7_DEBUG_NONE        0
+#define GEN7_DEBUG_BREAKPOINT  1
+
+#define GEN7_DEPENDENCY_NORMAL         0
+#define GEN7_DEPENDENCY_NOTCLEARED     1
+#define GEN7_DEPENDENCY_NOTCHECKED     2
+#define GEN7_DEPENDENCY_DISABLE        3
+
+#define GEN7_EXECUTE_1     0
+#define GEN7_EXECUTE_2     1
+#define GEN7_EXECUTE_4     2
+#define GEN7_EXECUTE_8     3
+#define GEN7_EXECUTE_16    4
+#define GEN7_EXECUTE_32    5
+
+#define GEN7_HORIZONTAL_STRIDE_0   0
+#define GEN7_HORIZONTAL_STRIDE_1   1
+#define GEN7_HORIZONTAL_STRIDE_2   2
+#define GEN7_HORIZONTAL_STRIDE_4   3
+
+#define GEN7_INSTRUCTION_NORMAL    0
+#define GEN7_INSTRUCTION_SATURATE  1
+
+#define INTEL_MASK_ENABLE   0
+#define INTEL_MASK_DISABLE  1
+
+#define GEN7_OPCODE_MOV        1
+#define GEN7_OPCODE_SEL        2
+#define GEN7_OPCODE_NOT        4
+#define GEN7_OPCODE_AND        5
+#define GEN7_OPCODE_OR         6
+#define GEN7_OPCODE_XOR        7
+#define GEN7_OPCODE_SHR        8
+#define GEN7_OPCODE_SHL        9
+#define GEN7_OPCODE_RSR        10
+#define GEN7_OPCODE_RSL        11
+#define GEN7_OPCODE_ASR        12
+#define GEN7_OPCODE_CMP        16
+#define GEN7_OPCODE_JMPI       32
+#define GEN7_OPCODE_IF         34
+#define GEN7_OPCODE_IFF        35
+#define GEN7_OPCODE_ELSE       36
+#define GEN7_OPCODE_ENDIF      37
+#define GEN7_OPCODE_DO         38
+#define GEN7_OPCODE_WHILE      39
+#define GEN7_OPCODE_BREAK      40
+#define GEN7_OPCODE_CONTINUE   41
+#define GEN7_OPCODE_HALT       42
+#define GEN7_OPCODE_MSAVE      44
+#define GEN7_OPCODE_MRESTORE   45
+#define GEN7_OPCODE_PUSH       46
+#define GEN7_OPCODE_POP        47
+#define GEN7_OPCODE_WAIT       48
+#define GEN7_OPCODE_SEND       49
+#define GEN7_OPCODE_ADD        64
+#define GEN7_OPCODE_MUL        65
+#define GEN7_OPCODE_AVG        66
+#define GEN7_OPCODE_FRC        67
+#define GEN7_OPCODE_RNDU       68
+#define GEN7_OPCODE_RNDD       69
+#define GEN7_OPCODE_RNDE       70
+#define GEN7_OPCODE_RNDZ       71
+#define GEN7_OPCODE_MAC        72
+#define GEN7_OPCODE_MACH       73
+#define GEN7_OPCODE_LZD        74
+#define GEN7_OPCODE_SAD2       80
+#define GEN7_OPCODE_SADA2      81
+#define GEN7_OPCODE_DP4        84
+#define GEN7_OPCODE_DPH        85
+#define GEN7_OPCODE_DP3        86
+#define GEN7_OPCODE_DP2        87
+#define GEN7_OPCODE_DPA2       88
+#define GEN7_OPCODE_LINE       89
+#define GEN7_OPCODE_NOP        126
+
+#define GEN7_PREDICATE_NONE             0
+#define GEN7_PREDICATE_NORMAL           1
+#define GEN7_PREDICATE_ALIGN1_ANYV             2
+#define GEN7_PREDICATE_ALIGN1_ALLV             3
+#define GEN7_PREDICATE_ALIGN1_ANY2H            4
+#define GEN7_PREDICATE_ALIGN1_ALL2H            5
+#define GEN7_PREDICATE_ALIGN1_ANY4H            6
+#define GEN7_PREDICATE_ALIGN1_ALL4H            7
+#define GEN7_PREDICATE_ALIGN1_ANY8H            8
+#define GEN7_PREDICATE_ALIGN1_ALL8H            9
+#define GEN7_PREDICATE_ALIGN1_ANY16H           10
+#define GEN7_PREDICATE_ALIGN1_ALL16H           11
+#define GEN7_PREDICATE_ALIGN16_REPLICATE_X     2
+#define GEN7_PREDICATE_ALIGN16_REPLICATE_Y     3
+#define GEN7_PREDICATE_ALIGN16_REPLICATE_Z     4
+#define GEN7_PREDICATE_ALIGN16_REPLICATE_W     5
+#define GEN7_PREDICATE_ALIGN16_ANY4H           6
+#define GEN7_PREDICATE_ALIGN16_ALL4H           7
+
+#define GEN7_ARCHITECTURE_REGISTER_FILE    0
+#define GEN7_GENERAL_REGISTER_FILE         1
+#define GEN7_MESSAGE_REGISTER_FILE         2
+#define GEN7_IMMEDIATE_VALUE               3
+
+#define GEN7_REGISTER_TYPE_UD  0
+#define GEN7_REGISTER_TYPE_D   1
+#define GEN7_REGISTER_TYPE_UW  2
+#define GEN7_REGISTER_TYPE_W   3
+#define GEN7_REGISTER_TYPE_UB  4
+#define GEN7_REGISTER_TYPE_B   5
+#define GEN7_REGISTER_TYPE_VF  5	/* packed float vector, immediates only? */
+#define GEN7_REGISTER_TYPE_HF  6
+#define GEN7_REGISTER_TYPE_V   6	/* packed int vector, immediates only, uword dest only */
+#define GEN7_REGISTER_TYPE_F   7
+
+#define GEN7_ARF_NULL                  0x00
+#define GEN7_ARF_ADDRESS               0x10
+#define GEN7_ARF_ACCUMULATOR           0x20
+#define GEN7_ARF_FLAG                  0x30
+#define GEN7_ARF_MASK                  0x40
+#define GEN7_ARF_MASK_STACK            0x50
+#define GEN7_ARF_MASK_STACK_DEPTH      0x60
+#define GEN7_ARF_STATE                 0x70
+#define GEN7_ARF_CONTROL               0x80
+#define GEN7_ARF_NOTIFICATION_COUNT    0x90
+#define GEN7_ARF_IP                    0xA0
+
+#define GEN7_AMASK   0
+#define GEN7_IMASK   1
+#define GEN7_LMASK   2
+#define GEN7_CMASK   3
+
+#define GEN7_THREAD_NORMAL     0
+#define GEN7_THREAD_ATOMIC     1
+#define GEN7_THREAD_SWITCH     2
+
+#define GEN7_VERTICAL_STRIDE_0                 0
+#define GEN7_VERTICAL_STRIDE_1                 1
+#define GEN7_VERTICAL_STRIDE_2                 2
+#define GEN7_VERTICAL_STRIDE_4                 3
+#define GEN7_VERTICAL_STRIDE_8                 4
+#define GEN7_VERTICAL_STRIDE_16                5
+#define GEN7_VERTICAL_STRIDE_32                6
+#define GEN7_VERTICAL_STRIDE_64                7
+#define GEN7_VERTICAL_STRIDE_128               8
+#define GEN7_VERTICAL_STRIDE_256               9
+#define GEN7_VERTICAL_STRIDE_ONE_DIMENSIONAL   0xF
+
+#define GEN7_WIDTH_1       0
+#define GEN7_WIDTH_2       1
+#define GEN7_WIDTH_4       2
+#define GEN7_WIDTH_8       3
+#define GEN7_WIDTH_16      4
+
+#define GEN7_STATELESS_BUFFER_BOUNDARY_1K      0
+#define GEN7_STATELESS_BUFFER_BOUNDARY_2K      1
+#define GEN7_STATELESS_BUFFER_BOUNDARY_4K      2
+#define GEN7_STATELESS_BUFFER_BOUNDARY_8K      3
+#define GEN7_STATELESS_BUFFER_BOUNDARY_16K     4
+#define GEN7_STATELESS_BUFFER_BOUNDARY_32K     5
+#define GEN7_STATELESS_BUFFER_BOUNDARY_64K     6
+#define GEN7_STATELESS_BUFFER_BOUNDARY_128K    7
+#define GEN7_STATELESS_BUFFER_BOUNDARY_256K    8
+#define GEN7_STATELESS_BUFFER_BOUNDARY_512K    9
+#define GEN7_STATELESS_BUFFER_BOUNDARY_1M      10
+#define GEN7_STATELESS_BUFFER_BOUNDARY_2M      11
+
+#define GEN7_POLYGON_FACING_FRONT      0
+#define GEN7_POLYGON_FACING_BACK       1
+
+#define GEN7_MESSAGE_TARGET_NULL               0
+#define GEN7_MESSAGE_TARGET_MATH               1
+#define GEN7_MESSAGE_TARGET_SAMPLER            2
+#define GEN7_MESSAGE_TARGET_GATEWAY            3
+#define GEN7_MESSAGE_TARGET_DATAPORT_READ      4
+#define GEN7_MESSAGE_TARGET_DATAPORT_WRITE     5
+#define GEN7_MESSAGE_TARGET_URB                6
+#define GEN7_MESSAGE_TARGET_THREAD_SPAWNER     7
+
+#define GEN7_SAMPLER_RETURN_FORMAT_FLOAT32     0
+#define GEN7_SAMPLER_RETURN_FORMAT_UINT32      2
+#define GEN7_SAMPLER_RETURN_FORMAT_SINT32      3
+
+#define GEN7_SAMPLER_MESSAGE_SIMD8_SAMPLE              0
+#define GEN7_SAMPLER_MESSAGE_SIMD16_SAMPLE             0
+#define GEN7_SAMPLER_MESSAGE_SIMD16_SAMPLE_BIAS        0
+#define GEN7_SAMPLER_MESSAGE_SIMD8_KILLPIX             1
+#define GEN7_SAMPLER_MESSAGE_SIMD4X2_SAMPLE_LOD        1
+#define GEN7_SAMPLER_MESSAGE_SIMD16_SAMPLE_LOD         1
+#define GEN7_SAMPLER_MESSAGE_SIMD4X2_SAMPLE_GRADIENTS  2
+#define GEN7_SAMPLER_MESSAGE_SIMD8_SAMPLE_GRADIENTS    2
+#define GEN7_SAMPLER_MESSAGE_SIMD4X2_SAMPLE_COMPARE    0
+#define GEN7_SAMPLER_MESSAGE_SIMD16_SAMPLE_COMPARE     2
+#define GEN7_SAMPLER_MESSAGE_SIMD4X2_RESINFO           2
+#define GEN7_SAMPLER_MESSAGE_SIMD8_RESINFO             2
+#define GEN7_SAMPLER_MESSAGE_SIMD16_RESINFO            2
+#define GEN7_SAMPLER_MESSAGE_SIMD4X2_LD                3
+#define GEN7_SAMPLER_MESSAGE_SIMD8_LD                  3
+#define GEN7_SAMPLER_MESSAGE_SIMD16_LD                 3
+
+#define GEN7_DATAPORT_OWORD_BLOCK_1_OWORDLOW   0
+#define GEN7_DATAPORT_OWORD_BLOCK_1_OWORDHIGH  1
+#define GEN7_DATAPORT_OWORD_BLOCK_2_OWORDS     2
+#define GEN7_DATAPORT_OWORD_BLOCK_4_OWORDS     3
+#define GEN7_DATAPORT_OWORD_BLOCK_8_OWORDS     4
+
+#define GEN7_DATAPORT_OWORD_DUAL_BLOCK_1OWORD     0
+#define GEN7_DATAPORT_OWORD_DUAL_BLOCK_4OWORDS    2
+
+#define GEN7_DATAPORT_DWORD_SCATTERED_BLOCK_8DWORDS   2
+#define GEN7_DATAPORT_DWORD_SCATTERED_BLOCK_16DWORDS  3
+
+#define GEN7_DATAPORT_READ_MESSAGE_OWORD_BLOCK_READ          0
+#define GEN7_DATAPORT_READ_MESSAGE_OWORD_DUAL_BLOCK_READ     1
+#define GEN7_DATAPORT_READ_MESSAGE_DWORD_BLOCK_READ          2
+#define GEN7_DATAPORT_READ_MESSAGE_DWORD_SCATTERED_READ      3
+
+#define GEN7_DATAPORT_READ_TARGET_DATA_CACHE      0
+#define GEN7_DATAPORT_READ_TARGET_RENDER_CACHE    1
+#define GEN7_DATAPORT_READ_TARGET_SAMPLER_CACHE   2
+
+#define GEN7_DATAPORT_RENDER_TARGET_WRITE_SIMD16_SINGLE_SOURCE                0
+#define GEN7_DATAPORT_RENDER_TARGET_WRITE_SIMD16_SINGLE_SOURCE_REPLICATED     1
+#define GEN7_DATAPORT_RENDER_TARGET_WRITE_SIMD8_DUAL_SOURCE_SUBSPAN01         2
+#define GEN7_DATAPORT_RENDER_TARGET_WRITE_SIMD8_DUAL_SOURCE_SUBSPAN23         3
+#define GEN7_DATAPORT_RENDER_TARGET_WRITE_SIMD8_SINGLE_SOURCE_SUBSPAN01       4
+
+#define GEN7_DATAPORT_WRITE_MESSAGE_OWORD_BLOCK_WRITE                0
+#define GEN7_DATAPORT_WRITE_MESSAGE_OWORD_DUAL_BLOCK_WRITE           1
+#define GEN7_DATAPORT_WRITE_MESSAGE_DWORD_BLOCK_WRITE                2
+#define GEN7_DATAPORT_WRITE_MESSAGE_DWORD_SCATTERED_WRITE            3
+#define GEN7_DATAPORT_WRITE_MESSAGE_RENDER_TARGET_WRITE              4
+#define GEN7_DATAPORT_WRITE_MESSAGE_STREAMED_VERTEX_BUFFER_WRITE     5
+#define GEN7_DATAPORT_WRITE_MESSAGE_FLUSH_RENDER_CACHE               7
+
+#define GEN7_MATH_FUNCTION_INV                              1
+#define GEN7_MATH_FUNCTION_LOG                              2
+#define GEN7_MATH_FUNCTION_EXP                              3
+#define GEN7_MATH_FUNCTION_SQRT                             4
+#define GEN7_MATH_FUNCTION_RSQ                              5
+#define GEN7_MATH_FUNCTION_SIN                              6 /* was 7 */
+#define GEN7_MATH_FUNCTION_COS                              7 /* was 8 */
+#define GEN7_MATH_FUNCTION_SINCOS                           8 /* was 6 */
+#define GEN7_MATH_FUNCTION_TAN                              9
+#define GEN7_MATH_FUNCTION_POW                              10
+#define GEN7_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER   11
+#define GEN7_MATH_FUNCTION_INT_DIV_QUOTIENT                 12
+#define GEN7_MATH_FUNCTION_INT_DIV_REMAINDER                13
+
+#define GEN7_MATH_INTEGER_UNSIGNED     0
+#define GEN7_MATH_INTEGER_SIGNED       1
+
+#define GEN7_MATH_PRECISION_FULL        0
+#define GEN7_MATH_PRECISION_PARTIAL     1
+
+#define GEN7_MATH_SATURATE_NONE         0
+#define GEN7_MATH_SATURATE_SATURATE     1
+
+#define GEN7_MATH_DATA_VECTOR  0
+#define GEN7_MATH_DATA_SCALAR  1
+
+#define GEN7_URB_OPCODE_WRITE  0
+
+#define GEN7_URB_SWIZZLE_NONE          0
+#define GEN7_URB_SWIZZLE_INTERLEAVE    1
+#define GEN7_URB_SWIZZLE_TRANSPOSE     2
+
+#define GEN7_SCRATCH_SPACE_SIZE_1K     0
+#define GEN7_SCRATCH_SPACE_SIZE_2K     1
+#define GEN7_SCRATCH_SPACE_SIZE_4K     2
+#define GEN7_SCRATCH_SPACE_SIZE_8K     3
+#define GEN7_SCRATCH_SPACE_SIZE_16K    4
+#define GEN7_SCRATCH_SPACE_SIZE_32K    5
+#define GEN7_SCRATCH_SPACE_SIZE_64K    6
+#define GEN7_SCRATCH_SPACE_SIZE_128K   7
+#define GEN7_SCRATCH_SPACE_SIZE_256K   8
+#define GEN7_SCRATCH_SPACE_SIZE_512K   9
+#define GEN7_SCRATCH_SPACE_SIZE_1M     10
+#define GEN7_SCRATCH_SPACE_SIZE_2M     11
+
+/* The hardware supports two different modes for border color. The
+ * default (OpenGL) mode uses floating-point color channels, while the
+ * legacy mode uses 4 bytes.
+ *
+ * More significantly, the legacy mode respects the components of the
+ * border color for channels not present in the source, (whereas the
+ * default mode will ignore the border color's alpha channel and use
+ * alpha==1 for an RGB source, for example).
+ *
+ * The legacy mode matches the semantics specified by the Render
+ * extension.
+ */
+struct gen7_sampler_default_border_color {
+   float color[4];
+};
+
+struct gen7_sampler_legacy_border_color {
+   uint8_t color[4];
+};
+
+struct gen7_blend_state {
+	struct {
+		uint32_t dest_blend_factor:5;
+		uint32_t source_blend_factor:5;
+		uint32_t pad3:1;
+		uint32_t blend_func:3;
+		uint32_t pad2:1;
+		uint32_t ia_dest_blend_factor:5;
+		uint32_t ia_source_blend_factor:5;
 		uint32_t pad1:1;
-		uint32_t surface_array_spacing:1;
-		uint32_t vert_line_stride_ofs:1;
-		uint32_t vert_line_stride:1;
-		uint32_t tile_walk:1;
-		uint32_t tiled_surface:1;
-		uint32_t horizontal_alignment:1;
-		uint32_t vertical_alignment:2;
-		uint32_t surface_format:9;     /**< BRW_SURFACEFORMAT_x */
+		uint32_t ia_blend_func:3;
 		uint32_t pad0:1;
-		uint32_t is_array:1;
-		uint32_t surface_type:3;       /**< BRW_SURFACE_1D/2D/3D/CUBE */
+		uint32_t ia_blend_enable:1;
+		uint32_t blend_enable:1;
+	} blend0;
+
+	struct {
+		uint32_t post_blend_clamp_enable:1;
+		uint32_t pre_blend_clamp_enable:1;
+		uint32_t clamp_range:2;
+		uint32_t pad0:4;
+		uint32_t x_dither_offset:2;
+		uint32_t y_dither_offset:2;
+		uint32_t dither_enable:1;
+		uint32_t alpha_test_func:3;
+		uint32_t alpha_test_enable:1;
+		uint32_t pad1:1;
+		uint32_t logic_op_func:4;
+		uint32_t logic_op_enable:1;
+		uint32_t pad2:1;
+		uint32_t write_disable_b:1;
+		uint32_t write_disable_g:1;
+		uint32_t write_disable_r:1;
+		uint32_t write_disable_a:1;
+		uint32_t pad3:1;
+		uint32_t alpha_to_coverage_dither:1;
+		uint32_t alpha_to_one:1;
+		uint32_t alpha_to_coverage:1;
+	} blend1;
+};
+
+struct gen7_color_calc_state {
+	struct {
+		uint32_t alpha_test_format:1;
+		uint32_t pad0:14;
+		uint32_t round_disable:1;
+		uint32_t bf_stencil_ref:8;
+		uint32_t stencil_ref:8;
+	} cc0;
+
+	union {
+		float alpha_ref_f;
+		struct {
+			uint32_t ui:8;
+			uint32_t pad0:24;
+		} alpha_ref_fi;
+	} cc1;
+
+	float constant_r;
+	float constant_g;
+	float constant_b;
+	float constant_a;
+};
+
+struct gen7_depth_stencil_state {
+	struct {
+		uint32_t pad0:3;
+		uint32_t bf_stencil_pass_depth_pass_op:3;
+		uint32_t bf_stencil_pass_depth_fail_op:3;
+		uint32_t bf_stencil_fail_op:3;
+		uint32_t bf_stencil_func:3;
+		uint32_t bf_stencil_enable:1;
+		uint32_t pad1:2;
+		uint32_t stencil_write_enable:1;
+		uint32_t stencil_pass_depth_pass_op:3;
+		uint32_t stencil_pass_depth_fail_op:3;
+		uint32_t stencil_fail_op:3;
+		uint32_t stencil_func:3;
+		uint32_t stencil_enable:1;
+	} ds0;
+
+	struct {
+		uint32_t bf_stencil_write_mask:8;
+		uint32_t bf_stencil_test_mask:8;
+		uint32_t stencil_write_mask:8;
+		uint32_t stencil_test_mask:8;
+	} ds1;
+
+	struct {
+		uint32_t pad0:26;
+		uint32_t depth_write_enable:1;
+		uint32_t depth_test_func:3;
+		uint32_t pad1:1;
+		uint32_t depth_test_enable:1;
+	} ds2;
+};
+
+struct gen7_surface_state {
+	struct {
+		unsigned int cube_pos_z:1;
+		unsigned int cube_neg_z:1;
+		unsigned int cube_pos_y:1;
+		unsigned int cube_neg_y:1;
+		unsigned int cube_pos_x:1;
+		unsigned int cube_neg_x:1;
+		unsigned int pad2:2;
+		unsigned int render_cache_read_write:1;
+		unsigned int pad1:1;
+		unsigned int surface_array_spacing:1;
+		unsigned int vert_line_stride_ofs:1;
+		unsigned int vert_line_stride:1;
+		unsigned int tile_walk:1;
+		unsigned int tiled_surface:1;
+		unsigned int horizontal_alignment:1;
+		unsigned int vertical_alignment:2;
+		unsigned int surface_format:9;     /**< BRW_SURFACEFORMAT_x */
+		unsigned int pad0:1;
+		unsigned int is_array:1;
+		unsigned int surface_type:3;       /**< BRW_SURFACE_1D/2D/3D/CUBE */
 	} ss0;
 
 	struct {
-		uint32_t base_addr;
+		unsigned int base_addr;
 	} ss1;
 
 	struct {
-		uint32_t width:14;
-		uint32_t pad1:2;
-		uint32_t height:14;
-		uint32_t pad0:2;
+		unsigned int width:14;
+		unsigned int pad1:2;
+		unsigned int height:14;
+		unsigned int pad0:2;
 	} ss2;
 
 	struct {
-		uint32_t pitch:18;
-		uint32_t pad:3;
-		uint32_t depth:11;
+		unsigned int pitch:18;
+		unsigned int pad:3;
+		unsigned int depth:11;
 	} ss3;
 
 	struct {
-		uint32_t multisample_position_palette_index:3;
-		uint32_t num_multisamples:3;
-		uint32_t multisampled_surface_storage_format:1;
-		uint32_t render_target_view_extent:11;
-		uint32_t min_array_elt:11;
-		uint32_t rotation:2;
-		uint32_t pad0:1;
+		unsigned int multisample_position_palette_index:3;
+		unsigned int num_multisamples:3;
+		unsigned int multisampled_surface_storage_format:1;
+		unsigned int render_target_view_extent:11;
+		unsigned int min_array_elt:11;
+		unsigned int rotation:2;
+		unsigned int pad0:1;
 	} ss4;
 
 	struct {
-		uint32_t mip_count:4;
-		uint32_t min_lod:4;
-		uint32_t pad1:12;
-		uint32_t y_offset:4;
-		uint32_t pad0:1;
-		uint32_t x_offset:7;
+		unsigned int mip_count:4;
+		unsigned int min_lod:4;
+		unsigned int pad1:12;
+		unsigned int y_offset:4;
+		unsigned int pad0:1;
+		unsigned int x_offset:7;
 	} ss5;
 
 	struct {
-		uint32_t pad; /* Multisample Control Surface stuff */
+		unsigned int pad; /* Multisample Control Surface stuff */
 	} ss6;
 
 	struct {
-		uint32_t resource_min_lod:12;
-
-		/* Only on Haswell */
-		uint32_t pad0:4;
-		uint32_t shader_chanel_select_a:3;
-		uint32_t shader_chanel_select_b:3;
-		uint32_t shader_chanel_select_g:3;
-		uint32_t shader_chanel_select_r:3;
-
-		uint32_t alpha_clear_color:1;
-		uint32_t blue_clear_color:1;
-		uint32_t green_clear_color:1;
-		uint32_t red_clear_color:1;
+		unsigned int resource_min_lod:12;
+		unsigned int pad0:16;
+		unsigned int alpha_clear_color:1;
+		unsigned int blue_clear_color:1;
+		unsigned int green_clear_color:1;
+		unsigned int red_clear_color:1;
 	} ss7;
 };
 
-struct gen7_sampler_state
-{
-	struct
-	{
-		uint32_t aniso_algorithm:1;
-		uint32_t lod_bias:13;
-		uint32_t min_filter:3;
-		uint32_t mag_filter:3;
-		uint32_t mip_filter:2;
-		uint32_t base_level:5;
-		uint32_t pad1:1;
-		uint32_t lod_preclamp:1;
-		uint32_t default_color_mode:1;
-		uint32_t pad0:1;
-		uint32_t disable:1;
+struct gen7_sampler_state {
+	struct {
+		unsigned int aniso_algorithm:1;
+		unsigned int lod_bias:13;
+		unsigned int min_filter:3;
+		unsigned int mag_filter:3;
+		unsigned int mip_filter:2;
+		unsigned int base_level:5;
+		unsigned int pad1:1;
+		unsigned int lod_preclamp:1;
+		unsigned int default_color_mode:1;
+		unsigned int pad0:1;
+		unsigned int disable:1;
 	} ss0;
 
-	struct
-	{
-		uint32_t cube_control_mode:1;
-		uint32_t shadow_function:3;
-		uint32_t pad:4;
-		uint32_t max_lod:12;
-		uint32_t min_lod:12;
+	struct {
+		unsigned int cube_control_mode:1;
+		unsigned int shadow_function:3;
+		unsigned int pad:4;
+		unsigned int max_lod:12;
+		unsigned int min_lod:12;
 	} ss1;
 
-	struct
-	{
-		uint32_t pad:5;
-		uint32_t default_color_pointer:27;
+	struct {
+		unsigned int pad:5;
+		unsigned int default_color_pointer:27;
 	} ss2;
 
-	struct
-	{
-		uint32_t r_wrap_mode:3;
-		uint32_t t_wrap_mode:3;
-		uint32_t s_wrap_mode:3;
-		uint32_t pad:1;
-		uint32_t non_normalized_coord:1;
-		uint32_t trilinear_quality:2;
-		uint32_t address_round:6;
-		uint32_t max_aniso:3;
-		uint32_t chroma_key_mode:1;
-		uint32_t chroma_key_index:2;
-		uint32_t chroma_key_enable:1;
-		uint32_t pad0:6;
+	struct {
+		unsigned int r_wrap_mode:3;
+		unsigned int t_wrap_mode:3;
+		unsigned int s_wrap_mode:3;
+		unsigned int pad:1;
+		unsigned int non_normalized_coord:1;
+		unsigned int trilinear_quality:2;
+		unsigned int address_round:6;
+		unsigned int max_aniso:3;
+		unsigned int chroma_key_mode:1;
+		unsigned int chroma_key_index:2;
+		unsigned int chroma_key_enable:1;
+		unsigned int pad0:6;
 	} ss3;
 };
 
-struct gen7_sf_clip_viewport {
-	struct {
-		float m00;
-		float m11;
-		float m22;
-		float m30;
-		float m31;
-		float m32;
-	} viewport;
-
-	uint32_t pad0[2];
-
-	struct {
-		float xmin;
-		float xmax;
-		float ymin;
-		float ymax;
-	} guardband;
-
-	float pad1[4];
-};
-
-struct gen6_scissor_rect
-{
-	uint32_t xmin:16;
-	uint32_t ymin:16;
-	uint32_t xmax:16;
-	uint32_t ymax:16;
+/* Surface state DW0 */
+#define GEN7_SURFACE_RC_READ_WRITE	(1 << 8)
+#define GEN7_SURFACE_TILED		(1 << 14)
+#define GEN7_SURFACE_TILED_Y		(1 << 13)
+#define GEN7_SURFACE_FORMAT_SHIFT	18
+#define GEN7_SURFACE_TYPE_SHIFT		29
+
+/* Surface state DW2 */
+#define GEN7_SURFACE_HEIGHT_SHIFT        16
+#define GEN7_SURFACE_WIDTH_SHIFT         0
+
+/* Surface state DW3 */
+#define GEN7_SURFACE_DEPTH_SHIFT         21
+#define GEN7_SURFACE_PITCH_SHIFT         0
+
+#define HSW_SWIZZLE_ZERO		0
+#define HSW_SWIZZLE_ONE			1
+#define HSW_SWIZZLE_RED			4
+#define HSW_SWIZZLE_GREEN		5
+#define HSW_SWIZZLE_BLUE		6
+#define HSW_SWIZZLE_ALPHA		7
+#define __HSW_SURFACE_SWIZZLE(r,g,b,a) \
+	((a) << 16 | (b) << 19 | (g) << 22 | (r) << 25)
+#define HSW_SURFACE_SWIZZLE(r,g,b,a) \
+	__HSW_SURFACE_SWIZZLE(HSW_SWIZZLE_##r, HSW_SWIZZLE_##g, HSW_SWIZZLE_##b, HSW_SWIZZLE_##a)
+
+/* _3DSTATE_VERTEX_BUFFERS on GEN7*/
+/* DW1 */
+#define GEN7_VB0_ADDRESS_MODIFYENABLE   (1 << 14)
+
+/* _3DPRIMITIVE on GEN7 */
+/* DW1 */
+# define GEN7_3DPRIM_VERTEXBUFFER_ACCESS_SEQUENTIAL     (0 << 8)
+# define GEN7_3DPRIM_VERTEXBUFFER_ACCESS_RANDOM         (1 << 8)
+
+#define GEN7_3DSTATE_CLEAR_PARAMS               GEN7_3D(3, 0, 0x04)
+#define GEN7_3DSTATE_DEPTH_BUFFER               GEN7_3D(3, 0, 0x05)
+# define GEN7_3DSTATE_DEPTH_BUFFER_TYPE_SHIFT	29
+# define GEN7_3DSTATE_DEPTH_BUFFER_FORMAT_SHIFT	18
+/* DW1 */
+# define GEN7_3DSTATE_DEPTH_CLEAR_VALID		(1 << 15)
+
+#define GEN7_3DSTATE_CONSTANT_HS                GEN7_3D(3, 0, 0x19)
+#define GEN7_3DSTATE_CONSTANT_DS                GEN7_3D(3, 0, 0x1a)
+
+#define GEN7_3DSTATE_HS                         GEN7_3D(3, 0, 0x1b)
+#define GEN7_3DSTATE_TE                         GEN7_3D(3, 0, 0x1c)
+#define GEN7_3DSTATE_DS                         GEN7_3D(3, 0, 0x1d)
+#define GEN7_3DSTATE_STREAMOUT                  GEN7_3D(3, 0, 0x1e)
+#define GEN7_3DSTATE_SBE                        GEN7_3D(3, 0, 0x1f)
+
+/* DW1 */
+# define GEN7_SBE_SWIZZLE_CONTROL_MODE          (1 << 28)
+# define GEN7_SBE_NUM_OUTPUTS_SHIFT             22
+# define GEN7_SBE_SWIZZLE_ENABLE                (1 << 21)
+# define GEN7_SBE_POINT_SPRITE_LOWERLEFT        (1 << 20)
+# define GEN7_SBE_URB_ENTRY_READ_LENGTH_SHIFT   11
+# define GEN7_SBE_URB_ENTRY_READ_OFFSET_SHIFT   4
+
+#define GEN7_3DSTATE_PS                                 GEN7_3D(3, 0, 0x20)
+/* DW1: kernel pointer */
+/* DW2 */
+# define GEN7_PS_SPF_MODE                               (1 << 31)
+# define GEN7_PS_VECTOR_MASK_ENABLE                     (1 << 30)
+# define GEN7_PS_SAMPLER_COUNT_SHIFT                    27
+# define GEN7_PS_BINDING_TABLE_ENTRY_COUNT_SHIFT        18
+# define GEN7_PS_FLOATING_POINT_MODE_IEEE_754           (0 << 16)
+# define GEN7_PS_FLOATING_POINT_MODE_ALT                (1 << 16)
+/* DW3: scratch space */
+/* DW4 */
+# define IVB_PS_MAX_THREADS_SHIFT                      24
+# define HSW_PS_MAX_THREADS_SHIFT                      23
+# define HSW_PS_SAMPLE_MASK_SHIFT                      12
+# define GEN7_PS_PUSH_CONSTANT_ENABLE                   (1 << 11)
+# define GEN7_PS_ATTRIBUTE_ENABLE                       (1 << 10)
+# define GEN7_PS_OMASK_TO_RENDER_TARGET                 (1 << 9)
+# define GEN7_PS_DUAL_SOURCE_BLEND_ENABLE               (1 << 7)
+# define GEN7_PS_POSOFFSET_NONE                         (0 << 3)
+# define GEN7_PS_POSOFFSET_CENTROID                     (2 << 3)
+# define GEN7_PS_POSOFFSET_SAMPLE                       (3 << 3)
+# define GEN7_PS_32_DISPATCH_ENABLE                     (1 << 2)
+# define GEN7_PS_16_DISPATCH_ENABLE                     (1 << 1)
+# define GEN7_PS_8_DISPATCH_ENABLE                      (1 << 0)
+/* DW5 */
+# define GEN7_PS_DISPATCH_START_GRF_SHIFT_0             16
+# define GEN7_PS_DISPATCH_START_GRF_SHIFT_1             8
+# define GEN7_PS_DISPATCH_START_GRF_SHIFT_2             0
+/* DW6: kernel 1 pointer */
+/* DW7: kernel 2 pointer */
+
+#define GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_SF_CL      GEN7_3D(3, 0, 0x21)
+#define GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_CC         GEN7_3D(3, 0, 0x23)
+
+#define GEN7_3DSTATE_BLEND_STATE_POINTERS               GEN7_3D(3, 0, 0x24)
+#define GEN7_3DSTATE_DEPTH_STENCIL_STATE_POINTERS       GEN7_3D(3, 0, 0x25)
+
+#define GEN7_3DSTATE_BINDING_TABLE_POINTERS_VS          GEN7_3D(3, 0, 0x26)
+#define GEN7_3DSTATE_BINDING_TABLE_POINTERS_HS          GEN7_3D(3, 0, 0x27)
+#define GEN7_3DSTATE_BINDING_TABLE_POINTERS_DS          GEN7_3D(3, 0, 0x28)
+#define GEN7_3DSTATE_BINDING_TABLE_POINTERS_GS          GEN7_3D(3, 0, 0x29)
+#define GEN7_3DSTATE_BINDING_TABLE_POINTERS_PS          GEN7_3D(3, 0, 0x2a)
+
+#define GEN7_3DSTATE_SAMPLER_STATE_POINTERS_VS          GEN7_3D(3, 0, 0x2b)
+#define GEN7_3DSTATE_SAMPLER_STATE_POINTERS_GS          GEN7_3D(3, 0, 0x2e)
+#define GEN7_3DSTATE_SAMPLER_STATE_POINTERS_PS          GEN7_3D(3, 0, 0x2f)
+
+#define GEN7_3DSTATE_URB_VS                             GEN7_3D(3, 0, 0x30)
+#define GEN7_3DSTATE_URB_HS                             GEN7_3D(3, 0, 0x31)
+#define GEN7_3DSTATE_URB_DS                             GEN7_3D(3, 0, 0x32)
+#define GEN7_3DSTATE_URB_GS                             GEN7_3D(3, 0, 0x33)
+/* DW1 */
+# define GEN7_URB_ENTRY_NUMBER_SHIFT            0
+# define GEN7_URB_ENTRY_SIZE_SHIFT              16
+# define GEN7_URB_STARTING_ADDRESS_SHIFT        25
+
+#define GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_VS             GEN7_3D(3, 1, 0x12)
+#define GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_PS             GEN7_3D(3, 1, 0x16)
+/* DW1 */
+# define GEN7_PUSH_CONSTANT_BUFFER_OFFSET_SHIFT 16
+
+struct gen7_cc_viewport {
+	float min_depth;
+	float max_depth;
 };
 
+typedef enum {
+	SAMPLER_FILTER_NEAREST = 0,
+	SAMPLER_FILTER_BILINEAR,
+	FILTER_COUNT
+} sampler_filter_t;
+
+typedef enum {
+	SAMPLER_EXTEND_NONE = 0,
+	SAMPLER_EXTEND_REPEAT,
+	SAMPLER_EXTEND_PAD,
+	SAMPLER_EXTEND_REFLECT,
+	EXTEND_COUNT
+} sampler_extend_t;
+
 #endif
diff -rupN dump_1/lib/intel_batchbuffer.h dump/lib/intel_batchbuffer.h
--- dump_1/lib/intel_batchbuffer.h	2001-01-14 08:11:40.059619273 +0800
+++ dump/lib/intel_batchbuffer.h	2001-01-14 08:10:57.836625323 +0800
@@ -15,6 +15,7 @@ struct intel_batchbuffer {
 
 	uint8_t buffer[BATCH_SZ];
 	uint8_t *ptr;
+	uint8_t *state;
 };
 
 struct intel_batchbuffer *intel_batchbuffer_alloc(drm_intel_bufmgr *bufmgr,
diff -rupN dump_1/lib/intel_chipset.h dump/lib/intel_chipset.h
--- dump_1/lib/intel_chipset.h	2001-01-14 08:11:40.060619273 +0800
+++ dump/lib/intel_chipset.h	2001-01-14 08:10:57.837625206 +0800
@@ -122,7 +122,11 @@
 #define PCI_CHIP_HASWELL_CRW_S_GT2      0x0D2A
 #define PCI_CHIP_HASWELL_CRW_S_GT2_PLUS 0x0D3A
 
-#define PCI_CHIP_VALLEYVIEW_PO		0x0f30 /* VLV PO board */
+#define PCI_CHIP_VALLEYVIEWO		0x0f30 /* VLV PO board */
+#define PCI_CHIP_VALLEYVIEW1            0x0f31
+#define PCI_CHIP_VALLEYVIEW2            0x0f32
+#define PCI_CHIP_VALLEYVIEW3            0x0f33
+
 
 #define IS_MOBILE(devid)	(devid == PCI_CHIP_I855_GM || \
 				 devid == PCI_CHIP_I915_GM || \
@@ -194,9 +198,16 @@
 				 dev == PCI_CHIP_IVYBRIDGE_M_GT2 || \
 				 dev == PCI_CHIP_IVYBRIDGE_S || \
 				 dev == PCI_CHIP_IVYBRIDGE_S_GT2 || \
-				 dev == PCI_CHIP_VALLEYVIEW_PO)
+				 dev == PCI_CHIP_VALLEYVIEWO || \
+				 dev == PCI_CHIP_VALLEYVIEW1 || \
+                                 dev == PCI_CHIP_VALLEYVIEW2 || \
+                                 dev == PCI_CHIP_VALLEYVIEW3 )
+
+#define IS_VALLEYVIEW(devid)	(devid == PCI_CHIP_VALLEYVIEWO || \
+				 devid == PCI_CHIP_VALLEYVIEW1 || \
+                                 devid == PCI_CHIP_VALLEYVIEW2 || \
+                                 devid == PCI_CHIP_VALLEYVIEW3 )
 
-#define IS_VALLEYVIEW(devid)	(devid == PCI_CHIP_VALLEYVIEW_PO)
 
 #define IS_HSW_GT1(devid)       (devid == PCI_CHIP_HASWELL_GT1 || \
 				 devid == PCI_CHIP_HASWELL_M_GT1 || \
diff -rupN dump_1/lib/intel_mmio.c dump/lib/intel_mmio.c
--- dump_1/lib/intel_mmio.c	2001-01-14 08:11:40.060619273 +0800
+++ dump/lib/intel_mmio.c	2001-01-14 08:10:57.838625089 +0800
@@ -92,7 +92,7 @@ intel_get_mmio(struct pci_device *pci_de
 
 	gen = intel_gen(devid);
 	if (gen < 3)
-		mmio_size = 64*1024;
+		mmio_size = 512*1024;
 	else if (gen < 5)
 		mmio_size = 512*1024;
 	else
diff -rupN dump_1/lib/intel_pci.c dump/lib/intel_pci.c
--- dump_1/lib/intel_pci.c	2001-01-14 08:11:40.061619273 +0800
+++ dump/lib/intel_pci.c	2001-01-14 08:10:57.838625089 +0800
@@ -54,8 +54,27 @@ intel_get_pci_device(void)
 		exit(1);
 	}
 
-	/* Grab the graphics card */
+	/* Grab the graphics card. Try the canonical slot first, then
+	 * walk the entire PCI bus for a matching device. */
 	pci_dev = pci_device_find_by_slot(0, 0, 2, 0);
+	if (pci_dev == NULL || pci_dev->vendor_id != 0x8086) {
+		struct pci_device_iterator *iter;
+		struct pci_id_match match;
+
+		match.vendor_id = 0x8086; /* Intel */
+		match.device_id = PCI_MATCH_ANY;
+		match.subvendor_id = PCI_MATCH_ANY;
+		match.subdevice_id = PCI_MATCH_ANY;
+
+		match.device_class = 0x3 << 16;
+		match.device_class_mask = 0xff << 16;
+
+		match.match_data = 0;
+
+		iter = pci_id_match_iterator_create(&match);
+		pci_dev = pci_device_next(iter);
+		pci_iterator_destroy(iter);
+	}
 	if (pci_dev == NULL)
 		errx(1, "Couldn't find graphics card");
 
diff -rupN dump_1/lib/intel_reg.h dump/lib/intel_reg.h
--- dump_1/lib/intel_reg.h	2001-01-14 08:11:40.063619273 +0800
+++ dump/lib/intel_reg.h	2001-01-14 08:10:57.841624755 +0800
@@ -1577,6 +1577,9 @@ SOFTWARE OR THE USE OR OTHER DEALINGS IN
 #define ADPA_HSYNC_ACTIVE_HIGH	(1<<3)
 #define ADPA_HSYNC_ACTIVE_LOW	0
 
+#define PCH_DSP_CHICKEN1	0x42000
+#define PCH_DSP_CHICKEN2	0x42004
+#define PCH_DSP_CHICKEN3	0x4200c
 #define PCH_DSPCLK_GATE_D	0x42020
 #define PCH_DSPRAMCLK_GATE_D	0x42024
 #define PCH_3DCGDIS0		0x46020
@@ -3341,6 +3344,12 @@ typedef enum {
 #define  FDI_SCRAMBLING_ENABLE		(0<<7)
 #define  FDI_SCRAMBLING_DISABLE		(1<<7)
 
+/* Additional cpu TX control regs, from ivb bspec */
+#define DPAFE_BMFUNC		0x6c024
+#define DPAFE_DL_IREFCAL0	0x6c02c
+#define DPAFE_DL_IREFCAL1	0x6c030
+#define DPAFE_DP_IREFCAL	0x6c034
+
 /* FDI_RX, FDI_X is hard-wired to Transcoder_X */
 #define FDI_RXA_CTL		0xf000c
 #define FDI_RXB_CTL		0xf100c
diff -rupN dump_1/lib/intel_vlv.h dump/lib/intel_vlv.h
--- dump_1/lib/intel_vlv.h	1970-01-01 07:30:00.000000000 +0730
+++ dump/lib/intel_vlv.h	2001-01-14 08:10:57.842624647 +0800
@@ -0,0 +1,145 @@
+/* VLV specific header */
+
+#ifndef _INTEL_VLV_H_
+#define _INTEL_VLV_H_
+
+#define false 0
+#define true 1
+
+#define VLV_DISPLAY_BASE                0x180000
+#define RENDER_RING_BASE                0x02000
+#define BLT_RING_BASE                   0x22000
+#define GFX_MODE_GEN7                   0x0229c
+#define RENDER_HWS_PGA_GEN7            (0x04080)
+#define BSD_HWS_PGA_GEN7               (0x04180)
+#define BLT_HWS_PGA_GEN7               (0x04280)
+#define GEN6_BSD_SLEEP_PSMI_CONTROL     0x12050
+#define GEN6_BSD_RNCID                  0x12198
+#define GEN6_BLITTER_ECOSKPD            0x221d0
+#define VLV_MASTER_IER                  0x4400c /* Gunit master IER */
+#define GEN6_PMIER                      0x4402C
+#define VLV_IIR_RW                      0x182084
+#define VLV_ISR                         0x1820ac
+#define FORCEWAKE_VLV                   0x1300b0
+#define FORCEWAKE_ACK_VLV               0x1300b4
+#define GEN6_GDRST                      0x941c
+#define _3D_CHICKEN3                    0x02090
+#define IVB_CHICKEN3                    0x4200c
+#define GEN7_HALF_SLICE_CHICKEN1        0xe100 /* IVB GT1 + VLV */
+#define GEN7_L3CNTLREG1                 0xB01C
+#define GEN7_L3_CHICKEN_MODE_REGISTER   0xB030
+#define GEN7_ROW_CHICKEN2               0xe4f4
+#define GEN7_L3SQCREG4                  0xb034
+#define GEN7_SQ_CHICKEN_MBCUNIT_CONFIG  0x9030
+#define GEN6_MBCTL                      0x0907c
+#define GEN6_UCGCTL2                    0x9404
+#define GEN7_UCGCTL4                    0x940c
+#define FENCE_REG_SANDYBRIDGE_0         0x100000
+#define GEN6_BSD_RING_BASE              0x12000
+#define GEN7_COMMON_SLICE_CHICKEN1      0x7010
+
+
+
+static int IS_DISPLAYREG(uint32_t reg)
+{
+
+	if (reg >= VLV_DISPLAY_BASE)
+		return false;
+
+	if (reg >= RENDER_RING_BASE &&
+                        reg < RENDER_RING_BASE + 0xff)
+		return false;
+
+
+	if (reg >= GEN6_BSD_RING_BASE &&
+                        reg < GEN6_BSD_RING_BASE + 0xff)
+		return false;
+
+	if (reg >= BLT_RING_BASE &&
+                        reg < BLT_RING_BASE + 0xff)
+                return false;
+
+	if (reg == PGTBL_ER)
+                return false;
+
+        if (reg >= IPEIR_I965 &&
+                        reg < HWSTAM)
+                return false;
+
+	if (reg == MI_MODE)
+                return false;
+
+        if (reg == GFX_MODE_GEN7)
+                return false;
+
+        if (reg == RENDER_HWS_PGA_GEN7 ||
+                        reg == BSD_HWS_PGA_GEN7 ||
+                        reg == BLT_HWS_PGA_GEN7)
+                return false;
+
+        if (reg == GEN6_BSD_SLEEP_PSMI_CONTROL ||
+                        reg == GEN6_BSD_RNCID)
+                return false;
+
+        if (reg == GEN6_BLITTER_ECOSKPD)
+                return false;
+
+        if (reg >= 0x4000c &&
+                        reg <= 0x4002c)
+                return false;
+
+        if (reg >= 0x4f000 &&
+                        reg <= 0x4f08f)
+                return false;
+
+        if (reg >= 0x4f100 &&
+                        reg <= 0x4f11f)
+                return false;
+
+        if (reg >= VLV_MASTER_IER &&
+                        reg <= GEN6_PMIER)
+                return false;
+
+	if (reg >= FENCE_REG_SANDYBRIDGE_0 &&
+                        reg < (FENCE_REG_SANDYBRIDGE_0 + (16*8)))
+                return false;
+
+        if (reg >= VLV_IIR_RW &&
+                        reg <= VLV_ISR)
+                return false;
+
+        if (reg == FORCEWAKE_VLV ||
+                        reg == FORCEWAKE_ACK_VLV ||
+                        reg == 0x130090)
+                return false;
+
+        if (reg == GEN6_GDRST)
+                return false;
+
+        if(reg > 0x9400 && reg <= 0x9418){
+                return false;
+        }
+
+	  switch (reg) {
+               case _3D_CHICKEN3:
+               case IVB_CHICKEN3:
+               case GEN7_HALF_SLICE_CHICKEN1:
+               case GEN7_COMMON_SLICE_CHICKEN1:
+               case GEN7_L3CNTLREG1:
+               case GEN7_L3_CHICKEN_MODE_REGISTER:
+               case GEN7_ROW_CHICKEN2:
+               case GEN7_L3SQCREG4:
+               case GEN7_SQ_CHICKEN_MBCUNIT_CONFIG:
+               case GEN6_MBCTL:
+               case GEN6_UCGCTL2:
+               case GEN7_UCGCTL4:
+                      return false;
+               default:
+                      break;
+        }
+
+        return true;
+}
+
+#endif
+
diff -rupN dump_1/lib/rendercopy_gen7.c dump/lib/rendercopy_gen7.c
--- dump_1/lib/rendercopy_gen7.c	2001-01-14 08:11:40.064619273 +0800
+++ dump/lib/rendercopy_gen7.c	2001-01-14 08:10:57.844624435 +0800
@@ -4,53 +4,22 @@
 #include <assert.h>
 
 #define ALIGN(x, y) (((x) + (y)-1) & ~((y)-1))
-#define VERTEX_SIZE (3*4)
 
-#if DEBUG_RENDERCPY
-static void dump_batch(struct intel_batchbuffer *batch)
-#else
-#define dump_batch(x) do { } while(0)
-#endif
-
-struct {
-	uint32_t cc_state;
-	uint32_t blend_state;
-	uint32_t ds_state;
-} cc;
-
-struct {
-	uint32_t cc_state;
-	uint32_t sf_clip_state;
-} viewport;
-
-/* see shaders/ps/blit.g7a */
 static const uint32_t ps_kernel[][4] = {
-#if 1
-   { 0x0060005a, 0x214077bd, 0x000000c0, 0x008d0040 },
-   { 0x0060005a, 0x216077bd, 0x000000c0, 0x008d0080 },
-   { 0x0060005a, 0x218077bd, 0x000000d0, 0x008d0040 },
-   { 0x0060005a, 0x21a077bd, 0x000000d0, 0x008d0080 },
-   { 0x02800031, 0x2e001e3d, 0x00000140, 0x08840001 },
-   { 0x05800031, 0x20001e3c, 0x00000e00, 0x90031000 },
-
-#else
-   /* Write all -1 */
-   { 0x00600001, 0x2e000061, 0x00000000, 0x3f800000 },
-   { 0x00600001, 0x2e200061, 0x00000000, 0x3f800000 },
-   { 0x00600001, 0x2e400061, 0x00000000, 0x3f800000 },
-   { 0x00600001, 0x2e600061, 0x00000000, 0x3f800000 },
-   { 0x00600001, 0x2e800061, 0x00000000, 0x3f800000 },
-   { 0x00600001, 0x2ea00061, 0x00000000, 0x3f800000 },
-   { 0x00600001, 0x2ec00061, 0x00000000, 0x3f800000 },
-   { 0x00600001, 0x2ee00061, 0x00000000, 0x3f800000 },
-   { 0x05800031, 0x20001e3c, 0x00000e00, 0x90031000 },
-#endif
+	{ 0x0080005a, 0x2e2077bd, 0x000000c0, 0x008d0040 },
+	{ 0x0080005a, 0x2e6077bd, 0x000000d0, 0x008d0040 },
+	{ 0x02800031, 0x21801fa9, 0x008d0e20, 0x08840001 },
+	{ 0x00800001, 0x2e2003bd, 0x008d0180, 0x00000000 },
+	{ 0x00800001, 0x2e6003bd, 0x008d01c0, 0x00000000 },
+	{ 0x00800001, 0x2ea003bd, 0x008d0200, 0x00000000 },
+	{ 0x00800001, 0x2ee003bd, 0x008d0240, 0x00000000 },
+	{ 0x05800031, 0x20001fa8, 0x008d0e20, 0x90031000 },
 };
 
 static uint32_t
 batch_used(struct intel_batchbuffer *batch)
 {
-	return batch->ptr - batch->buffer;
+	return batch->state - batch->buffer;
 }
 
 static uint32_t
@@ -58,7 +27,7 @@ batch_align(struct intel_batchbuffer *ba
 {
 	uint32_t offset = batch_used(batch);
 	offset = ALIGN(offset, align);
-	batch->ptr = batch->buffer + offset;
+	batch->state = batch->buffer + offset;
 	return offset;
 }
 
@@ -66,7 +35,7 @@ static void *
 batch_alloc(struct intel_batchbuffer *batch, uint32_t size, uint32_t align)
 {
 	uint32_t offset = batch_align(batch, align);
-	batch->ptr += size;
+	batch->state += size;
 	return memset(batch->buffer + offset, 0, size);
 }
 
@@ -83,7 +52,7 @@ batch_copy(struct intel_batchbuffer *bat
 }
 
 static void
-gen6_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end)
+gen7_render_flush(struct intel_batchbuffer *batch, uint32_t batch_end)
 {
 	int ret;
 
@@ -94,11 +63,24 @@ gen6_render_flush(struct intel_batchbuff
 	assert(ret == 0);
 }
 
-/* Mostly copy+paste from gen6, except height, width, pitch moved */
 static uint32_t
-gen7_bind_buf(struct intel_batchbuffer *batch, struct scratch_buf *buf,
-	      uint32_t format, int is_dst) {
-	struct gen7_surface_state *ss;
+gen7_tiling_bits(uint32_t tiling)
+{
+	switch (tiling) {
+	default: assert(0);
+	case I915_TILING_NONE: return 0;
+	case I915_TILING_X: return GEN7_SURFACE_TILED;
+	case I915_TILING_Y: return GEN7_SURFACE_TILED | GEN7_SURFACE_TILED_Y;
+	}
+}
+
+static uint32_t
+gen7_bind_buf(struct intel_batchbuffer *batch,
+	      struct scratch_buf *buf,
+	      uint32_t format,
+	      int is_dst)
+{
+	uint32_t *ss;
 	uint32_t write_domain, read_domain;
 	int ret;
 
@@ -110,13 +92,20 @@ gen7_bind_buf(struct intel_batchbuffer *
 	}
 
 	ss = batch_alloc(batch, sizeof(*ss), 32);
-	ss->ss0.surface_type = GEN6_SURFACE_2D;
-	ss->ss0.surface_format = format;
-	ss->ss0.render_cache_read_write = 1; /* GEN7+ */
-	ss->ss0.tiled_surface = buf->tiling != I915_TILING_NONE;
-	ss->ss0.tile_walk     = buf->tiling == I915_TILING_Y;
 
-	ss->ss1.base_addr = buf->bo->offset;
+	ss[0] = (GEN7_SURFACE_2D << GEN7_SURFACE_TYPE_SHIFT |
+		 gen7_tiling_bits(buf->tiling) |
+		format << GEN7_SURFACE_FORMAT_SHIFT);
+	ss[1] = buf->bo->offset;
+	ss[2] = ((buf_width(buf) - 1)  << GEN7_SURFACE_WIDTH_SHIFT |
+		 (buf_height(buf) - 1) << GEN7_SURFACE_HEIGHT_SHIFT);
+	ss[3] = (buf->stride - 1) << GEN7_SURFACE_PITCH_SHIFT;
+	ss[4] = 0;
+	ss[5] = 0;
+	ss[6] = 0;
+	ss[7] = 0;
+	if (IS_HASWELL(batch->devid))
+		ss[7] |= HSW_SURFACE_SWIZZLE(RED, GREEN, BLUE, ALPHA);
 
 	ret = drm_intel_bo_emit_reloc(batch->bo,
 				      batch_offset(batch, ss) + 4,
@@ -124,574 +113,403 @@ gen7_bind_buf(struct intel_batchbuffer *
 				      read_domain, write_domain);
 	assert(ret == 0);
 
-	ss->ss2.height = buf_height(buf) - 1;
-	ss->ss2.width  = buf_width(buf) - 1;
-	ss->ss3.pitch  = buf->stride - 1;
-
-	if (IS_HASWELL(batch->devid)) {
-		ss->ss7.shader_chanel_select_a = 4;
-		ss->ss7.shader_chanel_select_g = 5;
-		ss->ss7.shader_chanel_select_b = 6;
-		ss->ss7.shader_chanel_select_a = 7;
-	}
-
 	return batch_offset(batch, ss);
 }
 
-static uint32_t
-gen7_bind_surfaces(struct intel_batchbuffer *batch,
-		   struct scratch_buf *src,
-		   struct scratch_buf *dst) {
-	uint32_t *binding_table;
-
-	binding_table = batch_alloc(batch, 8, 32);
-
-	binding_table[0] =
-		gen7_bind_buf(batch, dst, GEN6_SURFACEFORMAT_B8G8R8A8_UNORM, 1);
-	binding_table[1] =
-		gen7_bind_buf(batch, src, GEN6_SURFACEFORMAT_B8G8R8A8_UNORM, 0);
+static void
+gen7_emit_vertex_elements(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_VERTEX_ELEMENTS |
+		  ((2 * (1 + 2)) + 1 - 2));
 
-	return batch_offset(batch, binding_table);
+	OUT_BATCH(0 << GEN7_VE0_VERTEX_BUFFER_INDEX_SHIFT | GEN7_VE0_VALID |
+		  GEN7_SURFACEFORMAT_R32G32B32A32_FLOAT << GEN7_VE0_FORMAT_SHIFT |
+		  0 << GEN7_VE0_OFFSET_SHIFT);
+
+	OUT_BATCH(GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_0_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_1_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_2_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_3_SHIFT);
+
+	/* x,y */
+	OUT_BATCH(0 << GEN7_VE0_VERTEX_BUFFER_INDEX_SHIFT | GEN7_VE0_VALID |
+		  GEN7_SURFACEFORMAT_R16G16_SSCALED << GEN7_VE0_FORMAT_SHIFT |
+		  0 << GEN7_VE0_OFFSET_SHIFT); /* offsets vb in bytes */
+	OUT_BATCH(GEN7_VFCOMPONENT_STORE_SRC << GEN7_VE1_VFCOMPONENT_0_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_SRC << GEN7_VE1_VFCOMPONENT_1_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_2_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_1_FLT << GEN7_VE1_VFCOMPONENT_3_SHIFT);
+
+	/* s,t */
+	OUT_BATCH(0 << GEN7_VE0_VERTEX_BUFFER_INDEX_SHIFT | GEN7_VE0_VALID |
+		  GEN7_SURFACEFORMAT_R16G16_SSCALED << GEN7_VE0_FORMAT_SHIFT |
+		  4 << GEN7_VE0_OFFSET_SHIFT);  /* offset vb in bytes */
+	OUT_BATCH(GEN7_VFCOMPONENT_STORE_SRC << GEN7_VE1_VFCOMPONENT_0_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_SRC << GEN7_VE1_VFCOMPONENT_1_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_0 << GEN7_VE1_VFCOMPONENT_2_SHIFT |
+		  GEN7_VFCOMPONENT_STORE_1_FLT << GEN7_VE1_VFCOMPONENT_3_SHIFT);
 }
 
-/* Mostly copy+paste from gen6, except wrap modes moved */
 static uint32_t
-gen7_create_sampler(struct intel_batchbuffer *batch) {
-	struct gen7_sampler_state *ss;
-
-	ss = batch_alloc(batch, sizeof(*ss), 32);
+gen7_create_vertex_buffer(struct intel_batchbuffer *batch,
+			  uint32_t src_x, uint32_t src_y,
+			  uint32_t dst_x, uint32_t dst_y,
+			  uint32_t width, uint32_t height)
+{
+	uint16_t *v;
+
+	v = batch_alloc(batch, 12*sizeof(*v), 8);
+
+	v[0] = dst_x + width;
+	v[1] = dst_y + height;
+	v[2] = src_x + width;
+	v[3] = src_y + height;
+
+	v[4] = dst_x;
+	v[5] = dst_y + height;
+	v[6] = src_x;
+	v[7] = src_y + height;
+
+	v[8] = dst_x;
+	v[9] = dst_y;
+	v[10] = src_x;
+	v[11] = src_y;
 
-	ss->ss0.min_filter = GEN6_MAPFILTER_NEAREST;
-	ss->ss0.mag_filter = GEN6_MAPFILTER_NEAREST;
-	ss->ss3.r_wrap_mode = GEN6_TEXCOORDMODE_CLAMP;
-	ss->ss3.s_wrap_mode = GEN6_TEXCOORDMODE_CLAMP;
-	ss->ss3.t_wrap_mode = GEN6_TEXCOORDMODE_CLAMP;
-
-	/* I've experimented with non-normalized coordinates and using the LD
-	 * sampler fetch, but couldn't make it work. */
-	ss->ss3.non_normalized_coord = 0;
-
-	return batch_offset(batch, ss);
+	return batch_offset(batch, v);
 }
 
-/**
- * gen7_fill_vertex_buffer_data populate vertex buffer with data.
- *
- * The vertex buffer consists of 3 vertices to construct a RECTLIST. The 4th
- * vertex is implied (automatically derived by the HW). Each element has the
- * destination offset, and the normalized texture offset (src). The rectangle
- * itself will span the entire subsurface to be copied.
- *
- * see gen6_emit_vertex_elements
- */
-static uint32_t
-gen7_fill_vertex_buffer_data(struct intel_batchbuffer *batch,
-			     struct scratch_buf *src,
-			     uint32_t src_x, uint32_t src_y,
-			     uint32_t dst_x, uint32_t dst_y,
-			     uint32_t width, uint32_t height) {
-	void *ret;
-
-	ret = batch->ptr;
-
-	emit_vertex_2s(batch, dst_x + width, dst_y + height);
-	emit_vertex_normalized(batch, src_x + width, buf_width(src));
-	emit_vertex_normalized(batch, src_y + height, buf_height(src));
-
-	emit_vertex_2s(batch, dst_x, dst_y + height);
-	emit_vertex_normalized(batch, src_x, buf_width(src));
-	emit_vertex_normalized(batch, src_y + height, buf_height(src));
-
-	emit_vertex_2s(batch, dst_x, dst_y);
-	emit_vertex_normalized(batch, src_x, buf_width(src));
-	emit_vertex_normalized(batch, src_y, buf_height(src));
-
-	return batch_offset(batch, ret);
-}
-
-/**
- * gen6_emit_vertex_elements - The vertex elements describe the contents of the
- * vertex buffer. We pack the vertex buffer in a semi weird way, conforming to
- * what gen6_rendercopy did. The most straightforward would be to store
- * everything as floats.
- *
- * see gen7_fill_vertex_buffer_data() for where the corresponding elements are
- * packed.
- */
-static void
-gen6_emit_vertex_elements(struct intel_batchbuffer *batch) {
-	/*
-	 * The VUE layout
-	 *    dword 0-3: pad (0, 0, 0. 0)
-	 *    dword 4-7: position (x, y, 0, 1.0),
-	 *    dword 8-11: texture coordinate 0 (u0, v0, 0, 1.0)
-	 */
-	OUT_BATCH(GEN6_3DSTATE_VERTEX_ELEMENTS | (3 * 2 + 1 - 2));
-
-	/* Element state 0. These are 4 dwords of 0 required for the VUE format.
-	 * We don't really know or care what they do.
-	 */
-	OUT_BATCH(0 << VE0_VERTEX_BUFFER_INDEX_SHIFT | VE0_VALID |
-		  GEN6_SURFACEFORMAT_R32G32B32A32_FLOAT << VE0_FORMAT_SHIFT |
-		  0 << VE0_OFFSET_SHIFT); /* we specify 0, but it's really does not exist */
-	OUT_BATCH(GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_0_SHIFT |
-		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_1_SHIFT |
-		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_2_SHIFT |
-		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_3_SHIFT);
-
-	/* Element state 1 - Our "destination" vertices. These are passed down
-	 * through the pipeline, and eventually make it to the pixel shader as
-	 * the offsets in the destination surface. It's packed as the 16
-	 * signed/scaled because of gen6 rendercopy. I see no particular reason
-	 * for doing this though.
-	 */
-	OUT_BATCH(0 << VE0_VERTEX_BUFFER_INDEX_SHIFT | VE0_VALID |
-		  GEN6_SURFACEFORMAT_R16G16_SSCALED << VE0_FORMAT_SHIFT |
-		  0 << VE0_OFFSET_SHIFT); /* offsets vb in bytes */
-	OUT_BATCH(GEN6_VFCOMPONENT_STORE_SRC << VE1_VFCOMPONENT_0_SHIFT |
-		  GEN6_VFCOMPONENT_STORE_SRC << VE1_VFCOMPONENT_1_SHIFT |
-		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_2_SHIFT |
-		  GEN6_VFCOMPONENT_STORE_1_FLT << VE1_VFCOMPONENT_3_SHIFT);
-
-	/* Element state 2. Last but not least we store the U,V components as
-	 * normalized floats. These will be used in the pixel shader to sample
-	 * from the source buffer.
-	 */
-	OUT_BATCH(0 << VE0_VERTEX_BUFFER_INDEX_SHIFT | VE0_VALID |
-		  GEN6_SURFACEFORMAT_R32G32_FLOAT << VE0_FORMAT_SHIFT |
-		  4 << VE0_OFFSET_SHIFT);	/* offset vb in bytes */
-	OUT_BATCH(GEN6_VFCOMPONENT_STORE_SRC << VE1_VFCOMPONENT_0_SHIFT |
-		  GEN6_VFCOMPONENT_STORE_SRC << VE1_VFCOMPONENT_1_SHIFT |
-		  GEN6_VFCOMPONENT_STORE_0 << VE1_VFCOMPONENT_2_SHIFT |
-		  GEN6_VFCOMPONENT_STORE_1_FLT << VE1_VFCOMPONENT_3_SHIFT);
-}
-
-/**
- * gen7_emit_vertex_buffer emit the vertex buffers command
- *
- * @batch
- * @offset - bytw offset within the @batch where the vertex buffer starts.
- */
 static void gen7_emit_vertex_buffer(struct intel_batchbuffer *batch,
-				    uint32_t offset) {
-	OUT_BATCH(GEN6_3DSTATE_VERTEX_BUFFERS | (4 * 1 - 1));
-	OUT_BATCH(0 << VB0_BUFFER_INDEX_SHIFT | /* VB 0th index */
-		  VB0_VERTEXDATA |
-		  GEN7_VB0_BUFFER_ADDR_MOD_EN | /* Address Modify Enable */
-		  VERTEX_SIZE << VB0_BUFFER_PITCH_SHIFT);
+				    int src_x, int src_y,
+				    int dst_x, int dst_y,
+				    int width, int height)
+{
+	uint32_t offset;
+
+	offset = gen7_create_vertex_buffer(batch,
+					   src_x, src_y,
+					   dst_x, dst_y,
+					   width, height);
+
+	OUT_BATCH(GEN7_3DSTATE_VERTEX_BUFFERS | (5 - 2));
+	OUT_BATCH(0 << GEN7_VB0_BUFFER_INDEX_SHIFT |
+		  GEN7_VB0_VERTEXDATA |
+		  GEN7_VB0_ADDRESS_MODIFY_ENABLE |
+		  4*2 << GEN7_VB0_BUFFER_PITCH_SHIFT);
+
 	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_VERTEX, 0, offset);
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_VERTEX, 0, offset + (VERTEX_SIZE * 3) - 1);
+	OUT_BATCH(~0);
 	OUT_BATCH(0);
 }
 
 static uint32_t
-gen6_create_cc_state(struct intel_batchbuffer *batch)
+gen7_bind_surfaces(struct intel_batchbuffer *batch,
+		   struct scratch_buf *src,
+		   struct scratch_buf *dst)
 {
-	struct gen6_color_calc_state *cc_state;
-	cc_state = batch_alloc(batch, sizeof(*cc_state), 64);
-	return batch_offset(batch, cc_state);
+	uint32_t *binding_table;
+
+	binding_table = batch_alloc(batch, 8, 32);
+
+	binding_table[0] =
+		gen7_bind_buf(batch, dst, GEN7_SURFACEFORMAT_B8G8R8A8_UNORM, 1);
+	binding_table[1] =
+		gen7_bind_buf(batch, src, GEN7_SURFACEFORMAT_B8G8R8A8_UNORM, 0);
+
+	return batch_offset(batch, binding_table);
 }
 
-static uint32_t
-gen6_create_depth_stencil_state(struct intel_batchbuffer *batch)
+static void
+gen7_emit_binding_table(struct intel_batchbuffer *batch,
+			struct scratch_buf *src,
+			struct scratch_buf *dst)
+{
+	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_PS | (2 - 2));
+	OUT_BATCH(gen7_bind_surfaces(batch, src, dst));
+}
+
+static void
+gen7_emit_drawing_rectangle(struct intel_batchbuffer *batch, struct scratch_buf *dst)
 {
-	struct gen6_depth_stencil_state *depth;
-	depth = batch_alloc(batch, sizeof(*depth), 64);
-	depth->ds0.stencil_enable = 0;
-	return batch_offset(batch, depth);
+	OUT_BATCH(GEN7_3DSTATE_DRAWING_RECTANGLE | (4 - 2));
+	OUT_BATCH(0);
+	OUT_BATCH((buf_height(dst) - 1) << 16 | (buf_width(dst) - 1));
+	OUT_BATCH(0);
 }
 
 static uint32_t
-gen6_create_blend_state(struct intel_batchbuffer *batch)
+gen7_create_blend_state(struct intel_batchbuffer *batch)
 {
-	struct gen6_blend_state *blend;
+	struct gen7_blend_state *blend;
+
 	blend = batch_alloc(batch, sizeof(*blend), 64);
-	blend->blend0.blend_enable = 0;
+
+	blend->blend0.dest_blend_factor = GEN7_BLENDFACTOR_ZERO;
+	blend->blend0.source_blend_factor = GEN7_BLENDFACTOR_ONE;
+	blend->blend0.blend_func = GEN7_BLENDFUNCTION_ADD;
+	blend->blend1.post_blend_clamp_enable = 1;
 	blend->blend1.pre_blend_clamp_enable = 1;
+
 	return batch_offset(batch, blend);
 }
 
+static void
+gen7_emit_state_base_address(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_STATE_BASE_ADDRESS | (10 - 2));
+	OUT_BATCH(0);
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+	OUT_BATCH(0);
+	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+
+	OUT_BATCH(0);
+	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
+	OUT_BATCH(0);
+	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
+}
+
 static uint32_t
-gen6_create_cc_viewport(struct intel_batchbuffer *batch)
+gen7_create_cc_viewport(struct intel_batchbuffer *batch)
 {
-	struct gen6_cc_viewport *vp;
+	struct gen7_cc_viewport *vp;
 
 	vp = batch_alloc(batch, sizeof(*vp), 32);
-	/* XXX I don't understand this */
 	vp->min_depth = -1.e35;
 	vp->max_depth = 1.e35;
+
 	return batch_offset(batch, vp);
 }
 
-static uint32_t
-gen7_create_sf_clip_viewport(struct intel_batchbuffer *batch) {
-	/* XXX these are likely not needed */
-	struct gen7_sf_clip_viewport *scv_state;
-	scv_state = batch_alloc(batch, sizeof(*scv_state), 64);
-	scv_state->guardband.xmin = 0;
-	scv_state->guardband.xmax = 1.0f;
-	scv_state->guardband.ymin = 0;
-	scv_state->guardband.ymax = 1.0f;
-	return batch_offset(batch, scv_state);
+static void
+gen7_emit_cc(struct intel_batchbuffer *batch)
+{
+        OUT_BATCH(GEN7_3DSTATE_BLEND_STATE_POINTERS | (2 - 2));
+        OUT_BATCH(gen7_create_blend_state(batch));
+
+        OUT_BATCH(GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_CC | (2 - 2));
+	OUT_BATCH(gen7_create_cc_viewport(batch));
 }
 
 static uint32_t
-gen6_create_scissor_rect(struct intel_batchbuffer *batch)
+gen7_create_sampler(struct intel_batchbuffer *batch)
 {
-	struct gen6_scissor_rect *scissor;
-	scissor = batch_alloc(batch, sizeof(*scissor), 64);
-	return batch_offset(batch, scissor);
-}
+	struct gen7_sampler_state *ss;
 
+	ss = batch_alloc(batch, sizeof(*ss), 32);
 
+	ss->ss0.min_filter = GEN7_MAPFILTER_NEAREST;
+	ss->ss0.mag_filter = GEN7_MAPFILTER_NEAREST;
 
+	ss->ss3.r_wrap_mode = GEN7_TEXCOORDMODE_CLAMP;
+	ss->ss3.s_wrap_mode = GEN7_TEXCOORDMODE_CLAMP;
+	ss->ss3.t_wrap_mode = GEN7_TEXCOORDMODE_CLAMP;
 
+	ss->ss3.non_normalized_coord = 1;
 
-static void
-gen6_emit_sip(struct intel_batchbuffer *batch) {
-	OUT_BATCH(GEN6_STATE_SIP | 0);
-	OUT_BATCH(0);
+	return batch_offset(batch, ss);
 }
 
 static void
-gen7_emit_push_constants(struct intel_batchbuffer *batch) {
-	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_VS);
-	OUT_BATCH(0);
-	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_HS);
-	OUT_BATCH(0);
-	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_DS);
-	OUT_BATCH(0);
-	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_GS);
-	OUT_BATCH(0);
-	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_PS);
-	OUT_BATCH(0);
+gen7_emit_sampler(struct intel_batchbuffer *batch)
+{
+        OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_PS | (2 - 2));
+        OUT_BATCH(gen7_create_sampler(batch));
 }
 
 static void
-gen7_emit_state_base_address(struct intel_batchbuffer *batch) {
-	OUT_BATCH(GEN6_STATE_BASE_ADDRESS | (10 - 2));
-	/* general (stateless) */
-	/* surface */
-	/* instruction */
-	/* indirect */
-	/* dynamic */
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_SAMPLER, 0, BASE_ADDRESS_MODIFY);
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION,
-		  0, BASE_ADDRESS_MODIFY);
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-	OUT_RELOC(batch->bo, I915_GEM_DOMAIN_INSTRUCTION, 0, BASE_ADDRESS_MODIFY);
+gen7_emit_multisample(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_MULTISAMPLE | (4 - 2));
+	OUT_BATCH(GEN7_3DSTATE_MULTISAMPLE_PIXEL_LOCATION_CENTER |
+		  GEN7_3DSTATE_MULTISAMPLE_NUMSAMPLES_1); /* 1 sample/pixel */
+	OUT_BATCH(0);
+	OUT_BATCH(0);
 
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-	OUT_BATCH(0xfffff000 | BASE_ADDRESS_MODIFY); // copied from mesa
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
-	OUT_BATCH(0 | BASE_ADDRESS_MODIFY);
+	OUT_BATCH(GEN7_3DSTATE_SAMPLE_MASK | (2 - 2));
+	OUT_BATCH(1);
 }
 
 static void
-gen7_emit_urb(struct intel_batchbuffer *batch) {
-	/* XXX: Min valid values from mesa */
-	const int vs_entries = 32;
-	const int vs_size = 2;
-	const int vs_start = 2;
+gen7_emit_urb(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_PUSH_CONSTANT_ALLOC_PS | (2 - 2));
+	OUT_BATCH(8); /* in 1KBs */
 
-	OUT_BATCH(GEN7_3DSTATE_URB_VS);
-	OUT_BATCH(vs_entries | ((vs_size - 1) << 16) | (vs_start << 25));
-	OUT_BATCH(GEN7_3DSTATE_URB_GS);
-	OUT_BATCH(vs_start << 25);
-	OUT_BATCH(GEN7_3DSTATE_URB_HS);
-	OUT_BATCH(vs_start << 25);
-	OUT_BATCH(GEN7_3DSTATE_URB_DS);
-	OUT_BATCH(vs_start << 25);
-}
+	/* num of VS entries must be divisible by 8 if size < 9 */
+	OUT_BATCH(GEN7_3DSTATE_URB_VS | (2 - 2));
+	OUT_BATCH((64 << GEN7_URB_ENTRY_NUMBER_SHIFT) |
+		  (2 - 1) << GEN7_URB_ENTRY_SIZE_SHIFT |
+		  (1 << GEN7_URB_STARTING_ADDRESS_SHIFT));
 
-static void
-gen7_emit_cc(struct intel_batchbuffer *batch) {
-	OUT_BATCH(GEN7_3DSTATE_BLEND_STATE_POINTERS);
-	OUT_BATCH(cc.blend_state | 1);
+	OUT_BATCH(GEN7_3DSTATE_URB_HS | (2 - 2));
+	OUT_BATCH((0 << GEN7_URB_ENTRY_SIZE_SHIFT) |
+		  (2 << GEN7_URB_STARTING_ADDRESS_SHIFT));
 
-	OUT_BATCH(GEN6_3DSTATE_CC_STATE_POINTERS);
-	OUT_BATCH(cc.cc_state | 1);
+	OUT_BATCH(GEN7_3DSTATE_URB_DS | (2 - 2));
+	OUT_BATCH((0 << GEN7_URB_ENTRY_SIZE_SHIFT) |
+		  (2 << GEN7_URB_STARTING_ADDRESS_SHIFT));
 
-	OUT_BATCH(GEN7_3DSTATE_DS_STATE_POINTERS);
-	OUT_BATCH(cc.ds_state | 1);
+	OUT_BATCH(GEN7_3DSTATE_URB_GS | (2 - 2));
+	OUT_BATCH((0 << GEN7_URB_ENTRY_SIZE_SHIFT) |
+		  (1 << GEN7_URB_STARTING_ADDRESS_SHIFT));
 }
 
 static void
-gen7_emit_multisample(struct intel_batchbuffer *batch) {
-	OUT_BATCH(GEN6_3DSTATE_MULTISAMPLE | 2);
+gen7_emit_vs(struct intel_batchbuffer *batch)
+{
+	OUT_BATCH(GEN7_3DSTATE_VS | (6 - 2));
+	OUT_BATCH(0); /* no VS kernel */
 	OUT_BATCH(0);
 	OUT_BATCH(0);
 	OUT_BATCH(0);
-
-	OUT_BATCH(GEN6_3DSTATE_SAMPLE_MASK);
-	OUT_BATCH(1);
+	OUT_BATCH(0); /* pass-through */
 }
 
 static void
-gen7_emit_vs(struct intel_batchbuffer *batch) {
-	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_VS);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_VS);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN6_3DSTATE_CONSTANT_VS | (7-2));
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN6_3DSTATE_VS | (6-2));
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
+gen7_emit_hs(struct intel_batchbuffer *batch)
+{
+        OUT_BATCH(GEN7_3DSTATE_HS | (7 - 2));
+        OUT_BATCH(0); /* no HS kernel */
+        OUT_BATCH(0);
+        OUT_BATCH(0);
+        OUT_BATCH(0);
+        OUT_BATCH(0);
+        OUT_BATCH(0); /* pass-through */
 }
 
 static void
-gen7_emit_hs(struct intel_batchbuffer *batch) {
-	OUT_BATCH(GEN7_3DSTATE_CONSTANT_HS | (7-2));
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN7_3DSTATE_HS | (7-2));
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_HS);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_HS);
-	OUT_BATCH(0);
+gen7_emit_te(struct intel_batchbuffer *batch)
+{
+        OUT_BATCH(GEN7_3DSTATE_TE | (4 - 2));
+        OUT_BATCH(0);
+        OUT_BATCH(0);
+        OUT_BATCH(0);
 }
 
 static void
-gen7_emit_gs(struct intel_batchbuffer *batch) {
-	OUT_BATCH(GEN7_3DSTATE_CONSTANT_GS | (7-2));
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN7_3DSTATE_GS | (7-2));
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_GS);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_GS);
-	OUT_BATCH(0);
+gen7_emit_ds(struct intel_batchbuffer *batch)
+{
+        OUT_BATCH(GEN7_3DSTATE_DS | (6 - 2));
+        OUT_BATCH(0);
+        OUT_BATCH(0);
+        OUT_BATCH(0);
+        OUT_BATCH(0);
+        OUT_BATCH(0);
 }
 
 static void
-gen7_emit_ds(struct intel_batchbuffer *batch) {
-	OUT_BATCH(GEN7_3DSTATE_CONSTANT_DS | (7-2));
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN7_3DSTATE_DS | (6-2));
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_DS);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_DS);
-	OUT_BATCH(0);
+gen7_emit_gs(struct intel_batchbuffer *batch)
+{
+        OUT_BATCH(GEN7_3DSTATE_GS | (7 - 2));
+        OUT_BATCH(0); /* no GS kernel */
+        OUT_BATCH(0);
+        OUT_BATCH(0);
+        OUT_BATCH(0);
+        OUT_BATCH(0);
+        OUT_BATCH(0); /* pass-through  */
 }
 
 static void
-gen7_emit_null_state(struct intel_batchbuffer *batch) {
-	gen7_emit_hs(batch);
-	OUT_BATCH(GEN7_3DSTATE_TE | (4-2));
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	gen7_emit_gs(batch);
-	gen7_emit_ds(batch);
-	gen7_emit_vs(batch);
+gen7_emit_streamout(struct intel_batchbuffer *batch)
+{
+        OUT_BATCH(GEN7_3DSTATE_STREAMOUT | (3 - 2));
+        OUT_BATCH(0);
+        OUT_BATCH(0);
 }
 
 static void
-gen7_emit_clip(struct intel_batchbuffer *batch) {
-	OUT_BATCH(GEN6_3DSTATE_CLIP | (4 - 2));
-	OUT_BATCH(0); 
-	OUT_BATCH(0); /*  pass-through */
-	OUT_BATCH(0);
+gen7_emit_sf(struct intel_batchbuffer *batch)
+{
+        OUT_BATCH(GEN7_3DSTATE_SF | (7 - 2));
+        OUT_BATCH(0);
+        OUT_BATCH(GEN7_3DSTATE_SF_CULL_NONE);
+        OUT_BATCH(2 << GEN7_3DSTATE_SF_TRIFAN_PROVOKE_SHIFT);
+        OUT_BATCH(0);
+        OUT_BATCH(0);
+        OUT_BATCH(0);
 }
 
 static void
-gen7_emit_sf(struct intel_batchbuffer *batch) {
+gen7_emit_sbe(struct intel_batchbuffer *batch)
+{
 	OUT_BATCH(GEN7_3DSTATE_SBE | (14 - 2));
-#ifdef GPU_HANG
-	OUT_BATCH(0 << 22 | 1 << 11 | 1 << 4);
-#else
-	OUT_BATCH(1 << 22 | 1 << 11 | 1 << 4);
-#endif
-	OUT_BATCH(0);
+	OUT_BATCH(1 << GEN7_SBE_NUM_OUTPUTS_SHIFT |
+		  1 << GEN7_SBE_URB_ENTRY_READ_LENGTH_SHIFT |
+		  1 << GEN7_SBE_URB_ENTRY_READ_OFFSET_SHIFT);
 	OUT_BATCH(0);
+	OUT_BATCH(0); /* dw4 */
 	OUT_BATCH(0);
 	OUT_BATCH(0);
 	OUT_BATCH(0);
+	OUT_BATCH(0); /* dw8 */
 	OUT_BATCH(0);
 	OUT_BATCH(0);
 	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN6_3DSTATE_SF | (7 - 2));
-	OUT_BATCH(0);
-	OUT_BATCH(GEN6_3DSTATE_SF_CULL_NONE);
-//	OUT_BATCH(2 << GEN6_3DSTATE_SF_TRIFAN_PROVOKE_SHIFT);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
+	OUT_BATCH(0); /* dw12 */
 	OUT_BATCH(0);
 	OUT_BATCH(0);
 }
 
 static void
-gen7_emit_ps(struct intel_batchbuffer *batch, uint32_t kernel) {
-	const int max_threads = 86;
+gen7_emit_ps(struct intel_batchbuffer *batch)
+{
+	int threads;
 
-	OUT_BATCH(GEN6_3DSTATE_WM | (3 - 2));
-	OUT_BATCH(GEN7_WM_DISPATCH_ENABLE |
-		  /* XXX: I don't understand the BARYCENTRIC stuff, but it
-		   * appears we need it to put our setup data in the place we
-		   * expect (g6, see below) */
-		  GEN7_3DSTATE_PS_PERSPECTIVE_PIXEL_BARYCENTRIC);
-	OUT_BATCH(0);
+	if (IS_HASWELL(batch->devid))
+		threads = 40 << HSW_PS_MAX_THREADS_SHIFT | 1 << HSW_PS_SAMPLE_MASK_SHIFT;
+	else
+		threads = 40 << IVB_PS_MAX_THREADS_SHIFT;
 
-	OUT_BATCH(GEN6_3DSTATE_CONSTANT_PS | (7-2));
-	OUT_BATCH(0);
+	OUT_BATCH(GEN7_3DSTATE_PS | (8 - 2));
+	OUT_BATCH(batch_copy(batch, ps_kernel, sizeof(ps_kernel), 64));
+	OUT_BATCH(1 << GEN7_PS_SAMPLER_COUNT_SHIFT |
+		  2 << GEN7_PS_BINDING_TABLE_ENTRY_COUNT_SHIFT);
+	OUT_BATCH(0); /* scratch address */
+	OUT_BATCH(threads |
+		  GEN7_PS_16_DISPATCH_ENABLE |
+		  GEN7_PS_ATTRIBUTE_ENABLE);
+	OUT_BATCH(6 << GEN7_PS_DISPATCH_START_GRF_SHIFT_0);
 	OUT_BATCH(0);
 	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN7_3DSTATE_PS | (8-2));
-	OUT_BATCH(kernel);
-	OUT_BATCH(1 << GEN6_3DSTATE_WM_SAMPLER_COUNT_SHITF |
-		  2 << GEN6_3DSTATE_WM_BINDING_TABLE_ENTRY_COUNT_SHIFT);
-	OUT_BATCH(0); /* scratch space stuff */
-	if (IS_HASWELL(batch->devid)) {
-		OUT_BATCH((max_threads - 1) << GEN7_3DSTATE_WM_MAX_THREADS_SHIFT |
-			  GEN7_3DSTATE_PS_ATTRIBUTE_ENABLED |
-			  GEN6_3DSTATE_WM_16_DISPATCH_ENABLE);
-	} else {
-		OUT_BATCH((max_threads - 1) << HSW_3DSTATE_WM_MAX_THREADS_SHIFT |
-			  GEN7_3DSTATE_PS_ATTRIBUTE_ENABLED |
-			  GEN6_3DSTATE_WM_16_DISPATCH_ENABLE);
-	}
-	OUT_BATCH(6 << GEN6_3DSTATE_WM_DISPATCH_START_GRF_0_SHIFT);
-	OUT_BATCH(0); // kernel 1
-	OUT_BATCH(0); // kernel 2
 }
 
 static void
-gen7_emit_depth(struct intel_batchbuffer *batch) {
-	OUT_BATCH(GEN7_3DSTATE_DEPTH_BUFFER | (7-2));
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
-	OUT_BATCH(GEN7_3DSTATE_HIER_DEPTH_BUFFER | (3-2));
-	OUT_BATCH(0);
-	OUT_BATCH(0);
+gen7_emit_clip(struct intel_batchbuffer *batch)
+{
+        OUT_BATCH(GEN7_3DSTATE_CLIP | (4 - 2));
+        OUT_BATCH(0);
+        OUT_BATCH(0); /* pass-through */
+        OUT_BATCH(0);
 
-	OUT_BATCH(GEN7_3DSTATE_STENCIL_BUFFER | (3-2));
-	OUT_BATCH(0);
-	OUT_BATCH(0);
+        OUT_BATCH(GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_SF_CL | (2 - 2));
+        OUT_BATCH(0);
 }
 
 static void
-gen7_emit_clear(struct intel_batchbuffer *batch) {
-	OUT_BATCH(GEN7_3DSTATE_CLEAR_PARAMS | (3-2));
-	OUT_BATCH(0);
-	OUT_BATCH(1); // clear valid
+gen7_emit_wm(struct intel_batchbuffer *batch)
+{
+        OUT_BATCH(GEN7_3DSTATE_WM | (3 - 2));
+        OUT_BATCH(GEN7_WM_DISPATCH_ENABLE |
+                  GEN7_WM_PERSPECTIVE_PIXEL_BARYCENTRIC);
+        OUT_BATCH(0);
 }
 
 static void
-gen6_emit_drawing_rectangle(struct intel_batchbuffer *batch, struct scratch_buf *dst)
+gen7_emit_null_depth_buffer(struct intel_batchbuffer *batch)
 {
-	OUT_BATCH(GEN6_3DSTATE_DRAWING_RECTANGLE | (4 - 2));
-	OUT_BATCH(0);
-	OUT_BATCH((buf_height(dst) - 1) << 16 | (buf_width(dst) - 1));
-	OUT_BATCH(0);
-}
+        OUT_BATCH(GEN7_3DSTATE_DEPTH_BUFFER | (7 - 2));
+        OUT_BATCH(GEN7_SURFACE_NULL << GEN7_3DSTATE_DEPTH_BUFFER_TYPE_SHIFT |
+                  GEN7_DEPTHFORMAT_D32_FLOAT << GEN7_3DSTATE_DEPTH_BUFFER_FORMAT_SHIFT);
+        OUT_BATCH(0); /* disable depth, stencil and hiz */
+        OUT_BATCH(0);
+        OUT_BATCH(0);
+        OUT_BATCH(0);
+        OUT_BATCH(0);
 
-/* Vertex elements MUST be defined before this according to spec */
-static void gen7_emit_primitive(struct intel_batchbuffer *batch, uint32_t offset)
-{
-	OUT_BATCH(GEN6_3DPRIMITIVE | (7-2));
-	OUT_BATCH(_3DPRIM_RECTLIST);
-	OUT_BATCH(3);	/* vertex count */
-	OUT_BATCH(0);	/*  We're specifying this instead with offset in GEN6_3DSTATE_VERTEX_BUFFERS */
-	OUT_BATCH(1);	/* single instance */
-	OUT_BATCH(0);	/* start instance location */
-	OUT_BATCH(0);	/* index buffer offset, ignored */
-}
-
-/* The general rule is if it's named gen6 it is directly copied from
- * gen6_render_copyfunc.
- *
- * This sets up most of the 3d pipeline, and most of that to NULL state. The
- * docs aren't specific about exactly what must be set up NULL, but the general
- * rule is we could be run at any time, and so the most state we set to NULL,
- * the better our odds of success.
- *
- * +---------------+ <---- 4096
- * |       ^       |
- * |       |       |
- * |    various    |
- * |      state    |
- * |       |       |
- * |_______|_______| <---- 2048 + ?
- * |       ^       |
- * |       |       |
- * |   batch       |
- * |    commands   |
- * |       |       |
- * |       |       |
- * +---------------+ <---- 0 + ?
- *
- * The batch commands point to state within tthe batch, so all state offsets should be
- * 0 < offset < 4096. Both commands and state build upwards, and are constructed
- * in that order. This means too many batch commands can delete state if not
- * careful.
- *
- */
+        OUT_BATCH(GEN7_3DSTATE_CLEAR_PARAMS | (3 - 2));
+        OUT_BATCH(0);
+        OUT_BATCH(0);
+}
 
 #define BATCH_STATE_SPLIT 2048
 void gen7_render_copyfunc(struct intel_batchbuffer *batch,
@@ -699,103 +517,52 @@ void gen7_render_copyfunc(struct intel_b
 			  unsigned width, unsigned height,
 			  struct scratch_buf *dst, unsigned dst_x, unsigned dst_y)
 {
-	uint32_t ps_sampler_state, ps_kernel_off, ps_binding_table;
-	uint32_t scissor_state;
-	uint32_t vertex_buffer;
 	uint32_t batch_end;
 
 	intel_batchbuffer_flush(batch);
 
-	batch_align(batch, 8);
-
-	batch->ptr = &batch->buffer[BATCH_STATE_SPLIT];
-
-	ps_binding_table  = gen7_bind_surfaces(batch, src, dst);
-	ps_sampler_state  = gen7_create_sampler(batch);
-	ps_kernel_off = batch_copy(batch, ps_kernel, sizeof(ps_kernel), 64);
-	vertex_buffer = gen7_fill_vertex_buffer_data(batch, src, src_x, src_y, dst_x, dst_y, width, height);
-	cc.cc_state = gen6_create_cc_state(batch);
-	cc.ds_state = gen6_create_depth_stencil_state(batch);
-	cc.blend_state = gen6_create_blend_state(batch);
-	viewport.cc_state = gen6_create_cc_viewport(batch);
-	viewport.sf_clip_state = gen7_create_sf_clip_viewport(batch);
-	scissor_state = gen6_create_scissor_rect(batch);
-	/* TODO: theree is other state which isn't setup */
-
-	assert(batch->ptr < &batch->buffer[4095]);
-
-	batch->ptr = batch->buffer;
-
-	/* Start emitting the commands. The order roughly follows the mesa blorp
-	 * order */
-	OUT_BATCH(GEN6_PIPELINE_SELECT | PIPELINE_SELECT_3D);
+	batch->state = &batch->buffer[BATCH_STATE_SPLIT];
 
-	gen6_emit_sip(batch);
-
-	gen7_emit_push_constants(batch);
+	OUT_BATCH(GEN7_PIPELINE_SELECT | PIPELINE_SELECT_3D);
 
 	gen7_emit_state_base_address(batch);
-
-	OUT_BATCH(GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_CC);
-	OUT_BATCH(viewport.cc_state);
-	OUT_BATCH(GEN7_3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP);
-	OUT_BATCH(viewport.sf_clip_state);
-
-	gen7_emit_urb(batch);
-
-	gen7_emit_cc(batch);
-
 	gen7_emit_multisample(batch);
-
-	gen7_emit_null_state(batch);
-
-	OUT_BATCH(GEN7_3DSTATE_STREAMOUT | 1);
-	OUT_BATCH(0);
-	OUT_BATCH(0);
-
+	gen7_emit_urb(batch);
+	gen7_emit_vs(batch);
+	gen7_emit_hs(batch);
+	gen7_emit_te(batch);
+	gen7_emit_ds(batch);
+	gen7_emit_gs(batch);
 	gen7_emit_clip(batch);
-
 	gen7_emit_sf(batch);
+	gen7_emit_wm(batch);
+	gen7_emit_streamout(batch);
+	gen7_emit_null_depth_buffer(batch);
 
-	OUT_BATCH(GEN7_3DSTATE_BINDING_TABLE_POINTERS_PS);
-	OUT_BATCH(ps_binding_table);
-
-	OUT_BATCH(GEN7_3DSTATE_SAMPLER_STATE_POINTERS_PS);
-	OUT_BATCH(ps_sampler_state);
-
-	gen7_emit_ps(batch, ps_kernel_off);
-
-	OUT_BATCH(GEN6_3DSTATE_SCISSOR_STATE_POINTERS);
-	OUT_BATCH(scissor_state);
-
-	gen7_emit_depth(batch);
-
-	gen7_emit_clear(batch);
-
-	gen6_emit_drawing_rectangle(batch, dst);
-
-	gen7_emit_vertex_buffer(batch, vertex_buffer);
-	gen6_emit_vertex_elements(batch);
-
-	gen7_emit_primitive(batch, vertex_buffer);
+	gen7_emit_cc(batch);
+        gen7_emit_sampler(batch);
+        gen7_emit_sbe(batch);
+        gen7_emit_ps(batch);
+        gen7_emit_vertex_elements(batch);
+        gen7_emit_vertex_buffer(batch,
+				src_x, src_y, dst_x, dst_y, width, height);
+	gen7_emit_binding_table(batch, src, dst);
+	gen7_emit_drawing_rectangle(batch, dst);
+
+        OUT_BATCH(GEN7_3DPRIMITIVE | (7- 2));
+        OUT_BATCH(GEN7_3DPRIMITIVE_VERTEX_SEQUENTIAL | _3DPRIM_RECTLIST);
+        OUT_BATCH(3);
+        OUT_BATCH(0);
+        OUT_BATCH(1);   /* single instance */
+        OUT_BATCH(0);   /* start instance location */
+        OUT_BATCH(0);   /* index buffer offset, ignored */
 
 	OUT_BATCH(MI_BATCH_BUFFER_END);
 
-	batch_end = batch_align(batch, 8);
+	batch_end = batch->ptr - batch->buffer;
+	batch_end = ALIGN(batch_end, 8);
 	assert(batch_end < BATCH_STATE_SPLIT);
 
-	dump_batch(batch);
-
-	gen6_render_flush(batch, batch_end);
+	gen7_render_flush(batch, batch_end);
 	intel_batchbuffer_reset(batch);
 }
-
-#if DEBUG_RENDERCPY
-static void dump_batch(struct intel_batchbuffer *batch) {
-	int fd = open("/tmp/i965-batchbuffers.dump", O_WRONLY | O_CREAT,  0666);
-	if (fd != -1) {
-		write(fd, batch->buffer, 4096);
-		fd = close(fd);
-	}
-}
-#endif
diff -rupN dump_1/lib/rendercopy.h dump/lib/rendercopy.h
--- dump_1/lib/rendercopy.h	2001-01-14 08:11:40.064619273 +0800
+++ dump/lib/rendercopy.h	2001-01-14 08:10:57.845624331 +0800
@@ -63,6 +63,8 @@ typedef void (*render_copyfunc_t)(struct
 				  unsigned width, unsigned height,
 				  struct scratch_buf *dst, unsigned dst_x, unsigned dst_y);
 
+render_copyfunc_t get_render_copyfunc(int devid);
+
 void gen7_render_copyfunc(struct intel_batchbuffer *batch,
 			  struct scratch_buf *src, unsigned src_x, unsigned src_y,
 			  unsigned width, unsigned height,
diff -rupN dump_1/lib/rendercopy_i830.c dump/lib/rendercopy_i830.c
--- dump_1/lib/rendercopy_i830.c	2001-01-14 08:11:40.065619273 +0800
+++ dump/lib/rendercopy_i830.c	2001-01-14 08:10:57.845624331 +0800
@@ -227,3 +227,19 @@ void gen2_render_copyfunc(struct intel_b
 
 	intel_batchbuffer_flush(batch);
 }
+
+render_copyfunc_t get_render_copyfunc(int devid)
+{
+	render_copyfunc_t copy = NULL;
+
+	if (IS_GEN2(devid))
+		copy = gen2_render_copyfunc;
+	else if (IS_GEN3(devid))
+		copy = gen3_render_copyfunc;
+	else if (IS_GEN6(devid))
+		copy = gen6_render_copyfunc;
+	else if (IS_GEN7(devid))
+		copy = gen7_render_copyfunc;
+
+	return copy;
+}
diff -rupN dump_1/README dump/README
--- dump_1/README	2001-01-14 08:11:40.049619271 +0800
+++ dump/README	2001-01-14 08:10:57.846624230 +0800
@@ -36,6 +36,30 @@ tests/
 	  options to test different kms functionality, again read the source of
 	  the details.
 
+	The more comfortable way to run tests is with piglit. First grab piglit
+	from
+
+	git://anongit.freedesktop.org/piglit
+
+	and build it (no need to install anything). Then we need to link up the
+	i-g-t sources with piglit
+
+	piglit-sources $ cd bin
+	piglit-sources/bin $ ln $i-g-t-sources igt -s
+
+	The tests in the i-g-t sources need to have been built already. Then we
+	can run the testcases with (as usual as root, no other drm clients
+	running):
+
+	piglit-sources # ./piglit-run.py tests/igt.tests <results-file>
+
+	The testlist is built at runtime, so no need to update anything in
+	piglit when adding new tests. See
+
+	piglit-sources $ ./piglit-run.py -h
+
+	for some useful options.
+
 lib/
 	Common helper functions and headers used by the other tools.
 
diff -rupN dump_1/tests/gem_basic.c dump/tests/gem_basic.c
--- dump_1/tests/gem_basic.c	2001-01-14 08:11:40.073619273 +0800
+++ dump/tests/gem_basic.c	2001-01-14 08:10:57.847624131 +0800
@@ -80,11 +80,16 @@ int main(int argc, char **argv)
 {
 	int fd;
 
+	drmtest_subtest_init(argc, argv);
+
 	fd = drm_open_any();
 
-	test_bad_close(fd);
-	test_create_close(fd);
-	test_create_fd_close(fd);
+	if (drmtest_run_subtest("bad-close"))
+		test_bad_close(fd);
+	if (drmtest_run_subtest("create-close"))
+		test_create_close(fd);
+	if (drmtest_run_subtest("create-fd-close"))
+		test_create_fd_close(fd);
 
 	return 0;
 }
diff -rupN dump_1/tests/gem_cacheing.c dump/tests/gem_cacheing.c
--- dump_1/tests/gem_cacheing.c	2001-01-14 08:11:40.073619273 +0800
+++ dump/tests/gem_cacheing.c	2001-01-14 08:10:57.848624033 +0800
@@ -110,21 +110,27 @@ int main(int argc, char **argv)
 	int i, j;
 	uint8_t *cpu_ptr;
 	uint8_t *gtt_ptr;
+	bool skipped_all = true;
+
+	drmtest_subtest_init(argc, argv);
 
 	srandom(0xdeadbeef);
 
 	fd = drm_open_any();
 
-	if (!gem_has_cacheing(fd))
+	if (!gem_has_cacheing(fd)) {
+		printf("no set_caching support detected\n");
 		return 77;
+	}
 
 	devid = intel_get_drm_devid(fd);
 	if (IS_GEN2(devid)) /* chipset only handles cached -> uncached */
 		flags &= ~TEST_READ;
-	if (IS_965(devid)) /* chipset is completely fubar */
+	if (IS_BROADWATER(devid) || IS_CRESTLINE(devid)) {
+		/* chipset is completely fubar */
+		printf("coherency broken on i965g/gm\n");
 		flags = 0;
-	if (flags == 0)
-		return 77;
+	}
 
 	bufmgr = drm_intel_bufmgr_gem_init(fd, 4096);
 	batch = intel_batchbuffer_alloc(bufmgr, devid);
@@ -138,8 +144,10 @@ int main(int argc, char **argv)
 	drmtest_init_aperture_trashers(bufmgr);
 	mappable_gtt_limit = gem_mappable_aperture_size();
 
-	if (flags & TEST_READ) {
+	if (drmtest_run_subtest("reads") && (flags & TEST_READ)) {
 		printf("checking partial reads\n");
+		skipped_all = false;
+
 		for (i = 0; i < ROUNDS; i++) {
 			uint8_t val0 = i;
 			int start, len;
@@ -164,8 +172,10 @@ int main(int argc, char **argv)
 		}
 	}
 
-	if (flags & TEST_WRITE) {
+	if (drmtest_run_subtest("writes") && (flags & TEST_WRITE)) {
 		printf("checking partial writes\n");
+		skipped_all = false;
+
 		for (i = 0; i < ROUNDS; i++) {
 			uint8_t val0 = i, val1;
 			int start, len;
@@ -212,8 +222,10 @@ int main(int argc, char **argv)
 		}
 	}
 
-	if ((flags & TEST_BOTH) == TEST_BOTH) {
+	if (drmtest_run_subtest("read-writes") && (flags & TEST_BOTH) == TEST_BOTH) {
 		printf("checking partial writes after partial reads\n");
+		skipped_all = false;
+
 		for (i = 0; i < ROUNDS; i++) {
 			uint8_t val0 = i, val1, val2;
 			int start, len;
@@ -286,5 +298,5 @@ int main(int argc, char **argv)
 
 	close(fd);
 
-	return 0;
+	return skipped_all ? 77 : 0;
 }
diff -rupN dump_1/tests/gem_cpu_concurrent_blit.c dump/tests/gem_cpu_concurrent_blit.c
--- dump_1/tests/gem_cpu_concurrent_blit.c	2001-01-14 08:11:40.073619273 +0800
+++ dump/tests/gem_cpu_concurrent_blit.c	2001-01-14 08:10:57.848624033 +0800
@@ -93,11 +93,13 @@ main(int argc, char **argv)
 	drm_intel_bufmgr *bufmgr;
 	struct intel_batchbuffer *batch;
 	int num_buffers = 128, max;
-	drm_intel_bo *src[128], *dst[128], *dummy;
+	drm_intel_bo *src[128], *dst[128], *dummy = NULL;
 	int width = 512, height = 512;
 	int fd;
 	int i;
 
+	drmtest_subtest_init(argc, argv);
+
 	fd = drm_open_any();
 
 	max = gem_aperture_size (fd) / (1024 * 1024) / 2;
@@ -108,35 +110,45 @@ main(int argc, char **argv)
 	drm_intel_bufmgr_gem_enable_reuse(bufmgr);
 	batch = intel_batchbuffer_alloc(bufmgr, intel_get_drm_devid(fd));
 
-	for (i = 0; i < num_buffers; i++) {
-		src[i] = create_bo(bufmgr, i, width, height);
-		dst[i] = create_bo(bufmgr, ~i, width, height);
+	if (!drmtest_only_list_subtests()) {
+		for (i = 0; i < num_buffers; i++) {
+			src[i] = create_bo(bufmgr, i, width, height);
+			dst[i] = create_bo(bufmgr, ~i, width, height);
+		}
+		dummy = create_bo(bufmgr, 0, width, height);
 	}
-	dummy = create_bo(bufmgr, 0, width, height);
 
 	/* try to overwrite the source values */
-	for (i = 0; i < num_buffers; i++)
-		intel_copy_bo(batch, dst[i], src[i], width, height);
-	for (i = num_buffers; i--; )
-		set_bo(src[i], 0xdeadbeef, width, height);
-	for (i = 0; i < num_buffers; i++)
-		cmp_bo(dst[i], i, width, height);
+	if (drmtest_run_subtest("overwrite-source")) {
+		for (i = 0; i < num_buffers; i++)
+			intel_copy_bo(batch, dst[i], src[i], width, height);
+		for (i = num_buffers; i--; )
+			set_bo(src[i], 0xdeadbeef, width, height);
+		for (i = 0; i < num_buffers; i++)
+			cmp_bo(dst[i], i, width, height);
+	}
 
 	/* try to read the results before the copy completes */
-	for (i = 0; i < num_buffers; i++)
-		intel_copy_bo(batch, dst[i], src[i], width, height);
-	for (i = num_buffers; i--; )
-		cmp_bo(dst[i], 0xdeadbeef, width, height);
+	if (drmtest_run_subtest("early-read")) {
+		for (i = num_buffers; i--; )
+			set_bo(src[i], 0xdeadbeef, width, height);
+		for (i = 0; i < num_buffers; i++)
+			intel_copy_bo(batch, dst[i], src[i], width, height);
+		for (i = num_buffers; i--; )
+			cmp_bo(dst[i], 0xdeadbeef, width, height);
+	}
 
 	/* and finally try to trick the kernel into loosing the pending write */
-	for (i = num_buffers; i--; )
-		set_bo(src[i], 0xabcdabcd, width, height);
-	for (i = 0; i < num_buffers; i++)
-		intel_copy_bo(batch, dst[i], src[i], width, height);
-	for (i = num_buffers; i--; )
-		intel_copy_bo(batch, dummy, dst[i], width, height);
-	for (i = num_buffers; i--; )
-		cmp_bo(dst[i], 0xabcdabcd, width, height);
+	if (drmtest_run_subtest("gpu-read-after-write")) {
+		for (i = num_buffers; i--; )
+			set_bo(src[i], 0xabcdabcd, width, height);
+		for (i = 0; i < num_buffers; i++)
+			intel_copy_bo(batch, dst[i], src[i], width, height);
+		for (i = num_buffers; i--; )
+			intel_copy_bo(batch, dummy, dst[i], width, height);
+		for (i = num_buffers; i--; )
+			cmp_bo(dst[i], 0xabcdabcd, width, height);
+	}
 
 	return 0;
 }
diff -rupN dump_1/tests/gem_cpu_reloc.c dump/tests/gem_cpu_reloc.c
--- dump_1/tests/gem_cpu_reloc.c	1970-01-01 07:30:00.000000000 +0730
+++ dump/tests/gem_cpu_reloc.c	2001-01-14 08:10:57.849623937 +0800
@@ -0,0 +1,216 @@
+/*
+ * Copyright © 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Chris Wilson <chris@chris-wilson.co.uk>
+ *
+ */
+
+/*
+ * Testcase: Test the relocations through the CPU domain
+ *
+ * Attempt to stress test performing relocations whilst the batch is in the
+ * CPU domain.
+ *
+ * A freshly allocated buffer starts in the CPU domain, and the pwrite
+ * should also be performed whilst in the CPU domain and so we should
+ * execute the relocations within the CPU domain. If for any reason one of
+ * those steps should land it in the GTT domain, we take the secondary
+ * precaution of filling the mappable portion of the GATT.
+ *
+ * In order to detect whether a relocation fails, we first fill a target
+ * buffer with a sequence of invalid commands that would cause the GPU to
+ * immediate hang, and then attempt to overwrite them with a legal, if
+ * short, batchbuffer using a BLT. Then we come to execute the bo, if the
+ * relocation fail and we either copy across all zeros or garbage, then the
+ * GPU will hang.
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <assert.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include "drm.h"
+#include "i915_drm.h"
+#include "drmtest.h"
+#include "intel_bufmgr.h"
+#include "intel_batchbuffer.h"
+#include "intel_gpu_tools.h"
+
+static uint32_t use_blt;
+
+static void copy(int fd, uint32_t batch, uint32_t src, uint32_t dst)
+{
+	struct drm_i915_gem_execbuffer2 execbuf;
+	struct drm_i915_gem_relocation_entry gem_reloc[2];
+	struct drm_i915_gem_exec_object2 gem_exec[3];
+
+	gem_reloc[0].offset = 4 * sizeof(uint32_t);
+	gem_reloc[0].delta = 0;
+	gem_reloc[0].target_handle = dst;
+	gem_reloc[0].read_domains = I915_GEM_DOMAIN_RENDER;
+	gem_reloc[0].write_domain = I915_GEM_DOMAIN_RENDER;
+	gem_reloc[0].presumed_offset = 0;
+
+	gem_reloc[1].offset = 7 * sizeof(uint32_t);
+	gem_reloc[1].delta = 0;
+	gem_reloc[1].target_handle = src;
+	gem_reloc[1].read_domains = I915_GEM_DOMAIN_RENDER;
+	gem_reloc[1].write_domain = 0;
+	gem_reloc[1].presumed_offset = 0;
+
+	memset(gem_exec, 0, sizeof(gem_exec));
+	gem_exec[0].handle = src;
+	gem_exec[1].handle = dst;
+	gem_exec[2].handle = batch;
+	gem_exec[2].relocation_count = 2;
+	gem_exec[2].relocs_ptr = (uintptr_t)gem_reloc;
+
+	memset(&execbuf, 0, sizeof(execbuf));
+	execbuf.buffers_ptr = (uintptr_t)gem_exec;
+	execbuf.buffer_count = 3;
+	execbuf.batch_len = 4096;
+	execbuf.flags = use_blt;
+
+	do_or_die(drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, &execbuf));
+}
+
+static void exec(int fd, uint32_t handle)
+{
+	struct drm_i915_gem_execbuffer2 execbuf;
+	struct drm_i915_gem_exec_object2 gem_exec;
+
+	memset(&gem_exec, 0, sizeof(gem_exec));
+	gem_exec.handle = handle;
+
+	memset(&execbuf, 0, sizeof(execbuf));
+	execbuf.buffers_ptr = (uintptr_t)&gem_exec;
+	execbuf.buffer_count = 1;
+	execbuf.batch_len = 4096;
+
+	do_or_die(drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, &execbuf));
+}
+
+int main(int argc, char **argv)
+{
+	const uint32_t batch[] = {
+		(XY_SRC_COPY_BLT_CMD |
+		 XY_SRC_COPY_BLT_WRITE_ALPHA |
+		 XY_SRC_COPY_BLT_WRITE_RGB),
+		(3 << 24 | /* 32 bits */
+		 0xcc << 16 | /* copy ROP */
+		 4096),
+		0 << 16 | 0, /* dst x1, y1 */
+		1 << 16 | 2,
+		0, /* dst relocation */
+		0 << 16 | 0, /* src x1, y1 */
+		4096,
+		0, /* src relocation */
+		MI_BATCH_BUFFER_END,
+	};
+	const uint32_t hang[] = {-1, -1, -1, -1};
+	const uint32_t end[] = {MI_BATCH_BUFFER_END, 0};
+	uint64_t aper_size;
+	uint32_t noop;
+	uint32_t *handles;
+	int fd, i, count;
+
+	fd = drm_open_any();
+	noop = intel_get_drm_devid(fd);
+
+	use_blt = 0;
+	if (intel_gen(noop) >= 6)
+		use_blt = I915_EXEC_BLT;
+
+	aper_size = gem_mappable_aperture_size();
+	if (intel_get_total_ram_mb() < aper_size / (1024*1024) * 2) {
+		fprintf(stderr, "not enough mem to run test\n");
+		return 77;
+	}
+
+	count = aper_size / 4096 * 2;
+	handles = malloc (count * sizeof(uint32_t));
+	assert(handles);
+
+	noop = gem_create(fd, 4096);
+	gem_write(fd, noop, 0, end, sizeof(end));
+
+	/* fill the entire gart with batches and run them */
+	for (i = 0; i < count; i++) {
+		uint32_t bad;
+
+		handles[i] = gem_create(fd, 4096);
+		gem_write(fd, handles[i], 0, batch, sizeof(batch));
+
+		bad = gem_create(fd, 4096);
+		gem_write(fd, bad, 0, hang, sizeof(hang));
+
+		/* launch the newly created batch */
+		copy(fd, handles[i], noop, bad);
+		exec(fd, bad);
+		gem_close(fd, bad);
+
+		drmtest_progress("gem_cpu_reloc: ", i, 2*count);
+	}
+
+	/* And again in reverse to try and catch the relocation code out */
+	for (i = 0; i < count; i++) {
+		uint32_t bad;
+
+		bad = gem_create(fd, 4096);
+		gem_write(fd, bad, 0, hang, sizeof(hang));
+
+		/* launch the newly created batch */
+		copy(fd, handles[count-i-1], noop, bad);
+		exec(fd, bad);
+		gem_close(fd, bad);
+
+		drmtest_progress("gem_cpu_reloc: ", count+i, 3*count);
+	}
+
+	/* Third time lucky? */
+	for (i = 0; i < count; i++) {
+		uint32_t bad;
+
+		bad = gem_create(fd, 4096);
+		gem_write(fd, bad, 0, hang, sizeof(hang));
+
+		/* launch the newly created batch */
+		gem_set_domain(fd, handles[i],
+			       I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU);
+		copy(fd, handles[i], noop, bad);
+		exec(fd, bad);
+		gem_close(fd, bad);
+
+		drmtest_progress("gem_cpu_reloc: ", 2*count+i, 3*count);
+	}
+
+	printf("Test suceeded, cleanup up - this might take a while.\n");
+	close(fd);
+
+	return 0;
+}
diff -rupN dump_1/tests/gem_cs_prefetch.c dump/tests/gem_cs_prefetch.c
--- dump_1/tests/gem_cs_prefetch.c	2001-01-14 08:11:40.074619273 +0800
+++ dump/tests/gem_cs_prefetch.c	2001-01-14 08:10:57.850623842 +0800
@@ -31,6 +31,9 @@
  * Historically the batch prefetcher doesn't check whether it's crossing page
  * boundaries and likes to throw up when it gets a pagefault in return for his
  * over-eager behaviour. Check for this.
+ *
+ * This test for a bug where we've failed to plug a scratch pte entry into the
+ * very last gtt pte.
  */
 #include <stdlib.h>
 #include <stdio.h>
@@ -160,7 +163,7 @@ int main(int argc, char **argv)
 		drmtest_progress("gem_cs_prefetch: ", i, count);
 	}
 
-	fprintf(stderr, "Test suceeded, cleanup up - this might take a while.\n");
+	printf("Test suceeded, cleanup up - this might take a while.\n");
 	drm_intel_bufmgr_destroy(bufmgr);
 
 	close(fd);
diff -rupN dump_1/tests/gem_cs_tlb.c dump/tests/gem_cs_tlb.c
--- dump_1/tests/gem_cs_tlb.c	1970-01-01 07:30:00.000000000 +0730
+++ dump/tests/gem_cs_tlb.c	2001-01-14 08:10:57.850623842 +0800
@@ -0,0 +1,179 @@
+/*
+ * Copyright © 2011,2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Chris Wilson <chris@chris-wilson.co.uk>
+ *    Daniel Vetter <daniel.vetter@ffwll.ch>
+ *
+ */
+
+/*
+ * Testcase: Check whether we correctly invalidate the cs tlb
+ *
+ * Motivated by a strange bug on launchpad where *acth != ipehr, on snb notably
+ * where everything should be coherent by default.
+ *
+ * https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1063252
+ */
+
+#include <unistd.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <assert.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/time.h>
+#include "drm.h"
+#include "i915_drm.h"
+#include "drmtest.h"
+#include "intel_gpu_tools.h"
+
+#define BATCH_SIZE (1024*1024)
+bool skipped_all = true;
+
+static int exec(int fd, uint32_t handle, int split,
+		uint64_t *gtt_ofs, unsigned ring_id)
+{
+	struct drm_i915_gem_execbuffer2 execbuf;
+	struct drm_i915_gem_exec_object2 gem_exec[1];
+	int ret = 0;
+
+	gem_exec[0].handle = handle;
+	gem_exec[0].relocation_count = 0;
+	gem_exec[0].relocs_ptr = 0;
+	gem_exec[0].alignment = 0;
+	gem_exec[0].offset = 0x00100000;
+	gem_exec[0].flags = 0;
+	gem_exec[0].rsvd1 = 0;
+	gem_exec[0].rsvd2 = 0;
+
+	execbuf.buffers_ptr = (uintptr_t)gem_exec;
+	execbuf.buffer_count = 1;
+	execbuf.batch_start_offset = 0;
+	execbuf.batch_len = 8*(split+1);
+	execbuf.cliprects_ptr = 0;
+	execbuf.num_cliprects = 0;
+	execbuf.DR1 = 0;
+	execbuf.DR4 = 0;
+	execbuf.flags = ring_id;
+	i915_execbuffer2_set_context_id(execbuf, 0);
+	execbuf.rsvd2 = 0;
+
+	ret = drmIoctl(fd,
+		       DRM_IOCTL_I915_GEM_EXECBUFFER2,
+		       &execbuf);
+
+	*gtt_ofs = gem_exec[0].offset;
+
+	return ret;
+}
+
+static void run_on_ring(int fd, unsigned ring_id, const char *ring_name)
+{
+	uint32_t handle, handle_new;
+	uint64_t gtt_offset, gtt_offset_new;
+	uint32_t *batch_ptr, *batch_ptr_old;
+	unsigned split;
+	char buf[100];
+	int i;
+
+	sprintf(buf, "testing %s cs tlb coherency: ", ring_name);
+	skipped_all = false;
+
+	/* Shut up gcc, too stupid. */
+	batch_ptr_old = NULL;
+	handle = 0;
+	gtt_offset = 0;
+
+	for (split = 0; split < BATCH_SIZE/8 - 1; split += 2) {
+		drmtest_progress(buf, split, BATCH_SIZE/8 - 1);
+
+		handle_new = gem_create(fd, BATCH_SIZE);
+		batch_ptr = gem_mmap__cpu(fd, handle_new, BATCH_SIZE,
+					  PROT_READ | PROT_WRITE);
+		batch_ptr[split*2] = MI_BATCH_BUFFER_END;
+
+		for (i = split*2 + 2; i < BATCH_SIZE/8; i++)
+			batch_ptr[i] = 0xffffffff;
+
+		if (split > 0) {
+			gem_sync(fd, handle);
+			gem_close(fd, handle);
+		}
+
+		if (exec(fd, handle_new, split, &gtt_offset_new, 0))
+			exit(1);
+
+		if (split > 0) {
+			/* Check that we've managed to collide in the tlb. */
+			assert(gtt_offset == gtt_offset_new);
+
+			/* We hang onto the storage of the old batch by keeping
+			 * the cpu mmap around. */
+			munmap(batch_ptr_old, BATCH_SIZE);
+		}
+
+		handle = handle_new;
+		gtt_offset = gtt_offset_new;
+		batch_ptr_old = batch_ptr;
+	}
+
+}
+
+int main(int argc, char **argv)
+{
+	int fd;
+	uint32_t devid;
+
+	drmtest_subtest_init(argc, argv);
+
+	fd = drm_open_any();
+	devid = intel_get_drm_devid(fd);
+
+	if (!drmtest_only_list_subtests()) {
+		/* This test is very sensitive to residual gtt_mm noise from previous
+		 * tests. Try to quiet thing down first. */
+		gem_quiescent_gpu(fd);
+		sleep(5); /* needs more serious ducttape */
+	}
+
+	if (drmtest_run_subtest("render"))
+		run_on_ring(fd, I915_EXEC_RENDER, "render");
+
+	if (drmtest_run_subtest("bsd"))
+		if (HAS_BSD_RING(devid))
+			run_on_ring(fd, I915_EXEC_BSD, "bsd");
+
+	if (drmtest_run_subtest("blt"))
+		if (HAS_BLT_RING(devid))
+			run_on_ring(fd, I915_EXEC_BLT, "blt");
+
+	close(fd);
+
+	return skipped_all ? 77 : 0;
+}
diff -rupN dump_1/tests/gem_ctx_basic.c dump/tests/gem_ctx_basic.c
--- dump_1/tests/gem_ctx_basic.c	2001-01-14 08:11:40.074619273 +0800
+++ dump/tests/gem_ctx_basic.c	2001-01-14 08:10:57.851623749 +0800
@@ -58,6 +58,7 @@ static void init_buffer(drm_intel_bufmgr
 static void *work(void *arg)
 {
 	struct intel_batchbuffer *batch;
+	render_copyfunc_t rendercopy = get_render_copyfunc(devid);
 	drm_intel_context *context;
 	drm_intel_bufmgr *bufmgr;
 	int thread_id = *(int *)arg;
@@ -88,7 +89,8 @@ static void *work(void *arg)
 
 
 		if (uncontexted) {
-			gen6_render_copyfunc(batch, &src, 0, 0, 0, 0, &dst, 0, 0);
+			assert(rendercopy);
+			rendercopy(batch, &src, 0, 0, 0, 0, &dst, 0, 0);
 		} else {
 			int ret;
 			ret = drm_intel_bo_subdata(batch->bo, 0, 4096, batch->buffer);
diff -rupN dump_1/tests/gem_ctx_create.c dump/tests/gem_ctx_create.c
--- dump_1/tests/gem_ctx_create.c	2001-01-14 08:11:40.074619273 +0800
+++ dump/tests/gem_ctx_create.c	2001-01-14 08:10:57.851623749 +0800
@@ -49,7 +49,7 @@ int main(int argc, char *argv[])
 
 	ret = drmIoctl(fd, CONTEXT_CREATE_IOCTL, &create);
 	if (ret != 0 && (errno == ENODEV || errno == EINVAL)) {
-		fprintf(stderr, "Kernel is too old, or contexts not supported: %s\n",
+		printf("Kernel is too old, or contexts not supported: %s\n",
 			strerror(errno));
 		exit(77);
 	} else if (ret != 0) {
diff -rupN dump_1/tests/gem_dummy_reloc_loop.c dump/tests/gem_dummy_reloc_loop.c
--- dump_1/tests/gem_dummy_reloc_loop.c	2001-01-14 08:11:40.075619273 +0800
+++ dump/tests/gem_dummy_reloc_loop.c	2001-01-14 08:10:57.852623658 +0800
@@ -127,10 +127,7 @@ int main(int argc, char **argv)
 	int fd;
 	int devid;
 
-	if (argc != 1) {
-		fprintf(stderr, "usage: %s\n", argv[0]);
-		exit(-1);
-	}
+	drmtest_subtest_init(argc, argv);
 
 	fd = drm_open_any();
 	devid = intel_get_drm_devid(fd);
@@ -158,32 +155,39 @@ int main(int argc, char **argv)
 		exit(-1);
 	}
 
-	fprintf(stderr, "running dummy loop on render\n");
-	dummy_reloc_loop(I915_EXEC_RENDER);
-	fprintf(stderr, "dummy loop run on render completed\n");
-
-	if (!HAS_BSD_RING(devid))
-		goto skip;
-
-	sleep(2);
-	fprintf(stderr, "running dummy loop on bsd\n");
-	dummy_reloc_loop(I915_EXEC_BSD);
-	fprintf(stderr, "dummy loop run on bsd completed\n");
-
-	if (!HAS_BLT_RING(devid))
-		goto skip;
-
-	sleep(2);
-	fprintf(stderr, "running dummy loop on blt\n");
-	dummy_reloc_loop(I915_EXEC_BLT);
-	fprintf(stderr, "dummy loop run on blt completed\n");
-
-	sleep(2);
-	fprintf(stderr, "running dummy loop on random rings\n");
-	dummy_reloc_loop_random_ring();
-	fprintf(stderr, "dummy loop run on random rings completed\n");
+	if (drmtest_run_subtest("render")) {
+		printf("running dummy loop on render\n");
+		dummy_reloc_loop(I915_EXEC_RENDER);
+		printf("dummy loop run on render completed\n");
+	}
+
+	if (drmtest_run_subtest("bsd")) {
+		if (HAS_BSD_RING(devid)) {
+			sleep(2);
+			printf("running dummy loop on bsd\n");
+			dummy_reloc_loop(I915_EXEC_BSD);
+			printf("dummy loop run on bsd completed\n");
+		}
+	}
+
+	if (drmtest_run_subtest("blt")) {
+		if (HAS_BLT_RING(devid)) {
+			sleep(2);
+			printf("running dummy loop on blt\n");
+			dummy_reloc_loop(I915_EXEC_BLT);
+			printf("dummy loop run on blt completed\n");
+		}
+	}
+
+	if (drmtest_run_subtest("mixed")) {
+		if (HAS_BLT_RING(devid) && HAS_BSD_RING(devid)) {
+			sleep(2);
+			printf("running dummy loop on random rings\n");
+			dummy_reloc_loop_random_ring();
+			printf("dummy loop run on random rings completed\n");
+		}
+	}
 
-skip:
 	drm_intel_bo_unreference(target_buffer);
 	intel_batchbuffer_free(batch);
 	drm_intel_bufmgr_destroy(bufmgr);
diff -rupN dump_1/tests/gem_exec_bad_domains.c dump/tests/gem_exec_bad_domains.c
--- dump_1/tests/gem_exec_bad_domains.c	2001-01-14 08:11:40.075619273 +0800
+++ dump/tests/gem_exec_bad_domains.c	2001-01-14 08:10:57.853623567 +0800
@@ -83,11 +83,81 @@ run_batch(void)
 	return ret;
 }
 
+#define I915_GEM_GPU_DOMAINS \
+	(I915_GEM_DOMAIN_RENDER | \
+	 I915_GEM_DOMAIN_SAMPLER | \
+	 I915_GEM_DOMAIN_COMMAND | \
+	 I915_GEM_DOMAIN_INSTRUCTION | \
+	 I915_GEM_DOMAIN_VERTEX)
+
+static void multi_write_domain(int fd)
+{
+	struct drm_i915_gem_execbuffer2 execbuf;
+	struct drm_i915_gem_exec_object2 exec[2];
+	struct drm_i915_gem_relocation_entry reloc[1];
+	uint32_t handle, handle_target;
+	int ret;
+
+	handle = gem_create(fd, 4096);
+	handle_target = gem_create(fd, 4096);
+
+	exec[0].handle = handle_target;
+	exec[0].relocation_count = 0;
+	exec[0].relocs_ptr = 0;
+	exec[0].alignment = 0;
+	exec[0].offset = 0;
+	exec[0].flags = 0;
+	exec[0].rsvd1 = 0;
+	exec[0].rsvd2 = 0;
+
+	exec[1].handle = handle;
+	exec[1].relocation_count = 1;
+	exec[1].relocs_ptr = (uintptr_t) reloc;
+	exec[1].alignment = 0;
+	exec[1].offset = 0;
+	exec[1].flags = 0;
+	exec[1].rsvd1 = 0;
+	exec[1].rsvd2 = 0;
+
+	reloc[0].offset = 4;
+	reloc[0].delta = 0;
+	reloc[0].target_handle = handle_target;
+	reloc[0].read_domains = I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION;
+	reloc[0].write_domain = I915_GEM_DOMAIN_RENDER | I915_GEM_DOMAIN_INSTRUCTION;
+	reloc[0].presumed_offset = 0;
+
+	execbuf.buffers_ptr = (uintptr_t)exec;
+	execbuf.buffer_count = 2;
+	execbuf.batch_start_offset = 0;
+	execbuf.batch_len = 8;
+	execbuf.cliprects_ptr = 0;
+	execbuf.num_cliprects = 0;
+	execbuf.DR1 = 0;
+	execbuf.DR4 = 0;
+	execbuf.flags = 0;
+	i915_execbuffer2_set_context_id(execbuf, 0);
+	execbuf.rsvd2 = 0;
+
+	ret = drmIoctl(fd,
+		       DRM_IOCTL_I915_GEM_EXECBUFFER2,
+		       &execbuf);
+
+	gem_close(fd, handle);
+	gem_close(fd, handle_target);
+
+	if (ret == 0 || errno != EINVAL) {
+		fprintf(stderr, "multiple write domains not rejected\n");
+		exit(1);
+	}
+}
+
 int main(int argc, char **argv)
 {
 	int fd, ret;
 	drm_intel_bo *tmp;
 
+	drmtest_subtest_init(argc, argv);
+
 	fd = drm_open_any();
 
 	bufmgr = drm_intel_bufmgr_gem_init(fd, 4096);
@@ -96,44 +166,94 @@ int main(int argc, char **argv)
 
 	tmp = drm_intel_bo_alloc(bufmgr, "tmp", 128 * 128, 4096);
 
-	BEGIN_BATCH(2);
-	OUT_BATCH(0);
-	OUT_RELOC(tmp, I915_GEM_DOMAIN_CPU, 0, 0);
-	ADVANCE_BATCH();
-	ret = run_batch();
-	if (ret != -EINVAL) {
-		fprintf(stderr, "(cpu, 0) reloc not rejected\n");
-		exit(1);
+	if (drmtest_run_subtest("cpu-domain")) {
+		BEGIN_BATCH(2);
+		OUT_BATCH(0);
+		OUT_RELOC(tmp, I915_GEM_DOMAIN_CPU, 0, 0);
+		ADVANCE_BATCH();
+		ret = run_batch();
+		if (ret != -EINVAL) {
+			fprintf(stderr, "(cpu, 0) reloc not rejected\n");
+			exit(1);
+		}
+
+		BEGIN_BATCH(2);
+		OUT_BATCH(0);
+		OUT_RELOC(tmp, I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU, 0);
+		ADVANCE_BATCH();
+		ret = run_batch();
+		if (ret != -EINVAL) {
+			fprintf(stderr, "(cpu, cpu) reloc not rejected\n");
+			exit(1);
+		}
 	}
 
-	BEGIN_BATCH(2);
-	OUT_BATCH(0);
-	OUT_RELOC(tmp, I915_GEM_DOMAIN_CPU, I915_GEM_DOMAIN_CPU, 0);
-	ADVANCE_BATCH();
-	ret = run_batch();
-	if (ret != -EINVAL) {
-		fprintf(stderr, "(cpu, cpu) reloc not rejected\n");
-		exit(1);
+	if (drmtest_run_subtest("gtt-domain")) {
+		BEGIN_BATCH(2);
+		OUT_BATCH(0);
+		OUT_RELOC(tmp, I915_GEM_DOMAIN_GTT, 0, 0);
+		ADVANCE_BATCH();
+		ret = run_batch();
+		if (ret != -EINVAL) {
+			fprintf(stderr, "(gtt, 0) reloc not rejected\n");
+			exit(1);
+		}
+
+		BEGIN_BATCH(2);
+		OUT_BATCH(0);
+		OUT_RELOC(tmp, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT, 0);
+		ADVANCE_BATCH();
+		ret = run_batch();
+		if (ret != -EINVAL) {
+			fprintf(stderr, "(gtt, gtt) reloc not rejected\n");
+			exit(1);
+		}
 	}
 
-	BEGIN_BATCH(2);
-	OUT_BATCH(0);
-	OUT_RELOC(tmp, I915_GEM_DOMAIN_GTT, 0, 0);
-	ADVANCE_BATCH();
-	ret = run_batch();
-	if (ret != -EINVAL) {
-		fprintf(stderr, "(gtt, 0) reloc not rejected\n");
-		exit(1);
+#if 0 /* kernel checks have been eased, doesn't reject conflicting write domains
+	 any more */
+	if (drmtest_run_subtest("conflicting-write-domain")) {
+		BEGIN_BATCH(4);
+		OUT_BATCH(0);
+		OUT_RELOC(tmp, I915_GEM_DOMAIN_RENDER,
+			  I915_GEM_DOMAIN_RENDER, 0);
+		OUT_BATCH(0);
+		OUT_RELOC(tmp, I915_GEM_DOMAIN_INSTRUCTION,
+			  I915_GEM_DOMAIN_INSTRUCTION, 0);
+		ADVANCE_BATCH();
+		ret = run_batch();
+		if (ret != -EINVAL) {
+			fprintf(stderr, "conflicting write domains not rejected\n");
+			exit(1);
+		}
 	}
+#endif
 
-	BEGIN_BATCH(2);
-	OUT_BATCH(0);
-	OUT_RELOC(tmp, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT, 0);
-	ADVANCE_BATCH();
-	ret = run_batch();
-	if (ret != -EINVAL) {
-		fprintf(stderr, "(gtt, gtt) reloc not rejected\n");
-		exit(1);
+	if (drmtest_run_subtest("double-write-domain"))
+		multi_write_domain(fd);
+
+	if (drmtest_run_subtest("invalid-gpu-domain")) {
+		BEGIN_BATCH(2);
+		OUT_BATCH(0);
+		OUT_RELOC(tmp, ~(I915_GEM_GPU_DOMAINS | I915_GEM_DOMAIN_GTT | I915_GEM_DOMAIN_CPU),
+			  0, 0);
+		ADVANCE_BATCH();
+		ret = run_batch();
+		if (ret != -EINVAL) {
+			fprintf(stderr, "invalid gpu read domains not rejected\n");
+			exit(1);
+		}
+
+		BEGIN_BATCH(2);
+		OUT_BATCH(0);
+		OUT_RELOC(tmp, I915_GEM_DOMAIN_GTT << 1,
+			  I915_GEM_DOMAIN_GTT << 1, 0);
+		ADVANCE_BATCH();
+		ret = run_batch();
+		if (ret != -EINVAL) {
+			fprintf(stderr, "invalid gpu domain not rejected\n");
+			exit(1);
+		}
 	}
 
 	intel_batchbuffer_free(batch);
diff -rupN dump_1/tests/gem_exec_big.c dump/tests/gem_exec_big.c
--- dump_1/tests/gem_exec_big.c	1970-01-01 07:30:00.000000000 +0730
+++ dump/tests/gem_exec_big.c	2001-01-14 08:10:57.854623477 +0800
@@ -0,0 +1,127 @@
+/*
+ * Copyright © 2011,2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Chris Wilson <chris@chris-wilson.co.uk>
+ *    Daniel Vetter <daniel.vetter@ffwll.ch>
+ *
+ */
+
+/*
+ * Testcase: run a nop batch which is really big
+ *
+ * Mostly useful to stress-test the error-capture code
+ */
+
+#include <unistd.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <assert.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/time.h>
+#include "drm.h"
+#include "i915_drm.h"
+#include "drmtest.h"
+
+#define MI_BATCH_BUFFER_END	(0xA<<23)
+#define BATCH_SIZE		(1024*1024)
+
+static int exec(int fd, uint32_t handle, uint32_t reloc_ofs)
+{
+	struct drm_i915_gem_execbuffer2 execbuf;
+	struct drm_i915_gem_exec_object2 gem_exec[1];
+	struct drm_i915_gem_relocation_entry gem_reloc[1];
+	uint32_t tmp;
+	int ret = 0;
+
+	gem_reloc[0].offset = reloc_ofs;
+	gem_reloc[0].delta = 0;
+	gem_reloc[0].target_handle = handle;
+	gem_reloc[0].read_domains = I915_GEM_DOMAIN_RENDER;
+	gem_reloc[0].write_domain = 0;
+	gem_reloc[0].presumed_offset = 0;
+
+	gem_exec[0].handle = handle;
+	gem_exec[0].relocation_count = 1;
+	gem_exec[0].relocs_ptr = (uintptr_t) gem_reloc;
+	gem_exec[0].alignment = 0;
+	gem_exec[0].offset = 0;
+	gem_exec[0].flags = 0;
+	gem_exec[0].rsvd1 = 0;
+	gem_exec[0].rsvd2 = 0;
+
+	execbuf.buffers_ptr = (uintptr_t)gem_exec;
+	execbuf.buffer_count = 1;
+	execbuf.batch_start_offset = 0;
+	execbuf.batch_len = 8;
+	execbuf.cliprects_ptr = 0;
+	execbuf.num_cliprects = 0;
+	execbuf.DR1 = 0;
+	execbuf.DR4 = 0;
+	execbuf.flags = 0;
+	i915_execbuffer2_set_context_id(execbuf, 0);
+	execbuf.rsvd2 = 0;
+
+	ret = drmIoctl(fd,
+		       DRM_IOCTL_I915_GEM_EXECBUFFER2,
+		       &execbuf);
+	gem_sync(fd, handle);
+
+	gem_read(fd, handle, reloc_ofs, &tmp, 4);
+
+	assert(tmp == gem_reloc[0].presumed_offset);
+
+	return ret;
+}
+
+int main(int argc, char **argv)
+{
+	uint32_t batch[2] = {MI_BATCH_BUFFER_END};
+	uint32_t handle;
+	int fd;
+	uint32_t reloc_ofs;
+	unsigned batch_size;
+
+	fd = drm_open_any();
+
+	for (batch_size = BATCH_SIZE/4; batch_size <= BATCH_SIZE; batch_size += 4096) {
+		handle = gem_create(fd, batch_size);
+		gem_write(fd, handle, 0, batch, sizeof(batch));
+
+		for (reloc_ofs = 4096; reloc_ofs < batch_size; reloc_ofs += 4096)
+			if (exec(fd, handle, reloc_ofs))
+				exit(1);
+	}
+
+	gem_close(fd, handle);
+
+	close(fd);
+
+	return 0;
+}
diff -rupN dump_1/tests/gem_exec_nop.c dump/tests/gem_exec_nop.c
--- dump_1/tests/gem_exec_nop.c	2001-01-14 08:11:40.076619273 +0800
+++ dump/tests/gem_exec_nop.c	2001-01-14 08:10:57.854623477 +0800
@@ -41,8 +41,9 @@
 #include "drm.h"
 #include "i915_drm.h"
 #include "drmtest.h"
+#include "intel_gpu_tools.h"
 
-#define MI_BATCH_BUFFER_END	(0xA<<23)
+bool skipped_all = true;
 
 static double elapsed(const struct timeval *start,
 		      const struct timeval *end,
@@ -51,7 +52,7 @@ static double elapsed(const struct timev
 	return (1e6*(end->tv_sec - start->tv_sec) + (end->tv_usec - start->tv_usec))/loop;
 }
 
-static int exec(int fd, uint32_t handle, int loops)
+static int exec(int fd, uint32_t handle, int loops, unsigned ring_id)
 {
 	struct drm_i915_gem_execbuffer2 execbuf;
 	struct drm_i915_gem_exec_object2 gem_exec[1];
@@ -74,7 +75,7 @@ static int exec(int fd, uint32_t handle,
 	execbuf.num_cliprects = 0;
 	execbuf.DR1 = 0;
 	execbuf.DR4 = 0;
-	execbuf.flags = 0;
+	execbuf.flags = ring_id;
 	i915_execbuffer2_set_context_id(execbuf, 0);
 	execbuf.rsvd2 = 0;
 
@@ -88,32 +89,55 @@ static int exec(int fd, uint32_t handle,
 	return ret;
 }
 
-int main(int argc, char **argv)
+static void loop(int fd, uint32_t handle, unsigned ring_id, const char *ring_name)
 {
-	uint32_t batch[2] = {MI_BATCH_BUFFER_END};
-	uint32_t handle;
 	int count;
-	int fd;
 
-	fd = drm_open_any();
-
-	handle = gem_create(fd, 4096);
-	gem_write(fd, handle, 0, batch, sizeof(batch));
+	skipped_all = false;
 
 	for (count = 1; count <= 1<<17; count <<= 1) {
 		struct timeval start, end;
 
 		gettimeofday(&start, NULL);
-		if (exec(fd, handle, count))
+		if (exec(fd, handle, count, ring_id))
 			exit(1);
 		gettimeofday(&end, NULL);
-		printf("Time to exec x %d:		%7.3fµs\n",
-		       count, elapsed(&start, &end, count));
+		printf("Time to exec x %d:		%7.3fµs (ring=%s)\n",
+		       count, elapsed(&start, &end, count), ring_name);
 		fflush(stdout);
 	}
+
+}
+int main(int argc, char **argv)
+{
+	uint32_t batch[2] = {MI_BATCH_BUFFER_END};
+	uint32_t handle;
+	uint32_t devid;
+	int fd;
+
+	drmtest_subtest_init(argc, argv);
+
+	fd = drm_open_any();
+	devid = intel_get_drm_devid(fd);
+
+	handle = gem_create(fd, 4096);
+	gem_write(fd, handle, 0, batch, sizeof(batch));
+
+	if (drmtest_run_subtest("render"))
+		loop(fd, handle, I915_EXEC_RENDER, "render");
+
+	if (drmtest_run_subtest("bsd"))
+		if (HAS_BSD_RING(devid))
+			loop(fd, handle, I915_EXEC_BSD, "bsd");
+
+	if (drmtest_run_subtest("blt"))
+		if (HAS_BLT_RING(devid))
+			loop(fd, handle, I915_EXEC_BLT, "blt");
+
+
 	gem_close(fd, handle);
 
 	close(fd);
 
-	return 0;
+	return skipped_all ? 77 : 0;
 }
diff -rupN dump_1/tests/gem_flink.c dump/tests/gem_flink.c
--- dump_1/tests/gem_flink.c	2001-01-14 08:11:40.076619273 +0800
+++ dump/tests/gem_flink.c	2001-01-14 08:10:57.855623389 +0800
@@ -115,16 +115,59 @@ test_bad_open(int fd)
 	assert(ret == -1 && errno == ENOENT);
 }
 
+static void
+test_flink_lifetime(int fd)
+{
+	struct drm_i915_gem_create create;
+	struct drm_gem_flink flink;
+	struct drm_gem_open gem_open;
+	int ret, fd2;
+
+	printf("Testing flink lifetime.\n");
+
+	fd2 = drm_open_any();
+
+	memset(&create, 0, sizeof(create));
+	create.size = 16 * 1024;
+	ret = ioctl(fd2, DRM_IOCTL_I915_GEM_CREATE, &create);
+	assert(ret == 0);
+
+	flink.handle = create.handle;
+	ret = ioctl(fd2, DRM_IOCTL_GEM_FLINK, &flink);
+	assert(ret == 0);
+
+	gem_open.name = flink.name;
+	ret = ioctl(fd, DRM_IOCTL_GEM_OPEN, &gem_open);
+	assert(ret == 0);
+	assert(gem_open.handle != 0);
+
+	close(fd2);
+	fd2 = drm_open_any();
+
+	gem_open.name = flink.name;
+	ret = ioctl(fd2, DRM_IOCTL_GEM_OPEN, &gem_open);
+	assert(ret == 0);
+	assert(gem_open.handle != 0);
+}
+
 int main(int argc, char **argv)
 {
 	int fd;
 
+	drmtest_subtest_init(argc, argv);
+
 	fd = drm_open_any();
 
-	test_flink(fd);
-	test_double_flink(fd);
-	test_bad_flink(fd);
-	test_bad_open(fd);
+	if (drmtest_run_subtest("basic"))
+		test_flink(fd);
+	if (drmtest_run_subtest("double-flink"))
+		test_double_flink(fd);
+	if (drmtest_run_subtest("bad-flink"))
+		test_bad_flink(fd);
+	if (drmtest_run_subtest("bad-open"))
+		test_bad_open(fd);
+	if (drmtest_run_subtest("flink-lifetime"))
+		test_flink_lifetime(fd);
 
 	return 0;
 }
diff -rupN dump_1/tests/gem_gtt_concurrent_blit.c dump/tests/gem_gtt_concurrent_blit.c
--- dump_1/tests/gem_gtt_concurrent_blit.c	2001-01-14 08:11:40.077619273 +0800
+++ dump/tests/gem_gtt_concurrent_blit.c	2001-01-14 08:10:57.856623303 +0800
@@ -96,11 +96,13 @@ main(int argc, char **argv)
 	drm_intel_bufmgr *bufmgr;
 	struct intel_batchbuffer *batch;
 	int num_buffers = 128, max;
-	drm_intel_bo *src[128], *dst[128], *dummy;
+	drm_intel_bo *src[128], *dst[128], *dummy = NULL;
 	int width = 512, height = 512;
 	int fd;
 	int i;
 
+	drmtest_subtest_init(argc, argv);
+
 	fd = drm_open_any();
 
 	max = gem_aperture_size (fd) / (1024 * 1024) / 2;
@@ -111,35 +113,45 @@ main(int argc, char **argv)
 	drm_intel_bufmgr_gem_enable_reuse(bufmgr);
 	batch = intel_batchbuffer_alloc(bufmgr, intel_get_drm_devid(fd));
 
-	for (i = 0; i < num_buffers; i++) {
-		src[i] = create_bo(bufmgr, i, width, height);
-		dst[i] = create_bo(bufmgr, ~i, width, height);
+	if (!drmtest_only_list_subtests()) {
+		for (i = 0; i < num_buffers; i++) {
+			src[i] = create_bo(bufmgr, i, width, height);
+			dst[i] = create_bo(bufmgr, ~i, width, height);
+		}
+		dummy = create_bo(bufmgr, 0, width, height);
 	}
-	dummy = create_bo(bufmgr, 0, width, height);
 
 	/* try to overwrite the source values */
-	for (i = 0; i < num_buffers; i++)
-		intel_copy_bo(batch, dst[i], src[i], width, height);
-	for (i = num_buffers; i--; )
-		set_bo(src[i], 0xdeadbeef, width, height);
-	for (i = 0; i < num_buffers; i++)
-		cmp_bo(dst[i], i, width, height);
+	if (drmtest_run_subtest("overwrite-source")) {
+		for (i = 0; i < num_buffers; i++)
+			intel_copy_bo(batch, dst[i], src[i], width, height);
+		for (i = num_buffers; i--; )
+			set_bo(src[i], 0xdeadbeef, width, height);
+		for (i = 0; i < num_buffers; i++)
+			cmp_bo(dst[i], i, width, height);
+	}
 
 	/* try to read the results before the copy completes */
-	for (i = 0; i < num_buffers; i++)
-		intel_copy_bo(batch, dst[i], src[i], width, height);
-	for (i = num_buffers; i--; )
-		cmp_bo(dst[i], 0xdeadbeef, width, height);
+	if (drmtest_run_subtest("early-read")) {
+		for (i = num_buffers; i--; )
+			set_bo(src[i], 0xdeadbeef, width, height);
+		for (i = 0; i < num_buffers; i++)
+			intel_copy_bo(batch, dst[i], src[i], width, height);
+		for (i = num_buffers; i--; )
+			cmp_bo(dst[i], 0xdeadbeef, width, height);
+	}
 
 	/* and finally try to trick the kernel into loosing the pending write */
-	for (i = num_buffers; i--; )
-		set_bo(src[i], 0xabcdabcd, width, height);
-	for (i = 0; i < num_buffers; i++)
-		intel_copy_bo(batch, dst[i], src[i], width, height);
-	for (i = num_buffers; i--; )
-		intel_copy_bo(batch, dummy, dst[i], width, height);
-	for (i = num_buffers; i--; )
-		cmp_bo(dst[i], 0xabcdabcd, width, height);
+	if (drmtest_run_subtest("gpu-read-after-write")) {
+		for (i = num_buffers; i--; )
+			set_bo(src[i], 0xabcdabcd, width, height);
+		for (i = 0; i < num_buffers; i++)
+			intel_copy_bo(batch, dst[i], src[i], width, height);
+		for (i = num_buffers; i--; )
+			intel_copy_bo(batch, dummy, dst[i], width, height);
+		for (i = num_buffers; i--; )
+			cmp_bo(dst[i], 0xabcdabcd, width, height);
+	}
 
 	return 0;
 }
diff -rupN dump_1/tests/gem_mmap_gtt.c dump/tests/gem_mmap_gtt.c
--- dump_1/tests/gem_mmap_gtt.c	2001-01-14 08:11:40.078619273 +0800
+++ dump/tests/gem_mmap_gtt.c	2001-01-14 08:10:57.856623303 +0800
@@ -148,12 +148,18 @@ int main(int argc, char **argv)
 {
 	int fd;
 
+	drmtest_subtest_init(argc, argv);
+
 	fd = drm_open_any();
 
-	test_copy(fd);
-	test_read(fd);
-	test_write(fd);
-	test_write_gtt(fd);
+	if (drmtest_run_subtest("copy"))
+		test_copy(fd);
+	if (drmtest_run_subtest("read"))
+		test_read(fd);
+	if (drmtest_run_subtest("write"))
+		test_write(fd);
+	if (drmtest_run_subtest("write-gtt"))
+		test_write_gtt(fd);
 
 	close(fd);
 
diff -rupN dump_1/tests/gem_non_secure_batch.c dump/tests/gem_non_secure_batch.c
--- dump_1/tests/gem_non_secure_batch.c	1970-01-01 07:30:00.000000000 +0730
+++ dump/tests/gem_non_secure_batch.c	2001-01-14 08:10:57.857623218 +0800
@@ -0,0 +1,124 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Daniel Vetter <daniel.vetter@ffwll.ch> (based on gem_storedw_*.c)
+ *
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <assert.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include "drm.h"
+#include "i915_drm.h"
+#include "drmtest.h"
+#include "intel_bufmgr.h"
+#include "intel_batchbuffer.h"
+#include "intel_gpu_tools.h"
+#include "i830_reg.h"
+
+static drm_intel_bufmgr *bufmgr;
+struct intel_batchbuffer *batch;
+
+/*
+ * Testcase: Basic check of non-secure batches
+ *
+ * This test tries to stop the render ring with a MI_LOAD_REG command, which
+ * should fail if the non-secure handling works correctly.
+ */
+
+#define MI_LOAD_REGISTER_IMM                 (0x22<<23)
+
+static int num_rings = 1;
+
+static void
+mi_lri_loop(void)
+{
+	int i;
+
+	srandom(0xdeadbeef);
+
+	for (i = 0; i < 0x100; i++) {
+		int ring = random() % num_rings + 1;
+
+		BEGIN_BATCH(4);
+		OUT_BATCH(MI_LOAD_REGISTER_IMM | 1);
+		OUT_BATCH(0x203c); /* RENDER RING CTL */
+		OUT_BATCH(0); /* try to stop the ring */
+		OUT_BATCH(MI_NOOP);
+		ADVANCE_BATCH();
+
+		intel_batchbuffer_flush_on_ring(batch, ring);
+	}
+}
+
+int main(int argc, char **argv)
+{
+	int fd;
+	int devid;
+
+	if (argc != 1) {
+		fprintf(stderr, "usage: %s\n", argv[0]);
+		exit(-1);
+	}
+
+	fd = drm_open_any();
+	devid = intel_get_drm_devid(fd);
+
+	if (HAS_BSD_RING(devid))
+		num_rings++;
+
+	if (HAS_BLT_RING(devid))
+		num_rings++;
+
+
+	printf("num rings detected: %i\n", num_rings);
+
+	bufmgr = drm_intel_bufmgr_gem_init(fd, 4096);
+	if (!bufmgr) {
+		fprintf(stderr, "failed to init libdrm\n");
+		exit(-1);
+	}
+	drm_intel_bufmgr_gem_enable_reuse(bufmgr);
+
+	batch = intel_batchbuffer_alloc(bufmgr, devid);
+	if (!batch) {
+		fprintf(stderr, "failed to create batch buffer\n");
+		exit(-1);
+	}
+
+	mi_lri_loop();
+	gem_quiescent_gpu(fd);
+
+	intel_batchbuffer_free(batch);
+	drm_intel_bufmgr_destroy(bufmgr);
+
+	close(fd);
+
+	return 0;
+}
diff -rupN dump_1/tests/gem_partial_pwrite_pread.c dump/tests/gem_partial_pwrite_pread.c
--- dump_1/tests/gem_partial_pwrite_pread.c	2001-01-14 08:11:40.079619273 +0800
+++ dump/tests/gem_partial_pwrite_pread.c	2001-01-14 08:10:57.858623135 +0800
@@ -103,27 +103,11 @@ blt_bo_fill(drm_intel_bo *tmp_bo, drm_in
 
 #define MAX_BLT_SIZE 128
 #define ROUNDS 1000
-int main(int argc, char **argv)
+uint8_t tmp[BO_SIZE];
+
+static void test_partial_reads(void)
 {
 	int i, j;
-	uint8_t tmp[BO_SIZE];
-	uint8_t *gtt_ptr;
-
-	srandom(0xdeadbeef);
-
-	fd = drm_open_any();
-
-	bufmgr = drm_intel_bufmgr_gem_init(fd, 4096);
-	//drm_intel_bufmgr_gem_enable_reuse(bufmgr);
-	devid = intel_get_drm_devid(fd);
-	batch = intel_batchbuffer_alloc(bufmgr, devid);
-
-	/* overallocate the buffers we're actually using because */
-	scratch_bo = drm_intel_bo_alloc(bufmgr, "scratch bo", BO_SIZE, 4096);
-	staging_bo = drm_intel_bo_alloc(bufmgr, "staging bo", BO_SIZE, 4096);
-
-	drmtest_init_aperture_trashers(bufmgr);
-	mappable_gtt_limit = gem_mappable_aperture_size();
 
 	printf("checking partial reads\n");
 	for (i = 0; i < ROUNDS; i++) {
@@ -147,6 +131,13 @@ int main(int argc, char **argv)
 		drmtest_progress("partial reads test: ", i, ROUNDS);
 	}
 
+}
+
+static void test_partial_writes(void)
+{
+	int i, j;
+	uint8_t *gtt_ptr;
+
 	printf("checking partial writes\n");
 	for (i = 0; i < ROUNDS; i++) {
 		int start, len;
@@ -191,6 +182,13 @@ int main(int argc, char **argv)
 		drmtest_progress("partial writes test: ", i, ROUNDS);
 	}
 
+}
+
+static void test_partial_read_writes(void)
+{
+	int i, j;
+	uint8_t *gtt_ptr;
+
 	printf("checking partial writes after partial reads\n");
 	for (i = 0; i < ROUNDS; i++) {
 		int start, len;
@@ -253,6 +251,36 @@ int main(int argc, char **argv)
 
 		drmtest_progress("partial read/writes test: ", i, ROUNDS);
 	}
+}
+
+int main(int argc, char **argv)
+{
+	srandom(0xdeadbeef);
+
+	drmtest_subtest_init(argc, argv);
+
+	fd = drm_open_any();
+
+	bufmgr = drm_intel_bufmgr_gem_init(fd, 4096);
+	//drm_intel_bufmgr_gem_enable_reuse(bufmgr);
+	devid = intel_get_drm_devid(fd);
+	batch = intel_batchbuffer_alloc(bufmgr, devid);
+
+	/* overallocate the buffers we're actually using because */
+	scratch_bo = drm_intel_bo_alloc(bufmgr, "scratch bo", BO_SIZE, 4096);
+	staging_bo = drm_intel_bo_alloc(bufmgr, "staging bo", BO_SIZE, 4096);
+
+	drmtest_init_aperture_trashers(bufmgr);
+	mappable_gtt_limit = gem_mappable_aperture_size();
+
+	if (drmtest_run_subtest("reads"))
+		test_partial_reads();
+
+	if (drmtest_run_subtest("writes"))
+		test_partial_writes();
+
+	if (drmtest_run_subtest("writes-after-reads"))
+		test_partial_read_writes();
 
 	drmtest_cleanup_aperture_trashers();
 	drm_intel_bufmgr_destroy(bufmgr);
diff -rupN dump_1/tests/gem_pin.c dump/tests/gem_pin.c
--- dump_1/tests/gem_pin.c	1970-01-01 07:30:00.000000000 +0730
+++ dump/tests/gem_pin.c	2001-01-14 08:10:57.859623052 +0800
@@ -0,0 +1,246 @@
+/*
+ * Copyright © 20013 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Chris Wilson <chris@chris-wilson.co.uk>
+ *
+ */
+
+/* Exercises pinning of small bo */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <assert.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include "drm.h"
+#include "i915_drm.h"
+#include "drmtest.h"
+#include "intel_chipset.h"
+#include "intel_gpu_tools.h"
+
+#define COPY_BLT_CMD            (2<<29|0x53<<22|0x6)
+#define BLT_WRITE_ALPHA         (1<<21)
+#define BLT_WRITE_RGB           (1<<20)
+
+static void exec(int fd, uint32_t handle, uint32_t offset)
+{
+	struct drm_i915_gem_execbuffer2 execbuf;
+	struct drm_i915_gem_exec_object2 gem_exec[1];
+	struct drm_i915_gem_relocation_entry gem_reloc[1];
+
+	gem_reloc[0].offset = 1024;
+	gem_reloc[0].delta = 0;
+	gem_reloc[0].target_handle = handle;
+	gem_reloc[0].read_domains = I915_GEM_DOMAIN_RENDER;
+	gem_reloc[0].write_domain = 0;
+	gem_reloc[0].presumed_offset = 0;
+
+	gem_exec[0].handle = handle;
+	gem_exec[0].relocation_count = 1;
+	gem_exec[0].relocs_ptr = (uintptr_t) gem_reloc;
+	gem_exec[0].alignment = 0;
+	gem_exec[0].offset = 0;
+	gem_exec[0].flags = 0;
+	gem_exec[0].rsvd1 = 0;
+	gem_exec[0].rsvd2 = 0;
+
+	execbuf.buffers_ptr = (uintptr_t)gem_exec;
+	execbuf.buffer_count = 1;
+	execbuf.batch_start_offset = 0;
+	execbuf.batch_len = 8;
+	execbuf.cliprects_ptr = 0;
+	execbuf.num_cliprects = 0;
+	execbuf.DR1 = 0;
+	execbuf.DR4 = 0;
+	execbuf.flags = 0;
+	i915_execbuffer2_set_context_id(execbuf, 0);
+	execbuf.rsvd2 = 0;
+
+	do_or_die(drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, &execbuf));
+	assert(gem_exec[0].offset == offset);
+}
+
+static int gem_linear_blt(uint32_t *batch,
+			  uint32_t src,
+			  uint32_t dst,
+			  uint32_t length,
+			  struct drm_i915_gem_relocation_entry *reloc)
+{
+	uint32_t *b = batch;
+
+	b[0] = COPY_BLT_CMD | BLT_WRITE_ALPHA | BLT_WRITE_RGB;
+	b[1] = 0x66 << 16 | 1 << 25 | 1 << 24 | (4*1024);
+	b[2] = 0;
+	b[3] = (length / (4*1024)) << 16 | 1024;
+	b[4] = 0;
+	reloc->offset = (b-batch+4) * sizeof(uint32_t);
+	reloc->delta = 0;
+	reloc->target_handle = dst;
+	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
+	reloc->write_domain = I915_GEM_DOMAIN_RENDER;
+	reloc->presumed_offset = 0;
+	reloc++;
+
+	b[5] = 0;
+	b[6] = 4*1024;
+	b[7] = 0;
+	reloc->offset = (b-batch+7) * sizeof(uint32_t);
+	reloc->delta = 0;
+	reloc->target_handle = src;
+	reloc->read_domains = I915_GEM_DOMAIN_RENDER;
+	reloc->write_domain = 0;
+	reloc->presumed_offset = 0;
+	reloc++;
+
+	b += 8;
+
+	b[0] = MI_BATCH_BUFFER_END;
+	b[1] = 0;
+
+	return (b+2 - batch) * sizeof(uint32_t);
+}
+
+static void make_busy(int fd, uint32_t handle)
+{
+	struct drm_i915_gem_execbuffer2 execbuf;
+	struct drm_i915_gem_exec_object2 obj[2];
+	struct drm_i915_gem_relocation_entry reloc[2];
+	uint32_t batch[20];
+	uint32_t tmp;
+	int count;
+
+	tmp = gem_create(fd, 1024*1024);
+
+	obj[0].handle = tmp;
+	obj[0].relocation_count = 0;
+	obj[0].relocs_ptr = 0;
+	obj[0].alignment = 0;
+	obj[0].offset = 0;
+	obj[0].flags = 0;
+	obj[0].rsvd1 = 0;
+	obj[0].rsvd2 = 0;
+
+	obj[1].handle = handle;
+	obj[1].relocation_count = 2;
+	obj[1].relocs_ptr = (uintptr_t) reloc;
+	obj[1].alignment = 0;
+	obj[1].offset = 0;
+	obj[1].flags = 0;
+	obj[1].rsvd1 = 0;
+	obj[1].rsvd2 = 0;
+
+	execbuf.buffers_ptr = (uintptr_t)obj;
+	execbuf.buffer_count = 2;
+	execbuf.batch_start_offset = 0;
+	execbuf.batch_len = gem_linear_blt(batch, tmp, tmp, 1024*1024,reloc);
+	execbuf.cliprects_ptr = 0;
+	execbuf.num_cliprects = 0;
+	execbuf.DR1 = 0;
+	execbuf.DR4 = 0;
+	execbuf.flags = 0;
+	if (HAS_BLT_RING(intel_get_drm_devid(fd)))
+		execbuf.flags |= I915_EXEC_BLT;
+	i915_execbuffer2_set_context_id(execbuf, 0);
+	execbuf.rsvd2 = 0;
+
+	gem_write(fd, handle, 0, batch, execbuf.batch_len);
+	for (count = 0; count < 10; count++)
+		do_or_die(drmIoctl(fd, DRM_IOCTL_I915_GEM_EXECBUFFER2, &execbuf));
+	gem_close(fd, tmp);
+}
+
+static int test_can_pin(int fd)
+{
+	struct drm_i915_gem_pin pin;
+	int ret;
+
+	pin.handle = gem_create(fd, 4096);;
+	pin.alignment = 0;
+	ret = drmIoctl(fd, DRM_IOCTL_I915_GEM_PIN, &pin);
+	gem_close(fd, pin.handle);
+
+	return ret == 0;;
+}
+
+static uint32_t gem_pin(int fd, int handle, int alignment)
+{
+	struct drm_i915_gem_pin pin;
+
+	pin.handle = handle;
+	pin.alignment = alignment;
+	do_ioctl(fd, DRM_IOCTL_I915_GEM_PIN, &pin);
+	return pin.offset;
+}
+
+int main(int argc, char **argv)
+{
+	const uint32_t batch[2] = {MI_BATCH_BUFFER_END};
+	struct timeval start, now;
+	uint32_t *handle, *offset;
+	int fd, i;
+
+	fd = drm_open_any();
+
+	if (!test_can_pin(fd))
+		return 77;
+
+	handle = malloc(sizeof(uint32_t)*100);
+	offset = malloc(sizeof(uint32_t)*100);
+
+	/* Race creation/use against interrupts */
+	drmtest_fork_signal_helper();
+	gettimeofday(&start, NULL);
+	do {
+		for (i = 0; i < 100; i++) {
+			if (i & 1) {
+				/* pin anidle bo */
+				handle[i] = gem_create(fd, 4096);
+				offset[i] = gem_pin(fd, handle[i], 0);
+				assert(offset[i]);
+				gem_write(fd, handle[i], 0, batch, sizeof(batch));
+			} else {
+				/* try to pin an anidle bo */
+				handle[i] = gem_create(fd, 4096);
+				make_busy(fd, handle[i]);
+				offset[i] = gem_pin(fd, handle[i], 256*1024);
+				assert(offset[i]);
+				assert((offset[i] & (256*1024-1)) == 0);
+				gem_write(fd, handle[i], 0, batch, sizeof(batch));
+			}
+		}
+		for (i = 0; i < 1000; i++) {
+			int j = rand() % 100;
+			exec(fd, handle[j], offset[j]);
+		}
+		for (i = 0; i < 100; i++)
+			gem_close(fd, handle[i]);
+		gettimeofday(&now, NULL);
+	} while ((now.tv_sec - start.tv_sec)*1000 + (now.tv_usec - start.tv_usec) / 1000 < 10000);
+	drmtest_stop_signal_helper();
+
+	return 0;
+}
diff -rupN dump_1/tests/gem_readwrite.c dump/tests/gem_readwrite.c
--- dump_1/tests/gem_readwrite.c	2001-01-14 08:11:40.080619273 +0800
+++ dump/tests/gem_readwrite.c	2001-01-14 08:10:57.859623052 +0800
@@ -44,34 +44,34 @@
 static int
 do_read(int fd, int handle, void *buf, int offset, int size)
 {
-	struct drm_i915_gem_pread read;
+	struct drm_i915_gem_pread gem_pread;
 
 	/* Ensure that we don't have any convenient data in buf in case
 	 * we fail.
 	 */
 	memset(buf, 0xd0, size);
 
-	memset(&read, 0, sizeof(read));
-	read.handle = handle;
-	read.data_ptr = (uintptr_t)buf;
-	read.size = size;
-	read.offset = offset;
+	memset(&gem_pread, 0, sizeof(gem_pread));
+	gem_pread.handle = handle;
+	gem_pread.data_ptr = (uintptr_t)buf;
+	gem_pread.size = size;
+	gem_pread.offset = offset;
 
-	return ioctl(fd, DRM_IOCTL_I915_GEM_PREAD, &read);
+	return ioctl(fd, DRM_IOCTL_I915_GEM_PREAD, &gem_pread);
 }
 
 static int
 do_write(int fd, int handle, void *buf, int offset, int size)
 {
-	struct drm_i915_gem_pwrite write;
+	struct drm_i915_gem_pwrite gem_pwrite;
 
-	memset(&write, 0, sizeof(write));
-	write.handle = handle;
-	write.data_ptr = (uintptr_t)buf;
-	write.size = size;
-	write.offset = offset;
+	memset(&gem_pwrite, 0, sizeof(gem_pwrite));
+	gem_pwrite.handle = handle;
+	gem_pwrite.data_ptr = (uintptr_t)buf;
+	gem_pwrite.size = size;
+	gem_pwrite.offset = offset;
 
-	return ioctl(fd, DRM_IOCTL_I915_GEM_PWRITE, &write);
+	return ioctl(fd, DRM_IOCTL_I915_GEM_PWRITE, &gem_pwrite);
 }
 
 int main(int argc, char **argv)
diff -rupN dump_1/tests/gem_reg_read.c dump/tests/gem_reg_read.c
--- dump_1/tests/gem_reg_read.c	2001-01-14 08:11:40.080619273 +0800
+++ dump/tests/gem_reg_read.c	2001-01-14 08:10:57.860622971 +0800
@@ -52,29 +52,29 @@ static void handle_bad(int ret, int lerr
 
 static uint64_t timer_query(int fd)
 {
-	struct local_drm_i915_reg_read read;
+	struct local_drm_i915_reg_read reg_read;
 	int ret;
 
-	read.offset = 0x2358;
-	ret = drmIoctl(fd, REG_READ_IOCTL, &read);
+	reg_read.offset = 0x2358;
+	ret = drmIoctl(fd, REG_READ_IOCTL, &reg_read);
 	if (ret) {
 		perror("positive test case failed: ");
 		exit(EXIT_FAILURE);
 	}
 
-	return read.val;
+	return reg_read.val;
 }
 
 int main(int argc, char *argv[])
 {
-	struct local_drm_i915_reg_read read;
+	struct local_drm_i915_reg_read reg_read;
 	int ret, fd;
 	uint64_t val;
 
 	fd = drm_open_any();
 
-	read.offset = 0x2358;
-	ret = drmIoctl(fd, REG_READ_IOCTL, &read);
+	reg_read.offset = 0x2358;
+	ret = drmIoctl(fd, REG_READ_IOCTL, &reg_read);
 	if (errno == EINVAL)
 		exit(77);
 	else if (ret)
@@ -88,8 +88,8 @@ int main(int argc, char *argv[])
 	}
 
 	/* bad reg */
-	read.offset = 0x12345678;
-	ret = drmIoctl(fd, REG_READ_IOCTL, &read);
+	reg_read.offset = 0x12345678;
+	ret = drmIoctl(fd, REG_READ_IOCTL, &reg_read);
 	handle_bad(ret, errno, EINVAL, "bad register");
 
 	close(fd);
diff -rupN dump_1/tests/gem_render_linear_blits.c dump/tests/gem_render_linear_blits.c
--- dump_1/tests/gem_render_linear_blits.c	1970-01-01 07:30:00.000000000 +0730
+++ dump/tests/gem_render_linear_blits.c	2001-01-14 08:10:57.861622892 +0800
@@ -0,0 +1,171 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Chris Wilson <chris@chris-wilson.co.uk>
+ *
+ */
+
+/** @file gem_linear_render_blits.c
+ *
+ * This is a test of doing many blits, with a working set
+ * larger than the aperture size.
+ *
+ * The goal is to simply ensure the basics work.
+ */
+
+#include "rendercopy.h"
+
+#define WIDTH 512
+#define STRIDE (WIDTH*4)
+#define HEIGHT 512
+#define SIZE (HEIGHT*STRIDE)
+
+static uint32_t linear[WIDTH*HEIGHT];
+static render_copyfunc_t render_copy;
+
+static void
+check_bo(int fd, uint32_t handle, uint32_t val)
+{
+	int i;
+
+	gem_read(fd, handle, 0, linear, sizeof(linear));
+	for (i = 0; i < WIDTH*HEIGHT; i++) {
+		if (linear[i] != val) {
+			fprintf(stderr, "Expected 0x%08x, found 0x%08x "
+				"at offset 0x%08x\n",
+				val, linear[i], i * 4);
+			abort();
+		}
+		val++;
+	}
+}
+
+int main(int argc, char **argv)
+{
+	drm_intel_bufmgr *bufmgr;
+	struct intel_batchbuffer *batch;
+	uint32_t *start_val;
+	drm_intel_bo **bo;
+	uint32_t start = 0;
+	int i, j, fd, count;
+
+	fd = drm_open_any();
+
+	render_copy = get_render_copyfunc(intel_get_drm_devid(fd));
+	if (render_copy == NULL) {
+		printf("no render-copy function, doing nothing\n");
+		return 77;
+	}
+
+	bufmgr = drm_intel_bufmgr_gem_init(fd, 4096);
+	batch = intel_batchbuffer_alloc(bufmgr, intel_get_drm_devid(fd));
+
+	count = 0;
+	if (argc > 1)
+		count = atoi(argv[1]);
+	if (count == 0)
+		count = 3 * gem_aperture_size(fd) / SIZE / 2;
+	printf("Using %d 1MiB buffers\n", count);
+
+	bo = malloc(sizeof(*bo)*count);
+	start_val = malloc(sizeof(*start_val)*count);
+
+	for (i = 0; i < count; i++) {
+		bo[i] = drm_intel_bo_alloc(bufmgr, "", SIZE, 4096);
+		start_val[i] = start;
+		for (j = 0; j < WIDTH*HEIGHT; j++)
+			linear[j] = start++;
+		gem_write(fd, bo[i]->handle, 0, linear, sizeof(linear));
+	}
+
+	printf("Verifying initialisation...\n");
+	for (i = 0; i < count; i++)
+		check_bo(fd, bo[i]->handle, start_val[i]);
+
+	printf("Cyclic blits, forward...\n");
+	for (i = 0; i < count * 4; i++) {
+		struct scratch_buf src, dst;
+
+		src.bo = bo[i % count];
+		src.stride = STRIDE;
+		src.tiling = I915_TILING_NONE;
+		src.size = SIZE;
+
+		dst.bo = bo[(i + 1) % count];
+		dst.stride = STRIDE;
+		dst.tiling = I915_TILING_NONE;
+		dst.size = SIZE;
+
+		render_copy(batch, &src, 0, 0, WIDTH, HEIGHT, &dst, 0, 0);
+		start_val[(i + 1) % count] = start_val[i % count];
+	}
+	for (i = 0; i < count; i++)
+		check_bo(fd, bo[i]->handle, start_val[i]);
+
+	printf("Cyclic blits, backward...\n");
+	for (i = 0; i < count * 4; i++) {
+		struct scratch_buf src, dst;
+
+		src.bo = bo[(i + 1) % count];
+		src.stride = STRIDE;
+		src.tiling = I915_TILING_NONE;
+		src.size = SIZE;
+
+		dst.bo = bo[i % count];
+		dst.stride = STRIDE;
+		dst.tiling = I915_TILING_NONE;
+		dst.size = SIZE;
+
+		render_copy(batch, &src, 0, 0, WIDTH, HEIGHT, &dst, 0, 0);
+		start_val[i % count] = start_val[(i + 1) % count];
+	}
+	for (i = 0; i < count; i++)
+		check_bo(fd, bo[i]->handle, start_val[i]);
+
+	printf("Random blits...\n");
+	for (i = 0; i < count * 4; i++) {
+		struct scratch_buf src, dst;
+		int s = random() % count;
+		int d = random() % count;
+
+		if (s == d)
+			continue;
+
+		src.bo = bo[s];
+		src.stride = STRIDE;
+		src.tiling = I915_TILING_NONE;
+		src.size = SIZE;
+
+		dst.bo = bo[d];
+		dst.stride = STRIDE;
+		dst.tiling = I915_TILING_NONE;
+		dst.size = SIZE;
+
+		render_copy(batch, &src, 0, 0, WIDTH, HEIGHT, &dst, 0, 0);
+		start_val[d] = start_val[s];
+	}
+	for (i = 0; i < count; i++)
+		check_bo(fd, bo[i]->handle, start_val[i]);
+
+	return 0;
+}
diff -rupN dump_1/tests/gem_render_tiled_blits.c dump/tests/gem_render_tiled_blits.c
--- dump_1/tests/gem_render_tiled_blits.c	1970-01-01 07:30:00.000000000 +0730
+++ dump/tests/gem_render_tiled_blits.c	2001-01-14 08:10:57.861622892 +0800
@@ -0,0 +1,158 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Chris Wilson <chris@chris-wilson.co.uk>
+ *
+ */
+
+/** @file gem_linear_render_blits.c
+ *
+ * This is a test of doing many blits, with a working set
+ * larger than the aperture size.
+ *
+ * The goal is to simply ensure the basics work.
+ */
+
+#include "rendercopy.h"
+
+#define WIDTH 512
+#define STRIDE (WIDTH*4)
+#define HEIGHT 512
+#define SIZE (HEIGHT*STRIDE)
+
+static render_copyfunc_t render_copy;
+
+static void
+check_bo(drm_intel_bo *bo, uint32_t val)
+{
+	uint32_t *ptr;
+	int i;
+
+	do_or_die(drm_intel_gem_bo_map_gtt(bo));
+	ptr = bo->virtual;
+	for (i = 0; i < WIDTH*HEIGHT; i++) {
+		if (ptr[i] != val) {
+			fprintf(stderr, "Expected 0x%08x, found 0x%08x "
+				"at offset 0x%08x\n",
+				val, ptr[i], i * 4);
+			abort();
+		}
+		val++;
+	}
+	drm_intel_gem_bo_unmap_gtt(bo);
+}
+
+int main(int argc, char **argv)
+{
+	drm_intel_bufmgr *bufmgr;
+	struct intel_batchbuffer *batch;
+	uint32_t *start_val;
+	struct scratch_buf *buf;
+	uint32_t start = 0;
+	int i, j, fd, count;
+
+	fd = drm_open_any();
+
+	render_copy = get_render_copyfunc(intel_get_drm_devid(fd));
+	if (render_copy == NULL) {
+		printf("no render-copy function, doing nothing\n");
+		return 77;
+	}
+
+	bufmgr = drm_intel_bufmgr_gem_init(fd, 4096);
+	drm_intel_bufmgr_gem_set_vma_cache_size(bufmgr, 32);
+	batch = intel_batchbuffer_alloc(bufmgr, intel_get_drm_devid(fd));
+
+	count = 0;
+	if (argc > 1)
+		count = atoi(argv[1]);
+	if (count == 0)
+		count = 3 * gem_aperture_size(fd) / SIZE / 2;
+	printf("Using %d 1MiB buffers\n", count);
+
+	buf = malloc(sizeof(*buf)*count);
+	start_val = malloc(sizeof(*start_val)*count);
+
+	for (i = 0; i < count; i++) {
+		uint32_t tiling = I915_TILING_X + (random() & 1);
+		unsigned long pitch = STRIDE;
+		uint32_t *ptr;
+
+		buf[i].bo = drm_intel_bo_alloc_tiled(bufmgr, "",
+						     WIDTH, HEIGHT, 4,
+						     &tiling, &pitch, 0);
+		buf[i].stride = pitch;
+		buf[i].tiling = tiling;
+		buf[i].size = SIZE;
+
+		start_val[i] = start;
+
+		do_or_die(drm_intel_gem_bo_map_gtt(buf[i].bo));
+		ptr = buf[i].bo->virtual;
+		for (j = 0; j < WIDTH*HEIGHT; j++)
+			ptr[j] = start++;
+		drm_intel_gem_bo_unmap_gtt(buf[i].bo);
+	}
+
+	printf("Verifying initialisation...\n");
+	for (i = 0; i < count; i++)
+		check_bo(buf[i].bo, start_val[i]);
+
+	printf("Cyclic blits, forward...\n");
+	for (i = 0; i < count * 4; i++) {
+		int src = i % count;
+		int dst = (i + 1) % count;
+
+		render_copy(batch, buf+src, 0, 0, WIDTH, HEIGHT, buf+dst, 0, 0);
+		start_val[dst] = start_val[src];
+	}
+	for (i = 0; i < count; i++)
+		check_bo(buf[i].bo, start_val[i]);
+
+	printf("Cyclic blits, backward...\n");
+	for (i = 0; i < count * 4; i++) {
+		int src = (i + 1) % count;
+		int dst = i % count;
+
+		render_copy(batch, buf+src, 0, 0, WIDTH, HEIGHT, buf+dst, 0, 0);
+		start_val[dst] = start_val[src];
+	}
+	for (i = 0; i < count; i++)
+		check_bo(buf[i].bo, start_val[i]);
+
+	printf("Random blits...\n");
+	for (i = 0; i < count * 4; i++) {
+		int src = random() % count;
+		int dst = random() % count;
+
+		if (src == dst)
+			continue;
+
+		render_copy(batch, buf+src, 0, 0, WIDTH, HEIGHT, buf+dst, 0, 0);
+		start_val[dst] = start_val[src];
+	}
+	for (i = 0; i < count; i++)
+		check_bo(buf[i].bo, start_val[i]);
+
+	return 0;
+}
diff -rupN dump_1/tests/gem_ringfill.c dump/tests/gem_ringfill.c
--- dump_1/tests/gem_ringfill.c	2001-01-14 08:11:40.080619273 +0800
+++ dump/tests/gem_ringfill.c	2001-01-14 08:10:57.862622813 +0800
@@ -55,6 +55,7 @@ struct bo {
 };
 
 static const int width = 512, height = 512;
+static bool skipped_all = true;
 
 static void create_bo(drm_intel_bufmgr *bufmgr,
 		      struct bo *b,
@@ -122,6 +123,7 @@ static int check_ring(drm_intel_bufmgr *
 	int i;
 
 	snprintf(output, 100, "filling %s ring: ", ring);
+	skipped_all = false;
 
 	create_bo(bufmgr, &bo, ring);
 
@@ -203,25 +205,23 @@ int main(int argc, char **argv)
 	render_copyfunc_t copy;
 	int fd, fails = 0;
 
+	drmtest_subtest_init(argc, argv);
+
 	fd = drm_open_any();
 
 	bufmgr = drm_intel_bufmgr_gem_init(fd, 4096);
 	drm_intel_bufmgr_gem_enable_reuse(bufmgr);
 	batch = intel_batchbuffer_alloc(bufmgr, intel_get_drm_devid(fd));
 
-	fails += check_ring(bufmgr, batch, "blt", blt_copy);
+	if (drmtest_run_subtest("blitter"))
+		fails += check_ring(bufmgr, batch, "blt", blt_copy);
 
 	/* Strictly only required on architectures with a separate BLT ring,
 	 * but lets stress everybody.
 	 */
-	copy = NULL;
-	if (IS_GEN2(batch->devid))
-		copy = gen2_render_copyfunc;
-	else if (IS_GEN3(batch->devid))
-		copy = gen3_render_copyfunc;
-	else if (IS_GEN6(batch->devid))
-		copy = gen6_render_copyfunc;
-	if (copy)
+	copy = get_render_copyfunc(batch->devid);
+
+	if (drmtest_run_subtest("render") && copy)
 		fails += check_ring(bufmgr, batch, "render", copy);
 
 	intel_batchbuffer_free(batch);
@@ -229,5 +229,5 @@ int main(int argc, char **argv)
 
 	close(fd);
 
-	return fails != 0;
+	return skipped_all ? 77 : fails != 0;
 }
diff -rupN dump_1/tests/gem_seqno_wrap.c dump/tests/gem_seqno_wrap.c
--- dump_1/tests/gem_seqno_wrap.c	1970-01-01 07:30:00.000000000 +0730
+++ dump/tests/gem_seqno_wrap.c	2001-01-14 08:10:57.863622736 +0800
@@ -0,0 +1,672 @@
+/*
+ * Copyright (c) 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Mika Kuoppala <mika.kuoppala@intel.com>
+ *
+ */
+
+/*
+ * This test runs blitcopy -> rendercopy with multiple buffers over wrap
+ * boundary.
+ */
+
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+#include <assert.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <limits.h>
+#include <wordexp.h>
+#include <signal.h>
+
+#include "i915_drm.h"
+#include "intel_bufmgr.h"
+#include "intel_batchbuffer.h"
+#include "intel_gpu_tools.h"
+#include "rendercopy.h"
+
+static int devid;
+static int card_index = 0;
+static uint32_t last_seqno = 0;
+
+static struct intel_batchbuffer *batch_blt;
+static struct intel_batchbuffer *batch_3d;
+
+struct option_struct {
+	int rounds;
+	int background;
+	char cmd[1024];
+	int verbose;
+	int timeout;
+	int dontwrap;
+	int prewrap_space;
+	int random;
+	int buffers;
+};
+
+static struct option_struct options;
+
+static void init_buffer(drm_intel_bufmgr *bufmgr,
+			struct scratch_buf *buf,
+			drm_intel_bo *bo,
+			int width, int height)
+{
+	/* buf->bo = drm_intel_bo_alloc(bufmgr, "", size, 4096); */
+	buf->bo = bo;
+	buf->size = width * height * 4;
+	assert(buf->bo);
+	buf->tiling = I915_TILING_NONE;
+	buf->data = buf->cpu_mapping = NULL;
+	buf->num_tiles = width * height * 4;
+	buf->stride = width * 4;
+}
+
+static void
+set_bo(drm_intel_bo *bo, uint32_t val, int width, int height)
+{
+	int size = width * height;
+	uint32_t *vaddr;
+
+	drm_intel_gem_bo_start_gtt_access(bo, true);
+	vaddr = bo->virtual;
+	while (size--)
+		*vaddr++ = val;
+}
+
+static int
+cmp_bo(drm_intel_bo *bo, uint32_t val, int width, int height)
+{
+	int size = width * height;
+	uint32_t *vaddr;
+
+	drm_intel_gem_bo_start_gtt_access(bo, false);
+	vaddr = bo->virtual;
+	while (size--) {
+		if (*vaddr++ != val) {
+			printf("%d: 0x%x differs from assumed 0x%x\n",
+			       width * height - size, *vaddr-1, val);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+static drm_intel_bo *
+create_bo(drm_intel_bufmgr *bufmgr, uint32_t val, int width, int height)
+{
+	drm_intel_bo *bo;
+
+	bo = drm_intel_bo_alloc(bufmgr, "bo", width * height * 4, 0);
+	assert(bo);
+
+	/* gtt map doesn't have a write parameter, so just keep the mapping
+	 * around (to avoid the set_domain with the gtt write domain set) and
+	 * manually tell the kernel when we start access the gtt. */
+	drm_intel_gem_bo_map_gtt(bo);
+
+	set_bo(bo, val, width, height);
+
+	return bo;
+}
+
+static void release_bo(drm_intel_bo *bo)
+{
+	drm_intel_gem_bo_unmap_gtt(bo);
+	drm_intel_bo_unreference(bo);
+}
+
+static void render_copyfunc(struct scratch_buf *src,
+			    struct scratch_buf *dst,
+			    int width,
+			    int height)
+{
+	const int src_x = 0, src_y = 0, dst_x = 0, dst_y = 0;
+	render_copyfunc_t rendercopy = get_render_copyfunc(devid);
+	static int warned = 0;
+
+	if (rendercopy) {
+		rendercopy(batch_3d,
+			   src, src_x, src_y,
+			   width, height,
+			   dst, dst_x, dst_y);
+		intel_batchbuffer_flush(batch_3d);
+	} else {
+		if (!warned) {
+			printf("No render copy found for this gen, "
+			       "test is shallow!\n");
+			warned = 1;
+		}
+		assert(dst->bo);
+		assert(src->bo);
+		intel_copy_bo(batch_blt, dst->bo, src->bo, width, height);
+		intel_batchbuffer_flush(batch_blt);
+	}
+}
+
+static void exchange_uint(void *array, unsigned i, unsigned j)
+{
+	unsigned *i_arr = array;
+	unsigned i_tmp;
+
+	i_tmp = i_arr[i];
+	i_arr[i] = i_arr[j];
+	i_arr[j] = i_tmp;
+}
+
+static int run_sync_test(int num_buffers, bool verify)
+{
+	drm_intel_bufmgr *bufmgr;
+	int max;
+	drm_intel_bo **src, **dst1, **dst2;
+	int width = 128, height = 128;
+	int fd;
+	int i;
+	int r = -1;
+	int failed = 0;
+	unsigned int *p_dst1, *p_dst2;
+	struct scratch_buf *s_src, *s_dst;
+
+	fd = drm_open_any();
+	assert(fd >= 0);
+
+	gem_quiescent_gpu(fd);
+
+	devid = intel_get_drm_devid(fd);
+
+	max = gem_aperture_size (fd) / (1024 * 1024) / 2;
+	if (num_buffers > max)
+		num_buffers = max;
+
+	bufmgr = drm_intel_bufmgr_gem_init(fd, 4096);
+	drm_intel_bufmgr_gem_enable_reuse(bufmgr);
+	batch_blt = intel_batchbuffer_alloc(bufmgr, intel_get_drm_devid(fd));
+	assert(batch_blt);
+	batch_3d = intel_batchbuffer_alloc(bufmgr, intel_get_drm_devid(fd));
+	assert(batch_3d);
+
+	src = malloc(num_buffers * sizeof(**src));
+	assert(src);
+
+	dst1 = malloc(num_buffers * sizeof(**dst1));
+	assert(dst1);
+
+	dst2 = malloc(num_buffers * sizeof(**dst2));
+	assert(dst2);
+
+	s_src = malloc(num_buffers * sizeof(*s_src));
+	assert(s_src);
+
+	s_dst = malloc(num_buffers * sizeof(*s_dst));
+	assert(s_dst);
+
+	p_dst1 = malloc(num_buffers * sizeof(unsigned int));
+	if (p_dst1 == NULL)
+		return -ENOMEM;
+
+	p_dst2 = malloc(num_buffers * sizeof(unsigned int));
+	if (p_dst2 == NULL)
+		return -ENOMEM;
+
+	for (i = 0; i < num_buffers; i++) {
+		p_dst1[i] = p_dst2[i] = i;
+		src[i] = create_bo(bufmgr, i, width, height);
+		assert(src[i]);
+		dst1[i] = create_bo(bufmgr, ~i, width, height);
+		assert(dst1[i]);
+		dst2[i] = create_bo(bufmgr, ~i, width, height);
+		assert(dst2[i]);
+		init_buffer(bufmgr, &s_src[i], src[i], width, height);
+		init_buffer(bufmgr, &s_dst[i], dst1[i], width, height);
+	}
+
+	drmtest_permute_array(p_dst1, num_buffers, exchange_uint);
+	drmtest_permute_array(p_dst2, num_buffers, exchange_uint);
+
+	for (i = 0; i < num_buffers; i++)
+		render_copyfunc(&s_src[i], &s_dst[p_dst1[i]], width, height);
+
+	/* Only sync between buffers if this is actual test run and
+	 * not a seqno filler */
+	if (verify) {
+		for (i = 0; i < num_buffers; i++)
+			intel_copy_bo(batch_blt, dst2[p_dst2[i]], dst1[p_dst1[i]],
+				      width, height);
+
+		for (i = 0; i < num_buffers; i++) {
+			r = cmp_bo(dst2[p_dst2[i]], i, width, height);
+			if (r) {
+				printf("buffer %d differs, seqno_before_test 0x%x, "
+				       " approximated seqno on test fail 0x%x\n",
+				       i, last_seqno, last_seqno + i * 2);
+				failed = -1;
+			}
+		}
+	}
+
+	for (i = 0; i < num_buffers; i++) {
+		release_bo(src[i]);
+		release_bo(dst1[i]);
+		release_bo(dst2[i]);
+	}
+
+	intel_batchbuffer_free(batch_3d);
+	intel_batchbuffer_free(batch_blt);
+	drm_intel_bufmgr_destroy(bufmgr);
+
+	free(p_dst1);
+	free(p_dst2);
+	free(s_dst);
+	free(s_src);
+	free(dst2);
+	free(dst1);
+	free(src);
+
+	gem_quiescent_gpu(fd);
+
+	close(fd);
+
+	return failed;
+}
+
+static int run_cmd(char *s)
+{
+	int pid;
+	int r = -1;
+	int status = 0;
+	wordexp_t wexp;
+	int i;
+	r = wordexp(s, &wexp, 0);
+	if (r != 0) {
+		printf("can't parse %s\n", s);
+		return r;
+	}
+
+	for(i = 0; i < wexp.we_wordc; i++)
+		printf("argv[%d] = %s\n", i, wexp.we_wordv[i]);
+
+	pid = fork();
+
+	if (pid == 0) {
+		char path[PATH_MAX];
+		char full_path[PATH_MAX];
+
+		if (getcwd(path, PATH_MAX) == NULL)
+			perror("getcwd");
+
+		assert(snprintf(full_path, PATH_MAX, "%s/%s", path, wexp.we_wordv[0]) > 0);
+
+		/* if (!options.verbose) {
+			close(STDOUT_FILENO);
+			close(STDERR_FILENO);
+		}
+		*/
+
+		r = execv(full_path, wexp.we_wordv);
+		if (r == -1)
+			perror("execv failed");
+	} else {
+		int waitcount = options.timeout;
+
+		while(waitcount-- > 0) {
+			r = waitpid(pid, &status, WNOHANG);
+			if (r == pid) {
+				if(WIFEXITED(status)) {
+					if (WEXITSTATUS(status))
+						fprintf(stderr,
+						    "child returned with %d\n",
+							WEXITSTATUS(status));
+					return WEXITSTATUS(status);
+				}
+			} else if (r != 0) {
+				perror("waitpid");
+				return -errno;
+			}
+
+			sleep(3);
+		}
+
+		kill(pid, SIGKILL);
+		return -ETIMEDOUT;
+	}
+
+	return r;
+}
+
+static const char *dfs_base = "/sys/kernel/debug/dri";
+static const char *dfs_entry = "i915_next_seqno";
+
+static int dfs_open(int mode)
+{
+	char fname[FILENAME_MAX];
+	int fh;
+
+	snprintf(fname, FILENAME_MAX, "%s/%i/%s",
+		 dfs_base, card_index, dfs_entry);
+
+	fh = open(fname, mode);
+	if (fh == -1) {
+		fprintf(stderr,
+			"error %d opening '%s/%d/%s'. too old kernel?\n",
+			errno, dfs_base, card_index, dfs_entry);
+		exit(77);
+	}
+
+	return fh;
+}
+
+static int __read_seqno(uint32_t *seqno)
+{
+	int fh;
+	char buf[32];
+	int r;
+	char *p;
+	unsigned long int tmp;
+
+	fh = dfs_open(O_RDONLY);
+
+	r = read(fh, buf, sizeof(buf) - 1);
+	close(fh);
+	if (r < 0) {
+		perror("read");
+		return -errno;
+	}
+
+	buf[r] = 0;
+
+	p = strstr(buf, "0x");
+	if (!p)
+		p = buf;
+
+	errno = 0;
+	tmp = strtoul(p, NULL, 0);
+	if (tmp == ULONG_MAX && errno) {
+		perror("strtoul");
+		return -errno;
+	}
+
+	*seqno = tmp;
+
+	if (options.verbose)
+		printf("next_seqno: 0x%x\n", *seqno);
+
+	return 0;
+}
+
+static int read_seqno(void)
+{
+	uint32_t seqno = 0;
+	int r;
+	int wrap = 0;
+
+	r = __read_seqno(&seqno);
+	assert(r == 0);
+
+	if (last_seqno > seqno)
+		wrap++;
+
+	last_seqno = seqno;
+
+	return wrap;
+}
+
+static int write_seqno(uint32_t seqno)
+{
+	int fh;
+	char buf[32];
+	int r;
+
+	if (options.dontwrap)
+		return 0;
+
+	fh = dfs_open(O_RDWR);
+	assert(snprintf(buf, sizeof(buf), "0x%x", seqno) > 0);
+
+	r = write(fh, buf, strnlen(buf, sizeof(buf)));
+	close(fh);
+	if (r < 0)
+		return r;
+
+	assert(r == strnlen(buf, sizeof(buf)));
+
+	last_seqno = seqno;
+
+	if (options.verbose)
+		printf("next_seqno set to: 0x%x\n", seqno);
+
+	return 0;
+}
+
+static uint32_t calc_prewrap_val(void)
+{
+	const int pval = options.prewrap_space;
+
+	if (options.random == 0)
+		return pval;
+
+	if (pval == 0)
+		return 0;
+
+	return (random() % pval);
+}
+
+static int run_test(void)
+{
+	int r;
+
+	if (strnlen(options.cmd, sizeof(options.cmd)) > 0) {
+		r = run_cmd(options.cmd);
+	} else {
+		r = run_sync_test(options.buffers, true);
+	}
+
+	return r;
+}
+
+static void preset_run_once(void)
+{
+	assert(write_seqno(1) == 0);
+	assert(run_test() == 0);
+
+	assert(write_seqno(0x7fffffff) == 0);
+	assert(run_test() == 0);
+
+	assert(write_seqno(0xffffffff) == 0);
+	assert(run_test() == 0);
+
+	assert(write_seqno(0xfffffff0) == 0);
+	assert(run_test() == 0);
+}
+
+static void random_run_once(void)
+{
+	uint32_t val;
+
+	do {
+		val = random() % UINT32_MAX;
+		if (RAND_MAX < UINT32_MAX)
+			val += random();
+	} while (val == 0);
+
+	assert(write_seqno(val) == 0);
+	assert(run_test() == 0);
+}
+
+static void wrap_run_once(void)
+{
+	const uint32_t pw_val = calc_prewrap_val();
+
+	assert(write_seqno(UINT32_MAX - pw_val) == 0);
+
+	while(!read_seqno())
+		assert(run_test() == 0);
+}
+
+static void background_run_once(void)
+{
+	const uint32_t pw_val = calc_prewrap_val();
+
+	assert(write_seqno(UINT32_MAX - pw_val) == 0);
+
+	while(!read_seqno())
+		sleep(3);
+}
+
+static void print_usage(const char *s)
+{
+	printf("%s: [OPTION]...\n", s);
+	printf("    where options are:\n");
+	printf("    -b --background       run in background inducing wraps\n");
+	printf("    -c --cmd=cmdstring    use cmdstring to cross wrap\n");
+	printf("    -n --rounds=num       run num times across wrap boundary, 0 == forever\n");
+	printf("    -t --timeout=sec      set timeout to wait for testrun to sec seconds\n");
+	printf("    -d --dontwrap         don't wrap just run the test\n");
+	printf("    -p --prewrap=n        set seqno to WRAP - n for each testrun\n");
+	printf("    -r --norandom         dont randomize prewrap space\n");
+	printf("    -i --buffers          number of buffers to copy\n");
+	exit(-1);
+}
+
+static void parse_options(int argc, char **argv)
+{
+	int c;
+	int option_index = 0;
+	static struct option long_options[] = {
+		{"cmd", required_argument, 0, 'c'},
+		{"rounds", required_argument, 0, 'n'},
+		{"background", no_argument, 0, 'b'},
+		{"timeout", required_argument, 0, 't'},
+		{"dontwrap", no_argument, 0, 'd'},
+		{"verbose", no_argument, 0, 'v'},
+		{"prewrap", required_argument, 0, 'p'},
+		{"norandom", no_argument, 0, 'r'},
+		{"buffers", required_argument, 0, 'i'},
+	};
+
+	strcpy(options.cmd, "");
+	options.rounds = 50;
+	options.background = 0;
+	options.dontwrap = 0;
+	options.timeout = 20;
+	options.verbose = 0;
+	options.random = 1;
+	options.prewrap_space = 21;
+	options.buffers = 10;
+
+	while((c = getopt_long(argc, argv, "c:n:bvt:dp:ri:",
+			       long_options, &option_index)) != -1) {
+		switch(c) {
+		case 'b':
+			options.background = 1;
+			printf("running in background inducing wraps\n");
+			break;
+		case 'd':
+			options.dontwrap = 1;
+			printf("won't wrap after testruns\n");
+			break;
+		case 'n':
+			options.rounds = atoi(optarg);
+			printf("running %d rounds\n", options.rounds);
+			break;
+		case 'c':
+			strncpy(options.cmd, optarg, sizeof(options.cmd) - 1);
+			options.cmd[sizeof(options.cmd) - 1] = 0;
+			printf("cmd set to %s\n", options.cmd);
+			break;
+		case 'i':
+			options.buffers = atoi(optarg);
+			printf("buffers %d\n", options.buffers);
+			break;
+		case 't':
+			options.timeout = atoi(optarg);
+			if (options.timeout == 0)
+				options.timeout = 10;
+			printf("setting timeout to %d seconds\n",
+			       options.timeout);
+			break;
+		case 'v':
+			options.verbose = 1;
+			break;
+		case 'r':
+			options.random = 0;
+			break;
+		case 'p':
+			options.prewrap_space = atoi(optarg);
+			printf("prewrap set to %d (0x%x)\n",
+			       options.prewrap_space, UINT32_MAX -
+			       options.prewrap_space);
+			break;
+		default:
+			printf("unkown command options\n");
+			print_usage(argv[0]);
+			break;
+		}
+	}
+
+	if (optind < argc) {
+		printf("unkown command options\n");
+		print_usage(argv[0]);
+	}
+}
+
+int main(int argc, char **argv)
+{
+	int wcount = 0;
+	int r = -1;
+
+	parse_options(argc, argv);
+
+	card_index = drm_get_card(0);
+	assert(card_index != -1);
+
+	srandom(time(NULL));
+
+	while(options.rounds == 0 || wcount < options.rounds) {
+		if (options.background) {
+			background_run_once();
+		} else {
+			preset_run_once();
+			random_run_once();
+			wrap_run_once();
+		}
+
+		wcount++;
+
+		if (options.verbose) {
+			printf("%s done: %d\n",
+			       options.dontwrap ? "tests" : "wraps", wcount);
+			fflush(stdout);
+		}
+	}
+
+	if (options.rounds == wcount) {
+		if (options.verbose)
+			printf("done %d wraps successfully\n", wcount);
+		return 0;
+	}
+
+	return r;
+}
diff -rupN dump_1/tests/gem_set_tiling_vs_blt.c dump/tests/gem_set_tiling_vs_blt.c
--- dump_1/tests/gem_set_tiling_vs_blt.c	2001-01-14 08:11:40.081619273 +0800
+++ dump/tests/gem_set_tiling_vs_blt.c	2001-01-14 08:10:57.864622660 +0800
@@ -233,6 +233,8 @@ int main(int argc, char **argv)
 	int i, fd;
 	uint32_t tiling, tiling_after;
 
+	drmtest_subtest_init(argc, argv);
+
 	for (i = 0; i < 1024*256; i++)
 		data[i] = i;
 
@@ -243,27 +245,32 @@ int main(int argc, char **argv)
 	devid = intel_get_drm_devid(fd);
 	batch = intel_batchbuffer_alloc(bufmgr, devid);
 
-
-	printf("testing untiled->tiled transisition:\n");
-	tiling = I915_TILING_NONE;
-	tiling_after = I915_TILING_X;
-	do_test(tiling, TEST_STRIDE, tiling_after, TEST_STRIDE);
-	assert(tiling == I915_TILING_NONE);
-	assert(tiling_after == I915_TILING_X);
-
-	printf("testing tiled->untiled transisition:\n");
-	tiling = I915_TILING_X;
-	tiling_after = I915_TILING_NONE;
-	do_test(tiling, TEST_STRIDE, tiling_after, TEST_STRIDE);
-	assert(tiling == I915_TILING_X);
-	assert(tiling_after == I915_TILING_NONE);
-
-	printf("testing tiled->tiled transisition:\n");
-	tiling = I915_TILING_X;
-	tiling_after = I915_TILING_X;
-	do_test(tiling, TEST_STRIDE/2, tiling_after, TEST_STRIDE);
-	assert(tiling == I915_TILING_X);
-	assert(tiling_after == I915_TILING_X);
+	if (drmtest_run_subtest("untiled-to-tiled")) {
+		printf("testing untiled->tiled transisition:\n");
+		tiling = I915_TILING_NONE;
+		tiling_after = I915_TILING_X;
+		do_test(tiling, TEST_STRIDE, tiling_after, TEST_STRIDE);
+		assert(tiling == I915_TILING_NONE);
+		assert(tiling_after == I915_TILING_X);
+	}
+
+	if (drmtest_run_subtest("tiled-to-untiled")) {
+		printf("testing tiled->untiled transisition:\n");
+		tiling = I915_TILING_X;
+		tiling_after = I915_TILING_NONE;
+		do_test(tiling, TEST_STRIDE, tiling_after, TEST_STRIDE);
+		assert(tiling == I915_TILING_X);
+		assert(tiling_after == I915_TILING_NONE);
+	}
+
+	if (drmtest_run_subtest("tiled-to-tiled")) {
+		printf("testing tiled->tiled transisition:\n");
+		tiling = I915_TILING_X;
+		tiling_after = I915_TILING_X;
+		do_test(tiling, TEST_STRIDE/2, tiling_after, TEST_STRIDE);
+		assert(tiling == I915_TILING_X);
+		assert(tiling_after == I915_TILING_X);
+	}
 
 	return 0;
 }
diff -rupN dump_1/tests/gem_storedw_batches_loop.c dump/tests/gem_storedw_batches_loop.c
--- dump_1/tests/gem_storedw_batches_loop.c	2001-01-14 08:11:40.081619273 +0800
+++ dump/tests/gem_storedw_batches_loop.c	2001-01-14 08:10:57.865622585 +0800
@@ -48,12 +48,14 @@ static int has_ppgtt = 0;
 
 /* Like the store dword test, but we create new command buffers each time */
 static void
-store_dword_loop(void)
+store_dword_loop(int divider)
 {
 	int cmd, i, val = 0, ret;
 	uint32_t *buf;
 	drm_intel_bo *cmd_bo;
 
+	printf("running storedw loop with stall every %i batch\n", divider);
+
 	cmd = MI_STORE_DWORD_IMM;
 	if (!has_ppgtt)
 		cmd |= MI_MEM_VIRTUAL;
@@ -104,6 +106,9 @@ store_dword_loop(void)
 			exit(-1);
 		}
 
+		if (i % divider != 0)
+			goto cont;
+
 		drm_intel_bo_wait_rendering(cmd_bo);
 
 		drm_intel_bo_map(target_bo, 1);
@@ -118,6 +123,7 @@ store_dword_loop(void)
 		buf[0] = 0; /* let batch write it again */
 		drm_intel_bo_unmap(target_bo);
 
+cont:
 		drm_intel_bo_unreference(cmd_bo);
 
 		val++;
@@ -162,7 +168,10 @@ int main(int argc, char **argv)
 		exit(-1);
 	}
 
-	store_dword_loop();
+	store_dword_loop(1);
+	store_dword_loop(2);
+	store_dword_loop(3);
+	store_dword_loop(5);
 
 	drm_intel_bo_unreference(target_bo);
 	drm_intel_bufmgr_destroy(bufmgr);
diff -rupN dump_1/tests/gem_storedw_loop_blt.c dump/tests/gem_storedw_loop_blt.c
--- dump_1/tests/gem_storedw_loop_blt.c	2001-01-14 08:11:40.081619273 +0800
+++ dump/tests/gem_storedw_loop_blt.c	2001-01-14 08:10:57.865622585 +0800
@@ -52,11 +52,13 @@ static int has_ppgtt = 0;
  */
 
 static void
-store_dword_loop(void)
+store_dword_loop(int divider)
 {
 	int cmd, i, val = 0;
 	uint32_t *buf;
 
+	printf("running storedw loop on blt with stall every %i batch\n", divider);
+
 	cmd = MI_STORE_DWORD_IMM;
 	if (!has_ppgtt)
 		cmd |= MI_MEM_VIRTUAL;
@@ -72,6 +74,9 @@ store_dword_loop(void)
 
 		intel_batchbuffer_flush_on_ring(batch, I915_EXEC_BLT);
 
+		if (i % divider != 0)
+			goto cont;
+
 		drm_intel_bo_map(target_buffer, 0);
 
 		buf = target_buffer->virtual;
@@ -84,6 +89,7 @@ store_dword_loop(void)
 
 		drm_intel_bo_unmap(target_buffer);
 
+cont:
 		val++;
 	}
 
@@ -142,7 +148,10 @@ int main(int argc, char **argv)
 		exit(-1);
 	}
 
-	store_dword_loop();
+	store_dword_loop(1);
+	store_dword_loop(2);
+	store_dword_loop(3);
+	store_dword_loop(5);
 
 	drm_intel_bo_unreference(target_buffer);
 	intel_batchbuffer_free(batch);
diff -rupN dump_1/tests/gem_storedw_loop_bsd.c dump/tests/gem_storedw_loop_bsd.c
--- dump_1/tests/gem_storedw_loop_bsd.c	2001-01-14 08:11:40.081619273 +0800
+++ dump/tests/gem_storedw_loop_bsd.c	2001-01-14 08:10:57.866622512 +0800
@@ -52,11 +52,13 @@ static int has_ppgtt = 0;
  */
 
 static void
-store_dword_loop(void)
+store_dword_loop(int divider)
 {
 	int cmd, i, val = 0;
 	uint32_t *buf;
 
+	printf("running storedw loop on bsd with stall every %i batch\n", divider);
+
 	cmd = MI_STORE_DWORD_IMM;
 	if (!has_ppgtt)
 		cmd |= MI_MEM_VIRTUAL;
@@ -72,6 +74,9 @@ store_dword_loop(void)
 
 		intel_batchbuffer_flush_on_ring(batch, I915_EXEC_BSD);
 
+		if (i % divider != 0)
+			goto cont;
+
 		drm_intel_bo_map(target_buffer, 0);
 
 		buf = target_buffer->virtual;
@@ -84,6 +89,7 @@ store_dword_loop(void)
 
 		drm_intel_bo_unmap(target_buffer);
 
+cont:
 		val++;
 	}
 
@@ -148,7 +154,10 @@ int main(int argc, char **argv)
 		exit(-1);
 	}
 
-	store_dword_loop();
+	store_dword_loop(1);
+	store_dword_loop(2);
+	store_dword_loop(3);
+	store_dword_loop(5);
 
 	drm_intel_bo_unreference(target_buffer);
 	intel_batchbuffer_free(batch);
diff -rupN dump_1/tests/gem_storedw_loop_render.c dump/tests/gem_storedw_loop_render.c
--- dump_1/tests/gem_storedw_loop_render.c	2001-01-14 08:11:40.081619273 +0800
+++ dump/tests/gem_storedw_loop_render.c	2001-01-14 08:10:57.867622439 +0800
@@ -52,11 +52,13 @@ static int has_ppgtt = 0;
  */
 
 static void
-store_dword_loop(void)
+store_dword_loop(int divider)
 {
 	int cmd, i, val = 0;
 	uint32_t *buf;
 
+	printf("running storedw loop on render with stall every %i batch\n", divider);
+
 	cmd = MI_STORE_DWORD_IMM;
 	if (!has_ppgtt)
 		cmd |= MI_MEM_VIRTUAL;
@@ -72,6 +74,9 @@ store_dword_loop(void)
 
 		intel_batchbuffer_flush_on_ring(batch, 0);
 
+		if (i % divider != 0)
+			goto cont;
+
 		drm_intel_bo_map(target_buffer, 0);
 
 		buf = target_buffer->virtual;
@@ -84,6 +89,7 @@ store_dword_loop(void)
 
 		drm_intel_bo_unmap(target_buffer);
 
+cont:
 		val++;
 	}
 
@@ -136,7 +142,10 @@ int main(int argc, char **argv)
 		exit(-1);
 	}
 
-	store_dword_loop();
+	store_dword_loop(1);
+	store_dword_loop(2);
+	store_dword_loop(3);
+	store_dword_loop(5);
 
 	drm_intel_bo_unreference(target_buffer);
 	intel_batchbuffer_free(batch);
diff -rupN dump_1/tests/gem_stress.c dump/tests/gem_stress.c
--- dump_1/tests/gem_stress.c	2001-01-14 08:11:40.082619273 +0800
+++ dump/tests/gem_stress.c	2001-01-14 08:10:57.868622368 +0800
@@ -323,31 +323,16 @@ static void render_copyfunc(struct scrat
 			    unsigned logical_tile_no)
 {
 	static unsigned keep_gpu_busy_counter = 0;
+	render_copyfunc_t rendercopy = get_render_copyfunc(devid);
 
 	/* check both edges of the fence usage */
 	if (keep_gpu_busy_counter & 1)
 		keep_gpu_busy();
 
-	if (IS_GEN2(devid))
-		gen2_render_copyfunc(batch,
-				     src, src_x, src_y,
-				     options.tile_size, options.tile_size,
-				     dst, dst_x, dst_y);
-	else if (IS_GEN3(devid))
-		gen3_render_copyfunc(batch,
-				     src, src_x, src_y,
-				     options.tile_size, options.tile_size,
-				     dst, dst_x, dst_y);
-	else if (IS_GEN6(devid))
-		gen6_render_copyfunc(batch,
-				     src, src_x, src_y,
-				     options.tile_size, options.tile_size,
-				     dst, dst_x, dst_y);
-	else if (IS_GEN7(devid))
-		gen7_render_copyfunc(batch,
-				     src, src_x, src_y,
-				     options.tile_size, options.tile_size,
-				     dst, dst_x, dst_y);
+	if (rendercopy)
+		rendercopy(batch, src, src_x, src_y,
+		     options.tile_size, options.tile_size,
+		     dst, dst_x, dst_y);
 	else
 		blitter_copyfunc(src, src_x, src_y,
 				 dst, dst_x, dst_y,
@@ -665,6 +650,7 @@ static void parse_options(int argc, char
 		{"tile-size", 1, 0, TILESZ},
 #define CHCK_RENDER 0xdead0003
 		{"check-render-cpyfn", 0, 0, CHCK_RENDER},
+		{NULL, 0, 0, 0},
 	};
 
 	options.scratch_buf_size = 256*4096;
diff -rupN dump_1/tests/gem_threaded_access_tiled.c dump/tests/gem_threaded_access_tiled.c
--- dump_1/tests/gem_threaded_access_tiled.c	1970-01-01 07:30:00.000000000 +0730
+++ dump/tests/gem_threaded_access_tiled.c	2001-01-14 08:10:57.868622368 +0800
@@ -0,0 +1,123 @@
+/*
+ * Copyright (c) 2012 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Mika Kuoppala <mika.kuoppala@intel.com>
+ */
+
+#include <stdlib.h>
+#include <string.h>
+#include <assert.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <pthread.h>
+
+#include "drmtest.h"
+#include "i915_drm.h"
+#include "intel_bufmgr.h"
+
+/* Testcase: check parallel access to tiled memory
+ *
+ * Parallel access to tiled memory caused sigbus
+ */
+
+#define NUM_THREADS 2
+#define WIDTH 4096
+#define HEIGHT 4096
+
+struct thread_ctx {
+	drm_intel_bo *bo;
+};
+
+static drm_intel_bufmgr *bufmgr;
+static struct thread_ctx tctx[NUM_THREADS];
+
+static void *copy_fn(void *p)
+{
+	unsigned char *buf;
+	struct thread_ctx *c = p;
+
+	buf = malloc(WIDTH * HEIGHT);
+	if (buf == NULL)
+		return (void *)1;
+
+	memcpy(buf, c->bo->virtual, WIDTH * HEIGHT);
+
+	free(buf);
+	return (void *)0;
+}
+
+static int copy_tile_threaded(drm_intel_bo *bo)
+{
+	int i;
+	int r;
+	pthread_t thr[NUM_THREADS];
+	void *status;
+
+	for (i = 0; i < NUM_THREADS; i++) {
+		tctx[i].bo = bo;
+		r = pthread_create(&thr[i], NULL, copy_fn, (void *)&tctx[i]);
+		assert(r == 0);
+	}
+
+	for (i = 0;  i < NUM_THREADS; i++) {
+		pthread_join(thr[i], &status);
+		assert(status == 0);
+	}
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	int fd;
+	drm_intel_bo *bo;
+	uint32_t tiling_mode = I915_TILING_Y;
+	unsigned long pitch = 0;
+	int r;
+
+	fd = drm_open_any();
+	assert(fd >= 0);
+
+	bufmgr = drm_intel_bufmgr_gem_init(fd, 4096);
+	assert(bufmgr);
+
+	bo = drm_intel_bo_alloc_tiled(bufmgr, "mmap bo", WIDTH, HEIGHT, 1,
+				      &tiling_mode, &pitch, 0);
+	assert(bo);
+
+	r = drm_intel_gem_bo_map_gtt(bo);
+	assert(!r);
+
+	r = copy_tile_threaded(bo);
+	assert(!r);
+
+	r = drm_intel_gem_bo_unmap_gtt(bo);
+	assert(!r);
+
+	drm_intel_bo_unreference(bo);
+	drm_intel_bufmgr_destroy(bufmgr);
+
+	close(fd);
+
+	return 0;
+}
diff -rupN dump_1/tests/gem_tiled_blits.c dump/tests/gem_tiled_blits.c
--- dump_1/tests/gem_tiled_blits.c	2001-01-14 08:11:40.082619273 +0800
+++ dump/tests/gem_tiled_blits.c	2001-01-14 08:10:57.869622297 +0800
@@ -67,17 +67,16 @@ create_bo(uint32_t start_val)
 	drm_intel_bo *bo, *linear_bo;
 	uint32_t *linear;
 	uint32_t tiling = I915_TILING_X;
-	int ret, i;
+	int i;
 
 	bo = drm_intel_bo_alloc(bufmgr, "tiled bo", 1024 * 1024, 4096);
-	ret = drm_intel_bo_set_tiling(bo, &tiling, width * 4);
-	assert(ret == 0);
+	do_or_die(drm_intel_bo_set_tiling(bo, &tiling, width * 4));
 	assert(tiling == I915_TILING_X);
 
 	linear_bo = drm_intel_bo_alloc(bufmgr, "linear src", 1024 * 1024, 4096);
 
 	/* Fill the BO with dwords starting at start_val */
-	drm_intel_bo_map(linear_bo, 1);
+	do_or_die(drm_intel_bo_map(linear_bo, 1));
 	linear = linear_bo->virtual;
 	for (i = 0; i < 1024 * 1024 / 4; i++)
 		linear[i] = start_val++;
@@ -101,7 +100,7 @@ check_bo(drm_intel_bo *bo, uint32_t star
 
 	intel_copy_bo(batch, linear_bo, bo, width, height);
 
-	drm_intel_bo_map(linear_bo, 0);
+	do_or_die(drm_intel_bo_map(linear_bo, 0));
 	linear = linear_bo->virtual;
 
 	for (i = 0; i < 1024 * 1024 / 4; i++) {
@@ -147,6 +146,7 @@ int main(int argc, char **argv)
 
 	bufmgr = drm_intel_bufmgr_gem_init(fd, 4096);
 	drm_intel_bufmgr_gem_enable_reuse(bufmgr);
+	drm_intel_bufmgr_gem_set_vma_cache_size(bufmgr, 32);
 	batch = intel_batchbuffer_alloc(bufmgr, intel_get_drm_devid(fd));
 
 	for (i = 0; i < count; i++) {
diff -rupN dump_1/tests/gem_tiled_partial_pwrite_pread.c dump/tests/gem_tiled_partial_pwrite_pread.c
--- dump_1/tests/gem_tiled_partial_pwrite_pread.c	2001-01-14 08:11:40.083619273 +0800
+++ dump/tests/gem_tiled_partial_pwrite_pread.c	2001-01-14 08:10:57.870622228 +0800
@@ -125,36 +125,12 @@ blt_bo_fill(drm_intel_bo *tmp_bo, drm_in
 
 #define MAX_BLT_SIZE 128
 #define ROUNDS 200
-int main(int argc, char **argv)
+uint8_t tmp[BO_SIZE];
+uint8_t compare_tmp[BO_SIZE];
+
+static void test_partial_reads(void)
 {
 	int i, j;
-	uint8_t tmp[BO_SIZE];
-	uint8_t compare_tmp[BO_SIZE];
-	uint32_t tiling_mode = I915_TILING_X;
-
-	srandom(0xdeadbeef);
-
-	fd = drm_open_any();
-
-	bufmgr = drm_intel_bufmgr_gem_init(fd, 4096);
-	//drm_intel_bufmgr_gem_enable_reuse(bufmgr);
-	devid = intel_get_drm_devid(fd);
-	batch = intel_batchbuffer_alloc(bufmgr, devid);
-
-	/* overallocate the buffers we're actually using because */
-	scratch_bo = drm_intel_bo_alloc_tiled(bufmgr, "scratch bo", 1024, 
-					      BO_SIZE/4096, 4,
-					      &tiling_mode, &scratch_pitch, 0);
-	assert(tiling_mode == I915_TILING_X);
-	assert(scratch_pitch == 4096);
-	staging_bo = drm_intel_bo_alloc(bufmgr, "staging bo", BO_SIZE, 4096);
-	tiled_staging_bo = drm_intel_bo_alloc_tiled(bufmgr, "scratch bo", 1024,
-						    BO_SIZE/4096, 4,
-						    &tiling_mode,
-						    &scratch_pitch, 0);
-
-	drmtest_init_aperture_trashers(bufmgr);
-	mappable_gtt_limit = gem_mappable_aperture_size();
 
 	printf("checking partial reads\n");
 	for (i = 0; i < ROUNDS; i++) {
@@ -177,6 +153,11 @@ int main(int argc, char **argv)
 
 		drmtest_progress("partial reads test: ", i, ROUNDS);
 	}
+}
+
+static void test_partial_writes(void)
+{
+	int i, j;
 
 	printf("checking partial writes\n");
 	for (i = 0; i < ROUNDS; i++) {
@@ -221,6 +202,11 @@ int main(int argc, char **argv)
 
 		drmtest_progress("partial writes test: ", i, ROUNDS);
 	}
+}
+
+static void test_partial_read_writes(void)
+{
+	int i, j;
 
 	printf("checking partial writes after partial reads\n");
 	for (i = 0; i < ROUNDS; i++) {
@@ -284,6 +270,46 @@ int main(int argc, char **argv)
 
 		drmtest_progress("partial read/writes test: ", i, ROUNDS);
 	}
+}
+
+int main(int argc, char **argv)
+{
+	uint32_t tiling_mode = I915_TILING_X;
+
+	drmtest_subtest_init(argc, argv);
+
+	srandom(0xdeadbeef);
+
+	fd = drm_open_any();
+
+	bufmgr = drm_intel_bufmgr_gem_init(fd, 4096);
+	//drm_intel_bufmgr_gem_enable_reuse(bufmgr);
+	devid = intel_get_drm_devid(fd);
+	batch = intel_batchbuffer_alloc(bufmgr, devid);
+
+	/* overallocate the buffers we're actually using because */
+	scratch_bo = drm_intel_bo_alloc_tiled(bufmgr, "scratch bo", 1024,
+					      BO_SIZE/4096, 4,
+					      &tiling_mode, &scratch_pitch, 0);
+	assert(tiling_mode == I915_TILING_X);
+	assert(scratch_pitch == 4096);
+	staging_bo = drm_intel_bo_alloc(bufmgr, "staging bo", BO_SIZE, 4096);
+	tiled_staging_bo = drm_intel_bo_alloc_tiled(bufmgr, "scratch bo", 1024,
+						    BO_SIZE/4096, 4,
+						    &tiling_mode,
+						    &scratch_pitch, 0);
+
+	drmtest_init_aperture_trashers(bufmgr);
+	mappable_gtt_limit = gem_mappable_aperture_size();
+
+	if (drmtest_run_subtest("reads"))
+		test_partial_reads();
+
+	if (drmtest_run_subtest("writes"))
+		test_partial_writes();
+
+	if (drmtest_run_subtest("writes-after-reads"))
+		test_partial_read_writes();
 
 	drmtest_cleanup_aperture_trashers();
 	drm_intel_bufmgr_destroy(bufmgr);
diff -rupN dump_1/tests/gem_tiled_pread_pwrite.c dump/tests/gem_tiled_pread_pwrite.c
--- dump_1/tests/gem_tiled_pread_pwrite.c	2001-01-14 08:11:40.083619273 +0800
+++ dump/tests/gem_tiled_pread_pwrite.c	2001-01-14 08:10:57.870622228 +0800
@@ -149,6 +149,8 @@ main(int argc, char **argv)
 		munmap(data, sizeof(linear));
 
 		/* Leak both bos so that we use all of system mem! */
+		gem_madvise(fd, handle_target, I915_MADV_DONTNEED);
+		gem_madvise(fd, handle, I915_MADV_DONTNEED);
 
 		drmtest_progress("gem_tiled_pread_pwrite: ", i, count/2);
 	}
diff -rupN dump_1/tests/gem_tiled_swapping.c dump/tests/gem_tiled_swapping.c
--- dump_1/tests/gem_tiled_swapping.c	2001-01-14 08:11:40.083619273 +0800
+++ dump/tests/gem_tiled_swapping.c	2001-01-14 08:10:57.871622160 +0800
@@ -79,6 +79,9 @@ create_bo_and_fill(int fd)
 
 	/* Fill the BO with dwords starting at start_val */
 	data = gem_mmap(fd, handle, LINEAR_DWORDS, PROT_READ | PROT_WRITE);
+	if (data == NULL && errno == ENOSPC)
+		return 0;
+
 	for (i = 0; i < WIDTH*HEIGHT; i++)
 		data[i] = i;
 	munmap(data, LINEAR_DWORDS);
@@ -117,8 +120,13 @@ main(int argc, char **argv)
 		return 77;
 	}
 
-	for (i = 0; i < count; i++)
+	for (i = 0; i < count; i++) {
 		bo_handles[i] = create_bo_and_fill(fd);
+		if (bo_handles[i] == 0) {
+			printf("insufficient address space\n");
+			return 77;
+		}
+	}
 
 	for (i = 0; i < count; i++)
 		idx_arr[i] = i;
diff -rupN dump_1/tests/gem_wait_render_timeout.c dump/tests/gem_wait_render_timeout.c
--- dump_1/tests/gem_wait_render_timeout.c	2001-01-14 08:11:40.084619273 +0800
+++ dump/tests/gem_wait_render_timeout.c	2001-01-14 08:10:57.872622092 +0800
@@ -133,7 +133,7 @@ int main(int argc, char **argv)
 
 	if (gem_bo_wait_timeout(fd, dst->handle, &timeout) == -EINVAL) {
 		printf("kernel doesn't support wait_timeout, skipping test\n");
-		return -77;
+		return 77;
 	}
 	timeout = ENOUGH_WORK_IN_SECONDS * NSEC_PER_SEC;
 
diff -rupN dump_1/tests/kms_flip.c dump/tests/kms_flip.c
--- dump_1/tests/kms_flip.c	1970-01-01 07:30:00.000000000 +0730
+++ dump/tests/kms_flip.c	2001-01-14 08:10:57.874621933 +0800
@@ -0,0 +1,1052 @@
+/*
+ * Copyright 2012 Intel Corporation
+ *   Jesse Barnes <jesse.barnes@intel.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "config.h"
+
+#include <assert.h>
+#include <cairo.h>
+#include <errno.h>
+#include <math.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <sys/poll.h>
+#include <sys/time.h>
+#include <sys/mman.h>
+#include <sys/ioctl.h>
+
+#include "i915_drm.h"
+#include "drmtest.h"
+#include "testdisplay.h"
+#include "intel_bufmgr.h"
+#include "intel_batchbuffer.h"
+#include "intel_gpu_tools.h"
+
+#define TEST_DPMS		(1 << 0)
+#define TEST_WITH_DUMMY_LOAD	(1 << 1)
+#define TEST_PAN		(1 << 2)
+#define TEST_MODESET		(1 << 3)
+#define TEST_CHECK_TS		(1 << 4)
+#define TEST_EBUSY		(1 << 5)
+#define TEST_EINVAL		(1 << 6)
+#define TEST_FLIP		(1 << 7)
+#define TEST_VBLANK		(1 << 8)
+#define TEST_VBLANK_BLOCK	(1 << 9)
+#define TEST_VBLANK_ABSOLUTE	(1 << 10)
+#define TEST_VBLANK_EXPIRED_SEQ	(1 << 11)
+#define TEST_FB_RECREATE	(1 << 12)
+#define TEST_RMFB		(1 << 13)
+
+#define EVENT_FLIP		(1 << 0)
+#define EVENT_VBLANK		(1 << 1)
+
+#ifndef DRM_CAP_TIMESTAMP_MONOTONIC
+#define DRM_CAP_TIMESTAMP_MONOTONIC 6
+#endif
+
+drmModeRes *resources;
+int drm_fd;
+static drm_intel_bufmgr *bufmgr;
+struct intel_batchbuffer *batch;
+uint32_t devid;
+int test_time = 3;
+static bool monotonic_timestamp;
+
+uint32_t *fb_ptr;
+
+struct type_name {
+	int type;
+	const char *name;
+};
+
+struct event_state {
+	const char *name;
+
+	/*
+	 * Event data for the last event that has already passed our check.
+	 * Updated using the below current_* vars in update_state().
+	 */
+	struct timeval last_ts;			/* kernel reported timestamp */
+	struct timeval last_received_ts;	/* the moment we received it */
+	unsigned int last_seq;			/* kernel reported seq. num */
+
+	/*
+	 * Event data for for the current event that we just received and
+	 * going to check for validity. Set in event_handler().
+	 */
+	struct timeval current_ts;		/* kernel reported timestamp */
+	struct timeval current_received_ts;	/* the moment we received it */
+	unsigned int current_seq;		/* kernel reported seq. num */
+
+	int count;				/* # of events of this type */
+
+	/* Step between the current and next 'target' sequence number. */
+	int seq_step;
+};
+
+struct test_output {
+	const char *test_name;
+	uint32_t id;
+	int mode_valid;
+	drmModeModeInfo mode;
+	drmModeEncoder *encoder;
+	drmModeConnector *connector;
+	int crtc;
+	int pipe;
+	int flags;
+	unsigned int current_fb_id;
+	unsigned int fb_width;
+	unsigned int fb_height;
+	unsigned int fb_ids[2];
+	int bpp, depth;
+	struct kmstest_fb fb_info[2];
+
+	struct event_state flip_state;
+	struct event_state vblank_state;
+	unsigned int pending_events;
+};
+
+
+static unsigned long gettime_us(void)
+{
+	struct timespec ts;
+
+	clock_gettime(CLOCK_MONOTONIC, &ts);
+
+	return ts.tv_sec * 1000000 + ts.tv_nsec / 1000;
+}
+
+static void emit_dummy_load(struct test_output *o)
+{
+	int i, limit;
+	drm_intel_bo *dummy_bo, *target_bo, *tmp_bo;
+	struct kmstest_fb *fb_info = &o->fb_info[o->current_fb_id];
+	unsigned pitch = fb_info->stride;
+
+	limit = intel_gen(devid) < 6 ? 500 : 5000;
+
+	dummy_bo = drm_intel_bo_alloc(bufmgr, "dummy_bo", fb_info->size, 4096);
+	assert(dummy_bo);
+	target_bo = gem_handle_to_libdrm_bo(bufmgr, drm_fd, "imported", fb_info->gem_handle);
+	assert(target_bo);
+
+	for (i = 0; i < limit; i++) {
+		BEGIN_BATCH(8);
+		OUT_BATCH(XY_SRC_COPY_BLT_CMD |
+			  XY_SRC_COPY_BLT_WRITE_ALPHA |
+			  XY_SRC_COPY_BLT_WRITE_RGB);
+		OUT_BATCH((3 << 24) | /* 32 bits */
+			  (0xcc << 16) | /* copy ROP */
+			  pitch);
+		OUT_BATCH(0 << 16 | 0);
+		OUT_BATCH((o->mode.vdisplay) << 16 | (o->mode.hdisplay));
+		OUT_RELOC_FENCED(dummy_bo, I915_GEM_DOMAIN_RENDER, I915_GEM_DOMAIN_RENDER, 0);
+		OUT_BATCH(0 << 16 | 0);
+		OUT_BATCH(pitch);
+		OUT_RELOC_FENCED(target_bo, I915_GEM_DOMAIN_RENDER, 0, 0);
+		ADVANCE_BATCH();
+
+		if (IS_GEN6(devid) || IS_GEN7(devid)) {
+			BEGIN_BATCH(3);
+			OUT_BATCH(XY_SETUP_CLIP_BLT_CMD);
+			OUT_BATCH(0);
+			OUT_BATCH(0);
+			ADVANCE_BATCH();
+		}
+
+		tmp_bo = dummy_bo;
+		dummy_bo = target_bo;
+		target_bo = tmp_bo;
+	}
+	intel_batchbuffer_flush(batch);
+
+	drm_intel_bo_unreference(dummy_bo);
+	drm_intel_bo_unreference(target_bo);
+}
+
+static int set_dpms(struct test_output *o, int mode)
+{
+	int i, dpms = 0;
+
+	for (i = 0; i < o->connector->count_props; i++) {
+		struct drm_mode_get_property prop;
+
+		prop.prop_id = o->connector->props[i];
+		prop.count_values = 0;
+		prop.count_enum_blobs = 0;
+		if (drmIoctl(drm_fd, DRM_IOCTL_MODE_GETPROPERTY, &prop))
+			continue;
+
+		if (strcmp(prop.name, "DPMS"))
+			continue;
+
+		dpms = prop.prop_id;
+		break;
+	}
+	if (!dpms) {
+		fprintf(stderr, "DPMS property not found on %d\n", o->id);
+		errno = ENOENT;
+		return -1;
+	}
+
+	return drmModeConnectorSetProperty(drm_fd, o->id, dpms, mode);
+}
+
+static void set_flag(unsigned int *v, unsigned int flag)
+{
+	assert(!(*v & flag));
+	*v |= flag;
+}
+
+static void clear_flag(unsigned int *v, unsigned int flag)
+{
+	assert(*v & flag);
+	*v &= ~flag;
+}
+
+static int do_page_flip(struct test_output *o, int fb_id)
+{
+	int ret;
+
+	ret = drmModePageFlip(drm_fd, o->crtc, fb_id, DRM_MODE_PAGE_FLIP_EVENT,
+				o);
+	if (ret == 0)
+		set_flag(&o->pending_events, EVENT_FLIP);
+
+	return ret;
+}
+
+struct vblank_reply {
+	unsigned int sequence;
+	struct timeval ts;
+};
+
+static int __wait_for_vblank(unsigned int flags, int crtc_idx,
+			      int target_seq, unsigned long ret_data,
+			      struct vblank_reply *reply)
+{
+	drmVBlank wait_vbl;
+	int ret;
+	unsigned crtc_idx_mask;
+	bool event = !(flags & TEST_VBLANK_BLOCK);
+
+	memset(&wait_vbl, 0, sizeof(wait_vbl));
+
+	crtc_idx_mask = crtc_idx << DRM_VBLANK_HIGH_CRTC_SHIFT;
+	assert(!(crtc_idx_mask & ~DRM_VBLANK_HIGH_CRTC_MASK));
+
+	wait_vbl.request.type = crtc_idx_mask;
+	if (flags & TEST_VBLANK_ABSOLUTE)
+		wait_vbl.request.type |= DRM_VBLANK_ABSOLUTE;
+	else
+		wait_vbl.request.type |= DRM_VBLANK_RELATIVE;
+	if (event) {
+		wait_vbl.request.type |= DRM_VBLANK_EVENT;
+		wait_vbl.request.signal = ret_data;
+	}
+	wait_vbl.request.sequence = target_seq;
+
+	ret = drmWaitVBlank(drm_fd, &wait_vbl);
+
+	if (ret == 0) {
+		reply->ts.tv_sec = wait_vbl.reply.tval_sec;
+		reply->ts.tv_usec = wait_vbl.reply.tval_usec;
+		reply->sequence = wait_vbl.reply.sequence;
+	} else
+		ret = -errno;
+
+	return ret;
+}
+
+static int do_wait_for_vblank(struct test_output *o, int pipe_id,
+			      int target_seq, struct vblank_reply *reply)
+{
+	int ret;
+
+	ret = __wait_for_vblank(o->flags, pipe_id, target_seq, (unsigned long)o,
+				reply);
+	if (ret == 0 && !(o->flags & TEST_VBLANK_BLOCK))
+		set_flag(&o->pending_events, EVENT_VBLANK);
+
+	return ret;
+}
+
+static bool
+analog_tv_connector(struct test_output *o)
+{
+	uint32_t connector_type = o->connector->connector_type;
+
+	return connector_type == DRM_MODE_CONNECTOR_TV ||
+		connector_type == DRM_MODE_CONNECTOR_9PinDIN ||
+		connector_type == DRM_MODE_CONNECTOR_SVIDEO ||
+		connector_type == DRM_MODE_CONNECTOR_Composite;
+}
+
+static void event_handler(struct event_state *es, unsigned int frame,
+			  unsigned int sec, unsigned int usec)
+{
+	struct timeval now;
+
+	if (monotonic_timestamp) {
+		struct timespec ts;
+
+		clock_gettime(CLOCK_MONOTONIC, &ts);
+		now.tv_sec = ts.tv_sec;
+		now.tv_usec = ts.tv_nsec / 1000;
+	} else {
+		gettimeofday(&now, NULL);
+	}
+	es->current_received_ts = now;
+
+	es->current_ts.tv_sec = sec;
+	es->current_ts.tv_usec = usec;
+	es->current_seq = frame;
+}
+
+static void page_flip_handler(int fd, unsigned int frame, unsigned int sec,
+			      unsigned int usec, void *data)
+{
+	struct test_output *o = data;
+
+	clear_flag(&o->pending_events, EVENT_FLIP);
+	event_handler(&o->flip_state, frame, sec, usec);
+}
+
+static double frame_time(struct test_output *o)
+{
+	return 1000.0 * 1000.0 / o->mode.vrefresh;
+}
+
+static void fixup_premature_vblank_ts(struct test_output *o,
+				      struct event_state *es)
+{
+	/*
+	 * In case a power off event preempts the completion of a
+	 * wait-for-vblank event the kernel will return a wf-vblank event with
+	 * a zeroed-out timestamp. In order that check_state() doesn't
+	 * complain replace this ts with a valid ts. As we can't calculate the
+	 * exact timestamp, just use the time we received the event.
+	 */
+	struct timeval tv;
+
+	if (!(o->flags & (TEST_DPMS | TEST_MODESET)))
+		return;
+
+	if (o->vblank_state.current_ts.tv_sec != 0 ||
+	    o->vblank_state.current_ts.tv_usec != 0)
+		return;
+
+	tv.tv_sec = 0;
+	tv.tv_usec = 1;
+	timersub(&es->current_received_ts, &tv, &es->current_ts);
+}
+
+static void vblank_handler(int fd, unsigned int frame, unsigned int sec,
+			      unsigned int usec, void *data)
+{
+	struct test_output *o = data;
+
+	clear_flag(&o->pending_events, EVENT_VBLANK);
+	event_handler(&o->vblank_state, frame, sec, usec);
+	fixup_premature_vblank_ts(o, &o->vblank_state);
+}
+
+static void check_state(struct test_output *o, struct event_state *es)
+{
+	struct timeval diff;
+	double usec_interflip;
+
+	timersub(&es->current_ts, &es->current_received_ts, &diff);
+	if ((!analog_tv_connector(o)) &&
+	    (diff.tv_sec > 0 || (diff.tv_sec == 0 && diff.tv_usec > 2000))) {
+		fprintf(stderr, "%s ts delayed for too long: %is, %iusec\n",
+			es->name, (int)diff.tv_sec, (int)diff.tv_usec);
+		exit(5);
+	}
+
+	if (es->count == 0)
+		return;
+
+	if (!timercmp(&es->last_received_ts, &es->current_ts, <)) {
+		fprintf(stderr, "%s ts before the %s was issued!\n",
+				es->name, es->name);
+
+		timersub(&es->current_ts, &es->last_received_ts, &diff);
+		fprintf(stderr, "timerdiff %is, %ius\n",
+			(int) diff.tv_sec, (int) diff.tv_usec);
+		exit(6);
+	}
+
+	/* This bounding matches the one in DRM_IOCTL_WAIT_VBLANK. */
+	if (!(o->flags & (TEST_DPMS | TEST_MODESET))) {
+		/* check only valid if no modeset happens in between, that
+		 * increments by (1 << 23) on each step. */
+		if (es->current_seq - (es->last_seq + es->seq_step) > 1UL << 23) {
+			fprintf(stderr, "unexpected %s seq %u, should be >= %u\n",
+				es->name, es->current_seq, es->last_seq + es->seq_step);
+			exit(10);
+		}
+	}
+
+	if ((o->flags & TEST_CHECK_TS) && (!analog_tv_connector(o))) {
+		timersub(&es->current_ts, &es->last_ts, &diff);
+		usec_interflip = (double)es->seq_step * frame_time(o);
+		if (fabs((((double) diff.tv_usec) - usec_interflip) /
+		    usec_interflip) > 0.005) {
+			fprintf(stderr, "inter-%s ts jitter: %is, %ius\n",
+				es->name,
+				(int) diff.tv_sec, (int) diff.tv_usec);
+			exit(9);
+		}
+
+		if (es->current_seq != es->last_seq + es->seq_step) {
+			fprintf(stderr, "unexpected %s seq %u, expected %u\n",
+					es->name, es->current_seq,
+					es->last_seq + es->seq_step);
+			exit(9);
+		}
+	}
+}
+
+static void check_state_correlation(struct test_output *o,
+				    struct event_state *es1,
+				    struct event_state *es2)
+{
+	struct timeval tv_diff;
+	double ftime;
+	double usec_diff;
+	int seq_diff;
+
+	if (es1->count == 0 || es2->count == 0)
+		return;
+
+	timersub(&es2->current_ts, &es1->current_ts, &tv_diff);
+	usec_diff = tv_diff.tv_sec * 1000 * 1000 + tv_diff.tv_usec;
+
+	seq_diff = es2->current_seq - es1->current_seq;
+	ftime = frame_time(o);
+	usec_diff -= seq_diff * ftime;
+
+	if (fabs(usec_diff) / ftime > 0.005) {
+		fprintf(stderr,
+			"timestamp mismatch between %s and %s (diff %.4f sec)\n",
+			es1->name, es2->name, usec_diff / 1000 / 1000);
+		exit(14);
+	}
+}
+
+static void check_all_state(struct test_output *o,
+			    unsigned int completed_events)
+{
+	bool flip, vblank;
+
+	flip = completed_events & EVENT_FLIP;
+	vblank = completed_events & EVENT_VBLANK;
+
+	if (flip)
+		check_state(o, &o->flip_state);
+	if (vblank)
+		check_state(o, &o->vblank_state);
+
+	if (flip && vblank)
+		check_state_correlation(o, &o->flip_state, &o->vblank_state);
+}
+
+static void recreate_fb(struct test_output *o)
+{
+	drmModeFBPtr r;
+	struct kmstest_fb *fb_info = &o->fb_info[o->current_fb_id];
+	uint32_t new_fb_id;
+
+	/* Call rmfb/getfb/addfb to ensure those don't introduce stalls */
+	r = drmModeGetFB(drm_fd, fb_info->fb_id);
+	assert(r);
+
+	do_or_die(drmModeAddFB(drm_fd, o->fb_width, o->fb_height, o->depth,
+			       o->bpp, fb_info->stride,
+			       r->handle, &new_fb_id));
+
+	drmFree(r);
+	gem_close(drm_fd, r->handle);
+	do_or_die(drmModeRmFB(drm_fd, fb_info->fb_id));
+
+	o->fb_ids[o->current_fb_id] = new_fb_id;
+	o->fb_info[o->current_fb_id].fb_id = new_fb_id;
+}
+
+/* Return mask of completed events. */
+static unsigned int run_test_step(struct test_output *o)
+{
+	unsigned int new_fb_id;
+	/* for funny reasons page_flip returns -EBUSY on disabled crtcs ... */
+	int expected_einval = o->flags & TEST_MODESET ? -EBUSY : -EINVAL;
+	unsigned int completed_events = 0;
+	bool do_flip;
+	bool do_vblank;
+	struct vblank_reply vbl_reply;
+	unsigned int target_seq;
+
+	target_seq = o->vblank_state.seq_step;
+	if (o->flags & TEST_VBLANK_ABSOLUTE)
+		target_seq += o->vblank_state.last_seq;
+
+	/*
+	 * It's possible that we don't have a pending flip here, in case both
+	 * wf-vblank and flip were scheduled and the wf-vblank event was
+	 * delivered earlier. The same applies to vblank events w.r.t flip.
+	 */
+	do_flip = (o->flags & TEST_FLIP) && !(o->pending_events & EVENT_FLIP);
+	do_vblank = (o->flags & TEST_VBLANK) &&
+		    !(o->pending_events & EVENT_VBLANK);
+
+	if (o->flags & TEST_WITH_DUMMY_LOAD)
+		emit_dummy_load(o);
+
+
+	o->current_fb_id = !o->current_fb_id;
+	if (o->flags & TEST_FB_RECREATE)
+		recreate_fb(o);
+	new_fb_id = o->fb_ids[o->current_fb_id];
+
+	if ((o->flags & TEST_VBLANK_EXPIRED_SEQ) &&
+	    !(o->pending_events & EVENT_VBLANK) && o->flip_state.count > 0) {
+		struct vblank_reply reply;
+		unsigned int exp_seq;
+		unsigned long start;
+
+		exp_seq = o->flip_state.current_seq;
+		start = gettime_us();
+		do_or_die(__wait_for_vblank(TEST_VBLANK_ABSOLUTE |
+					    TEST_VBLANK_BLOCK, o->pipe, exp_seq,
+					    0, &reply));
+		assert(gettime_us() - start < 500);
+		assert(reply.sequence == exp_seq);
+		assert(timercmp(&reply.ts, &o->flip_state.last_ts, ==));
+	}
+
+	if (do_flip && (o->flags & TEST_EINVAL) && o->flip_state.count > 0)
+		assert(do_page_flip(o, new_fb_id) == expected_einval);
+
+	if (do_vblank && (o->flags & TEST_EINVAL) && o->vblank_state.count > 0)
+		assert(do_wait_for_vblank(o, o->pipe, target_seq, &vbl_reply)
+		       == -EINVAL);
+
+	if (o->flags & TEST_MODESET) {
+		if (drmModeSetCrtc(drm_fd, o->crtc,
+				   o->fb_ids[o->current_fb_id],
+				   0, 0,
+				   &o->id, 1, &o->mode)) {
+			fprintf(stderr, "failed to restore output mode: %s\n",
+				strerror(errno));
+			exit(7);
+		}
+	}
+
+	if (o->flags & TEST_DPMS)
+		do_or_die(set_dpms(o, DRM_MODE_DPMS_ON));
+
+	printf("."); fflush(stdout);
+
+	if (do_flip)
+		do_or_die(do_page_flip(o, new_fb_id));
+
+	if (do_vblank) {
+		do_or_die(do_wait_for_vblank(o, o->pipe, target_seq,
+					     &vbl_reply));
+		if (o->flags & TEST_VBLANK_BLOCK) {
+			event_handler(&o->vblank_state, vbl_reply.sequence,
+				      vbl_reply.ts.tv_sec,
+				      vbl_reply.ts.tv_usec);
+			completed_events = EVENT_VBLANK;
+		}
+	}
+
+	if (do_flip && (o->flags & TEST_EBUSY))
+		assert(do_page_flip(o, new_fb_id) == -EBUSY);
+
+	if (do_flip && (o->flags & TEST_RMFB))
+		recreate_fb(o);
+
+	/* pan before the flip completes */
+	if (o->flags & TEST_PAN) {
+		int count = do_flip ?
+			o->flip_state.count : o->vblank_state.count;
+		int x_ofs = count * 10 > o->mode.hdisplay ?
+			    o->mode.hdisplay : count * 10;
+
+		if (drmModeSetCrtc(drm_fd, o->crtc, o->fb_ids[o->current_fb_id],
+				   x_ofs, 0, &o->id, 1, &o->mode)) {
+			fprintf(stderr, "failed to pan (%dx%d@%dHz): %s\n",
+				o->fb_width, o->fb_height,
+				o->mode.vrefresh, strerror(errno));
+			exit(7);
+		}
+	}
+
+	if (o->flags & TEST_DPMS)
+		do_or_die(set_dpms(o, DRM_MODE_DPMS_OFF));
+
+	if (o->flags & TEST_MODESET && !(o->flags & TEST_RMFB)) {
+		if (drmModeSetCrtc(drm_fd, o->crtc,
+				   0, /* no fb */
+				   0, 0,
+				   NULL, 0, NULL)) {
+			fprintf(stderr, "failed to disable output: %s\n",
+				strerror(errno));
+			exit(7);
+		}
+	}
+
+	if (do_vblank && (o->flags & TEST_EINVAL) && o->vblank_state.count > 0)
+		assert(do_wait_for_vblank(o, o->pipe, target_seq, &vbl_reply)
+		       == -EINVAL);
+
+	if (do_flip && (o->flags & TEST_EINVAL))
+		assert(do_page_flip(o, new_fb_id) == expected_einval);
+
+	return completed_events;
+}
+
+static void update_state(struct event_state *es)
+{
+	es->last_received_ts = es->current_received_ts;
+	es->last_ts = es->current_ts;
+	es->last_seq = es->current_seq;
+	es->count++;
+}
+
+static void update_all_state(struct test_output *o,
+			     unsigned int completed_events)
+{
+	if (completed_events & EVENT_FLIP)
+		update_state(&o->flip_state);
+
+	if (completed_events & EVENT_VBLANK)
+		update_state(&o->vblank_state);
+}
+
+static void connector_find_preferred_mode(struct test_output *o, int crtc_id)
+{
+	drmModeConnector *connector;
+	drmModeEncoder *encoder = NULL;
+	int i, j;
+
+	/* First, find the connector & mode */
+	o->mode_valid = 0;
+	o->crtc = 0;
+	connector = drmModeGetConnector(drm_fd, o->id);
+	assert(connector);
+
+	if (connector->connection != DRM_MODE_CONNECTED) {
+		drmModeFreeConnector(connector);
+		return;
+	}
+
+	if (!connector->count_modes) {
+		fprintf(stderr, "connector %d has no modes\n", o->id);
+		drmModeFreeConnector(connector);
+		return;
+	}
+
+	if (connector->connector_id != o->id) {
+		fprintf(stderr, "connector id doesn't match (%d != %d)\n",
+			connector->connector_id, o->id);
+		drmModeFreeConnector(connector);
+		return;
+	}
+
+	for (j = 0; j < connector->count_modes; j++) {
+		o->mode = connector->modes[j];
+		if (o->mode.type & DRM_MODE_TYPE_PREFERRED) {
+			o->mode_valid = 1;
+			break;
+		}
+	}
+
+	if (!o->mode_valid) {
+		if (connector->count_modes > 0) {
+			/* use the first mode as test mode */
+			o->mode = connector->modes[0];
+			o->mode_valid = 1;
+		}
+		else {
+			fprintf(stderr, "failed to find any modes on connector %d\n",
+				o->id);
+			return;
+		}
+	}
+
+	/* Now get the encoder */
+	for (i = 0; i < connector->count_encoders; i++) {
+		encoder = drmModeGetEncoder(drm_fd, connector->encoders[i]);
+
+		if (!encoder) {
+			fprintf(stderr, "could not get encoder %i: %s\n",
+				resources->encoders[i], strerror(errno));
+			drmModeFreeEncoder(encoder);
+			continue;
+		}
+
+		break;
+	}
+
+	o->encoder = encoder;
+
+	if (i == resources->count_encoders) {
+		fprintf(stderr, "failed to find encoder\n");
+		o->mode_valid = 0;
+		return;
+	}
+
+	/* Find first CRTC not in use */
+	for (i = 0; i < resources->count_crtcs; i++) {
+		if (resources->crtcs[i] != crtc_id)
+			continue;
+		if (resources->crtcs[i] &&
+		    (o->encoder->possible_crtcs & (1<<i))) {
+			o->crtc = resources->crtcs[i];
+			break;
+		}
+	}
+
+	if (!o->crtc) {
+		fprintf(stderr, "could not find requested crtc %d\n", crtc_id);
+		o->mode_valid = 0;
+		return;
+	}
+
+	o->connector = connector;
+}
+
+static void
+paint_flip_mode(cairo_t *cr, int width, int height, void *priv)
+{
+	bool odd_frame = (bool) priv;
+
+	if (odd_frame)
+		cairo_rectangle(cr, width/4, height/2, width/4, height/8);
+	else
+		cairo_rectangle(cr, width/2, height/2, width/4, height/8);
+
+	cairo_set_source_rgb(cr, 1, 1, 1);
+	cairo_fill(cr);
+}
+
+static int
+fb_is_bound(struct test_output *o, int fb)
+{
+	struct drm_mode_crtc mode;
+
+	mode.crtc_id = o->crtc;
+	if (drmIoctl(drm_fd, DRM_IOCTL_MODE_GETCRTC, &mode))
+		return 0;
+
+	return mode.mode_valid && mode.fb_id == fb;
+}
+
+static void check_final_state(struct test_output *o, struct event_state *es,
+			      unsigned int ellapsed)
+{
+	if (es->count == 0) {
+		fprintf(stderr, "no %s event received\n", es->name);
+		exit(12);
+	}
+
+	/* Verify we drop no frames, but only if it's not a TV encoder, since
+	 * those use some funny fake timings behind userspace's back. */
+	if (o->flags & TEST_CHECK_TS && !analog_tv_connector(o)) {
+		int expected;
+		int count = es->count;
+
+		count *= es->seq_step;
+		expected = ellapsed * o->mode.vrefresh / (1000 * 1000);
+		if (count < expected * 99/100) {
+			fprintf(stderr, "dropped frames, expected %d, counted %d, encoder type %d\n",
+				expected, count, o->encoder->encoder_type);
+			exit(3);
+		}
+	}
+}
+
+/*
+ * Wait until at least one pending event completes. Return mask of completed
+ * events.
+ */
+static unsigned int wait_for_events(struct test_output *o)
+{
+	drmEventContext evctx;
+	struct timeval timeout = { .tv_sec = 3, .tv_usec = 0 };
+	fd_set fds;
+	unsigned int event_mask;
+	int ret;
+
+	event_mask = o->pending_events;
+	assert(event_mask);
+
+	memset(&evctx, 0, sizeof evctx);
+	evctx.version = DRM_EVENT_CONTEXT_VERSION;
+	evctx.vblank_handler = vblank_handler;
+	evctx.page_flip_handler = page_flip_handler;
+
+	/* make timeout lax with the dummy load */
+	if (o->flags & TEST_WITH_DUMMY_LOAD)
+		timeout.tv_sec *= 10;
+
+	FD_ZERO(&fds);
+	FD_SET(drm_fd, &fds);
+	ret = select(drm_fd + 1, &fds, NULL, NULL, &timeout);
+
+	if (ret <= 0) {
+		fprintf(stderr, "select timed out or error (ret %d)\n",
+				ret);
+		exit(1);
+	} else if (FD_ISSET(0, &fds)) {
+		fprintf(stderr, "no fds active, breaking\n");
+		exit(2);
+	}
+
+	do_or_die(drmHandleEvent(drm_fd, &evctx));
+
+	event_mask ^= o->pending_events;
+	assert(event_mask);
+
+	return event_mask;
+}
+
+/* Returned the ellapsed time in us */
+static unsigned event_loop(struct test_output *o, unsigned duration_sec)
+{
+	unsigned long start, end;
+
+	start = gettime_us();
+
+	while (1) {
+		unsigned int completed_events;
+
+		completed_events = run_test_step(o);
+		if (o->pending_events)
+			completed_events |= wait_for_events(o);
+		check_all_state(o, completed_events);
+		update_all_state(o, completed_events);
+
+		if ((gettime_us() - start) / 1000000 >= duration_sec)
+			break;
+	}
+
+	end = gettime_us();
+
+	/* Flush any remaining events */
+	if (o->pending_events)
+		wait_for_events(o);
+
+	return end - start;
+}
+
+static void run_test_on_crtc(struct test_output *o, int crtc, int duration)
+{
+	unsigned ellapsed;
+
+	o->bpp = 32;
+	o->depth = 24;
+
+	connector_find_preferred_mode(o, crtc);
+	if (!o->mode_valid)
+		return;
+
+	fprintf(stdout, "Beginning %s on crtc %d, connector %d\n",
+		o->test_name, crtc, o->id);
+
+	o->fb_width = o->mode.hdisplay;
+	o->fb_height = o->mode.vdisplay;
+
+	if (o->flags & TEST_PAN)
+		o->fb_width *= 2;
+
+	o->fb_ids[0] = kmstest_create_fb(drm_fd, o->fb_width, o->fb_height,
+					 o->bpp, o->depth, false, &o->fb_info[0],
+					 paint_flip_mode, (void *)false);
+	o->fb_ids[1] = kmstest_create_fb(drm_fd, o->fb_width, o->fb_height,
+					 o->bpp, o->depth, false, &o->fb_info[1],
+					 paint_flip_mode, (void *)true);
+
+	if (!o->fb_ids[0] || !o->fb_ids[1]) {
+		fprintf(stderr, "failed to create fbs\n");
+		exit(3);
+	}
+
+	kmstest_dump_mode(&o->mode);
+	if (drmModeSetCrtc(drm_fd, o->crtc, o->fb_ids[0], 0, 0,
+			   &o->id, 1, &o->mode)) {
+		fprintf(stderr, "failed to set mode (%dx%d@%dHz): %s\n",
+			o->fb_width, o->fb_height, o->mode.vrefresh,
+			strerror(errno));
+		exit(3);
+	}
+	assert(fb_is_bound(o, o->fb_ids[0]));
+
+	/* quiescent the hw a bit so ensure we don't miss a single frame */
+	if (o->flags & TEST_CHECK_TS)
+		sleep(1);
+
+	if (do_page_flip(o, o->fb_ids[1])) {
+		fprintf(stderr, "failed to page flip: %s\n", strerror(errno));
+		exit(4);
+	}
+	wait_for_events(o);
+
+	o->current_fb_id = 1;
+	o->flip_state.seq_step = 1;
+	if (o->flags & TEST_VBLANK_ABSOLUTE)
+		o->vblank_state.seq_step = 5;
+	else
+		o->vblank_state.seq_step = 1;
+
+	ellapsed = event_loop(o, duration);
+
+	if (o->flags & TEST_FLIP)
+		check_final_state(o, &o->flip_state, ellapsed);
+	if (o->flags & TEST_VBLANK)
+		check_final_state(o, &o->vblank_state, ellapsed);
+
+	fprintf(stdout, "\n%s on crtc %d, connector %d: PASSED\n\n",
+		o->test_name, crtc, o->id);
+
+	kmstest_remove_fb(drm_fd, o->fb_ids[1]);
+	kmstest_remove_fb(drm_fd, o->fb_ids[0]);
+
+	drmModeFreeEncoder(o->encoder);
+	drmModeFreeConnector(o->connector);
+}
+
+static int run_test(int duration, int flags, const char *test_name)
+{
+	struct test_output o;
+	int c, i;
+
+	resources = drmModeGetResources(drm_fd);
+	if (!resources) {
+		fprintf(stderr, "drmModeGetResources failed: %s\n",
+			strerror(errno));
+		exit(5);
+	}
+
+	/* Find any connected displays */
+	for (c = 0; c < resources->count_connectors; c++) {
+		for (i = 0; i < resources->count_crtcs; i++) {
+			int crtc;
+
+			memset(&o, 0, sizeof(o));
+			o.test_name = test_name;
+			o.id = resources->connectors[c];
+			o.flags = flags;
+			o.flip_state.name = "flip";
+			o.vblank_state.name = "vblank";
+			crtc = resources->crtcs[i];
+			o.pipe = kmstest_get_pipe_from_crtc_id(drm_fd, crtc);
+
+			run_test_on_crtc(&o, crtc, duration);
+		}
+	}
+
+	drmModeFreeResources(resources);
+	return 1;
+}
+
+static void get_timestamp_format(void)
+{
+	uint64_t cap_mono;
+	int ret;
+
+	ret = drmGetCap(drm_fd, DRM_CAP_TIMESTAMP_MONOTONIC, &cap_mono);
+	assert(ret == 0 || errno == EINVAL);
+	monotonic_timestamp = ret == 0 && cap_mono == 1;
+	printf("Using %s timestamps\n",
+		monotonic_timestamp ? "monotonic" : "real");
+}
+
+int main(int argc, char **argv)
+{
+	struct {
+		int duration;
+		int flags;
+		const char *name;
+	} tests[] = {
+		{ 15, TEST_VBLANK, "wf_vblank" },
+		{ 15, TEST_VBLANK | TEST_CHECK_TS, "wf_vblank-ts-check" },
+		{ 15, TEST_VBLANK | TEST_VBLANK_BLOCK | TEST_CHECK_TS,
+					"blocking-wf_vblank" },
+		{ 5,  TEST_VBLANK | TEST_VBLANK_ABSOLUTE,
+					"absolute-wf_vblank" },
+		{ 5,  TEST_VBLANK | TEST_VBLANK_BLOCK | TEST_VBLANK_ABSOLUTE,
+					"blocking-absolute-wf_vblank" },
+		{ 30,  TEST_VBLANK | TEST_DPMS | TEST_EINVAL, "wf_vblank-vs-dpms" },
+		{ 30,  TEST_VBLANK | TEST_DPMS | TEST_WITH_DUMMY_LOAD,
+					"delayed-wf_vblank-vs-dpms" },
+		{ 30,  TEST_VBLANK | TEST_MODESET | TEST_EINVAL, "wf_vblank-vs-modeset" },
+		{ 30,  TEST_VBLANK | TEST_MODESET | TEST_WITH_DUMMY_LOAD,
+					"delayed-wf_vblank-vs-modeset" },
+
+		{ 15, TEST_FLIP | TEST_EBUSY , "plain-flip" },
+		{ 15, TEST_FLIP | TEST_CHECK_TS | TEST_EBUSY , "plain-flip-ts-check" },
+		{ 15, TEST_FLIP | TEST_CHECK_TS | TEST_EBUSY | TEST_FB_RECREATE,
+			"plain-flip-fb-recreate" },
+		{ 15, TEST_FLIP | TEST_EBUSY | TEST_RMFB | TEST_MODESET , "flip-vs-rmfb" },
+		{ 30, TEST_FLIP | TEST_DPMS | TEST_EINVAL, "flip-vs-dpms" },
+		{ 30, TEST_FLIP | TEST_DPMS | TEST_WITH_DUMMY_LOAD, "delayed-flip-vs-dpms" },
+		{ 5,  TEST_FLIP | TEST_PAN, "flip-vs-panning" },
+		{ 30, TEST_FLIP | TEST_PAN | TEST_WITH_DUMMY_LOAD, "delayed-flip-vs-panning" },
+		{ 30, TEST_FLIP | TEST_MODESET | TEST_EINVAL, "flip-vs-modeset" },
+		{ 30, TEST_FLIP | TEST_MODESET | TEST_WITH_DUMMY_LOAD, "delayed-flip-vs-modeset" },
+		{ 5,  TEST_FLIP | TEST_VBLANK_EXPIRED_SEQ,
+					"flip-vs-expired-vblank" },
+
+		{ 15, TEST_FLIP | TEST_VBLANK | TEST_VBLANK_ABSOLUTE |
+		      TEST_CHECK_TS, "flip-vs-absolute-wf_vblank" },
+		{ 15, TEST_FLIP | TEST_VBLANK | TEST_CHECK_TS,
+					"flip-vs-wf_vblank" },
+		{ 15, TEST_FLIP | TEST_VBLANK | TEST_VBLANK_BLOCK |
+			TEST_CHECK_TS, "flip-vs-blocking-wf-vblank" },
+	};
+	int i;
+
+	drmtest_subtest_init(argc, argv);
+
+	drm_fd = drm_open_any();
+
+	if (!drmtest_only_list_subtests())
+		get_timestamp_format();
+
+	bufmgr = drm_intel_bufmgr_gem_init(drm_fd, 4096);
+	devid = intel_get_drm_devid(drm_fd);
+	batch = intel_batchbuffer_alloc(bufmgr, devid);
+
+	for (i = 0; i < sizeof(tests) / sizeof (tests[0]); i++) {
+		if (drmtest_run_subtest(tests[i].name)) {
+			printf("running testcase: %s\n", tests[i].name);
+			run_test(tests[i].duration, tests[i].flags, tests[i].name);
+		}
+	}
+
+	close(drm_fd);
+
+	return 0;
+}
diff -rupN dump_1/tests/Makefile.am dump/tests/Makefile.am
--- dump_1/tests/Makefile.am	2001-01-14 08:11:40.069619273 +0800
+++ dump/tests/Makefile.am	2001-01-14 08:10:57.875621814 +0800
@@ -1,46 +1,60 @@
 noinst_PROGRAMS = \
 	gem_stress \
 	$(TESTS_progs) \
+	$(TESTS_progs_M) \
 	$(HANG) \
 	$(NULL)
 
 if HAVE_NOUVEAU
+NOUVEAU_TESTS_M = \
+	prime_nv_test \
+	prime_nv_pcopy
 NOUVEAU_TESTS = \
-	prime_nv_api  \
-	prime_nv_pcopy \
-	prime_nv_test
+	prime_nv_api
 endif
 
-TESTS_progs = \
-	getversion \
-	getclient \
-	getstats \
+TESTS_progs_M = \
 	gem_basic \
 	gem_cacheing \
 	gem_cpu_concurrent_blit \
-	gem_gtt_concurrent_blit \
+	gem_cs_tlb \
+	gem_dummy_reloc_loop \
+	gem_exec_bad_domains \
 	gem_exec_nop \
+	gem_flink \
+	gem_gtt_concurrent_blit \
+	gem_mmap_gtt \
+	gem_partial_pwrite_pread \
+	gem_ringfill \
+	gem_set_tiling_vs_blt \
+	gem_tiled_partial_pwrite_pread \
+	$(NOUVEAU_TESTS_M) \
+	kms_flip \
+	$(NULL)
+
+TESTS_progs = \
+	getversion \
+	getclient \
+	getstats \
+	gem_exec_big \
 	gem_exec_blt \
-	gem_exec_bad_domains \
 	gem_exec_faulting_reloc \
-	gem_flink \
 	gem_readwrite \
-	gem_ringfill \
 	gem_mmap \
-	gem_mmap_gtt \
 	gem_mmap_offset_exhaustion \
+	gem_hangcheck_forcewake \
+	gem_pin \
 	gem_pwrite \
 	gem_pread_after_blit \
-	gem_set_tiling_vs_blt \
 	gem_set_tiling_vs_gtt \
 	gem_set_tiling_vs_pwrite \
 	gem_tiled_pread \
 	gem_tiled_pread_pwrite \
-	gem_tiled_partial_pwrite_pread \
 	gem_tiled_swapping \
-	gem_partial_pwrite_pread \
 	gem_linear_blits \
 	gem_vmap_blits \
+	gem_threaded_access_tiled \
+	gem_seqno_wrap \
 	gem_tiled_blits \
 	gem_tiled_fence_blits \
 	gem_largeobject \
@@ -50,16 +64,18 @@ TESTS_progs = \
 	gem_gtt_speed \
 	gem_gtt_cpu_tlb \
 	gem_cs_prefetch \
+	gem_cpu_reloc \
 	gen3_render_linear_blits \
 	gen3_render_tiledx_blits \
 	gen3_render_tiledy_blits \
 	gen3_render_mixed_blits \
 	gen3_mixed_blits \
+	gem_render_linear_blits \
+	gem_render_tiled_blits \
 	gem_storedw_loop_render \
 	gem_storedw_loop_blt \
 	gem_storedw_loop_bsd \
 	gem_storedw_batches_loop \
-	gem_dummy_reloc_loop \
 	gem_double_irq_loop \
 	gem_ring_sync_loop \
 	gem_pipe_control_store_loop \
@@ -72,7 +88,6 @@ TESTS_progs = \
 	drm_vma_limiter_cached \
 	sysfs_rc6_residency \
 	sysfs_rps \
-	flip_test \
 	gem_wait_render_timeout \
 	gem_ctx_create \
 	gem_ctx_bad_destroy \
@@ -82,42 +97,70 @@ TESTS_progs = \
 	gem_reg_read \
 	$(NOUVEAU_TESTS) \
 	prime_self_import \
+	prime_udl \
 	$(NULL)
 
 # IMPORTANT: The ZZ_ tests need to be run last!
 # ... and make can't deal with inlined comments ...
+TESTS_scripts_M = \
+	$(NULL)
+
 TESTS_scripts = \
+	test_rte_check
 	debugfs_reader \
 	debugfs_emon_crash \
 	sysfs_l3_parity \
 	sysfs_edid_timing \
 	module_reload \
-	ZZ_check_dmesg \
 	ZZ_hangman \
 	$(NULL)
 
-kernel_tests = \
+# This target contains testcases which support automagic subtest enumeration
+# from the piglit testrunner with --list-subtests and running individual
+# subtests with --run-subtest <testname>
+multi_kernel_tests = \
+	$(TESTS_progs_M) \
+	$(TESTS_scripts_M) \
+	$(NULL)
+
+single_kernel_tests = \
 	$(TESTS_progs) \
 	$(TESTS_scripts) \
 	$(NULL)
 
+kernel_tests = \
+	$(single_kernel_tests) \
+	$(multi_kernel_tests) \
+	$(NULL)
+
 TESTS = \
 	$(NULL)
 
 test:
-	whoami | grep root || ( echo ERROR: not running as root; exit 1 )
-	./check_drm_clients
-	make TESTS="${kernel_tests}" check
+	@whoami | grep root || ( echo ERROR: not running as root; exit 1 )
+	@./check_drm_clients
+	@make TESTS="${kernel_tests}" check
+
+list-single-tests:
+	@echo TESTLIST
+	@echo ${single_kernel_tests}
+	@echo END TESTLIST
+
+list-multi-tests:
+	@echo TESTLIST
+	@echo ${multi_kernel_tests}
+	@echo END TESTLIST
 
 HANG = \
 	gem_bad_batch \
 	gem_hang \
 	gem_bad_blit \
 	gem_bad_address \
+	gem_non_secure_batch \
 	$(NULL)
 
-EXTRA_PROGRAMS = $(TESTS_progs) $(HANG)
-EXTRA_DIST = $(TESTS_scripts) drm_lib.sh check_drm_clients debugfs_wedged
+EXTRA_PROGRAMS = $(TESTS_progs) $(TESTS_progs_M) $(HANG)
+EXTRA_DIST = $(TESTS_scripts) $(TESTS_scripts_M) drm_lib.sh check_drm_clients debugfs_wedged
 CLEANFILES = $(EXTRA_PROGRAMS)
 
 AM_CFLAGS = $(DRM_CFLAGS) $(CWARNFLAGS) \
@@ -137,8 +180,10 @@ AM_CFLAGS += $(CAIRO_CFLAGS) $(LIBUDEV_C
 
 gem_fence_thrash_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_fence_thrash_LDADD = $(LDADD) -lpthread
+gem_threaded_access_tiled_LDADD = $(LDADD) -lpthread
 
 gem_wait_render_timeout_LDADD = $(LDADD) -lrt
+kms_flip_LDADD = $(LDADD) -lrt
 
 gem_ctx_basic_LDADD = $(LDADD) -lpthread
 
diff -rupN dump_1/tests/prime_nv_api.c dump/tests/prime_nv_api.c
--- dump_1/tests/prime_nv_api.c	2001-01-14 08:11:40.086619273 +0800
+++ dump/tests/prime_nv_api.c	2001-01-14 08:10:57.875621814 +0800
@@ -38,6 +38,8 @@ static int find_and_open_devices(void)
 	char vendor_id[8];
 	int venid;
 	for (i = 0; i < 9; i++) {
+		char *ret;
+
 		sprintf(path, "/sys/class/drm/card%d/device/vendor", i);
 		if (stat(path, &buf))
 			break;
@@ -46,7 +48,8 @@ static int find_and_open_devices(void)
 		if (!fl)
 			break;
 
-		fgets(vendor_id, 8, fl);
+		ret = fgets(vendor_id, 8, fl);
+		assert(ret);
 		fclose(fl);
 
 		venid = strtoul(vendor_id, NULL, 16);
diff -rupN dump_1/tests/prime_nv_pcopy.c dump/tests/prime_nv_pcopy.c
--- dump_1/tests/prime_nv_pcopy.c	2001-01-14 08:11:40.087619273 +0800
+++ dump/tests/prime_nv_pcopy.c	2001-01-14 08:10:57.877621571 +0800
@@ -28,6 +28,7 @@
 #include "nouveau.h"
 #include "intel_gpu_tools.h"
 #include "intel_batchbuffer.h"
+#include "drmtest.h"
 
 static int intel_fd = -1, nouveau_fd = -1;
 static drm_intel_bufmgr *bufmgr;
@@ -534,6 +535,7 @@ static int perform_copy(struct nouveau_b
 	uint32_t cpp = 1, exec = 0x00003000; /* QUERY|QUERY_SHORT|FORMAT */
 	uint32_t src_off = 0, dst_off = 0;
 	struct nouveau_pushbuf *push = npush;
+	int ret;
 
 	if (nvbi->config.nv50.tile_mode == tile_intel_y)
 		dbg("src is y-tiled\n");
@@ -591,9 +593,9 @@ static int perform_copy(struct nouveau_b
 	BEGIN_NVXX(push, SUBC_COPY(0x0300), 1);
 	PUSH_DATA (push, exec);
 
-	nouveau_pushbuf_kick(push, push->channel);
-	while (*query < query_counter) { usleep(1000); }
-	return 0;
+	ret = nouveau_pushbuf_kick(push, push->channel);
+	while (!ret && *query < query_counter) { usleep(1000); }
+	return ret;
 }
 
 static int check1_macro(uint32_t *p, uint32_t w, uint32_t h)
@@ -1265,13 +1267,16 @@ int main(int argc, char **argv)
 {
 	int ret, failed = 0, run = 0;
 
+	drmtest_subtest_init(argc, argv);
+
 	ret = find_and_open_devices();
 	if (ret < 0)
 		return ret;
 
 	if (nouveau_fd == -1 || intel_fd == -1) {
 		fprintf(stderr,"failed to find intel and nouveau GPU\n");
-		return 77;
+		if (!drmtest_only_list_subtests())
+			return 77;
 	}
 
 	/* set up intel bufmgr */
@@ -1291,6 +1296,7 @@ int main(int argc, char **argv)
 	batch = intel_batchbuffer_alloc(bufmgr, devid);
 
 #define xtest(x, args...) do { \
+	if (!drmtest_run_subtest( #x )) break; \
 	ret = ((x)(args)); \
 	++run; \
 	if (ret) { \
@@ -1324,6 +1330,8 @@ int main(int argc, char **argv)
 	close(intel_fd);
 	close(nouveau_fd);
 
-	printf("Tests: %u run, %u failed\n", run, failed);
+	if (!drmtest_only_list_subtests())
+		printf("Tests: %u run, %u failed\n", run, failed);
+
 	return failed;
 }
diff -rupN dump_1/tests/prime_nv_test.c dump/tests/prime_nv_test.c
--- dump_1/tests/prime_nv_test.c	2001-01-14 08:11:40.087619273 +0800
+++ dump/tests/prime_nv_test.c	2001-01-14 08:10:57.878621453 +0800
@@ -27,6 +27,7 @@
 #include "nouveau.h"
 #include "intel_gpu_tools.h"
 #include "intel_batchbuffer.h"
+#include "drmtest.h"
 
 int intel_fd = -1, nouveau_fd = -1;
 drm_intel_bufmgr *bufmgr;
@@ -46,6 +47,8 @@ static int find_and_open_devices(void)
 	char vendor_id[8];
 	int venid;
 	for (i = 0; i < 9; i++) {
+		char *ret;
+
 		sprintf(path, "/sys/class/drm/card%d/device/vendor", i);
 		if (stat(path, &buf))
 			break;
@@ -54,7 +57,8 @@ static int find_and_open_devices(void)
 		if (!fl)
 			break;
 
-		fgets(vendor_id, 8, fl);
+		ret = fgets(vendor_id, 8, fl);
+		assert(ret);
 		fclose(fl);
 
 		venid = strtoul(vendor_id, NULL, 16);
@@ -264,8 +268,7 @@ static int test5(void)
 
 	ret = drm_intel_bo_map(test_intel_bo, 0);
 	if (ret != 0) {
-		/* failed to map the bo is expected */
-		ret = 0;
+		fprintf(stderr,"failed to map imported bo on intel side\n");
 		goto out;
 	}
 	if (!test_intel_bo->virtual) {
@@ -404,18 +407,17 @@ static int test7(void)
 	*ptr = 0xdeadbeef;
 
 	ret = do_read(intel_fd, test_intel_bo->handle, buf, 0, 256);
-	if (ret != -1) {
-		fprintf(stderr,"pread succeedded %d\n", ret);
+	if (ret) {
+		fprintf(stderr,"pread failed %d\n", errno);
 		goto out;
 	}
 	buf[0] = 0xabcdef55;
 
 	ret = do_write(intel_fd, test_intel_bo->handle, buf, 0, 4);
-	if (ret != -1) {
-		fprintf(stderr,"pwrite succeedded\n");
+	if (ret) {
+		fprintf(stderr,"pwrite failed %d\n", errno);
 		goto out;
 	}
-	ret = 0;
  out:
 	nouveau_bo_ref(NULL, &nvbo);
 	drm_intel_bo_unreference(test_intel_bo);
@@ -502,15 +504,18 @@ out:
 
 int main(int argc, char **argv)
 {
-	int ret;
+	int ret = 0;
 
 	ret = find_and_open_devices();
 	if (ret < 0)
 		return ret;
 
+	drmtest_subtest_init(argc, argv);
+
 	if (nouveau_fd == -1 || intel_fd == -1) {
 		fprintf(stderr,"failed to find intel and nouveau GPU\n");
-		return 77;
+		if (!drmtest_only_list_subtests())
+			return 77;
 	}
 
 	/* set up intel bufmgr */
@@ -538,37 +543,37 @@ int main(int argc, char **argv)
 	intel_batch = intel_batchbuffer_alloc(bufmgr, devid);
 
 	/* create an object on the i915 */
-	ret = test1();
-	if (ret)
-		fprintf(stderr,"prime_test: failed test 1\n");
-
-	ret = test2();
-	if (ret)
-		fprintf(stderr,"prime_test: failed test 2\n");
-
-	ret = test3();
-	if (ret)
-		fprintf(stderr,"prime_test: failed test 3\n");
-
-	ret = test4();
-	if (ret)
-		fprintf(stderr,"prime_test: failed test 4\n");
-
-	ret = test5();
-	if (ret)
-		fprintf(stderr,"prime_test: failed test 5\n");
-
-	ret = test6();
-	if (ret)
-		fprintf(stderr,"prime_test: failed test 6\n");
-
-	ret = test7();
-	if (ret)
-		fprintf(stderr,"prime_test: failed test 7\n");
-
-	ret = test8();
-	if (ret)
-		fprintf(stderr,"prime_test: failed test 8\n");
+	if (drmtest_run_subtest("i915-nouveau-sharing"))
+		if (test1())
+			exit(1);
+
+	if (drmtest_run_subtest("nouveau-i915-sharing"))
+		if (test2())
+			exit(1);
+
+	if (drmtest_run_subtest("nouveau-write-i915-shmem-read"))
+		if (test3())
+			exit(1);
+
+	if (drmtest_run_subtest("nouveau-write-i915-gtt-read"))
+		if (test4())
+			exit(1);
+
+	if (drmtest_run_subtest("i915-import-shmem-mmap"))
+		if (test5())
+			exit(1);
+
+	if (drmtest_run_subtest("i915-import-gtt-mmap"))
+		if (test6())
+			exit(1);
+
+	if (drmtest_run_subtest("i915-import-pread-pwrite"))
+		if (test7())
+			exit(1);
+
+	if (drmtest_run_subtest("i915-blt-fill-nouveau-read"))
+		if (test8())
+			exit(1);
 
 	intel_batchbuffer_free(intel_batch);
 
diff -rupN dump_1/tests/prime_udl.c dump/tests/prime_udl.c
--- dump_1/tests/prime_udl.c	1970-01-01 07:30:00.000000000 +0730
+++ dump/tests/prime_udl.c	2001-01-14 08:10:57.879621337 +0800
@@ -0,0 +1,187 @@
+/* basic set of prime tests between intel and nouveau */
+
+/* test list - 
+   1. share buffer from intel -> nouveau.
+   2. share buffer from nouveau -> intel
+   3. share intel->nouveau, map on both, write intel, read nouveau
+   4. share intel->nouveau, blit intel fill, readback on nouveau
+   test 1 + map buffer, read/write, map other size.
+   do some hw actions on the buffer
+   some illegal operations -
+       close prime fd try and map
+
+   TODO add some nouveau rendering tests
+*/
+
+   
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+
+#include "xf86drm.h"
+#include "xf86drmMode.h"
+#include "i915_drm.h"
+#include "intel_bufmgr.h"
+#include "intel_gpu_tools.h"
+#include "intel_batchbuffer.h"
+
+int intel_fd = -1, udl_fd = -1;
+drm_intel_bufmgr *bufmgr;
+uint32_t devid;
+struct intel_batchbuffer *intel_batch;
+
+#define BO_SIZE (640*480*2)
+
+static int find_and_open_devices(void)
+{
+	int i;
+	char path[80];
+	struct stat buf;
+	FILE *fl;
+	char vendor_id[8];
+	int venid;
+	for (i = 0; i < 9; i++) {
+		sprintf(path, "/sys/class/drm/card%d/device/vendor", i);
+		if (stat(path, &buf)) {
+			/* look for usb dev */
+			sprintf(path, "/sys/class/drm/card%d/device/idVendor", i);
+			if (stat(path, &buf))
+				break;
+		}
+
+		fl = fopen(path, "r");
+		if (!fl)
+			break;
+
+		fgets(vendor_id, 8, fl);
+		fclose(fl);
+
+		venid = strtoul(vendor_id, NULL, 16);
+		sprintf(path, "/dev/dri/card%d", i);
+		if (venid == 0x8086) {
+			intel_fd = open(path, O_RDWR);
+			if (!intel_fd)
+				return -1;
+		} else if (venid == 0x17e9) {
+			udl_fd = open(path, O_RDWR);
+			if (!udl_fd)
+				return -1;
+		}
+	}
+	return 0;
+}
+
+static int dumb_bo_destroy(int fd, uint32_t handle)
+{
+
+	struct drm_mode_destroy_dumb arg;
+	int ret;
+	memset(&arg, 0, sizeof(arg));
+	arg.handle = handle;
+	ret = drmIoctl(fd, DRM_IOCTL_MODE_DESTROY_DUMB, &arg);
+	if (ret)
+		return -errno;
+	return 0;
+
+}
+
+/*
+ * simple share and import
+ */
+static int test1(void)
+{
+	drm_intel_bo *test_intel_bo;
+	int prime_fd;
+	int ret;
+	uint32_t udl_handle;
+
+	test_intel_bo = drm_intel_bo_alloc(bufmgr, "test bo", BO_SIZE, 4096);
+
+	drm_intel_bo_gem_export_to_prime(test_intel_bo, &prime_fd);
+
+	ret = drmPrimeFDToHandle(udl_fd, prime_fd, &udl_handle);
+
+	dumb_bo_destroy(udl_fd, udl_handle);
+	drm_intel_bo_unreference(test_intel_bo);
+	return ret;
+}
+
+static int test2(void)
+{
+	drm_intel_bo *test_intel_bo;
+	uint32_t fb_id;
+	drmModeClip clip;
+	int prime_fd;
+	uint32_t udl_handle;
+	int ret;
+
+	test_intel_bo = drm_intel_bo_alloc(bufmgr, "test bo", BO_SIZE, 4096);
+
+	drm_intel_bo_gem_export_to_prime(test_intel_bo, &prime_fd);
+
+	ret = drmPrimeFDToHandle(udl_fd, prime_fd, &udl_handle);
+	if (ret)
+		goto out;
+
+	ret = drmModeAddFB(udl_fd, 640, 480, 16, 16, 640, udl_handle, &fb_id);
+	if (ret)
+		goto out;
+
+	clip.x1 = 0;
+	clip.y1 = 0;
+	clip.x2 = 10;
+	clip.y2 = 10;
+	ret = drmModeDirtyFB(udl_fd, fb_id, &clip, 1);
+	if (ret) {
+		return ret;
+	}
+out:
+	dumb_bo_destroy(udl_fd, udl_handle);
+	drm_intel_bo_unreference(test_intel_bo);
+	return ret;
+}
+
+int main(int argc, char **argv)
+{
+	int ret;
+
+	ret = find_and_open_devices();
+	if (ret < 0)
+		return ret;
+
+	if (udl_fd == -1 && intel_fd == -1) {
+		fprintf(stderr,"failed to find intel and udl GPU\n");
+		return -1;
+	}
+
+	/* set up intel bufmgr */
+	bufmgr = drm_intel_bufmgr_gem_init(intel_fd, 4096);
+	drm_intel_bufmgr_gem_enable_reuse(bufmgr);
+
+	/* set up an intel batch buffer */
+	devid = intel_get_drm_devid(intel_fd);
+	intel_batch = intel_batchbuffer_alloc(bufmgr, devid);
+
+	/* create an object on the i915 */
+	ret = test1();
+	if (ret)
+		fprintf(stderr,"prime_test: failed test 1\n");
+
+	ret = test2();
+	if (ret)
+		fprintf(stderr,"prime_test: failed test 2 %d\n", ret);
+
+	intel_batchbuffer_free(intel_batch);
+
+	drm_intel_bufmgr_destroy(bufmgr);
+
+	close(intel_fd);
+	close(udl_fd);
+
+	return ret;
+}
diff -rupN dump_1/tests/sysfs_rc6_residency.c dump/tests/sysfs_rc6_residency.c
--- dump_1/tests/sysfs_rc6_residency.c	2001-01-14 08:11:40.088619273 +0800
+++ dump/tests/sysfs_rc6_residency.c	2001-01-14 08:10:57.879621337 +0800
@@ -38,6 +38,7 @@
 static unsigned int readit(const char *path)
 {
 	unsigned int ret;
+	int scanned;
 
 	FILE *file;
 	file = fopen(path, "r");
@@ -45,7 +46,9 @@ static unsigned int readit(const char *p
 		fprintf(stderr, "Couldn't open %s (%d)\n", path, errno);
 		abort();
 	}
-	fscanf(file, "%u", &ret);
+	scanned = fscanf(file, "%u", &ret);
+	assert(scanned == 1);
+
 	fclose(file);
 
 	return ret;
diff -rupN dump_1/tests/sysfs_rps.c dump/tests/sysfs_rps.c
--- dump_1/tests/sysfs_rps.c	2001-01-14 08:11:40.088619273 +0800
+++ dump/tests/sysfs_rps.c	2001-01-14 08:10:57.880621228 +0800
@@ -64,18 +64,30 @@ struct junk {
 static int readval(FILE *filp)
 {
 	int val;
+	int scanned;
+
 	fflush(filp);
 	rewind(filp);
-	fscanf(filp, "%d", &val);
+	scanned = fscanf(filp, "%d", &val);
+	assert(scanned == 1);
+
 	return val;
 }
 
-static void writeval(FILE *filp, int val)
+static int do_writeval(FILE *filp, int val, int lerrno)
 {
+	/* Must write twice to sysfs since the first one simply calculates the size and won't return the error */
+	int ret;
 	rewind(filp);
-	fprintf(filp, "%d", val);
+	ret = fprintf(filp, "%d", val);
+	rewind(filp);
+	ret = fprintf(filp, "%d", val);
+	if (ret && lerrno)
+		assert(errno = lerrno);
 	fflush(filp);
+	return ret;
 }
+#define writeval(filp, val) do_writeval(filp, val, 0)
 
 #define fcur (readval(stuff[CUR].filp))
 #define fmin (readval(stuff[MIN].filp))
@@ -135,7 +147,7 @@ int main(int argc, char *argv[])
 		assert(ret != -1);
 		junk->filp = fopen(path, junk->mode);
 		if (junk->filp == NULL) {
-			fprintf(stderr, "Kernel is too old. GTFO\n");
+			printf("Kernel is too old. GTFO\n");
 			exit(77);
 		}
 		val = readval(junk->filp);
@@ -172,6 +184,9 @@ int main(int argc, char *argv[])
 	writeval(stuff[MAX].filp, fmin - 1);
 	checkit();
 
+	do_writeval(stuff[MIN].filp, 0x11111110, EINVAL);
+	do_writeval(stuff[MAX].filp, 0, EINVAL);
+
 	writeval(stuff[MIN].filp, origmin);
 	writeval(stuff[MAX].filp, origmax);
 
diff -rupN dump_1/tests/testdisplay.c dump/tests/testdisplay.c
--- dump_1/tests/testdisplay.c	2001-01-14 08:11:40.089619273 +0800
+++ dump/tests/testdisplay.c	2001-01-14 08:10:57.881621122 +0800
@@ -72,6 +72,7 @@ int dump_info = 0, test_all_modes =0, te
 int sleep_between_modes = 5;
 uint32_t depth = 24, stride, bpp;
 int qr_code = 0;
+int only_one_mode = 0, specified_mode_num = 0, specified_disp_id = 0;
 
 drmModeModeInfo force_timing;
 
@@ -193,8 +194,10 @@ static void dump_connectors_fd(int drmfd
 		printf("  modes:\n");
 		printf("  name refresh (Hz) hdisp hss hse htot vdisp "
 		       "vss vse vtot flags type clock\n");
-		for (j = 0; j < connector->count_modes; j++)
+		for (j = 0; j < connector->count_modes; j++){
+			fprintf(stdout, "[%d]", j );
 			kmstest_dump_mode(&connector->modes[j]);
+		}
 
 		drmModeFreeConnector(connector);
 	}
@@ -275,6 +278,12 @@ static void connector_find_preferred_mod
 		}
 	}
 
+	if ( only_one_mode ){
+		c->mode = connector->modes[specified_mode_num];
+		if (c->mode.type & DRM_MODE_TYPE_PREFERRED)
+			c->mode_valid = 1;
+	}
+
 	if (!c->mode_valid) {
 		if (connector->count_modes > 0) {
 			/* use the first mode as test mode */
@@ -318,14 +327,14 @@ static void connector_find_preferred_mod
 	c->crtc = resources->crtcs[i];
 	c->pipe = i;
 
-	if(test_preferred_mode || force_mode)
+	if(test_preferred_mode || force_mode || only_one_mode)
 		resources->crtcs[i] = 0;
 
 	c->connector = connector;
 }
 
 static void
-paint_color_key(void)
+paint_color_key(struct kmstest_fb *fb_info)
 {
 	int i, j;
 
@@ -333,7 +342,7 @@ paint_color_key(void)
 		for (j = crtc_x; j < crtc_x + crtc_w; j++) {
 			uint32_t offset;
 
-			offset = (i * width) + j;
+			offset = (i * fb_info->stride / 4) + j;
 			fb_ptr[offset] = SPRITE_COLOR_KEY;
 		}
 }
@@ -521,11 +530,11 @@ set_mode(struct connector *c)
 		fb_ptr = gem_mmap(drm_fd, fb_info.gem_handle,
 				  fb_info.size, PROT_READ | PROT_WRITE);
 		assert(fb_ptr);
-		paint_color_key();
+		paint_color_key(&fb_info);
 
 		gem_close(drm_fd, fb_info.gem_handle);
 
-		fprintf(stdout, "CRTS(%u):",c->crtc);
+		fprintf(stdout, "CRTS(%u):[%d]",c->crtc, j);
 		kmstest_dump_mode(&c->mode);
 		if (drmModeSetCrtc(drm_fd, c->crtc, fb_id, 0, 0,
 				   &c->id, 1, &c->mode)) {
@@ -585,10 +594,13 @@ int update_display(void)
 		dump_crtcs_fd(drm_fd);
 	}
 
-	if (test_preferred_mode || test_all_modes || force_mode) {
+	if (test_preferred_mode || test_all_modes || force_mode || only_one_mode) {
 		/* Find any connected displays */
 		for (c = 0; c < resources->count_connectors; c++) {
 			connectors[c].id = resources->connectors[c];
+			if ( only_one_mode == 1 && connectors[c].id != specified_disp_id )
+				continue;
+
 			set_mode(&connectors[c]);
 		}
 	}
@@ -596,7 +608,7 @@ int update_display(void)
 	return 1;
 }
 
-static char optstr[] = "hiaf:s:d:p:mrt";
+static char optstr[] = "hiaf:s:d:p:mrto:";
 
 static void __attribute__((noreturn)) usage(char *name)
 {
@@ -609,6 +621,7 @@ static void __attribute__((noreturn)) us
 	fprintf(stderr, "\t-m\ttest the preferred mode\n");
 	fprintf(stderr, "\t-t\tuse a tiled framebuffer\n");
 	fprintf(stderr, "\t-r\tprint a QR code on the screen whose content is \"pass\" for the automatic test\n");
+	fprintf(stderr, "\t-o\t<id of the display>,<number of the mode>\tonly test specified mode on the specified display\n");
 	fprintf(stderr, "\t-f\t<clock MHz>,<hdisp>,<hsync-start>,<hsync-end>,<htotal>,\n");
 	fprintf(stderr, "\t\t<vdisp>,<vsync-start>,<vsync-end>,<vtotal>\n");
 	fprintf(stderr, "\t\ttest force mode\n");
@@ -637,6 +650,7 @@ static void enter_exec_path( char **argv
 	char *exec_path = NULL;
 	char *pos = NULL;
 	short len_path = 0;
+	int ret;
 
 	len_path = strlen( argv[0] );
 	exec_path = (char*) malloc(len_path);
@@ -646,7 +660,8 @@ static void enter_exec_path( char **argv
 	if (pos != NULL)
 		*(pos+1) = '\0';
 
-	chdir(exec_path);
+	ret = chdir(exec_path);
+	assert(ret == 0);
 	free(exec_path);
 }
 
@@ -701,6 +716,10 @@ int main(int argc, char **argv)
 		case 'r':
 			qr_code = 1;
 			break;
+		case 'o':
+			only_one_mode = 1;
+			sscanf(optarg, "%d,%d", &specified_disp_id, &specified_mode_num);
+			break;
 		default:
 			fprintf(stderr, "unknown option %c\n", c);
 			/* fall through */
@@ -710,7 +729,7 @@ int main(int argc, char **argv)
 		}
 	}
 	if (!test_all_modes && !force_mode && !dump_info &&
-	    !test_preferred_mode)
+	    !test_preferred_mode && !only_one_mode)
 		test_all_modes = 1;
 
 	drm_fd = drm_open_any();
diff -rupN dump_1/tests/ZZ_hangman dump/tests/ZZ_hangman
--- dump_1/tests/ZZ_hangman	2001-01-14 08:11:40.069619273 +0800
+++ dump/tests/ZZ_hangman	2001-01-14 08:10:57.881621122 +0800
@@ -27,7 +27,7 @@ fi
 echo 0xf > i915_ring_stop
 echo "rings stopped"
 
-(cd $oldpath; $SOURCE_DIR/gem_exec_nop) > /dev/null
+(cd $oldpath; $SOURCE_DIR/gem_exec_big) > /dev/null
 
 if cat i915_error_state | grep -v "no error state collected" > /dev/null ; then
 	echo "gpu hang correctly dectected"
diff -rupN dump_1/tools/intel_error_decode.c dump/tools/intel_error_decode.c
--- dump_1/tools/intel_error_decode.c	2001-01-14 08:11:40.093619273 +0800
+++ dump/tools/intel_error_decode.c	2001-01-14 08:10:57.882621019 +0800
@@ -49,6 +49,7 @@
 #include <errno.h>
 #include <sys/stat.h>
 #include <err.h>
+#include <assert.h>
 #include <intel_bufmgr.h>
 
 #include "intel_chipset.h"
@@ -479,13 +480,18 @@ main (int argc, char *argv[])
     }
 
     if (S_ISDIR (st.st_mode)) {
-	asprintf (&filename, "%s/i915_error_state", path);
+	int ret;
+
+	ret = asprintf (&filename, "%s/i915_error_state", path);
+	assert(ret > 0);
 	file = fopen(filename, "r");
 	if (!file) {
 	    int minor;
 	    for (minor = 0; minor < 64; minor++) {
 		free(filename);
-		asprintf(&filename, "%s/%d/i915_error_state", path, minor);
+		ret = asprintf(&filename, "%s/%d/i915_error_state", path, minor);
+		assert(ret > 0);
+
 		file = fopen(filename, "r");
 		if (file)
 		    break;
diff -rupN dump_1/tools/intel_gpu_abrt dump/tools/intel_gpu_abrt
--- dump_1/tools/intel_gpu_abrt	2001-01-14 08:11:40.093619273 +0800
+++ dump/tools/intel_gpu_abrt	2001-01-14 08:10:57.883620919 +0800
@@ -1,5 +1,21 @@
 #!/bin/sh
 
+if [[ $UID -ne 0 ]]; then
+    echo "$0 must be run as root"
+    exit 1
+fi
+
+get(){
+    if [ ! -e $tardir/${@:$#} ] ; then
+	mkdir -p $tardir/${@:$#}
+    fi
+    if [ -e $1 ] ; then
+	cp -a ${@:1:$#-1} $tardir/${@:$#} 2>/dev/null
+    fi
+}
+
+igtdir=`dirname $0`
+
 if [ -d /debug/dri ] ; then
 	debugfs_path=/debug_dri
 fi
@@ -25,21 +41,62 @@ tmpdir=`mktemp -d`
 tardir=$tmpdir/intel_gpu_abrt
 mkdir $tardir
 
-mkdir $tardir/debugfs
-cp $i915_debugfs/* $tardir/debugfs
+get $i915_debugfs/* debugfs
 
-mkdir $tardir/mod_opts
-cp /sys/module/i915/parameters/* $tardir/mod_opts
+get /sys/module/i915/parameters/* mod_opts
 
 mkdir $tardir/X
-cp /var/log/Xorg.*.log $tardir/X
-cp /etc/X11/xorg.conf $tardir/X
+xrandr --verbose > $tardir/X/xrandr
+get /var/log/Xorg.0.log X
+get /var/log/Xorg.0.log.old X
+get /etc/X11/xorg.conf X
+get /etc/X11/xorg.conf.d/ X
 
 dmesg > $tardir/dmesg
 lspci -nn > $tardir/lspci
 
+$igtdir/intel_reg_dumper > $tardir/intel_reg_dumper.txt
+$igtdir/intel_bios_dumper $tardir/intel_bios_dump
+$igtdir/intel_stepping > $tardir/intel_stepping
+
+echo 1 > /sys/devices/pci0000:00/0000:00:02.0/rom
+cat /sys/devices/pci0000:00/0000:00:02.0/rom > $tardir/vbios.dump
+echo 0 > /sys/devices/pci0000:00/0000:00:02.0/rom
+
 (cd $tmpdir; tar -c intel_gpu_abrt ) > intel_gpu_abrt.tar
 
 rm $tmpdir -Rf
 
+if [ -f intel_gpu_abrt.tar ] ; then
+	cat <<EOF
+intel_gpu_abrt.tar has been created.
+
+Please attach it to https://bugs.freedesktop.org
+with a good bug description as suggested in this template:
+
+System environment:
+-- chipset:
+-- system architecture: `uname -m`
+-- xf86-video-intel:
+-- xserver: `grep "X.Org X Server" /var/log/Xorg.0.log | awk '{print $NF}'`
+-- mesa:
+-- libdrm: `pkg-config --modversion libdrm`
+-- kernel: `uname -r`
+-- Linux distribution:
+-- Machine or mobo model:
+-- Display connector:
+
+Reproducing steps:
+
+Additional info:
+
+EOF
 exit 0
+else
+cat <<EOF
+Error on tarball generation.
+For bug report, please follow manual instructions available at:
+https://01.org/linuxgraphics/documentation/how-report-bugs-0
+EOF
+exit 1
+fi
diff -rupN dump_1/tools/intel_gtt.c dump/tools/intel_gtt.c
--- dump_1/tools/intel_gtt.c	2001-01-14 08:11:40.094619273 +0800
+++ dump/tools/intel_gtt.c	2001-01-14 08:10:57.883620919 +0800
@@ -70,7 +70,7 @@ int main(int argc, char **argv)
 				break;
 		} else {
 			int offset;
-			if (IS_G4X(devid) || IS_GEN5(devid))
+			if (IS_G4X(devid) || IS_GEN5(devid) || IS_VALLEYVIEW(devid))
 				offset = MB(2);
 			else
 				offset = KB(512);
diff -rupN dump_1/tools/intel_infoframes.c dump/tools/intel_infoframes.c
--- dump_1/tools/intel_infoframes.c	2001-01-14 08:11:40.095619273 +0800
+++ dump/tools/intel_infoframes.c	2001-01-14 08:10:57.885620728 +0800
@@ -125,6 +125,8 @@ typedef enum {
 #define SPD_INFOFRAME_VERSION 0x01
 #define SPD_INFOFRAME_LENGTH  0x19
 
+#define VENDOR_ID_HDMI	0x000c03
+
 typedef struct {
 	uint8_t type;
 	uint8_t version;
@@ -175,6 +177,21 @@ typedef union {
 	} __attribute__((packed)) spd;
 	struct {
 		DipInfoFrameHeader header;
+		uint8_t checksum;
+
+		uint8_t id[3];
+
+		uint8_t Rsvd0        :5;
+		uint8_t video_format :3;
+
+		uint8_t Rsvd1         :4;
+		uint8_t s3d_structure :4;
+
+		uint8_t Rsvd2        :4;
+		uint8_t s3d_ext_data :4;
+	} __attribute__((packed)) vendor;
+	struct {
+		DipInfoFrameHeader header;
 		uint8_t body[27];
 	} generic;
 	uint8_t data8[128];
@@ -424,10 +441,45 @@ static void dump_avi_info(Transcoder tra
 		printf("Invalid InfoFrame checksum!\n");
 }
 
+static const char *vendor_id_to_string(uint32_t id)
+{
+	switch (id) {
+	case VENDOR_ID_HDMI:
+		return "HDMI";
+	default:
+		return "Unknown";
+	}
+}
+
+static const char *s3d_structure_to_string(int format)
+{
+	switch (format) {
+	case 0:
+		return "Frame Packing";
+	case 6:
+		return "Top Bottom";
+	case 8:
+		return "Side By Side (half)";
+	default:
+		return "Reserved";
+	}
+}
+
+static void dump_vendor_hdmi(DipInfoFrame *frame)
+{
+	int s3d_present = frame->vendor.video_format & 0x2;
+
+	printf("- video format: 0x%03x %s\n", frame->vendor.video_format,
+	       s3d_present ? "(3D)" : "");
+	if (s3d_present)
+		printf("- 3D Format: %s\n",
+		       s3d_structure_to_string(frame->vendor.s3d_structure));
+}
+
 static void dump_vendor_info(Transcoder transcoder)
 {
 	Register reg = get_dip_ctl_reg(transcoder);
-	uint32_t val;
+	uint32_t val, vendor_id;
 	DipFrequency freq;
 	DipInfoFrame frame;
 
@@ -446,6 +498,15 @@ static void dump_vendor_info(Transcoder
 
 	dump_raw_infoframe(&frame);
 
+	vendor_id = frame.vendor.id[2] << 16 | frame.vendor.id[1] << 8 |
+		    frame.vendor.id[0];
+
+	printf("- vendor Id: 0x%06x (%s)\n", vendor_id,
+	       vendor_id_to_string(vendor_id));
+
+	if (vendor_id == VENDOR_ID_HDMI)
+		dump_vendor_hdmi(&frame);
+
 	if (!infoframe_valid_checksum(&frame))
 		printf("Invalid InfoFrame checksum!\n");
 }
@@ -772,7 +833,7 @@ static void change_spd_infoframe(Transco
 	val = INREG(reg);
 
 	while (1) {
-		rc = sscanf(current, "%31s%n", option, &read);
+		rc = sscanf(current, "%15s%n", option, &read);
 		current = &current[read];
 		if (rc == EOF) {
 			break;
diff -rupN dump_1/tools/intel_reg_dumper.c dump/tools/intel_reg_dumper.c
--- dump_1/tools/intel_reg_dumper.c	2001-01-14 08:11:40.097619273 +0800
+++ dump/tools/intel_reg_dumper.c	2001-01-14 08:10:57.886620637 +0800
@@ -1777,6 +1777,16 @@ static struct reg_debug ironlake_debug_r
 	DEFINEREG2(FDI_RXB_CTL, ironlake_debug_fdi_rx_ctl),
 	DEFINEREG2(FDI_RXC_CTL, ironlake_debug_fdi_rx_ctl),
 
+	DEFINEREG(DPAFE_BMFUNC),
+	DEFINEREG(DPAFE_DL_IREFCAL0),
+	DEFINEREG(DPAFE_DL_IREFCAL1),
+	DEFINEREG(DPAFE_DP_IREFCAL),
+
+	DEFINEREG(PCH_DSPCLK_GATE_D),
+	DEFINEREG(PCH_DSP_CHICKEN1),
+	DEFINEREG(PCH_DSP_CHICKEN2),
+	DEFINEREG(PCH_DSP_CHICKEN3),
+
 	DEFINEREG2(FDI_RXA_MISC, ironlake_debug_fdi_rx_misc),
 	DEFINEREG2(FDI_RXB_MISC, ironlake_debug_fdi_rx_misc),
 	DEFINEREG2(FDI_RXC_MISC, ironlake_debug_fdi_rx_misc),
@@ -1918,7 +1928,6 @@ _intel_dump_reg(struct reg_debug *reg, u
 {
 	char debug[1024];
 
-#if 0
 	if (reg->debug_output != NULL) {
 		reg->debug_output(debug, sizeof(debug), reg->reg, val);
 		printf("%30.30s: 0x%08x (%s)\n",
@@ -1926,8 +1935,6 @@ _intel_dump_reg(struct reg_debug *reg, u
 	} else {
 		printf("%30.30s: 0x%08x\n", reg->name, val);
 	}
-#endif
-	printf("%30.30s: 0x%08x\n", reg->name, reg->reg);
 }
 
 #define intel_dump_regs(regs) _intel_dump_regs(regs, ARRAY_SIZE(regs))
diff -rupN dump_1/tools/intel_reg_read.c dump/tools/intel_reg_read.c
--- dump_1/tools/intel_reg_read.c	2001-01-14 08:11:40.097619273 +0800
+++ dump/tools/intel_reg_read.c	2001-01-14 08:11:17.683616867 +0800
@@ -31,6 +31,7 @@
 #include <err.h>
 #include <string.h>
 #include "intel_gpu_tools.h"
+#include "intel_vlv.h"
 
 static void bit_decode(uint32_t reg)
 {
@@ -48,10 +49,21 @@ static void bit_decode(uint32_t reg)
 static void dump_range(uint32_t start, uint32_t end)
 {
 	int i;
+	uint32_t offset = 0;
+	struct pci_device *pci_dev;
+	pci_dev = intel_get_pci_device();
+
+
+	for (i = start; i < end; i += 4){
+		if (IS_VALLEYVIEW(pci_dev->device_id) && IS_DISPLAYREG(start))
+                        offset = 0x180000;
+                else
+                        offset=0x0;
+        	
 
-	for (i = start; i < end; i += 4)
 		printf("0x%X : 0x%X\n", i,
-		       *(volatile uint32_t *)((volatile char*)mmio + i));
+		       *(volatile uint32_t *)((volatile char*)mmio + i + offset));
+	}
 }
 
 static void usage(char *cmdname)
@@ -68,6 +80,8 @@ int main(int argc, char** argv)
 {
 	int ret = 0;
 	uint32_t reg;
+	struct pci_device *pci_dev;
+
 	int i, ch;
 	char *cmdname = strdup(argv[0]);
 	int full_dump = 0;
@@ -123,6 +137,7 @@ int main(int argc, char** argv)
 		dump_range(0x60000, 0x6ffff);   /* display engine pipeline registers */
 		dump_range(0x70000, 0x72fff);   /* display and cursor registers */
 		dump_range(0x73000, 0x73fff);   /* performance counters */
+
 	} else {
 		for (i=0; i < argc; i++) {
 			sscanf(argv[i], "0x%x", &reg);
diff -rupN dump_1/tools/intel_reg_snapshot.c dump/tools/intel_reg_snapshot.c
--- dump_1/tools/intel_reg_snapshot.c	2001-01-14 08:11:40.097619273 +0800
+++ dump/tools/intel_reg_snapshot.c	2001-01-14 08:10:57.888620464 +0800
@@ -25,6 +25,7 @@
  */
 
 #include <unistd.h>
+#include <assert.h>
 #include "intel_gpu_tools.h"
 
 int main(int argc, char** argv)
@@ -32,6 +33,7 @@ int main(int argc, char** argv)
 	struct pci_device *pci_dev;
 	uint32_t devid;
 	int mmio_bar;
+	int ret;
 
 	pci_dev = intel_get_pci_device();
 	devid = pci_dev->device_id;
@@ -42,7 +44,8 @@ int main(int argc, char** argv)
 	else
 		mmio_bar = 0;
 
-	write(1, mmio, pci_dev->regions[mmio_bar].size);
+	ret = write(1, mmio, pci_dev->regions[mmio_bar].size);
+	assert(ret > 0);
 
 	return 0;
 }
diff -rupN dump_1/tools/intel_reg_write.c dump/tools/intel_reg_write.c
--- dump_1/tools/intel_reg_write.c	2001-01-14 08:11:40.097619273 +0800
+++ dump/tools/intel_reg_write.c	2001-01-14 08:10:57.888620464 +0800
@@ -30,11 +30,13 @@
 #include <stdio.h>
 #include <err.h>
 #include "intel_gpu_tools.h"
+#include "intel_vlv.h"
 
 int main(int argc, char** argv)
 {
-	uint32_t reg, value;
+	uint32_t reg, value, offset = 0;
 	volatile uint32_t *ptr;
+	struct pci_device *pci_dev;
 
 	if (argc < 3) {
 		printf("Usage: %s addr value\n", argv[0]);
@@ -44,9 +46,16 @@ int main(int argc, char** argv)
 	}
 
 	intel_register_access_init(intel_get_pci_device(), 0);
+	pci_dev = intel_get_pci_device();
+
 	sscanf(argv[1], "0x%x", &reg);
 	sscanf(argv[2], "0x%x", &value);
-	ptr = (volatile uint32_t *)((volatile char *)mmio + reg);
+
+
+	if (IS_VALLEYVIEW(pci_dev->device_id) && IS_DISPLAYREG(reg))
+		offset = 0x180000;
+
+	ptr = (volatile uint32_t *)((volatile char *)mmio + reg + offset);
 
 	printf("Value before: 0x%X\n", *ptr);
 	*ptr = value;

[-- Attachment #3: intel-gpu-tools_master.patch --]
[-- Type: application/octet-stream, Size: 9107 bytes --]

diff -rupN intel-gpu-tools/lib/intel_chipset.h intel/lib/intel_chipset.h
--- intel-gpu-tools/lib/intel_chipset.h	2001-01-14 08:29:30.929654461 +0800
+++ intel/lib/intel_chipset.h	2001-01-14 08:30:33.033651024 +0800
@@ -122,7 +122,10 @@
 #define PCI_CHIP_HASWELL_CRW_S_GT2      0x0D2A
 #define PCI_CHIP_HASWELL_CRW_S_GT2_PLUS 0x0D3A
 
-#define PCI_CHIP_VALLEYVIEW_PO		0x0f30 /* VLV PO board */
+#define PCI_CHIP_VALLEYVIEWO		0x0f30 /* VLV PO board */
+#define PCI_CHIP_VALLEYVIEW1		0x0f31 
+#define PCI_CHIP_VALLEYVIEW2		0x0f32 
+#define PCI_CHIP_VALLEYVIEW3		0x0f33 
 
 #define IS_MOBILE(devid)	(devid == PCI_CHIP_I855_GM || \
 				 devid == PCI_CHIP_I915_GM || \
@@ -194,9 +197,16 @@
 				 dev == PCI_CHIP_IVYBRIDGE_M_GT2 || \
 				 dev == PCI_CHIP_IVYBRIDGE_S || \
 				 dev == PCI_CHIP_IVYBRIDGE_S_GT2 || \
-				 dev == PCI_CHIP_VALLEYVIEW_PO)
-
-#define IS_VALLEYVIEW(devid)	(devid == PCI_CHIP_VALLEYVIEW_PO)
+				 dev == PCI_CHIP_VALLEYVIEWO ||\
+				 dev == PCI_CHIP_VALLEYVIEW1 ||\
+				 dev == PCI_CHIP_VALLEYVIEW2 ||\
+				 dev == PCI_CHIP_VALLEYVIEW3 )
+
+#define IS_VALLEYVIEW(devid)	(devid == PCI_CHIP_VALLEYVIEWO || \
+				 devid == PCI_CHIP_VALLEYVIEW1 || \
+				 devid == PCI_CHIP_VALLEYVIEW2 || \
+				 devid == PCI_CHIP_VALLEYVIEW3 )
+				
 
 #define IS_HSW_GT1(devid)       (devid == PCI_CHIP_HASWELL_GT1 || \
 				 devid == PCI_CHIP_HASWELL_M_GT1 || \
diff -rupN intel-gpu-tools/lib/intel_vlv.h intel/lib/intel_vlv.h
--- intel-gpu-tools/lib/intel_vlv.h	1970-01-01 07:30:00.000000000 +0730
+++ intel/lib/intel_vlv.h	2001-01-14 08:30:33.034651111 +0800
@@ -0,0 +1,145 @@
+/* VLV specific header */
+
+#ifndef _INTEL_VLV_H_
+#define _INTEL_VLV_H_
+
+#define false 0
+#define true 1
+
+#define VLV_DISPLAY_BASE                0x180000
+#define RENDER_RING_BASE                0x02000
+#define BLT_RING_BASE                   0x22000
+#define GFX_MODE_GEN7                   0x0229c
+#define RENDER_HWS_PGA_GEN7            (0x04080)
+#define BSD_HWS_PGA_GEN7               (0x04180)
+#define BLT_HWS_PGA_GEN7               (0x04280)
+#define GEN6_BSD_SLEEP_PSMI_CONTROL     0x12050
+#define GEN6_BSD_RNCID                  0x12198
+#define GEN6_BLITTER_ECOSKPD            0x221d0
+#define VLV_MASTER_IER                  0x4400c /* Gunit master IER */
+#define GEN6_PMIER                      0x4402C
+#define VLV_IIR_RW                      0x182084
+#define VLV_ISR                         0x1820ac
+#define FORCEWAKE_VLV                   0x1300b0
+#define FORCEWAKE_ACK_VLV               0x1300b4
+#define GEN6_GDRST                      0x941c
+#define _3D_CHICKEN3                    0x02090
+#define IVB_CHICKEN3                    0x4200c
+#define GEN7_HALF_SLICE_CHICKEN1        0xe100 /* IVB GT1 + VLV */
+#define GEN7_L3CNTLREG1                 0xB01C
+#define GEN7_L3_CHICKEN_MODE_REGISTER   0xB030
+#define GEN7_ROW_CHICKEN2               0xe4f4
+#define GEN7_L3SQCREG4                  0xb034
+#define GEN7_SQ_CHICKEN_MBCUNIT_CONFIG  0x9030
+#define GEN6_MBCTL                      0x0907c
+#define GEN6_UCGCTL2                    0x9404
+#define GEN7_UCGCTL4                    0x940c
+#define FENCE_REG_SANDYBRIDGE_0         0x100000
+#define GEN6_BSD_RING_BASE              0x12000
+#define GEN7_COMMON_SLICE_CHICKEN1      0x7010
+
+
+
+static int IS_DISPLAYREG(uint32_t reg)
+{
+
+	if (reg >= VLV_DISPLAY_BASE)
+		return false;
+
+	if (reg >= RENDER_RING_BASE &&
+                        reg < RENDER_RING_BASE + 0xff)
+		return false;
+
+
+	if (reg >= GEN6_BSD_RING_BASE &&
+                        reg < GEN6_BSD_RING_BASE + 0xff)
+		return false;
+
+	if (reg >= BLT_RING_BASE &&
+                        reg < BLT_RING_BASE + 0xff)
+                return false;
+
+	if (reg == PGTBL_ER)
+                return false;
+
+        if (reg >= IPEIR_I965 &&
+                        reg < HWSTAM)
+                return false;
+
+	if (reg == MI_MODE)
+                return false;
+
+        if (reg == GFX_MODE_GEN7)
+                return false;
+
+        if (reg == RENDER_HWS_PGA_GEN7 ||
+                        reg == BSD_HWS_PGA_GEN7 ||
+                        reg == BLT_HWS_PGA_GEN7)
+                return false;
+
+        if (reg == GEN6_BSD_SLEEP_PSMI_CONTROL ||
+                        reg == GEN6_BSD_RNCID)
+                return false;
+
+        if (reg == GEN6_BLITTER_ECOSKPD)
+                return false;
+
+        if (reg >= 0x4000c &&
+                        reg <= 0x4002c)
+                return false;
+
+        if (reg >= 0x4f000 &&
+                        reg <= 0x4f08f)
+                return false;
+
+        if (reg >= 0x4f100 &&
+                        reg <= 0x4f11f)
+                return false;
+
+        if (reg >= VLV_MASTER_IER &&
+                        reg <= GEN6_PMIER)
+                return false;
+
+	if (reg >= FENCE_REG_SANDYBRIDGE_0 &&
+                        reg < (FENCE_REG_SANDYBRIDGE_0 + (16*8)))
+                return false;
+
+        if (reg >= VLV_IIR_RW &&
+                        reg <= VLV_ISR)
+                return false;
+
+        if (reg == FORCEWAKE_VLV ||
+                        reg == FORCEWAKE_ACK_VLV ||
+                        reg == 0x130090)
+                return false;
+
+        if (reg == GEN6_GDRST)
+                return false;
+
+        if(reg > 0x9400 && reg <= 0x9418){
+                return false;
+        }
+
+	  switch (reg) {
+               case _3D_CHICKEN3:
+               case IVB_CHICKEN3:
+               case GEN7_HALF_SLICE_CHICKEN1:
+               case GEN7_COMMON_SLICE_CHICKEN1:
+               case GEN7_L3CNTLREG1:
+               case GEN7_L3_CHICKEN_MODE_REGISTER:
+               case GEN7_ROW_CHICKEN2:
+               case GEN7_L3SQCREG4:
+               case GEN7_SQ_CHICKEN_MBCUNIT_CONFIG:
+               case GEN6_MBCTL:
+               case GEN6_UCGCTL2:
+               case GEN7_UCGCTL4:
+                      return false;
+               default:
+                      break;
+        }
+
+        return true;
+}
+
+#endif
+
diff -rupN intel-gpu-tools/tools/intel_gtt.c intel/tools/intel_gtt.c
--- intel-gpu-tools/tools/intel_gtt.c	2001-01-14 08:29:30.968654461 +0800
+++ intel/tools/intel_gtt.c	2001-01-14 08:30:33.035651197 +0800
@@ -70,7 +70,7 @@ int main(int argc, char **argv)
 				break;
 		} else {
 			int offset;
-			if (IS_G4X(devid) || IS_GEN5(devid))
+			if (IS_G4X(devid) || IS_GEN5(devid) || IS_VALLEYVIEW(devid))
 				offset = MB(2);
 			else
 				offset = KB(512);
diff -rupN intel-gpu-tools/tools/intel_reg_read.c intel/tools/intel_reg_read.c
--- intel-gpu-tools/tools/intel_reg_read.c	2001-01-14 08:29:30.970654461 +0800
+++ intel/tools/intel_reg_read.c	2001-01-14 08:31:18.525658454 +0800
@@ -31,6 +31,7 @@
 #include <err.h>
 #include <string.h>
 #include "intel_gpu_tools.h"
+#include "intel_vlv.h"
 
 static void bit_decode(uint32_t reg)
 {
@@ -48,10 +49,21 @@ static void bit_decode(uint32_t reg)
 static void dump_range(uint32_t start, uint32_t end)
 {
 	int i;
+	uint32_t offset = 0;
+	struct pci_device *pci_dev;
+	pci_dev = intel_get_pci_device();
+
+
+	for (i = start; i < end; i += 4){
+		if (IS_VALLEYVIEW(pci_dev->device_id) && IS_DISPLAYREG(start))
+                        offset = 0x180000;
+                else
+                        offset=0x0;
+        	
 
-	for (i = start; i < end; i += 4)
 		printf("0x%X : 0x%X\n", i,
-		       *(volatile uint32_t *)((volatile char*)mmio + i));
+		       *(volatile uint32_t *)((volatile char*)mmio + i + offset));
+	}
 }
 
 static void usage(char *cmdname)
@@ -68,6 +80,8 @@ int main(int argc, char** argv)
 {
 	int ret = 0;
 	uint32_t reg;
+	struct pci_device *pci_dev;
+
 	int i, ch;
 	char *cmdname = strdup(argv[0]);
 	int full_dump = 0;
@@ -123,6 +137,7 @@ int main(int argc, char** argv)
 		dump_range(0x60000, 0x6ffff);   /* display engine pipeline registers */
 		dump_range(0x70000, 0x72fff);   /* display and cursor registers */
 		dump_range(0x73000, 0x73fff);   /* performance counters */
+
 	} else {
 		for (i=0; i < argc; i++) {
 			sscanf(argv[i], "0x%x", &reg);
diff -rupN intel-gpu-tools/tools/intel_reg_write.c intel/tools/intel_reg_write.c
--- intel-gpu-tools/tools/intel_reg_write.c	2001-01-14 08:29:30.971654461 +0800
+++ intel/tools/intel_reg_write.c	2001-01-14 08:31:25.451662334 +0800
@@ -30,11 +30,13 @@
 #include <stdio.h>
 #include <err.h>
 #include "intel_gpu_tools.h"
+#include "intel_vlv.h"
 
 int main(int argc, char** argv)
 {
-	uint32_t reg, value;
+	uint32_t reg, value, offset = 0;
 	volatile uint32_t *ptr;
+	struct pci_device *pci_dev;
 
 	if (argc < 3) {
 		printf("Usage: %s addr value\n", argv[0]);
@@ -44,9 +46,16 @@ int main(int argc, char** argv)
 	}
 
 	intel_register_access_init(intel_get_pci_device(), 0);
+	pci_dev = intel_get_pci_device();
+
 	sscanf(argv[1], "0x%x", &reg);
 	sscanf(argv[2], "0x%x", &value);
-	ptr = (volatile uint32_t *)((volatile char *)mmio + reg);
+
+
+	if (IS_VALLEYVIEW(pci_dev->device_id) && IS_DISPLAYREG(reg))
+		offset = 0x180000;
+
+	ptr = (volatile uint32_t *)((volatile char *)mmio + reg + offset);
 
 	printf("Value before: 0x%X\n", *ptr);
 	*ptr = value;

[-- Attachment #4: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: intel-gpu-tools patches for read/write MMIO
  2013-01-29  8:16           ` intel-gpu-tools patches for read/write MMIO Cheah, Vincent Beng Keat
@ 2013-01-29 20:01             ` Jesse Barnes
  2013-01-29 20:15               ` Daniel Vetter
  0 siblings, 1 reply; 13+ messages in thread
From: Jesse Barnes @ 2013-01-29 20:01 UTC (permalink / raw)
  To: Cheah, Vincent Beng Keat
  Cc: Vetter, Daniel, intel-gfx@lists.freedesktop.org, Ung, Teng En,
	Teres Alexis, Alan Previn, Widawsky, Benjamin

Can you just post them externally to intel-gfx@lists.freedesktop.org?
It's best to use git send-email to do it, that way the changelogs are
preserved and posted to the ml along with the patches.

Not sure if there's a bunch of duplication between the two, but you
could split them up a bit.

I still don't like the idea of silently adding the display offset on
vlv; these are just debug tools and the developer should get the
absolute offset they asked for no matter what.

Jesse

On Tue, 29 Jan 2013 00:16:51 -0800
"Cheah, Vincent Beng Keat" <vincent.beng.keat.cheah@intel.com> wrote:

> Hi 
> 
> Attached refers to two different patches that I have made for Benjamin Windawsky’s branch (bwidawsk_branch.patch) and intel-gpu-tools (master branch - intel-gpu-tools_master.patch). Alternative link: (\\pglvm2008-v03.png.intel.com\automation\binary\Linux\Automation\patches )
> 
> patches: 
> 	•	intel-gpu-tools-1.3_master.patch 
> 		o	To be applied on latest intel-gpu-tools-1.3 (git clone git://anongit.freedesktop.org/xorg/app/intel-gpu-tools ) 
> 		o	The patches added are VLV chipset support + correcting intel_read_reg.c, intel_reg_wirte.c and intel_gtt.c
> 		o	Web link: http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/
> 	•	bwidawsk_branch.patch
> 		o	To be applied on Benjamin Windawsky’s branch (git clone git://people.freedesktop.org/~bwidawsk/intel-gpu-tools -b dump_util
> 		o	The patches added are VLV chipset support + correcting intel_read_reg.c, intel_reg_wirte.c and intel_gtt.c + merge in change(s) from intel-gpu-tools-1.3
> 		o	Web link: http://cgit.freedesktop.org/~bwidawsk/intel-gpu-tools/?h=dump_util
> 
> Could somebody you please help to upstream this? 
> 
> Thanks.
> 
> Best regards, 
> Vincent 
> 
> 
> -----Original Message-----
> From: Ben Widawsky [mailto:benjamin.widawsky@intel.com] 
> Sent: Tuesday, January 15, 2013 2:55 PM
> To: Teres Alexis, Alan Previn
> Cc: Barnes, Jesse; Cheah, Vincent Beng Keat; Vetter, Daniel
> Subject: Re: intel-gpu-tools patches for read/write MMIO
> 
> On Mon, Jan 14, 2013 at 10:42 PM, Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com> wrote:
> > Ben, point us to that infrastructure ur working on - and since ur currently maintaining the intel-gpu-tools, let us know if that framework is still being worked on for VLV support or if someone else is working on adding VLV support in some form into the intel-gpu-tools.
> > Vincent is already starting to work on adding IS_DISPLAY_REG for VLV. Don’t want any overlap - let us know if so.
> >
> 
> I am too lazy to find the mailing list post, but here it is:
> http://cgit.freedesktop.org/~bwidawsk/intel-gpu-tools/log/?h=dump_util
> 
> I made some changed during PO which I probably never pushed. I'd have to look. IMO, this is the way to go though. (see vlv_display.txt)
> 
> > On the intel_reg_read/write should only do what the user asks - I agree with that. But if that function is being re-used by other internal tests like "dump display regs" or something, then an internal function could pass in that value - i.e. the option to explicitly say if its display or not should still be there.
> 
> We don't have the kind of capability you're referring to there. It would be nice to have, but not there yet. Anyway, I agree with you.
> 
> > Also, the option to have a text file define the range sounds excellent 
> > - but should stop the one-off cmd line drive reg read / write - which 
> > I am sure is not being removed by anyone in any branch for any reason 
> > :P
> 
> Yeah, I think Daniel gave up arguing against it, I forget if I was supposed to resubmit the patch. It came up at our London meeting.
> Anyone remember?
> 
> >
> > ...alan
> >
> >
> > -----Original Message-----
> > From: Ben Widawsky [mailto:benjamin.widawsky@intel.com]
> > Sent: Tuesday, January 15, 2013 12:49 PM
> > To: Teres Alexis, Alan Previn
> > Cc: Barnes, Jesse; Cheah, Vincent Beng Keat; Vetter, Daniel
> > Subject: Re: intel-gpu-tools patches for read/write MMIO
> >
> > This is what that infrastructure I worked on was meant to do (where a text file defines the registers you want to read), you know, the one Daniel more or less nak'd ;-) ... intel_reg_read/write shouldn't ever do anything except what the user asked. Personally, I think the dump range never belonged in read/write, but that predated me.
> > intel_reg_dumper is a bit of another story though, see first sentenc.
> >
> > There is no need to work with Daniel directly if you don't want.
> > Simply submit them to the intel-gfx mailing lists. If we have patches that cannot be me public yet, we have an internal list for that which we can point you to (and I am currently maintaining that intel-gpu-tools repository).
> >
> > Anyway, I wasn't directly addressed, so I'll butt out having left my 
> > $.02 :-)
> >
> > On Mon, Jan 14, 2013 at 6:19 PM, Teres Alexis, Alan Previn <alan.previn.teres.alexis@intel.com> wrote:
> >> Hey Jesse and Daniel,
> >> Looks like our team mate didn't add VLV support into the whole intel_gpu_tools suite, he only added VLV support intel_reg_read and intel_reg_write - where the 0x180000 was hard coded for manual user register reads and register writes.
> >> The other tests would pass or fail depending. For example, intel_reg_dumper.c might fail (in most cases), because its mostly display regs and needs the 0x18000 but intel_gem_blahblah tests would pass because I belive most of them don't touch display regs.
> >> But any tests that want to verify GTT might fail because the gtt mapping was not modded to support VLV.
> >>
> >> Jesse, Daniel,  do u have someone on OTC enabling full support of VLV for intel-gpu-tools??? If not, then  then Vincent has volunteered to enable this and upstream thru Daniel - I will help him add explicit support on test-case by test-case basis as I summarized above.
> >>
> >> For generic reading / writing regs, I would propose an additional param (that is defaulted to zero) that means "is_display_reg" so the user could explicitly request to read or write a register and tell the tool that it IS_DISPLAY or  IS_NOT_DISPLAY. And in other cases, this tool will decide based on the same IS_DISPLAY macro in the kernel driver. (the optional override is important since we have overlapping IRQ and some other registers that have the same offset for both render and display and those cases require explicit mention).
> >>
> >> ...alan
> >>
> >>
> >> -----Original Message-----
> >> From: Teres Alexis, Alan Previn
> >> Sent: Tuesday, January 15, 2013 7:13 AM
> >> To: Barnes, Jesse; Cheah, Vincent Beng Keat
> >> Cc: Vetter, Daniel; Widawsky, Benjamin
> >> Subject: RE: intel-gpu-tools patches for read/write MMIO
> >>
> >> Vincent - lets review this offline - if intel-gpu-tools holds register names and addresses, then we can add that driver IS_VLV_DISPLAY_REG macro into that tool (which handles the optional need to add - or not to add - the 0x180000 offset).
> >> Else we should remove it and just ensure the MMIO BAR ranges can cover the larger range.
> >> ...alan
> >>
> >> -----Original Message-----
> >> From: Barnes, Jesse
> >> Sent: Monday, January 14, 2013 11:38 PM
> >> To: Cheah, Vincent Beng Keat
> >> Cc: Vetter, Daniel; Teres Alexis, Alan Previn; Widawsky, Benjamin
> >> Subject: Re: intel-gpu-tools patches for read/write MMIO
> >>
> >> On Mon, 14 Jan 2013 00:57:15 -0800
> >> "Cheah, Vincent Beng Keat" <vincent.beng.keat.cheah@intel.com> wrote:
> >>
> >>> Hi Daniel.
> >>> Attached refers to the patches  that I have done on intel-gpu-tools-1.3 to read and write MMIO register for VLV platform specific.
> >>>
> >>> Could help me to make this  upstream.
> >>
> >> I don't think this is quite right.  Not all of the regs are above 0x180000, just the display ones.
> >>
> >> Also, I think we should drop the comments about "PO boards" and just call them VLV_D, VLV_M, and VLV_T to match the SKUs we have.
> >>
> >> I don't think we need to add the offset to _read & _write either; those are just bare tools and users can just add the offset themselves.
> >>
> >> But yes, we do have permission to publish this stuff, so you can publish an updated patch to the mailing list.
> >>
> >> Thanks,
> >> Jesse
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: intel-gpu-tools patches for read/write MMIO
  2013-01-29 20:01             ` Jesse Barnes
@ 2013-01-29 20:15               ` Daniel Vetter
  2013-01-30  1:12                 ` Ben Widawsky
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2013-01-29 20:15 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: intel-gfx@lists.freedesktop.org, Cheah, Vincent Beng Keat,
	Ung, Teng En, Teres Alexis, Alan Previn, Widawsky, Benjamin

On 29/01/2013 21:01, Jesse Barnes wrote:
> Can you just post them externally tointel-gfx@lists.freedesktop.org?
> It's best to use git send-email to do it, that way the changelogs are
> preserved and posted to the ml along with the patches.
Public intel-gfx is already on the cc list, just in case you get the 
urge to spill some secrets ;-)
> Not sure if there's a bunch of duplication between the two, but you
> could split them up a bit.
>
> I still don't like the idea of silently adding the display offset on
> vlv; these are just debug tools and the developer should get the
> absolute offset they asked for no matter what.
On that topic of silently adding display offset - with Ville's kernel 
work we'll have switched away completely from such tricks in the kernel. 
So I think i-g-t shouldn't automatically add the offset.

Which essentially just leaves us with intel_reg_dumper. Now for that I'm 
somewhat hopefully that we will be able to (eventually) dump registers 
using the bspec xml sources (there should be bspec xmls around for just 
the open-source approved parts). In the meantime, can't we just adjust 
the relevant offsets of the register blocks? IIrc their all somewhat 
usefully grouped together, so this would amount to adding a quick 
function to add the offset to a given table (put keep all the names) and 
then feed the adjusted table to the dumper functions ...
-Daniel
Intel Semiconductor AG
Registered No. 020.30.913.786-7
Registered Office: World Trade Center, Leutschenbachstrasse 95, 8050 Zurich, Switzerland

This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: intel-gpu-tools patches for read/write MMIO
  2013-01-29 20:15               ` Daniel Vetter
@ 2013-01-30  1:12                 ` Ben Widawsky
  2013-01-30  1:39                   ` Teres Alexis, Alan Previn
  2013-01-30 17:13                   ` Jesse Barnes
  0 siblings, 2 replies; 13+ messages in thread
From: Ben Widawsky @ 2013-01-30  1:12 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Cheah, Vincent Beng Keat, Ung, Teng En, Teres Alexis, Alan Previn,
	intel-gfx@lists.freedesktop.org, Jesse Barnes, Widawsky, Benjamin

On Tue, Jan 29, 2013 at 09:15:22PM +0100, Daniel Vetter wrote:
> On 29/01/2013 21:01, Jesse Barnes wrote:
> >Can you just post them externally tointel-gfx@lists.freedesktop.org?
> >It's best to use git send-email to do it, that way the changelogs are
> >preserved and posted to the ml along with the patches.
> Public intel-gfx is already on the cc list, just in case you get the
> urge to spill some secrets ;-)
> >Not sure if there's a bunch of duplication between the two, but you
> >could split them up a bit.
> >
> >I still don't like the idea of silently adding the display offset on
> >vlv; these are just debug tools and the developer should get the
> >absolute offset they asked for no matter what.
> On that topic of silently adding display offset - with Ville's
> kernel work we'll have switched away completely from such tricks in
> the kernel. So I think i-g-t shouldn't automatically add the offset.
> 
> Which essentially just leaves us with intel_reg_dumper. Now for that
> I'm somewhat hopefully that we will be able to (eventually) dump
> registers using the bspec xml sources (there should be bspec xmls
> around for just the open-source approved parts). In the meantime,
> can't we just adjust the relevant offsets of the register blocks?
> IIrc their all somewhat usefully grouped together, so this would
> amount to adding a quick function to add the offset to a given table
> (put keep all the names) and then feed the adjusted table to the
> dumper functions ...
> -Daniel

As we discussed in private, even if we get to the point of having bspec
xml, we would still want a tool similar to the one that was proposed for
parsing the XML (as opposed to the text). Reg dumper as has been
mentioned in several threads is pretty inflexible, and a pain to modify
for person use.

As we also discussed in private, I'd like Jesse to either fight or not
for this because I don't think he has to butt heads with you enough.

-- 
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: intel-gpu-tools patches for read/write MMIO
  2013-01-30  1:12                 ` Ben Widawsky
@ 2013-01-30  1:39                   ` Teres Alexis, Alan Previn
  2013-01-30  3:27                     ` Teres Alexis, Alan Previn
  2013-01-30 17:13                   ` Jesse Barnes
  1 sibling, 1 reply; 13+ messages in thread
From: Teres Alexis, Alan Previn @ 2013-01-30  1:39 UTC (permalink / raw)
  To: Ben Widawsky, Vetter, Daniel
  Cc: intel-gfx@lists.freedesktop.org, Cheah, Vincent Beng Keat,
	Barnes, Jesse, Ung, Teng En, Widawsky, Benjamin

Hey folks, 

Putting previous work aside, I have to agree with Ben about getting the user to provide the absolute register offset - the adding of the 0x180000 into the igt tool patch below was based on some prior work we had (some internal ULTs we have running with this tool).
This makes more sense considering the fact that there are some registers that have the same B-Spec register offset for both Render and Display - except one needs the 180000 offset and the other doesn't - i.e. for those cases, u cant silently just add the 0x180000 by simply inspecting what offset it is.
Actually, I emailed about this previously and I remember that in that thread our team's incarnation of igt would have an additional input param for intel-reg-read/write to explicitly say if we want to dictate it as a display register or not (which is a NO-OP for non-VLV). But I see that got lost from the patch.

Ben, Daniel, please let us the final verdict - if the 0x180000 should NOT be added, we'll re-do the patches and require use explicit addition of the 180000 (and probably need to re-do our internal ULTs in some near future).

However, WRT to the xml file parsing - this patch has no relationship whatsoever to that. I agree that XML file parsing is a good idea but this patch is about enabling VLV support of existing i-g-t functions.
On the xml concept, should we perhaps consider chipset HW abstraction for the kernel driver too??? i.e. the registers header files?? (i915_regs.h --> PIPEA_STATUS_GEN vs ivlv_regs.h --> PIPEA_STATUS_VLV for example - with the 180000 already added to the latter)?

...alan




-----Original Message-----
From: Ben Widawsky [mailto:ben@bwidawsk.net] 
Sent: Wednesday, January 30, 2013 9:13 AM
To: Vetter, Daniel
Cc: Barnes, Jesse; intel-gfx@lists.freedesktop.org; Cheah, Vincent Beng Keat; Ung, Teng En; Teres Alexis, Alan Previn; Widawsky, Benjamin
Subject: Re: [Intel-gfx] intel-gpu-tools patches for read/write MMIO

On Tue, Jan 29, 2013 at 09:15:22PM +0100, Daniel Vetter wrote:
> On 29/01/2013 21:01, Jesse Barnes wrote:
> >Can you just post them externally tointel-gfx@lists.freedesktop.org?
> >It's best to use git send-email to do it, that way the changelogs are 
> >preserved and posted to the ml along with the patches.
> Public intel-gfx is already on the cc list, just in case you get the 
> urge to spill some secrets ;-)
> >Not sure if there's a bunch of duplication between the two, but you 
> >could split them up a bit.
> >
> >I still don't like the idea of silently adding the display offset on 
> >vlv; these are just debug tools and the developer should get the 
> >absolute offset they asked for no matter what.
> On that topic of silently adding display offset - with Ville's kernel 
> work we'll have switched away completely from such tricks in the 
> kernel. So I think i-g-t shouldn't automatically add the offset.
> 
> Which essentially just leaves us with intel_reg_dumper. Now for that 
> I'm somewhat hopefully that we will be able to (eventually) dump 
> registers using the bspec xml sources (there should be bspec xmls 
> around for just the open-source approved parts). In the meantime, 
> can't we just adjust the relevant offsets of the register blocks?
> IIrc their all somewhat usefully grouped together, so this would 
> amount to adding a quick function to add the offset to a given table 
> (put keep all the names) and then feed the adjusted table to the 
> dumper functions ...
> -Daniel

As we discussed in private, even if we get to the point of having bspec xml, we would still want a tool similar to the one that was proposed for parsing the XML (as opposed to the text). Reg dumper as has been mentioned in several threads is pretty inflexible, and a pain to modify for person use.

As we also discussed in private, I'd like Jesse to either fight or not for this because I don't think he has to butt heads with you enough.

--
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: intel-gpu-tools patches for read/write MMIO
  2013-01-30  1:39                   ` Teres Alexis, Alan Previn
@ 2013-01-30  3:27                     ` Teres Alexis, Alan Previn
  2013-01-30  3:39                       ` Ben Widawsky
  0 siblings, 1 reply; 13+ messages in thread
From: Teres Alexis, Alan Previn @ 2013-01-30  3:27 UTC (permalink / raw)
  To: 'Ben Widawsky', Vetter, Daniel, Cheah, Vincent Beng Keat
  Cc: 'intel-gfx@lists.freedesktop.org', Barnes, Jesse,
	Ung, Teng En, Widawsky, Benjamin

Vincent, quick realization: 

If we patch VLV support on Ben's branch, (i.e. QuickDump style - separate txt-file tables of register lists that have the correct absolute register offsets), then we should ensure intel_reg_dumper / intel_panel_fitter is removed (since that one is broken for VLV in its current incarnation)... which is what was wrong with our patch. Our patch for Ben's branch - we added the +180000 into the primitive functions and though that worked for the intel_reg_dumper / intel_panel_fitter, we ended up breaking Quick-dump . But this would be okay for main 1.3 tree. Basically these approaches are mutually exclusive (Quick-dump vs the intel_reg_dumper and alike tools). 

So what needs to happen is:

1. Daniel / Ben needs agree on what to go with for master igt (i.e. remove intel_panel_fitter, intel_reg_dumper etc  and replace with Quick-dump). I think the future is probably Quick-dump.
2. If they go with Quick-Dump, we can use Ben's stuff as is - no changes required - but we'll have to ensure that for individual register read / write, user uses the absolute MMIO offset.
	- in this case, no patch is required from our side (except the BAR memory mapping fixes for VLV - there was some other things here).
	- over time, we can add, additional VLV tables for QuickDump (since base_interrupt, base_power doesn't include other VLV IRQ/Power register sets not part of current base tables).
3. If they go with maintaining the intel_reg_dumper (for now), 
	- then we need to either add the +180000 in our primitive functions - which we don't really like OR modify the intel_reg_dumper / intel_panel_fitter/ etc with more "if(VALLEYVIEW)" and reference a new set of register tables with the VLV absolute offsets.

Lets wait for Ben and Daniel to give the go-ahead (I doubt we'll have to decide on #3, I think they'll lean towards #2).

Daniel, Ben - let us know when QuickDump is going live - we'll skip our patches for now - but we'll probably maintain our own internal bad version of intel_reg_read / write for now so we can carry on with our internal testing.

...alan


-----Original Message-----
From: Teres Alexis, Alan Previn 

Hey folks, 

Putting previous work aside, I have to agree with Ben about getting the user to provide the absolute register offset - the adding of the 0x180000 into the igt tool patch below was based on some prior work we had (some internal ULTs we have running with this tool).
This makes more sense considering the fact that there are some registers that have the same B-Spec register offset for both Render and Display - except one needs the 180000 offset and the other doesn't - i.e. for those cases, u cant silently just add the 0x180000 by simply inspecting what offset it is.
Actually, I emailed about this previously and I remember that in that thread our team's incarnation of igt would have an additional input param for intel-reg-read/write to explicitly say if we want to dictate it as a display register or not (which is a NO-OP for non-VLV). But I see that got lost from the patch.

Ben, Daniel, please let us the final verdict - if the 0x180000 should NOT be added, we'll re-do the patches and require use explicit addition of the 180000 (and probably need to re-do our internal ULTs in some near future).

However, WRT to the xml file parsing - this patch has no relationship whatsoever to that. I agree that XML file parsing is a good idea but this patch is about enabling VLV support of existing i-g-t functions.
On the xml concept, should we perhaps consider chipset HW abstraction for the kernel driver too??? i.e. the registers header files?? (i915_regs.h --> PIPEA_STATUS_GEN vs ivlv_regs.h --> PIPEA_STATUS_VLV for example - with the 180000 already added to the latter)?

...alan




-----Original Message-----
From: Ben Widawsky [mailto:ben@bwidawsk.net]
Sent: Wednesday, January 30, 2013 9:13 AM
To: Vetter, Daniel
Cc: Barnes, Jesse; intel-gfx@lists.freedesktop.org; Cheah, Vincent Beng Keat; Ung, Teng En; Teres Alexis, Alan Previn; Widawsky, Benjamin
Subject: Re: [Intel-gfx] intel-gpu-tools patches for read/write MMIO

On Tue, Jan 29, 2013 at 09:15:22PM +0100, Daniel Vetter wrote:
> On 29/01/2013 21:01, Jesse Barnes wrote:
> >Can you just post them externally tointel-gfx@lists.freedesktop.org?
> >It's best to use git send-email to do it, that way the changelogs are 
> >preserved and posted to the ml along with the patches.
> Public intel-gfx is already on the cc list, just in case you get the 
> urge to spill some secrets ;-)
> >Not sure if there's a bunch of duplication between the two, but you 
> >could split them up a bit.
> >
> >I still don't like the idea of silently adding the display offset on 
> >vlv; these are just debug tools and the developer should get the 
> >absolute offset they asked for no matter what.
> On that topic of silently adding display offset - with Ville's kernel 
> work we'll have switched away completely from such tricks in the 
> kernel. So I think i-g-t shouldn't automatically add the offset.
> 
> Which essentially just leaves us with intel_reg_dumper. Now for that 
> I'm somewhat hopefully that we will be able to (eventually) dump 
> registers using the bspec xml sources (there should be bspec xmls 
> around for just the open-source approved parts). In the meantime, 
> can't we just adjust the relevant offsets of the register blocks?
> IIrc their all somewhat usefully grouped together, so this would 
> amount to adding a quick function to add the offset to a given table 
> (put keep all the names) and then feed the adjusted table to the 
> dumper functions ...
> -Daniel

As we discussed in private, even if we get to the point of having bspec xml, we would still want a tool similar to the one that was proposed for parsing the XML (as opposed to the text). Reg dumper as has been mentioned in several threads is pretty inflexible, and a pain to modify for person use.

As we also discussed in private, I'd like Jesse to either fight or not for this because I don't think he has to butt heads with you enough.

--
Ben Widawsky, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: intel-gpu-tools patches for read/write MMIO
  2013-01-30  3:27                     ` Teres Alexis, Alan Previn
@ 2013-01-30  3:39                       ` Ben Widawsky
  0 siblings, 0 replies; 13+ messages in thread
From: Ben Widawsky @ 2013-01-30  3:39 UTC (permalink / raw)
  To: Teres Alexis, Alan Previn
  Cc: Vetter, Daniel, intel-gfx@lists.freedesktop.org,
	Cheah, Vincent Beng Keat, Barnes, Jesse, Ung, Teng En

On Tue, Jan 29, 2013 at 7:27 PM, Teres Alexis, Alan Previn
<alan.previn.teres.alexis@intel.com> wrote:
> Vincent, quick realization:
>
> If we patch VLV support on Ben's branch, (i.e. QuickDump style - separate txt-file tables of register lists that have the correct absolute register offsets), then we should ensure intel_reg_dumper / intel_panel_fitter is removed (since that one is broken for VLV in its current incarnation)... which is what was wrong with our patch. Our patch for Ben's branch - we added the +180000 into the primitive functions and though that worked for the intel_reg_dumper / intel_panel_fitter, we ended up breaking Quick-dump . But this would be okay for main 1.3 tree. Basically these approaches are mutually exclusive (Quick-dump vs the intel_reg_dumper and alike tools).
>
> So what needs to happen is:
>
> 1. Daniel / Ben needs agree on what to go with for master igt (i.e. remove intel_panel_fitter, intel_reg_dumper etc  and replace with Quick-dump). I think the future is probably Quick-dump.
> 2. If they go with Quick-Dump, we can use Ben's stuff as is - no changes required - but we'll have to ensure that for individual register read / write, user uses the absolute MMIO offset.
>         - in this case, no patch is required from our side (except the BAR memory mapping fixes for VLV - there was some other things here).
>         - over time, we can add, additional VLV tables for QuickDump (since base_interrupt, base_power doesn't include other VLV IRQ/Power register sets not part of current base tables).
> 3. If they go with maintaining the intel_reg_dumper (for now),
>         - then we need to either add the +180000 in our primitive functions - which we don't really like OR modify the intel_reg_dumper / intel_panel_fitter/ etc with more "if(VALLEYVIEW)" and reference a new set of register tables with the VLV absolute offsets.
>
> Lets wait for Ben and Daniel to give the go-ahead (I doubt we'll have to decide on #3, I think they'll lean towards #2).
>
> Daniel, Ben - let us know when QuickDump is going live - we'll skip our patches for now - but we'll probably maintain our own internal bad version of intel_reg_read / write for now so we can carry on with our internal testing.
>
> ...alan
>
WRT  #1 = it seems to be a useful, easy to maintain tool, where groups
like ISG can easily carry around there on stuff. I also suspect it
will just be overlapped with xml parsing if we ever get bspec XML.
When discussing this today.

In regards to #2, and #3 though - because it's out of the main
directory and relies on fairly standard tools (reg, dpio read/write) -
there is no reason one couldn't easily maintain it out of tree. I
don't think that's ideal, but it should be almost trivial (it doesn't
even require any Makefile merging IIRC). I haven't rebased it in a few
months, so take that with a grain of salt.

You should probably wait until Daniel puts his foot firmly down. I've
asked him in that case that he find the resource to fix up reg_dumper
to his ideas.

Daniel: Honestly, I'm fine with you putting your foot down as long as
you have a plan to make this stuff what we need it to be in the near
term.

[snip]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: intel-gpu-tools patches for read/write MMIO
  2013-01-30  1:12                 ` Ben Widawsky
  2013-01-30  1:39                   ` Teres Alexis, Alan Previn
@ 2013-01-30 17:13                   ` Jesse Barnes
  2013-01-30 17:25                     ` Daniel Vetter
  1 sibling, 1 reply; 13+ messages in thread
From: Jesse Barnes @ 2013-01-30 17:13 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: Cheah, Vincent Beng Keat, Ung, Teng En, Teres Alexis, Alan Previn,
	intel-gfx@lists.freedesktop.org, Widawsky, Benjamin,
	Daniel Vetter

On Tue, 29 Jan 2013 17:12:53 -0800
Ben Widawsky <ben@bwidawsk.net> wrote:

> On Tue, Jan 29, 2013 at 09:15:22PM +0100, Daniel Vetter wrote:
> > On 29/01/2013 21:01, Jesse Barnes wrote:
> > >Can you just post them externally tointel-gfx@lists.freedesktop.org?
> > >It's best to use git send-email to do it, that way the changelogs are
> > >preserved and posted to the ml along with the patches.
> > Public intel-gfx is already on the cc list, just in case you get the
> > urge to spill some secrets ;-)
> > >Not sure if there's a bunch of duplication between the two, but you
> > >could split them up a bit.
> > >
> > >I still don't like the idea of silently adding the display offset on
> > >vlv; these are just debug tools and the developer should get the
> > >absolute offset they asked for no matter what.
> > On that topic of silently adding display offset - with Ville's
> > kernel work we'll have switched away completely from such tricks in
> > the kernel. So I think i-g-t shouldn't automatically add the offset.
> > 
> > Which essentially just leaves us with intel_reg_dumper. Now for that
> > I'm somewhat hopefully that we will be able to (eventually) dump
> > registers using the bspec xml sources (there should be bspec xmls
> > around for just the open-source approved parts). In the meantime,
> > can't we just adjust the relevant offsets of the register blocks?
> > IIrc their all somewhat usefully grouped together, so this would
> > amount to adding a quick function to add the offset to a given table
> > (put keep all the names) and then feed the adjusted table to the
> > dumper functions ...

The big downside of using the bspec stuff is it'll be a huge rename
effort for us, and will likely get renamed and changed in the bspec
over time, breaking things.

> As we discussed in private, even if we get to the point of having bspec
> xml, we would still want a tool similar to the one that was proposed for
> parsing the XML (as opposed to the text). Reg dumper as has been
> mentioned in several threads is pretty inflexible, and a pain to modify
> for person use.
> 
> As we also discussed in private, I'd like Jesse to either fight or not
> for this because I don't think he has to butt heads with you enough.

For reg_dumper I'd prefer something like Ben's work, which just takes
text files describing what's being dumped, so we can better handle
dumping subsets of regs and have different files for different
platforms.

Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: intel-gpu-tools patches for read/write MMIO
  2013-01-30 17:13                   ` Jesse Barnes
@ 2013-01-30 17:25                     ` Daniel Vetter
  2013-01-30 17:30                       ` Jesse Barnes
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2013-01-30 17:25 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Ben Widawsky, Cheah, Vincent Beng Keat, Ung, Teng En,
	Teres Alexis, Alan Previn, intel-gfx@lists.freedesktop.org,
	Widawsky, Benjamin

On 30/01/2013 18:13, Jesse Barnes wrote:
> On Tue, 29 Jan 2013 17:12:53 -0800
> Ben Widawsky <ben@bwidawsk.net> wrote:
>
>> On Tue, Jan 29, 2013 at 09:15:22PM +0100, Daniel Vetter wrote:
>>> On 29/01/2013 21:01, Jesse Barnes wrote:
>>>> Can you just post them externally tointel-gfx@lists.freedesktop.org?
>>>> It's best to use git send-email to do it, that way the changelogs are
>>>> preserved and posted to the ml along with the patches.
>>> Public intel-gfx is already on the cc list, just in case you get the
>>> urge to spill some secrets ;-)
>>>> Not sure if there's a bunch of duplication between the two, but you
>>>> could split them up a bit.
>>>>
>>>> I still don't like the idea of silently adding the display offset on
>>>> vlv; these are just debug tools and the developer should get the
>>>> absolute offset they asked for no matter what.
>>> On that topic of silently adding display offset - with Ville's
>>> kernel work we'll have switched away completely from such tricks in
>>> the kernel. So I think i-g-t shouldn't automatically add the offset.
>>>
>>> Which essentially just leaves us with intel_reg_dumper. Now for that
>>> I'm somewhat hopefully that we will be able to (eventually) dump
>>> registers using the bspec xml sources (there should be bspec xmls
>>> around for just the open-source approved parts). In the meantime,
>>> can't we just adjust the relevant offsets of the register blocks?
>>> IIrc their all somewhat usefully grouped together, so this would
>>> amount to adding a quick function to add the offset to a given table
>>> (put keep all the names) and then feed the adjusted table to the
>>> dumper functions ...
> The big downside of using the bspec stuff is it'll be a huge rename
> effort for us, and will likely get renamed and changed in the bspec
> over time, breaking things.
Which is why autogenerating headers makes imo no sense. But register 
dumping and decoding for debug purposes is a different thing and I'm 
hopeful that using bspec xmls cut allow us to cut down a lot of boring 
work in that area ...

>
>> As we discussed in private, even if we get to the point of having bspec
>> xml, we would still want a tool similar to the one that was proposed for
>> parsing the XML (as opposed to the text). Reg dumper as has been
>> mentioned in several threads is pretty inflexible, and a pain to modify
>> for person use.
>>
>> As we also discussed in private, I'd like Jesse to either fight or not
>> for this because I don't think he has to butt heads with you enough.
> For reg_dumper I'd prefer something like Ben's work, which just takes
> text files describing what's being dumped, so we can better handle
> dumping subsets of regs and have different files for different
> platforms.
Essentially I'm only against the magic register offset adjustment, since 
that doesn't work due to some aliased registers. I'm happy with any of 
the other ideas tossed around here. If we come up with different tools, 
maybe adding a wrapper script to pick the right one (e.g. binary 
reg_dumper for older platfroms, textfile-driven dumper for newer 
platforms) would be nice. Otherwise we'll inevitably have a few 
unnecessary round-trips in bug reports.

Cheers, Daniel
Intel Semiconductor AG
Registered No. 020.30.913.786-7
Registered Office: World Trade Center, Leutschenbachstrasse 95, 8050 Zurich, Switzerland

This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: intel-gpu-tools patches for read/write MMIO
  2013-01-30 17:25                     ` Daniel Vetter
@ 2013-01-30 17:30                       ` Jesse Barnes
  2013-01-30 17:52                         ` Ben Widawsky
  0 siblings, 1 reply; 13+ messages in thread
From: Jesse Barnes @ 2013-01-30 17:30 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Ben Widawsky, Cheah, Vincent Beng Keat, Ung, Teng En,
	Teres Alexis, Alan Previn, intel-gfx@lists.freedesktop.org,
	Widawsky, Benjamin

On Wed, 30 Jan 2013 18:25:39 +0100
Daniel Vetter <daniel.vetter@intel.com> wrote:

> On 30/01/2013 18:13, Jesse Barnes wrote:
> > On Tue, 29 Jan 2013 17:12:53 -0800
> > Ben Widawsky <ben@bwidawsk.net> wrote:
> >
> >> On Tue, Jan 29, 2013 at 09:15:22PM +0100, Daniel Vetter wrote:
> >>> On 29/01/2013 21:01, Jesse Barnes wrote:
> >>>> Can you just post them externally tointel-gfx@lists.freedesktop.org?
> >>>> It's best to use git send-email to do it, that way the changelogs are
> >>>> preserved and posted to the ml along with the patches.
> >>> Public intel-gfx is already on the cc list, just in case you get the
> >>> urge to spill some secrets ;-)
> >>>> Not sure if there's a bunch of duplication between the two, but you
> >>>> could split them up a bit.
> >>>>
> >>>> I still don't like the idea of silently adding the display offset on
> >>>> vlv; these are just debug tools and the developer should get the
> >>>> absolute offset they asked for no matter what.
> >>> On that topic of silently adding display offset - with Ville's
> >>> kernel work we'll have switched away completely from such tricks in
> >>> the kernel. So I think i-g-t shouldn't automatically add the offset.
> >>>
> >>> Which essentially just leaves us with intel_reg_dumper. Now for that
> >>> I'm somewhat hopefully that we will be able to (eventually) dump
> >>> registers using the bspec xml sources (there should be bspec xmls
> >>> around for just the open-source approved parts). In the meantime,
> >>> can't we just adjust the relevant offsets of the register blocks?
> >>> IIrc their all somewhat usefully grouped together, so this would
> >>> amount to adding a quick function to add the offset to a given table
> >>> (put keep all the names) and then feed the adjusted table to the
> >>> dumper functions ...
> > The big downside of using the bspec stuff is it'll be a huge rename
> > effort for us, and will likely get renamed and changed in the bspec
> > over time, breaking things.
> Which is why autogenerating headers makes imo no sense. But register 
> dumping and decoding for debug purposes is a different thing and I'm 
> hopeful that using bspec xmls cut allow us to cut down a lot of boring 
> work in that area ...

Ah ok I didn't catch that distinction.  I think I agree, though we'll
be stuck with mapping the bspec regs back to the other names we're
familiar with too.  But it's definitely easier to deal with going
forward.

> 
> >
> >> As we discussed in private, even if we get to the point of having bspec
> >> xml, we would still want a tool similar to the one that was proposed for
> >> parsing the XML (as opposed to the text). Reg dumper as has been
> >> mentioned in several threads is pretty inflexible, and a pain to modify
> >> for person use.
> >>
> >> As we also discussed in private, I'd like Jesse to either fight or not
> >> for this because I don't think he has to butt heads with you enough.
> > For reg_dumper I'd prefer something like Ben's work, which just takes
> > text files describing what's being dumped, so we can better handle
> > dumping subsets of regs and have different files for different
> > platforms.
> Essentially I'm only against the magic register offset adjustment, since 
> that doesn't work due to some aliased registers. I'm happy with any of 
> the other ideas tossed around here. If we come up with different tools, 
> maybe adding a wrapper script to pick the right one (e.g. binary 
> reg_dumper for older platfroms, textfile-driven dumper for newer 
> platforms) would be nice. Otherwise we'll inevitably have a few 
> unnecessary round-trips in bug reports.

Ok so let's get Ben to push his stuff.  If we get bspec xml bits we can
extend his tool to use them as well.

Jesse

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: intel-gpu-tools patches for read/write MMIO
  2013-01-30 17:30                       ` Jesse Barnes
@ 2013-01-30 17:52                         ` Ben Widawsky
  2013-01-31  2:40                           ` Cheah, Vincent Beng Keat
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Widawsky @ 2013-01-30 17:52 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Ben Widawsky, Cheah, Vincent Beng Keat, Ung, Teng En,
	Teres Alexis, Alan Previn, intel-gfx@lists.freedesktop.org,
	Daniel Vetter

On Wed, Jan 30, 2013 at 9:30 AM, Jesse Barnes <jesse.barnes@intel.com> wrote:
> On Wed, 30 Jan 2013 18:25:39 +0100
> Daniel Vetter <daniel.vetter@intel.com> wrote:
>
>> On 30/01/2013 18:13, Jesse Barnes wrote:
>> > On Tue, 29 Jan 2013 17:12:53 -0800
>> > Ben Widawsky <ben@bwidawsk.net> wrote:
>> >
>> >> On Tue, Jan 29, 2013 at 09:15:22PM +0100, Daniel Vetter wrote:
>> >>> On 29/01/2013 21:01, Jesse Barnes wrote:
>> >>>> Can you just post them externally tointel-gfx@lists.freedesktop.org?
>> >>>> It's best to use git send-email to do it, that way the changelogs are
>> >>>> preserved and posted to the ml along with the patches.
>> >>> Public intel-gfx is already on the cc list, just in case you get the
>> >>> urge to spill some secrets ;-)
>> >>>> Not sure if there's a bunch of duplication between the two, but you
>> >>>> could split them up a bit.
>> >>>>
>> >>>> I still don't like the idea of silently adding the display offset on
>> >>>> vlv; these are just debug tools and the developer should get the
>> >>>> absolute offset they asked for no matter what.
>> >>> On that topic of silently adding display offset - with Ville's
>> >>> kernel work we'll have switched away completely from such tricks in
>> >>> the kernel. So I think i-g-t shouldn't automatically add the offset.
>> >>>
>> >>> Which essentially just leaves us with intel_reg_dumper. Now for that
>> >>> I'm somewhat hopefully that we will be able to (eventually) dump
>> >>> registers using the bspec xml sources (there should be bspec xmls
>> >>> around for just the open-source approved parts). In the meantime,
>> >>> can't we just adjust the relevant offsets of the register blocks?
>> >>> IIrc their all somewhat usefully grouped together, so this would
>> >>> amount to adding a quick function to add the offset to a given table
>> >>> (put keep all the names) and then feed the adjusted table to the
>> >>> dumper functions ...
>> > The big downside of using the bspec stuff is it'll be a huge rename
>> > effort for us, and will likely get renamed and changed in the bspec
>> > over time, breaking things.
>> Which is why autogenerating headers makes imo no sense. But register
>> dumping and decoding for debug purposes is a different thing and I'm
>> hopeful that using bspec xmls cut allow us to cut down a lot of boring
>> work in that area ...
>
> Ah ok I didn't catch that distinction.  I think I agree, though we'll
> be stuck with mapping the bspec regs back to the other names we're
> familiar with too.  But it's definitely easier to deal with going
> forward.
>
>>
>> >
>> >> As we discussed in private, even if we get to the point of having bspec
>> >> xml, we would still want a tool similar to the one that was proposed for
>> >> parsing the XML (as opposed to the text). Reg dumper as has been
>> >> mentioned in several threads is pretty inflexible, and a pain to modify
>> >> for person use.
>> >>
>> >> As we also discussed in private, I'd like Jesse to either fight or not
>> >> for this because I don't think he has to butt heads with you enough.
>> > For reg_dumper I'd prefer something like Ben's work, which just takes
>> > text files describing what's being dumped, so we can better handle
>> > dumping subsets of regs and have different files for different
>> > platforms.
>> Essentially I'm only against the magic register offset adjustment, since
>> that doesn't work due to some aliased registers. I'm happy with any of
>> the other ideas tossed around here. If we come up with different tools,
>> maybe adding a wrapper script to pick the right one (e.g. binary
>> reg_dumper for older platfroms, textfile-driven dumper for newer
>> platforms) would be nice. Otherwise we'll inevitably have a few
>> unnecessary round-trips in bug reports.
>
> Ok so let's get Ben to push his stuff.  If we get bspec xml bits we can
> extend his tool to use them as well.
>
> Jesse

As I've said previous, having some kind of tool like this makes it
easier for others to maintain their own, potentially confidential
register sets/bit decoding.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: intel-gpu-tools patches for read/write MMIO
  2013-01-30 17:52                         ` Ben Widawsky
@ 2013-01-31  2:40                           ` Cheah, Vincent Beng Keat
  2013-01-31  3:27                             ` Ben Widawsky
  0 siblings, 1 reply; 13+ messages in thread
From: Cheah, Vincent Beng Keat @ 2013-01-31  2:40 UTC (permalink / raw)
  To: Widawsky, Benjamin, Barnes, Jesse
  Cc: Vetter, Daniel, Ben Widawsky, intel-gfx@lists.freedesktop.org,
	Teres Alexis, Alan Previn, Ung, Teng En

Hey Ben, 

For your quick dump mechanism yours, there are also some consideration that you might have left out - overlapping IRQ and some other registers that have the same offset for both render and display (since base_interrupt, base_power doesn’t include other VLV IRQ/Power register sets not part of current base tables. On top of that there are also some registers that are still missing in accordance bspecs. Are you going to add that in upon pushing it to the master branch? If not, then I have volunteered to help you get all the registers ready setup for you. 


I don’t think there is any other patches required on our side. Oh yes before I forget, there is some BAR memory mapping fixes for VLV in order for it to work - intel_gtt.c
       
         for (f = 0; flag[f] != 0; f++) {
                       if (IS_GEN3(devid)) {
                               /* 915/945 chips has GTT range in bar 3 */
                               if (pci_device_map_range(pci_dev,
                                                        pci_dev->regions[3].base_addr,
                                                        pci_dev->regions[3].size,
                                                        flag[f],
                                                        (void **)&gtt) == 0)
                                       break;
                       } else {
                               int offset;
                               if (IS_G4X(devid) || IS_GEN5(devid) || IS_VALLEYVIEW(devid))
                                       offset = MB(2);
                               else
                                       offset = KB(512);
                               if (pci_device_map_range(pci_dev,
                                                        pci_dev->regions[0].base_addr + offset,
                                                        offset,
                                                        flag[f],
                                                        (void **)&gtt) == 0)
                                       break;
                       }
               }


In between, I'm also in the process of creating a personal repo to keep our own internal bad version of intel_reg_read / write as for now so we can carry on with our internal testing and providing a quick solution temporary for our customer to read/write MMIO regs. I have a problem, creating an account in people.freedesktop.org. The link below simply does not provide me much information. By the way, I did apply membership in Xorg foundation based to some other link where I came across, but I don't seems to be able help neither. Am I missing anything? 

(http://www.freedesktop.org/wiki/Infrastructure/git/RepositoryAdmin) 

Thanks.

...vincent

-----Original Message-----
From: Ben Widawsky [mailto:benjamin.widawsky@intel.com] 
Sent: Thursday, January 31, 2013 1:53 AM
To: Barnes, Jesse
Cc: Vetter, Daniel; Ben Widawsky; intel-gfx@lists.freedesktop.org; Cheah, Vincent Beng Keat; Ung, Teng En; Teres Alexis, Alan Previn
Subject: Re: [Intel-gfx] intel-gpu-tools patches for read/write MMIO

On Wed, Jan 30, 2013 at 9:30 AM, Jesse Barnes <jesse.barnes@intel.com> wrote:
> On Wed, 30 Jan 2013 18:25:39 +0100
> Daniel Vetter <daniel.vetter@intel.com> wrote:
>
>> On 30/01/2013 18:13, Jesse Barnes wrote:
>> > On Tue, 29 Jan 2013 17:12:53 -0800
>> > Ben Widawsky <ben@bwidawsk.net> wrote:
>> >
>> >> On Tue, Jan 29, 2013 at 09:15:22PM +0100, Daniel Vetter wrote:
>> >>> On 29/01/2013 21:01, Jesse Barnes wrote:
>> >>>> Can you just post them externally tointel-gfx@lists.freedesktop.org?
>> >>>> It's best to use git send-email to do it, that way the 
>> >>>> changelogs are preserved and posted to the ml along with the patches.
>> >>> Public intel-gfx is already on the cc list, just in case you get 
>> >>> the urge to spill some secrets ;-)
>> >>>> Not sure if there's a bunch of duplication between the two, but 
>> >>>> you could split them up a bit.
>> >>>>
>> >>>> I still don't like the idea of silently adding the display 
>> >>>> offset on vlv; these are just debug tools and the developer 
>> >>>> should get the absolute offset they asked for no matter what.
>> >>> On that topic of silently adding display offset - with Ville's 
>> >>> kernel work we'll have switched away completely from such tricks 
>> >>> in the kernel. So I think i-g-t shouldn't automatically add the offset.
>> >>>
>> >>> Which essentially just leaves us with intel_reg_dumper. Now for 
>> >>> that I'm somewhat hopefully that we will be able to (eventually) 
>> >>> dump registers using the bspec xml sources (there should be bspec 
>> >>> xmls around for just the open-source approved parts). In the 
>> >>> meantime, can't we just adjust the relevant offsets of the register blocks?
>> >>> IIrc their all somewhat usefully grouped together, so this would 
>> >>> amount to adding a quick function to add the offset to a given 
>> >>> table (put keep all the names) and then feed the adjusted table 
>> >>> to the dumper functions ...
>> > The big downside of using the bspec stuff is it'll be a huge rename 
>> > effort for us, and will likely get renamed and changed in the bspec 
>> > over time, breaking things.
>> Which is why autogenerating headers makes imo no sense. But register 
>> dumping and decoding for debug purposes is a different thing and I'm 
>> hopeful that using bspec xmls cut allow us to cut down a lot of 
>> boring work in that area ...
>
> Ah ok I didn't catch that distinction.  I think I agree, though we'll 
> be stuck with mapping the bspec regs back to the other names we're 
> familiar with too.  But it's definitely easier to deal with going 
> forward.
>
>>
>> >
>> >> As we discussed in private, even if we get to the point of having 
>> >> bspec xml, we would still want a tool similar to the one that was 
>> >> proposed for parsing the XML (as opposed to the text). Reg dumper 
>> >> as has been mentioned in several threads is pretty inflexible, and 
>> >> a pain to modify for person use.
>> >>
>> >> As we also discussed in private, I'd like Jesse to either fight or 
>> >> not for this because I don't think he has to butt heads with you enough.
>> > For reg_dumper I'd prefer something like Ben's work, which just 
>> > takes text files describing what's being dumped, so we can better 
>> > handle dumping subsets of regs and have different files for 
>> > different platforms.
>> Essentially I'm only against the magic register offset adjustment, 
>> since that doesn't work due to some aliased registers. I'm happy with 
>> any of the other ideas tossed around here. If we come up with 
>> different tools, maybe adding a wrapper script to pick the right one 
>> (e.g. binary reg_dumper for older platfroms, textfile-driven dumper 
>> for newer
>> platforms) would be nice. Otherwise we'll inevitably have a few 
>> unnecessary round-trips in bug reports.
>
> Ok so let's get Ben to push his stuff.  If we get bspec xml bits we 
> can extend his tool to use them as well.
>
> Jesse

As I've said previous, having some kind of tool like this makes it easier for others to maintain their own, potentially confidential register sets/bit decoding.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: intel-gpu-tools patches for read/write MMIO
  2013-01-31  2:40                           ` Cheah, Vincent Beng Keat
@ 2013-01-31  3:27                             ` Ben Widawsky
  0 siblings, 0 replies; 13+ messages in thread
From: Ben Widawsky @ 2013-01-31  3:27 UTC (permalink / raw)
  To: Cheah, Vincent Beng Keat
  Cc: Ung, Teng En, Teres Alexis, Alan Previn,
	intel-gfx@lists.freedesktop.org, Barnes, Jesse,
	Widawsky, Benjamin, Vetter, Daniel

First of all, I managed to go and get quite sick, so please bare with
any slowness.

On Thu, Jan 31, 2013 at 02:40:20AM +0000, Cheah, Vincent Beng Keat wrote:
> Hey Ben, 
> 
> For your quick dump mechanism yours, there are also some consideration that you might have left out - overlapping IRQ and some other registers that have the same offset for both render and display (since base_interrupt, base_power doesn’t include other VLV IRQ/Power register sets not part of current base tables. On top of that there are also some registers that are still missing in accordance bspecs. Are you going to add that in upon pushing it to the master branch? If not, then I have volunteered to help you get all the registers ready setup for you. 
> 
> 

The work I did thus far was very much just hacking something to work to
submit RFC patches and enable us for power-on. Depending on my health, I
plan to rebase what was there, maybe add the appropriate automake
cleanups (if any are needed) and get it upstream. I also realize Jesse
added the DPIO support which I may need to cleanup or review or
whatever.

I don't understand your concern, so maybe ask me again once the current
stuff is upstreamed?

As a side note, I never pushed Jesse's additions, which I've just done.
http://cgit.freedesktop.org/~bwidawsk/intel-gpu-tools/log/?h=dump_util2


> I don’t think there is any other patches required on our side. Oh yes before I forget, there is some BAR memory mapping fixes for VLV in order for it to work - intel_gtt.c
>        
>          for (f = 0; flag[f] != 0; f++) {
>                        if (IS_GEN3(devid)) {
>                                /* 915/945 chips has GTT range in bar 3 */
>                                if (pci_device_map_range(pci_dev,
>                                                         pci_dev->regions[3].base_addr,
>                                                         pci_dev->regions[3].size,
>                                                         flag[f],
>                                                         (void **)&gtt) == 0)
>                                        break;
>                        } else {
>                                int offset;
>                                if (IS_G4X(devid) || IS_GEN5(devid) || IS_VALLEYVIEW(devid))
>                                        offset = MB(2);
>                                else
>                                        offset = KB(512);
>                                if (pci_device_map_range(pci_dev,
>                                                         pci_dev->regions[0].base_addr + offset,
>                                                         offset,
>                                                         flag[f],
>                                                         (void **)&gtt) == 0)
>                                        break;
>                        }
>                }
> 
> 

Can you please send this again as a unified diff (also a new email
thread would be appropriate).

> In between, I'm also in the process of creating a personal repo to keep our own internal bad version of intel_reg_read / write as for now so we can carry on with our internal testing and providing a quick solution temporary for our customer to read/write MMIO regs. I have a problem, creating an account in people.freedesktop.org. The link below simply does not provide me much information. By the way, I did apply membership in Xorg foundation based to some other link where I came across, but I don't seems to be able help neither. Am I missing anything? 
> 
> (http://www.freedesktop.org/wiki/Infrastructure/git/RepositoryAdmin) 
> 
> Thanks.
> 
> ...vincent

The admins are slow. They're all part-time or less. I'd recommending
just using the internal git services for this kind of stuff.

[snip]

-- 
Ben Widawsky, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-01-31  3:25 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <D4B999590D2ACC499FE6A1F7CF2D454AAA23F8@PGSMSX103.gar.corp.intel.com>
     [not found] ` <20130114073735.60c1f0c7@jbarnes-desktop>
     [not found]   ` <67A6A5BE6078AA49887BBA3935A429881D60A9@PGSMSX103.gar.corp.intel.com>
     [not found]     ` <CALNAZXqD3WLKa_L_z4RffCuTX0J8C28x2b1AjaFWwF=U29fwZA@mail.gmail.com>
     [not found]       ` <67A6A5BE6078AA49887BBA3935A429881D636B@PGSMSX103.gar.corp.intel.com>
     [not found]         ` <CALNAZXoC-Ss_2uxV+3Fc=SoStn+t_9pkBAdfMZ-ReVLHHGvu3g@mail.gmail.com>
2013-01-29  8:16           ` intel-gpu-tools patches for read/write MMIO Cheah, Vincent Beng Keat
2013-01-29 20:01             ` Jesse Barnes
2013-01-29 20:15               ` Daniel Vetter
2013-01-30  1:12                 ` Ben Widawsky
2013-01-30  1:39                   ` Teres Alexis, Alan Previn
2013-01-30  3:27                     ` Teres Alexis, Alan Previn
2013-01-30  3:39                       ` Ben Widawsky
2013-01-30 17:13                   ` Jesse Barnes
2013-01-30 17:25                     ` Daniel Vetter
2013-01-30 17:30                       ` Jesse Barnes
2013-01-30 17:52                         ` Ben Widawsky
2013-01-31  2:40                           ` Cheah, Vincent Beng Keat
2013-01-31  3:27                             ` Ben Widawsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.