From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mario Kleiner Subject: Re: Funky new vblank counter regressions in Linux 4.4-rc1 Date: Mon, 23 Nov 2015 17:02:11 +0100 Message-ID: <56533883.8020706@gmail.com> References: <564E0AF4.3050406@gmail.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------060507040107020704090403" Return-path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTP id A769C6E65F for ; Mon, 30 Nov 2015 12:13:21 -0800 (PST) Resent-Message-ID: <20151130201259.GX4437@intel.com> Resent-To: dri-devel@lists.freedesktop.org In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Alex Deucher , Harry Wentland Cc: =?UTF-8?Q?Michel_D=c3=a4nzer?= List-Id: dri-devel@lists.freedesktop.org This is a multi-part message in MIME format. --------------060507040107020704090403 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable On 11/20/2015 04:42 AM, Alex Deucher wrote: > On Thu, Nov 19, 2015 at 12:46 PM, Mario Kleiner > wrote: >> Hi Alex and Michel and Ville, >> >> it's "fix vblank stuff" time again ;-) > > Adding Harry from our display team. He might be able to fill in the > blanks of on some of this better than I can. It might also be worth > checking to see how our DAL (our new display code which is being > developed directly by our display team) code handles this. It could > be that we are just missing register settings: Thanks Alex! And hello Harry :) > http://cgit.freedesktop.org/~agd5f/linux/log/?h=3DDAL-wip I'll have a look at this. > Additionally we've published full registers headers for the display blo= ck: > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/dri= vers/gpu/drm/amd/include/asic_reg/dce > The DCE8 stuff should generally apply back to DCE4. If you have > questions about registers older asics not covered in the hw docs, let > me know. Note the new headers are dword aligned rather than byte > aligned. > I've tested now with two different progressive modes on DCE3 and one=20 progressive mode on DCE4, the only cards i have atm. So far it seems=20 that the framecounter indeed increments when the vpos from the scanout=20 position query jumps back to zero. Attached for reference is my current=20 patch to radeon-kms. This one seems to work reliably so far, also if i=20 enable the immediate vblank irq disable which we so far only used on=20 intel-kms. But according to this patch the framecounter increment happens somewhere=20 in the middle of vblank. Essentially from the vpos extracted from EVERGREEN_CRTC_STATUS_POSITION=20 which defines start of vblank ("Start" part of=20 EVERGREEN_CRTC_V_BLANK_START_END) until maximum ie. VTOTAL-1 the=20 framecounter stays on the count of the old previous scanout cycle. Then=20 when vpos wraps to zero the framecounter increments by 1. And then we=20 have another couple of dozen lines inside vblank until reaching the=20 "End" part of EVERGREEN_CRTC_V_BLANK_START_END and entering active=20 scanout for the new frame. So the position of observed framecounter increment seems to be not close=20 to the end of vblank ("start of frame" indicator?), but a couple of=20 scanlines after start of vblank. E.g., for a 2560x1440 video mode at 60 Hz, start of vblank is 1478,=20 vtotal is 1481, end of vblank is 38. So i enter the vblank and see the=20 old framecounter for vpos =3D 1478, 1479, 1480, then it wraps to 0 and th= e=20 framecounter increments by 1, then 38 scanlines later the vblank ends. So i seem to have something that seems to work in practice and this=20 "increment framecounter if vpos wraps back to zero" behavior makes some=20 sense. It just doesn't conform to what those descriptions for start_line=20 and "start of frame" indicator describe? I'll test with a few more video modes. thanks, -mario >> >> Ville's changes to the DRM's drm_handle_vblank() / drm_update_vblank_c= ount() >> code in Linux 4.4 not only made that code more elegant, but also remov= ed the >> robustness against the vblank irq quirks in AMD hw and similar hardwar= e. So >> now i get tons of off-by-one errors and >> >> "[ 432.345] (WW) RADEON(1): radeon_dri2_flip_event_handler: Pageflip >> completion event has impossible msc 24803 < target_msc 24804" XOrg mes= sages >> from that kernel. >> >> One of the reasons for trouble is that AMD hw quirk where the hw fires= an >> extra vblank irq shortly after vblank irq's get enabled, not synchroni= zed to >> vblank, but typically in the middle of active scanout, so we get a red= undant >> call to drm_handle_vblank in the middle of scanout. >> >> To fix that i have a minor patch to make drm_update_vblank_count() aga= in >> robust against such redundant calls, which i will send out later to th= e >> mailing list. Diff attached for reference. >> >> The second quirk of AMD hw is that the vblank interrupt fires a few >> scanlines before start of vblank, so drm_handle_vblank -> >> drm_update_vblank_count() -> dev->driver->get_vblank_counter() gets ca= lled >> before the start of the vblank for which the new vblank count should b= e >> queried. >> >> The third problem is that the DRM vblank handling always had the assum= ption >> that hardware vblank counters would essentially increment at leading e= dge of >> vblank - basically in sync with the firing of the vblank irq, so that = a hw >> counter readout from within the vblank irq handler would always delive= r the >> new incremented value. If this assumption is violated then the countin= g by >> use of the hw counter gets unreliable, because depending on random sma= ll >> delays in irq handling the code may end up sampling the hw counter pre= - or >> post-increment, leading to inconsistent updating and funky bugs. It ju= st so >> happens that AMD hardware doesn't increment the hw counter at leading = edge >> of vblank, so stuff falls apart. >> >> So to fix those two problems i'm tinkering with cooking the hw vblank >> counter value returned by radeon_get_vblank_counter_kms() to make it a= ppear >> as if the counter incremented at leading edge of vblank in sync with v= blank >> irq. >> >> It almost sort of works on the rs600 code path, but i need a bit of in= fo >> from you: >> >> 1. There's this register from the old specs for m76.pdf, which is not = part >> of the current register defines for radeon-kms: >> >> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]" >> >> It contains the lower 16 bits of framecounter and the 13 bits of verti= cal >> scanout position. It seems to give the same readings as the 24 bit >> R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This wou= ld >> come handy. >> >> Does Evergreen and later have a same/similar register and where is it? > > Yes. CRTC_STATUS_VF_COUNT > > CRTC:CRTC_STATUS_VF_COUNT =C2=B7 [R/W] =C2=B7 32 bits =C2=B7 Acce= ss: 8/16/32 =C2=B7 > [INST0] GpuF0MMReg:0x6e9c, [INST1] GpuF0MMReg:0x7a9c, [INST2] > GpuF0MMReg:0x1069c, [INST3] GpuF0MMReg:0x1129c, [INST4] > GpuF0MMReg:0x11e9c, [INST5] GpuF0MMReg:0x12a9c > DESCRIPTION: Current composite vertical and frame count for CRTC > Field Name Bits Default Description > CRTC_VF_COUNT > (Access: R) 28:0 0x0 Reports current vertical and frame count > >> >> 2. The hw framecounter seems to increment when the vertical scanout po= sition >> wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3 gpu i tes= ted so >> far. Is this so on all asics? And is the hw counter increment happenin= g >> exactly at the moment that vertical scanout position jumps back to zer= o, ie. >> both events are driven by the same signal? Or is the framecounter incr= ement >> just happening somewhere inside either scanline VTOTAL-1 or scanline 0= ? > > Unless Harry knows, we can ask the hw team, but I doubt they would > have changed it. I think it's tied to the start line which is > controlled by this register: > CRTC:CRTC_START_LINE_CONTROL =C2=B7 [R/W] =C2=B7 32 bits =C2=B7 A= ccess: 8/16/32 > =C2=B7 [INST0] GpuF0MMReg:0x6ecc, [INST1] GpuF0MMReg:0x7acc, [INST2] > GpuF0MMReg:0x106cc, [INST3] GpuF0MMReg:0x112cc, [INST4] > GpuF0MMReg:0x11ecc, [INST5] GpuF0MMReg:0x12acc > DESCRIPTION: move start_line signal earlier by 1 line in CRTC > Field Name Bits Default Description > CRTC_PROGRESSIVE_START_LINE_EARLY 0 0x0 move start_line > signal by 1 line eariler in progressive mode > > CRTC_INTERLACE_START_LINE_EARLY 8 0x1 move start_line > signal by 1 line earlier in interlaced timing mode > > CRTC_ADVANCED_START_LINE_POSITION 19:16 0x4 Advanced > Start Line position respect to not VBLANK. Default of 4 means the > Advanced Start Line is 4 lines before the first non VBLANK line > > The following info I dug up internally may be useful: > > The position of the CRTC output timing generator when the =E2=80=9Cstar= t of > frame=E2=80=9D indicator occurs depends on several conditions. These > conditions are whether the timing generator is outputting timing for a > progressive or interlaced display mode, whether the scaler is enabled, > and if so, whether the register to select an earlier than normal > =E2=80=9Cstart of frame=E2=80=9D indicator is set. The =E2=80=9Cstart = of frame=E2=80=9D indicator > typically occurs 2 lines before the vertical blank end position > (DxCRTC_V_BLANK_END) is reached > > When interlaced output display modes are enabled > (DxCRTC_INTERLACE_ENABLE =3D 1) and the CRTC timing generator is enable= d > (DxCRTC_MASTER_EN =3D 1), the timing generator=E2=80=99s vertical count= er counts > by 2 for both the even fields and odd fields of each frame. For both > the even fields and the odd fields of interlaced output modes, the > =E2=80=9Cstart of frame=E2=80=9D indicator occurs 2 lines before the en= d of the > vertical blank occurs (DxCRTC_V_BLANK_END =E2=80=93 4 or DxCRTC_V_BLANK= _END =E2=80=93 > 3 depending on the field since the vertical counter counts by 2 in > interlaced), since the vertical counter counts by 2 in this mode). > There is one exception to the previous statement. If scaling is > enabled (DxSCL_SCALE_EN =3D 1) and the early start line is enabled > (DxCRTC_INTERLACE_START_LINE_EARLY =3D 1) in interlaced output mode the= n > the =E2=80=9Cstart of frame=E2=80=9D indicator happens 3 lines before t= he end of the > vertical blank (DxCRTC_V_BLANK_END =E2=80=93 6 or DxCRTC_V_BLANK_END =E2= =80=93 5 > depending on the field since the vertical counter counts by 2 in > interlaced). > > When progressive output display modes are enabled > (DxCRTC_INTERLACE_ENABLE =3D 0) and the CRTC timing generator is enable= d > (DxCRTC_MASTER_EN =3D 1), the =E2=80=9Cstart of frame=E2=80=9D indicato= r occurs 2 lines > before the end of the vertical blank occurs (DxCRTC_V_BLANK_END - 2). > Similar to interlaced output mode, there is one exception to the > previous statement. If scaling is enabled (DxSCL_SCALE_EN =3D 1) and > the early start line is enabled (DxCRTC_PROGRESSIVE_START_LINE_EARLY =3D > 1) in progressive output mode then the =E2=80=9Cstart of frame=E2=80=9D= indicator > happens 3 lines before the end of the vertical blank > (DxCRTC_V_BLANK_END =E2=80=93 3) > > I hope this helps, > > Alex > >> >> >> If we can fix this and get it into rc2 or rc3 then we could avoid a ba= d >> regression and with a bit of luck at the same time improve by being ab= le to >> set dev->vblank_disable_immediate =3D true then and allow vblank irqs = to get >> turned off more aggressively for a bit of extra power saving. >> >> thanks, >> -mario --------------060507040107020704090403 Content-Type: text/x-patch; name="radeonframecounter.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="radeonframecounter.patch" diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index 7f33767..4357f8f 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -4377,6 +4377,10 @@ int evergreen_rlc_resume(struct radeon_device *rdev) u32 evergreen_get_vblank_counter(struct radeon_device *rdev, int crtc) { + u32 adl = RREG32(0x6ecc + crtc_offsets[crtc]); + u32 fvr = RREG32(CRTC_STATUS_FRAME_COUNT + 4 + crtc_offsets[crtc]); + DRM_DEBUG_VBL("crtc %d: adl0: %d adl8: %d adls: %d Fd %d Vd %d\n", crtc, adl & (1 << 0), adl & (1 << 8), (adl >> 16) & 0xf, fvr & 0xffff, fvr >> 16); + if (crtc >= rdev->num_crtc) return 0; else diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index a8d9927..76b3449 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -1910,11 +1910,22 @@ int radeon_get_crtc_scanoutpos(struct drm_device *dev, unsigned int pipe, vbl_start = mode->crtc_vdisplay; vbl_end = 0; } - + DRM_DEBUG_VBL("crtc %d: vpos %d : vs %d , ve %d, vs2 %d, vt %d\n", pipe, *vpos, vbl_start, vbl_end, mode->crtc_vdisplay, mode->crtc_vtotal); /* Test scanout position against vblank region. */ if ((*vpos < vbl_start) && (*vpos >= vbl_end)) in_vbl = false; + /* In vblank? */ + if (in_vbl) + ret |= DRM_SCANOUTPOS_IN_VBLANK; + + /* Called from driver internal vblank counter query code? */ + if (flags & 0x80000000) { + /* Caller wants distance from vbl_start */ + *vpos -= vbl_start; + return ret; + } + /* Check if inside vblank area and apply corrective offsets: * vpos will then be >=0 in video scanout area, but negative * within vblank area, counting down the number of lines until @@ -1930,10 +1941,6 @@ int radeon_get_crtc_scanoutpos(struct drm_device *dev, unsigned int pipe, /* Correct for shifted end of vbl at vbl_end. */ *vpos = *vpos - vbl_end; - /* In vblank? */ - if (in_vbl) - ret |= DRM_SCANOUTPOS_IN_VBLANK; - /* Is vpos outside nominal vblank area, but less than * 1/100 of a frame height away from start of vblank? * If so, assume this isn't a massively delayed vblank diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c index 0ec6fcc..885f934 100644 --- a/drivers/gpu/drm/radeon/radeon_kms.c +++ b/drivers/gpu/drm/radeon/radeon_kms.c @@ -755,6 +755,8 @@ void radeon_driver_preclose_kms(struct drm_device *dev, */ u32 radeon_get_vblank_counter_kms(struct drm_device *dev, int crtc) { + int vpos, hpos, stat, bump_line; + u32 count; struct radeon_device *rdev = dev->dev_private; if (crtc < 0 || crtc >= rdev->num_crtc) { @@ -762,7 +764,57 @@ u32 radeon_get_vblank_counter_kms(struct drm_device *dev, int crtc) return -EINVAL; } - return radeon_get_vblank_counter(rdev, crtc); + /* If called from hw vblank irq then we need hw quirk treatment. + * The hw fires its vblank irq typically 2-3 scanlines before + * the vblank, triggering sampling of our hw counter by DRM + * core before it increments at start of vblank, which is bad + * for vblank counting and timestamping, causing frequent + * off-by-one errors. Try to cook the hw count here to make it + * appear to the caller as if it was sampled after it incremented. + */ + if (rdev->mode_info.crtcs[crtc]) { + do { + count = radeon_get_vblank_counter(rdev, crtc); + /* Ask function to return vpos as distance to start of + * vblank, instead of regular vertical scanout pos. + */ + stat = radeon_get_crtc_scanoutpos( + dev, crtc, + 0x80000000, &vpos, &hpos, NULL, NULL, + &rdev->mode_info.crtcs[crtc]->base.hwmode); + } while (count != radeon_get_vblank_counter(rdev, crtc)); + + if (((stat & (DRM_SCANOUTPOS_VALID | DRM_SCANOUTPOS_ACCURATE)) != + (DRM_SCANOUTPOS_VALID | DRM_SCANOUTPOS_ACCURATE))) { + DRM_DEBUG_VBL("Query failed! stat %d\n", stat); + } + else { + /* Bump counter if we are at >= leading edge of vblank, + * but before wraparound to zero where the hw counter + * increments. Bump counter already 10 scanlines before + * vblank if we are called from vblank irq handler, as + * that one fires a few scanlines before vblank but + * should get a count as if it was called inside vblank. + */ + bump_line = in_irq() ? -10 : 0; + if (vpos >= bump_line) { + count++; + DRM_DEBUG_VBL("crtc %d: vpos %d >= bump_line %d -> Bumped.\n", + crtc, vpos, bump_line); + } + else { + DRM_DEBUG_VBL("crtc %d: vpos %d\n", + crtc, vpos); + } + } + } + else { + /* Fallback to use value as is. */ + count = radeon_get_vblank_counter(rdev, crtc); + DRM_DEBUG_VBL("NULL mode info!\n"); + } + + return count; } /** diff --git a/drivers/gpu/drm/radeon/rs600.c b/drivers/gpu/drm/radeon/rs600.c index 97a9048..460ca2f 100644 --- a/drivers/gpu/drm/radeon/rs600.c +++ b/drivers/gpu/drm/radeon/rs600.c @@ -835,6 +835,9 @@ int rs600_irq_process(struct radeon_device *rdev) u32 rs600_get_vblank_counter(struct radeon_device *rdev, int crtc) { + u32 fvr = RREG32((crtc == 0) ? (R_0060A4_D1CRTC_STATUS_FRAME_COUNT + 4) : (R_0068A4_D2CRTC_STATUS_FRAME_COUNT + 4)); + DRM_DEBUG_VBL("crtc %d: Fd%d Vd %d\n", crtc, fvr & 0xffff, fvr >> 16); + if (crtc == 0) return RREG32(R_0060A4_D1CRTC_STATUS_FRAME_COUNT); else --------------060507040107020704090403 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHA6Ly9saXN0 cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9kcmktZGV2ZWwK --------------060507040107020704090403--