From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Anholt Subject: Re: [PATCH v2 3/3] drm/vc4: Add a load tracker to prevent HVS underflow errors Date: Tue, 30 Oct 2018 16:12:55 -0700 Message-ID: <87bm7bm3k8.fsf@anholt.net> References: <20181025124546.22145-1-boris.brezillon@bootlin.com> <20181025124546.22145-4-boris.brezillon@bootlin.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0078336996==" Return-path: Received: from anholt.net (anholt.net [50.246.234.109]) by gabe.freedesktop.org (Postfix) with ESMTP id ACF576E1EF for ; Tue, 30 Oct 2018 23:12:59 +0000 (UTC) In-Reply-To: <20181025124546.22145-4-boris.brezillon@bootlin.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Daniel Vetter Cc: Boris Brezillon , dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0078336996== Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Boris Brezillon writes: > The HVS block is supposed to fill the pixelvalve FIFOs fast enough to > meet the requested framerate. The problem is, the HVS and memory bus > bandwidths are limited, and if we don't take these limitations into > account we might end up with HVS underflow errors. > > This patch is trying to model the per-plane HVS and memory bus bandwidth > consumption and take a decision at atomic_check() time whether the > estimated load will fit in the HVS and membus budget. > > Note that we take an extra margin on the memory bus consumption to let > the system run smoothly when other blocks are doing heavy use of the > memory bus. Same goes for the HVS limit, except the margin is smaller in > this case, since the HVS is not used by external components. > > Signed-off-by: Boris Brezillon > --- > Changes in v2: > - Remove an unused var in vc4_load_tracker_atomic_check() > --- > drivers/gpu/drm/vc4/vc4_drv.c | 1 + > drivers/gpu/drm/vc4/vc4_drv.h | 11 ++++ > drivers/gpu/drm/vc4/vc4_kms.c | 103 +++++++++++++++++++++++++++++++- > drivers/gpu/drm/vc4/vc4_plane.c | 60 +++++++++++++++++++ > 4 files changed, 174 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c > index f6f5cd80c04d..7195a0bcceb3 100644 > --- a/drivers/gpu/drm/vc4/vc4_drv.c > +++ b/drivers/gpu/drm/vc4/vc4_drv.c > @@ -313,6 +313,7 @@ static void vc4_drm_unbind(struct device *dev) >=20=20 > drm_mode_config_cleanup(drm); >=20=20 > + drm_atomic_private_obj_fini(&vc4->load_tracker); > drm_atomic_private_obj_fini(&vc4->ctm_manager); >=20=20 > drm_dev_put(drm); > diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h > index a6c67c65ffbc..11369da944b6 100644 > --- a/drivers/gpu/drm/vc4/vc4_drv.h > +++ b/drivers/gpu/drm/vc4/vc4_drv.h > @@ -200,6 +200,7 @@ struct vc4_dev { >=20=20 > struct drm_modeset_lock ctm_state_lock; > struct drm_private_obj ctm_manager; > + struct drm_private_obj load_tracker; > }; >=20=20 > static inline struct vc4_dev * > @@ -369,6 +370,16 @@ struct vc4_plane_state { > * to enable background color fill. > */ > bool needs_bg_fill; > + > + /* Load of this plane on the HVS block. The load is expressed in HVS > + * cycles/sec. > + */ > + u64 hvs_load; > + > + /* Memory bandwidth needed for this plane. This is expressed in > + * bytes/sec. > + */ > + u64 membus_load; > }; >=20=20 > static inline struct vc4_plane_state * > diff --git a/drivers/gpu/drm/vc4/vc4_kms.c b/drivers/gpu/drm/vc4/vc4_kms.c > index 8e0183b1e8bb..b905a19c1470 100644 > --- a/drivers/gpu/drm/vc4/vc4_kms.c > +++ b/drivers/gpu/drm/vc4/vc4_kms.c > @@ -34,6 +34,18 @@ static struct vc4_ctm_state *to_vc4_ctm_state(struct d= rm_private_state *priv) > return container_of(priv, struct vc4_ctm_state, base); > } >=20=20 > +struct vc4_load_tracker_state { > + struct drm_private_state base; > + u64 hvs_load; > + u64 membus_load; > +}; > + > +static struct vc4_load_tracker_state * > +to_vc4_load_tracker_state(struct drm_private_state *priv) > +{ > + return container_of(priv, struct vc4_load_tracker_state, base); > +} > + > static struct vc4_ctm_state *vc4_get_ctm_state(struct drm_atomic_state *= state, > struct drm_private_obj *manager) > { > @@ -383,6 +395,81 @@ vc4_ctm_atomic_check(struct drm_device *dev, struct = drm_atomic_state *state) > return 0; > } >=20=20 > +static int vc4_load_tracker_atomic_check(struct drm_atomic_state *state) > +{ > + struct drm_plane_state *old_plane_state, *new_plane_state; > + struct vc4_dev *vc4 =3D to_vc4_dev(state->dev); > + struct vc4_load_tracker_state *load_state; > + struct drm_private_state *priv_state; > + struct drm_plane *plane; > + int i; > + > + priv_state =3D drm_atomic_get_private_obj_state(state, > + &vc4->load_tracker); > + if (IS_ERR(priv_state)) > + return PTR_ERR(priv_state); > + > + load_state =3D to_vc4_load_tracker_state(priv_state); > + for_each_oldnew_plane_in_state(state, plane, old_plane_state, > + new_plane_state, i) { > + struct vc4_plane_state *vc4_plane_state; > + > + if (old_plane_state->fb && old_plane_state->crtc) { > + vc4_plane_state =3D to_vc4_plane_state(old_plane_state); > + load_state->membus_load -=3D vc4_plane_state->membus_load; > + load_state->hvs_load -=3D vc4_plane_state->hvs_load; > + } > + > + if (new_plane_state->fb && new_plane_state->crtc) { > + vc4_plane_state =3D to_vc4_plane_state(new_plane_state); > + load_state->membus_load +=3D vc4_plane_state->membus_load; > + load_state->hvs_load +=3D vc4_plane_state->hvs_load; > + } > + } > + > + /* The abolsute limit is 2Gbyte/sec, but let's take a margin to let "absolute" > + * the system work when other blocks are accessing the memory. > + */ > + if (load_state->membus_load > SZ_1G + SZ_512M) > + return -ENOSPC; > + /* HVS clock is supposed to run @ 250Mhz, let's take a margin and > + * consider the maximum number of cycles is 240M. > + */ > + if (load_state->hvs_load > 240000000ULL) > + return -ENOSPC; Some day we should probably be using the HVS's actual clock from DT if available instead of a hardcoded number here. Good enough for now. > +static void vc4_load_tracker_destroy_state(struct drm_private_obj *obj, > + struct drm_private_state *state) > +{ > + struct vc4_load_tracker_state *load_state; > + > + load_state =3D to_vc4_load_tracker_state(state); > + kfree(load_state); > +} Optional: just kfree(state) for simplicity. > +static void vc4_plane_calc_load(struct drm_plane_state *state) > +{ > + unsigned int hvs_load_shift, vrefresh, i; > + struct drm_framebuffer *fb =3D state->fb; > + struct vc4_plane_state *vc4_state; > + struct drm_crtc_state *crtc_state; > + unsigned int vscale_factor; > + > + vc4_state =3D to_vc4_plane_state(state); > + crtc_state =3D drm_atomic_get_existing_crtc_state(state->state, > + state->crtc); > + vrefresh =3D drm_mode_vrefresh(&crtc_state->adjusted_mode); > + > + /* The HVS is able to process 2 pixels/cycle when scaling the source, > + * 4 pixels/cycle otherwise. > + * Alpha blending step seems to be pipelined and it's always operating > + * at 4 pixels/cycle, so the limiting aspect here seems to be the > + * scaler block. > + * HVS load is expressed in clk-cycles/sec (AKA Hz). > + */ > + if (vc4_state->x_scaling[0] !=3D VC4_SCALING_NONE || > + vc4_state->x_scaling[1] !=3D VC4_SCALING_NONE || > + vc4_state->y_scaling[0] !=3D VC4_SCALING_NONE || > + vc4_state->y_scaling[1] !=3D VC4_SCALING_NONE) > + hvs_load_shift =3D 1; > + else > + hvs_load_shift =3D 2; > + > + vc4_state->membus_load =3D 0; > + vc4_state->hvs_load =3D 0; > + for (i =3D 0; i < fb->format->num_planes; i++) { > + unsigned long pixels_load; I'm scared any time I see longs. Do you want 32 or 64 bits here? > + /* Even if the bandwidth/plane required for a single frame is > + * > + * vc4_state->src_w[i] * vc4_state->src_h[i] * cpp * vrefresh > + * > + * when downscaling, we have to read more pixels per line in > + * the time frame reserved for a single line, so the bandwidth > + * demand can be punctually higher. To account for that, we > + * calculate the down-scaling factor and multiply the plane > + * load by this number. We're likely over-estimating the read > + * demand, but that's better than under-estimating it. > + */ > + vscale_factor =3D DIV_ROUND_UP(vc4_state->src_h[i], > + vc4_state->crtc_h); > + pixels_load =3D vc4_state->src_w[i] * vc4_state->src_h[i] * > + vscale_factor; If we're upscaling (common for video, right?), aren't we under-counting the cost? You need to scale/colorspace-convert crtc_w * crtc_h at 2 pixels per cycle. Overall, this is simpler than I expected it to be, and looks really promising. Thanks for working on it! > + vc4_state->membus_load +=3D fb->format->cpp[i] * pixels_load; > + vc4_state->hvs_load +=3D pixels_load; > + } > + > + vc4_state->hvs_load *=3D vrefresh; > + vc4_state->hvs_load >>=3D hvs_load_shift; > + vc4_state->membus_load *=3D vrefresh; > +} > + --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEE/JuuFDWp9/ZkuCBXtdYpNtH8nugFAlvY5XcACgkQtdYpNtH8 nuibEQ/9GUT013TpqltsPO0ebX0Oj1dvsxTVRAs+7DyzNmb98lMI90EcEV1WKyjB 83kp2KtUS6b7duFDDBlTp1FQ+9jENQPfXFqR8G6AAu2w3Mt6B8Kxd0+KKv7hQOA7 dsQanN4OdHz9lSSvbaEQE7YfihtSjn7k39mv4VkQnrAq81pprEykupt0AW3F+Y0p LZjsOeck15RoMur8En9eRTtkw9uC0sQinqFCxtvZrRs7OOn87A6FmiNfg/HF61T2 Np2dzH43ap2oEmOx84Wl5fuUwNw57QcpY5fBNpZn+P1KYMP/8FbG0IHrbtfYzmNu DD1hrVR8O6j7U4zO7Ch+wlND/bS1sSXta3oyNvPV5nIC6CggwdRUmqcyVvMyTNfK 6pOb/zNgDRFM3b1twwlrASwfXyIPBM6fQa0omDGoYXCnZo725RsPtC+S8hclDVgE bbX7+MTIKAYwFWpL1U5ZYo683iZ8+vsUHccIi0GWVQ0MwFB9uPbKsPWo/mkQ1HSP nw4vlWUoRIUhAHREx691/rWUCt7S1oKhbI8ROpYu5BYL6X1Kib6If2vj8vU0VmnZ Q0LUGXBdb+m0WXRQP8ZrKTN8T0/bC5Z2NYGpCO67ShnYXaGGMVojgxS2hfO4yBvc oMlIsC7up5bDutHz7lrywQw0H60Ak5J2MAQETxHO4yFQt0Ffp8w= =yqy+ -----END PGP SIGNATURE----- --=-=-=-- --===============0078336996== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0078336996==--