From: Dmitry Osipenko <digetx@gmail.com>
To: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: Thierry Reding <thierry.reding@gmail.com>,
Jonathan Hunter <jonathanh@nvidia.com>,
Matt Merhar <mattmerhar@protonmail.com>,
Peter Geis <pgwipeout@gmail.com>,
Nicolas Chauvet <kwizart@gmail.com>,
linux-tegra@vger.kernel.org, linux-pm@vger.kernel.org,
linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v15 1/2] drm/tegra: dc: Support memory bandwidth management
Date: Mon, 15 Mar 2021 21:39:06 +0300 [thread overview]
Message-ID: <1158bbca-8880-918d-7564-e2e30552e6b3@gmail.com> (raw)
In-Reply-To: <20210314223119.GC2733@qmqm.qmqm.pl>
15.03.2021 01:31, Michał Mirosław пишет:
> On Thu, Mar 11, 2021 at 08:22:54PM +0300, Dmitry Osipenko wrote:
>> Display controller (DC) performs isochronous memory transfers, and thus,
>> has a requirement for a minimum memory bandwidth that shall be fulfilled,
>> otherwise framebuffer data can't be fetched fast enough and this results
>> in a DC's data-FIFO underflow that follows by a visual corruption.
> [...]
>> +static unsigned long
>> +tegra_plane_overlap_mask(struct drm_crtc_state *state,
>> + const struct drm_plane_state *plane_state)
>> +{
>> + const struct drm_plane_state *other_state;
>> + const struct tegra_plane *tegra;
>> + unsigned long overlap_mask = 0;
>> + struct drm_plane *plane;
>> + struct drm_rect rect;
>> +
>> + if (!plane_state->visible || !plane_state->fb)
>> + return 0;
>> +
>> + drm_atomic_crtc_state_for_each_plane_state(plane, other_state, state) {
> [...]
>> + /*
>> + * Data-prefetch FIFO will easily help to overcome temporal memory
>> + * pressure if other plane overlaps with the cursor plane.
>> + */
>> + if (tegra_plane_is_cursor(plane_state) && overlap_mask)
>> + return 0;
>> +
>> + return overlap_mask;
>> +}
>
> Since for cursor plane this always returns 0, you could test
> tegra_plane_is_cursor() at the start of the function.
Yes, thanks.
>> +static int tegra_crtc_calculate_memory_bandwidth(struct drm_crtc *crtc,
>> + struct drm_atomic_state *state)
> [...]
>> + /*
>> + * For overlapping planes pixel's data is fetched for each plane at
>> + * the same time, hence bandwidths are accumulated in this case.
>> + * This needs to be taken into account for calculating total bandwidth
>> + * consumed by all planes.
>> + *
>> + * Here we get the overlapping state of each plane, which is a
>> + * bitmask of plane indices telling with what planes there is an
>> + * overlap. Note that bitmask[plane] includes BIT(plane) in order
>> + * to make further code nicer and simpler.
>> + */
>> + drm_atomic_crtc_state_for_each_plane_state(plane, plane_state, new_state) {
>> + tegra_state = to_const_tegra_plane_state(plane_state);
>> + tegra = to_tegra_plane(plane);
>> +
>> + if (WARN_ON_ONCE(tegra->index >= TEGRA_DC_LEGACY_PLANES_NUM))
>> + return -EINVAL;
>> +
>> + plane_peak_bw[tegra->index] = tegra_state->peak_memory_bandwidth;
>> + mask = tegra_plane_overlap_mask(new_state, plane_state);
>> + overlap_mask[tegra->index] = mask;
>> +
>> + if (hweight_long(mask) != 3)
>> + all_planes_overlap_simultaneously = false;
>> + }
>> +
>> + old_state = drm_atomic_get_old_crtc_state(state, crtc);
>> + old_dc_state = to_const_dc_state(old_state);
>> +
>> + /*
>> + * Then we calculate maximum bandwidth of each plane state.
>> + * The bandwidth includes the plane BW + BW of the "simultaneously"
>> + * overlapping planes, where "simultaneously" means areas where DC
>> + * fetches from the planes simultaneously during of scan-out process.
>> + *
>> + * For example, if plane A overlaps with planes B and C, but B and C
>> + * don't overlap, then the peak bandwidth will be either in area where
>> + * A-and-B or A-and-C planes overlap.
>> + *
>> + * The plane_peak_bw[] contains peak memory bandwidth values of
>> + * each plane, this information is needed by interconnect provider
>> + * in order to set up latency allowness based on the peak BW, see
>> + * tegra_crtc_update_memory_bandwidth().
>> + */
>> + for (i = 0; i < ARRAY_SIZE(plane_peak_bw); i++) {
>> + overlap_bw = 0;
>> +
>> + for_each_set_bit(k, &overlap_mask[i], 3) {
>> + if (k == i)
>> + continue;
>> +
>> + if (all_planes_overlap_simultaneously)
>> + overlap_bw += plane_peak_bw[k];
>> + else
>> + overlap_bw = max(overlap_bw, plane_peak_bw[k]);
>> + }
>> +
>> + new_dc_state->plane_peak_bw[i] = plane_peak_bw[i] + overlap_bw;
>> +
>> + /*
>> + * If plane's peak bandwidth changed (for example plane isn't
>> + * overlapped anymore) and plane isn't in the atomic state,
>> + * then add plane to the state in order to have the bandwidth
>> + * updated.
>> + */
>> + if (old_dc_state->plane_peak_bw[i] !=
>> + new_dc_state->plane_peak_bw[i]) {
>> + plane = tegra_crtc_get_plane_by_index(crtc, i);
>> + if (!plane)
>> + continue;
>> +
>> + plane_state = drm_atomic_get_plane_state(state, plane);
>> + if (IS_ERR(plane_state))
>> + return PTR_ERR(plane_state);
>> + }
>> + }
> [...]
>
> Does it matter to which channel (plane) the peak bw is attached? Would
> it still work if the first channel specified max(peak_bw of overlaps)
> and others only zeroes?
The latency allowance will be configured incorrectly for the case of
zeroes by the memory driver, hence it does matter.
>> + /*
>> + * Horizontal downscale needs a lower memory latency, which roughly
>> + * depends on the scaled width. Trying to tune latency of a memory
>> + * client alone will likely result in a strong negative impact on
>> + * other memory clients, hence we will request a higher bandwidth
>> + * since latency depends on bandwidth. This allows to prevent memory
>> + * FIFO underflows for a large plane downscales, meanwhile allowing
>> + * display to share bandwidth fairly with other memory clients.
>> + */
>> + if (src_w > dst_w)
>> + mul = (src_w - dst_w) * bpp / 2048 + 1;
>> + else
>> + mul = 1;
> [...]
>
> One point is unexplained yet: why is the multiplier proportional to a
> *difference* between src and dst widths? Also, I would expect max (worst
> case) is pixclock * read_size when src_w/dst_w >= read_size.
IIRC, the difference gives a more adequate/practical result than the
proportion. Although, downstream driver uses proportion. I'll try to
revisit this for the next version of the patch.
> BTW, you could move this below and : if (src > dst_w) peak_bandwidth *= ...
Indeed, and should be a bit nicer to use 'mul' in both cases.
WARNING: multiple messages have this Message-ID (diff)
From: Dmitry Osipenko <digetx@gmail.com>
To: "Michał Mirosław" <mirq-linux@rere.qmqm.pl>
Cc: linux-pm@vger.kernel.org, Nicolas Chauvet <kwizart@gmail.com>,
linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
Jonathan Hunter <jonathanh@nvidia.com>,
Thierry Reding <thierry.reding@gmail.com>,
Matt Merhar <mattmerhar@protonmail.com>,
Peter Geis <pgwipeout@gmail.com>,
linux-tegra@vger.kernel.org
Subject: Re: [PATCH v15 1/2] drm/tegra: dc: Support memory bandwidth management
Date: Mon, 15 Mar 2021 21:39:06 +0300 [thread overview]
Message-ID: <1158bbca-8880-918d-7564-e2e30552e6b3@gmail.com> (raw)
In-Reply-To: <20210314223119.GC2733@qmqm.qmqm.pl>
15.03.2021 01:31, Michał Mirosław пишет:
> On Thu, Mar 11, 2021 at 08:22:54PM +0300, Dmitry Osipenko wrote:
>> Display controller (DC) performs isochronous memory transfers, and thus,
>> has a requirement for a minimum memory bandwidth that shall be fulfilled,
>> otherwise framebuffer data can't be fetched fast enough and this results
>> in a DC's data-FIFO underflow that follows by a visual corruption.
> [...]
>> +static unsigned long
>> +tegra_plane_overlap_mask(struct drm_crtc_state *state,
>> + const struct drm_plane_state *plane_state)
>> +{
>> + const struct drm_plane_state *other_state;
>> + const struct tegra_plane *tegra;
>> + unsigned long overlap_mask = 0;
>> + struct drm_plane *plane;
>> + struct drm_rect rect;
>> +
>> + if (!plane_state->visible || !plane_state->fb)
>> + return 0;
>> +
>> + drm_atomic_crtc_state_for_each_plane_state(plane, other_state, state) {
> [...]
>> + /*
>> + * Data-prefetch FIFO will easily help to overcome temporal memory
>> + * pressure if other plane overlaps with the cursor plane.
>> + */
>> + if (tegra_plane_is_cursor(plane_state) && overlap_mask)
>> + return 0;
>> +
>> + return overlap_mask;
>> +}
>
> Since for cursor plane this always returns 0, you could test
> tegra_plane_is_cursor() at the start of the function.
Yes, thanks.
>> +static int tegra_crtc_calculate_memory_bandwidth(struct drm_crtc *crtc,
>> + struct drm_atomic_state *state)
> [...]
>> + /*
>> + * For overlapping planes pixel's data is fetched for each plane at
>> + * the same time, hence bandwidths are accumulated in this case.
>> + * This needs to be taken into account for calculating total bandwidth
>> + * consumed by all planes.
>> + *
>> + * Here we get the overlapping state of each plane, which is a
>> + * bitmask of plane indices telling with what planes there is an
>> + * overlap. Note that bitmask[plane] includes BIT(plane) in order
>> + * to make further code nicer and simpler.
>> + */
>> + drm_atomic_crtc_state_for_each_plane_state(plane, plane_state, new_state) {
>> + tegra_state = to_const_tegra_plane_state(plane_state);
>> + tegra = to_tegra_plane(plane);
>> +
>> + if (WARN_ON_ONCE(tegra->index >= TEGRA_DC_LEGACY_PLANES_NUM))
>> + return -EINVAL;
>> +
>> + plane_peak_bw[tegra->index] = tegra_state->peak_memory_bandwidth;
>> + mask = tegra_plane_overlap_mask(new_state, plane_state);
>> + overlap_mask[tegra->index] = mask;
>> +
>> + if (hweight_long(mask) != 3)
>> + all_planes_overlap_simultaneously = false;
>> + }
>> +
>> + old_state = drm_atomic_get_old_crtc_state(state, crtc);
>> + old_dc_state = to_const_dc_state(old_state);
>> +
>> + /*
>> + * Then we calculate maximum bandwidth of each plane state.
>> + * The bandwidth includes the plane BW + BW of the "simultaneously"
>> + * overlapping planes, where "simultaneously" means areas where DC
>> + * fetches from the planes simultaneously during of scan-out process.
>> + *
>> + * For example, if plane A overlaps with planes B and C, but B and C
>> + * don't overlap, then the peak bandwidth will be either in area where
>> + * A-and-B or A-and-C planes overlap.
>> + *
>> + * The plane_peak_bw[] contains peak memory bandwidth values of
>> + * each plane, this information is needed by interconnect provider
>> + * in order to set up latency allowness based on the peak BW, see
>> + * tegra_crtc_update_memory_bandwidth().
>> + */
>> + for (i = 0; i < ARRAY_SIZE(plane_peak_bw); i++) {
>> + overlap_bw = 0;
>> +
>> + for_each_set_bit(k, &overlap_mask[i], 3) {
>> + if (k == i)
>> + continue;
>> +
>> + if (all_planes_overlap_simultaneously)
>> + overlap_bw += plane_peak_bw[k];
>> + else
>> + overlap_bw = max(overlap_bw, plane_peak_bw[k]);
>> + }
>> +
>> + new_dc_state->plane_peak_bw[i] = plane_peak_bw[i] + overlap_bw;
>> +
>> + /*
>> + * If plane's peak bandwidth changed (for example plane isn't
>> + * overlapped anymore) and plane isn't in the atomic state,
>> + * then add plane to the state in order to have the bandwidth
>> + * updated.
>> + */
>> + if (old_dc_state->plane_peak_bw[i] !=
>> + new_dc_state->plane_peak_bw[i]) {
>> + plane = tegra_crtc_get_plane_by_index(crtc, i);
>> + if (!plane)
>> + continue;
>> +
>> + plane_state = drm_atomic_get_plane_state(state, plane);
>> + if (IS_ERR(plane_state))
>> + return PTR_ERR(plane_state);
>> + }
>> + }
> [...]
>
> Does it matter to which channel (plane) the peak bw is attached? Would
> it still work if the first channel specified max(peak_bw of overlaps)
> and others only zeroes?
The latency allowance will be configured incorrectly for the case of
zeroes by the memory driver, hence it does matter.
>> + /*
>> + * Horizontal downscale needs a lower memory latency, which roughly
>> + * depends on the scaled width. Trying to tune latency of a memory
>> + * client alone will likely result in a strong negative impact on
>> + * other memory clients, hence we will request a higher bandwidth
>> + * since latency depends on bandwidth. This allows to prevent memory
>> + * FIFO underflows for a large plane downscales, meanwhile allowing
>> + * display to share bandwidth fairly with other memory clients.
>> + */
>> + if (src_w > dst_w)
>> + mul = (src_w - dst_w) * bpp / 2048 + 1;
>> + else
>> + mul = 1;
> [...]
>
> One point is unexplained yet: why is the multiplier proportional to a
> *difference* between src and dst widths? Also, I would expect max (worst
> case) is pixclock * read_size when src_w/dst_w >= read_size.
IIRC, the difference gives a more adequate/practical result than the
proportion. Although, downstream driver uses proportion. I'll try to
revisit this for the next version of the patch.
> BTW, you could move this below and : if (src > dst_w) peak_bandwidth *= ...
Indeed, and should be a bit nicer to use 'mul' in both cases.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
next prev parent reply other threads:[~2021-03-15 18:39 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-11 17:22 [PATCH v15 0/2] Add memory bandwidth management to NVIDIA Tegra DRM driver Dmitry Osipenko
2021-03-11 17:22 ` Dmitry Osipenko
2021-03-11 17:22 ` [PATCH v15 1/2] drm/tegra: dc: Support memory bandwidth management Dmitry Osipenko
2021-03-11 17:22 ` Dmitry Osipenko
2021-03-14 22:31 ` Michał Mirosław
2021-03-14 22:31 ` Michał Mirosław
2021-03-15 18:39 ` Dmitry Osipenko [this message]
2021-03-15 18:39 ` Dmitry Osipenko
2021-03-16 22:56 ` Dmitry Osipenko
2021-03-16 22:56 ` Dmitry Osipenko
2021-03-11 17:22 ` [PATCH v15 2/2] drm/tegra: dc: Extend debug stats with total number of events Dmitry Osipenko
2021-03-11 17:22 ` Dmitry Osipenko
2021-03-14 22:11 ` Michał Mirosław
2021-03-14 22:11 ` Michał Mirosław
2021-03-17 17:44 ` Dmitry Osipenko
2021-03-17 17:44 ` Dmitry Osipenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1158bbca-8880-918d-7564-e2e30552e6b3@gmail.com \
--to=digetx@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=jonathanh@nvidia.com \
--cc=kwizart@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=mattmerhar@protonmail.com \
--cc=mirq-linux@rere.qmqm.pl \
--cc=pgwipeout@gmail.com \
--cc=thierry.reding@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.