* Re: [PATCH] Enable '-Werror' by default for all kernel builds [not found] ` <CAHk-=wgZkQ+eZ02TaCpAWo_ffiLMwA2tYNHyL+B1dQ4YB0qfmA@mail.gmail.com> @ 2021-09-07 17:33 ` Linus Torvalds 2021-09-07 21:07 ` Harry Wentland 0 siblings, 1 reply; 16+ messages in thread From: Linus Torvalds @ 2021-09-07 17:33 UTC (permalink / raw) To: Arnd Bergmann, Harry Wentland, Leo Li, Alex Deucher, Christian König, Pan, Xinhui Cc: Nathan Chancellor, Guenter Roeck, Linux Kernel Mailing List, llvm, Nick Desaulniers, amd-gfx [-- Attachment #1: Type: text/plain, Size: 2203 bytes --] On Tue, Sep 7, 2021 at 10:10 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Do I know why? No. I do note that that code is disgusting. > > It's passing one of those structs around by value, for example. That's > a 72-byte structure that is copied on the stack due to stupid calling > conventions. Maybe clang generates a few extra temporaries for it as > part of the function call stack setup? Who knows.. Ooh, yes. This attached patch is crap - it converts the helper functions to use const pointers instead of passing the whole structure, but it then only converts that one file that *uses* them. So the end result will not compile in general, but you can do make drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_rq_dlg_calc_30.o and it compiles for me. And while gcc doesn't care that much - it will apparently either generate the argument stack every call - clang cares deeply. The nasty 720-byte stack frame that clang generates turns into just a 320-byte one, and code generation in general looks a *lot* better. Now, as mentioned, this patch is broken and incomplete. But I really think the AMD GPU people need to do this. It makes those functions go from practically unusable to not horribly disgusting. So Harry/Leo/Alex/Christian and amd-gfx list - can you look into making this ugly "make one file compile better" patch actually work properly? It *looks* lto me ike that code was perhaps written for a C++ compiler and the helpers have been written as a "pass by reference", and the arguments used to be const display_data_rq_misc_params_st& rq_misc_param and then the compiler will pass the argument as a pointer. And then it was converted to C, and the "pass by reference" in the function declaration was turned into "pass by value", to avoid changing "." to "->" in the use. But a '&arg' thing in C++ really is a '*arg' pointer in C, and should have been done as that. Of course, it's also possible that that code was simply written by somebody who didn't understand just *how* horrible it is to pass structures bigger than a word or two by value. Do we have a compiler warning for passing big structures by value? Linus [-- Attachment #2: patch.diff --] [-- Type: text/x-patch, Size: 17439 bytes --] .../display/dc/dml/dcn30/display_rq_dlg_calc_30.c | 142 ++++++++++----------- .../display/dc/dml/dcn30/display_rq_dlg_calc_30.h | 2 +- .../amd/display/dc/dml/display_rq_dlg_helpers.h | 20 +-- 3 files changed, 82 insertions(+), 82 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c index 0d934fae1c3a..9b5ee53d2de3 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c @@ -92,7 +92,7 @@ static void extract_rq_sizing_regs(struct display_mode_lib *mode_lib, const display_data_rq_sizing_params_st rq_sizing) { dml_print("DML_DLG: %s: rq_sizing param\n", __func__); - print__data_rq_sizing_params_st(mode_lib, rq_sizing); + print__data_rq_sizing_params_st(mode_lib, &rq_sizing); rq_regs->chunk_size = dml_log2(rq_sizing.chunk_bytes) - 10; @@ -113,28 +113,28 @@ static void extract_rq_sizing_regs(struct display_mode_lib *mode_lib, static void extract_rq_regs(struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_rq_params_st rq_param) + const display_rq_params_st *rq_param) { unsigned int detile_buf_size_in_bytes = mode_lib->ip.det_buffer_size_kbytes * 1024; unsigned int detile_buf_plane1_addr = 0; - extract_rq_sizing_regs(mode_lib, &(rq_regs->rq_regs_l), rq_param.sizing.rq_l); + extract_rq_sizing_regs(mode_lib, &(rq_regs->rq_regs_l), rq_param->sizing.rq_l); - rq_regs->rq_regs_l.pte_row_height_linear = dml_floor(dml_log2(rq_param.dlg.rq_l.dpte_row_height), + rq_regs->rq_regs_l.pte_row_height_linear = dml_floor(dml_log2(rq_param->dlg.rq_l.dpte_row_height), 1) - 3; - if (rq_param.yuv420) { - extract_rq_sizing_regs(mode_lib, &(rq_regs->rq_regs_c), rq_param.sizing.rq_c); - rq_regs->rq_regs_c.pte_row_height_linear = dml_floor(dml_log2(rq_param.dlg.rq_c.dpte_row_height), + if (rq_param->yuv420) { + extract_rq_sizing_regs(mode_lib, &(rq_regs->rq_regs_c), rq_param->sizing.rq_c); + rq_regs->rq_regs_c.pte_row_height_linear = dml_floor(dml_log2(rq_param->dlg.rq_c.dpte_row_height), 1) - 3; } - rq_regs->rq_regs_l.swath_height = dml_log2(rq_param.dlg.rq_l.swath_height); - rq_regs->rq_regs_c.swath_height = dml_log2(rq_param.dlg.rq_c.swath_height); + rq_regs->rq_regs_l.swath_height = dml_log2(rq_param->dlg.rq_l.swath_height); + rq_regs->rq_regs_c.swath_height = dml_log2(rq_param->dlg.rq_c.swath_height); // FIXME: take the max between luma, chroma chunk size? // okay for now, as we are setting chunk_bytes to 8kb anyways - if (rq_param.sizing.rq_l.chunk_bytes >= 32 * 1024 || (rq_param.yuv420 && rq_param.sizing.rq_c.chunk_bytes >= 32 * 1024)) { //32kb + if (rq_param->sizing.rq_l.chunk_bytes >= 32 * 1024 || (rq_param->yuv420 && rq_param->sizing.rq_c.chunk_bytes >= 32 * 1024)) { //32kb rq_regs->drq_expansion_mode = 0; } else { rq_regs->drq_expansion_mode = 2; @@ -143,9 +143,9 @@ static void extract_rq_regs(struct display_mode_lib *mode_lib, rq_regs->mrq_expansion_mode = 1; rq_regs->crq_expansion_mode = 1; - if (rq_param.yuv420) { - if ((double)rq_param.misc.rq_l.stored_swath_bytes - / (double)rq_param.misc.rq_c.stored_swath_bytes <= 1.5) { + if (rq_param->yuv420) { + if ((double)rq_param->misc.rq_l.stored_swath_bytes + / (double)rq_param->misc.rq_c.stored_swath_bytes <= 1.5) { detile_buf_plane1_addr = (detile_buf_size_in_bytes / 2.0 / 64.0); // half to chroma } else { detile_buf_plane1_addr = dml_round_to_multiple((unsigned int)((2.0 * detile_buf_size_in_bytes) / 3.0), @@ -747,7 +747,7 @@ static void get_surf_rq_param(struct display_mode_lib *mode_lib, display_data_rq_sizing_params_st *rq_sizing_param, display_data_rq_dlg_params_st *rq_dlg_param, display_data_rq_misc_params_st *rq_misc_param, - const display_pipe_params_st pipe_param, + const display_pipe_params_st *pipe_param, bool is_chroma, bool is_alpha) { @@ -761,32 +761,32 @@ static void get_surf_rq_param(struct display_mode_lib *mode_lib, // FIXME check if ppe apply for both luma and chroma in 422 case if (is_chroma | is_alpha) { - vp_width = pipe_param.src.viewport_width_c / ppe; - vp_height = pipe_param.src.viewport_height_c; - data_pitch = pipe_param.src.data_pitch_c; - meta_pitch = pipe_param.src.meta_pitch_c; - surface_height = pipe_param.src.surface_height_y / 2.0; + vp_width = pipe_param->src.viewport_width_c / ppe; + vp_height = pipe_param->src.viewport_height_c; + data_pitch = pipe_param->src.data_pitch_c; + meta_pitch = pipe_param->src.meta_pitch_c; + surface_height = pipe_param->src.surface_height_y / 2.0; } else { - vp_width = pipe_param.src.viewport_width / ppe; - vp_height = pipe_param.src.viewport_height; - data_pitch = pipe_param.src.data_pitch; - meta_pitch = pipe_param.src.meta_pitch; - surface_height = pipe_param.src.surface_height_y; + vp_width = pipe_param->src.viewport_width / ppe; + vp_height = pipe_param->src.viewport_height; + data_pitch = pipe_param->src.data_pitch; + meta_pitch = pipe_param->src.meta_pitch; + surface_height = pipe_param->src.surface_height_y; } - if (pipe_param.dest.odm_combine) { + if (pipe_param->dest.odm_combine) { unsigned int access_dir = 0; unsigned int full_src_vp_width = 0; unsigned int hactive_odm = 0; unsigned int src_hactive_odm = 0; - access_dir = (pipe_param.src.source_scan == dm_vert); // vp access direction: horizontal or vertical accessed - hactive_odm = pipe_param.dest.hactive / ((unsigned int)pipe_param.dest.odm_combine*2); + access_dir = (pipe_param->src.source_scan == dm_vert); // vp access direction: horizontal or vertical accessed + hactive_odm = pipe_param->dest.hactive / ((unsigned int)pipe_param->dest.odm_combine*2); if (is_chroma) { - full_src_vp_width = pipe_param.scale_ratio_depth.hscl_ratio_c * pipe_param.dest.full_recout_width; - src_hactive_odm = pipe_param.scale_ratio_depth.hscl_ratio_c * hactive_odm; + full_src_vp_width = pipe_param->scale_ratio_depth.hscl_ratio_c * pipe_param->dest.full_recout_width; + src_hactive_odm = pipe_param->scale_ratio_depth.hscl_ratio_c * hactive_odm; } else { - full_src_vp_width = pipe_param.scale_ratio_depth.hscl_ratio * pipe_param.dest.full_recout_width; - src_hactive_odm = pipe_param.scale_ratio_depth.hscl_ratio * hactive_odm; + full_src_vp_width = pipe_param->scale_ratio_depth.hscl_ratio * pipe_param->dest.full_recout_width; + src_hactive_odm = pipe_param->scale_ratio_depth.hscl_ratio * hactive_odm; } if (access_dir == 0) { @@ -815,7 +815,7 @@ static void get_surf_rq_param(struct display_mode_lib *mode_lib, rq_sizing_param->meta_chunk_bytes = 2048; rq_sizing_param->min_meta_chunk_bytes = 256; - if (pipe_param.src.hostvm) + if (pipe_param->src.hostvm) rq_sizing_param->mpte_group_bytes = 512; else rq_sizing_param->mpte_group_bytes = 2048; @@ -828,28 +828,28 @@ static void get_surf_rq_param(struct display_mode_lib *mode_lib, vp_height, data_pitch, meta_pitch, - pipe_param.src.source_format, - pipe_param.src.sw_mode, - pipe_param.src.macro_tile_size, - pipe_param.src.source_scan, - pipe_param.src.hostvm, + pipe_param->src.source_format, + pipe_param->src.sw_mode, + pipe_param->src.macro_tile_size, + pipe_param->src.source_scan, + pipe_param->src.hostvm, is_chroma, surface_height); } static void dml_rq_dlg_get_rq_params(struct display_mode_lib *mode_lib, display_rq_params_st *rq_param, - const display_pipe_params_st pipe_param) + const display_pipe_params_st *pipe_param) { // get param for luma surface - rq_param->yuv420 = pipe_param.src.source_format == dm_420_8 - || pipe_param.src.source_format == dm_420_10 - || pipe_param.src.source_format == dm_rgbe_alpha - || pipe_param.src.source_format == dm_420_12; + rq_param->yuv420 = pipe_param->src.source_format == dm_420_8 + || pipe_param->src.source_format == dm_420_10 + || pipe_param->src.source_format == dm_rgbe_alpha + || pipe_param->src.source_format == dm_420_12; - rq_param->yuv420_10bpc = pipe_param.src.source_format == dm_420_10; + rq_param->yuv420_10bpc = pipe_param->src.source_format == dm_420_10; - rq_param->rgbe_alpha = (pipe_param.src.source_format == dm_rgbe_alpha)?1:0; + rq_param->rgbe_alpha = (pipe_param->src.source_format == dm_rgbe_alpha)?1:0; get_surf_rq_param(mode_lib, &(rq_param->sizing.rq_l), @@ -859,7 +859,7 @@ static void dml_rq_dlg_get_rq_params(struct display_mode_lib *mode_lib, 0, 0); - if (is_dual_plane((enum source_format_class)(pipe_param.src.source_format))) { + if (is_dual_plane((enum source_format_class)(pipe_param->src.source_format))) { // get param for chroma surface get_surf_rq_param(mode_lib, &(rq_param->sizing.rq_c), @@ -871,21 +871,21 @@ static void dml_rq_dlg_get_rq_params(struct display_mode_lib *mode_lib, } // calculate how to split the det buffer space between luma and chroma - handle_det_buf_split(mode_lib, rq_param, pipe_param.src); - print__rq_params_st(mode_lib, *rq_param); + handle_det_buf_split(mode_lib, rq_param, pipe_param->src); + print__rq_params_st(mode_lib, rq_param); } void dml30_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_pipe_params_st pipe_param) + const display_pipe_params_st *pipe_param) { display_rq_params_st rq_param = { 0 }; memset(rq_regs, 0, sizeof(*rq_regs)); dml_rq_dlg_get_rq_params(mode_lib, &rq_param, pipe_param); - extract_rq_regs(mode_lib, rq_regs, rq_param); + extract_rq_regs(mode_lib, rq_regs, &rq_param); - print__rq_regs_st(mode_lib, *rq_regs); + print__rq_regs_st(mode_lib, rq_regs); } static void calculate_ttu_cursor(struct display_mode_lib *mode_lib, @@ -984,8 +984,8 @@ static void dml_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib, const unsigned int pipe_idx, display_dlg_regs_st *disp_dlg_regs, display_ttu_regs_st *disp_ttu_regs, - const display_rq_dlg_params_st rq_dlg_param, - const display_dlg_sys_params_st dlg_sys_param, + const display_rq_dlg_params_st *rq_dlg_param, + const display_dlg_sys_params_st *dlg_sys_param, const bool cstate_en, const bool pstate_en, const bool vm_en, @@ -1137,7 +1137,7 @@ static void dml_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib, * dml_pow(2, 8)); disp_dlg_regs->dlg_vblank_end = interlaced ? (vblank_end / 2) : vblank_end; // 15 bits - min_dcfclk_mhz = dlg_sys_param.deepsleep_dcfclk_mhz; + min_dcfclk_mhz = dlg_sys_param->deepsleep_dcfclk_mhz; t_calc_us = get_tcalc(mode_lib, e2e_pipe_param, num_pipes); min_ttu_vblank = get_min_ttu_vblank(mode_lib, e2e_pipe_param, num_pipes, pipe_idx); @@ -1191,13 +1191,13 @@ static void dml_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib, scl_enable = scl->scl_enable; line_time_in_us = (htotal / pclk_freq_in_mhz); - swath_width_ub_l = rq_dlg_param.rq_l.swath_width_ub; - dpte_groups_per_row_ub_l = rq_dlg_param.rq_l.dpte_groups_per_row_ub; - swath_width_ub_c = rq_dlg_param.rq_c.swath_width_ub; - dpte_groups_per_row_ub_c = rq_dlg_param.rq_c.dpte_groups_per_row_ub; + swath_width_ub_l = rq_dlg_param->rq_l.swath_width_ub; + dpte_groups_per_row_ub_l = rq_dlg_param->rq_l.dpte_groups_per_row_ub; + swath_width_ub_c = rq_dlg_param->rq_c.swath_width_ub; + dpte_groups_per_row_ub_c = rq_dlg_param->rq_c.dpte_groups_per_row_ub; - meta_chunks_per_row_ub_l = rq_dlg_param.rq_l.meta_chunks_per_row_ub; - meta_chunks_per_row_ub_c = rq_dlg_param.rq_c.meta_chunks_per_row_ub; + meta_chunks_per_row_ub_l = rq_dlg_param->rq_l.meta_chunks_per_row_ub; + meta_chunks_per_row_ub_c = rq_dlg_param->rq_c.meta_chunks_per_row_ub; vupdate_offset = dst->vupdate_offset; vupdate_width = dst->vupdate_width; vready_offset = dst->vready_offset; @@ -1379,16 +1379,16 @@ static void dml_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib, dml_print("DML_DLG: %s: vratio_pre_c=%3.2f\n", __func__, vratio_pre_c); // Active - req_per_swath_ub_l = rq_dlg_param.rq_l.req_per_swath_ub; - req_per_swath_ub_c = rq_dlg_param.rq_c.req_per_swath_ub; - meta_row_height_l = rq_dlg_param.rq_l.meta_row_height; - meta_row_height_c = rq_dlg_param.rq_c.meta_row_height; + req_per_swath_ub_l = rq_dlg_param->rq_l.req_per_swath_ub; + req_per_swath_ub_c = rq_dlg_param->rq_c.req_per_swath_ub; + meta_row_height_l = rq_dlg_param->rq_l.meta_row_height; + meta_row_height_c = rq_dlg_param->rq_c.meta_row_height; swath_width_pixels_ub_l = 0; swath_width_pixels_ub_c = 0; scaler_rec_in_width_l = 0; scaler_rec_in_width_c = 0; - dpte_row_height_l = rq_dlg_param.rq_l.dpte_row_height; - dpte_row_height_c = rq_dlg_param.rq_c.dpte_row_height; + dpte_row_height_l = rq_dlg_param->rq_l.dpte_row_height; + dpte_row_height_c = rq_dlg_param->rq_c.dpte_row_height; if (mode_422) { swath_width_pixels_ub_l = swath_width_ub_l * 2; // *2 for 2 pixel per element @@ -1824,8 +1824,8 @@ static void dml_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib, disp_ttu_regs->min_ttu_vblank = min_ttu_vblank * refclk_freq_in_mhz; ASSERT(disp_ttu_regs->min_ttu_vblank < dml_pow(2, 24)); - print__ttu_regs_st(mode_lib, *disp_ttu_regs); - print__dlg_regs_st(mode_lib, *disp_dlg_regs); + print__ttu_regs_st(mode_lib, disp_ttu_regs); + print__dlg_regs_st(mode_lib, disp_dlg_regs); } void dml30_rq_dlg_get_dlg_reg(struct display_mode_lib *mode_lib, @@ -1861,20 +1861,20 @@ void dml30_rq_dlg_get_dlg_reg(struct display_mode_lib *mode_lib, dlg_sys_param.t_srx_delay_us = mode_lib->ip.dcfclk_cstate_latency / dlg_sys_param.deepsleep_dcfclk_mhz; // TODO: Deprecated - print__dlg_sys_params_st(mode_lib, dlg_sys_param); + print__dlg_sys_params_st(mode_lib, &dlg_sys_param); // system parameter calculation done dml_print("DML_DLG: Calculation for pipe[%d] start\n\n", pipe_idx); - dml_rq_dlg_get_rq_params(mode_lib, &rq_param, e2e_pipe_param[pipe_idx].pipe); + dml_rq_dlg_get_rq_params(mode_lib, &rq_param, &e2e_pipe_param[pipe_idx].pipe); dml_rq_dlg_get_dlg_params(mode_lib, e2e_pipe_param, num_pipes, pipe_idx, dlg_regs, ttu_regs, - rq_param.dlg, - dlg_sys_param, + &rq_param.dlg, + &dlg_sys_param, cstate_en, pstate_en, vm_en, diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.h b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.h index c04965cceff3..b40abc41828a 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.h +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.h @@ -41,7 +41,7 @@ struct display_mode_lib; // See also: <display_rq_regs_st> void dml30_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_pipe_params_st pipe_param); + const display_pipe_params_st *pipe_param); // Function: dml_rq_dlg_get_dlg_reg // Calculate and return DLG and TTU register struct given the system setting diff --git a/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.h b/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.h index 2555ef0358c2..fb61a0b1470f 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.h +++ b/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.h @@ -31,16 +31,16 @@ /* Function: Printer functions * Print various struct */ -void print__rq_params_st(struct display_mode_lib *mode_lib, display_rq_params_st rq_param); -void print__data_rq_sizing_params_st(struct display_mode_lib *mode_lib, display_data_rq_sizing_params_st rq_sizing); -void print__data_rq_dlg_params_st(struct display_mode_lib *mode_lib, display_data_rq_dlg_params_st rq_dlg_param); -void print__data_rq_misc_params_st(struct display_mode_lib *mode_lib, display_data_rq_misc_params_st rq_misc_param); -void print__rq_dlg_params_st(struct display_mode_lib *mode_lib, display_rq_dlg_params_st rq_dlg_param); -void print__dlg_sys_params_st(struct display_mode_lib *mode_lib, display_dlg_sys_params_st dlg_sys_param); +void print__rq_params_st(struct display_mode_lib *mode_lib, const display_rq_params_st *rq_param); +void print__data_rq_sizing_params_st(struct display_mode_lib *mode_lib, const display_data_rq_sizing_params_st *rq_sizing); +void print__data_rq_dlg_params_st(struct display_mode_lib *mode_lib, const display_data_rq_dlg_params_st *rq_dlg_param); +void print__data_rq_misc_params_st(struct display_mode_lib *mode_lib, const display_data_rq_misc_params_st *rq_misc_param); +void print__rq_dlg_params_st(struct display_mode_lib *mode_lib, const display_rq_dlg_params_st *rq_dlg_param); +void print__dlg_sys_params_st(struct display_mode_lib *mode_lib, const display_dlg_sys_params_st *dlg_sys_param); -void print__data_rq_regs_st(struct display_mode_lib *mode_lib, display_data_rq_regs_st data_rq_regs); -void print__rq_regs_st(struct display_mode_lib *mode_lib, display_rq_regs_st rq_regs); -void print__dlg_regs_st(struct display_mode_lib *mode_lib, display_dlg_regs_st dlg_regs); -void print__ttu_regs_st(struct display_mode_lib *mode_lib, display_ttu_regs_st ttu_regs); +void print__data_rq_regs_st(struct display_mode_lib *mode_lib, const display_data_rq_regs_st *data_rq_regs); +void print__rq_regs_st(struct display_mode_lib *mode_lib, const display_rq_regs_st *rq_regs); +void print__dlg_regs_st(struct display_mode_lib *mode_lib, const display_dlg_regs_st *dlg_regs); +void print__ttu_regs_st(struct display_mode_lib *mode_lib, const display_ttu_regs_st *ttu_regs); #endif ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] Enable '-Werror' by default for all kernel builds 2021-09-07 17:33 ` [PATCH] Enable '-Werror' by default for all kernel builds Linus Torvalds @ 2021-09-07 21:07 ` Harry Wentland 2021-09-08 3:52 ` Harry Wentland 0 siblings, 1 reply; 16+ messages in thread From: Harry Wentland @ 2021-09-07 21:07 UTC (permalink / raw) To: Linus Torvalds, Arnd Bergmann, Leo Li, Alex Deucher, Christian König, Pan, Xinhui Cc: Nathan Chancellor, Guenter Roeck, Linux Kernel Mailing List, llvm, Nick Desaulniers, amd-gfx On 2021-09-07 1:33 p.m., Linus Torvalds wrote: > On Tue, Sep 7, 2021 at 10:10 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> Do I know why? No. I do note that that code is disgusting. >> >> It's passing one of those structs around by value, for example. That's >> a 72-byte structure that is copied on the stack due to stupid calling >> conventions. Maybe clang generates a few extra temporaries for it as >> part of the function call stack setup? Who knows.. > > Ooh, yes. > > This attached patch is crap - it converts the helper functions to use > const pointers instead of passing the whole structure, but it then > only converts that one file that *uses* them. > > So the end result will not compile in general, but you can do > > make drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_rq_dlg_calc_30.o > > and it compiles for me. > > And while gcc doesn't care that much - it will apparently either > generate the argument stack every call - clang cares deeply. > > The nasty 720-byte stack frame that clang generates turns into just a > 320-byte one, and code generation in general looks a *lot* better. > > Now, as mentioned, this patch is broken and incomplete. But I really > think the AMD GPU people need to do this. It makes those functions go > from practically unusable to not horribly disgusting. > > So Harry/Leo/Alex/Christian and amd-gfx list - can you look into > making this ugly "make one file compile better" patch actually work > properly? > Yes, will take a look at this tonight. We definitely shouldn't be passing large structs by value. Harry > It *looks* lto me ike that code was perhaps written for a C++ compiler > and the helpers have been written as a "pass by reference", and the > arguments used to be > > const display_data_rq_misc_params_st& rq_misc_param > > and then the compiler will pass the argument as a pointer. And then it > was converted to C, and the "pass by reference" in the function > declaration was turned into "pass by value", to avoid changing "." to > "->" in the use. > > But a '&arg' thing in C++ really is a '*arg' pointer in C, and should > have been done as that. > > Of course, it's also possible that that code was simply written by > somebody who didn't understand just *how* horrible it is to pass > structures bigger than a word or two by value. > > Do we have a compiler warning for passing big structures by value? > > Linus > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Enable '-Werror' by default for all kernel builds 2021-09-07 21:07 ` Harry Wentland @ 2021-09-08 3:52 ` Harry Wentland 2021-09-08 4:41 ` Linus Torvalds 0 siblings, 1 reply; 16+ messages in thread From: Harry Wentland @ 2021-09-08 3:52 UTC (permalink / raw) To: Linus Torvalds, Arnd Bergmann, Leo Li, Alex Deucher, Christian König, Pan, Xinhui Cc: Nathan Chancellor, Guenter Roeck, Linux Kernel Mailing List, llvm, Nick Desaulniers, amd-gfx [-- Attachment #1: Type: text/plain, Size: 3680 bytes --] On 2021-09-07 5:07 p.m., Harry Wentland wrote: > > > On 2021-09-07 1:33 p.m., Linus Torvalds wrote: >> On Tue, Sep 7, 2021 at 10:10 AM Linus Torvalds >> <torvalds@linux-foundation.org> wrote: >>> >>> Do I know why? No. I do note that that code is disgusting. >>> >>> It's passing one of those structs around by value, for example. That's >>> a 72-byte structure that is copied on the stack due to stupid calling >>> conventions. Maybe clang generates a few extra temporaries for it as >>> part of the function call stack setup? Who knows.. >> >> Ooh, yes. >> >> This attached patch is crap - it converts the helper functions to use >> const pointers instead of passing the whole structure, but it then >> only converts that one file that *uses* them. >> >> So the end result will not compile in general, but you can do >> >> make drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_rq_dlg_calc_30.o >> >> and it compiles for me. >> >> And while gcc doesn't care that much - it will apparently either >> generate the argument stack every call - clang cares deeply. >> >> The nasty 720-byte stack frame that clang generates turns into just a >> 320-byte one, and code generation in general looks a *lot* better. >> >> Now, as mentioned, this patch is broken and incomplete. But I really >> think the AMD GPU people need to do this. It makes those functions go >> from practically unusable to not horribly disgusting. >> >> So Harry/Leo/Alex/Christian and amd-gfx list - can you look into >> making this ugly "make one file compile better" patch actually work >> properly? >> > > Yes, will take a look at this tonight. We definitely shouldn't be passing > large structs by value. > Attached patches fix these x86_64 ones reported by Nick: x86_64-alpine.log:drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:452:13: error: stack frame size (1800) exceeds limit (1280) in function 'dcn_bw_calc_rq_dlg_ttu' [-Werror,-Wframe-larger-than] x86_64-alpine.log:drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_rq_dlg_calc_21.c:1657:6: error: stack frame size (1336) exceeds limit (1280) in function 'dml21_rq_dlg_get_dlg_reg' [-Werror,-Wframe-larger-than] x86_64-alpine.log:drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_rq_dlg_calc_30.c:1831:6: error: stack frame size (1352) exceeds limit (1280) in function 'dml30_rq_dlg_get_dlg_reg' [-Werror,-Wframe-larger-than] I'm also seeing one more that might be more challenging to fix but is nearly at 1024: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_mode_vba_21.c:3397:6: error: stack frame size of 1064 bytes in function 'dml21_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than=] The attached patches build and boot without error or warning on a Radeon RX 5500 XT. Harry > Harry > >> It *looks* lto me ike that code was perhaps written for a C++ compiler >> and the helpers have been written as a "pass by reference", and the >> arguments used to be >> >> const display_data_rq_misc_params_st& rq_misc_param >> >> and then the compiler will pass the argument as a pointer. And then it >> was converted to C, and the "pass by reference" in the function >> declaration was turned into "pass by value", to avoid changing "." to >> "->" in the use. >> >> But a '&arg' thing in C++ really is a '*arg' pointer in C, and should >> have been done as that. >> >> Of course, it's also possible that that code was simply written by >> somebody who didn't understand just *how* horrible it is to pass >> structures bigger than a word or two by value. >> >> Do we have a compiler warning for passing big structures by value? >> >> Linus >> > [-- Attachment #2: 0001-drm-amd-display-Pass-display_pipe_params_st-as-const.patch --] [-- Type: text/x-patch, Size: 30495 bytes --] From 353a23b14584705cb194d9bfc91010caacaf42f8 Mon Sep 17 00:00:00 2001 From: Harry Wentland <harry.wentland@amd.com> Date: Tue, 7 Sep 2021 19:40:06 -0400 Subject: [PATCH 1/2] drm/amd/display: Pass display_pipe_params_st as const in DML [Why] This neither needs to be on the stack nor passed by value to each function call. In fact, when building with clang it seems to break the Linux's default 1024 byte stack frame limit. [How] We can simply pass this as a const pointer. Signed-off-by: Harry Wentland <harry.wentland@amd.com> --- .../drm/amd/display/dc/dcn20/dcn20_resource.c | 2 +- .../dc/dml/dcn20/display_rq_dlg_calc_20.c | 6 +- .../dc/dml/dcn20/display_rq_dlg_calc_20.h | 4 +- .../dc/dml/dcn20/display_rq_dlg_calc_20v2.c | 6 +- .../dc/dml/dcn20/display_rq_dlg_calc_20v2.h | 4 +- .../dc/dml/dcn21/display_rq_dlg_calc_21.c | 62 ++++++++-------- .../dc/dml/dcn21/display_rq_dlg_calc_21.h | 4 +- .../dc/dml/dcn30/display_rq_dlg_calc_30.c | 72 +++++++++---------- .../dc/dml/dcn30/display_rq_dlg_calc_30.h | 4 +- .../dc/dml/dcn31/display_rq_dlg_calc_31.c | 68 +++++++++--------- .../dc/dml/dcn31/display_rq_dlg_calc_31.h | 4 +- .../drm/amd/display/dc/dml/display_mode_lib.h | 4 +- 12 files changed, 120 insertions(+), 120 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c index e3e01b17c164..4389b36f0760 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c @@ -3152,7 +3152,7 @@ void dcn20_calculate_dlg_params( context->bw_ctx.dml.funcs.rq_dlg_get_rq_reg(&context->bw_ctx.dml, &context->res_ctx.pipe_ctx[i].rq_regs, - pipes[pipe_idx].pipe); + &pipes[pipe_idx].pipe); pipe_idx++; } } diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c index 2091dd8c252d..8c168f348a27 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c @@ -768,12 +768,12 @@ static void dml20_rq_dlg_get_rq_params(struct display_mode_lib *mode_lib, void dml20_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_pipe_params_st pipe_param) + const display_pipe_params_st *pipe_param) { display_rq_params_st rq_param = {0}; memset(rq_regs, 0, sizeof(*rq_regs)); - dml20_rq_dlg_get_rq_params(mode_lib, &rq_param, pipe_param.src); + dml20_rq_dlg_get_rq_params(mode_lib, &rq_param, pipe_param->src); extract_rq_regs(mode_lib, rq_regs, rq_param); print__rq_regs_st(mode_lib, *rq_regs); @@ -1549,7 +1549,7 @@ static void dml20_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib, void dml20_rq_dlg_get_dlg_reg(struct display_mode_lib *mode_lib, display_dlg_regs_st *dlg_regs, display_ttu_regs_st *ttu_regs, - display_e2e_pipe_params_st *e2e_pipe_param, + const display_e2e_pipe_params_st *e2e_pipe_param, const unsigned int num_pipes, const unsigned int pipe_idx, const bool cstate_en, diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.h b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.h index d0b90947f540..8b23867e97c1 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.h +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.h @@ -43,7 +43,7 @@ struct display_mode_lib; void dml20_rq_dlg_get_rq_reg( struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_pipe_params_st pipe_param); + const display_pipe_params_st *pipe_param); // Function: dml_rq_dlg_get_dlg_reg @@ -61,7 +61,7 @@ void dml20_rq_dlg_get_dlg_reg( struct display_mode_lib *mode_lib, display_dlg_regs_st *dlg_regs, display_ttu_regs_st *ttu_regs, - display_e2e_pipe_params_st *e2e_pipe_param, + const display_e2e_pipe_params_st *e2e_pipe_param, const unsigned int num_pipes, const unsigned int pipe_idx, const bool cstate_en, diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c index 1a0c14e465fa..26ececfd40cd 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.c @@ -768,12 +768,12 @@ static void dml20v2_rq_dlg_get_rq_params(struct display_mode_lib *mode_lib, void dml20v2_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_pipe_params_st pipe_param) + const display_pipe_params_st *pipe_param) { display_rq_params_st rq_param = {0}; memset(rq_regs, 0, sizeof(*rq_regs)); - dml20v2_rq_dlg_get_rq_params(mode_lib, &rq_param, pipe_param.src); + dml20v2_rq_dlg_get_rq_params(mode_lib, &rq_param, pipe_param->src); extract_rq_regs(mode_lib, rq_regs, rq_param); print__rq_regs_st(mode_lib, *rq_regs); @@ -1550,7 +1550,7 @@ static void dml20v2_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib, void dml20v2_rq_dlg_get_dlg_reg(struct display_mode_lib *mode_lib, display_dlg_regs_st *dlg_regs, display_ttu_regs_st *ttu_regs, - display_e2e_pipe_params_st *e2e_pipe_param, + const display_e2e_pipe_params_st *e2e_pipe_param, const unsigned int num_pipes, const unsigned int pipe_idx, const bool cstate_en, diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.h b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.h index 27cf8bed9376..2b4e46ea1c3d 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.h +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20v2.h @@ -43,7 +43,7 @@ struct display_mode_lib; void dml20v2_rq_dlg_get_rq_reg( struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_pipe_params_st pipe_param); + const display_pipe_params_st *pipe_param); // Function: dml_rq_dlg_get_dlg_reg @@ -61,7 +61,7 @@ void dml20v2_rq_dlg_get_dlg_reg( struct display_mode_lib *mode_lib, display_dlg_regs_st *dlg_regs, display_ttu_regs_st *ttu_regs, - display_e2e_pipe_params_st *e2e_pipe_param, + const display_e2e_pipe_params_st *e2e_pipe_param, const unsigned int num_pipes, const unsigned int pipe_idx, const bool cstate_en, diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c b/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c index 287e31052b30..736978c4d40a 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.c @@ -694,7 +694,7 @@ static void get_surf_rq_param( display_data_rq_sizing_params_st *rq_sizing_param, display_data_rq_dlg_params_st *rq_dlg_param, display_data_rq_misc_params_st *rq_misc_param, - const display_pipe_params_st pipe_param, + const display_pipe_params_st *pipe_param, bool is_chroma) { bool mode_422 = false; @@ -706,30 +706,30 @@ static void get_surf_rq_param( // FIXME check if ppe apply for both luma and chroma in 422 case if (is_chroma) { - vp_width = pipe_param.src.viewport_width_c / ppe; - vp_height = pipe_param.src.viewport_height_c; - data_pitch = pipe_param.src.data_pitch_c; - meta_pitch = pipe_param.src.meta_pitch_c; + vp_width = pipe_param->src.viewport_width_c / ppe; + vp_height = pipe_param->src.viewport_height_c; + data_pitch = pipe_param->src.data_pitch_c; + meta_pitch = pipe_param->src.meta_pitch_c; } else { - vp_width = pipe_param.src.viewport_width / ppe; - vp_height = pipe_param.src.viewport_height; - data_pitch = pipe_param.src.data_pitch; - meta_pitch = pipe_param.src.meta_pitch; + vp_width = pipe_param->src.viewport_width / ppe; + vp_height = pipe_param->src.viewport_height; + data_pitch = pipe_param->src.data_pitch; + meta_pitch = pipe_param->src.meta_pitch; } - if (pipe_param.dest.odm_combine) { + if (pipe_param->dest.odm_combine) { unsigned int access_dir; unsigned int full_src_vp_width; unsigned int hactive_half; unsigned int src_hactive_half; - access_dir = (pipe_param.src.source_scan == dm_vert); // vp access direction: horizontal or vertical accessed - hactive_half = pipe_param.dest.hactive / 2; + access_dir = (pipe_param->src.source_scan == dm_vert); // vp access direction: horizontal or vertical accessed + hactive_half = pipe_param->dest.hactive / 2; if (is_chroma) { - full_src_vp_width = pipe_param.scale_ratio_depth.hscl_ratio_c * pipe_param.dest.full_recout_width; - src_hactive_half = pipe_param.scale_ratio_depth.hscl_ratio_c * hactive_half; + full_src_vp_width = pipe_param->scale_ratio_depth.hscl_ratio_c * pipe_param->dest.full_recout_width; + src_hactive_half = pipe_param->scale_ratio_depth.hscl_ratio_c * hactive_half; } else { - full_src_vp_width = pipe_param.scale_ratio_depth.hscl_ratio * pipe_param.dest.full_recout_width; - src_hactive_half = pipe_param.scale_ratio_depth.hscl_ratio * hactive_half; + full_src_vp_width = pipe_param->scale_ratio_depth.hscl_ratio * pipe_param->dest.full_recout_width; + src_hactive_half = pipe_param->scale_ratio_depth.hscl_ratio * hactive_half; } if (access_dir == 0) { @@ -754,7 +754,7 @@ static void get_surf_rq_param( rq_sizing_param->meta_chunk_bytes = 2048; rq_sizing_param->min_meta_chunk_bytes = 256; - if (pipe_param.src.hostvm) + if (pipe_param->src.hostvm) rq_sizing_param->mpte_group_bytes = 512; else rq_sizing_param->mpte_group_bytes = 2048; @@ -768,23 +768,23 @@ static void get_surf_rq_param( vp_height, data_pitch, meta_pitch, - pipe_param.src.source_format, - pipe_param.src.sw_mode, - pipe_param.src.macro_tile_size, - pipe_param.src.source_scan, - pipe_param.src.hostvm, + pipe_param->src.source_format, + pipe_param->src.sw_mode, + pipe_param->src.macro_tile_size, + pipe_param->src.source_scan, + pipe_param->src.hostvm, is_chroma); } static void dml_rq_dlg_get_rq_params( struct display_mode_lib *mode_lib, display_rq_params_st *rq_param, - const display_pipe_params_st pipe_param) + const display_pipe_params_st *pipe_param) { // get param for luma surface - rq_param->yuv420 = pipe_param.src.source_format == dm_420_8 - || pipe_param.src.source_format == dm_420_10; - rq_param->yuv420_10bpc = pipe_param.src.source_format == dm_420_10; + rq_param->yuv420 = pipe_param->src.source_format == dm_420_8 + || pipe_param->src.source_format == dm_420_10; + rq_param->yuv420_10bpc = pipe_param->src.source_format == dm_420_10; get_surf_rq_param( mode_lib, @@ -794,7 +794,7 @@ static void dml_rq_dlg_get_rq_params( pipe_param, 0); - if (is_dual_plane((enum source_format_class) (pipe_param.src.source_format))) { + if (is_dual_plane((enum source_format_class) (pipe_param->src.source_format))) { // get param for chroma surface get_surf_rq_param( mode_lib, @@ -806,14 +806,14 @@ static void dml_rq_dlg_get_rq_params( } // calculate how to split the det buffer space between luma and chroma - handle_det_buf_split(mode_lib, rq_param, pipe_param.src); + handle_det_buf_split(mode_lib, rq_param, pipe_param->src); print__rq_params_st(mode_lib, *rq_param); } void dml21_rq_dlg_get_rq_reg( struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_pipe_params_st pipe_param) + const display_pipe_params_st *pipe_param) { display_rq_params_st rq_param = {0}; @@ -1658,7 +1658,7 @@ void dml21_rq_dlg_get_dlg_reg( struct display_mode_lib *mode_lib, display_dlg_regs_st *dlg_regs, display_ttu_regs_st *ttu_regs, - display_e2e_pipe_params_st *e2e_pipe_param, + const display_e2e_pipe_params_st *e2e_pipe_param, const unsigned int num_pipes, const unsigned int pipe_idx, const bool cstate_en, @@ -1696,7 +1696,7 @@ void dml21_rq_dlg_get_dlg_reg( // system parameter calculation done dml_print("DML_DLG: Calculation for pipe[%d] start\n\n", pipe_idx); - dml_rq_dlg_get_rq_params(mode_lib, &rq_param, e2e_pipe_param[pipe_idx].pipe); + dml_rq_dlg_get_rq_params(mode_lib, &rq_param, &e2e_pipe_param[pipe_idx].pipe); dml_rq_dlg_get_dlg_params( mode_lib, e2e_pipe_param, diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.h b/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.h index e8f7785e3fc6..af6ad0ca9cf8 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.h +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn21/display_rq_dlg_calc_21.h @@ -44,7 +44,7 @@ struct display_mode_lib; void dml21_rq_dlg_get_rq_reg( struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_pipe_params_st pipe_param); + const display_pipe_params_st *pipe_param); // Function: dml_rq_dlg_get_dlg_reg // Calculate and return DLG and TTU register struct given the system setting @@ -61,7 +61,7 @@ void dml21_rq_dlg_get_dlg_reg( struct display_mode_lib *mode_lib, display_dlg_regs_st *dlg_regs, display_ttu_regs_st *ttu_regs, - display_e2e_pipe_params_st *e2e_pipe_param, + const display_e2e_pipe_params_st *e2e_pipe_param, const unsigned int num_pipes, const unsigned int pipe_idx, const bool cstate_en, diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c index 0d934fae1c3a..2120e0941a09 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.c @@ -747,7 +747,7 @@ static void get_surf_rq_param(struct display_mode_lib *mode_lib, display_data_rq_sizing_params_st *rq_sizing_param, display_data_rq_dlg_params_st *rq_dlg_param, display_data_rq_misc_params_st *rq_misc_param, - const display_pipe_params_st pipe_param, + const display_pipe_params_st *pipe_param, bool is_chroma, bool is_alpha) { @@ -761,32 +761,32 @@ static void get_surf_rq_param(struct display_mode_lib *mode_lib, // FIXME check if ppe apply for both luma and chroma in 422 case if (is_chroma | is_alpha) { - vp_width = pipe_param.src.viewport_width_c / ppe; - vp_height = pipe_param.src.viewport_height_c; - data_pitch = pipe_param.src.data_pitch_c; - meta_pitch = pipe_param.src.meta_pitch_c; - surface_height = pipe_param.src.surface_height_y / 2.0; + vp_width = pipe_param->src.viewport_width_c / ppe; + vp_height = pipe_param->src.viewport_height_c; + data_pitch = pipe_param->src.data_pitch_c; + meta_pitch = pipe_param->src.meta_pitch_c; + surface_height = pipe_param->src.surface_height_y / 2.0; } else { - vp_width = pipe_param.src.viewport_width / ppe; - vp_height = pipe_param.src.viewport_height; - data_pitch = pipe_param.src.data_pitch; - meta_pitch = pipe_param.src.meta_pitch; - surface_height = pipe_param.src.surface_height_y; + vp_width = pipe_param->src.viewport_width / ppe; + vp_height = pipe_param->src.viewport_height; + data_pitch = pipe_param->src.data_pitch; + meta_pitch = pipe_param->src.meta_pitch; + surface_height = pipe_param->src.surface_height_y; } - if (pipe_param.dest.odm_combine) { + if (pipe_param->dest.odm_combine) { unsigned int access_dir = 0; unsigned int full_src_vp_width = 0; unsigned int hactive_odm = 0; unsigned int src_hactive_odm = 0; - access_dir = (pipe_param.src.source_scan == dm_vert); // vp access direction: horizontal or vertical accessed - hactive_odm = pipe_param.dest.hactive / ((unsigned int)pipe_param.dest.odm_combine*2); + access_dir = (pipe_param->src.source_scan == dm_vert); // vp access direction: horizontal or vertical accessed + hactive_odm = pipe_param->dest.hactive / ((unsigned int) pipe_param->dest.odm_combine*2); if (is_chroma) { - full_src_vp_width = pipe_param.scale_ratio_depth.hscl_ratio_c * pipe_param.dest.full_recout_width; - src_hactive_odm = pipe_param.scale_ratio_depth.hscl_ratio_c * hactive_odm; + full_src_vp_width = pipe_param->scale_ratio_depth.hscl_ratio_c * pipe_param->dest.full_recout_width; + src_hactive_odm = pipe_param->scale_ratio_depth.hscl_ratio_c * hactive_odm; } else { - full_src_vp_width = pipe_param.scale_ratio_depth.hscl_ratio * pipe_param.dest.full_recout_width; - src_hactive_odm = pipe_param.scale_ratio_depth.hscl_ratio * hactive_odm; + full_src_vp_width = pipe_param->scale_ratio_depth.hscl_ratio * pipe_param->dest.full_recout_width; + src_hactive_odm = pipe_param->scale_ratio_depth.hscl_ratio * hactive_odm; } if (access_dir == 0) { @@ -815,7 +815,7 @@ static void get_surf_rq_param(struct display_mode_lib *mode_lib, rq_sizing_param->meta_chunk_bytes = 2048; rq_sizing_param->min_meta_chunk_bytes = 256; - if (pipe_param.src.hostvm) + if (pipe_param->src.hostvm) rq_sizing_param->mpte_group_bytes = 512; else rq_sizing_param->mpte_group_bytes = 2048; @@ -828,28 +828,28 @@ static void get_surf_rq_param(struct display_mode_lib *mode_lib, vp_height, data_pitch, meta_pitch, - pipe_param.src.source_format, - pipe_param.src.sw_mode, - pipe_param.src.macro_tile_size, - pipe_param.src.source_scan, - pipe_param.src.hostvm, + pipe_param->src.source_format, + pipe_param->src.sw_mode, + pipe_param->src.macro_tile_size, + pipe_param->src.source_scan, + pipe_param->src.hostvm, is_chroma, surface_height); } static void dml_rq_dlg_get_rq_params(struct display_mode_lib *mode_lib, display_rq_params_st *rq_param, - const display_pipe_params_st pipe_param) + const display_pipe_params_st *pipe_param) { // get param for luma surface - rq_param->yuv420 = pipe_param.src.source_format == dm_420_8 - || pipe_param.src.source_format == dm_420_10 - || pipe_param.src.source_format == dm_rgbe_alpha - || pipe_param.src.source_format == dm_420_12; + rq_param->yuv420 = pipe_param->src.source_format == dm_420_8 + || pipe_param->src.source_format == dm_420_10 + || pipe_param->src.source_format == dm_rgbe_alpha + || pipe_param->src.source_format == dm_420_12; - rq_param->yuv420_10bpc = pipe_param.src.source_format == dm_420_10; + rq_param->yuv420_10bpc = pipe_param->src.source_format == dm_420_10; - rq_param->rgbe_alpha = (pipe_param.src.source_format == dm_rgbe_alpha)?1:0; + rq_param->rgbe_alpha = (pipe_param->src.source_format == dm_rgbe_alpha)?1:0; get_surf_rq_param(mode_lib, &(rq_param->sizing.rq_l), @@ -859,7 +859,7 @@ static void dml_rq_dlg_get_rq_params(struct display_mode_lib *mode_lib, 0, 0); - if (is_dual_plane((enum source_format_class)(pipe_param.src.source_format))) { + if (is_dual_plane((enum source_format_class)(pipe_param->src.source_format))) { // get param for chroma surface get_surf_rq_param(mode_lib, &(rq_param->sizing.rq_c), @@ -871,13 +871,13 @@ static void dml_rq_dlg_get_rq_params(struct display_mode_lib *mode_lib, } // calculate how to split the det buffer space between luma and chroma - handle_det_buf_split(mode_lib, rq_param, pipe_param.src); + handle_det_buf_split(mode_lib, rq_param, pipe_param->src); print__rq_params_st(mode_lib, *rq_param); } void dml30_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_pipe_params_st pipe_param) + const display_pipe_params_st *pipe_param) { display_rq_params_st rq_param = { 0 }; @@ -1831,7 +1831,7 @@ static void dml_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib, void dml30_rq_dlg_get_dlg_reg(struct display_mode_lib *mode_lib, display_dlg_regs_st *dlg_regs, display_ttu_regs_st *ttu_regs, - display_e2e_pipe_params_st *e2e_pipe_param, + const display_e2e_pipe_params_st *e2e_pipe_param, const unsigned int num_pipes, const unsigned int pipe_idx, const bool cstate_en, @@ -1866,7 +1866,7 @@ void dml30_rq_dlg_get_dlg_reg(struct display_mode_lib *mode_lib, // system parameter calculation done dml_print("DML_DLG: Calculation for pipe[%d] start\n\n", pipe_idx); - dml_rq_dlg_get_rq_params(mode_lib, &rq_param, e2e_pipe_param[pipe_idx].pipe); + dml_rq_dlg_get_rq_params(mode_lib, &rq_param, &e2e_pipe_param[pipe_idx].pipe); dml_rq_dlg_get_dlg_params(mode_lib, e2e_pipe_param, num_pipes, diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.h b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.h index c04965cceff3..625e41f8d575 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.h +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn30/display_rq_dlg_calc_30.h @@ -41,7 +41,7 @@ struct display_mode_lib; // See also: <display_rq_regs_st> void dml30_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_pipe_params_st pipe_param); + const display_pipe_params_st *pipe_param); // Function: dml_rq_dlg_get_dlg_reg // Calculate and return DLG and TTU register struct given the system setting @@ -57,7 +57,7 @@ void dml30_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib, void dml30_rq_dlg_get_dlg_reg(struct display_mode_lib *mode_lib, display_dlg_regs_st *dlg_regs, display_ttu_regs_st *ttu_regs, - display_e2e_pipe_params_st *e2e_pipe_param, + const display_e2e_pipe_params_st *e2e_pipe_param, const unsigned int num_pipes, const unsigned int pipe_idx, const bool cstate_en, diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_rq_dlg_calc_31.c b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_rq_dlg_calc_31.c index c23905bc733a..57bd4e3f8a82 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_rq_dlg_calc_31.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_rq_dlg_calc_31.c @@ -738,7 +738,7 @@ static void get_surf_rq_param( display_data_rq_sizing_params_st *rq_sizing_param, display_data_rq_dlg_params_st *rq_dlg_param, display_data_rq_misc_params_st *rq_misc_param, - const display_pipe_params_st pipe_param, + const display_pipe_params_st *pipe_param, bool is_chroma, bool is_alpha) { @@ -752,33 +752,33 @@ static void get_surf_rq_param( // FIXME check if ppe apply for both luma and chroma in 422 case if (is_chroma | is_alpha) { - vp_width = pipe_param.src.viewport_width_c / ppe; - vp_height = pipe_param.src.viewport_height_c; - data_pitch = pipe_param.src.data_pitch_c; - meta_pitch = pipe_param.src.meta_pitch_c; - surface_height = pipe_param.src.surface_height_y / 2.0; + vp_width = pipe_param->src.viewport_width_c / ppe; + vp_height = pipe_param->src.viewport_height_c; + data_pitch = pipe_param->src.data_pitch_c; + meta_pitch = pipe_param->src.meta_pitch_c; + surface_height = pipe_param->src.surface_height_y / 2.0; } else { - vp_width = pipe_param.src.viewport_width / ppe; - vp_height = pipe_param.src.viewport_height; - data_pitch = pipe_param.src.data_pitch; - meta_pitch = pipe_param.src.meta_pitch; - surface_height = pipe_param.src.surface_height_y; + vp_width = pipe_param->src.viewport_width / ppe; + vp_height = pipe_param->src.viewport_height; + data_pitch = pipe_param->src.data_pitch; + meta_pitch = pipe_param->src.meta_pitch; + surface_height = pipe_param->src.surface_height_y; } - if (pipe_param.dest.odm_combine) { + if (pipe_param->dest.odm_combine) { unsigned int access_dir; unsigned int full_src_vp_width; unsigned int hactive_odm; unsigned int src_hactive_odm; - access_dir = (pipe_param.src.source_scan == dm_vert); // vp access direction: horizontal or vertical accessed - hactive_odm = pipe_param.dest.hactive / ((unsigned int) pipe_param.dest.odm_combine * 2); + access_dir = (pipe_param->src.source_scan == dm_vert); // vp access direction: horizontal or vertical accessed + hactive_odm = pipe_param->dest.hactive / ((unsigned int) pipe_param->dest.odm_combine * 2); if (is_chroma) { - full_src_vp_width = pipe_param.scale_ratio_depth.hscl_ratio_c * pipe_param.dest.full_recout_width; - src_hactive_odm = pipe_param.scale_ratio_depth.hscl_ratio_c * hactive_odm; + full_src_vp_width = pipe_param->scale_ratio_depth.hscl_ratio_c * pipe_param->dest.full_recout_width; + src_hactive_odm = pipe_param->scale_ratio_depth.hscl_ratio_c * hactive_odm; } else { - full_src_vp_width = pipe_param.scale_ratio_depth.hscl_ratio * pipe_param.dest.full_recout_width; - src_hactive_odm = pipe_param.scale_ratio_depth.hscl_ratio * hactive_odm; + full_src_vp_width = pipe_param->scale_ratio_depth.hscl_ratio * pipe_param->dest.full_recout_width; + src_hactive_odm = pipe_param->scale_ratio_depth.hscl_ratio * hactive_odm; } if (access_dir == 0) { @@ -808,7 +808,7 @@ static void get_surf_rq_param( rq_sizing_param->meta_chunk_bytes = 2048; rq_sizing_param->min_meta_chunk_bytes = 256; - if (pipe_param.src.hostvm) + if (pipe_param->src.hostvm) rq_sizing_param->mpte_group_bytes = 512; else rq_sizing_param->mpte_group_bytes = 2048; @@ -822,38 +822,38 @@ static void get_surf_rq_param( vp_height, data_pitch, meta_pitch, - pipe_param.src.source_format, - pipe_param.src.sw_mode, - pipe_param.src.macro_tile_size, - pipe_param.src.source_scan, - pipe_param.src.hostvm, + pipe_param->src.source_format, + pipe_param->src.sw_mode, + pipe_param->src.macro_tile_size, + pipe_param->src.source_scan, + pipe_param->src.hostvm, is_chroma, surface_height); } -static void dml_rq_dlg_get_rq_params(struct display_mode_lib *mode_lib, display_rq_params_st *rq_param, const display_pipe_params_st pipe_param) +static void dml_rq_dlg_get_rq_params(struct display_mode_lib *mode_lib, display_rq_params_st *rq_param, const display_pipe_params_st *pipe_param) { // get param for luma surface - rq_param->yuv420 = pipe_param.src.source_format == dm_420_8 || pipe_param.src.source_format == dm_420_10 || pipe_param.src.source_format == dm_rgbe_alpha - || pipe_param.src.source_format == dm_420_12; + rq_param->yuv420 = pipe_param->src.source_format == dm_420_8 || pipe_param->src.source_format == dm_420_10 || pipe_param->src.source_format == dm_rgbe_alpha + || pipe_param->src.source_format == dm_420_12; - rq_param->yuv420_10bpc = pipe_param.src.source_format == dm_420_10; + rq_param->yuv420_10bpc = pipe_param->src.source_format == dm_420_10; - rq_param->rgbe_alpha = (pipe_param.src.source_format == dm_rgbe_alpha) ? 1 : 0; + rq_param->rgbe_alpha = (pipe_param->src.source_format == dm_rgbe_alpha) ? 1 : 0; get_surf_rq_param(mode_lib, &(rq_param->sizing.rq_l), &(rq_param->dlg.rq_l), &(rq_param->misc.rq_l), pipe_param, 0, 0); - if (is_dual_plane((enum source_format_class) (pipe_param.src.source_format))) { + if (is_dual_plane((enum source_format_class) (pipe_param->src.source_format))) { // get param for chroma surface get_surf_rq_param(mode_lib, &(rq_param->sizing.rq_c), &(rq_param->dlg.rq_c), &(rq_param->misc.rq_c), pipe_param, 1, rq_param->rgbe_alpha); } // calculate how to split the det buffer space between luma and chroma - handle_det_buf_split(mode_lib, rq_param, pipe_param.src); + handle_det_buf_split(mode_lib, rq_param, pipe_param->src); print__rq_params_st(mode_lib, *rq_param); } -void dml31_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, const display_pipe_params_st pipe_param) +void dml31_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, const display_pipe_params_st *pipe_param) { display_rq_params_st rq_param = {0}; @@ -1677,7 +1677,7 @@ void dml31_rq_dlg_get_dlg_reg( struct display_mode_lib *mode_lib, display_dlg_regs_st *dlg_regs, display_ttu_regs_st *ttu_regs, - display_e2e_pipe_params_st *e2e_pipe_param, + const display_e2e_pipe_params_st *e2e_pipe_param, const unsigned int num_pipes, const unsigned int pipe_idx, const bool cstate_en, @@ -1704,7 +1704,7 @@ void dml31_rq_dlg_get_dlg_reg( // system parameter calculation done dml_print("DML_DLG: Calculation for pipe[%d] start\n\n", pipe_idx); - dml_rq_dlg_get_rq_params(mode_lib, &rq_param, e2e_pipe_param[pipe_idx].pipe); + dml_rq_dlg_get_rq_params(mode_lib, &rq_param, &e2e_pipe_param[pipe_idx].pipe); dml_rq_dlg_get_dlg_params( mode_lib, e2e_pipe_param, diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_rq_dlg_calc_31.h b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_rq_dlg_calc_31.h index adf8518f761f..8ee991351699 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_rq_dlg_calc_31.h +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn31/display_rq_dlg_calc_31.h @@ -41,7 +41,7 @@ struct display_mode_lib; // See also: <display_rq_regs_st> void dml31_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_pipe_params_st pipe_param); + const display_pipe_params_st *pipe_param); // Function: dml_rq_dlg_get_dlg_reg // Calculate and return DLG and TTU register struct given the system setting @@ -57,7 +57,7 @@ void dml31_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib, void dml31_rq_dlg_get_dlg_reg(struct display_mode_lib *mode_lib, display_dlg_regs_st *dlg_regs, display_ttu_regs_st *ttu_regs, - display_e2e_pipe_params_st *e2e_pipe_param, + const display_e2e_pipe_params_st *e2e_pipe_param, const unsigned int num_pipes, const unsigned int pipe_idx, const bool cstate_en, diff --git a/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.h b/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.h index d42a0aeca6be..72b1957022aa 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.h +++ b/drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.h @@ -49,7 +49,7 @@ struct dml_funcs { struct display_mode_lib *mode_lib, display_dlg_regs_st *dlg_regs, display_ttu_regs_st *ttu_regs, - display_e2e_pipe_params_st *e2e_pipe_param, + const display_e2e_pipe_params_st *e2e_pipe_param, const unsigned int num_pipes, const unsigned int pipe_idx, const bool cstate_en, @@ -60,7 +60,7 @@ struct dml_funcs { void (*rq_dlg_get_rq_reg)( struct display_mode_lib *mode_lib, display_rq_regs_st *rq_regs, - const display_pipe_params_st pipe_param); + const display_pipe_params_st *pipe_param); void (*recalculate)(struct display_mode_lib *mode_lib); void (*validate)(struct display_mode_lib *mode_lib); }; -- 2.33.0 [-- Attachment #3: 0002-drm-amd-display-Allocate-structs-needed-by-dcn_bw_ca.patch --] [-- Type: text/x-patch, Size: 22557 bytes --] From eece72e18fc9afb237f11077c0c57d82c59b1d1f Mon Sep 17 00:00:00 2001 From: Harry Wentland <harry.wentland@amd.com> Date: Tue, 7 Sep 2021 20:22:13 -0400 Subject: [PATCH 2/2] drm/amd/display: Allocate structs needed by dcn_bw_calc_rq_dlg_ttu in pipe_ctx [Why & How] dcn_bw_calc_rq_dlg_ttu uses a stack frame great than 1024. To solve this we could allocate the rq_param, dlg_sys_param, and input structs dynamically. Since this function is inside a kernel_fpu_begin()/end() call we want to avoid memory allocation. Instead it's much safer to pre-allocate these on the pipe_ctx. Signed-off-by: Harry Wentland <harry.wentland@amd.com> --- .../gpu/drm/amd/display/dc/calcs/dcn_calcs.c | 55 +++--- .../display/dc/dml/dml1_display_rq_dlg_calc.c | 170 +++++++++--------- .../display/dc/dml/dml1_display_rq_dlg_calc.h | 8 +- .../gpu/drm/amd/display/dc/inc/core_types.h | 3 + 4 files changed, 121 insertions(+), 115 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c index 0e18df1283b6..e64145b0a32b 100644 --- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c +++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c @@ -459,9 +459,9 @@ static void dcn_bw_calc_rq_dlg_ttu( struct _vcs_dpi_display_dlg_regs_st *dlg_regs = &pipe->dlg_regs; struct _vcs_dpi_display_ttu_regs_st *ttu_regs = &pipe->ttu_regs; struct _vcs_dpi_display_rq_regs_st *rq_regs = &pipe->rq_regs; - struct _vcs_dpi_display_rq_params_st rq_param = {0}; - struct _vcs_dpi_display_dlg_sys_params_st dlg_sys_param = {0}; - struct _vcs_dpi_display_e2e_pipe_params_st input = { { { 0 } } }; + struct _vcs_dpi_display_rq_params_st *rq_param = &pipe->dml_rq_param; + struct _vcs_dpi_display_dlg_sys_params_st *dlg_sys_param = &pipe->dml_dlg_sys_param; + struct _vcs_dpi_display_e2e_pipe_params_st *input = &pipe->dml_input; float total_active_bw = 0; float total_prefetch_bw = 0; int total_flip_bytes = 0; @@ -470,45 +470,48 @@ static void dcn_bw_calc_rq_dlg_ttu( memset(dlg_regs, 0, sizeof(*dlg_regs)); memset(ttu_regs, 0, sizeof(*ttu_regs)); memset(rq_regs, 0, sizeof(*rq_regs)); + memset(rq_param, 0, sizeof(*rq_param)); + memset(dlg_sys_param, 0, sizeof(*dlg_sys_param)); + memset(input, 0, sizeof(*input)); for (i = 0; i < number_of_planes; i++) { total_active_bw += v->read_bandwidth[i]; total_prefetch_bw += v->prefetch_bandwidth[i]; total_flip_bytes += v->total_immediate_flip_bytes[i]; } - dlg_sys_param.total_flip_bw = v->return_bw - dcn_bw_max2(total_active_bw, total_prefetch_bw); - if (dlg_sys_param.total_flip_bw < 0.0) - dlg_sys_param.total_flip_bw = 0; - - dlg_sys_param.t_mclk_wm_us = v->dram_clock_change_watermark; - dlg_sys_param.t_sr_wm_us = v->stutter_enter_plus_exit_watermark; - dlg_sys_param.t_urg_wm_us = v->urgent_watermark; - dlg_sys_param.t_extra_us = v->urgent_extra_latency; - dlg_sys_param.deepsleep_dcfclk_mhz = v->dcf_clk_deep_sleep; - dlg_sys_param.total_flip_bytes = total_flip_bytes; - - pipe_ctx_to_e2e_pipe_params(pipe, &input.pipe); - input.clks_cfg.dcfclk_mhz = v->dcfclk; - input.clks_cfg.dispclk_mhz = v->dispclk; - input.clks_cfg.dppclk_mhz = v->dppclk; - input.clks_cfg.refclk_mhz = dc->res_pool->ref_clocks.dchub_ref_clock_inKhz / 1000.0; - input.clks_cfg.socclk_mhz = v->socclk; - input.clks_cfg.voltage = v->voltage_level; + dlg_sys_param->total_flip_bw = v->return_bw - dcn_bw_max2(total_active_bw, total_prefetch_bw); + if (dlg_sys_param->total_flip_bw < 0.0) + dlg_sys_param->total_flip_bw = 0; + + dlg_sys_param->t_mclk_wm_us = v->dram_clock_change_watermark; + dlg_sys_param->t_sr_wm_us = v->stutter_enter_plus_exit_watermark; + dlg_sys_param->t_urg_wm_us = v->urgent_watermark; + dlg_sys_param->t_extra_us = v->urgent_extra_latency; + dlg_sys_param->deepsleep_dcfclk_mhz = v->dcf_clk_deep_sleep; + dlg_sys_param->total_flip_bytes = total_flip_bytes; + + pipe_ctx_to_e2e_pipe_params(pipe, &input->pipe); + input->clks_cfg.dcfclk_mhz = v->dcfclk; + input->clks_cfg.dispclk_mhz = v->dispclk; + input->clks_cfg.dppclk_mhz = v->dppclk; + input->clks_cfg.refclk_mhz = dc->res_pool->ref_clocks.dchub_ref_clock_inKhz / 1000.0; + input->clks_cfg.socclk_mhz = v->socclk; + input->clks_cfg.voltage = v->voltage_level; // dc->dml.logger = pool->base.logger; - input.dout.output_format = (v->output_format[in_idx] == dcn_bw_420) ? dm_420 : dm_444; - input.dout.output_type = (v->output[in_idx] == dcn_bw_hdmi) ? dm_hdmi : dm_dp; + input->dout.output_format = (v->output_format[in_idx] == dcn_bw_420) ? dm_420 : dm_444; + input->dout.output_type = (v->output[in_idx] == dcn_bw_hdmi) ? dm_hdmi : dm_dp; //input[in_idx].dout.output_standard; /*todo: soc->sr_enter_plus_exit_time??*/ - dlg_sys_param.t_srx_delay_us = dc->dcn_ip->dcfclk_cstate_latency / v->dcf_clk_deep_sleep; + dlg_sys_param->t_srx_delay_us = dc->dcn_ip->dcfclk_cstate_latency / v->dcf_clk_deep_sleep; - dml1_rq_dlg_get_rq_params(dml, &rq_param, input.pipe.src); + dml1_rq_dlg_get_rq_params(dml, rq_param, input->pipe.src); dml1_extract_rq_regs(dml, rq_regs, rq_param); dml1_rq_dlg_get_dlg_params( dml, dlg_regs, ttu_regs, - rq_param.dlg, + &rq_param->dlg, dlg_sys_param, input, true, diff --git a/drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.c b/drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.c index 8f2b1684c231..0561374cdada 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.c +++ b/drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.c @@ -231,22 +231,22 @@ static void extract_rq_sizing_regs( void dml1_extract_rq_regs( struct display_mode_lib *mode_lib, struct _vcs_dpi_display_rq_regs_st *rq_regs, - const struct _vcs_dpi_display_rq_params_st rq_param) + const struct _vcs_dpi_display_rq_params_st *rq_param) { unsigned int detile_buf_size_in_bytes = mode_lib->ip.det_buffer_size_kbytes * 1024; unsigned int detile_buf_plane1_addr = 0; - extract_rq_sizing_regs(mode_lib, &(rq_regs->rq_regs_l), rq_param.sizing.rq_l); - if (rq_param.yuv420) - extract_rq_sizing_regs(mode_lib, &(rq_regs->rq_regs_c), rq_param.sizing.rq_c); + extract_rq_sizing_regs(mode_lib, &(rq_regs->rq_regs_l), rq_param->sizing.rq_l); + if (rq_param->yuv420) + extract_rq_sizing_regs(mode_lib, &(rq_regs->rq_regs_c), rq_param->sizing.rq_c); - rq_regs->rq_regs_l.swath_height = dml_log2(rq_param.dlg.rq_l.swath_height); - rq_regs->rq_regs_c.swath_height = dml_log2(rq_param.dlg.rq_c.swath_height); + rq_regs->rq_regs_l.swath_height = dml_log2(rq_param->dlg.rq_l.swath_height); + rq_regs->rq_regs_c.swath_height = dml_log2(rq_param->dlg.rq_c.swath_height); /* TODO: take the max between luma, chroma chunk size? * okay for now, as we are setting chunk_bytes to 8kb anyways */ - if (rq_param.sizing.rq_l.chunk_bytes >= 32 * 1024) { /*32kb */ + if (rq_param->sizing.rq_l.chunk_bytes >= 32 * 1024) { /*32kb */ rq_regs->drq_expansion_mode = 0; } else { rq_regs->drq_expansion_mode = 2; @@ -255,9 +255,9 @@ void dml1_extract_rq_regs( rq_regs->mrq_expansion_mode = 1; rq_regs->crq_expansion_mode = 1; - if (rq_param.yuv420) { - if ((double) rq_param.misc.rq_l.stored_swath_bytes - / (double) rq_param.misc.rq_c.stored_swath_bytes <= 1.5) { + if (rq_param->yuv420) { + if ((double) rq_param->misc.rq_l.stored_swath_bytes + / (double) rq_param->misc.rq_c.stored_swath_bytes <= 1.5) { detile_buf_plane1_addr = (detile_buf_size_in_bytes / 2.0 / 64.0); /* half to chroma */ } else { detile_buf_plane1_addr = dml_round_to_multiple( @@ -998,26 +998,26 @@ void dml1_rq_dlg_get_dlg_params( struct display_mode_lib *mode_lib, struct _vcs_dpi_display_dlg_regs_st *disp_dlg_regs, struct _vcs_dpi_display_ttu_regs_st *disp_ttu_regs, - const struct _vcs_dpi_display_rq_dlg_params_st rq_dlg_param, - const struct _vcs_dpi_display_dlg_sys_params_st dlg_sys_param, - const struct _vcs_dpi_display_e2e_pipe_params_st e2e_pipe_param, + const struct _vcs_dpi_display_rq_dlg_params_st *rq_dlg_param, + const struct _vcs_dpi_display_dlg_sys_params_st *dlg_sys_param, + const struct _vcs_dpi_display_e2e_pipe_params_st *e2e_pipe_param, const bool cstate_en, const bool pstate_en, const bool vm_en, const bool iflip_en) { /* Timing */ - unsigned int htotal = e2e_pipe_param.pipe.dest.htotal; - unsigned int hblank_end = e2e_pipe_param.pipe.dest.hblank_end; - unsigned int vblank_start = e2e_pipe_param.pipe.dest.vblank_start; - unsigned int vblank_end = e2e_pipe_param.pipe.dest.vblank_end; - bool interlaced = e2e_pipe_param.pipe.dest.interlaced; + unsigned int htotal = e2e_pipe_param->pipe.dest.htotal; + unsigned int hblank_end = e2e_pipe_param->pipe.dest.hblank_end; + unsigned int vblank_start = e2e_pipe_param->pipe.dest.vblank_start; + unsigned int vblank_end = e2e_pipe_param->pipe.dest.vblank_end; + bool interlaced = e2e_pipe_param->pipe.dest.interlaced; unsigned int min_vblank = mode_lib->ip.min_vblank_lines; - double pclk_freq_in_mhz = e2e_pipe_param.pipe.dest.pixel_rate_mhz; - double refclk_freq_in_mhz = e2e_pipe_param.clks_cfg.refclk_mhz; - double dppclk_freq_in_mhz = e2e_pipe_param.clks_cfg.dppclk_mhz; - double dispclk_freq_in_mhz = e2e_pipe_param.clks_cfg.dispclk_mhz; + double pclk_freq_in_mhz = e2e_pipe_param->pipe.dest.pixel_rate_mhz; + double refclk_freq_in_mhz = e2e_pipe_param->clks_cfg.refclk_mhz; + double dppclk_freq_in_mhz = e2e_pipe_param->clks_cfg.dppclk_mhz; + double dispclk_freq_in_mhz = e2e_pipe_param->clks_cfg.dispclk_mhz; double ref_freq_to_pix_freq; double prefetch_xy_calc_in_dcfclk; @@ -1160,13 +1160,13 @@ void dml1_rq_dlg_get_dlg_params( disp_dlg_regs->dlg_vblank_end = interlaced ? (vblank_end / 2) : vblank_end; /* 15 bits */ prefetch_xy_calc_in_dcfclk = 24.0; /* TODO: ip_param */ - min_dcfclk_mhz = dlg_sys_param.deepsleep_dcfclk_mhz; + min_dcfclk_mhz = dlg_sys_param->deepsleep_dcfclk_mhz; t_calc_us = prefetch_xy_calc_in_dcfclk / min_dcfclk_mhz; - min_ttu_vblank = dlg_sys_param.t_urg_wm_us; + min_ttu_vblank = dlg_sys_param->t_urg_wm_us; if (cstate_en) - min_ttu_vblank = dml_max(dlg_sys_param.t_sr_wm_us, min_ttu_vblank); + min_ttu_vblank = dml_max(dlg_sys_param->t_sr_wm_us, min_ttu_vblank); if (pstate_en) - min_ttu_vblank = dml_max(dlg_sys_param.t_mclk_wm_us, min_ttu_vblank); + min_ttu_vblank = dml_max(dlg_sys_param->t_mclk_wm_us, min_ttu_vblank); min_ttu_vblank = min_ttu_vblank + t_calc_us; min_dst_y_ttu_vblank = min_ttu_vblank * pclk_freq_in_mhz / (double) htotal; @@ -1197,59 +1197,59 @@ void dml1_rq_dlg_get_dlg_params( /* ------------------------- */ /* Prefetch Calc */ /* Source */ - dcc_en = e2e_pipe_param.pipe.src.dcc; + dcc_en = e2e_pipe_param->pipe.src.dcc; dual_plane = is_dual_plane( - (enum source_format_class) e2e_pipe_param.pipe.src.source_format); + (enum source_format_class) e2e_pipe_param->pipe.src.source_format); mode_422 = 0; /* TODO */ - access_dir = (e2e_pipe_param.pipe.src.source_scan == dm_vert); /* vp access direction: horizontal or vertical accessed */ + access_dir = (e2e_pipe_param->pipe.src.source_scan == dm_vert); /* vp access direction: horizontal or vertical accessed */ bytes_per_element_l = get_bytes_per_element( - (enum source_format_class) e2e_pipe_param.pipe.src.source_format, + (enum source_format_class) e2e_pipe_param->pipe.src.source_format, 0); bytes_per_element_c = get_bytes_per_element( - (enum source_format_class) e2e_pipe_param.pipe.src.source_format, + (enum source_format_class) e2e_pipe_param->pipe.src.source_format, 1); - vp_height_l = e2e_pipe_param.pipe.src.viewport_height; - vp_width_l = e2e_pipe_param.pipe.src.viewport_width; - vp_height_c = e2e_pipe_param.pipe.src.viewport_height_c; - vp_width_c = e2e_pipe_param.pipe.src.viewport_width_c; + vp_height_l = e2e_pipe_param->pipe.src.viewport_height; + vp_width_l = e2e_pipe_param->pipe.src.viewport_width; + vp_height_c = e2e_pipe_param->pipe.src.viewport_height_c; + vp_width_c = e2e_pipe_param->pipe.src.viewport_width_c; /* Scaling */ - htaps_l = e2e_pipe_param.pipe.scale_taps.htaps; - htaps_c = e2e_pipe_param.pipe.scale_taps.htaps_c; - hratios_l = e2e_pipe_param.pipe.scale_ratio_depth.hscl_ratio; - hratios_c = e2e_pipe_param.pipe.scale_ratio_depth.hscl_ratio_c; - vratio_l = e2e_pipe_param.pipe.scale_ratio_depth.vscl_ratio; - vratio_c = e2e_pipe_param.pipe.scale_ratio_depth.vscl_ratio_c; + htaps_l = e2e_pipe_param->pipe.scale_taps.htaps; + htaps_c = e2e_pipe_param->pipe.scale_taps.htaps_c; + hratios_l = e2e_pipe_param->pipe.scale_ratio_depth.hscl_ratio; + hratios_c = e2e_pipe_param->pipe.scale_ratio_depth.hscl_ratio_c; + vratio_l = e2e_pipe_param->pipe.scale_ratio_depth.vscl_ratio; + vratio_c = e2e_pipe_param->pipe.scale_ratio_depth.vscl_ratio_c; line_time_in_us = (htotal / pclk_freq_in_mhz); - vinit_l = e2e_pipe_param.pipe.scale_ratio_depth.vinit; - vinit_c = e2e_pipe_param.pipe.scale_ratio_depth.vinit_c; - vinit_bot_l = e2e_pipe_param.pipe.scale_ratio_depth.vinit_bot; - vinit_bot_c = e2e_pipe_param.pipe.scale_ratio_depth.vinit_bot_c; - - swath_height_l = rq_dlg_param.rq_l.swath_height; - swath_width_ub_l = rq_dlg_param.rq_l.swath_width_ub; - dpte_bytes_per_row_ub_l = rq_dlg_param.rq_l.dpte_bytes_per_row_ub; - dpte_groups_per_row_ub_l = rq_dlg_param.rq_l.dpte_groups_per_row_ub; - meta_pte_bytes_per_frame_ub_l = rq_dlg_param.rq_l.meta_pte_bytes_per_frame_ub; - meta_bytes_per_row_ub_l = rq_dlg_param.rq_l.meta_bytes_per_row_ub; - - swath_height_c = rq_dlg_param.rq_c.swath_height; - swath_width_ub_c = rq_dlg_param.rq_c.swath_width_ub; - dpte_bytes_per_row_ub_c = rq_dlg_param.rq_c.dpte_bytes_per_row_ub; - dpte_groups_per_row_ub_c = rq_dlg_param.rq_c.dpte_groups_per_row_ub; - - meta_chunks_per_row_ub_l = rq_dlg_param.rq_l.meta_chunks_per_row_ub; - vupdate_offset = e2e_pipe_param.pipe.dest.vupdate_offset; - vupdate_width = e2e_pipe_param.pipe.dest.vupdate_width; - vready_offset = e2e_pipe_param.pipe.dest.vready_offset; + vinit_l = e2e_pipe_param->pipe.scale_ratio_depth.vinit; + vinit_c = e2e_pipe_param->pipe.scale_ratio_depth.vinit_c; + vinit_bot_l = e2e_pipe_param->pipe.scale_ratio_depth.vinit_bot; + vinit_bot_c = e2e_pipe_param->pipe.scale_ratio_depth.vinit_bot_c; + + swath_height_l = rq_dlg_param->rq_l.swath_height; + swath_width_ub_l = rq_dlg_param->rq_l.swath_width_ub; + dpte_bytes_per_row_ub_l = rq_dlg_param->rq_l.dpte_bytes_per_row_ub; + dpte_groups_per_row_ub_l = rq_dlg_param->rq_l.dpte_groups_per_row_ub; + meta_pte_bytes_per_frame_ub_l = rq_dlg_param->rq_l.meta_pte_bytes_per_frame_ub; + meta_bytes_per_row_ub_l = rq_dlg_param->rq_l.meta_bytes_per_row_ub; + + swath_height_c = rq_dlg_param->rq_c.swath_height; + swath_width_ub_c = rq_dlg_param->rq_c.swath_width_ub; + dpte_bytes_per_row_ub_c = rq_dlg_param->rq_c.dpte_bytes_per_row_ub; + dpte_groups_per_row_ub_c = rq_dlg_param->rq_c.dpte_groups_per_row_ub; + + meta_chunks_per_row_ub_l = rq_dlg_param->rq_l.meta_chunks_per_row_ub; + vupdate_offset = e2e_pipe_param->pipe.dest.vupdate_offset; + vupdate_width = e2e_pipe_param->pipe.dest.vupdate_width; + vready_offset = e2e_pipe_param->pipe.dest.vready_offset; dppclk_delay_subtotal = mode_lib->ip.dppclk_delay_subtotal; dispclk_delay_subtotal = mode_lib->ip.dispclk_delay_subtotal; pixel_rate_delay_subtotal = dppclk_delay_subtotal * pclk_freq_in_mhz / dppclk_freq_in_mhz + dispclk_delay_subtotal * pclk_freq_in_mhz / dispclk_freq_in_mhz; - vstartup_start = e2e_pipe_param.pipe.dest.vstartup_start; + vstartup_start = e2e_pipe_param->pipe.dest.vstartup_start; if (interlaced) vstartup_start = vstartup_start / 2; @@ -1276,13 +1276,13 @@ void dml1_rq_dlg_get_dlg_params( dst_x_after_scaler = 0; dst_y_after_scaler = 0; - if (e2e_pipe_param.pipe.src.is_hsplit) + if (e2e_pipe_param->pipe.src.is_hsplit) dst_x_after_scaler = pixel_rate_delay_subtotal - + e2e_pipe_param.pipe.dest.recout_width; + + e2e_pipe_param->pipe.dest.recout_width; else dst_x_after_scaler = pixel_rate_delay_subtotal; - if (e2e_pipe_param.dout.output_format == dm_420) + if (e2e_pipe_param->dout.output_format == dm_420) dst_y_after_scaler = 1; else dst_y_after_scaler = 0; @@ -1334,7 +1334,7 @@ void dml1_rq_dlg_get_dlg_params( DTRACE( "DLG: %s: t_srx_delay_us = %3.2f", __func__, - (double) dlg_sys_param.t_srx_delay_us); + (double) dlg_sys_param->t_srx_delay_us); DTRACE("DLG: %s: line_time_in_us = %3.2f", __func__, (double) line_time_in_us); DTRACE("DLG: %s: vupdate_offset = %d", __func__, vupdate_offset); DTRACE("DLG: %s: vupdate_width = %d", __func__, vupdate_width); @@ -1408,12 +1408,12 @@ void dml1_rq_dlg_get_dlg_params( DTRACE("DLG: %s: dpte_row_bytes = %d", __func__, dpte_row_bytes); prefetch_bw = (vm_bytes + 2 * dpte_row_bytes + 2 * meta_row_bytes + sw_bytes) / t_pre_us; - flip_bw = ((vm_bytes + dpte_row_bytes + meta_row_bytes) * dlg_sys_param.total_flip_bw) - / (double) dlg_sys_param.total_flip_bytes; + flip_bw = ((vm_bytes + dpte_row_bytes + meta_row_bytes) * dlg_sys_param->total_flip_bw) + / (double) dlg_sys_param->total_flip_bytes; t_vm_us = line_time_in_us / 4.0; if (vm_en && dcc_en) { t_vm_us = dml_max( - dlg_sys_param.t_extra_us, + dlg_sys_param->t_extra_us, dml_max((double) vm_bytes / prefetch_bw, t_vm_us)); if (iflip_en && !dual_plane) { @@ -1423,12 +1423,12 @@ void dml1_rq_dlg_get_dlg_params( } } - t_r0_us = dml_max(dlg_sys_param.t_extra_us - t_vm_us, line_time_in_us - t_vm_us); + t_r0_us = dml_max(dlg_sys_param->t_extra_us - t_vm_us, line_time_in_us - t_vm_us); if (vm_en || dcc_en) { t_r0_us = dml_max( (double) (dpte_row_bytes + meta_row_bytes) / prefetch_bw, - dlg_sys_param.t_extra_us); + dlg_sys_param->t_extra_us); t_r0_us = dml_max((double) (line_time_in_us - t_vm_us), t_r0_us); if (iflip_en && !dual_plane) { @@ -1550,15 +1550,15 @@ void dml1_rq_dlg_get_dlg_params( disp_dlg_regs->refcyc_per_meta_chunk_vblank_l;/* dcc for 4:2:0 is not supported in dcn1.0. assigned to be the same as _l for now */ /* Active */ - req_per_swath_ub_l = rq_dlg_param.rq_l.req_per_swath_ub; - req_per_swath_ub_c = rq_dlg_param.rq_c.req_per_swath_ub; - meta_row_height_l = rq_dlg_param.rq_l.meta_row_height; + req_per_swath_ub_l = rq_dlg_param->rq_l.req_per_swath_ub; + req_per_swath_ub_c = rq_dlg_param->rq_c.req_per_swath_ub; + meta_row_height_l = rq_dlg_param->rq_l.meta_row_height; swath_width_pixels_ub_l = 0; swath_width_pixels_ub_c = 0; scaler_rec_in_width_l = 0; scaler_rec_in_width_c = 0; - dpte_row_height_l = rq_dlg_param.rq_l.dpte_row_height; - dpte_row_height_c = rq_dlg_param.rq_c.dpte_row_height; + dpte_row_height_l = rq_dlg_param->rq_l.dpte_row_height; + dpte_row_height_c = rq_dlg_param->rq_c.dpte_row_height; disp_dlg_regs->dst_y_per_pte_row_nom_l = (unsigned int) ((double) dpte_row_height_l / (double) vratio_l * dml_pow(2, 2)); @@ -1650,14 +1650,14 @@ void dml1_rq_dlg_get_dlg_params( refcyc_per_req_delivery_cur0 = 0.; full_recout_width = 0; - if (e2e_pipe_param.pipe.src.is_hsplit) { - if (e2e_pipe_param.pipe.dest.full_recout_width == 0) { + if (e2e_pipe_param->pipe.src.is_hsplit) { + if (e2e_pipe_param->pipe.dest.full_recout_width == 0) { DTRACE("DLG: %s: Warningfull_recout_width not set in hsplit mode", __func__); - full_recout_width = e2e_pipe_param.pipe.dest.recout_width * 2; /* assume half split for dcn1 */ + full_recout_width = e2e_pipe_param->pipe.dest.recout_width * 2; /* assume half split for dcn1 */ } else - full_recout_width = e2e_pipe_param.pipe.dest.full_recout_width; + full_recout_width = e2e_pipe_param->pipe.dest.full_recout_width; } else - full_recout_width = e2e_pipe_param.pipe.dest.recout_width; + full_recout_width = e2e_pipe_param->pipe.dest.recout_width; refcyc_per_line_delivery_pre_l = get_refcyc_per_delivery( mode_lib, @@ -1824,9 +1824,9 @@ void dml1_rq_dlg_get_dlg_params( } /* TTU - Cursor */ - hratios_cur0 = e2e_pipe_param.pipe.scale_ratio_depth.hscl_ratio; - cur0_src_width = e2e_pipe_param.pipe.src.cur0_src_width; /* cursor source width */ - cur0_bpp = (enum cursor_bpp) e2e_pipe_param.pipe.src.cur0_bpp; + hratios_cur0 = e2e_pipe_param->pipe.scale_ratio_depth.hscl_ratio; + cur0_src_width = e2e_pipe_param->pipe.src.cur0_src_width; /* cursor source width */ + cur0_bpp = (enum cursor_bpp) e2e_pipe_param->pipe.src.cur0_bpp; cur0_req_size = 0; cur0_req_width = 0; cur0_width_ub = 0.0; diff --git a/drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.h b/drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.h index 9c06913ad767..23d4f30826ff 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.h +++ b/drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.h @@ -33,7 +33,7 @@ struct display_mode_lib; void dml1_extract_rq_regs( struct display_mode_lib *mode_lib, struct _vcs_dpi_display_rq_regs_st *rq_regs, - const struct _vcs_dpi_display_rq_params_st rq_param); + const struct _vcs_dpi_display_rq_params_st *rq_param); /* Function: dml_rq_dlg_get_rq_params * Calculate requestor related parameters that register definition agnostic * (i.e. this layer does try to separate real values from register definition) @@ -55,9 +55,9 @@ void dml1_rq_dlg_get_dlg_params( struct display_mode_lib *mode_lib, struct _vcs_dpi_display_dlg_regs_st *dlg_regs, struct _vcs_dpi_display_ttu_regs_st *ttu_regs, - const struct _vcs_dpi_display_rq_dlg_params_st rq_dlg_param, - const struct _vcs_dpi_display_dlg_sys_params_st dlg_sys_param, - const struct _vcs_dpi_display_e2e_pipe_params_st e2e_pipe_param, + const struct _vcs_dpi_display_rq_dlg_params_st *rq_dlg_param, + const struct _vcs_dpi_display_dlg_sys_params_st *dlg_sys_param, + const struct _vcs_dpi_display_e2e_pipe_params_st *e2e_pipe_param, const bool cstate_en, const bool pstate_en, const bool vm_en, diff --git a/drivers/gpu/drm/amd/display/dc/inc/core_types.h b/drivers/gpu/drm/amd/display/dc/inc/core_types.h index 45a6216dfa2a..4dca14b598dd 100644 --- a/drivers/gpu/drm/amd/display/dc/inc/core_types.h +++ b/drivers/gpu/drm/amd/display/dc/inc/core_types.h @@ -366,6 +366,9 @@ struct pipe_ctx { struct _vcs_dpi_display_ttu_regs_st ttu_regs; struct _vcs_dpi_display_rq_regs_st rq_regs; struct _vcs_dpi_display_pipe_dest_params_st pipe_dlg_param; + struct _vcs_dpi_display_rq_params_st dml_rq_param; + struct _vcs_dpi_display_dlg_sys_params_st dml_dlg_sys_param; + struct _vcs_dpi_display_e2e_pipe_params_st dml_input; int det_buffer_size_kb; bool unbounded_req; #endif -- 2.33.0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] Enable '-Werror' by default for all kernel builds 2021-09-08 3:52 ` Harry Wentland @ 2021-09-08 4:41 ` Linus Torvalds 2021-09-09 0:48 ` Harry Wentland 0 siblings, 1 reply; 16+ messages in thread From: Linus Torvalds @ 2021-09-08 4:41 UTC (permalink / raw) To: Harry Wentland Cc: Arnd Bergmann, Leo Li, Alex Deucher, Christian König, Pan, Xinhui, Nathan Chancellor, Guenter Roeck, Linux Kernel Mailing List, llvm, Nick Desaulniers, amd-gfx On Tue, Sep 7, 2021 at 8:52 PM Harry Wentland <harry.wentland@amd.com> wrote: > > Attached patches fix these x86_64 ones reported by Nick: Hmm. You didn't seem to fix up the calling convention for print__xyz(), which still take those xyz structs as pass-by-value. Obviously it would be good to do things incrementally, so if that attached patch was just [1/N] I won't complain.. > I'm also seeing one more that might be more challenging to fix but is nearly at 1024: > > drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_mode_vba_21.c:3397:6: error: stack frame size of 1064 bytes in function 'dml21_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than=] Oh Gods, that function is truly something else.. Is there some reason why it's one humongous function, with the occasional single-line comment? Because it really looks to me like pretty much everywhere I see one of those rare comments, I would go "this part should be a function of its own", and then there would be one caller fuynction that just calls each of those sub-functions one after the other. That would - I think - make the code easier to read, and then it would also make it very obvious where it magically uses a lot of stack. My suspicion is actually "nowhere". The stack use is just hugely spread out, and the compiler has just kept accumulating more spill variables on the frame with no single big reason. Yes, I see a couple of local structures: Pipe myPipe; HostVM myHostVM; but more than that I see several function calls that have basically 62 arguments. And I wish I was making that number up. I'm not. That "CalculatePrefetchSchedule()" call literally has 62 arguments. But *all* of the top-level loops in that function literally look like they could - and should - be functions in their own right. Some of them would be fairly complex even so (ie that code under the comment //Prefetch Check would be quite the big function all of its own. We have a coding style thing: Documentation/process/coding-style.rst that says that you should strive to have functions that are "short and sweet" and fit on one or two screenfuls of text. That one function from hell is 1832 lines of code. It really could be improved upon. Linus ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Enable '-Werror' by default for all kernel builds 2021-09-08 4:41 ` Linus Torvalds @ 2021-09-09 0:48 ` Harry Wentland 0 siblings, 0 replies; 16+ messages in thread From: Harry Wentland @ 2021-09-09 0:48 UTC (permalink / raw) To: Linus Torvalds Cc: Arnd Bergmann, Leo Li, Alex Deucher, Christian König, Pan, Xinhui, Nathan Chancellor, Guenter Roeck, Linux Kernel Mailing List, llvm, Nick Desaulniers, amd-gfx On 2021-09-08 12:41 a.m., Linus Torvalds wrote: > On Tue, Sep 7, 2021 at 8:52 PM Harry Wentland <harry.wentland@amd.com> wrote: >> >> Attached patches fix these x86_64 ones reported by Nick: > > Hmm. > > You didn't seem to fix up the calling convention for print__xyz(), > which still take those xyz structs as pass-by-value. > > Obviously it would be good to do things incrementally, so if that > attached patch was just [1/N] I won't complain.. > You're right. I was focussed on the stack frame limit but fixed up the rest as well now and sent the series out. https://lkml.org/lkml/2021/9/8/933 >> I'm also seeing one more that might be more challenging to fix but is nearly at 1024: >> >> drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn21/display_mode_vba_21.c:3397:6: error: stack frame size of 1064 bytes in function 'dml21_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than=] > > Oh Gods, that function is truly something else.. > > Is there some reason why it's one humongous function, with the > occasional single-line comment? > > Because it really looks to me like pretty much everywhere I see one of > those rare comments, I would go "this part should be a function of its > own", and then there would be one caller fuynction that just calls > each of those sub-functions one after the other. > Yeah, that's what I'm thinking as well. It would likely fix the stack size, even without dynamically allocating the two structs you mention below. > That would - I think - make the code easier to read, and then it would > also make it very obvious where it magically uses a lot of stack. > > My suspicion is actually "nowhere". The stack use is just hugely > spread out, and the compiler has just kept accumulating more spill > variables on the frame with no single big reason. > > Yes, I see a couple of local structures: > > Pipe myPipe; > HostVM myHostVM; > > but more than that I see several function calls that have basically 62 > arguments. And I wish I was making that number up. I'm not. That > "CalculatePrefetchSchedule()" call literally has 62 arguments. > > But *all* of the top-level loops in that function literally look like > they could - and should - be functions in their own right. Some of > them would be fairly complex even so (ie that code under the comment > > //Prefetch Check > > would be quite the big function all of its own. > > We have a coding style thing: > > Documentation/process/coding-style.rst > > that says that you should strive to have functions that are "short and > sweet" and fit on one or two screenfuls of text. > > That one function from hell is 1832 lines of code. > > It really could be improved upon. > Absolutely. The file comes with an (easy to miss) disclaimer that it is "gcc-parseable HW gospel": https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_mode_vba_20.c?h=v5.14#n32 The display_mode_vba* stuff deals with bandwidth formulas from HW designers. At some point in the past we attempted to convert them to something more readable and elegant but would often run into difficulties getting support from the right people when things wouldn't work. Using the HW designer's code directly tends to short circuit any arguments about SW correctness. In short, I don't really like this code but it works. It helps prevent black screens and underflows on the display. We try to follow the coding-style.rst for the most part elsewhere, though there are still plenty of areas where we can improve. Harry > Linus > ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <YTkjJPCdR1VGaaVm@archlinux-ax161>]
[parent not found: <75a10e8b-9f11-64c4-460b-9f3ac09965e2@roeck-us.net>]
[parent not found: <YTkyIAevt7XOd+8j@elver.google.com>]
* Re: [PATCH] Enable '-Werror' by default for all kernel builds [not found] ` <YTkyIAevt7XOd+8j@elver.google.com> @ 2021-09-09 5:58 ` Christoph Hellwig 2021-09-09 6:07 ` Guenter Roeck ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Christoph Hellwig @ 2021-09-09 5:58 UTC (permalink / raw) To: Marco Elver Cc: Guenter Roeck, Nathan Chancellor, Arnd Bergmann, Linus Torvalds, Linux Kernel Mailing List, llvm, Nick Desaulniers, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, Andrey Konovalov, kasan-dev, Christian =?unknown-8bit?B?S8O2bmln?=, Pan, Xinhui, amd-gfx [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=unknown-8bit, Size: 949 bytes --] On Wed, Sep 08, 2021 at 11:58:56PM +0200, Marco Elver wrote: > It'd be good to avoid. It has helped uncover build issues with KASAN in > the past. Or at least make it dependent on the problematic architecture. > For example if arm is a problem, something like this: I'm also seeing quite a few stack size warnings with KASAN on x86_64 without COMPILT_TEST using gcc 10.2.1 from Debian. In fact there are a few warnings without KASAN, but with KASAN there are a lot more. I'll try to find some time to dig into them. While we're at it, with -Werror something like this is really futile: drivers/gpu/drm/amd/amdgpu/amdgpu_object.c: In function ‘amdgpu_bo_support_uswc’: drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:493:2: warning: #warning Please enable CONFIG_MTRR and CONFIG_X86_PAT for better performance thanks to write-combining [-Wcpp 493 | #warning Please enable CONFIG_MTRR and CONFIG_X86_PAT for better performance \ | ^~~~~~~ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Enable '-Werror' by default for all kernel builds 2021-09-09 5:58 ` Christoph Hellwig @ 2021-09-09 6:07 ` Guenter Roeck 2021-09-09 7:30 ` Christian König 2021-09-09 10:53 ` Marco Elver 2021-09-09 16:46 ` Linus Torvalds 2 siblings, 1 reply; 16+ messages in thread From: Guenter Roeck @ 2021-09-09 6:07 UTC (permalink / raw) To: Christoph Hellwig, Marco Elver Cc: Nathan Chancellor, Arnd Bergmann, Linus Torvalds, Linux Kernel Mailing List, llvm, Nick Desaulniers, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, Andrey Konovalov, kasan-dev, Christian =?unknown-8bit?B?S8O2bmln?=, Pan, Xinhui, amd-gfx On 9/8/21 10:58 PM, Christoph Hellwig wrote: > On Wed, Sep 08, 2021 at 11:58:56PM +0200, Marco Elver wrote: >> It'd be good to avoid. It has helped uncover build issues with KASAN in >> the past. Or at least make it dependent on the problematic architecture. >> For example if arm is a problem, something like this: > > I'm also seeing quite a few stack size warnings with KASAN on x86_64 > without COMPILT_TEST using gcc 10.2.1 from Debian. In fact there are a > few warnings without KASAN, but with KASAN there are a lot more. > I'll try to find some time to dig into them. > > While we're at it, with -Werror something like this is really futile: > > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c: In function ‘amdgpu_bo_support_uswc’: > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:493:2: warning: #warning > Please enable CONFIG_MTRR and CONFIG_X86_PAT for better performance thanks to write-combining [-Wcpp > 493 | #warning Please enable CONFIG_MTRR and CONFIG_X86_PAT for better performance \ > | ^~~~~~~ > I have been wondering if all those #warning "errors" should either be removed or be replaced with "#pragma message". Guenter ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Enable '-Werror' by default for all kernel builds 2021-09-09 6:07 ` Guenter Roeck @ 2021-09-09 7:30 ` Christian König 2021-09-09 14:59 ` Guenter Roeck 0 siblings, 1 reply; 16+ messages in thread From: Christian König @ 2021-09-09 7:30 UTC (permalink / raw) To: Guenter Roeck, Christoph Hellwig, Marco Elver Cc: Nathan Chancellor, Arnd Bergmann, Linus Torvalds, Linux Kernel Mailing List, llvm, Nick Desaulniers, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, Andrey Konovalov, kasan-dev, Pan, Xinhui, amd-gfx Am 09.09.21 um 08:07 schrieb Guenter Roeck: > On 9/8/21 10:58 PM, Christoph Hellwig wrote: >> On Wed, Sep 08, 2021 at 11:58:56PM +0200, Marco Elver wrote: >>> It'd be good to avoid. It has helped uncover build issues with KASAN in >>> the past. Or at least make it dependent on the problematic >>> architecture. >>> For example if arm is a problem, something like this: >> >> I'm also seeing quite a few stack size warnings with KASAN on x86_64 >> without COMPILT_TEST using gcc 10.2.1 from Debian. In fact there are a >> few warnings without KASAN, but with KASAN there are a lot more. >> I'll try to find some time to dig into them. >> >> While we're at it, with -Werror something like this is really futile: >> >> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c: In function >> ‘amdgpu_bo_support_uswc’: >> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:493:2: warning: #warning >> Please enable CONFIG_MTRR and CONFIG_X86_PAT for better performance >> thanks to write-combining [-Wcpp >> 493 | #warning Please enable CONFIG_MTRR and CONFIG_X86_PAT for >> better performance \ >> | ^~~~~~~ Ah, yes good point! > > I have been wondering if all those #warning "errors" should either > be removed or be replaced with "#pragma message". Well we started to add those warnings because people compiled their kernel with CONFIG_MTRR and CONFIG_X86_PAT and was then wondering why the performance of the display driver was so crappy. When those warning now generate an error which you have to disable explicitly then that might not be bad at all. It at least points people to this setting and makes it really clear that they are doing something very unusual and need to keep in mind that it might not have the desired result. Regards, Christian. > > Guenter ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Enable '-Werror' by default for all kernel builds 2021-09-09 7:30 ` Christian König @ 2021-09-09 14:59 ` Guenter Roeck 0 siblings, 0 replies; 16+ messages in thread From: Guenter Roeck @ 2021-09-09 14:59 UTC (permalink / raw) To: Christian König, Christoph Hellwig, Marco Elver Cc: Nathan Chancellor, Arnd Bergmann, Linus Torvalds, Linux Kernel Mailing List, llvm, Nick Desaulniers, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, Andrey Konovalov, kasan-dev, Pan, Xinhui, amd-gfx On 9/9/21 12:30 AM, Christian König wrote: > Am 09.09.21 um 08:07 schrieb Guenter Roeck: >> On 9/8/21 10:58 PM, Christoph Hellwig wrote: >>> On Wed, Sep 08, 2021 at 11:58:56PM +0200, Marco Elver wrote: >>>> It'd be good to avoid. It has helped uncover build issues with KASAN in >>>> the past. Or at least make it dependent on the problematic architecture. >>>> For example if arm is a problem, something like this: >>> >>> I'm also seeing quite a few stack size warnings with KASAN on x86_64 >>> without COMPILT_TEST using gcc 10.2.1 from Debian. In fact there are a >>> few warnings without KASAN, but with KASAN there are a lot more. >>> I'll try to find some time to dig into them. >>> >>> While we're at it, with -Werror something like this is really futile: >>> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c: In function ‘amdgpu_bo_support_uswc’: >>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:493:2: warning: #warning >>> Please enable CONFIG_MTRR and CONFIG_X86_PAT for better performance thanks to write-combining [-Wcpp >>> 493 | #warning Please enable CONFIG_MTRR and CONFIG_X86_PAT for better performance \ >>> | ^~~~~~~ > > Ah, yes good point! > >> >> I have been wondering if all those #warning "errors" should either >> be removed or be replaced with "#pragma message". > > Well we started to add those warnings because people compiled their kernel with CONFIG_MTRR and CONFIG_X86_PAT and was then wondering why the performance of the display driver was so crappy. > > When those warning now generate an error which you have to disable explicitly then that might not be bad at all. > > It at least points people to this setting and makes it really clear that they are doing something very unusual and need to keep in mind that it might not have the desired result. > That specific warning is surrounded with "#ifndef CONFIG_COMPILE_TEST" so it doesn't really matter because it doesn't cause test build failures. Of course, we could do the same for any #warning which does now cause a test build failure. Guenter ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Enable '-Werror' by default for all kernel builds 2021-09-09 5:58 ` Christoph Hellwig 2021-09-09 6:07 ` Guenter Roeck @ 2021-09-09 10:53 ` Marco Elver 2021-09-09 11:00 ` Arnd Bergmann 2021-09-09 16:46 ` Linus Torvalds 2 siblings, 1 reply; 16+ messages in thread From: Marco Elver @ 2021-09-09 10:53 UTC (permalink / raw) To: Christoph Hellwig Cc: Guenter Roeck, Nathan Chancellor, Arnd Bergmann, Linus Torvalds, Linux Kernel Mailing List, llvm, Nick Desaulniers, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, Andrey Konovalov, kasan-dev, Christian König, Pan, Xinhui, amd-gfx On Thu, 9 Sept 2021 at 07:59, Christoph Hellwig <hch@infradead.org> wrote: > On Wed, Sep 08, 2021 at 11:58:56PM +0200, Marco Elver wrote: > > It'd be good to avoid. It has helped uncover build issues with KASAN in > > the past. Or at least make it dependent on the problematic architecture. > > For example if arm is a problem, something like this: > > I'm also seeing quite a few stack size warnings with KASAN on x86_64 > without COMPILT_TEST using gcc 10.2.1 from Debian. In fact there are a > few warnings without KASAN, but with KASAN there are a lot more. > I'll try to find some time to dig into them. Right, this reminded me that we actually at least double the real stack size for KASAN builds, because it inherently requires more stack space. I think we need Wframe-larger-than to match that, otherwise we'll just keep having this problem: https://lkml.kernel.org/r/20210909104925.809674-1-elver@google.com > While we're at it, with -Werror something like this is really futile: > > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c: In function ‘amdgpu_bo_support_uswc’: > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:493:2: warning: #warning > Please enable CONFIG_MTRR and CONFIG_X86_PAT for better performance thanks to write-combining [-Wcpp > 493 | #warning Please enable CONFIG_MTRR and CONFIG_X86_PAT for better performance \ > | ^~~~~~~ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Enable '-Werror' by default for all kernel builds 2021-09-09 10:53 ` Marco Elver @ 2021-09-09 11:00 ` Arnd Bergmann 2021-09-09 11:43 ` Marco Elver 0 siblings, 1 reply; 16+ messages in thread From: Arnd Bergmann @ 2021-09-09 11:00 UTC (permalink / raw) To: Marco Elver Cc: Christoph Hellwig, Guenter Roeck, Nathan Chancellor, Linus Torvalds, Linux Kernel Mailing List, llvm, Nick Desaulniers, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, Andrey Konovalov, kasan-dev, Christian König, Pan, Xinhui, amd-gfx list On Thu, Sep 9, 2021 at 12:54 PM Marco Elver <elver@google.com> wrote: > On Thu, 9 Sept 2021 at 07:59, Christoph Hellwig <hch@infradead.org> wrote: > > On Wed, Sep 08, 2021 at 11:58:56PM +0200, Marco Elver wrote: > > > It'd be good to avoid. It has helped uncover build issues with KASAN in > > > the past. Or at least make it dependent on the problematic architecture. > > > For example if arm is a problem, something like this: > > > > I'm also seeing quite a few stack size warnings with KASAN on x86_64 > > without COMPILT_TEST using gcc 10.2.1 from Debian. In fact there are a > > few warnings without KASAN, but with KASAN there are a lot more. > > I'll try to find some time to dig into them. > > Right, this reminded me that we actually at least double the real > stack size for KASAN builds, because it inherently requires more stack > space. I think we need Wframe-larger-than to match that, otherwise > we'll just keep having this problem: > > https://lkml.kernel.org/r/20210909104925.809674-1-elver@google.com The problem with this is that it completely defeats the point of the stack size warnings in allmodconfig kernels when they have KASAN enabled and end up missing obvious code bugs in drivers that put large structures on the stack. Let's not go there. Arnd ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Enable '-Werror' by default for all kernel builds 2021-09-09 11:00 ` Arnd Bergmann @ 2021-09-09 11:43 ` Marco Elver 2021-09-09 12:55 ` Arnd Bergmann 2021-09-09 16:53 ` Linus Torvalds 0 siblings, 2 replies; 16+ messages in thread From: Marco Elver @ 2021-09-09 11:43 UTC (permalink / raw) To: Arnd Bergmann Cc: Christoph Hellwig, Guenter Roeck, Nathan Chancellor, Linus Torvalds, Linux Kernel Mailing List, llvm, Nick Desaulniers, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, Andrey Konovalov, kasan-dev, Christian König, Pan, Xinhui, amd-gfx list On Thu, 9 Sept 2021 at 13:00, Arnd Bergmann <arnd@kernel.org> wrote: > On Thu, Sep 9, 2021 at 12:54 PM Marco Elver <elver@google.com> wrote: > > On Thu, 9 Sept 2021 at 07:59, Christoph Hellwig <hch@infradead.org> wrote: > > > On Wed, Sep 08, 2021 at 11:58:56PM +0200, Marco Elver wrote: > > > > It'd be good to avoid. It has helped uncover build issues with KASAN in > > > > the past. Or at least make it dependent on the problematic architecture. > > > > For example if arm is a problem, something like this: > > > > > > I'm also seeing quite a few stack size warnings with KASAN on x86_64 > > > without COMPILT_TEST using gcc 10.2.1 from Debian. In fact there are a > > > few warnings without KASAN, but with KASAN there are a lot more. > > > I'll try to find some time to dig into them. > > > > Right, this reminded me that we actually at least double the real > > stack size for KASAN builds, because it inherently requires more stack > > space. I think we need Wframe-larger-than to match that, otherwise > > we'll just keep having this problem: > > > > https://lkml.kernel.org/r/20210909104925.809674-1-elver@google.com > > The problem with this is that it completely defeats the point of the > stack size warnings in allmodconfig kernels when they have KASAN > enabled and end up missing obvious code bugs in drivers that put > large structures on the stack. Let's not go there. Sure, but the reality is that the real stack size is already doubled for KASAN. And that should be reflected in Wframe-larger-than. Either that, or we just have to live with the occasional warning (that is likely benign). But with WERROR we're now forced to make the defaults as sane as possible. If the worry is allmodconfig, maybe we do have to make KASAN dependent on !COMPILE_TEST, even though that's not great either because it has caught real issues in the past (it'll also mean doing the same for all other instrumentation-based tools, like KCSAN, UBSAN, etc.). ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Enable '-Werror' by default for all kernel builds 2021-09-09 11:43 ` Marco Elver @ 2021-09-09 12:55 ` Arnd Bergmann 2021-09-09 16:53 ` Linus Torvalds 1 sibling, 0 replies; 16+ messages in thread From: Arnd Bergmann @ 2021-09-09 12:55 UTC (permalink / raw) To: Marco Elver Cc: Christoph Hellwig, Guenter Roeck, Nathan Chancellor, Linus Torvalds, Linux Kernel Mailing List, llvm, Nick Desaulniers, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, Andrey Konovalov, kasan-dev, Christian König, Pan, Xinhui, amd-gfx list On Thu, Sep 9, 2021 at 1:43 PM Marco Elver <elver@google.com> wrote: > On Thu, 9 Sept 2021 at 13:00, Arnd Bergmann <arnd@kernel.org> wrote: > > On Thu, Sep 9, 2021 at 12:54 PM Marco Elver <elver@google.com> wrote: > > > On Thu, 9 Sept 2021 at 07:59, Christoph Hellwig <hch@infradead.org> wrote: > > > > On Wed, Sep 08, 2021 at 11:58:56PM +0200, Marco Elver wrote: > > > > > It'd be good to avoid. It has helped uncover build issues with KASAN in > > > > > the past. Or at least make it dependent on the problematic architecture. > > > > > For example if arm is a problem, something like this: > > > > > > > > I'm also seeing quite a few stack size warnings with KASAN on x86_64 > > > > without COMPILT_TEST using gcc 10.2.1 from Debian. In fact there are a > > > > few warnings without KASAN, but with KASAN there are a lot more. > > > > I'll try to find some time to dig into them. > > > > > > Right, this reminded me that we actually at least double the real > > > stack size for KASAN builds, because it inherently requires more stack > > > space. I think we need Wframe-larger-than to match that, otherwise > > > we'll just keep having this problem: > > > > > > https://lkml.kernel.org/r/20210909104925.809674-1-elver@google.com > > > > The problem with this is that it completely defeats the point of the > > stack size warnings in allmodconfig kernels when they have KASAN > > enabled and end up missing obvious code bugs in drivers that put > > large structures on the stack. Let's not go there. > > Sure, but the reality is that the real stack size is already doubled > for KASAN. And that should be reflected in Wframe-larger-than. I don't think "double" is an accurate description of what is going on, it's much more complex than this. There are some functions that completely explode with KASAN_STACK enabled on clang, and many other functions instances that don't grow much at all. I've been building randconfig kernels for a long time with KASAN_STACK enabled on gcc, and the limit increased to 1440 bytes for 32-bit and not increased beyond the normal 2048 bytes for 64-bit. I have some patches to address the outliers and should go through and resend some of those. With the same limits and patches using clang, and KASAN=y but KASAN_STACK=n I also get no warnings in randconfig builds, but KASAN_STACK on clang doesn't really seem to have a good limit that would make an allmodconfig kernel build with no warnings. These are the worst offenders I see based on configuration, using an 32-bit ARM allmodconfig with my fixups: gcc-11, KASAN, no KASAN_STACK, FRAME_WARN=1024: (nothing) gcc-11, KASAN_STACK: drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c:782:1: warning: the frame size of 1416 bytes is larger than 1024 bytes [-Wframe-larger-than=] drivers/media/dvb-frontends/mxl5xx.c:1575:1: warning: the frame size of 1240 bytes is larger than 1024 bytes [-Wframe-larger-than=] drivers/mtd/nftlcore.c:468:1: warning: the frame size of 1232 bytes is larger than 1024 bytes [-Wframe-larger-than=] drivers/char/ipmi/ipmi_msghandler.c:4880:1: warning: the frame size of 1232 bytes is larger than 1024 bytes [-Wframe-larger-than=] drivers/mtd/chips/cfi_cmdset_0001.c:1870:1: warning: the frame size of 1224 bytes is larger than 1024 bytes [-Wframe-larger-than=] drivers/net/wireless/ath/ath9k/ar9003_paprd.c:749:1: warning: the frame size of 1216 bytes is larger than 1024 bytes [-Wframe-larger-than=] drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c:136:1: warning: the frame size of 1216 bytes is larger than 1024 bytes [-Wframe-larger-than=] drivers/ntb/hw/idt/ntb_hw_idt.c:1116:1: warning: the frame size of 1200 bytes is larger than 1024 bytes [-Wframe-larger-than=] net/dcb/dcbnl.c:1172:1: warning: the frame size of 1192 bytes is larger than 1024 bytes [-Wframe-larger-than=] fs/select.c:1042:1: warning: the frame size of 1192 bytes is larger than 1024 bytes [-Wframe-larger-than=] clang-12 KASAN, no KASAN_STACK, FRAME_WARN=1024: kernel/trace/trace_events_hist.c:4601:13: error: stack frame size 1384 exceeds limit 1024 in function 'hist_trigger_print_key' [-Werror,-Wframe-larger-than] drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dce_calcs.c:3045:6: error: stack frame size 1384 exceeds limit 1024 in function 'bw_calcs' [-Werror,-Wframe-larger-than] drivers/staging/fbtft/fbtft-core.c:992:5: error: stack frame size 1208 exceeds limit 1024 in function 'fbtft_init_display' [-Werror,-Wframe-larger-than] crypto/wp512.c:782:13: error: stack frame size 1176 exceeds limit 1024 in function 'wp512_process_buffer' [-Werror,-Wframe-larger-than] drivers/staging/fbtft/fbtft-core.c:902:12: error: stack frame size 1080 exceeds limit 1024 in function 'fbtft_init_display_from_property' [-Werror,-Wframe-larger-than] drivers/mtd/chips/cfi_cmdset_0001.c:1872:12: error: stack frame size 1064 exceeds limit 1024 in function 'cfi_intelext_writev' [-Werror,-Wframe-larger-than] drivers/staging/rtl8723bs/core/rtw_security.c:1288:5: error: stack frame size 1040 exceeds limit 1024 in function 'rtw_aes_decrypt' [-Werror,-Wframe-larger-than] drivers/ntb/hw/idt/ntb_hw_idt.c:1041:27: error: stack frame size 1032 exceeds limit 1024 in function 'idt_scan_mws' [-Werror,-Wframe-larger-than] clang-12, KASAN_STACK: drivers/infiniband/hw/ocrdma/ocrdma_stats.c:686:16: error: stack frame size 20608 exceeds limit 1024 in function 'ocrdma_dbgfs_ops_read' [-Werror,-Wframe-larger-than] lib/bitfield_kunit.c:60:20: error: stack frame size 10336 exceeds limit 10240 in function 'test_bitfields_constants' [-Werror,-Wframe-larger-than] drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9012:13: error: stack frame size 9952 exceeds limit 1024 in function 'rt2800_init_rfcsr' [-Werror,-Wframe-larger-than] drivers/net/usb/r8152.c:7486:13: error: stack frame size 8768 exceeds limit 1024 in function 'r8156b_hw_phy_cfg' [-Werror,-Wframe-larger-than] drivers/media/dvb-frontends/nxt200x.c:915:12: error: stack frame size 8192 exceeds limit 1024 in function 'nxt2004_init' [-Werror,-Wframe-larger-than] drivers/net/wan/slic_ds26522.c:203:12: error: stack frame size 8064 exceeds limit 1024 in function 'slic_ds26522_probe' [-Werror,-Wframe-larger-than] drivers/firmware/broadcom/bcm47xx_sprom.c:188:13: error: stack frame size 8064 exceeds limit 1024 in function 'bcm47xx_sprom_fill_auto' [-Werror,-Wframe-larger-than] drivers/media/dvb-frontends/drxd_hard.c:2857:12: error: stack frame size 7584 exceeds limit 1024 in function 'drxd_set_frontend' [-Werror,-Wframe-larger-than] drivers/media/dvb-frontends/nxt200x.c:519:12: error: stack frame size 6848 exceeds limit 1024 in function 'nxt200x_setup_frontend_parameters' [-Werror,-Wframe-larger-than] drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:17019:13: error: stack frame size 6560 exceeds limit 1024 in function 'wlc_phy_workarounds_nphy' [-Werror,-Wframe-larger-than] > Either that, or we just have to live with the occasional warning (that > is likely benign). But with WERROR we're now forced to make the > defaults as sane as possible. If the worry is allmodconfig, maybe we > do have to make KASAN dependent on !COMPILE_TEST, even though that's > not great either because it has caught real issues in the past (it'll > also mean doing the same for all other instrumentation-based tools, > like KCSAN, UBSAN, etc.). I would prefer going back to marking KASAN_STACK as broken on clang, it does not seem like the warnings on the symbol were enough to stop people from attempting to using it, and the remaining warnings seem fixable with a small increase of the FRAME_WARN when using KASAN with clang but no KASAN_STACK, or when using KASAN_STACK with gcc. Arnd ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Enable '-Werror' by default for all kernel builds 2021-09-09 11:43 ` Marco Elver 2021-09-09 12:55 ` Arnd Bergmann @ 2021-09-09 16:53 ` Linus Torvalds 1 sibling, 0 replies; 16+ messages in thread From: Linus Torvalds @ 2021-09-09 16:53 UTC (permalink / raw) To: Marco Elver Cc: Arnd Bergmann, Christoph Hellwig, Guenter Roeck, Nathan Chancellor, Linux Kernel Mailing List, llvm, Nick Desaulniers, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, Andrey Konovalov, kasan-dev, Christian König, Pan, Xinhui, amd-gfx list On Thu, Sep 9, 2021 at 4:43 AM Marco Elver <elver@google.com> wrote: > > Sure, but the reality is that the real stack size is already doubled > for KASAN. And that should be reflected in Wframe-larger-than. I don't think that's true. Quite the reverse, in fact. Yes, the *dynamic* stack size is doubled due to KASAN, because it will cause much deeper callchains. But the individual frames don't grow that much apart from compilers doing stupid things (ie apparently clang and KASAN_STACK), and if anything, the deeper dynamic call chains means that the individual frame size being small is even *more* important, but we do compensate for the deeper stacks by making THREAD_SIZE_ORDER bigger at least on x86. Honestly, I am not even happy with the current "2048 bytes for 64-bit". The excuse has been that 64-bit needs more stack, but all it ever did was clearly to just allow people to just do bad things. Because a 1kB stack frame is horrendous even in 64-bit. That's not "spill some registers" kind of stack frame. That's "put a big structure on the stack" kind of stack frame regardless of any other issues. And no, "but we have 16kB of stack and we'll switch stacks on interrupts" is not an excuse for one single level to use up 1kB, much less 2kB. Does anybody seriously believe that we don't quite normally have stacks that are easily tens of frames deep? Without having some true "this is the full callchain" information, the best we can do is just limit individual stack frames. And 2kB is *excessive*. Linus ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Enable '-Werror' by default for all kernel builds 2021-09-09 5:58 ` Christoph Hellwig 2021-09-09 6:07 ` Guenter Roeck 2021-09-09 10:53 ` Marco Elver @ 2021-09-09 16:46 ` Linus Torvalds 2 siblings, 0 replies; 16+ messages in thread From: Linus Torvalds @ 2021-09-09 16:46 UTC (permalink / raw) To: Christoph Hellwig Cc: Marco Elver, Guenter Roeck, Nathan Chancellor, Arnd Bergmann, Linux Kernel Mailing List, llvm, Nick Desaulniers, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, Andrey Konovalov, kasan-dev, Christian König, Pan, Xinhui, amd-gfx On Wed, Sep 8, 2021 at 10:59 PM Christoph Hellwig <hch@infradead.org> wrote: > > While we're at it, with -Werror something like this is really futile: Yeah, I'm thinking we could do -Wno-error=cpp to at least allow the cpp warnings to come through without being fatal. Because while they can be annoying too, they are most definitely under our direct control, so.. I didn't actually test that, but I think it should work. That said, maybe they should just be removed. They might be better off just as Kconfig rules, rather than as a "hey, you screwed up your Kconfig" warning after the fact. Linus ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] Enable '-Werror' by default for all kernel builds [not found] ` <YTkjJPCdR1VGaaVm@archlinux-ax161> [not found] ` <75a10e8b-9f11-64c4-460b-9f3ac09965e2@roeck-us.net> @ 2021-09-21 15:41 ` Arnd Bergmann 1 sibling, 0 replies; 16+ messages in thread From: Arnd Bergmann @ 2021-09-21 15:41 UTC (permalink / raw) To: Nathan Chancellor Cc: Linus Torvalds, Guenter Roeck, Linux Kernel Mailing List, llvm, Nick Desaulniers, Paul Walmsley, Palmer Dabbelt, Albert Ou, linux-riscv, Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov, Andrey Konovalov, kasan-dev, Harry Wentland, Alex Deucher, Christian König, xinhui pan, amd-gfx list On Wed, Sep 8, 2021 at 10:55 PM Nathan Chancellor <nathan@kernel.org> wrote: > On Tue, Sep 07, 2021 at 11:11:17AM +0200, Arnd Bergmann wrote: > > On Tue, Sep 7, 2021 at 4:32 AM Nathan Chancellor <nathan@kernel.org> wrote: function 'rtw_aes_decrypt' [-Werror,-Wframe-larger-than] > > > arm32-fedora.log: drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dce_calcs.c:3043:6: error: stack frame size (1376) exceeds limit (1024) in function 'bw_calcs' [-Werror,-Wframe-larger-than] > > > arm32-fedora.log: drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dce_calcs.c:77:13: error: stack frame size (5384) exceeds limit (1024) in function 'calculate_bandwidth' [-Werror,-Wframe-larger-than] > > > > > > Aside from the dce_calcs.c warnings, these do not seem too bad. I > > > believe allmodconfig turns on UBSAN but it could also be aggressive > > > inlining by clang. I intend to look at all -Wframe-large-than warnings > > > closely later. > > > > I've had them close to zero in the past, but a couple of new ones came in. > > > > The amdgpu ones are probably not fixable unless they stop using 64-bit > > floats in the kernel for > > random calculations. The crypto/* ones tend to be compiler bugs, but hard to fix > > I have started taking a look at these. Most of the allmodconfig ones > appear to be related to CONFIG_KASAN, which is now supported for > CONFIG_ARM. > > The two in bpmp-debugfs.c appear regardless of CONFIG_KASAN and it turns > out that you actually submitted a patch for these: > > https://lore.kernel.org/r/20201204193714.3134651-1-arnd@kernel.org/ > > Is it worth resending or pinging that? I'm now restarting from a clean tree for my randconfig patches to see which ones are actually needed, will hopefully get to that. > The dce_calcs.c ones also appear without CONFIG_KASAN, which you noted > is probably unavoidable. (adding amdgpu folks to Cc here) Harry Wentland did a nice rework for dcn_calcs.c that should also be portable to dce_calcs.c, I hope that he will be able to get to that as well. Looking at my older patches now, I found that I had only suppressed that one and given up fixing it, but I did put my analysis into https://bugs.llvm.org/show_bug.cgi?id=42551, which should be helpful for addressing it in either the kernel or the compiler. Arnd ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-09-21 15:42 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20210906142615.GA1917503@roeck-us.net>
[not found] ` <CAHk-=wgjTePY1v_D-jszz4NrpTso0CdvB9PcdroPS=TNU1oZMQ@mail.gmail.com>
[not found] ` <YTbOs13waorzamZ6@Ryzen-9-3900X.localdomain>
[not found] ` <CAK8P3a3_Tdc-XVPXrJ69j3S9048uzmVJGrNcvi0T6yr6OrHkPw@mail.gmail.com>
[not found] ` <CAHk-=wgZkQ+eZ02TaCpAWo_ffiLMwA2tYNHyL+B1dQ4YB0qfmA@mail.gmail.com>
2021-09-07 17:33 ` [PATCH] Enable '-Werror' by default for all kernel builds Linus Torvalds
2021-09-07 21:07 ` Harry Wentland
2021-09-08 3:52 ` Harry Wentland
2021-09-08 4:41 ` Linus Torvalds
2021-09-09 0:48 ` Harry Wentland
[not found] ` <YTkjJPCdR1VGaaVm@archlinux-ax161>
[not found] ` <75a10e8b-9f11-64c4-460b-9f3ac09965e2@roeck-us.net>
[not found] ` <YTkyIAevt7XOd+8j@elver.google.com>
2021-09-09 5:58 ` Christoph Hellwig
2021-09-09 6:07 ` Guenter Roeck
2021-09-09 7:30 ` Christian König
2021-09-09 14:59 ` Guenter Roeck
2021-09-09 10:53 ` Marco Elver
2021-09-09 11:00 ` Arnd Bergmann
2021-09-09 11:43 ` Marco Elver
2021-09-09 12:55 ` Arnd Bergmann
2021-09-09 16:53 ` Linus Torvalds
2021-09-09 16:46 ` Linus Torvalds
2021-09-21 15:41 ` Arnd Bergmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox