On 19/02/2025 10:02, Juri Lelli wrote: > On 19/02/25 10:29, Dietmar Eggemann wrote: > > ... > >> I did now. > > Thanks! > >> Patch-wise I have: >> >> (1) Putting 'fair_server's __dl_server_[de|at]tach_root() under if >> '(cpumask_test_cpu(rq->cpu, [old_rd->online|cpu_active_mask))' in >> rq_attach_root() >> >> https://lkml.kernel.org/r/Z7RhNmLpOb7SLImW@jlelli-thinkpadt14gen4.remote.csb >> >> (2) Create __dl_server_detach_root() and call it in rq_attach_root() >> >> https://lkml.kernel.org/r/Z4fd_6M2vhSMSR0i@jlelli-thinkpadt14gen4.remote.csb >> >> plus debug patch: >> >> https://lkml.kernel.org/r/Z6M5fQB9P1_bDF7A@jlelli-thinkpadt14gen4.remote.csb >> >> plus additional debug. > > So you don't have the one with which we ignore special tasks while > rebuilding domains? > > https://lore.kernel.org/all/Z6spnwykg6YSXBX_@jlelli-thinkpadt14gen4.remote.csb/ > > Could you please double check again against > > git@github.com:jlelli/linux.git experimental/dl-debug > >> The suspend issue still persists. >> >> My hunch is that it's rather an issue with having 0 CPUs left in DEF >> while deactivating the last isol CPU (CPU3) so we set overflow = 1 w/o >> calling __dl_overflow(). We want to account fair_server_bw=52428 >> against 0 CPUs. >> >> l B B l l l >> >> ^^^ >> isolcpus=[3,4] >> >> >> cpumask_and(mask, rd->span, cpu_active_mask) >> >> mask = [3-5] & [0-3] = [3] -> dl_bw_cpus(3) = 1 >> >> --- >> >> dl_bw_deactivate() called cpu=5 >> >> dl_bw_deactivate() called cpu=4 >> >> dl_bw_deactivate() called cpu=3 >> >> dl_bw_cpus() cpu=6 rd->span=3-5 cpu_active_mask=0-3 cpus=1 type=DEF >> ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ >> cpumask_subset(rd->span, cpu_active_mask) is false >> >> for_each_cpu_and(i, rd->span, cpu_active_mask) >> cpus++ <-- cpus is 1 !!! >> >> dl_bw_manage: cpu=3 cap=0 fair_server_bw=52428 total_bw=104856 dl_bw_cpus=1 type=DEF span=3-5 > ^^^^^^ > This still looks wrong: with a single cpu remaining we should only have > the corresponding dl server bandwidth present (unless there is some > other DL task running. > > If you already had the patch ignoring sugovs bandwidth in your set, could > you please share the full dmesg? Attached is the full dmesg from my board with your latest branch. I have not been able to get to the traces yet, because I am using the same board to debug another issue. Cheers Jon -- nvpublic