* [PATCH 0/5] Fix and cleanups to page-writeback @ 2024-01-23 18:33 Kemeng Shi 2024-01-23 18:33 ` [PATCH 1/5] mm: enable __wb_calc_thresh to calculate dirty background threshold Kemeng Shi ` (5 more replies) 0 siblings, 6 replies; 15+ messages in thread From: Kemeng Shi @ 2024-01-23 18:33 UTC (permalink / raw) To: willy, akpm Cc: tj, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel This series contains some random cleanups and a fix to correct calculation of cgroup wb's bg_thresh. More details can be found respective patches. Thanks! Kemeng Shi (5): mm: enable __wb_calc_thresh to calculate dirty background threshold mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh mm: call __wb_calc_thresh instead of wb_calc_thresh in wb_over_bg_thresh mm: remove redundant check in wb_min_max_ratio mm: remove stale comment __folio_mark_dirty mm/page-writeback.c | 43 +++++++++++++++++++++---------------------- 1 file changed, 21 insertions(+), 22 deletions(-) -- 2.30.0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/5] mm: enable __wb_calc_thresh to calculate dirty background threshold 2024-01-23 18:33 [PATCH 0/5] Fix and cleanups to page-writeback Kemeng Shi @ 2024-01-23 18:33 ` Kemeng Shi 2024-01-23 18:33 ` [PATCH 2/5] mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh Kemeng Shi ` (4 subsequent siblings) 5 siblings, 0 replies; 15+ messages in thread From: Kemeng Shi @ 2024-01-23 18:33 UTC (permalink / raw) To: willy, akpm Cc: tj, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel Originally, __wb_calc_thresh always calculate wb's share of dirty throttling threshold. By getting thresh of wb_domain from caller, __wb_calc_thresh could be used for both dirty throttling and dirty background threshold. This is a preparation to correct threshold calculation of wb in cgroup. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- mm/page-writeback.c | 31 +++++++++++++++++-------------- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index cd4e4ae77c40..9268859722c4 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -838,13 +838,15 @@ static void mdtc_calc_avail(struct dirty_throttle_control *mdtc, } /** - * __wb_calc_thresh - @wb's share of dirty throttling threshold + * __wb_calc_thresh - @wb's share of dirty threshold * @dtc: dirty_throttle_context of interest + * @thresh: dirty throttling or dirty background threshold of wb_domain in @dtc * - * Note that balance_dirty_pages() will only seriously take it as a hard limit - * when sleeping max_pause per page is not enough to keep the dirty pages under - * control. For example, when the device is completely stalled due to some error - * conditions, or when there are 1000 dd tasks writing to a slow 10MB/s USB key. + * Note that balance_dirty_pages() will only seriously take dirty throttling + * threshold as a hard limit when sleeping max_pause per page is not enough + * to keep the dirty pages under control. For example, when the device is + * completely stalled due to some error conditions, or when there are 1000 + * dd tasks writing to a slow 10MB/s USB key. * In the other normal situations, it acts more gently by throttling the tasks * more (rather than completely block them) when the wb dirty pages go high. * @@ -855,19 +857,20 @@ static void mdtc_calc_avail(struct dirty_throttle_control *mdtc, * The wb's share of dirty limit will be adapting to its throughput and * bounded by the bdi->min_ratio and/or bdi->max_ratio parameters, if set. * - * Return: @wb's dirty limit in pages. The term "dirty" in the context of - * dirty balancing includes all PG_dirty and PG_writeback pages. + * Return: @wb's dirty limit in pages. For dirty throttling limit, the term + * "dirty" in the context of dirty balancing includes all PG_dirty and + * PG_writeback pages. */ -static unsigned long __wb_calc_thresh(struct dirty_throttle_control *dtc) +static unsigned long __wb_calc_thresh(struct dirty_throttle_control *dtc, + unsigned long thresh) { struct wb_domain *dom = dtc_dom(dtc); - unsigned long thresh = dtc->thresh; u64 wb_thresh; unsigned long numerator, denominator; unsigned long wb_min_ratio, wb_max_ratio; /* - * Calculate this BDI's share of the thresh ratio. + * Calculate this wb's share of the thresh ratio. */ fprop_fraction_percpu(&dom->completions, dtc->wb_completions, &numerator, &denominator); @@ -887,9 +890,9 @@ static unsigned long __wb_calc_thresh(struct dirty_throttle_control *dtc) unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh) { - struct dirty_throttle_control gdtc = { GDTC_INIT(wb), - .thresh = thresh }; - return __wb_calc_thresh(&gdtc); + struct dirty_throttle_control gdtc = { GDTC_INIT(wb) }; + + return __wb_calc_thresh(&gdtc, thresh); } /* @@ -1636,7 +1639,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc) * wb_position_ratio() will let the dirtier task progress * at some rate <= (write_bw / 2) for bringing down wb_dirty. */ - dtc->wb_thresh = __wb_calc_thresh(dtc); + dtc->wb_thresh = __wb_calc_thresh(dtc, dtc->thresh); dtc->wb_bg_thresh = dtc->thresh ? div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0; -- 2.30.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/5] mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh 2024-01-23 18:33 [PATCH 0/5] Fix and cleanups to page-writeback Kemeng Shi 2024-01-23 18:33 ` [PATCH 1/5] mm: enable __wb_calc_thresh to calculate dirty background threshold Kemeng Shi @ 2024-01-23 18:33 ` Kemeng Shi 2024-01-23 20:43 ` Tejun Heo 2024-01-23 18:33 ` [PATCH 3/5] mm: call __wb_calc_thresh instead of wb_calc_thresh " Kemeng Shi ` (3 subsequent siblings) 5 siblings, 1 reply; 15+ messages in thread From: Kemeng Shi @ 2024-01-23 18:33 UTC (permalink / raw) To: willy, akpm Cc: tj, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel The wb_calc_thresh will calculate wb's share in global wb domain. We need to wb's share in mem_cgroup_wb_domain for mdtc. Call __wb_calc_thresh instead of wb_calc_thresh to fix this. Fixes: 74d369443325 ("writeback: Fix performance regression in wb_over_bg_thresh()") Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- mm/page-writeback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 9268859722c4..f6c7f3b0f495 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2118,7 +2118,7 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb) if (mdtc->dirty > mdtc->bg_thresh) return true; - thresh = wb_calc_thresh(mdtc->wb, mdtc->bg_thresh); + thresh = __wb_calc_thresh(mdtc, mdtc->bg_thresh); if (thresh < 2 * wb_stat_error()) reclaimable = wb_stat_sum(wb, WB_RECLAIMABLE); else -- 2.30.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh 2024-01-23 18:33 ` [PATCH 2/5] mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh Kemeng Shi @ 2024-01-23 20:43 ` Tejun Heo 2024-01-24 2:01 ` Kemeng Shi 0 siblings, 1 reply; 15+ messages in thread From: Tejun Heo @ 2024-01-23 20:43 UTC (permalink / raw) To: Kemeng Shi Cc: willy, akpm, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel On Wed, Jan 24, 2024 at 02:33:29AM +0800, Kemeng Shi wrote: > The wb_calc_thresh will calculate wb's share in global wb domain. We need > to wb's share in mem_cgroup_wb_domain for mdtc. Call __wb_calc_thresh > instead of wb_calc_thresh to fix this. That function is calculating the wb's portion of wb portion in the whole system so that threshold can be distributed accordingly. So, it has to be compared in the global domain. If you look at the comment on top of struct wb_domain, it says: /* * A wb_domain represents a domain that wb's (bdi_writeback's) belong to * and are measured against each other in. There always is one global * domain, global_wb_domain, that every wb in the system is a member of. * This allows measuring the relative bandwidth of each wb to distribute * dirtyable memory accordingly. */ Also, how is this tested? Was there a case where the existing code misbehaved that's improved by this patch? Or is this just from reading code? Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh 2024-01-23 20:43 ` Tejun Heo @ 2024-01-24 2:01 ` Kemeng Shi 2024-01-29 21:00 ` Tejun Heo 0 siblings, 1 reply; 15+ messages in thread From: Kemeng Shi @ 2024-01-24 2:01 UTC (permalink / raw) To: Tejun Heo Cc: willy, akpm, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel on 1/24/2024 4:43 AM, Tejun Heo wrote: > On Wed, Jan 24, 2024 at 02:33:29AM +0800, Kemeng Shi wrote: >> The wb_calc_thresh will calculate wb's share in global wb domain. We need >> to wb's share in mem_cgroup_wb_domain for mdtc. Call __wb_calc_thresh >> instead of wb_calc_thresh to fix this. > > That function is calculating the wb's portion of wb portion in the whole > system so that threshold can be distributed accordingly. So, it has to be > compared in the global domain. If you look at the comment on top of struct > wb_domain, it says: > > /* > * A wb_domain represents a domain that wb's (bdi_writeback's) belong to > * and are measured against each other in. There always is one global > * domain, global_wb_domain, that every wb in the system is a member of. > * This allows measuring the relative bandwidth of each wb to distribute > * dirtyable memory accordingly. > */ > Hi Tejun, thanks for reply. For cgroup wb, it will belongs to a global wb domain and a cgroup domain. I agree the way how we calculate wb's threshold in global domain as you described above. This patch tries to fix calculation of wb's threshold in cgroup domain which now is wb_calc_thresh(mdtc->wb, mdtc->bg_thresh)), means: (wb bandwidth) / (system bandwidth) * (*cgroup domain threshold*) The cgroup domain threshold is (memory of cgroup domain) / (memory of system) * (system threshold). Then the wb's threshold in cgroup will be smaller than expected. Consider following domain hierarchy: global domain (100G) / \ cgroup domain1(50G) cgroup domain2(50G) | | bdi wb1 wb2 Assume wb1 and wb2 has the same bandwidth. We have global domain bg_thresh 10G, cgroup domain bg_thresh 5G. Then we have: wb's thresh in global domain = 10G * (wb bandwidth) / (system bandwidth) = 10G * 1/2 = 5G wb's thresh in cgroup domain = 5G * (wb bandwidth) / (system bandwidth) = 5G * 1/2 = 2.5G At last, wb1 and wb2 will be limited at 2.5G, the system will be limited at 5G which is less than global domain bg_thresh 10G. After the fix, threshold in cgroup domain will be: (wb bandwidth) / (cgroup bandwidth) * (cgroup domain threshold) The wb1 and wb2 will be limited at 5G, the system will be limited at 10G which equals to global domain bg_thresh 10G. As I didn't take a deep look into memory cgroup, please correct me if anything is wrong. Thanks! > Also, how is this tested? Was there a case where the existing code > misbehaved that's improved by this patch? Or is this just from reading code? This is jut from reading code. Would the case showed above convince you a bit. Look forward to your reply, thanks!. > > Thanks. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh 2024-01-24 2:01 ` Kemeng Shi @ 2024-01-29 21:00 ` Tejun Heo 2024-02-08 9:26 ` Kemeng Shi 0 siblings, 1 reply; 15+ messages in thread From: Tejun Heo @ 2024-01-29 21:00 UTC (permalink / raw) To: Kemeng Shi Cc: willy, akpm, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel Hello, On Wed, Jan 24, 2024 at 10:01:47AM +0800, Kemeng Shi wrote: > Hi Tejun, thanks for reply. For cgroup wb, it will belongs to a global wb > domain and a cgroup domain. I agree the way how we calculate wb's threshold > in global domain as you described above. This patch tries to fix calculation > of wb's threshold in cgroup domain which now is wb_calc_thresh(mdtc->wb, > mdtc->bg_thresh)), means: > (wb bandwidth) / (system bandwidth) * (*cgroup domain threshold*) > The cgroup domain threshold is > (memory of cgroup domain) / (memory of system) * (system threshold). > Then the wb's threshold in cgroup will be smaller than expected. > > Consider following domain hierarchy: > global domain (100G) > / \ > cgroup domain1(50G) cgroup domain2(50G) > | | > bdi wb1 wb2 > Assume wb1 and wb2 has the same bandwidth. > We have global domain bg_thresh 10G, cgroup domain bg_thresh 5G. > Then we have: > wb's thresh in global domain = 10G * (wb bandwidth) / (system bandwidth) > = 10G * 1/2 = 5G > wb's thresh in cgroup domain = 5G * (wb bandwidth) / (system bandwidth) > = 5G * 1/2 = 2.5G > At last, wb1 and wb2 will be limited at 2.5G, the system will be limited > at 5G which is less than global domain bg_thresh 10G. > > After the fix, threshold in cgroup domain will be: > (wb bandwidth) / (cgroup bandwidth) * (cgroup domain threshold) > The wb1 and wb2 will be limited at 5G, the system will be limited at > 10G which equals to global domain bg_thresh 10G. > > As I didn't take a deep look into memory cgroup, please correct me if > anything is wrong. Thanks! > > Also, how is this tested? Was there a case where the existing code > > misbehaved that's improved by this patch? Or is this just from reading code? > > This is jut from reading code. Would the case showed above convince you > a bit. Look forward to your reply, thanks!. So, the explanation makes some sense to me but can you please construct a case to actually demonstrate the problem and fix? I don't think it'd be wise to apply the change without actually observing the code change does what it says it does. Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh 2024-01-29 21:00 ` Tejun Heo @ 2024-02-08 9:26 ` Kemeng Shi 2024-02-08 19:32 ` Tejun Heo 0 siblings, 1 reply; 15+ messages in thread From: Kemeng Shi @ 2024-02-08 9:26 UTC (permalink / raw) To: Tejun Heo Cc: willy, akpm, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel on 1/30/2024 5:00 AM, Tejun Heo wrote: > Hello, > > On Wed, Jan 24, 2024 at 10:01:47AM +0800, Kemeng Shi wrote: >> Hi Tejun, thanks for reply. For cgroup wb, it will belongs to a global wb >> domain and a cgroup domain. I agree the way how we calculate wb's threshold >> in global domain as you described above. This patch tries to fix calculation >> of wb's threshold in cgroup domain which now is wb_calc_thresh(mdtc->wb, >> mdtc->bg_thresh)), means: >> (wb bandwidth) / (system bandwidth) * (*cgroup domain threshold*) >> The cgroup domain threshold is >> (memory of cgroup domain) / (memory of system) * (system threshold). >> Then the wb's threshold in cgroup will be smaller than expected. >> >> Consider following domain hierarchy: >> global domain (100G) >> / \ >> cgroup domain1(50G) cgroup domain2(50G) >> | | >> bdi wb1 wb2 >> Assume wb1 and wb2 has the same bandwidth. >> We have global domain bg_thresh 10G, cgroup domain bg_thresh 5G. >> Then we have: >> wb's thresh in global domain = 10G * (wb bandwidth) / (system bandwidth) >> = 10G * 1/2 = 5G >> wb's thresh in cgroup domain = 5G * (wb bandwidth) / (system bandwidth) >> = 5G * 1/2 = 2.5G >> At last, wb1 and wb2 will be limited at 2.5G, the system will be limited >> at 5G which is less than global domain bg_thresh 10G. >> >> After the fix, threshold in cgroup domain will be: >> (wb bandwidth) / (cgroup bandwidth) * (cgroup domain threshold) >> The wb1 and wb2 will be limited at 5G, the system will be limited at >> 10G which equals to global domain bg_thresh 10G. >> >> As I didn't take a deep look into memory cgroup, please correct me if >> anything is wrong. Thanks! >>> Also, how is this tested? Was there a case where the existing code >>> misbehaved that's improved by this patch? Or is this just from reading code? >> >> This is jut from reading code. Would the case showed above convince you >> a bit. Look forward to your reply, thanks!. > > So, the explanation makes some sense to me but can you please construct a > case to actually demonstrate the problem and fix? I don't think it'd be wise > to apply the change without actually observing the code change does what it > says it does. Hi Tejun, sorry for the delay as I found there is a issue that keep triggering writeback even the dirty page is under dirty background threshold. The issue make it difficult to observe the expected improvment from this patch. I try to fix it in [1] and test this patch based on the fix patches. Run test as following: /* make background writeback easier to observe */ echo 300000 > /proc/sys/vm/dirty_expire_centisecs echo 100 > /proc/sys/vm/dirty_writeback_centisecs /* enable memory and io cgroup */ echo "+memory +io" > /sys/fs/cgroup/cgroup.subtree_control /* run fio in group1 with shell */ cd /sys/fs/cgroup mkdir group1 cd group1 echo 10G > memory.high echo 10G > memory.max echo $$ > cgroup.procs mkfs.ext4 -F /dev/vdb mount /dev/vdb /bdi1/ fio -name test -filename=/bdi1/file -size=800M -ioengine=libaio -bs=4K -iodepth=1 -rw=write -direct=0 --time_based -runtime=60 -invalidate=0 /* run another fio in group2 with another shell */ cd /sys/fs/cgroup mkdir group2 cd group2 echo 10G > memory.high echo 10G > memory.max echo $$ > cgroup.procs mkfs.ext4 -F /dev/vdc mount /dev/vdc /bdi2/ fio -name test -filename=/bdi2/file -size=800M -ioengine=libaio -bs=4K -iodepth=1 -rw=write -direct=0 --time_based -runtime=60 -invalidate=0 Before the fix we got (result of three tests): fio1 WRITE: bw=1304MiB/s (1367MB/s), 1304MiB/s-1304MiB/s (1367MB/s-1367MB/s), io=76.4GiB (82.0GB), run=60001-60001msec WRITE: bw=1351MiB/s (1417MB/s), 1351MiB/s-1351MiB/s (1417MB/s-1417MB/s), io=79.2GiB (85.0GB), run=60001-60001msec WRITE: bw=1373MiB/s (1440MB/s), 1373MiB/s-1373MiB/s (1440MB/s-1440MB/s), io=80.5GiB (86.4GB), run=60001-60001msec fio2 WRITE: bw=1134MiB/s (1190MB/s), 1134MiB/s-1134MiB/s (1190MB/s-1190MB/s), io=66.5GiB (71.4GB), run=60001-60001msec WRITE: bw=1414MiB/s (1483MB/s), 1414MiB/s-1414MiB/s (1483MB/s-1483MB/s), io=82.8GiB (88.0GB), run=60001-60001msec WRITE: bw=1469MiB/s (1540MB/s), 1469MiB/s-1469MiB/s (1540MB/s-1540MB/s), io=86.0GiB (92.4GB), run=60001-60001msec After the fix we got (result of three tests): fio1 WRITE: bw=1719MiB/s (1802MB/s), 1719MiB/s-1719MiB/s (1802MB/s-1802MB/s), io=101GiB (108GB), run=60001-60001msec WRITE: bw=1723MiB/s (1806MB/s), 1723MiB/s-1723MiB/s (1806MB/s-1806MB/s), io=101GiB (108GB), run=60001-60001msec WRITE: bw=1691MiB/s (1774MB/s), 1691MiB/s-1691MiB/s (1774MB/s-1774MB/s), io=99.2GiB (106GB), run=60036-60036msec fio2 WRITE: bw=1692MiB/s (1774MB/s), 1692MiB/s-1692MiB/s (1774MB/s-1774MB/s), io=99.1GiB (106GB), run=60001-60001msec WRITE: bw=1681MiB/s (1763MB/s), 1681MiB/s-1681MiB/s (1763MB/s-1763MB/s), io=98.5GiB (106GB), run=60001-60001msec WRITE: bw=1671MiB/s (1752MB/s), 1671MiB/s-1671MiB/s (1752MB/s-1752MB/s), io=97.9GiB (105GB), run=60001-60001msec I also add code to print the pages written in writeback and pages written in writeback reduce a lot and are rare after this fix. [1] https://lore.kernel.org/linux-fsdevel/20240208172024.23625-2-shikemeng@huaweicloud.com/T/#u > > Thanks. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh 2024-02-08 9:26 ` Kemeng Shi @ 2024-02-08 19:32 ` Tejun Heo 2024-02-18 2:35 ` Kemeng Shi 0 siblings, 1 reply; 15+ messages in thread From: Tejun Heo @ 2024-02-08 19:32 UTC (permalink / raw) To: Kemeng Shi Cc: willy, akpm, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel Hello, Kemeng. On Thu, Feb 08, 2024 at 05:26:10PM +0800, Kemeng Shi wrote: > Hi Tejun, sorry for the delay as I found there is a issue that keep triggering > writeback even the dirty page is under dirty background threshold. The issue > make it difficult to observe the expected improvment from this patch. I try to > fix it in [1] and test this patch based on the fix patches. > Run test as following: Ah, that looks promising and thanks a lot for looking into this. It's great to have someone actually poring over the code and behavior. Understanding the wb and cgroup wb behaviors have always been challenging because the only thing we have is the tracepoints and it's really tedious and difficult to build an overall understanding from the trace outputs. Can I persuade you into writing a drgn monitoring script similar to e.g. tools/workqueues/wq_monitor.py? I think there's a pretty good chance the visibility can be improved substantially. Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh 2024-02-08 19:32 ` Tejun Heo @ 2024-02-18 2:35 ` Kemeng Shi 2024-02-20 17:34 ` Tejun Heo 0 siblings, 1 reply; 15+ messages in thread From: Kemeng Shi @ 2024-02-18 2:35 UTC (permalink / raw) To: Tejun Heo Cc: willy, akpm, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel on 2/9/2024 3:32 AM, Tejun Heo wrote: > Hello, Kemeng. > > On Thu, Feb 08, 2024 at 05:26:10PM +0800, Kemeng Shi wrote: >> Hi Tejun, sorry for the delay as I found there is a issue that keep triggering >> writeback even the dirty page is under dirty background threshold. The issue >> make it difficult to observe the expected improvment from this patch. I try to >> fix it in [1] and test this patch based on the fix patches. >> Run test as following: > > Ah, that looks promising and thanks a lot for looking into this. It's great > to have someone actually poring over the code and behavior. Understanding > the wb and cgroup wb behaviors have always been challenging because the only > thing we have is the tracepoints and it's really tedious and difficult to > build an overall understanding from the trace outputs. Can I persuade you > into writing a drgn monitoring script similar to e.g. > tools/workqueues/wq_monitor.py? I think there's a pretty good chance the > visibility can be improved substantially. Hi Tejun, sorry for the late reply as I was on vacation these days. I agree that visibility is poor so I have to add some printks to debug. Actually, I have added per wb stats to improve the visibility as we only have per bdi stats (/sys/kernel/debug/bdi/xxx:x/stats) now. And I plan to submit it in a new series. I'd like to add a script to improve visibility more but I can't ensure the time to do it. I would submit the monitoring script with the per wb stats if the time problem does not bother you. Thanks. > > Thanks. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh 2024-02-18 2:35 ` Kemeng Shi @ 2024-02-20 17:34 ` Tejun Heo 0 siblings, 0 replies; 15+ messages in thread From: Tejun Heo @ 2024-02-20 17:34 UTC (permalink / raw) To: Kemeng Shi Cc: willy, akpm, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel Hello, On Sun, Feb 18, 2024 at 10:35:41AM +0800, Kemeng Shi wrote: > Hi Tejun, sorry for the late reply as I was on vacation these days. > I agree that visibility is poor so I have to add some printks to debug. > Actually, I have added per wb stats to improve the visibility as we > only have per bdi stats (/sys/kernel/debug/bdi/xxx:x/stats) now. And I > plan to submit it in a new series. > I'd like to add a script to improve visibility more but I can't ensure > the time to do it. I would submit the monitoring script with the per wb > stats if the time problem does not bother you. It has had poor visiblity for many many years, I don't think we're in any hurry. Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/5] mm: call __wb_calc_thresh instead of wb_calc_thresh in wb_over_bg_thresh 2024-01-23 18:33 [PATCH 0/5] Fix and cleanups to page-writeback Kemeng Shi 2024-01-23 18:33 ` [PATCH 1/5] mm: enable __wb_calc_thresh to calculate dirty background threshold Kemeng Shi 2024-01-23 18:33 ` [PATCH 2/5] mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh Kemeng Shi @ 2024-01-23 18:33 ` Kemeng Shi 2024-01-23 18:33 ` [PATCH 4/5] mm: remove redundant check in wb_min_max_ratio Kemeng Shi ` (2 subsequent siblings) 5 siblings, 0 replies; 15+ messages in thread From: Kemeng Shi @ 2024-01-23 18:33 UTC (permalink / raw) To: willy, akpm Cc: tj, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel Call __wb_calc_thresh directly to remove unnecessary wrap of wb_calc_thresh. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- mm/page-writeback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index f6c7f3b0f495..5c19ebffe5be 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2098,7 +2098,7 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb) if (gdtc->dirty > gdtc->bg_thresh) return true; - thresh = wb_calc_thresh(gdtc->wb, gdtc->bg_thresh); + thresh = __wb_calc_thresh(gdtc, gdtc->bg_thresh); if (thresh < 2 * wb_stat_error()) reclaimable = wb_stat_sum(wb, WB_RECLAIMABLE); else -- 2.30.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/5] mm: remove redundant check in wb_min_max_ratio 2024-01-23 18:33 [PATCH 0/5] Fix and cleanups to page-writeback Kemeng Shi ` (2 preceding siblings ...) 2024-01-23 18:33 ` [PATCH 3/5] mm: call __wb_calc_thresh instead of wb_calc_thresh " Kemeng Shi @ 2024-01-23 18:33 ` Kemeng Shi 2024-01-23 18:33 ` [PATCH 5/5] mm: remove stale comment __folio_mark_dirty Kemeng Shi 2024-01-23 20:46 ` [PATCH 0/5] Fix and cleanups to page-writeback Tejun Heo 5 siblings, 0 replies; 15+ messages in thread From: Kemeng Shi @ 2024-01-23 18:33 UTC (permalink / raw) To: willy, akpm Cc: tj, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel We initialize bdi->max_ratio with a valid value (100 * BDI_RATIO_SCALE) in bdi_init and we always set bdi->max_ratio with a valid value (< 100 * BDI_RATIO_SCALE) in __bdi_set_max_ratio. So the validation of max_ratio in wb_min_max_ratio is redundant. Just remove it. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- mm/page-writeback.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 5c19ebffe5be..f25393034c76 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -198,10 +198,8 @@ static void wb_min_max_ratio(struct bdi_writeback *wb, min *= this_bw; min = div64_ul(min, tot_bw); } - if (max < 100 * BDI_RATIO_SCALE) { - max *= this_bw; - max = div64_ul(max, tot_bw); - } + max *= this_bw; + max = div64_ul(max, tot_bw); } *minp = min; -- 2.30.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 5/5] mm: remove stale comment __folio_mark_dirty 2024-01-23 18:33 [PATCH 0/5] Fix and cleanups to page-writeback Kemeng Shi ` (3 preceding siblings ...) 2024-01-23 18:33 ` [PATCH 4/5] mm: remove redundant check in wb_min_max_ratio Kemeng Shi @ 2024-01-23 18:33 ` Kemeng Shi 2024-01-23 14:07 ` Matthew Wilcox 2024-01-23 20:46 ` [PATCH 0/5] Fix and cleanups to page-writeback Tejun Heo 5 siblings, 1 reply; 15+ messages in thread From: Kemeng Shi @ 2024-01-23 18:33 UTC (permalink / raw) To: willy, akpm Cc: tj, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel The __folio_mark_dirty will not mark inode dirty any longer. Remove the stale comment of it. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- mm/page-writeback.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/page-writeback.c b/mm/page-writeback.c index f25393034c76..03e2797bc983 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2647,8 +2647,7 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb) } /* - * Mark the folio dirty, and set it dirty in the page cache, and mark - * the inode dirty. + * Mark the folio dirty, and set it dirty in the page cache. * * If warn is true, then emit a warning if the folio is not uptodate and has * not been truncated. -- 2.30.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 5/5] mm: remove stale comment __folio_mark_dirty 2024-01-23 18:33 ` [PATCH 5/5] mm: remove stale comment __folio_mark_dirty Kemeng Shi @ 2024-01-23 14:07 ` Matthew Wilcox 0 siblings, 0 replies; 15+ messages in thread From: Matthew Wilcox @ 2024-01-23 14:07 UTC (permalink / raw) To: Kemeng Shi Cc: akpm, tj, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel On Wed, Jan 24, 2024 at 02:33:32AM +0800, Kemeng Shi wrote: > The __folio_mark_dirty will not mark inode dirty any longer. Remove the > stale comment of it. > > Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/5] Fix and cleanups to page-writeback 2024-01-23 18:33 [PATCH 0/5] Fix and cleanups to page-writeback Kemeng Shi ` (4 preceding siblings ...) 2024-01-23 18:33 ` [PATCH 5/5] mm: remove stale comment __folio_mark_dirty Kemeng Shi @ 2024-01-23 20:46 ` Tejun Heo 5 siblings, 0 replies; 15+ messages in thread From: Tejun Heo @ 2024-01-23 20:46 UTC (permalink / raw) To: Kemeng Shi Cc: willy, akpm, hcochran, mszeredi, axboe, linux-fsdevel, linux-mm, linux-kernel On Wed, Jan 24, 2024 at 02:33:27AM +0800, Kemeng Shi wrote: > This series contains some random cleanups and a fix to correct > calculation of cgroup wb's bg_thresh. More details can be found > respective patches. Thanks! > > Kemeng Shi (5): > mm: enable __wb_calc_thresh to calculate dirty background threshold > mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh > mm: call __wb_calc_thresh instead of wb_calc_thresh in > wb_over_bg_thresh > mm: remove redundant check in wb_min_max_ratio > mm: remove stale comment __folio_mark_dirty I don't have the fifth patch in my inbox and I find the patchset difficult to review because it's changing subtle behaviors without sufficient justifications or concrete cases which motivated the changes. For now, I think it'd be prudent to not apply this patchset. Thanks. -- tejun ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-02-20 17:34 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-01-23 18:33 [PATCH 0/5] Fix and cleanups to page-writeback Kemeng Shi 2024-01-23 18:33 ` [PATCH 1/5] mm: enable __wb_calc_thresh to calculate dirty background threshold Kemeng Shi 2024-01-23 18:33 ` [PATCH 2/5] mm: correct calculation of cgroup wb's bg_thresh in wb_over_bg_thresh Kemeng Shi 2024-01-23 20:43 ` Tejun Heo 2024-01-24 2:01 ` Kemeng Shi 2024-01-29 21:00 ` Tejun Heo 2024-02-08 9:26 ` Kemeng Shi 2024-02-08 19:32 ` Tejun Heo 2024-02-18 2:35 ` Kemeng Shi 2024-02-20 17:34 ` Tejun Heo 2024-01-23 18:33 ` [PATCH 3/5] mm: call __wb_calc_thresh instead of wb_calc_thresh " Kemeng Shi 2024-01-23 18:33 ` [PATCH 4/5] mm: remove redundant check in wb_min_max_ratio Kemeng Shi 2024-01-23 18:33 ` [PATCH 5/5] mm: remove stale comment __folio_mark_dirty Kemeng Shi 2024-01-23 14:07 ` Matthew Wilcox 2024-01-23 20:46 ` [PATCH 0/5] Fix and cleanups to page-writeback Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).