From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29427C433F5 for ; Fri, 22 Oct 2021 14:47:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7691160230 for ; Fri, 22 Oct 2021 14:47:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7691160230 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=techsingularity.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 11B7D940007; Fri, 22 Oct 2021 10:47:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0CBEB900002; Fri, 22 Oct 2021 10:47:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED5CF940007; Fri, 22 Oct 2021 10:47:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id DD4FD900002 for ; Fri, 22 Oct 2021 10:47:04 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 906C682499A8 for ; Fri, 22 Oct 2021 14:47:04 +0000 (UTC) X-FDA: 78724350768.10.88E31E5 Received: from outbound-smtp02.blacknight.com (outbound-smtp02.blacknight.com [81.17.249.8]) by imf12.hostedemail.com (Postfix) with ESMTP id B2C6210000B0 for ; Fri, 22 Oct 2021 14:47:03 +0000 (UTC) Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp02.blacknight.com (Postfix) with ESMTPS id 32A2BBADBC for ; Fri, 22 Oct 2021 15:47:02 +0100 (IST) Received: (qmail 28750 invoked from network); 22 Oct 2021 14:47:01 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.17.29]) by 81.17.254.9 with ESMTPA; 22 Oct 2021 14:47:01 -0000 From: Mel Gorman To: Andrew Morton Cc: NeilBrown , Theodore Ts'o , Andreas Dilger , "Darrick J . Wong" , Matthew Wilcox , Michal Hocko , Dave Chinner , Rik van Riel , Vlastimil Babka , Johannes Weiner , Jonathan Corbet , Linux-MM , Linux-fsdevel , LKML , Mel Gorman Subject: [PATCH v5 0/8] Remove dependency on congestion_wait in mm/ Date: Fri, 22 Oct 2021 15:46:43 +0100 Message-Id: <20211022144651.19914-1-mgorman@techsingularity.net> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Stat-Signature: t78retrkf8bpagp8wg1wg4mi1ck35w5g X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: B2C6210000B0 Authentication-Results: imf12.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf12.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.8 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net X-HE-Tag: 1634914023-113899 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This series replaces the v4 version in mmotm as the changes caused excessive conflicts. This series is also available at git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-reclaimcon= gest-v5r4 Changelog since v4 o Costmetic changes (neilb) o Correct number of writeback throttled tasks (neilb) o Use wake_up (neilb) Changelog since v3 o Count writeback completions for NR_THROTTLED_WRITTEN only o Use IRQ-safe inc_node_page_state o Remove redundant throttling This series that removes all calls to congestion_wait in mm/ and deletes wait_iff_congested. It's not a clever implementation but congestion_wait has been broken for a long time (https://lore.kernel.org/linux-mm/45d8b7a6-8548-65f5-cccf-9f451d4ae3d4@ke= rnel.dk/). Even if congestion throttling worked, it was never a great idea. While excessive dirty/writeback pages at the tail of the LRU is one possibility that reclaim may be slow, there is also the problem of too many pages being isolated and reclaim failing for other reasons (elevated references= , too many pages isolated, excessive LRU contention etc). This series replaces the "congestion" throttling with 3 different types. o If there are too many dirty/writeback pages, sleep until a timeout or enough pages get cleaned o If too many pages are isolated, sleep until enough isolated pages are either reclaimed or put back on the LRU o If no progress is being made, direct reclaim tasks sleep until another task makes progress with acceptable efficiency. This was initially tested with a mix of workloads that used to trigger corner cases that no longer work. A new test case was created called "stutterp" (pagereclaim-stutterp-noreaders in mmtests) using a freshly created XFS filesystem. Note that it may be necessary to increase the timeout of ssh if executing remotely as ssh itself can get throttled and the connection may timeout. stutterp varies the number of "worker" processes from 4 up to NR_CPUS*4 to check the impact as the number of direct reclaimers increase. It has four types of worker. o One "anon latency" worker creates small mappings with mmap() and times how long it takes to fault the mapping reading it 4K at a time o X file writers which is fio randomly writing X files where the total size of the files add up to the allowed dirty_ratio. fio is allowed to run for a warmup period to allow some file-backed pages to accumulate. The duration of the warmup is based on the best-case linear write speed of the storage. o Y file readers which is fio randomly reading small files o Z anon memory hogs which continually map (100-dirty_ratio)% of memory o Total estimated WSS =3D (100+dirty_ration) percentage of memory X+Y+Z+1 =3D=3D NR_WORKERS varying from 4 up to NR_CPUS*4 The intent is to maximise the total WSS with a mix of file and anon memor= y where some anonymous memory must be swapped and there is a high likelihoo= d of dirty/writeback pages reaching the end of the LRU. The test can be configured to have no background readers to stress dirty/writeback pages. The results below are based on having zero readers= . The short summary of the results is that the series works and stalls until some event occurs but the timeouts may need adjustment. The test results are not broken down by patch as the series should be treated as one block that replaces a broken throttling mechanism with a working one. Finally, three machines were tested but I'm reporting the worst set of results. The other two machines had much better latencies for example. First the results of the "anon latency" latency stutterp 5.15.0-rc1 5.15.0-rc1 vanilla mm-reclaimcongest-v5r4 Amean mmap-4 31.4003 ( 0.00%) 2661.0198 (-8374.52%) Amean mmap-7 38.1641 ( 0.00%) 149.2891 (-291.18%) Amean mmap-12 60.0981 ( 0.00%) 187.8105 (-212.51%) Amean mmap-21 161.2699 ( 0.00%) 213.9107 ( -32.64%) Amean mmap-30 174.5589 ( 0.00%) 377.7548 (-116.41%) Amean mmap-48 8106.8160 ( 0.00%) 1070.5616 ( 86.79%) Stddev mmap-4 41.3455 ( 0.00%) 27573.9676 (-66591.66%) Stddev mmap-7 53.5556 ( 0.00%) 4608.5860 (-8505.23%) Stddev mmap-12 171.3897 ( 0.00%) 5559.4542 (-3143.75%) Stddev mmap-21 1506.6752 ( 0.00%) 5746.2507 (-281.39%) Stddev mmap-30 557.5806 ( 0.00%) 7678.1624 (-1277.05%) Stddev mmap-48 61681.5718 ( 0.00%) 14507.2830 ( 76.48%) Max-90 mmap-4 31.4243 ( 0.00%) 83.1457 (-164.59%) Max-90 mmap-7 41.0410 ( 0.00%) 41.0720 ( -0.08%) Max-90 mmap-12 66.5255 ( 0.00%) 53.9073 ( 18.97%) Max-90 mmap-21 146.7479 ( 0.00%) 105.9540 ( 27.80%) Max-90 mmap-30 193.9513 ( 0.00%) 64.3067 ( 66.84%) Max-90 mmap-48 277.9137 ( 0.00%) 591.0594 (-112.68%) Max mmap-4 1913.8009 ( 0.00%) 299623.9695 (-15555.96%) Max mmap-7 2423.9665 ( 0.00%) 204453.1708 (-8334.65%) Max mmap-12 6845.6573 ( 0.00%) 221090.3366 (-3129.64%) Max mmap-21 56278.6508 ( 0.00%) 213877.3496 (-280.03%) Max mmap-30 19716.2990 ( 0.00%) 216287.6229 (-997.00%) Max mmap-48 477923.9400 ( 0.00%) 245414.8238 ( 48.65%) For most thread counts, the time to mmap() is unfortunately increased. In earlier versions of the series, this was lower but a large number of throttling events were reaching their timeout increasing the amount of inefficient scanning of the LRU. There is no prioritisation of reclaim tasks making progress based on each tasks rate of page allocation versus progress of reclaim. The variance is also impacted for high worker counts but in all cases, the differences in latency are not statistically significant due to very large maximum outliers. Max-90 shows that 90% of the stalls are comparable but the Max results show the massive outliers which are increased to to stalling. It is expected that this will be very machine dependant. Due to the test design, reclaim is difficult so allocations stall and there are variances depending on whether THPs can be allocated or not. The amount of memory will affect exactly how bad the corner cases are and how often they trigger. The warmup period calculation is not ideal as it's based on linear writes where as fio is randomly writing multiple files from multiple tasks so the start state of the test is variable. For example, these are the latencies on a single-socket machine that had more memory Amean mmap-4 42.2287 ( 0.00%) 49.6838 * -17.65%* Amean mmap-7 216.4326 ( 0.00%) 47.4451 * 78.08%* Amean mmap-12 2412.0588 ( 0.00%) 51.7497 ( 97.85%) Amean mmap-21 5546.2548 ( 0.00%) 51.8862 ( 99.06%) Amean mmap-30 1085.3121 ( 0.00%) 72.1004 ( 93.36%) The overall system CPU usage and elapsed time is as follows 5.15.0-rc3 5.15.0-rc3 vanilla mm-reclaimcongest-v5r4 Duration User 6989.03 983.42 Duration System 7308.12 799.68 Duration Elapsed 2277.67 2092.98 The patches reduce system CPU usage by 89% as the vanilla kernel is rarel= y stalling. The high-level /proc/vmstats show 5.15.0-rc1 5.15.0-rc1 vanilla mm-reclaimcongest-v5r2 Ops Direct pages scanned 1056608451.00 503594991.00 Ops Kswapd pages scanned 109795048.00 147289810.00 Ops Kswapd pages reclaimed 63269243.00 31036005.00 Ops Direct pages reclaimed 10803973.00 6328887.00 Ops Kswapd efficiency % 57.62 21.07 Ops Kswapd velocity 48204.98 57572.86 Ops Direct efficiency % 1.02 1.26 Ops Direct velocity 463898.83 196845.97 Kswapd scanned less pages but the detailed pattern is different. The vanilla kernel scans slowly over time where as the patches exhibits burst patterns of scan activity. Direct reclaim scanning is reduced by 52% due to stalling. The pattern for stealing pages is also slightly different. Both kernels e= xhibit spikes but the vanilla kernel when reclaiming shows pages being reclaimed= over a period of time where as the patches tend to reclaim in spikes. The diff= erence is that vanilla is not throttling and instead scanning constantly finding= some pages over time where as the patched kernel throttles and reclaims in spi= kes. Ops Percentage direct scans 90.59 77.37 For direct reclaim, vanilla scanned 90.59% of pages where as with the patches, 77.37% were direct reclaim due to throttling Ops Page writes by reclaim 2613590.00 1687131.00 Page writes from reclaim context are reduced. Ops Page writes anon 2932752.00 1917048.00 And there is less swapping. Ops Page reclaim immediate 996248528.00 107664764.00 The number of pages encountered at the tail of the LRU tagged for immedia= te reclaim but still dirty/writeback is reduced by 89%. Ops Slabs scanned 164284.00 153608.00 Slab scan activity is similar. ftrace was used to gather stall activity Vanilla ------- 1 writeback_wait_iff_congested: usec_timeout=3D100000 usec_delayed=3D= 16000 2 writeback_wait_iff_congested: usec_timeout=3D100000 usec_delayed=3D= 12000 8 writeback_wait_iff_congested: usec_timeout=3D100000 usec_delayed=3D= 8000 29 writeback_wait_iff_congested: usec_timeout=3D100000 usec_delayed=3D= 4000 82394 writeback_wait_iff_congested: usec_timeout=3D100000 usec_delayed=3D= 0 The fast majority of wait_iff_congested calls do not stall at all. What is likely happening is that cond_resched() reschedules the task for a short period when the BDI is not registering congestion (which it never will in this test setup). 1 writeback_congestion_wait: usec_timeout=3D100000 usec_delayed=3D1= 20000 2 writeback_congestion_wait: usec_timeout=3D100000 usec_delayed=3D1= 32000 4 writeback_congestion_wait: usec_timeout=3D100000 usec_delayed=3D1= 12000 380 writeback_congestion_wait: usec_timeout=3D100000 usec_delayed=3D1= 08000 778 writeback_congestion_wait: usec_timeout=3D100000 usec_delayed=3D1= 04000 congestion_wait if called always exceeds the timeout as there is no trigger to wake it up. Bottom line: Vanilla will throttle but it's not effective. Patch series ------------ Kswapd throttle activity was always due to scanning pages tagged for immediate reclaim at the tail of the LRU 1 usec_timeout=3D100000 usect_delayed=3D72000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 4 usec_timeout=3D100000 usect_delayed=3D20000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 5 usec_timeout=3D100000 usect_delayed=3D12000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 6 usec_timeout=3D100000 usect_delayed=3D16000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 11 usec_timeout=3D100000 usect_delayed=3D100000 reason=3DVMSCAN_THRO= TTLE_WRITEBACK 11 usec_timeout=3D100000 usect_delayed=3D8000 reason=3DVMSCAN_THROTT= LE_WRITEBACK 94 usec_timeout=3D100000 usect_delayed=3D0 reason=3DVMSCAN_THROTTLE_= WRITEBACK 112 usec_timeout=3D100000 usect_delayed=3D4000 reason=3DVMSCAN_THROTT= LE_WRITEBACK The majority of events did not stall or stalled for a short period. Roughly 16% of stalls reached the timeout before expiry. For direct reclaim, the number of times stalled for each reason were 6624 reason=3DVMSCAN_THROTTLE_ISOLATED 93246 reason=3DVMSCAN_THROTTLE_NOPROGRESS 96934 reason=3DVMSCAN_THROTTLE_WRITEBACK The most common reason to stall was due to excessive pages tagged for immediate reclaim at the tail of the LRU followed by a failure to make forward. A relatively small number were due to too many pages isolated from the LRU by parallel threads For VMSCAN_THROTTLE_ISOLATED, the breakdown of delays was =20 9 usec_timeout=3D20000 usect_delayed=3D4000 reason=3DVMSCAN_THROTTL= E_ISOLATED 12 usec_timeout=3D20000 usect_delayed=3D16000 reason=3DVMSCAN_THROTT= LE_ISOLATED 83 usec_timeout=3D20000 usect_delayed=3D20000 reason=3DVMSCAN_THROTT= LE_ISOLATED 6520 usec_timeout=3D20000 usect_delayed=3D0 reason=3DVMSCAN_THROTTLE_I= SOLATED Most did not stall at all. A small number reached the timeout. For VMSCAN_THROTTLE_NOPROGRESS, the breakdown of stalls were all over the map 1 usec_timeout=3D500000 usect_delayed=3D324000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 1 usec_timeout=3D500000 usect_delayed=3D332000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 1 usec_timeout=3D500000 usect_delayed=3D348000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 1 usec_timeout=3D500000 usect_delayed=3D360000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 2 usec_timeout=3D500000 usect_delayed=3D228000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 2 usec_timeout=3D500000 usect_delayed=3D260000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 2 usec_timeout=3D500000 usect_delayed=3D340000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 2 usec_timeout=3D500000 usect_delayed=3D364000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 2 usec_timeout=3D500000 usect_delayed=3D372000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 2 usec_timeout=3D500000 usect_delayed=3D428000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 2 usec_timeout=3D500000 usect_delayed=3D460000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 2 usec_timeout=3D500000 usect_delayed=3D464000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 3 usec_timeout=3D500000 usect_delayed=3D244000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 3 usec_timeout=3D500000 usect_delayed=3D252000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 3 usec_timeout=3D500000 usect_delayed=3D272000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 4 usec_timeout=3D500000 usect_delayed=3D188000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 4 usec_timeout=3D500000 usect_delayed=3D268000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 4 usec_timeout=3D500000 usect_delayed=3D328000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 4 usec_timeout=3D500000 usect_delayed=3D380000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 4 usec_timeout=3D500000 usect_delayed=3D392000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 4 usec_timeout=3D500000 usect_delayed=3D432000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 5 usec_timeout=3D500000 usect_delayed=3D204000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 5 usec_timeout=3D500000 usect_delayed=3D220000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 5 usec_timeout=3D500000 usect_delayed=3D412000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 5 usec_timeout=3D500000 usect_delayed=3D436000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 6 usec_timeout=3D500000 usect_delayed=3D488000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 7 usec_timeout=3D500000 usect_delayed=3D212000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 7 usec_timeout=3D500000 usect_delayed=3D300000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 7 usec_timeout=3D500000 usect_delayed=3D316000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 7 usec_timeout=3D500000 usect_delayed=3D472000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 8 usec_timeout=3D500000 usect_delayed=3D248000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 8 usec_timeout=3D500000 usect_delayed=3D356000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 8 usec_timeout=3D500000 usect_delayed=3D456000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 9 usec_timeout=3D500000 usect_delayed=3D124000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 9 usec_timeout=3D500000 usect_delayed=3D376000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 9 usec_timeout=3D500000 usect_delayed=3D484000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 10 usec_timeout=3D500000 usect_delayed=3D172000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 10 usec_timeout=3D500000 usect_delayed=3D420000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 10 usec_timeout=3D500000 usect_delayed=3D452000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 11 usec_timeout=3D500000 usect_delayed=3D256000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 12 usec_timeout=3D500000 usect_delayed=3D112000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 12 usec_timeout=3D500000 usect_delayed=3D116000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 12 usec_timeout=3D500000 usect_delayed=3D144000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 12 usec_timeout=3D500000 usect_delayed=3D152000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 12 usec_timeout=3D500000 usect_delayed=3D264000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 12 usec_timeout=3D500000 usect_delayed=3D384000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 12 usec_timeout=3D500000 usect_delayed=3D424000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 12 usec_timeout=3D500000 usect_delayed=3D492000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 13 usec_timeout=3D500000 usect_delayed=3D184000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 13 usec_timeout=3D500000 usect_delayed=3D444000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 14 usec_timeout=3D500000 usect_delayed=3D308000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 14 usec_timeout=3D500000 usect_delayed=3D440000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 14 usec_timeout=3D500000 usect_delayed=3D476000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 16 usec_timeout=3D500000 usect_delayed=3D140000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 17 usec_timeout=3D500000 usect_delayed=3D232000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 17 usec_timeout=3D500000 usect_delayed=3D240000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 17 usec_timeout=3D500000 usect_delayed=3D280000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 18 usec_timeout=3D500000 usect_delayed=3D404000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 20 usec_timeout=3D500000 usect_delayed=3D148000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 20 usec_timeout=3D500000 usect_delayed=3D216000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 20 usec_timeout=3D500000 usect_delayed=3D468000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 21 usec_timeout=3D500000 usect_delayed=3D448000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 23 usec_timeout=3D500000 usect_delayed=3D168000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 23 usec_timeout=3D500000 usect_delayed=3D296000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 25 usec_timeout=3D500000 usect_delayed=3D132000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 25 usec_timeout=3D500000 usect_delayed=3D352000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 26 usec_timeout=3D500000 usect_delayed=3D180000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 27 usec_timeout=3D500000 usect_delayed=3D284000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 28 usec_timeout=3D500000 usect_delayed=3D164000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 29 usec_timeout=3D500000 usect_delayed=3D136000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 30 usec_timeout=3D500000 usect_delayed=3D200000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 30 usec_timeout=3D500000 usect_delayed=3D400000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 31 usec_timeout=3D500000 usect_delayed=3D196000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 32 usec_timeout=3D500000 usect_delayed=3D156000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 33 usec_timeout=3D500000 usect_delayed=3D224000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 35 usec_timeout=3D500000 usect_delayed=3D128000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 35 usec_timeout=3D500000 usect_delayed=3D176000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 36 usec_timeout=3D500000 usect_delayed=3D368000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 36 usec_timeout=3D500000 usect_delayed=3D496000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 37 usec_timeout=3D500000 usect_delayed=3D312000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 38 usec_timeout=3D500000 usect_delayed=3D304000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 40 usec_timeout=3D500000 usect_delayed=3D288000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 43 usec_timeout=3D500000 usect_delayed=3D408000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 55 usec_timeout=3D500000 usect_delayed=3D416000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 56 usec_timeout=3D500000 usect_delayed=3D76000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 58 usec_timeout=3D500000 usect_delayed=3D120000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 59 usec_timeout=3D500000 usect_delayed=3D208000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 61 usec_timeout=3D500000 usect_delayed=3D68000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 71 usec_timeout=3D500000 usect_delayed=3D192000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 71 usec_timeout=3D500000 usect_delayed=3D480000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 79 usec_timeout=3D500000 usect_delayed=3D60000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 82 usec_timeout=3D500000 usect_delayed=3D320000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 82 usec_timeout=3D500000 usect_delayed=3D92000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 85 usec_timeout=3D500000 usect_delayed=3D64000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 85 usec_timeout=3D500000 usect_delayed=3D80000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 88 usec_timeout=3D500000 usect_delayed=3D84000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 90 usec_timeout=3D500000 usect_delayed=3D160000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 90 usec_timeout=3D500000 usect_delayed=3D292000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 94 usec_timeout=3D500000 usect_delayed=3D56000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 118 usec_timeout=3D500000 usect_delayed=3D88000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 119 usec_timeout=3D500000 usect_delayed=3D72000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 126 usec_timeout=3D500000 usect_delayed=3D108000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 146 usec_timeout=3D500000 usect_delayed=3D52000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 148 usec_timeout=3D500000 usect_delayed=3D36000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 148 usec_timeout=3D500000 usect_delayed=3D48000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 159 usec_timeout=3D500000 usect_delayed=3D28000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 178 usec_timeout=3D500000 usect_delayed=3D44000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 183 usec_timeout=3D500000 usect_delayed=3D40000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 237 usec_timeout=3D500000 usect_delayed=3D100000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 266 usec_timeout=3D500000 usect_delayed=3D32000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 313 usec_timeout=3D500000 usect_delayed=3D24000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 347 usec_timeout=3D500000 usect_delayed=3D96000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 470 usec_timeout=3D500000 usect_delayed=3D20000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 559 usec_timeout=3D500000 usect_delayed=3D16000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 964 usec_timeout=3D500000 usect_delayed=3D12000 reason=3DVMSCAN_THROT= TLE_NOPROGRESS 2001 usec_timeout=3D500000 usect_delayed=3D104000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS 2447 usec_timeout=3D500000 usect_delayed=3D8000 reason=3DVMSCAN_THROTT= LE_NOPROGRESS 7888 usec_timeout=3D500000 usect_delayed=3D4000 reason=3DVMSCAN_THROTT= LE_NOPROGRESS 22727 usec_timeout=3D500000 usect_delayed=3D0 reason=3DVMSCAN_THROTTLE_= NOPROGRESS 51305 usec_timeout=3D500000 usect_delayed=3D500000 reason=3DVMSCAN_THRO= TTLE_NOPROGRESS The full timeout is often hit but a large number also do not stall at all= . The remainder slept a little allowing other reclaim tasks to make progres= s. While this timeout could be further increased, it could also negatively impact worst-case behaviour when there is no prioritisation of what task should make progress. For VMSCAN_THROTTLE_WRITEBACK, the breakdown was 1 usec_timeout=3D100000 usect_delayed=3D44000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 2 usec_timeout=3D100000 usect_delayed=3D76000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 3 usec_timeout=3D100000 usect_delayed=3D80000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 5 usec_timeout=3D100000 usect_delayed=3D48000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 5 usec_timeout=3D100000 usect_delayed=3D84000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 6 usec_timeout=3D100000 usect_delayed=3D72000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 7 usec_timeout=3D100000 usect_delayed=3D88000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 11 usec_timeout=3D100000 usect_delayed=3D56000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 12 usec_timeout=3D100000 usect_delayed=3D64000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 16 usec_timeout=3D100000 usect_delayed=3D92000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 24 usec_timeout=3D100000 usect_delayed=3D68000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 28 usec_timeout=3D100000 usect_delayed=3D32000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 30 usec_timeout=3D100000 usect_delayed=3D60000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 30 usec_timeout=3D100000 usect_delayed=3D96000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 32 usec_timeout=3D100000 usect_delayed=3D52000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 42 usec_timeout=3D100000 usect_delayed=3D40000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 77 usec_timeout=3D100000 usect_delayed=3D28000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 99 usec_timeout=3D100000 usect_delayed=3D36000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 137 usec_timeout=3D100000 usect_delayed=3D24000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 190 usec_timeout=3D100000 usect_delayed=3D20000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 339 usec_timeout=3D100000 usect_delayed=3D16000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 518 usec_timeout=3D100000 usect_delayed=3D12000 reason=3DVMSCAN_THROT= TLE_WRITEBACK 852 usec_timeout=3D100000 usect_delayed=3D8000 reason=3DVMSCAN_THROTT= LE_WRITEBACK 3359 usec_timeout=3D100000 usect_delayed=3D4000 reason=3DVMSCAN_THROTT= LE_WRITEBACK 7147 usec_timeout=3D100000 usect_delayed=3D0 reason=3DVMSCAN_THROTTLE_= WRITEBACK 83962 usec_timeout=3D100000 usect_delayed=3D100000 reason=3DVMSCAN_THRO= TTLE_WRITEBACK The majority hit the timeout in direct reclaim context although a sizable number did not stall at all. This is very different to kswapd where only a tiny percentage of stalls due to writeback reached the timeout. Bottom line, the throttling appears to work and the wakeup events may lim= it worst case stalls. There might be some grounds for adjusting timeouts but it's likely futile as the worst-case scenarios depend on the workload, memory size and the speed of the storage. A better approach to improve the series further would be to prioritise tasks based on their rate of allocation with the caveat that it may be very expensive to track. include/linux/backing-dev.h | 1 - include/linux/mmzone.h | 15 +++ include/trace/events/vmscan.h | 38 ++++++++ include/trace/events/writeback.h | 7 -- mm/backing-dev.c | 48 ---------- mm/compaction.c | 10 +- mm/filemap.c | 1 + mm/internal.h | 21 +++++ mm/memcontrol.c | 10 +- mm/page-writeback.c | 11 ++- mm/page_alloc.c | 26 ++---- mm/vmscan.c | 151 ++++++++++++++++++++++++++++--- mm/vmstat.c | 1 + 13 files changed, 237 insertions(+), 103 deletions(-) --=20 2.31.1