From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13561C48260 for ; Fri, 9 Feb 2024 02:31:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2497E6B0071; Thu, 8 Feb 2024 21:31:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F9A26B0072; Thu, 8 Feb 2024 21:31:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C14A6B0074; Thu, 8 Feb 2024 21:31:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id F2A6A6B0071 for ; Thu, 8 Feb 2024 21:31:40 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 72D2EA2618 for ; Fri, 9 Feb 2024 02:31:40 +0000 (UTC) X-FDA: 81770689560.10.2E59121 Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) by imf22.hostedemail.com (Postfix) with ESMTP id 80388C0011 for ; Fri, 9 Feb 2024 02:31:38 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=chrisdown.name header.s=google header.b=cSPhc+7B; spf=pass (imf22.hostedemail.com: domain of chris@chrisdown.name designates 209.85.128.43 as permitted sender) smtp.mailfrom=chris@chrisdown.name; dmarc=pass (policy=quarantine) header.from=chrisdown.name ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707445898; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=1MlJk4a+ACUd7zc2tyVc4R+Cusd9XX87Ftg32HapFTY=; b=gvZOhhQSVyo1hK8dUcw+cRnWflKRPcZdP027OsyfAmSLK8toVYPhRZNm7iL+/rgXwKRhQN aKi9wToPPQySUFhrUYpoBqE1P0SrcS/G0LhJNe8XZrpWh8tEDDd28IQWjXPf/HldmB1Fy+ b3+UHPlcsuF2jdP3NqRRrxdCJRFcjDo= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=chrisdown.name header.s=google header.b=cSPhc+7B; spf=pass (imf22.hostedemail.com: domain of chris@chrisdown.name designates 209.85.128.43 as permitted sender) smtp.mailfrom=chris@chrisdown.name; dmarc=pass (policy=quarantine) header.from=chrisdown.name ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707445898; a=rsa-sha256; cv=none; b=vIb3gkc7pw0U83aAKCEwQEUq7xuqSMGBlaKnVkJKYW5C2CF6TEDwtxig56fRxkxkDLBV8n hoQSXtbf60+5sO+BpW0Rtcl8EEEpxDLYa/w2j6drDjSEYYvnPk11hjPASaXJvhpOcAqvaq 5b7M1GVPx8D4O9jicTcfQmzb/gMdaV4= Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-41061f068cdso2103665e9.0 for ; Thu, 08 Feb 2024 18:31:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chrisdown.name; s=google; t=1707445897; x=1708050697; darn=kvack.org; h=user-agent:content-disposition:mime-version:message-id:subject:cc :to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=1MlJk4a+ACUd7zc2tyVc4R+Cusd9XX87Ftg32HapFTY=; b=cSPhc+7BuqCf2u7XHjXs94JZcLdKegOFEOpayK16MhzjtCYHW+0+97lo+t9DQSK2Ck 1hEK+9ddPmcZbaWP9pmaAJjG70rovbidwUhbRegKJuQklCMLgMaR6X5l791hd/nmVUXT tNNlHYhT1Q8NzP70qAlfaemjanQ9h+MEwrZA0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707445897; x=1708050697; h=user-agent:content-disposition:mime-version:message-id:subject:cc :to:from:date:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1MlJk4a+ACUd7zc2tyVc4R+Cusd9XX87Ftg32HapFTY=; b=Qo/vcuDk0tqk9TSyTuEYvM8APw892NKqLQghhetaqXgo4M5qDQWUymHpMGJC61xtQQ Iaxttn3/97gQPa/dKXOE3rPRcZOBOwnn4hQfE5Xfo4MDy77qIq/odqmjLZRyDgaCH8Nr xOcg4vUGDV4hvMKEtD8jSRhs8wp/hYv+qgEMR3UF3eWF2Z4v7J1iqpmbr5HMd59NxgnO ryU3bm9cy13aEJoXtXbPCffj7ZrkRUxqV3H7q+P5C5io/8XyKGMs/zdjeSI4SuHHdZUT YsiWERwJuBxHkZJCFrF6HrbwgTK8LCeO8/GFctwU0mdqdpkNWmOTx2TojPJ5ub+iCElx vrdw== X-Gm-Message-State: AOJu0Yy2RZK2h6qTjJyFFmkF7TBM4GW0eu655BNt2/E/fR40q7j+4RB9 MUtVBspJgCZV2WAKjcirTHR1tfeU1UfkPN3jnSwk/FTJ5XsluxbGlfF2oDaxqze51M3ptuNAP0G 2 X-Google-Smtp-Source: AGHT+IFGD6CCwF3dUnWu7TcoZqtOrRS9SOm0ra5hH2MasIfjHAO3fyGCUZ0Iur4YNANfLYFf5Tx5Pg== X-Received: by 2002:a05:600c:4f13:b0:410:5876:63bf with SMTP id l19-20020a05600c4f1300b00410587663bfmr112973wmq.17.1707445896386; Thu, 08 Feb 2024 18:31:36 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCV2GGD974d9AiB3cRsdIOSmFYa/arZhxt8Xb/7r/BBmbbHjtsDFQXsQzAnJViS5YIOEtMkKH/ITq75WJk4sdENI5wAM6KO3Qd3FaNq9WVrEByG9U/7lWlO7Aw3+NZc5+I1x8vjewMR79e6o7Jqpcubyw4snGnCw6XEQoPuExFXLpGfkjw== Received: from localhost ([2a01:4b00:8432:8600:5ee4:2aff:fe50:f48d]) by smtp.gmail.com with ESMTPSA id g6-20020a05600c4ec600b0040ff583e17csm1044254wmq.9.2024.02.08.18.31.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 18:31:35 -0800 (PST) Date: Fri, 9 Feb 2024 02:31:35 +0000 From: Chris Down To: Yu Zhao Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@fb.com, Johannes Weiner Subject: MGLRU premature memcg OOM on slow writes Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline User-Agent: Mutt/2.2.12 (2023-09-09) X-Rspamd-Queue-Id: 80388C0011 X-Rspam-User: X-Stat-Signature: 6xgb8zbefkyhbfkp5gz7dwputdc5dyjc X-Rspamd-Server: rspam01 X-HE-Tag: 1707445898-312478 X-HE-Meta: U2FsdGVkX19bilSaKC8kziIUpFOd+MWWE6sZ4q7aigLn8TqDBcWpRSi3TbC8M2rkM8FCbIFG/BqfFpSsTP1A1oMWnzgZwXNQbi/e5ucCKXj/tfS2K8Fb5ueXgtQHNjO8VOLDP4i9mXquDZc+JxBSortXB3+HZrTQqvdQ6I40xU1hhkJ+ojXsDh/zMpU7PwVlIpnOQQM/Z7ELVNcpbiTMaf5etpuisnWX/LPh3OMv02TqOd3hT2mDOay7mP8q9SYsujUjPEOkr0jrQZNddOgGpfnSx4b/BJEYtYBLj6Knaqw/RI3phuv2wm9bZrQr+w/PBioRZKU+iW8gzBdivCGI85mWvXpU7JdyvSc0MtA6vYpNzPkKw3GEwEiB8yACtNL9i3VSlgHj85UZSmIxhZmAhP7ZiKx+95qC+kYmFttAZsWM/WXkx6+amJXczlNfM5SHo8/9yj6PrG5Df5L9duBi3CKOOn1/a/juAFHDLdQm3+g+aRF6BFk+6POyYR14hvapAOEtnW+H8rqbJFrx0AhHT+l61sQoTo45Gl445lQ9xEnYg1qsYCQyaALeg7tARwvn0Bg/y3n9lOEmxdYmIIYP3j1gNykRMbw7OZKVEf4hXr/fs6ryEAiY70r002hWUyecZrz1BljcaeZxt58nliajhIZiFH/oJaVPpGWGfO7+FEI+kZHDQsRhxueQsljFj5pAvNAMcRIiRNy552h5UDwyx15RbZdtJTeh1XOJSyt99DR8R4zEmbhorsb8WS9INgkMKTlRwl1L8zQguJIJDs4NYN0WPxIFgiPZ0JroUHpiZdr88kQ+K0cCOI0XgzR2igvAZYQGllXaGyaEnwvQDlgKghGyZjhLs4dbHG1uFMs+Yejt+HOoeyFsGEqYaoYje8wVu90ckh+1iZpHKFC8G0jvNPp9cRoEAGmsRFATo91C62KXhSHrKdGwu6eS/gEpCS/xrojqIU+xtsfalgvc+M+ g4FMUY/q WYFfUOGBZPEXEA9i/5BqKKMrPlK/gV0A2f0D6qCo46xHEbYl7oKMUdZ0s4/RtX71EeYHHomOD0w8uEFsF591k/iZC/G+U5kNK66MWkc0DNNQgExPKwEMpiHv+5tI1P3IEjszvcU/hGW3eIBEIwipRY9J990PoVjE/wk30XYQ6GUxWJ7yaUO44WZg7zUWvUJJm/KBQ5An9KHKDXzWRHcULdz0PcI1kJ5E18BGo8na1VgdJDVmn6EfbSi822r76QBA+9nTC81RNr1wLWd5CuojSpOMv7wep4C/anmCyw6SaaBJYM983B4UXgX+169Zt5bZWQY5kdoxFTzDYYF8DRqZBX7k3ipcqnCu2/30B8QjjnbsFp0aFVlY7J4/Dzk2iXFmv2PYYMC0Xv5uG49jh+X9UERnqzzS1j0Yc70S7VSCS+R7d6Pu8L4tFHGGmQ5waR1+g0TAYqsR+024DlglwZ+BiqccWG69eUr7UYdVjFK8LDd9gRxNAa8+TjRGwofQeYmijMlE2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Yu, When running with MGLRU I'm encountering premature OOMs when transferring files to a slow disk. On non-MGLRU setups, writeback flushers are awakened and get to work. But on MGLRU, one can see OOM killer outputs like the following when doing an rsync with a memory.max of 32M: --- % systemd-run --user -t -p MemoryMax=32M -- rsync -rv ... /mnt/usb Running as unit: run-u640.service Press ^] three times within 1s to disconnect TTY. sending incremental file list ... rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(713) [generator=3.2.7] --- [41368.535735] Memory cgroup out of memory: Killed process 128824 (rsync) total-vm:14008kB, anon-rss:256kB, file-rss:5504kB, shmem-rss:0kB, UID:1000 pgtables:64kB oom_score_adj:200 [41369.847965] rsync invoked oom-killer: gfp_mask=0x408d40(GFP_NOFS|__GFP_NOFAIL|__GFP_ZERO|__GFP_ACCOUNT), order=0, oom_score_adj=200 [41369.847972] CPU: 1 PID: 128826 Comm: rsync Tainted: G S OE 6.7.4-arch1-1 #1 20d30c48b78a04be2046f4b305b40455f0b5b38b [41369.847975] Hardware name: LENOVO 20WNS23A0G/20WNS23A0G, BIOS N35ET53W (1.53 ) 03/22/2023 [41369.847977] Call Trace: [41369.847978] [41369.847980] dump_stack_lvl+0x47/0x60 [41369.847985] dump_header+0x45/0x1b0 [41369.847988] oom_kill_process+0xfa/0x200 [41369.847990] out_of_memory+0x244/0x590 [41369.847992] mem_cgroup_out_of_memory+0x134/0x150 [41369.847995] try_charge_memcg+0x76d/0x870 [41369.847998] ? try_charge_memcg+0xcd/0x870 [41369.848000] obj_cgroup_charge+0xb8/0x1b0 [41369.848002] kmem_cache_alloc+0xaa/0x310 [41369.848005] ? alloc_buffer_head+0x1e/0x80 [41369.848007] alloc_buffer_head+0x1e/0x80 [41369.848009] folio_alloc_buffers+0xab/0x180 [41369.848012] ? __pfx_fat_get_block+0x10/0x10 [fat 0a109de409393851f8a884f020fb5682aab8dcd1] [41369.848021] create_empty_buffers+0x1d/0xb0 [41369.848023] __block_write_begin_int+0x524/0x600 [41369.848026] ? __pfx_fat_get_block+0x10/0x10 [fat 0a109de409393851f8a884f020fb5682aab8dcd1] [41369.848031] ? __filemap_get_folio+0x168/0x2e0 [41369.848033] ? __pfx_fat_get_block+0x10/0x10 [fat 0a109de409393851f8a884f020fb5682aab8dcd1] [41369.848038] block_write_begin+0x52/0x120 [41369.848040] fat_write_begin+0x34/0x80 [fat 0a109de409393851f8a884f020fb5682aab8dcd1] [41369.848046] ? __pfx_fat_get_block+0x10/0x10 [fat 0a109de409393851f8a884f020fb5682aab8dcd1] [41369.848051] generic_perform_write+0xd6/0x240 [41369.848054] generic_file_write_iter+0x65/0xd0 [41369.848056] vfs_write+0x23a/0x400 [41369.848060] ksys_write+0x6f/0xf0 [41369.848063] do_syscall_64+0x61/0xe0 [41369.848065] ? do_user_addr_fault+0x304/0x670 [41369.848069] ? exc_page_fault+0x7f/0x180 [41369.848071] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [41369.848074] RIP: 0033:0x7965df71a184 [41369.848116] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d c5 3e 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48 [41369.848117] RSP: 002b:00007fffee661738 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [41369.848119] RAX: ffffffffffffffda RBX: 0000570f66343bb0 RCX: 00007965df71a184 [41369.848121] RDX: 0000000000040000 RSI: 0000570f66343bb0 RDI: 0000000000000003 [41369.848122] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000570f66343b20 [41369.848122] R10: 0000000000000008 R11: 0000000000000202 R12: 0000000000000649 [41369.848123] R13: 0000570f651f8b40 R14: 0000000000008000 R15: 0000570f6633bba0 [41369.848125] [41369.848126] memory: usage 32768kB, limit 32768kB, failcnt 21239 [41369.848126] swap: usage 2112kB, limit 9007199254740988kB, failcnt 0 [41369.848127] Memory cgroup stats for /user.slice/user-1000.slice/user@1000.service/app.slice/run-u640.service: [41369.848174] anon 0 [41369.848175] file 26927104 [41369.848176] kernel 6615040 [41369.848176] kernel_stack 32768 [41369.848177] pagetables 122880 [41369.848177] sec_pagetables 0 [41369.848177] percpu 480 [41369.848178] sock 0 [41369.848178] vmalloc 0 [41369.848178] shmem 0 [41369.848179] zswap 312451 [41369.848179] zswapped 1458176 [41369.848179] file_mapped 0 [41369.848180] file_dirty 26923008 [41369.848180] file_writeback 0 [41369.848180] swapcached 12288 [41369.848181] anon_thp 0 [41369.848181] file_thp 0 [41369.848181] shmem_thp 0 [41369.848182] inactive_anon 0 [41369.848182] active_anon 12288 [41369.848182] inactive_file 15908864 [41369.848183] active_file 11014144 [41369.848183] unevictable 0 [41369.848183] slab_reclaimable 5963640 [41369.848184] slab_unreclaimable 89048 [41369.848184] slab 6052688 [41369.848185] workingset_refault_anon 4031 [41369.848185] workingset_refault_file 9236 [41369.848185] workingset_activate_anon 691 [41369.848186] workingset_activate_file 2553 [41369.848186] workingset_restore_anon 691 [41369.848186] workingset_restore_file 0 [41369.848187] workingset_nodereclaim 0 [41369.848187] pgscan 40473 [41369.848187] pgsteal 20881 [41369.848188] pgscan_kswapd 0 [41369.848188] pgscan_direct 40473 [41369.848188] pgscan_khugepaged 0 [41369.848189] pgsteal_kswapd 0 [41369.848189] pgsteal_direct 20881 [41369.848190] pgsteal_khugepaged 0 [41369.848190] pgfault 6019 [41369.848190] pgmajfault 4033 [41369.848191] pgrefill 30578988 [41369.848191] pgactivate 2925 [41369.848191] pgdeactivate 0 [41369.848192] pglazyfree 0 [41369.848192] pglazyfreed 0 [41369.848192] zswpin 1520 [41369.848193] zswpout 1141 [41369.848193] thp_fault_alloc 0 [41369.848193] thp_collapse_alloc 0 [41369.848194] thp_swpout 0 [41369.848194] thp_swpout_fallback 0 [41369.848194] Tasks state (memory values in pages): [41369.848195] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [41369.848195] [ 128825] 1000 128825 3449 864 65536 192 200 rsync [41369.848198] [ 128826] 1000 128826 3523 288 57344 288 200 rsync [41369.848199] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/user.slice/user-1000.slice/user@1000.service/app.slice/run-u640.service,task_memcg=/user.slice/user-1000.slice/user@1000.service/app.slice/run-u640.service,task=rsync,pid=128825,uid=1000 [41369.848207] Memory cgroup out of memory: Killed process 128825 (rsync) total-vm:13796kB, anon-rss:0kB, file-rss:3456kB, shmem-rss:0kB, UID:1000 pgtables:64kB oom_score_adj:200 --- Importantly, note that there appears to be no attempt to write back before declaring OOM -- file_writeback is 0 when file_dirty is 26923008. The issue is consistently reproducible (and thanks Johannes for looking at this with me). On non-MGLRU, flushers are active and are making forward progress in preventing OOM. This is writing to a slow disk with about ~10MiB/s available write speed, so the CPU and read speed is far faster than the write speed the disk can take. Is this a known problem in MGLRU? If not, could you point me to where MGLRU tries to handle flusher wakeup on slow I/O? I didn't immediately find it. Thanks, Chris