From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFAE0EB64D8 for ; Thu, 22 Jun 2023 14:12:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 092098D0002; Thu, 22 Jun 2023 10:12:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 01A658D0001; Thu, 22 Jun 2023 10:12:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD6F58D0002; Thu, 22 Jun 2023 10:12:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CD1FB8D0001 for ; Thu, 22 Jun 2023 10:12:42 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 85D7CA014F for ; Thu, 22 Jun 2023 14:12:42 +0000 (UTC) X-FDA: 80930574564.05.34E5A14 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf10.hostedemail.com (Postfix) with ESMTP id 85236C0075 for ; Thu, 22 Jun 2023 14:11:43 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b="TxD4/6Pd"; spf=pass (imf10.hostedemail.com: domain of pmladek@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=pmladek@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687443103; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JO+qUDNdUNVoPD0FhaVbRE/taqiewkBn+vmFp3w+xqM=; b=X0q+rnt0hKek3ngu7g1ysnQ1RmNzzFve1gsA+CeZCB3yfTD2fzmYcm7lXRB0j/7VgYw1JB oyHOENYQ/hQ/tf4WdZoZsLwRnwGyJ9J9t96cKrrvDE6e4cdtkRTvkIDTK4p9dEAh6AhzS1 mRYWugK0MmBYx6vpsWfWflRnxYXczHc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687443103; a=rsa-sha256; cv=none; b=2OFXcl2EfS9v2u7U1ig6gRqvTxyWOT06PlNJ5YDhW9t9aZgASLDn2vHdGGMdbJVB9WBy/h X+fBQn2G6tgssqpXOlwqsQMBb/xZTjSCHjEVTee7qUeFSvtFpTVLh8LD5AnwYtPczbImjk yWE6PxfK/cHx81RluNxdaPosAFFF/KY= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b="TxD4/6Pd"; spf=pass (imf10.hostedemail.com: domain of pmladek@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=pmladek@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id DB4A31FD7C; Thu, 22 Jun 2023 14:11:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1687443101; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JO+qUDNdUNVoPD0FhaVbRE/taqiewkBn+vmFp3w+xqM=; b=TxD4/6Pd8Xgx2ZYaCjOjGJRDqLQ9aWSDbb3sda96hnurAB2bOy5g21H6q3EGLOSQJOdK/u eV4Jx9fAcK+jZQSbxb5p5uYTOvCCwsZGmf1LPA9TGND5oEqfMBiZgXHJtUnNwApnFKUVCY 7rwTaoAKYLMaAbXC5mw+goLpSj8ADZ8= Received: from suse.cz (unknown [10.100.201.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 561D02C141; Thu, 22 Jun 2023 14:11:40 +0000 (UTC) Date: Thu, 22 Jun 2023 16:11:39 +0200 From: Petr Mladek To: Tetsuo Handa Cc: Sebastian Andrzej Siewior , linux-mm@kvack.org, "Luis Claudio R. Goncalves" , Andrew Morton , Mel Gorman , Michal Hocko , Thomas Gleixner Subject: Re: [PATCH] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save(). Message-ID: References: <20230621104034.HT6QnNkQ@linutronix.de> <0e9fc992-8e05-2e63-b3b1-d8d3ce89fc16@I-love.SAKURA.ne.jp> <20230621130641.-5iueY1I@linutronix.de> <20230621143421.BgHjJklo@linutronix.de> <01031ffe-c81f-9cec-76fb-e70d548429cf@I-love.SAKURA.ne.jp> <8b6d3f39-c573-ca2b-957b-8c48c2fa68ad@I-love.SAKURA.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 85236C0075 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: zojcsm7ioe5bmwdtnpf5ao8kb4m1rkyt X-HE-Tag: 1687443103-177158 X-HE-Meta: U2FsdGVkX1+k2SybTHO07/AcO/nfE98xpUfP6MvSggDAk9wVd55pU0WrzXydrUgReFXCDR2J95sz4OfNkcTGw2tJvas1XAsrJQTGlawJvo0xlMomnuqV2+a9xoJMLdmELKmdAZdZX7jDev9TCQL+RI3FmzcC6UK2X+nDhS57F0bt56h4hCYnXagqenpH/7pQ40f82+YW1kGKI/e8HQLzA4pW4k8APmrBQjxD4AVOeHRB/m5/ySu+NHLKpYV2BbZxsIdujTl0wm16hRrbl+v9ZNF8qjq6YszcqQxL4laDwCVFVX0AqcouraL2/CZ9GM9RgIvUWBazo0VDJwBL5XGwMkAbwbBL5FwCnNa8KnI6yNqX8ZLCQP8kzeR9taoxOxYAvlJ51mlZA7XSMRTI4mPfnowsnrmcB3Yr+ia/jz1bOclU2aP9/TYXH0Xh2lcjO+A1xHfIBrGQjelVxKHzvLP9YJDVr9zfD4szZ+pJbYKCpkp/mRT9h+ttsZNLuK+ilEeKIOujnQbrR6j+4ex+jpsqgO1REvyh95W/CGwynudVyuViEGUSnoE+M/527PrFdEybQ2rvkGAwEMm3ok01D3kBwqUiyf309Ht7dBVdYBK8XHfDQ0fnkhIey7oc7k9/jBGkHxz/gw0FhWzejF5WyLihdnwTJbsQ227uNzsE4IVXadOLDGUruMe9WMBhijPkqksjyewb07pWlYmeCGhCz6/QHKyg0brN5O5fMRSwbAE3I/EX8sfww8sfjJ4wiOTm/zqFz4dAWF5Gf70Mn78PY0U3xyLdL7urKAZrHye77yaBZAuOwyicFo91RRojoQl7luFL3/FrMkNYeD5hAOobsQwwUll0vognol6Up6cIl9wWs1KaYnkb17QwaqfD/fcReQfE3iH5sxrNEejReKsEA0ji1WFCvxx2wGw1mcdS0LANtZ/GFjVH1r36a3bNv9+3BZqu64/V7fGYUyRzPZiAS3n KiTe9bDx 5cAjAMoOdb4bcZtu9JM0r2oaG4NwdqRkxt1jUbiB1wUREoPkgV0xV7uRcEiWyi74gRXx9FQoFs5u8G5fPpe5TmU4rismiecQ/lqRXLZLATVtkxshLO6yQ2xy0mKtLrQnpbNu6Hv6M6lymTVZ51WxCC7nM5fq86ivyjjTcwR4XzOQyBO++JGur8cmbzWge1l/O4rOk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 2023-06-22 22:36:27, Tetsuo Handa wrote: > On 2023/06/22 8:24, Tetsuo Handa wrote: > > By the way, given > > > > write_seqlock_irqsave(&zonelist_update_seq, flags); > > <> > > some_timer_function() { > > kmalloc(GFP_ATOMIC); > > } > > <> > > printk_deferred_enter(); > > > > scenario in CONFIG_PREEMPT_RT=y case is handled by executing some_timer_function() > > on a dedicated kernel thread for IRQs, what guarantees that the kernel thread for > > IRQs gives up CPU and the user thread which called write_seqlock() gains CPU until > > write_sequnlock() is called? How can the kernel figure out that executing the user > > thread needs higher priority than the kernel thread? > > I haven't got response on this question. > > Several years ago, I demonstrated that a SCHED_IDLE priority userspace thread holding > oom_lock causes other concurrently allocating !SCHED_IDLE priority threads to > misunderstand that mutex_trylock(&oom_lock) failure implies we are making forward > progress (despite the SCHED_IDLE priority userspace thread was unable to wake up for > minutes). > > If a SCHED_IDLE priority thread which called write_seqlock_irqsave() is preempted by > some other !SCHED_IDLE priority threads (especially realtime priority threads), and > such !SCHED_IDLE priority thread calls kmalloc(GFP_ATOMIC) or printk(), a similar thing > (misunderstand that spinning on read_seqbegin() from zonelist_iter_begin() can make > forward progress despite a thread which called write_seqlock_irqsave() cannot make > progress due to preemption) can happen. > > Question to Sebastian: > To make sure that such thing cannot happen, we should make sure that > a thread which entered write_seqcount_begin(&zonelist_update_seq.seqcount) from > write_seqlock_irqsave(&zonelist_update_seq, flags) can continue using CPU until > write_seqcount_end(&zonelist_update_seq.seqcount) from > write_seqlock_irqrestore(&zonelist_update_seq, flags). > Does adding preempt_disable() before write_seqlock(&zonelist_update_seq, flags) help? > > > > Question to Peter: > Even if local_irq_save(flags) disables IRQ, NMI context can enqueue message via printk(). > When does the message enqueued from NMI context gets printed? They are flushed to the console either by irq_work or by another printk(). The irq_work could not be proceed when IRQs are disabled. But another non-deferred printk() would try to flush them immediately. > If there is a possibility > that the message enqueued from NMI context gets printed between > "write_seqlock_irqsave(&zonelist_update_seq, flags) and printk_deferred_enter()" or > "printk_deferred_exit() and write_sequnlock_irqrestore(&zonelist_update_seq, flags)" ? > If yes, we can't increment zonelist_update_seq.seqcount before printk_deferred_enter()... It might happen when a printk() is called in these holes. Best Regards, Petr