From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF09AC001DF for ; Sun, 23 Jul 2023 13:11:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D06886B0072; Sun, 23 Jul 2023 09:11:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB6A26B0074; Sun, 23 Jul 2023 09:11:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7EB96B0075; Sun, 23 Jul 2023 09:11:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A64616B0072 for ; Sun, 23 Jul 2023 09:11:51 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7DFA1B1AFC for ; Sun, 23 Jul 2023 13:11:51 +0000 (UTC) X-FDA: 81042914022.11.F072704 Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) by imf09.hostedemail.com (Postfix) with ESMTP id 76B5E140009 for ; Sun, 23 Jul 2023 13:11:47 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=eeIlzQzs; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of fmdefrancesco@gmail.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=fmdefrancesco@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690117907; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tX3v8B1OLr2NP8ZefAYOHoM8zrephbkRog16CAgCySk=; b=QzLG9Qcsoa2e9kmUhc1tYFr7MwKqzwb9m8CqOI1q+t2pdwrqVtYJKqfvZAGhqdbh4LRfCP kL2wcjevYcRVVyuWUU18KNul/2SnmobXP06vvxGPmr2pSsq3DbaBqz93p+AbhJREV5J+dd 5df3OJg12XYQsxunsVmc95VMHEYnk+s= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=eeIlzQzs; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of fmdefrancesco@gmail.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=fmdefrancesco@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690117907; a=rsa-sha256; cv=none; b=AFYDKpt2/d1E+7VLAhZkdJzG4XWIsSAZrbNLCackCAwZG9ccjqjmuipSkiDGeYiC/xXiKg 2LD9kRdehKbtKIB2uXW5dsvh7wLlPZ01l9OHUoxVZzv0JpZCk58ZvjdPbrIQaDf2j7f4S9 AFon7EUmWRpu1UILtGB7lYodrSCMu2g= Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-3159d75606dso2371862f8f.1 for ; Sun, 23 Jul 2023 06:11:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690117906; x=1690722706; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tX3v8B1OLr2NP8ZefAYOHoM8zrephbkRog16CAgCySk=; b=eeIlzQzsrkW6W3wuEfnkS0XDVJCR/WM2l1G8xFjEIkIwtWHl40IVkAmaeY7XrneztS yiKyi39B8I74HmXEzBzvOWlraLvGz2kF9nwyHrfcqAit5m2MnRzdV1HDVNv5yb3czGO1 9rYImd/KEb+Gvk8ZHdcM7T2VIdEGguq2EIz/EMZyxUxXd0Z6h/2enRmep/aUmwqXQt9y shRpNTIwgZrk7luFx5rjJ1X8sCaHK/mqeNcbOEr7yYxLUYxUV4+5ghczBa4xLF/mSnWS oNxIKogQqMLEBib1sPJsxAf/IDKReTk9SiSPXWTTSpNIlHJ7cVDWCI/2Nx9QtwQBpQaC Zdmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690117906; x=1690722706; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tX3v8B1OLr2NP8ZefAYOHoM8zrephbkRog16CAgCySk=; b=MCBMtJ9LecdWYSK7ymwiAAWBPDq4cF7ed0TbLYxuD9kdtT4cY4Wd1SJPdnvCSRWwva Ny5plCJ12Mjga2GJIBxl1sSuN9MlXOSOfPiFio7q87G8UJ0mHlGDSRSa7qZgx22J/1c2 pbeRHYnC7nul9c+EP/RZuglcSOZ7O5UpoaXJ9J30UZZ6h2oQaedhDjgFZPb5PHA96QvQ Fc9pe/QBXh6YE75FuCjstYOuemu6RkEIfiJ7LUhb3ZXZ0oMGCVYOROmN7nPh8rL6ERBW qJglm+Kf2aUawgYrfZ3w/YfZfNNtcqXhxNcwdxceH8eqvVdrJACREixhU+fEurmlgxbd qMXQ== X-Gm-Message-State: ABy/qLbS4sz+Wv6gploAiV4IDvFkZGK8OvCdfQTzCnRRMHDHAPkEyR1z D/95osiMmsLZA2qTt9g+RuI= X-Google-Smtp-Source: APBJJlGY3mEwmPlTj7RWvoK3PH/n9T926KyRvr/yePecgUfe0DfX/dlG9X9kk2ZcQlnwoeLzT27sCg== X-Received: by 2002:a5d:4fc9:0:b0:317:5e91:558a with SMTP id h9-20020a5d4fc9000000b003175e91558amr73951wrw.38.1690117905639; Sun, 23 Jul 2023 06:11:45 -0700 (PDT) Received: from suse.localnet (host-87-20-104-222.retail.telecomitalia.it. [87.20.104.222]) by smtp.gmail.com with ESMTPSA id a23-20020a5d4577000000b00314315071bbsm9642868wrc.38.2023.07.23.06.11.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 23 Jul 2023 06:11:45 -0700 (PDT) From: "Fabio M. De Francesco" To: Jonathan Corbet , Jonathan Cameron , Linus Walleij , Mike Rapoport , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton , Bagas Sanjaya , Matthew Wilcox , Randy Dunlap Subject: Re: [RFC PATCH v2] Documentation/page_tables: MMU, TLB, and Page Faults Date: Sun, 23 Jul 2023 15:11:38 +0200 Message-ID: <5974032.lOV4Wx5bFT@suse> In-Reply-To: <20230723120100.5891-1-fmdefrancesco@gmail.com> References: <20230723120100.5891-1-fmdefrancesco@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Rspamd-Queue-Id: 76B5E140009 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: jbshgak97sm4bqdzjcxedukq9srd19k6 X-HE-Tag: 1690117907-517075 X-HE-Meta: U2FsdGVkX18bvK6+9BB7MvhQQf0wVoYQwv4QVClH7Rr60VEyIbinEyXNXFeVZm0klJxGDdqwatoMOgWWJ4ZSKnadfsKg5Fcq0Khu2x1XQoHRWQX/21+56EUZJrEDmhsQocRjaE0o/R08UgeYx7W/XelW+3Tshf1sfBsb8gagqbhqqAUiD4Vf+gzhBMm/wZKSZ2RW4HV4q8dmGN7SW5pORkRMfkiJXu5CleM3ehNGzUvvw5taBhb3IhV9ieJzj1ZOh4188OycrkXJSKjBYkBkcv0rPPYjiH0MOslbgBG4cNn5Wy+kJUS7ntQFK0cZIiABCzIKXpI6WMOKLiSwIJ5ZuFXFariENG2uuKbSocFnf7nsc4N1JvAGPKTi8G1jEonyApAwEqIaMryQjuev4WHARHN+q4CBfYfgMS8BLPThylg+KJrB3OK6RJKXgiMlB2Gt5p2RTlQMpIuosaDoWc+NBPbjZvYv7Mtmu+TZ/py3bvVa9Yjo4FTRCUx3AwOr6M4EWMPQoApHpFWE+h6q5om8/85ZVENxE6vTTtzg5xbfy/OkDMnAea47BNqGrjn0FCacOBOqOz6/vLrmh/gYQefUlMr3aPSP5pwAhtNqKe1+asdDu77t1ebeki4SdWBhrmUGWnlLizVg+wZI353rtb6rcEWFISFtMz1DbrJcyZ2xc0qoqlqmpSb3eOSRL9TmXqnTip637S4fAO0w122DfYzz8+Ei2I9f84zjPsu9K6PYWSB6dZdUuyqqGHqwbHlF9bdqtVPXYl31niet4+ZFOVaSG6gc2uFWLHDtcgE2rSMPmY7o4Qy7XJ1vmhN6qQXxKeL48QY236gmgslW1Qvn568WawbaErDkdUEDIKRImpqZdu5rwVQa0ESteu+JjMSjZCoe6ngEoX4IsykMzocqJFvofN1oxcjTwbg/xwCpXv75xcst3HX6bj22fURn5Gbipnmjh4BV7hGzpVUbyWOsCVp 3gRtF1DO fSH1qgejJpPXQa31EIFJwXZTrc0dl1hd3zl/f6couEsInTzoS3YEIvvnnILTPpX5+xwqrFgS1D3DH54bc+WzUnw98kRAWQMDi3twgp2u2GJ1teiQBH/QBnPuM6dTuGBlxk6bw21afEwrK6EngTF6ENd3ew6Lv/Mmzqu7ECqZuSzgnQpnGbDFBucE1K4+SFwa+2Ex3tiPIWFgWlasHD+b91PhbL4JBg9hnKRKyZ1uVkZ3jGQbNfQdaKtwW4aOWjwMR5SYFF47Dmu7Ud1mMycnoeIMTjjs1H7ERnXYeI44DO4W7X4ZND4S+oWDvvE0eabbklY1IRmSIyZ/3nfERgzA6U3XHupQcvImaX1MuNrJQ4EOA3bAjN8/9/vY2Faccn2JzFnA9I/OLoiatothTUt8Pi/PyAp4tNezIaTz975AZbiY1npfaA0nqgT3UlmS8rP/WnquqBIjchZrGLGoSvz+U/aNwBo5YlUYFgbvJXD7gt2yU5QeebfFZFDh2grVuGzimyL2Em/zkRPru9BE6UhAfL1UhPRO3n1/Hea5Gaao0vYU5tEeme4qWudqcePzyJyirHDpKfy4HGlw/h6+SPkdyfgy2H4QQrIk4AjpGLIypKU6D6ije1BureNPyt6de6me85ui8Grc7EVyFJGVip2Yxi9N2jSCKM4y6VY2TWDjcGVUDkThDP1Cx0MbU6aGbTOGECug8IBUKeWT7aMPf6Ggm+7nesdJ1nr+v3H0+6h+1dTMf6yVRPnNiF6egQtFwxfS3dFI99pseNKgTsgF9htuZR0xEnHLOiqHbyF5IuTKPRt9JoaiYwlHIyFLRBOswld2Po/WDPvs44Cp4rkNzonbb/g3LhQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On domenica 23 luglio 2023 13:56:38 CEST Fabio M. De Francesco wrote: > Extend page_tables.rst by adding a small introductive section about > the role of MMU and TLB in translating between virtual addresses and > physical page frames. Furthermore explain the concepts behind the > Page Faults exceptions and how Linux handles them. Please discard this RFC because I sent it by mistake. The real RFC is "[RFC PATCH v2] Documentation/page_tables: Add info about MMU/ TLB and Page Faults" at https://lore.kernel.org/lkml/20230723120721.7139-1-fmdefrancesco@gmail.com/ Sorry for the noise. Fabio > Cc: Andrew Morton > Cc: Bagas Sanjaya > Cc: Jonathan Cameron > Cc: Jonathan Corbet > Cc: Linus Walleij > Cc: Matthew Wilcox > Cc: Mike Rapoport > Cc: Randy Dunlap > Signed-off-by: Fabio M. De Francesco > --- > > v1->v2: Add further information about lower level functions in the page > fault handler and add information about how and why to disable / enable > the page fault handler (provided a link to a Ira's patch that make use > of pagefault_disable() to prevent deadlocks. > > This is an RFC PATCH because of two reasons: > > 1) I've heard that there is consensus about the need to revise and > extend the MM documentation, but I'm not sure about whether or not > developers need these kind of introductory information. > > 2) While preparing this little patch I decided to take a quicj look at > the code and found out it currently is not how I thought I remembered > it. I'm especially speaking about the x86 case. I'm not sure that I've > been able to properly understand what I described as a difference in > workflow compared to most of the other architecture. > > Therefore, for the two reasons explained above, I'd like to hear from > people actively involved in MM. If this is not what you want, feel free > to throw it away. Otherwise I'd be happy to write more on this and other > MM topics. I'm looking forward for comments on this small work. > > Documentation/mm/page_tables.rst | 61 ++++++++++++++++++++++++++++++++ > 1 file changed, 61 insertions(+) > > diff --git a/Documentation/mm/page_tables.rst > b/Documentation/mm/page_tables.rst index 7840c1891751..fa617894fda8 100644 > --- a/Documentation/mm/page_tables.rst > +++ b/Documentation/mm/page_tables.rst > @@ -152,3 +152,64 @@ Page table handling code that wishes to be > architecture-neutral, such as the virtual memory manager, will need to be > written so that it traverses all of the currently five levels. This style > should also be preferred for > architecture-specific code, so as to be robust to future changes. > + > + > +MMU, TLB, and Page Faults > +========================= > + > +The Memory Management Unit (MMU) is a hardware component that handles virtual > to +physical address translations. It uses a relatively small cache in > hardware +called the Translation Lookaside Buffer (TLB) to speed up these > translations. +When a process wants to access a memory location, the CPU > provides a virtual +address to the MMU, which then uses the TLB to quickly > find the corresponding +physical address. > + > +However, sometimes the MMU can't find a valid translation in the TLB. This > +could be because the process is trying to access a range of memory that it's > not +allowed to, or because the memory hasn't been loaded into RAM yet. When > this +happens, the MMU triggers a page fault, which is a type of interrupt > that +signals the CPU to pause the current process and run a special function > to +handle the fault. > + > +One cause of page faults is due to bugs (or maliciously crafted addresses) > and +happens when a process tries to access a range of memory that it doesn't > have +permission to. This could be because the memory is reserved for the > kernel or +for another process, or because the process is trying to write to > a read-only +section of memory. When this happens, the kernel sends a > Segmentation Fault +(SIGSEGV) signal to the process, which usually causes the > process to terminate. + > +An expected and more common cause of page faults is "lazy allocation". This > is +a technique used by the Kernel to improve memory efficiency and reduce > +footprint. Instead of allocating physical memory to a process as soon as > it's +requested, the kernel waits until the process actually tries to use the > memory. +This can save a significant amount of memory in cases where a > process requests +a large block but only uses a small portion of it. > + > +A related technique is "Copy-on-Write" (COW), where the Kernel allows > multiple +processes to share the same physical memory as long as they're only > reading +from it. If a process tries to write to the shared memory, the > kernel triggers +a page fault and allocates a separate copy of the memory for > the process. This +allows the kernel to save memory and avoid unnecessary > data copying and, by +doing so, it reduces latency. > + > +Now, let's see how the Linux kernel handles these page faults: > + > +1. For most architectures, `do_page_fault()` is the primary interrupt handler > + for page faults. It delegates the actual handling of the page fault to + > `handle_mm_fault()`. This function checks the cause of the page fault and + > takes the appropriate action, such as loading the required page into + > memory, granting the process the necessary permissions, or sending a + > SIGSEGV signal to the process. > + > +2. In the specific case of the x86 architecture, the interrupt handler is > + defined by the `DEFINE_IDTENTRY_RAW_ERRORCODE()` macro, which calls > + `handle_page_fault()`. This function then calls either > + `do_user_addr_fault()` or `do_kern_addr_fault()`, depending on whether > + the fault occurred in user space or kernel space. Both of these functions > + eventually lead to `handle_mm_fault()`, similar to the workflow in other > + architectures. > + > +The actual implementation of the workflow is very complex. Its design allows > +Linux to handle page faults in a way that is tailored to the specific > +characteristics of each architecture, while still sharing a common overall > +structure. > -- > 2.41.0