From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-doc-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3B9C8C001B0
	for <linux-doc@archiver.kernel.org>; Sun, 23 Jul 2023 13:11:51 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229578AbjGWNLu (ORCPT <rfc822;linux-doc@archiver.kernel.org>);
        Sun, 23 Jul 2023 09:11:50 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53642 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229566AbjGWNLt (ORCPT
        <rfc822;linux-doc@vger.kernel.org>); Sun, 23 Jul 2023 09:11:49 -0400
Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5B62E52;
        Sun, 23 Jul 2023 06:11:47 -0700 (PDT)
Received: by mail-wr1-x431.google.com with SMTP id ffacd0b85a97d-31716932093so2366052f8f.3;
        Sun, 23 Jul 2023 06:11:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1690117906; x=1690722706;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=tX3v8B1OLr2NP8ZefAYOHoM8zrephbkRog16CAgCySk=;
        b=eeIlzQzsrkW6W3wuEfnkS0XDVJCR/WM2l1G8xFjEIkIwtWHl40IVkAmaeY7XrneztS
         yiKyi39B8I74HmXEzBzvOWlraLvGz2kF9nwyHrfcqAit5m2MnRzdV1HDVNv5yb3czGO1
         9rYImd/KEb+Gvk8ZHdcM7T2VIdEGguq2EIz/EMZyxUxXd0Z6h/2enRmep/aUmwqXQt9y
         shRpNTIwgZrk7luFx5rjJ1X8sCaHK/mqeNcbOEr7yYxLUYxUV4+5ghczBa4xLF/mSnWS
         oNxIKogQqMLEBib1sPJsxAf/IDKReTk9SiSPXWTTSpNIlHJ7cVDWCI/2Nx9QtwQBpQaC
         Zdmw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1690117906; x=1690722706;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=tX3v8B1OLr2NP8ZefAYOHoM8zrephbkRog16CAgCySk=;
        b=Nh02/jyrDc/9exf/myQgbWI254wDzwoXqPBGvOCeRky8NtMXw+284thbsyLCpblH8H
         EcNs9paVLwx3aLFHuruOJHfKGcVdLWKabryia4kxADJBU/VfRu0q4SLICRNUu9RvE2Ic
         3yZP+qJ4XvRSEp301OKUcBWpaOhgegh9CC/m8Kw74rXyJzL5QVcjabV31JczaKYS/WkI
         4TtF4tGUgA0vdZmr+KRGddqezqbrU7DTK7wFprjsf5c3SQSutryax5VSSCTX36PhWS3H
         FPpQPqZMClXbdk26y4jVwAzO/qBNokods5D4i1e/1bw7+V4yfmF5f4Id3JFpWVpdwFNi
         7TsA==
X-Gm-Message-State: ABy/qLYZK3VUu57XnTN1FoEQ4SIx6JnWYd1leBOpp/G/LqSPuZBhn7qP
        Rx4NyMJKxzvalzkCsxwJz3E=
X-Google-Smtp-Source: APBJJlGY3mEwmPlTj7RWvoK3PH/n9T926KyRvr/yePecgUfe0DfX/dlG9X9kk2ZcQlnwoeLzT27sCg==
X-Received: by 2002:a5d:4fc9:0:b0:317:5e91:558a with SMTP id h9-20020a5d4fc9000000b003175e91558amr73951wrw.38.1690117905639;
        Sun, 23 Jul 2023 06:11:45 -0700 (PDT)
Received: from suse.localnet (host-87-20-104-222.retail.telecomitalia.it. [87.20.104.222])
        by smtp.gmail.com with ESMTPSA id a23-20020a5d4577000000b00314315071bbsm9642868wrc.38.2023.07.23.06.11.44
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sun, 23 Jul 2023 06:11:45 -0700 (PDT)
From:   "Fabio M. De Francesco" <fmdefrancesco@gmail.com>
To:     Jonathan Corbet <corbet@lwn.net>,
        Jonathan Cameron <Jonathan.Cameron@huawei.com>,
        Linus Walleij <linus.walleij@linaro.org>,
        Mike Rapoport <rppt@kernel.org>, linux-doc@vger.kernel.org,
        linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc:     Andrew Morton <akpm@linux-foundation.org>,
        Bagas Sanjaya <bagasdotme@gmail.com>,
        Matthew Wilcox <willy@infradead.org>,
        Randy Dunlap <rdunlap@infradead.org>
Subject: Re: [RFC PATCH v2] Documentation/page_tables: MMU, TLB,
 and Page Faults
Date:   Sun, 23 Jul 2023 15:11:38 +0200
Message-ID: <5974032.lOV4Wx5bFT@suse>
In-Reply-To: <20230723120100.5891-1-fmdefrancesco@gmail.com>
References: <20230723120100.5891-1-fmdefrancesco@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
Precedence: bulk
List-ID: <linux-doc.vger.kernel.org>
X-Mailing-List: linux-doc@vger.kernel.org

On domenica 23 luglio 2023 13:56:38 CEST Fabio M. De Francesco wrote:
> Extend page_tables.rst by adding a small introductive section about
> the role of MMU and TLB in translating between virtual addresses and
> physical page frames. Furthermore explain the concepts behind the
> Page Faults exceptions and how Linux handles them.

Please discard this RFC because I sent it by mistake.
The real RFC is "[RFC PATCH v2] Documentation/page_tables: Add info about MMU/
TLB and Page Faults" at https://lore.kernel.org/lkml/20230723120721.7139-1-fmdefrancesco@gmail.com/

Sorry for the noise.

Fabio

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Bagas Sanjaya <bagasdotme@gmail.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Linus Walleij <linus.walleij@linaro.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
> ---
> 
> v1->v2: Add further information about lower level functions in the page
> fault handler and add information about how and why to disable / enable
> the page fault handler (provided a link to a Ira's patch that make use
> of pagefault_disable() to prevent deadlocks.
> 
> This is an RFC PATCH because of two reasons:
> 
> 1) I've heard that there is consensus about the need to revise and
> extend the MM documentation, but I'm not sure about whether or not
> developers need these kind of introductory information.
> 
> 2) While preparing this little patch I decided to take a quicj look at
> the code and found out it currently is not how I thought I remembered
> it. I'm especially speaking about the x86 case. I'm not sure that I've
> been able to properly understand what I described as a difference in
> workflow compared to most of the other architecture.
> 
> Therefore, for the two reasons explained above, I'd like to hear from
> people actively involved in MM. If this is not what you want, feel free
> to throw it away. Otherwise I'd be happy to write more on this and other
> MM topics. I'm looking forward for comments on this small work.
> 
>  Documentation/mm/page_tables.rst | 61 ++++++++++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
> 
> diff --git a/Documentation/mm/page_tables.rst
> b/Documentation/mm/page_tables.rst index 7840c1891751..fa617894fda8 100644
> --- a/Documentation/mm/page_tables.rst
> +++ b/Documentation/mm/page_tables.rst
> @@ -152,3 +152,64 @@ Page table handling code that wishes to be
> architecture-neutral, such as the virtual memory manager, will need to be
> written so that it traverses all of the currently five levels. This style
> should also be preferred for
>  architecture-specific code, so as to be robust to future changes.
> +
> +
> +MMU, TLB, and Page Faults
> +=========================
> +
> +The Memory Management Unit (MMU) is a hardware component that handles 
virtual
> to +physical address translations. It uses a relatively small cache in
> hardware +called the Translation Lookaside Buffer (TLB) to speed up these
> translations. +When a process wants to access a memory location, the CPU
> provides a virtual +address to the MMU, which then uses the TLB to quickly
> find the corresponding +physical address.
> +
> +However, sometimes the MMU can't find a valid translation in the TLB. This
> +could be because the process is trying to access a range of memory that 
it's
> not +allowed to, or because the memory hasn't been loaded into RAM yet. When
> this +happens, the MMU triggers a page fault, which is a type of interrupt
> that +signals the CPU to pause the current process and run a special 
function
> to +handle the fault.
> +
> +One cause of page faults is due to bugs (or maliciously crafted addresses)
> and +happens when a process tries to access a range of memory that it 
doesn't
> have +permission to. This could be because the memory is reserved for the
> kernel or +for another process, or because the process is trying to write to
> a read-only +section of memory. When this happens, the kernel sends a
> Segmentation Fault +(SIGSEGV) signal to the process, which usually causes 
the
> process to terminate. +
> +An expected and more common cause of page faults is "lazy allocation". This
> is +a technique used by the Kernel to improve memory efficiency and reduce
> +footprint. Instead of allocating physical memory to a process as soon as
> it's +requested, the kernel waits until the process actually tries to use 
the
> memory. +This can save a significant amount of memory in cases where a
> process requests +a large block but only uses a small portion of it.
> +
> +A related technique is "Copy-on-Write" (COW), where the Kernel allows
> multiple +processes to share the same physical memory as long as they're 
only
> reading +from it. If a process tries to write to the shared memory, the
> kernel triggers +a page fault and allocates a separate copy of the memory 
for
> the process. This +allows the kernel to save memory and avoid unnecessary
> data copying and, by +doing so, it reduces latency.
> +
> +Now, let's see how the Linux kernel handles these page faults:
> +
> +1. For most architectures, `do_page_fault()` is the primary interrupt 
handler
> +   for page faults. It delegates the actual handling of the page fault to + 
>  `handle_mm_fault()`. This function checks the cause of the page fault and + 
>  takes the appropriate action, such as loading the required page into +  
> memory, granting the process the necessary permissions, or sending a +  
> SIGSEGV signal to the process.
> +
> +2. In the specific case of the x86 architecture, the interrupt handler is
> +   defined by the `DEFINE_IDTENTRY_RAW_ERRORCODE()` macro, which calls
> +   `handle_page_fault()`. This function then calls either
> +   `do_user_addr_fault()` or `do_kern_addr_fault()`, depending on whether
> +   the fault occurred in user space or kernel space. Both of these 
functions
> +   eventually lead to `handle_mm_fault()`, similar to the workflow in other
> +   architectures.
> +
> +The actual implementation of the workflow is very complex. Its design 
allows
> +Linux to handle page faults in a way that is tailored to the specific
> +characteristics of each architecture, while still sharing a common overall
> +structure.
> --
> 2.41.0