From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-4316.protonmail.ch (mail-4316.protonmail.ch [185.70.43.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54363613D; Wed, 24 Jul 2024 19:36:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.70.43.16 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721849806; cv=none; b=EMuFst+xXzFryBPSYoIEA5FRq4gC4WUceW9WwvyRv9i2X1UPKKCHmAQU040FfDPS5ERX3CffwmuB9D/B28jo4dh6w5hYsShJRnI9ZjEnR1BQzXxpfKhONAzFpI3HUCCgSBGXcQZCNM9JM9Rhm9LEPYEvxqeEYzKZdChwe/0Dwi4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721849806; c=relaxed/simple; bh=G65BQ8ZETSnWP5F7kWFRz1hJIoEwZ48ofdPadlg5AdM=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=byJzfQVxl0J3f5f6XEmGU8LaW/jvMiCpyEw6s6VlnQkc5tJ6HqOOZE/H0ZPXWutHp3Q8CwYpmSbta0D0s+TNsML2JqznLqdMd+x+QGiQrUR8ze0L9+VvraWRZ7j8+0oGqFl9HiFJrWotHyrLbkpb6m5ypXqS+WWifmNHaesaLJY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=proton.me; spf=pass smtp.mailfrom=proton.me; dkim=pass (2048-bit key) header.d=proton.me header.i=@proton.me header.b=hccTp/14; arc=none smtp.client-ip=185.70.43.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=proton.me Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=proton.me Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=proton.me header.i=@proton.me header.b="hccTp/14" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=proton.me; s=protonmail; t=1721849796; x=1722108996; bh=Ex+yutQkGp2wbHRJ1q0Nvfaduezw0qpsrEkXrC2Hz2U=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=hccTp/14FsvEzv57iz5I7FOaS8KpfbobNd3BGo7ySqOJ7hKXTcDw7xQUiuQqQgy1Y I9JGME8jJOdL7De84zZJfXpQvm60n5bpGQComZ3fR5YM8OgIMtoO7eLZ0iqHDgenfA 0l/ZvwXSL+vjs0ZYuLanTzhedI6N3Sj0R2VFZ0UZO4+yZALwIg3xyt9631VzFEbLKP Ij5SnoHjva8jBbdA1LOXrO2GL7Gy0DCOOnu443uHW938XdyqL0CxOJGT/++2Utw4Nj VjECOVyWlCj9y+3itmnFeZODyVdRhygmSLAjfotdPYD5hP3TmoaY/It7A2X+vHJ+xK GMq8HwEv1gTKg== Date: Wed, 24 Jul 2024 19:36:29 +0000 To: Alice Ryhl From: Benno Lossin Cc: Jonathan Corbet , Miguel Ojeda , Alex Gaynor , Wedson Almeida Filho , Boqun Feng , Gary Guo , =?utf-8?Q?Bj=C3=B6rn_Roy_Baron?= , Andreas Hindborg , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org Subject: Re: [RFC PATCH 1/5] doc: rust: create safety standard Message-ID: <4ed475d3-37b6-49e7-8b43-df3160f4ebbc@proton.me> In-Reply-To: References: <20240717221133.459589-1-benno.lossin@proton.me> <20240717221133.459589-2-benno.lossin@proton.me> Feedback-ID: 71780778:user:proton X-Pm-Message-ID: 06ca8bbf045bb4f4659d593a55d9d1f164b44cc2 Precedence: bulk X-Mailing-List: rust-for-linux@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 18.07.24 14:20, Alice Ryhl wrote: > On Thu, Jul 18, 2024 at 12:12=E2=80=AFAM Benno Lossin wrote: >> +Soundness >> +=3D=3D=3D=3D=3D=3D=3D=3D=3D >> + >> +``unsafe`` operations (e.g. ``unsafe`` functions, dereferencing raw poi= nters etc.) have certain >> +conditions that need to be fulfilled in order for the operation to not = be UB. >> +To evaluate if the ``unsafe`` code usage is correct, one needs to consi= der the API that wraps said >> +``unsafe`` code. If under all possible safe uses of the API, the condit= ions for the ``unsafe`` >> +operation are fulfilled, the API is *sound*. Otherwise it is *unsound*.= Here is a simple example:: >> + >> + pub struct Data { >> + a: usize, >> + } >> + >> + pub fn access_a(data: *mut Data) -> usize { >> + unsafe { (*data).a } >> + } >> + >> + fn main() { >> + let mut d =3D Data { a: 42 }; >> + println!("{}", access_a(&mut d)); >> + } >> + >> +While this example has no UB, the function ``access_a`` is unsound. Thi= s is because one could just >> +write the following safe usage:: >> + >> + println!("{}", access_a(core::ptr::null_mut())); >> + >> +And this would result in a dereference of a null pointer. >> + >> +In its essence, a sound API means that if someone only writes safe code= , they can never encounter UB >> +even if they call safe code that calls ``unsafe`` code behind the scene= s. >=20 > I think this section on soundness could be more clear. You never > really give a definition for what soundness is; you just jump into an > example immediately. I would argue that the sentence "If under all possible safe uses of the API, the conditions for the ``unsafe`` operation are fulfilled, the API is *sound*. Otherwise it is *unsound*." is a definition. Is this lacking anything, or do you have an idea to improve the wording? I am not really understanding what kind of additional definition you would like to see. > It may be nice to talk about how a sound API must prevent memory > safety issues, even if the safe code required to do so is silly or > unrealistic. Doing so is necessary to maintain the "safe code never > has memory safety bugs" guarantee. That is a good way to phrase things, I will add that. >> +Because unsoundness issues have the potential for allowing safe code to= experience UB, they are >> +treated similarly to actual bugs with UB. Their fixes should also be in= cluded in the stable tree. >> + >> +Safety Documentation >> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> + >> +After trying to minimize and remove as much ``unsafe`` code as possible= , there still is some left. >> +This is because some things are just not possible in only safe code. Th= is last part of ``unsafe`` >> +code must still be correct. Helping with that is the safety documentati= on: it meticulously documents >> +the various requirements and justifications for every line of ``unsafe`= ` code. That way it can be >> +ensured that all ``unsafe`` code is sound without anyone needing to kno= w the whole kernel at once. >> +The gist of the idea is this: every ``unsafe`` operation documents its = requirements and every >> +location that uses an ``unsafe`` operation documents for every requirem= ent a justification why they >> +are fulfilled. If now all requirements and justifications are correct, = then there can only be sound >> +``unsafe`` code. >> + >> +The ``unsafe`` keywords has two different meanings depending on the con= text it is used in: >> + >> +* granting access to an unchecked operation, >> +* declaring that something is an unchecked operation. >> + >> +In both cases we have to add safety documentation. In the first case, w= e have to justify why we can >> +always guarantee that the requirements of the unchecked operation are f= ulfilled. In the second case, >> +we have to list the requirements that have to be fulfilled for the oper= ation to be sound. >> + >> +In the following sections we will go over each location where ``unsafe`= ` can be used. >> + >> +.. _unsafe-Functions: >> + >> +``unsafe`` Functions >> +-------------------- >> + >> +``unsafe`` on function declarations is used to state that this function= has special requirements >> +that callers have to ensure when calling the function:: >> + >> + unsafe fn foo() { >> + // ... >> + } >> + >> +These requirements are called the safety requirements of the function. = These requirements can take >> +any shape and range from simple requirements like "``ptr_arg`` is valid= " (``ptr_arg`` refers to some >> +argument with the type matching ``*mut T`` or ``*const T``) to more com= plex requirements like >> +"``ptr`` must be valid, point to a ``NUL``-terminated C string, and it = must be valid for at least >> +``'a``. While the returned value is alive, the memory at ``ptr`` must n= ot be mutated.". >> + >> +The safety requirements have to be documented in the so called safety s= ection:: >> + >> + /// >> + /// >> + /// >> + /// >> + /// # Safety >> + /// >> + /// >> + unsafe fn foo() { >> + // ... >> + } >=20 > I would love to see a proper example here. Depending on how the discussion goes with this series, I plan to go through the tree and try to improve the safety comments. That way I will have a good source for examples, since otherwise they are extremely hard to come up with. (I am also open to any suggestions :) >> +.. _unsafe-Blocks: >> + >> +``unsafe`` Blocks >> +----------------- >> + >> +``unsafe`` code blocks are used to call ``unsafe`` functions and perfor= m built-in ``unsafe`` >> +operations such as dereferencing a raw pointer:: >> + >> + unsafe { foo() }; >> + >> +In order to ensure that all safety requirements of ``unsafe`` operation= s are upheld, a safety >> +comment is mandatory for all ``unsafe`` blocks. This safety comment nee= ds to provide a correct >> +justification for every safety requirements of every operation within t= he block:: >> + >> + // SAFETY: >> + unsafe { foo() }; >=20 > I think it is worth explicitly pointing out that the safety comment > must explain why the preconditions are satisfied, *not* what the > preconditions are. It's a really really common mistake to mix up > these, and it probably even makes sense to include two examples > showing the difference. Good idea, but I will put that into justifications.rst. >> +For transparency it is best practice to have only a single ``unsafe`` o= peration per ``unsafe`` >> +block, since then it is more clear what the justifications are trying t= o justify. Safe operations >> +should not be included in the block, since it adds confusion as to whic= h operation is the ``unsafe`` >> +one. In certain cases however it makes it easier to understand if there= is only a single ``unsafe`` >> +block. For example:: >> + >> + // SAFETY: `ptr` is valid for writes. >> + unsafe { >> + (*ptr).field1 =3D 42; >> + (*ptr).field2 =3D 24; >> + (*ptr).field3 =3D 2442; >> + } >> + >> +In this case it is more readable to not split the block into multiple p= arts. >=20 > It would be nice with an example of a completely normal safety comment > example. Also would be nice to call out that the semicolon goes > outside the unsafe block to improve formatting, when it's a single > operation. Will add. > It would also be nice with an example that shows how to reference the > safety requirements of the current function. (That is, an example that > combines unsafe fn with unsafe block.) See above. --- Cheers, Benno