linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: David Howells <dhowells@redhat.com>
Cc: viro@zeniv.linux.org.uk, raven@themaw.net,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	linux-block@vger.kernel.org, keyrings@vger.kernel.org,
	linux-security-module@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 02/13] uapi: General notification ring definitions [ver #4]
Date: Fri, 7 Jun 2019 08:12:28 -0700	[thread overview]
Message-ID: <20190607151228.GA1872258@magnolia> (raw)
In-Reply-To: <155991706083.15579.16359443779582362339.stgit@warthog.procyon.org.uk>

On Fri, Jun 07, 2019 at 03:17:40PM +0100, David Howells wrote:
> Add UAPI definitions for the general notification ring, including the
> following pieces:
> 
>  (1) struct watch_notification.
> 
>      This is the metadata header for each entry in the ring.  It includes a
>      type and subtype that indicate the source of the message
>      (eg. WATCH_TYPE_MOUNT_NOTIFY) and the kind of the message
>      (eg. NOTIFY_MOUNT_NEW_MOUNT).
> 
>      The header also contains an information field that conveys the
>      following information:
> 
> 	- WATCH_INFO_LENGTH.  The size of the entry (entries are variable
>           length).
> 
> 	- WATCH_INFO_OVERRUN.  If preceding messages were lost due to ring
> 	  overrun or lack of memory.
> 
> 	- WATCH_INFO_ENOMEM.  If preceding messages were lost due to lack
>           of memory.
> 
> 	- WATCH_INFO_RECURSIVE.  If the event detected was applied to
>           multiple objects (eg. a recursive change to mount attributes).
> 
> 	- WATCH_INFO_IN_SUBTREE.  If the event didn't happen at the watched
>           object, but rather to some related object (eg. a subtree mount
>           watch saw a mount happen somewhere within the subtree).
> 
> 	- WATCH_INFO_TYPE_FLAGS.  Eight flags whose meanings depend on the
>           message type.
> 
> 	- WATCH_INFO_ID.  The watch ID specified when the watchpoint was
>           set.
> 
>      All the information in the header can be used in filtering messages at
>      the point of writing into the buffer.
> 
>  (2) struct watch_queue_buffer.
> 
>      This describes the layout of the ring.  Note that the first slots in
>      the ring contain a special metadata entry that contains the ring
>      pointers.  The producer in the kernel knows to skip this and it has a
>      proper header (WATCH_TYPE_META, WATCH_META_SKIP_NOTIFICATION) that
>      indicates the size so that the ring consumer can handle it the same as
>      any other record and just skip it.
> 
>      Note that this means that ring entries can never be split over the end
>      of the ring, so if an entry would need to be split, a skip record is
>      inserted to wrap the ring first; this is also WATCH_TYPE_META,
>      WATCH_META_SKIP_NOTIFICATION.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>

I'm starting by reading the uapi changes and the sample program...

> ---
> 
>  include/uapi/linux/watch_queue.h |   63 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 63 insertions(+)
>  create mode 100644 include/uapi/linux/watch_queue.h
> 
> diff --git a/include/uapi/linux/watch_queue.h b/include/uapi/linux/watch_queue.h
> new file mode 100644
> index 000000000000..c3a88fa5f62a
> --- /dev/null
> +++ b/include/uapi/linux/watch_queue.h
> @@ -0,0 +1,63 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef _UAPI_LINUX_WATCH_QUEUE_H
> +#define _UAPI_LINUX_WATCH_QUEUE_H
> +
> +#include <linux/types.h>
> +
> +enum watch_notification_type {
> +	WATCH_TYPE_META		= 0,	/* Special record */
> +	WATCH_TYPE_MOUNT_NOTIFY	= 1,	/* Mount notification record */
> +	WATCH_TYPE_SB_NOTIFY	= 2,	/* Superblock notification */
> +	WATCH_TYPE_KEY_NOTIFY	= 3,	/* Key/keyring change notification */
> +	WATCH_TYPE_BLOCK_NOTIFY	= 4,	/* Block layer notifications */
> +#define WATCH_TYPE___NR 5

Given the way enums work I think you can just make WATCH_TYPE___NR the
last element in the enum?

> +};
> +
> +enum watch_meta_notification_subtype {
> +	WATCH_META_SKIP_NOTIFICATION	= 0,	/* Just skip this record */
> +	WATCH_META_REMOVAL_NOTIFICATION	= 1,	/* Watched object was removed */
> +};
> +
> +/*
> + * Notification record
> + */
> +struct watch_notification {

Kind of a long name...

> +	__u32			type:24;	/* enum watch_notification_type */
> +	__u32			subtype:8;	/* Type-specific subtype (filterable) */

16777216 diferent types and 256 different subtypes?  My gut instinct
wants a better balance, though I don't know where I'd draw the line.
Probably 12 bits for type and 10 for subtype?  OTOH I don't have a good
sense of how many distinct notification types an XFS would want to send
back to userspace, and maybe 256 subtypes is fine.  We could always
reserve another watch_notification_type if we need > 256.

Ok, no objections. :)

> +	__u32			info;
> +#define WATCH_INFO_OVERRUN	0x00000001	/* Event(s) lost due to overrun */
> +#define WATCH_INFO_ENOMEM	0x00000002	/* Event(s) lost due to ENOMEM */
> +#define WATCH_INFO_RECURSIVE	0x00000004	/* Change was recursive */
> +#define WATCH_INFO_LENGTH	0x000001f8	/* Length of record / sizeof(watch_notification) */

This is a mask, isn't it?  Could we perhaps have some helpers here?
Something along the lines of...

#define WATCH_INFO_LENGTH_MASK	0x000001f8
#define WATCH_INFO_LENGTH_SHIFT	3

static inline size_t watch_notification_length(struct watch_notification *wn)
{
	return (wn->info & WATCH_INFO_LENGTH_MASK) >> WATCH_INFO_LENGTH_SHIFT *
			sizeof(struct watch_notification);
}

static inline struct watch_notification *watch_notification_next(
		struct watch_notification *wn)
{
	return wn + ((wn->info & WATCH_INFO_LENGTH_MASK) >>
			WATCH_INFO_LENGTH_SHIFT);
}

...so that we don't have to opencode all of the ring buffer walking
magic and stuff?

(I might also shorten the namespace from WATCH_INFO_ to WNI_ ...)

Hmm so the length field is 6 bits and therefore the maximum size of a
notification record is ... 63 * (sizeof(u32)  * 2) = 504 bytes?  Which
means that kernel users can send back a maximum payload of 496 bytes?
That's probably big enough for random fs notifications (bad metadata
detected, media errors, etc.)

Judging from the sample program I guess all that userspace does is
allocate a memory buffer and toss it into the kernel, which then
initializes the ring management variables, and from there we just scan
around the ring buffer every time poll(watch_fd) says there's something
to do?  How does userspace tell the kernel the size of the ring buffer?

Does (watch_notification->info & WATCH_INFO_LENGTH) == 0 have any
meaning besides apparently "stop looking at me"?

> +#define WATCH_INFO_IN_SUBTREE	0x00000200	/* Change was not at watched root */
> +#define WATCH_INFO_TYPE_FLAGS	0x00ff0000	/* Type-specific flags */

WATCH_INFO_FLAG_MASK ?

> +#define WATCH_INFO_FLAG_0	0x00010000
> +#define WATCH_INFO_FLAG_1	0x00020000
> +#define WATCH_INFO_FLAG_2	0x00040000
> +#define WATCH_INFO_FLAG_3	0x00080000
> +#define WATCH_INFO_FLAG_4	0x00100000
> +#define WATCH_INFO_FLAG_5	0x00200000
> +#define WATCH_INFO_FLAG_6	0x00400000
> +#define WATCH_INFO_FLAG_7	0x00800000
> +#define WATCH_INFO_ID		0xff000000	/* ID of watchpoint */

WATCH_INFO_ID_MASK ?

> +#define WATCH_INFO_ID__SHIFT	24

Why double underscore here?

> +};
> +
> +#define WATCH_LENGTH_SHIFT	3
> +
> +struct watch_queue_buffer {
> +	union {
> +		/* The first few entries are special, containing the
> +		 * ring management variables.

The first /two/ entries, correct?

Also, weird multiline comment style.

> +		 */
> +		struct {
> +			struct watch_notification watch; /* WATCH_TYPE_META */

Do these structures have to be cache-aligned for the atomic reads and
writes to work?

--D

> +			__u32		head;		/* Ring head index */
> +			__u32		tail;		/* Ring tail index */
> +			__u32		mask;		/* Ring index mask */
> +		} meta;
> +		struct watch_notification slots[0];
> +	};
> +};
> +
> +#endif /* _UAPI_LINUX_WATCH_QUEUE_H */
> 

  reply	other threads:[~2019-06-07 15:15 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-07 14:17 [RFC][PATCH 00/13] Mount, FS, Block and Keyrings notifications [ver #4] David Howells
2019-06-07 14:17 ` [PATCH 01/13] security: Override creds in __fput() with last fputter's creds " David Howells
2019-06-07 14:17 ` [PATCH 02/13] uapi: General notification ring definitions " David Howells
2019-06-07 15:12   ` Darrick J. Wong [this message]
2019-06-07 15:30   ` David Howells
2019-06-07 15:51   ` David Howells
2019-06-09  4:35     ` Randy Dunlap
2019-06-13 13:34     ` David Howells
2019-06-13 14:49       ` Randy Dunlap
2019-06-07 14:17 ` [PATCH 03/13] security: Add hooks to rule on setting a watch " David Howells
2019-06-07 14:17 ` [PATCH 04/13] security: Add a hook for the point of notification insertion " David Howells
2019-06-07 14:18 ` [PATCH 05/13] General notification queue with user mmap()'able ring buffer " David Howells
2019-06-07 14:18 ` [PATCH 06/13] keys: Add a notification facility " David Howells
2019-06-10 17:11   ` Jonathan Corbet
2019-06-10 17:47   ` David Howells
2019-06-07 14:18 ` [PATCH 07/13] vfs: Add a mount-notification " David Howells
2019-06-07 14:18 ` [PATCH 08/13] vfs: Add superblock notifications " David Howells
2019-06-07 14:18 ` [PATCH 09/13] fsinfo: Export superblock notification counter " David Howells
2019-06-07 14:18 ` [PATCH 10/13] Add a general, global device notification watch list " David Howells
2019-06-07 14:19 ` [PATCH 11/13] block: Add block layer notifications " David Howells
2019-06-07 14:19 ` [PATCH 12/13] usb: Add USB subsystem " David Howells
2019-06-07 14:19 ` [PATCH 13/13] Add sample notification program " David Howells
2019-06-10 15:21 ` [RFC][PATCH 00/13] Mount, FS, Block and Keyrings notifications " Stephen Smalley
2019-06-10 16:33   ` Casey Schaufler
2019-06-10 16:42     ` Andy Lutomirski
2019-06-10 18:01       ` Casey Schaufler
2019-06-10 18:22         ` Andy Lutomirski
2019-06-10 19:33           ` Casey Schaufler
2019-06-10 19:53             ` Andy Lutomirski
2019-06-10 21:25               ` Casey Schaufler
2019-06-11  0:13                 ` Andy Lutomirski
2019-06-11 14:32                   ` Stephen Smalley
2019-06-12  8:55                   ` David Howells
2019-06-10 22:07               ` David Howells
2019-06-11 14:21 ` What do LSMs *actually* need for checks on notifications? David Howells
2019-06-11 15:57   ` Stephen Smalley
2019-06-11 16:22   ` Casey Schaufler
2019-06-12 11:43   ` David Howells
2019-06-13 18:46     ` Stephen Smalley
2019-06-12 17:41   ` David Howells
2019-06-12 18:14     ` Casey Schaufler
2019-06-12 18:36     ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190607151228.GA1872258@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=dhowells@redhat.com \
    --cc=keyrings@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=raven@themaw.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).