[PATCH 0/2] get rid of PF_MEMALLOC

linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] get rid of PF_MEMALLOC_NORECLAIM
@ 2024-08-26  8:47 Michal Hocko
  2024-08-26  8:47 ` [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Michal Hocko
  2024-08-26  8:47 ` [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM Michal Hocko
  0 siblings, 2 replies; 78+ messages in thread
From: Michal Hocko @ 2024-08-26  8:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Yafang Shao, Kent Overstreet, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

https://lore.kernel.org/all/20240812090525.80299-1-laoar.shao@gmail.com/T/#u
attempted to build on top of PF_MEMALLOC_NORECLAIM introduced by 
eab0af905bfc ("mm: introduce PF_MEMALLOC_NORECLAIM,
PF_MEMALLOC_NOWARN"). This flag has been merged even though there was an
explicit push back from the MM people - https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/

These two patches are dropping the flag and use an explicit GFP_NOWAIT
allocation context in the bcache inode allocation code. This required to
push gfp mask down to inode_init_always and its LSM hooks. I am not really
familiar with this code so please give it a thorough review.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26  8:47 [PATCH 0/2] get rid of PF_MEMALLOC_NORECLAIM Michal Hocko
@ 2024-08-26  8:47 ` Michal Hocko
  2024-08-26 13:11   ` Matthew Wilcox
                     ` (6 more replies)
  2024-08-26  8:47 ` [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM Michal Hocko
  1 sibling, 7 replies; 78+ messages in thread
From: Michal Hocko @ 2024-08-26  8:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Yafang Shao, Kent Overstreet, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
inode to achieve GFP_NOWAIT semantic while holding locks. If this
allocation fails it will drop locks and use GFP_NOFS allocation context.

We would like to drop PF_MEMALLOC_NORECLAIM because it is really
dangerous to use if the caller doesn't control the full call chain with
this flag set. E.g. if any of the function down the chain needed
GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
cause unexpected failure.

While this is not the case in this particular case using the scoped gfp
semantic is not really needed bacause we can easily pus the allocation
context down the chain without too much clutter.

Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/bcachefs/fs.c          | 14 ++++++--------
 fs/inode.c                |  6 +++---
 include/linux/fs.h        |  7 ++++++-
 include/linux/lsm_hooks.h |  2 +-
 include/linux/security.h  |  4 ++--
 security/security.c       |  8 ++++----
 6 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
index 15fc41e63b6c..7a55167b9133 100644
--- a/fs/bcachefs/fs.c
+++ b/fs/bcachefs/fs.c
@@ -231,9 +231,9 @@ static struct inode *bch2_alloc_inode(struct super_block *sb)
 	BUG();
 }
 
-static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
+static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c, gfp_t gfp)
 {
-	struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, GFP_NOFS);
+	struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, gfp);
 	if (!inode)
 		return NULL;
 
@@ -245,7 +245,7 @@ static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
 	mutex_init(&inode->ei_quota_lock);
 	memset(&inode->ei_devs_need_flush, 0, sizeof(inode->ei_devs_need_flush));
 
-	if (unlikely(inode_init_always(c->vfs_sb, &inode->v))) {
+	if (unlikely(inode_init_always_gfp(c->vfs_sb, &inode->v), gfp)) {
 		kmem_cache_free(bch2_inode_cache, inode);
 		return NULL;
 	}
@@ -258,12 +258,10 @@ static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
  */
 static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans)
 {
-	struct bch_inode_info *inode =
-		memalloc_flags_do(PF_MEMALLOC_NORECLAIM|PF_MEMALLOC_NOWARN,
-				  __bch2_new_inode(trans->c));
+	struct bch_inode_info *inode = __bch2_new_inode(trans->c, GFP_NOWARN | GFP_NOWAIT);
 
 	if (unlikely(!inode)) {
-		int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c)) ? 0 : -ENOMEM);
+		int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c, GFP_NOFS)) ? 0 : -ENOMEM);
 		if (ret && inode) {
 			__destroy_inode(&inode->v);
 			kmem_cache_free(bch2_inode_cache, inode);
@@ -328,7 +326,7 @@ __bch2_create(struct mnt_idmap *idmap,
 	if (ret)
 		return ERR_PTR(ret);
 #endif
-	inode = __bch2_new_inode(c);
+	inode = __bch2_new_inode(c, GFP_NOFS);
 	if (unlikely(!inode)) {
 		inode = ERR_PTR(-ENOMEM);
 		goto err;
diff --git a/fs/inode.c b/fs/inode.c
index 86670941884b..95fd67a6cac3 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -153,7 +153,7 @@ static int no_open(struct inode *inode, struct file *file)
  * These are initializations that need to be done on every inode
  * allocation as the fields are not initialised by slab allocation.
  */
-int inode_init_always(struct super_block *sb, struct inode *inode)
+int inode_init_always(struct super_block *sb, struct inode *inode, gfp_t gfp)
 {
 	static const struct inode_operations empty_iops;
 	static const struct file_operations no_open_fops = {.open = no_open};
@@ -230,14 +230,14 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
 #endif
 	inode->i_flctx = NULL;
 
-	if (unlikely(security_inode_alloc(inode)))
+	if (unlikely(security_inode_alloc(inode, gfp)))
 		return -ENOMEM;
 
 	this_cpu_inc(nr_inodes);
 
 	return 0;
 }
-EXPORT_SYMBOL(inode_init_always);
+EXPORT_SYMBOL(inode_init_always_gfp);
 
 void free_inode_nonrcu(struct inode *inode)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd34b5755c0b..d46ca71a7855 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3027,7 +3027,12 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int whence);
 
 extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence);
 
-extern int inode_init_always(struct super_block *, struct inode *);
+extern int inode_init_always_gfp(struct super_block *, struct inode *, gfp_t);
+static inline int inode_init_always(struct super_block *sb, struct inode *inode)
+{
+	return inode_init_always_gfp(sb, inode, GFP_NOFS);
+}
+
 extern void inode_init_once(struct inode *);
 extern void address_space_init_once(struct address_space *mapping);
 extern struct inode * igrab(struct inode *);
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index a2ade0ffe9e7..b08472d64765 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -150,6 +150,6 @@ extern struct lsm_info __start_early_lsm_info[], __end_early_lsm_info[];
 		__used __section(".early_lsm_info.init")		\
 		__aligned(sizeof(unsigned long))
 
-extern int lsm_inode_alloc(struct inode *inode);
+extern int lsm_inode_alloc(struct inode *inode, gfp_t gfp);
 
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/include/linux/security.h b/include/linux/security.h
index 1390f1efb4f0..7c6b9b038a0d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -336,7 +336,7 @@ int security_dentry_create_files_as(struct dentry *dentry, int mode,
 					struct cred *new);
 int security_path_notify(const struct path *path, u64 mask,
 					unsigned int obj_type);
-int security_inode_alloc(struct inode *inode);
+int security_inode_alloc(struct inode *inode, gfp_t gfp);
 void security_inode_free(struct inode *inode);
 int security_inode_init_security(struct inode *inode, struct inode *dir,
 				 const struct qstr *qstr,
@@ -769,7 +769,7 @@ static inline int security_path_notify(const struct path *path, u64 mask,
 	return 0;
 }
 
-static inline int security_inode_alloc(struct inode *inode)
+static inline int security_inode_alloc(struct inode *inode, gfp_t gfp)
 {
 	return 0;
 }
diff --git a/security/security.c b/security/security.c
index 8cee5b6c6e6d..3581262da5ee 100644
--- a/security/security.c
+++ b/security/security.c
@@ -660,14 +660,14 @@ static int lsm_file_alloc(struct file *file)
  *
  * Returns 0, or -ENOMEM if memory can't be allocated.
  */
-int lsm_inode_alloc(struct inode *inode)
+int lsm_inode_alloc(struct inode *inode, gfp_t gfp)
 {
 	if (!lsm_inode_cache) {
 		inode->i_security = NULL;
 		return 0;
 	}
 
-	inode->i_security = kmem_cache_zalloc(lsm_inode_cache, GFP_NOFS);
+	inode->i_security = kmem_cache_zalloc(lsm_inode_cache, gfp);
 	if (inode->i_security == NULL)
 		return -ENOMEM;
 	return 0;
@@ -1582,9 +1582,9 @@ int security_path_notify(const struct path *path, u64 mask,
  *
  * Return: Return 0 if operation was successful.
  */
-int security_inode_alloc(struct inode *inode)
+int security_inode_alloc(struct inode *inode, gfp_t gfp)
 {
-	int rc = lsm_inode_alloc(inode);
+	int rc = lsm_inode_alloc(inode, gfp);
 
 	if (unlikely(rc))
 		return rc;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM
  2024-08-26  8:47 [PATCH 0/2] get rid of PF_MEMALLOC_NORECLAIM Michal Hocko
  2024-08-26  8:47 ` [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Michal Hocko
@ 2024-08-26  8:47 ` Michal Hocko
  2024-08-26 13:48   ` Yafang Shao
                     ` (3 more replies)
  1 sibling, 4 replies; 78+ messages in thread
From: Michal Hocko @ 2024-08-26  8:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Yafang Shao, Kent Overstreet, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

There is no existing user of the flag and the flag is dangerous because
a nested allocation context can use GFP_NOFAIL which could cause
unexpected failure. Such a code would be hard to maintain because it
could be deeper in the call chain.

PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
that such a allocation contex is inherently unsafe if the context
doesn't fully control all allocations called from this context.

[1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/sched.h    | 1 -
 include/linux/sched/mm.h | 7 ++-----
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f8d150343d42..72dad3a6317a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1657,7 +1657,6 @@ extern struct pid *cad_pid;
 						 * I am cleaning dirty pages from some other bdi. */
 #define PF_KTHREAD		0x00200000	/* I am a kernel thread */
 #define PF_RANDOMIZE		0x00400000	/* Randomize virtual address space */
-#define PF_MEMALLOC_NORECLAIM	0x00800000	/* All allocation requests will clear __GFP_DIRECT_RECLAIM */
 #define PF_MEMALLOC_NOWARN	0x01000000	/* All allocation requests will inherit __GFP_NOWARN */
 #define PF__HOLE__02000000	0x02000000
 #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_mask */
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 91546493c43d..c49f2b24acb9 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -260,16 +260,13 @@ static inline gfp_t current_gfp_context(gfp_t flags)
 
 	if (unlikely(pflags & (PF_MEMALLOC_NOIO |
 			       PF_MEMALLOC_NOFS |
-			       PF_MEMALLOC_NORECLAIM |
 			       PF_MEMALLOC_NOWARN |
 			       PF_MEMALLOC_PIN))) {
 		/*
 		 * Stronger flags before weaker flags:
-		 * NORECLAIM implies NOIO, which in turn implies NOFS
+		 * NOIO implies NOFS
 		 */
-		if (pflags & PF_MEMALLOC_NORECLAIM)
-			flags &= ~__GFP_DIRECT_RECLAIM;
-		else if (pflags & PF_MEMALLOC_NOIO)
+		if (pflags & PF_MEMALLOC_NOIO)
 			flags &= ~(__GFP_IO | __GFP_FS);
 		else if (pflags & PF_MEMALLOC_NOFS)
 			flags &= ~__GFP_FS;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26  8:47 ` [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Michal Hocko
@ 2024-08-26 13:11   ` Matthew Wilcox
  2024-08-26 16:48     ` Michal Hocko
  2024-08-26 19:39   ` Kent Overstreet
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 78+ messages in thread
From: Matthew Wilcox @ 2024-08-26 13:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, Kent Overstreet,
	jack, Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Mon, Aug 26, 2024 at 10:47:12AM +0200, Michal Hocko wrote:
> @@ -258,12 +258,10 @@ static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
>   */
>  static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans)
>  {
> -	struct bch_inode_info *inode =
> -		memalloc_flags_do(PF_MEMALLOC_NORECLAIM|PF_MEMALLOC_NOWARN,
> -				  __bch2_new_inode(trans->c));
> +	struct bch_inode_info *inode = __bch2_new_inode(trans->c, GFP_NOWARN | GFP_NOWAIT);

GFP_NOWAIT include GFP_NOWARN these days (since 16f5dfbc851b)

> +++ b/fs/inode.c
> @@ -153,7 +153,7 @@ static int no_open(struct inode *inode, struct file *file)
>   * These are initializations that need to be done on every inode
>   * allocation as the fields are not initialised by slab allocation.
>   */
> -int inode_init_always(struct super_block *sb, struct inode *inode)
> +int inode_init_always(struct super_block *sb, struct inode *inode, gfp_t gfp)

Did you send the right version of this patch?  There should be a "_gfp"
appended to this function name.

> +++ b/include/linux/fs.h
> @@ -3027,7 +3027,12 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int whence);
>  
>  extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence);
>  
> -extern int inode_init_always(struct super_block *, struct inode *);
> +extern int inode_init_always_gfp(struct super_block *, struct inode *, gfp_t);

You can drop the "extern" while you're changing this line.

> +static inline int inode_init_always(struct super_block *sb, struct inode *inode)
> +{
> +	return inode_init_always_gfp(sb, inode, GFP_NOFS);
> +}

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM
  2024-08-26  8:47 ` [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM Michal Hocko
@ 2024-08-26 13:48   ` Yafang Shao
  2024-08-26 16:54     ` Michal Hocko
  2024-08-26 13:59   ` Matthew Wilcox
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 78+ messages in thread
From: Yafang Shao @ 2024-08-26 13:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Kent Overstreet, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Mon, Aug 26, 2024 at 4:53 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> From: Michal Hocko <mhocko@suse.com>
>
> There is no existing user of the flag and the flag is dangerous because
> a nested allocation context can use GFP_NOFAIL which could cause
> unexpected failure. Such a code would be hard to maintain because it
> could be deeper in the call chain.
>
> PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
> that such a allocation contex is inherently unsafe if the context
> doesn't fully control all allocations called from this context.
>
> [1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  include/linux/sched.h    | 1 -
>  include/linux/sched/mm.h | 7 ++-----
>  2 files changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index f8d150343d42..72dad3a6317a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1657,7 +1657,6 @@ extern struct pid *cad_pid;
>                                                  * I am cleaning dirty pages from some other bdi. */
>  #define PF_KTHREAD             0x00200000      /* I am a kernel thread */
>  #define PF_RANDOMIZE           0x00400000      /* Randomize virtual address space */
> -#define PF_MEMALLOC_NORECLAIM  0x00800000      /* All allocation requests will clear __GFP_DIRECT_RECLAIM */

To maintain consistency with the other unused bits, it would be better
to define PF__HOLE__00800000 instead.

--
Regards

Yafang

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM
  2024-08-26  8:47 ` [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM Michal Hocko
  2024-08-26 13:48   ` Yafang Shao
@ 2024-08-26 13:59   ` Matthew Wilcox
  2024-08-26 16:51     ` Michal Hocko
  2024-08-26 19:04   ` Kent Overstreet
  2024-08-27 12:29   ` Christoph Hellwig
  3 siblings, 1 reply; 78+ messages in thread
From: Matthew Wilcox @ 2024-08-26 13:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, Kent Overstreet,
	jack, Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Mon, Aug 26, 2024 at 10:47:13AM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> There is no existing user of the flag and the flag is dangerous because
> a nested allocation context can use GFP_NOFAIL which could cause
> unexpected failure. Such a code would be hard to maintain because it
> could be deeper in the call chain.
> 
> PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
> that such a allocation contex is inherently unsafe if the context
> doesn't fully control all allocations called from this context.

Wouldn't a straight-up revert of eab0af905bfc be cleaner?  Or is there
a reason to keep PF_MEMALLOC_NOWARN?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26 13:11   ` Matthew Wilcox
@ 2024-08-26 16:48     ` Michal Hocko
  0 siblings, 0 replies; 78+ messages in thread
From: Michal Hocko @ 2024-08-26 16:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, Kent Overstreet,
	jack, Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon 26-08-24 14:11:39, Matthew Wilcox wrote:
> On Mon, Aug 26, 2024 at 10:47:12AM +0200, Michal Hocko wrote:
> > @@ -258,12 +258,10 @@ static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
> >   */
> >  static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans)
> >  {
> > -	struct bch_inode_info *inode =
> > -		memalloc_flags_do(PF_MEMALLOC_NORECLAIM|PF_MEMALLOC_NOWARN,
> > -				  __bch2_new_inode(trans->c));
> > +	struct bch_inode_info *inode = __bch2_new_inode(trans->c, GFP_NOWARN | GFP_NOWAIT);
> 
> GFP_NOWAIT include GFP_NOWARN these days (since 16f5dfbc851b)

Ohh, I was not aware of that. I will drop NOWARN then.

> > +++ b/fs/inode.c
> > @@ -153,7 +153,7 @@ static int no_open(struct inode *inode, struct file *file)
> >   * These are initializations that need to be done on every inode
> >   * allocation as the fields are not initialised by slab allocation.
> >   */
> > -int inode_init_always(struct super_block *sb, struct inode *inode)
> > +int inode_init_always(struct super_block *sb, struct inode *inode, gfp_t gfp)
> 
> Did you send the right version of this patch?  There should be a "_gfp"
> appended to this function name.

yes, screw up on my end.

> > +++ b/include/linux/fs.h
> > @@ -3027,7 +3027,12 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int whence);
> >  
> >  extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence);
> >  
> > -extern int inode_init_always(struct super_block *, struct inode *);
> > +extern int inode_init_always_gfp(struct super_block *, struct inode *, gfp_t);
> 
> You can drop the "extern" while you're changing this line.

OK, I can. I just kept the usual style in this file.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM
  2024-08-26 13:59   ` Matthew Wilcox
@ 2024-08-26 16:51     ` Michal Hocko
  2024-08-26 17:49       ` Matthew Wilcox
  0 siblings, 1 reply; 78+ messages in thread
From: Michal Hocko @ 2024-08-26 16:51 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, Kent Overstreet,
	jack, Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon 26-08-24 14:59:29, Matthew Wilcox wrote:
> On Mon, Aug 26, 2024 at 10:47:13AM +0200, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > There is no existing user of the flag and the flag is dangerous because
> > a nested allocation context can use GFP_NOFAIL which could cause
> > unexpected failure. Such a code would be hard to maintain because it
> > could be deeper in the call chain.
> > 
> > PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
> > that such a allocation contex is inherently unsafe if the context
> > doesn't fully control all allocations called from this context.
> 
> Wouldn't a straight-up revert of eab0af905bfc be cleaner?  Or is there
> a reason to keep PF_MEMALLOC_NOWARN?

I wanted to make it PF_MEMALLOC_NORECLAIM specific. I do not have a
strong case against PF_MEMALLOC_NOWARN TBH. It is a hack because the
scope is claiming something about all allocations within the scope
without necessarily knowing all of them (including potential future
changes). But NOWARN is not really harmful so I do not care strongly.

If a plan revert is preferably, I will go with it.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM
  2024-08-26 13:48   ` Yafang Shao
@ 2024-08-26 16:54     ` Michal Hocko
  0 siblings, 0 replies; 78+ messages in thread
From: Michal Hocko @ 2024-08-26 16:54 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Andrew Morton, Christoph Hellwig, Kent Overstreet, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon 26-08-24 21:48:34, Yafang Shao wrote:
> On Mon, Aug 26, 2024 at 4:53 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > From: Michal Hocko <mhocko@suse.com>
> >
> > There is no existing user of the flag and the flag is dangerous because
> > a nested allocation context can use GFP_NOFAIL which could cause
> > unexpected failure. Such a code would be hard to maintain because it
> > could be deeper in the call chain.
> >
> > PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
> > that such a allocation contex is inherently unsafe if the context
> > doesn't fully control all allocations called from this context.
> >
> > [1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  include/linux/sched.h    | 1 -
> >  include/linux/sched/mm.h | 7 ++-----
> >  2 files changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index f8d150343d42..72dad3a6317a 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1657,7 +1657,6 @@ extern struct pid *cad_pid;
> >                                                  * I am cleaning dirty pages from some other bdi. */
> >  #define PF_KTHREAD             0x00200000      /* I am a kernel thread */
> >  #define PF_RANDOMIZE           0x00400000      /* Randomize virtual address space */
> > -#define PF_MEMALLOC_NORECLAIM  0x00800000      /* All allocation requests will clear __GFP_DIRECT_RECLAIM */
> 
> To maintain consistency with the other unused bits, it would be better
> to define PF__HOLE__00800000 instead.

OK

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM
  2024-08-26 16:51     ` Michal Hocko
@ 2024-08-26 17:49       ` Matthew Wilcox
  2024-08-26 19:18         ` Michal Hocko
  0 siblings, 1 reply; 78+ messages in thread
From: Matthew Wilcox @ 2024-08-26 17:49 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, Kent Overstreet,
	jack, Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon, Aug 26, 2024 at 06:51:55PM +0200, Michal Hocko wrote:
> On Mon 26-08-24 14:59:29, Matthew Wilcox wrote:
> > On Mon, Aug 26, 2024 at 10:47:13AM +0200, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > There is no existing user of the flag and the flag is dangerous because
> > > a nested allocation context can use GFP_NOFAIL which could cause
> > > unexpected failure. Such a code would be hard to maintain because it
> > > could be deeper in the call chain.
> > > 
> > > PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
> > > that such a allocation contex is inherently unsafe if the context
> > > doesn't fully control all allocations called from this context.
> > 
> > Wouldn't a straight-up revert of eab0af905bfc be cleaner?  Or is there
> > a reason to keep PF_MEMALLOC_NOWARN?
> 
> I wanted to make it PF_MEMALLOC_NORECLAIM specific. I do not have a
> strong case against PF_MEMALLOC_NOWARN TBH. It is a hack because the
> scope is claiming something about all allocations within the scope
> without necessarily knowing all of them (including potential future
> changes). But NOWARN is not really harmful so I do not care strongly.
> 
> If a plan revert is preferably, I will go with it.

There aren't any other users of PF_MEMALLOC_NOWARN and it definitely
seems like something you want at a callsite rather than blanket for every
allocation below this point.  We don't seem to have many PF_ flags left,
so let's not keep it around if there's no immediate plans for it.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM
  2024-08-26  8:47 ` [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM Michal Hocko
  2024-08-26 13:48   ` Yafang Shao
  2024-08-26 13:59   ` Matthew Wilcox
@ 2024-08-26 19:04   ` Kent Overstreet
  2024-08-27 12:29   ` Christoph Hellwig
  3 siblings, 0 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-08-26 19:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Mon, Aug 26, 2024 at 10:47:13AM GMT, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> There is no existing user of the flag and the flag is dangerous because
> a nested allocation context can use GFP_NOFAIL which could cause
> unexpected failure. Such a code would be hard to maintain because it
> could be deeper in the call chain.
> 
> PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
> that such a allocation contex is inherently unsafe if the context
> doesn't fully control all allocations called from this context.

I don't really buy the unsafety argument; if it applies to anything, it
applies to GFP_NOFAIL - but we recently grew warnings about unsafe uses
for it, so I don't see it as a great concern.

GFP_NORECLAIM is frequently desirable as a hint about the latency
requirements of a codepath; "don't try too hard, I've got fallbacks and
I'm in a codepath where I don't want to block too long".

I expect PF_MEMALLOC_NORECLAIM will find legitimate uses.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM
  2024-08-26 17:49       ` Matthew Wilcox
@ 2024-08-26 19:18         ` Michal Hocko
  2024-08-26 19:20           ` Matthew Wilcox
                             ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: Michal Hocko @ 2024-08-26 19:18 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, Kent Overstreet,
	jack, Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon 26-08-24 18:49:23, Matthew Wilcox wrote:
> On Mon, Aug 26, 2024 at 06:51:55PM +0200, Michal Hocko wrote:
[...]
> > If a plan revert is preferably, I will go with it.
> 
> There aren't any other users of PF_MEMALLOC_NOWARN and it definitely
> seems like something you want at a callsite rather than blanket for every
> allocation below this point.  We don't seem to have many PF_ flags left,
> so let's not keep it around if there's no immediate plans for it.

Good point. What about this?
--- 
From 923cd429d4b1a3520c93bcf46611ae74a3158865 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Mon, 26 Aug 2024 21:15:02 +0200
Subject: [PATCH] Revert "mm: introduce PF_MEMALLOC_NORECLAIM,
 PF_MEMALLOC_NOWARN"

This reverts commit eab0af905bfc3e9c05da2ca163d76a1513159aa4.

There is no existing user of those flags. PF_MEMALLOC_NOWARN is
dangerous because a nested allocation context can use GFP_NOFAIL which
could cause unexpected failure. Such a code would be hard to maintain
because it could be deeper in the call chain.

PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
that such a allocation contex is inherently unsafe if the context
doesn't fully control all allocations called from this context.

While PF_MEMALLOC_NOWARN is not dangerous the way PF_MEMALLOC_NORECLAIM
is it doesn't have any user and as Matthew has pointed out we are
running out of those flags so better reclaim it without any real users.

[1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/sched.h    |  4 ++--
 include/linux/sched/mm.h | 17 ++++-------------
 2 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f8d150343d42..731ff1078c9e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1657,8 +1657,8 @@ extern struct pid *cad_pid;
 						 * I am cleaning dirty pages from some other bdi. */
 #define PF_KTHREAD		0x00200000	/* I am a kernel thread */
 #define PF_RANDOMIZE		0x00400000	/* Randomize virtual address space */
-#define PF_MEMALLOC_NORECLAIM	0x00800000	/* All allocation requests will clear __GFP_DIRECT_RECLAIM */
-#define PF_MEMALLOC_NOWARN	0x01000000	/* All allocation requests will inherit __GFP_NOWARN */
+#define PF__HOLE__00800000	0x00800000
+#define PF__HOLE__01000000	0x01000000
 #define PF__HOLE__02000000	0x02000000
 #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_mask */
 #define PF_MCE_EARLY		0x08000000      /* Early kill for mce process policy */
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 91546493c43d..07c4fde32827 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -258,25 +258,16 @@ static inline gfp_t current_gfp_context(gfp_t flags)
 {
 	unsigned int pflags = READ_ONCE(current->flags);
 
-	if (unlikely(pflags & (PF_MEMALLOC_NOIO |
-			       PF_MEMALLOC_NOFS |
-			       PF_MEMALLOC_NORECLAIM |
-			       PF_MEMALLOC_NOWARN |
-			       PF_MEMALLOC_PIN))) {
+	if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) {
 		/*
-		 * Stronger flags before weaker flags:
-		 * NORECLAIM implies NOIO, which in turn implies NOFS
+		 * NOIO implies both NOIO and NOFS and it is a weaker context
+		 * so always make sure it makes precedence
 		 */
-		if (pflags & PF_MEMALLOC_NORECLAIM)
-			flags &= ~__GFP_DIRECT_RECLAIM;
-		else if (pflags & PF_MEMALLOC_NOIO)
+		if (pflags & PF_MEMALLOC_NOIO)
 			flags &= ~(__GFP_IO | __GFP_FS);
 		else if (pflags & PF_MEMALLOC_NOFS)
 			flags &= ~__GFP_FS;
 
-		if (pflags & PF_MEMALLOC_NOWARN)
-			flags |= __GFP_NOWARN;
-
 		if (pflags & PF_MEMALLOC_PIN)
 			flags &= ~__GFP_MOVABLE;
 	}
-- 
2.46.0


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM
  2024-08-26 19:18         ` Michal Hocko
@ 2024-08-26 19:20           ` Matthew Wilcox
  2024-08-28  4:11           ` Dave Chinner
  2024-08-29 21:45           ` Vlastimil Babka
  2 siblings, 0 replies; 78+ messages in thread
From: Matthew Wilcox @ 2024-08-26 19:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, Kent Overstreet,
	jack, Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon, Aug 26, 2024 at 09:18:01PM +0200, Michal Hocko wrote:
> On Mon 26-08-24 18:49:23, Matthew Wilcox wrote:
> > On Mon, Aug 26, 2024 at 06:51:55PM +0200, Michal Hocko wrote:
> [...]
> > > If a plan revert is preferably, I will go with it.
> > 
> > There aren't any other users of PF_MEMALLOC_NOWARN and it definitely
> > seems like something you want at a callsite rather than blanket for every
> > allocation below this point.  We don't seem to have many PF_ flags left,
> > so let's not keep it around if there's no immediate plans for it.
> 
> Good point. What about this?

Looks clean to me.

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

> >From 923cd429d4b1a3520c93bcf46611ae74a3158865 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Mon, 26 Aug 2024 21:15:02 +0200
> Subject: [PATCH] Revert "mm: introduce PF_MEMALLOC_NORECLAIM,
>  PF_MEMALLOC_NOWARN"
> 
> This reverts commit eab0af905bfc3e9c05da2ca163d76a1513159aa4.
> 
> There is no existing user of those flags. PF_MEMALLOC_NOWARN is
> dangerous because a nested allocation context can use GFP_NOFAIL which
> could cause unexpected failure. Such a code would be hard to maintain
> because it could be deeper in the call chain.
> 
> PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
> that such a allocation contex is inherently unsafe if the context
> doesn't fully control all allocations called from this context.
> 
> While PF_MEMALLOC_NOWARN is not dangerous the way PF_MEMALLOC_NORECLAIM
> is it doesn't have any user and as Matthew has pointed out we are
> running out of those flags so better reclaim it without any real users.
> 
> [1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  include/linux/sched.h    |  4 ++--
>  include/linux/sched/mm.h | 17 ++++-------------
>  2 files changed, 6 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index f8d150343d42..731ff1078c9e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1657,8 +1657,8 @@ extern struct pid *cad_pid;
>  						 * I am cleaning dirty pages from some other bdi. */
>  #define PF_KTHREAD		0x00200000	/* I am a kernel thread */
>  #define PF_RANDOMIZE		0x00400000	/* Randomize virtual address space */
> -#define PF_MEMALLOC_NORECLAIM	0x00800000	/* All allocation requests will clear __GFP_DIRECT_RECLAIM */
> -#define PF_MEMALLOC_NOWARN	0x01000000	/* All allocation requests will inherit __GFP_NOWARN */
> +#define PF__HOLE__00800000	0x00800000
> +#define PF__HOLE__01000000	0x01000000
>  #define PF__HOLE__02000000	0x02000000
>  #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_mask */
>  #define PF_MCE_EARLY		0x08000000      /* Early kill for mce process policy */
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 91546493c43d..07c4fde32827 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -258,25 +258,16 @@ static inline gfp_t current_gfp_context(gfp_t flags)
>  {
>  	unsigned int pflags = READ_ONCE(current->flags);
>  
> -	if (unlikely(pflags & (PF_MEMALLOC_NOIO |
> -			       PF_MEMALLOC_NOFS |
> -			       PF_MEMALLOC_NORECLAIM |
> -			       PF_MEMALLOC_NOWARN |
> -			       PF_MEMALLOC_PIN))) {
> +	if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) {
>  		/*
> -		 * Stronger flags before weaker flags:
> -		 * NORECLAIM implies NOIO, which in turn implies NOFS
> +		 * NOIO implies both NOIO and NOFS and it is a weaker context
> +		 * so always make sure it makes precedence
>  		 */
> -		if (pflags & PF_MEMALLOC_NORECLAIM)
> -			flags &= ~__GFP_DIRECT_RECLAIM;
> -		else if (pflags & PF_MEMALLOC_NOIO)
> +		if (pflags & PF_MEMALLOC_NOIO)
>  			flags &= ~(__GFP_IO | __GFP_FS);
>  		else if (pflags & PF_MEMALLOC_NOFS)
>  			flags &= ~__GFP_FS;
>  
> -		if (pflags & PF_MEMALLOC_NOWARN)
> -			flags |= __GFP_NOWARN;
> -
>  		if (pflags & PF_MEMALLOC_PIN)
>  			flags &= ~__GFP_MOVABLE;
>  	}
> -- 
> 2.46.0
> 
> 
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26  8:47 ` [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Michal Hocko
  2024-08-26 13:11   ` Matthew Wilcox
@ 2024-08-26 19:39   ` Kent Overstreet
  2024-08-26 19:41     ` Matthew Wilcox
  2024-08-26 19:58     ` Michal Hocko
  2024-08-26 19:52   ` kernel test robot
                     ` (4 subsequent siblings)
  6 siblings, 2 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-08-26 19:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Mon, Aug 26, 2024 at 10:47:12AM GMT, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
> inode to achieve GFP_NOWAIT semantic while holding locks. If this
> allocation fails it will drop locks and use GFP_NOFS allocation context.
> 
> We would like to drop PF_MEMALLOC_NORECLAIM because it is really
> dangerous to use if the caller doesn't control the full call chain with
> this flag set. E.g. if any of the function down the chain needed
> GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
> cause unexpected failure.
> 
> While this is not the case in this particular case using the scoped gfp
> semantic is not really needed bacause we can easily pus the allocation
> context down the chain without too much clutter.

yeah, eesh, nack.

Given the amount of plumbing required here, it's clear that passing gfp
flags is the less safe way of doing it, and this really does belong in
the allocation context.

Failure to pass gfp flags correctly (which we know is something that
happens today, e.g. vmalloc -> pte allocation) means you're introducing
a deadlock.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26 19:39   ` Kent Overstreet
@ 2024-08-26 19:41     ` Matthew Wilcox
  2024-08-26 19:42       ` Kent Overstreet
  2024-08-26 19:44       ` Kent Overstreet
  2024-08-26 19:58     ` Michal Hocko
  1 sibling, 2 replies; 78+ messages in thread
From: Matthew Wilcox @ 2024-08-26 19:41 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Michal Hocko, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Mon, Aug 26, 2024 at 03:39:47PM -0400, Kent Overstreet wrote:
> Given the amount of plumbing required here, it's clear that passing gfp
> flags is the less safe way of doing it, and this really does belong in
> the allocation context.
> 
> Failure to pass gfp flags correctly (which we know is something that
> happens today, e.g. vmalloc -> pte allocation) means you're introducing
> a deadlock.

The problem with vmalloc is that the page table allocation _doesn't_
take a GFP parameter.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26 19:41     ` Matthew Wilcox
@ 2024-08-26 19:42       ` Kent Overstreet
  2024-08-26 19:47         ` Matthew Wilcox
  2024-08-26 19:44       ` Kent Overstreet
  1 sibling, 1 reply; 78+ messages in thread
From: Kent Overstreet @ 2024-08-26 19:42 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Michal Hocko, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Mon, Aug 26, 2024 at 08:41:42PM GMT, Matthew Wilcox wrote:
> On Mon, Aug 26, 2024 at 03:39:47PM -0400, Kent Overstreet wrote:
> > Given the amount of plumbing required here, it's clear that passing gfp
> > flags is the less safe way of doing it, and this really does belong in
> > the allocation context.
> > 
> > Failure to pass gfp flags correctly (which we know is something that
> > happens today, e.g. vmalloc -> pte allocation) means you're introducing
> > a deadlock.
> 
> The problem with vmalloc is that the page table allocation _doesn't_
> take a GFP parameter.

yeah, I know. I posted patches to plumb it through, which were nacked by
Linus.

And we're trying to get away from passing gfp flags directly, are we
not? I just don't buy the GFP_NOFAIL unsafety argument.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26 19:41     ` Matthew Wilcox
  2024-08-26 19:42       ` Kent Overstreet
@ 2024-08-26 19:44       ` Kent Overstreet
  1 sibling, 0 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-08-26 19:44 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Michal Hocko, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Mon, Aug 26, 2024 at 08:41:42PM GMT, Matthew Wilcox wrote:
> On Mon, Aug 26, 2024 at 03:39:47PM -0400, Kent Overstreet wrote:
> > Given the amount of plumbing required here, it's clear that passing gfp
> > flags is the less safe way of doing it, and this really does belong in
> > the allocation context.
> > 
> > Failure to pass gfp flags correctly (which we know is something that
> > happens today, e.g. vmalloc -> pte allocation) means you're introducing
> > a deadlock.
> 
> The problem with vmalloc is that the page table allocation _doesn't_
> take a GFP parameter.

and given the increasing usage of kvmalloc() this is something we really
should be thinking about

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26 19:42       ` Kent Overstreet
@ 2024-08-26 19:47         ` Matthew Wilcox
  2024-08-26 19:54           ` Kent Overstreet
  0 siblings, 1 reply; 78+ messages in thread
From: Matthew Wilcox @ 2024-08-26 19:47 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Michal Hocko, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Mon, Aug 26, 2024 at 03:42:59PM -0400, Kent Overstreet wrote:
> On Mon, Aug 26, 2024 at 08:41:42PM GMT, Matthew Wilcox wrote:
> > On Mon, Aug 26, 2024 at 03:39:47PM -0400, Kent Overstreet wrote:
> > > Given the amount of plumbing required here, it's clear that passing gfp
> > > flags is the less safe way of doing it, and this really does belong in
> > > the allocation context.
> > > 
> > > Failure to pass gfp flags correctly (which we know is something that
> > > happens today, e.g. vmalloc -> pte allocation) means you're introducing
> > > a deadlock.
> > 
> > The problem with vmalloc is that the page table allocation _doesn't_
> > take a GFP parameter.
> 
> yeah, I know. I posted patches to plumb it through, which were nacked by
> Linus.
> 
> And we're trying to get away from passing gfp flags directly, are we
> not? I just don't buy the GFP_NOFAIL unsafety argument.

The problem with the giant invasive change of "getting away from passing
GFP flags directly" is that you need to build consensus for what it
looks like and convince everyone that you have a solution that solves
all the problems, or at least doesn't make any of those problems worse.
You haven't done that, you've just committed code that the MM people hate
(indeed already rejected), and set back the idea.

Look, it's not your job to fix it, but if you want to do it, do it
properly.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26  8:47 ` [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Michal Hocko
  2024-08-26 13:11   ` Matthew Wilcox
  2024-08-26 19:39   ` Kent Overstreet
@ 2024-08-26 19:52   ` kernel test robot
  2024-08-26 20:53   ` kernel test robot
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 78+ messages in thread
From: kernel test robot @ 2024-08-26 19:52 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: oe-kbuild-all, Linux Memory Management List, Christoph Hellwig,
	Yafang Shao, Kent Overstreet, jack, Christian Brauner,
	Alexander Viro, Paul Moore, James Morris, Serge E. Hallyn,
	linux-fsdevel, linux-bcachefs, linux-security-module,
	linux-kernel, Michal Hocko

Hi Michal,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on tip/sched/core brauner-vfs/vfs.all linus/master v6.11-rc5]
[cannot apply to next-20240826]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Michal-Hocko/bcachefs-do-not-use-PF_MEMALLOC_NORECLAIM/20240826-171013
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240826085347.1152675-2-mhocko%40kernel.org
patch subject: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
config: arc-randconfig-001-20240827 (https://download.01.org/0day-ci/archive/20240827/202408270304.AUPHM0xo-lkp@intel.com/config)
compiler: arc-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240827/202408270304.AUPHM0xo-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408270304.AUPHM0xo-lkp@intel.com/

All errors (new ones prefixed by >>):

   fs/bcachefs/fs.c: In function '__bch2_new_inode':
>> fs/bcachefs/fs.c:248:70: error: macro "unlikely" passed 2 arguments, but takes just 1
     248 |         if (unlikely(inode_init_always_gfp(c->vfs_sb, &inode->v), gfp)) {
         |                                                                      ^
   In file included from include/linux/build_bug.h:5,
                    from include/linux/container_of.h:5,
                    from include/linux/list.h:5,
                    from include/linux/backing-dev-defs.h:5,
                    from fs/bcachefs/bcachefs.h:186,
                    from fs/bcachefs/fs.c:4:
   include/linux/compiler.h:77: note: macro "unlikely" defined here
      77 | # define unlikely(x)    __builtin_expect(!!(x), 0)
         | 
>> fs/bcachefs/fs.c:248:13: error: 'unlikely' undeclared (first use in this function)
     248 |         if (unlikely(inode_init_always_gfp(c->vfs_sb, &inode->v), gfp)) {
         |             ^~~~~~~~
   fs/bcachefs/fs.c:248:13: note: each undeclared identifier is reported only once for each function it appears in
   fs/bcachefs/fs.c: In function 'bch2_new_inode':
>> fs/bcachefs/fs.c:261:67: error: 'GFP_NOWARN' undeclared (first use in this function); did you mean 'GFP_NOWAIT'?
     261 |         struct bch_inode_info *inode = __bch2_new_inode(trans->c, GFP_NOWARN | GFP_NOWAIT);
         |                                                                   ^~~~~~~~~~
         |                                                                   GFP_NOWAIT


vim +/unlikely +248 fs/bcachefs/fs.c

   233	
   234	static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c, gfp_t gfp)
   235	{
   236		struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, gfp);
   237		if (!inode)
   238			return NULL;
   239	
   240		inode_init_once(&inode->v);
   241		mutex_init(&inode->ei_update_lock);
   242		two_state_lock_init(&inode->ei_pagecache_lock);
   243		INIT_LIST_HEAD(&inode->ei_vfs_inode_list);
   244		inode->ei_flags = 0;
   245		mutex_init(&inode->ei_quota_lock);
   246		memset(&inode->ei_devs_need_flush, 0, sizeof(inode->ei_devs_need_flush));
   247	
 > 248		if (unlikely(inode_init_always_gfp(c->vfs_sb, &inode->v), gfp)) {
   249			kmem_cache_free(bch2_inode_cache, inode);
   250			return NULL;
   251		}
   252	
   253		return inode;
   254	}
   255	
   256	/*
   257	 * Allocate a new inode, dropping/retaking btree locks if necessary:
   258	 */
   259	static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans)
   260	{
 > 261		struct bch_inode_info *inode = __bch2_new_inode(trans->c, GFP_NOWARN | GFP_NOWAIT);
   262	
   263		if (unlikely(!inode)) {
   264			int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c, GFP_NOFS)) ? 0 : -ENOMEM);
   265			if (ret && inode) {
   266				__destroy_inode(&inode->v);
   267				kmem_cache_free(bch2_inode_cache, inode);
   268			}
   269			if (ret)
   270				return ERR_PTR(ret);
   271		}
   272	
   273		return inode;
   274	}
   275	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26 19:47         ` Matthew Wilcox
@ 2024-08-26 19:54           ` Kent Overstreet
  0 siblings, 0 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-08-26 19:54 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Michal Hocko, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Mon, Aug 26, 2024 at 08:47:03PM GMT, Matthew Wilcox wrote:
> On Mon, Aug 26, 2024 at 03:42:59PM -0400, Kent Overstreet wrote:
> > On Mon, Aug 26, 2024 at 08:41:42PM GMT, Matthew Wilcox wrote:
> > > On Mon, Aug 26, 2024 at 03:39:47PM -0400, Kent Overstreet wrote:
> > > > Given the amount of plumbing required here, it's clear that passing gfp
> > > > flags is the less safe way of doing it, and this really does belong in
> > > > the allocation context.
> > > > 
> > > > Failure to pass gfp flags correctly (which we know is something that
> > > > happens today, e.g. vmalloc -> pte allocation) means you're introducing
> > > > a deadlock.
> > > 
> > > The problem with vmalloc is that the page table allocation _doesn't_
> > > take a GFP parameter.
> > 
> > yeah, I know. I posted patches to plumb it through, which were nacked by
> > Linus.
> > 
> > And we're trying to get away from passing gfp flags directly, are we
> > not? I just don't buy the GFP_NOFAIL unsafety argument.
> 
> The problem with the giant invasive change of "getting away from passing
> GFP flags directly" is that you need to build consensus for what it
> looks like and convince everyone that you have a solution that solves
> all the problems, or at least doesn't make any of those problems worse.
> You haven't done that, you've just committed code that the MM people hate
> (indeed already rejected), and set back the idea.

Err, what? I'm not the one who originated the idae of getting away from
passing gfp flags directly. That's been the consensus for _awhile_;
you've been talking about it, Linus talked about it re: vmalloc pte
allocation.

So this looks to me like the mm people just trying to avoid having to
deal with that. GFP_NOFAIL conflicting with other memalloc/gfp flags is
not a new issue (you really shouldn't be combining GFP_NOFAIL with
GFP_NOFS or GFP_NOIO!), and it's something we have warnings for.

But the issue of gfp flags not being passed correctly is not something
we can check for at runtime with any reliability - which means this
patchset look like a net negative to me.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26 19:39   ` Kent Overstreet
  2024-08-26 19:41     ` Matthew Wilcox
@ 2024-08-26 19:58     ` Michal Hocko
  2024-08-26 20:00       ` Kent Overstreet
  1 sibling, 1 reply; 78+ messages in thread
From: Michal Hocko @ 2024-08-26 19:58 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon 26-08-24 15:39:47, Kent Overstreet wrote:
> On Mon, Aug 26, 2024 at 10:47:12AM GMT, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
> > inode to achieve GFP_NOWAIT semantic while holding locks. If this
> > allocation fails it will drop locks and use GFP_NOFS allocation context.
> > 
> > We would like to drop PF_MEMALLOC_NORECLAIM because it is really
> > dangerous to use if the caller doesn't control the full call chain with
> > this flag set. E.g. if any of the function down the chain needed
> > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
> > cause unexpected failure.
> > 
> > While this is not the case in this particular case using the scoped gfp
> > semantic is not really needed bacause we can easily pus the allocation
> > context down the chain without too much clutter.
> 
> yeah, eesh, nack.

Sure, you can NAK this but then deal with the lack of the PF flag by
other means. We have made it clear that PF_MEMALLOC_NORECLAIM is not we
are going to support at the MM level. 

I have done your homework and shown that it is really easy
to use gfp flags directly. The net result is passing gfp flag down to
two functions. Sure part of it is ugglier by having several different
callbacks implementing it but still manageable. Without too much churn.

So do whatever you like in the bcache code but do not rely on something
that is unsupported by the MM layer which you have sneaked in without an
agreement.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26 19:58     ` Michal Hocko
@ 2024-08-26 20:00       ` Kent Overstreet
  2024-08-26 20:27         ` Michal Hocko
  0 siblings, 1 reply; 78+ messages in thread
From: Kent Overstreet @ 2024-08-26 20:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon, Aug 26, 2024 at 09:58:08PM GMT, Michal Hocko wrote:
> On Mon 26-08-24 15:39:47, Kent Overstreet wrote:
> > On Mon, Aug 26, 2024 at 10:47:12AM GMT, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
> > > inode to achieve GFP_NOWAIT semantic while holding locks. If this
> > > allocation fails it will drop locks and use GFP_NOFS allocation context.
> > > 
> > > We would like to drop PF_MEMALLOC_NORECLAIM because it is really
> > > dangerous to use if the caller doesn't control the full call chain with
> > > this flag set. E.g. if any of the function down the chain needed
> > > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
> > > cause unexpected failure.
> > > 
> > > While this is not the case in this particular case using the scoped gfp
> > > semantic is not really needed bacause we can easily pus the allocation
> > > context down the chain without too much clutter.
> > 
> > yeah, eesh, nack.
> 
> Sure, you can NAK this but then deal with the lack of the PF flag by
> other means. We have made it clear that PF_MEMALLOC_NORECLAIM is not we
> are going to support at the MM level. 
> 
> I have done your homework and shown that it is really easy
> to use gfp flags directly. The net result is passing gfp flag down to
> two functions. Sure part of it is ugglier by having several different
> callbacks implementing it but still manageable. Without too much churn.
> 
> So do whatever you like in the bcache code but do not rely on something
> that is unsupported by the MM layer which you have sneaked in without an
> agreement.

Michal, you're being damned hostile, while posting code you haven't even
tried to compile. Seriously, dude?

How about sticking to the technical issues at hand instead of saying
"this is mm, so my way or the highway?". We're all kernel developers
here, this is not what we do.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26 20:00       ` Kent Overstreet
@ 2024-08-26 20:27         ` Michal Hocko
  2024-08-26 20:43           ` Kent Overstreet
  0 siblings, 1 reply; 78+ messages in thread
From: Michal Hocko @ 2024-08-26 20:27 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon 26-08-24 16:00:56, Kent Overstreet wrote:
> On Mon, Aug 26, 2024 at 09:58:08PM GMT, Michal Hocko wrote:
> > On Mon 26-08-24 15:39:47, Kent Overstreet wrote:
> > > On Mon, Aug 26, 2024 at 10:47:12AM GMT, Michal Hocko wrote:
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > 
> > > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
> > > > inode to achieve GFP_NOWAIT semantic while holding locks. If this
> > > > allocation fails it will drop locks and use GFP_NOFS allocation context.
> > > > 
> > > > We would like to drop PF_MEMALLOC_NORECLAIM because it is really
> > > > dangerous to use if the caller doesn't control the full call chain with
> > > > this flag set. E.g. if any of the function down the chain needed
> > > > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
> > > > cause unexpected failure.
> > > > 
> > > > While this is not the case in this particular case using the scoped gfp
> > > > semantic is not really needed bacause we can easily pus the allocation
> > > > context down the chain without too much clutter.
> > > 
> > > yeah, eesh, nack.
> > 
> > Sure, you can NAK this but then deal with the lack of the PF flag by
> > other means. We have made it clear that PF_MEMALLOC_NORECLAIM is not we
> > are going to support at the MM level. 
> > 
> > I have done your homework and shown that it is really easy
> > to use gfp flags directly. The net result is passing gfp flag down to
> > two functions. Sure part of it is ugglier by having several different
> > callbacks implementing it but still manageable. Without too much churn.
> > 
> > So do whatever you like in the bcache code but do not rely on something
> > that is unsupported by the MM layer which you have sneaked in without an
> > agreement.
> 
> Michal, you're being damned hostile, while posting code you haven't even
> tried to compile. Seriously, dude?
> 
> How about sticking to the technical issues at hand instead of saying
> "this is mm, so my way or the highway?". We're all kernel developers
> here, this is not what we do.

Kent, we do respect review feedback. You are clearly fine ignoring it
when you feels like it (eab0af905bfc ("mm: introduce
PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN") is a clear example of it).

I have already made my arguments (repeatedly) why implicit nowait
allocation context is tricky and problematic. Your response is that you
simply "do no buy it" which is a highly technical argument.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26 20:27         ` Michal Hocko
@ 2024-08-26 20:43           ` Kent Overstreet
  2024-08-26 21:10             ` Kent Overstreet
  2024-08-27  6:01             ` Michal Hocko
  0 siblings, 2 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-08-26 20:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon, Aug 26, 2024 at 10:27:44PM GMT, Michal Hocko wrote:
> On Mon 26-08-24 16:00:56, Kent Overstreet wrote:
> > On Mon, Aug 26, 2024 at 09:58:08PM GMT, Michal Hocko wrote:
> > > On Mon 26-08-24 15:39:47, Kent Overstreet wrote:
> > > > On Mon, Aug 26, 2024 at 10:47:12AM GMT, Michal Hocko wrote:
> > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > 
> > > > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
> > > > > inode to achieve GFP_NOWAIT semantic while holding locks. If this
> > > > > allocation fails it will drop locks and use GFP_NOFS allocation context.
> > > > > 
> > > > > We would like to drop PF_MEMALLOC_NORECLAIM because it is really
> > > > > dangerous to use if the caller doesn't control the full call chain with
> > > > > this flag set. E.g. if any of the function down the chain needed
> > > > > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
> > > > > cause unexpected failure.
> > > > > 
> > > > > While this is not the case in this particular case using the scoped gfp
> > > > > semantic is not really needed bacause we can easily pus the allocation
> > > > > context down the chain without too much clutter.
> > > > 
> > > > yeah, eesh, nack.
> > > 
> > > Sure, you can NAK this but then deal with the lack of the PF flag by
> > > other means. We have made it clear that PF_MEMALLOC_NORECLAIM is not we
> > > are going to support at the MM level. 
> > > 
> > > I have done your homework and shown that it is really easy
> > > to use gfp flags directly. The net result is passing gfp flag down to
> > > two functions. Sure part of it is ugglier by having several different
> > > callbacks implementing it but still manageable. Without too much churn.
> > > 
> > > So do whatever you like in the bcache code but do not rely on something
> > > that is unsupported by the MM layer which you have sneaked in without an
> > > agreement.
> > 
> > Michal, you're being damned hostile, while posting code you haven't even
> > tried to compile. Seriously, dude?
> > 
> > How about sticking to the technical issues at hand instead of saying
> > "this is mm, so my way or the highway?". We're all kernel developers
> > here, this is not what we do.
> 
> Kent, we do respect review feedback. You are clearly fine ignoring it
> when you feels like it (eab0af905bfc ("mm: introduce
> PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN") is a clear example of it).
> 
> I have already made my arguments (repeatedly) why implicit nowait
> allocation context is tricky and problematic. Your response is that you
> simply "do no buy it" which is a highly technical argument.

No, I explained why GFP_NORECLAIM/PF_MEMALLOC_NORECLAIM can absolutely
apply to a context, not a callsite, and why vmalloc() and kvmalloc()
ignoring gfp flags is a much more serious issue.

If you want to do something useful, figure out what we're going to do
about _that_. If you really don't want PF_MEMALLOC_NORECLAIM to exist,
then see if Linus will let you plumb gfp flags down to pte allocation -
and beware, that's arch code that you'll have to fix.

Reminder: kvmalloc() is a thing, and it's steadily seeing wider use.

Otherwise, PF_MEMALLOC_NORECLAIM needs to stay; and thank you for
bringing this to my attention, because it's made me realize all the
other places in bcachefs that use gfp flags for allocating memory with
btree locks held need to be switch to memalloc_flags_do().

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26  8:47 ` [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Michal Hocko
                     ` (2 preceding siblings ...)
  2024-08-26 19:52   ` kernel test robot
@ 2024-08-26 20:53   ` kernel test robot
  2024-08-27  2:23   ` kernel test robot
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 78+ messages in thread
From: kernel test robot @ 2024-08-26 20:53 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: llvm, oe-kbuild-all, Linux Memory Management List,
	Christoph Hellwig, Yafang Shao, Kent Overstreet, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

Hi Michal,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on tip/sched/core brauner-vfs/vfs.all linus/master v6.11-rc5]
[cannot apply to next-20240826]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Michal-Hocko/bcachefs-do-not-use-PF_MEMALLOC_NORECLAIM/20240826-171013
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240826085347.1152675-2-mhocko%40kernel.org
patch subject: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
config: arm-randconfig-003-20240827 (https://download.01.org/0day-ci/archive/20240827/202408270446.6Ew7FPHA-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project 08e5a1de8227512d4774a534b91cb2353cef6284)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240827/202408270446.6Ew7FPHA-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408270446.6Ew7FPHA-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from fs/bcachefs/fs.c:4:
   In file included from fs/bcachefs/bcachefs.h:188:
   In file included from include/linux/bio.h:10:
   In file included from include/linux/blk_types.h:10:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:8:
   In file included from include/linux/cacheflush.h:5:
   In file included from arch/arm/include/asm/cacheflush.h:10:
   In file included from include/linux/mm.h:2198:
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
>> fs/bcachefs/fs.c:248:60: error: too many arguments provided to function-like macro invocation
     248 |         if (unlikely(inode_init_always_gfp(c->vfs_sb, &inode->v), gfp)) {
         |                                                                   ^
   include/linux/compiler.h:77:10: note: macro 'unlikely' defined here
      77 | # define unlikely(x)    __builtin_expect(!!(x), 0)
         |          ^
>> fs/bcachefs/fs.c:248:6: error: use of undeclared identifier 'unlikely'
     248 |         if (unlikely(inode_init_always_gfp(c->vfs_sb, &inode->v), gfp)) {
         |             ^
>> fs/bcachefs/fs.c:261:60: error: use of undeclared identifier 'GFP_NOWARN'
     261 |         struct bch_inode_info *inode = __bch2_new_inode(trans->c, GFP_NOWARN | GFP_NOWAIT);
         |                                                                   ^
   1 warning and 3 errors generated.


vim +248 fs/bcachefs/fs.c

   233	
   234	static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c, gfp_t gfp)
   235	{
   236		struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, gfp);
   237		if (!inode)
   238			return NULL;
   239	
   240		inode_init_once(&inode->v);
   241		mutex_init(&inode->ei_update_lock);
   242		two_state_lock_init(&inode->ei_pagecache_lock);
   243		INIT_LIST_HEAD(&inode->ei_vfs_inode_list);
   244		inode->ei_flags = 0;
   245		mutex_init(&inode->ei_quota_lock);
   246		memset(&inode->ei_devs_need_flush, 0, sizeof(inode->ei_devs_need_flush));
   247	
 > 248		if (unlikely(inode_init_always_gfp(c->vfs_sb, &inode->v), gfp)) {
   249			kmem_cache_free(bch2_inode_cache, inode);
   250			return NULL;
   251		}
   252	
   253		return inode;
   254	}
   255	
   256	/*
   257	 * Allocate a new inode, dropping/retaking btree locks if necessary:
   258	 */
   259	static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans)
   260	{
 > 261		struct bch_inode_info *inode = __bch2_new_inode(trans->c, GFP_NOWARN | GFP_NOWAIT);
   262	
   263		if (unlikely(!inode)) {
   264			int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c, GFP_NOFS)) ? 0 : -ENOMEM);
   265			if (ret && inode) {
   266				__destroy_inode(&inode->v);
   267				kmem_cache_free(bch2_inode_cache, inode);
   268			}
   269			if (ret)
   270				return ERR_PTR(ret);
   271		}
   272	
   273		return inode;
   274	}
   275	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26 20:43           ` Kent Overstreet
@ 2024-08-26 21:10             ` Kent Overstreet
  2024-08-27  6:01             ` Michal Hocko
  1 sibling, 0 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-08-26 21:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon, Aug 26, 2024 at 04:44:01PM GMT, Kent Overstreet wrote:
> No, I explained why GFP_NORECLAIM/PF_MEMALLOC_NORECLAIM can absolutely
> apply to a context, not a callsite, and why vmalloc() and kvmalloc()
> ignoring gfp flags is a much more serious issue.
> 
> If you want to do something useful, figure out what we're going to do
> about _that_. If you really don't want PF_MEMALLOC_NORECLAIM to exist,
> then see if Linus will let you plumb gfp flags down to pte allocation -
> and beware, that's arch code that you'll have to fix.
> 
> Reminder: kvmalloc() is a thing, and it's steadily seeing wider use.
> 
> Otherwise, PF_MEMALLOC_NORECLAIM needs to stay; and thank you for
> bringing this to my attention, because it's made me realize all the
> other places in bcachefs that use gfp flags for allocating memory with
> btree locks held need to be switch to memalloc_flags_do().

Additionally: plumbing gfp flags to pte allocation is something we do
need to do. I proposed it before kvmalloc() was a thing, but now it's
become much more of a lurking landmine.

Even with that I'd still be against this series, though. GFP_NOFAIL
interacting badly with other gfp/memalloc flags is going to continue to
be an issue, and I think the only answer to that is stricter runtime
checks (which, thank you mm guys for adding recently).

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26  8:47 ` [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Michal Hocko
                     ` (3 preceding siblings ...)
  2024-08-26 20:53   ` kernel test robot
@ 2024-08-27  2:23   ` kernel test robot
  2024-08-27  6:15   ` [PATCH 1/2 v2] " Michal Hocko
  2024-08-29  9:37   ` [PATCH 1/2] " Jan Kara
  6 siblings, 0 replies; 78+ messages in thread
From: kernel test robot @ 2024-08-27  2:23 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton
  Cc: oe-kbuild-all, Linux Memory Management List, Christoph Hellwig,
	Yafang Shao, Kent Overstreet, jack, Christian Brauner,
	Alexander Viro, Paul Moore, James Morris, Serge E. Hallyn,
	linux-fsdevel, linux-bcachefs, linux-security-module,
	linux-kernel, Michal Hocko

Hi Michal,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on tip/sched/core brauner-vfs/vfs.all linus/master v6.11-rc5]
[cannot apply to next-20240826]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Michal-Hocko/bcachefs-do-not-use-PF_MEMALLOC_NORECLAIM/20240826-171013
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240826085347.1152675-2-mhocko%40kernel.org
patch subject: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
config: i386-buildonly-randconfig-003-20240827 (https://download.01.org/0day-ci/archive/20240827/202408271041.5IWf4ZQC-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240827/202408271041.5IWf4ZQC-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408271041.5IWf4ZQC-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> security/security.c:664: warning: Function parameter or struct member 'gfp' not described in 'lsm_inode_alloc'
>> security/security.c:1586: warning: Function parameter or struct member 'gfp' not described in 'security_inode_alloc'


vim +664 security/security.c

33bf60cabcc768 Casey Schaufler 2018-11-12  654  
afb1cbe37440c7 Casey Schaufler 2018-09-21  655  /**
afb1cbe37440c7 Casey Schaufler 2018-09-21  656   * lsm_inode_alloc - allocate a composite inode blob
afb1cbe37440c7 Casey Schaufler 2018-09-21  657   * @inode: the inode that needs a blob
afb1cbe37440c7 Casey Schaufler 2018-09-21  658   *
afb1cbe37440c7 Casey Schaufler 2018-09-21  659   * Allocate the inode blob for all the modules
afb1cbe37440c7 Casey Schaufler 2018-09-21  660   *
afb1cbe37440c7 Casey Schaufler 2018-09-21  661   * Returns 0, or -ENOMEM if memory can't be allocated.
afb1cbe37440c7 Casey Schaufler 2018-09-21  662   */
b2ce84652b3193 Michal Hocko    2024-08-26  663  int lsm_inode_alloc(struct inode *inode, gfp_t gfp)
afb1cbe37440c7 Casey Schaufler 2018-09-21 @664  {
afb1cbe37440c7 Casey Schaufler 2018-09-21  665  	if (!lsm_inode_cache) {
afb1cbe37440c7 Casey Schaufler 2018-09-21  666  		inode->i_security = NULL;
afb1cbe37440c7 Casey Schaufler 2018-09-21  667  		return 0;
afb1cbe37440c7 Casey Schaufler 2018-09-21  668  	}
afb1cbe37440c7 Casey Schaufler 2018-09-21  669  
b2ce84652b3193 Michal Hocko    2024-08-26  670  	inode->i_security = kmem_cache_zalloc(lsm_inode_cache, gfp);
afb1cbe37440c7 Casey Schaufler 2018-09-21  671  	if (inode->i_security == NULL)
afb1cbe37440c7 Casey Schaufler 2018-09-21  672  		return -ENOMEM;
afb1cbe37440c7 Casey Schaufler 2018-09-21  673  	return 0;
afb1cbe37440c7 Casey Schaufler 2018-09-21  674  }
afb1cbe37440c7 Casey Schaufler 2018-09-21  675  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26 20:43           ` Kent Overstreet
  2024-08-26 21:10             ` Kent Overstreet
@ 2024-08-27  6:01             ` Michal Hocko
  2024-08-27  6:40               ` Kent Overstreet
  1 sibling, 1 reply; 78+ messages in thread
From: Michal Hocko @ 2024-08-27  6:01 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon 26-08-24 16:43:55, Kent Overstreet wrote:
> On Mon, Aug 26, 2024 at 10:27:44PM GMT, Michal Hocko wrote:
> > On Mon 26-08-24 16:00:56, Kent Overstreet wrote:
> > > On Mon, Aug 26, 2024 at 09:58:08PM GMT, Michal Hocko wrote:
> > > > On Mon 26-08-24 15:39:47, Kent Overstreet wrote:
> > > > > On Mon, Aug 26, 2024 at 10:47:12AM GMT, Michal Hocko wrote:
> > > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > > 
> > > > > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
> > > > > > inode to achieve GFP_NOWAIT semantic while holding locks. If this
> > > > > > allocation fails it will drop locks and use GFP_NOFS allocation context.
> > > > > > 
> > > > > > We would like to drop PF_MEMALLOC_NORECLAIM because it is really
> > > > > > dangerous to use if the caller doesn't control the full call chain with
> > > > > > this flag set. E.g. if any of the function down the chain needed
> > > > > > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
> > > > > > cause unexpected failure.
> > > > > > 
> > > > > > While this is not the case in this particular case using the scoped gfp
> > > > > > semantic is not really needed bacause we can easily pus the allocation
> > > > > > context down the chain without too much clutter.
> > > > > 
> > > > > yeah, eesh, nack.
> > > > 
> > > > Sure, you can NAK this but then deal with the lack of the PF flag by
> > > > other means. We have made it clear that PF_MEMALLOC_NORECLAIM is not we
> > > > are going to support at the MM level. 
> > > > 
> > > > I have done your homework and shown that it is really easy
> > > > to use gfp flags directly. The net result is passing gfp flag down to
> > > > two functions. Sure part of it is ugglier by having several different
> > > > callbacks implementing it but still manageable. Without too much churn.
> > > > 
> > > > So do whatever you like in the bcache code but do not rely on something
> > > > that is unsupported by the MM layer which you have sneaked in without an
> > > > agreement.
> > > 
> > > Michal, you're being damned hostile, while posting code you haven't even
> > > tried to compile. Seriously, dude?
> > > 
> > > How about sticking to the technical issues at hand instead of saying
> > > "this is mm, so my way or the highway?". We're all kernel developers
> > > here, this is not what we do.
> > 
> > Kent, we do respect review feedback. You are clearly fine ignoring it
> > when you feels like it (eab0af905bfc ("mm: introduce
> > PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN") is a clear example of it).
> > 
> > I have already made my arguments (repeatedly) why implicit nowait
> > allocation context is tricky and problematic. Your response is that you
> > simply "do no buy it" which is a highly technical argument.
> 
> No, I explained why GFP_NORECLAIM/PF_MEMALLOC_NORECLAIM can absolutely
> apply to a context, not a callsite, and why vmalloc() and kvmalloc()
> ignoring gfp flags is a much more serious issue.

You are not really answering the main concern I have brought up though.
I.e. GFP_NOFAIL being fundamentally incompatible with NORECLAIM semantic
because the page allocator doesn't and will not support this allocation
mode.  Scoped noreclaim semantic makes such a use much less visible
because it can be deep in the scoped context there more error prone to
introduce thus making the code harder to maintain. 

I do see why you would like to have NOWAIT kvmalloc support available
and I also do see challenges in achieving that. But I completely fail to
see why you are bring that up _here_ as that is not really relevant to
PF_MEMALLOC_NORECLAIM use by bcachefs because it demonstrably doesn't
need that. There is no other user of the flag at the moment so dropping
the flag before there is more misuse is a reasonable goal. If you want
to bring up vmalloc NOWAIT support then feel free to do that in another
context and we can explore potential ways to achieve that.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26  8:47 ` [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Michal Hocko
                     ` (4 preceding siblings ...)
  2024-08-27  2:23   ` kernel test robot
@ 2024-08-27  6:15   ` Michal Hocko
  2024-08-27 12:29     ` Christoph Hellwig
  2024-08-28  4:09     ` Dave Chinner
  2024-08-29  9:37   ` [PATCH 1/2] " Jan Kara
  6 siblings, 2 replies; 78+ messages in thread
From: Michal Hocko @ 2024-08-27  6:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Yafang Shao, Kent Overstreet, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
inode to achieve GFP_NOWAIT semantic while holding locks. If this
allocation fails it will drop locks and use GFP_NOFS allocation context.

We would like to drop PF_MEMALLOC_NORECLAIM because it is really
dangerous to use if the caller doesn't control the full call chain with
this flag set. E.g. if any of the function down the chain needed
GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
cause unexpected failure.

While this is not the case in this particular case using the scoped gfp
semantic is not really needed bacause we can easily pus the allocation
context down the chain without too much clutter.

Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/bcachefs/fs.c          | 14 ++++++--------
 fs/inode.c                |  6 +++---
 include/linux/fs.h        |  7 ++++++-
 include/linux/lsm_hooks.h |  2 +-
 include/linux/security.h  |  4 ++--
 security/security.c       |  8 ++++----
 6 files changed, 22 insertions(+), 19 deletions(-)

Chancges since v1
- compile errors fixed 
- dropped GFP_NOWARN as it is part of GFP_NOWAIT now

diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
index 15fc41e63b6c..d151a2f28d12 100644
--- a/fs/bcachefs/fs.c
+++ b/fs/bcachefs/fs.c
@@ -231,9 +231,9 @@ static struct inode *bch2_alloc_inode(struct super_block *sb)
 	BUG();
 }
 
-static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
+static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c, gfp_t gfp)
 {
-	struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, GFP_NOFS);
+	struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, gfp);
 	if (!inode)
 		return NULL;
 
@@ -245,7 +245,7 @@ static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
 	mutex_init(&inode->ei_quota_lock);
 	memset(&inode->ei_devs_need_flush, 0, sizeof(inode->ei_devs_need_flush));
 
-	if (unlikely(inode_init_always(c->vfs_sb, &inode->v))) {
+	if (unlikely(inode_init_always_gfp(c->vfs_sb, &inode->v, gfp))) {
 		kmem_cache_free(bch2_inode_cache, inode);
 		return NULL;
 	}
@@ -258,12 +258,10 @@ static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
  */
 static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans)
 {
-	struct bch_inode_info *inode =
-		memalloc_flags_do(PF_MEMALLOC_NORECLAIM|PF_MEMALLOC_NOWARN,
-				  __bch2_new_inode(trans->c));
+	struct bch_inode_info *inode = __bch2_new_inode(trans->c, GFP_NOWAIT);
 
 	if (unlikely(!inode)) {
-		int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c)) ? 0 : -ENOMEM);
+		int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c, GFP_NOFS)) ? 0 : -ENOMEM);
 		if (ret && inode) {
 			__destroy_inode(&inode->v);
 			kmem_cache_free(bch2_inode_cache, inode);
@@ -328,7 +326,7 @@ __bch2_create(struct mnt_idmap *idmap,
 	if (ret)
 		return ERR_PTR(ret);
 #endif
-	inode = __bch2_new_inode(c);
+	inode = __bch2_new_inode(c, GFP_NOFS);
 	if (unlikely(!inode)) {
 		inode = ERR_PTR(-ENOMEM);
 		goto err;
diff --git a/fs/inode.c b/fs/inode.c
index 86670941884b..a2aabbcffbe4 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -153,7 +153,7 @@ static int no_open(struct inode *inode, struct file *file)
  * These are initializations that need to be done on every inode
  * allocation as the fields are not initialised by slab allocation.
  */
-int inode_init_always(struct super_block *sb, struct inode *inode)
+int inode_init_always_gfp(struct super_block *sb, struct inode *inode, gfp_t gfp)
 {
 	static const struct inode_operations empty_iops;
 	static const struct file_operations no_open_fops = {.open = no_open};
@@ -230,14 +230,14 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
 #endif
 	inode->i_flctx = NULL;
 
-	if (unlikely(security_inode_alloc(inode)))
+	if (unlikely(security_inode_alloc(inode, gfp)))
 		return -ENOMEM;
 
 	this_cpu_inc(nr_inodes);
 
 	return 0;
 }
-EXPORT_SYMBOL(inode_init_always);
+EXPORT_SYMBOL(inode_init_always_gfp);
 
 void free_inode_nonrcu(struct inode *inode)
 {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd34b5755c0b..d46ca71a7855 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3027,7 +3027,12 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int whence);
 
 extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence);
 
-extern int inode_init_always(struct super_block *, struct inode *);
+extern int inode_init_always_gfp(struct super_block *, struct inode *, gfp_t);
+static inline int inode_init_always(struct super_block *sb, struct inode *inode)
+{
+	return inode_init_always_gfp(sb, inode, GFP_NOFS);
+}
+
 extern void inode_init_once(struct inode *);
 extern void address_space_init_once(struct address_space *mapping);
 extern struct inode * igrab(struct inode *);
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index a2ade0ffe9e7..b08472d64765 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -150,6 +150,6 @@ extern struct lsm_info __start_early_lsm_info[], __end_early_lsm_info[];
 		__used __section(".early_lsm_info.init")		\
 		__aligned(sizeof(unsigned long))
 
-extern int lsm_inode_alloc(struct inode *inode);
+extern int lsm_inode_alloc(struct inode *inode, gfp_t gfp);
 
 #endif /* ! __LINUX_LSM_HOOKS_H */
diff --git a/include/linux/security.h b/include/linux/security.h
index 1390f1efb4f0..7c6b9b038a0d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -336,7 +336,7 @@ int security_dentry_create_files_as(struct dentry *dentry, int mode,
 					struct cred *new);
 int security_path_notify(const struct path *path, u64 mask,
 					unsigned int obj_type);
-int security_inode_alloc(struct inode *inode);
+int security_inode_alloc(struct inode *inode, gfp_t gfp);
 void security_inode_free(struct inode *inode);
 int security_inode_init_security(struct inode *inode, struct inode *dir,
 				 const struct qstr *qstr,
@@ -769,7 +769,7 @@ static inline int security_path_notify(const struct path *path, u64 mask,
 	return 0;
 }
 
-static inline int security_inode_alloc(struct inode *inode)
+static inline int security_inode_alloc(struct inode *inode, gfp_t gfp)
 {
 	return 0;
 }
diff --git a/security/security.c b/security/security.c
index 8cee5b6c6e6d..3581262da5ee 100644
--- a/security/security.c
+++ b/security/security.c
@@ -660,14 +660,14 @@ static int lsm_file_alloc(struct file *file)
  *
  * Returns 0, or -ENOMEM if memory can't be allocated.
  */
-int lsm_inode_alloc(struct inode *inode)
+int lsm_inode_alloc(struct inode *inode, gfp_t gfp)
 {
 	if (!lsm_inode_cache) {
 		inode->i_security = NULL;
 		return 0;
 	}
 
-	inode->i_security = kmem_cache_zalloc(lsm_inode_cache, GFP_NOFS);
+	inode->i_security = kmem_cache_zalloc(lsm_inode_cache, gfp);
 	if (inode->i_security == NULL)
 		return -ENOMEM;
 	return 0;
@@ -1582,9 +1582,9 @@ int security_path_notify(const struct path *path, u64 mask,
  *
  * Return: Return 0 if operation was successful.
  */
-int security_inode_alloc(struct inode *inode)
+int security_inode_alloc(struct inode *inode, gfp_t gfp)
 {
-	int rc = lsm_inode_alloc(inode);
+	int rc = lsm_inode_alloc(inode, gfp);
 
 	if (unlikely(rc))
 		return rc;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-27  6:01             ` Michal Hocko
@ 2024-08-27  6:40               ` Kent Overstreet
  2024-08-27  6:58                 ` Michal Hocko
  0 siblings, 1 reply; 78+ messages in thread
From: Kent Overstreet @ 2024-08-27  6:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Tue, Aug 27, 2024 at 08:01:32AM GMT, Michal Hocko wrote:
> You are not really answering the main concern I have brought up though.
> I.e. GFP_NOFAIL being fundamentally incompatible with NORECLAIM semantic
> because the page allocator doesn't and will not support this allocation
> mode.  Scoped noreclaim semantic makes such a use much less visible
> because it can be deep in the scoped context there more error prone to
> introduce thus making the code harder to maintain. 

You're too attached to GFP_NOFAIL.

GFP_NOFAIL is something we very rarely use, and it's not something we
want to use. Furthermore, GFP_NOFAIL allocations can fail regardless of
this patch - e.g. if it's more than 2 pages, it's not going to be
GFP_NOFAIL. In general you can't avoid having error paths for memory
allocations, and GFP_NOFAIL can't fundamentally change that.

Stop trying to design things around GFP_NOFAIL.

> I do see why you would like to have NOWAIT kvmalloc support available
> and I also do see challenges in achieving that. But I completely fail to
> see why you are bring that up _here_ as that is not really relevant to
> PF_MEMALLOC_NORECLAIM use by bcachefs because it demonstrably doesn't
> need that. There is no other user of the flag at the moment so dropping
> the flag before there is more misuse is a reasonable goal. If you want
> to bring up vmalloc NOWAIT support then feel free to do that in another
> context and we can explore potential ways to achieve that.

The inconsistencies between what PF_MEMALLOC supports and what gfp flags
support are quite relevant. If gfp flags aren't plumbed everywhere, and
aren't going to be plumbed everywhere (and that is the current
decision), then we must have PF_MEMALLOC support.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-27  6:40               ` Kent Overstreet
@ 2024-08-27  6:58                 ` Michal Hocko
  2024-08-27  7:05                   ` Kent Overstreet
  0 siblings, 1 reply; 78+ messages in thread
From: Michal Hocko @ 2024-08-27  6:58 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Tue 27-08-24 02:40:16, Kent Overstreet wrote:
> On Tue, Aug 27, 2024 at 08:01:32AM GMT, Michal Hocko wrote:
> > You are not really answering the main concern I have brought up though.
> > I.e. GFP_NOFAIL being fundamentally incompatible with NORECLAIM semantic
> > because the page allocator doesn't and will not support this allocation
> > mode.  Scoped noreclaim semantic makes such a use much less visible
> > because it can be deep in the scoped context there more error prone to
> > introduce thus making the code harder to maintain. 
> 
> You're too attached to GFP_NOFAIL.

Unfortunatelly GFP_NOFAIL is there and we need to support it. We cannot
just close eyes and pretend it doesn't exist and hope for the best.

> GFP_NOFAIL is something we very rarely use, and it's not something we
> want to use. Furthermore, GFP_NOFAIL allocations can fail regardless of
> this patch - e.g. if it's more than 2 pages, it's not going to be
> GFP_NOFAIL.

We can reasonably assume we do not have any of those users in the tree
though. We know that because we have a warning to tell us about that.
We still have legit GFP_NOFAIL users and we can safely assume we will
have some in the future though. And they have no way to handle the
failure. If they did they wouldn't have used GFP_NOFAIL in the first
place. So they do not check for NULL and they would either blow up or
worse fail in subtle and harder to detect way.
 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-27  6:58                 ` Michal Hocko
@ 2024-08-27  7:05                   ` Kent Overstreet
  2024-08-27  7:35                     ` Michal Hocko
  0 siblings, 1 reply; 78+ messages in thread
From: Kent Overstreet @ 2024-08-27  7:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Tue, Aug 27, 2024 at 08:58:39AM GMT, Michal Hocko wrote:
> On Tue 27-08-24 02:40:16, Kent Overstreet wrote:
> > On Tue, Aug 27, 2024 at 08:01:32AM GMT, Michal Hocko wrote:
> > > You are not really answering the main concern I have brought up though.
> > > I.e. GFP_NOFAIL being fundamentally incompatible with NORECLAIM semantic
> > > because the page allocator doesn't and will not support this allocation
> > > mode.  Scoped noreclaim semantic makes such a use much less visible
> > > because it can be deep in the scoped context there more error prone to
> > > introduce thus making the code harder to maintain. 
> > 
> > You're too attached to GFP_NOFAIL.
> 
> Unfortunatelly GFP_NOFAIL is there and we need to support it. We cannot
> just close eyes and pretend it doesn't exist and hope for the best.

You need to notice when you're trying to do something immpossible.

> > GFP_NOFAIL is something we very rarely use, and it's not something we
> > want to use. Furthermore, GFP_NOFAIL allocations can fail regardless of
> > this patch - e.g. if it's more than 2 pages, it's not going to be
> > GFP_NOFAIL.
> 
> We can reasonably assume we do not have any of those users in the tree
> though. We know that because we have a warning to tell us about that.
> We still have legit GFP_NOFAIL users and we can safely assume we will
> have some in the future though. And they have no way to handle the
> failure. If they did they wouldn't have used GFP_NOFAIL in the first
> place. So they do not check for NULL and they would either blow up or
> worse fail in subtle and harder to detect way.

No, because not all GFP_NOFAIL allocations are statically sized.

And the problem of the dynamic context overriding GFP_NOFAIL is more
general - if you use GFP_NOFAIL from nonblocking context (interrupt
context or preemption disabled) - the allocation has to fail, or
something even worse will happen.

Just because we don't track that with PF_MEMALLOC flags doesn't mean the
problem isn't htere.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-27  7:05                   ` Kent Overstreet
@ 2024-08-27  7:35                     ` Michal Hocko
  0 siblings, 0 replies; 78+ messages in thread
From: Michal Hocko @ 2024-08-27  7:35 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Tue 27-08-24 03:05:29, Kent Overstreet wrote:
> On Tue, Aug 27, 2024 at 08:58:39AM GMT, Michal Hocko wrote:
> > On Tue 27-08-24 02:40:16, Kent Overstreet wrote:
> > > On Tue, Aug 27, 2024 at 08:01:32AM GMT, Michal Hocko wrote:
> > > > You are not really answering the main concern I have brought up though.
> > > > I.e. GFP_NOFAIL being fundamentally incompatible with NORECLAIM semantic
> > > > because the page allocator doesn't and will not support this allocation
> > > > mode.  Scoped noreclaim semantic makes such a use much less visible
> > > > because it can be deep in the scoped context there more error prone to
> > > > introduce thus making the code harder to maintain. 
> > > 
> > > You're too attached to GFP_NOFAIL.
> > 
> > Unfortunatelly GFP_NOFAIL is there and we need to support it. We cannot
> > just close eyes and pretend it doesn't exist and hope for the best.
> 
> You need to notice when you're trying to do something immpossible.

Agreed! And GFP_NOFAIL for allocations <= order 1 in the page allocator or 
kvmalloc(GFP_NOFAIL) for reasonable sizes is a supported setup. And it
should work as documented and shouldn't create any surprises. Like
returning unexpected failure because you have been called from withing a
NORECLAIM scope which you as an author of the code are not even aware of
because that has happened somewhere detached from your code and you
happen to be in a callchain.

> > > GFP_NOFAIL is something we very rarely use, and it's not something we
> > > want to use. Furthermore, GFP_NOFAIL allocations can fail regardless of
> > > this patch - e.g. if it's more than 2 pages, it's not going to be
> > > GFP_NOFAIL.
> > 
> > We can reasonably assume we do not have any of those users in the tree
> > though. We know that because we have a warning to tell us about that.
> > We still have legit GFP_NOFAIL users and we can safely assume we will
> > have some in the future though. And they have no way to handle the
> > failure. If they did they wouldn't have used GFP_NOFAIL in the first
> > place. So they do not check for NULL and they would either blow up or
> > worse fail in subtle and harder to detect way.
> 
> No, because not all GFP_NOFAIL allocations are statically sized.

This is a runtime check warning.
rmqueue:
        WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));

> And the problem of the dynamic context overriding GFP_NOFAIL is more
> general - if you use GFP_NOFAIL from nonblocking context (interrupt
> context or preemption disabled) - the allocation has to fail, or
> something even worse will happen.

If you use __GFP_NOFAIL | GFP_KERNEL from an atomic context then you are
screwed the same way as if you used GFP_KERNEL alone - sleeping while
atomic or worse. The allocator doesn't even try to deal with this and
protect the caller by not sleeping and returning NULL.

More fundamentally, GFP_NOFAIL from non-blocking context is an incorrect
an unsupported use of the flag. This is the crux of the whole
discussion. GFP_NOWAIT | __GFP_NOFAIL or GFP_ATOMIC | __GFP_NOFAIL is
just a bug. We can git grep for those, and surprisingly found one instance
which already has a patch waiting to be merged.

We cannot enforce that at a compile time and that sucks but such is a
life. But we can grep for this at least. Now consider a scoped
(implicit) NOWAIT context which makes even seeemingly correct GFP_NOFAIL
use a bug.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM
  2024-08-26  8:47 ` [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM Michal Hocko
                     ` (2 preceding siblings ...)
  2024-08-26 19:04   ` Kent Overstreet
@ 2024-08-27 12:29   ` Christoph Hellwig
  3 siblings, 0 replies; 78+ messages in thread
From: Christoph Hellwig @ 2024-08-27 12:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, Kent Overstreet,
	jack, Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-27  6:15   ` [PATCH 1/2 v2] " Michal Hocko
@ 2024-08-27 12:29     ` Christoph Hellwig
  2024-08-28  4:09     ` Dave Chinner
  1 sibling, 0 replies; 78+ messages in thread
From: Christoph Hellwig @ 2024-08-27 12:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, Kent Overstreet,
	jack, Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

Thanks for fixing this up!

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-27  6:15   ` [PATCH 1/2 v2] " Michal Hocko
  2024-08-27 12:29     ` Christoph Hellwig
@ 2024-08-28  4:09     ` Dave Chinner
  2024-08-29 10:02       ` Kent Overstreet
  1 sibling, 1 reply; 78+ messages in thread
From: Dave Chinner @ 2024-08-28  4:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, Kent Overstreet,
	jack, Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Tue, Aug 27, 2024 at 08:15:43AM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
> inode to achieve GFP_NOWAIT semantic while holding locks. If this
> allocation fails it will drop locks and use GFP_NOFS allocation context.
> 
> We would like to drop PF_MEMALLOC_NORECLAIM because it is really
> dangerous to use if the caller doesn't control the full call chain with
> this flag set. E.g. if any of the function down the chain needed
> GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
> cause unexpected failure.
> 
> While this is not the case in this particular case using the scoped gfp
> semantic is not really needed bacause we can easily pus the allocation
> context down the chain without too much clutter.
> 
> Acked-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Looks good to me.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM
  2024-08-26 19:18         ` Michal Hocko
  2024-08-26 19:20           ` Matthew Wilcox
@ 2024-08-28  4:11           ` Dave Chinner
  2024-08-29 21:45           ` Vlastimil Babka
  2 siblings, 0 replies; 78+ messages in thread
From: Dave Chinner @ 2024-08-28  4:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, Andrew Morton, Christoph Hellwig, Yafang Shao,
	Kent Overstreet, jack, Christian Brauner, Alexander Viro,
	Paul Moore, James Morris, Serge E. Hallyn, linux-fsdevel,
	linux-mm, linux-bcachefs, linux-security-module, linux-kernel

On Mon, Aug 26, 2024 at 09:18:01PM +0200, Michal Hocko wrote:
> On Mon 26-08-24 18:49:23, Matthew Wilcox wrote:
> > On Mon, Aug 26, 2024 at 06:51:55PM +0200, Michal Hocko wrote:
> [...]
> > > If a plan revert is preferably, I will go with it.
> > 
> > There aren't any other users of PF_MEMALLOC_NOWARN and it definitely
> > seems like something you want at a callsite rather than blanket for every
> > allocation below this point.  We don't seem to have many PF_ flags left,
> > so let's not keep it around if there's no immediate plans for it.
> 
> Good point. What about this?
> --- 
> From 923cd429d4b1a3520c93bcf46611ae74a3158865 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Mon, 26 Aug 2024 21:15:02 +0200
> Subject: [PATCH] Revert "mm: introduce PF_MEMALLOC_NORECLAIM,
>  PF_MEMALLOC_NOWARN"
> 
> This reverts commit eab0af905bfc3e9c05da2ca163d76a1513159aa4.
> 
> There is no existing user of those flags. PF_MEMALLOC_NOWARN is
> dangerous because a nested allocation context can use GFP_NOFAIL which
> could cause unexpected failure. Such a code would be hard to maintain
> because it could be deeper in the call chain.
> 
> PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
> that such a allocation contex is inherently unsafe if the context
> doesn't fully control all allocations called from this context.
> 
> While PF_MEMALLOC_NOWARN is not dangerous the way PF_MEMALLOC_NORECLAIM
> is it doesn't have any user and as Matthew has pointed out we are
> running out of those flags so better reclaim it without any real users.
> 
> [1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Looks good to me.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-26  8:47 ` [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Michal Hocko
                     ` (5 preceding siblings ...)
  2024-08-27  6:15   ` [PATCH 1/2 v2] " Michal Hocko
@ 2024-08-29  9:37   ` Jan Kara
  6 siblings, 0 replies; 78+ messages in thread
From: Jan Kara @ 2024-08-29  9:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, Kent Overstreet,
	jack, Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Mon 26-08-24 10:47:12, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
> inode to achieve GFP_NOWAIT semantic while holding locks. If this
> allocation fails it will drop locks and use GFP_NOFS allocation context.
> 
> We would like to drop PF_MEMALLOC_NORECLAIM because it is really
> dangerous to use if the caller doesn't control the full call chain with
> this flag set. E.g. if any of the function down the chain needed
> GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
> cause unexpected failure.
> 
> While this is not the case in this particular case using the scoped gfp
> semantic is not really needed bacause we can easily pus the allocation
> context down the chain without too much clutter.
> 
> Acked-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

For the VFS changes feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/bcachefs/fs.c          | 14 ++++++--------
>  fs/inode.c                |  6 +++---
>  include/linux/fs.h        |  7 ++++++-
>  include/linux/lsm_hooks.h |  2 +-
>  include/linux/security.h  |  4 ++--
>  security/security.c       |  8 ++++----
>  6 files changed, 22 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/bcachefs/fs.c b/fs/bcachefs/fs.c
> index 15fc41e63b6c..7a55167b9133 100644
> --- a/fs/bcachefs/fs.c
> +++ b/fs/bcachefs/fs.c
> @@ -231,9 +231,9 @@ static struct inode *bch2_alloc_inode(struct super_block *sb)
>  	BUG();
>  }
>  
> -static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
> +static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c, gfp_t gfp)
>  {
> -	struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, GFP_NOFS);
> +	struct bch_inode_info *inode = kmem_cache_alloc(bch2_inode_cache, gfp);
>  	if (!inode)
>  		return NULL;
>  
> @@ -245,7 +245,7 @@ static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
>  	mutex_init(&inode->ei_quota_lock);
>  	memset(&inode->ei_devs_need_flush, 0, sizeof(inode->ei_devs_need_flush));
>  
> -	if (unlikely(inode_init_always(c->vfs_sb, &inode->v))) {
> +	if (unlikely(inode_init_always_gfp(c->vfs_sb, &inode->v), gfp)) {
>  		kmem_cache_free(bch2_inode_cache, inode);
>  		return NULL;
>  	}
> @@ -258,12 +258,10 @@ static struct bch_inode_info *__bch2_new_inode(struct bch_fs *c)
>   */
>  static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans)
>  {
> -	struct bch_inode_info *inode =
> -		memalloc_flags_do(PF_MEMALLOC_NORECLAIM|PF_MEMALLOC_NOWARN,
> -				  __bch2_new_inode(trans->c));
> +	struct bch_inode_info *inode = __bch2_new_inode(trans->c, GFP_NOWARN | GFP_NOWAIT);
>  
>  	if (unlikely(!inode)) {
> -		int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c)) ? 0 : -ENOMEM);
> +		int ret = drop_locks_do(trans, (inode = __bch2_new_inode(trans->c, GFP_NOFS)) ? 0 : -ENOMEM);
>  		if (ret && inode) {
>  			__destroy_inode(&inode->v);
>  			kmem_cache_free(bch2_inode_cache, inode);
> @@ -328,7 +326,7 @@ __bch2_create(struct mnt_idmap *idmap,
>  	if (ret)
>  		return ERR_PTR(ret);
>  #endif
> -	inode = __bch2_new_inode(c);
> +	inode = __bch2_new_inode(c, GFP_NOFS);
>  	if (unlikely(!inode)) {
>  		inode = ERR_PTR(-ENOMEM);
>  		goto err;
> diff --git a/fs/inode.c b/fs/inode.c
> index 86670941884b..95fd67a6cac3 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -153,7 +153,7 @@ static int no_open(struct inode *inode, struct file *file)
>   * These are initializations that need to be done on every inode
>   * allocation as the fields are not initialised by slab allocation.
>   */
> -int inode_init_always(struct super_block *sb, struct inode *inode)
> +int inode_init_always(struct super_block *sb, struct inode *inode, gfp_t gfp)
>  {
>  	static const struct inode_operations empty_iops;
>  	static const struct file_operations no_open_fops = {.open = no_open};
> @@ -230,14 +230,14 @@ int inode_init_always(struct super_block *sb, struct inode *inode)
>  #endif
>  	inode->i_flctx = NULL;
>  
> -	if (unlikely(security_inode_alloc(inode)))
> +	if (unlikely(security_inode_alloc(inode, gfp)))
>  		return -ENOMEM;
>  
>  	this_cpu_inc(nr_inodes);
>  
>  	return 0;
>  }
> -EXPORT_SYMBOL(inode_init_always);
> +EXPORT_SYMBOL(inode_init_always_gfp);
>  
>  void free_inode_nonrcu(struct inode *inode)
>  {
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index fd34b5755c0b..d46ca71a7855 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3027,7 +3027,12 @@ extern loff_t default_llseek(struct file *file, loff_t offset, int whence);
>  
>  extern loff_t vfs_llseek(struct file *file, loff_t offset, int whence);
>  
> -extern int inode_init_always(struct super_block *, struct inode *);
> +extern int inode_init_always_gfp(struct super_block *, struct inode *, gfp_t);
> +static inline int inode_init_always(struct super_block *sb, struct inode *inode)
> +{
> +	return inode_init_always_gfp(sb, inode, GFP_NOFS);
> +}
> +
>  extern void inode_init_once(struct inode *);
>  extern void address_space_init_once(struct address_space *mapping);
>  extern struct inode * igrab(struct inode *);
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index a2ade0ffe9e7..b08472d64765 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -150,6 +150,6 @@ extern struct lsm_info __start_early_lsm_info[], __end_early_lsm_info[];
>  		__used __section(".early_lsm_info.init")		\
>  		__aligned(sizeof(unsigned long))
>  
> -extern int lsm_inode_alloc(struct inode *inode);
> +extern int lsm_inode_alloc(struct inode *inode, gfp_t gfp);
>  
>  #endif /* ! __LINUX_LSM_HOOKS_H */
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 1390f1efb4f0..7c6b9b038a0d 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -336,7 +336,7 @@ int security_dentry_create_files_as(struct dentry *dentry, int mode,
>  					struct cred *new);
>  int security_path_notify(const struct path *path, u64 mask,
>  					unsigned int obj_type);
> -int security_inode_alloc(struct inode *inode);
> +int security_inode_alloc(struct inode *inode, gfp_t gfp);
>  void security_inode_free(struct inode *inode);
>  int security_inode_init_security(struct inode *inode, struct inode *dir,
>  				 const struct qstr *qstr,
> @@ -769,7 +769,7 @@ static inline int security_path_notify(const struct path *path, u64 mask,
>  	return 0;
>  }
>  
> -static inline int security_inode_alloc(struct inode *inode)
> +static inline int security_inode_alloc(struct inode *inode, gfp_t gfp)
>  {
>  	return 0;
>  }
> diff --git a/security/security.c b/security/security.c
> index 8cee5b6c6e6d..3581262da5ee 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -660,14 +660,14 @@ static int lsm_file_alloc(struct file *file)
>   *
>   * Returns 0, or -ENOMEM if memory can't be allocated.
>   */
> -int lsm_inode_alloc(struct inode *inode)
> +int lsm_inode_alloc(struct inode *inode, gfp_t gfp)
>  {
>  	if (!lsm_inode_cache) {
>  		inode->i_security = NULL;
>  		return 0;
>  	}
>  
> -	inode->i_security = kmem_cache_zalloc(lsm_inode_cache, GFP_NOFS);
> +	inode->i_security = kmem_cache_zalloc(lsm_inode_cache, gfp);
>  	if (inode->i_security == NULL)
>  		return -ENOMEM;
>  	return 0;
> @@ -1582,9 +1582,9 @@ int security_path_notify(const struct path *path, u64 mask,
>   *
>   * Return: Return 0 if operation was successful.
>   */
> -int security_inode_alloc(struct inode *inode)
> +int security_inode_alloc(struct inode *inode, gfp_t gfp)
>  {
> -	int rc = lsm_inode_alloc(inode);
> +	int rc = lsm_inode_alloc(inode, gfp);
>  
>  	if (unlikely(rc))
>  		return rc;
> -- 
> 2.46.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-28  4:09     ` Dave Chinner
@ 2024-08-29 10:02       ` Kent Overstreet
  2024-08-29 13:12         ` Dave Chinner
  0 siblings, 1 reply; 78+ messages in thread
From: Kent Overstreet @ 2024-08-29 10:02 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Michal Hocko, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Wed, Aug 28, 2024 at 02:09:57PM GMT, Dave Chinner wrote:
> On Tue, Aug 27, 2024 at 08:15:43AM +0200, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
> > inode to achieve GFP_NOWAIT semantic while holding locks. If this
> > allocation fails it will drop locks and use GFP_NOFS allocation context.
> > 
> > We would like to drop PF_MEMALLOC_NORECLAIM because it is really
> > dangerous to use if the caller doesn't control the full call chain with
> > this flag set. E.g. if any of the function down the chain needed
> > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
> > cause unexpected failure.
> > 
> > While this is not the case in this particular case using the scoped gfp
> > semantic is not really needed bacause we can easily pus the allocation
> > context down the chain without too much clutter.
> > 
> > Acked-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> Looks good to me.
> 
> Reviewed-by: Dave Chinner <dchinner@redhat.com>

Reposting what I wrote in the other thread:

This series is based on a fundamental misunderstanding of what a safe
API is: a safe API is not one that doesn't return errors, it's one that
never invokes undefined behaviour.

It was decided years ago that the scoped APIs were the better
replacement for the gfp APIs, and they need to exist.

This "GFP_NOFAIL exists therefore we can't tell the memory allocator
about situations wehre it would have to fail" is utter insanity - it's
the exact opposite of defining a safe API.

A safe API would be one where we /did/ always tell the memory allocator
when we're in non-sleepable context, and the allocator always returned
failure instead of context switching.

This is utter brain damage; rule #1 of kernel programming is that _you
check for errors_. If you don't know that your GFP_NOFAIL usage is in a
safe context (and that's not just memory reclaim context, it's also the
size of your alloction) then you have to check for errors.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-29 10:02       ` Kent Overstreet
@ 2024-08-29 13:12         ` Dave Chinner
  2024-08-29 13:22           ` Kent Overstreet
  2024-08-29 13:32           ` Kent Overstreet
  0 siblings, 2 replies; 78+ messages in thread
From: Dave Chinner @ 2024-08-29 13:12 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Michal Hocko, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Thu, Aug 29, 2024 at 06:02:32AM -0400, Kent Overstreet wrote:
> On Wed, Aug 28, 2024 at 02:09:57PM GMT, Dave Chinner wrote:
> > On Tue, Aug 27, 2024 at 08:15:43AM +0200, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
> > > inode to achieve GFP_NOWAIT semantic while holding locks. If this
> > > allocation fails it will drop locks and use GFP_NOFS allocation context.
> > > 
> > > We would like to drop PF_MEMALLOC_NORECLAIM because it is really
> > > dangerous to use if the caller doesn't control the full call chain with
> > > this flag set. E.g. if any of the function down the chain needed
> > > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
> > > cause unexpected failure.
> > > 
> > > While this is not the case in this particular case using the scoped gfp
> > > semantic is not really needed bacause we can easily pus the allocation
> > > context down the chain without too much clutter.
> > > 
> > > Acked-by: Christoph Hellwig <hch@lst.de>
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > 
> > Looks good to me.
> > 
> > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> Reposting what I wrote in the other thread:

I've read the thread. I've heard what you have had to say. Like
several other people, I think your position is just not practical or
reasonable.

I don't care about the purity or the safety of the API - the
practical result of PF_MEMALLOC_NORECLAIM is that __GFP_NOFAIL
allocation can now fail and that will cause unexpected kernel
crashes.  Keeping existing code and API semantics working correctly
(i.e. regression free) takes precedence over new functionality or
API features that people want to introduce.

That's all there is to it. This is not a hill you need to die on.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-29 13:12         ` Dave Chinner
@ 2024-08-29 13:22           ` Kent Overstreet
  2024-08-29 13:32           ` Kent Overstreet
  1 sibling, 0 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-08-29 13:22 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Michal Hocko, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Thu, Aug 29, 2024 at 11:12:18PM GMT, Dave Chinner wrote:
> On Thu, Aug 29, 2024 at 06:02:32AM -0400, Kent Overstreet wrote:
> > On Wed, Aug 28, 2024 at 02:09:57PM GMT, Dave Chinner wrote:
> > > On Tue, Aug 27, 2024 at 08:15:43AM +0200, Michal Hocko wrote:
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > 
> > > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
> > > > inode to achieve GFP_NOWAIT semantic while holding locks. If this
> > > > allocation fails it will drop locks and use GFP_NOFS allocation context.
> > > > 
> > > > We would like to drop PF_MEMALLOC_NORECLAIM because it is really
> > > > dangerous to use if the caller doesn't control the full call chain with
> > > > this flag set. E.g. if any of the function down the chain needed
> > > > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
> > > > cause unexpected failure.
> > > > 
> > > > While this is not the case in this particular case using the scoped gfp
> > > > semantic is not really needed bacause we can easily pus the allocation
> > > > context down the chain without too much clutter.
> > > > 
> > > > Acked-by: Christoph Hellwig <hch@lst.de>
> > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > 
> > > Looks good to me.
> > > 
> > > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > 
> > Reposting what I wrote in the other thread:
> 
> I've read the thread. I've heard what you have had to say. Like
> several other people, I think your position is just not practical or
> reasonable.
> 
> I don't care about the purity or the safety of the API - the
> practical result of PF_MEMALLOC_NORECLAIM is that __GFP_NOFAIL
> allocation can now fail and that will cause unexpected kernel
> crashes.  Keeping existing code and API semantics working correctly
> (i.e. regression free) takes precedence over new functionality or
> API features that people want to introduce.
> 
> That's all there is to it. This is not a hill you need to die on.

If you use GFP_NOFAIL in a context where you're not allowed to sleep,
that's a bug, same as any other bug where you get the gfp flags wrong
(e.g. GFP_KERNEL in non sleepable context).

This isn't going to affect you unless you start going around inserting
PF_MEMALLOC_NORECLAIM where it doesn't need to be. Why would you do
that?

But the lack of gfp flags for pte allocation means that this actually is
a serious gap we need to be fixing.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-29 13:12         ` Dave Chinner
  2024-08-29 13:22           ` Kent Overstreet
@ 2024-08-29 13:32           ` Kent Overstreet
  2024-08-29 14:03             ` Yafang Shao
  2024-09-02  0:23             ` Dave Chinner
  1 sibling, 2 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-08-29 13:32 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Michal Hocko, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Thu, Aug 29, 2024 at 11:12:18PM GMT, Dave Chinner wrote:
> On Thu, Aug 29, 2024 at 06:02:32AM -0400, Kent Overstreet wrote:
> > On Wed, Aug 28, 2024 at 02:09:57PM GMT, Dave Chinner wrote:
> > > On Tue, Aug 27, 2024 at 08:15:43AM +0200, Michal Hocko wrote:
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > 
> > > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
> > > > inode to achieve GFP_NOWAIT semantic while holding locks. If this
> > > > allocation fails it will drop locks and use GFP_NOFS allocation context.
> > > > 
> > > > We would like to drop PF_MEMALLOC_NORECLAIM because it is really
> > > > dangerous to use if the caller doesn't control the full call chain with
> > > > this flag set. E.g. if any of the function down the chain needed
> > > > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
> > > > cause unexpected failure.
> > > > 
> > > > While this is not the case in this particular case using the scoped gfp
> > > > semantic is not really needed bacause we can easily pus the allocation
> > > > context down the chain without too much clutter.
> > > > 
> > > > Acked-by: Christoph Hellwig <hch@lst.de>
> > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > 
> > > Looks good to me.
> > > 
> > > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > 
> > Reposting what I wrote in the other thread:
> 
> I've read the thread. I've heard what you have had to say. Like
> several other people, I think your position is just not practical or
> reasonable.
> 
> I don't care about the purity or the safety of the API - the
> practical result of PF_MEMALLOC_NORECLAIM is that __GFP_NOFAIL
> allocation can now fail and that will cause unexpected kernel
> crashes.  Keeping existing code and API semantics working correctly
> (i.e. regression free) takes precedence over new functionality or
> API features that people want to introduce.
> 
> That's all there is to it. This is not a hill you need to die on.

And more than that, this is coming from you saying "We didn't have to
handle memory allocation failures in IRIX, why can't we be like IRIX?
All those error paths are a pain to test, why can't we get rid of them?"

Except that's bullshit; at the very least any dynamically sized
allocation _definitely_ has to have an error path that's tested, and if
there's questions about the context a code path might run in, that
that's another reason.

GFP_NOFAIL is the problem here, and if it's encouraging this brain
damaged "why can't we just get rid of error paths?" thinking, then it
should be removed.

Error paths have to exist, and they have to be tested.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-29 13:32           ` Kent Overstreet
@ 2024-08-29 14:03             ` Yafang Shao
  2024-09-02  0:23             ` Dave Chinner
  1 sibling, 0 replies; 78+ messages in thread
From: Yafang Shao @ 2024-08-29 14:03 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Dave Chinner, Michal Hocko, Andrew Morton, Christoph Hellwig,
	jack, Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Thu, Aug 29, 2024 at 9:32 PM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Thu, Aug 29, 2024 at 11:12:18PM GMT, Dave Chinner wrote:
> > On Thu, Aug 29, 2024 at 06:02:32AM -0400, Kent Overstreet wrote:
> > > On Wed, Aug 28, 2024 at 02:09:57PM GMT, Dave Chinner wrote:
> > > > On Tue, Aug 27, 2024 at 08:15:43AM +0200, Michal Hocko wrote:
> > > > > From: Michal Hocko <mhocko@suse.com>
> > > > >
> > > > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
> > > > > inode to achieve GFP_NOWAIT semantic while holding locks. If this
> > > > > allocation fails it will drop locks and use GFP_NOFS allocation context.
> > > > >
> > > > > We would like to drop PF_MEMALLOC_NORECLAIM because it is really
> > > > > dangerous to use if the caller doesn't control the full call chain with
> > > > > this flag set. E.g. if any of the function down the chain needed
> > > > > GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
> > > > > cause unexpected failure.
> > > > >
> > > > > While this is not the case in this particular case using the scoped gfp
> > > > > semantic is not really needed bacause we can easily pus the allocation
> > > > > context down the chain without too much clutter.
> > > > >
> > > > > Acked-by: Christoph Hellwig <hch@lst.de>
> > > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > >
> > > > Looks good to me.
> > > >
> > > > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > >
> > > Reposting what I wrote in the other thread:
> >
> > I've read the thread. I've heard what you have had to say. Like
> > several other people, I think your position is just not practical or
> > reasonable.
> >
> > I don't care about the purity or the safety of the API - the
> > practical result of PF_MEMALLOC_NORECLAIM is that __GFP_NOFAIL
> > allocation can now fail and that will cause unexpected kernel
> > crashes.  Keeping existing code and API semantics working correctly
> > (i.e. regression free) takes precedence over new functionality or
> > API features that people want to introduce.
> >
> > That's all there is to it. This is not a hill you need to die on.
>
> And more than that, this is coming from you saying "We didn't have to
> handle memory allocation failures in IRIX, why can't we be like IRIX?
> All those error paths are a pain to test, why can't we get rid of them?"
>
> Except that's bullshit; at the very least any dynamically sized
> allocation _definitely_ has to have an error path that's tested, and if
> there's questions about the context a code path might run in, that
> that's another reason.
>
> GFP_NOFAIL is the problem here, and if it's encouraging this brain
> damaged "why can't we just get rid of error paths?" thinking, then it
> should be removed.
>
> Error paths have to exist, and they have to be tested.

I completely agree.
Adding a dead loop in the core of the page allocator just to bypass
error handling is a reckless idea.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM
  2024-08-26 19:18         ` Michal Hocko
  2024-08-26 19:20           ` Matthew Wilcox
  2024-08-28  4:11           ` Dave Chinner
@ 2024-08-29 21:45           ` Vlastimil Babka
  2 siblings, 0 replies; 78+ messages in thread
From: Vlastimil Babka @ 2024-08-29 21:45 UTC (permalink / raw)
  To: Michal Hocko, Matthew Wilcox
  Cc: Andrew Morton, Christoph Hellwig, Yafang Shao, Kent Overstreet,
	jack, Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On 8/26/24 21:18, Michal Hocko wrote:
> On Mon 26-08-24 18:49:23, Matthew Wilcox wrote:
>> On Mon, Aug 26, 2024 at 06:51:55PM +0200, Michal Hocko wrote:
> [...]
>> > If a plan revert is preferably, I will go with it.
>> 
>> There aren't any other users of PF_MEMALLOC_NOWARN and it definitely
>> seems like something you want at a callsite rather than blanket for every
>> allocation below this point.  We don't seem to have many PF_ flags left,
>> so let's not keep it around if there's no immediate plans for it.
> 
> Good point. What about this?
> --- 
> From 923cd429d4b1a3520c93bcf46611ae74a3158865 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Mon, 26 Aug 2024 21:15:02 +0200
> Subject: [PATCH] Revert "mm: introduce PF_MEMALLOC_NORECLAIM,
>  PF_MEMALLOC_NOWARN"
> 
> This reverts commit eab0af905bfc3e9c05da2ca163d76a1513159aa4.
> 
> There is no existing user of those flags. PF_MEMALLOC_NOWARN is
> dangerous because a nested allocation context can use GFP_NOFAIL which
> could cause unexpected failure. Such a code would be hard to maintain
> because it could be deeper in the call chain.
> 
> PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1]
> that such a allocation contex is inherently unsafe if the context
> doesn't fully control all allocations called from this context.
> 
> While PF_MEMALLOC_NOWARN is not dangerous the way PF_MEMALLOC_NORECLAIM
> is it doesn't have any user and as Matthew has pointed out we are
> running out of those flags so better reclaim it without any real users.
> 
> [1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  include/linux/sched.h    |  4 ++--
>  include/linux/sched/mm.h | 17 ++++-------------
>  2 files changed, 6 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index f8d150343d42..731ff1078c9e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1657,8 +1657,8 @@ extern struct pid *cad_pid;
>  						 * I am cleaning dirty pages from some other bdi. */
>  #define PF_KTHREAD		0x00200000	/* I am a kernel thread */
>  #define PF_RANDOMIZE		0x00400000	/* Randomize virtual address space */
> -#define PF_MEMALLOC_NORECLAIM	0x00800000	/* All allocation requests will clear __GFP_DIRECT_RECLAIM */
> -#define PF_MEMALLOC_NOWARN	0x01000000	/* All allocation requests will inherit __GFP_NOWARN */
> +#define PF__HOLE__00800000	0x00800000
> +#define PF__HOLE__01000000	0x01000000
>  #define PF__HOLE__02000000	0x02000000
>  #define PF_NO_SETAFFINITY	0x04000000	/* Userland is not allowed to meddle with cpus_mask */
>  #define PF_MCE_EARLY		0x08000000      /* Early kill for mce process policy */
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 91546493c43d..07c4fde32827 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -258,25 +258,16 @@ static inline gfp_t current_gfp_context(gfp_t flags)
>  {
>  	unsigned int pflags = READ_ONCE(current->flags);
>  
> -	if (unlikely(pflags & (PF_MEMALLOC_NOIO |
> -			       PF_MEMALLOC_NOFS |
> -			       PF_MEMALLOC_NORECLAIM |
> -			       PF_MEMALLOC_NOWARN |
> -			       PF_MEMALLOC_PIN))) {
> +	if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) {
>  		/*
> -		 * Stronger flags before weaker flags:
> -		 * NORECLAIM implies NOIO, which in turn implies NOFS
> +		 * NOIO implies both NOIO and NOFS and it is a weaker context
> +		 * so always make sure it makes precedence
>  		 */
> -		if (pflags & PF_MEMALLOC_NORECLAIM)
> -			flags &= ~__GFP_DIRECT_RECLAIM;
> -		else if (pflags & PF_MEMALLOC_NOIO)
> +		if (pflags & PF_MEMALLOC_NOIO)
>  			flags &= ~(__GFP_IO | __GFP_FS);
>  		else if (pflags & PF_MEMALLOC_NOFS)
>  			flags &= ~__GFP_FS;
>  
> -		if (pflags & PF_MEMALLOC_NOWARN)
> -			flags |= __GFP_NOWARN;
> -
>  		if (pflags & PF_MEMALLOC_PIN)
>  			flags &= ~__GFP_MOVABLE;
>  	}


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-08-29 13:32           ` Kent Overstreet
  2024-08-29 14:03             ` Yafang Shao
@ 2024-09-02  0:23             ` Dave Chinner
  2024-09-02  1:35               ` Kent Overstreet
  1 sibling, 1 reply; 78+ messages in thread
From: Dave Chinner @ 2024-09-02  0:23 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Michal Hocko, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Thu, Aug 29, 2024 at 09:32:52AM -0400, Kent Overstreet wrote:
> On Thu, Aug 29, 2024 at 11:12:18PM GMT, Dave Chinner wrote:
> > On Thu, Aug 29, 2024 at 06:02:32AM -0400, Kent Overstreet wrote:
> > > On Wed, Aug 28, 2024 at 02:09:57PM GMT, Dave Chinner wrote:
> > > > On Tue, Aug 27, 2024 at 08:15:43AM +0200, Michal Hocko
> > > > wrote:
> > > > > From: Michal Hocko <mhocko@suse.com>
> > > > > 
> > > > > bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to
> > > > > allocate a new inode to achieve GFP_NOWAIT semantic while
> > > > > holding locks. If this allocation fails it will drop locks
> > > > > and use GFP_NOFS allocation context.
> > > > > 
> > > > > We would like to drop PF_MEMALLOC_NORECLAIM because it is
> > > > > really dangerous to use if the caller doesn't control the
> > > > > full call chain with this flag set. E.g. if any of the
> > > > > function down the chain needed GFP_NOFAIL request the
> > > > > PF_MEMALLOC_NORECLAIM would override this and cause
> > > > > unexpected failure.
> > > > > 
> > > > > While this is not the case in this particular case using
> > > > > the scoped gfp semantic is not really needed bacause we
> > > > > can easily pus the allocation context down the chain
> > > > > without too much clutter.
> > > > > 
> > > > > Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by:
> > > > > Michal Hocko <mhocko@suse.com>
> > > > 
> > > > Looks good to me.
> > > > 
> > > > Reviewed-by: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Reposting what I wrote in the other thread:
> > 
> > I've read the thread. I've heard what you have had to say. Like
> > several other people, I think your position is just not
> > practical or reasonable.
> > 
> > I don't care about the purity or the safety of the API - the
> > practical result of PF_MEMALLOC_NORECLAIM is that __GFP_NOFAIL
> > allocation can now fail and that will cause unexpected kernel
> > crashes.  Keeping existing code and API semantics working
> > correctly (i.e. regression free) takes precedence over new
> > functionality or API features that people want to introduce.
> > 
> > That's all there is to it. This is not a hill you need to die
> > on.
> 
> And more than that, this is coming from you saying "We didn't have
> to handle memory allocation failures in IRIX, why can't we be like
> IRIX?  All those error paths are a pain to test, why can't we get
> rid of them?"
>
You're not listening, Kent. We are not eliding error paths because
they aren't (or cannot be) tested.

It's a choice between whether a transient error (e.g. ENOMEM) should
be considered a fatal error or not. The architectural choice that
was made for XFS back in the 1990s was that the filesystem should
never fail when transient errors occur. The choice was to wait for
the transient error to go away and then continue on. The rest of the
filesystem was build around these fundamental behavioural choices.

This goes beyond memory allocation - we do it for IO errors, too.
e.g.  metadata writeback keeps trying to write back the metadata
repeatedly on -EIO.  On every EIO from a metadata write, we will
immediately attempt a rewrite without a backoff. If that rewrite
then fails, wei requeue the write for later resubmission. That means
we back off and wait for up to 30s before attempting the next
rewrite. 

Hence -EIO  on async metadata writeback won't fail/shutdown the
filesystem until a (configurable) number of repeated failures occurs
or the filesystem unmounts before the metadata could be written back
successfully.

There's good reason for this "wait for transients to resolve" method
of error handling - go back to the late 1990s and early 2000s and
high-end multi-path FC SAN based storage was well known to have
transient path failures that can take minutes to resolve before a
secondary path takes over. That was the sort of storage environment
XFS was designed to operate in, and those users expected the
filesystem to be extremely tolerant of transient failure conditions.

Hence failing an IO and shutting down the filesystem because there
are transient errors occuring in either the storage or the OS was
absolutely the wrong thing to be doing. It still is the wrong thing
to be doing - we want to wait until the transient error has
progressed to being classified as a permanent error before we take
drastic action like denying service to the filesystem.

Memory allocation failure has always been considered a transient
error by XFS that the OS will resolve in one way or another in a
realtively short period of time. If we're prepared to wait minutes
for IO path failures to resolve, waiting a couple of seconds for
transient low memory situations to resolve isn't a big deal.

Ranting about how we handle errors without understanding the error
model we are working within is not productive. bcachefs has a
different error handling model to almost every other filesystem out
there, but that doesn't mean every other filesystem must work the
same way that bcachefs does.

If changing this transient error handling model was as simple as
detecting an allocation failure and returning -ENOMEM, we would have
done that 20 years ago. But it isn't - the error handling model is
"block until transients resolve" so that the error handling paths
only need to handle fatal errors.

Therein lies the problem - those error handling paths need to be
substantially changed to be able to handle transient errors such as
ENOMEM. We'd need to either be able to back out of a dirty
transaction or restart the transaction in some way rather than
shutting down the filesystem.

Put simply: reclassifying ENOMEM from a "wait for transient to
resolve" handler to a "back out and restart" mechanism like bcachefs
uses requires re-architecting the entire log item architecture for
metadata modification tracking and journal space management.

Even if I knew how to implement this right now, it would require
years worth of engineering effort and resources before it would be
completed and ready for merge. Then it will take years more for all
the existing kernels to cycle out of production.

Further: this "ENOMEM is transient so retry" model has been used
without any significant issues in production systems for mission
critical infrastructure for the past 25+ years. There's a very
strong "if it ain't broke, don't fix it" argument to be made here.
The cost-benefit analysis comes out very strongly on the side of
keeping __GFP_NOFAIL semantics as they currently stand.

> Except that's bullshit; at the very least any dynamically sized
> allocation _definitely_ has to have an error path that's tested, and if
> there's questions about the context a code path might run in, that
> that's another reason.

We know the context we run __GFP_NOFAIL allocations in - transaction
context absolutely requires a task context because we take sleeping
locks, submit and wait on IO, do blocking memory allocation, etc. We
also know the size of the allocations because we've bounds checked
everything before we do an allocation.

Hence this argument of "__GFP_NOFAIL aboslutely requires error
checking because an invalid size or wonrg context might be used"
is completely irrelevant to XFS. If you call into the filesytsem
from an atomic context, you've lost long before we get to memory
allocation because filesystems take sleeping locks....

> GFP_NOFAIL is the problem here, and if it's encouraging this brain
> damaged "why can't we just get rid of error paths?" thinking, then it
> should be removed.
>
> Error paths have to exist, and they have to be tested.

__GFP_NOFAIL is *not the problem*, and we are not "avoiding error
handling".  Backing off, looping and trying again is a valid
mechanism for handling transient failure conditions. Having a flag
that tells the allocator to "backoff, loop and try again" is a
perfectly good way of providing a generic error handling mechanism.

IOWs, Using __GFP_NOFAIL doesn't mean we are "not handling errors";
it simply means we have moved the error handling we were doing
inside the allocator.  And yes, we test the hell out of this error
handling path....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-09-02  0:23             ` Dave Chinner
@ 2024-09-02  1:35               ` Kent Overstreet
  2024-09-02  8:41                 ` Michal Hocko
  0 siblings, 1 reply; 78+ messages in thread
From: Kent Overstreet @ 2024-09-02  1:35 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Michal Hocko, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Michal Hocko

On Mon, Sep 02, 2024 at 10:23:06AM GMT, Dave Chinner wrote:
> On Thu, Aug 29, 2024 at 09:32:52AM -0400, Kent Overstreet wrote:
> > And more than that, this is coming from you saying "We didn't have
> > to handle memory allocation failures in IRIX, why can't we be like
> > IRIX?  All those error paths are a pain to test, why can't we get
> > rid of them?"
> >
> You're not listening, Kent. We are not eliding error paths because
> they aren't (or cannot be) tested.
> 
> It's a choice between whether a transient error (e.g. ENOMEM) should
> be considered a fatal error or not. The architectural choice that
> was made for XFS back in the 1990s was that the filesystem should
> never fail when transient errors occur. The choice was to wait for
> the transient error to go away and then continue on. The rest of the
> filesystem was build around these fundamental behavioural choices.

<snipped>

> Hence failing an IO and shutting down the filesystem because there
> are transient errors occuring in either the storage or the OS was
> absolutely the wrong thing to be doing. It still is the wrong thing
> to be doing - we want to wait until the transient error has
> progressed to being classified as a permanent error before we take
> drastic action like denying service to the filesystem.

Two different things are getting conflated here.

I'm saying forcefully that __GFP_NOFAIL isn't a good design decision,
and it's bad for system behaviour when we're thrashing, because you,
willy and others have been talking openly about making something like
that the norm, and that concerns me.

I'm _not_ saying that you should have to rearchitect your codebase to
deal with transient -ENOMEMs... that's clearly not going to happen.

But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
in the case of bugs, because that's going to be an improvement w.r.t.
system robustness, in exactly the same way we don't use BUG_ON() if it's
something that we can't guarantee won't happen in the wild - we WARN()
and try to handle the error as best we can.

If we could someday get PF_MEMALLOC_NORECLAIM annotated everywhere we're
not in a sleepable context (or more likely, teach kmalloc() about
preempt count) - that turns a whole class of "scheduling while atomic"
bugs into recoverable errors. That'd be a good thing.

In XFS's case, the only thing you'd ever want to do on failure of a
__GFP_NOFAIL allocation is do an emergency shutdown - the code is buggy,
and those bugs tend to be quickly found and fixed (even quicker when the
system stays usable and the user can collect a backtrace).

> > Except that's bullshit; at the very least any dynamically sized
> > allocation _definitely_ has to have an error path that's tested, and if
> > there's questions about the context a code path might run in, that
> > that's another reason.
> 
> We know the context we run __GFP_NOFAIL allocations in - transaction
> context absolutely requires a task context because we take sleeping
> locks, submit and wait on IO, do blocking memory allocation, etc. We
> also know the size of the allocations because we've bounds checked
> everything before we do an allocation.

*nod*

> Hence this argument of "__GFP_NOFAIL aboslutely requires error
> checking because an invalid size or wonrg context might be used"
> is completely irrelevant to XFS. If you call into the filesytsem
> from an atomic context, you've lost long before we get to memory
> allocation because filesystems take sleeping locks....

Yeah, if you know the context of your allocation and you have bounds on
the allocation size that's all fine - that's your call.

_As a general policy_, I will say that even __GFP_NOFAIL allocations
should have error paths - becuase lots of allocations have sizes that
userspace can control, so it becomes a security issue and we need to be
careful what we tell people.

> > GFP_NOFAIL is the problem here, and if it's encouraging this brain
> > damaged "why can't we just get rid of error paths?" thinking, then it
> > should be removed.
> >
> > Error paths have to exist, and they have to be tested.
> 
> __GFP_NOFAIL is *not the problem*, and we are not "avoiding error
> handling".  Backing off, looping and trying again is a valid
> mechanism for handling transient failure conditions. Having a flag
> that tells the allocator to "backoff, loop and try again" is a
> perfectly good way of providing a generic error handling mechanism.

But we do have to differentiate between transient allocation failures
and failures that are a result of bugs. Because just looping means a
buggy allocation call becomes, at best, a hung task you can't kill.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-09-02  1:35               ` Kent Overstreet
@ 2024-09-02  8:41                 ` Michal Hocko
  2024-09-02  8:52                   ` Kent Overstreet
  0 siblings, 1 reply; 78+ messages in thread
From: Michal Hocko @ 2024-09-02  8:41 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Dave Chinner, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
[...]
> But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
> in the case of bugs, because that's going to be an improvement w.r.t.
> system robustness, in exactly the same way we don't use BUG_ON() if it's
> something that we can't guarantee won't happen in the wild - we WARN()
> and try to handle the error as best we can.

We have discussed that in a different email thread. And I have to say
that I am not convinced that returning NULL makes a broken code much
better. Why? Because we can expect that broken NOFAIL users will not have a
error checking path. Even valid NOFAIL users will not have one because
they _know_ they do not have a different than retry for ever recovery
path. 

That means that an unexpected NULL return either means OOPS or a subtle
silent error - e.g. memory corruption. The former is a actually a saner
recovery model because the execution is stopped before more harm can be
done. I suspect most of those buggy users will simply OOPS but
systematically checking for latter is a lot of work and needs to be
constantly because code evolves...

I have tried to argue that if allocator cannot or refuse to satisfy
GFP_NOFAIL request because it is trying to use unsupported allocation
mode or size then we should terminate the allocation context. That would
make the API more predictable and therefore safer to use.

This is not what the allocator does today though. Atomic NOFAIL
allocations fail same as kvmalloc requests which are clearly overflows.
Especially the later could become a risk if they are reachable from the
userspace with controlable allocation size.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-09-02  8:41                 ` Michal Hocko
@ 2024-09-02  8:52                   ` Kent Overstreet
  2024-09-02  9:39                     ` Michal Hocko
  0 siblings, 1 reply; 78+ messages in thread
From: Kent Overstreet @ 2024-09-02  8:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dave Chinner, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
> On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
> [...]
> > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
> > in the case of bugs, because that's going to be an improvement w.r.t.
> > system robustness, in exactly the same way we don't use BUG_ON() if it's
> > something that we can't guarantee won't happen in the wild - we WARN()
> > and try to handle the error as best we can.
> 
> We have discussed that in a different email thread. And I have to say
> that I am not convinced that returning NULL makes a broken code much
> better. Why? Because we can expect that broken NOFAIL users will not have a
> error checking path. Even valid NOFAIL users will not have one because
> they _know_ they do not have a different than retry for ever recovery
> path. 

You mean where I asked you for a link to the discussion and rationale
you claimed had happened? Still waiting on that

> That means that an unexpected NULL return either means OOPS or a subtle
> silent error - e.g. memory corruption. The former is a actually a saner
> recovery model because the execution is stopped before more harm can be
> done. I suspect most of those buggy users will simply OOPS but
> systematically checking for latter is a lot of work and needs to be
> constantly because code evolves...
> 
> I have tried to argue that if allocator cannot or refuse to satisfy
> GFP_NOFAIL request because it is trying to use unsupported allocation
> mode or size then we should terminate the allocation context. That would
> make the API more predictable and therefore safer to use.

You're arguing that we treat it like an oops?

Yeah, enough of this insanity.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-09-02  8:52                   ` Kent Overstreet
@ 2024-09-02  9:39                     ` Michal Hocko
  2024-09-02  9:51                       ` Kent Overstreet
  0 siblings, 1 reply; 78+ messages in thread
From: Michal Hocko @ 2024-09-02  9:39 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Dave Chinner, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
> On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
> > On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
> > [...]
> > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
> > > in the case of bugs, because that's going to be an improvement w.r.t.
> > > system robustness, in exactly the same way we don't use BUG_ON() if it's
> > > something that we can't guarantee won't happen in the wild - we WARN()
> > > and try to handle the error as best we can.
> > 
> > We have discussed that in a different email thread. And I have to say
> > that I am not convinced that returning NULL makes a broken code much
> > better. Why? Because we can expect that broken NOFAIL users will not have a
> > error checking path. Even valid NOFAIL users will not have one because
> > they _know_ they do not have a different than retry for ever recovery
> > path. 
> 
> You mean where I asked you for a link to the discussion and rationale
> you claimed had happened? Still waiting on that

I am not your assistent to be tasked and search through lore archives.
Find one if you need that.

Anyway, if you read the email and even tried to understand what is
written there rather than immediately started shouting a response then
you would have noticed I have put actual arguments here. You are free to
disagree with them and lay down your arguments. You have decided to

[...]

> Yeah, enough of this insanity.

so I do not think you are able to do that. Again...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-09-02  9:39                     ` Michal Hocko
@ 2024-09-02  9:51                       ` Kent Overstreet
  2024-09-02 14:07                         ` Jonathan Corbet
                                           ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-09-02  9:51 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dave Chinner, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote:
> On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
> > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
> > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
> > > [...]
> > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
> > > > in the case of bugs, because that's going to be an improvement w.r.t.
> > > > system robustness, in exactly the same way we don't use BUG_ON() if it's
> > > > something that we can't guarantee won't happen in the wild - we WARN()
> > > > and try to handle the error as best we can.
> > > 
> > > We have discussed that in a different email thread. And I have to say
> > > that I am not convinced that returning NULL makes a broken code much
> > > better. Why? Because we can expect that broken NOFAIL users will not have a
> > > error checking path. Even valid NOFAIL users will not have one because
> > > they _know_ they do not have a different than retry for ever recovery
> > > path. 
> > 
> > You mean where I asked you for a link to the discussion and rationale
> > you claimed had happened? Still waiting on that
> 
> I am not your assistent to be tasked and search through lore archives.
> Find one if you need that.
> 
> Anyway, if you read the email and even tried to understand what is
> written there rather than immediately started shouting a response then
> you would have noticed I have put actual arguments here. You are free to
> disagree with them and lay down your arguments. You have decided to
> 
> [...]
> 
> > Yeah, enough of this insanity.
> 
> so I do not think you are able to do that. Again...

Michal, if you think crashing processes is an acceptable alternative to
error handling _you have no business writing kernel code_.

You have been stridently arguing for one bad idea after another, and
it's an insult to those of us who do give a shit about writing reliable
software.

You're arguing against basic precepts of kernel programming.

Get your head examined. And get the fuck out of here with this shit.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-09-02  9:51                       ` Kent Overstreet
@ 2024-09-02 14:07                         ` Jonathan Corbet
  2024-09-04 18:01                         ` Shuah Khan
  2024-11-22 21:48                         ` Dan Williams
  2 siblings, 0 replies; 78+ messages in thread
From: Jonathan Corbet @ 2024-09-02 14:07 UTC (permalink / raw)
  To: Kent Overstreet, Michal Hocko
  Cc: Dave Chinner, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel

Kent Overstreet <kent.overstreet@linux.dev> writes:

> You're arguing against basic precepts of kernel programming.
>
> Get your head examined. And get the fuck out of here with this shit.

Kent, this is not the way to deal with your peers in this community, no
matter how strongly you disagree with them.  Might I suggest calming
down a bit, followed by an apology?

Thanks,

jon

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-09-02  9:51                       ` Kent Overstreet
  2024-09-02 14:07                         ` Jonathan Corbet
@ 2024-09-04 18:01                         ` Shuah Khan
  2024-11-20 20:34                           ` Kent Overstreet
  2024-11-22 21:48                         ` Dan Williams
  2 siblings, 1 reply; 78+ messages in thread
From: Shuah Khan @ 2024-09-04 18:01 UTC (permalink / raw)
  To: Kent Overstreet, Michal Hocko
  Cc: Dave Chinner, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, Shuah Khan,
	conduct@kernel.org

On 9/2/24 03:51, Kent Overstreet wrote:
> On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote:
>> On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
>>> On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
>>>> On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
>>>> [...]
>>>>> But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
>>>>> in the case of bugs, because that's going to be an improvement w.r.t.
>>>>> system robustness, in exactly the same way we don't use BUG_ON() if it's
>>>>> something that we can't guarantee won't happen in the wild - we WARN()
>>>>> and try to handle the error as best we can.
>>>>
>>>> We have discussed that in a different email thread. And I have to say
>>>> that I am not convinced that returning NULL makes a broken code much
>>>> better. Why? Because we can expect that broken NOFAIL users will not have a
>>>> error checking path. Even valid NOFAIL users will not have one because
>>>> they _know_ they do not have a different than retry for ever recovery
>>>> path.
>>>
>>> You mean where I asked you for a link to the discussion and rationale
>>> you claimed had happened? Still waiting on that
>>
>> I am not your assistent to be tasked and search through lore archives.
>> Find one if you need that.
>>
>> Anyway, if you read the email and even tried to understand what is
>> written there rather than immediately started shouting a response then
>> you would have noticed I have put actual arguments here. You are free to
>> disagree with them and lay down your arguments. You have decided to
>>
>> [...]
>>
>>> Yeah, enough of this insanity.
>>
>> so I do not think you are able to do that. Again...
> 
> Michal, if you think crashing processes is an acceptable alternative to
> error handling _you have no business writing kernel code_.
> 
> You have been stridently arguing for one bad idea after another, and
> it's an insult to those of us who do give a shit about writing reliable
> software.
> 
> You're arguing against basic precepts of kernel programming.
> 
> Get your head examined. And get the fuck out of here with this shit.
> 

Kent,

Using language like this is clearly unacceptable and violates the
Code of Conduct. This type of language doesn't promote respectful
and productive discussions and is detrimental to the health of the
community.

You should be well aware that this type of language and personal
attack is a clear violation of the Linux kernel Contributor Covenant
Code of Conduct as outlined in the following:

https://www.kernel.org/doc/html/latest/process/code-of-conduct.html

Refer to the Code of Conduct and refrain from violating the Code of
Conduct in the future.

thanks,
-- Shuah (On behalf of the Code of Conduct Committee)

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-09-04 18:01                         ` Shuah Khan
@ 2024-11-20 20:34                           ` Kent Overstreet
  2024-11-20 21:12                             ` Shuah Khan
  0 siblings, 1 reply; 78+ messages in thread
From: Kent Overstreet @ 2024-11-20 20:34 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Michal Hocko, Dave Chinner, Andrew Morton, Christoph Hellwig,
	Yafang Shao, jack, Christian Brauner, Alexander Viro, Paul Moore,
	James Morris, Serge E. Hallyn, linux-fsdevel, linux-mm,
	linux-bcachefs, linux-security-module, linux-kernel,
	conduct@kernel.org

On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote:
> On 9/2/24 03:51, Kent Overstreet wrote:
> > On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote:
> > > On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
> > > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
> > > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
> > > > > [...]
> > > > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
> > > > > > in the case of bugs, because that's going to be an improvement w.r.t.
> > > > > > system robustness, in exactly the same way we don't use BUG_ON() if it's
> > > > > > something that we can't guarantee won't happen in the wild - we WARN()
> > > > > > and try to handle the error as best we can.
> > > > > 
> > > > > We have discussed that in a different email thread. And I have to say
> > > > > that I am not convinced that returning NULL makes a broken code much
> > > > > better. Why? Because we can expect that broken NOFAIL users will not have a
> > > > > error checking path. Even valid NOFAIL users will not have one because
> > > > > they _know_ they do not have a different than retry for ever recovery
> > > > > path.
> > > > 
> > > > You mean where I asked you for a link to the discussion and rationale
> > > > you claimed had happened? Still waiting on that
> > > 
> > > I am not your assistent to be tasked and search through lore archives.
> > > Find one if you need that.
> > > 
> > > Anyway, if you read the email and even tried to understand what is
> > > written there rather than immediately started shouting a response then
> > > you would have noticed I have put actual arguments here. You are free to
> > > disagree with them and lay down your arguments. You have decided to
> > > 
> > > [...]
> > > 
> > > > Yeah, enough of this insanity.
> > > 
> > > so I do not think you are able to do that. Again...
> > 
> > Michal, if you think crashing processes is an acceptable alternative to
> > error handling _you have no business writing kernel code_.
> > 
> > You have been stridently arguing for one bad idea after another, and
> > it's an insult to those of us who do give a shit about writing reliable
> > software.
> > 
> > You're arguing against basic precepts of kernel programming.
> > 
> > Get your head examined. And get the fuck out of here with this shit.
> > 
> 
> Kent,
> 
> Using language like this is clearly unacceptable and violates the
> Code of Conduct. This type of language doesn't promote respectful
> and productive discussions and is detrimental to the health of the
> community.
> 
> You should be well aware that this type of language and personal
> attack is a clear violation of the Linux kernel Contributor Covenant
> Code of Conduct as outlined in the following:
> 
> https://www.kernel.org/doc/html/latest/process/code-of-conduct.html
> 
> Refer to the Code of Conduct and refrain from violating the Code of
> Conduct in the future.

I believe Michal and I have more or less worked this out privately (and
you guys have been copied on that as well).

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 20:34                           ` Kent Overstreet
@ 2024-11-20 21:12                             ` Shuah Khan
  2024-11-20 21:20                               ` Kent Overstreet
  0 siblings, 1 reply; 78+ messages in thread
From: Shuah Khan @ 2024-11-20 21:12 UTC (permalink / raw)
  To: Kent Overstreet, Michal Hocko
  Cc: Dave Chinner, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, conduct@kernel.org,
	Shuah Khan

On 11/20/24 13:34, Kent Overstreet wrote:
> On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote:
>> On 9/2/24 03:51, Kent Overstreet wrote:
>>> On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote:
>>>> On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
>>>>> On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
>>>>>> On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
>>>>>> [...]
>>>>>>> But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
>>>>>>> in the case of bugs, because that's going to be an improvement w.r.t.
>>>>>>> system robustness, in exactly the same way we don't use BUG_ON() if it's
>>>>>>> something that we can't guarantee won't happen in the wild - we WARN()
>>>>>>> and try to handle the error as best we can.
>>>>>>
>>>>>> We have discussed that in a different email thread. And I have to say
>>>>>> that I am not convinced that returning NULL makes a broken code much
>>>>>> better. Why? Because we can expect that broken NOFAIL users will not have a
>>>>>> error checking path. Even valid NOFAIL users will not have one because
>>>>>> they _know_ they do not have a different than retry for ever recovery
>>>>>> path.
>>>>>
>>>>> You mean where I asked you for a link to the discussion and rationale
>>>>> you claimed had happened? Still waiting on that
>>>>
>>>> I am not your assistent to be tasked and search through lore archives.
>>>> Find one if you need that.
>>>>
>>>> Anyway, if you read the email and even tried to understand what is
>>>> written there rather than immediately started shouting a response then
>>>> you would have noticed I have put actual arguments here. You are free to
>>>> disagree with them and lay down your arguments. You have decided to
>>>>
>>>> [...]
>>>>
>>>>> Yeah, enough of this insanity.
>>>>
>>>> so I do not think you are able to do that. Again...
>>>
>>> Michal, if you think crashing processes is an acceptable alternative to
>>> error handling _you have no business writing kernel code_.
>>>
>>> You have been stridently arguing for one bad idea after another, and
>>> it's an insult to those of us who do give a shit about writing reliable
>>> software.
>>>
>>> You're arguing against basic precepts of kernel programming.
>>>
>>> Get your head examined. And get the fuck out of here with this shit.
>>>
>>
>> Kent,
>>
>> Using language like this is clearly unacceptable and violates the
>> Code of Conduct. This type of language doesn't promote respectful
>> and productive discussions and is detrimental to the health of the
>> community.
>>
>> You should be well aware that this type of language and personal
>> attack is a clear violation of the Linux kernel Contributor Covenant
>> Code of Conduct as outlined in the following:
>>
>> https://www.kernel.org/doc/html/latest/process/code-of-conduct.html
>>
>> Refer to the Code of Conduct and refrain from violating the Code of
>> Conduct in the future.
> 
> I believe Michal and I have more or less worked this out privately (and
> you guys have been copied on that as well).

Thank you for updating us on the behind the scenes work between you
and Michal.

I will make one correction to your statement, "you guys have been copied on
that as well" - which is inaccurate. You have shared your email exchanges
with Michal with us to let us know that the issue has been sorted out.

You might have your reasons and concerns about the direction of the code
and design that pertains to the discussion in this email thread. You might
have your reasons for expressing your frustration. However, those need to be
worked out as separate from this Code of Conduct violation.

In the case of unacceptable behaviors as defined in the Code of Conduct
document, the process is to work towards restoring productive and
respectful discussions. It is reasonable to ask for an apology to help
us get to the goal as soon as possible.

I urge you once again to apologize for using language that negatively impacts
productive discussions.

thanks,
-- Shuah (On behalf of the Code of Conduct Committee)
  


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 21:12                             ` Shuah Khan
@ 2024-11-20 21:20                               ` Kent Overstreet
  2024-11-20 21:37                                 ` Shuah Khan
  0 siblings, 1 reply; 78+ messages in thread
From: Kent Overstreet @ 2024-11-20 21:20 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Michal Hocko, Dave Chinner, Andrew Morton, Christoph Hellwig,
	Yafang Shao, jack, Christian Brauner, Alexander Viro, Paul Moore,
	James Morris, Serge E. Hallyn, linux-fsdevel, linux-mm,
	linux-bcachefs, linux-security-module, linux-kernel,
	conduct@kernel.org

On Wed, Nov 20, 2024 at 02:12:12PM -0700, Shuah Khan wrote:
> On 11/20/24 13:34, Kent Overstreet wrote:
> > On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote:
> > > On 9/2/24 03:51, Kent Overstreet wrote:
> > > > On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote:
> > > > > On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
> > > > > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
> > > > > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
> > > > > > > [...]
> > > > > > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
> > > > > > > > in the case of bugs, because that's going to be an improvement w.r.t.
> > > > > > > > system robustness, in exactly the same way we don't use BUG_ON() if it's
> > > > > > > > something that we can't guarantee won't happen in the wild - we WARN()
> > > > > > > > and try to handle the error as best we can.
> > > > > > > 
> > > > > > > We have discussed that in a different email thread. And I have to say
> > > > > > > that I am not convinced that returning NULL makes a broken code much
> > > > > > > better. Why? Because we can expect that broken NOFAIL users will not have a
> > > > > > > error checking path. Even valid NOFAIL users will not have one because
> > > > > > > they _know_ they do not have a different than retry for ever recovery
> > > > > > > path.
> > > > > > 
> > > > > > You mean where I asked you for a link to the discussion and rationale
> > > > > > you claimed had happened? Still waiting on that
> > > > > 
> > > > > I am not your assistent to be tasked and search through lore archives.
> > > > > Find one if you need that.
> > > > > 
> > > > > Anyway, if you read the email and even tried to understand what is
> > > > > written there rather than immediately started shouting a response then
> > > > > you would have noticed I have put actual arguments here. You are free to
> > > > > disagree with them and lay down your arguments. You have decided to
> > > > > 
> > > > > [...]
> > > > > 
> > > > > > Yeah, enough of this insanity.
> > > > > 
> > > > > so I do not think you are able to do that. Again...
> > > > 
> > > > Michal, if you think crashing processes is an acceptable alternative to
> > > > error handling _you have no business writing kernel code_.
> > > > 
> > > > You have been stridently arguing for one bad idea after another, and
> > > > it's an insult to those of us who do give a shit about writing reliable
> > > > software.
> > > > 
> > > > You're arguing against basic precepts of kernel programming.
> > > > 
> > > > Get your head examined. And get the fuck out of here with this shit.
> > > > 
> > > 
> > > Kent,
> > > 
> > > Using language like this is clearly unacceptable and violates the
> > > Code of Conduct. This type of language doesn't promote respectful
> > > and productive discussions and is detrimental to the health of the
> > > community.
> > > 
> > > You should be well aware that this type of language and personal
> > > attack is a clear violation of the Linux kernel Contributor Covenant
> > > Code of Conduct as outlined in the following:
> > > 
> > > https://www.kernel.org/doc/html/latest/process/code-of-conduct.html
> > > 
> > > Refer to the Code of Conduct and refrain from violating the Code of
> > > Conduct in the future.
> > 
> > I believe Michal and I have more or less worked this out privately (and
> > you guys have been copied on that as well).
> 
> Thank you for updating us on the behind the scenes work between you
> and Michal.
> 
> I will make one correction to your statement, "you guys have been copied on
> that as well" - which is inaccurate. You have shared your email exchanges
> with Michal with us to let us know that the issue has been sorted out.

That seems to be what I just said.

> You might have your reasons and concerns about the direction of the code
> and design that pertains to the discussion in this email thread. You might
> have your reasons for expressing your frustration. However, those need to be
> worked out as separate from this Code of Conduct violation.
> 
> In the case of unacceptable behaviors as defined in the Code of Conduct
> document, the process is to work towards restoring productive and
> respectful discussions. It is reasonable to ask for an apology to help
> us get to the goal as soon as possible.
> 
> I urge you once again to apologize for using language that negatively impacts
> productive discussions.

Shuah, I'd be happy to give you that after the discussion I suggested.
Failing that, I urge you to stick to what we agreed to last night.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 21:20                               ` Kent Overstreet
@ 2024-11-20 21:37                                 ` Shuah Khan
  2024-11-20 22:21                                   ` Shuah Khan
  0 siblings, 1 reply; 78+ messages in thread
From: Shuah Khan @ 2024-11-20 21:37 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Michal Hocko, Dave Chinner, Andrew Morton, Christoph Hellwig,
	Yafang Shao, jack, Christian Brauner, Alexander Viro, Paul Moore,
	James Morris, Serge E. Hallyn, linux-fsdevel, linux-mm,
	linux-bcachefs, linux-security-module, linux-kernel,
	conduct@kernel.org

On 11/20/24 14:20, Kent Overstreet wrote:
> On Wed, Nov 20, 2024 at 02:12:12PM -0700, Shuah Khan wrote:
>> On 11/20/24 13:34, Kent Overstreet wrote:
>>> On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote:
>>>> On 9/2/24 03:51, Kent Overstreet wrote:
>>>>> On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote:
>>>>>> On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
>>>>>>> On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
>>>>>>>> On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
>>>>>>>> [...]
>>>>>>>>> But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
>>>>>>>>> in the case of bugs, because that's going to be an improvement w.r.t.
>>>>>>>>> system robustness, in exactly the same way we don't use BUG_ON() if it's
>>>>>>>>> something that we can't guarantee won't happen in the wild - we WARN()
>>>>>>>>> and try to handle the error as best we can.
>>>>>>>>
>>>>>>>> We have discussed that in a different email thread. And I have to say
>>>>>>>> that I am not convinced that returning NULL makes a broken code much
>>>>>>>> better. Why? Because we can expect that broken NOFAIL users will not have a
>>>>>>>> error checking path. Even valid NOFAIL users will not have one because
>>>>>>>> they _know_ they do not have a different than retry for ever recovery
>>>>>>>> path.
>>>>>>>
>>>>>>> You mean where I asked you for a link to the discussion and rationale
>>>>>>> you claimed had happened? Still waiting on that
>>>>>>
>>>>>> I am not your assistent to be tasked and search through lore archives.
>>>>>> Find one if you need that.
>>>>>>
>>>>>> Anyway, if you read the email and even tried to understand what is
>>>>>> written there rather than immediately started shouting a response then
>>>>>> you would have noticed I have put actual arguments here. You are free to
>>>>>> disagree with them and lay down your arguments. You have decided to
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>> Yeah, enough of this insanity.
>>>>>>
>>>>>> so I do not think you are able to do that. Again...
>>>>>
>>>>> Michal, if you think crashing processes is an acceptable alternative to
>>>>> error handling _you have no business writing kernel code_.
>>>>>
>>>>> You have been stridently arguing for one bad idea after another, and
>>>>> it's an insult to those of us who do give a shit about writing reliable
>>>>> software.
>>>>>
>>>>> You're arguing against basic precepts of kernel programming.
>>>>>
>>>>> Get your head examined. And get the fuck out of here with this shit.
>>>>>
>>>>
>>>> Kent,
>>>>
>>>> Using language like this is clearly unacceptable and violates the
>>>> Code of Conduct. This type of language doesn't promote respectful
>>>> and productive discussions and is detrimental to the health of the
>>>> community.
>>>>
>>>> You should be well aware that this type of language and personal
>>>> attack is a clear violation of the Linux kernel Contributor Covenant
>>>> Code of Conduct as outlined in the following:
>>>>
>>>> https://www.kernel.org/doc/html/latest/process/code-of-conduct.html
>>>>
>>>> Refer to the Code of Conduct and refrain from violating the Code of
>>>> Conduct in the future.
>>>
>>> I believe Michal and I have more or less worked this out privately (and
>>> you guys have been copied on that as well).
>>
>> Thank you for updating us on the behind the scenes work between you
>> and Michal.
>>
>> I will make one correction to your statement, "you guys have been copied on
>> that as well" - which is inaccurate. You have shared your email exchanges
>> with Michal with us to let us know that the issue has been sorted out.
> 
> That seems to be what I just said.
> 
>> You might have your reasons and concerns about the direction of the code
>> and design that pertains to the discussion in this email thread. You might
>> have your reasons for expressing your frustration. However, those need to be
>> worked out as separate from this Code of Conduct violation.
>>
>> In the case of unacceptable behaviors as defined in the Code of Conduct
>> document, the process is to work towards restoring productive and
>> respectful discussions. It is reasonable to ask for an apology to help
>> us get to the goal as soon as possible.
>>
>> I urge you once again to apologize for using language that negatively impacts
>> productive discussions.
> 
> Shuah, I'd be happy to give you that after the discussion I suggested.
> Failing that, I urge you to stick to what we agreed to last night.

You have the right to the discussion and debate and engage the community
in the discussion. Discuss and debate away.

thanks,
-- Shuah (On behalf of the Code of Conduct Committee)

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 21:37                                 ` Shuah Khan
@ 2024-11-20 22:21                                   ` Shuah Khan
  2024-11-20 22:39                                     ` Kent Overstreet
  2024-11-21 21:26                                     ` Martin Steigerwald
  0 siblings, 2 replies; 78+ messages in thread
From: Shuah Khan @ 2024-11-20 22:21 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Michal Hocko, Dave Chinner, Andrew Morton, Christoph Hellwig,
	Yafang Shao, jack, Christian Brauner, Alexander Viro, Paul Moore,
	James Morris, Serge E. Hallyn, linux-fsdevel, linux-mm,
	linux-bcachefs, linux-security-module, linux-kernel,
	conduct@kernel.org, Shuah Khan

On 11/20/24 14:37, Shuah Khan wrote:
> On 11/20/24 14:20, Kent Overstreet wrote:
>> On Wed, Nov 20, 2024 at 02:12:12PM -0700, Shuah Khan wrote:
>>> On 11/20/24 13:34, Kent Overstreet wrote:
>>>> On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote:
>>>>> On 9/2/24 03:51, Kent Overstreet wrote:
>>>>>> On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote:
>>>>>>> On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
>>>>>>>> On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
>>>>>>>>> On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
>>>>>>>>> [...]
>>>>>>>>>> But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
>>>>>>>>>> in the case of bugs, because that's going to be an improvement w.r.t.
>>>>>>>>>> system robustness, in exactly the same way we don't use BUG_ON() if it's
>>>>>>>>>> something that we can't guarantee won't happen in the wild - we WARN()
>>>>>>>>>> and try to handle the error as best we can.
>>>>>>>>>
>>>>>>>>> We have discussed that in a different email thread. And I have to say
>>>>>>>>> that I am not convinced that returning NULL makes a broken code much
>>>>>>>>> better. Why? Because we can expect that broken NOFAIL users will not have a
>>>>>>>>> error checking path. Even valid NOFAIL users will not have one because
>>>>>>>>> they _know_ they do not have a different than retry for ever recovery
>>>>>>>>> path.
>>>>>>>>
>>>>>>>> You mean where I asked you for a link to the discussion and rationale
>>>>>>>> you claimed had happened? Still waiting on that
>>>>>>>
>>>>>>> I am not your assistent to be tasked and search through lore archives.
>>>>>>> Find one if you need that.
>>>>>>>
>>>>>>> Anyway, if you read the email and even tried to understand what is
>>>>>>> written there rather than immediately started shouting a response then
>>>>>>> you would have noticed I have put actual arguments here. You are free to
>>>>>>> disagree with them and lay down your arguments. You have decided to
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>> Yeah, enough of this insanity.
>>>>>>>
>>>>>>> so I do not think you are able to do that. Again...
>>>>>>
>>>>>> Michal, if you think crashing processes is an acceptable alternative to
>>>>>> error handling _you have no business writing kernel code_.
>>>>>>
>>>>>> You have been stridently arguing for one bad idea after another, and
>>>>>> it's an insult to those of us who do give a shit about writing reliable
>>>>>> software.
>>>>>>
>>>>>> You're arguing against basic precepts of kernel programming.
>>>>>>
>>>>>> Get your head examined. And get the fuck out of here with this shit.
>>>>>>
>>>>>
>>>>> Kent,
>>>>>
>>>>> Using language like this is clearly unacceptable and violates the
>>>>> Code of Conduct. This type of language doesn't promote respectful
>>>>> and productive discussions and is detrimental to the health of the
>>>>> community.
>>>>>
>>>>> You should be well aware that this type of language and personal
>>>>> attack is a clear violation of the Linux kernel Contributor Covenant
>>>>> Code of Conduct as outlined in the following:
>>>>>
>>>>> https://www.kernel.org/doc/html/latest/process/code-of-conduct.html
>>>>>
>>>>> Refer to the Code of Conduct and refrain from violating the Code of
>>>>> Conduct in the future.
>>>>
>>>> I believe Michal and I have more or less worked this out privately (and
>>>> you guys have been copied on that as well).
>>>
>>> Thank you for updating us on the behind the scenes work between you
>>> and Michal.
>>>
>>> I will make one correction to your statement, "you guys have been copied on
>>> that as well" - which is inaccurate. You have shared your email exchanges
>>> with Michal with us to let us know that the issue has been sorted out.
>>
>> That seems to be what I just said.
>>
>>> You might have your reasons and concerns about the direction of the code
>>> and design that pertains to the discussion in this email thread. You might
>>> have your reasons for expressing your frustration. However, those need to be
>>> worked out as separate from this Code of Conduct violation.
>>>
>>> In the case of unacceptable behaviors as defined in the Code of Conduct
>>> document, the process is to work towards restoring productive and
>>> respectful discussions. It is reasonable to ask for an apology to help
>>> us get to the goal as soon as possible.
>>>
>>> I urge you once again to apologize for using language that negatively impacts
>>> productive discussions.
>>
>> Shuah, I'd be happy to give you that after the discussion I suggested.
>> Failing that, I urge you to stick to what we agreed to last night.
The only thing we agreed upon is that you would respond the thread
to update your sorting things out with Michal.

As for the discussion, I will repeat what I said in our conversation
that the discussion will be lot more productive after making amends
with the community. I stand by that assessment.

I will also repeat what I said that the discussion and debate is
outside the scope of the current issue the Code of Conduct Committee
is trying to resolve.

I didn't pick up on your desire to apologize after the discussion in
our conversation.

Are you saying you will be happy to make amends with an apology after
the discussion and debate?

thanks,
-- Shuah (On behalf of the Code of Conduct Committee)

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 22:21                                   ` Shuah Khan
@ 2024-11-20 22:39                                     ` Kent Overstreet
  2024-11-20 22:55                                       ` Kent Overstreet
                                                         ` (2 more replies)
  2024-11-21 21:26                                     ` Martin Steigerwald
  1 sibling, 3 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-11-20 22:39 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Michal Hocko, Dave Chinner, Andrew Morton, Christoph Hellwig,
	Yafang Shao, jack, Christian Brauner, Alexander Viro, Paul Moore,
	James Morris, Serge E. Hallyn, linux-fsdevel, linux-mm,
	linux-bcachefs, linux-security-module, linux-kernel,
	conduct@kernel.org

On Wed, Nov 20, 2024 at 03:21:06PM -0700, Shuah Khan wrote:
> On 11/20/24 14:37, Shuah Khan wrote:
> > On 11/20/24 14:20, Kent Overstreet wrote:
> > > On Wed, Nov 20, 2024 at 02:12:12PM -0700, Shuah Khan wrote:
> > > > On 11/20/24 13:34, Kent Overstreet wrote:
> > > > > On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote:
> > > > > > On 9/2/24 03:51, Kent Overstreet wrote:
> > > > > > > On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote:
> > > > > > > > On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
> > > > > > > > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
> > > > > > > > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
> > > > > > > > > > [...]
> > > > > > > > > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
> > > > > > > > > > > in the case of bugs, because that's going to be an improvement w.r.t.
> > > > > > > > > > > system robustness, in exactly the same way we don't use BUG_ON() if it's
> > > > > > > > > > > something that we can't guarantee won't happen in the wild - we WARN()
> > > > > > > > > > > and try to handle the error as best we can.
> > > > > > > > > > 
> > > > > > > > > > We have discussed that in a different email thread. And I have to say
> > > > > > > > > > that I am not convinced that returning NULL makes a broken code much
> > > > > > > > > > better. Why? Because we can expect that broken NOFAIL users will not have a
> > > > > > > > > > error checking path. Even valid NOFAIL users will not have one because
> > > > > > > > > > they _know_ they do not have a different than retry for ever recovery
> > > > > > > > > > path.
> > > > > > > > > 
> > > > > > > > > You mean where I asked you for a link to the discussion and rationale
> > > > > > > > > you claimed had happened? Still waiting on that
> > > > > > > > 
> > > > > > > > I am not your assistent to be tasked and search through lore archives.
> > > > > > > > Find one if you need that.
> > > > > > > > 
> > > > > > > > Anyway, if you read the email and even tried to understand what is
> > > > > > > > written there rather than immediately started shouting a response then
> > > > > > > > you would have noticed I have put actual arguments here. You are free to
> > > > > > > > disagree with them and lay down your arguments. You have decided to
> > > > > > > > 
> > > > > > > > [...]
> > > > > > > > 
> > > > > > > > > Yeah, enough of this insanity.
> > > > > > > > 
> > > > > > > > so I do not think you are able to do that. Again...
> > > > > > > 
> > > > > > > Michal, if you think crashing processes is an acceptable alternative to
> > > > > > > error handling _you have no business writing kernel code_.
> > > > > > > 
> > > > > > > You have been stridently arguing for one bad idea after another, and
> > > > > > > it's an insult to those of us who do give a shit about writing reliable
> > > > > > > software.
> > > > > > > 
> > > > > > > You're arguing against basic precepts of kernel programming.
> > > > > > > 
> > > > > > > Get your head examined. And get the fuck out of here with this shit.
> > > > > > > 
> > > > > > 
> > > > > > Kent,
> > > > > > 
> > > > > > Using language like this is clearly unacceptable and violates the
> > > > > > Code of Conduct. This type of language doesn't promote respectful
> > > > > > and productive discussions and is detrimental to the health of the
> > > > > > community.
> > > > > > 
> > > > > > You should be well aware that this type of language and personal
> > > > > > attack is a clear violation of the Linux kernel Contributor Covenant
> > > > > > Code of Conduct as outlined in the following:
> > > > > > 
> > > > > > https://www.kernel.org/doc/html/latest/process/code-of-conduct.html
> > > > > > 
> > > > > > Refer to the Code of Conduct and refrain from violating the Code of
> > > > > > Conduct in the future.
> > > > > 
> > > > > I believe Michal and I have more or less worked this out privately (and
> > > > > you guys have been copied on that as well).
> > > > 
> > > > Thank you for updating us on the behind the scenes work between you
> > > > and Michal.
> > > > 
> > > > I will make one correction to your statement, "you guys have been copied on
> > > > that as well" - which is inaccurate. You have shared your email exchanges
> > > > with Michal with us to let us know that the issue has been sorted out.
> > > 
> > > That seems to be what I just said.
> > > 
> > > > You might have your reasons and concerns about the direction of the code
> > > > and design that pertains to the discussion in this email thread. You might
> > > > have your reasons for expressing your frustration. However, those need to be
> > > > worked out as separate from this Code of Conduct violation.
> > > > 
> > > > In the case of unacceptable behaviors as defined in the Code of Conduct
> > > > document, the process is to work towards restoring productive and
> > > > respectful discussions. It is reasonable to ask for an apology to help
> > > > us get to the goal as soon as possible.
> > > > 
> > > > I urge you once again to apologize for using language that negatively impacts
> > > > productive discussions.
> > > 
> > > Shuah, I'd be happy to give you that after the discussion I suggested.
> > > Failing that, I urge you to stick to what we agreed to last night.
> The only thing we agreed upon is that you would respond the thread
> to update your sorting things out with Michal.

...Shall I quote you?

> 
> As for the discussion, I will repeat what I said in our conversation
> that the discussion will be lot more productive after making amends
> with the community. I stand by that assessment.
> 
> I will also repeat what I said that the discussion and debate is
> outside the scope of the current issue the Code of Conduct Committee
> is trying to resolve.
> 
> I didn't pick up on your desire to apologize after the discussion in
> our conversation.
> 
> Are you saying you will be happy to make amends with an apology after
> the discussion and debate?

Look, I just want to be done with this, so let me lay it all out as I
see it, starting from the beginning of where things went off the rails
between myself and Michal:

Michal's (as well as Steve's) behaviour in the memory allocation
profiling review process was, in my view, unacceptable (this included
such things as crashing our LSF presentation with ideas they'd come up
with that morning, and persistent dismissive axegrinding on the list).
The project was nearly killed because of his inability to listen to the
reasons for a design and being stubbornly stuck on his right to be heard
as the maintainer.

In my view, being a good maintainer has a lot more to do with
stewardship and leadership, than stubbornly insisting for - whatever
that was. In any event, that was where I came to the conclusion "I just
cannot work that guy".

Next up, PF_MEMALLOC_NORECLAIM over Michal's nack - I was wrong there, I
only did it because it really seemed to me that Michal was axe grinding
against _anything_ I was posting, but I still shouldn't have and that
was more serious infraction in my view; that sort of thing causes a real
loss of trust, and no I will not do it again.

The subsequent PF_MEMALLOC_NORECLAIM discussion was such a trainwreck
that I don't think I will go into it. Except to say that yes, if it
makes you happy, I shouldn't have used that language and I won't do it
again.

But I do have to call out you, the CoC board's behaviour, and I think
that ony fair since you call out other people's behaviour publically.

Greg's behaviour when he approached me at Plumbers was beyond
unprofessional, and since it wasn't exactly public and you guys have
already heard about it privately I won't repeat exactly what happened,
but it is an issue.

Shuah, you weren't much better.

There were concerns raised in the recent CoC enforcement thread, by
someone with experience in such matters, that your aproach seemed
extremeely heavy handed and I find myself in 100% agreement.

The approach you take is that of a bad HR department: all about image,
no understanding. When tensions arise, it's important get to the bottom
of things, to at least try to take the time to listen with an open mind.
People have real frustrations, and it's amazing what you can learn and
what you can accomplish by having real conversations.

But that's not what you guys do: you say "Ok, if someone's being too
much of an asshole, we'll just be an even bigger asshole!".

No. Cut that out.

I've done the hard work of stepping in and building bridges when
relations have broken down (on quite a large scale), so I'm offended by
what you guys do.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 22:39                                     ` Kent Overstreet
@ 2024-11-20 22:55                                       ` Kent Overstreet
  2024-11-20 23:21                                         ` Kent Overstreet
                                                           ` (2 more replies)
  2024-11-21  8:43                                       ` review process (was: underalated stuff) Michal Hocko
  2024-11-21 20:17                                       ` [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Simona Vetter
  2 siblings, 3 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-11-20 22:55 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Michal Hocko, Dave Chinner, Andrew Morton, Christoph Hellwig,
	Yafang Shao, jack, Christian Brauner, Alexander Viro, Paul Moore,
	James Morris, Serge E. Hallyn, linux-fsdevel, linux-mm,
	linux-bcachefs, linux-security-module, linux-kernel,
	conduct@kernel.org

On Wed, Nov 20, 2024 at 05:39:19PM -0500, Kent Overstreet wrote:
> On Wed, Nov 20, 2024 at 03:21:06PM -0700, Shuah Khan wrote:
> > On 11/20/24 14:37, Shuah Khan wrote:
> > > On 11/20/24 14:20, Kent Overstreet wrote:
> > > > On Wed, Nov 20, 2024 at 02:12:12PM -0700, Shuah Khan wrote:
> > > > > On 11/20/24 13:34, Kent Overstreet wrote:
> > > > > > On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote:
> > > > > > > On 9/2/24 03:51, Kent Overstreet wrote:
> > > > > > > > On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote:
> > > > > > > > > On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
> > > > > > > > > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
> > > > > > > > > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
> > > > > > > > > > > [...]
> > > > > > > > > > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
> > > > > > > > > > > > in the case of bugs, because that's going to be an improvement w.r.t.
> > > > > > > > > > > > system robustness, in exactly the same way we don't use BUG_ON() if it's
> > > > > > > > > > > > something that we can't guarantee won't happen in the wild - we WARN()
> > > > > > > > > > > > and try to handle the error as best we can.
> > > > > > > > > > > 
> > > > > > > > > > > We have discussed that in a different email thread. And I have to say
> > > > > > > > > > > that I am not convinced that returning NULL makes a broken code much
> > > > > > > > > > > better. Why? Because we can expect that broken NOFAIL users will not have a
> > > > > > > > > > > error checking path. Even valid NOFAIL users will not have one because
> > > > > > > > > > > they _know_ they do not have a different than retry for ever recovery
> > > > > > > > > > > path.
> > > > > > > > > > 
> > > > > > > > > > You mean where I asked you for a link to the discussion and rationale
> > > > > > > > > > you claimed had happened? Still waiting on that
> > > > > > > > > 
> > > > > > > > > I am not your assistent to be tasked and search through lore archives.
> > > > > > > > > Find one if you need that.
> > > > > > > > > 
> > > > > > > > > Anyway, if you read the email and even tried to understand what is
> > > > > > > > > written there rather than immediately started shouting a response then
> > > > > > > > > you would have noticed I have put actual arguments here. You are free to
> > > > > > > > > disagree with them and lay down your arguments. You have decided to
> > > > > > > > > 
> > > > > > > > > [...]
> > > > > > > > > 
> > > > > > > > > > Yeah, enough of this insanity.
> > > > > > > > > 
> > > > > > > > > so I do not think you are able to do that. Again...
> > > > > > > > 
> > > > > > > > Michal, if you think crashing processes is an acceptable alternative to
> > > > > > > > error handling _you have no business writing kernel code_.
> > > > > > > > 
> > > > > > > > You have been stridently arguing for one bad idea after another, and
> > > > > > > > it's an insult to those of us who do give a shit about writing reliable
> > > > > > > > software.
> > > > > > > > 
> > > > > > > > You're arguing against basic precepts of kernel programming.
> > > > > > > > 
> > > > > > > > Get your head examined. And get the fuck out of here with this shit.
> > > > > > > > 
> > > > > > > 
> > > > > > > Kent,
> > > > > > > 
> > > > > > > Using language like this is clearly unacceptable and violates the
> > > > > > > Code of Conduct. This type of language doesn't promote respectful
> > > > > > > and productive discussions and is detrimental to the health of the
> > > > > > > community.
> > > > > > > 
> > > > > > > You should be well aware that this type of language and personal
> > > > > > > attack is a clear violation of the Linux kernel Contributor Covenant
> > > > > > > Code of Conduct as outlined in the following:
> > > > > > > 
> > > > > > > https://www.kernel.org/doc/html/latest/process/code-of-conduct.html
> > > > > > > 
> > > > > > > Refer to the Code of Conduct and refrain from violating the Code of
> > > > > > > Conduct in the future.
> > > > > > 
> > > > > > I believe Michal and I have more or less worked this out privately (and
> > > > > > you guys have been copied on that as well).
> > > > > 
> > > > > Thank you for updating us on the behind the scenes work between you
> > > > > and Michal.
> > > > > 
> > > > > I will make one correction to your statement, "you guys have been copied on
> > > > > that as well" - which is inaccurate. You have shared your email exchanges
> > > > > with Michal with us to let us know that the issue has been sorted out.
> > > > 
> > > > That seems to be what I just said.
> > > > 
> > > > > You might have your reasons and concerns about the direction of the code
> > > > > and design that pertains to the discussion in this email thread. You might
> > > > > have your reasons for expressing your frustration. However, those need to be
> > > > > worked out as separate from this Code of Conduct violation.
> > > > > 
> > > > > In the case of unacceptable behaviors as defined in the Code of Conduct
> > > > > document, the process is to work towards restoring productive and
> > > > > respectful discussions. It is reasonable to ask for an apology to help
> > > > > us get to the goal as soon as possible.
> > > > > 
> > > > > I urge you once again to apologize for using language that negatively impacts
> > > > > productive discussions.
> > > > 
> > > > Shuah, I'd be happy to give you that after the discussion I suggested.
> > > > Failing that, I urge you to stick to what we agreed to last night.
> > The only thing we agreed upon is that you would respond the thread
> > to update your sorting things out with Michal.
> 
> ...Shall I quote you?
> 
> > 
> > As for the discussion, I will repeat what I said in our conversation
> > that the discussion will be lot more productive after making amends
> > with the community. I stand by that assessment.
> > 
> > I will also repeat what I said that the discussion and debate is
> > outside the scope of the current issue the Code of Conduct Committee
> > is trying to resolve.
> > 
> > I didn't pick up on your desire to apologize after the discussion in
> > our conversation.
> > 
> > Are you saying you will be happy to make amends with an apology after
> > the discussion and debate?
> 
> Look, I just want to be done with this, so let me lay it all out as I
> see it, starting from the beginning of where things went off the rails
> between myself and Michal:
> 
> Michal's (as well as Steve's) behaviour in the memory allocation
> profiling review process was, in my view, unacceptable (this included
> such things as crashing our LSF presentation with ideas they'd come up
> with that morning, and persistent dismissive axegrinding on the list).
> The project was nearly killed because of his inability to listen to the
> reasons for a design and being stubbornly stuck on his right to be heard
> as the maintainer.
> 
> In my view, being a good maintainer has a lot more to do with
> stewardship and leadership, than stubbornly insisting for - whatever
> that was. In any event, that was where I came to the conclusion "I just
> cannot work that guy".
> 
> Next up, PF_MEMALLOC_NORECLAIM over Michal's nack - I was wrong there, I
> only did it because it really seemed to me that Michal was axe grinding
> against _anything_ I was posting, but I still shouldn't have and that
> was more serious infraction in my view; that sort of thing causes a real
> loss of trust, and no I will not do it again.
> 
> The subsequent PF_MEMALLOC_NORECLAIM discussion was such a trainwreck
> that I don't think I will go into it. Except to say that yes, if it
> makes you happy, I shouldn't have used that language and I won't do it
> again.
> 
> But I do have to call out you, the CoC board's behaviour, and I think
> that ony fair since you call out other people's behaviour publically.
> 
> Greg's behaviour when he approached me at Plumbers was beyond
> unprofessional, and since it wasn't exactly public and you guys have
> already heard about it privately I won't repeat exactly what happened,
> but it is an issue.
> 
> Shuah, you weren't much better.
> 
> There were concerns raised in the recent CoC enforcement thread, by
> someone with experience in such matters, that your aproach seemed
> extremeely heavy handed and I find myself in 100% agreement.
> 
> The approach you take is that of a bad HR department: all about image,
> no understanding. When tensions arise, it's important get to the bottom
> of things, to at least try to take the time to listen with an open mind.
> People have real frustrations, and it's amazing what you can learn and
> what you can accomplish by having real conversations.
> 
> But that's not what you guys do: you say "Ok, if someone's being too
> much of an asshole, we'll just be an even bigger asshole!".
> 
> No. Cut that out.
> 
> I've done the hard work of stepping in and building bridges when
> relations have broken down (on quite a large scale), so I'm offended by
> what you guys do.

Now, I've said two things I'll do differently, or not do in the future.

Michal, would you be willing to consider changing your approach a bit in
similar situations? Try to lead a little bit less by "I'm the mainainer,
my concerns must be addressed" and a little bit more by incorporating
the best of everyone's ideas, and showing respect to others who have
studied their problems, as you want to be respected as maintainer?

Shuah, would you be willing to entertain the notion of modifying your
approach a bit as well? More in the direction of genuine conversations
and understanding, less of just following a process and slapping people
if they don't comply?

We've got people in the community who are good at this sort of thing,
and might be willing to help if they were asked - it doesn't have to
just be you guys, and if we started encouraging this sort of thing it
could be a real learning experience for everyone.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 22:55                                       ` Kent Overstreet
@ 2024-11-20 23:21                                         ` Kent Overstreet
  2024-11-20 23:47                                         ` Theodore Ts'o
  2024-11-21  5:51                                         ` Kent Overstreet
  2 siblings, 0 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-11-20 23:21 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Michal Hocko, Dave Chinner, Andrew Morton, Christoph Hellwig,
	Yafang Shao, jack, Christian Brauner, Alexander Viro, Paul Moore,
	James Morris, Serge E. Hallyn, linux-fsdevel, linux-mm,
	linux-bcachefs, linux-security-module, linux-kernel,
	conduct@kernel.org

Lastly, the thing that motivated me to make an issue out of this was
several recent complaints, by my funders, that it's gotten increasingly
difficult to get work done on the lists lately without showing up at
conferences and shmoozing with the right people. I've noticed that as
well.

That's something we do need to address, and I see a common thread
between that and dismissive/authoritarian behaviour, and I think those
of us at the highest level (i.e. CoC board members) should be mindful of
how we set the tone for everyone else.

Yes, we're all Busy Important People (TM), but doing our jobs well
requires us to engage well with people.

I think that should be prioritized at least as much as "language". It's
not just about what words we use to communicate, it's about whether
we're able to communicate effectively or at all.

Couple's therapists say they can tell in a few minutes if a relationship
is worth salvaging or if it's beyond repair - and it comes down to if
they come in displaying anger or dismissiventess. Anger can be worked
through, dismissiveness means they no longer care.

I find the same is true with engineers. When people are pissed off about
something, that anger is often pointing to some important issue
underneath that, and getting to the bottom of it is going to hava a big
payoff. But when teams stop being able to work together - when people
start getting silod, afraid to stick their head up - that's really bad.

Vannevar Bush said that all he did was get people to talk to each other.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 22:55                                       ` Kent Overstreet
  2024-11-20 23:21                                         ` Kent Overstreet
@ 2024-11-20 23:47                                         ` Theodore Ts'o
  2024-11-20 23:57                                           ` Kent Overstreet
                                                             ` (4 more replies)
  2024-11-21  5:51                                         ` Kent Overstreet
  2 siblings, 5 replies; 78+ messages in thread
From: Theodore Ts'o @ 2024-11-20 23:47 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Shuah Khan, Michal Hocko, Dave Chinner, Andrew Morton,
	Christoph Hellwig, Yafang Shao, jack, Christian Brauner,
	Alexander Viro, Paul Moore, James Morris, Serge E. Hallyn,
	linux-fsdevel, linux-mm, linux-bcachefs, linux-security-module,
	linux-kernel, conduct@kernel.org

On Wed, Nov 20, 2024 at 05:55:03PM -0500, Kent Overstreet wrote:
> Shuah, would you be willing to entertain the notion of modifying your...

Kent, I'd like to gently remind you that Shuah is not speaking in her
personal capacity, but as a representative of the Code of Conduct
Committee[1], as she has noted in her signature.  The Code of Conduct
Committee is appointed by, and reports to, the TAB[2], which is an
elected body composed of kernel developers and maintainers.

[1] https://www.kernel.org/code-of-conduct.html
[2] https://www.kernel.org/doc/html/latest/process/code-of-conduct-interpretation.html

Speaking purely in a personal capacity, and not as a member of the TAB
(although I do serve as vice-chair of that body) I am extremely
grateful of the work of Shuah and her colleages (listed in [1]).  I
believe that their work is important in terms of establishing guard
rails regarding the minimum standards of behavior in our community.

If you look at the git history of the kernel sources, you will see
that a large number of your fellow maintainers assented to this
approach --- for example by providing their Acked-by in commit
1279dbeed36f ("Code of Conduct Interpretation: Add document explaining
how the Code of Conduct is to be interpreted").

Best regards,

						- Ted

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 23:47                                         ` Theodore Ts'o
@ 2024-11-20 23:57                                           ` Kent Overstreet
  2024-11-21  0:10                                           ` Kent Overstreet
                                                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-11-20 23:57 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Shuah Khan, Michal Hocko, Dave Chinner, Andrew Morton,
	Christoph Hellwig, Yafang Shao, jack, Christian Brauner,
	Alexander Viro, Paul Moore, James Morris, Serge E. Hallyn,
	linux-fsdevel, linux-mm, linux-bcachefs, linux-security-module,
	linux-kernel, conduct@kernel.org

On Wed, Nov 20, 2024 at 03:47:59PM -0800, Theodore Ts'o wrote:
> On Wed, Nov 20, 2024 at 05:55:03PM -0500, Kent Overstreet wrote:
> > Shuah, would you be willing to entertain the notion of modifying your...
> 
> Kent, I'd like to gently remind you that Shuah is not speaking in her
> personal capacity, but as a representative of the Code of Conduct
> Committee[1], as she has noted in her signature.  The Code of Conduct
> Committee is appointed by, and reports to, the TAB[2], which is an
> elected body composed of kernel developers and maintainers.
> 
> [1] https://www.kernel.org/code-of-conduct.html
> [2] https://www.kernel.org/doc/html/latest/process/code-of-conduct-interpretation.html
> 
> Speaking purely in a personal capacity, and not as a member of the TAB
> (although I do serve as vice-chair of that body) I am extremely
> grateful of the work of Shuah and her colleages (listed in [1]).  I
> believe that their work is important in terms of establishing guard
> rails regarding the minimum standards of behavior in our community.
> 
> If you look at the git history of the kernel sources, you will see
> that a large number of your fellow maintainers assented to this
> approach --- for example by providing their Acked-by in commit
> 1279dbeed36f ("Code of Conduct Interpretation: Add document explaining
> how the Code of Conduct is to be interpreted").

Authored by Shuah, so I do seem to be talking to the correct person.

Regardless, we talked privately about opening some sort of public
discussion, this is pursuant to that.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 23:47                                         ` Theodore Ts'o
  2024-11-20 23:57                                           ` Kent Overstreet
@ 2024-11-21  0:10                                           ` Kent Overstreet
  2024-11-21  4:25                                           ` Christoph Hellwig
                                                             ` (2 subsequent siblings)
  4 siblings, 0 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-11-21  0:10 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Shuah Khan, Michal Hocko, Dave Chinner, Andrew Morton,
	Christoph Hellwig, Yafang Shao, jack, Christian Brauner,
	Alexander Viro, Paul Moore, James Morris, Serge E. Hallyn,
	linux-fsdevel, linux-mm, linux-bcachefs, linux-security-module,
	linux-kernel, conduct@kernel.org

On Wed, Nov 20, 2024 at 03:47:59PM -0800, Theodore Ts'o wrote:
> On Wed, Nov 20, 2024 at 05:55:03PM -0500, Kent Overstreet wrote:
> > Shuah, would you be willing to entertain the notion of modifying your...
> 
> Kent, I'd like to gently remind you that Shuah is not speaking in her
> personal capacity, but as a representative of the Code of Conduct
> Committee[1], as she has noted in her signature.  The Code of Conduct
> Committee is appointed by, and reports to, the TAB[2], which is an
> elected body composed of kernel developers and maintainers.
> 
> [1] https://www.kernel.org/code-of-conduct.html
> [2] https://www.kernel.org/doc/html/latest/process/code-of-conduct-interpretation.html
> 
> Speaking purely in a personal capacity, and not as a member of the TAB
> (although I do serve as vice-chair of that body) I am extremely
> grateful of the work of Shuah and her colleages (listed in [1]).  I
> believe that their work is important in terms of establishing guard
> rails regarding the minimum standards of behavior in our community.
> 
> If you look at the git history of the kernel sources, you will see
> that a large number of your fellow maintainers assented to this
> approach --- for example by providing their Acked-by in commit
> 1279dbeed36f ("Code of Conduct Interpretation: Add document explaining
> how the Code of Conduct is to be interpreted").

And Ted, I don't think you realize just how at my limit I am here with
what I'm willing to put up with.

I'm coming off of, what, 6+ months of getting roasted and having my work
quested by Linus every pull request (and he did stop that, but not
before it had done real damage, both completely changing the tone of
public conversations and nearly scaring off people I've been trying to
hire).

It's gotten harder and harder, not easier, for me to get work done in
other parts of the kernel; I gave up long ago in the block layer after
the two people in charge there had repeatedly introduced silent data
corruption bugs into core block layer code that I'd written, without
CCing me, which I then had to debug, which they then ignored or put up
ridiculous fights over when reported, and now have turned petty on
subsequent block layer patches.

Filesystem people have been good to work with, thank god, but now
getting anything done in mm is looking like more and more of what the
block layer has turned into.

And you guys, because the system works for you, keep saying "nah,
everything is fine and this has already been decided, you don't get any
say".

Meanwhile, I'm seeing more and more heisenbugs in the rest of the kernel
as I'm stabilizing bcachefs, and my users are reporting the same - in
compaction, in the block layer, now in the scheduler or locking, I'm not
sure on the last one.

And I'm sitting here wondering how the hell I'm supposed to debug my own
code when I don't even have a stable base to work on anymore.

This is turning into an utter farce.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 23:47                                         ` Theodore Ts'o
  2024-11-20 23:57                                           ` Kent Overstreet
  2024-11-21  0:10                                           ` Kent Overstreet
@ 2024-11-21  4:25                                           ` Christoph Hellwig
  2024-11-21  4:53                                             ` Kent Overstreet
  2024-11-21 23:53                                             ` Shuah Khan
  2024-11-21 21:32                                           ` Martin Steigerwald
  2024-11-22  9:47                                           ` Geert Uytterhoeven
  4 siblings, 2 replies; 78+ messages in thread
From: Christoph Hellwig @ 2024-11-21  4:25 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Kent Overstreet, Shuah Khan, Michal Hocko, Dave Chinner,
	Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, conduct@kernel.org

On Wed, Nov 20, 2024 at 03:47:59PM -0800, Theodore Ts'o wrote:
> On Wed, Nov 20, 2024 at 05:55:03PM -0500, Kent Overstreet wrote:
> > Shuah, would you be willing to entertain the notion of modifying your...
> 
> Kent, I'd like to gently remind you that Shuah is not speaking in her
> personal capacity, but as a representative of the Code of Conduct
> Committee[1], as she has noted in her signature.  The Code of Conduct
> Committee is appointed by, and reports to, the TAB[2], which is an
> elected body composed of kernel developers and maintainers.

FYI, without taking any stance on the issue under debate here, I find the
language used by Shuah on behalf of the Code of Conduct committee
extremely patronising and passive aggressive.  This might be because I
do not have an American academic class background, but I would suggest
that the code of conduct committee should educate itself about
communicating without projecting this implicit cultural and class bias
so blatantly.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-21  4:25                                           ` Christoph Hellwig
@ 2024-11-21  4:53                                             ` Kent Overstreet
  2024-11-21 23:53                                             ` Shuah Khan
  1 sibling, 0 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-11-21  4:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Theodore Ts'o, Shuah Khan, Michal Hocko, Dave Chinner,
	Andrew Morton, Yafang Shao, jack, Christian Brauner,
	Alexander Viro, Paul Moore, James Morris, Serge E. Hallyn,
	linux-fsdevel, linux-mm, linux-bcachefs, linux-security-module,
	linux-kernel, conduct@kernel.org

On Thu, Nov 21, 2024 at 05:25:58AM +0100, Christoph Hellwig wrote:
> On Wed, Nov 20, 2024 at 03:47:59PM -0800, Theodore Ts'o wrote:
> > On Wed, Nov 20, 2024 at 05:55:03PM -0500, Kent Overstreet wrote:
> > > Shuah, would you be willing to entertain the notion of modifying your...
> > 
> > Kent, I'd like to gently remind you that Shuah is not speaking in her
> > personal capacity, but as a representative of the Code of Conduct
> > Committee[1], as she has noted in her signature.  The Code of Conduct
> > Committee is appointed by, and reports to, the TAB[2], which is an
> > elected body composed of kernel developers and maintainers.
> 
> FYI, without taking any stance on the issue under debate here, I find the
> language used by Shuah on behalf of the Code of Conduct committee
> extremely patronising and passive aggressive.  This might be because I
> do not have an American academic class background, but I would suggest
> that the code of conduct committee should educate itself about
> communicating without projecting this implicit cultural and class bias
> so blatantly.

I wasn't even thinking about the language issue, but I think that's a
very good point.

We have a very strong culture of personal responsibility and taking
responsibility for our work here, and I think that's one of the main
thing that enables us to function and work together even when we're
constantly butting heads.

Broadly speaking, what that means to me is: I will justify, explain and
give the reasoning for my actions if asked, and if I can't because I
made a mistake, then I will make it right or at least acknowledge my
mistake.

So, the very passive language from Shuah (i.e. "I'm not the one 'doing'
this, I'm just implementing the policy that we all decided on, even
though I totally wrote and advocated for that policy") does seem
problematic to me as well.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 22:55                                       ` Kent Overstreet
  2024-11-20 23:21                                         ` Kent Overstreet
  2024-11-20 23:47                                         ` Theodore Ts'o
@ 2024-11-21  5:51                                         ` Kent Overstreet
  2 siblings, 0 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-11-21  5:51 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Michal Hocko, Dave Chinner, Andrew Morton, Christoph Hellwig,
	Yafang Shao, jack, Christian Brauner, Alexander Viro, Paul Moore,
	James Morris, Serge E. Hallyn, linux-fsdevel, linux-mm,
	linux-bcachefs, linux-security-module, linux-kernel,
	conduct@kernel.org

On Wed, Nov 20, 2024 at 05:55:10PM -0500, Kent Overstreet wrote:
> On Wed, Nov 20, 2024 at 05:39:19PM -0500, Kent Overstreet wrote:
> > On Wed, Nov 20, 2024 at 03:21:06PM -0700, Shuah Khan wrote:
> > > On 11/20/24 14:37, Shuah Khan wrote:
> > > > On 11/20/24 14:20, Kent Overstreet wrote:
> > > > > On Wed, Nov 20, 2024 at 02:12:12PM -0700, Shuah Khan wrote:
> > > > > > On 11/20/24 13:34, Kent Overstreet wrote:
> > > > > > > On Wed, Sep 04, 2024 at 12:01:50PM -0600, Shuah Khan wrote:
> > > > > > > > On 9/2/24 03:51, Kent Overstreet wrote:
> > > > > > > > > On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote:
> > > > > > > > > > On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
> > > > > > > > > > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
> > > > > > > > > > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
> > > > > > > > > > > > [...]
> > > > > > > > > > > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
> > > > > > > > > > > > > in the case of bugs, because that's going to be an improvement w.r.t.
> > > > > > > > > > > > > system robustness, in exactly the same way we don't use BUG_ON() if it's
> > > > > > > > > > > > > something that we can't guarantee won't happen in the wild - we WARN()
> > > > > > > > > > > > > and try to handle the error as best we can.
> > > > > > > > > > > > 
> > > > > > > > > > > > We have discussed that in a different email thread. And I have to say
> > > > > > > > > > > > that I am not convinced that returning NULL makes a broken code much
> > > > > > > > > > > > better. Why? Because we can expect that broken NOFAIL users will not have a
> > > > > > > > > > > > error checking path. Even valid NOFAIL users will not have one because
> > > > > > > > > > > > they _know_ they do not have a different than retry for ever recovery
> > > > > > > > > > > > path.
> > > > > > > > > > > 
> > > > > > > > > > > You mean where I asked you for a link to the discussion and rationale
> > > > > > > > > > > you claimed had happened? Still waiting on that
> > > > > > > > > > 
> > > > > > > > > > I am not your assistent to be tasked and search through lore archives.
> > > > > > > > > > Find one if you need that.
> > > > > > > > > > 
> > > > > > > > > > Anyway, if you read the email and even tried to understand what is
> > > > > > > > > > written there rather than immediately started shouting a response then
> > > > > > > > > > you would have noticed I have put actual arguments here. You are free to
> > > > > > > > > > disagree with them and lay down your arguments. You have decided to
> > > > > > > > > > 
> > > > > > > > > > [...]
> > > > > > > > > > 
> > > > > > > > > > > Yeah, enough of this insanity.
> > > > > > > > > > 
> > > > > > > > > > so I do not think you are able to do that. Again...
> > > > > > > > > 
> > > > > > > > > Michal, if you think crashing processes is an acceptable alternative to
> > > > > > > > > error handling _you have no business writing kernel code_.
> > > > > > > > > 
> > > > > > > > > You have been stridently arguing for one bad idea after another, and
> > > > > > > > > it's an insult to those of us who do give a shit about writing reliable
> > > > > > > > > software.
> > > > > > > > > 
> > > > > > > > > You're arguing against basic precepts of kernel programming.
> > > > > > > > > 
> > > > > > > > > Get your head examined. And get the fuck out of here with this shit.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Kent,
> > > > > > > > 
> > > > > > > > Using language like this is clearly unacceptable and violates the
> > > > > > > > Code of Conduct. This type of language doesn't promote respectful
> > > > > > > > and productive discussions and is detrimental to the health of the
> > > > > > > > community.
> > > > > > > > 
> > > > > > > > You should be well aware that this type of language and personal
> > > > > > > > attack is a clear violation of the Linux kernel Contributor Covenant
> > > > > > > > Code of Conduct as outlined in the following:
> > > > > > > > 
> > > > > > > > https://www.kernel.org/doc/html/latest/process/code-of-conduct.html
> > > > > > > > 
> > > > > > > > Refer to the Code of Conduct and refrain from violating the Code of
> > > > > > > > Conduct in the future.
> > > > > > > 
> > > > > > > I believe Michal and I have more or less worked this out privately (and
> > > > > > > you guys have been copied on that as well).
> > > > > > 
> > > > > > Thank you for updating us on the behind the scenes work between you
> > > > > > and Michal.
> > > > > > 
> > > > > > I will make one correction to your statement, "you guys have been copied on
> > > > > > that as well" - which is inaccurate. You have shared your email exchanges
> > > > > > with Michal with us to let us know that the issue has been sorted out.
> > > > > 
> > > > > That seems to be what I just said.
> > > > > 
> > > > > > You might have your reasons and concerns about the direction of the code
> > > > > > and design that pertains to the discussion in this email thread. You might
> > > > > > have your reasons for expressing your frustration. However, those need to be
> > > > > > worked out as separate from this Code of Conduct violation.
> > > > > > 
> > > > > > In the case of unacceptable behaviors as defined in the Code of Conduct
> > > > > > document, the process is to work towards restoring productive and
> > > > > > respectful discussions. It is reasonable to ask for an apology to help
> > > > > > us get to the goal as soon as possible.
> > > > > > 
> > > > > > I urge you once again to apologize for using language that negatively impacts
> > > > > > productive discussions.
> > > > > 
> > > > > Shuah, I'd be happy to give you that after the discussion I suggested.
> > > > > Failing that, I urge you to stick to what we agreed to last night.
> > > The only thing we agreed upon is that you would respond the thread
> > > to update your sorting things out with Michal.
> > 
> > ...Shall I quote you?
> > 
> > > 
> > > As for the discussion, I will repeat what I said in our conversation
> > > that the discussion will be lot more productive after making amends
> > > with the community. I stand by that assessment.
> > > 
> > > I will also repeat what I said that the discussion and debate is
> > > outside the scope of the current issue the Code of Conduct Committee
> > > is trying to resolve.
> > > 
> > > I didn't pick up on your desire to apologize after the discussion in
> > > our conversation.
> > > 
> > > Are you saying you will be happy to make amends with an apology after
> > > the discussion and debate?
> > 
> > Look, I just want to be done with this, so let me lay it all out as I
> > see it, starting from the beginning of where things went off the rails
> > between myself and Michal:
> > 
> > Michal's (as well as Steve's) behaviour in the memory allocation
> > profiling review process was, in my view, unacceptable (this included
> > such things as crashing our LSF presentation with ideas they'd come up
> > with that morning, and persistent dismissive axegrinding on the list).
> > The project was nearly killed because of his inability to listen to the
> > reasons for a design and being stubbornly stuck on his right to be heard
> > as the maintainer.
> > 
> > In my view, being a good maintainer has a lot more to do with
> > stewardship and leadership, than stubbornly insisting for - whatever
> > that was. In any event, that was where I came to the conclusion "I just
> > cannot work that guy".
> > 
> > Next up, PF_MEMALLOC_NORECLAIM over Michal's nack - I was wrong there, I
> > only did it because it really seemed to me that Michal was axe grinding
> > against _anything_ I was posting, but I still shouldn't have and that
> > was more serious infraction in my view; that sort of thing causes a real
> > loss of trust, and no I will not do it again.
> > 
> > The subsequent PF_MEMALLOC_NORECLAIM discussion was such a trainwreck
> > that I don't think I will go into it. Except to say that yes, if it
> > makes you happy, I shouldn't have used that language and I won't do it
> > again.
> > 
> > But I do have to call out you, the CoC board's behaviour, and I think
> > that ony fair since you call out other people's behaviour publically.
> > 
> > Greg's behaviour when he approached me at Plumbers was beyond
> > unprofessional, and since it wasn't exactly public and you guys have
> > already heard about it privately I won't repeat exactly what happened,
> > but it is an issue.
> > 
> > Shuah, you weren't much better.
> > 
> > There were concerns raised in the recent CoC enforcement thread, by
> > someone with experience in such matters, that your aproach seemed
> > extremeely heavy handed and I find myself in 100% agreement.
> > 
> > The approach you take is that of a bad HR department: all about image,
> > no understanding. When tensions arise, it's important get to the bottom
> > of things, to at least try to take the time to listen with an open mind.
> > People have real frustrations, and it's amazing what you can learn and
> > what you can accomplish by having real conversations.
> > 
> > But that's not what you guys do: you say "Ok, if someone's being too
> > much of an asshole, we'll just be an even bigger asshole!".
> > 
> > No. Cut that out.
> > 
> > I've done the hard work of stepping in and building bridges when
> > relations have broken down (on quite a large scale), so I'm offended by
> > what you guys do.
> 
> Now, I've said two things I'll do differently, or not do in the future.
> 
> Michal, would you be willing to consider changing your approach a bit in
> similar situations? Try to lead a little bit less by "I'm the mainainer,
> my concerns must be addressed" and a little bit more by incorporating
> the best of everyone's ideas, and showing respect to others who have
> studied their problems, as you want to be respected as maintainer?
> 
> Shuah, would you be willing to entertain the notion of modifying your
> approach a bit as well? More in the direction of genuine conversations
> and understanding, less of just following a process and slapping people
> if they don't comply?
> 
> We've got people in the community who are good at this sort of thing,
> and might be willing to help if they were asked - it doesn't have to
> just be you guys, and if we started encouraging this sort of thing it
> could be a real learning experience for everyone.

Also, let's rehash a bit:

- This was worked out months ago, at Linus's behest
- Scanning back through the original thread, I'm reminded by how very
  much not one sided this was
- And there hasn't been any hope of intelligent conversation with Shuah,
  just broken record "we need you to do to this", so...

This is no longer a process I can take seriously. Linus will pull my
code, or he won't; it's out of my hands...

Night all.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* review process (was: underalated stuff)
  2024-11-20 22:39                                     ` Kent Overstreet
  2024-11-20 22:55                                       ` Kent Overstreet
@ 2024-11-21  8:43                                       ` Michal Hocko
  2024-11-21  9:03                                         ` Kent Overstreet
  2024-11-21 20:17                                       ` [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Simona Vetter
  2 siblings, 1 reply; 78+ messages in thread
From: Michal Hocko @ 2024-11-21  8:43 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Shuah Khan, Dave Chinner, Andrew Morton, Christoph Hellwig,
	Yafang Shao, jack, Christian Brauner, Alexander Viro, Paul Moore,
	James Morris, Serge E. Hallyn, linux-fsdevel, linux-mm,
	linux-bcachefs, linux-security-module, linux-kernel,
	conduct@kernel.org

On Wed 20-11-24 17:39:09, Kent Overstreet wrote:
> Michal's (as well as Steve's) behaviour in the memory allocation
> profiling review process was, in my view, unacceptable (this included
> such things as crashing our LSF presentation with ideas they'd come up
> with that morning, and persistent dismissive axegrinding on the list).
> The project was nearly killed because of his inability to listen to the
> reasons for a design and being stubbornly stuck on his right to be heard
> as the maintainer.

Couple of entry points that might be helful for that.
https://lore.kernel.org/all/YxBc1xuGbB36f8zC@dhcp22.suse.cz/
I have expressed my concerns and set expectations to move the
work forward. I've had couple of back and forth with Suren about
specifics of overhead assumptions from the stack unwinding IIRC. 

For the first non-RFC version my feedback was
https://lore.kernel.org/all/ZFIMaflxeHS3uR%2FA@dhcp22.suse.cz/#t
not really "maintenance burden only" but a request to show that alternative
approaches have been explored. It was not particularly helpful that you
had expected tracing people would implement the feature for you.
https://lore.kernel.org/all/20230503092128.1a120845@gandalf.local.home/
Other people have also expressed that this is not completely impossible
https://lore.kernel.org/all/ZFKNZZwC8EUbOLMv@slm.duckdns.org/
The rest of the email thread is mostly a combat zone that I have avoided
participating as much as possible.

I didn't have any reaction to v2 at all.

v3 was aiming to be merged and I've stepped up as there was no single
review at the time https://lore.kernel.org/all/Zctfa2DvmlTYSfe8@tiehlicka/

I admit that I was really open that I do not like the solution and I've
said reasons for that. Allocator APIs have always been a large mess of
macros, static inlines that makes it really far from free to maintain
and alternative ways should be considered before going that route.

I was also clear that support by MM people was necessary to get this
merged. I have explicitly _not_ NAKed the series and backed off for you
guys to gain that support. 

So essentially there was a clear outline for you and Sure how to achieve
that.

I would really like to hear from other maintainers. Is tnis really
unacceptable maintainer behavior? I am OK to apologize but the above is
in line of my understanding of how to ack properly.

[...]

> Next up, PF_MEMALLOC_NORECLAIM over Michal's nack - I was wrong there, I
> only did it because it really seemed to me that Michal was axe grinding
> against _anything_ I was posting, but I still shouldn't have and that
> was more serious infraction in my view; that sort of thing causes a real
> loss of trust, and no I will not do it again.

Yes, this is simply unacceptable! Just to put full context. We are
talking about eab0af905bfc ("mm: introduce PF_MEMALLOC_NORECLAIM,
PF_MEMALLOC_NOWARN"). I have pushed back on that https://lore.kernel.org/all/Zbu_yyChbCO6b2Lj@tiehlicka/
Rather than searching for support elsewhere you have completely bypassed
the whole MM community including Andrew and pushed that hidden in
bcachefs PR. This is breaking trust model we are all relying on.

I am cutting the rest of as something that is not really material to
maintainership discussion.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: review process (was: underalated stuff)
  2024-11-21  8:43                                       ` review process (was: underalated stuff) Michal Hocko
@ 2024-11-21  9:03                                         ` Kent Overstreet
  0 siblings, 0 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-11-21  9:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Shuah Khan, Dave Chinner, Andrew Morton, Christoph Hellwig,
	Yafang Shao, jack, Christian Brauner, Alexander Viro, Paul Moore,
	James Morris, Serge E. Hallyn, linux-fsdevel, linux-mm,
	linux-bcachefs, linux-security-module, linux-kernel,
	conduct@kernel.org

On Thu, Nov 21, 2024 at 09:43:21AM +0100, Michal Hocko wrote:
> On Wed 20-11-24 17:39:09, Kent Overstreet wrote:
> > Michal's (as well as Steve's) behaviour in the memory allocation
> > profiling review process was, in my view, unacceptable (this included
> > such things as crashing our LSF presentation with ideas they'd come up
> > with that morning, and persistent dismissive axegrinding on the list).
> > The project was nearly killed because of his inability to listen to the
> > reasons for a design and being stubbornly stuck on his right to be heard
> > as the maintainer.
> 
> Couple of entry points that might be helful for that.
> https://lore.kernel.org/all/YxBc1xuGbB36f8zC@dhcp22.suse.cz/
> I have expressed my concerns and set expectations to move the
> work forward. I've had couple of back and forth with Suren about
> specifics of overhead assumptions from the stack unwinding IIRC. 
> 
> For the first non-RFC version my feedback was
> https://lore.kernel.org/all/ZFIMaflxeHS3uR%2FA@dhcp22.suse.cz/#t
> not really "maintenance burden only" but a request to show that alternative
> approaches have been explored. It was not particularly helpful that you
> had expected tracing people would implement the feature for you.
> https://lore.kernel.org/all/20230503092128.1a120845@gandalf.local.home/
> Other people have also expressed that this is not completely impossible
> https://lore.kernel.org/all/ZFKNZZwC8EUbOLMv@slm.duckdns.org/
> The rest of the email thread is mostly a combat zone that I have avoided
> participating as much as possible.
> 
> I didn't have any reaction to v2 at all.
> 
> v3 was aiming to be merged and I've stepped up as there was no single
> review at the time https://lore.kernel.org/all/Zctfa2DvmlTYSfe8@tiehlicka/
> 
> I admit that I was really open that I do not like the solution and I've
> said reasons for that. Allocator APIs have always been a large mess of
> macros, static inlines that makes it really far from free to maintain
> and alternative ways should be considered before going that route.
> 
> I was also clear that support by MM people was necessary to get this
> merged. I have explicitly _not_ NAKed the series and backed off for you
> guys to gain that support. 
> 
> So essentially there was a clear outline for you and Sure how to achieve
> that.
> 
> I would really like to hear from other maintainers. Is tnis really
> unacceptable maintainer behavior? I am OK to apologize but the above is
> in line of my understanding of how to ack properly.

I wonder if I was reading too much into what you were saying in the
off-list thread, when I said "argument to authority". Can you tell me if
this might be closer to the mark?

If I read you correctly, when you said you were "voicing your concerns",
I took it as you being pushy because that was the effects: I need you to
take me at my word, because you didn't see everything behind the scenes,
that this did derail and nearly kill the project.

But I should've been taking you at your word, that you just really were
that concerned.

I think the best way I can explain this issue is with an analogy to
parenting: when we're parenting, our first and foremost job isn't really
to make sure there's a roof, shelter, food - that part is easy in
today's world. The main job really is to make sure that people feel
safe, that they can go out and explore the world without being
terrified.

In order to do that, we have to master our own fears, we can't be
projecting them all the time.

Being a maintainer, or any kind of leader, is the exact same thing. My
big lesson on this was watching Andrew, back during the process of
merging MGLRU - I couldn't believe at the time how chill he was about
it. But he understood the process, and he's a master at this.

Part of mastering our fears in a group setting like this is learning to
trust other people.

It's not that your concerns didn't have any validity - but we were
already doing as much as we could to address them, and you didn't show
any trust that we, I especially, knew what we were doing. And that led
to a _lot_ of wasted effort down blind alleys that I already knew
weren't going to work.

I think that may be what this is really about, sometimes you do have to
trust that people know what they're doing; you share your knowledge if
you have knowledge to share, otherwise you have to back off and let
people do their work. Otherwise the whole thing breaks down and no one
can get anything done.

The main thing is I just want to ask yourself if you could have done
anything better in the memory allocation profiling process. I don't need
you to apologize for anything specific, if you can just try to work
better together in the future that's all I need.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 22:39                                     ` Kent Overstreet
  2024-11-20 22:55                                       ` Kent Overstreet
  2024-11-21  8:43                                       ` review process (was: underalated stuff) Michal Hocko
@ 2024-11-21 20:17                                       ` Simona Vetter
  2 siblings, 0 replies; 78+ messages in thread
From: Simona Vetter @ 2024-11-21 20:17 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Shuah Khan, Michal Hocko, Dave Chinner, Andrew Morton,
	Christoph Hellwig, Yafang Shao, jack, Christian Brauner,
	Alexander Viro, Paul Moore, James Morris, Serge E. Hallyn,
	linux-fsdevel, linux-mm, linux-bcachefs, linux-security-module,
	linux-kernel, conduct@kernel.org, DRI Development, Dave Airlie

On Wed, Nov 20, 2024 at 05:39:09PM -0500, Kent Overstreet wrote:
> There were concerns raised in the recent CoC enforcement thread, by
> someone with experience in such matters, that your aproach seemed
> extremeely heavy handed and I find myself in 100% agreement.

Ehrm ...

Yes, I did quite strongly criticize the new coc enforcement process.

No, you would not appreciate what I'd do instead, not at all.
-Sima
-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 22:21                                   ` Shuah Khan
  2024-11-20 22:39                                     ` Kent Overstreet
@ 2024-11-21 21:26                                     ` Martin Steigerwald
  1 sibling, 0 replies; 78+ messages in thread
From: Martin Steigerwald @ 2024-11-21 21:26 UTC (permalink / raw)
  To: Kent Overstreet, Shuah Khan
  Cc: Michal Hocko, Dave Chinner, Andrew Morton, Christoph Hellwig,
	Yafang Shao, jack, Christian Brauner, Alexander Viro, Paul Moore,
	James Morris, Serge E. Hallyn, linux-fsdevel, linux-mm,
	linux-bcachefs, linux-security-module, linux-kernel,
	conduct@kernel.org, Shuah Khan

Hi Shuah, hi everyone.

Shuah, I appreciate your effort to resolve the Code of Conduct issue.

Also I make no judgment about the technical matter at hand. Basically I do 
not even have a clear idea on what it is about. So I am just commenting on 
the Code of Conduct enforcement process:

Shuah Khan - 20.11.24, 23:21:06 MEZ:
> I didn't pick up on your desire to apologize after the discussion in
> our conversation.
> 
> Are you saying you will be happy to make amends with an apology after
> the discussion and debate?

Do you really think that power-playing Kent into submission by doing a 
public apology is doing anything good to resolve the issue at hand?

While it may not really compare to some of the wording Linus has used 
before having been convinced to change his behavior… I do not agree with 
the wording Kent has used. I certainly do not condone it.

But this forced public apology approach in my point of view is very likely 
just to cement the division instead of heal it. While I publicly disagreed 
with Kent before, I also publicly disagree with this kind of Code of 
Conduct enforcement. I have seen similar patterns within the Debian 
community and in my point of view this lead to the loss of several Debian 
developers who contributed a lot to the project while leaving behind 
frustration and unresolved conflict.

No amount of power play is going to resolve this. Just exercising 
authority is not doing any good in here. This needs mediation, not forced 
public humiliation.

To me, honestly written, this whole interaction feels a bit like I'd 
imagine children may be fighting over a toy. With a majority of the 
children grouping together to single out someone who does not appear to fit 
in at first glance. I mean no offense with that. This is just the impression 
I got so far. The whole interaction just does not remind me of respectful 
communication between adult human beings. I have seen it with myself… in 
situations where it was challenging for me to access what I learned, for 
whatever reason, I had been acting similarly to a child. So really no 
offense meant. This is just an impression I got and wanted to mirror back 
to you for your consideration.

I'd make three changes to the current approach regarding Kent's behavior:

1) Take it to private mediation.

2) Move it from mail to actually talking with one another. Resolving 
conflicts by mail exchange is hard. Maybe voice / video chat. Or meeting in 
person, in case it possible. In other words: *Talk to each other*! Mail is 
really very bad for things like that.

3) Assume good intentions!

And the best first step for everyone involved may just be: Take a deep 
breath and let it sit for a while. Maybe there is something to learn from 
this for everyone involved, including myself.

I have and claim no standing in kernel community. So take this for 
whatever it is worth for you.

Best,
-- 
Martin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 23:47                                         ` Theodore Ts'o
                                                             ` (2 preceding siblings ...)
  2024-11-21  4:25                                           ` Christoph Hellwig
@ 2024-11-21 21:32                                           ` Martin Steigerwald
  2024-11-22  9:47                                           ` Geert Uytterhoeven
  4 siblings, 0 replies; 78+ messages in thread
From: Martin Steigerwald @ 2024-11-21 21:32 UTC (permalink / raw)
  To: Kent Overstreet, Theodore Ts'o
  Cc: Shuah Khan, Michal Hocko, Dave Chinner, Andrew Morton,
	Christoph Hellwig, Yafang Shao, jack, Christian Brauner,
	Alexander Viro, Paul Moore, James Morris, Serge E. Hallyn,
	linux-fsdevel, linux-mm, linux-bcachefs, linux-security-module,
	linux-kernel, conduct@kernel.org

Hi Ted, hi everyone.

Theodore Ts'o - 21.11.24, 00:47:59 MEZ:
> If you look at the git history of the kernel sources, you will see
> that a large number of your fellow maintainers assented to this
> approach --- for example by providing their Acked-by in commit
> 1279dbeed36f ("Code of Conduct Interpretation: Add document explaining
> how the Code of Conduct is to be interpreted").

A large number of people agreeing on a process like this does not 
automatically make it an effective idea for resolving conflict. As I 
outlined in my other mail, this kind of forced public apology approach in 
my point of view is just serving to escalate matters. And actually it 
seems that exactly that just happened right now. See my other mail for 
suggestions on what I think might work better.

A large number of people agreeing on anything does not automatically make 
it right.

I'd suggest to avoid any kind of power-play like "we are more than you" in 
here. What would respectful communication would look like? What does 
happen if *everyone* involved considers how it might feel in the shoes of 
the other one?

I have and claim no standing in kernel community. So take this for 
whatever it is worth for you. I won't be offended in case you disregard it. 
Also I do not need any reply.

And again, just for clarity: I certainly do not condone of the tone Kent 
has used.

Best,
-- 
Martin

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-21  4:25                                           ` Christoph Hellwig
  2024-11-21  4:53                                             ` Kent Overstreet
@ 2024-11-21 23:53                                             ` Shuah Khan
  2024-11-22  6:51                                               ` Kent Overstreet
  2024-11-22 12:06                                               ` Christoph Hellwig
  1 sibling, 2 replies; 78+ messages in thread
From: Shuah Khan @ 2024-11-21 23:53 UTC (permalink / raw)
  To: Christoph Hellwig, Theodore Ts'o
  Cc: Kent Overstreet, Michal Hocko, Dave Chinner, Andrew Morton,
	Yafang Shao, jack, Christian Brauner, Alexander Viro, Paul Moore,
	James Morris, Serge E. Hallyn, linux-fsdevel, linux-mm,
	linux-bcachefs, linux-security-module, linux-kernel,
	conduct@kernel.org, Shuah Khan

On 11/20/24 21:25, Christoph Hellwig wrote:
> On Wed, Nov 20, 2024 at 03:47:59PM -0800, Theodore Ts'o wrote:
>> On Wed, Nov 20, 2024 at 05:55:03PM -0500, Kent Overstreet wrote:
>>> Shuah, would you be willing to entertain the notion of modifying your...
>>
>> Kent, I'd like to gently remind you that Shuah is not speaking in her
>> personal capacity, but as a representative of the Code of Conduct
>> Committee[1], as she has noted in her signature.  The Code of Conduct
>> Committee is appointed by, and reports to, the TAB[2], which is an
>> elected body composed of kernel developers and maintainers.
> 
> FYI, without taking any stance on the issue under debate here, I find the
> language used by Shuah on behalf of the Code of Conduct committee
> extremely patronising and passive aggressive.  This might be because I
> do not have an American academic class background, but I would suggest
> that the code of conduct committee should educate itself about
> communicating without projecting this implicit cultural and class bias
> so blatantly.

I tend to use formal style when I speak on behalf of the Code of Conduct
committee. I would label it as very formal bordering on pedantic.
  
Would you prefer less formal style in the CoC communication?

I am open to educating myself.

thanks,
-- Shuah



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-21 23:53                                             ` Shuah Khan
@ 2024-11-22  6:51                                               ` Kent Overstreet
  2024-11-22 12:06                                               ` Christoph Hellwig
  1 sibling, 0 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-11-22  6:51 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Christoph Hellwig, Theodore Ts'o, Michal Hocko, Dave Chinner,
	Andrew Morton, Yafang Shao, jack, Christian Brauner,
	Alexander Viro, Paul Moore, James Morris, Serge E. Hallyn,
	linux-fsdevel, linux-mm, linux-bcachefs, linux-security-module,
	linux-kernel, conduct@kernel.org

On Thu, Nov 21, 2024 at 04:53:48PM -0700, Shuah Khan wrote:
> On 11/20/24 21:25, Christoph Hellwig wrote:
> > On Wed, Nov 20, 2024 at 03:47:59PM -0800, Theodore Ts'o wrote:
> > > On Wed, Nov 20, 2024 at 05:55:03PM -0500, Kent Overstreet wrote:
> > > > Shuah, would you be willing to entertain the notion of modifying your...
> > > 
> > > Kent, I'd like to gently remind you that Shuah is not speaking in her
> > > personal capacity, but as a representative of the Code of Conduct
> > > Committee[1], as she has noted in her signature.  The Code of Conduct
> > > Committee is appointed by, and reports to, the TAB[2], which is an
> > > elected body composed of kernel developers and maintainers.
> > 
> > FYI, without taking any stance on the issue under debate here, I find the
> > language used by Shuah on behalf of the Code of Conduct committee
> > extremely patronising and passive aggressive.  This might be because I
> > do not have an American academic class background, but I would suggest
> > that the code of conduct committee should educate itself about
> > communicating without projecting this implicit cultural and class bias
> > so blatantly.
> 
> I tend to use formal style when I speak on behalf of the Code of Conduct
> committee. I would label it as very formal bordering on pedantic.
> Would you prefer less formal style in the CoC communication?

Personally, it's always easier to take requests from a human, than a
faceless process that I have no input into.

I'll always rebell against the latter, but I'll always work to help out
or make things right with people in the community.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-20 23:47                                         ` Theodore Ts'o
                                                             ` (3 preceding siblings ...)
  2024-11-21 21:32                                           ` Martin Steigerwald
@ 2024-11-22  9:47                                           ` Geert Uytterhoeven
  4 siblings, 0 replies; 78+ messages in thread
From: Geert Uytterhoeven @ 2024-11-22  9:47 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Kent Overstreet, Shuah Khan, Michal Hocko, Dave Chinner,
	Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, conduct@kernel.org

On Thu, Nov 21, 2024 at 12:49 AM Theodore Ts'o <tytso@mit.edu> wrote:
> If you look at the git history of the kernel sources, you will see
> that a large number of your fellow maintainers assented to this
> approach --- for example by providing their Acked-by in commit
> 1279dbeed36f ("Code of Conduct Interpretation: Add document explaining
> how the Code of Conduct is to be interpreted").

404

The correct reference is commit 79dbeed36f7335ec ("Code of Conduct
Interpretation: Add document explaining how the Code of Conduct is to
be interpreted").

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-21 23:53                                             ` Shuah Khan
  2024-11-22  6:51                                               ` Kent Overstreet
@ 2024-11-22 12:06                                               ` Christoph Hellwig
  1 sibling, 0 replies; 78+ messages in thread
From: Christoph Hellwig @ 2024-11-22 12:06 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Christoph Hellwig, Theodore Ts'o, Kent Overstreet,
	Michal Hocko, Dave Chinner, Andrew Morton, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, conduct@kernel.org

On Thu, Nov 21, 2024 at 04:53:48PM -0700, Shuah Khan wrote:
> I tend to use formal style when I speak on behalf of the Code of Conduct
> committee. I would label it as very formal bordering on pedantic.
>  Would you prefer less formal style in the CoC communication?

I would prefer a less passive agressive and more to the point one.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-09-02  9:51                       ` Kent Overstreet
  2024-09-02 14:07                         ` Jonathan Corbet
  2024-09-04 18:01                         ` Shuah Khan
@ 2024-11-22 21:48                         ` Dan Williams
  2024-11-22 22:02                           ` Kent Overstreet
       [not found]                           ` <1592065022.1379875.1732602282945@fidget.co-bxl>
  2 siblings, 2 replies; 78+ messages in thread
From: Dan Williams @ 2024-11-22 21:48 UTC (permalink / raw)
  To: Kent Overstreet, Michal Hocko
  Cc: Dave Chinner, Andrew Morton, Christoph Hellwig, Yafang Shao, jack,
	Christian Brauner, Alexander Viro, Paul Moore, James Morris,
	Serge E. Hallyn, linux-fsdevel, linux-mm, linux-bcachefs,
	linux-security-module, linux-kernel, maintainers

Kent Overstreet wrote:
> On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote:
> > On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
> > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
> > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
[..]

Kent,

The Code of Conduct Committee received reports about your conduct in
this email discussion.

Link to email where the violation took place:

https://lore.kernel.org/citv2v6f33hoidq75xd2spaqxf7nl5wbmmzma4wgmrwpoqidhj@k453tmq7vdrk

Our community works on trust and respect and has agreed to abide by the
Code of Conduct:

Reference: https://docs.kernel.org/process/code-of-conduct.html

The code of Conduct Committee has determined that your written abuse
of another community member required action on your part to repair the
damage to the individual and the community. You took insufficient action
to restore the community's faith in having otherwise productive technical
discussions without the fear of personal attacks.

Following the Code of Conduct Interpretation process the TAB has approved
has approved the following recommendation:

-- Restrict Kent Overstreet's participation in the kernel development
   process during the Linux 6.13 kernel development cycle.

       - Scope: Decline all pull requests from Kent Overstreet during
         the Linux 6.13 kernel development cycle.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
  2024-11-22 21:48                         ` Dan Williams
@ 2024-11-22 22:02                           ` Kent Overstreet
       [not found]                           ` <1592065022.1379875.1732602282945@fidget.co-bxl>
  1 sibling, 0 replies; 78+ messages in thread
From: Kent Overstreet @ 2024-11-22 22:02 UTC (permalink / raw)
  To: Dan Williams
  Cc: Michal Hocko, Dave Chinner, Andrew Morton, Christoph Hellwig,
	Yafang Shao, jack, Christian Brauner, Alexander Viro, Paul Moore,
	James Morris, Serge E. Hallyn, linux-fsdevel, linux-mm,
	linux-bcachefs, linux-security-module, linux-kernel, maintainers

On Fri, Nov 22, 2024 at 01:48:42PM -0800, Dan Williams wrote:
> Kent Overstreet wrote:
> > On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote:
> > > On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
> > > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
> > > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
> [..]
> 
> Kent,
> 
> The Code of Conduct Committee received reports about your conduct in
> this email discussion.
> 
> Link to email where the violation took place:
> 
> https://lore.kernel.org/citv2v6f33hoidq75xd2spaqxf7nl5wbmmzma4wgmrwpoqidhj@k453tmq7vdrk
> 
> Our community works on trust and respect and has agreed to abide by the
> Code of Conduct:
> 
> Reference: https://docs.kernel.org/process/code-of-conduct.html
> 
> The code of Conduct Committee has determined that your written abuse
> of another community member required action on your part to repair the
> damage to the individual and the community. You took insufficient action
> to restore the community's faith in having otherwise productive technical
> discussions without the fear of personal attacks.
> 
> Following the Code of Conduct Interpretation process the TAB has approved
> has approved the following recommendation:
> 
> -- Restrict Kent Overstreet's participation in the kernel development
>    process during the Linux 6.13 kernel development cycle.
> 
>        - Scope: Decline all pull requests from Kent Overstreet during
>          the Linux 6.13 kernel development cycle.

Ok.

Just to be clear on what this is about though, I'm going to post
publically what I wrote Michal back in September.

This is about a CoC board that on the one hand, doesn't wish to follow
its own rules, and on the other - I can't even make sense of.

From kent.overstreet@linux.dev Wed Sep  4 14:22:39 2024
Date: Wed, 4 Sep 2024 14:22:39 -0400
From: Kent Overstreet <kent.overstreet@linux.dev>
To: Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
Message-ID: <5qukfetlxkadrdjp443xlzbyjhw26l3zzgcdlmfkxgbkpkrv65@hobl73hoc5ip>
References: <20240827061543.1235703-1-mhocko@kernel.org>
 <Zs6jFb953AR2Raec@dread.disaster.area>
 <ylycajqc6yx633f4sh5g3mdbco7zrjdc5bg267sox2js6ok4qb@7j7zut5drbyy>
 <ZtBzstXltxowPOhR@dread.disaster.area>
 <myb6fw5v2l2byxn4raxlaqozwfdpezdmn3mnacry3y2qxmdxtl@bxbsf4v4qbmg>
 <ZtUFaq3vD+zo0gfC@dread.disaster.area>
 <nawltogcoffous3zv4kd2eerrrwhihbulz7pi2qyfjvslp6g3f@j3qkqftra2qm>
 <ZtV6OwlFRu4ZEuSG@tiehlicka>
 <v664cj6evwv7zu3b77gf2lx6dv5sp4qp2rm7jjysddi2wc2uzl@qvnj4kmc6xhq>
 <ZtWH3SkiIEed4NDc@tiehlicka>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <ZtWH3SkiIEed4NDc@tiehlicka>
Status: RO
Content-Length: 4224
Lines: 88

On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote:
> On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
> > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
> > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
> > > [...]
> > > > But I am saying that kmalloc(__GFP_NOFAIL) _should_ fail and return NULL
> > > > in the case of bugs, because that's going to be an improvement w.r.t.
> > > > system robustness, in exactly the same way we don't use BUG_ON() if it's
> > > > something that we can't guarantee won't happen in the wild - we WARN()
> > > > and try to handle the error as best we can.
> > > 
> > > We have discussed that in a different email thread. And I have to say
> > > that I am not convinced that returning NULL makes a broken code much
> > > better. Why? Because we can expect that broken NOFAIL users will not have a
> > > error checking path. Even valid NOFAIL users will not have one because
> > > they _know_ they do not have a different than retry for ever recovery
> > > path. 
> > 
> > You mean where I asked you for a link to the discussion and rationale
> > you claimed had happened? Still waiting on that
> 
> I am not your assistent to be tasked and search through lore archives.
> Find one if you need that.
> 
> Anyway, if you read the email and even tried to understand what is
> written there rather than immediately started shouting a response then
> you would have noticed I have put actual arguments here. You are free to
> disagree with them and lay down your arguments. You have decided to
> 
> [...]
> 
> > Yeah, enough of this insanity.
> 
> so I do not think you are able to do that. Again...

BTW - (after getting emails for three different people about this, heh)

I do want to apologize for things getting this heated the other day, but
I need to also tell you why I reacted the way I did.

Firstly, it's nothing personal: I'm not axe grinding against you
(although you were a major source of frustration for myself and Suren in
the memory allocation profiling discussions, and I hope you can
recognize that as well).

But I do take correctness issues very seriously, and I will get frosty
or genuinely angry if they're being ignored or brushed aside.

The reality as that experience, and to be frank standards of
professionalism, do vary within the kernel community, and I have had
some _outrageous_ fights over things as bad as silent data corruption
bugs (introduced in code I wrote by people who did not CC me, no less;
it was _bad_, and yes it has happened more than once). So - I am _not_
inclined to let things slide, even if it means being the asshole at
times.

Thankfully, most people aren't like that. Dave, Willy, Linus - we can be
shouting at each other, but we still listen, and we know how not to take
it personally and focus on the technical when there's something serious
going on.

Usually when one of us is shouting, you'll find there's a good reason
and some history behind it, even if we also recognize the need to try to
tone things down and not be _too_ much of an asshole. Linus was
reminding me of that yesterday...

So for the record: I'm not trying to roadblock you or anyone else, I'm
just trying to make sure we all have shit that _works_.

And I have been noticing you stepping up in discussions more, and I'd
like to encourage that, if I may. MM has been lacking in strong
technical leadership for a _long_ time - Andrew's great on the process
side, he makes sure things move along, but we haven't had anyone who's
trying to keep everything important in their heads, who's able to point
out to people where their work fits in the larger picture, and there's
been some messy things that have built up over time.

And a word on technical leadership - it doesn't mean being the one who's
"right" all the time, although it does involve a lot of saying "no" to
people. The people who seem the smartest - it's not raw IQ that they've
got (although that helps!), it's the ability to listen and quickly
incorporate other people's ideas without drama or attachment, and the
ability to maintain perspective; notice what people are blocked on,
without getting stuck on it, and think about how it fits into the wider
perspective.

Cheers,
Kent

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM
       [not found]                           ` <1592065022.1379875.1732602282945@fidget.co-bxl>
@ 2024-11-26  6:27                             ` Dylan ‎ ‎
  0 siblings, 0 replies; 78+ messages in thread
From: Dylan ‎ ‎ @ 2024-11-26  6:27 UTC (permalink / raw)
  To: Michal Hocko, Dan Williams, Kent Overstreet
  Cc: linux-kernel, linux-security-module

When do you plan on addressing the Code of Conduct violation done by Greg Kroah-Hartman and the rest of the Linux Foundation by banning all Russians from contributing to the Linux kernel and in doing so discriminating against nationality and ethnicity?

*Apologies in advance for not replying to everyone, my service provider has a cap on recipients I can reply to.

On Nov 22, 2024 at 9:48 PM, Dan Williams <dan.j.williams@intel.com> wrote:Kent Overstreet wrote:
> On Mon, Sep 02, 2024 at 11:39:41AM GMT, Michal Hocko wrote:
> > On Mon 02-09-24 04:52:49, Kent Overstreet wrote:
> > > On Mon, Sep 02, 2024 at 10:41:31AM GMT, Michal Hocko wrote:
> > > > On Sun 01-09-24 21:35:30, Kent Overstreet wrote:
[..]

Kent,

The Code of Conduct Committee received reports about your conduct in
this email discussion.

Link to email where the violation took place:

https://lore.kernel.org/citv2v6f33hoidq75xd2spaqxf7nl5wbmmzma4wgmrwpoqidhj@k453tmq7vdrk

Our community works on trust and respect and has agreed to abide by the
Code of Conduct:

Reference: https://docs.kernel.org/process/code-of-conduct.html

The code of Conduct Committee has determined that your written abuse
of another community member required action on your part to repair the
damage to the individual and the community. You took insufficient action
to restore the community's faith in having otherwise productive technical
discussions without the fear of personal attacks.

Following the Code of Conduct Interpretation process the TAB has approved
has approved the following recommendation:

-- Restrict Kent Overstreet's participation in the kernel development
   process during the Linux 6.13 kernel development cycle.

       - Scope: Decline all pull requests from Kent Overstreet during
         the Linux 6.13 kernel development cycle.

-- 
Sent with https://mailfence.com  
Secure and private email

-- 
Sent with https://mailfence.com  
Secure and private email

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2024-11-26  6:27 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-26  8:47 [PATCH 0/2] get rid of PF_MEMALLOC_NORECLAIM Michal Hocko
2024-08-26  8:47 ` [PATCH 1/2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Michal Hocko
2024-08-26 13:11   ` Matthew Wilcox
2024-08-26 16:48     ` Michal Hocko
2024-08-26 19:39   ` Kent Overstreet
2024-08-26 19:41     ` Matthew Wilcox
2024-08-26 19:42       ` Kent Overstreet
2024-08-26 19:47         ` Matthew Wilcox
2024-08-26 19:54           ` Kent Overstreet
2024-08-26 19:44       ` Kent Overstreet
2024-08-26 19:58     ` Michal Hocko
2024-08-26 20:00       ` Kent Overstreet
2024-08-26 20:27         ` Michal Hocko
2024-08-26 20:43           ` Kent Overstreet
2024-08-26 21:10             ` Kent Overstreet
2024-08-27  6:01             ` Michal Hocko
2024-08-27  6:40               ` Kent Overstreet
2024-08-27  6:58                 ` Michal Hocko
2024-08-27  7:05                   ` Kent Overstreet
2024-08-27  7:35                     ` Michal Hocko
2024-08-26 19:52   ` kernel test robot
2024-08-26 20:53   ` kernel test robot
2024-08-27  2:23   ` kernel test robot
2024-08-27  6:15   ` [PATCH 1/2 v2] " Michal Hocko
2024-08-27 12:29     ` Christoph Hellwig
2024-08-28  4:09     ` Dave Chinner
2024-08-29 10:02       ` Kent Overstreet
2024-08-29 13:12         ` Dave Chinner
2024-08-29 13:22           ` Kent Overstreet
2024-08-29 13:32           ` Kent Overstreet
2024-08-29 14:03             ` Yafang Shao
2024-09-02  0:23             ` Dave Chinner
2024-09-02  1:35               ` Kent Overstreet
2024-09-02  8:41                 ` Michal Hocko
2024-09-02  8:52                   ` Kent Overstreet
2024-09-02  9:39                     ` Michal Hocko
2024-09-02  9:51                       ` Kent Overstreet
2024-09-02 14:07                         ` Jonathan Corbet
2024-09-04 18:01                         ` Shuah Khan
2024-11-20 20:34                           ` Kent Overstreet
2024-11-20 21:12                             ` Shuah Khan
2024-11-20 21:20                               ` Kent Overstreet
2024-11-20 21:37                                 ` Shuah Khan
2024-11-20 22:21                                   ` Shuah Khan
2024-11-20 22:39                                     ` Kent Overstreet
2024-11-20 22:55                                       ` Kent Overstreet
2024-11-20 23:21                                         ` Kent Overstreet
2024-11-20 23:47                                         ` Theodore Ts'o
2024-11-20 23:57                                           ` Kent Overstreet
2024-11-21  0:10                                           ` Kent Overstreet
2024-11-21  4:25                                           ` Christoph Hellwig
2024-11-21  4:53                                             ` Kent Overstreet
2024-11-21 23:53                                             ` Shuah Khan
2024-11-22  6:51                                               ` Kent Overstreet
2024-11-22 12:06                                               ` Christoph Hellwig
2024-11-21 21:32                                           ` Martin Steigerwald
2024-11-22  9:47                                           ` Geert Uytterhoeven
2024-11-21  5:51                                         ` Kent Overstreet
2024-11-21  8:43                                       ` review process (was: underalated stuff) Michal Hocko
2024-11-21  9:03                                         ` Kent Overstreet
2024-11-21 20:17                                       ` [PATCH 1/2 v2] bcachefs: do not use PF_MEMALLOC_NORECLAIM Simona Vetter
2024-11-21 21:26                                     ` Martin Steigerwald
2024-11-22 21:48                         ` Dan Williams
2024-11-22 22:02                           ` Kent Overstreet
     [not found]                           ` <1592065022.1379875.1732602282945@fidget.co-bxl>
2024-11-26  6:27                             ` Dylan ‎ ‎
2024-08-29  9:37   ` [PATCH 1/2] " Jan Kara
2024-08-26  8:47 ` [PATCH 2/2] mm: drop PF_MEMALLOC_NORECLAIM Michal Hocko
2024-08-26 13:48   ` Yafang Shao
2024-08-26 16:54     ` Michal Hocko
2024-08-26 13:59   ` Matthew Wilcox
2024-08-26 16:51     ` Michal Hocko
2024-08-26 17:49       ` Matthew Wilcox
2024-08-26 19:18         ` Michal Hocko
2024-08-26 19:20           ` Matthew Wilcox
2024-08-28  4:11           ` Dave Chinner
2024-08-29 21:45           ` Vlastimil Babka
2024-08-26 19:04   ` Kent Overstreet
2024-08-27 12:29   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).